All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches
@ 2023-05-02  0:16 Matthew Brost
  2023-05-02  0:16 ` [Intel-xe] [PATCH v2 01/31] drm/sched: Add run_wq argument to drm_sched_init Matthew Brost
                   ` (32 more replies)
  0 siblings, 33 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:16 UTC (permalink / raw)
  To: intel-xe

Series includes:

- DRM scheduler changes for firmware backends (1 to 1 entity to scheduler)
- LR workload story
- VM LRU handling
- GuC doorbell submission
- Basic GPUVA
- Sparse binding support
- GPUVA + extobj + drm exec (collaboration with dakr + Francois Dugast)
- GPUVA + userptr (minimal, more can be once Nouveua has userptr)
- Fix fencing rules for compute / fault mode
- Remove async worker for VM + error handling updates
- Kernel doc for VM bind

Series is not fully ready for upstream and some of these things need to
get merged upstream first but overall it is largely correct and
certainly step in the right direction. Based on its size and the fact it
took me 8 hours to rebase this today I'd say let's get this tree and
fixup everything else in place.

Minor uAPI breakage, IGT series:
https://patchwork.freedesktop.org/series/117177/

gitlab link:
https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/344

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Christian König (1):
  drm: execution context for GEM buffers v3

Danilo Krummrich (2):
  maple_tree: split up MA_STATE() macro
  drm: manager to keep track of GPUs VA mappings

Matthew Brost (28):
  drm/sched: Add run_wq argument to drm_sched_init
  drm/sched: Move schedule policy to scheduler
  drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
  drm/xe: Use DRM_SCHED_POLICY_SINGLE_ENTITY mode
  drm/xe: Long running job update
  drm/xe: Ensure LR engines are not persistent
  drm/xe: Only try to lock external BOs in VM bind
  drm/xe: VM LRU bulk move
  drm/xe/guc: Read HXG fields from DW1 of G2H response
  drm/xe/guc: Return the lower part of blocking H2G message
  drm/xe/guc: Use doorbells for submission if possible
  drm/xe/guc: Print doorbell ID in GuC engine debugfs entry
  maple_tree: Export mas_preallocate
  drm/xe: Port Xe to GPUVA
  drm/xe: NULL binding implementation
  drm/xe: Avoid doing rebinds
  drm/xe: Reduce the number list links in xe_vma
  drm/xe: Optimize size of xe_vma allocation
  drm/gpuva: Add drm device to GPUVA manager
  drm/gpuva: Move dma-resv to GPUVA manager
  drm/gpuva: Add support for extobj
  drm/xe: Userptr refactor
  drm/exec: Always compile drm_exec
  drm/xe: Use drm_exec for locking rather than TTM exec helpers
  drm/xe: Allow dma-fences as in-syncs for compute / faulting VM
  drm/xe: Allow compute VMs to output dma-fences on binds
  drm/xe: remove async worker, sync binds, new error handling
  drm/xe/uapi: Add some VM bind kernel doc

 Documentation/gpu/drm-mm.rst                 |   43 +
 drivers/gpu/drm/Kconfig                      |    6 +
 drivers/gpu/drm/Makefile                     |    4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c   |    3 +-
 drivers/gpu/drm/drm_debugfs.c                |   41 +
 drivers/gpu/drm/drm_exec.c                   |  248 ++
 drivers/gpu/drm/drm_gem.c                    |    3 +
 drivers/gpu/drm/drm_gpuva_mgr.c              | 1779 ++++++++++++
 drivers/gpu/drm/etnaviv/etnaviv_sched.c      |    5 +-
 drivers/gpu/drm/i915/display/intel_display.c |    6 +-
 drivers/gpu/drm/lima/lima_sched.c            |    5 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c         |    5 +-
 drivers/gpu/drm/panfrost/panfrost_job.c      |    5 +-
 drivers/gpu/drm/scheduler/sched_entity.c     |   84 +-
 drivers/gpu/drm/scheduler/sched_fence.c      |    2 +-
 drivers/gpu/drm/scheduler/sched_main.c       |   88 +-
 drivers/gpu/drm/v3d/v3d_sched.c              |   25 +-
 drivers/gpu/drm/xe/Kconfig                   |    1 +
 drivers/gpu/drm/xe/regs/xe_guc_regs.h        |    1 +
 drivers/gpu/drm/xe/tests/xe_bo.c             |   26 +-
 drivers/gpu/drm/xe/tests/xe_migrate.c        |    6 +-
 drivers/gpu/drm/xe/xe_bo.c                   |  100 +-
 drivers/gpu/drm/xe/xe_bo.h                   |   13 +-
 drivers/gpu/drm/xe/xe_bo_evict.c             |   24 +-
 drivers/gpu/drm/xe/xe_bo_types.h             |    1 -
 drivers/gpu/drm/xe/xe_device.c               |    2 +-
 drivers/gpu/drm/xe/xe_dma_buf.c              |    2 +-
 drivers/gpu/drm/xe/xe_engine.c               |   50 +-
 drivers/gpu/drm/xe/xe_engine.h               |    4 +
 drivers/gpu/drm/xe/xe_engine_types.h         |    1 +
 drivers/gpu/drm/xe/xe_exec.c                 |  117 +-
 drivers/gpu/drm/xe/xe_execlist.c             |    3 +-
 drivers/gpu/drm/xe/xe_gt_pagefault.c         |   84 +-
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c  |   14 +-
 drivers/gpu/drm/xe/xe_guc.c                  |    6 +
 drivers/gpu/drm/xe/xe_guc_ct.c               |   12 +-
 drivers/gpu/drm/xe/xe_guc_engine_types.h     |    9 +
 drivers/gpu/drm/xe/xe_guc_pc.c               |    6 +-
 drivers/gpu/drm/xe/xe_guc_submit.c           |  398 ++-
 drivers/gpu/drm/xe/xe_guc_submit.h           |    1 +
 drivers/gpu/drm/xe/xe_guc_types.h            |    4 +
 drivers/gpu/drm/xe/xe_huc.c                  |    2 +-
 drivers/gpu/drm/xe/xe_lrc.c                  |    8 +-
 drivers/gpu/drm/xe/xe_migrate.c              |   31 +-
 drivers/gpu/drm/xe/xe_pt.c                   |  198 +-
 drivers/gpu/drm/xe/xe_sync.c                 |   26 +-
 drivers/gpu/drm/xe/xe_sync.h                 |    2 +-
 drivers/gpu/drm/xe/xe_trace.h                |   20 +-
 drivers/gpu/drm/xe/xe_vm.c                   | 2567 +++++++-----------
 drivers/gpu/drm/xe/xe_vm.h                   |  135 +-
 drivers/gpu/drm/xe/xe_vm_madvise.c           |  125 +-
 drivers/gpu/drm/xe/xe_vm_types.h             |  324 ++-
 drivers/gpu/drm/xe/xe_wait_user_fence.c      |   43 +-
 include/drm/drm_debugfs.h                    |   24 +
 include/drm/drm_drv.h                        |    7 +
 include/drm/drm_exec.h                       |  115 +
 include/drm/drm_gem.h                        |   75 +
 include/drm/drm_gpuva_mgr.h                  |  759 ++++++
 include/drm/gpu_scheduler.h                  |   29 +-
 include/linux/maple_tree.h                   |    7 +-
 include/uapi/drm/xe_drm.h                    |  128 +-
 lib/maple_tree.c                             |    1 +
 62 files changed, 5543 insertions(+), 2320 deletions(-)
 create mode 100644 drivers/gpu/drm/drm_exec.c
 create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
 create mode 100644 include/drm/drm_exec.h
 create mode 100644 include/drm/drm_gpuva_mgr.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 01/31] drm/sched: Add run_wq argument to drm_sched_init
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
@ 2023-05-02  0:16 ` Matthew Brost
  2023-05-03 12:03   ` Thomas Hellström
  2023-05-02  0:16 ` [Intel-xe] [PATCH v2 02/31] drm/sched: Move schedule policy to scheduler Matthew Brost
                   ` (31 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:16 UTC (permalink / raw)
  To: intel-xe

We will have this argument upstream, lets pull into the Xe repo.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c    |  2 +-
 drivers/gpu/drm/lima/lima_sched.c          |  2 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c       |  2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c    |  2 +-
 drivers/gpu/drm/scheduler/sched_main.c     |  4 +++-
 drivers/gpu/drm/v3d/v3d_sched.c            | 10 +++++-----
 drivers/gpu/drm/xe/xe_execlist.c           |  2 +-
 drivers/gpu/drm/xe/xe_guc_submit.c         |  2 +-
 include/drm/gpu_scheduler.h                |  1 +
 10 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 902f9b5ff82c..fe28f6b71fe3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2364,7 +2364,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
 			break;
 		}
 
-		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
+		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL,
 				   ring->num_hw_submission, amdgpu_job_hang_limit,
 				   timeout, adev->reset_domain->wq,
 				   ring->sched_score, ring->name,
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 1ae87dfd19c4..8486a2923f1b 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -133,7 +133,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
 {
 	int ret;
 
-	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
+	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
 			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
 			     msecs_to_jiffies(500), NULL, NULL,
 			     dev_name(gpu->dev), gpu->dev);
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index ff003403fbbc..54f53bece27c 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
 
 	INIT_WORK(&pipe->recover_work, lima_sched_recover_work);
 
-	return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
+	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1,
 			      lima_job_hang_limit,
 			      msecs_to_jiffies(timeout), NULL,
 			      NULL, name, pipe->ldev->dev);
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 57a8e9564540..5879fc262047 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -95,7 +95,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
 	 /* currently managing hangcheck ourselves: */
 	sched_timeout = MAX_SCHEDULE_TIMEOUT;
 
-	ret = drm_sched_init(&ring->sched, &msm_sched_ops,
+	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
 			num_hw_submissions, 0, sched_timeout,
 			NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
 	if (ret) {
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index dbc597ab46fb..f48b07056a16 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -815,7 +815,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
 		js->queue[j].fence_context = dma_fence_context_alloc(1);
 
 		ret = drm_sched_init(&js->queue[j].sched,
-				     &panfrost_sched_ops,
+				     &panfrost_sched_ops, NULL,
 				     nentries, 0,
 				     msecs_to_jiffies(JOB_TIMEOUT_MS),
 				     pfdev->reset.wq,
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index cfd8a838e283..e79b9c760efe 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1182,6 +1182,7 @@ static void drm_sched_main(struct work_struct *w)
  *
  * @sched: scheduler instance
  * @ops: backend operations for this scheduler
+ * @run_wq: workqueue to use for run work. If NULL, the system_wq is used
  * @hw_submission: number of hw submissions that can be in flight
  * @hang_limit: number of times to allow a job to hang before dropping it
  * @timeout: timeout value in jiffies for the scheduler
@@ -1195,6 +1196,7 @@ static void drm_sched_main(struct work_struct *w)
  */
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
+		   struct workqueue_struct *run_wq,
 		   unsigned hw_submission, unsigned hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
 		   atomic_t *score, const char *name, struct device *dev)
@@ -1203,9 +1205,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 	sched->ops = ops;
 	sched->hw_submission_limit = hw_submission;
 	sched->name = name;
+	sched->run_wq = run_wq ? : system_wq;
 	sched->timeout = timeout;
 	sched->timeout_wq = timeout_wq ? : system_wq;
-	sched->run_wq = system_wq;	/* FIXME: Let user pass this in */
 	sched->hang_limit = hang_limit;
 	sched->score = score ? score : &sched->_score;
 	sched->dev = dev;
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 06238e6d7f5c..38e092ea41e6 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -388,7 +388,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 	int ret;
 
 	ret = drm_sched_init(&v3d->queue[V3D_BIN].sched,
-			     &v3d_bin_sched_ops,
+			     &v3d_bin_sched_ops, NULL,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
 			     NULL, "v3d_bin", v3d->drm.dev);
@@ -396,7 +396,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 		return ret;
 
 	ret = drm_sched_init(&v3d->queue[V3D_RENDER].sched,
-			     &v3d_render_sched_ops,
+			     &v3d_render_sched_ops, NULL,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
 			     NULL, "v3d_render", v3d->drm.dev);
@@ -404,7 +404,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 		goto fail;
 
 	ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
-			     &v3d_tfu_sched_ops,
+			     &v3d_tfu_sched_ops, NULL,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
 			     NULL, "v3d_tfu", v3d->drm.dev);
@@ -413,7 +413,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 
 	if (v3d_has_csd(v3d)) {
 		ret = drm_sched_init(&v3d->queue[V3D_CSD].sched,
-				     &v3d_csd_sched_ops,
+				     &v3d_csd_sched_ops, NULL,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms), NULL,
 				     NULL, "v3d_csd", v3d->drm.dev);
@@ -421,7 +421,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 			goto fail;
 
 		ret = drm_sched_init(&v3d->queue[V3D_CACHE_CLEAN].sched,
-				     &v3d_cache_clean_sched_ops,
+				     &v3d_cache_clean_sched_ops, NULL,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms), NULL,
 				     NULL, "v3d_cache_clean", v3d->drm.dev);
diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
index de4f0044b211..d6d60ebf3d5f 100644
--- a/drivers/gpu/drm/xe/xe_execlist.c
+++ b/drivers/gpu/drm/xe/xe_execlist.c
@@ -336,7 +336,7 @@ static int execlist_engine_init(struct xe_engine *e)
 
 	exl->engine = e;
 
-	err = drm_sched_init(&exl->sched, &drm_sched_ops,
+	err = drm_sched_init(&exl->sched, &drm_sched_ops, NULL,
 			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
 			     XE_SCHED_HANG_LIMIT, XE_SCHED_JOB_TIMEOUT,
 			     NULL, NULL, e->hwe->name,
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index e857013070b9..735f31257f3a 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1081,7 +1081,7 @@ static int guc_engine_init(struct xe_engine *e)
 	init_waitqueue_head(&ge->suspend_wait);
 
 	timeout = xe_vm_no_dma_fences(e->vm) ? MAX_SCHEDULE_TIMEOUT : HZ * 5;
-	err = drm_sched_init(&ge->sched, &drm_sched_ops,
+	err = drm_sched_init(&ge->sched, &drm_sched_ops, NULL,
 			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
 			     64, timeout, guc_to_gt(guc)->ordered_wq, NULL,
 			     e->name, gt_to_xe(e->gt)->drm.dev);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index cf85f93218fc..09bc39840dc8 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -539,6 +539,7 @@ struct drm_gpu_scheduler {
 
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
+		   struct workqueue_struct *run_wq,
 		   uint32_t hw_submission, unsigned hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
 		   atomic_t *score, const char *name, struct device *dev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 02/31] drm/sched: Move schedule policy to scheduler
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
  2023-05-02  0:16 ` [Intel-xe] [PATCH v2 01/31] drm/sched: Add run_wq argument to drm_sched_init Matthew Brost
@ 2023-05-02  0:16 ` Matthew Brost
  2023-05-03 12:13   ` Thomas Hellström
  2023-05-02  0:16 ` [Intel-xe] [PATCH v2 03/31] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
                   ` (30 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:16 UTC (permalink / raw)
  To: intel-xe

Rather than a global modparam for scheduling policy, move the scheduling
policy to scheduler so user can control each scheduler policy.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
 drivers/gpu/drm/etnaviv/etnaviv_sched.c    |  3 ++-
 drivers/gpu/drm/lima/lima_sched.c          |  3 ++-
 drivers/gpu/drm/msm/msm_ringbuffer.c       |  3 ++-
 drivers/gpu/drm/panfrost/panfrost_job.c    |  3 ++-
 drivers/gpu/drm/scheduler/sched_entity.c   | 24 ++++++++++++++++++----
 drivers/gpu/drm/scheduler/sched_main.c     | 21 ++++++++++++++-----
 drivers/gpu/drm/v3d/v3d_sched.c            | 15 +++++++++-----
 drivers/gpu/drm/xe/xe_execlist.c           |  2 +-
 drivers/gpu/drm/xe/xe_guc_submit.c         |  3 ++-
 include/drm/gpu_scheduler.h                | 20 ++++++++++++------
 11 files changed, 72 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index fe28f6b71fe3..577ea5b98cd5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2368,6 +2368,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
 				   ring->num_hw_submission, amdgpu_job_hang_limit,
 				   timeout, adev->reset_domain->wq,
 				   ring->sched_score, ring->name,
+				   DRM_SCHED_POLICY_DEFAULT,
 				   adev->dev);
 		if (r) {
 			DRM_ERROR("Failed to create scheduler on ring %s.\n",
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 8486a2923f1b..61204a3f8b0b 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -136,7 +136,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
 	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
 			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
 			     msecs_to_jiffies(500), NULL, NULL,
-			     dev_name(gpu->dev), gpu->dev);
+			     dev_name(gpu->dev), DRM_SCHED_POLICY_DEFAULT,
+			     gpu->dev);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index 54f53bece27c..33042ba6ae93 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -491,7 +491,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
 	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1,
 			      lima_job_hang_limit,
 			      msecs_to_jiffies(timeout), NULL,
-			      NULL, name, pipe->ldev->dev);
+			      NULL, name, DRM_SCHED_POLICY_DEFAULT,
+			      pipe->ldev->dev);
 }
 
 void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 5879fc262047..f408a9097315 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -97,7 +97,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
 
 	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
 			num_hw_submissions, 0, sched_timeout,
-			NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
+			NULL, NULL, to_msm_bo(ring->bo)->name,
+			DRM_SCHED_POLICY_DEFAULT, gpu->dev->dev);
 	if (ret) {
 		goto fail;
 	}
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index f48b07056a16..effa48b33dce 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -819,7 +819,8 @@ int panfrost_job_init(struct panfrost_device *pfdev)
 				     nentries, 0,
 				     msecs_to_jiffies(JOB_TIMEOUT_MS),
 				     pfdev->reset.wq,
-				     NULL, "pan_js", pfdev->dev);
+				     NULL, "pan_js", DRM_SCHED_POLICY_DEFAULT,
+				     pfdev->dev);
 		if (ret) {
 			dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret);
 			goto err_sched;
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 15d04a0ec623..2300b2fc06ab 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -33,6 +33,20 @@
 #define to_drm_sched_job(sched_job)		\
 		container_of((sched_job), struct drm_sched_job, queue_node)
 
+static bool bad_policies(struct drm_gpu_scheduler **sched_list,
+			 unsigned int num_sched_list)
+{
+	enum drm_sched_policy sched_policy = sched_list[0]->sched_policy;
+	unsigned int i;
+
+	/* All scdedule policies must match */
+	for (i = 1; i < num_sched_list; ++i)
+		if (sched_policy != sched_list[i]->sched_policy)
+			return true;
+
+	return false;
+}
+
 /**
  * drm_sched_entity_init - Init a context entity used by scheduler when
  * submit to HW ring.
@@ -62,7 +76,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 			  unsigned int num_sched_list,
 			  atomic_t *guilty)
 {
-	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
+	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])) ||
+	    bad_policies(sched_list, num_sched_list))
 		return -EINVAL;
 
 	memset(entity, 0, sizeof(struct drm_sched_entity));
@@ -75,8 +90,9 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 	entity->last_scheduled = NULL;
 	RB_CLEAR_NODE(&entity->rb_tree_node);
 
-	if(num_sched_list)
+	if(num_sched_list) {
 		entity->rq = &sched_list[0]->sched_rq[entity->priority];
+	}
 
 	init_completion(&entity->entity_idle);
 
@@ -440,7 +456,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
 	 * Update the entity's location in the min heap according to
 	 * the timestamp of the next job, if any.
 	 */
-	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
+	if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO) {
 		struct drm_sched_job *next;
 
 		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
@@ -528,7 +544,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 		drm_sched_rq_add_entity(entity->rq, entity);
 		spin_unlock(&entity->rq_lock);
 
-		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+		if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO)
 			drm_sched_rq_update_fifo(entity, sched_job->submit_ts);
 
 		drm_sched_wakeup(entity->rq->sched);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index e79b9c760efe..6777a2db554f 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -62,14 +62,14 @@
 #define to_drm_sched_job(sched_job)		\
 		container_of((sched_job), struct drm_sched_job, queue_node)
 
-int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
+int default_drm_sched_policy = DRM_SCHED_POLICY_FIFO;
 
 /**
  * DOC: sched_policy (int)
  * Used to override default entities scheduling policy in a run queue.
  */
 MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
-module_param_named(sched_policy, drm_sched_policy, int, 0444);
+module_param_named(sched_policy, default_drm_sched_policy, int, 0444);
 
 static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a,
 							    const struct rb_node *b)
@@ -173,7 +173,7 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
 	if (rq->current_entity == entity)
 		rq->current_entity = NULL;
 
-	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+	if (rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO)
 		drm_sched_rq_remove_fifo_locked(entity);
 
 	spin_unlock(&rq->lock);
@@ -956,7 +956,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
 
 	/* Kernel run queue has higher priority than normal run queue*/
 	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
-		entity = drm_sched_policy == DRM_SCHED_POLICY_FIFO ?
+		entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ?
 			drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]) :
 			drm_sched_rq_select_entity_rr(&sched->sched_rq[i]);
 		if (entity)
@@ -1190,6 +1190,7 @@ static void drm_sched_main(struct work_struct *w)
  *		used
  * @score: optional score atomic shared with other schedulers
  * @name: name used for debugging
+ * @sched_policy: schedule policy
  * @dev: target &struct device
  *
  * Return 0 on success, otherwise error code.
@@ -1199,9 +1200,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   struct workqueue_struct *run_wq,
 		   unsigned hw_submission, unsigned hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
-		   atomic_t *score, const char *name, struct device *dev)
+		   atomic_t *score, const char *name,
+		   enum drm_sched_policy sched_policy,
+		   struct device *dev)
 {
 	int i;
+
+	if (sched_policy >= DRM_SCHED_POLICY_COUNT)
+		return -EINVAL;
+
 	sched->ops = ops;
 	sched->hw_submission_limit = hw_submission;
 	sched->name = name;
@@ -1211,6 +1218,10 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 	sched->hang_limit = hang_limit;
 	sched->score = score ? score : &sched->_score;
 	sched->dev = dev;
+	if (sched_policy == DRM_SCHED_POLICY_DEFAULT)
+		sched->sched_policy = default_drm_sched_policy;
+	else
+		sched->sched_policy = sched_policy;
 	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
 		drm_sched_rq_init(sched, &sched->sched_rq[i]);
 
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 38e092ea41e6..5e3fe77fa991 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -391,7 +391,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 			     &v3d_bin_sched_ops, NULL,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
-			     NULL, "v3d_bin", v3d->drm.dev);
+			     NULL, "v3d_bin", DRM_SCHED_POLICY_DEFAULT,
+			     v3d->drm.dev);
 	if (ret)
 		return ret;
 
@@ -399,7 +400,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 			     &v3d_render_sched_ops, NULL,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
-			     NULL, "v3d_render", v3d->drm.dev);
+			     ULL, "v3d_render", DRM_SCHED_POLICY_DEFAULT,
+			     v3d->drm.dev);
 	if (ret)
 		goto fail;
 
@@ -407,7 +409,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 			     &v3d_tfu_sched_ops, NULL,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
-			     NULL, "v3d_tfu", v3d->drm.dev);
+			     NULL, "v3d_tfu", DRM_SCHED_POLICY_DEFAULT,
+			     v3d->drm.dev);
 	if (ret)
 		goto fail;
 
@@ -416,7 +419,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 				     &v3d_csd_sched_ops, NULL,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms), NULL,
-				     NULL, "v3d_csd", v3d->drm.dev);
+				     NULL, "v3d_csd", DRM_SCHED_POLICY_DEFAULT,
+				     v3d->drm.dev);
 		if (ret)
 			goto fail;
 
@@ -424,7 +428,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 				     &v3d_cache_clean_sched_ops, NULL,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms), NULL,
-				     NULL, "v3d_cache_clean", v3d->drm.dev);
+				     NULL, "v3d_cache_clean",
+				     DRM_SCHED_POLICY_DEFAULT, v3d->drm.dev);
 		if (ret)
 			goto fail;
 	}
diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
index d6d60ebf3d5f..48060d14547a 100644
--- a/drivers/gpu/drm/xe/xe_execlist.c
+++ b/drivers/gpu/drm/xe/xe_execlist.c
@@ -339,7 +339,7 @@ static int execlist_engine_init(struct xe_engine *e)
 	err = drm_sched_init(&exl->sched, &drm_sched_ops, NULL,
 			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
 			     XE_SCHED_HANG_LIMIT, XE_SCHED_JOB_TIMEOUT,
-			     NULL, NULL, e->hwe->name,
+			     NULL, NULL, e->hwe->name, DRM_SCHED_POLICY_DEFAULT,
 			     gt_to_xe(e->gt)->drm.dev);
 	if (err)
 		goto err_free;
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 735f31257f3a..9d3fadca43be 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1084,7 +1084,8 @@ static int guc_engine_init(struct xe_engine *e)
 	err = drm_sched_init(&ge->sched, &drm_sched_ops, NULL,
 			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
 			     64, timeout, guc_to_gt(guc)->ordered_wq, NULL,
-			     e->name, gt_to_xe(e->gt)->drm.dev);
+			     e->name, DRM_SCHED_POLICY_DEFAULT,
+			     gt_to_xe(e->gt)->drm.dev);
 	if (err)
 		goto err_free;
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 09bc39840dc8..3df801401028 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -63,11 +63,15 @@ enum drm_sched_priority {
 	DRM_SCHED_PRIORITY_UNSET = -2
 };
 
-/* Used to chose between FIFO and RR jobs scheduling */
-extern int drm_sched_policy;
-
-#define DRM_SCHED_POLICY_RR    0
-#define DRM_SCHED_POLICY_FIFO  1
+/* Used to chose default scheduling policy*/
+extern int default_drm_sched_policy;
+
+enum drm_sched_policy {
+	DRM_SCHED_POLICY_DEFAULT,
+	DRM_SCHED_POLICY_RR,
+	DRM_SCHED_POLICY_FIFO,
+	DRM_SCHED_POLICY_COUNT,
+};
 
 /**
  * struct drm_sched_entity - A wrapper around a job queue (typically
@@ -505,6 +509,7 @@ struct drm_sched_backend_ops {
  *              guilty and it will no longer be considered for scheduling.
  * @score: score to help loadbalancer pick a idle sched
  * @_score: score used when the driver doesn't provide one
+ * @sched_policy: Schedule policy for scheduler
  * @ready: marks if the underlying HW is ready to work
  * @free_guilty: A hit to time out handler to free the guilty job.
  * @pause_run_wq: pause queuing of @work_run on @run_wq
@@ -531,6 +536,7 @@ struct drm_gpu_scheduler {
 	int				hang_limit;
 	atomic_t                        *score;
 	atomic_t                        _score;
+	enum drm_sched_policy		sched_policy;
 	bool				ready;
 	bool				free_guilty;
 	bool				pause_run_wq;
@@ -542,7 +548,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   struct workqueue_struct *run_wq,
 		   uint32_t hw_submission, unsigned hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
-		   atomic_t *score, const char *name, struct device *dev);
+		   atomic_t *score, const char *name,
+		   enum drm_sched_policy sched_policy,
+		   struct device *dev);
 
 void drm_sched_fini(struct drm_gpu_scheduler *sched);
 int drm_sched_job_init(struct drm_sched_job *job,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 03/31] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
  2023-05-02  0:16 ` [Intel-xe] [PATCH v2 01/31] drm/sched: Add run_wq argument to drm_sched_init Matthew Brost
  2023-05-02  0:16 ` [Intel-xe] [PATCH v2 02/31] drm/sched: Move schedule policy to scheduler Matthew Brost
@ 2023-05-02  0:16 ` Matthew Brost
  2023-05-08 12:40   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 04/31] drm/xe: Use DRM_SCHED_POLICY_SINGLE_ENTITY mode Matthew Brost
                   ` (29 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:16 UTC (permalink / raw)
  To: intel-xe

DRM_SCHED_POLICY_SINGLE_ENTITY creates a 1 to 1 relationship between
scheduler and entity. No priorities or run queue used in this mode.
Intended for devices with firmware schedulers.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 64 +++++++++++++++++++-----
 drivers/gpu/drm/scheduler/sched_fence.c  |  2 +-
 drivers/gpu/drm/scheduler/sched_main.c   | 63 ++++++++++++++++++++---
 include/drm/gpu_scheduler.h              |  8 +++
 4 files changed, 115 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 2300b2fc06ab..8b70900c54cc 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -83,6 +83,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 	memset(entity, 0, sizeof(struct drm_sched_entity));
 	INIT_LIST_HEAD(&entity->list);
 	entity->rq = NULL;
+	entity->single_sched = NULL;
 	entity->guilty = guilty;
 	entity->num_sched_list = num_sched_list;
 	entity->priority = priority;
@@ -91,7 +92,15 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 	RB_CLEAR_NODE(&entity->rb_tree_node);
 
 	if(num_sched_list) {
-		entity->rq = &sched_list[0]->sched_rq[entity->priority];
+		if (sched_list[0]->sched_policy !=
+		    DRM_SCHED_POLICY_SINGLE_ENTITY) {
+			entity->rq = &sched_list[0]->sched_rq[entity->priority];
+		} else {
+			if (num_sched_list != 1 || sched_list[0]->single_entity)
+				return -EINVAL;
+			sched_list[0]->single_entity = entity;
+			entity->single_sched = sched_list[0];
+		}
 	}
 
 	init_completion(&entity->entity_idle);
@@ -125,7 +134,8 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 				    struct drm_gpu_scheduler **sched_list,
 				    unsigned int num_sched_list)
 {
-	WARN_ON(!num_sched_list || !sched_list);
+	WARN_ON(!num_sched_list || !sched_list ||
+		!!entity->single_sched);
 
 	entity->sched_list = sched_list;
 	entity->num_sched_list = num_sched_list;
@@ -195,13 +205,15 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
 {
 	struct drm_sched_job *job;
 	struct dma_fence *prev;
+	bool single_entity = !!entity->single_sched;
 
-	if (!entity->rq)
+	if (!entity->rq && !single_entity)
 		return;
 
 	spin_lock(&entity->rq_lock);
 	entity->stopped = true;
-	drm_sched_rq_remove_entity(entity->rq, entity);
+	if (!single_entity)
+		drm_sched_rq_remove_entity(entity->rq, entity);
 	spin_unlock(&entity->rq_lock);
 
 	/* Make sure this entity is not used by the scheduler at the moment */
@@ -223,6 +235,20 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
 	dma_fence_put(prev);
 }
 
+/**
+ * drm_sched_entity_to_scheduler - Schedule entity to GPU scheduler
+ * @entity: scheduler entity
+ *
+ * Returns GPU scheduler for the entity
+ */
+struct drm_gpu_scheduler *
+drm_sched_entity_to_scheduler(struct drm_sched_entity *entity)
+{
+	bool single_entity = !!entity->single_sched;
+
+	return single_entity ? entity->single_sched : entity->rq->sched;
+}
+
 /**
  * drm_sched_entity_flush - Flush a context entity
  *
@@ -240,11 +266,12 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
 	struct drm_gpu_scheduler *sched;
 	struct task_struct *last_user;
 	long ret = timeout;
+	bool single_entity = !!entity->single_sched;
 
-	if (!entity->rq)
+	if (!entity->rq && !single_entity)
 		return 0;
 
-	sched = entity->rq->sched;
+	sched = drm_sched_entity_to_scheduler(entity);
 	/**
 	 * The client will not queue more IBs during this fini, consume existing
 	 * queued IBs or discard them on SIGKILL
@@ -337,7 +364,7 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
 		container_of(cb, struct drm_sched_entity, cb);
 
 	drm_sched_entity_clear_dep(f, cb);
-	drm_sched_wakeup(entity->rq->sched);
+	drm_sched_wakeup(drm_sched_entity_to_scheduler(entity));
 }
 
 /**
@@ -351,6 +378,8 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
 void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
 				   enum drm_sched_priority priority)
 {
+	WARN_ON(!!entity->single_sched);
+
 	spin_lock(&entity->rq_lock);
 	entity->priority = priority;
 	spin_unlock(&entity->rq_lock);
@@ -363,7 +392,7 @@ EXPORT_SYMBOL(drm_sched_entity_set_priority);
  */
 static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
 {
-	struct drm_gpu_scheduler *sched = entity->rq->sched;
+	struct drm_gpu_scheduler *sched = drm_sched_entity_to_scheduler(entity);
 	struct dma_fence *fence = entity->dependency;
 	struct drm_sched_fence *s_fence;
 
@@ -456,7 +485,8 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
 	 * Update the entity's location in the min heap according to
 	 * the timestamp of the next job, if any.
 	 */
-	if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO) {
+	if (drm_sched_entity_to_scheduler(entity)->sched_policy ==
+	    DRM_SCHED_POLICY_FIFO) {
 		struct drm_sched_job *next;
 
 		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
@@ -473,6 +503,8 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
 	struct drm_gpu_scheduler *sched;
 	struct drm_sched_rq *rq;
 
+	WARN_ON(!!entity->single_sched);
+
 	/* single possible engine and already selected */
 	if (!entity->sched_list)
 		return;
@@ -522,16 +554,21 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
 void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 {
 	struct drm_sched_entity *entity = sched_job->entity;
+	bool single_entity = !!entity->single_sched;
 	bool first;
 
 	trace_drm_sched_job(sched_job, entity);
-	atomic_inc(entity->rq->sched->score);
+	if (!single_entity)
+		atomic_inc(entity->rq->sched->score);
 	WRITE_ONCE(entity->last_user, current->group_leader);
 	first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node);
 	sched_job->submit_ts = ktime_get();
 
 	/* first job wakes up scheduler */
 	if (first) {
+		struct drm_gpu_scheduler *sched =
+			drm_sched_entity_to_scheduler(entity);
+
 		/* Add the entity to the run queue */
 		spin_lock(&entity->rq_lock);
 		if (entity->stopped) {
@@ -541,13 +578,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 			return;
 		}
 
-		drm_sched_rq_add_entity(entity->rq, entity);
+		if (!single_entity)
+			drm_sched_rq_add_entity(entity->rq, entity);
 		spin_unlock(&entity->rq_lock);
 
-		if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO)
+		if (sched->sched_policy == DRM_SCHED_POLICY_FIFO)
 			drm_sched_rq_update_fifo(entity, sched_job->submit_ts);
 
-		drm_sched_wakeup(entity->rq->sched);
+		drm_sched_wakeup(sched);
 	}
 }
 EXPORT_SYMBOL(drm_sched_entity_push_job);
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index 7fd869520ef2..1ba5056851dd 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -167,7 +167,7 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
 {
 	unsigned seq;
 
-	fence->sched = entity->rq->sched;
+	fence->sched = drm_sched_entity_to_scheduler(entity);
 	seq = atomic_inc_return(&entity->fence_seq);
 	dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
 		       &fence->lock, entity->fence_context, seq);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 6777a2db554f..870568d94f1f 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -32,7 +32,8 @@
  * backend operations to the scheduler like submitting a job to hardware run queue,
  * returning the dependencies of a job etc.
  *
- * The organisation of the scheduler is the following:
+ * The organisation of the scheduler is the following for scheduling policies
+ * DRM_SCHED_POLICY_RR and DRM_SCHED_POLICY_FIFO:
  *
  * 1. Each hw run queue has one scheduler
  * 2. Each scheduler has multiple run queues with different priorities
@@ -41,7 +42,22 @@
  * 4. Entities themselves maintain a queue of jobs that will be scheduled on
  *    the hardware.
  *
- * The jobs in a entity are always scheduled in the order that they were pushed.
+ * The organisation of the scheduler is the following for scheduling policy
+ * DRM_SCHED_POLICY_SINGLE_ENTITY:
+ *
+ * 1. One to one relationship between scheduler and entity
+ * 2. No priorities implemented per scheduler (single job queue)
+ * 3. No run queues in scheduler rather jobs are directly dequeued from entity
+ * 4. The entity maintains a queue of jobs that will be scheduled on the
+ * hardware
+ *
+ * The jobs in a entity are always scheduled in the order that they were pushed
+ * regardless of scheduling policy.
+ *
+ * A policy of DRM_SCHED_POLICY_RR or DRM_SCHED_POLICY_FIFO is expected to used
+ * when the KMD is scheduling directly on the hardware while a scheduling policy
+ * of DRM_SCHED_POLICY_SINGLE_ENTITY is expected to be used when there is a
+ * firmare scheduler.
  */
 
 #include <linux/wait.h>
@@ -92,6 +108,8 @@ static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *enti
 
 void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
 {
+	WARN_ON(!!entity->single_sched);
+
 	/*
 	 * Both locks need to be grabbed, one to protect from entity->rq change
 	 * for entity from within concurrent drm_sched_entity_select_rq and the
@@ -122,6 +140,8 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
 static void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
 			      struct drm_sched_rq *rq)
 {
+	WARN_ON(sched->sched_policy == DRM_SCHED_POLICY_SINGLE_ENTITY);
+
 	spin_lock_init(&rq->lock);
 	INIT_LIST_HEAD(&rq->entities);
 	rq->rb_tree_root = RB_ROOT_CACHED;
@@ -140,6 +160,8 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
 void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
 			     struct drm_sched_entity *entity)
 {
+	WARN_ON(!!entity->single_sched);
+
 	if (!list_empty(&entity->list))
 		return;
 
@@ -162,6 +184,8 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
 void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
 				struct drm_sched_entity *entity)
 {
+	WARN_ON(!!entity->single_sched);
+
 	if (list_empty(&entity->list))
 		return;
 
@@ -691,7 +715,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
 		       struct drm_sched_entity *entity,
 		       void *owner)
 {
-	if (!entity->rq)
+	if (!entity->rq && !entity->single_sched)
 		return -ENOENT;
 
 	job->entity = entity;
@@ -724,13 +748,16 @@ void drm_sched_job_arm(struct drm_sched_job *job)
 {
 	struct drm_gpu_scheduler *sched;
 	struct drm_sched_entity *entity = job->entity;
+	bool single_entity = !!entity->single_sched;
 
 	BUG_ON(!entity);
-	drm_sched_entity_select_rq(entity);
-	sched = entity->rq->sched;
+	if (!single_entity)
+		drm_sched_entity_select_rq(entity);
+	sched = drm_sched_entity_to_scheduler(entity);
 
 	job->sched = sched;
-	job->s_priority = entity->rq - sched->sched_rq;
+	if (!single_entity)
+		job->s_priority = entity->rq - sched->sched_rq;
 	job->id = atomic64_inc_return(&sched->job_id_count);
 
 	drm_sched_fence_init(job->s_fence, job->entity);
@@ -954,6 +981,13 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
 	if (!drm_sched_ready(sched))
 		return NULL;
 
+	if (sched->single_entity) {
+		if (drm_sched_entity_is_ready(sched->single_entity))
+			return sched->single_entity;
+
+		return NULL;
+	}
+
 	/* Kernel run queue has higher priority than normal run queue*/
 	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
 		entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ?
@@ -1210,6 +1244,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		return -EINVAL;
 
 	sched->ops = ops;
+	sched->single_entity = NULL;
 	sched->hw_submission_limit = hw_submission;
 	sched->name = name;
 	sched->run_wq = run_wq ? : system_wq;
@@ -1222,7 +1257,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		sched->sched_policy = default_drm_sched_policy;
 	else
 		sched->sched_policy = sched_policy;
-	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
+	for (i = DRM_SCHED_PRIORITY_MIN; sched_policy !=
+	     DRM_SCHED_POLICY_SINGLE_ENTITY && i < DRM_SCHED_PRIORITY_COUNT;
+	     i++)
 		drm_sched_rq_init(sched, &sched->sched_rq[i]);
 
 	init_waitqueue_head(&sched->job_scheduled);
@@ -1255,7 +1292,15 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 
 	drm_sched_run_wq_stop(sched);
 
-	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
+	if (sched->single_entity) {
+		spin_lock(&sched->single_entity->rq_lock);
+		sched->single_entity->stopped = true;
+		spin_unlock(&sched->single_entity->rq_lock);
+	}
+
+	for (i = DRM_SCHED_PRIORITY_COUNT - 1; sched->sched_policy !=
+	     DRM_SCHED_POLICY_SINGLE_ENTITY && i >= DRM_SCHED_PRIORITY_MIN;
+	     i--) {
 		struct drm_sched_rq *rq = &sched->sched_rq[i];
 
 		if (!rq)
@@ -1299,6 +1344,8 @@ void drm_sched_increase_karma(struct drm_sched_job *bad)
 	struct drm_sched_entity *entity;
 	struct drm_gpu_scheduler *sched = bad->sched;
 
+	WARN_ON(sched->sched_policy == DRM_SCHED_POLICY_SINGLE_ENTITY);
+
 	/* don't change @bad's karma if it's from KERNEL RQ,
 	 * because sometimes GPU hang would cause kernel jobs (like VM updating jobs)
 	 * corrupt but keep in mind that kernel jobs always considered good.
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 3df801401028..669d6520cd3a 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -70,6 +70,7 @@ enum drm_sched_policy {
 	DRM_SCHED_POLICY_DEFAULT,
 	DRM_SCHED_POLICY_RR,
 	DRM_SCHED_POLICY_FIFO,
+	DRM_SCHED_POLICY_SINGLE_ENTITY,
 	DRM_SCHED_POLICY_COUNT,
 };
 
@@ -103,6 +104,9 @@ struct drm_sched_entity {
 	 */
 	struct drm_sched_rq		*rq;
 
+	/** @single_sched: Single scheduler */
+	struct drm_gpu_scheduler	*single_sched;
+
 	/**
 	 * @sched_list:
 	 *
@@ -488,6 +492,7 @@ struct drm_sched_backend_ops {
  * struct drm_gpu_scheduler - scheduler instance-specific data
  *
  * @ops: backend operations provided by the driver.
+ * @single_entity: Single entity for the scheduler
  * @hw_submission_limit: the max size of the hardware queue.
  * @timeout: the time after which a job is removed from the scheduler.
  * @name: name of the ring for which this scheduler is being used.
@@ -519,6 +524,7 @@ struct drm_sched_backend_ops {
  */
 struct drm_gpu_scheduler {
 	const struct drm_sched_backend_ops	*ops;
+	struct drm_sched_entity		*single_entity;
 	uint32_t			hw_submission_limit;
 	long				timeout;
 	const char			*name;
@@ -604,6 +610,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 			  struct drm_gpu_scheduler **sched_list,
 			  unsigned int num_sched_list,
 			  atomic_t *guilty);
+struct drm_gpu_scheduler *
+drm_sched_entity_to_scheduler(struct drm_sched_entity *entity);
 long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout);
 void drm_sched_entity_fini(struct drm_sched_entity *entity);
 void drm_sched_entity_destroy(struct drm_sched_entity *entity);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 04/31] drm/xe: Use DRM_SCHED_POLICY_SINGLE_ENTITY mode
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (2 preceding siblings ...)
  2023-05-02  0:16 ` [Intel-xe] [PATCH v2 03/31] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-08 12:41   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 05/31] drm/xe: Long running job update Matthew Brost
                   ` (28 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

We create 1 GPU scheduler per entity in Xe, use
DRM_SCHED_POLICY_SINGLE_ENTITY scheduling which is designed for that
paradigm.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_execlist.c   | 3 ++-
 drivers/gpu/drm/xe/xe_guc_submit.c | 3 +--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
index 48060d14547a..79fb951c2965 100644
--- a/drivers/gpu/drm/xe/xe_execlist.c
+++ b/drivers/gpu/drm/xe/xe_execlist.c
@@ -339,7 +339,8 @@ static int execlist_engine_init(struct xe_engine *e)
 	err = drm_sched_init(&exl->sched, &drm_sched_ops, NULL,
 			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
 			     XE_SCHED_HANG_LIMIT, XE_SCHED_JOB_TIMEOUT,
-			     NULL, NULL, e->hwe->name, DRM_SCHED_POLICY_DEFAULT,
+			     NULL, NULL, e->hwe->name,
+			     DRM_SCHED_POLICY_SINGLE_ENTITY,
 			     gt_to_xe(e->gt)->drm.dev);
 	if (err)
 		goto err_free;
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 9d3fadca43be..68d09e7a4cc0 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1084,7 +1084,7 @@ static int guc_engine_init(struct xe_engine *e)
 	err = drm_sched_init(&ge->sched, &drm_sched_ops, NULL,
 			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
 			     64, timeout, guc_to_gt(guc)->ordered_wq, NULL,
-			     e->name, DRM_SCHED_POLICY_DEFAULT,
+			     e->name, DRM_SCHED_POLICY_SINGLE_ENTITY,
 			     gt_to_xe(e->gt)->drm.dev);
 	if (err)
 		goto err_free;
@@ -1185,7 +1185,6 @@ static int guc_engine_set_priority(struct xe_engine *e,
 	if (!msg)
 		return -ENOMEM;
 
-	drm_sched_entity_set_priority(e->entity, priority);
 	guc_engine_add_msg(e, msg, SET_SCHED_PROPS);
 
 	return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 05/31] drm/xe: Long running job update
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (3 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 04/31] drm/xe: Use DRM_SCHED_POLICY_SINGLE_ENTITY mode Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-05 18:36   ` Rodrigo Vivi
  2023-05-08 13:14   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 06/31] drm/xe: Ensure LR engines are not persistent Matthew Brost
                   ` (27 subsequent siblings)
  32 siblings, 2 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

Flow control + write ring in exec, return NULL in run_job, siganl
xe_hw_fence immediately, and override TDR for LR jobs.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_engine.c           | 32 ++++++++
 drivers/gpu/drm/xe/xe_engine.h           |  4 +
 drivers/gpu/drm/xe/xe_exec.c             |  8 ++
 drivers/gpu/drm/xe/xe_guc_engine_types.h |  2 +
 drivers/gpu/drm/xe/xe_guc_submit.c       | 95 +++++++++++++++++++++---
 drivers/gpu/drm/xe/xe_trace.h            |  5 ++
 6 files changed, 137 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
index 094ec17d3004..d1e84d7adbd4 100644
--- a/drivers/gpu/drm/xe/xe_engine.c
+++ b/drivers/gpu/drm/xe/xe_engine.c
@@ -18,6 +18,7 @@
 #include "xe_macros.h"
 #include "xe_migrate.h"
 #include "xe_pm.h"
+#include "xe_ring_ops_types.h"
 #include "xe_trace.h"
 #include "xe_vm.h"
 
@@ -673,6 +674,37 @@ static void engine_kill_compute(struct xe_engine *e)
 	up_write(&e->vm->lock);
 }
 
+/**
+ * xe_engine_is_lr() - Whether an engine is long-running
+ * @e: The engine
+ *
+ * Return: True if the engine is long-running, false otherwise.
+ */
+bool xe_engine_is_lr(struct xe_engine *e)
+{
+	return e->vm && xe_vm_no_dma_fences(e->vm) &&
+		!(e->flags & ENGINE_FLAG_VM);
+}
+
+static s32 xe_engine_num_job_inflight(struct xe_engine *e)
+{
+	return e->lrc->fence_ctx.next_seqno - xe_lrc_seqno(e->lrc) - 1;
+}
+
+/**
+ * xe_engine_ring_full() - Whether an engine's ring is full
+ * @e: The engine
+ *
+ * Return: True if the engine's ring is full, false otherwise.
+ */
+bool xe_engine_ring_full(struct xe_engine *e)
+{
+	struct xe_lrc *lrc = e->lrc;
+	s32 max_job = lrc->ring.size / MAX_JOB_SIZE_BYTES;
+
+	return xe_engine_num_job_inflight(e) >= max_job;
+}
+
 /**
  * xe_engine_is_idle() - Whether an engine is idle.
  * @engine: The engine
diff --git a/drivers/gpu/drm/xe/xe_engine.h b/drivers/gpu/drm/xe/xe_engine.h
index a49cf2ab405e..2e60f6d90226 100644
--- a/drivers/gpu/drm/xe/xe_engine.h
+++ b/drivers/gpu/drm/xe/xe_engine.h
@@ -42,6 +42,10 @@ static inline bool xe_engine_is_parallel(struct xe_engine *engine)
 	return engine->width > 1;
 }
 
+bool xe_engine_is_lr(struct xe_engine *e);
+
+bool xe_engine_ring_full(struct xe_engine *e);
+
 bool xe_engine_is_idle(struct xe_engine *engine);
 
 void xe_engine_kill(struct xe_engine *e);
diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index ea869f2452ef..44ea9bcd0066 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -13,6 +13,7 @@
 #include "xe_device.h"
 #include "xe_engine.h"
 #include "xe_macros.h"
+#include "xe_ring_ops_types.h"
 #include "xe_sched_job.h"
 #include "xe_sync.h"
 #include "xe_vm.h"
@@ -277,6 +278,11 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		goto err_engine_end;
 	}
 
+	if (xe_engine_is_lr(engine) && xe_engine_ring_full(engine)) {
+		err = -EWOULDBLOCK;
+		goto err_engine_end;
+	}
+
 	job = xe_sched_job_create(engine, xe_engine_is_parallel(engine) ?
 				  addresses : &args->address);
 	if (IS_ERR(job)) {
@@ -363,6 +369,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		xe_sync_entry_signal(&syncs[i], job,
 				     &job->drm.s_fence->finished);
 
+	if (xe_engine_is_lr(engine))
+		engine->ring_ops->emit_job(job);
 	xe_sched_job_push(job);
 	xe_vm_reactivate_rebind(vm);
 
diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
index cbfb13026ec1..5d83132034a6 100644
--- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
@@ -31,6 +31,8 @@ struct xe_guc_engine {
 	 */
 #define MAX_STATIC_MSG_TYPE	3
 	struct drm_sched_msg static_msgs[MAX_STATIC_MSG_TYPE];
+	/** @lr_tdr: long running TDR worker */
+	struct work_struct lr_tdr;
 	/** @fini_async: do final fini async from this worker */
 	struct work_struct fini_async;
 	/** @resume_time: time of last resume */
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 68d09e7a4cc0..0a41f5d04f6d 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -500,6 +500,14 @@ static void register_engine(struct xe_engine *e)
 		parallel_write(xe, map, wq_desc.wq_status, WQ_STATUS_ACTIVE);
 	}
 
+	/*
+	 * We must keep a reference for LR engines if engine is registered with
+	 * the GuC as jobs signal immediately and can't destroy an engine if the
+	 * GuC has a reference to it.
+	 */
+	if (xe_engine_is_lr(e))
+		xe_engine_get(e);
+
 	set_engine_registered(e);
 	trace_xe_engine_register(e);
 	if (xe_engine_is_parallel(e))
@@ -662,6 +670,7 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
 {
 	struct xe_sched_job *job = to_xe_sched_job(drm_job);
 	struct xe_engine *e = job->engine;
+	bool lr = xe_engine_is_lr(e);
 
 	XE_BUG_ON((engine_destroyed(e) || engine_pending_disable(e)) &&
 		  !engine_banned(e) && !engine_suspended(e));
@@ -671,14 +680,19 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
 	if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
 		if (!engine_registered(e))
 			register_engine(e);
-		e->ring_ops->emit_job(job);
+		if (!lr)	/* Written in IOCTL */
+			e->ring_ops->emit_job(job);
 		submit_engine(e);
 	}
 
-	if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags))
+	if (lr) {
+		xe_sched_job_set_error(job, -ENOTSUPP);
+		return NULL;
+	} else if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags)) {
 		return job->fence;
-	else
+	} else {
 		return dma_fence_get(job->fence);
+	}
 }
 
 static void guc_engine_free_job(struct drm_sched_job *drm_job)
@@ -782,6 +796,57 @@ static void simple_error_capture(struct xe_engine *e)
 }
 #endif
 
+static void xe_guc_engine_trigger_cleanup(struct xe_engine *e)
+{
+	struct xe_guc *guc = engine_to_guc(e);
+
+	if (xe_engine_is_lr(e))
+		queue_work(guc_to_gt(guc)->ordered_wq, &e->guc->lr_tdr);
+	else
+		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
+}
+
+static void xe_guc_engine_lr_cleanup(struct work_struct *w)
+{
+	struct xe_guc_engine *ge =
+		container_of(w, struct xe_guc_engine, lr_tdr);
+	struct xe_engine *e = ge->engine;
+	struct drm_gpu_scheduler *sched = &ge->sched;
+
+	XE_BUG_ON(!xe_engine_is_lr(e));
+	trace_xe_engine_lr_cleanup(e);
+
+	/* Kill the run_job / process_msg entry points */
+	drm_sched_run_wq_stop(sched);
+
+	/* Engine state now stable, disable scheduling / deregister if needed */
+	if (engine_registered(e)) {
+		struct xe_guc *guc = engine_to_guc(e);
+		int ret;
+
+		set_engine_banned(e);
+		xe_engine_get(e);
+		disable_scheduling_deregister(guc, e);
+
+		/*
+		 * Must wait for scheduling to be disabled before signalling
+		 * any fences, if GT broken the GT reset code should signal us.
+		 */
+		smp_rmb();
+		ret = wait_event_timeout(guc->ct.wq,
+					 !engine_pending_disable(e) ||
+					 guc_read_stopped(guc), HZ * 5);
+		if (!ret) {
+			XE_WARN_ON("Schedule disable failed to respond");
+			drm_sched_run_wq_start(sched);
+			xe_gt_reset_async(e->gt);
+			return;
+		}
+	}
+
+	drm_sched_run_wq_start(sched);
+}
+
 static enum drm_gpu_sched_stat
 guc_engine_timedout_job(struct drm_sched_job *drm_job)
 {
@@ -832,7 +897,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
 			err = -EIO;
 		set_engine_banned(e);
 		xe_engine_get(e);
-		disable_scheduling_deregister(engine_to_guc(e), e);
+		disable_scheduling_deregister(guc, e);
 
 		/*
 		 * Must wait for scheduling to be disabled before signalling
@@ -865,7 +930,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
 	 */
 	list_add(&drm_job->list, &sched->pending_list);
 	drm_sched_run_wq_start(sched);
-	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
+	xe_guc_engine_trigger_cleanup(e);
 
 	/* Mark all outstanding jobs as bad, thus completing them */
 	spin_lock(&sched->job_list_lock);
@@ -889,6 +954,8 @@ static void __guc_engine_fini_async(struct work_struct *w)
 
 	trace_xe_engine_destroy(e);
 
+	if (xe_engine_is_lr(e))
+		cancel_work_sync(&ge->lr_tdr);
 	if (e->flags & ENGINE_FLAG_PERSISTENT)
 		xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
 	release_guc_id(guc, e);
@@ -906,7 +973,7 @@ static void guc_engine_fini_async(struct xe_engine *e)
 	bool kernel = e->flags & ENGINE_FLAG_KERNEL;
 
 	INIT_WORK(&e->guc->fini_async, __guc_engine_fini_async);
-	queue_work(system_unbound_wq, &e->guc->fini_async);
+	queue_work(system_wq, &e->guc->fini_async);
 
 	/* We must block on kernel engines so slabs are empty on driver unload */
 	if (kernel) {
@@ -1089,12 +1156,16 @@ static int guc_engine_init(struct xe_engine *e)
 	if (err)
 		goto err_free;
 
+
 	sched = &ge->sched;
 	err = drm_sched_entity_init(&ge->entity, DRM_SCHED_PRIORITY_NORMAL,
 				    &sched, 1, NULL);
 	if (err)
 		goto err_sched;
 
+	if (xe_engine_is_lr(e))
+		INIT_WORK(&e->guc->lr_tdr, xe_guc_engine_lr_cleanup);
+
 	mutex_lock(&guc->submission_state.lock);
 
 	err = alloc_guc_id(guc, e);
@@ -1146,7 +1217,7 @@ static void guc_engine_kill(struct xe_engine *e)
 {
 	trace_xe_engine_kill(e);
 	set_engine_killed(e);
-	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
+	xe_guc_engine_trigger_cleanup(e);
 }
 
 static void guc_engine_add_msg(struct xe_engine *e, struct drm_sched_msg *msg,
@@ -1296,6 +1367,9 @@ static void guc_engine_stop(struct xe_guc *guc, struct xe_engine *e)
 	/* Stop scheduling + flush any DRM scheduler operations */
 	drm_sched_run_wq_stop(sched);
 
+	if (engine_registered(e) && xe_engine_is_lr(e))
+		xe_engine_put(e);
+
 	/* Clean up lost G2H + reset engine state */
 	if (engine_destroyed(e) && engine_registered(e)) {
 		if (engine_banned(e))
@@ -1520,6 +1594,9 @@ int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	trace_xe_engine_deregister_done(e);
 
 	clear_engine_registered(e);
+	if (xe_engine_is_lr(e))
+		xe_engine_put(e);
+
 	if (engine_banned(e))
 		xe_engine_put(e);
 	else
@@ -1557,7 +1634,7 @@ int xe_guc_engine_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	 */
 	set_engine_reset(e);
 	if (!engine_banned(e))
-		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
+		xe_guc_engine_trigger_cleanup(e);
 
 	return 0;
 }
@@ -1584,7 +1661,7 @@ int xe_guc_engine_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
 	/* Treat the same as engine reset */
 	set_engine_reset(e);
 	if (!engine_banned(e))
-		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
+		xe_guc_engine_trigger_cleanup(e);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index 2f8eb7ebe9a7..02861c26e145 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -219,6 +219,11 @@ DEFINE_EVENT(xe_engine, xe_engine_resubmit,
 	     TP_ARGS(e)
 );
 
+DEFINE_EVENT(xe_engine, xe_engine_lr_cleanup,
+	     TP_PROTO(struct xe_engine *e),
+	     TP_ARGS(e)
+);
+
 DECLARE_EVENT_CLASS(xe_sched_job,
 		    TP_PROTO(struct xe_sched_job *job),
 		    TP_ARGS(job),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 06/31] drm/xe: Ensure LR engines are not persistent
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (4 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 05/31] drm/xe: Long running job update Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-05 18:38   ` Rodrigo Vivi
  2023-05-09 12:21   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 07/31] drm/xe: Only try to lock external BOs in VM bind Matthew Brost
                   ` (26 subsequent siblings)
  32 siblings, 2 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

With our ref counting scheme LR engines only close properly if not
persistent, ensure that LR engines are non-persistent.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_engine.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
index d1e84d7adbd4..91600b1e8249 100644
--- a/drivers/gpu/drm/xe/xe_engine.c
+++ b/drivers/gpu/drm/xe/xe_engine.c
@@ -596,7 +596,9 @@ int xe_engine_create_ioctl(struct drm_device *dev, void *data,
 			return -ENOENT;
 
 		e = xe_engine_create(xe, vm, logical_mask,
-				     args->width, hwe, ENGINE_FLAG_PERSISTENT);
+				     args->width, hwe,
+				     xe_vm_no_dma_fences(vm) ? 0 :
+				     ENGINE_FLAG_PERSISTENT);
 		xe_vm_put(vm);
 		if (IS_ERR(e))
 			return PTR_ERR(e);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 07/31] drm/xe: Only try to lock external BOs in VM bind
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (5 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 06/31] drm/xe: Ensure LR engines are not persistent Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-05 18:40   ` Rodrigo Vivi
  2023-05-08  1:17   ` Christopher Snowhill
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move Matthew Brost
                   ` (25 subsequent siblings)
  32 siblings, 2 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost

Not needed and causes some issues with bulk LRU moves.

Signed-off-by: Matthew Brost <mattthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 272f0f7f24fe..6c427ff92c44 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2064,9 +2064,11 @@ static int vm_bind_ioctl(struct xe_vm *vm, struct xe_vma *vma,
 		 */
 		xe_bo_get(vbo);
 
-		tv_bo.bo = &vbo->ttm;
-		tv_bo.num_shared = 1;
-		list_add(&tv_bo.head, &objs);
+		if (!vbo->vm) {
+			tv_bo.bo = &vbo->ttm;
+			tv_bo.num_shared = 1;
+			list_add(&tv_bo.head, &objs);
+		}
 	}
 
 again:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (6 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 07/31] drm/xe: Only try to lock external BOs in VM bind Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-08 21:39   ` Rodrigo Vivi
  2023-05-09 12:47   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 09/31] drm/xe/guc: Read HXG fields from DW1 of G2H response Matthew Brost
                   ` (24 subsequent siblings)
  32 siblings, 2 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

Use the TTM LRU bulk move for BOs tied to a VM. Update the bulk moves
LRU position on every exec.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c       | 32 ++++++++++++++++++++++++++++----
 drivers/gpu/drm/xe/xe_bo.h       |  4 ++--
 drivers/gpu/drm/xe/xe_dma_buf.c  |  2 +-
 drivers/gpu/drm/xe/xe_exec.c     |  6 ++++++
 drivers/gpu/drm/xe/xe_vm_types.h |  3 +++
 5 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 3ab404e33fae..da99ee53e7d7 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -985,6 +985,23 @@ static void xe_gem_object_free(struct drm_gem_object *obj)
 	ttm_bo_put(container_of(obj, struct ttm_buffer_object, base));
 }
 
+static void xe_gem_object_close(struct drm_gem_object *obj,
+				struct drm_file *file_priv)
+{
+	struct xe_bo *bo = gem_to_xe_bo(obj);
+
+	if (bo->vm && !xe_vm_no_dma_fences(bo->vm)) {
+		struct ww_acquire_ctx ww;
+
+		XE_BUG_ON(!xe_bo_is_user(bo));
+
+		xe_bo_lock(bo, &ww, 0, false);
+		ttm_bo_set_bulk_move(&bo->ttm, NULL);
+		xe_bo_unlock(bo, &ww);
+	}
+}
+
+
 static bool should_migrate_to_system(struct xe_bo *bo)
 {
 	struct xe_device *xe = xe_bo_device(bo);
@@ -1040,6 +1057,7 @@ static const struct vm_operations_struct xe_gem_vm_ops = {
 
 static const struct drm_gem_object_funcs xe_gem_object_funcs = {
 	.free = xe_gem_object_free,
+	.close = xe_gem_object_close,
 	.mmap = drm_gem_ttm_mmap,
 	.export = xe_gem_prime_export,
 	.vm_ops = &xe_gem_vm_ops,
@@ -1081,8 +1099,8 @@ void xe_bo_free(struct xe_bo *bo)
 
 struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 				    struct xe_gt *gt, struct dma_resv *resv,
-				    size_t size, enum ttm_bo_type type,
-				    u32 flags)
+				    struct ttm_lru_bulk_move *bulk, size_t size,
+				    enum ttm_bo_type type, u32 flags)
 {
 	struct ttm_operation_ctx ctx = {
 		.interruptible = true,
@@ -1149,7 +1167,10 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 		return ERR_PTR(err);
 
 	bo->created = true;
-	ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
+	if (bulk)
+		ttm_bo_set_bulk_move(&bo->ttm, bulk);
+	else
+		ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
 
 	return bo;
 }
@@ -1219,7 +1240,10 @@ xe_bo_create_locked_range(struct xe_device *xe,
 		}
 	}
 
-	bo = __xe_bo_create_locked(xe, bo, gt, vm ? &vm->resv : NULL, size,
+	bo = __xe_bo_create_locked(xe, bo, gt, vm ? &vm->resv : NULL,
+				   vm && !xe_vm_no_dma_fences(vm) &&
+				   flags & XE_BO_CREATE_USER_BIT ?
+				   &vm->lru_bulk_move : NULL, size,
 				   type, flags);
 	if (IS_ERR(bo))
 		return bo;
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 8354d05ccdf3..25457b3c757b 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -81,8 +81,8 @@ void xe_bo_free(struct xe_bo *bo);
 
 struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 				    struct xe_gt *gt, struct dma_resv *resv,
-				    size_t size, enum ttm_bo_type type,
-				    u32 flags);
+				    struct ttm_lru_bulk_move *bulk, size_t size,
+				    enum ttm_bo_type type, u32 flags);
 struct xe_bo *
 xe_bo_create_locked_range(struct xe_device *xe,
 			  struct xe_gt *gt, struct xe_vm *vm,
diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 9b252cc782b7..975dee1f770f 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -199,7 +199,7 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
 	int ret;
 
 	dma_resv_lock(resv, NULL);
-	bo = __xe_bo_create_locked(xe, storage, NULL, resv, dma_buf->size,
+	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
 				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
 	if (IS_ERR(bo)) {
 		ret = PTR_ERR(bo);
diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 44ea9bcd0066..21a9c2fddf86 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -374,6 +374,12 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	xe_sched_job_push(job);
 	xe_vm_reactivate_rebind(vm);
 
+	if (!err && !xe_vm_no_dma_fences(vm)) {
+		spin_lock(&xe->ttm.lru_lock);
+		ttm_lru_bulk_move_tail(&vm->lru_bulk_move);
+		spin_unlock(&xe->ttm.lru_lock);
+	}
+
 err_repin:
 	if (!xe_vm_no_dma_fences(vm))
 		up_read(&vm->userptr.notifier_lock);
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index fada7896867f..d3e99f22510d 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -164,6 +164,9 @@ struct xe_vm {
 	/** Protects @rebind_list and the page-table structures */
 	struct dma_resv resv;
 
+	/** @lru_bulk_move: Bulk LRU move list for this VM's BOs */
+	struct ttm_lru_bulk_move lru_bulk_move;
+
 	u64 size;
 	struct rb_root vmas;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 09/31] drm/xe/guc: Read HXG fields from DW1 of G2H response
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (7 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-05 18:50   ` Rodrigo Vivi
  2023-05-09 12:49   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 10/31] drm/xe/guc: Return the lower part of blocking H2G message Matthew Brost
                   ` (23 subsequent siblings)
  32 siblings, 2 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

The HXG fields are DW1 not DW0, fix this.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_ct.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 9055ff133a7c..6abf1dee95af 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -782,13 +782,13 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
 	if (type == GUC_HXG_TYPE_RESPONSE_FAILURE) {
 		g2h_fence->fail = true;
 		g2h_fence->error =
-			FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, msg[0]);
+			FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, msg[1]);
 		g2h_fence->hint =
-			FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, msg[0]);
+			FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, msg[1]);
 	} else if (type == GUC_HXG_TYPE_NO_RESPONSE_RETRY) {
 		g2h_fence->retry = true;
 		g2h_fence->reason =
-			FIELD_GET(GUC_HXG_RETRY_MSG_0_REASON, msg[0]);
+			FIELD_GET(GUC_HXG_RETRY_MSG_0_REASON, msg[1]);
 	} else if (g2h_fence->response_buffer) {
 		g2h_fence->response_len = response_len;
 		memcpy(g2h_fence->response_buffer, msg + GUC_CTB_MSG_MIN_LEN,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 10/31] drm/xe/guc: Return the lower part of blocking H2G message
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (8 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 09/31] drm/xe/guc: Read HXG fields from DW1 of G2H response Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-05 18:52   ` Rodrigo Vivi
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 11/31] drm/xe/guc: Use doorbells for submission if possible Matthew Brost
                   ` (22 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

The upper layers may need this data, an example of this is allocating
DIST doorbell.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_ct.c | 6 +++++-
 drivers/gpu/drm/xe/xe_guc_pc.c | 6 ++++--
 drivers/gpu/drm/xe/xe_huc.c    | 2 +-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 6abf1dee95af..60b69fcfac9f 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -25,6 +25,7 @@
 struct g2h_fence {
 	u32 *response_buffer;
 	u32 seqno;
+	u32 status;
 	u16 response_len;
 	u16 error;
 	u16 hint;
@@ -727,7 +728,7 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
 		ret = -EIO;
 	}
 
-	return ret > 0 ? 0 : ret;
+	return ret > 0 ? g2h_fence.status : ret;
 }
 
 int xe_guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
@@ -793,6 +794,9 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
 		g2h_fence->response_len = response_len;
 		memcpy(g2h_fence->response_buffer, msg + GUC_CTB_MSG_MIN_LEN,
 		       response_len * sizeof(u32));
+	} else {
+		g2h_fence->status =
+			FIELD_GET(GUC_HXG_RESPONSE_MSG_0_DATA0, msg[1]);
 	}
 
 	g2h_release_space(ct, GUC_CTB_HXG_MSG_MAX_LEN);
diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
index 72d460d5323b..3d2ea723a4a7 100644
--- a/drivers/gpu/drm/xe/xe_guc_pc.c
+++ b/drivers/gpu/drm/xe/xe_guc_pc.c
@@ -204,11 +204,13 @@ static int pc_action_query_task_state(struct xe_guc_pc *pc)
 
 	/* Blocking here to ensure the results are ready before reading them */
 	ret = xe_guc_ct_send_block(ct, action, ARRAY_SIZE(action));
-	if (ret)
+	if (ret < 0) {
 		drm_err(&pc_to_xe(pc)->drm,
 			"GuC PC query task state failed: %pe", ERR_PTR(ret));
+		return ret;
+	}
 
-	return ret;
+	return 0;
 }
 
 static int pc_action_set_param(struct xe_guc_pc *pc, u8 id, u32 value)
diff --git a/drivers/gpu/drm/xe/xe_huc.c b/drivers/gpu/drm/xe/xe_huc.c
index 55dcaab34ea4..9c48c3075410 100644
--- a/drivers/gpu/drm/xe/xe_huc.c
+++ b/drivers/gpu/drm/xe/xe_huc.c
@@ -39,7 +39,7 @@ int xe_huc_init(struct xe_huc *huc)
 
 	huc->fw.type = XE_UC_FW_TYPE_HUC;
 	ret = xe_uc_fw_init(&huc->fw);
-	if (ret)
+	if (ret < 0)
 		goto out;
 
 	xe_uc_fw_change_status(&huc->fw, XE_UC_FIRMWARE_LOADABLE);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 11/31] drm/xe/guc: Use doorbells for submission if possible
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (9 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 10/31] drm/xe/guc: Return the lower part of blocking H2G message Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-08 21:42   ` Rodrigo Vivi
                     ` (2 more replies)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 12/31] drm/xe/guc: Print doorbell ID in GuC engine debugfs entry Matthew Brost
                   ` (21 subsequent siblings)
  32 siblings, 3 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe; +Cc: Faith Ekstrand

We have 256 doorbells (on most platforms) that we can allocate to bypass
using the H2G channel for submission. This will avoid contention on the
CT mutex.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Suggested-by: Faith Ekstrand <faith.ekstrand@collabora.com>
---
 drivers/gpu/drm/xe/regs/xe_guc_regs.h    |   1 +
 drivers/gpu/drm/xe/xe_guc.c              |   6 +
 drivers/gpu/drm/xe/xe_guc_engine_types.h |   7 +
 drivers/gpu/drm/xe/xe_guc_submit.c       | 295 ++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
 drivers/gpu/drm/xe/xe_guc_types.h        |   4 +
 drivers/gpu/drm/xe/xe_trace.h            |   5 +
 7 files changed, 315 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/regs/xe_guc_regs.h b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
index 37e0ac550931..11b117293a62 100644
--- a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
@@ -109,6 +109,7 @@ struct guc_doorbell_info {
 
 #define DIST_DBS_POPULATED			XE_REG(0xd08)
 #define   DOORBELLS_PER_SQIDI_MASK		REG_GENMASK(23, 16)
+#define	  DOORBELLS_PER_SQIDI_SHIFT		16
 #define   SQIDIS_DOORBELL_EXIST_MASK		REG_GENMASK(15, 0)
 
 #define GUC_BCS_RCS_IER				XE_REG(0xC550)
diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 89d20faced19..0c87f78a868b 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -297,6 +297,12 @@ int xe_guc_init(struct xe_guc *guc)
  */
 int xe_guc_init_post_hwconfig(struct xe_guc *guc)
 {
+	int ret;
+
+	ret = xe_guc_submit_init_post_hwconfig(guc);
+	if (ret)
+		return ret;
+
 	return xe_guc_ads_init_post_hwconfig(&guc->ads);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
index 5d83132034a6..420b7f53e649 100644
--- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
@@ -12,6 +12,7 @@
 #include <drm/gpu_scheduler.h>
 
 struct dma_fence;
+struct xe_bo;
 struct xe_engine;
 
 /**
@@ -37,6 +38,10 @@ struct xe_guc_engine {
 	struct work_struct fini_async;
 	/** @resume_time: time of last resume */
 	u64 resume_time;
+	/** @doorbell_bo: BO for memory doorbell */
+	struct xe_bo *doorbell_bo;
+	/** @doorbell_offset: MMIO doorbell offset */
+	u32 doorbell_offset;
 	/** @state: GuC specific state for this xe_engine */
 	atomic_t state;
 	/** @wqi_head: work queue item tail */
@@ -45,6 +50,8 @@ struct xe_guc_engine {
 	u32 wqi_tail;
 	/** @id: GuC id for this xe_engine */
 	u16 id;
+	/** @doorbell_id: doorbell id */
+	u16 doorbell_id;
 	/** @suspend_wait: wait queue used to wait on pending suspends */
 	wait_queue_head_t suspend_wait;
 	/** @suspend_pending: a suspend of the engine is pending */
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 0a41f5d04f6d..1b6f36b04cd1 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -13,7 +13,10 @@
 
 #include <drm/drm_managed.h>
 
+#include "regs/xe_guc_regs.h"
 #include "regs/xe_lrc_layout.h"
+
+#include "xe_bo.h"
 #include "xe_device.h"
 #include "xe_engine.h"
 #include "xe_force_wake.h"
@@ -26,12 +29,22 @@
 #include "xe_lrc.h"
 #include "xe_macros.h"
 #include "xe_map.h"
+#include "xe_mmio.h"
 #include "xe_mocs.h"
 #include "xe_ring_ops_types.h"
 #include "xe_sched_job.h"
 #include "xe_trace.h"
 #include "xe_vm.h"
 
+#define HAS_GUC_MMIO_DB(xe) (IS_DGFX(xe) || GRAPHICS_VERx100(xe) >= 1250)
+#define HAS_GUC_DIST_DB(xe) \
+	(GRAPHICS_VERx100(xe) >= 1200 && !HAS_GUC_MMIO_DB(xe))
+
+#define GUC_NUM_HW_DOORBELLS 256
+
+#define GUC_MMIO_DB_BAR_OFFSET SZ_4M
+#define GUC_MMIO_DB_BAR_SIZE SZ_4M
+
 static struct xe_gt *
 guc_to_gt(struct xe_guc *guc)
 {
@@ -63,6 +76,7 @@ engine_to_guc(struct xe_engine *e)
 #define ENGINE_STATE_SUSPENDED		(1 << 5)
 #define ENGINE_STATE_RESET		(1 << 6)
 #define ENGINE_STATE_KILLED		(1 << 7)
+#define ENGINE_STATE_DB_REGISTERED	(1 << 8)
 
 static bool engine_registered(struct xe_engine *e)
 {
@@ -179,6 +193,16 @@ static void set_engine_killed(struct xe_engine *e)
 	atomic_or(ENGINE_STATE_KILLED, &e->guc->state);
 }
 
+static bool engine_doorbell_registered(struct xe_engine *e)
+{
+	return atomic_read(&e->guc->state) & ENGINE_STATE_DB_REGISTERED;
+}
+
+static void set_engine_doorbell_registered(struct xe_engine *e)
+{
+	atomic_or(ENGINE_STATE_DB_REGISTERED, &e->guc->state);
+}
+
 static bool engine_killed_or_banned(struct xe_engine *e)
 {
 	return engine_killed(e) || engine_banned(e);
@@ -190,6 +214,7 @@ static void guc_submit_fini(struct drm_device *drm, void *arg)
 
 	xa_destroy(&guc->submission_state.engine_lookup);
 	ida_destroy(&guc->submission_state.guc_ids);
+	ida_destroy(&guc->submission_state.doorbell_ids);
 	bitmap_free(guc->submission_state.guc_ids_bitmap);
 }
 
@@ -230,6 +255,7 @@ int xe_guc_submit_init(struct xe_guc *guc)
 	mutex_init(&guc->submission_state.lock);
 	xa_init(&guc->submission_state.engine_lookup);
 	ida_init(&guc->submission_state.guc_ids);
+	ida_init(&guc->submission_state.doorbell_ids);
 
 	spin_lock_init(&guc->submission_state.suspend.lock);
 	guc->submission_state.suspend.context = dma_fence_context_alloc(1);
@@ -243,6 +269,237 @@ int xe_guc_submit_init(struct xe_guc *guc)
 	return 0;
 }
 
+int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc)
+{
+	if (HAS_GUC_DIST_DB(guc_to_xe(guc))) {
+		u32 distdbreg = xe_mmio_read32(guc_to_gt(guc),
+					       DIST_DBS_POPULATED.reg);
+		u32 num_sqidi =
+			hweight32(distdbreg & SQIDIS_DOORBELL_EXIST_MASK);
+		u32 doorbells_per_sqidi =
+			((distdbreg >> DOORBELLS_PER_SQIDI_SHIFT) &
+			 DOORBELLS_PER_SQIDI_MASK) + 1;
+
+		guc->submission_state.num_doorbells =
+			num_sqidi * doorbells_per_sqidi;
+	} else {
+		guc->submission_state.num_doorbells = GUC_NUM_HW_DOORBELLS;
+	}
+
+	return 0;
+}
+
+static bool alloc_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
+{
+	int ret;
+
+	lockdep_assert_held(&guc->submission_state.lock);
+
+	e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
+	ret = ida_simple_get(&guc->submission_state.doorbell_ids, 0,
+			     guc->submission_state.num_doorbells, GFP_NOWAIT);
+	if (ret < 0)
+		return false;
+
+	e->guc->doorbell_id = ret;
+
+	return true;
+}
+
+static void release_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
+{
+	mutex_lock(&guc->submission_state.lock);
+	ida_simple_remove(&guc->submission_state.doorbell_ids,
+			  e->guc->doorbell_id);
+	mutex_unlock(&guc->submission_state.lock);
+
+	e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
+}
+
+static int allocate_doorbell(struct xe_guc *guc, u16 guc_id, u16 doorbell_id,
+			     u64 gpa, u32 gtt_addr)
+{
+	u32 action[] = {
+		XE_GUC_ACTION_ALLOCATE_DOORBELL,
+		guc_id,
+		doorbell_id,
+		lower_32_bits(gpa),
+		upper_32_bits(gpa),
+		gtt_addr
+	};
+
+	return xe_guc_ct_send_block(&guc->ct, action, ARRAY_SIZE(action));
+}
+
+static void deallocate_doorbell(struct xe_guc *guc, u16 guc_id)
+{
+	u32 action[] = {
+		XE_GUC_ACTION_DEALLOCATE_DOORBELL,
+		guc_id
+	};
+
+	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
+}
+
+static bool has_doorbell(struct xe_engine *e)
+{
+	return e->guc->doorbell_id != GUC_NUM_HW_DOORBELLS;
+}
+
+#define doorbell_read(guc_, e_, field_) ({			\
+	struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;	\
+	iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);	\
+	xe_map_rd_field(guc_to_xe((guc_)), &_vmap, 0,		\
+				  struct guc_doorbell_info, field_); \
+	})
+#define doorbell_write(guc_, e_, field_, val_) ({		\
+	struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;	\
+	iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);	\
+	xe_map_wr_field(guc_to_xe((guc_)), &_vmap, 0,		\
+				  struct guc_doorbell_info, field_, val_); \
+	})
+
+static void init_doorbell(struct xe_guc *guc, struct xe_engine *e)
+{
+	struct xe_device *xe = guc_to_xe(guc);
+
+	/* GuC does the initialization with distributed and MMIO doorbells */
+	if (!HAS_GUC_DIST_DB(xe) && !HAS_GUC_MMIO_DB(xe)) {
+		doorbell_write(guc, e, db_status, GUC_DOORBELL_ENABLED);
+		doorbell_write(guc, e, cookie, 0);
+	}
+}
+
+static void fini_doorbell(struct xe_guc *guc, struct xe_engine *e)
+{
+	if (!HAS_GUC_MMIO_DB(guc_to_xe(guc)) &&
+	    xe_device_mem_access_ongoing(guc_to_xe(guc)))
+		doorbell_write(guc, e, db_status, GUC_DOORBELL_DISABLED);
+}
+
+static void destroy_doorbell(struct xe_guc *guc, struct xe_engine *e)
+{
+	if (has_doorbell(e)) {
+		release_doorbell_id(guc, e);
+		xe_bo_unpin_map_no_vm(e->guc->doorbell_bo);
+	}
+}
+
+static void ring_memory_doorbell(struct xe_guc *guc, struct xe_engine *e)
+{
+	u32 cookie;
+
+	cookie = doorbell_read(guc, e, cookie);
+	doorbell_write(guc, e, cookie, cookie + 1 ?: cookie + 2);
+
+	XE_WARN_ON(doorbell_read(guc, e, db_status) != GUC_DOORBELL_ENABLED);
+}
+
+#define GUC_MMIO_DOORBELL_RING_ACK	0xACEDBEEF
+#define GUC_MMIO_DOORBELL_RING_NACK	0xDEADBEEF
+static void ring_mmio_doorbell(struct xe_guc *guc, u32 doorbell_offset)
+{
+	u32 db_value;
+
+	db_value = xe_mmio_read32(guc_to_gt(guc), GUC_MMIO_DB_BAR_OFFSET +
+				  doorbell_offset);
+
+	/*
+	 * The read from the doorbell page will return ack/nack. We don't remove
+	 * doorbells from active clients so we don't expect to ever get a nack.
+	 * XXX: if doorbell is lost, re-acquire it?
+	 */
+	XE_WARN_ON(db_value == GUC_MMIO_DOORBELL_RING_NACK);
+	XE_WARN_ON(db_value != GUC_MMIO_DOORBELL_RING_ACK);
+}
+
+static void ring_doorbell(struct xe_guc *guc, struct xe_engine *e)
+{
+	XE_BUG_ON(!has_doorbell(e));
+
+	if (HAS_GUC_MMIO_DB(guc_to_xe(guc)))
+		ring_mmio_doorbell(guc, e->guc->doorbell_offset);
+	else
+		ring_memory_doorbell(guc, e);
+
+	trace_xe_engine_ring_db(e);
+}
+
+static void register_engine(struct xe_engine *e);
+
+static int create_doorbell(struct xe_guc *guc, struct xe_engine *e, bool init)
+{
+	struct xe_gt *gt = guc_to_gt(guc);
+	struct xe_device *xe = gt_to_xe(gt);
+	u64 gpa;
+	u32 gtt_addr;
+	int ret;
+
+	XE_BUG_ON(!has_doorbell(e));
+
+	if (HAS_GUC_MMIO_DB(xe)) {
+		e->guc->doorbell_offset = PAGE_SIZE * e->guc->doorbell_id;
+		gpa = GUC_MMIO_DB_BAR_OFFSET + e->guc->doorbell_offset;
+		gtt_addr = 0;
+	} else {
+		struct xe_bo *bo;
+
+		if (!e->guc->doorbell_bo) {
+			bo = xe_bo_create_pin_map(xe, gt, NULL, PAGE_SIZE,
+						  ttm_bo_type_kernel,
+						  XE_BO_CREATE_VRAM_IF_DGFX(gt) |
+						  XE_BO_CREATE_GGTT_BIT);
+			if (IS_ERR(bo))
+				return PTR_ERR(bo);
+
+			e->guc->doorbell_bo = bo;
+		} else {
+			bo = e->guc->doorbell_bo;
+		}
+
+		init_doorbell(guc, e);
+		gpa = xe_bo_main_addr(bo, PAGE_SIZE);
+		gtt_addr = xe_bo_ggtt_addr(bo);
+	}
+
+	if (init && e->flags & ENGINE_FLAG_KERNEL)
+		return 0;
+
+	register_engine(e);
+	ret = allocate_doorbell(guc, e->guc->id, e->guc->doorbell_id, gpa,
+				gtt_addr);
+	if (ret < 0) {
+		fini_doorbell(guc, e);
+		return ret;
+	}
+
+	/*
+	 * In distributed doorbells, guc is returning the cacheline selected
+	 * by HW as part of the 7bit data from the allocate doorbell command:
+	 *  bit [22]   - Cacheline allocated
+	 *  bit [21:16] - Cacheline offset address
+	 * (bit 21 must be zero, or our assumption of only using half a page is
+	 * no longer correct).
+	 */
+	if (HAS_GUC_DIST_DB(xe)) {
+		u32 dd_cacheline_info;
+
+		XE_WARN_ON(!(ret & BIT(22)));
+		XE_WARN_ON(ret & BIT(21));
+
+		dd_cacheline_info = FIELD_GET(GENMASK(21, 16), ret);
+		e->guc->doorbell_offset = dd_cacheline_info * cache_line_size();
+
+		/* and verify db status was updated correctly by the guc fw */
+		XE_WARN_ON(doorbell_read(guc, e, db_status) !=
+			   GUC_DOORBELL_ENABLED);
+	}
+
+	set_engine_doorbell_registered(e);
+
+	return 0;
+}
+
 static int alloc_guc_id(struct xe_guc *guc, struct xe_engine *e)
 {
 	int ret;
@@ -623,6 +880,7 @@ static void submit_engine(struct xe_engine *e)
 	u32 num_g2h = 0;
 	int len = 0;
 	bool extra_submit = false;
+	bool enable = false;
 
 	XE_BUG_ON(!engine_registered(e));
 
@@ -642,6 +900,7 @@ static void submit_engine(struct xe_engine *e)
 		num_g2h = 1;
 		if (xe_engine_is_parallel(e))
 			extra_submit = true;
+		enable = true;
 
 		e->guc->resume_time = RESUME_PENDING;
 		set_engine_pending_enable(e);
@@ -653,7 +912,10 @@ static void submit_engine(struct xe_engine *e)
 		trace_xe_engine_submit(e);
 	}
 
-	xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
+	if (enable || !engine_doorbell_registered(e))
+		xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
+	else
+		ring_doorbell(guc, e);
 
 	if (extra_submit) {
 		len = 0;
@@ -678,8 +940,17 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
 	trace_xe_sched_job_run(job);
 
 	if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
-		if (!engine_registered(e))
-			register_engine(e);
+		if (!engine_registered(e)) {
+			if (has_doorbell(e)) {
+				int err = create_doorbell(engine_to_guc(e), e,
+							  false);
+
+				/* Not fatal, but let's warn */
+				XE_WARN_ON(err);
+			} else {
+				register_engine(e);
+			}
+		}
 		if (!lr)	/* Written in IOCTL */
 			e->ring_ops->emit_job(job);
 		submit_engine(e);
@@ -722,6 +993,11 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
 	MAKE_SCHED_CONTEXT_ACTION(e, DISABLE);
 	int ret;
 
+	if (has_doorbell(e)) {
+		fini_doorbell(guc, e);
+		deallocate_doorbell(guc, e->guc->id);
+	}
+
 	set_min_preemption_timeout(guc, e);
 	smp_rmb();
 	ret = wait_event_timeout(guc->ct.wq, !engine_pending_enable(e) ||
@@ -958,6 +1234,7 @@ static void __guc_engine_fini_async(struct work_struct *w)
 		cancel_work_sync(&ge->lr_tdr);
 	if (e->flags & ENGINE_FLAG_PERSISTENT)
 		xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
+	destroy_doorbell(guc, e);
 	release_guc_id(guc, e);
 	drm_sched_entity_fini(&ge->entity);
 	drm_sched_fini(&ge->sched);
@@ -1136,6 +1413,7 @@ static int guc_engine_init(struct xe_engine *e)
 	struct xe_guc_engine *ge;
 	long timeout;
 	int err;
+	bool create_db = false;
 
 	XE_BUG_ON(!xe_device_guc_submission_enabled(guc_to_xe(guc)));
 
@@ -1177,8 +1455,17 @@ static int guc_engine_init(struct xe_engine *e)
 	if (guc_read_stopped(guc))
 		drm_sched_stop(sched, NULL);
 
+	create_db = alloc_doorbell_id(guc, e);
+
 	mutex_unlock(&guc->submission_state.lock);
 
+	if (create_db) {
+		/* Error isn't fatal as we don't need a doorbell */
+		err = create_doorbell(guc, e, true);
+		if (err)
+			release_doorbell_id(guc, e);
+	}
+
 	switch (e->class) {
 	case XE_ENGINE_CLASS_RENDER:
 		sprintf(e->name, "rcs%d", e->guc->id);
@@ -1302,7 +1589,7 @@ static int guc_engine_set_job_timeout(struct xe_engine *e, u32 job_timeout_ms)
 {
 	struct drm_gpu_scheduler *sched = &e->guc->sched;
 
-	XE_BUG_ON(engine_registered(e));
+	XE_BUG_ON(engine_registered(e) && !has_doorbell(e));
 	XE_BUG_ON(engine_banned(e));
 	XE_BUG_ON(engine_killed(e));
 
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
index 8002734d6f24..bada6c02d6aa 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.h
+++ b/drivers/gpu/drm/xe/xe_guc_submit.h
@@ -13,6 +13,7 @@ struct xe_engine;
 struct xe_guc;
 
 int xe_guc_submit_init(struct xe_guc *guc);
+int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc);
 void xe_guc_submit_print(struct xe_guc *guc, struct drm_printer *p);
 
 int xe_guc_submit_reset_prepare(struct xe_guc *guc);
diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
index ac7eec28934d..9ee4d572f4e0 100644
--- a/drivers/gpu/drm/xe/xe_guc_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_types.h
@@ -36,10 +36,14 @@ struct xe_guc {
 		struct xarray engine_lookup;
 		/** @guc_ids: used to allocate new guc_ids, single-lrc */
 		struct ida guc_ids;
+		/** @doorbell_ids: use to allocate new doorbells */
+		struct ida doorbell_ids;
 		/** @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc */
 		unsigned long *guc_ids_bitmap;
 		/** @stopped: submissions are stopped */
 		atomic_t stopped;
+		/** @num_doorbells: number of doorbels */
+		int num_doorbells;
 		/** @lock: protects submission state */
 		struct mutex lock;
 		/** @suspend: suspend fence state */
diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index 02861c26e145..38e9d7c6197b 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -149,6 +149,11 @@ DEFINE_EVENT(xe_engine, xe_engine_submit,
 	     TP_ARGS(e)
 );
 
+DEFINE_EVENT(xe_engine, xe_engine_ring_db,
+	     TP_PROTO(struct xe_engine *e),
+	     TP_ARGS(e)
+);
+
 DEFINE_EVENT(xe_engine, xe_engine_scheduling_enable,
 	     TP_PROTO(struct xe_engine *e),
 	     TP_ARGS(e)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 12/31] drm/xe/guc: Print doorbell ID in GuC engine debugfs entry
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (10 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 11/31] drm/xe/guc: Use doorbells for submission if possible Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-05 18:55   ` Rodrigo Vivi
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 13/31] maple_tree: split up MA_STATE() macro Matthew Brost
                   ` (20 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

This information is helpful so print it.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 1b6f36b04cd1..880f480c6d5f 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -2016,6 +2016,8 @@ static void guc_engine_print(struct xe_engine *e, struct drm_printer *p)
 	drm_printf(p, "\tTimeslice: %u (us)\n", e->sched_props.timeslice_us);
 	drm_printf(p, "\tPreempt timeout: %u (us)\n",
 		   e->sched_props.preempt_timeout_us);
+	drm_printf(p, "\tDoorbell ID: %u\n",
+		   e->guc->doorbell_id);
 	for (i = 0; i < e->width; ++i ) {
 		struct xe_lrc *lrc = e->lrc + i;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 13/31] maple_tree: split up MA_STATE() macro
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (11 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 12/31] drm/xe/guc: Print doorbell ID in GuC engine debugfs entry Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-09 13:21   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 14/31] maple_tree: Export mas_preallocate Matthew Brost
                   ` (19 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe; +Cc: Danilo Krummrich

From: Danilo Krummrich <dakr@redhat.com>

Split up the MA_STATE() macro such that components using the maple tree
can easily inherit from struct ma_state and build custom tree walk
macros to hide their internals from users.

Example:

struct sample_iterator {
        struct ma_state mas;
        struct sample_mgr *mgr;
};

\#define SAMPLE_ITERATOR(name, __mgr, start)                    \
        struct sample_iterator name = {                         \
                .mas = MA_STATE_INIT(&(__mgr)->mt, start, 0),   \
                .mgr = __mgr,                                   \
        }

\#define sample_iter_for_each_range(it__, entry__, end__) \
        mas_for_each(&(it__).mas, entry__, end__)

--

struct sample *sample;
SAMPLE_ITERATOR(si, min);

sample_iter_for_each_range(&si, sample, max) {
        frob(mgr, sample);
}

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 include/linux/maple_tree.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index 1fadb5f5978b..87d55334f1c2 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -423,8 +423,8 @@ struct ma_wr_state {
 #define MA_ERROR(err) \
 		((struct maple_enode *)(((unsigned long)err << 2) | 2UL))
 
-#define MA_STATE(name, mt, first, end)					\
-	struct ma_state name = {					\
+#define MA_STATE_INIT(mt, first, end)					\
+	{								\
 		.tree = mt,						\
 		.index = first,						\
 		.last = end,						\
@@ -435,6 +435,9 @@ struct ma_wr_state {
 		.mas_flags = 0,						\
 	}
 
+#define MA_STATE(name, mt, first, end)					\
+	struct ma_state name = MA_STATE_INIT(mt, first, end)
+
 #define MA_WR_STATE(name, ma_state, wr_entry)				\
 	struct ma_wr_state name = {					\
 		.mas = ma_state,					\
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 14/31] maple_tree: Export mas_preallocate
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (12 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 13/31] maple_tree: split up MA_STATE() macro Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-09 13:33   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 15/31] drm: manager to keep track of GPUs VA mappings Matthew Brost
                   ` (18 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

The DRM GPUVA implementation needs this function.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 lib/maple_tree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 9e2735cbc2b4..ae37a167e25d 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5726,6 +5726,7 @@ int mas_preallocate(struct ma_state *mas, gfp_t gfp)
 	mas_reset(mas);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(mas_preallocate);
 
 /*
  * mas_destroy() - destroy a maple state.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 15/31] drm: manager to keep track of GPUs VA mappings
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (13 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 14/31] maple_tree: Export mas_preallocate Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-09 13:49   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 16/31] drm/xe: Port Xe to GPUVA Matthew Brost
                   ` (17 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe; +Cc: Dave Airlie, Danilo Krummrich

From: Danilo Krummrich <dakr@redhat.com>

Add infrastructure to keep track of GPU virtual address (VA) mappings
with a decicated VA space manager implementation.

New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
start implementing, allow userspace applications to request multiple and
arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
intended to serve the following purposes in this context.

1) Provide infrastructure to track GPU VA allocations and mappings,
   making use of the maple_tree.

2) Generically connect GPU VA mappings to their backing buffers, in
   particular DRM GEM objects.

3) Provide a common implementation to perform more complex mapping
   operations on the GPU VA space. In particular splitting and merging
   of GPU VA mappings, e.g. for intersecting mapping requests or partial
   unmap requests.

Suggested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 Documentation/gpu/drm-mm.rst    |   31 +
 drivers/gpu/drm/Makefile        |    1 +
 drivers/gpu/drm/drm_debugfs.c   |   41 +
 drivers/gpu/drm/drm_gem.c       |    3 +
 drivers/gpu/drm/drm_gpuva_mgr.c | 1686 +++++++++++++++++++++++++++++++
 include/drm/drm_debugfs.h       |   24 +
 include/drm/drm_drv.h           |    7 +
 include/drm/drm_gem.h           |   75 ++
 include/drm/drm_gpuva_mgr.h     |  681 +++++++++++++
 9 files changed, 2549 insertions(+)
 create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
 create mode 100644 include/drm/drm_gpuva_mgr.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index a79fd3549ff8..fe40ee686f6e 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
 .. kernel-doc:: drivers/gpu/drm/drm_mm.c
    :export:
 
+DRM GPU VA Manager
+==================
+
+Overview
+--------
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Overview
+
+Split and Merge
+---------------
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Split and Merge
+
+Locking
+-------
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Locking
+
+
+DRM GPU VA Manager Function References
+--------------------------------------
+
+.. kernel-doc:: include/drm/drm_gpuva_mgr.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :export:
+
 DRM Buddy Allocator
 ===================
 
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 66dd2c48944a..ad6267273503 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -46,6 +46,7 @@ drm-y := \
 	drm_vblank.o \
 	drm_vblank_work.o \
 	drm_vma_manager.o \
+	drm_gpuva_mgr.o \
 	drm_writeback.o
 drm-$(CONFIG_DRM_LEGACY) += \
 	drm_agpsupport.o \
diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c
index 4855230ba2c6..2191a00b080e 100644
--- a/drivers/gpu/drm/drm_debugfs.c
+++ b/drivers/gpu/drm/drm_debugfs.c
@@ -38,6 +38,7 @@
 #include <drm/drm_edid.h>
 #include <drm/drm_file.h>
 #include <drm/drm_gem.h>
+#include <drm/drm_gpuva_mgr.h>
 #include <drm/drm_managed.h>
 
 #include "drm_crtc_internal.h"
@@ -175,6 +176,46 @@ static const struct file_operations drm_debugfs_fops = {
 	.release = single_release,
 };
 
+/**
+ * drm_debugfs_gpuva_info - dump the given DRM GPU VA space
+ * @m: pointer to the &seq_file to write
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ *
+ * Dumps the GPU VA mappings of a given DRM GPU VA manager.
+ *
+ * For each DRM GPU VA space drivers should call this function from their
+ * &drm_info_list's show callback.
+ *
+ * Returns: 0 on success, -ENODEV if the &mgr is not initialized
+ */
+int drm_debugfs_gpuva_info(struct seq_file *m,
+			   struct drm_gpuva_manager *mgr)
+{
+	DRM_GPUVA_ITER(it, mgr, 0);
+	struct drm_gpuva *va, *kva = &mgr->kernel_alloc_node;
+
+	if (!mgr->name)
+		return -ENODEV;
+
+	seq_printf(m, "DRM GPU VA space (%s) [0x%016llx;0x%016llx]\n",
+		   mgr->name, mgr->mm_start, mgr->mm_start + mgr->mm_range);
+	seq_printf(m, "Kernel reserved node [0x%016llx;0x%016llx]\n",
+		   kva->va.addr, kva->va.addr + kva->va.range);
+	seq_puts(m, "\n");
+	seq_puts(m, " VAs | start              | range              | end                | object             | object offset\n");
+	seq_puts(m, "-------------------------------------------------------------------------------------------------------------\n");
+	drm_gpuva_iter_for_each(va, it) {
+		if (unlikely(va == &mgr->kernel_alloc_node))
+			continue;
+
+		seq_printf(m, "     | 0x%016llx | 0x%016llx | 0x%016llx | 0x%016llx | 0x%016llx\n",
+			   va->va.addr, va->va.range, va->va.addr + va->va.range,
+			   (u64)va->gem.obj, va->gem.offset);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_debugfs_gpuva_info);
 
 /**
  * drm_debugfs_create_files - Initialize a given set of debugfs files for DRM
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index a6208e2c089b..15fe61856190 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
 	if (!obj->resv)
 		obj->resv = &obj->_resv;
 
+	if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA))
+		drm_gem_gpuva_init(obj);
+
 	drm_vma_node_reset(&obj->vma_node);
 	INIT_LIST_HEAD(&obj->lru_node);
 }
diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
new file mode 100644
index 000000000000..bd7d27ee44bb
--- /dev/null
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -0,0 +1,1686 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *     Danilo Krummrich <dakr@redhat.com>
+ *
+ */
+
+#include <drm/drm_gem.h>
+#include <drm/drm_gpuva_mgr.h>
+
+/**
+ * DOC: Overview
+ *
+ * The DRM GPU VA Manager, represented by struct drm_gpuva_manager keeps track
+ * of a GPU's virtual address (VA) space and manages the corresponding virtual
+ * mappings represented by &drm_gpuva objects. It also keeps track of the
+ * mapping's backing &drm_gem_object buffers.
+ *
+ * &drm_gem_object buffers maintain a list (and a corresponding list lock) of
+ * &drm_gpuva objects representing all existent GPU VA mappings using this
+ * &drm_gem_object as backing buffer.
+ *
+ * GPU VAs can be flagged as sparse, such that drivers may use GPU VAs to also
+ * keep track of sparse PTEs in order to support Vulkan 'Sparse Resources'.
+ *
+ * The GPU VA manager internally uses a &maple_tree to manage the
+ * &drm_gpuva mappings within a GPU's virtual address space.
+ *
+ * The &drm_gpuva_manager contains a special &drm_gpuva representing the
+ * portion of VA space reserved by the kernel. This node is initialized together
+ * with the GPU VA manager instance and removed when the GPU VA manager is
+ * destroyed.
+ *
+ * In a typical application drivers would embed struct drm_gpuva_manager and
+ * struct drm_gpuva within their own driver specific structures, there won't be
+ * any memory allocations of it's own nor memory allocations of &drm_gpuva
+ * entries.
+ *
+ * However, the &drm_gpuva_manager needs to allocate nodes for it's internal
+ * tree structures when &drm_gpuva entries are inserted. In order to support
+ * inserting &drm_gpuva entries from dma-fence signalling critical sections the
+ * &drm_gpuva_manager provides struct drm_gpuva_prealloc. Drivers may create
+ * pre-allocated nodes which drm_gpuva_prealloc_create() and subsequently insert
+ * a new &drm_gpuva entry with drm_gpuva_insert_prealloc().
+ */
+
+/**
+ * DOC: Split and Merge
+ *
+ * The DRM GPU VA manager also provides an algorithm implementing splitting and
+ * merging of existent GPU VA mappings with the ones that are requested to be
+ * mapped or unmapped. This feature is required by the Vulkan API to implement
+ * Vulkan 'Sparse Memory Bindings' - drivers UAPIs often refer to this as
+ * VM BIND.
+ *
+ * Drivers can call drm_gpuva_sm_map() to receive a sequence of callbacks
+ * containing map, unmap and remap operations for a given newly requested
+ * mapping. The sequence of callbacks represents the set of operations to
+ * execute in order to integrate the new mapping cleanly into the current state
+ * of the GPU VA space.
+ *
+ * Depending on how the new GPU VA mapping intersects with the existent mappings
+ * of the GPU VA space the &drm_gpuva_fn_ops callbacks contain an arbitrary
+ * amount of unmap operations, a maximum of two remap operations and a single
+ * map operation. The caller might receive no callback at all if no operation is
+ * required, e.g. if the requested mapping already exists in the exact same way.
+ *
+ * The single map operation represents the original map operation requested by
+ * the caller.
+ *
+ * &drm_gpuva_op_unmap contains a 'keep' field, which indicates whether the
+ * &drm_gpuva to unmap is physically contiguous with the original mapping
+ * request. Optionally, if 'keep' is set, drivers may keep the actual page table
+ * entries for this &drm_gpuva, adding the missing page table entries only and
+ * update the &drm_gpuva_manager's view of things accordingly.
+ *
+ * Drivers may do the same optimization, namely delta page table updates, also
+ * for remap operations. This is possible since &drm_gpuva_op_remap consists of
+ * one unmap operation and one or two map operations, such that drivers can
+ * derive the page table update delta accordingly.
+ *
+ * Note that there can't be more than two existent mappings to split up, one at
+ * the beginning and one at the end of the new mapping, hence there is a
+ * maximum of two remap operations.
+ *
+ * Analogous to drm_gpuva_sm_map() drm_gpuva_sm_unmap() uses &drm_gpuva_fn_ops
+ * to call back into the driver in order to unmap a range of GPU VA space. The
+ * logic behind this function is way simpler though: For all existent mappings
+ * enclosed by the given range unmap operations are created. For mappings which
+ * are only partically located within the given range, remap operations are
+ * created such that those mappings are split up and re-mapped partically.
+ *
+ * To update the &drm_gpuva_manager's view of the GPU VA space
+ * drm_gpuva_insert(), drm_gpuva_insert_prealloc(), and drm_gpuva_remove() may
+ * be used. Please note that these functions are not safe to be called from a
+ * &drm_gpuva_fn_ops callback originating from drm_gpuva_sm_map() or
+ * drm_gpuva_sm_unmap(). The drm_gpuva_map(), drm_gpuva_remap() and
+ * drm_gpuva_unmap() helpers should be used instead.
+ *
+ * The following diagram depicts the basic relationships of existent GPU VA
+ * mappings, a newly requested mapping and the resulting mappings as implemented
+ * by drm_gpuva_sm_map() - it doesn't cover any arbitrary combinations of these.
+ *
+ * 1) Requested mapping is identical. Replace it, but indicate the backing PTEs
+ *    could be kept.
+ *
+ *    ::
+ *
+ *	     0     a     1
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	     0     a     1
+ *	req: |-----------| (bo_offset=n)
+ *
+ *	     0     a     1
+ *	new: |-----------| (bo_offset=n)
+ *
+ *
+ * 2) Requested mapping is identical, except for the BO offset, hence replace
+ *    the mapping.
+ *
+ *    ::
+ *
+ *	     0     a     1
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	     0     a     1
+ *	req: |-----------| (bo_offset=m)
+ *
+ *	     0     a     1
+ *	new: |-----------| (bo_offset=m)
+ *
+ *
+ * 3) Requested mapping is identical, except for the backing BO, hence replace
+ *    the mapping.
+ *
+ *    ::
+ *
+ *	     0     a     1
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	     0     b     1
+ *	req: |-----------| (bo_offset=n)
+ *
+ *	     0     b     1
+ *	new: |-----------| (bo_offset=n)
+ *
+ *
+ * 4) Existent mapping is a left aligned subset of the requested one, hence
+ *    replace the existent one.
+ *
+ *    ::
+ *
+ *	     0  a  1
+ *	old: |-----|       (bo_offset=n)
+ *
+ *	     0     a     2
+ *	req: |-----------| (bo_offset=n)
+ *
+ *	     0     a     2
+ *	new: |-----------| (bo_offset=n)
+ *
+ *    .. note::
+ *       We expect to see the same result for a request with a different BO
+ *       and/or non-contiguous BO offset.
+ *
+ *
+ * 5) Requested mapping's range is a left aligned subset of the existent one,
+ *    but backed by a different BO. Hence, map the requested mapping and split
+ *    the existent one adjusting it's BO offset.
+ *
+ *    ::
+ *
+ *	     0     a     2
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	     0  b  1
+ *	req: |-----|       (bo_offset=n)
+ *
+ *	     0  b  1  a' 2
+ *	new: |-----|-----| (b.bo_offset=n, a.bo_offset=n+1)
+ *
+ *    .. note::
+ *       We expect to see the same result for a request with a different BO
+ *       and/or non-contiguous BO offset.
+ *
+ *
+ * 6) Existent mapping is a superset of the requested mapping. Split it up, but
+ *    indicate that the backing PTEs could be kept.
+ *
+ *    ::
+ *
+ *	     0     a     2
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	     0  a  1
+ *	req: |-----|       (bo_offset=n)
+ *
+ *	     0  a  1  a' 2
+ *	new: |-----|-----| (a.bo_offset=n, a'.bo_offset=n+1)
+ *
+ *
+ * 7) Requested mapping's range is a right aligned subset of the existent one,
+ *    but backed by a different BO. Hence, map the requested mapping and split
+ *    the existent one, without adjusting the BO offset.
+ *
+ *    ::
+ *
+ *	     0     a     2
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	           1  b  2
+ *	req:       |-----| (bo_offset=m)
+ *
+ *	     0  a  1  b  2
+ *	new: |-----|-----| (a.bo_offset=n,b.bo_offset=m)
+ *
+ *
+ * 8) Existent mapping is a superset of the requested mapping. Split it up, but
+ *    indicate that the backing PTEs could be kept.
+ *
+ *    ::
+ *
+ *	      0     a     2
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	           1  a  2
+ *	req:       |-----| (bo_offset=n+1)
+ *
+ *	     0  a' 1  a  2
+ *	new: |-----|-----| (a'.bo_offset=n, a.bo_offset=n+1)
+ *
+ *
+ * 9) Existent mapping is overlapped at the end by the requested mapping backed
+ *    by a different BO. Hence, map the requested mapping and split up the
+ *    existent one, without adjusting the BO offset.
+ *
+ *    ::
+ *
+ *	     0     a     2
+ *	old: |-----------|       (bo_offset=n)
+ *
+ *	           1     b     3
+ *	req:       |-----------| (bo_offset=m)
+ *
+ *	     0  a  1     b     3
+ *	new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
+ *
+ *
+ * 10) Existent mapping is overlapped by the requested mapping, both having the
+ *     same backing BO with a contiguous offset. Indicate the backing PTEs of
+ *     the old mapping could be kept.
+ *
+ *     ::
+ *
+ *	      0     a     2
+ *	 old: |-----------|       (bo_offset=n)
+ *
+ *	            1     a     3
+ *	 req:       |-----------| (bo_offset=n+1)
+ *
+ *	      0  a' 1     a     3
+ *	 new: |-----|-----------| (a'.bo_offset=n, a.bo_offset=n+1)
+ *
+ *
+ * 11) Requested mapping's range is a centered subset of the existent one
+ *     having a different backing BO. Hence, map the requested mapping and split
+ *     up the existent one in two mappings, adjusting the BO offset of the right
+ *     one accordingly.
+ *
+ *     ::
+ *
+ *	      0        a        3
+ *	 old: |-----------------| (bo_offset=n)
+ *
+ *	            1  b  2
+ *	 req:       |-----|       (bo_offset=m)
+ *
+ *	      0  a  1  b  2  a' 3
+ *	 new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
+ *
+ *
+ * 12) Requested mapping is a contiguous subset of the existent one. Split it
+ *     up, but indicate that the backing PTEs could be kept.
+ *
+ *     ::
+ *
+ *	      0        a        3
+ *	 old: |-----------------| (bo_offset=n)
+ *
+ *	            1  a  2
+ *	 req:       |-----|       (bo_offset=n+1)
+ *
+ *	      0  a' 1  a  2 a'' 3
+ *	 old: |-----|-----|-----| (a'.bo_offset=n, a.bo_offset=n+1, a''.bo_offset=n+2)
+ *
+ *
+ * 13) Existent mapping is a right aligned subset of the requested one, hence
+ *     replace the existent one.
+ *
+ *     ::
+ *
+ *	            1  a  2
+ *	 old:       |-----| (bo_offset=n+1)
+ *
+ *	      0     a     2
+ *	 req: |-----------| (bo_offset=n)
+ *
+ *	      0     a     2
+ *	 new: |-----------| (bo_offset=n)
+ *
+ *     .. note::
+ *        We expect to see the same result for a request with a different bo
+ *        and/or non-contiguous bo_offset.
+ *
+ *
+ * 14) Existent mapping is a centered subset of the requested one, hence
+ *     replace the existent one.
+ *
+ *     ::
+ *
+ *	            1  a  2
+ *	 old:       |-----| (bo_offset=n+1)
+ *
+ *	      0        a       3
+ *	 req: |----------------| (bo_offset=n)
+ *
+ *	      0        a       3
+ *	 new: |----------------| (bo_offset=n)
+ *
+ *     .. note::
+ *        We expect to see the same result for a request with a different bo
+ *        and/or non-contiguous bo_offset.
+ *
+ *
+ * 15) Existent mappings is overlapped at the beginning by the requested mapping
+ *     backed by a different BO. Hence, map the requested mapping and split up
+ *     the existent one, adjusting it's BO offset accordingly.
+ *
+ *     ::
+ *
+ *	            1     a     3
+ *	 old:       |-----------| (bo_offset=n)
+ *
+ *	      0     b     2
+ *	 req: |-----------|       (bo_offset=m)
+ *
+ *	      0     b     2  a' 3
+ *	 new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
+ */
+
+/**
+ * DOC: Locking
+ *
+ * Generally, the GPU VA manager does not take care of locking itself, it is
+ * the drivers responsibility to take care about locking. Drivers might want to
+ * protect the following operations: inserting, removing and iterating
+ * &drm_gpuva objects as well as generating all kinds of operations, such as
+ * split / merge or prefetch.
+ *
+ * The GPU VA manager also does not take care of the locking of the backing
+ * &drm_gem_object buffers GPU VA lists by itself; drivers are responsible to
+ * enforce mutual exclusion.
+ */
+
+ /*
+  * Maple Tree Locking
+  *
+  * The maple tree's advanced API requires the user of the API to protect
+  * certain tree operations with a lock (either the external or internal tree
+  * lock) for tree internal reasons.
+  *
+  * The actual rules (when to aquire/release the lock) are enforced by lockdep
+  * through the maple tree implementation.
+  *
+  * For this reason the DRM GPUVA manager takes the maple tree's internal
+  * spinlock according to the lockdep enforced rules.
+  *
+  * Please note, that this lock is *only* meant to fulfill the maple trees
+  * requirements and does not intentionally protect the DRM GPUVA manager
+  * against concurrent access.
+  *
+  * The following mail thread provides more details on why the maple tree
+  * has this requirement.
+  *
+  * https://lore.kernel.org/lkml/20230217134422.14116-5-dakr@redhat.com/
+  */
+
+static int __drm_gpuva_insert(struct drm_gpuva_manager *mgr,
+			      struct drm_gpuva *va);
+static void __drm_gpuva_remove(struct drm_gpuva *va);
+
+/**
+ * drm_gpuva_manager_init - initialize a &drm_gpuva_manager
+ * @mgr: pointer to the &drm_gpuva_manager to initialize
+ * @name: the name of the GPU VA space
+ * @start_offset: the start offset of the GPU VA space
+ * @range: the size of the GPU VA space
+ * @reserve_offset: the start of the kernel reserved GPU VA area
+ * @reserve_range: the size of the kernel reserved GPU VA area
+ * @ops: &drm_gpuva_fn_ops called on &drm_gpuva_sm_map / &drm_gpuva_sm_unmap
+ *
+ * The &drm_gpuva_manager must be initialized with this function before use.
+ *
+ * Note that @mgr must be cleared to 0 before calling this function. The given
+ * &name is expected to be managed by the surrounding driver structures.
+ */
+void
+drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
+		       const char *name,
+		       u64 start_offset, u64 range,
+		       u64 reserve_offset, u64 reserve_range,
+		       struct drm_gpuva_fn_ops *ops)
+{
+	mt_init(&mgr->mtree);
+
+	mgr->mm_start = start_offset;
+	mgr->mm_range = range;
+
+	mgr->name = name ? name : "unknown";
+	mgr->ops = ops;
+
+	memset(&mgr->kernel_alloc_node, 0, sizeof(struct drm_gpuva));
+
+	if (reserve_range) {
+		mgr->kernel_alloc_node.va.addr = reserve_offset;
+		mgr->kernel_alloc_node.va.range = reserve_range;
+
+		__drm_gpuva_insert(mgr, &mgr->kernel_alloc_node);
+	}
+
+}
+EXPORT_SYMBOL(drm_gpuva_manager_init);
+
+/**
+ * drm_gpuva_manager_destroy - cleanup a &drm_gpuva_manager
+ * @mgr: pointer to the &drm_gpuva_manager to clean up
+ *
+ * Note that it is a bug to call this function on a manager that still
+ * holds GPU VA mappings.
+ */
+void
+drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr)
+{
+	mgr->name = NULL;
+
+	if (mgr->kernel_alloc_node.va.range)
+		__drm_gpuva_remove(&mgr->kernel_alloc_node);
+
+	mtree_lock(&mgr->mtree);
+	WARN(!mtree_empty(&mgr->mtree),
+	     "GPUVA tree is not empty, potentially leaking memory.");
+	__mt_destroy(&mgr->mtree);
+	mtree_unlock(&mgr->mtree);
+}
+EXPORT_SYMBOL(drm_gpuva_manager_destroy);
+
+static inline bool
+drm_gpuva_in_mm_range(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
+{
+	u64 end = addr + range;
+	u64 mm_start = mgr->mm_start;
+	u64 mm_end = mm_start + mgr->mm_range;
+
+	return addr < mm_end && mm_start < end;
+}
+
+static inline bool
+drm_gpuva_in_kernel_node(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
+{
+	u64 end = addr + range;
+	u64 kstart = mgr->kernel_alloc_node.va.addr;
+	u64 krange = mgr->kernel_alloc_node.va.range;
+	u64 kend = kstart + krange;
+
+	return krange && addr < kend && kstart < end;
+}
+
+static inline bool
+drm_gpuva_range_valid(struct drm_gpuva_manager *mgr,
+		      u64 addr, u64 range)
+{
+	return drm_gpuva_in_mm_range(mgr, addr, range) &&
+	       !drm_gpuva_in_kernel_node(mgr, addr, range);
+}
+
+/**
+ * drm_gpuva_iter_remove - removes the iterators current element
+ * @it: the &drm_gpuva_iterator
+ *
+ * This removes the element the iterator currently points to.
+ */
+void
+drm_gpuva_iter_remove(struct drm_gpuva_iterator *it)
+{
+	mas_lock(&it->mas);
+	mas_erase(&it->mas);
+	mas_unlock(&it->mas);
+}
+EXPORT_SYMBOL(drm_gpuva_iter_remove);
+
+/**
+ * drm_gpuva_prealloc_create - creates a preallocated node to store a
+ * &drm_gpuva entry.
+ *
+ * Returns: the &drm_gpuva_prealloc object on success, NULL on failure
+ */
+struct drm_gpuva_prealloc *
+drm_gpuva_prealloc_create(void)
+{
+	struct drm_gpuva_prealloc *pa;
+
+	pa = kzalloc(sizeof(*pa), GFP_KERNEL);
+	if (!pa)
+		return NULL;
+
+	if (mas_preallocate(&pa->mas, GFP_KERNEL)) {
+		kfree(pa);
+		return NULL;
+	}
+
+	return pa;
+}
+EXPORT_SYMBOL(drm_gpuva_prealloc_create);
+
+/**
+ * drm_gpuva_prealloc_destroy - destroyes a preallocated node and frees the
+ * &drm_gpuva_prealloc
+ *
+ * @pa: the &drm_gpuva_prealloc to destroy
+ */
+void
+drm_gpuva_prealloc_destroy(struct drm_gpuva_prealloc *pa)
+{
+	mas_destroy(&pa->mas);
+	kfree(pa);
+}
+EXPORT_SYMBOL(drm_gpuva_prealloc_destroy);
+
+static int
+drm_gpuva_insert_state(struct drm_gpuva_manager *mgr,
+		       struct ma_state *mas,
+		       struct drm_gpuva *va)
+{
+	u64 addr = va->va.addr;
+	u64 range = va->va.range;
+	u64 last = addr + range - 1;
+
+	mas_set(mas, addr);
+
+	mas_lock(mas);
+	if (unlikely(mas_walk(mas))) {
+		mas_unlock(mas);
+		return -EEXIST;
+	}
+
+	if (unlikely(mas->last < last)) {
+		mas_unlock(mas);
+		return -EEXIST;
+	}
+
+	mas->index = addr;
+	mas->last = last;
+
+	mas_store_prealloc(mas, va);
+	mas_unlock(mas);
+
+	va->mgr = mgr;
+
+	return 0;
+}
+
+static int
+__drm_gpuva_insert(struct drm_gpuva_manager *mgr,
+		   struct drm_gpuva *va)
+{
+	MA_STATE(mas, &mgr->mtree, 0, 0);
+	int ret;
+
+	ret = mas_preallocate(&mas, GFP_KERNEL);
+	if (ret)
+		return ret;
+
+	return drm_gpuva_insert_state(mgr, &mas, va);
+}
+
+/**
+ * drm_gpuva_insert - insert a &drm_gpuva
+ * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
+ * @va: the &drm_gpuva to insert
+ *
+ * Insert a &drm_gpuva with a given address and range into a
+ * &drm_gpuva_manager.
+ *
+ * It is not allowed to use this function while iterating this GPU VA space,
+ * e.g via drm_gpuva_iter_for_each().
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuva_insert(struct drm_gpuva_manager *mgr,
+		 struct drm_gpuva *va)
+{
+	u64 addr = va->va.addr;
+	u64 range = va->va.range;
+
+	if (unlikely(!drm_gpuva_range_valid(mgr, addr, range)))
+		return -EINVAL;
+
+	return __drm_gpuva_insert(mgr, va);
+}
+EXPORT_SYMBOL(drm_gpuva_insert);
+
+/**
+ * drm_gpuva_insert_prealloc - insert a &drm_gpuva with a preallocated node
+ * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
+ * @va: the &drm_gpuva to insert
+ * @pa: the &drm_gpuva_prealloc node
+ *
+ * Insert a &drm_gpuva with a given address and range into a
+ * &drm_gpuva_manager.
+ *
+ * It is not allowed to use this function while iterating this GPU VA space,
+ * e.g via drm_gpuva_iter_for_each().
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuva_insert_prealloc(struct drm_gpuva_manager *mgr,
+			  struct drm_gpuva_prealloc *pa,
+			  struct drm_gpuva *va)
+{
+	struct ma_state *mas = &pa->mas;
+	u64 addr = va->va.addr;
+	u64 range = va->va.range;
+
+	if (unlikely(!drm_gpuva_range_valid(mgr, addr, range)))
+		return -EINVAL;
+
+	mas->tree = &mgr->mtree;
+	return drm_gpuva_insert_state(mgr, mas, va);
+}
+EXPORT_SYMBOL(drm_gpuva_insert_prealloc);
+
+static void
+__drm_gpuva_remove(struct drm_gpuva *va)
+{
+	MA_STATE(mas, &va->mgr->mtree, va->va.addr, 0);
+
+	mas_lock(&mas);
+	mas_erase(&mas);
+	mas_unlock(&mas);
+}
+
+/**
+ * drm_gpuva_remove - remove a &drm_gpuva
+ * @va: the &drm_gpuva to remove
+ *
+ * This removes the given &va from the underlaying tree.
+ *
+ * It is not allowed to use this function while iterating this GPU VA space,
+ * e.g via drm_gpuva_iter_for_each(). Please use drm_gpuva_iter_remove()
+ * instead.
+ */
+void
+drm_gpuva_remove(struct drm_gpuva *va)
+{
+	struct drm_gpuva_manager *mgr = va->mgr;
+
+	if (unlikely(va == &mgr->kernel_alloc_node)) {
+		WARN(1, "Can't destroy kernel reserved node.\n");
+		return;
+	}
+
+	__drm_gpuva_remove(va);
+}
+EXPORT_SYMBOL(drm_gpuva_remove);
+
+/**
+ * drm_gpuva_link - link a &drm_gpuva
+ * @va: the &drm_gpuva to link
+ *
+ * This adds the given &va to the GPU VA list of the &drm_gem_object it is
+ * associated with.
+ *
+ * This function expects the caller to protect the GEM's GPUVA list against
+ * concurrent access.
+ */
+void
+drm_gpuva_link(struct drm_gpuva *va)
+{
+	if (likely(va->gem.obj))
+		list_add_tail(&va->gem.entry, &va->gem.obj->gpuva.list);
+}
+EXPORT_SYMBOL(drm_gpuva_link);
+
+/**
+ * drm_gpuva_unlink - unlink a &drm_gpuva
+ * @va: the &drm_gpuva to unlink
+ *
+ * This removes the given &va from the GPU VA list of the &drm_gem_object it is
+ * associated with.
+ *
+ * This function expects the caller to protect the GEM's GPUVA list against
+ * concurrent access.
+ */
+void
+drm_gpuva_unlink(struct drm_gpuva *va)
+{
+	if (likely(va->gem.obj))
+		list_del_init(&va->gem.entry);
+}
+EXPORT_SYMBOL(drm_gpuva_unlink);
+
+/**
+ * drm_gpuva_find_first - find the first &drm_gpuva in the given range
+ * @mgr: the &drm_gpuva_manager to search in
+ * @addr: the &drm_gpuvas address
+ * @range: the &drm_gpuvas range
+ *
+ * Returns: the first &drm_gpuva within the given range
+ */
+struct drm_gpuva *
+drm_gpuva_find_first(struct drm_gpuva_manager *mgr,
+		     u64 addr, u64 range)
+{
+	MA_STATE(mas, &mgr->mtree, addr, 0);
+	struct drm_gpuva *va;
+
+	mas_lock(&mas);
+	va = mas_find(&mas, addr + range - 1);
+	mas_unlock(&mas);
+
+	return va;
+}
+EXPORT_SYMBOL(drm_gpuva_find_first);
+
+/**
+ * drm_gpuva_find - find a &drm_gpuva
+ * @mgr: the &drm_gpuva_manager to search in
+ * @addr: the &drm_gpuvas address
+ * @range: the &drm_gpuvas range
+ *
+ * Returns: the &drm_gpuva at a given &addr and with a given &range
+ */
+struct drm_gpuva *
+drm_gpuva_find(struct drm_gpuva_manager *mgr,
+	       u64 addr, u64 range)
+{
+	struct drm_gpuva *va;
+
+	va = drm_gpuva_find_first(mgr, addr, range);
+	if (!va)
+		goto out;
+
+	if (va->va.addr != addr ||
+	    va->va.range != range)
+		goto out;
+
+	return va;
+
+out:
+	return NULL;
+}
+EXPORT_SYMBOL(drm_gpuva_find);
+
+/**
+ * drm_gpuva_find_prev - find the &drm_gpuva before the given address
+ * @mgr: the &drm_gpuva_manager to search in
+ * @start: the given GPU VA's start address
+ *
+ * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
+ *
+ * Note that if there is any free space between the GPU VA mappings no mapping
+ * is returned.
+ *
+ * Returns: a pointer to the found &drm_gpuva or NULL if none was found
+ */
+struct drm_gpuva *
+drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
+{
+	MA_STATE(mas, &mgr->mtree, start - 1, 0);
+	struct drm_gpuva *va;
+
+	if (start <= mgr->mm_start ||
+	    start > (mgr->mm_start + mgr->mm_range))
+		return NULL;
+
+	mas_lock(&mas);
+	va = mas_walk(&mas);
+	mas_unlock(&mas);
+
+	return va;
+}
+EXPORT_SYMBOL(drm_gpuva_find_prev);
+
+/**
+ * drm_gpuva_find_next - find the &drm_gpuva after the given address
+ * @mgr: the &drm_gpuva_manager to search in
+ * @end: the given GPU VA's end address
+ *
+ * Find the adjacent &drm_gpuva after the GPU VA with given &end address.
+ *
+ * Note that if there is any free space between the GPU VA mappings no mapping
+ * is returned.
+ *
+ * Returns: a pointer to the found &drm_gpuva or NULL if none was found
+ */
+struct drm_gpuva *
+drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end)
+{
+	MA_STATE(mas, &mgr->mtree, end, 0);
+	struct drm_gpuva *va;
+
+	if (end < mgr->mm_start ||
+	    end >= (mgr->mm_start + mgr->mm_range))
+		return NULL;
+
+	mas_lock(&mas);
+	va = mas_walk(&mas);
+	mas_unlock(&mas);
+
+	return va;
+}
+EXPORT_SYMBOL(drm_gpuva_find_next);
+
+/**
+ * drm_gpuva_interval_empty - indicate whether a given interval of the VA space
+ * is empty
+ * @mgr: the &drm_gpuva_manager to check the range for
+ * @addr: the start address of the range
+ * @range: the range of the interval
+ *
+ * Returns: true if the interval is empty, false otherwise
+ */
+bool
+drm_gpuva_interval_empty(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
+{
+	DRM_GPUVA_ITER(it, mgr, addr);
+	struct drm_gpuva *va;
+
+	drm_gpuva_iter_for_each_range(va, it, addr + range)
+		return false;
+
+	return true;
+}
+EXPORT_SYMBOL(drm_gpuva_interval_empty);
+
+/**
+ * drm_gpuva_map - helper to insert a &drm_gpuva from &drm_gpuva_fn_ops
+ * callbacks
+ *
+ * @mgr: the &drm_gpuva_manager
+ * @pa: the &drm_gpuva_prealloc
+ * @va: the &drm_gpuva to inser
+ */
+int
+drm_gpuva_map(struct drm_gpuva_manager *mgr,
+	      struct drm_gpuva_prealloc *pa,
+	      struct drm_gpuva *va)
+{
+	return drm_gpuva_insert_prealloc(mgr, pa, va);
+}
+EXPORT_SYMBOL(drm_gpuva_map);
+
+/**
+ * drm_gpuva_remap - helper to insert a &drm_gpuva from &drm_gpuva_fn_ops
+ * callbacks
+ *
+ * @state: the current &drm_gpuva_state
+ * @prev: the &drm_gpuva to remap when keeping the start of a mapping,
+ * may be NULL
+ * @next: the &drm_gpuva to remap when keeping the end of a mapping,
+ * may be NULL
+ */
+int
+drm_gpuva_remap(drm_gpuva_state_t state,
+		struct drm_gpuva *prev,
+		struct drm_gpuva *next)
+{
+	struct ma_state *mas = &state->mas;
+	u64 max = mas->last;
+
+	if (unlikely(!prev && !next))
+		return -EINVAL;
+
+	if (prev) {
+		u64 addr = prev->va.addr;
+		u64 last = addr + prev->va.range - 1;
+
+		if (unlikely(addr != mas->index))
+			return -EINVAL;
+
+		if (unlikely(last >= mas->last))
+			return -EINVAL;
+	}
+
+	if (next) {
+		u64 addr = next->va.addr;
+		u64 last = addr + next->va.range - 1;
+
+		if (unlikely(last != mas->last))
+			return -EINVAL;
+
+		if (unlikely(addr <= mas->index))
+			return -EINVAL;
+	}
+
+	if (prev && next) {
+		u64 p_last = prev->va.addr + prev->va.range - 1;
+		u64 n_addr = next->va.addr;
+
+		if (unlikely(p_last > n_addr))
+			return -EINVAL;
+
+		if (unlikely(n_addr - p_last <= 1))
+			return -EINVAL;
+	}
+
+	mas_lock(mas);
+	if (prev) {
+		mas_store(mas, prev);
+		mas_next(mas, max);
+		if (!next)
+			mas_store(mas, NULL);
+	}
+
+	if (next) {
+		mas->last = next->va.addr - 1;
+		mas_store(mas, NULL);
+		mas_next(mas, max);
+		mas_store(mas, next);
+	}
+	mas_unlock(mas);
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_gpuva_remap);
+
+/**
+ * drm_gpuva_unmap - helper to remove a &drm_gpuva from &drm_gpuva_fn_ops
+ * callbacks
+ *
+ * @state: the current &drm_gpuva_state
+ *
+ * The entry associated with the current state is removed.
+ */
+void
+drm_gpuva_unmap(drm_gpuva_state_t state)
+{
+	drm_gpuva_iter_remove(state);
+}
+EXPORT_SYMBOL(drm_gpuva_unmap);
+
+static int
+op_map_cb(struct drm_gpuva_fn_ops *fn, void *priv,
+	  u64 addr, u64 range,
+	  struct drm_gem_object *obj, u64 offset)
+{
+	struct drm_gpuva_op op = {};
+
+	op.op = DRM_GPUVA_OP_MAP;
+	op.map.va.addr = addr;
+	op.map.va.range = range;
+	op.map.gem.obj = obj;
+	op.map.gem.offset = offset;
+
+	return fn->sm_step_map(&op, priv);
+}
+
+static int
+op_remap_cb(struct drm_gpuva_fn_ops *fn,
+	    drm_gpuva_state_t state, void *priv,
+	    struct drm_gpuva_op_map *prev,
+	    struct drm_gpuva_op_map *next,
+	    struct drm_gpuva_op_unmap *unmap)
+{
+	struct drm_gpuva_op op = {};
+	struct drm_gpuva_op_remap *r;
+
+	op.op = DRM_GPUVA_OP_REMAP;
+	r = &op.remap;
+	r->prev = prev;
+	r->next = next;
+	r->unmap = unmap;
+
+	return fn->sm_step_remap(&op, state, priv);
+}
+
+static int
+op_unmap_cb(struct drm_gpuva_fn_ops *fn,
+	    drm_gpuva_state_t state, void *priv,
+	    struct drm_gpuva *va, bool merge)
+{
+	struct drm_gpuva_op op = {};
+
+	op.op = DRM_GPUVA_OP_UNMAP;
+	op.unmap.va = va;
+	op.unmap.keep = merge;
+
+	return fn->sm_step_unmap(&op, state, priv);
+}
+
+static int
+__drm_gpuva_sm_map(struct drm_gpuva_manager *mgr,
+		   struct drm_gpuva_fn_ops *ops, void *priv,
+		   u64 req_addr, u64 req_range,
+		   struct drm_gem_object *req_obj, u64 req_offset)
+{
+	DRM_GPUVA_ITER(it, mgr, req_addr);
+	struct drm_gpuva *va, *prev = NULL;
+	u64 req_end = req_addr + req_range;
+	int ret;
+
+	if (unlikely(!drm_gpuva_in_mm_range(mgr, req_addr, req_range)))
+		return -EINVAL;
+
+	if (unlikely(drm_gpuva_in_kernel_node(mgr, req_addr, req_range)))
+		return -EINVAL;
+
+	drm_gpuva_iter_for_each_range(va, it, req_end) {
+		struct drm_gem_object *obj = va->gem.obj;
+		u64 offset = va->gem.offset;
+		u64 addr = va->va.addr;
+		u64 range = va->va.range;
+		u64 end = addr + range;
+		bool merge = !!va->gem.obj;
+
+		if (addr == req_addr) {
+			merge &= obj == req_obj &&
+				 offset == req_offset;
+
+			if (end == req_end) {
+				ret = op_unmap_cb(ops, &it, priv, va, merge);
+				if (ret)
+					return ret;
+				break;
+			}
+
+			if (end < req_end) {
+				ret = op_unmap_cb(ops, &it, priv, va, merge);
+				if (ret)
+					return ret;
+				goto next;
+			}
+
+			if (end > req_end) {
+				struct drm_gpuva_op_map n = {
+					.va.addr = req_end,
+					.va.range = range - req_range,
+					.gem.obj = obj,
+					.gem.offset = offset + req_range,
+				};
+				struct drm_gpuva_op_unmap u = {
+					.va = va,
+					.keep = merge,
+				};
+
+				ret = op_remap_cb(ops, &it, priv, NULL, &n, &u);
+				if (ret)
+					return ret;
+				break;
+			}
+		} else if (addr < req_addr) {
+			u64 ls_range = req_addr - addr;
+			struct drm_gpuva_op_map p = {
+				.va.addr = addr,
+				.va.range = ls_range,
+				.gem.obj = obj,
+				.gem.offset = offset,
+			};
+			struct drm_gpuva_op_unmap u = { .va = va };
+
+			merge &= obj == req_obj &&
+				 offset + ls_range == req_offset;
+			u.keep = merge;
+
+			if (end == req_end) {
+				ret = op_remap_cb(ops, &it, priv, &p, NULL, &u);
+				if (ret)
+					return ret;
+				break;
+			}
+
+			if (end < req_end) {
+				ret = op_remap_cb(ops, &it, priv, &p, NULL, &u);
+				if (ret)
+					return ret;
+				goto next;
+			}
+
+			if (end > req_end) {
+				struct drm_gpuva_op_map n = {
+					.va.addr = req_end,
+					.va.range = end - req_end,
+					.gem.obj = obj,
+					.gem.offset = offset + ls_range +
+						      req_range,
+				};
+
+				ret = op_remap_cb(ops, &it, priv, &p, &n, &u);
+				if (ret)
+					return ret;
+				break;
+			}
+		} else if (addr > req_addr) {
+			merge &= obj == req_obj &&
+				 offset == req_offset +
+					   (addr - req_addr);
+
+			if (end == req_end) {
+				ret = op_unmap_cb(ops, &it, priv, va, merge);
+				if (ret)
+					return ret;
+				break;
+			}
+
+			if (end < req_end) {
+				ret = op_unmap_cb(ops, &it, priv, va, merge);
+				if (ret)
+					return ret;
+				goto next;
+			}
+
+			if (end > req_end) {
+				struct drm_gpuva_op_map n = {
+					.va.addr = req_end,
+					.va.range = end - req_end,
+					.gem.obj = obj,
+					.gem.offset = offset + req_end - addr,
+				};
+				struct drm_gpuva_op_unmap u = {
+					.va = va,
+					.keep = merge,
+				};
+
+				ret = op_remap_cb(ops, &it, priv, NULL, &n, &u);
+				if (ret)
+					return ret;
+				break;
+			}
+		}
+next:
+		prev = va;
+	}
+
+	return op_map_cb(ops, priv,
+			 req_addr, req_range,
+			 req_obj, req_offset);
+}
+
+static int
+__drm_gpuva_sm_unmap(struct drm_gpuva_manager *mgr,
+		     struct drm_gpuva_fn_ops *ops, void *priv,
+		     u64 req_addr, u64 req_range)
+{
+	DRM_GPUVA_ITER(it, mgr, req_addr);
+	struct drm_gpuva *va;
+	u64 req_end = req_addr + req_range;
+	int ret;
+
+	if (unlikely(drm_gpuva_in_kernel_node(mgr, req_addr, req_range)))
+		return -EINVAL;
+
+	drm_gpuva_iter_for_each_range(va, it, req_end) {
+		struct drm_gpuva_op_map prev = {}, next = {};
+		bool prev_split = false, next_split = false;
+		struct drm_gem_object *obj = va->gem.obj;
+		u64 offset = va->gem.offset;
+		u64 addr = va->va.addr;
+		u64 range = va->va.range;
+		u64 end = addr + range;
+
+		if (addr < req_addr) {
+			prev.va.addr = addr;
+			prev.va.range = req_addr - addr;
+			prev.gem.obj = obj;
+			prev.gem.offset = offset;
+
+			prev_split = true;
+		}
+
+		if (end > req_end) {
+			next.va.addr = req_end;
+			next.va.range = end - req_end;
+			next.gem.obj = obj;
+			next.gem.offset = offset + (req_end - addr);
+
+			next_split = true;
+		}
+
+		if (prev_split || next_split) {
+			struct drm_gpuva_op_unmap unmap = { .va = va };
+
+			ret = op_remap_cb(ops, &it, priv,
+					  prev_split ? &prev : NULL,
+					  next_split ? &next : NULL,
+					  &unmap);
+			if (ret)
+				return ret;
+		} else {
+			ret = op_unmap_cb(ops, &it, priv, va, false);
+			if (ret)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * drm_gpuva_sm_map - creates the &drm_gpuva_op split/merge steps
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ * @req_addr: the start address of the new mapping
+ * @req_range: the range of the new mapping
+ * @req_obj: the &drm_gem_object to map
+ * @req_offset: the offset within the &drm_gem_object
+ * @priv: pointer to a driver private data structure
+ *
+ * This function iterates the given range of the GPU VA space. It utilizes the
+ * &drm_gpuva_fn_ops to call back into the driver providing the split and merge
+ * steps.
+ *
+ * Drivers may use these callbacks to update the GPU VA space right away within
+ * the callback. In case the driver decides to copy and store the operations for
+ * later processing neither this function nor &drm_gpuva_sm_unmap is allowed to
+ * be called before the &drm_gpuva_manager's view of the GPU VA space was
+ * updated with the previous set of operations. To update the
+ * &drm_gpuva_manager's view of the GPU VA space drm_gpuva_insert(),
+ * drm_gpuva_destroy_locked() and/or drm_gpuva_destroy_unlocked() should be
+ * used.
+ *
+ * A sequence of callbacks can contain map, unmap and remap operations, but
+ * the sequence of callbacks might also be empty if no operation is required,
+ * e.g. if the requested mapping already exists in the exact same way.
+ *
+ * There can be an arbitrary amount of unmap operations, a maximum of two remap
+ * operations and a single map operation. The latter one represents the original
+ * map operation requested by the caller.
+ *
+ * Returns: 0 on success or a negative error code
+ */
+int
+drm_gpuva_sm_map(struct drm_gpuva_manager *mgr, void *priv,
+		 u64 req_addr, u64 req_range,
+		 struct drm_gem_object *req_obj, u64 req_offset)
+{
+	struct drm_gpuva_fn_ops *ops = mgr->ops;
+
+	if (unlikely(!(ops && ops->sm_step_map &&
+		       ops->sm_step_remap &&
+		       ops->sm_step_unmap)))
+		return -EINVAL;
+
+	return __drm_gpuva_sm_map(mgr, ops, priv,
+				  req_addr, req_range,
+				  req_obj, req_offset);
+}
+EXPORT_SYMBOL(drm_gpuva_sm_map);
+
+/**
+ * drm_gpuva_sm_unmap - creates the &drm_gpuva_ops to split on unmap
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ * @priv: pointer to a driver private data structure
+ * @req_addr: the start address of the range to unmap
+ * @req_range: the range of the mappings to unmap
+ *
+ * This function iterates the given range of the GPU VA space. It utilizes the
+ * &drm_gpuva_fn_ops to call back into the driver providing the operations to
+ * unmap and, if required, split existent mappings.
+ *
+ * Drivers may use these callbacks to update the GPU VA space right away within
+ * the callback. In case the driver decides to copy and store the operations for
+ * later processing neither this function nor &drm_gpuva_sm_map is allowed to be
+ * called before the &drm_gpuva_manager's view of the GPU VA space was updated
+ * with the previous set of operations. To update the &drm_gpuva_manager's view
+ * of the GPU VA space drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
+ * drm_gpuva_destroy_unlocked() should be used.
+ *
+ * A sequence of callbacks can contain unmap and remap operations, depending on
+ * whether there are actual overlapping mappings to split.
+ *
+ * There can be an arbitrary amount of unmap operations and a maximum of two
+ * remap operations.
+ *
+ * Returns: 0 on success or a negative error code
+ */
+int
+drm_gpuva_sm_unmap(struct drm_gpuva_manager *mgr, void *priv,
+		   u64 req_addr, u64 req_range)
+{
+	struct drm_gpuva_fn_ops *ops = mgr->ops;
+
+	if (unlikely(!(ops && ops->sm_step_remap &&
+		       ops->sm_step_unmap)))
+		return -EINVAL;
+
+	return __drm_gpuva_sm_unmap(mgr, ops, priv,
+				    req_addr, req_range);
+}
+EXPORT_SYMBOL(drm_gpuva_sm_unmap);
+
+static struct drm_gpuva_op *
+gpuva_op_alloc(struct drm_gpuva_manager *mgr)
+{
+	struct drm_gpuva_fn_ops *fn = mgr->ops;
+	struct drm_gpuva_op *op;
+
+	if (fn && fn->op_alloc)
+		op = fn->op_alloc();
+	else
+		op = kzalloc(sizeof(*op), GFP_KERNEL);
+
+	if (unlikely(!op))
+		return NULL;
+
+	return op;
+}
+
+static void
+gpuva_op_free(struct drm_gpuva_manager *mgr,
+	      struct drm_gpuva_op *op)
+{
+	struct drm_gpuva_fn_ops *fn = mgr->ops;
+
+	if (fn && fn->op_free)
+		fn->op_free(op);
+	else
+		kfree(op);
+}
+
+static int
+drm_gpuva_sm_step(struct drm_gpuva_op *__op,
+		  drm_gpuva_state_t state,
+		  void *priv)
+{
+	struct {
+		struct drm_gpuva_manager *mgr;
+		struct drm_gpuva_ops *ops;
+	} *args = priv;
+	struct drm_gpuva_manager *mgr = args->mgr;
+	struct drm_gpuva_ops *ops = args->ops;
+	struct drm_gpuva_op *op;
+
+	op = gpuva_op_alloc(mgr);
+	if (unlikely(!op))
+		goto err;
+
+	memcpy(op, __op, sizeof(*op));
+
+	if (op->op == DRM_GPUVA_OP_REMAP) {
+		struct drm_gpuva_op_remap *__r = &__op->remap;
+		struct drm_gpuva_op_remap *r = &op->remap;
+
+		r->unmap = kmemdup(__r->unmap, sizeof(*r->unmap),
+				   GFP_KERNEL);
+		if (unlikely(!r->unmap))
+			goto err_free_op;
+
+		if (__r->prev) {
+			r->prev = kmemdup(__r->prev, sizeof(*r->prev),
+					  GFP_KERNEL);
+			if (unlikely(!r->prev))
+				goto err_free_unmap;
+		}
+
+		if (__r->next) {
+			r->next = kmemdup(__r->next, sizeof(*r->next),
+					  GFP_KERNEL);
+			if (unlikely(!r->next))
+				goto err_free_prev;
+		}
+	}
+
+	list_add_tail(&op->entry, &ops->list);
+
+	return 0;
+
+err_free_unmap:
+	kfree(op->remap.unmap);
+err_free_prev:
+	kfree(op->remap.prev);
+err_free_op:
+	gpuva_op_free(mgr, op);
+err:
+	return -ENOMEM;
+}
+
+static int
+drm_gpuva_sm_step_map(struct drm_gpuva_op *__op, void *priv)
+{
+	return drm_gpuva_sm_step(__op, NULL, priv);
+}
+
+static struct drm_gpuva_fn_ops gpuva_list_ops = {
+	.sm_step_map = drm_gpuva_sm_step_map,
+	.sm_step_remap = drm_gpuva_sm_step,
+	.sm_step_unmap = drm_gpuva_sm_step,
+};
+
+/**
+ * drm_gpuva_sm_map_ops_create - creates the &drm_gpuva_ops to split and merge
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ * @req_addr: the start address of the new mapping
+ * @req_range: the range of the new mapping
+ * @req_obj: the &drm_gem_object to map
+ * @req_offset: the offset within the &drm_gem_object
+ *
+ * This function creates a list of operations to perform splitting and merging
+ * of existent mapping(s) with the newly requested one.
+ *
+ * The list can be iterated with &drm_gpuva_for_each_op and must be processed
+ * in the given order. It can contain map, unmap and remap operations, but it
+ * also can be empty if no operation is required, e.g. if the requested mapping
+ * already exists is the exact same way.
+ *
+ * There can be an arbitrary amount of unmap operations, a maximum of two remap
+ * operations and a single map operation. The latter one represents the original
+ * map operation requested by the caller.
+ *
+ * Note that before calling this function again with another mapping request it
+ * is necessary to update the &drm_gpuva_manager's view of the GPU VA space. The
+ * previously obtained operations must be either processed or abandoned. To
+ * update the &drm_gpuva_manager's view of the GPU VA space drm_gpuva_insert(),
+ * drm_gpuva_destroy_locked() and/or drm_gpuva_destroy_unlocked() should be
+ * used.
+ *
+ * After the caller finished processing the returned &drm_gpuva_ops, they must
+ * be freed with &drm_gpuva_ops_free.
+ *
+ * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
+ */
+struct drm_gpuva_ops *
+drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
+			    u64 req_addr, u64 req_range,
+			    struct drm_gem_object *req_obj, u64 req_offset)
+{
+	struct drm_gpuva_ops *ops;
+	struct {
+		struct drm_gpuva_manager *mgr;
+		struct drm_gpuva_ops *ops;
+	} args;
+	int ret;
+
+	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
+	if (unlikely(!ops))
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&ops->list);
+
+	args.mgr = mgr;
+	args.ops = ops;
+
+	ret = __drm_gpuva_sm_map(mgr, &gpuva_list_ops, &args,
+				 req_addr, req_range,
+				 req_obj, req_offset);
+	if (ret)
+		goto err_free_ops;
+
+	return ops;
+
+err_free_ops:
+	drm_gpuva_ops_free(mgr, ops);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(drm_gpuva_sm_map_ops_create);
+
+/**
+ * drm_gpuva_sm_unmap_ops_create - creates the &drm_gpuva_ops to split on unmap
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ * @req_addr: the start address of the range to unmap
+ * @req_range: the range of the mappings to unmap
+ *
+ * This function creates a list of operations to perform unmapping and, if
+ * required, splitting of the mappings overlapping the unmap range.
+ *
+ * The list can be iterated with &drm_gpuva_for_each_op and must be processed
+ * in the given order. It can contain unmap and remap operations, depending on
+ * whether there are actual overlapping mappings to split.
+ *
+ * There can be an arbitrary amount of unmap operations and a maximum of two
+ * remap operations.
+ *
+ * Note that before calling this function again with another range to unmap it
+ * is necessary to update the &drm_gpuva_manager's view of the GPU VA space. The
+ * previously obtained operations must be processed or abandoned. To update the
+ * &drm_gpuva_manager's view of the GPU VA space drm_gpuva_insert(),
+ * drm_gpuva_destroy_locked() and/or drm_gpuva_destroy_unlocked() should be
+ * used.
+ *
+ * After the caller finished processing the returned &drm_gpuva_ops, they must
+ * be freed with &drm_gpuva_ops_free.
+ *
+ * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
+ */
+struct drm_gpuva_ops *
+drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
+			      u64 req_addr, u64 req_range)
+{
+	struct drm_gpuva_ops *ops;
+	struct {
+		struct drm_gpuva_manager *mgr;
+		struct drm_gpuva_ops *ops;
+	} args;
+	int ret;
+
+	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
+	if (unlikely(!ops))
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&ops->list);
+
+	args.mgr = mgr;
+	args.ops = ops;
+
+	ret = __drm_gpuva_sm_unmap(mgr, &gpuva_list_ops, &args,
+				   req_addr, req_range);
+	if (ret)
+		goto err_free_ops;
+
+	return ops;
+
+err_free_ops:
+	drm_gpuva_ops_free(mgr, ops);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(drm_gpuva_sm_unmap_ops_create);
+
+/**
+ * drm_gpuva_prefetch_ops_create - creates the &drm_gpuva_ops to prefetch
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ * @addr: the start address of the range to prefetch
+ * @range: the range of the mappings to prefetch
+ *
+ * This function creates a list of operations to perform prefetching.
+ *
+ * The list can be iterated with &drm_gpuva_for_each_op and must be processed
+ * in the given order. It can contain prefetch operations.
+ *
+ * There can be an arbitrary amount of prefetch operations.
+ *
+ * After the caller finished processing the returned &drm_gpuva_ops, they must
+ * be freed with &drm_gpuva_ops_free.
+ *
+ * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
+ */
+struct drm_gpuva_ops *
+drm_gpuva_prefetch_ops_create(struct drm_gpuva_manager *mgr,
+			      u64 addr, u64 range)
+{
+	DRM_GPUVA_ITER(it, mgr, addr);
+	struct drm_gpuva_ops *ops;
+	struct drm_gpuva_op *op;
+	struct drm_gpuva *va;
+	int ret;
+
+	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
+	if (!ops)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&ops->list);
+
+	drm_gpuva_iter_for_each_range(va, it, addr + range) {
+		op = gpuva_op_alloc(mgr);
+		if (!op) {
+			ret = -ENOMEM;
+			goto err_free_ops;
+		}
+
+		op->op = DRM_GPUVA_OP_PREFETCH;
+		op->prefetch.va = va;
+		list_add_tail(&op->entry, &ops->list);
+	}
+
+	return ops;
+
+err_free_ops:
+	drm_gpuva_ops_free(mgr, ops);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(drm_gpuva_prefetch_ops_create);
+
+/**
+ * drm_gpuva_gem_unmap_ops_create - creates the &drm_gpuva_ops to unmap a GEM
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ * @obj: the &drm_gem_object to unmap
+ *
+ * This function creates a list of operations to perform unmapping for every
+ * GPUVA attached to a GEM.
+ *
+ * The list can be iterated with &drm_gpuva_for_each_op and consists out of an
+ * arbitrary amount of unmap operations.
+ *
+ * After the caller finished processing the returned &drm_gpuva_ops, they must
+ * be freed with &drm_gpuva_ops_free.
+ *
+ * It is the callers responsibility to protect the GEMs GPUVA list against
+ * concurrent access.
+ *
+ * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
+ */
+struct drm_gpuva_ops *
+drm_gpuva_gem_unmap_ops_create(struct drm_gpuva_manager *mgr,
+			       struct drm_gem_object *obj)
+{
+	struct drm_gpuva_ops *ops;
+	struct drm_gpuva_op *op;
+	struct drm_gpuva *va;
+	int ret;
+
+	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
+	if (!ops)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&ops->list);
+
+	drm_gem_for_each_gpuva(va, obj) {
+		op = gpuva_op_alloc(mgr);
+		if (!op) {
+			ret = -ENOMEM;
+			goto err_free_ops;
+		}
+
+		op->op = DRM_GPUVA_OP_UNMAP;
+		op->unmap.va = va;
+		list_add_tail(&op->entry, &ops->list);
+	}
+
+	return ops;
+
+err_free_ops:
+	drm_gpuva_ops_free(mgr, ops);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(drm_gpuva_gem_unmap_ops_create);
+
+
+/**
+ * drm_gpuva_ops_free - free the given &drm_gpuva_ops
+ * @mgr: the &drm_gpuva_manager the ops were created for
+ * @ops: the &drm_gpuva_ops to free
+ *
+ * Frees the given &drm_gpuva_ops structure including all the ops associated
+ * with it.
+ */
+void
+drm_gpuva_ops_free(struct drm_gpuva_manager *mgr,
+		   struct drm_gpuva_ops *ops)
+{
+	struct drm_gpuva_op *op, *next;
+
+	drm_gpuva_for_each_op_safe(op, next, ops) {
+		list_del(&op->entry);
+
+		if (op->op == DRM_GPUVA_OP_REMAP) {
+			kfree(op->remap.prev);
+			kfree(op->remap.next);
+			kfree(op->remap.unmap);
+		}
+
+		gpuva_op_free(mgr, op);
+	}
+
+	kfree(ops);
+}
+EXPORT_SYMBOL(drm_gpuva_ops_free);
diff --git a/include/drm/drm_debugfs.h b/include/drm/drm_debugfs.h
index 7616f457ce70..3031fcb96b39 100644
--- a/include/drm/drm_debugfs.h
+++ b/include/drm/drm_debugfs.h
@@ -34,6 +34,22 @@
 
 #include <linux/types.h>
 #include <linux/seq_file.h>
+
+#include <drm/drm_gpuva_mgr.h>
+
+/**
+ * DRM_DEBUGFS_GPUVA_INFO - &drm_info_list entry to dump a GPU VA space
+ * @show: the &drm_info_list's show callback
+ * @data: driver private data
+ *
+ * Drivers should use this macro to define a &drm_info_list entry to provide a
+ * debugfs file for dumping the GPU VA space regions and mappings.
+ *
+ * For each DRM GPU VA space drivers should call drm_debugfs_gpuva_info() from
+ * their @show callback.
+ */
+#define DRM_DEBUGFS_GPUVA_INFO(show, data) {"gpuvas", show, DRIVER_GEM_GPUVA, data}
+
 /**
  * struct drm_info_list - debugfs info list entry
  *
@@ -134,6 +150,8 @@ void drm_debugfs_add_file(struct drm_device *dev, const char *name,
 
 void drm_debugfs_add_files(struct drm_device *dev,
 			   const struct drm_debugfs_info *files, int count);
+int drm_debugfs_gpuva_info(struct seq_file *m,
+			   struct drm_gpuva_manager *mgr);
 #else
 static inline void drm_debugfs_create_files(const struct drm_info_list *files,
 					    int count, struct dentry *root,
@@ -155,6 +173,12 @@ static inline void drm_debugfs_add_files(struct drm_device *dev,
 					 const struct drm_debugfs_info *files,
 					 int count)
 {}
+
+static inline int drm_debugfs_gpuva_info(struct seq_file *m,
+					 struct drm_gpuva_manager *mgr)
+{
+	return 0;
+}
 #endif
 
 #endif /* _DRM_DEBUGFS_H_ */
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index b419c59c4bef..9e2ec2da0685 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -105,6 +105,13 @@ enum drm_driver_feature {
 	 */
 	DRIVER_COMPUTE_ACCEL            = BIT(7),
 
+	/**
+	 * @DRIVER_GEM_GPUVA:
+	 *
+	 * Driver supports user defined GPU VA bindings for GEM objects.
+	 */
+	DRIVER_GEM_GPUVA		= BIT(8),
+
 	/* IMPORTANT: Below are all the legacy flags, add new ones above. */
 
 	/**
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 7bd8e2bbbb36..3514a93df850 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -36,6 +36,8 @@
 
 #include <linux/kref.h>
 #include <linux/dma-resv.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
 
 #include <drm/drm_vma_manager.h>
 
@@ -347,6 +349,17 @@ struct drm_gem_object {
 	 */
 	struct dma_resv _resv;
 
+	/**
+	 * @gpuva:
+	 *
+	 * Provides the list and list mutex of GPU VAs attached to this
+	 * GEM object.
+	 */
+	struct {
+		struct list_head list;
+		struct mutex mutex;
+	} gpuva;
+
 	/**
 	 * @funcs:
 	 *
@@ -493,4 +506,66 @@ unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru,
 
 int drm_gem_evict(struct drm_gem_object *obj);
 
+/**
+ * drm_gem_gpuva_init - initialize the gpuva list of a GEM object
+ * @obj: the &drm_gem_object
+ *
+ * This initializes the &drm_gem_object's &drm_gpuva list and the mutex
+ * protecting it.
+ *
+ * Calling this function is only necessary for drivers intending to support the
+ * &drm_driver_feature DRIVER_GEM_GPUVA.
+ */
+static inline void drm_gem_gpuva_init(struct drm_gem_object *obj)
+{
+	INIT_LIST_HEAD(&obj->gpuva.list);
+	mutex_init(&obj->gpuva.mutex);
+}
+
+/**
+ * drm_gem_gpuva_lock - lock the GEM's gpuva list mutex
+ * @obj: the &drm_gem_object
+ *
+ * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
+ */
+static inline void drm_gem_gpuva_lock(struct drm_gem_object *obj)
+{
+	mutex_lock(&obj->gpuva.mutex);
+}
+
+/**
+ * drm_gem_gpuva_unlock - unlock the GEM's gpuva list mutex
+ * @obj: the &drm_gem_object
+ *
+ * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
+ */
+static inline void drm_gem_gpuva_unlock(struct drm_gem_object *obj)
+{
+	mutex_unlock(&obj->gpuva.mutex);
+}
+
+/**
+ * drm_gem_for_each_gpuva - iternator to walk over a list of gpuvas
+ * @entry: &drm_gpuva structure to assign to in each iteration step
+ * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
+ *
+ * This iterator walks over all &drm_gpuva structures associated with the
+ * &drm_gpuva_manager.
+ */
+#define drm_gem_for_each_gpuva(entry__, obj__) \
+	list_for_each_entry(entry__, &(obj__)->gpuva.list, gem.entry)
+
+/**
+ * drm_gem_for_each_gpuva_safe - iternator to safely walk over a list of gpuvas
+ * @entry: &drm_gpuva structure to assign to in each iteration step
+ * @next: &next &drm_gpuva to store the next step
+ * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
+ *
+ * This iterator walks over all &drm_gpuva structures associated with the
+ * &drm_gem_object. It is implemented with list_for_each_entry_safe(), hence
+ * it is save against removal of elements.
+ */
+#define drm_gem_for_each_gpuva_safe(entry__, next__, obj__) \
+	list_for_each_entry_safe(entry__, next__, &(obj__)->gpuva.list, gem.entry)
+
 #endif /* __DRM_GEM_H__ */
diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
new file mode 100644
index 000000000000..62169d850098
--- /dev/null
+++ b/include/drm/drm_gpuva_mgr.h
@@ -0,0 +1,681 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __DRM_GPUVA_MGR_H__
+#define __DRM_GPUVA_MGR_H__
+
+/*
+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/maple_tree.h>
+#include <linux/mm.h>
+#include <linux/rbtree.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+
+struct drm_gpuva_manager;
+struct drm_gpuva_fn_ops;
+struct drm_gpuva_prealloc;
+
+/**
+ * enum drm_gpuva_flags - flags for struct drm_gpuva
+ */
+enum drm_gpuva_flags {
+	/**
+	 * @DRM_GPUVA_EVICTED:
+	 *
+	 * Flag indicating that the &drm_gpuva's backing GEM is evicted.
+	 */
+	DRM_GPUVA_EVICTED = (1 << 0),
+
+	/**
+	 * @DRM_GPUVA_SPARSE:
+	 *
+	 * Flag indicating that the &drm_gpuva is a sparse mapping.
+	 */
+	DRM_GPUVA_SPARSE = (1 << 1),
+
+	/**
+	 * @DRM_GPUVA_USERBITS: user defined bits
+	 */
+	DRM_GPUVA_USERBITS = (1 << 2),
+};
+
+/**
+ * struct drm_gpuva - structure to track a GPU VA mapping
+ *
+ * This structure represents a GPU VA mapping and is associated with a
+ * &drm_gpuva_manager.
+ *
+ * Typically, this structure is embedded in bigger driver structures.
+ */
+struct drm_gpuva {
+	/**
+	 * @mgr: the &drm_gpuva_manager this object is associated with
+	 */
+	struct drm_gpuva_manager *mgr;
+
+	/**
+	 * @flags: the &drm_gpuva_flags for this mapping
+	 */
+	enum drm_gpuva_flags flags;
+
+	/**
+	 * @va: structure containing the address and range of the &drm_gpuva
+	 */
+	struct {
+		/**
+		 * @addr: the start address
+		 */
+		u64 addr;
+
+		/*
+		 * @range: the range
+		 */
+		u64 range;
+	} va;
+
+	/**
+	 * @gem: structure containing the &drm_gem_object and it's offset
+	 */
+	struct {
+		/**
+		 * @offset: the offset within the &drm_gem_object
+		 */
+		u64 offset;
+
+		/**
+		 * @obj: the mapped &drm_gem_object
+		 */
+		struct drm_gem_object *obj;
+
+		/**
+		 * @entry: the &list_head to attach this object to a &drm_gem_object
+		 */
+		struct list_head entry;
+	} gem;
+};
+
+void drm_gpuva_link(struct drm_gpuva *va);
+void drm_gpuva_unlink(struct drm_gpuva *va);
+
+int drm_gpuva_insert(struct drm_gpuva_manager *mgr,
+		     struct drm_gpuva *va);
+int drm_gpuva_insert_prealloc(struct drm_gpuva_manager *mgr,
+			      struct drm_gpuva_prealloc *pa,
+			      struct drm_gpuva *va);
+void drm_gpuva_remove(struct drm_gpuva *va);
+
+struct drm_gpuva *drm_gpuva_find(struct drm_gpuva_manager *mgr,
+				 u64 addr, u64 range);
+struct drm_gpuva *drm_gpuva_find_first(struct drm_gpuva_manager *mgr,
+				       u64 addr, u64 range);
+struct drm_gpuva *drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start);
+struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end);
+
+bool drm_gpuva_interval_empty(struct drm_gpuva_manager *mgr, u64 addr, u64 range);
+
+/**
+ * drm_gpuva_evict - sets whether the backing GEM of this &drm_gpuva is evicted
+ * @va: the &drm_gpuva to set the evict flag for
+ * @evict: indicates whether the &drm_gpuva is evicted
+ */
+static inline void drm_gpuva_evict(struct drm_gpuva *va, bool evict)
+{
+	if (evict)
+		va->flags |= DRM_GPUVA_EVICTED;
+	else
+		va->flags &= ~DRM_GPUVA_EVICTED;
+}
+
+/**
+ * drm_gpuva_evicted - indicates whether the backing BO of this &drm_gpuva
+ * is evicted
+ * @va: the &drm_gpuva to check
+ */
+static inline bool drm_gpuva_evicted(struct drm_gpuva *va)
+{
+	return va->flags & DRM_GPUVA_EVICTED;
+}
+
+/**
+ * struct drm_gpuva_manager - DRM GPU VA Manager
+ *
+ * The DRM GPU VA Manager keeps track of a GPU's virtual address space by using
+ * &maple_tree structures. Typically, this structure is embedded in bigger
+ * driver structures.
+ *
+ * Drivers can pass addresses and ranges in an arbitrary unit, e.g. bytes or
+ * pages.
+ *
+ * There should be one manager instance per GPU virtual address space.
+ */
+struct drm_gpuva_manager {
+	/**
+	 * @name: the name of the DRM GPU VA space
+	 */
+	const char *name;
+
+	/**
+	 * @mm_start: start of the VA space
+	 */
+	u64 mm_start;
+
+	/**
+	 * @mm_range: length of the VA space
+	 */
+	u64 mm_range;
+
+	/**
+	 * @mtree: the &maple_tree to track GPU VA mappings
+	 */
+	struct maple_tree mtree;
+
+	/**
+	 * @kernel_alloc_node:
+	 *
+	 * &drm_gpuva representing the address space cutout reserved for
+	 * the kernel
+	 */
+	struct drm_gpuva kernel_alloc_node;
+
+	/**
+	 * @ops: &drm_gpuva_fn_ops providing the split/merge steps to drivers
+	 */
+	struct drm_gpuva_fn_ops *ops;
+};
+
+void drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
+			    const char *name,
+			    u64 start_offset, u64 range,
+			    u64 reserve_offset, u64 reserve_range,
+			    struct drm_gpuva_fn_ops *ops);
+void drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr);
+
+/**
+ * struct drm_gpuva_prealloc - holds a preallocated node for the
+ * &drm_gpuva_manager to insert a single new entry
+ */
+struct drm_gpuva_prealloc {
+	/**
+	 * @mas: the maple tree advanced state
+	 */
+	struct ma_state mas;
+};
+
+struct drm_gpuva_prealloc * drm_gpuva_prealloc_create(void);
+void drm_gpuva_prealloc_destroy(struct drm_gpuva_prealloc *pa);
+
+/**
+ * struct drm_gpuva_iterator - iterator for walking the internal (maple) tree
+ */
+struct drm_gpuva_iterator {
+	/**
+	 * @mas: the maple tree advanced state
+	 */
+	struct ma_state mas;
+
+	/**
+	 * @mgr: the &drm_gpuva_manager to iterate
+	 */
+	struct drm_gpuva_manager *mgr;
+};
+typedef struct drm_gpuva_iterator * drm_gpuva_state_t;
+
+void drm_gpuva_iter_remove(struct drm_gpuva_iterator *it);
+int drm_gpuva_iter_va_replace(struct drm_gpuva_iterator *it,
+			      struct drm_gpuva *va);
+
+static inline struct drm_gpuva *
+drm_gpuva_iter_find(struct drm_gpuva_iterator *it, unsigned long max)
+{
+	struct drm_gpuva *va;
+
+	mas_lock(&it->mas);
+	va = mas_find(&it->mas, max);
+	mas_unlock(&it->mas);
+
+	return va;
+}
+
+/**
+ * DRM_GPUVA_ITER - create an iterator structure to iterate the &drm_gpuva tree
+ * @name: the name of the &drm_gpuva_iterator to create
+ * @mgr__: the &drm_gpuva_manager to iterate
+ * @start: starting offset, the first entry will overlap this
+ */
+#define DRM_GPUVA_ITER(name, mgr__, start)				\
+	struct drm_gpuva_iterator name = {				\
+		.mas = MA_STATE_INIT(&(mgr__)->mtree, start, 0),	\
+		.mgr = mgr__,						\
+	}
+
+/**
+ * drm_gpuva_iter_for_each_range - iternator to walk over a range of entries
+ * @va__: the &drm_gpuva found for the current iteration
+ * @it__: &drm_gpuva_iterator structure to assign to in each iteration step
+ * @end__: ending offset, the last entry will start before this (but may overlap)
+ *
+ * This function can be used to iterate &drm_gpuva objects.
+ *
+ * It is safe against the removal of elements using &drm_gpuva_iter_remove,
+ * however it is not safe against the removal of elements using
+ * &drm_gpuva_remove.
+ */
+#define drm_gpuva_iter_for_each_range(va__, it__, end__) \
+	while (((va__) = drm_gpuva_iter_find(&(it__), (end__) - 1)))
+
+/**
+ * drm_gpuva_iter_for_each - iternator to walk over all existing entries
+ * @va__: the &drm_gpuva found for the current iteration
+ * @it__: &drm_gpuva_iterator structure to assign to in each iteration step
+ *
+ * This function can be used to iterate &drm_gpuva objects.
+ *
+ * In order to walk over all potentially existing entries, the
+ * &drm_gpuva_iterator must be initialized to start at
+ * &drm_gpuva_manager->mm_start or simply 0.
+ *
+ * It is safe against the removal of elements using &drm_gpuva_iter_remove,
+ * however it is not safe against the removal of elements using
+ * &drm_gpuva_remove.
+ */
+#define drm_gpuva_iter_for_each(va__, it__) \
+	drm_gpuva_iter_for_each_range(va__, it__, (it__).mgr->mm_start + (it__).mgr->mm_range)
+
+/**
+ * enum drm_gpuva_op_type - GPU VA operation type
+ *
+ * Operations to alter the GPU VA mappings tracked by the &drm_gpuva_manager.
+ */
+enum drm_gpuva_op_type {
+	/**
+	 * @DRM_GPUVA_OP_MAP: the map op type
+	 */
+	DRM_GPUVA_OP_MAP,
+
+	/**
+	 * @DRM_GPUVA_OP_REMAP: the remap op type
+	 */
+	DRM_GPUVA_OP_REMAP,
+
+	/**
+	 * @DRM_GPUVA_OP_UNMAP: the unmap op type
+	 */
+	DRM_GPUVA_OP_UNMAP,
+
+	/**
+	 * @DRM_GPUVA_OP_PREFETCH: the prefetch op type
+	 */
+	DRM_GPUVA_OP_PREFETCH,
+};
+
+/**
+ * struct drm_gpuva_op_map - GPU VA map operation
+ *
+ * This structure represents a single map operation generated by the
+ * DRM GPU VA manager.
+ */
+struct drm_gpuva_op_map {
+	/**
+	 * @va: structure containing address and range of a map
+	 * operation
+	 */
+	struct {
+		/**
+		 * @addr: the base address of the new mapping
+		 */
+		u64 addr;
+
+		/**
+		 * @range: the range of the new mapping
+		 */
+		u64 range;
+	} va;
+
+	/**
+	 * @gem: structure containing the &drm_gem_object and it's offset
+	 */
+	struct {
+		/**
+		 * @offset: the offset within the &drm_gem_object
+		 */
+		u64 offset;
+
+		/**
+		 * @obj: the &drm_gem_object to map
+		 */
+		struct drm_gem_object *obj;
+	} gem;
+};
+
+/**
+ * struct drm_gpuva_op_unmap - GPU VA unmap operation
+ *
+ * This structure represents a single unmap operation generated by the
+ * DRM GPU VA manager.
+ */
+struct drm_gpuva_op_unmap {
+	/**
+	 * @va: the &drm_gpuva to unmap
+	 */
+	struct drm_gpuva *va;
+
+	/**
+	 * @keep:
+	 *
+	 * Indicates whether this &drm_gpuva is physically contiguous with the
+	 * original mapping request.
+	 *
+	 * Optionally, if &keep is set, drivers may keep the actual page table
+	 * mappings for this &drm_gpuva, adding the missing page table entries
+	 * only and update the &drm_gpuva_manager accordingly.
+	 */
+	bool keep;
+};
+
+/**
+ * struct drm_gpuva_op_remap - GPU VA remap operation
+ *
+ * This represents a single remap operation generated by the DRM GPU VA manager.
+ *
+ * A remap operation is generated when an existing GPU VA mmapping is split up
+ * by inserting a new GPU VA mapping or by partially unmapping existent
+ * mapping(s), hence it consists of a maximum of two map and one unmap
+ * operation.
+ *
+ * The @unmap operation takes care of removing the original existing mapping.
+ * @prev is used to remap the preceding part, @next the subsequent part.
+ *
+ * If either a new mapping's start address is aligned with the start address
+ * of the old mapping or the new mapping's end address is aligned with the
+ * end address of the old mapping, either @prev or @next is NULL.
+ *
+ * Note, the reason for a dedicated remap operation, rather than arbitrary
+ * unmap and map operations, is to give drivers the chance of extracting driver
+ * specific data for creating the new mappings from the unmap operations's
+ * &drm_gpuva structure which typically is embedded in larger driver specific
+ * structures.
+ */
+struct drm_gpuva_op_remap {
+	/**
+	 * @prev: the preceding part of a split mapping
+	 */
+	struct drm_gpuva_op_map *prev;
+
+	/**
+	 * @next: the subsequent part of a split mapping
+	 */
+	struct drm_gpuva_op_map *next;
+
+	/**
+	 * @unmap: the unmap operation for the original existing mapping
+	 */
+	struct drm_gpuva_op_unmap *unmap;
+};
+
+/**
+ * struct drm_gpuva_op_prefetch - GPU VA prefetch operation
+ *
+ * This structure represents a single prefetch operation generated by the
+ * DRM GPU VA manager.
+ */
+struct drm_gpuva_op_prefetch {
+	/**
+	 * @va: the &drm_gpuva to prefetch
+	 */
+	struct drm_gpuva *va;
+};
+
+/**
+ * struct drm_gpuva_op - GPU VA operation
+ *
+ * This structure represents a single generic operation.
+ *
+ * The particular type of the operation is defined by @op.
+ */
+struct drm_gpuva_op {
+	/**
+	 * @entry:
+	 *
+	 * The &list_head used to distribute instances of this struct within
+	 * &drm_gpuva_ops.
+	 */
+	struct list_head entry;
+
+	/**
+	 * @op: the type of the operation
+	 */
+	enum drm_gpuva_op_type op;
+
+	union {
+		/**
+		 * @map: the map operation
+		 */
+		struct drm_gpuva_op_map map;
+
+		/**
+		 * @remap: the remap operation
+		 */
+		struct drm_gpuva_op_remap remap;
+
+		/**
+		 * @unmap: the unmap operation
+		 */
+		struct drm_gpuva_op_unmap unmap;
+
+		/**
+		 * @prefetch: the prefetch operation
+		 */
+		struct drm_gpuva_op_prefetch prefetch;
+	};
+};
+
+/**
+ * struct drm_gpuva_ops - wraps a list of &drm_gpuva_op
+ */
+struct drm_gpuva_ops {
+	/**
+	 * @list: the &list_head
+	 */
+	struct list_head list;
+};
+
+/**
+ * drm_gpuva_for_each_op - iterator to walk over &drm_gpuva_ops
+ * @op: &drm_gpuva_op to assign in each iteration step
+ * @ops: &drm_gpuva_ops to walk
+ *
+ * This iterator walks over all ops within a given list of operations.
+ */
+#define drm_gpuva_for_each_op(op, ops) list_for_each_entry(op, &(ops)->list, entry)
+
+/**
+ * drm_gpuva_for_each_op_safe - iterator to safely walk over &drm_gpuva_ops
+ * @op: &drm_gpuva_op to assign in each iteration step
+ * @next: &next &drm_gpuva_op to store the next step
+ * @ops: &drm_gpuva_ops to walk
+ *
+ * This iterator walks over all ops within a given list of operations. It is
+ * implemented with list_for_each_safe(), so save against removal of elements.
+ */
+#define drm_gpuva_for_each_op_safe(op, next, ops) \
+	list_for_each_entry_safe(op, next, &(ops)->list, entry)
+
+/**
+ * drm_gpuva_for_each_op_from_reverse - iterate backwards from the given point
+ * @op: &drm_gpuva_op to assign in each iteration step
+ * @ops: &drm_gpuva_ops to walk
+ *
+ * This iterator walks over all ops within a given list of operations beginning
+ * from the given operation in reverse order.
+ */
+#define drm_gpuva_for_each_op_from_reverse(op, ops) \
+	list_for_each_entry_from_reverse(op, &(ops)->list, entry)
+
+/**
+ * drm_gpuva_first_op - returns the first &drm_gpuva_op from &drm_gpuva_ops
+ * @ops: the &drm_gpuva_ops to get the fist &drm_gpuva_op from
+ */
+#define drm_gpuva_first_op(ops) \
+	list_first_entry(&(ops)->list, struct drm_gpuva_op, entry)
+
+/**
+ * drm_gpuva_last_op - returns the last &drm_gpuva_op from &drm_gpuva_ops
+ * @ops: the &drm_gpuva_ops to get the last &drm_gpuva_op from
+ */
+#define drm_gpuva_last_op(ops) \
+	list_last_entry(&(ops)->list, struct drm_gpuva_op, entry)
+
+/**
+ * drm_gpuva_prev_op - previous &drm_gpuva_op in the list
+ * @op: the current &drm_gpuva_op
+ */
+#define drm_gpuva_prev_op(op) list_prev_entry(op, entry)
+
+/**
+ * drm_gpuva_next_op - next &drm_gpuva_op in the list
+ * @op: the current &drm_gpuva_op
+ */
+#define drm_gpuva_next_op(op) list_next_entry(op, entry)
+
+struct drm_gpuva_ops *
+drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
+			    u64 addr, u64 range,
+			    struct drm_gem_object *obj, u64 offset);
+struct drm_gpuva_ops *
+drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
+			      u64 addr, u64 range);
+
+struct drm_gpuva_ops *
+drm_gpuva_prefetch_ops_create(struct drm_gpuva_manager *mgr,
+				 u64 addr, u64 range);
+
+struct drm_gpuva_ops *
+drm_gpuva_gem_unmap_ops_create(struct drm_gpuva_manager *mgr,
+			       struct drm_gem_object *obj);
+
+void drm_gpuva_ops_free(struct drm_gpuva_manager *mgr,
+			struct drm_gpuva_ops *ops);
+
+/**
+ * struct drm_gpuva_fn_ops - callbacks for split/merge steps
+ *
+ * This structure defines the callbacks used by &drm_gpuva_sm_map and
+ * &drm_gpuva_sm_unmap to provide the split/merge steps for map and unmap
+ * operations to drivers.
+ */
+struct drm_gpuva_fn_ops {
+	/**
+	 * @op_alloc: called when the &drm_gpuva_manager allocates
+	 * a struct drm_gpuva_op
+	 *
+	 * Some drivers may want to embed struct drm_gpuva_op into driver
+	 * specific structures. By implementing this callback drivers can
+	 * allocate memory accordingly.
+	 *
+	 * This callback is optional.
+	 */
+	struct drm_gpuva_op *(*op_alloc)(void);
+
+	/**
+	 * @op_free: called when the &drm_gpuva_manager frees a
+	 * struct drm_gpuva_op
+	 *
+	 * Some drivers may want to embed struct drm_gpuva_op into driver
+	 * specific structures. By implementing this callback drivers can
+	 * free the previously allocated memory accordingly.
+	 *
+	 * This callback is optional.
+	 */
+	void (*op_free)(struct drm_gpuva_op *op);
+
+	/**
+	 * @sm_step_map: called from &drm_gpuva_sm_map to finally insert the
+	 * mapping once all previous steps were completed
+	 *
+	 * The &priv pointer matches the one the driver passed to
+	 * &drm_gpuva_sm_map or &drm_gpuva_sm_unmap, respectively.
+	 *
+	 * Can be NULL if &drm_gpuva_sm_map is used.
+	 */
+	int (*sm_step_map)(struct drm_gpuva_op *op, void *priv);
+
+	/**
+	 * @sm_step_remap: called from &drm_gpuva_sm_map and
+	 * &drm_gpuva_sm_unmap to split up an existent mapping
+	 *
+	 * This callback is called when existent mapping needs to be split up.
+	 * This is the case when either a newly requested mapping overlaps or
+	 * is enclosed by an existent mapping or a partial unmap of an existent
+	 * mapping is requested.
+	 *
+	 * Drivers must not modify the GPUVA space with accessors that do not
+	 * take a &drm_gpuva_state as argument from this callback.
+	 *
+	 * The &priv pointer matches the one the driver passed to
+	 * &drm_gpuva_sm_map or &drm_gpuva_sm_unmap, respectively.
+	 *
+	 * Can be NULL if neither &drm_gpuva_sm_map nor &drm_gpuva_sm_unmap is
+	 * used.
+	 */
+	int (*sm_step_remap)(struct drm_gpuva_op *op,
+			     drm_gpuva_state_t state,
+			     void *priv);
+
+	/**
+	 * @sm_step_unmap: called from &drm_gpuva_sm_map and
+	 * &drm_gpuva_sm_unmap to unmap an existent mapping
+	 *
+	 * This callback is called when existent mapping needs to be unmapped.
+	 * This is the case when either a newly requested mapping encloses an
+	 * existent mapping or an unmap of an existent mapping is requested.
+	 *
+	 * Drivers must not modify the GPUVA space with accessors that do not
+	 * take a &drm_gpuva_state as argument from this callback.
+	 *
+	 * The &priv pointer matches the one the driver passed to
+	 * &drm_gpuva_sm_map or &drm_gpuva_sm_unmap, respectively.
+	 *
+	 * Can be NULL if neither &drm_gpuva_sm_map nor &drm_gpuva_sm_unmap is
+	 * used.
+	 */
+	int (*sm_step_unmap)(struct drm_gpuva_op *op,
+			     drm_gpuva_state_t state,
+			     void *priv);
+};
+
+int drm_gpuva_sm_map(struct drm_gpuva_manager *mgr, void *priv,
+		     u64 addr, u64 range,
+		     struct drm_gem_object *obj, u64 offset);
+
+int drm_gpuva_sm_unmap(struct drm_gpuva_manager *mgr, void *priv,
+		       u64 addr, u64 range);
+
+int drm_gpuva_map(struct drm_gpuva_manager *mgr,
+		  struct drm_gpuva_prealloc *pa,
+		  struct drm_gpuva *va);
+int drm_gpuva_remap(drm_gpuva_state_t state,
+		    struct drm_gpuva *prev,
+		    struct drm_gpuva *next);
+void drm_gpuva_unmap(drm_gpuva_state_t state);
+
+#endif /* __DRM_GPUVA_MGR_H__ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 16/31] drm/xe: Port Xe to GPUVA
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (14 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 15/31] drm: manager to keep track of GPUs VA mappings Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-09 13:52   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 17/31] drm/xe: NULL binding implementation Matthew Brost
                   ` (16 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

Rather than open coding VM binds and VMA tracking, use the GPUVA
library. GPUVA provides a common infrastructure for VM binds to use mmap
/ munmap semantics and support for VK sparse bindings.

The concepts are:

1) xe_vm inherits from drm_gpuva_manager
2) xe_vma inherits from drm_gpuva
3) xe_vma_op inherits from drm_gpuva_op
4) VM bind operations (MAP, UNMAP, PREFETCH, UNMAP_ALL) call into the
GPUVA code to generate an VMA operations list which is parsed, commited,
and executed.

v2 (CI): Add break after default in case statement.
v3: Rebase
v4: Fix some error handling

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c                  |   10 +-
 drivers/gpu/drm/xe/xe_device.c              |    2 +-
 drivers/gpu/drm/xe/xe_exec.c                |    2 +-
 drivers/gpu/drm/xe/xe_gt_pagefault.c        |   23 +-
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c |   14 +-
 drivers/gpu/drm/xe/xe_migrate.c             |    8 +-
 drivers/gpu/drm/xe/xe_pt.c                  |  106 +-
 drivers/gpu/drm/xe/xe_trace.h               |   10 +-
 drivers/gpu/drm/xe/xe_vm.c                  | 1849 +++++++++----------
 drivers/gpu/drm/xe/xe_vm.h                  |   66 +-
 drivers/gpu/drm/xe/xe_vm_madvise.c          |   88 +-
 drivers/gpu/drm/xe/xe_vm_types.h            |  167 +-
 12 files changed, 1171 insertions(+), 1174 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index da99ee53e7d7..a475d0584916 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -391,7 +391,8 @@ static int xe_bo_trigger_rebind(struct xe_device *xe, struct xe_bo *bo,
 {
 	struct dma_resv_iter cursor;
 	struct dma_fence *fence;
-	struct xe_vma *vma;
+	struct drm_gpuva *gpuva;
+	struct drm_gem_object *obj = &bo->ttm.base;
 	int ret = 0;
 
 	dma_resv_assert_held(bo->ttm.base.resv);
@@ -404,8 +405,9 @@ static int xe_bo_trigger_rebind(struct xe_device *xe, struct xe_bo *bo,
 		dma_resv_iter_end(&cursor);
 	}
 
-	list_for_each_entry(vma, &bo->vmas, bo_link) {
-		struct xe_vm *vm = vma->vm;
+	drm_gem_for_each_gpuva(gpuva, obj) {
+		struct xe_vma *vma = gpuva_to_vma(gpuva);
+		struct xe_vm *vm = xe_vma_vm(vma);
 
 		trace_xe_vma_evict(vma);
 
@@ -430,10 +432,8 @@ static int xe_bo_trigger_rebind(struct xe_device *xe, struct xe_bo *bo,
 			} else {
 				ret = timeout;
 			}
-
 		} else {
 			bool vm_resv_locked = false;
-			struct xe_vm *vm = vma->vm;
 
 			/*
 			 * We need to put the vma on the vm's rebind_list,
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 45d6e5ff47fd..ab2ecd208f97 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -131,7 +131,7 @@ static struct drm_driver driver = {
 	.driver_features =
 	    DRIVER_GEM |
 	    DRIVER_RENDER | DRIVER_SYNCOBJ |
-	    DRIVER_SYNCOBJ_TIMELINE,
+	    DRIVER_SYNCOBJ_TIMELINE | DRIVER_GEM_GPUVA,
 	.open = xe_file_open,
 	.postclose = xe_file_close,
 
diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 21a9c2fddf86..90c46d092737 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -119,7 +119,7 @@ static int xe_exec_begin(struct xe_engine *e, struct ww_acquire_ctx *ww,
 		if (xe_vma_is_userptr(vma))
 			continue;
 
-		err = xe_bo_validate(vma->bo, vm, false);
+		err = xe_bo_validate(xe_vma_bo(vma), vm, false);
 		if (err) {
 			xe_vm_unlock_dma_resv(vm, tv_onstack, *tv, ww, objs);
 			*tv = NULL;
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 1677640e1075..f7a066090a13 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -75,9 +75,10 @@ static bool vma_is_valid(struct xe_gt *gt, struct xe_vma *vma)
 		!(BIT(gt->info.id) & vma->usm.gt_invalidated);
 }
 
-static bool vma_matches(struct xe_vma *vma, struct xe_vma *lookup)
+static bool vma_matches(struct xe_vma *vma, u64 page_addr)
 {
-	if (lookup->start > vma->end || lookup->end < vma->start)
+	if (page_addr > xe_vma_end(vma) - 1 ||
+	    page_addr + SZ_4K < xe_vma_start(vma))
 		return false;
 
 	return true;
@@ -90,16 +91,14 @@ static bool only_needs_bo_lock(struct xe_bo *bo)
 
 static struct xe_vma *lookup_vma(struct xe_vm *vm, u64 page_addr)
 {
-	struct xe_vma *vma = NULL, lookup;
+	struct xe_vma *vma = NULL;
 
-	lookup.start = page_addr;
-	lookup.end = lookup.start + SZ_4K - 1;
 	if (vm->usm.last_fault_vma) {   /* Fast lookup */
-		if (vma_matches(vm->usm.last_fault_vma, &lookup))
+		if (vma_matches(vm->usm.last_fault_vma, page_addr))
 			vma = vm->usm.last_fault_vma;
 	}
 	if (!vma)
-		vma = xe_vm_find_overlapping_vma(vm, &lookup);
+		vma = xe_vm_find_overlapping_vma(vm, page_addr, SZ_4K);
 
 	return vma;
 }
@@ -170,7 +169,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 	}
 
 	/* Lock VM and BOs dma-resv */
-	bo = vma->bo;
+	bo = xe_vma_bo(vma);
 	if (only_needs_bo_lock(bo)) {
 		/* This path ensures the BO's LRU is updated */
 		ret = xe_bo_lock(bo, &ww, xe->info.tile_count, false);
@@ -487,12 +486,8 @@ static struct xe_vma *get_acc_vma(struct xe_vm *vm, struct acc *acc)
 {
 	u64 page_va = acc->va_range_base + (ffs(acc->sub_granularity) - 1) *
 		sub_granularity_in_byte(acc->granularity);
-	struct xe_vma lookup;
-
-	lookup.start = page_va;
-	lookup.end = lookup.start + SZ_4K - 1;
 
-	return xe_vm_find_overlapping_vma(vm, &lookup);
+	return xe_vm_find_overlapping_vma(vm, page_va, SZ_4K);
 }
 
 static int handle_acc(struct xe_gt *gt, struct acc *acc)
@@ -536,7 +531,7 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
 		goto unlock_vm;
 
 	/* Lock VM and BOs dma-resv */
-	bo = vma->bo;
+	bo = xe_vma_bo(vma);
 	if (only_needs_bo_lock(bo)) {
 		/* This path ensures the BO's LRU is updated */
 		ret = xe_bo_lock(bo, &ww, xe->info.tile_count, false);
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 604f189dbd70..66fd67ffa8a9 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -201,8 +201,8 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
 	if (!xe->info.has_range_tlb_invalidation) {
 		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
 	} else {
-		u64 start = vma->start;
-		u64 length = vma->end - vma->start + 1;
+		u64 start = xe_vma_start(vma);
+		u64 length = xe_vma_size(vma);
 		u64 align, end;
 
 		if (length < SZ_4K)
@@ -215,12 +215,12 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
 		 * address mask covering the required range.
 		 */
 		align = roundup_pow_of_two(length);
-		start = ALIGN_DOWN(vma->start, align);
-		end = ALIGN(vma->start + length, align);
+		start = ALIGN_DOWN(xe_vma_start(vma), align);
+		end = ALIGN(xe_vma_start(vma) + length, align);
 		length = align;
 		while (start + length < end) {
 			length <<= 1;
-			start = ALIGN_DOWN(vma->start, length);
+			start = ALIGN_DOWN(xe_vma_start(vma), length);
 		}
 
 		/*
@@ -229,7 +229,7 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
 		 */
 		if (length >= SZ_2M) {
 			length = max_t(u64, SZ_16M, length);
-			start = ALIGN_DOWN(vma->start, length);
+			start = ALIGN_DOWN(xe_vma_start(vma), length);
 		}
 
 		XE_BUG_ON(length < SZ_4K);
@@ -238,7 +238,7 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
 		XE_BUG_ON(!IS_ALIGNED(start, length));
 
 		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
-		action[len++] = vma->vm->usm.asid;
+		action[len++] = xe_vma_vm(vma)->usm.asid;
 		action[len++] = lower_32_bits(start);
 		action[len++] = upper_32_bits(start);
 		action[len++] = ilog2(length) - ilog2(SZ_4K);
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index f40f47ccb76f..b44aa094a466 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -1050,8 +1050,10 @@ xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
 		return ERR_PTR(-ETIME);
 
 	if (wait_vm && !dma_resv_test_signaled(&vm->resv,
-					       DMA_RESV_USAGE_BOOKKEEP))
+					       DMA_RESV_USAGE_BOOKKEEP)) {
+		vm_dbg(&vm->xe->drm, "wait on VM for munmap");
 		return ERR_PTR(-ETIME);
+	}
 
 	if (ops->pre_commit) {
 		err = ops->pre_commit(pt_update);
@@ -1139,7 +1141,8 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
 	u64 addr;
 	int err = 0;
 	bool usm = !eng && xe->info.supports_usm;
-	bool first_munmap_rebind = vma && vma->first_munmap_rebind;
+	bool first_munmap_rebind = vma &&
+		vma->gpuva.flags & XE_VMA_FIRST_REBIND;
 
 	/* Use the CPU if no in syncs and engine is idle */
 	if (no_in_syncs(syncs, num_syncs) && (!eng || xe_engine_is_idle(eng))) {
@@ -1260,6 +1263,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
 	 * trigger preempts before moving forward
 	 */
 	if (first_munmap_rebind) {
+		vm_dbg(&vm->xe->drm, "wait on first_munmap_rebind");
 		err = job_add_deps(job, &vm->resv,
 				   DMA_RESV_USAGE_BOOKKEEP);
 		if (err)
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 4ee5ea2cabc9..2b5b05a8a084 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -94,7 +94,7 @@ static dma_addr_t vma_addr(struct xe_vma *vma, u64 offset,
 				&cur);
 		return xe_res_dma(&cur) + offset;
 	} else {
-		return xe_bo_addr(vma->bo, offset, page_size, is_vram);
+		return xe_bo_addr(xe_vma_bo(vma), offset, page_size, is_vram);
 	}
 }
 
@@ -159,7 +159,7 @@ u64 gen8_pte_encode(struct xe_vma *vma, struct xe_bo *bo,
 
 	if (is_vram) {
 		pte |= XE_PPGTT_PTE_LM;
-		if (vma && vma->use_atomic_access_pte_bit)
+		if (vma && vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT)
 			pte |= XE_USM_PPGTT_PTE_AE;
 	}
 
@@ -738,7 +738,7 @@ static int
 xe_pt_stage_bind(struct xe_gt *gt, struct xe_vma *vma,
 		 struct xe_vm_pgtable_update *entries, u32 *num_entries)
 {
-	struct xe_bo *bo = vma->bo;
+	struct xe_bo *bo = xe_vma_bo(vma);
 	bool is_vram = !xe_vma_is_userptr(vma) && bo && xe_bo_is_vram(bo);
 	struct xe_res_cursor curs;
 	struct xe_pt_stage_bind_walk xe_walk = {
@@ -747,22 +747,23 @@ xe_pt_stage_bind(struct xe_gt *gt, struct xe_vma *vma,
 			.shifts = xe_normal_pt_shifts,
 			.max_level = XE_PT_HIGHEST_LEVEL,
 		},
-		.vm = vma->vm,
+		.vm = xe_vma_vm(vma),
 		.gt = gt,
 		.curs = &curs,
-		.va_curs_start = vma->start,
-		.pte_flags = vma->pte_flags,
+		.va_curs_start = xe_vma_start(vma),
+		.pte_flags = xe_vma_read_only(vma) ? XE_PTE_READ_ONLY : 0,
 		.wupd.entries = entries,
-		.needs_64K = (vma->vm->flags & XE_VM_FLAGS_64K) && is_vram,
+		.needs_64K = (xe_vma_vm(vma)->flags & XE_VM_FLAGS_64K) &&
+			is_vram,
 	};
-	struct xe_pt *pt = vma->vm->pt_root[gt->info.id];
+	struct xe_pt *pt = xe_vma_vm(vma)->pt_root[gt->info.id];
 	int ret;
 
 	if (is_vram) {
 		struct xe_gt *bo_gt = xe_bo_to_gt(bo);
 
 		xe_walk.default_pte = XE_PPGTT_PTE_LM;
-		if (vma && vma->use_atomic_access_pte_bit)
+		if (vma && vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT)
 			xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
 		xe_walk.dma_offset = bo_gt->mem.vram.io_start -
 			gt_to_xe(gt)->mem.vram.io_start;
@@ -778,17 +779,16 @@ xe_pt_stage_bind(struct xe_gt *gt, struct xe_vma *vma,
 
 	xe_bo_assert_held(bo);
 	if (xe_vma_is_userptr(vma))
-		xe_res_first_sg(vma->userptr.sg, 0, vma->end - vma->start + 1,
-				&curs);
+		xe_res_first_sg(vma->userptr.sg, 0, xe_vma_size(vma), &curs);
 	else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
-		xe_res_first(bo->ttm.resource, vma->bo_offset,
-			     vma->end - vma->start + 1, &curs);
+		xe_res_first(bo->ttm.resource, xe_vma_bo_offset(vma),
+			     xe_vma_size(vma), &curs);
 	else
-		xe_res_first_sg(xe_bo_get_sg(bo), vma->bo_offset,
-				vma->end - vma->start + 1, &curs);
+		xe_res_first_sg(xe_bo_get_sg(bo), xe_vma_bo_offset(vma),
+				xe_vma_size(vma), &curs);
 
-	ret = drm_pt_walk_range(&pt->drm, pt->level, vma->start, vma->end + 1,
-				&xe_walk.drm);
+	ret = drm_pt_walk_range(&pt->drm, pt->level, xe_vma_start(vma),
+				xe_vma_end(vma), &xe_walk.drm);
 
 	*num_entries = xe_walk.wupd.num_used_entries;
 	return ret;
@@ -923,13 +923,13 @@ bool xe_pt_zap_ptes(struct xe_gt *gt, struct xe_vma *vma)
 		},
 		.gt = gt,
 	};
-	struct xe_pt *pt = vma->vm->pt_root[gt->info.id];
+	struct xe_pt *pt = xe_vma_vm(vma)->pt_root[gt->info.id];
 
 	if (!(vma->gt_present & BIT(gt->info.id)))
 		return false;
 
-	(void)drm_pt_walk_shared(&pt->drm, pt->level, vma->start, vma->end + 1,
-				 &xe_walk.drm);
+	(void)drm_pt_walk_shared(&pt->drm, pt->level, xe_vma_start(vma),
+				 xe_vma_end(vma), &xe_walk.drm);
 
 	return xe_walk.needs_invalidate;
 }
@@ -966,21 +966,21 @@ static void xe_pt_abort_bind(struct xe_vma *vma,
 			continue;
 
 		for (j = 0; j < entries[i].qwords; j++)
-			xe_pt_destroy(entries[i].pt_entries[j].pt, vma->vm->flags, NULL);
+			xe_pt_destroy(entries[i].pt_entries[j].pt, xe_vma_vm(vma)->flags, NULL);
 		kfree(entries[i].pt_entries);
 	}
 }
 
 static void xe_pt_commit_locks_assert(struct xe_vma *vma)
 {
-	struct xe_vm *vm = vma->vm;
+	struct xe_vm *vm = xe_vma_vm(vma);
 
 	lockdep_assert_held(&vm->lock);
 
 	if (xe_vma_is_userptr(vma))
 		lockdep_assert_held_read(&vm->userptr.notifier_lock);
 	else
-		dma_resv_assert_held(vma->bo->ttm.base.resv);
+		dma_resv_assert_held(xe_vma_bo(vma)->ttm.base.resv);
 
 	dma_resv_assert_held(&vm->resv);
 }
@@ -1013,7 +1013,7 @@ static void xe_pt_commit_bind(struct xe_vma *vma,
 
 			if (xe_pt_entry(pt_dir, j_))
 				xe_pt_destroy(xe_pt_entry(pt_dir, j_),
-					      vma->vm->flags, deferred);
+					      xe_vma_vm(vma)->flags, deferred);
 
 			pt_dir->dir.entries[j_] = &newpte->drm;
 		}
@@ -1074,7 +1074,7 @@ static int xe_pt_userptr_inject_eagain(struct xe_vma *vma)
 	static u32 count;
 
 	if (count++ % divisor == divisor - 1) {
-		struct xe_vm *vm = vma->vm;
+		struct xe_vm *vm = xe_vma_vm(vma);
 
 		vma->userptr.divisor = divisor << 1;
 		spin_lock(&vm->userptr.invalidated_lock);
@@ -1117,7 +1117,7 @@ static int xe_pt_userptr_pre_commit(struct xe_migrate_pt_update *pt_update)
 		container_of(pt_update, typeof(*userptr_update), base);
 	struct xe_vma *vma = pt_update->vma;
 	unsigned long notifier_seq = vma->userptr.notifier_seq;
-	struct xe_vm *vm = vma->vm;
+	struct xe_vm *vm = xe_vma_vm(vma);
 
 	userptr_update->locked = false;
 
@@ -1288,20 +1288,20 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 		},
 		.bind = true,
 	};
-	struct xe_vm *vm = vma->vm;
+	struct xe_vm *vm = xe_vma_vm(vma);
 	u32 num_entries;
 	struct dma_fence *fence;
 	struct invalidation_fence *ifence = NULL;
 	int err;
 
 	bind_pt_update.locked = false;
-	xe_bo_assert_held(vma->bo);
+	xe_bo_assert_held(xe_vma_bo(vma));
 	xe_vm_assert_held(vm);
 	XE_BUG_ON(xe_gt_is_media_type(gt));
 
-	vm_dbg(&vma->vm->xe->drm,
+	vm_dbg(&xe_vma_vm(vma)->xe->drm,
 	       "Preparing bind, with range [%llx...%llx) engine %p.\n",
-	       vma->start, vma->end, e);
+	       xe_vma_start(vma), xe_vma_end(vma) - 1, e);
 
 	err = xe_pt_prepare_bind(gt, vma, entries, &num_entries, rebind);
 	if (err)
@@ -1310,23 +1310,28 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 
 	xe_vm_dbg_print_entries(gt_to_xe(gt), entries, num_entries);
 
-	if (rebind && !xe_vm_no_dma_fences(vma->vm)) {
+	if (rebind && !xe_vm_no_dma_fences(xe_vma_vm(vma))) {
 		ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
 		if (!ifence)
 			return ERR_PTR(-ENOMEM);
 	}
 
 	fence = xe_migrate_update_pgtables(gt->migrate,
-					   vm, vma->bo,
+					   vm, xe_vma_bo(vma),
 					   e ? e : vm->eng[gt->info.id],
 					   entries, num_entries,
 					   syncs, num_syncs,
 					   &bind_pt_update.base);
 	if (!IS_ERR(fence)) {
+		bool last_munmap_rebind = vma->gpuva.flags & XE_VMA_LAST_REBIND;
 		LLIST_HEAD(deferred);
 
+
+		if (last_munmap_rebind)
+			vm_dbg(&vm->xe->drm, "last_munmap_rebind");
+
 		/* TLB invalidation must be done before signaling rebind */
-		if (rebind && !xe_vm_no_dma_fences(vma->vm)) {
+		if (rebind && !xe_vm_no_dma_fences(xe_vma_vm(vma))) {
 			int err = invalidation_fence_init(gt, ifence, fence,
 							  vma);
 			if (err) {
@@ -1339,12 +1344,12 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 
 		/* add shared fence now for pagetable delayed destroy */
 		dma_resv_add_fence(&vm->resv, fence, !rebind &&
-				   vma->last_munmap_rebind ?
+				   last_munmap_rebind ?
 				   DMA_RESV_USAGE_KERNEL :
 				   DMA_RESV_USAGE_BOOKKEEP);
 
-		if (!xe_vma_is_userptr(vma) && !vma->bo->vm)
-			dma_resv_add_fence(vma->bo->ttm.base.resv, fence,
+		if (!xe_vma_is_userptr(vma) && !xe_vma_bo(vma)->vm)
+			dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence,
 					   DMA_RESV_USAGE_BOOKKEEP);
 		xe_pt_commit_bind(vma, entries, num_entries, rebind,
 				  bind_pt_update.locked ? &deferred : NULL);
@@ -1357,8 +1362,7 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 			up_read(&vm->userptr.notifier_lock);
 			xe_bo_put_commit(&deferred);
 		}
-		if (!rebind && vma->last_munmap_rebind &&
-		    xe_vm_in_compute_mode(vm))
+		if (!rebind && last_munmap_rebind && xe_vm_in_compute_mode(vm))
 			queue_work(vm->xe->ordered_wq,
 				   &vm->preempt.rebind_work);
 	} else {
@@ -1506,14 +1510,14 @@ static unsigned int xe_pt_stage_unbind(struct xe_gt *gt, struct xe_vma *vma,
 			.max_level = XE_PT_HIGHEST_LEVEL,
 		},
 		.gt = gt,
-		.modified_start = vma->start,
-		.modified_end = vma->end + 1,
+		.modified_start = xe_vma_start(vma),
+		.modified_end = xe_vma_end(vma),
 		.wupd.entries = entries,
 	};
-	struct xe_pt *pt = vma->vm->pt_root[gt->info.id];
+	struct xe_pt *pt = xe_vma_vm(vma)->pt_root[gt->info.id];
 
-	(void)drm_pt_walk_shared(&pt->drm, pt->level, vma->start, vma->end + 1,
-				 &xe_walk.drm);
+	(void)drm_pt_walk_shared(&pt->drm, pt->level, xe_vma_start(vma),
+				 xe_vma_end(vma), &xe_walk.drm);
 
 	return xe_walk.wupd.num_used_entries;
 }
@@ -1525,7 +1529,7 @@ xe_migrate_clear_pgtable_callback(struct xe_migrate_pt_update *pt_update,
 				  const struct xe_vm_pgtable_update *update)
 {
 	struct xe_vma *vma = pt_update->vma;
-	u64 empty = __xe_pt_empty_pte(gt, vma->vm, update->pt->level);
+	u64 empty = __xe_pt_empty_pte(gt, xe_vma_vm(vma), update->pt->level);
 	int i;
 
 	XE_BUG_ON(xe_gt_is_media_type(gt));
@@ -1563,7 +1567,7 @@ xe_pt_commit_unbind(struct xe_vma *vma,
 			     i++) {
 				if (xe_pt_entry(pt_dir, i))
 					xe_pt_destroy(xe_pt_entry(pt_dir, i),
-						      vma->vm->flags, deferred);
+						      xe_vma_vm(vma)->flags, deferred);
 
 				pt_dir->dir.entries[i] = NULL;
 			}
@@ -1612,19 +1616,19 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 			.vma = vma,
 		},
 	};
-	struct xe_vm *vm = vma->vm;
+	struct xe_vm *vm = xe_vma_vm(vma);
 	u32 num_entries;
 	struct dma_fence *fence = NULL;
 	struct invalidation_fence *ifence;
 	LLIST_HEAD(deferred);
 
-	xe_bo_assert_held(vma->bo);
+	xe_bo_assert_held(xe_vma_bo(vma));
 	xe_vm_assert_held(vm);
 	XE_BUG_ON(xe_gt_is_media_type(gt));
 
-	vm_dbg(&vma->vm->xe->drm,
+	vm_dbg(&xe_vma_vm(vma)->xe->drm,
 	       "Preparing unbind, with range [%llx...%llx) engine %p.\n",
-	       vma->start, vma->end, e);
+	       xe_vma_start(vma), xe_vma_end(vma) - 1, e);
 
 	num_entries = xe_pt_stage_unbind(gt, vma, entries);
 	XE_BUG_ON(num_entries > ARRAY_SIZE(entries));
@@ -1663,8 +1667,8 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 				   DMA_RESV_USAGE_BOOKKEEP);
 
 		/* This fence will be installed by caller when doing eviction */
-		if (!xe_vma_is_userptr(vma) && !vma->bo->vm)
-			dma_resv_add_fence(vma->bo->ttm.base.resv, fence,
+		if (!xe_vma_is_userptr(vma) && !xe_vma_bo(vma)->vm)
+			dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence,
 					   DMA_RESV_USAGE_BOOKKEEP);
 		xe_pt_commit_unbind(vma, entries, num_entries,
 				    unbind_pt_update.locked ? &deferred : NULL);
diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index 38e9d7c6197b..5c7515296345 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -18,7 +18,7 @@
 #include "xe_gt_types.h"
 #include "xe_guc_engine_types.h"
 #include "xe_sched_job.h"
-#include "xe_vm_types.h"
+#include "xe_vm.h"
 
 DECLARE_EVENT_CLASS(xe_gt_tlb_invalidation_fence,
 		    TP_PROTO(struct xe_gt_tlb_invalidation_fence *fence),
@@ -378,10 +378,10 @@ DECLARE_EVENT_CLASS(xe_vma,
 
 		    TP_fast_assign(
 			   __entry->vma = (unsigned long)vma;
-			   __entry->asid = vma->vm->usm.asid;
-			   __entry->start = vma->start;
-			   __entry->end = vma->end;
-			   __entry->ptr = (u64)vma->userptr.ptr;
+			   __entry->asid = xe_vma_vm(vma)->usm.asid;
+			   __entry->start = xe_vma_start(vma);
+			   __entry->end = xe_vma_end(vma) - 1;
+			   __entry->ptr = xe_vma_userptr(vma);
 			   ),
 
 		    TP_printk("vma=0x%016llx, asid=0x%05x, start=0x%012llx, end=0x%012llx, ptr=0x%012llx,",
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 6c427ff92c44..f3608865e259 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -25,10 +25,8 @@
 #include "xe_preempt_fence.h"
 #include "xe_pt.h"
 #include "xe_res_cursor.h"
-#include "xe_sync.h"
 #include "xe_trace.h"
-
-#define TEST_VM_ASYNC_OPS_ERROR
+#include "xe_sync.h"
 
 /**
  * xe_vma_userptr_check_repin() - Advisory check for repin needed
@@ -51,20 +49,19 @@ int xe_vma_userptr_check_repin(struct xe_vma *vma)
 
 int xe_vma_userptr_pin_pages(struct xe_vma *vma)
 {
-	struct xe_vm *vm = vma->vm;
+	struct xe_vm *vm = xe_vma_vm(vma);
 	struct xe_device *xe = vm->xe;
-	const unsigned long num_pages =
-		(vma->end - vma->start + 1) >> PAGE_SHIFT;
+	const unsigned long num_pages = xe_vma_size(vma) >> PAGE_SHIFT;
 	struct page **pages;
 	bool in_kthread = !current->mm;
 	unsigned long notifier_seq;
 	int pinned, ret, i;
-	bool read_only = vma->pte_flags & XE_PTE_READ_ONLY;
+	bool read_only = xe_vma_read_only(vma);
 
 	lockdep_assert_held(&vm->lock);
 	XE_BUG_ON(!xe_vma_is_userptr(vma));
 retry:
-	if (vma->destroyed)
+	if (vma->gpuva.flags & XE_VMA_DESTROYED)
 		return 0;
 
 	notifier_seq = mmu_interval_read_begin(&vma->userptr.notifier);
@@ -94,7 +91,8 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
 	}
 
 	while (pinned < num_pages) {
-		ret = get_user_pages_fast(vma->userptr.ptr + pinned * PAGE_SIZE,
+		ret = get_user_pages_fast(xe_vma_userptr(vma) +
+					  pinned * PAGE_SIZE,
 					  num_pages - pinned,
 					  read_only ? 0 : FOLL_WRITE,
 					  &pages[pinned]);
@@ -295,7 +293,7 @@ void xe_vm_fence_all_extobjs(struct xe_vm *vm, struct dma_fence *fence,
 	struct xe_vma *vma;
 
 	list_for_each_entry(vma, &vm->extobj.list, extobj.link)
-		dma_resv_add_fence(vma->bo->ttm.base.resv, fence, usage);
+		dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence, usage);
 }
 
 static void resume_and_reinstall_preempt_fences(struct xe_vm *vm)
@@ -444,7 +442,7 @@ int xe_vm_lock_dma_resv(struct xe_vm *vm, struct ww_acquire_ctx *ww,
 	INIT_LIST_HEAD(objs);
 	list_for_each_entry(vma, &vm->extobj.list, extobj.link) {
 		tv_bo->num_shared = num_shared;
-		tv_bo->bo = &vma->bo->ttm;
+		tv_bo->bo = &xe_vma_bo(vma)->ttm;
 
 		list_add_tail(&tv_bo->head, objs);
 		tv_bo++;
@@ -459,10 +457,10 @@ int xe_vm_lock_dma_resv(struct xe_vm *vm, struct ww_acquire_ctx *ww,
 	spin_lock(&vm->notifier.list_lock);
 	list_for_each_entry_safe(vma, next, &vm->notifier.rebind_list,
 				 notifier.rebind_link) {
-		xe_bo_assert_held(vma->bo);
+		xe_bo_assert_held(xe_vma_bo(vma));
 
 		list_del_init(&vma->notifier.rebind_link);
-		if (vma->gt_present && !vma->destroyed)
+		if (vma->gt_present && !(vma->gpuva.flags & XE_VMA_DESTROYED))
 			list_move_tail(&vma->rebind_link, &vm->rebind_list);
 	}
 	spin_unlock(&vm->notifier.list_lock);
@@ -583,10 +581,11 @@ static void preempt_rebind_work_func(struct work_struct *w)
 		goto out_unlock;
 
 	list_for_each_entry(vma, &vm->rebind_list, rebind_link) {
-		if (xe_vma_is_userptr(vma) || vma->destroyed)
+		if (xe_vma_is_userptr(vma) ||
+		    vma->gpuva.flags & XE_VMA_DESTROYED)
 			continue;
 
-		err = xe_bo_validate(vma->bo, vm, false);
+		err = xe_bo_validate(xe_vma_bo(vma), vm, false);
 		if (err)
 			goto out_unlock;
 	}
@@ -645,17 +644,12 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	trace_xe_vm_rebind_worker_exit(vm);
 }
 
-struct async_op_fence;
-static int __xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma,
-			struct xe_engine *e, struct xe_sync_entry *syncs,
-			u32 num_syncs, struct async_op_fence *afence);
-
 static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni,
 				   const struct mmu_notifier_range *range,
 				   unsigned long cur_seq)
 {
 	struct xe_vma *vma = container_of(mni, struct xe_vma, userptr.notifier);
-	struct xe_vm *vm = vma->vm;
+	struct xe_vm *vm = xe_vma_vm(vma);
 	struct dma_resv_iter cursor;
 	struct dma_fence *fence;
 	long err;
@@ -679,7 +673,8 @@ static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni,
 	 * Tell exec and rebind worker they need to repin and rebind this
 	 * userptr.
 	 */
-	if (!xe_vm_in_fault_mode(vm) && !vma->destroyed && vma->gt_present) {
+	if (!xe_vm_in_fault_mode(vm) &&
+	    !(vma->gpuva.flags & XE_VMA_DESTROYED) && vma->gt_present) {
 		spin_lock(&vm->userptr.invalidated_lock);
 		list_move_tail(&vma->userptr.invalidate_link,
 			       &vm->userptr.invalidated);
@@ -784,7 +779,8 @@ int xe_vm_userptr_check_repin(struct xe_vm *vm)
 
 static struct dma_fence *
 xe_vm_bind_vma(struct xe_vma *vma, struct xe_engine *e,
-	       struct xe_sync_entry *syncs, u32 num_syncs);
+	       struct xe_sync_entry *syncs, u32 num_syncs,
+	       bool first_op, bool last_op);
 
 struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 {
@@ -805,7 +801,7 @@ struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 			trace_xe_vma_rebind_worker(vma);
 		else
 			trace_xe_vma_rebind_exec(vma);
-		fence = xe_vm_bind_vma(vma, NULL, NULL, 0);
+		fence = xe_vm_bind_vma(vma, NULL, NULL, 0, false, false);
 		if (IS_ERR(fence))
 			return fence;
 	}
@@ -833,6 +829,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 		return vma;
 	}
 
+	/* FIXME: Way to many lists, should be able to reduce this */
 	INIT_LIST_HEAD(&vma->rebind_link);
 	INIT_LIST_HEAD(&vma->unbind_link);
 	INIT_LIST_HEAD(&vma->userptr_link);
@@ -840,11 +837,12 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 	INIT_LIST_HEAD(&vma->notifier.rebind_link);
 	INIT_LIST_HEAD(&vma->extobj.link);
 
-	vma->vm = vm;
-	vma->start = start;
-	vma->end = end;
+	INIT_LIST_HEAD(&vma->gpuva.gem.entry);
+	vma->gpuva.mgr = &vm->mgr;
+	vma->gpuva.va.addr = start;
+	vma->gpuva.va.range = end - start + 1;
 	if (read_only)
-		vma->pte_flags = XE_PTE_READ_ONLY;
+		vma->gpuva.flags |= XE_VMA_READ_ONLY;
 
 	if (gt_mask) {
 		vma->gt_mask = gt_mask;
@@ -855,22 +853,24 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 	}
 
 	if (vm->xe->info.platform == XE_PVC)
-		vma->use_atomic_access_pte_bit = true;
+		vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
 
 	if (bo) {
 		xe_bo_assert_held(bo);
-		vma->bo_offset = bo_offset_or_userptr;
-		vma->bo = xe_bo_get(bo);
-		list_add_tail(&vma->bo_link, &bo->vmas);
+
+		drm_gem_object_get(&bo->ttm.base);
+		vma->gpuva.gem.obj = &bo->ttm.base;
+		vma->gpuva.gem.offset = bo_offset_or_userptr;
+		drm_gpuva_link(&vma->gpuva);
 	} else /* userptr */ {
 		u64 size = end - start + 1;
 		int err;
 
-		vma->userptr.ptr = bo_offset_or_userptr;
+		vma->gpuva.gem.offset = bo_offset_or_userptr;
 
 		err = mmu_interval_notifier_insert(&vma->userptr.notifier,
 						   current->mm,
-						   vma->userptr.ptr, size,
+						   xe_vma_userptr(vma), size,
 						   &vma_userptr_notifier_ops);
 		if (err) {
 			kfree(vma);
@@ -888,16 +888,16 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 static void vm_remove_extobj(struct xe_vma *vma)
 {
 	if (!list_empty(&vma->extobj.link)) {
-		vma->vm->extobj.entries--;
+		xe_vma_vm(vma)->extobj.entries--;
 		list_del_init(&vma->extobj.link);
 	}
 }
 
 static void xe_vma_destroy_late(struct xe_vma *vma)
 {
-	struct xe_vm *vm = vma->vm;
+	struct xe_vm *vm = xe_vma_vm(vma);
 	struct xe_device *xe = vm->xe;
-	bool read_only = vma->pte_flags & XE_PTE_READ_ONLY;
+	bool read_only = xe_vma_read_only(vma);
 
 	if (xe_vma_is_userptr(vma)) {
 		if (vma->userptr.sg) {
@@ -917,7 +917,7 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
 		mmu_interval_notifier_remove(&vma->userptr.notifier);
 		xe_vm_put(vm);
 	} else {
-		xe_bo_put(vma->bo);
+		xe_bo_put(xe_vma_bo(vma));
 	}
 
 	kfree(vma);
@@ -942,21 +942,22 @@ static void vma_destroy_cb(struct dma_fence *fence,
 
 static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
 {
-	struct xe_vm *vm = vma->vm;
+	struct xe_vm *vm = xe_vma_vm(vma);
 
 	lockdep_assert_held_write(&vm->lock);
 	XE_BUG_ON(!list_empty(&vma->unbind_link));
 
 	if (xe_vma_is_userptr(vma)) {
-		XE_WARN_ON(!vma->destroyed);
+		XE_WARN_ON(!(vma->gpuva.flags & XE_VMA_DESTROYED));
+
 		spin_lock(&vm->userptr.invalidated_lock);
 		list_del_init(&vma->userptr.invalidate_link);
 		spin_unlock(&vm->userptr.invalidated_lock);
 		list_del(&vma->userptr_link);
 	} else {
-		xe_bo_assert_held(vma->bo);
-		list_del(&vma->bo_link);
-		if (!vma->bo->vm)
+		xe_bo_assert_held(xe_vma_bo(vma));
+		drm_gpuva_unlink(&vma->gpuva);
+		if (!xe_vma_bo(vma)->vm)
 			vm_remove_extobj(vma);
 	}
 
@@ -981,13 +982,13 @@ static void xe_vma_destroy_unlocked(struct xe_vma *vma)
 {
 	struct ttm_validate_buffer tv[2];
 	struct ww_acquire_ctx ww;
-	struct xe_bo *bo = vma->bo;
+	struct xe_bo *bo = xe_vma_bo(vma);
 	LIST_HEAD(objs);
 	LIST_HEAD(dups);
 	int err;
 
 	memset(tv, 0, sizeof(tv));
-	tv[0].bo = xe_vm_ttm_bo(vma->vm);
+	tv[0].bo = xe_vm_ttm_bo(xe_vma_vm(vma));
 	list_add(&tv[0].head, &objs);
 
 	if (bo) {
@@ -1004,77 +1005,63 @@ static void xe_vma_destroy_unlocked(struct xe_vma *vma)
 		xe_bo_put(bo);
 }
 
-static struct xe_vma *to_xe_vma(const struct rb_node *node)
-{
-	BUILD_BUG_ON(offsetof(struct xe_vma, vm_node) != 0);
-	return (struct xe_vma *)node;
-}
-
-static int xe_vma_cmp(const struct xe_vma *a, const struct xe_vma *b)
-{
-	if (a->end < b->start) {
-		return -1;
-	} else if (b->end < a->start) {
-		return 1;
-	} else {
-		return 0;
-	}
-}
-
-static bool xe_vma_less_cb(struct rb_node *a, const struct rb_node *b)
-{
-	return xe_vma_cmp(to_xe_vma(a), to_xe_vma(b)) < 0;
-}
-
-int xe_vma_cmp_vma_cb(const void *key, const struct rb_node *node)
-{
-	struct xe_vma *cmp = to_xe_vma(node);
-	const struct xe_vma *own = key;
-
-	if (own->start > cmp->end)
-		return 1;
-
-	if (own->end < cmp->start)
-		return -1;
-
-	return 0;
-}
-
 struct xe_vma *
-xe_vm_find_overlapping_vma(struct xe_vm *vm, const struct xe_vma *vma)
+xe_vm_find_overlapping_vma(struct xe_vm *vm, u64 start, u64 range)
 {
-	struct rb_node *node;
+	struct drm_gpuva *gpuva;
 
 	if (xe_vm_is_closed(vm))
 		return NULL;
 
-	XE_BUG_ON(vma->end >= vm->size);
+	XE_BUG_ON(start + range > vm->size);
 	lockdep_assert_held(&vm->lock);
 
-	node = rb_find(vma, &vm->vmas, xe_vma_cmp_vma_cb);
+	gpuva = drm_gpuva_find_first(&vm->mgr, start, range);
 
-	return node ? to_xe_vma(node) : NULL;
+	return gpuva ? gpuva_to_vma(gpuva) : NULL;
 }
 
-static void xe_vm_insert_vma(struct xe_vm *vm, struct xe_vma *vma)
+static int xe_vm_insert_vma(struct xe_vm *vm, struct xe_vma *vma)
 {
-	XE_BUG_ON(vma->vm != vm);
+	int err;
+
+	XE_BUG_ON(xe_vma_vm(vma) != vm);
 	lockdep_assert_held(&vm->lock);
 
-	rb_add(&vma->vm_node, &vm->vmas, xe_vma_less_cb);
+	err = drm_gpuva_insert(&vm->mgr, &vma->gpuva);
+	XE_WARN_ON(err);	/* Shouldn't be possible */
+
+	return err;
 }
 
-static void xe_vm_remove_vma(struct xe_vm *vm, struct xe_vma *vma)
+static void xe_vm_remove_vma(struct xe_vm *vm, struct xe_vma *vma, bool remove)
 {
-	XE_BUG_ON(vma->vm != vm);
+	XE_BUG_ON(xe_vma_vm(vma) != vm);
 	lockdep_assert_held(&vm->lock);
 
-	rb_erase(&vma->vm_node, &vm->vmas);
+	if (remove)
+		drm_gpuva_remove(&vma->gpuva);
 	if (vm->usm.last_fault_vma == vma)
 		vm->usm.last_fault_vma = NULL;
 }
 
-static void async_op_work_func(struct work_struct *w);
+static struct drm_gpuva_op *xe_vm_op_alloc(void)
+{
+	struct xe_vma_op *op;
+
+	op = kzalloc(sizeof(*op), GFP_KERNEL);
+
+	if (unlikely(!op))
+		return NULL;
+
+	return &op->base;
+}
+
+static struct drm_gpuva_fn_ops gpuva_ops = {
+	.op_alloc = xe_vm_op_alloc,
+};
+
+static void xe_vma_op_work_func(struct work_struct *w);
 static void vm_destroy_work_func(struct work_struct *w);
 
 struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
@@ -1094,7 +1081,6 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 
 	vm->size = 1ull << xe_pt_shift(xe->info.vm_max_level + 1);
 
-	vm->vmas = RB_ROOT;
 	vm->flags = flags;
 
 	init_rwsem(&vm->lock);
@@ -1110,7 +1096,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 	spin_lock_init(&vm->notifier.list_lock);
 
 	INIT_LIST_HEAD(&vm->async_ops.pending);
-	INIT_WORK(&vm->async_ops.work, async_op_work_func);
+	INIT_WORK(&vm->async_ops.work, xe_vma_op_work_func);
 	spin_lock_init(&vm->async_ops.lock);
 
 	INIT_WORK(&vm->destroy_work, vm_destroy_work_func);
@@ -1130,6 +1116,8 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 	if (err)
 		goto err_put;
 
+	drm_gpuva_manager_init(&vm->mgr, "Xe VM", 0, vm->size, 0, 0,
+			       &gpuva_ops);
 	if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
 		vm->flags |= XE_VM_FLAGS_64K;
 
@@ -1235,6 +1223,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 			xe_pt_destroy(vm->pt_root[id], vm->flags, NULL);
 	}
 	dma_resv_unlock(&vm->resv);
+	drm_gpuva_manager_destroy(&vm->mgr);
 err_put:
 	dma_resv_fini(&vm->resv);
 	kfree(vm);
@@ -1284,14 +1273,19 @@ static void vm_error_capture(struct xe_vm *vm, int err,
 
 void xe_vm_close_and_put(struct xe_vm *vm)
 {
-	struct rb_root contested = RB_ROOT;
+	struct list_head contested;
 	struct ww_acquire_ctx ww;
 	struct xe_device *xe = vm->xe;
 	struct xe_gt *gt;
+	struct xe_vma *vma, *next_vma;
+	struct drm_gpuva *gpuva;
+	DRM_GPUVA_ITER(it, &vm->mgr, 0);
 	u8 id;
 
 	XE_BUG_ON(vm->preempt.num_engines);
 
+	INIT_LIST_HEAD(&contested);
+
 	vm->size = 0;
 	smp_mb();
 	flush_async_ops(vm);
@@ -1308,24 +1302,25 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 
 	down_write(&vm->lock);
 	xe_vm_lock(vm, &ww, 0, false);
-	while (vm->vmas.rb_node) {
-		struct xe_vma *vma = to_xe_vma(vm->vmas.rb_node);
+	drm_gpuva_iter_for_each(gpuva, it) {
+		vma = gpuva_to_vma(gpuva);
 
 		if (xe_vma_is_userptr(vma)) {
 			down_read(&vm->userptr.notifier_lock);
-			vma->destroyed = true;
+			vma->gpuva.flags |= XE_VMA_DESTROYED;
 			up_read(&vm->userptr.notifier_lock);
 		}
 
-		rb_erase(&vma->vm_node, &vm->vmas);
+		xe_vm_remove_vma(vm, vma, false);
+		drm_gpuva_iter_remove(&it);
 
 		/* easy case, remove from VMA? */
-		if (xe_vma_is_userptr(vma) || vma->bo->vm) {
+		if (xe_vma_is_userptr(vma) || xe_vma_bo(vma)->vm) {
 			xe_vma_destroy(vma, NULL);
 			continue;
 		}
 
-		rb_add(&vma->vm_node, &contested, xe_vma_less_cb);
+		list_add_tail(&vma->unbind_link, &contested);
 	}
 
 	/*
@@ -1348,19 +1343,14 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 	}
 	xe_vm_unlock(vm, &ww);
 
-	if (contested.rb_node) {
-
-		/*
-		 * VM is now dead, cannot re-add nodes to vm->vmas if it's NULL
-		 * Since we hold a refcount to the bo, we can remove and free
-		 * the members safely without locking.
-		 */
-		while (contested.rb_node) {
-			struct xe_vma *vma = to_xe_vma(contested.rb_node);
-
-			rb_erase(&vma->vm_node, &contested);
-			xe_vma_destroy_unlocked(vma);
-		}
+	/*
+	 * VM is now dead, cannot re-add nodes to vm->vmas if it's NULL
+	 * Since we hold a refcount to the bo, we can remove and free
+	 * the members safely without locking.
+	 */
+	list_for_each_entry_safe(vma, next_vma, &contested, unbind_link) {
+		list_del_init(&vma->unbind_link);
+		xe_vma_destroy_unlocked(vma);
 	}
 
 	if (vm->async_ops.error_capture.addr)
@@ -1369,6 +1359,8 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 	XE_WARN_ON(!list_empty(&vm->extobj.list));
 	up_write(&vm->lock);
 
+	drm_gpuva_manager_destroy(&vm->mgr);
+
 	mutex_lock(&xe->usm.lock);
 	if (vm->flags & XE_VM_FLAG_FAULT_MODE)
 		xe->usm.num_vm_in_fault_mode--;
@@ -1456,13 +1448,14 @@ u64 xe_vm_pdp4_descriptor(struct xe_vm *vm, struct xe_gt *full_gt)
 
 static struct dma_fence *
 xe_vm_unbind_vma(struct xe_vma *vma, struct xe_engine *e,
-		 struct xe_sync_entry *syncs, u32 num_syncs)
+		 struct xe_sync_entry *syncs, u32 num_syncs,
+		 bool first_op, bool last_op)
 {
 	struct xe_gt *gt;
 	struct dma_fence *fence = NULL;
 	struct dma_fence **fences = NULL;
 	struct dma_fence_array *cf = NULL;
-	struct xe_vm *vm = vma->vm;
+	struct xe_vm *vm = xe_vma_vm(vma);
 	int cur_fence = 0, i;
 	int number_gts = hweight_long(vma->gt_present);
 	int err;
@@ -1483,7 +1476,8 @@ xe_vm_unbind_vma(struct xe_vma *vma, struct xe_engine *e,
 
 		XE_BUG_ON(xe_gt_is_media_type(gt));
 
-		fence = __xe_pt_unbind_vma(gt, vma, e, syncs, num_syncs);
+		fence = __xe_pt_unbind_vma(gt, vma, e, first_op ? syncs : NULL,
+					   first_op ? num_syncs : 0);
 		if (IS_ERR(fence)) {
 			err = PTR_ERR(fence);
 			goto err_fences;
@@ -1509,7 +1503,7 @@ xe_vm_unbind_vma(struct xe_vma *vma, struct xe_engine *e,
 		}
 	}
 
-	for (i = 0; i < num_syncs; i++)
+	for (i = 0; last_op && i < num_syncs; i++)
 		xe_sync_entry_signal(&syncs[i], NULL, cf ? &cf->base : fence);
 
 	return cf ? &cf->base : !fence ? dma_fence_get_stub() : fence;
@@ -1528,13 +1522,14 @@ xe_vm_unbind_vma(struct xe_vma *vma, struct xe_engine *e,
 
 static struct dma_fence *
 xe_vm_bind_vma(struct xe_vma *vma, struct xe_engine *e,
-	       struct xe_sync_entry *syncs, u32 num_syncs)
+	       struct xe_sync_entry *syncs, u32 num_syncs,
+	       bool first_op, bool last_op)
 {
 	struct xe_gt *gt;
 	struct dma_fence *fence;
 	struct dma_fence **fences = NULL;
 	struct dma_fence_array *cf = NULL;
-	struct xe_vm *vm = vma->vm;
+	struct xe_vm *vm = xe_vma_vm(vma);
 	int cur_fence = 0, i;
 	int number_gts = hweight_long(vma->gt_mask);
 	int err;
@@ -1554,7 +1549,8 @@ xe_vm_bind_vma(struct xe_vma *vma, struct xe_engine *e,
 			goto next;
 
 		XE_BUG_ON(xe_gt_is_media_type(gt));
-		fence = __xe_pt_bind_vma(gt, vma, e, syncs, num_syncs,
+		fence = __xe_pt_bind_vma(gt, vma, e, first_op ? syncs : NULL,
+					 first_op ? num_syncs : 0,
 					 vma->gt_present & BIT(id));
 		if (IS_ERR(fence)) {
 			err = PTR_ERR(fence);
@@ -1581,7 +1577,7 @@ xe_vm_bind_vma(struct xe_vma *vma, struct xe_engine *e,
 		}
 	}
 
-	for (i = 0; i < num_syncs; i++)
+	for (i = 0; last_op && i < num_syncs; i++)
 		xe_sync_entry_signal(&syncs[i], NULL, cf ? &cf->base : fence);
 
 	return cf ? &cf->base : fence;
@@ -1680,15 +1676,27 @@ int xe_vm_async_fence_wait_start(struct dma_fence *fence)
 
 static int __xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma,
 			struct xe_engine *e, struct xe_sync_entry *syncs,
-			u32 num_syncs, struct async_op_fence *afence)
+			u32 num_syncs, struct async_op_fence *afence,
+			bool immediate, bool first_op, bool last_op)
 {
 	struct dma_fence *fence;
 
 	xe_vm_assert_held(vm);
 
-	fence = xe_vm_bind_vma(vma, e, syncs, num_syncs);
-	if (IS_ERR(fence))
-		return PTR_ERR(fence);
+	if (immediate) {
+		fence = xe_vm_bind_vma(vma, e, syncs, num_syncs, first_op,
+				       last_op);
+		if (IS_ERR(fence))
+			return PTR_ERR(fence);
+	} else {
+		int i;
+
+		XE_BUG_ON(!xe_vm_in_fault_mode(vm));
+
+		fence = dma_fence_get_stub();
+		for (i = 0; last_op && i < num_syncs; i++)
+			xe_sync_entry_signal(&syncs[i], NULL, fence);
+	}
 	if (afence)
 		add_async_op_fence_cb(vm, fence, afence);
 
@@ -1698,32 +1706,35 @@ static int __xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma,
 
 static int xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma, struct xe_engine *e,
 		      struct xe_bo *bo, struct xe_sync_entry *syncs,
-		      u32 num_syncs, struct async_op_fence *afence)
+		      u32 num_syncs, struct async_op_fence *afence,
+		      bool immediate, bool first_op, bool last_op)
 {
 	int err;
 
 	xe_vm_assert_held(vm);
 	xe_bo_assert_held(bo);
 
-	if (bo) {
+	if (bo && immediate) {
 		err = xe_bo_validate(bo, vm, true);
 		if (err)
 			return err;
 	}
 
-	return __xe_vm_bind(vm, vma, e, syncs, num_syncs, afence);
+	return __xe_vm_bind(vm, vma, e, syncs, num_syncs, afence, immediate,
+			    first_op, last_op);
 }
 
 static int xe_vm_unbind(struct xe_vm *vm, struct xe_vma *vma,
 			struct xe_engine *e, struct xe_sync_entry *syncs,
-			u32 num_syncs, struct async_op_fence *afence)
+			u32 num_syncs, struct async_op_fence *afence,
+			bool first_op, bool last_op)
 {
 	struct dma_fence *fence;
 
 	xe_vm_assert_held(vm);
-	xe_bo_assert_held(vma->bo);
+	xe_bo_assert_held(xe_vma_bo(vma));
 
-	fence = xe_vm_unbind_vma(vma, e, syncs, num_syncs);
+	fence = xe_vm_unbind_vma(vma, e, syncs, num_syncs, first_op, last_op);
 	if (IS_ERR(fence))
 		return PTR_ERR(fence);
 	if (afence)
@@ -1946,26 +1957,27 @@ static const u32 region_to_mem_type[] = {
 static int xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma,
 			  struct xe_engine *e, u32 region,
 			  struct xe_sync_entry *syncs, u32 num_syncs,
-			  struct async_op_fence *afence)
+			  struct async_op_fence *afence, bool first_op,
+			  bool last_op)
 {
 	int err;
 
 	XE_BUG_ON(region > ARRAY_SIZE(region_to_mem_type));
 
 	if (!xe_vma_is_userptr(vma)) {
-		err = xe_bo_migrate(vma->bo, region_to_mem_type[region]);
+		err = xe_bo_migrate(xe_vma_bo(vma), region_to_mem_type[region]);
 		if (err)
 			return err;
 	}
 
 	if (vma->gt_mask != (vma->gt_present & ~vma->usm.gt_invalidated)) {
-		return xe_vm_bind(vm, vma, e, vma->bo, syncs, num_syncs,
-				  afence);
+		return xe_vm_bind(vm, vma, e, xe_vma_bo(vma), syncs, num_syncs,
+				  afence, true, first_op, last_op);
 	} else {
 		int i;
 
 		/* Nothing to do, signal fences now */
-		for (i = 0; i < num_syncs; i++)
+		for (i = 0; last_op && i < num_syncs; i++)
 			xe_sync_entry_signal(&syncs[i], NULL,
 					     dma_fence_get_stub());
 		if (afence)
@@ -1976,29 +1988,6 @@ static int xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma,
 
 #define VM_BIND_OP(op)	(op & 0xffff)
 
-static int __vm_bind_ioctl(struct xe_vm *vm, struct xe_vma *vma,
-			   struct xe_engine *e, struct xe_bo *bo, u32 op,
-			   u32 region, struct xe_sync_entry *syncs,
-			   u32 num_syncs, struct async_op_fence *afence)
-{
-	switch (VM_BIND_OP(op)) {
-	case XE_VM_BIND_OP_MAP:
-		return xe_vm_bind(vm, vma, e, bo, syncs, num_syncs, afence);
-	case XE_VM_BIND_OP_UNMAP:
-	case XE_VM_BIND_OP_UNMAP_ALL:
-		return xe_vm_unbind(vm, vma, e, syncs, num_syncs, afence);
-	case XE_VM_BIND_OP_MAP_USERPTR:
-		return xe_vm_bind(vm, vma, e, NULL, syncs, num_syncs, afence);
-	case XE_VM_BIND_OP_PREFETCH:
-		return xe_vm_prefetch(vm, vma, e, region, syncs, num_syncs,
-				      afence);
-		break;
-	default:
-		XE_BUG_ON("NOT POSSIBLE");
-		return -EINVAL;
-	}
-}
-
 struct ttm_buffer_object *xe_vm_ttm_bo(struct xe_vm *vm)
 {
 	int idx = vm->flags & XE_VM_FLAG_MIGRATION ?
@@ -2014,836 +2003,851 @@ static void xe_vm_tv_populate(struct xe_vm *vm, struct ttm_validate_buffer *tv)
 	tv->bo = xe_vm_ttm_bo(vm);
 }
 
-static bool is_map_op(u32 op)
+static void vm_set_async_error(struct xe_vm *vm, int err)
 {
-	return VM_BIND_OP(op) == XE_VM_BIND_OP_MAP ||
-		VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR;
+	lockdep_assert_held(&vm->lock);
+	vm->async_ops.error = err;
 }
 
-static bool is_unmap_op(u32 op)
+static bool bo_has_vm_references(struct xe_bo *bo, struct xe_vm *vm,
+				 struct xe_vma *ignore)
 {
-	return VM_BIND_OP(op) == XE_VM_BIND_OP_UNMAP ||
-		VM_BIND_OP(op) == XE_VM_BIND_OP_UNMAP_ALL;
+	struct ww_acquire_ctx ww;
+	struct drm_gpuva *gpuva;
+	struct drm_gem_object *obj = &bo->ttm.base;
+	bool ret = false;
+
+	xe_bo_lock(bo, &ww, 0, false);
+	drm_gem_for_each_gpuva(gpuva, obj) {
+		struct xe_vma *vma = gpuva_to_vma(gpuva);
+
+		if (vma != ignore && xe_vma_vm(vma) == vm &&
+		    !(vma->gpuva.flags & XE_VMA_DESTROYED)) {
+			ret = true;
+			break;
+		}
+	}
+	xe_bo_unlock(bo, &ww);
+
+	return ret;
 }
 
-static int vm_bind_ioctl(struct xe_vm *vm, struct xe_vma *vma,
-			 struct xe_engine *e, struct xe_bo *bo,
-			 struct drm_xe_vm_bind_op *bind_op,
-			 struct xe_sync_entry *syncs, u32 num_syncs,
-			 struct async_op_fence *afence)
+static int vm_insert_extobj(struct xe_vm *vm, struct xe_vma *vma)
 {
-	LIST_HEAD(objs);
-	LIST_HEAD(dups);
-	struct ttm_validate_buffer tv_bo, tv_vm;
-	struct ww_acquire_ctx ww;
-	struct xe_bo *vbo;
-	int err, i;
+	struct xe_bo *bo = xe_vma_bo(vma);
 
-	lockdep_assert_held(&vm->lock);
-	XE_BUG_ON(!list_empty(&vma->unbind_link));
+	lockdep_assert_held_write(&vm->lock);
 
-	/* Binds deferred to faults, signal fences now */
-	if (xe_vm_in_fault_mode(vm) && is_map_op(bind_op->op) &&
-	    !(bind_op->op & XE_VM_BIND_FLAG_IMMEDIATE)) {
-		for (i = 0; i < num_syncs; i++)
-			xe_sync_entry_signal(&syncs[i], NULL,
-					     dma_fence_get_stub());
-		if (afence)
-			dma_fence_signal(&afence->fence);
+	if (bo_has_vm_references(bo, vm, vma))
 		return 0;
-	}
 
-	xe_vm_tv_populate(vm, &tv_vm);
-	list_add_tail(&tv_vm.head, &objs);
-	vbo = vma->bo;
-	if (vbo) {
-		/*
-		 * An unbind can drop the last reference to the BO and
-		 * the BO is needed for ttm_eu_backoff_reservation so
-		 * take a reference here.
-		 */
-		xe_bo_get(vbo);
+	list_add(&vma->extobj.link, &vm->extobj.list);
+	vm->extobj.entries++;
 
-		if (!vbo->vm) {
-			tv_bo.bo = &vbo->ttm;
-			tv_bo.num_shared = 1;
-			list_add(&tv_bo.head, &objs);
-		}
-	}
+	return 0;
+}
 
-again:
-	err = ttm_eu_reserve_buffers(&ww, &objs, true, &dups);
-	if (!err) {
-		err = __vm_bind_ioctl(vm, vma, e, bo,
-				      bind_op->op, bind_op->region, syncs,
-				      num_syncs, afence);
-		ttm_eu_backoff_reservation(&ww, &objs);
-		if (err == -EAGAIN && xe_vma_is_userptr(vma)) {
-			lockdep_assert_held_write(&vm->lock);
-			err = xe_vma_userptr_pin_pages(vma);
-			if (!err)
-				goto again;
-		}
+static int vm_bind_ioctl_lookup_vma(struct xe_vm *vm, struct xe_bo *bo,
+				    u64 addr, u64 range, u32 op)
+{
+	struct xe_device *xe = vm->xe;
+	struct xe_vma *vma;
+	bool async = !!(op & XE_VM_BIND_FLAG_ASYNC);
+
+	lockdep_assert_held(&vm->lock);
+
+	switch (VM_BIND_OP(op)) {
+	case XE_VM_BIND_OP_MAP:
+	case XE_VM_BIND_OP_MAP_USERPTR:
+		vma = xe_vm_find_overlapping_vma(vm, addr, range);
+		if (XE_IOCTL_ERR(xe, vma && !async))
+			return -EBUSY;
+		break;
+	case XE_VM_BIND_OP_UNMAP:
+	case XE_VM_BIND_OP_PREFETCH:
+		vma = xe_vm_find_overlapping_vma(vm, addr, range);
+		if (XE_IOCTL_ERR(xe, !vma) ||
+		    XE_IOCTL_ERR(xe, (xe_vma_start(vma) != addr ||
+				 xe_vma_end(vma) != addr + range) && !async))
+			return -EINVAL;
+		break;
+	case XE_VM_BIND_OP_UNMAP_ALL:
+		if (XE_IOCTL_ERR(xe, list_empty(&bo->ttm.base.gpuva.list)))
+			return -ENODATA;
+		break;
+	default:
+		XE_BUG_ON("NOT POSSIBLE");
+		return -EINVAL;
 	}
-	xe_bo_put(vbo);
 
-	return err;
+	return 0;
 }
 
-struct async_op {
-	struct xe_vma *vma;
-	struct xe_engine *engine;
-	struct xe_bo *bo;
-	struct drm_xe_vm_bind_op bind_op;
-	struct xe_sync_entry *syncs;
-	u32 num_syncs;
-	struct list_head link;
-	struct async_op_fence *fence;
-};
-
-static void async_op_cleanup(struct xe_vm *vm, struct async_op *op)
+static void prep_vma_destroy(struct xe_vm *vm, struct xe_vma *vma,
+			     bool post_commit)
 {
-	while (op->num_syncs--)
-		xe_sync_entry_cleanup(&op->syncs[op->num_syncs]);
-	kfree(op->syncs);
-	xe_bo_put(op->bo);
-	if (op->engine)
-		xe_engine_put(op->engine);
-	xe_vm_put(vm);
-	if (op->fence)
-		dma_fence_put(&op->fence->fence);
-	kfree(op);
+	down_read(&vm->userptr.notifier_lock);
+	vma->gpuva.flags |= XE_VMA_DESTROYED;
+	up_read(&vm->userptr.notifier_lock);
+	if (post_commit)
+		xe_vm_remove_vma(vm, vma, true);
 }
 
-static struct async_op *next_async_op(struct xe_vm *vm)
+#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_VM)
+static void print_op(struct xe_device *xe, struct drm_gpuva_op *op)
 {
-	return list_first_entry_or_null(&vm->async_ops.pending,
-					struct async_op, link);
-}
+	struct xe_vma *vma;
 
-static void vm_set_async_error(struct xe_vm *vm, int err)
+	switch (op->op) {
+	case DRM_GPUVA_OP_MAP:
+		vm_dbg(&xe->drm, "MAP: addr=0x%016llx, range=0x%016llx",
+		       op->map.va.addr, op->map.va.range);
+		break;
+	case DRM_GPUVA_OP_REMAP:
+		vma = gpuva_to_vma(op->remap.unmap->va);
+		vm_dbg(&xe->drm, "REMAP:UNMAP: addr=0x%016llx, range=0x%016llx, keep=%d",
+		       xe_vma_start(vma), xe_vma_size(vma),
+		       op->unmap.keep ? 1 : 0);
+		if (op->remap.prev)
+			vm_dbg(&xe->drm,
+			       "REMAP:PREV: addr=0x%016llx, range=0x%016llx",
+			       op->remap.prev->va.addr,
+			       op->remap.prev->va.range);
+		if (op->remap.next)
+			vm_dbg(&xe->drm,
+			       "REMAP:NEXT: addr=0x%016llx, range=0x%016llx",
+			       op->remap.next->va.addr,
+			       op->remap.next->va.range);
+		break;
+	case DRM_GPUVA_OP_UNMAP:
+		vma = gpuva_to_vma(op->unmap.va);
+		vm_dbg(&xe->drm, "UNMAP: addr=0x%016llx, range=0x%016llx, keep=%d",
+		       xe_vma_start(vma), xe_vma_size(vma),
+		       op->unmap.keep ? 1 : 0);
+		break;
+	default:
+		XE_BUG_ON("NOT_POSSIBLE");
+	}
+}
+#else
+static void print_op(struct xe_device *xe, struct drm_gpuva_op *op)
 {
-	lockdep_assert_held(&vm->lock);
-	vm->async_ops.error = err;
 }
+#endif
 
-static void async_op_work_func(struct work_struct *w)
+/*
+ * Create operations list from IOCTL arguments, setup operations fields so parse
+ * and commit steps are decoupled from IOCTL arguments. This step can fail.
+ */
+static struct drm_gpuva_ops *
+vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
+			 u64 bo_offset_or_userptr, u64 addr, u64 range,
+			 u32 operation, u64 gt_mask, u32 region)
 {
-	struct xe_vm *vm = container_of(w, struct xe_vm, async_ops.work);
-
-	for (;;) {
-		struct async_op *op;
-		int err;
-
-		if (vm->async_ops.error && !xe_vm_is_closed(vm))
-			break;
+	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
+	struct ww_acquire_ctx ww;
+	struct drm_gpuva_ops *ops;
+	struct drm_gpuva_op *__op;
+	struct xe_vma_op *op;
+	int err;
 
-		spin_lock_irq(&vm->async_ops.lock);
-		op = next_async_op(vm);
-		if (op)
-			list_del_init(&op->link);
-		spin_unlock_irq(&vm->async_ops.lock);
+	lockdep_assert_held_write(&vm->lock);
 
-		if (!op)
-			break;
+	vm_dbg(&vm->xe->drm,
+	       "op=%d, addr=0x%016llx, range=0x%016llx, bo_offset_or_userptr=0x%016llx",
+	       VM_BIND_OP(operation), addr, range, bo_offset_or_userptr);
 
-		if (!xe_vm_is_closed(vm)) {
-			bool first, last;
+	switch (VM_BIND_OP(operation)) {
+	case XE_VM_BIND_OP_MAP:
+	case XE_VM_BIND_OP_MAP_USERPTR:
+		ops = drm_gpuva_sm_map_ops_create(&vm->mgr, addr, range,
+						  obj, bo_offset_or_userptr);
+		drm_gpuva_for_each_op(__op, ops) {
+			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
 
-			down_write(&vm->lock);
-again:
-			first = op->vma->first_munmap_rebind;
-			last = op->vma->last_munmap_rebind;
-#ifdef TEST_VM_ASYNC_OPS_ERROR
-#define FORCE_ASYNC_OP_ERROR	BIT(31)
-			if (!(op->bind_op.op & FORCE_ASYNC_OP_ERROR)) {
-				err = vm_bind_ioctl(vm, op->vma, op->engine,
-						    op->bo, &op->bind_op,
-						    op->syncs, op->num_syncs,
-						    op->fence);
-			} else {
-				err = -ENOMEM;
-				op->bind_op.op &= ~FORCE_ASYNC_OP_ERROR;
-			}
-#else
-			err = vm_bind_ioctl(vm, op->vma, op->engine, op->bo,
-					    &op->bind_op, op->syncs,
-					    op->num_syncs, op->fence);
-#endif
-			/*
-			 * In order for the fencing to work (stall behind
-			 * existing jobs / prevent new jobs from running) all
-			 * the dma-resv slots need to be programmed in a batch
-			 * relative to execs / the rebind worker. The vm->lock
-			 * ensure this.
-			 */
-			if (!err && ((first && VM_BIND_OP(op->bind_op.op) ==
-				      XE_VM_BIND_OP_UNMAP) ||
-				     vm->async_ops.munmap_rebind_inflight)) {
-				if (last) {
-					op->vma->last_munmap_rebind = false;
-					vm->async_ops.munmap_rebind_inflight =
-						false;
-				} else {
-					vm->async_ops.munmap_rebind_inflight =
-						true;
-
-					async_op_cleanup(vm, op);
-
-					spin_lock_irq(&vm->async_ops.lock);
-					op = next_async_op(vm);
-					XE_BUG_ON(!op);
-					list_del_init(&op->link);
-					spin_unlock_irq(&vm->async_ops.lock);
-
-					goto again;
-				}
-			}
-			if (err) {
-				trace_xe_vma_fail(op->vma);
-				drm_warn(&vm->xe->drm, "Async VM op(%d) failed with %d",
-					 VM_BIND_OP(op->bind_op.op),
-					 err);
+			op->gt_mask = gt_mask;
+			op->map.immediate =
+				operation & XE_VM_BIND_FLAG_IMMEDIATE;
+			op->map.read_only =
+				operation & XE_VM_BIND_FLAG_READONLY;
+		}
+		break;
+	case XE_VM_BIND_OP_UNMAP:
+		ops = drm_gpuva_sm_unmap_ops_create(&vm->mgr, addr, range);
+		drm_gpuva_for_each_op(__op, ops) {
+			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
 
-				spin_lock_irq(&vm->async_ops.lock);
-				list_add(&op->link, &vm->async_ops.pending);
-				spin_unlock_irq(&vm->async_ops.lock);
+			op->gt_mask = gt_mask;
+		}
+		break;
+	case XE_VM_BIND_OP_PREFETCH:
+		ops = drm_gpuva_prefetch_ops_create(&vm->mgr, addr, range);
+		drm_gpuva_for_each_op(__op, ops) {
+			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
 
-				vm_set_async_error(vm, err);
-				up_write(&vm->lock);
+			op->gt_mask = gt_mask;
+			op->prefetch.region = region;
+		}
+		break;
+	case XE_VM_BIND_OP_UNMAP_ALL:
+		XE_BUG_ON(!bo);
 
-				if (vm->async_ops.error_capture.addr)
-					vm_error_capture(vm, err,
-							 op->bind_op.op,
-							 op->bind_op.addr,
-							 op->bind_op.range);
-				break;
-			}
-			up_write(&vm->lock);
-		} else {
-			trace_xe_vma_flush(op->vma);
+		err = xe_bo_lock(bo, &ww, 0, true);
+		if (err)
+			return ERR_PTR(err);
+		ops = drm_gpuva_gem_unmap_ops_create(&vm->mgr, obj);
+		xe_bo_unlock(bo, &ww);
 
-			if (is_unmap_op(op->bind_op.op)) {
-				down_write(&vm->lock);
-				xe_vma_destroy_unlocked(op->vma);
-				up_write(&vm->lock);
-			}
+		drm_gpuva_for_each_op(__op, ops) {
+			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
 
-			if (op->fence && !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-						   &op->fence->fence.flags)) {
-				if (!xe_vm_no_dma_fences(vm)) {
-					op->fence->started = true;
-					smp_wmb();
-					wake_up_all(&op->fence->wq);
-				}
-				dma_fence_signal(&op->fence->fence);
-			}
+			op->gt_mask = gt_mask;
 		}
+		break;
+	default:
+		XE_BUG_ON("NOT POSSIBLE");
+		ops = ERR_PTR(-EINVAL);
+	}
 
-		async_op_cleanup(vm, op);
+#ifdef TEST_VM_ASYNC_OPS_ERROR
+	if (operation & FORCE_ASYNC_OP_ERROR) {
+		op = list_first_entry_or_null(&ops->list, struct xe_vma_op,
+					      base.entry);
+		if (op)
+			op->inject_error = true;
 	}
+#endif
+
+	if (!IS_ERR(ops))
+		drm_gpuva_for_each_op(__op, ops)
+			print_op(vm->xe, __op);
+
+	return ops;
 }
 
-static int __vm_bind_ioctl_async(struct xe_vm *vm, struct xe_vma *vma,
-				 struct xe_engine *e, struct xe_bo *bo,
-				 struct drm_xe_vm_bind_op *bind_op,
-				 struct xe_sync_entry *syncs, u32 num_syncs)
+static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
+			      u64 gt_mask, bool read_only)
 {
-	struct async_op *op;
-	bool installed = false;
-	u64 seqno;
-	int i;
+	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
+	struct xe_vma *vma;
+	struct ww_acquire_ctx ww;
+	int err;
 
-	lockdep_assert_held(&vm->lock);
+	lockdep_assert_held_write(&vm->lock);
 
-	op = kmalloc(sizeof(*op), GFP_KERNEL);
-	if (!op) {
-		return -ENOMEM;
+	if (bo) {
+		err = xe_bo_lock(bo, &ww, 0, true);
+		if (err)
+			return ERR_PTR(err);
 	}
-
-	if (num_syncs) {
-		op->fence = kmalloc(sizeof(*op->fence), GFP_KERNEL);
-		if (!op->fence) {
-			kfree(op);
-			return -ENOMEM;
+	vma = xe_vma_create(vm, bo, op->gem.offset,
+			    op->va.addr, op->va.addr +
+			    op->va.range - 1, read_only,
+			    gt_mask);
+	if (bo)
+		xe_bo_unlock(bo, &ww);
+
+	if (xe_vma_is_userptr(vma)) {
+		err = xe_vma_userptr_pin_pages(vma);
+		if (err) {
+			prep_vma_destroy(vm, vma, false);
+			xe_vma_destroy(vma, NULL);
+			return ERR_PTR(err);
+		}
+	} else if(!bo->vm) {
+		vm_insert_extobj(vm, vma);
+		err = add_preempt_fences(vm, bo);
+		if (err) {
+			prep_vma_destroy(vm, vma, false);
+			xe_vma_destroy(vma, NULL);
+			return ERR_PTR(err);
 		}
+	}
+
+	return vma;
+}
+
+/*
+ * Parse operations list and create any resources needed for the operations
+ * prior to fully commiting to the operations. This setp can fail.
+ */
+static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
+				   struct drm_gpuva_ops **ops, int num_ops_list,
+				   struct xe_sync_entry *syncs, u32 num_syncs,
+				   struct list_head *ops_list, bool async)
+{
+	struct xe_vma_op *last_op = NULL;
+	struct list_head *async_list = NULL;
+	struct async_op_fence *fence = NULL;
+	int err, i;
+
+	lockdep_assert_held_write(&vm->lock);
+	XE_BUG_ON(num_ops_list > 1 && !async);
+
+	if (num_syncs && async) {
+		u64 seqno;
+
+		fence = kmalloc(sizeof(*fence), GFP_KERNEL);
+		if (!fence)
+			return -ENOMEM;
 
 		seqno = e ? ++e->bind.fence_seqno : ++vm->async_ops.fence.seqno;
-		dma_fence_init(&op->fence->fence, &async_op_fence_ops,
+		dma_fence_init(&fence->fence, &async_op_fence_ops,
 			       &vm->async_ops.lock, e ? e->bind.fence_ctx :
 			       vm->async_ops.fence.context, seqno);
 
 		if (!xe_vm_no_dma_fences(vm)) {
-			op->fence->vm = vm;
-			op->fence->started = false;
-			init_waitqueue_head(&op->fence->wq);
+			fence->vm = vm;
+			fence->started = false;
+			init_waitqueue_head(&fence->wq);
 		}
-	} else {
-		op->fence = NULL;
 	}
-	op->vma = vma;
-	op->engine = e;
-	op->bo = bo;
-	op->bind_op = *bind_op;
-	op->syncs = syncs;
-	op->num_syncs = num_syncs;
-	INIT_LIST_HEAD(&op->link);
 
-	for (i = 0; i < num_syncs; i++)
-		installed |= xe_sync_entry_signal(&syncs[i], NULL,
-						  &op->fence->fence);
+	for (i = 0; i < num_ops_list; ++i) {
+		struct drm_gpuva_ops *__ops = ops[i];
+		struct drm_gpuva_op *__op;
 
-	if (!installed && op->fence)
-		dma_fence_signal(&op->fence->fence);
+		drm_gpuva_for_each_op(__op, __ops) {
+			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
+			bool first = !async_list;
 
-	spin_lock_irq(&vm->async_ops.lock);
-	list_add_tail(&op->link, &vm->async_ops.pending);
-	spin_unlock_irq(&vm->async_ops.lock);
+			XE_BUG_ON(!first && !async);
 
-	if (!vm->async_ops.error)
-		queue_work(system_unbound_wq, &vm->async_ops.work);
+			INIT_LIST_HEAD(&op->link);
+			if (first)
+				async_list = ops_list;
+			list_add_tail(&op->link, async_list);
 
-	return 0;
-}
-
-static int vm_bind_ioctl_async(struct xe_vm *vm, struct xe_vma *vma,
-			       struct xe_engine *e, struct xe_bo *bo,
-			       struct drm_xe_vm_bind_op *bind_op,
-			       struct xe_sync_entry *syncs, u32 num_syncs)
-{
-	struct xe_vma *__vma, *next;
-	struct list_head rebind_list;
-	struct xe_sync_entry *in_syncs = NULL, *out_syncs = NULL;
-	u32 num_in_syncs = 0, num_out_syncs = 0;
-	bool first = true, last;
-	int err;
-	int i;
+			if (first) {
+				op->flags |= XE_VMA_OP_FIRST;
+				op->num_syncs = num_syncs;
+				op->syncs = syncs;
+			}
 
-	lockdep_assert_held(&vm->lock);
+			op->engine = e;
 
-	/* Not a linked list of unbinds + rebinds, easy */
-	if (list_empty(&vma->unbind_link))
-		return __vm_bind_ioctl_async(vm, vma, e, bo, bind_op,
-					     syncs, num_syncs);
+			switch (op->base.op) {
+			case DRM_GPUVA_OP_MAP:
+			{
+				struct xe_vma *vma;
 
-	/*
-	 * Linked list of unbinds + rebinds, decompose syncs into 'in / out'
-	 * passing the 'in' to the first operation and 'out' to the last. Also
-	 * the reference counting is a little tricky, increment the VM / bind
-	 * engine ref count on all but the last operation and increment the BOs
-	 * ref count on each rebind.
-	 */
+				vma = new_vma(vm, &op->base.map,
+					      op->gt_mask, op->map.read_only);
+				if (IS_ERR(vma)) {
+					err = PTR_ERR(vma);
+					goto free_fence;
+				}
 
-	XE_BUG_ON(VM_BIND_OP(bind_op->op) != XE_VM_BIND_OP_UNMAP &&
-		  VM_BIND_OP(bind_op->op) != XE_VM_BIND_OP_UNMAP_ALL &&
-		  VM_BIND_OP(bind_op->op) != XE_VM_BIND_OP_PREFETCH);
+				op->map.vma = vma;
+				break;
+			}
+			case DRM_GPUVA_OP_REMAP:
+				if (op->base.remap.prev) {
+					struct xe_vma *vma;
+					bool read_only =
+						op->base.remap.unmap->va->flags &
+						XE_VMA_READ_ONLY;
+
+					vma = new_vma(vm, op->base.remap.prev,
+						      op->gt_mask, read_only);
+					if (IS_ERR(vma)) {
+						err = PTR_ERR(vma);
+						goto free_fence;
+					}
+
+					op->remap.prev = vma;
+				}
 
-	/* Decompose syncs */
-	if (num_syncs) {
-		in_syncs = kmalloc(sizeof(*in_syncs) * num_syncs, GFP_KERNEL);
-		out_syncs = kmalloc(sizeof(*out_syncs) * num_syncs, GFP_KERNEL);
-		if (!in_syncs || !out_syncs) {
-			err = -ENOMEM;
-			goto out_error;
-		}
+				if (op->base.remap.next) {
+					struct xe_vma *vma;
+					bool read_only =
+						op->base.remap.unmap->va->flags &
+						XE_VMA_READ_ONLY;
 
-		for (i = 0; i < num_syncs; ++i) {
-			bool signal = syncs[i].flags & DRM_XE_SYNC_SIGNAL;
+					vma = new_vma(vm, op->base.remap.next,
+						      op->gt_mask, read_only);
+					if (IS_ERR(vma)) {
+						err = PTR_ERR(vma);
+						goto free_fence;
+					}
 
-			if (signal)
-				out_syncs[num_out_syncs++] = syncs[i];
-			else
-				in_syncs[num_in_syncs++] = syncs[i];
-		}
-	}
+					op->remap.next = vma;
+				}
 
-	/* Do unbinds + move rebinds to new list */
-	INIT_LIST_HEAD(&rebind_list);
-	list_for_each_entry_safe(__vma, next, &vma->unbind_link, unbind_link) {
-		if (__vma->destroyed ||
-		    VM_BIND_OP(bind_op->op) == XE_VM_BIND_OP_PREFETCH) {
-			list_del_init(&__vma->unbind_link);
-			xe_bo_get(bo);
-			err = __vm_bind_ioctl_async(xe_vm_get(vm), __vma,
-						    e ? xe_engine_get(e) : NULL,
-						    bo, bind_op, first ?
-						    in_syncs : NULL,
-						    first ? num_in_syncs : 0);
-			if (err) {
-				xe_bo_put(bo);
-				xe_vm_put(vm);
-				if (e)
-					xe_engine_put(e);
-				goto out_error;
+				/* XXX: Support no doing remaps */
+				op->remap.start =
+					xe_vma_start(gpuva_to_vma(op->base.remap.unmap->va));
+				op->remap.range =
+					xe_vma_size(gpuva_to_vma(op->base.remap.unmap->va));
+				break;
+			case DRM_GPUVA_OP_UNMAP:
+				op->unmap.start =
+					xe_vma_start(gpuva_to_vma(op->base.unmap.va));
+				op->unmap.range =
+					xe_vma_size(gpuva_to_vma(op->base.unmap.va));
+				break;
+			case DRM_GPUVA_OP_PREFETCH:
+				/* Nothing to do */
+				break;
+			default:
+				XE_BUG_ON("NOT POSSIBLE");
 			}
-			in_syncs = NULL;
-			first = false;
-		} else {
-			list_move_tail(&__vma->unbind_link, &rebind_list);
-		}
-	}
-	last = list_empty(&rebind_list);
-	if (!last) {
-		xe_vm_get(vm);
-		if (e)
-			xe_engine_get(e);
-	}
-	err = __vm_bind_ioctl_async(vm, vma, e,
-				    bo, bind_op,
-				    first ? in_syncs :
-				    last ? out_syncs : NULL,
-				    first ? num_in_syncs :
-				    last ? num_out_syncs : 0);
-	if (err) {
-		if (!last) {
-			xe_vm_put(vm);
-			if (e)
-				xe_engine_put(e);
-		}
-		goto out_error;
-	}
-	in_syncs = NULL;
 
-	/* Do rebinds */
-	list_for_each_entry_safe(__vma, next, &rebind_list, unbind_link) {
-		list_del_init(&__vma->unbind_link);
-		last = list_empty(&rebind_list);
-
-		if (xe_vma_is_userptr(__vma)) {
-			bind_op->op = XE_VM_BIND_FLAG_ASYNC |
-				XE_VM_BIND_OP_MAP_USERPTR;
-		} else {
-			bind_op->op = XE_VM_BIND_FLAG_ASYNC |
-				XE_VM_BIND_OP_MAP;
-			xe_bo_get(__vma->bo);
-		}
-
-		if (!last) {
-			xe_vm_get(vm);
-			if (e)
-				xe_engine_get(e);
+			last_op = op;
 		}
 
-		err = __vm_bind_ioctl_async(vm, __vma, e,
-					    __vma->bo, bind_op, last ?
-					    out_syncs : NULL,
-					    last ? num_out_syncs : 0);
-		if (err) {
-			if (!last) {
-				xe_vm_put(vm);
-				if (e)
-					xe_engine_put(e);
-			}
-			goto out_error;
-		}
+		last_op->ops = __ops;
 	}
 
-	kfree(syncs);
-	return 0;
+	XE_BUG_ON(!last_op);	/* FIXME: This is not an error, handle */
 
-out_error:
-	kfree(in_syncs);
-	kfree(out_syncs);
-	kfree(syncs);
+	last_op->flags |= XE_VMA_OP_LAST;
+	last_op->num_syncs = num_syncs;
+	last_op->syncs = syncs;
+	last_op->fence = fence;
 
+	return 0;
+
+free_fence:
+	kfree(fence);
 	return err;
 }
 
-static bool bo_has_vm_references(struct xe_bo *bo, struct xe_vm *vm,
-				 struct xe_vma *ignore)
+static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)
 {
-	struct ww_acquire_ctx ww;
-	struct xe_vma *vma;
-	bool ret = false;
+	int err = 0;
 
-	xe_bo_lock(bo, &ww, 0, false);
-	list_for_each_entry(vma, &bo->vmas, bo_link) {
-		if (vma != ignore && vma->vm == vm && !vma->destroyed) {
-			ret = true;
-			break;
-		}
+	lockdep_assert_held_write(&vm->lock);
+
+	switch (op->base.op) {
+	case DRM_GPUVA_OP_MAP:
+		err |= xe_vm_insert_vma(vm, op->map.vma);
+		break;
+	case DRM_GPUVA_OP_REMAP:
+		prep_vma_destroy(vm, gpuva_to_vma(op->base.remap.unmap->va),
+				 true);
+		if (op->remap.prev)
+			err |= xe_vm_insert_vma(vm, op->remap.prev);
+		if (op->remap.next)
+			err |= xe_vm_insert_vma(vm, op->remap.next);
+		break;
+	case DRM_GPUVA_OP_UNMAP:
+		prep_vma_destroy(vm, gpuva_to_vma(op->base.unmap.va), true);
+		break;
+	case DRM_GPUVA_OP_PREFETCH:
+		/* Nothing to do */
+		break;
+	default:
+		XE_BUG_ON("NOT POSSIBLE");
 	}
-	xe_bo_unlock(bo, &ww);
 
-	return ret;
+	op->flags |= XE_VMA_OP_COMMITTED;
+	return err;
 }
 
-static int vm_insert_extobj(struct xe_vm *vm, struct xe_vma *vma)
+static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
+			       struct xe_vma_op *op)
 {
-	struct xe_bo *bo = vma->bo;
+	LIST_HEAD(objs);
+	LIST_HEAD(dups);
+	struct ttm_validate_buffer tv_bo, tv_vm;
+	struct ww_acquire_ctx ww;
+	struct xe_bo *vbo;
+	int err;
 
 	lockdep_assert_held_write(&vm->lock);
 
-	if (bo_has_vm_references(bo, vm, vma))
-		return 0;
+	xe_vm_tv_populate(vm, &tv_vm);
+	list_add_tail(&tv_vm.head, &objs);
+	vbo = xe_vma_bo(vma);
+	if (vbo) {
+		/*
+		 * An unbind can drop the last reference to the BO and
+		 * the BO is needed for ttm_eu_backoff_reservation so
+		 * take a reference here.
+		 */
+		xe_bo_get(vbo);
 
-	list_add(&vma->extobj.link, &vm->extobj.list);
-	vm->extobj.entries++;
+		if (!vbo->vm) {
+			tv_bo.bo = &vbo->ttm;
+			tv_bo.num_shared = 1;
+			list_add(&tv_bo.head, &objs);
+		}
+	}
 
-	return 0;
-}
+again:
+	err = ttm_eu_reserve_buffers(&ww, &objs, true, &dups);
+	if (err) {
+		xe_bo_put(vbo);
+		return err;
+	}
 
-static int __vm_bind_ioctl_lookup_vma(struct xe_vm *vm, struct xe_bo *bo,
-				      u64 addr, u64 range, u32 op)
-{
-	struct xe_device *xe = vm->xe;
-	struct xe_vma *vma, lookup;
-	bool async = !!(op & XE_VM_BIND_FLAG_ASYNC);
+	xe_vm_assert_held(vm);
+	xe_bo_assert_held(xe_vma_bo(vma));
+
+	switch (op->base.op) {
+	case DRM_GPUVA_OP_MAP:
+		err = xe_vm_bind(vm, vma, op->engine, xe_vma_bo(vma),
+				 op->syncs, op->num_syncs, op->fence,
+				 op->map.immediate || !xe_vm_in_fault_mode(vm),
+				 op->flags & XE_VMA_OP_FIRST,
+				 op->flags & XE_VMA_OP_LAST);
+		break;
+	case DRM_GPUVA_OP_REMAP:
+	{
+		bool prev = !!op->remap.prev;
+		bool next = !!op->remap.next;
+
+		if (!op->remap.unmap_done) {
+			vm->async_ops.munmap_rebind_inflight = true;
+			if (prev || next)
+				vma->gpuva.flags |= XE_VMA_FIRST_REBIND;
+			err = xe_vm_unbind(vm, vma, op->engine, op->syncs,
+					   op->num_syncs,
+					   !prev && !next ? op->fence : NULL,
+					   op->flags & XE_VMA_OP_FIRST,
+					   op->flags & XE_VMA_OP_LAST && !prev &&
+					   !next);
+			if (err)
+				break;
+			op->remap.unmap_done = true;
+		}
 
-	lockdep_assert_held(&vm->lock);
+		if (prev) {
+			op->remap.prev->gpuva.flags |= XE_VMA_LAST_REBIND;
+			err = xe_vm_bind(vm, op->remap.prev, op->engine,
+					 xe_vma_bo(op->remap.prev), op->syncs,
+					 op->num_syncs,
+					 !next ? op->fence : NULL, true, false,
+					 op->flags & XE_VMA_OP_LAST && !next);
+			op->remap.prev->gpuva.flags &= ~XE_VMA_LAST_REBIND;
+			if (err)
+				break;
+			op->remap.prev = NULL;
+		}
 
-	lookup.start = addr;
-	lookup.end = addr + range - 1;
+		if (next) {
+			op->remap.next->gpuva.flags |= XE_VMA_LAST_REBIND;
+			err = xe_vm_bind(vm, op->remap.next, op->engine,
+					 xe_vma_bo(op->remap.next),
+					 op->syncs, op->num_syncs,
+					 op->fence, true, false,
+					 op->flags & XE_VMA_OP_LAST);
+			op->remap.next->gpuva.flags &= ~XE_VMA_LAST_REBIND;
+			if (err)
+				break;
+			op->remap.next = NULL;
+		}
+		vm->async_ops.munmap_rebind_inflight = false;
 
-	switch (VM_BIND_OP(op)) {
-	case XE_VM_BIND_OP_MAP:
-	case XE_VM_BIND_OP_MAP_USERPTR:
-		vma = xe_vm_find_overlapping_vma(vm, &lookup);
-		if (XE_IOCTL_ERR(xe, vma))
-			return -EBUSY;
 		break;
-	case XE_VM_BIND_OP_UNMAP:
-	case XE_VM_BIND_OP_PREFETCH:
-		vma = xe_vm_find_overlapping_vma(vm, &lookup);
-		if (XE_IOCTL_ERR(xe, !vma) ||
-		    XE_IOCTL_ERR(xe, (vma->start != addr ||
-				 vma->end != addr + range - 1) && !async))
-			return -EINVAL;
+	}
+	case DRM_GPUVA_OP_UNMAP:
+		err = xe_vm_unbind(vm, vma, op->engine, op->syncs,
+				   op->num_syncs, op->fence,
+				   op->flags & XE_VMA_OP_FIRST,
+				   op->flags & XE_VMA_OP_LAST);
 		break;
-	case XE_VM_BIND_OP_UNMAP_ALL:
+	case DRM_GPUVA_OP_PREFETCH:
+		err = xe_vm_prefetch(vm, vma, op->engine, op->prefetch.region,
+				     op->syncs, op->num_syncs, op->fence,
+				     op->flags & XE_VMA_OP_FIRST,
+				     op->flags & XE_VMA_OP_LAST);
 		break;
 	default:
 		XE_BUG_ON("NOT POSSIBLE");
-		return -EINVAL;
 	}
 
-	return 0;
-}
+	ttm_eu_backoff_reservation(&ww, &objs);
+	if (err == -EAGAIN && xe_vma_is_userptr(vma)) {
+		lockdep_assert_held_write(&vm->lock);
+		err = xe_vma_userptr_pin_pages(vma);
+		if (!err)
+			goto again;
+	}
+	xe_bo_put(vbo);
 
-static void prep_vma_destroy(struct xe_vm *vm, struct xe_vma *vma)
-{
-	down_read(&vm->userptr.notifier_lock);
-	vma->destroyed = true;
-	up_read(&vm->userptr.notifier_lock);
-	xe_vm_remove_vma(vm, vma);
+	if (err)
+		trace_xe_vma_fail(vma);
+
+	return err;
 }
 
-static int prep_replacement_vma(struct xe_vm *vm, struct xe_vma *vma)
+static int xe_vma_op_execute(struct xe_vm *vm, struct xe_vma_op *op)
 {
-	int err;
+	int ret = 0;
 
-	if (vma->bo && !vma->bo->vm) {
-		vm_insert_extobj(vm, vma);
-		err = add_preempt_fences(vm, vma->bo);
-		if (err)
-			return err;
+	lockdep_assert_held_write(&vm->lock);
+
+#ifdef TEST_VM_ASYNC_OPS_ERROR
+	if (op->inject_error) {
+		op->inject_error = false;
+		return -ENOMEM;
 	}
+#endif
 
-	return 0;
+	switch (op->base.op) {
+	case DRM_GPUVA_OP_MAP:
+		ret = __xe_vma_op_execute(vm, op->map.vma, op);
+		break;
+	case DRM_GPUVA_OP_REMAP:
+	{
+		struct xe_vma *vma;
+
+		if (!op->remap.unmap_done)
+			vma = gpuva_to_vma(op->base.remap.unmap->va);
+		else if(op->remap.prev)
+			vma = op->remap.prev;
+		else
+			vma = op->remap.next;
+
+		ret = __xe_vma_op_execute(vm, vma, op);
+		break;
+	}
+	case DRM_GPUVA_OP_UNMAP:
+		ret = __xe_vma_op_execute(vm, gpuva_to_vma(op->base.unmap.va),
+					  op);
+		break;
+	case DRM_GPUVA_OP_PREFETCH:
+		ret = __xe_vma_op_execute(vm,
+					  gpuva_to_vma(op->base.prefetch.va),
+					  op);
+		break;
+	default:
+		XE_BUG_ON("NOT POSSIBLE");
+	}
+
+	return ret;
 }
 
-/*
- * Find all overlapping VMAs in lookup range and add to a list in the returned
- * VMA, all of VMAs found will be unbound. Also possibly add 2 new VMAs that
- * need to be bound if first / last VMAs are not fully unbound. This is akin to
- * how munmap works.
- */
-static struct xe_vma *vm_unbind_lookup_vmas(struct xe_vm *vm,
-					    struct xe_vma *lookup)
+static void xe_vma_op_cleanup(struct xe_vm *vm, struct xe_vma_op *op)
 {
-	struct xe_vma *vma = xe_vm_find_overlapping_vma(vm, lookup);
-	struct rb_node *node;
-	struct xe_vma *first = vma, *last = vma, *new_first = NULL,
-		      *new_last = NULL, *__vma, *next;
-	int err = 0;
-	bool first_munmap_rebind = false;
+	bool last = op->flags & XE_VMA_OP_LAST;
 
-	lockdep_assert_held(&vm->lock);
-	XE_BUG_ON(!vma);
-
-	node = &vma->vm_node;
-	while ((node = rb_next(node))) {
-		if (!xe_vma_cmp_vma_cb(lookup, node)) {
-			__vma = to_xe_vma(node);
-			list_add_tail(&__vma->unbind_link, &vma->unbind_link);
-			last = __vma;
-		} else {
-			break;
-		}
+	if (last) {
+		while (op->num_syncs--)
+			xe_sync_entry_cleanup(&op->syncs[op->num_syncs]);
+		kfree(op->syncs);
+		if (op->engine)
+			xe_engine_put(op->engine);
+		if (op->fence)
+			dma_fence_put(&op->fence->fence);
 	}
-
-	node = &vma->vm_node;
-	while ((node = rb_prev(node))) {
-		if (!xe_vma_cmp_vma_cb(lookup, node)) {
-			__vma = to_xe_vma(node);
-			list_add(&__vma->unbind_link, &vma->unbind_link);
-			first = __vma;
-		} else {
-			break;
-		}
+	if (!list_empty(&op->link)) {
+		spin_lock_irq(&vm->async_ops.lock);
+		list_del(&op->link);
+		spin_unlock_irq(&vm->async_ops.lock);
 	}
+	if (op->ops)
+		drm_gpuva_ops_free(&vm->mgr, op->ops);
+	if (last)
+		xe_vm_put(vm);
+}
 
-	if (first->start != lookup->start) {
-		struct ww_acquire_ctx ww;
+static void xe_vma_op_unwind(struct xe_vm *vm, struct xe_vma_op *op,
+			     bool post_commit)
+{
+	lockdep_assert_held_write(&vm->lock);
 
-		if (first->bo)
-			err = xe_bo_lock(first->bo, &ww, 0, true);
-		if (err)
-			goto unwind;
-		new_first = xe_vma_create(first->vm, first->bo,
-					  first->bo ? first->bo_offset :
-					  first->userptr.ptr,
-					  first->start,
-					  lookup->start - 1,
-					  (first->pte_flags & XE_PTE_READ_ONLY),
-					  first->gt_mask);
-		if (first->bo)
-			xe_bo_unlock(first->bo, &ww);
-		if (!new_first) {
-			err = -ENOMEM;
-			goto unwind;
-		}
-		if (!first->bo) {
-			err = xe_vma_userptr_pin_pages(new_first);
-			if (err)
-				goto unwind;
+	switch (op->base.op) {
+	case DRM_GPUVA_OP_MAP:
+		if (op->map.vma) {
+			prep_vma_destroy(vm, op->map.vma, post_commit);
+			xe_vma_destroy(op->map.vma, NULL);
 		}
-		err = prep_replacement_vma(vm, new_first);
-		if (err)
-			goto unwind;
-	}
-
-	if (last->end != lookup->end) {
-		struct ww_acquire_ctx ww;
-		u64 chunk = lookup->end + 1 - last->start;
+		break;
+	case DRM_GPUVA_OP_UNMAP:
+	{
+		struct xe_vma *vma = gpuva_to_vma(op->base.unmap.va);
 
-		if (last->bo)
-			err = xe_bo_lock(last->bo, &ww, 0, true);
-		if (err)
-			goto unwind;
-		new_last = xe_vma_create(last->vm, last->bo,
-					 last->bo ? last->bo_offset + chunk :
-					 last->userptr.ptr + chunk,
-					 last->start + chunk,
-					 last->end,
-					 (last->pte_flags & XE_PTE_READ_ONLY),
-					 last->gt_mask);
-		if (last->bo)
-			xe_bo_unlock(last->bo, &ww);
-		if (!new_last) {
-			err = -ENOMEM;
-			goto unwind;
-		}
-		if (!last->bo) {
-			err = xe_vma_userptr_pin_pages(new_last);
-			if (err)
-				goto unwind;
-		}
-		err = prep_replacement_vma(vm, new_last);
-		if (err)
-			goto unwind;
+		down_read(&vm->userptr.notifier_lock);
+		vma->gpuva.flags &= ~XE_VMA_DESTROYED;
+		up_read(&vm->userptr.notifier_lock);
+		if (post_commit)
+			xe_vm_insert_vma(vm, vma);
+		break;
 	}
+	case DRM_GPUVA_OP_REMAP:
+		struct xe_vma *vma = gpuva_to_vma(op->base.remap.unmap->va);
 
-	prep_vma_destroy(vm, vma);
-	if (list_empty(&vma->unbind_link) && (new_first || new_last))
-		vma->first_munmap_rebind = true;
-	list_for_each_entry(__vma, &vma->unbind_link, unbind_link) {
-		if ((new_first || new_last) && !first_munmap_rebind) {
-			__vma->first_munmap_rebind = true;
-			first_munmap_rebind = true;
+		if (op->remap.prev) {
+			prep_vma_destroy(vm, op->remap.prev, post_commit);
+			xe_vma_destroy(op->remap.prev, NULL);
 		}
-		prep_vma_destroy(vm, __vma);
-	}
-	if (new_first) {
-		xe_vm_insert_vma(vm, new_first);
-		list_add_tail(&new_first->unbind_link, &vma->unbind_link);
-		if (!new_last)
-			new_first->last_munmap_rebind = true;
-	}
-	if (new_last) {
-		xe_vm_insert_vma(vm, new_last);
-		list_add_tail(&new_last->unbind_link, &vma->unbind_link);
-		new_last->last_munmap_rebind = true;
-	}
-
-	return vma;
-
-unwind:
-	list_for_each_entry_safe(__vma, next, &vma->unbind_link, unbind_link)
-		list_del_init(&__vma->unbind_link);
-	if (new_last) {
-		prep_vma_destroy(vm, new_last);
-		xe_vma_destroy_unlocked(new_last);
-	}
-	if (new_first) {
-		prep_vma_destroy(vm, new_first);
-		xe_vma_destroy_unlocked(new_first);
+		if (op->remap.next) {
+			prep_vma_destroy(vm, op->remap.next, post_commit);
+			xe_vma_destroy(op->remap.next, NULL);
+		}
+		down_read(&vm->userptr.notifier_lock);
+		vma->gpuva.flags &= ~XE_VMA_DESTROYED;
+		up_read(&vm->userptr.notifier_lock);
+		if (post_commit)
+			xe_vm_insert_vma(vm, vma);
+		break;
+	case DRM_GPUVA_OP_PREFETCH:
+		/* Nothing to do */
+		break;
+	default:
+		XE_BUG_ON("NOT POSSIBLE");
 	}
+}
 
-	return ERR_PTR(err);
+static struct xe_vma_op *next_vma_op(struct xe_vm *vm)
+{
+	return list_first_entry_or_null(&vm->async_ops.pending,
+					struct xe_vma_op, link);
 }
 
-/*
- * Similar to vm_unbind_lookup_vmas, find all VMAs in lookup range to prefetch
- */
-static struct xe_vma *vm_prefetch_lookup_vmas(struct xe_vm *vm,
-					      struct xe_vma *lookup,
-					      u32 region)
+static void xe_vma_op_work_func(struct work_struct *w)
 {
-	struct xe_vma *vma = xe_vm_find_overlapping_vma(vm, lookup), *__vma,
-		      *next;
-	struct rb_node *node;
+	struct xe_vm *vm = container_of(w, struct xe_vm, async_ops.work);
 
-	if (!xe_vma_is_userptr(vma)) {
-		if (!xe_bo_can_migrate(vma->bo, region_to_mem_type[region]))
-			return ERR_PTR(-EINVAL);
-	}
+	for (;;) {
+		struct xe_vma_op *op;
+		int err;
 
-	node = &vma->vm_node;
-	while ((node = rb_next(node))) {
-		if (!xe_vma_cmp_vma_cb(lookup, node)) {
-			__vma = to_xe_vma(node);
-			if (!xe_vma_is_userptr(__vma)) {
-				if (!xe_bo_can_migrate(__vma->bo, region_to_mem_type[region]))
-					goto flush_list;
-			}
-			list_add_tail(&__vma->unbind_link, &vma->unbind_link);
-		} else {
+		if (vm->async_ops.error && !xe_vm_is_closed(vm))
 			break;
-		}
-	}
 
-	node = &vma->vm_node;
-	while ((node = rb_prev(node))) {
-		if (!xe_vma_cmp_vma_cb(lookup, node)) {
-			__vma = to_xe_vma(node);
-			if (!xe_vma_is_userptr(__vma)) {
-				if (!xe_bo_can_migrate(__vma->bo, region_to_mem_type[region]))
-					goto flush_list;
-			}
-			list_add(&__vma->unbind_link, &vma->unbind_link);
-		} else {
+		spin_lock_irq(&vm->async_ops.lock);
+		op = next_vma_op(vm);
+		spin_unlock_irq(&vm->async_ops.lock);
+
+		if (!op)
 			break;
-		}
-	}
 
-	return vma;
+		if (!xe_vm_is_closed(vm)) {
+			down_write(&vm->lock);
+			err = xe_vma_op_execute(vm, op);
+			if (err) {
+				drm_warn(&vm->xe->drm,
+					 "Async VM op(%d) failed with %d",
+					 op->base.op, err);
+				vm_set_async_error(vm, err);
+				up_write(&vm->lock);
 
-flush_list:
-	list_for_each_entry_safe(__vma, next, &vma->unbind_link,
-				 unbind_link)
-		list_del_init(&__vma->unbind_link);
+				if (vm->async_ops.error_capture.addr)
+					vm_error_capture(vm, err, 0, 0, 0);
+				break;
+			}
+			up_write(&vm->lock);
+		} else {
+			struct xe_vma *vma;
 
-	return ERR_PTR(-EINVAL);
-}
+			switch (op->base.op) {
+			case DRM_GPUVA_OP_REMAP:
+				vma = gpuva_to_vma(op->base.remap.unmap->va);
+				trace_xe_vma_flush(vma);
 
-static struct xe_vma *vm_unbind_all_lookup_vmas(struct xe_vm *vm,
-						struct xe_bo *bo)
-{
-	struct xe_vma *first = NULL, *vma;
+				down_write(&vm->lock);
+				xe_vma_destroy_unlocked(vma);
+				up_write(&vm->lock);
+				break;
+			case DRM_GPUVA_OP_UNMAP:
+				vma = gpuva_to_vma(op->base.unmap.va);
+				trace_xe_vma_flush(vma);
 
-	lockdep_assert_held(&vm->lock);
-	xe_bo_assert_held(bo);
+				down_write(&vm->lock);
+				xe_vma_destroy_unlocked(vma);
+				up_write(&vm->lock);
+				break;
+			default:
+				/* Nothing to do */
+				break;
+			}
 
-	list_for_each_entry(vma, &bo->vmas, bo_link) {
-		if (vma->vm != vm)
-			continue;
+			if (op->fence && !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+						   &op->fence->fence.flags)) {
+				if (!xe_vm_no_dma_fences(vm)) {
+					op->fence->started = true;
+					smp_wmb();
+					wake_up_all(&op->fence->wq);
+				}
+				dma_fence_signal(&op->fence->fence);
+			}
+		}
 
-		prep_vma_destroy(vm, vma);
-		if (!first)
-			first = vma;
-		else
-			list_add_tail(&vma->unbind_link, &first->unbind_link);
+		xe_vma_op_cleanup(vm, op);
 	}
-
-	return first;
 }
 
-static struct xe_vma *vm_bind_ioctl_lookup_vma(struct xe_vm *vm,
-					       struct xe_bo *bo,
-					       u64 bo_offset_or_userptr,
-					       u64 addr, u64 range, u32 op,
-					       u64 gt_mask, u32 region)
+/*
+ * Commit operations list, this step cannot fail in async mode, can fail if the
+ * bind operation fails in sync mode.
+ */
+static int vm_bind_ioctl_ops_commit(struct xe_vm *vm,
+				    struct list_head *ops_list, bool async)
 {
-	struct ww_acquire_ctx ww;
-	struct xe_vma *vma, lookup;
+	struct xe_vma_op *op, *last_op, *next;
 	int err;
 
-	lockdep_assert_held(&vm->lock);
-
-	lookup.start = addr;
-	lookup.end = addr + range - 1;
+	lockdep_assert_held_write(&vm->lock);
 
-	switch (VM_BIND_OP(op)) {
-	case XE_VM_BIND_OP_MAP:
-		XE_BUG_ON(!bo);
+	list_for_each_entry(op, ops_list, link) {
+		last_op = op;
+		err = xe_vma_op_commit(vm, op);
+		if (err)
+			goto unwind;
+	}
 
-		err = xe_bo_lock(bo, &ww, 0, true);
+	if (!async) {
+		err = xe_vma_op_execute(vm, last_op);
 		if (err)
-			return ERR_PTR(err);
-		vma = xe_vma_create(vm, bo, bo_offset_or_userptr, addr,
-				    addr + range - 1,
-				    op & XE_VM_BIND_FLAG_READONLY,
-				    gt_mask);
-		xe_bo_unlock(bo, &ww);
-		if (!vma)
-			return ERR_PTR(-ENOMEM);
+			goto unwind;
+		xe_vma_op_cleanup(vm, last_op);
+	} else {
+		int i;
+		bool installed = false;
 
-		xe_vm_insert_vma(vm, vma);
-		if (!bo->vm) {
-			vm_insert_extobj(vm, vma);
-			err = add_preempt_fences(vm, bo);
-			if (err) {
-				prep_vma_destroy(vm, vma);
-				xe_vma_destroy_unlocked(vma);
+		for (i = 0; i < last_op->num_syncs; i++)
+			installed |= xe_sync_entry_signal(&last_op->syncs[i],
+							  NULL,
+							  &last_op->fence->fence);
+		if (!installed && last_op->fence)
+			dma_fence_signal(&last_op->fence->fence);
 
-				return ERR_PTR(err);
-			}
-		}
-		break;
-	case XE_VM_BIND_OP_UNMAP:
-		vma = vm_unbind_lookup_vmas(vm, &lookup);
-		break;
-	case XE_VM_BIND_OP_PREFETCH:
-		vma = vm_prefetch_lookup_vmas(vm, &lookup, region);
-		break;
-	case XE_VM_BIND_OP_UNMAP_ALL:
-		XE_BUG_ON(!bo);
+		spin_lock_irq(&vm->async_ops.lock);
+		list_splice_tail(ops_list, &vm->async_ops.pending);
+		spin_unlock_irq(&vm->async_ops.lock);
 
-		err = xe_bo_lock(bo, &ww, 0, true);
-		if (err)
-			return ERR_PTR(err);
-		vma = vm_unbind_all_lookup_vmas(vm, bo);
-		if (!vma)
-			vma = ERR_PTR(-EINVAL);
-		xe_bo_unlock(bo, &ww);
-		break;
-	case XE_VM_BIND_OP_MAP_USERPTR:
-		XE_BUG_ON(bo);
+		if (!vm->async_ops.error)
+			queue_work(system_unbound_wq, &vm->async_ops.work);
+	}
 
-		vma = xe_vma_create(vm, NULL, bo_offset_or_userptr, addr,
-				    addr + range - 1,
-				    op & XE_VM_BIND_FLAG_READONLY,
-				    gt_mask);
-		if (!vma)
-			return ERR_PTR(-ENOMEM);
+	return 0;
 
-		err = xe_vma_userptr_pin_pages(vma);
-		if (err) {
-			prep_vma_destroy(vm, vma);
-			xe_vma_destroy_unlocked(vma);
+unwind:
+	list_for_each_entry_reverse(op, ops_list, link)
+		xe_vma_op_unwind(vm, op, op->flags & XE_VMA_OP_COMMITTED);
+	list_for_each_entry_safe(op, next, ops_list, link)
+		xe_vma_op_cleanup(vm, op);
 
-			return ERR_PTR(err);
-		} else {
-			xe_vm_insert_vma(vm, vma);
+	return err;
+}
+
+/*
+ * Unwind operations list, called after a failure of vm_bind_ioctl_ops_create or
+ * vm_bind_ioctl_ops_parse.
+ */
+static void vm_bind_ioctl_ops_unwind(struct xe_vm *vm,
+				     struct drm_gpuva_ops **ops,
+				     int num_ops_list)
+{
+	int i;
+
+	for (i = 0; i < num_ops_list; ++i) {
+		struct drm_gpuva_ops *__ops = ops[i];
+		struct drm_gpuva_op *__op;
+
+		if (!__ops)
+			continue;
+
+		drm_gpuva_for_each_op(__op, __ops) {
+			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
+
+			xe_vma_op_unwind(vm, op, false);
 		}
-		break;
-	default:
-		XE_BUG_ON("NOT POSSIBLE");
-		vma = ERR_PTR(-EINVAL);
 	}
-
-	return vma;
 }
 
 #ifdef TEST_VM_ASYNC_OPS_ERROR
@@ -2973,15 +2977,16 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	struct drm_xe_vm_bind *args = data;
 	struct drm_xe_sync __user *syncs_user;
 	struct xe_bo **bos = NULL;
-	struct xe_vma **vmas = NULL;
+	struct drm_gpuva_ops **ops = NULL;
 	struct xe_vm *vm;
 	struct xe_engine *e = NULL;
 	u32 num_syncs;
 	struct xe_sync_entry *syncs = NULL;
 	struct drm_xe_vm_bind_op *bind_ops;
+	LIST_HEAD(ops_list);
 	bool async;
 	int err;
-	int i, j = 0;
+	int i;
 
 	err = vm_bind_ioctl_check_args(xe, args, &bind_ops, &async);
 	if (err)
@@ -3069,8 +3074,8 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		goto put_engine;
 	}
 
-	vmas = kzalloc(sizeof(*vmas) * args->num_binds, GFP_KERNEL);
-	if (!vmas) {
+	ops = kzalloc(sizeof(*ops) * args->num_binds, GFP_KERNEL);
+	if (!ops) {
 		err = -ENOMEM;
 		goto put_engine;
 	}
@@ -3137,7 +3142,7 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		u64 addr = bind_ops[i].addr;
 		u32 op = bind_ops[i].op;
 
-		err = __vm_bind_ioctl_lookup_vma(vm, bos[i], addr, range, op);
+		err = vm_bind_ioctl_lookup_vma(vm, bos[i], addr, range, op);
 		if (err)
 			goto release_vm_lock;
 	}
@@ -3150,128 +3155,45 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		u64 gt_mask = bind_ops[i].gt_mask;
 		u32 region = bind_ops[i].region;
 
-		vmas[i] = vm_bind_ioctl_lookup_vma(vm, bos[i], obj_offset,
-						   addr, range, op, gt_mask,
-						   region);
-		if (IS_ERR(vmas[i])) {
-			err = PTR_ERR(vmas[i]);
-			vmas[i] = NULL;
-			goto destroy_vmas;
+		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
+						  addr, range, op, gt_mask,
+						  region);
+		if (IS_ERR(ops[i])) {
+			err = PTR_ERR(ops[i]);
+			ops[i] = NULL;
+			goto unwind_ops;
 		}
 	}
 
-	for (j = 0; j < args->num_binds; ++j) {
-		struct xe_sync_entry *__syncs;
-		u32 __num_syncs = 0;
-		bool first_or_last = j == 0 || j == args->num_binds - 1;
-
-		if (args->num_binds == 1) {
-			__num_syncs = num_syncs;
-			__syncs = syncs;
-		} else if (first_or_last && num_syncs) {
-			bool first = j == 0;
-
-			__syncs = kmalloc(sizeof(*__syncs) * num_syncs,
-					  GFP_KERNEL);
-			if (!__syncs) {
-				err = ENOMEM;
-				break;
-			}
-
-			/* in-syncs on first bind, out-syncs on last bind */
-			for (i = 0; i < num_syncs; ++i) {
-				bool signal = syncs[i].flags &
-					DRM_XE_SYNC_SIGNAL;
-
-				if ((first && !signal) || (!first && signal))
-					__syncs[__num_syncs++] = syncs[i];
-			}
-		} else {
-			__num_syncs = 0;
-			__syncs = NULL;
-		}
-
-		if (async) {
-			bool last = j == args->num_binds - 1;
-
-			/*
-			 * Each pass of async worker drops the ref, take a ref
-			 * here, 1 set of refs taken above
-			 */
-			if (!last) {
-				if (e)
-					xe_engine_get(e);
-				xe_vm_get(vm);
-			}
-
-			err = vm_bind_ioctl_async(vm, vmas[j], e, bos[j],
-						  bind_ops + j, __syncs,
-						  __num_syncs);
-			if (err && !last) {
-				if (e)
-					xe_engine_put(e);
-				xe_vm_put(vm);
-			}
-			if (err)
-				break;
-		} else {
-			XE_BUG_ON(j != 0);	/* Not supported */
-			err = vm_bind_ioctl(vm, vmas[j], e, bos[j],
-					    bind_ops + j, __syncs,
-					    __num_syncs, NULL);
-			break;	/* Needed so cleanup loops work */
-		}
-	}
+	err = vm_bind_ioctl_ops_parse(vm, e, ops, args->num_binds,
+				      syncs, num_syncs, &ops_list, async);
+	if (err)
+		goto unwind_ops;
 
-	/* Most of cleanup owned by the async bind worker */
-	if (async && !err) {
-		up_write(&vm->lock);
-		if (args->num_binds > 1)
-			kfree(syncs);
-		goto free_objs;
-	}
+	err = vm_bind_ioctl_ops_commit(vm, &ops_list, async);
+	up_write(&vm->lock);
 
-destroy_vmas:
-	for (i = j; err && i < args->num_binds; ++i) {
-		u32 op = bind_ops[i].op;
-		struct xe_vma *vma, *next;
+	for (i = 0; i < args->num_binds; ++i)
+		xe_bo_put(bos[i]);
 
-		if (!vmas[i])
-			break;
+	kfree(bos);
+	kfree(ops);
+	if (args->num_binds > 1)
+		kfree(bind_ops);
 
-		list_for_each_entry_safe(vma, next, &vma->unbind_link,
-					 unbind_link) {
-			list_del_init(&vma->unbind_link);
-			if (!vma->destroyed) {
-				prep_vma_destroy(vm, vma);
-				xe_vma_destroy_unlocked(vma);
-			}
-		}
+	return err;
 
-		switch (VM_BIND_OP(op)) {
-		case XE_VM_BIND_OP_MAP:
-			prep_vma_destroy(vm, vmas[i]);
-			xe_vma_destroy_unlocked(vmas[i]);
-			break;
-		case XE_VM_BIND_OP_MAP_USERPTR:
-			prep_vma_destroy(vm, vmas[i]);
-			xe_vma_destroy_unlocked(vmas[i]);
-			break;
-		}
-	}
+unwind_ops:
+	vm_bind_ioctl_ops_unwind(vm, ops, args->num_binds);
 release_vm_lock:
 	up_write(&vm->lock);
 free_syncs:
-	while (num_syncs--) {
-		if (async && j &&
-		    !(syncs[num_syncs].flags & DRM_XE_SYNC_SIGNAL))
-			continue;	/* Still in async worker */
+	while (num_syncs--)
 		xe_sync_entry_cleanup(&syncs[num_syncs]);
-	}
 
 	kfree(syncs);
 put_obj:
-	for (i = j; i < args->num_binds; ++i)
+	for (i = 0; i < args->num_binds; ++i)
 		xe_bo_put(bos[i]);
 put_engine:
 	if (e)
@@ -3280,10 +3202,10 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	xe_vm_put(vm);
 free_objs:
 	kfree(bos);
-	kfree(vmas);
+	kfree(ops);
 	if (args->num_binds > 1)
 		kfree(bind_ops);
-	return err;
+	return err == -ENODATA ? 0 : err;
 }
 
 /*
@@ -3324,14 +3246,14 @@ void xe_vm_unlock(struct xe_vm *vm, struct ww_acquire_ctx *ww)
  */
 int xe_vm_invalidate_vma(struct xe_vma *vma)
 {
-	struct xe_device *xe = vma->vm->xe;
+	struct xe_device *xe = xe_vma_vm(vma)->xe;
 	struct xe_gt *gt;
 	u32 gt_needs_invalidate = 0;
 	int seqno[XE_MAX_GT];
 	u8 id;
 	int ret;
 
-	XE_BUG_ON(!xe_vm_in_fault_mode(vma->vm));
+	XE_BUG_ON(!xe_vm_in_fault_mode(xe_vma_vm(vma)));
 	trace_xe_vma_usm_invalidate(vma);
 
 	/* Check that we don't race with page-table updates */
@@ -3340,11 +3262,11 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
 			WARN_ON_ONCE(!mmu_interval_check_retry
 				     (&vma->userptr.notifier,
 				      vma->userptr.notifier_seq));
-			WARN_ON_ONCE(!dma_resv_test_signaled(&vma->vm->resv,
+			WARN_ON_ONCE(!dma_resv_test_signaled(&xe_vma_vm(vma)->resv,
 							     DMA_RESV_USAGE_BOOKKEEP));
 
 		} else {
-			xe_bo_assert_held(vma->bo);
+			xe_bo_assert_held(xe_vma_bo(vma));
 		}
 	}
 
@@ -3374,7 +3296,8 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
 #if IS_ENABLED(CONFIG_DRM_XE_SIMPLE_ERROR_CAPTURE)
 int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
 {
-	struct rb_node *node;
+	DRM_GPUVA_ITER(it, &vm->mgr, 0);
+	struct drm_gpuva *gpuva;
 	bool is_vram;
 	uint64_t addr;
 
@@ -3383,26 +3306,24 @@ int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
 		return 0;
 	}
 	if (vm->pt_root[gt_id]) {
-		addr = xe_bo_addr(vm->pt_root[gt_id]->bo, 0, XE_PAGE_SIZE,
-				  &is_vram);
+		addr = xe_bo_addr(vm->pt_root[gt_id]->bo, 0, XE_PAGE_SIZE, &is_vram);
 		drm_printf(p, " VM root: A:0x%llx %s\n", addr, is_vram ? "VRAM" : "SYS");
 	}
 
-	for (node = rb_first(&vm->vmas); node; node = rb_next(node)) {
-		struct xe_vma *vma = to_xe_vma(node);
+	drm_gpuva_iter_for_each(gpuva, it) {
+		struct xe_vma* vma = gpuva_to_vma(gpuva);
 		bool is_userptr = xe_vma_is_userptr(vma);
 
 		if (is_userptr) {
 			struct xe_res_cursor cur;
 
-			xe_res_first_sg(vma->userptr.sg, 0, XE_PAGE_SIZE,
-					&cur);
+			xe_res_first_sg(vma->userptr.sg, 0, XE_PAGE_SIZE, &cur);
 			addr = xe_res_dma(&cur);
 		} else {
-			addr = xe_bo_addr(vma->bo, 0, XE_PAGE_SIZE, &is_vram);
+			addr = xe_bo_addr(xe_vma_bo(vma), 0, XE_PAGE_SIZE, &is_vram);
 		}
 		drm_printf(p, " [%016llx-%016llx] S:0x%016llx A:%016llx %s\n",
-			   vma->start, vma->end, vma->end - vma->start + 1ull,
+			   xe_vma_start(vma), xe_vma_end(vma), xe_vma_size(vma),
 			   addr, is_userptr ? "USR" : is_vram ? "VRAM" : "SYS");
 	}
 	up_read(&vm->lock);
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 748dc16ebed9..21b1054949c4 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -6,6 +6,7 @@
 #ifndef _XE_VM_H_
 #define _XE_VM_H_
 
+#include "xe_bo_types.h"
 #include "xe_macros.h"
 #include "xe_map.h"
 #include "xe_vm_types.h"
@@ -25,7 +26,6 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags);
 void xe_vm_free(struct kref *ref);
 
 struct xe_vm *xe_vm_lookup(struct xe_file *xef, u32 id);
-int xe_vma_cmp_vma_cb(const void *key, const struct rb_node *node);
 
 static inline struct xe_vm *xe_vm_get(struct xe_vm *vm)
 {
@@ -50,7 +50,67 @@ static inline bool xe_vm_is_closed(struct xe_vm *vm)
 }
 
 struct xe_vma *
-xe_vm_find_overlapping_vma(struct xe_vm *vm, const struct xe_vma *vma);
+xe_vm_find_overlapping_vma(struct xe_vm *vm, u64 start, u64 range);
+
+static inline struct xe_vm *gpuva_to_vm(struct drm_gpuva *gpuva)
+{
+	return container_of(gpuva->mgr, struct xe_vm, mgr);
+}
+
+static inline struct xe_vma *gpuva_to_vma(struct drm_gpuva *gpuva)
+{
+	return container_of(gpuva, struct xe_vma, gpuva);
+}
+
+static inline struct xe_vma_op *gpuva_op_to_vma_op(struct drm_gpuva_op *op)
+{
+	return container_of(op, struct xe_vma_op, base);
+}
+
+/*
+ * Let's abstract start, size, end, bo_offset, vm, and bo as the underlying
+ * implementation may change
+ */
+static inline u64 xe_vma_start(struct xe_vma *vma)
+{
+	return vma->gpuva.va.addr;
+}
+
+static inline u64 xe_vma_size(struct xe_vma *vma)
+{
+	return vma->gpuva.va.range;
+}
+
+static inline u64 xe_vma_end(struct xe_vma *vma)
+{
+	return xe_vma_start(vma) + xe_vma_size(vma);
+}
+
+static inline u64 xe_vma_bo_offset(struct xe_vma *vma)
+{
+	return vma->gpuva.gem.offset;
+}
+
+static inline struct xe_bo *xe_vma_bo(struct xe_vma *vma)
+{
+	return !vma->gpuva.gem.obj ? NULL :
+		container_of(vma->gpuva.gem.obj, struct xe_bo, ttm.base);
+}
+
+static inline struct xe_vm *xe_vma_vm(struct xe_vma *vma)
+{
+	return container_of(vma->gpuva.mgr, struct xe_vm, mgr);
+}
+
+static inline bool xe_vma_read_only(struct xe_vma *vma)
+{
+	return vma->gpuva.flags & XE_VMA_READ_ONLY;
+}
+
+static inline u64 xe_vma_userptr(struct xe_vma *vma)
+{
+	return vma->gpuva.gem.offset;
+}
 
 #define xe_vm_assert_held(vm) dma_resv_assert_held(&(vm)->resv)
 
@@ -117,7 +177,7 @@ static inline void xe_vm_reactivate_rebind(struct xe_vm *vm)
 
 static inline bool xe_vma_is_userptr(struct xe_vma *vma)
 {
-	return !vma->bo;
+	return !xe_vma_bo(vma);
 }
 
 int xe_vma_userptr_pin_pages(struct xe_vma *vma);
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index 29815852985a..02d27a354b36 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -30,7 +30,7 @@ static int madvise_preferred_mem_class(struct xe_device *xe, struct xe_vm *vm,
 		struct xe_bo *bo;
 		struct ww_acquire_ctx ww;
 
-		bo = vmas[i]->bo;
+		bo = xe_vma_bo(vmas[i]);
 
 		err = xe_bo_lock(bo, &ww, 0, true);
 		if (err)
@@ -55,7 +55,7 @@ static int madvise_preferred_gt(struct xe_device *xe, struct xe_vm *vm,
 		struct xe_bo *bo;
 		struct ww_acquire_ctx ww;
 
-		bo = vmas[i]->bo;
+		bo = xe_vma_bo(vmas[i]);
 
 		err = xe_bo_lock(bo, &ww, 0, true);
 		if (err)
@@ -91,7 +91,7 @@ static int madvise_preferred_mem_class_gt(struct xe_device *xe,
 		struct xe_bo *bo;
 		struct ww_acquire_ctx ww;
 
-		bo = vmas[i]->bo;
+		bo = xe_vma_bo(vmas[i]);
 
 		err = xe_bo_lock(bo, &ww, 0, true);
 		if (err)
@@ -114,7 +114,7 @@ static int madvise_cpu_atomic(struct xe_device *xe, struct xe_vm *vm,
 		struct xe_bo *bo;
 		struct ww_acquire_ctx ww;
 
-		bo = vmas[i]->bo;
+		bo = xe_vma_bo(vmas[i]);
 		if (XE_IOCTL_ERR(xe, !(bo->flags & XE_BO_CREATE_SYSTEM_BIT)))
 			return -EINVAL;
 
@@ -145,7 +145,7 @@ static int madvise_device_atomic(struct xe_device *xe, struct xe_vm *vm,
 		struct xe_bo *bo;
 		struct ww_acquire_ctx ww;
 
-		bo = vmas[i]->bo;
+		bo = xe_vma_bo(vmas[i]);
 		if (XE_IOCTL_ERR(xe, !(bo->flags & XE_BO_CREATE_VRAM0_BIT) &&
 				 !(bo->flags & XE_BO_CREATE_VRAM1_BIT)))
 			return -EINVAL;
@@ -176,7 +176,7 @@ static int madvise_priority(struct xe_device *xe, struct xe_vm *vm,
 		struct xe_bo *bo;
 		struct ww_acquire_ctx ww;
 
-		bo = vmas[i]->bo;
+		bo = xe_vma_bo(vmas[i]);
 
 		err = xe_bo_lock(bo, &ww, 0, true);
 		if (err)
@@ -210,19 +210,13 @@ static const madvise_func madvise_funcs[] = {
 	[DRM_XE_VM_MADVISE_PIN] = madvise_pin,
 };
 
-static struct xe_vma *node_to_vma(const struct rb_node *node)
-{
-	BUILD_BUG_ON(offsetof(struct xe_vma, vm_node) != 0);
-	return (struct xe_vma *)node;
-}
-
 static struct xe_vma **
 get_vmas(struct xe_vm *vm, int *num_vmas, u64 addr, u64 range)
 {
-	struct xe_vma **vmas;
-	struct xe_vma *vma, *__vma, lookup;
+	struct xe_vma **vmas, **__vmas;
+	struct drm_gpuva *gpuva;
 	int max_vmas = 8;
-	struct rb_node *node;
+	DRM_GPUVA_ITER(it, &vm->mgr, addr);
 
 	lockdep_assert_held(&vm->lock);
 
@@ -230,64 +224,24 @@ get_vmas(struct xe_vm *vm, int *num_vmas, u64 addr, u64 range)
 	if (!vmas)
 		return NULL;
 
-	lookup.start = addr;
-	lookup.end = addr + range - 1;
+	drm_gpuva_iter_for_each_range(gpuva, it, addr + range) {
+		struct xe_vma *vma = gpuva_to_vma(gpuva);
 
-	vma = xe_vm_find_overlapping_vma(vm, &lookup);
-	if (!vma)
-		return vmas;
+		if (xe_vma_is_userptr(vma))
+			continue;
 
-	if (!xe_vma_is_userptr(vma)) {
+		if (*num_vmas == max_vmas) {
+			max_vmas <<= 1;
+			__vmas = krealloc(vmas, max_vmas * sizeof(*vmas),
+					  GFP_KERNEL);
+			if (!__vmas)
+				return NULL;
+			vmas = __vmas;
+		}
 		vmas[*num_vmas] = vma;
 		*num_vmas += 1;
 	}
 
-	node = &vma->vm_node;
-	while ((node = rb_next(node))) {
-		if (!xe_vma_cmp_vma_cb(&lookup, node)) {
-			__vma = node_to_vma(node);
-			if (xe_vma_is_userptr(__vma))
-				continue;
-
-			if (*num_vmas == max_vmas) {
-				struct xe_vma **__vmas =
-					krealloc(vmas, max_vmas * sizeof(*vmas),
-						 GFP_KERNEL);
-
-				if (!__vmas)
-					return NULL;
-				vmas = __vmas;
-			}
-			vmas[*num_vmas] = __vma;
-			*num_vmas += 1;
-		} else {
-			break;
-		}
-	}
-
-	node = &vma->vm_node;
-	while ((node = rb_prev(node))) {
-		if (!xe_vma_cmp_vma_cb(&lookup, node)) {
-			__vma = node_to_vma(node);
-			if (xe_vma_is_userptr(__vma))
-				continue;
-
-			if (*num_vmas == max_vmas) {
-				struct xe_vma **__vmas =
-					krealloc(vmas, max_vmas * sizeof(*vmas),
-						 GFP_KERNEL);
-
-				if (!__vmas)
-					return NULL;
-				vmas = __vmas;
-			}
-			vmas[*num_vmas] = __vma;
-			*num_vmas += 1;
-		} else {
-			break;
-		}
-	}
-
 	return vmas;
 }
 
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index d3e99f22510d..243dc91a61b0 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -6,6 +6,8 @@
 #ifndef _XE_VM_TYPES_H_
 #define _XE_VM_TYPES_H_
 
+#include <drm/drm_gpuva_mgr.h>
+
 #include <linux/dma-resv.h>
 #include <linux/kref.h>
 #include <linux/mmu_notifier.h>
@@ -14,28 +16,23 @@
 #include "xe_device_types.h"
 #include "xe_pt_types.h"
 
+struct async_op_fence;
 struct xe_bo;
+struct xe_sync_entry;
 struct xe_vm;
 
-struct xe_vma {
-	struct rb_node vm_node;
-	/** @vm: VM which this VMA belongs to */
-	struct xe_vm *vm;
+#define TEST_VM_ASYNC_OPS_ERROR
+#define FORCE_ASYNC_OP_ERROR	BIT(31)
 
-	/**
-	 * @start: start address of this VMA within its address domain, end -
-	 * start + 1 == VMA size
-	 */
-	u64 start;
-	/** @end: end address of this VMA within its address domain */
-	u64 end;
-	/** @pte_flags: pte flags for this VMA */
-	u32 pte_flags;
+#define XE_VMA_READ_ONLY	DRM_GPUVA_USERBITS
+#define XE_VMA_DESTROYED	(DRM_GPUVA_USERBITS << 1)
+#define XE_VMA_ATOMIC_PTE_BIT	(DRM_GPUVA_USERBITS << 2)
+#define XE_VMA_FIRST_REBIND	(DRM_GPUVA_USERBITS << 3)
+#define XE_VMA_LAST_REBIND	(DRM_GPUVA_USERBITS << 4)
 
-	/** @bo: BO if not a userptr, must be NULL is userptr */
-	struct xe_bo *bo;
-	/** @bo_offset: offset into BO if not a userptr, unused for userptr */
-	u64 bo_offset;
+struct xe_vma {
+	/** @gpuva: Base GPUVA object */
+	struct drm_gpuva gpuva;
 
 	/** @gt_mask: GT mask of where to create binding for this VMA */
 	u64 gt_mask;
@@ -49,40 +46,8 @@ struct xe_vma {
 	 */
 	u64 gt_present;
 
-	/**
-	 * @destroyed: VMA is destroyed, in the sense that it shouldn't be
-	 * subject to rebind anymore. This field must be written under
-	 * the vm lock in write mode and the userptr.notifier_lock in
-	 * either mode. Read under the vm lock or the userptr.notifier_lock in
-	 * write mode.
-	 */
-	bool destroyed;
-
-	/**
-	 * @first_munmap_rebind: VMA is first in a sequence of ops that triggers
-	 * a rebind (munmap style VM unbinds). This indicates the operation
-	 * using this VMA must wait on all dma-resv slots (wait for pending jobs
-	 * / trigger preempt fences).
-	 */
-	bool first_munmap_rebind;
-
-	/**
-	 * @last_munmap_rebind: VMA is first in a sequence of ops that triggers
-	 * a rebind (munmap style VM unbinds). This indicates the operation
-	 * using this VMA must install itself into kernel dma-resv slot (blocks
-	 * future jobs) and kick the rebind work in compute mode.
-	 */
-	bool last_munmap_rebind;
-
-	/** @use_atomic_access_pte_bit: Set atomic access bit in PTE */
-	bool use_atomic_access_pte_bit;
-
-	union {
-		/** @bo_link: link into BO if not a userptr */
-		struct list_head bo_link;
-		/** @userptr_link: link into VM repin list if userptr */
-		struct list_head userptr_link;
-	};
+	/** @userptr_link: link into VM repin list if userptr */
+	struct list_head userptr_link;
 
 	/**
 	 * @rebind_link: link into VM if this VMA needs rebinding, and
@@ -105,8 +70,6 @@ struct xe_vma {
 
 	/** @userptr: user pointer state */
 	struct {
-		/** @ptr: user pointer */
-		uintptr_t ptr;
 		/** @invalidate_link: Link for the vm::userptr.invalidated list */
 		struct list_head invalidate_link;
 		/**
@@ -154,6 +117,9 @@ struct xe_device;
 #define xe_vm_assert_held(vm) dma_resv_assert_held(&(vm)->resv)
 
 struct xe_vm {
+	/** @mgr: base GPUVA used to track VMAs */
+	struct drm_gpuva_manager mgr;
+
 	struct xe_device *xe;
 
 	struct kref refcount;
@@ -168,7 +134,6 @@ struct xe_vm {
 	struct ttm_lru_bulk_move lru_bulk_move;
 
 	u64 size;
-	struct rb_root vmas;
 
 	struct xe_pt *pt_root[XE_MAX_GT];
 	struct xe_bo *scratch_bo[XE_MAX_GT];
@@ -342,4 +307,98 @@ struct xe_vm {
 	} error_capture;
 };
 
+/** struct xe_vma_op_map - VMA map operation */
+struct xe_vma_op_map {
+	/** @vma: VMA to map */
+	struct xe_vma *vma;
+	/** @immediate: Immediate bind */
+	bool immediate;
+	/** @read_only: Read only */
+	bool read_only;
+};
+
+/** struct xe_vma_op_unmap - VMA unmap operation */
+struct xe_vma_op_unmap {
+	/** @start: start of the VMA unmap */
+	u64 start;
+	/** @range: range of the VMA unmap */
+	u64 range;
+};
+
+/** struct xe_vma_op_remap - VMA remap operation */
+struct xe_vma_op_remap {
+	/** @prev: VMA preceding part of a split mapping */
+	struct xe_vma *prev;
+	/** @next: VMA subsequent part of a split mapping */
+	struct xe_vma *next;
+	/** @start: start of the VMA unmap */
+	u64 start;
+	/** @range: range of the VMA unmap */
+	u64 range;
+	/** @unmap_done: unmap operation in done */
+	bool unmap_done;
+};
+
+/** struct xe_vma_op_prefetch - VMA prefetch operation */
+struct xe_vma_op_prefetch {
+	/** @region: memory region to prefetch to */
+	u32 region;
+};
+
+/** enum xe_vma_op_flags - flags for VMA operation */
+enum xe_vma_op_flags {
+	/** @XE_VMA_OP_FIRST: first VMA operation for a set of syncs */
+	XE_VMA_OP_FIRST		= (0x1 << 0),
+	/** @XE_VMA_OP_LAST: last VMA operation for a set of syncs */
+	XE_VMA_OP_LAST		= (0x1 << 1),
+	/** @XE_VMA_OP_COMMITTED: VMA operation committed */
+	XE_VMA_OP_COMMITTED	= (0x1 << 2),
+};
+
+/** struct xe_vma_op - VMA operation */
+struct xe_vma_op {
+	/** @base: GPUVA base operation */
+	struct drm_gpuva_op base;
+	/**
+	 * @ops: GPUVA ops, when set call drm_gpuva_ops_free after this
+	 * operations is processed
+	 */
+	struct drm_gpuva_ops *ops;
+	/** @engine: engine for this operation */
+	struct xe_engine *engine;
+	/**
+	 * @syncs: syncs for this operation, only used on first and last
+	 * operation
+	 */
+	struct xe_sync_entry *syncs;
+	/** @num_syncs: number of syncs */
+	u32 num_syncs;
+	/** @link: async operation link */
+	struct list_head link;
+	/**
+	 * @fence: async operation fence, signaled on last operation complete
+	 */
+	struct async_op_fence *fence;
+	/** @gt_mask: gt mask for this operation */
+	u64 gt_mask;
+	/** @flags: operation flags */
+	enum xe_vma_op_flags flags;
+
+#ifdef TEST_VM_ASYNC_OPS_ERROR
+	/** @inject_error: inject error to test async op error handling */
+	bool inject_error;
+#endif
+
+	union {
+		/** @map: VMA map operation specific data */
+		struct xe_vma_op_map map;
+		/** @unmap: VMA unmap operation specific data */
+		struct xe_vma_op_unmap unmap;
+		/** @remap: VMA remap operation specific data */
+		struct xe_vma_op_remap remap;
+		/** @prefetch: VMA prefetch operation specific data */
+		struct xe_vma_op_prefetch prefetch;
+	};
+};
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 17/31] drm/xe: NULL binding implementation
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (15 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 16/31] drm/xe: Port Xe to GPUVA Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-09 14:34   ` Rodrigo Vivi
  2023-05-09 15:17   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 18/31] drm/xe: Avoid doing rebinds Matthew Brost
                   ` (15 subsequent siblings)
  32 siblings, 2 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

Add uAPI and implementation for NULL bindings. A NULL binding is defined
as writes dropped and read zero. A single bit in the uAPI has been added
which results in a single bit in the PTEs being set.

NULL bindings are indended to be used to implement VK sparse bindings.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.h           |  1 +
 drivers/gpu/drm/xe/xe_exec.c         |  2 +
 drivers/gpu/drm/xe/xe_gt_pagefault.c |  4 +-
 drivers/gpu/drm/xe/xe_pt.c           | 77 ++++++++++++++++-------
 drivers/gpu/drm/xe/xe_vm.c           | 92 ++++++++++++++++++----------
 drivers/gpu/drm/xe/xe_vm.h           | 10 +++
 drivers/gpu/drm/xe/xe_vm_madvise.c   |  2 +-
 drivers/gpu/drm/xe/xe_vm_types.h     |  3 +
 include/uapi/drm/xe_drm.h            |  8 +++
 9 files changed, 144 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 25457b3c757b..81051f456874 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -56,6 +56,7 @@
 #define XE_PDE_IPS_64K			BIT_ULL(11)
 
 #define XE_GGTT_PTE_LM			BIT_ULL(1)
+#define XE_PTE_NULL			BIT_ULL(9)
 #define XE_USM_PPGTT_PTE_AE		BIT_ULL(10)
 #define XE_PPGTT_PTE_LM			BIT_ULL(11)
 #define XE_PDE_64K			BIT_ULL(6)
diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 90c46d092737..68f876afd13c 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -116,6 +116,8 @@ static int xe_exec_begin(struct xe_engine *e, struct ww_acquire_ctx *ww,
 	 * to a location where the GPU can access it).
 	 */
 	list_for_each_entry(vma, &vm->rebind_list, rebind_link) {
+		XE_BUG_ON(xe_vma_is_null(vma));
+
 		if (xe_vma_is_userptr(vma))
 			continue;
 
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index f7a066090a13..cfffe3398fe4 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -526,8 +526,8 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
 
 	trace_xe_vma_acc(vma);
 
-	/* Userptr can't be migrated, nothing to do */
-	if (xe_vma_is_userptr(vma))
+	/* Userptr or null can't be migrated, nothing to do */
+	if (xe_vma_has_no_bo(vma))
 		goto unlock_vm;
 
 	/* Lock VM and BOs dma-resv */
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 2b5b05a8a084..b4edb751bfbb 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -82,7 +82,9 @@ u64 gen8_pde_encode(struct xe_bo *bo, u64 bo_offset,
 static dma_addr_t vma_addr(struct xe_vma *vma, u64 offset,
 			   size_t page_size, bool *is_vram)
 {
-	if (xe_vma_is_userptr(vma)) {
+	if (xe_vma_is_null(vma)) {
+		return 0;
+	} else if (xe_vma_is_userptr(vma)) {
 		struct xe_res_cursor cur;
 		u64 page;
 
@@ -563,6 +565,10 @@ static bool xe_pt_hugepte_possible(u64 addr, u64 next, unsigned int level,
 	if (next - xe_walk->va_curs_start > xe_walk->curs->size)
 		return false;
 
+	/* null VMA's do not have dma adresses */
+	if (xe_walk->pte_flags & XE_PTE_NULL)
+		return true;
+
 	/* Is the DMA address huge PTE size aligned? */
 	size = next - addr;
 	dma = addr - xe_walk->va_curs_start + xe_res_dma(xe_walk->curs);
@@ -585,6 +591,10 @@ xe_pt_scan_64K(u64 addr, u64 next, struct xe_pt_stage_bind_walk *xe_walk)
 	if (next > xe_walk->l0_end_addr)
 		return false;
 
+	/* null VMA's do not have dma adresses */
+	if (xe_walk->pte_flags & XE_PTE_NULL)
+		return true;
+
 	xe_res_next(&curs, addr - xe_walk->va_curs_start);
 	for (; addr < next; addr += SZ_64K) {
 		if (!IS_ALIGNED(xe_res_dma(&curs), SZ_64K) || curs.size < SZ_64K)
@@ -630,17 +640,34 @@ xe_pt_stage_bind_entry(struct drm_pt *parent, pgoff_t offset,
 	struct xe_pt *xe_child;
 	bool covers;
 	int ret = 0;
-	u64 pte;
+	u64 pte = 0;
 
 	/* Is this a leaf entry ?*/
 	if (level == 0 || xe_pt_hugepte_possible(addr, next, level, xe_walk)) {
 		struct xe_res_cursor *curs = xe_walk->curs;
+		bool null = xe_walk->pte_flags & XE_PTE_NULL;
 
 		XE_WARN_ON(xe_walk->va_curs_start != addr);
 
-		pte = __gen8_pte_encode(xe_res_dma(curs) + xe_walk->dma_offset,
-					xe_walk->cache, xe_walk->pte_flags,
-					level);
+		if (null) {
+			pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
+
+			if (unlikely(xe_walk->pte_flags & XE_PTE_READ_ONLY))
+				pte &= ~XE_PAGE_RW;
+
+			if (level == 1)
+				pte |= XE_PDE_PS_2M;
+			else if (level == 2)
+				pte |= XE_PDPE_PS_1G;
+
+			pte |= XE_PTE_NULL;
+		} else {
+			pte = __gen8_pte_encode(xe_res_dma(curs) +
+						xe_walk->dma_offset,
+						xe_walk->cache,
+						xe_walk->pte_flags,
+						level);
+		}
 		pte |= xe_walk->default_pte;
 
 		/*
@@ -658,7 +685,8 @@ xe_pt_stage_bind_entry(struct drm_pt *parent, pgoff_t offset,
 		if (unlikely(ret))
 			return ret;
 
-		xe_res_next(curs, next - addr);
+		if (!null)
+			xe_res_next(curs, next - addr);
 		xe_walk->va_curs_start = next;
 		*action = ACTION_CONTINUE;
 
@@ -751,7 +779,8 @@ xe_pt_stage_bind(struct xe_gt *gt, struct xe_vma *vma,
 		.gt = gt,
 		.curs = &curs,
 		.va_curs_start = xe_vma_start(vma),
-		.pte_flags = xe_vma_read_only(vma) ? XE_PTE_READ_ONLY : 0,
+		.pte_flags = xe_vma_read_only(vma) ? XE_PTE_READ_ONLY : 0 |
+			xe_vma_is_null(vma) ? XE_PTE_NULL : 0,
 		.wupd.entries = entries,
 		.needs_64K = (xe_vma_vm(vma)->flags & XE_VM_FLAGS_64K) &&
 			is_vram,
@@ -769,23 +798,28 @@ xe_pt_stage_bind(struct xe_gt *gt, struct xe_vma *vma,
 			gt_to_xe(gt)->mem.vram.io_start;
 		xe_walk.cache = XE_CACHE_WB;
 	} else {
-		if (!xe_vma_is_userptr(vma) && bo->flags & XE_BO_SCANOUT_BIT)
+		if (!xe_vma_has_no_bo(vma) && bo->flags & XE_BO_SCANOUT_BIT)
 			xe_walk.cache = XE_CACHE_WT;
 		else
 			xe_walk.cache = XE_CACHE_WB;
 	}
-	if (!xe_vma_is_userptr(vma) && xe_bo_is_stolen(bo))
+	if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo))
 		xe_walk.dma_offset = xe_ttm_stolen_gpu_offset(xe_bo_device(bo));
 
 	xe_bo_assert_held(bo);
-	if (xe_vma_is_userptr(vma))
-		xe_res_first_sg(vma->userptr.sg, 0, xe_vma_size(vma), &curs);
-	else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
-		xe_res_first(bo->ttm.resource, xe_vma_bo_offset(vma),
-			     xe_vma_size(vma), &curs);
-	else
-		xe_res_first_sg(xe_bo_get_sg(bo), xe_vma_bo_offset(vma),
-				xe_vma_size(vma), &curs);
+	if (!xe_vma_is_null(vma)) {
+		if (xe_vma_is_userptr(vma))
+			xe_res_first_sg(vma->userptr.sg, 0, xe_vma_size(vma),
+					&curs);
+		else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
+			xe_res_first(bo->ttm.resource, xe_vma_bo_offset(vma),
+				     xe_vma_size(vma), &curs);
+		else
+			xe_res_first_sg(xe_bo_get_sg(bo), xe_vma_bo_offset(vma),
+					xe_vma_size(vma), &curs);
+	} else {
+		curs.size = xe_vma_size(vma);
+	}
 
 	ret = drm_pt_walk_range(&pt->drm, pt->level, xe_vma_start(vma),
 				xe_vma_end(vma), &xe_walk.drm);
@@ -979,7 +1013,7 @@ static void xe_pt_commit_locks_assert(struct xe_vma *vma)
 
 	if (xe_vma_is_userptr(vma))
 		lockdep_assert_held_read(&vm->userptr.notifier_lock);
-	else
+	else if (!xe_vma_is_null(vma))
 		dma_resv_assert_held(xe_vma_bo(vma)->ttm.base.resv);
 
 	dma_resv_assert_held(&vm->resv);
@@ -1283,7 +1317,8 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 	struct xe_vm_pgtable_update entries[XE_VM_MAX_LEVEL * 2 + 1];
 	struct xe_pt_migrate_pt_update bind_pt_update = {
 		.base = {
-			.ops = xe_vma_is_userptr(vma) ? &userptr_bind_ops : &bind_ops,
+			.ops = xe_vma_is_userptr(vma) ? &userptr_bind_ops :
+				&bind_ops,
 			.vma = vma,
 		},
 		.bind = true,
@@ -1348,7 +1383,7 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 				   DMA_RESV_USAGE_KERNEL :
 				   DMA_RESV_USAGE_BOOKKEEP);
 
-		if (!xe_vma_is_userptr(vma) && !xe_vma_bo(vma)->vm)
+		if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
 			dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence,
 					   DMA_RESV_USAGE_BOOKKEEP);
 		xe_pt_commit_bind(vma, entries, num_entries, rebind,
@@ -1667,7 +1702,7 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 				   DMA_RESV_USAGE_BOOKKEEP);
 
 		/* This fence will be installed by caller when doing eviction */
-		if (!xe_vma_is_userptr(vma) && !xe_vma_bo(vma)->vm)
+		if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
 			dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence,
 					   DMA_RESV_USAGE_BOOKKEEP);
 		xe_pt_commit_unbind(vma, entries, num_entries,
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index f3608865e259..a46f44ab2546 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -60,6 +60,7 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
 
 	lockdep_assert_held(&vm->lock);
 	XE_BUG_ON(!xe_vma_is_userptr(vma));
+	XE_BUG_ON(xe_vma_is_null(vma));
 retry:
 	if (vma->gpuva.flags & XE_VMA_DESTROYED)
 		return 0;
@@ -581,7 +582,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
 		goto out_unlock;
 
 	list_for_each_entry(vma, &vm->rebind_list, rebind_link) {
-		if (xe_vma_is_userptr(vma) ||
+		if (xe_vma_has_no_bo(vma) ||
 		    vma->gpuva.flags & XE_VMA_DESTROYED)
 			continue;
 
@@ -813,7 +814,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 				    struct xe_bo *bo,
 				    u64 bo_offset_or_userptr,
 				    u64 start, u64 end,
-				    bool read_only,
+				    bool read_only, bool null,
 				    u64 gt_mask)
 {
 	struct xe_vma *vma;
@@ -843,6 +844,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 	vma->gpuva.va.range = end - start + 1;
 	if (read_only)
 		vma->gpuva.flags |= XE_VMA_READ_ONLY;
+	if (null)
+		vma->gpuva.flags |= XE_VMA_NULL;
 
 	if (gt_mask) {
 		vma->gt_mask = gt_mask;
@@ -862,23 +865,26 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 		vma->gpuva.gem.obj = &bo->ttm.base;
 		vma->gpuva.gem.offset = bo_offset_or_userptr;
 		drm_gpuva_link(&vma->gpuva);
-	} else /* userptr */ {
-		u64 size = end - start + 1;
-		int err;
-
-		vma->gpuva.gem.offset = bo_offset_or_userptr;
+	} else /* userptr or null */ {
+		if (!null) {
+			u64 size = end - start + 1;
+			int err;
+
+			vma->gpuva.gem.offset = bo_offset_or_userptr;
+			err = mmu_interval_notifier_insert(&vma->userptr.notifier,
+							   current->mm,
+							   xe_vma_userptr(vma),
+							   size,
+							   &vma_userptr_notifier_ops);
+			if (err) {
+				kfree(vma);
+				vma = ERR_PTR(err);
+				return vma;
+			}
 
-		err = mmu_interval_notifier_insert(&vma->userptr.notifier,
-						   current->mm,
-						   xe_vma_userptr(vma), size,
-						   &vma_userptr_notifier_ops);
-		if (err) {
-			kfree(vma);
-			vma = ERR_PTR(err);
-			return vma;
+			vma->userptr.notifier_seq = LONG_MAX;
 		}
 
-		vma->userptr.notifier_seq = LONG_MAX;
 		xe_vm_get(vm);
 	}
 
@@ -916,6 +922,8 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
 		 */
 		mmu_interval_notifier_remove(&vma->userptr.notifier);
 		xe_vm_put(vm);
+	} else if (xe_vma_is_null(vma)) {
+		xe_vm_put(vm);
 	} else {
 		xe_bo_put(xe_vma_bo(vma));
 	}
@@ -954,7 +962,7 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
 		list_del_init(&vma->userptr.invalidate_link);
 		spin_unlock(&vm->userptr.invalidated_lock);
 		list_del(&vma->userptr_link);
-	} else {
+	} else if (!xe_vma_is_null(vma)) {
 		xe_bo_assert_held(xe_vma_bo(vma));
 		drm_gpuva_unlink(&vma->gpuva);
 		if (!xe_vma_bo(vma)->vm)
@@ -1305,7 +1313,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 	drm_gpuva_iter_for_each(gpuva, it) {
 		vma = gpuva_to_vma(gpuva);
 
-		if (xe_vma_is_userptr(vma)) {
+		if (xe_vma_has_no_bo(vma)) {
 			down_read(&vm->userptr.notifier_lock);
 			vma->gpuva.flags |= XE_VMA_DESTROYED;
 			up_read(&vm->userptr.notifier_lock);
@@ -1315,7 +1323,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 		drm_gpuva_iter_remove(&it);
 
 		/* easy case, remove from VMA? */
-		if (xe_vma_is_userptr(vma) || xe_vma_bo(vma)->vm) {
+		if (xe_vma_has_no_bo(vma) || xe_vma_bo(vma)->vm) {
 			xe_vma_destroy(vma, NULL);
 			continue;
 		}
@@ -1964,7 +1972,7 @@ static int xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma,
 
 	XE_BUG_ON(region > ARRAY_SIZE(region_to_mem_type));
 
-	if (!xe_vma_is_userptr(vma)) {
+	if (!xe_vma_has_no_bo(vma)) {
 		err = xe_bo_migrate(xe_vma_bo(vma), region_to_mem_type[region]);
 		if (err)
 			return err;
@@ -2170,6 +2178,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 				operation & XE_VM_BIND_FLAG_IMMEDIATE;
 			op->map.read_only =
 				operation & XE_VM_BIND_FLAG_READONLY;
+			op->map.null = operation & XE_VM_BIND_FLAG_NULL;
 		}
 		break;
 	case XE_VM_BIND_OP_UNMAP:
@@ -2226,7 +2235,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 }
 
 static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
-			      u64 gt_mask, bool read_only)
+			      u64 gt_mask, bool read_only, bool null)
 {
 	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
 	struct xe_vma *vma;
@@ -2242,7 +2251,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 	}
 	vma = xe_vma_create(vm, bo, op->gem.offset,
 			    op->va.addr, op->va.addr +
-			    op->va.range - 1, read_only,
+			    op->va.range - 1, read_only, null,
 			    gt_mask);
 	if (bo)
 		xe_bo_unlock(bo, &ww);
@@ -2254,7 +2263,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 			xe_vma_destroy(vma, NULL);
 			return ERR_PTR(err);
 		}
-	} else if(!bo->vm) {
+	} else if(!xe_vma_has_no_bo(vma) && !bo->vm) {
 		vm_insert_extobj(vm, vma);
 		err = add_preempt_fences(vm, bo);
 		if (err) {
@@ -2332,7 +2341,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
 				struct xe_vma *vma;
 
 				vma = new_vma(vm, &op->base.map,
-					      op->gt_mask, op->map.read_only);
+					      op->gt_mask, op->map.read_only,
+					      op->map.null );
 				if (IS_ERR(vma)) {
 					err = PTR_ERR(vma);
 					goto free_fence;
@@ -2347,9 +2357,13 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
 					bool read_only =
 						op->base.remap.unmap->va->flags &
 						XE_VMA_READ_ONLY;
+					bool null =
+						op->base.remap.unmap->va->flags &
+						XE_VMA_NULL;
 
 					vma = new_vma(vm, op->base.remap.prev,
-						      op->gt_mask, read_only);
+						      op->gt_mask, read_only,
+						      null);
 					if (IS_ERR(vma)) {
 						err = PTR_ERR(vma);
 						goto free_fence;
@@ -2364,8 +2378,13 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
 						op->base.remap.unmap->va->flags &
 						XE_VMA_READ_ONLY;
 
+					bool null =
+						op->base.remap.unmap->va->flags &
+						XE_VMA_NULL;
+
 					vma = new_vma(vm, op->base.remap.next,
-						      op->gt_mask, read_only);
+						      op->gt_mask, read_only,
+						      null);
 					if (IS_ERR(vma)) {
 						err = PTR_ERR(vma);
 						goto free_fence;
@@ -2853,11 +2872,12 @@ static void vm_bind_ioctl_ops_unwind(struct xe_vm *vm,
 #ifdef TEST_VM_ASYNC_OPS_ERROR
 #define SUPPORTED_FLAGS	\
 	(FORCE_ASYNC_OP_ERROR | XE_VM_BIND_FLAG_ASYNC | \
-	 XE_VM_BIND_FLAG_READONLY | XE_VM_BIND_FLAG_IMMEDIATE | 0xffff)
+	 XE_VM_BIND_FLAG_READONLY | XE_VM_BIND_FLAG_IMMEDIATE | \
+	 XE_VM_BIND_FLAG_NULL | 0xffff)
 #else
 #define SUPPORTED_FLAGS	\
 	(XE_VM_BIND_FLAG_ASYNC | XE_VM_BIND_FLAG_READONLY | \
-	 XE_VM_BIND_FLAG_IMMEDIATE | 0xffff)
+	 XE_VM_BIND_FLAG_IMMEDIATE | XE_VM_BIND_FLAG_NULL | 0xffff)
 #endif
 #define XE_64K_PAGE_MASK 0xffffull
 
@@ -2903,6 +2923,7 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
 		u32 obj = (*bind_ops)[i].obj;
 		u64 obj_offset = (*bind_ops)[i].obj_offset;
 		u32 region = (*bind_ops)[i].region;
+		bool null = op &  XE_VM_BIND_FLAG_NULL;
 
 		if (i == 0) {
 			*async = !!(op & XE_VM_BIND_FLAG_ASYNC);
@@ -2929,8 +2950,12 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
 		if (XE_IOCTL_ERR(xe, VM_BIND_OP(op) >
 				 XE_VM_BIND_OP_PREFETCH) ||
 		    XE_IOCTL_ERR(xe, op & ~SUPPORTED_FLAGS) ||
+		    XE_IOCTL_ERR(xe, obj && null) ||
+		    XE_IOCTL_ERR(xe, obj_offset && null) ||
+		    XE_IOCTL_ERR(xe, VM_BIND_OP(op) != XE_VM_BIND_OP_MAP &&
+				 null) ||
 		    XE_IOCTL_ERR(xe, !obj &&
-				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP) ||
+				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP && !null) ||
 		    XE_IOCTL_ERR(xe, !obj &&
 				 VM_BIND_OP(op) == XE_VM_BIND_OP_UNMAP_ALL) ||
 		    XE_IOCTL_ERR(xe, addr &&
@@ -3254,6 +3279,7 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
 	int ret;
 
 	XE_BUG_ON(!xe_vm_in_fault_mode(xe_vma_vm(vma)));
+	XE_BUG_ON(xe_vma_is_null(vma));
 	trace_xe_vma_usm_invalidate(vma);
 
 	/* Check that we don't race with page-table updates */
@@ -3313,8 +3339,11 @@ int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
 	drm_gpuva_iter_for_each(gpuva, it) {
 		struct xe_vma* vma = gpuva_to_vma(gpuva);
 		bool is_userptr = xe_vma_is_userptr(vma);
+		bool null = xe_vma_is_null(vma);
 
-		if (is_userptr) {
+		if (null) {
+			addr = 0;
+		} else if (is_userptr) {
 			struct xe_res_cursor cur;
 
 			xe_res_first_sg(vma->userptr.sg, 0, XE_PAGE_SIZE, &cur);
@@ -3324,7 +3353,8 @@ int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
 		}
 		drm_printf(p, " [%016llx-%016llx] S:0x%016llx A:%016llx %s\n",
 			   xe_vma_start(vma), xe_vma_end(vma), xe_vma_size(vma),
-			   addr, is_userptr ? "USR" : is_vram ? "VRAM" : "SYS");
+			   addr, null ? "NULL" :
+			   is_userptr ? "USR" : is_vram ? "VRAM" : "SYS");
 	}
 	up_read(&vm->lock);
 
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 21b1054949c4..96e2c6b07bf8 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -175,7 +175,17 @@ static inline void xe_vm_reactivate_rebind(struct xe_vm *vm)
 	}
 }
 
+static inline bool xe_vma_is_null(struct xe_vma *vma)
+{
+	return vma->gpuva.flags & XE_VMA_NULL;
+}
+
 static inline bool xe_vma_is_userptr(struct xe_vma *vma)
+{
+	return !xe_vma_bo(vma) && !xe_vma_is_null(vma);
+}
+
+static inline bool xe_vma_has_no_bo(struct xe_vma *vma)
 {
 	return !xe_vma_bo(vma);
 }
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index 02d27a354b36..03508645fa08 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -227,7 +227,7 @@ get_vmas(struct xe_vm *vm, int *num_vmas, u64 addr, u64 range)
 	drm_gpuva_iter_for_each_range(gpuva, it, addr + range) {
 		struct xe_vma *vma = gpuva_to_vma(gpuva);
 
-		if (xe_vma_is_userptr(vma))
+		if (xe_vma_has_no_bo(vma))
 			continue;
 
 		if (*num_vmas == max_vmas) {
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 243dc91a61b0..b61007b70502 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -29,6 +29,7 @@ struct xe_vm;
 #define XE_VMA_ATOMIC_PTE_BIT	(DRM_GPUVA_USERBITS << 2)
 #define XE_VMA_FIRST_REBIND	(DRM_GPUVA_USERBITS << 3)
 #define XE_VMA_LAST_REBIND	(DRM_GPUVA_USERBITS << 4)
+#define XE_VMA_NULL		(DRM_GPUVA_USERBITS << 5)
 
 struct xe_vma {
 	/** @gpuva: Base GPUVA object */
@@ -315,6 +316,8 @@ struct xe_vma_op_map {
 	bool immediate;
 	/** @read_only: Read only */
 	bool read_only;
+	/** @null: NULL (writes dropped, read zero) */
+	bool null;
 };
 
 /** struct xe_vma_op_unmap - VMA unmap operation */
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index b0b80aae3ee8..27c51946fadd 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -447,6 +447,14 @@ struct drm_xe_vm_bind_op {
 	 * than differing the MAP to the page fault handler.
 	 */
 #define XE_VM_BIND_FLAG_IMMEDIATE	(0x1 << 18)
+	/*
+	 * When the NULL flag is set, the page tables are setup with a special
+	 * bit which indicates writes are dropped and all reads return zero. The
+	 * NULL flags is only valid for XE_VM_BIND_OP_MAP operations, the BO
+	 * handle MBZ, and the BO offset MBZ. This flag is intended to implement
+	 * VK sparse bindings.
+	 */
+#define XE_VM_BIND_FLAG_NULL		(0x1 << 19)
 
 	/** @reserved: Reserved */
 	__u64 reserved[2];
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 18/31] drm/xe: Avoid doing rebinds
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (16 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 17/31] drm/xe: NULL binding implementation Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-09 14:48   ` Rodrigo Vivi
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 19/31] drm/xe: Reduce the number list links in xe_vma Matthew Brost
                   ` (14 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

If we dont change page sizes we can avoid doing rebinds rather just do a
partial unbind. The algorithm to determine is page size is greedy as we
assume all pages in the removed VMA are the largest page used in the
VMA.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c       |  4 ++
 drivers/gpu/drm/xe/xe_vm.c       | 71 +++++++++++++++++++++++++-------
 drivers/gpu/drm/xe/xe_vm_types.h | 17 ++++----
 3 files changed, 67 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index b4edb751bfbb..010f44260cda 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -412,6 +412,8 @@ struct xe_pt_stage_bind_walk {
 	/* Input parameters for the walk */
 	/** @vm: The vm we're building for. */
 	struct xe_vm *vm;
+	/** @vma: The vma we are binding for. */
+	struct xe_vma *vma;
 	/** @gt: The gt we're building for. */
 	struct xe_gt *gt;
 	/** @cache: Desired cache level for the ptes */
@@ -688,6 +690,7 @@ xe_pt_stage_bind_entry(struct drm_pt *parent, pgoff_t offset,
 		if (!null)
 			xe_res_next(curs, next - addr);
 		xe_walk->va_curs_start = next;
+		xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << level);
 		*action = ACTION_CONTINUE;
 
 		return ret;
@@ -776,6 +779,7 @@ xe_pt_stage_bind(struct xe_gt *gt, struct xe_vma *vma,
 			.max_level = XE_PT_HIGHEST_LEVEL,
 		},
 		.vm = xe_vma_vm(vma),
+		.vma = vma,
 		.gt = gt,
 		.curs = &curs,
 		.va_curs_start = xe_vma_start(vma),
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index a46f44ab2546..e0ed7201aeb0 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2276,6 +2276,16 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 	return vma;
 }
 
+static u64 xe_vma_max_pte_size(struct xe_vma *vma)
+{
+	if (vma->gpuva.flags & XE_VMA_PTE_1G)
+		return SZ_1G;
+	else if (vma->gpuva.flags & XE_VMA_PTE_2M)
+		return SZ_2M;
+
+	return SZ_4K;
+}
+
 /*
  * Parse operations list and create any resources needed for the operations
  * prior to fully commiting to the operations. This setp can fail.
@@ -2352,6 +2362,13 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
 				break;
 			}
 			case DRM_GPUVA_OP_REMAP:
+			{
+				struct xe_vma *old =
+					gpuva_to_vma(op->base.remap.unmap->va);
+
+				op->remap.start = xe_vma_start(old);
+				op->remap.range = xe_vma_size(old);
+
 				if (op->base.remap.prev) {
 					struct xe_vma *vma;
 					bool read_only =
@@ -2370,6 +2387,20 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
 					}
 
 					op->remap.prev = vma;
+
+					/*
+					 * XXX: Not sure why userptr doesn't
+					 * work but really shouldn't be a use
+					 * case.
+					 */
+					op->remap.skip_prev = !xe_vma_is_userptr(old) &&
+						IS_ALIGNED(xe_vma_end(vma), xe_vma_max_pte_size(old));
+					if (op->remap.skip_prev) {
+						op->remap.range -=
+							xe_vma_end(vma) -
+							xe_vma_start(old);
+						op->remap.start = xe_vma_end(vma);
+					}
 				}
 
 				if (op->base.remap.next) {
@@ -2391,20 +2422,16 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
 					}
 
 					op->remap.next = vma;
+					op->remap.skip_next = !xe_vma_is_userptr(old) &&
+						IS_ALIGNED(xe_vma_start(vma), xe_vma_max_pte_size(old));
+					if (op->remap.skip_next)
+						op->remap.range -=
+							xe_vma_end(old) -
+							xe_vma_start(vma);
 				}
-
-				/* XXX: Support no doing remaps */
-				op->remap.start =
-					xe_vma_start(gpuva_to_vma(op->base.remap.unmap->va));
-				op->remap.range =
-					xe_vma_size(gpuva_to_vma(op->base.remap.unmap->va));
 				break;
+			}
 			case DRM_GPUVA_OP_UNMAP:
-				op->unmap.start =
-					xe_vma_start(gpuva_to_vma(op->base.unmap.va));
-				op->unmap.range =
-					xe_vma_size(gpuva_to_vma(op->base.unmap.va));
-				break;
 			case DRM_GPUVA_OP_PREFETCH:
 				/* Nothing to do */
 				break;
@@ -2445,10 +2472,23 @@ static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)
 	case DRM_GPUVA_OP_REMAP:
 		prep_vma_destroy(vm, gpuva_to_vma(op->base.remap.unmap->va),
 				 true);
-		if (op->remap.prev)
+
+		if (op->remap.prev) {
 			err |= xe_vm_insert_vma(vm, op->remap.prev);
-		if (op->remap.next)
+			if (!err && op->remap.skip_prev)
+				op->remap.prev = NULL;
+		}
+		if (op->remap.next) {
 			err |= xe_vm_insert_vma(vm, op->remap.next);
+			if (!err && op->remap.skip_next)
+				op->remap.next = NULL;
+		}
+
+		/* Adjust for partial unbind after removin VMA from VM */
+		if (!err) {
+			op->base.remap.unmap->va->va.addr = op->remap.start;
+			op->base.remap.unmap->va->va.range = op->remap.range;
+		}
 		break;
 	case DRM_GPUVA_OP_UNMAP:
 		prep_vma_destroy(vm, gpuva_to_vma(op->base.unmap.va), true);
@@ -2518,9 +2558,10 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
 		bool next = !!op->remap.next;
 
 		if (!op->remap.unmap_done) {
-			vm->async_ops.munmap_rebind_inflight = true;
-			if (prev || next)
+			if (prev || next) {
+				vm->async_ops.munmap_rebind_inflight = true;
 				vma->gpuva.flags |= XE_VMA_FIRST_REBIND;
+			}
 			err = xe_vm_unbind(vm, vma, op->engine, op->syncs,
 					   op->num_syncs,
 					   !prev && !next ? op->fence : NULL,
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index b61007b70502..d55ec8156caa 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -30,6 +30,9 @@ struct xe_vm;
 #define XE_VMA_FIRST_REBIND	(DRM_GPUVA_USERBITS << 3)
 #define XE_VMA_LAST_REBIND	(DRM_GPUVA_USERBITS << 4)
 #define XE_VMA_NULL		(DRM_GPUVA_USERBITS << 5)
+#define XE_VMA_PTE_4K		(DRM_GPUVA_USERBITS << 6)
+#define XE_VMA_PTE_2M		(DRM_GPUVA_USERBITS << 7)
+#define XE_VMA_PTE_1G		(DRM_GPUVA_USERBITS << 8)
 
 struct xe_vma {
 	/** @gpuva: Base GPUVA object */
@@ -320,14 +323,6 @@ struct xe_vma_op_map {
 	bool null;
 };
 
-/** struct xe_vma_op_unmap - VMA unmap operation */
-struct xe_vma_op_unmap {
-	/** @start: start of the VMA unmap */
-	u64 start;
-	/** @range: range of the VMA unmap */
-	u64 range;
-};
-
 /** struct xe_vma_op_remap - VMA remap operation */
 struct xe_vma_op_remap {
 	/** @prev: VMA preceding part of a split mapping */
@@ -338,6 +333,10 @@ struct xe_vma_op_remap {
 	u64 start;
 	/** @range: range of the VMA unmap */
 	u64 range;
+	/** @skip_prev: skip prev rebind */
+	bool skip_prev;
+	/** @skip_next: skip next rebind */
+	bool skip_next;
 	/** @unmap_done: unmap operation in done */
 	bool unmap_done;
 };
@@ -395,8 +394,6 @@ struct xe_vma_op {
 	union {
 		/** @map: VMA map operation specific data */
 		struct xe_vma_op_map map;
-		/** @unmap: VMA unmap operation specific data */
-		struct xe_vma_op_unmap unmap;
 		/** @remap: VMA remap operation specific data */
 		struct xe_vma_op_remap remap;
 		/** @prefetch: VMA prefetch operation specific data */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 19/31] drm/xe: Reduce the number list links in xe_vma
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (17 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 18/31] drm/xe: Avoid doing rebinds Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-08 21:43   ` Rodrigo Vivi
  2023-05-11  8:38   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 20/31] drm/xe: Optimize size of xe_vma allocation Matthew Brost
                   ` (13 subsequent siblings)
  32 siblings, 2 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

5 list links in can be squashed into a union in xe_vma as being on the
various list is mutually exclusive.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_pagefault.c |  2 +-
 drivers/gpu/drm/xe/xe_pt.c           |  5 +-
 drivers/gpu/drm/xe/xe_vm.c           | 29 ++++++------
 drivers/gpu/drm/xe/xe_vm_types.h     | 71 +++++++++++++++-------------
 4 files changed, 55 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index cfffe3398fe4..d7bf6b0a0697 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -157,7 +157,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 
 	if (xe_vma_is_userptr(vma) && write_locked) {
 		spin_lock(&vm->userptr.invalidated_lock);
-		list_del_init(&vma->userptr.invalidate_link);
+		list_del_init(&vma->invalidate_link);
 		spin_unlock(&vm->userptr.invalidated_lock);
 
 		ret = xe_vma_userptr_pin_pages(vma);
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 010f44260cda..8eab8e1bbaf0 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1116,8 +1116,7 @@ static int xe_pt_userptr_inject_eagain(struct xe_vma *vma)
 
 		vma->userptr.divisor = divisor << 1;
 		spin_lock(&vm->userptr.invalidated_lock);
-		list_move_tail(&vma->userptr.invalidate_link,
-			       &vm->userptr.invalidated);
+		list_move_tail(&vma->invalidate_link, &vm->userptr.invalidated);
 		spin_unlock(&vm->userptr.invalidated_lock);
 		return true;
 	}
@@ -1724,7 +1723,7 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 
 		if (!vma->gt_present) {
 			spin_lock(&vm->userptr.invalidated_lock);
-			list_del_init(&vma->userptr.invalidate_link);
+			list_del_init(&vma->invalidate_link);
 			spin_unlock(&vm->userptr.invalidated_lock);
 		}
 		up_read(&vm->userptr.notifier_lock);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index e0ed7201aeb0..e5f2fffb2aec 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -677,8 +677,7 @@ static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni,
 	if (!xe_vm_in_fault_mode(vm) &&
 	    !(vma->gpuva.flags & XE_VMA_DESTROYED) && vma->gt_present) {
 		spin_lock(&vm->userptr.invalidated_lock);
-		list_move_tail(&vma->userptr.invalidate_link,
-			       &vm->userptr.invalidated);
+		list_move_tail(&vma->invalidate_link, &vm->userptr.invalidated);
 		spin_unlock(&vm->userptr.invalidated_lock);
 	}
 
@@ -726,8 +725,8 @@ int xe_vm_userptr_pin(struct xe_vm *vm)
 	/* Collect invalidated userptrs */
 	spin_lock(&vm->userptr.invalidated_lock);
 	list_for_each_entry_safe(vma, next, &vm->userptr.invalidated,
-				 userptr.invalidate_link) {
-		list_del_init(&vma->userptr.invalidate_link);
+				 invalidate_link) {
+		list_del_init(&vma->invalidate_link);
 		list_move_tail(&vma->userptr_link, &vm->userptr.repin_list);
 	}
 	spin_unlock(&vm->userptr.invalidated_lock);
@@ -830,12 +829,11 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 		return vma;
 	}
 
-	/* FIXME: Way to many lists, should be able to reduce this */
+	/*
+	 * userptr_link, destroy_link, notifier.rebind_link,
+	 * invalidate_link
+	 */
 	INIT_LIST_HEAD(&vma->rebind_link);
-	INIT_LIST_HEAD(&vma->unbind_link);
-	INIT_LIST_HEAD(&vma->userptr_link);
-	INIT_LIST_HEAD(&vma->userptr.invalidate_link);
-	INIT_LIST_HEAD(&vma->notifier.rebind_link);
 	INIT_LIST_HEAD(&vma->extobj.link);
 
 	INIT_LIST_HEAD(&vma->gpuva.gem.entry);
@@ -953,15 +951,14 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
 	struct xe_vm *vm = xe_vma_vm(vma);
 
 	lockdep_assert_held_write(&vm->lock);
-	XE_BUG_ON(!list_empty(&vma->unbind_link));
 
 	if (xe_vma_is_userptr(vma)) {
 		XE_WARN_ON(!(vma->gpuva.flags & XE_VMA_DESTROYED));
 
 		spin_lock(&vm->userptr.invalidated_lock);
-		list_del_init(&vma->userptr.invalidate_link);
+		if (!list_empty(&vma->invalidate_link))
+			list_del_init(&vma->invalidate_link);
 		spin_unlock(&vm->userptr.invalidated_lock);
-		list_del(&vma->userptr_link);
 	} else if (!xe_vma_is_null(vma)) {
 		xe_bo_assert_held(xe_vma_bo(vma));
 		drm_gpuva_unlink(&vma->gpuva);
@@ -1328,7 +1325,9 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 			continue;
 		}
 
-		list_add_tail(&vma->unbind_link, &contested);
+		if (!list_empty(&vma->destroy_link))
+			list_del_init(&vma->destroy_link);
+		list_add_tail(&vma->destroy_link, &contested);
 	}
 
 	/*
@@ -1356,8 +1355,8 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 	 * Since we hold a refcount to the bo, we can remove and free
 	 * the members safely without locking.
 	 */
-	list_for_each_entry_safe(vma, next_vma, &contested, unbind_link) {
-		list_del_init(&vma->unbind_link);
+	list_for_each_entry_safe(vma, next_vma, &contested, destroy_link) {
+		list_del_init(&vma->destroy_link);
 		xe_vma_destroy_unlocked(vma);
 	}
 
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index d55ec8156caa..22def5483c12 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -50,21 +50,32 @@ struct xe_vma {
 	 */
 	u64 gt_present;
 
-	/** @userptr_link: link into VM repin list if userptr */
-	struct list_head userptr_link;
+	union {
+		/** @userptr_link: link into VM repin list if userptr */
+		struct list_head userptr_link;
 
-	/**
-	 * @rebind_link: link into VM if this VMA needs rebinding, and
-	 * if it's a bo (not userptr) needs validation after a possible
-	 * eviction. Protected by the vm's resv lock.
-	 */
-	struct list_head rebind_link;
+		/**
+		 * @rebind_link: link into VM if this VMA needs rebinding, and
+		 * if it's a bo (not userptr) needs validation after a possible
+		 * eviction. Protected by the vm's resv lock.
+		 */
+		struct list_head rebind_link;
 
-	/**
-	 * @unbind_link: link or list head if an unbind of multiple VMAs, in
-	 * single unbind op, is being done.
-	 */
-	struct list_head unbind_link;
+		/** @destroy_link: link for contested VMAs on VM close */
+		struct list_head destroy_link;
+
+		/** @invalidate_link: Link for the vm::userptr.invalidated list */
+		struct list_head invalidate_link;
+
+		struct {
+			 /*
+			  * @notifier.rebind_link: link for
+			  * vm->notifier.rebind_list, protected by
+			  * vm->notifier.list_lock
+			  */
+			struct list_head rebind_link;
+		} notifier;
+	};
 
 	/** @destroy_cb: callback to destroy VMA when unbind job is done */
 	struct dma_fence_cb destroy_cb;
@@ -72,10 +83,22 @@ struct xe_vma {
 	/** @destroy_work: worker to destroy this BO */
 	struct work_struct destroy_work;
 
+	/** @usm: unified shared memory state */
+	struct {
+		/** @gt_invalidated: VMA has been invalidated */
+		u64 gt_invalidated;
+	} usm;
+
+	struct {
+		/**
+		 * @extobj.link: Link into vm's external object list.
+		 * protected by the vm lock.
+		 */
+		struct list_head link;
+	} extobj;
+
 	/** @userptr: user pointer state */
 	struct {
-		/** @invalidate_link: Link for the vm::userptr.invalidated list */
-		struct list_head invalidate_link;
 		/**
 		 * @notifier: MMU notifier for user pointer (invalidation call back)
 		 */
@@ -96,24 +119,6 @@ struct xe_vma {
 		u32 divisor;
 #endif
 	} userptr;
-
-	/** @usm: unified shared memory state */
-	struct {
-		/** @gt_invalidated: VMA has been invalidated */
-		u64 gt_invalidated;
-	} usm;
-
-	struct {
-		struct list_head rebind_link;
-	} notifier;
-
-	struct {
-		/**
-		 * @extobj.link: Link into vm's external object list.
-		 * protected by the vm lock.
-		 */
-		struct list_head link;
-	} extobj;
 };
 
 struct xe_device;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 20/31] drm/xe: Optimize size of xe_vma allocation
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (18 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 19/31] drm/xe: Reduce the number list links in xe_vma Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-05 19:37   ` Rodrigo Vivi
  2023-05-11  9:05   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 21/31] drm/gpuva: Add drm device to GPUVA manager Matthew Brost
                   ` (12 subsequent siblings)
  32 siblings, 2 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

Reduce gt_mask to a u8 from a u64, only allocate userptr state if VMA is
a userptr, and union of destroy callback and worker.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c       | 14 +++--
 drivers/gpu/drm/xe/xe_vm_types.h | 88 +++++++++++++++++---------------
 2 files changed, 57 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index e5f2fffb2aec..e8d9939ee535 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -814,7 +814,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 				    u64 bo_offset_or_userptr,
 				    u64 start, u64 end,
 				    bool read_only, bool null,
-				    u64 gt_mask)
+				    u8 gt_mask)
 {
 	struct xe_vma *vma;
 	struct xe_gt *gt;
@@ -823,7 +823,11 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 	XE_BUG_ON(start >= end);
 	XE_BUG_ON(end >= vm->size);
 
-	vma = kzalloc(sizeof(*vma), GFP_KERNEL);
+	if (!bo && !null)	/* userptr */
+		vma = kzalloc(sizeof(*vma), GFP_KERNEL);
+	else
+		vma = kzalloc(sizeof(*vma) - sizeof(struct xe_userptr),
+			      GFP_KERNEL);
 	if (!vma) {
 		vma = ERR_PTR(-ENOMEM);
 		return vma;
@@ -2149,7 +2153,7 @@ static void print_op(struct xe_device *xe, struct drm_gpuva_op *op)
 static struct drm_gpuva_ops *
 vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 			 u64 bo_offset_or_userptr, u64 addr, u64 range,
-			 u32 operation, u64 gt_mask, u32 region)
+			 u32 operation, u8 gt_mask, u32 region)
 {
 	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
 	struct ww_acquire_ctx ww;
@@ -2234,7 +2238,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 }
 
 static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
-			      u64 gt_mask, bool read_only, bool null)
+			      u8 gt_mask, bool read_only, bool null)
 {
 	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
 	struct xe_vma *vma;
@@ -3217,8 +3221,8 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		u64 addr = bind_ops[i].addr;
 		u32 op = bind_ops[i].op;
 		u64 obj_offset = bind_ops[i].obj_offset;
-		u64 gt_mask = bind_ops[i].gt_mask;
 		u32 region = bind_ops[i].region;
+		u8 gt_mask = bind_ops[i].gt_mask;
 
 		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
 						  addr, range, op, gt_mask,
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 22def5483c12..df4797ec4d7f 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -34,22 +34,34 @@ struct xe_vm;
 #define XE_VMA_PTE_2M		(DRM_GPUVA_USERBITS << 7)
 #define XE_VMA_PTE_1G		(DRM_GPUVA_USERBITS << 8)
 
+/** struct xe_userptr - User pointer */
+struct xe_userptr {
+	/**
+	 * @notifier: MMU notifier for user pointer (invalidation call back)
+	 */
+	struct mmu_interval_notifier notifier;
+	/** @sgt: storage for a scatter gather table */
+	struct sg_table sgt;
+	/** @sg: allocated scatter gather table */
+	struct sg_table *sg;
+	/** @notifier_seq: notifier sequence number */
+	unsigned long notifier_seq;
+	/**
+	 * @initial_bind: user pointer has been bound at least once.
+	 * write: vm->userptr.notifier_lock in read mode and vm->resv held.
+	 * read: vm->userptr.notifier_lock in write mode or vm->resv held.
+	 */
+	bool initial_bind;
+#if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
+	u32 divisor;
+#endif
+};
+
+/** xe_vma - Virtual memory address */
 struct xe_vma {
 	/** @gpuva: Base GPUVA object */
 	struct drm_gpuva gpuva;
 
-	/** @gt_mask: GT mask of where to create binding for this VMA */
-	u64 gt_mask;
-
-	/**
-	 * @gt_present: GT mask of binding are present for this VMA.
-	 * protected by vm->lock, vm->resv and for userptrs,
-	 * vm->userptr.notifier_lock for writing. Needs either for reading,
-	 * but if reading is done under the vm->lock only, it needs to be held
-	 * in write mode.
-	 */
-	u64 gt_present;
-
 	union {
 		/** @userptr_link: link into VM repin list if userptr */
 		struct list_head userptr_link;
@@ -77,16 +89,29 @@ struct xe_vma {
 		} notifier;
 	};
 
-	/** @destroy_cb: callback to destroy VMA when unbind job is done */
-	struct dma_fence_cb destroy_cb;
+	union {
+		/** @destroy_cb: callback to destroy VMA when unbind job is done */
+		struct dma_fence_cb destroy_cb;
+		/** @destroy_work: worker to destroy this BO */
+		struct work_struct destroy_work;
+	};
 
-	/** @destroy_work: worker to destroy this BO */
-	struct work_struct destroy_work;
+	/** @gt_mask: GT mask of where to create binding for this VMA */
+	u8 gt_mask;
+
+	/**
+	 * @gt_present: GT mask of binding are present for this VMA.
+	 * protected by vm->lock, vm->resv and for userptrs,
+	 * vm->userptr.notifier_lock for writing. Needs either for reading,
+	 * but if reading is done under the vm->lock only, it needs to be held
+	 * in write mode.
+	 */
+	u8 gt_present;
 
 	/** @usm: unified shared memory state */
 	struct {
 		/** @gt_invalidated: VMA has been invalidated */
-		u64 gt_invalidated;
+		u8 gt_invalidated;
 	} usm;
 
 	struct {
@@ -97,28 +122,11 @@ struct xe_vma {
 		struct list_head link;
 	} extobj;
 
-	/** @userptr: user pointer state */
-	struct {
-		/**
-		 * @notifier: MMU notifier for user pointer (invalidation call back)
-		 */
-		struct mmu_interval_notifier notifier;
-		/** @sgt: storage for a scatter gather table */
-		struct sg_table sgt;
-		/** @sg: allocated scatter gather table */
-		struct sg_table *sg;
-		/** @notifier_seq: notifier sequence number */
-		unsigned long notifier_seq;
-		/**
-		 * @initial_bind: user pointer has been bound at least once.
-		 * write: vm->userptr.notifier_lock in read mode and vm->resv held.
-		 * read: vm->userptr.notifier_lock in write mode or vm->resv held.
-		 */
-		bool initial_bind;
-#if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
-		u32 divisor;
-#endif
-	} userptr;
+	/**
+	 * @userptr: user pointer state, only allocated for VMAs that are
+	 * user pointers
+	 */
+	struct xe_userptr userptr;
 };
 
 struct xe_device;
@@ -387,7 +395,7 @@ struct xe_vma_op {
 	 */
 	struct async_op_fence *fence;
 	/** @gt_mask: gt mask for this operation */
-	u64 gt_mask;
+	u8 gt_mask;
 	/** @flags: operation flags */
 	enum xe_vma_op_flags flags;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 21/31] drm/gpuva: Add drm device to GPUVA manager
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (19 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 20/31] drm/xe: Optimize size of xe_vma allocation Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-05 19:39   ` Rodrigo Vivi
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 22/31] drm/gpuva: Move dma-resv " Matthew Brost
                   ` (11 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

This is the logical place for this, will help with upcoming changes too.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/drm_gpuva_mgr.c  |  3 +++
 drivers/gpu/drm/xe/xe_migrate.c  | 10 +++++-----
 drivers/gpu/drm/xe/xe_pt.c       | 18 +++++++++---------
 drivers/gpu/drm/xe/xe_vm.c       | 31 +++++++++++++++----------------
 drivers/gpu/drm/xe/xe_vm.h       | 10 ++++++++++
 drivers/gpu/drm/xe/xe_vm_types.h |  2 --
 include/drm/drm_gpuva_mgr.h      |  4 ++++
 7 files changed, 46 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
index bd7d27ee44bb..137322945e91 100644
--- a/drivers/gpu/drm/drm_gpuva_mgr.c
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -413,6 +413,7 @@ static void __drm_gpuva_remove(struct drm_gpuva *va);
 /**
  * drm_gpuva_manager_init - initialize a &drm_gpuva_manager
  * @mgr: pointer to the &drm_gpuva_manager to initialize
+ * @drm: drm device
  * @name: the name of the GPU VA space
  * @start_offset: the start offset of the GPU VA space
  * @range: the size of the GPU VA space
@@ -427,6 +428,7 @@ static void __drm_gpuva_remove(struct drm_gpuva *va);
  */
 void
 drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
+		       struct drm_device *drm,
 		       const char *name,
 		       u64 start_offset, u64 range,
 		       u64 reserve_offset, u64 reserve_range,
@@ -437,6 +439,7 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
 	mgr->mm_start = start_offset;
 	mgr->mm_range = range;
 
+	mgr->drm = drm;
 	mgr->name = name ? name : "unknown";
 	mgr->ops = ops;
 
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index b44aa094a466..0a393c5772e5 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -129,7 +129,7 @@ static u64 xe_migrate_vram_ofs(u64 addr)
 static int xe_migrate_create_cleared_bo(struct xe_migrate *m, struct xe_vm *vm)
 {
 	struct xe_gt *gt = m->gt;
-	struct xe_device *xe = vm->xe;
+	struct xe_device *xe = xe_vm_device(vm);
 	size_t cleared_size;
 	u64 vram_addr;
 	bool is_vram;
@@ -175,7 +175,7 @@ static int xe_migrate_prepare_vm(struct xe_gt *gt, struct xe_migrate *m,
 	/* Need to be sure everything fits in the first PT, or create more */
 	XE_BUG_ON(m->batch_base_ofs + batch->size >= SZ_2M);
 
-	bo = xe_bo_create_pin_map(vm->xe, m->gt, vm,
+	bo = xe_bo_create_pin_map(xe_vm_device(vm), m->gt, vm,
 				  num_entries * XE_PAGE_SIZE,
 				  ttm_bo_type_kernel,
 				  XE_BO_CREATE_VRAM_IF_DGFX(m->gt) |
@@ -1051,7 +1051,7 @@ xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
 
 	if (wait_vm && !dma_resv_test_signaled(&vm->resv,
 					       DMA_RESV_USAGE_BOOKKEEP)) {
-		vm_dbg(&vm->xe->drm, "wait on VM for munmap");
+		vm_dbg(&xe_vm_device(vm)->drm, "wait on VM for munmap");
 		return ERR_PTR(-ETIME);
 	}
 
@@ -1069,7 +1069,7 @@ xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
 
 	if (vm) {
 		trace_xe_vm_cpu_bind(vm);
-		xe_device_wmb(vm->xe);
+		xe_device_wmb(xe_vm_device(vm));
 	}
 
 	fence = dma_fence_get_stub();
@@ -1263,7 +1263,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
 	 * trigger preempts before moving forward
 	 */
 	if (first_munmap_rebind) {
-		vm_dbg(&vm->xe->drm, "wait on first_munmap_rebind");
+		vm_dbg(&xe_vm_device(vm)->drm, "wait on first_munmap_rebind");
 		err = job_add_deps(job, &vm->resv,
 				   DMA_RESV_USAGE_BOOKKEEP);
 		if (err)
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 8eab8e1bbaf0..4167f666d98d 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -218,7 +218,7 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_gt *gt,
 	if (!pt)
 		return ERR_PTR(-ENOMEM);
 
-	bo = xe_bo_create_pin_map(vm->xe, gt, vm, SZ_4K,
+	bo = xe_bo_create_pin_map(xe_vm_device(vm), gt, vm, SZ_4K,
 				  ttm_bo_type_kernel,
 				  XE_BO_CREATE_VRAM_IF_DGFX(gt) |
 				  XE_BO_CREATE_IGNORE_MIN_PAGE_SIZE_BIT |
@@ -264,11 +264,11 @@ void xe_pt_populate_empty(struct xe_gt *gt, struct xe_vm *vm,
 		 * FIXME: Some memory is allocated already allocated to zero?
 		 * Find out which memory that is and avoid this memset...
 		 */
-		xe_map_memset(vm->xe, map, 0, 0, SZ_4K);
+		xe_map_memset(xe_vm_device(vm), map, 0, 0, SZ_4K);
 	} else {
 		empty = __xe_pt_empty_pte(gt, vm, pt->level);
 		for (i = 0; i < XE_PDES; i++)
-			xe_pt_write(vm->xe, map, i, empty);
+			xe_pt_write(xe_vm_device(vm), map, i, empty);
 	}
 }
 
@@ -355,7 +355,7 @@ int xe_pt_create_scratch(struct xe_device *xe, struct xe_gt *gt,
 	if (IS_ERR(vm->scratch_bo[id]))
 		return PTR_ERR(vm->scratch_bo[id]);
 
-	xe_map_memset(vm->xe, &vm->scratch_bo[id]->vmap, 0, 0,
+	xe_map_memset(xe_vm_device(vm), &vm->scratch_bo[id]->vmap, 0, 0,
 		      vm->scratch_bo[id]->size);
 
 	for (i = 0; i < vm->pt_root[id]->level; i++) {
@@ -538,7 +538,7 @@ xe_pt_insert_entry(struct xe_pt_stage_bind_walk *xe_walk, struct xe_pt *parent,
 		if (unlikely(xe_child))
 			parent->drm.dir->entries[offset] = &xe_child->drm;
 
-		xe_pt_write(xe_walk->vm->xe, map, offset, pte);
+		xe_pt_write(xe_vm_device(xe_walk->vm), map, offset, pte);
 		parent->num_live++;
 	} else {
 		/* Shared pt. Stage update. */
@@ -1337,7 +1337,7 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 	xe_vm_assert_held(vm);
 	XE_BUG_ON(xe_gt_is_media_type(gt));
 
-	vm_dbg(&xe_vma_vm(vma)->xe->drm,
+	vm_dbg(&xe_vma_device(vma)->drm,
 	       "Preparing bind, with range [%llx...%llx) engine %p.\n",
 	       xe_vma_start(vma), xe_vma_end(vma) - 1, e);
 
@@ -1366,7 +1366,7 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 
 
 		if (last_munmap_rebind)
-			vm_dbg(&vm->xe->drm, "last_munmap_rebind");
+			vm_dbg(&xe_vm_device(vm)->drm, "last_munmap_rebind");
 
 		/* TLB invalidation must be done before signaling rebind */
 		if (rebind && !xe_vm_no_dma_fences(xe_vma_vm(vma))) {
@@ -1401,7 +1401,7 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 			xe_bo_put_commit(&deferred);
 		}
 		if (!rebind && last_munmap_rebind && xe_vm_in_compute_mode(vm))
-			queue_work(vm->xe->ordered_wq,
+			queue_work(xe_vm_device(vm)->ordered_wq,
 				   &vm->preempt.rebind_work);
 	} else {
 		kfree(ifence);
@@ -1664,7 +1664,7 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 	xe_vm_assert_held(vm);
 	XE_BUG_ON(xe_gt_is_media_type(gt));
 
-	vm_dbg(&xe_vma_vm(vma)->xe->drm,
+	vm_dbg(&xe_vma_device(vma)->drm,
 	       "Preparing unbind, with range [%llx...%llx) engine %p.\n",
 	       xe_vma_start(vma), xe_vma_end(vma) - 1, e);
 
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index e8d9939ee535..688130c509a4 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -50,7 +50,7 @@ int xe_vma_userptr_check_repin(struct xe_vma *vma)
 int xe_vma_userptr_pin_pages(struct xe_vma *vma)
 {
 	struct xe_vm *vm = xe_vma_vm(vma);
-	struct xe_device *xe = vm->xe;
+	struct xe_device *xe = xe_vm_device(vm);
 	const unsigned long num_pages = xe_vma_size(vma) >> PAGE_SHIFT;
 	struct page **pages;
 	bool in_kthread = !current->mm;
@@ -852,12 +852,12 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 	if (gt_mask) {
 		vma->gt_mask = gt_mask;
 	} else {
-		for_each_gt(gt, vm->xe, id)
+		for_each_gt(gt, xe_vm_device(vm), id)
 			if (!xe_gt_is_media_type(gt))
 				vma->gt_mask |= 0x1 << id;
 	}
 
-	if (vm->xe->info.platform == XE_PVC)
+	if (xe_vm_device(vm)->info.platform == XE_PVC)
 		vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
 
 	if (bo) {
@@ -904,7 +904,7 @@ static void vm_remove_extobj(struct xe_vma *vma)
 static void xe_vma_destroy_late(struct xe_vma *vma)
 {
 	struct xe_vm *vm = xe_vma_vm(vma);
-	struct xe_device *xe = vm->xe;
+	struct xe_device *xe = xe_vm_device(vm);
 	bool read_only = xe_vma_read_only(vma);
 
 	if (xe_vma_is_userptr(vma)) {
@@ -1084,7 +1084,6 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 	if (!vm)
 		return ERR_PTR(-ENOMEM);
 
-	vm->xe = xe;
 	kref_init(&vm->refcount);
 	dma_resv_init(&vm->resv);
 
@@ -1125,7 +1124,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 	if (err)
 		goto err_put;
 
-	drm_gpuva_manager_init(&vm->mgr, "Xe VM", 0, vm->size, 0, 0,
+	drm_gpuva_manager_init(&vm->mgr, &xe->drm, "Xe VM", 0, vm->size, 0, 0,
 			       &gpuva_ops);
 	if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
 		vm->flags |= XE_VM_FLAGS_64K;
@@ -1284,7 +1283,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 {
 	struct list_head contested;
 	struct ww_acquire_ctx ww;
-	struct xe_device *xe = vm->xe;
+	struct xe_device *xe = xe_vm_device(vm);
 	struct xe_gt *gt;
 	struct xe_vma *vma, *next_vma;
 	struct drm_gpuva *gpuva;
@@ -1387,7 +1386,7 @@ static void vm_destroy_work_func(struct work_struct *w)
 	struct xe_vm *vm =
 		container_of(w, struct xe_vm, destroy_work);
 	struct ww_acquire_ctx ww;
-	struct xe_device *xe = vm->xe;
+	struct xe_device *xe = xe_vm_device(vm);
 	struct xe_gt *gt;
 	u8 id;
 	void *lookup;
@@ -1481,7 +1480,7 @@ xe_vm_unbind_vma(struct xe_vma *vma, struct xe_engine *e,
 			return ERR_PTR(-ENOMEM);
 	}
 
-	for_each_gt(gt, vm->xe, id) {
+	for_each_gt(gt, xe_vm_device(vm), id) {
 		if (!(vma->gt_present & BIT(id)))
 			goto next;
 
@@ -1555,7 +1554,7 @@ xe_vm_bind_vma(struct xe_vma *vma, struct xe_engine *e,
 			return ERR_PTR(-ENOMEM);
 	}
 
-	for_each_gt(gt, vm->xe, id) {
+	for_each_gt(gt, xe_vm_device(vm), id) {
 		if (!(vma->gt_mask & BIT(id)))
 			goto next;
 
@@ -2061,7 +2060,7 @@ static int vm_insert_extobj(struct xe_vm *vm, struct xe_vma *vma)
 static int vm_bind_ioctl_lookup_vma(struct xe_vm *vm, struct xe_bo *bo,
 				    u64 addr, u64 range, u32 op)
 {
-	struct xe_device *xe = vm->xe;
+	struct xe_device *xe = xe_vm_device(vm);
 	struct xe_vma *vma;
 	bool async = !!(op & XE_VM_BIND_FLAG_ASYNC);
 
@@ -2164,7 +2163,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 
 	lockdep_assert_held_write(&vm->lock);
 
-	vm_dbg(&vm->xe->drm,
+	vm_dbg(&xe_vm_device(vm)->drm,
 	       "op=%d, addr=0x%016llx, range=0x%016llx, bo_offset_or_userptr=0x%016llx",
 	       VM_BIND_OP(operation), addr, range, bo_offset_or_userptr);
 
@@ -2232,7 +2231,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 
 	if (!IS_ERR(ops))
 		drm_gpuva_for_each_op(__op, ops)
-			print_op(vm->xe, __op);
+			print_op(xe_vm_device(vm), __op);
 
 	return ops;
 }
@@ -2783,7 +2782,7 @@ static void xe_vma_op_work_func(struct work_struct *w)
 			down_write(&vm->lock);
 			err = xe_vma_op_execute(vm, op);
 			if (err) {
-				drm_warn(&vm->xe->drm,
+				drm_warn(&xe_vm_device(vm)->drm,
 					 "Async VM op(%d) failed with %d",
 					 op->base.op, err);
 				vm_set_async_error(vm, err);
@@ -3103,7 +3102,7 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 
 			/* Rebinds may have been blocked, give worker a kick */
 			if (xe_vm_in_compute_mode(vm))
-				queue_work(vm->xe->ordered_wq,
+				queue_work(xe_vm_device(vm)->ordered_wq,
 					   &vm->preempt.rebind_work);
 		}
 
@@ -3315,7 +3314,7 @@ void xe_vm_unlock(struct xe_vm *vm, struct ww_acquire_ctx *ww)
  */
 int xe_vm_invalidate_vma(struct xe_vma *vma)
 {
-	struct xe_device *xe = xe_vma_vm(vma)->xe;
+	struct xe_device *xe = xe_vm_device(xe_vma_vm(vma));
 	struct xe_gt *gt;
 	u32 gt_needs_invalidate = 0;
 	int seqno[XE_MAX_GT];
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 96e2c6b07bf8..cbbe95d6291f 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -52,6 +52,11 @@ static inline bool xe_vm_is_closed(struct xe_vm *vm)
 struct xe_vma *
 xe_vm_find_overlapping_vma(struct xe_vm *vm, u64 start, u64 range);
 
+static inline struct xe_device *xe_vm_device(struct xe_vm *vm)
+{
+	return container_of(vm->mgr.drm, struct xe_device, drm);
+}
+
 static inline struct xe_vm *gpuva_to_vm(struct drm_gpuva *gpuva)
 {
 	return container_of(gpuva->mgr, struct xe_vm, mgr);
@@ -102,6 +107,11 @@ static inline struct xe_vm *xe_vma_vm(struct xe_vma *vma)
 	return container_of(vma->gpuva.mgr, struct xe_vm, mgr);
 }
 
+static inline struct xe_device *xe_vma_device(struct xe_vma *vma)
+{
+	return xe_vm_device(xe_vma_vm(vma));
+}
+
 static inline bool xe_vma_read_only(struct xe_vma *vma)
 {
 	return vma->gpuva.flags & XE_VMA_READ_ONLY;
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index df4797ec4d7f..fca42910dcae 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -137,8 +137,6 @@ struct xe_vm {
 	/** @mgr: base GPUVA used to track VMAs */
 	struct drm_gpuva_manager mgr;
 
-	struct xe_device *xe;
-
 	struct kref refcount;
 
 	/* engine used for (un)binding vma's */
diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
index 62169d850098..55b0acfdcc44 100644
--- a/include/drm/drm_gpuva_mgr.h
+++ b/include/drm/drm_gpuva_mgr.h
@@ -169,6 +169,9 @@ static inline bool drm_gpuva_evicted(struct drm_gpuva *va)
  * There should be one manager instance per GPU virtual address space.
  */
 struct drm_gpuva_manager {
+	/** @drm: drm device */
+	struct drm_device *drm;
+
 	/**
 	 * @name: the name of the DRM GPU VA space
 	 */
@@ -204,6 +207,7 @@ struct drm_gpuva_manager {
 };
 
 void drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
+			    struct drm_device *drm,
 			    const char *name,
 			    u64 start_offset, u64 range,
 			    u64 reserve_offset, u64 reserve_range,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 22/31] drm/gpuva: Move dma-resv to GPUVA manager
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (20 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 21/31] drm/gpuva: Add drm device to GPUVA manager Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-11  9:10   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 23/31] drm/gpuva: Add support for extobj Matthew Brost
                   ` (10 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

Logical place for this, will help with upcoming patches.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/drm_gpuva_mgr.c  |  2 ++
 drivers/gpu/drm/xe/xe_bo.c       | 10 +++++-----
 drivers/gpu/drm/xe/xe_bo.h       |  2 +-
 drivers/gpu/drm/xe/xe_exec.c     |  4 ++--
 drivers/gpu/drm/xe/xe_migrate.c  |  4 ++--
 drivers/gpu/drm/xe/xe_pt.c       |  6 +++---
 drivers/gpu/drm/xe/xe_vm.c       | 34 ++++++++++++++++----------------
 drivers/gpu/drm/xe/xe_vm.h       | 12 ++++++++++-
 drivers/gpu/drm/xe/xe_vm_types.h |  6 +-----
 include/drm/drm_gpuva_mgr.h      |  6 ++++++
 10 files changed, 50 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
index 137322945e91..6d2d0f4d5018 100644
--- a/drivers/gpu/drm/drm_gpuva_mgr.c
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -443,6 +443,8 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
 	mgr->name = name ? name : "unknown";
 	mgr->ops = ops;
 
+	dma_resv_init(&mgr->resv);
+
 	memset(&mgr->kernel_alloc_node, 0, sizeof(struct drm_gpuva));
 
 	if (reserve_range) {
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index a475d0584916..e0422ffb6327 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -441,9 +441,9 @@ static int xe_bo_trigger_rebind(struct xe_device *xe, struct xe_bo *bo,
 			 * that we indeed have it locked, put the vma an the
 			 * vm's notifier.rebind_list instead and scoop later.
 			 */
-			if (dma_resv_trylock(&vm->resv))
+			if (dma_resv_trylock(xe_vm_resv(vm)))
 				vm_resv_locked = true;
-			else if (ctx->resv != &vm->resv) {
+			else if (ctx->resv != xe_vm_resv(vm)) {
 				spin_lock(&vm->notifier.list_lock);
 				list_move_tail(&vma->notifier.rebind_link,
 					       &vm->notifier.rebind_list);
@@ -456,7 +456,7 @@ static int xe_bo_trigger_rebind(struct xe_device *xe, struct xe_bo *bo,
 				list_add_tail(&vma->rebind_link, &vm->rebind_list);
 
 			if (vm_resv_locked)
-				dma_resv_unlock(&vm->resv);
+				dma_resv_unlock(xe_vm_resv(vm));
 		}
 	}
 
@@ -1240,7 +1240,7 @@ xe_bo_create_locked_range(struct xe_device *xe,
 		}
 	}
 
-	bo = __xe_bo_create_locked(xe, bo, gt, vm ? &vm->resv : NULL,
+	bo = __xe_bo_create_locked(xe, bo, gt, vm ? xe_vm_resv(vm) : NULL,
 				   vm && !xe_vm_no_dma_fences(vm) &&
 				   flags & XE_BO_CREATE_USER_BIT ?
 				   &vm->lru_bulk_move : NULL, size,
@@ -1555,7 +1555,7 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
 		xe_vm_assert_held(vm);
 
 		ctx.allow_res_evict = allow_res_evict;
-		ctx.resv = &vm->resv;
+		ctx.resv = xe_vm_resv(vm);
 	}
 
 	return ttm_bo_validate(&bo->ttm, &bo->placement, &ctx);
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 81051f456874..9b401d30a130 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -150,7 +150,7 @@ void xe_bo_unlock(struct xe_bo *bo, struct ww_acquire_ctx *ww);
 static inline void xe_bo_unlock_vm_held(struct xe_bo *bo)
 {
 	if (bo) {
-		XE_BUG_ON(bo->vm && bo->ttm.base.resv != &bo->vm->resv);
+		XE_BUG_ON(bo->vm && bo->ttm.base.resv != &bo->vm->mgr.resv);
 		if (bo->vm)
 			xe_vm_assert_held(bo->vm);
 		else
diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 68f876afd13c..b352fd6e1f4d 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -327,7 +327,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	/* Wait behind munmap style rebinds */
 	if (!xe_vm_no_dma_fences(vm)) {
 		err = drm_sched_job_add_resv_dependencies(&job->drm,
-							  &vm->resv,
+							  xe_vm_resv(vm),
 							  DMA_RESV_USAGE_KERNEL);
 		if (err)
 			goto err_put_job;
@@ -355,7 +355,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	xe_sched_job_arm(job);
 	if (!xe_vm_no_dma_fences(vm)) {
 		/* Block userptr invalidations / BO eviction */
-		dma_resv_add_fence(&vm->resv,
+		dma_resv_add_fence(xe_vm_resv(vm),
 				   &job->drm.s_fence->finished,
 				   DMA_RESV_USAGE_BOOKKEEP);
 
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 0a393c5772e5..91a06c925a1e 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -1049,7 +1049,7 @@ xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
 					  DMA_RESV_USAGE_KERNEL))
 		return ERR_PTR(-ETIME);
 
-	if (wait_vm && !dma_resv_test_signaled(&vm->resv,
+	if (wait_vm && !dma_resv_test_signaled(xe_vm_resv(vm),
 					       DMA_RESV_USAGE_BOOKKEEP)) {
 		vm_dbg(&xe_vm_device(vm)->drm, "wait on VM for munmap");
 		return ERR_PTR(-ETIME);
@@ -1264,7 +1264,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
 	 */
 	if (first_munmap_rebind) {
 		vm_dbg(&xe_vm_device(vm)->drm, "wait on first_munmap_rebind");
-		err = job_add_deps(job, &vm->resv,
+		err = job_add_deps(job, xe_vm_resv(vm),
 				   DMA_RESV_USAGE_BOOKKEEP);
 		if (err)
 			goto err_job;
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 4167f666d98d..0f40f1950686 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1020,7 +1020,7 @@ static void xe_pt_commit_locks_assert(struct xe_vma *vma)
 	else if (!xe_vma_is_null(vma))
 		dma_resv_assert_held(xe_vma_bo(vma)->ttm.base.resv);
 
-	dma_resv_assert_held(&vm->resv);
+	dma_resv_assert_held(xe_vm_resv(vm));
 }
 
 static void xe_pt_commit_bind(struct xe_vma *vma,
@@ -1381,7 +1381,7 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 		}
 
 		/* add shared fence now for pagetable delayed destroy */
-		dma_resv_add_fence(&vm->resv, fence, !rebind &&
+		dma_resv_add_fence(xe_vm_resv(vm), fence, !rebind &&
 				   last_munmap_rebind ?
 				   DMA_RESV_USAGE_KERNEL :
 				   DMA_RESV_USAGE_BOOKKEEP);
@@ -1701,7 +1701,7 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 		fence = &ifence->base.base;
 
 		/* add shared fence now for pagetable delayed destroy */
-		dma_resv_add_fence(&vm->resv, fence,
+		dma_resv_add_fence(xe_vm_resv(vm), fence,
 				   DMA_RESV_USAGE_BOOKKEEP);
 
 		/* This fence will be installed by caller when doing eviction */
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 688130c509a4..8f7140501ff2 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -307,7 +307,7 @@ static void resume_and_reinstall_preempt_fences(struct xe_vm *vm)
 	list_for_each_entry(e, &vm->preempt.engines, compute.link) {
 		e->ops->resume(e);
 
-		dma_resv_add_fence(&vm->resv, e->compute.pfence,
+		dma_resv_add_fence(xe_vm_resv(vm), e->compute.pfence,
 				   DMA_RESV_USAGE_BOOKKEEP);
 		xe_vm_fence_all_extobjs(vm, e->compute.pfence,
 					DMA_RESV_USAGE_BOOKKEEP);
@@ -345,7 +345,7 @@ int xe_vm_add_compute_engine(struct xe_vm *vm, struct xe_engine *e)
 
 	down_read(&vm->userptr.notifier_lock);
 
-	dma_resv_add_fence(&vm->resv, pfence,
+	dma_resv_add_fence(xe_vm_resv(vm), pfence,
 			   DMA_RESV_USAGE_BOOKKEEP);
 
 	xe_vm_fence_all_extobjs(vm, pfence, DMA_RESV_USAGE_BOOKKEEP);
@@ -603,7 +603,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	}
 
 	/* Wait on munmap style VM unbinds */
-	wait = dma_resv_wait_timeout(&vm->resv,
+	wait = dma_resv_wait_timeout(xe_vm_resv(vm),
 				     DMA_RESV_USAGE_KERNEL,
 				     false, MAX_SCHEDULE_TIMEOUT);
 	if (wait <= 0) {
@@ -689,13 +689,13 @@ static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni,
 	 * unbinds to complete, and those are attached as BOOKMARK fences
 	 * to the vm.
 	 */
-	dma_resv_iter_begin(&cursor, &vm->resv,
+	dma_resv_iter_begin(&cursor, xe_vm_resv(vm),
 			    DMA_RESV_USAGE_BOOKKEEP);
 	dma_resv_for_each_fence_unlocked(&cursor, fence)
 		dma_fence_enable_sw_signaling(fence);
 	dma_resv_iter_end(&cursor);
 
-	err = dma_resv_wait_timeout(&vm->resv,
+	err = dma_resv_wait_timeout(xe_vm_resv(vm),
 				    DMA_RESV_USAGE_BOOKKEEP,
 				    false, MAX_SCHEDULE_TIMEOUT);
 	XE_WARN_ON(err <= 0);
@@ -742,12 +742,12 @@ int xe_vm_userptr_pin(struct xe_vm *vm)
 	}
 
 	/* Take lock and move to rebind_list for rebinding. */
-	err = dma_resv_lock_interruptible(&vm->resv, NULL);
+	err = dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
 	if (err)
 		goto out_err;
 
 	list_splice_tail(&tmp_evict, &vm->rebind_list);
-	dma_resv_unlock(&vm->resv);
+	dma_resv_unlock(xe_vm_resv(vm));
 
 	return 0;
 
@@ -1085,7 +1085,6 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 		return ERR_PTR(-ENOMEM);
 
 	kref_init(&vm->refcount);
-	dma_resv_init(&vm->resv);
 
 	vm->size = 1ull << xe_pt_shift(xe->info.vm_max_level + 1);
 
@@ -1120,12 +1119,13 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 		xe_device_mem_access_get(xe);
 	}
 
-	err = dma_resv_lock_interruptible(&vm->resv, NULL);
+	drm_gpuva_manager_init(&vm->mgr, &xe->drm, "Xe VM", 0, vm->size, 0, 0,
+			       &gpuva_ops);
+
+	err = dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
 	if (err)
 		goto err_put;
 
-	drm_gpuva_manager_init(&vm->mgr, &xe->drm, "Xe VM", 0, vm->size, 0, 0,
-			       &gpuva_ops);
 	if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
 		vm->flags |= XE_VM_FLAGS_64K;
 
@@ -1173,7 +1173,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 
 		xe_pt_populate_empty(gt, vm, vm->pt_root[id]);
 	}
-	dma_resv_unlock(&vm->resv);
+	dma_resv_unlock(xe_vm_resv(vm));
 
 	/* Kernel migration VM shouldn't have a circular loop.. */
 	if (!(flags & XE_VM_FLAG_MIGRATION)) {
@@ -1230,10 +1230,10 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 		if (vm->pt_root[id])
 			xe_pt_destroy(vm->pt_root[id], vm->flags, NULL);
 	}
-	dma_resv_unlock(&vm->resv);
+	dma_resv_unlock(xe_vm_resv(vm));
 	drm_gpuva_manager_destroy(&vm->mgr);
 err_put:
-	dma_resv_fini(&vm->resv);
+	dma_resv_fini(xe_vm_resv(vm));
 	kfree(vm);
 	if (!(flags & XE_VM_FLAG_MIGRATION)) {
 		xe_device_mem_access_put(xe);
@@ -1422,7 +1422,7 @@ static void vm_destroy_work_func(struct work_struct *w)
 
 	trace_xe_vm_free(vm);
 	dma_fence_put(vm->rebind_fence);
-	dma_resv_fini(&vm->resv);
+	dma_resv_fini(xe_vm_resv(vm));
 	kfree(vm);
 }
 
@@ -3298,7 +3298,7 @@ int xe_vm_lock(struct xe_vm *vm, struct ww_acquire_ctx *ww,
 
 void xe_vm_unlock(struct xe_vm *vm, struct ww_acquire_ctx *ww)
 {
-	dma_resv_unlock(&vm->resv);
+	dma_resv_unlock(xe_vm_resv(vm));
 	ww_acquire_fini(ww);
 }
 
@@ -3331,7 +3331,7 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
 			WARN_ON_ONCE(!mmu_interval_check_retry
 				     (&vma->userptr.notifier,
 				      vma->userptr.notifier_seq));
-			WARN_ON_ONCE(!dma_resv_test_signaled(&xe_vma_vm(vma)->resv,
+			WARN_ON_ONCE(!dma_resv_test_signaled(xe_vma_resv(vma),
 							     DMA_RESV_USAGE_BOOKKEEP));
 
 		} else {
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index cbbe95d6291f..81a9271be728 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -57,6 +57,11 @@ static inline struct xe_device *xe_vm_device(struct xe_vm *vm)
 	return container_of(vm->mgr.drm, struct xe_device, drm);
 }
 
+static inline struct dma_resv *xe_vm_resv(struct xe_vm *vm)
+{
+	return &vm->mgr.resv;
+}
+
 static inline struct xe_vm *gpuva_to_vm(struct drm_gpuva *gpuva)
 {
 	return container_of(gpuva->mgr, struct xe_vm, mgr);
@@ -112,6 +117,11 @@ static inline struct xe_device *xe_vma_device(struct xe_vma *vma)
 	return xe_vm_device(xe_vma_vm(vma));
 }
 
+static inline struct dma_resv *xe_vma_resv(struct xe_vma *vma)
+{
+	return xe_vm_resv(xe_vma_vm(vma));
+}
+
 static inline bool xe_vma_read_only(struct xe_vma *vma)
 {
 	return vma->gpuva.flags & XE_VMA_READ_ONLY;
@@ -122,7 +132,7 @@ static inline u64 xe_vma_userptr(struct xe_vma *vma)
 	return vma->gpuva.gem.offset;
 }
 
-#define xe_vm_assert_held(vm) dma_resv_assert_held(&(vm)->resv)
+#define xe_vm_assert_held(vm) dma_resv_assert_held(&(vm)->mgr.resv)
 
 u64 xe_vm_pdp4_descriptor(struct xe_vm *vm, struct xe_gt *full_gt);
 
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index fca42910dcae..26571d171a43 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -8,7 +8,6 @@
 
 #include <drm/drm_gpuva_mgr.h>
 
-#include <linux/dma-resv.h>
 #include <linux/kref.h>
 #include <linux/mmu_notifier.h>
 #include <linux/scatterlist.h>
@@ -131,7 +130,7 @@ struct xe_vma {
 
 struct xe_device;
 
-#define xe_vm_assert_held(vm) dma_resv_assert_held(&(vm)->resv)
+#define xe_vm_assert_held(vm) dma_resv_assert_held(&(vm)->mgr.resv)
 
 struct xe_vm {
 	/** @mgr: base GPUVA used to track VMAs */
@@ -142,9 +141,6 @@ struct xe_vm {
 	/* engine used for (un)binding vma's */
 	struct xe_engine *eng[XE_MAX_GT];
 
-	/** Protects @rebind_list and the page-table structures */
-	struct dma_resv resv;
-
 	/** @lru_bulk_move: Bulk LRU move list for this VM's BOs */
 	struct ttm_lru_bulk_move lru_bulk_move;
 
diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
index 55b0acfdcc44..010b649e363f 100644
--- a/include/drm/drm_gpuva_mgr.h
+++ b/include/drm/drm_gpuva_mgr.h
@@ -25,6 +25,7 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  */
 
+#include <linux/dma-resv.h>
 #include <linux/maple_tree.h>
 #include <linux/mm.h>
 #include <linux/rbtree.h>
@@ -177,6 +178,11 @@ struct drm_gpuva_manager {
 	 */
 	const char *name;
 
+	/**
+	 * @resv: dma-resv for all private GEMs mapped in this address space
+	 */
+	struct dma_resv resv;
+
 	/**
 	 * @mm_start: start of the VA space
 	 */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 23/31] drm/gpuva: Add support for extobj
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (21 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 22/31] drm/gpuva: Move dma-resv " Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-11  9:35   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 24/31] drm/xe: Userptr refactor Matthew Brost
                   ` (9 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

Manager maintains lists of GPUVA with extobjs.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/drm_gpuva_mgr.c  | 45 +++++++++++++--
 drivers/gpu/drm/xe/xe_exec.c     | 24 ++++----
 drivers/gpu/drm/xe/xe_vm.c       | 99 +++++---------------------------
 drivers/gpu/drm/xe/xe_vm.h       |  3 -
 drivers/gpu/drm/xe/xe_vm_types.h | 16 ------
 include/drm/drm_gpuva_mgr.h      | 39 ++++++++++++-
 6 files changed, 105 insertions(+), 121 deletions(-)

diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
index 6d2d0f4d5018..e8cd6e154336 100644
--- a/drivers/gpu/drm/drm_gpuva_mgr.c
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -447,6 +447,9 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
 
 	memset(&mgr->kernel_alloc_node, 0, sizeof(struct drm_gpuva));
 
+	mgr->extobj.entries = 0;
+	INIT_LIST_HEAD(&mgr->extobj.list);
+
 	if (reserve_range) {
 		mgr->kernel_alloc_node.va.addr = reserve_offset;
 		mgr->kernel_alloc_node.va.range = reserve_range;
@@ -706,7 +709,8 @@ EXPORT_SYMBOL(drm_gpuva_remove);
  * @va: the &drm_gpuva to link
  *
  * This adds the given &va to the GPU VA list of the &drm_gem_object it is
- * associated with.
+ * associated with and to &drm_gpuva_manager.extobj.list if GPUVA maps an
+ * extobj.
  *
  * This function expects the caller to protect the GEM's GPUVA list against
  * concurrent access.
@@ -714,8 +718,14 @@ EXPORT_SYMBOL(drm_gpuva_remove);
 void
 drm_gpuva_link(struct drm_gpuva *va)
 {
-	if (likely(va->gem.obj))
+	if (likely(va->gem.obj)) {
 		list_add_tail(&va->gem.entry, &va->gem.obj->gpuva.list);
+		if (va->flags & DRM_GPUVA_EXTOBJ) {
+			list_add_tail(&va->gem.extobj_link,
+				      &va->mgr->extobj.list);
+			++va->mgr->extobj.entries;
+		}
+	}
 }
 EXPORT_SYMBOL(drm_gpuva_link);
 
@@ -724,7 +734,8 @@ EXPORT_SYMBOL(drm_gpuva_link);
  * @va: the &drm_gpuva to unlink
  *
  * This removes the given &va from the GPU VA list of the &drm_gem_object it is
- * associated with.
+ * associated with and from &drm_gpuva_manager.extobj.list if GPUVA maps an
+ * extobj.
  *
  * This function expects the caller to protect the GEM's GPUVA list against
  * concurrent access.
@@ -732,8 +743,13 @@ EXPORT_SYMBOL(drm_gpuva_link);
 void
 drm_gpuva_unlink(struct drm_gpuva *va)
 {
-	if (likely(va->gem.obj))
+	if (likely(va->gem.obj)) {
 		list_del_init(&va->gem.entry);
+		if (va->flags & DRM_GPUVA_EXTOBJ) {
+			list_del(&va->gem.extobj_link);
+			--va->mgr->extobj.entries;
+		}
+	}
 }
 EXPORT_SYMBOL(drm_gpuva_unlink);
 
@@ -871,6 +887,27 @@ drm_gpuva_interval_empty(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
 }
 EXPORT_SYMBOL(drm_gpuva_interval_empty);
 
+/**
+ * drm_gpuva_add_fence - add fence to private and all extobj dma-resv
+ * @mgr: the &drm_gpuva_manager to add a fence to
+ * @fence: fence to add
+ * @private_usage: private dma-resv usage
+ * @extobj_usage: extobj dma-resv usage
+ *
+ * Returns: true if the interval is empty, false otherwise
+ */
+void drm_gpuva_add_fence(struct drm_gpuva_manager *mgr, struct dma_fence *fence,
+			 enum dma_resv_usage private_usage,
+			 enum dma_resv_usage extobj_usage)
+{
+	struct drm_gpuva *gpuva;
+
+	dma_resv_add_fence(&mgr->resv, fence, private_usage);
+	drm_gpuva_for_each_extobj(gpuva, mgr)
+		dma_resv_add_fence(gpuva->gem.obj->resv, fence, extobj_usage);
+}
+EXPORT_SYMBOL(drm_gpuva_add_fence);
+
 /**
  * drm_gpuva_map - helper to insert a &drm_gpuva from &drm_gpuva_fn_ops
  * callbacks
diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index b352fd6e1f4d..2ae02f1500d5 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -353,19 +353,17 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	 * the job and let the DRM scheduler / backend clean up the job.
 	 */
 	xe_sched_job_arm(job);
-	if (!xe_vm_no_dma_fences(vm)) {
-		/* Block userptr invalidations / BO eviction */
-		dma_resv_add_fence(xe_vm_resv(vm),
-				   &job->drm.s_fence->finished,
-				   DMA_RESV_USAGE_BOOKKEEP);
-
-		/*
-		 * Make implicit sync work across drivers, assuming all external
-		 * BOs are written as we don't pass in a read / write list.
-		 */
-		xe_vm_fence_all_extobjs(vm, &job->drm.s_fence->finished,
-					DMA_RESV_USAGE_WRITE);
-	}
+
+	/*
+	 * Block userptr invalidations / BO eviction
+	 *
+	 * Make implicit sync work across drivers, assuming all external BOss
+	 * are written as we don't pass in a read / write list.
+	 */
+	if (!xe_vm_no_dma_fences(vm))
+		drm_gpuva_add_fence(&vm->mgr, &job->drm.s_fence->finished,
+				    DMA_RESV_USAGE_BOOKKEEP,
+				    DMA_RESV_USAGE_WRITE);
 
 	for (i = 0; i < num_syncs; i++)
 		xe_sync_entry_signal(&syncs[i], job,
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 8f7140501ff2..336e21c710a5 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -278,25 +278,6 @@ static int add_preempt_fences(struct xe_vm *vm, struct xe_bo *bo)
 	return 0;
 }
 
-/**
- * xe_vm_fence_all_extobjs() - Add a fence to vm's external objects' resv
- * @vm: The vm.
- * @fence: The fence to add.
- * @usage: The resv usage for the fence.
- *
- * Loops over all of the vm's external object bindings and adds a @fence
- * with the given @usage to all of the external object's reservation
- * objects.
- */
-void xe_vm_fence_all_extobjs(struct xe_vm *vm, struct dma_fence *fence,
-			     enum dma_resv_usage usage)
-{
-	struct xe_vma *vma;
-
-	list_for_each_entry(vma, &vm->extobj.list, extobj.link)
-		dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence, usage);
-}
-
 static void resume_and_reinstall_preempt_fences(struct xe_vm *vm)
 {
 	struct xe_engine *e;
@@ -307,10 +288,9 @@ static void resume_and_reinstall_preempt_fences(struct xe_vm *vm)
 	list_for_each_entry(e, &vm->preempt.engines, compute.link) {
 		e->ops->resume(e);
 
-		dma_resv_add_fence(xe_vm_resv(vm), e->compute.pfence,
-				   DMA_RESV_USAGE_BOOKKEEP);
-		xe_vm_fence_all_extobjs(vm, e->compute.pfence,
-					DMA_RESV_USAGE_BOOKKEEP);
+		drm_gpuva_add_fence(&vm->mgr, e->compute.pfence,
+				    DMA_RESV_USAGE_BOOKKEEP,
+				    DMA_RESV_USAGE_BOOKKEEP);
 	}
 }
 
@@ -345,10 +325,9 @@ int xe_vm_add_compute_engine(struct xe_vm *vm, struct xe_engine *e)
 
 	down_read(&vm->userptr.notifier_lock);
 
-	dma_resv_add_fence(xe_vm_resv(vm), pfence,
-			   DMA_RESV_USAGE_BOOKKEEP);
-
-	xe_vm_fence_all_extobjs(vm, pfence, DMA_RESV_USAGE_BOOKKEEP);
+	drm_gpuva_add_fence(&vm->mgr, pfence,
+			    DMA_RESV_USAGE_BOOKKEEP,
+			    DMA_RESV_USAGE_BOOKKEEP);
 
 	/*
 	 * Check to see if a preemption on VM is in flight or userptr
@@ -425,15 +404,17 @@ int xe_vm_lock_dma_resv(struct xe_vm *vm, struct ww_acquire_ctx *ww,
 {
 	struct ttm_validate_buffer *tv_vm, *tv_bo;
 	struct xe_vma *vma, *next;
+	struct drm_gpuva *gpuva;
 	LIST_HEAD(dups);
 	int err;
 
 	lockdep_assert_held(&vm->lock);
 
-	if (vm->extobj.entries < XE_ONSTACK_TV) {
+	if (vm->mgr.extobj.entries < XE_ONSTACK_TV) {
 		tv_vm = tv_onstack;
 	} else {
-		tv_vm = kvmalloc_array(vm->extobj.entries + 1, sizeof(*tv_vm),
+		tv_vm = kvmalloc_array(vm->mgr.extobj.entries + 1,
+				       sizeof(*tv_vm),
 				       GFP_KERNEL);
 		if (!tv_vm)
 			return -ENOMEM;
@@ -441,9 +422,9 @@ int xe_vm_lock_dma_resv(struct xe_vm *vm, struct ww_acquire_ctx *ww,
 	tv_bo = tv_vm + 1;
 
 	INIT_LIST_HEAD(objs);
-	list_for_each_entry(vma, &vm->extobj.list, extobj.link) {
+	drm_gpuva_for_each_extobj(gpuva, &vm->mgr) {
 		tv_bo->num_shared = num_shared;
-		tv_bo->bo = &xe_vma_bo(vma)->ttm;
+		tv_bo->bo = &gem_to_xe_bo(gpuva->gem.obj)->ttm;
 
 		list_add_tail(&tv_bo->head, objs);
 		tv_bo++;
@@ -838,9 +819,9 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 	 * invalidate_link
 	 */
 	INIT_LIST_HEAD(&vma->rebind_link);
-	INIT_LIST_HEAD(&vma->extobj.link);
 
 	INIT_LIST_HEAD(&vma->gpuva.gem.entry);
+	INIT_LIST_HEAD(&vma->gpuva.gem.extobj_link);
 	vma->gpuva.mgr = &vm->mgr;
 	vma->gpuva.va.addr = start;
 	vma->gpuva.va.range = end - start + 1;
@@ -866,6 +847,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 		drm_gem_object_get(&bo->ttm.base);
 		vma->gpuva.gem.obj = &bo->ttm.base;
 		vma->gpuva.gem.offset = bo_offset_or_userptr;
+		if (!bo->vm)
+			vma->gpuva.flags |= DRM_GPUVA_EXTOBJ;
 		drm_gpuva_link(&vma->gpuva);
 	} else /* userptr or null */ {
 		if (!null) {
@@ -893,14 +876,6 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 	return vma;
 }
 
-static void vm_remove_extobj(struct xe_vma *vma)
-{
-	if (!list_empty(&vma->extobj.link)) {
-		xe_vma_vm(vma)->extobj.entries--;
-		list_del_init(&vma->extobj.link);
-	}
-}
-
 static void xe_vma_destroy_late(struct xe_vma *vma)
 {
 	struct xe_vm *vm = xe_vma_vm(vma);
@@ -966,8 +941,6 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
 	} else if (!xe_vma_is_null(vma)) {
 		xe_bo_assert_held(xe_vma_bo(vma));
 		drm_gpuva_unlink(&vma->gpuva);
-		if (!xe_vma_bo(vma)->vm)
-			vm_remove_extobj(vma);
 	}
 
 	xe_vm_assert_held(vm);
@@ -1111,8 +1084,6 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 	INIT_LIST_HEAD(&vm->preempt.engines);
 	vm->preempt.min_run_period_ms = 10;	/* FIXME: Wire up to uAPI */
 
-	INIT_LIST_HEAD(&vm->extobj.list);
-
 	if (!(flags & XE_VM_FLAG_MIGRATION)) {
 		/* We need to immeditatelly exit from any D3 state */
 		xe_pm_runtime_get(xe);
@@ -1366,7 +1337,6 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 	if (vm->async_ops.error_capture.addr)
 		wake_up_all(&vm->async_ops.error_capture.wq);
 
-	XE_WARN_ON(!list_empty(&vm->extobj.list));
 	up_write(&vm->lock);
 
 	drm_gpuva_manager_destroy(&vm->mgr);
@@ -2019,44 +1989,6 @@ static void vm_set_async_error(struct xe_vm *vm, int err)
 	vm->async_ops.error = err;
 }
 
-static bool bo_has_vm_references(struct xe_bo *bo, struct xe_vm *vm,
-				 struct xe_vma *ignore)
-{
-	struct ww_acquire_ctx ww;
-	struct drm_gpuva *gpuva;
-	struct drm_gem_object *obj = &bo->ttm.base;
-	bool ret = false;
-
-	xe_bo_lock(bo, &ww, 0, false);
-	drm_gem_for_each_gpuva(gpuva, obj) {
-		struct xe_vma *vma = gpuva_to_vma(gpuva);
-
-		if (vma != ignore && xe_vma_vm(vma) == vm &&
-		    !(vma->gpuva.flags & XE_VMA_DESTROYED)) {
-			ret = true;
-			break;
-		}
-	}
-	xe_bo_unlock(bo, &ww);
-
-	return ret;
-}
-
-static int vm_insert_extobj(struct xe_vm *vm, struct xe_vma *vma)
-{
-	struct xe_bo *bo = xe_vma_bo(vma);
-
-	lockdep_assert_held_write(&vm->lock);
-
-	if (bo_has_vm_references(bo, vm, vma))
-		return 0;
-
-	list_add(&vma->extobj.link, &vm->extobj.list);
-	vm->extobj.entries++;
-
-	return 0;
-}
-
 static int vm_bind_ioctl_lookup_vma(struct xe_vm *vm, struct xe_bo *bo,
 				    u64 addr, u64 range, u32 op)
 {
@@ -2266,7 +2198,6 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 			return ERR_PTR(err);
 		}
 	} else if(!xe_vma_has_no_bo(vma) && !bo->vm) {
-		vm_insert_extobj(vm, vma);
 		err = add_preempt_fences(vm, bo);
 		if (err) {
 			prep_vma_destroy(vm, vma, false);
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 81a9271be728..12de652d8d1c 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -232,9 +232,6 @@ void xe_vm_unlock_dma_resv(struct xe_vm *vm,
 			   struct ww_acquire_ctx *ww,
 			   struct list_head *objs);
 
-void xe_vm_fence_all_extobjs(struct xe_vm *vm, struct dma_fence *fence,
-			     enum dma_resv_usage usage);
-
 int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id);
 
 #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_VM)
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 26571d171a43..0b59bde3bc4e 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -113,14 +113,6 @@ struct xe_vma {
 		u8 gt_invalidated;
 	} usm;
 
-	struct {
-		/**
-		 * @extobj.link: Link into vm's external object list.
-		 * protected by the vm lock.
-		 */
-		struct list_head link;
-	} extobj;
-
 	/**
 	 * @userptr: user pointer state, only allocated for VMAs that are
 	 * user pointers
@@ -189,14 +181,6 @@ struct xe_vm {
 	 */
 	struct work_struct destroy_work;
 
-	/** @extobj: bookkeeping for external objects. Protected by the vm lock */
-	struct {
-		/** @enties: number of external BOs attached this VM */
-		u32 entries;
-		/** @list: list of vmas with external bos attached */
-		struct list_head list;
-	} extobj;
-
 	/** @async_ops: async VM operations (bind / unbinds) */
 	struct {
 		/** @list: list of pending async VM ops */
diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
index 010b649e363f..57861a7ed504 100644
--- a/include/drm/drm_gpuva_mgr.h
+++ b/include/drm/drm_gpuva_mgr.h
@@ -54,10 +54,18 @@ enum drm_gpuva_flags {
 	 */
 	DRM_GPUVA_SPARSE = (1 << 1),
 
+	/**
+	 * @DRM_GPUVA_EXTOBJ:
+	 *
+	 * Flag indicating that the &drm_gpuva is a mapping of an extobj (GEN
+	 * not tied to a single address space).
+	 */
+	DRM_GPUVA_EXTOBJ = (1 << 2),
+
 	/**
 	 * @DRM_GPUVA_USERBITS: user defined bits
 	 */
-	DRM_GPUVA_USERBITS = (1 << 2),
+	DRM_GPUVA_USERBITS = (1 << 3),
 };
 
 /**
@@ -112,6 +120,12 @@ struct drm_gpuva {
 		 * @entry: the &list_head to attach this object to a &drm_gem_object
 		 */
 		struct list_head entry;
+
+		/**
+		 * @extobj_link: the &list_head to attach this object to a
+		 * @drm_gpuva_manager.extobj.list
+		 */
+		struct list_head extobj_link;
 	} gem;
 };
 
@@ -134,6 +148,10 @@ struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end);
 
 bool drm_gpuva_interval_empty(struct drm_gpuva_manager *mgr, u64 addr, u64 range);
 
+void drm_gpuva_add_fence(struct drm_gpuva_manager *mgr, struct dma_fence *fence,
+			 enum dma_resv_usage private_usage,
+			 enum dma_resv_usage extobj_usage);
+
 /**
  * drm_gpuva_evict - sets whether the backing GEM of this &drm_gpuva is evicted
  * @va: the &drm_gpuva to set the evict flag for
@@ -206,6 +224,17 @@ struct drm_gpuva_manager {
 	 */
 	struct drm_gpuva kernel_alloc_node;
 
+	/** @extobj: bookkeeping for external GEMs */
+	struct {
+		/**
+		 * @entries: number of external GEMs attached this address
+		 * space
+		 */
+		u32 entries;
+		/** @list: list of GPUVAs with external GEMs attached */
+		struct list_head list;
+	} extobj;
+
 	/**
 	 * @ops: &drm_gpuva_fn_ops providing the split/merge steps to drivers
 	 */
@@ -509,6 +538,14 @@ struct drm_gpuva_ops {
 	struct list_head list;
 };
 
+/**
+ * drm_gpuva_for_each_op - iterator to walk over &drm_gpuva of extobjs
+ * @va: &drm_gpuva to assign in each iteration step
+ * @mgr: &drm_gpuva_manager to walk extobj lisy
+ */
+#define drm_gpuva_for_each_extobj(va, mgr) \
+	list_for_each_entry(va, &(mgr)->extobj.list, gem.extobj_link)
+
 /**
  * drm_gpuva_for_each_op - iterator to walk over &drm_gpuva_ops
  * @op: &drm_gpuva_op to assign in each iteration step
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 24/31] drm/xe: Userptr refactor
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (22 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 23/31] drm/gpuva: Add support for extobj Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-05 19:41   ` Rodrigo Vivi
  2023-05-11  9:46   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 25/31] drm: execution context for GEM buffers v3 Matthew Brost
                   ` (8 subsequent siblings)
  32 siblings, 2 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

Add GPUVA userptr flag, add GPUVA userptr sub-struct, and drop sg
pointer. A larger follow on cleanup may push more of userptr
implementation to GPUVA.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c       |  6 +--
 drivers/gpu/drm/xe/xe_vm.c       | 41 +++++++++++----------
 drivers/gpu/drm/xe/xe_vm.h       | 23 +++++++-----
 drivers/gpu/drm/xe/xe_vm_types.h | 20 +++++-----
 include/drm/drm_gpuva_mgr.h      | 63 +++++++++++++++++++++-----------
 5 files changed, 89 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 0f40f1950686..964baa24eba3 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -92,8 +92,8 @@ static dma_addr_t vma_addr(struct xe_vma *vma, u64 offset,
 		page = offset >> PAGE_SHIFT;
 		offset &= (PAGE_SIZE - 1);
 
-		xe_res_first_sg(vma->userptr.sg, page << PAGE_SHIFT, page_size,
-				&cur);
+		xe_res_first_sg(&vma->userptr.sgt, page << PAGE_SHIFT,
+				page_size, &cur);
 		return xe_res_dma(&cur) + offset;
 	} else {
 		return xe_bo_addr(xe_vma_bo(vma), offset, page_size, is_vram);
@@ -813,7 +813,7 @@ xe_pt_stage_bind(struct xe_gt *gt, struct xe_vma *vma,
 	xe_bo_assert_held(bo);
 	if (!xe_vma_is_null(vma)) {
 		if (xe_vma_is_userptr(vma))
-			xe_res_first_sg(vma->userptr.sg, 0, xe_vma_size(vma),
+			xe_res_first_sg(&vma->userptr.sgt, 0, xe_vma_size(vma),
 					&curs);
 		else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
 			xe_res_first(bo->ttm.resource, xe_vma_bo_offset(vma),
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 336e21c710a5..4d734ec4d6ab 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -73,13 +73,13 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
 	if (!pages)
 		return -ENOMEM;
 
-	if (vma->userptr.sg) {
+	if (xe_vma_userptr_sg_mapped(vma)) {
 		dma_unmap_sgtable(xe->drm.dev,
-				  vma->userptr.sg,
+				  &vma->userptr.sgt,
 				  read_only ? DMA_TO_DEVICE :
 				  DMA_BIDIRECTIONAL, 0);
-		sg_free_table(vma->userptr.sg);
-		vma->userptr.sg = NULL;
+		sg_free_table(&vma->userptr.sgt);
+		vma->gpuva.flags &= ~XE_VMA_USERPTR_SG_MAPPED;
 	}
 
 	pinned = ret = 0;
@@ -119,19 +119,19 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
 					0, (u64)pinned << PAGE_SHIFT,
 					GFP_KERNEL);
 	if (ret) {
-		vma->userptr.sg = NULL;
+		vma->gpuva.flags &= ~XE_VMA_USERPTR_SG_MAPPED;
 		goto out;
 	}
-	vma->userptr.sg = &vma->userptr.sgt;
+	vma->gpuva.flags |= XE_VMA_USERPTR_SG_MAPPED;
 
-	ret = dma_map_sgtable(xe->drm.dev, vma->userptr.sg,
+	ret = dma_map_sgtable(xe->drm.dev, &vma->userptr.sgt,
 			      read_only ? DMA_TO_DEVICE :
 			      DMA_BIDIRECTIONAL,
 			      DMA_ATTR_SKIP_CPU_SYNC |
 			      DMA_ATTR_NO_KERNEL_MAPPING);
 	if (ret) {
-		sg_free_table(vma->userptr.sg);
-		vma->userptr.sg = NULL;
+		sg_free_table(&vma->userptr.sgt);
+		vma->gpuva.flags &= ~XE_VMA_USERPTR_SG_MAPPED;
 		goto out;
 	}
 
@@ -820,15 +820,13 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 	 */
 	INIT_LIST_HEAD(&vma->rebind_link);
 
-	INIT_LIST_HEAD(&vma->gpuva.gem.entry);
-	INIT_LIST_HEAD(&vma->gpuva.gem.extobj_link);
 	vma->gpuva.mgr = &vm->mgr;
 	vma->gpuva.va.addr = start;
 	vma->gpuva.va.range = end - start + 1;
 	if (read_only)
 		vma->gpuva.flags |= XE_VMA_READ_ONLY;
 	if (null)
-		vma->gpuva.flags |= XE_VMA_NULL;
+		vma->gpuva.flags |= DRM_GPUVA_SPARSE;
 
 	if (gt_mask) {
 		vma->gt_mask = gt_mask;
@@ -845,6 +843,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 		xe_bo_assert_held(bo);
 
 		drm_gem_object_get(&bo->ttm.base);
+		INIT_LIST_HEAD(&vma->gpuva.gem.entry);
+		INIT_LIST_HEAD(&vma->gpuva.gem.extobj_link);
 		vma->gpuva.gem.obj = &bo->ttm.base;
 		vma->gpuva.gem.offset = bo_offset_or_userptr;
 		if (!bo->vm)
@@ -855,7 +855,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 			u64 size = end - start + 1;
 			int err;
 
-			vma->gpuva.gem.offset = bo_offset_or_userptr;
+			vma->gpuva.flags |= DRM_GPUVA_USERPTR;
+			vma->gpuva.userptr.address= bo_offset_or_userptr;
 			err = mmu_interval_notifier_insert(&vma->userptr.notifier,
 							   current->mm,
 							   xe_vma_userptr(vma),
@@ -883,13 +884,13 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
 	bool read_only = xe_vma_read_only(vma);
 
 	if (xe_vma_is_userptr(vma)) {
-		if (vma->userptr.sg) {
+		if (xe_vma_userptr_sg_mapped(vma)) {
 			dma_unmap_sgtable(xe->drm.dev,
-					  vma->userptr.sg,
+					  &vma->userptr.sgt,
 					  read_only ? DMA_TO_DEVICE :
 					  DMA_BIDIRECTIONAL, 0);
-			sg_free_table(vma->userptr.sg);
-			vma->userptr.sg = NULL;
+			sg_free_table(&vma->userptr.sgt);
+			vma->gpuva.flags &= ~XE_VMA_USERPTR_SG_MAPPED;
 		}
 
 		/*
@@ -2309,7 +2310,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
 						XE_VMA_READ_ONLY;
 					bool null =
 						op->base.remap.unmap->va->flags &
-						XE_VMA_NULL;
+						DRM_GPUVA_SPARSE;
 
 					vma = new_vma(vm, op->base.remap.prev,
 						      op->gt_mask, read_only,
@@ -2344,7 +2345,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
 
 					bool null =
 						op->base.remap.unmap->va->flags &
-						XE_VMA_NULL;
+						DRM_GPUVA_SPARSE;
 
 					vma = new_vma(vm, op->base.remap.next,
 						      op->gt_mask, read_only,
@@ -3320,7 +3321,7 @@ int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
 		} else if (is_userptr) {
 			struct xe_res_cursor cur;
 
-			xe_res_first_sg(vma->userptr.sg, 0, XE_PAGE_SIZE, &cur);
+			xe_res_first_sg(&vma->userptr.sgt, 0, XE_PAGE_SIZE, &cur);
 			addr = xe_res_dma(&cur);
 		} else {
 			addr = xe_bo_addr(xe_vma_bo(vma), 0, XE_PAGE_SIZE, &is_vram);
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 12de652d8d1c..f279fa622260 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -101,12 +101,6 @@ static inline u64 xe_vma_bo_offset(struct xe_vma *vma)
 	return vma->gpuva.gem.offset;
 }
 
-static inline struct xe_bo *xe_vma_bo(struct xe_vma *vma)
-{
-	return !vma->gpuva.gem.obj ? NULL :
-		container_of(vma->gpuva.gem.obj, struct xe_bo, ttm.base);
-}
-
 static inline struct xe_vm *xe_vma_vm(struct xe_vma *vma)
 {
 	return container_of(vma->gpuva.mgr, struct xe_vm, mgr);
@@ -129,7 +123,7 @@ static inline bool xe_vma_read_only(struct xe_vma *vma)
 
 static inline u64 xe_vma_userptr(struct xe_vma *vma)
 {
-	return vma->gpuva.gem.offset;
+	return vma->gpuva.userptr.address;
 }
 
 #define xe_vm_assert_held(vm) dma_resv_assert_held(&(vm)->mgr.resv)
@@ -197,12 +191,18 @@ static inline void xe_vm_reactivate_rebind(struct xe_vm *vm)
 
 static inline bool xe_vma_is_null(struct xe_vma *vma)
 {
-	return vma->gpuva.flags & XE_VMA_NULL;
+	return vma->gpuva.flags & DRM_GPUVA_SPARSE;
 }
 
 static inline bool xe_vma_is_userptr(struct xe_vma *vma)
 {
-	return !xe_vma_bo(vma) && !xe_vma_is_null(vma);
+	return vma->gpuva.flags & DRM_GPUVA_USERPTR;
+}
+
+static inline struct xe_bo *xe_vma_bo(struct xe_vma *vma)
+{
+	return xe_vma_is_null(vma) || xe_vma_is_userptr(vma) ? NULL :
+		container_of(vma->gpuva.gem.obj, struct xe_bo, ttm.base);
 }
 
 static inline bool xe_vma_has_no_bo(struct xe_vma *vma)
@@ -210,6 +210,11 @@ static inline bool xe_vma_has_no_bo(struct xe_vma *vma)
 	return !xe_vma_bo(vma);
 }
 
+static inline bool xe_vma_userptr_sg_mapped(struct xe_vma *vma)
+{
+	return vma->gpuva.flags & XE_VMA_USERPTR_SG_MAPPED;
+}
+
 int xe_vma_userptr_pin_pages(struct xe_vma *vma);
 
 int xe_vma_userptr_check_repin(struct xe_vma *vma);
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 0b59bde3bc4e..ce1260b8d3ef 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -23,15 +23,15 @@ struct xe_vm;
 #define TEST_VM_ASYNC_OPS_ERROR
 #define FORCE_ASYNC_OP_ERROR	BIT(31)
 
-#define XE_VMA_READ_ONLY	DRM_GPUVA_USERBITS
-#define XE_VMA_DESTROYED	(DRM_GPUVA_USERBITS << 1)
-#define XE_VMA_ATOMIC_PTE_BIT	(DRM_GPUVA_USERBITS << 2)
-#define XE_VMA_FIRST_REBIND	(DRM_GPUVA_USERBITS << 3)
-#define XE_VMA_LAST_REBIND	(DRM_GPUVA_USERBITS << 4)
-#define XE_VMA_NULL		(DRM_GPUVA_USERBITS << 5)
-#define XE_VMA_PTE_4K		(DRM_GPUVA_USERBITS << 6)
-#define XE_VMA_PTE_2M		(DRM_GPUVA_USERBITS << 7)
-#define XE_VMA_PTE_1G		(DRM_GPUVA_USERBITS << 8)
+#define XE_VMA_READ_ONLY		DRM_GPUVA_USERBITS
+#define XE_VMA_DESTROYED		(DRM_GPUVA_USERBITS << 1)
+#define XE_VMA_ATOMIC_PTE_BIT		(DRM_GPUVA_USERBITS << 2)
+#define XE_VMA_FIRST_REBIND		(DRM_GPUVA_USERBITS << 3)
+#define XE_VMA_LAST_REBIND		(DRM_GPUVA_USERBITS << 4)
+#define XE_VMA_USERPTR_SG_MAPPED	(DRM_GPUVA_USERBITS << 5)
+#define XE_VMA_PTE_4K			(DRM_GPUVA_USERBITS << 6)
+#define XE_VMA_PTE_2M			(DRM_GPUVA_USERBITS << 7)
+#define XE_VMA_PTE_1G			(DRM_GPUVA_USERBITS << 8)
 
 /** struct xe_userptr - User pointer */
 struct xe_userptr {
@@ -41,8 +41,6 @@ struct xe_userptr {
 	struct mmu_interval_notifier notifier;
 	/** @sgt: storage for a scatter gather table */
 	struct sg_table sgt;
-	/** @sg: allocated scatter gather table */
-	struct sg_table *sg;
 	/** @notifier_seq: notifier sequence number */
 	unsigned long notifier_seq;
 	/**
diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
index 57861a7ed504..943c8fcda533 100644
--- a/include/drm/drm_gpuva_mgr.h
+++ b/include/drm/drm_gpuva_mgr.h
@@ -62,10 +62,17 @@ enum drm_gpuva_flags {
 	 */
 	DRM_GPUVA_EXTOBJ = (1 << 2),
 
+	/**
+	 * @DRM_GPUVA_USERPTR:
+	 *
+	 * Flag indicating that the &drm_gpuva is a user pointer mapping.
+	 */
+	DRM_GPUVA_USERPTR = (1 << 3),
+
 	/**
 	 * @DRM_GPUVA_USERBITS: user defined bits
 	 */
-	DRM_GPUVA_USERBITS = (1 << 3),
+	DRM_GPUVA_USERBITS = (1 << 4),
 };
 
 /**
@@ -102,31 +109,45 @@ struct drm_gpuva {
 		u64 range;
 	} va;
 
-	/**
-	 * @gem: structure containing the &drm_gem_object and it's offset
-	 */
-	struct {
-		/**
-		 * @offset: the offset within the &drm_gem_object
-		 */
-		u64 offset;
-
-		/**
-		 * @obj: the mapped &drm_gem_object
-		 */
-		struct drm_gem_object *obj;
-
+	union {
 		/**
-		 * @entry: the &list_head to attach this object to a &drm_gem_object
+		 * @gem: structure containing the &drm_gem_object and it's
+		 * offset
 		 */
-		struct list_head entry;
+		struct {
+			/**
+			 * @offset: the offset within the &drm_gem_object
+			 */
+			u64 offset;
+
+			/**
+			 * @obj: the mapped &drm_gem_object
+			 */
+			struct drm_gem_object *obj;
+
+			/**
+			 * @entry: the &list_head to attach this object to a
+			 * &drm_gem_object
+			 */
+			struct list_head entry;
+
+			/**
+			 * @extobj_link: the &list_head to attach this object to
+			 * a @drm_gpuva_manager.extobj.list
+			 */
+			struct list_head extobj_link;
+		} gem;
 
 		/**
-		 * @extobj_link: the &list_head to attach this object to a
-		 * @drm_gpuva_manager.extobj.list
+		 * @userptr: structure containing user pointer state
 		 */
-		struct list_head extobj_link;
-	} gem;
+		struct {
+			/**
+			 * @address: user pointer address
+			 */
+			u64 address;
+		} userptr;
+	};
 };
 
 void drm_gpuva_link(struct drm_gpuva *va);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 25/31] drm: execution context for GEM buffers v3
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (23 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 24/31] drm/xe: Userptr refactor Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 26/31] drm/exec: Always compile drm_exec Matthew Brost
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe; +Cc: Christian König, Christian König

From: Christian König <ckoenig.leichtzumerken@gmail.com>

This adds the infrastructure for an execution context for GEM buffers
which is similar to the existinc TTMs execbuf util and intended to replace
it in the long term.

The basic functionality is that we abstracts the necessary loop to lock
many different GEM buffers with automated deadlock and duplicate handling.

v2: drop xarray and use dynamic resized array instead, the locking
    overhead is unecessary and measureable.
v3: drop duplicate tracking, radeon is really the only one needing that.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 Documentation/gpu/drm-mm.rst |  12 ++
 drivers/gpu/drm/Kconfig      |   6 +
 drivers/gpu/drm/Makefile     |   2 +
 drivers/gpu/drm/drm_exec.c   | 248 +++++++++++++++++++++++++++++++++++
 include/drm/drm_exec.h       | 115 ++++++++++++++++
 5 files changed, 383 insertions(+)
 create mode 100644 drivers/gpu/drm/drm_exec.c
 create mode 100644 include/drm/drm_exec.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index fe40ee686f6e..c9f120cfe730 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -524,6 +524,18 @@ DRM Sync Objects
 .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
    :export:
 
+DRM Execution context
+=====================
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :doc: Overview
+
+.. kernel-doc:: include/drm/drm_exec.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :export:
+
 GPU Scheduler
 =============
 
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index e928284b4357..39c9d079d52a 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -201,6 +201,12 @@ config DRM_TTM
 	  GPU memory types. Will be enabled automatically if a device driver
 	  uses it.
 
+config DRM_EXEC
+	tristate
+	depends on DRM
+	help
+	  Execution context for command submissions
+
 config DRM_BUDDY
 	tristate
 	depends on DRM
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index ad6267273503..ab728632d8a2 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -80,6 +80,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
 #
 # Memory-management helpers
 #
+#
+obj-$(CONFIG_DRM_EXEC) += drm_exec.o
 
 obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
 
diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
new file mode 100644
index 000000000000..f645d22a0863
--- /dev/null
+++ b/drivers/gpu/drm/drm_exec.c
@@ -0,0 +1,248 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#include <drm/drm_exec.h>
+#include <drm/drm_gem.h>
+#include <linux/dma-resv.h>
+
+/**
+ * DOC: Overview
+ *
+ * This component mainly abstracts the retry loop necessary for locking
+ * multiple GEM objects while preparing hardware operations (e.g. command
+ * submissions, page table updates etc..).
+ *
+ * If a contention is detected while locking a GEM object the cleanup procedure
+ * unlocks all previously locked GEM objects and locks the contended one first
+ * before locking any further objects.
+ *
+ * After an object is locked fences slots can optionally be reserved on the
+ * dma_resv object inside the GEM object.
+ *
+ * A typical usage pattern should look like this::
+ *
+ *	struct drm_gem_object *obj;
+ *	struct drm_exec exec;
+ *	unsigned long index;
+ *	int ret;
+ *
+ *	drm_exec_init(&exec, true);
+ *	drm_exec_while_not_all_locked(&exec) {
+ *		ret = drm_exec_prepare_obj(&exec, boA, 1);
+ *		drm_exec_continue_on_contention(&exec);
+ *		if (ret)
+ *			goto error;
+ *
+ *		ret = drm_exec_lock(&exec, boB, 1);
+ *		drm_exec_continue_on_contention(&exec);
+ *		if (ret)
+ *			goto error;
+ *	}
+ *
+ *	drm_exec_for_each_locked_object(&exec, index, obj) {
+ *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
+ *		...
+ *	}
+ *	drm_exec_fini(&exec);
+ *
+ * See struct dma_exec for more details.
+ */
+
+/* Dummy value used to initially enter the retry loop */
+#define DRM_EXEC_DUMMY (void*)~0
+
+/* Unlock all objects and drop references */
+static void drm_exec_unlock_all(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj;
+	unsigned long index;
+
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		dma_resv_unlock(obj->resv);
+		drm_gem_object_put(obj);
+	}
+
+	if (exec->prelocked) {
+		drm_gem_object_put(exec->prelocked);
+		exec->prelocked = NULL;
+	}
+}
+
+/**
+ * drm_exec_init - initialize a drm_exec object
+ * @exec: the drm_exec object to initialize
+ * @interruptible: if locks should be acquired interruptible
+ *
+ * Initialize the object and make sure that we can track locked and duplicate
+ * objects.
+ */
+void drm_exec_init(struct drm_exec *exec, bool interruptible)
+{
+	exec->interruptible = interruptible;
+	exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
+
+	/* If allocation here fails, just delay that till the first use */
+	exec->max_objects = exec->objects ? PAGE_SIZE / sizeof(void *) : 0;
+	exec->num_objects = 0;
+	exec->contended = DRM_EXEC_DUMMY;
+	exec->prelocked = NULL;
+}
+EXPORT_SYMBOL(drm_exec_init);
+
+/**
+ * drm_exec_fini - finalize a drm_exec object
+ * @exec: the drm_exec object to finilize
+ *
+ * Unlock all locked objects, drop the references to objects and free all memory
+ * used for tracking the state.
+ */
+void drm_exec_fini(struct drm_exec *exec)
+{
+	drm_exec_unlock_all(exec);
+	kvfree(exec->objects);
+	if (exec->contended != DRM_EXEC_DUMMY) {
+		drm_gem_object_put(exec->contended);
+		ww_acquire_fini(&exec->ticket);
+	}
+}
+EXPORT_SYMBOL(drm_exec_fini);
+
+/**
+ * drm_exec_cleanup - cleanup when contention is detected
+ * @exec: the drm_exec object to cleanup
+ *
+ * Cleanup the current state and return true if we should stay inside the retry
+ * loop, false if there wasn't any contention detected and we can keep the
+ * objects locked.
+ */
+bool drm_exec_cleanup(struct drm_exec *exec)
+{
+	if (likely(!exec->contended)) {
+		ww_acquire_done(&exec->ticket);
+		return false;
+	}
+
+	if (likely(exec->contended == DRM_EXEC_DUMMY)) {
+		exec->contended = NULL;
+		ww_acquire_init(&exec->ticket, &reservation_ww_class);
+		return true;
+	}
+
+	drm_exec_unlock_all(exec);
+	exec->num_objects = 0;
+	return true;
+}
+EXPORT_SYMBOL(drm_exec_cleanup);
+
+/* Track the locked object in the xa and reserve fences */
+static int drm_exec_obj_locked(struct drm_exec *exec,
+			       struct drm_gem_object *obj)
+{
+	if (unlikely(exec->num_objects == exec->max_objects)) {
+		size_t size = exec->max_objects * sizeof(void *);
+		void *tmp;
+
+		tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
+				GFP_KERNEL);
+		if (!tmp)
+			return -ENOMEM;
+
+		exec->objects = tmp;
+		exec->max_objects += PAGE_SIZE / sizeof(void *);
+	}
+	drm_gem_object_get(obj);
+	exec->objects[exec->num_objects++] = obj;
+
+	return 0;
+}
+
+/* Make sure the contended object is locked first */
+static int drm_exec_lock_contended(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj = exec->contended;
+	int ret;
+
+	if (likely(!obj))
+		return 0;
+
+	if (exec->interruptible) {
+		ret = dma_resv_lock_slow_interruptible(obj->resv,
+						       &exec->ticket);
+		if (unlikely(ret))
+			goto error_dropref;
+	} else {
+		dma_resv_lock_slow(obj->resv, &exec->ticket);
+	}
+
+	ret = drm_exec_obj_locked(exec, obj);
+	if (unlikely(ret)) {
+		dma_resv_unlock(obj->resv);
+		goto error_dropref;
+	}
+
+	swap(exec->prelocked, obj);
+
+error_dropref:
+	/* Always cleanup the contention so that error handling can kick in */
+	drm_gem_object_put(obj);
+	exec->contended = NULL;
+	return ret;
+}
+
+/**
+ * drm_exec_prepare_obj - prepare a GEM object for use
+ * @exec: the drm_exec object with the state
+ * @obj: the GEM object to prepare
+ * @num_fences: how many fences to reserve
+ *
+ * Prepare a GEM object for use by locking it and reserving fence slots. All
+ * succesfully locked objects are put into the locked container. Duplicates
+ * detected as well and automatically moved into the duplicates container.
+ *
+ * Returns: -EDEADLK if a contention is detected, -ENOMEM when memory
+ * allocation failed and zero for success.
+ */
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences)
+{
+	int ret;
+
+	ret = drm_exec_lock_contended(exec);
+	if (unlikely(ret))
+		return ret;
+
+	if (exec->prelocked == obj) {
+		drm_gem_object_put(exec->prelocked);
+		exec->prelocked = NULL;
+
+		return dma_resv_reserve_fences(obj->resv, num_fences);
+	}
+
+	if (exec->interruptible)
+		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
+	else
+		ret = dma_resv_lock(obj->resv, &exec->ticket);
+
+	if (unlikely(ret == -EDEADLK)) {
+		drm_gem_object_get(obj);
+		exec->contended = obj;
+		return -EDEADLK;
+	}
+
+	if (unlikely(ret))
+		return ret;
+
+	ret = drm_exec_obj_locked(exec, obj);
+	if (ret)
+		goto error_unlock;
+
+	/* Keep locked when reserving fences fails */
+	return dma_resv_reserve_fences(obj->resv, num_fences);
+
+error_unlock:
+	dma_resv_unlock(obj->resv);
+	return ret;
+}
+EXPORT_SYMBOL(drm_exec_prepare_obj);
+
+MODULE_DESCRIPTION("DRM execution context");
+MODULE_LICENSE("Dual MIT/GPL");
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
new file mode 100644
index 000000000000..65e518c01db3
--- /dev/null
+++ b/include/drm/drm_exec.h
@@ -0,0 +1,115 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#ifndef __DRM_EXEC_H__
+#define __DRM_EXEC_H__
+
+#include <linux/ww_mutex.h>
+
+struct drm_gem_object;
+
+/**
+ * struct drm_exec - Execution context
+ */
+struct drm_exec {
+	/**
+	 * @interruptible: If locks should be taken interruptible
+	 */
+	bool			interruptible;
+
+	/**
+	 * @ticket: WW ticket used for acquiring locks
+	 */
+	struct ww_acquire_ctx	ticket;
+
+	/**
+	 * @num_objects: number of objects locked
+	 */
+	unsigned int		num_objects;
+
+	/**
+	 * @max_objects: maximum objects in array
+	 */
+	unsigned int		max_objects;
+
+	/**
+	 * @objects: array of the locked objects
+	 */
+	struct drm_gem_object	**objects;
+
+	/**
+	 * @contended: contended GEM object we backet of for
+	 */
+	struct drm_gem_object	*contended;
+
+	/**
+	 * @prelocked: already locked GEM object because of contention
+	 */
+	struct drm_gem_object *prelocked;
+};
+
+/**
+ * drm_exec_for_each_locked_object - iterate over all the locked objects
+ * @exec: drm_exec object
+ * @index: unsigned long index for the iteration
+ * @obj: the current GEM object
+ *
+ * Iterate over all the locked GEM objects inside the drm_exec object.
+ */
+#define drm_exec_for_each_locked_object(exec, index, obj)	\
+	for (index = 0, obj = (exec)->objects[0];		\
+	     index < (exec)->num_objects;			\
+	     ++index, obj = (exec)->objects[index])
+
+/**
+ * drm_exec_while_not_all_locked - loop until all GEM objects are prepared
+ * @exec: drm_exec object
+ *
+ * Core functionality of the drm_exec object. Loops until all GEM objects are
+ * prepared and no more contention exists.
+ *
+ * At the beginning of the loop it is guaranteed that no GEM object is locked.
+ */
+#define drm_exec_while_not_all_locked(exec)	\
+	while (drm_exec_cleanup(exec))
+
+/**
+ * drm_exec_continue_on_contention - continue the loop when we need to cleanup
+ * @exec: drm_exec object
+ *
+ * Control flow helper to continue when a contention was detected and we need to
+ * clean up and re-start the loop to prepare all GEM objects.
+ */
+#define drm_exec_continue_on_contention(exec)		\
+	if (unlikely(drm_exec_is_contended(exec)))	\
+		continue
+
+/**
+ * drm_exec_break_on_contention - break a subordinal loop on contention
+ * @exec: drm_exec object
+ *
+ * Control flow helper to break a subordinal loop when a contention was detected
+ * and we need to clean up and re-start the loop to prepare all GEM objects.
+ */
+#define drm_exec_break_on_contention(exec)		\
+	if (unlikely(drm_exec_is_contended(exec)))	\
+		break
+
+/**
+ * drm_exec_is_contended - check for contention
+ * @exec: drm_exec object
+ *
+ * Returns true if the drm_exec object has run into some contention while
+ * locking a GEM object and needs to clean up.
+ */
+static inline bool drm_exec_is_contended(struct drm_exec *exec)
+{
+	return !!exec->contended;
+}
+
+void drm_exec_init(struct drm_exec *exec, bool interruptible);
+void drm_exec_fini(struct drm_exec *exec);
+bool drm_exec_cleanup(struct drm_exec *exec);
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences);
+
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 26/31] drm/exec: Always compile drm_exec
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (24 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 25/31] drm: execution context for GEM buffers v3 Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-09 14:45   ` Rodrigo Vivi
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 27/31] drm/xe: Use drm_exec for locking rather than TTM exec helpers Matthew Brost
                   ` (6 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe; +Cc: Danilo Krummrich

We want some helpers for DRM exec in gpuva, alway compile this.

Suggested-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/Makefile | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index ab728632d8a2..40067970af04 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -23,6 +23,7 @@ drm-y := \
 	drm_dumb_buffers.o \
 	drm_edid.o \
 	drm_encoder.o \
+	drm_exec.o \
 	drm_file.o \
 	drm_fourcc.o \
 	drm_framebuffer.o \
@@ -81,8 +82,6 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
 # Memory-management helpers
 #
 #
-obj-$(CONFIG_DRM_EXEC) += drm_exec.o
-
 obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
 
 drm_dma_helper-y := drm_gem_dma_helper.o
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 27/31] drm/xe: Use drm_exec for locking rather than TTM exec helpers
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (25 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 26/31] drm/exec: Always compile drm_exec Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-05 19:42   ` Rodrigo Vivi
  2023-05-11 10:01   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 28/31] drm/xe: Allow dma-fences as in-syncs for compute / faulting VM Matthew Brost
                   ` (5 subsequent siblings)
  32 siblings, 2 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe; +Cc: Danilo Krummrich

drm_exec is intended to replace TTM exec helpers, use drm_exec. Also
combine parts of drm_exec with gpuva where it makes sense (locking,
fence installation).

Suggested-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
---
 drivers/gpu/drm/drm_gpuva_mgr.c              |  67 ++++-
 drivers/gpu/drm/i915/display/intel_display.c |   6 +-
 drivers/gpu/drm/xe/Kconfig                   |   1 +
 drivers/gpu/drm/xe/tests/xe_bo.c             |  26 +-
 drivers/gpu/drm/xe/tests/xe_migrate.c        |   6 +-
 drivers/gpu/drm/xe/xe_bo.c                   |  56 ++--
 drivers/gpu/drm/xe/xe_bo.h                   |   6 +-
 drivers/gpu/drm/xe/xe_bo_evict.c             |  24 +-
 drivers/gpu/drm/xe/xe_bo_types.h             |   1 -
 drivers/gpu/drm/xe/xe_engine.c               |   7 +-
 drivers/gpu/drm/xe/xe_exec.c                 |  37 +--
 drivers/gpu/drm/xe/xe_gt_pagefault.c         |  55 +---
 drivers/gpu/drm/xe/xe_lrc.c                  |   8 +-
 drivers/gpu/drm/xe/xe_migrate.c              |  13 +-
 drivers/gpu/drm/xe/xe_vm.c                   | 283 ++++++++-----------
 drivers/gpu/drm/xe/xe_vm.h                   |  27 +-
 drivers/gpu/drm/xe/xe_vm_madvise.c           |  37 +--
 include/drm/drm_gpuva_mgr.h                  |  16 +-
 18 files changed, 315 insertions(+), 361 deletions(-)

diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
index e8cd6e154336..93c912c34211 100644
--- a/drivers/gpu/drm/drm_gpuva_mgr.c
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -483,6 +483,50 @@ drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr)
 }
 EXPORT_SYMBOL(drm_gpuva_manager_destroy);
 
+/**
+ * TODO
+ */
+int drm_gpuva_manager_lock(struct drm_gpuva_manager *mgr, struct drm_exec *exec,
+			   struct drm_gem_object *mgr_obj, bool intr,
+			   unsigned int num_fences)
+{
+	struct drm_gpuva *gpuva;
+	int ret;
+
+	drm_exec_init(exec, intr);
+	drm_exec_while_not_all_locked(exec) {
+		ret = drm_exec_prepare_obj(exec, mgr_obj, num_fences);
+		drm_exec_continue_on_contention(exec);
+		if (ret && ret != -EALREADY)
+			goto err_exec;
+
+		drm_gpuva_for_each_extobj(gpuva, mgr) {
+			ret = drm_exec_prepare_obj(exec, gpuva->gem.obj,
+						   num_fences);
+			drm_exec_break_on_contention(exec);
+			if (ret && ret != -EALREADY)
+				goto err_exec;
+		}
+	}
+
+	return 0;
+
+err_exec:
+	drm_exec_fini(exec);
+	return ret;
+}
+EXPORT_SYMBOL(drm_gpuva_manager_lock);
+
+/**
+ * TODO
+ */
+void drm_gpuva_manager_unlock(struct drm_gpuva_manager *mgr,
+			      struct drm_exec *exec)
+{
+	drm_exec_fini(exec);
+}
+EXPORT_SYMBOL(drm_gpuva_manager_unlock);
+
 static inline bool
 drm_gpuva_in_mm_range(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
 {
@@ -888,7 +932,7 @@ drm_gpuva_interval_empty(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
 EXPORT_SYMBOL(drm_gpuva_interval_empty);
 
 /**
- * drm_gpuva_add_fence - add fence to private and all extobj dma-resv
+ * drm_gpuva_manager_add_fence - add fence to private and all extobj dma-resv
  * @mgr: the &drm_gpuva_manager to add a fence to
  * @fence: fence to add
  * @private_usage: private dma-resv usage
@@ -896,17 +940,24 @@ EXPORT_SYMBOL(drm_gpuva_interval_empty);
  *
  * Returns: true if the interval is empty, false otherwise
  */
-void drm_gpuva_add_fence(struct drm_gpuva_manager *mgr, struct dma_fence *fence,
-			 enum dma_resv_usage private_usage,
-			 enum dma_resv_usage extobj_usage)
+void drm_gpuva_manager_add_fence(struct drm_gpuva_manager *mgr,
+				 struct drm_exec *exec,
+				 struct dma_fence *fence,
+				 enum dma_resv_usage private_usage,
+				 enum dma_resv_usage extobj_usage)
 {
-	struct drm_gpuva *gpuva;
+	struct drm_gem_object *obj;
+	unsigned long index;
+
+	dma_resv_assert_held(&mgr->resv);
 
 	dma_resv_add_fence(&mgr->resv, fence, private_usage);
-	drm_gpuva_for_each_extobj(gpuva, mgr)
-		dma_resv_add_fence(gpuva->gem.obj->resv, fence, extobj_usage);
+	drm_exec_for_each_locked_object(exec, index, obj)
+		if (likely(&mgr->resv != obj->resv))
+			dma_resv_add_fence(obj->resv, fence, extobj_usage);
 }
-EXPORT_SYMBOL(drm_gpuva_add_fence);
+EXPORT_SYMBOL(drm_gpuva_manager_add_fence);
+
 
 /**
  * drm_gpuva_map - helper to insert a &drm_gpuva from &drm_gpuva_fn_ops
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 28a227450329..aab1a3a0f06d 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -7340,11 +7340,11 @@ static int i915_gem_object_read_from_page(struct xe_bo *bo,
 	void *virtual;
 	bool is_iomem;
 	int ret;
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 
 	XE_BUG_ON(size != 8);
 
-	ret = xe_bo_lock(bo, &ww, 0, true);
+	ret = xe_bo_lock(bo, &exec, 0, true);
 	if (ret)
 		return ret;
 
@@ -7361,7 +7361,7 @@ static int i915_gem_object_read_from_page(struct xe_bo *bo,
 
 	ttm_bo_kunmap(&map);
 out_unlock:
-	xe_bo_unlock(bo, &ww);
+	xe_bo_unlock(bo, &exec);
 	return ret;
 }
 #endif
diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
index f6f3b491d162..bbcc9b64b776 100644
--- a/drivers/gpu/drm/xe/Kconfig
+++ b/drivers/gpu/drm/xe/Kconfig
@@ -8,6 +8,7 @@ config DRM_XE
 	select SHMEM
 	select TMPFS
 	select DRM_BUDDY
+	select DRM_EXEC
 	select DRM_KMS_HELPER
 	select DRM_PANEL
 	select DRM_SUBALLOC_HELPER
diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
index 9bd381e5b7a6..316c6cf2bb86 100644
--- a/drivers/gpu/drm/xe/tests/xe_bo.c
+++ b/drivers/gpu/drm/xe/tests/xe_bo.c
@@ -175,17 +175,17 @@ static int evict_test_run_gt(struct xe_device *xe, struct xe_gt *gt, struct kuni
 	unsigned int bo_flags = XE_BO_CREATE_USER_BIT |
 		XE_BO_CREATE_VRAM_IF_DGFX(gt);
 	struct xe_vm *vm = xe_migrate_get_vm(xe->gt[0].migrate);
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 	int err, i;
 
 	kunit_info(test, "Testing device %s gt id %u vram id %u\n",
 		   dev_name(xe->drm.dev), gt->info.id, gt->info.vram_id);
 
 	for (i = 0; i < 2; ++i) {
-		xe_vm_lock(vm, &ww, 0, false);
+		xe_vm_lock(vm, &exec, 0, false);
 		bo = xe_bo_create(xe, NULL, vm, 0x10000, ttm_bo_type_device,
 				  bo_flags);
-		xe_vm_unlock(vm, &ww);
+		xe_vm_unlock(vm, &exec);
 		if (IS_ERR(bo)) {
 			KUNIT_FAIL(test, "bo create err=%pe\n", bo);
 			break;
@@ -198,9 +198,9 @@ static int evict_test_run_gt(struct xe_device *xe, struct xe_gt *gt, struct kuni
 			goto cleanup_bo;
 		}
 
-		xe_bo_lock(external, &ww, 0, false);
+		xe_bo_lock(external, &exec, 0, false);
 		err = xe_bo_pin_external(external);
-		xe_bo_unlock(external, &ww);
+		xe_bo_unlock(external, &exec);
 		if (err) {
 			KUNIT_FAIL(test, "external bo pin err=%pe\n",
 				   ERR_PTR(err));
@@ -240,18 +240,18 @@ static int evict_test_run_gt(struct xe_device *xe, struct xe_gt *gt, struct kuni
 
 		if (i) {
 			down_read(&vm->lock);
-			xe_vm_lock(vm, &ww, 0, false);
+			xe_vm_lock(vm, &exec, 0, false);
 			err = xe_bo_validate(bo, bo->vm, false);
-			xe_vm_unlock(vm, &ww);
+			xe_vm_unlock(vm, &exec);
 			up_read(&vm->lock);
 			if (err) {
 				KUNIT_FAIL(test, "bo valid err=%pe\n",
 					   ERR_PTR(err));
 				goto cleanup_all;
 			}
-			xe_bo_lock(external, &ww, 0, false);
+			xe_bo_lock(external, &exec, 0, false);
 			err = xe_bo_validate(external, NULL, false);
-			xe_bo_unlock(external, &ww);
+			xe_bo_unlock(external, &exec);
 			if (err) {
 				KUNIT_FAIL(test, "external bo valid err=%pe\n",
 					   ERR_PTR(err));
@@ -259,18 +259,18 @@ static int evict_test_run_gt(struct xe_device *xe, struct xe_gt *gt, struct kuni
 			}
 		}
 
-		xe_bo_lock(external, &ww, 0, false);
+		xe_bo_lock(external, &exec, 0, false);
 		xe_bo_unpin_external(external);
-		xe_bo_unlock(external, &ww);
+		xe_bo_unlock(external, &exec);
 
 		xe_bo_put(external);
 		xe_bo_put(bo);
 		continue;
 
 cleanup_all:
-		xe_bo_lock(external, &ww, 0, false);
+		xe_bo_lock(external, &exec, 0, false);
 		xe_bo_unpin_external(external);
-		xe_bo_unlock(external, &ww);
+		xe_bo_unlock(external, &exec);
 cleanup_external:
 		xe_bo_put(external);
 cleanup_bo:
diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index 0f4371ad1fd9..e1482b4491b1 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -394,14 +394,14 @@ static int migrate_test_run_device(struct xe_device *xe)
 
 	for_each_gt(gt, xe, id) {
 		struct xe_migrate *m = gt->migrate;
-		struct ww_acquire_ctx ww;
+		struct drm_exec exec;
 
 		kunit_info(test, "Testing gt id %d.\n", id);
-		xe_vm_lock(m->eng->vm, &ww, 0, true);
+		xe_vm_lock(m->eng->vm, &exec, 0, true);
 		xe_device_mem_access_get(xe);
 		xe_migrate_sanity_test(m, test);
 		xe_device_mem_access_put(xe);
-		xe_vm_unlock(m->eng->vm, &ww);
+		xe_vm_unlock(m->eng->vm, &exec);
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index e0422ffb6327..a427edbf486b 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -8,6 +8,7 @@
 #include <linux/dma-buf.h>
 
 #include <drm/drm_drv.h>
+#include <drm/drm_exec.h>
 #include <drm/drm_gem_ttm_helper.h>
 #include <drm/ttm/ttm_device.h>
 #include <drm/ttm/ttm_placement.h>
@@ -991,13 +992,13 @@ static void xe_gem_object_close(struct drm_gem_object *obj,
 	struct xe_bo *bo = gem_to_xe_bo(obj);
 
 	if (bo->vm && !xe_vm_no_dma_fences(bo->vm)) {
-		struct ww_acquire_ctx ww;
+		struct drm_exec exec;
 
 		XE_BUG_ON(!xe_bo_is_user(bo));
 
-		xe_bo_lock(bo, &ww, 0, false);
+		xe_bo_lock(bo, &exec, 0, false);
 		ttm_bo_set_bulk_move(&bo->ttm, NULL);
-		xe_bo_unlock(bo, &ww);
+		xe_bo_unlock(bo, &exec);
 	}
 }
 
@@ -1402,11 +1403,6 @@ int xe_bo_pin_external(struct xe_bo *bo)
 	}
 
 	ttm_bo_pin(&bo->ttm);
-
-	/*
-	 * FIXME: If we always use the reserve / unreserve functions for locking
-	 * we do not need this.
-	 */
 	ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
 
 	return 0;
@@ -1461,11 +1457,6 @@ int xe_bo_pin(struct xe_bo *bo)
 	}
 
 	ttm_bo_pin(&bo->ttm);
-
-	/*
-	 * FIXME: If we always use the reserve / unreserve functions for locking
-	 * we do not need this.
-	 */
 	ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
 
 	return 0;
@@ -1496,11 +1487,6 @@ void xe_bo_unpin_external(struct xe_bo *bo)
 	}
 
 	ttm_bo_unpin(&bo->ttm);
-
-	/*
-	 * FIXME: If we always use the reserve / unreserve functions for locking
-	 * we do not need this.
-	 */
 	ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
 }
 
@@ -1650,7 +1636,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 	struct xe_device *xe = to_xe_device(dev);
 	struct xe_file *xef = to_xe_file(file);
 	struct drm_xe_gem_create *args = data;
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 	struct xe_vm *vm = NULL;
 	struct xe_bo *bo;
 	unsigned bo_flags = XE_BO_CREATE_USER_BIT;
@@ -1686,7 +1672,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 		vm = xe_vm_lookup(xef, args->vm_id);
 		if (XE_IOCTL_ERR(xe, !vm))
 			return -ENOENT;
-		err = xe_vm_lock(vm, &ww, 0, true);
+		err = xe_vm_lock(vm, &exec, 0, true);
 		if (err) {
 			xe_vm_put(vm);
 			return err;
@@ -1703,7 +1689,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 	bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
 			  bo_flags);
 	if (vm) {
-		xe_vm_unlock(vm, &ww);
+		xe_vm_unlock(vm, &exec);
 		xe_vm_put(vm);
 	}
 
@@ -1744,26 +1730,30 @@ int xe_gem_mmap_offset_ioctl(struct drm_device *dev, void *data,
 	return 0;
 }
 
-int xe_bo_lock(struct xe_bo *bo, struct ww_acquire_ctx *ww,
+int xe_bo_lock(struct xe_bo *bo, struct drm_exec *exec,
 	       int num_resv, bool intr)
 {
-	struct ttm_validate_buffer tv_bo;
-	LIST_HEAD(objs);
-	LIST_HEAD(dups);
+	int err;
 
-	XE_BUG_ON(!ww);
+	drm_exec_init(exec, intr);
+	drm_exec_while_not_all_locked(exec) {
+		err = drm_exec_prepare_obj(exec, &bo->ttm.base,
+					   num_resv);
+		drm_exec_continue_on_contention(exec);
+		if (err && err != -EALREADY)
+			goto out_err;
+	}
 
-	tv_bo.num_shared = num_resv;
-	tv_bo.bo = &bo->ttm;;
-	list_add_tail(&tv_bo.head, &objs);
+	return 0;
 
-	return ttm_eu_reserve_buffers(ww, &objs, intr, &dups);
+out_err:
+	drm_exec_fini(exec);
+	return err;
 }
 
-void xe_bo_unlock(struct xe_bo *bo, struct ww_acquire_ctx *ww)
+void xe_bo_unlock(struct xe_bo *bo, struct drm_exec *exec)
 {
-	dma_resv_unlock(bo->ttm.base.resv);
-	ww_acquire_fini(ww);
+	drm_exec_fini(exec);
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 9b401d30a130..5a80ebf72d10 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -75,6 +75,7 @@
 
 #define XE_BO_PROPS_INVALID	(-1)
 
+struct drm_exec;
 struct sg_table;
 
 struct xe_bo *xe_bo_alloc(void);
@@ -142,10 +143,9 @@ static inline void xe_bo_assert_held(struct xe_bo *bo)
 		dma_resv_assert_held((bo)->ttm.base.resv);
 }
 
-int xe_bo_lock(struct xe_bo *bo, struct ww_acquire_ctx *ww,
+int xe_bo_lock(struct xe_bo *bo, struct drm_exec *exec,
 	       int num_resv, bool intr);
-
-void xe_bo_unlock(struct xe_bo *bo, struct ww_acquire_ctx *ww);
+void xe_bo_unlock(struct xe_bo *bo, struct drm_exec *exec);
 
 static inline void xe_bo_unlock_vm_held(struct xe_bo *bo)
 {
diff --git a/drivers/gpu/drm/xe/xe_bo_evict.c b/drivers/gpu/drm/xe/xe_bo_evict.c
index 6642c5f52009..46d9d9eb110c 100644
--- a/drivers/gpu/drm/xe/xe_bo_evict.c
+++ b/drivers/gpu/drm/xe/xe_bo_evict.c
@@ -3,6 +3,8 @@
  * Copyright © 2022 Intel Corporation
  */
 
+#include <drm/drm_exec.h>
+
 #include "xe_bo_evict.h"
 
 #include "xe_bo.h"
@@ -27,7 +29,7 @@
 int xe_bo_evict_all(struct xe_device *xe)
 {
 	struct ttm_device *bdev = &xe->ttm;
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 	struct xe_bo *bo;
 	struct xe_gt *gt;
 	struct list_head still_in_list;
@@ -62,9 +64,9 @@ int xe_bo_evict_all(struct xe_device *xe)
 		list_move_tail(&bo->pinned_link, &still_in_list);
 		spin_unlock(&xe->pinned.lock);
 
-		xe_bo_lock(bo, &ww, 0, false);
+		xe_bo_lock(bo, &exec, 0, false);
 		ret = xe_bo_evict_pinned(bo);
-		xe_bo_unlock(bo, &ww);
+		xe_bo_unlock(bo, &exec);
 		xe_bo_put(bo);
 		if (ret) {
 			spin_lock(&xe->pinned.lock);
@@ -96,9 +98,9 @@ int xe_bo_evict_all(struct xe_device *xe)
 		list_move_tail(&bo->pinned_link, &xe->pinned.evicted);
 		spin_unlock(&xe->pinned.lock);
 
-		xe_bo_lock(bo, &ww, 0, false);
+		xe_bo_lock(bo, &exec, 0, false);
 		ret = xe_bo_evict_pinned(bo);
-		xe_bo_unlock(bo, &ww);
+		xe_bo_unlock(bo, &exec);
 		xe_bo_put(bo);
 		if (ret)
 			return ret;
@@ -123,7 +125,7 @@ int xe_bo_evict_all(struct xe_device *xe)
  */
 int xe_bo_restore_kernel(struct xe_device *xe)
 {
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 	struct xe_bo *bo;
 	int ret;
 
@@ -140,9 +142,9 @@ int xe_bo_restore_kernel(struct xe_device *xe)
 		list_move_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present);
 		spin_unlock(&xe->pinned.lock);
 
-		xe_bo_lock(bo, &ww, 0, false);
+		xe_bo_lock(bo, &exec, 0, false);
 		ret = xe_bo_restore_pinned(bo);
-		xe_bo_unlock(bo, &ww);
+		xe_bo_unlock(bo, &exec);
 		if (ret) {
 			xe_bo_put(bo);
 			return ret;
@@ -182,7 +184,7 @@ int xe_bo_restore_kernel(struct xe_device *xe)
  */
 int xe_bo_restore_user(struct xe_device *xe)
 {
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 	struct xe_bo *bo;
 	struct xe_gt *gt;
 	struct list_head still_in_list;
@@ -204,9 +206,9 @@ int xe_bo_restore_user(struct xe_device *xe)
 		xe_bo_get(bo);
 		spin_unlock(&xe->pinned.lock);
 
-		xe_bo_lock(bo, &ww, 0, false);
+		xe_bo_lock(bo, &exec, 0, false);
 		ret = xe_bo_restore_pinned(bo);
-		xe_bo_unlock(bo, &ww);
+		xe_bo_unlock(bo, &exec);
 		xe_bo_put(bo);
 		if (ret) {
 			spin_lock(&xe->pinned.lock);
diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index 06de3330211d..2ba34a8c9b66 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -11,7 +11,6 @@
 #include <drm/drm_mm.h>
 #include <drm/ttm/ttm_bo.h>
 #include <drm/ttm/ttm_device.h>
-#include <drm/ttm/ttm_execbuf_util.h>
 #include <drm/ttm/ttm_placement.h>
 
 struct xe_device;
diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
index 91600b1e8249..8b425b777259 100644
--- a/drivers/gpu/drm/xe/xe_engine.c
+++ b/drivers/gpu/drm/xe/xe_engine.c
@@ -8,6 +8,7 @@
 #include <linux/nospec.h>
 
 #include <drm/drm_device.h>
+#include <drm/drm_exec.h>
 #include <drm/drm_file.h>
 #include <drm/xe_drm.h>
 
@@ -89,18 +90,18 @@ struct xe_engine *xe_engine_create(struct xe_device *xe, struct xe_vm *vm,
 				   u32 logical_mask, u16 width,
 				   struct xe_hw_engine *hwe, u32 flags)
 {
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 	struct xe_engine *e;
 	int err;
 
 	if (vm) {
-		err = xe_vm_lock(vm, &ww, 0, true);
+		err = xe_vm_lock(vm, &exec, 0, true);
 		if (err)
 			return ERR_PTR(err);
 	}
 	e = __xe_engine_create(xe, vm, logical_mask, width, hwe, flags);
 	if (vm)
-		xe_vm_unlock(vm, &ww);
+		xe_vm_unlock(vm, &exec);
 
 	return e;
 }
diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 2ae02f1500d5..9f7f1088c403 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -6,6 +6,7 @@
 #include "xe_exec.h"
 
 #include <drm/drm_device.h>
+#include <drm/drm_exec.h>
 #include <drm/drm_file.h>
 #include <drm/xe_drm.h>
 
@@ -92,21 +93,16 @@
  *	Unlock all
  */
 
-static int xe_exec_begin(struct xe_engine *e, struct ww_acquire_ctx *ww,
-			 struct ttm_validate_buffer tv_onstack[],
-			 struct ttm_validate_buffer **tv,
-			 struct list_head *objs)
+static int xe_exec_begin(struct xe_engine *e, struct drm_exec *exec)
 {
 	struct xe_vm *vm = e->vm;
 	struct xe_vma *vma;
-	LIST_HEAD(dups);
 	int err;
 
-	*tv = NULL;
 	if (xe_vm_no_dma_fences(e->vm))
 		return 0;
 
-	err = xe_vm_lock_dma_resv(vm, ww, tv_onstack, tv, objs, true, 1);
+	err = xe_vm_lock_dma_resv(vm, exec, true, 1);
 	if (err)
 		return err;
 
@@ -123,8 +119,7 @@ static int xe_exec_begin(struct xe_engine *e, struct ww_acquire_ctx *ww,
 
 		err = xe_bo_validate(xe_vma_bo(vma), vm, false);
 		if (err) {
-			xe_vm_unlock_dma_resv(vm, tv_onstack, *tv, ww, objs);
-			*tv = NULL;
+			xe_vm_unlock_dma_resv(vm, exec);
 			return err;
 		}
 	}
@@ -132,14 +127,10 @@ static int xe_exec_begin(struct xe_engine *e, struct ww_acquire_ctx *ww,
 	return 0;
 }
 
-static void xe_exec_end(struct xe_engine *e,
-			struct ttm_validate_buffer *tv_onstack,
-			struct ttm_validate_buffer *tv,
-			struct ww_acquire_ctx *ww,
-			struct list_head *objs)
+static void xe_exec_end(struct xe_engine *e, struct drm_exec *exec)
 {
 	if (!xe_vm_no_dma_fences(e->vm))
-		xe_vm_unlock_dma_resv(e->vm, tv_onstack, tv, ww, objs);
+		xe_vm_unlock_dma_resv(e->vm, exec);
 }
 
 int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
@@ -149,17 +140,14 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	struct drm_xe_exec *args = data;
 	struct drm_xe_sync __user *syncs_user = u64_to_user_ptr(args->syncs);
 	u64 __user *addresses_user = u64_to_user_ptr(args->address);
+	struct drm_exec exec;
 	struct xe_engine *engine;
 	struct xe_sync_entry *syncs = NULL;
 	u64 addresses[XE_HW_ENGINE_MAX_INSTANCE];
-	struct ttm_validate_buffer tv_onstack[XE_ONSTACK_TV];
-	struct ttm_validate_buffer *tv = NULL;
 	u32 i, num_syncs = 0;
 	struct xe_sched_job *job;
 	struct dma_fence *rebind_fence;
 	struct xe_vm *vm;
-	struct ww_acquire_ctx ww;
-	struct list_head objs;
 	bool write_locked;
 	int err = 0;
 
@@ -270,7 +258,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 			goto err_unlock_list;
 	}
 
-	err = xe_exec_begin(engine, &ww, tv_onstack, &tv, &objs);
+	err = xe_exec_begin(engine, &exec);
 	if (err)
 		goto err_unlock_list;
 
@@ -361,9 +349,10 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	 * are written as we don't pass in a read / write list.
 	 */
 	if (!xe_vm_no_dma_fences(vm))
-		drm_gpuva_add_fence(&vm->mgr, &job->drm.s_fence->finished,
-				    DMA_RESV_USAGE_BOOKKEEP,
-				    DMA_RESV_USAGE_WRITE);
+		drm_gpuva_manager_add_fence(&vm->mgr, &exec,
+					    &job->drm.s_fence->finished,
+					    DMA_RESV_USAGE_BOOKKEEP,
+					    DMA_RESV_USAGE_WRITE);
 
 	for (i = 0; i < num_syncs; i++)
 		xe_sync_entry_signal(&syncs[i], job,
@@ -387,7 +376,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	if (err)
 		xe_sched_job_put(job);
 err_engine_end:
-	xe_exec_end(engine, tv_onstack, tv, &ww, &objs);
+	xe_exec_end(engine, &exec);
 err_unlock_list:
 	if (write_locked)
 		up_write(&vm->lock);
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index d7bf6b0a0697..1145c6eaa17d 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -9,7 +9,7 @@
 #include <linux/circ_buf.h>
 
 #include <drm/drm_managed.h>
-#include <drm/ttm/ttm_execbuf_util.h>
+#include <drm/drm_exec.h>
 
 #include "xe_bo.h"
 #include "xe_gt.h"
@@ -84,11 +84,6 @@ static bool vma_matches(struct xe_vma *vma, u64 page_addr)
 	return true;
 }
 
-static bool only_needs_bo_lock(struct xe_bo *bo)
-{
-	return bo && bo->vm;
-}
-
 static struct xe_vma *lookup_vma(struct xe_vm *vm, u64 page_addr)
 {
 	struct xe_vma *vma = NULL;
@@ -109,10 +104,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 	struct xe_vm *vm;
 	struct xe_vma *vma = NULL;
 	struct xe_bo *bo;
-	LIST_HEAD(objs);
-	LIST_HEAD(dups);
-	struct ttm_validate_buffer tv_bo, tv_vm;
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 	struct dma_fence *fence;
 	bool write_locked;
 	int ret = 0;
@@ -170,20 +162,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 
 	/* Lock VM and BOs dma-resv */
 	bo = xe_vma_bo(vma);
-	if (only_needs_bo_lock(bo)) {
-		/* This path ensures the BO's LRU is updated */
-		ret = xe_bo_lock(bo, &ww, xe->info.tile_count, false);
-	} else {
-		tv_vm.num_shared = xe->info.tile_count;
-		tv_vm.bo = xe_vm_ttm_bo(vm);
-		list_add(&tv_vm.head, &objs);
-		if (bo) {
-			tv_bo.bo = &bo->ttm;
-			tv_bo.num_shared = xe->info.tile_count;
-			list_add(&tv_bo.head, &objs);
-		}
-		ret = ttm_eu_reserve_buffers(&ww, &objs, false, &dups);
-	}
+	ret = xe_vm_bo_lock(vm, bo, &exec, xe->info.tile_count, false);
 	if (ret)
 		goto unlock_vm;
 
@@ -226,10 +205,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 	vma->usm.gt_invalidated &= ~BIT(gt->info.id);
 
 unlock_dma_resv:
-	if (only_needs_bo_lock(bo))
-		xe_bo_unlock(bo, &ww);
-	else
-		ttm_eu_backoff_reservation(&ww, &objs);
+	xe_vm_bo_unlock(vm, bo, &exec, true);
 unlock_vm:
 	if (!ret)
 		vm->usm.last_fault_vma = vma;
@@ -496,10 +472,7 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
 	struct xe_vm *vm;
 	struct xe_vma *vma;
 	struct xe_bo *bo;
-	LIST_HEAD(objs);
-	LIST_HEAD(dups);
-	struct ttm_validate_buffer tv_bo, tv_vm;
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 	int ret = 0;
 
 	/* We only support ACC_TRIGGER at the moment */
@@ -532,28 +505,14 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
 
 	/* Lock VM and BOs dma-resv */
 	bo = xe_vma_bo(vma);
-	if (only_needs_bo_lock(bo)) {
-		/* This path ensures the BO's LRU is updated */
-		ret = xe_bo_lock(bo, &ww, xe->info.tile_count, false);
-	} else {
-		tv_vm.num_shared = xe->info.tile_count;
-		tv_vm.bo = xe_vm_ttm_bo(vm);
-		list_add(&tv_vm.head, &objs);
-		tv_bo.bo = &bo->ttm;
-		tv_bo.num_shared = xe->info.tile_count;
-		list_add(&tv_bo.head, &objs);
-		ret = ttm_eu_reserve_buffers(&ww, &objs, false, &dups);
-	}
+	ret = xe_vm_bo_lock(vm, bo, &exec, xe->info.tile_count, false);
 	if (ret)
 		goto unlock_vm;
 
 	/* Migrate to VRAM, move should invalidate the VMA first */
 	ret = xe_bo_migrate(bo, XE_PL_VRAM0 + gt->info.vram_id);
 
-	if (only_needs_bo_lock(bo))
-		xe_bo_unlock(bo, &ww);
-	else
-		ttm_eu_backoff_reservation(&ww, &objs);
+	xe_vm_bo_unlock(vm, bo, &exec, true);
 unlock_vm:
 	up_read(&vm->lock);
 	xe_vm_put(vm);
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index ae605e7805de..3cc34efe8dd8 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -3,6 +3,8 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include <drm/drm_exec.h>
+
 #include "xe_lrc.h"
 
 #include "regs/xe_engine_regs.h"
@@ -712,16 +714,16 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 
 void xe_lrc_finish(struct xe_lrc *lrc)
 {
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 
 	xe_hw_fence_ctx_finish(&lrc->fence_ctx);
 	if (lrc->bo->vm)
-		xe_vm_lock(lrc->bo->vm, &ww, 0, false);
+		xe_vm_lock(lrc->bo->vm, &exec, 0, false);
 	else
 		xe_bo_lock_no_vm(lrc->bo, NULL);
 	xe_bo_unpin(lrc->bo);
 	if (lrc->bo->vm)
-		xe_vm_unlock(lrc->bo->vm, &ww);
+		xe_vm_unlock(lrc->bo->vm, &exec);
 	else
 		xe_bo_unlock_no_vm(lrc->bo);
 	xe_bo_put(lrc->bo);
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 91a06c925a1e..1dd497252640 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -9,6 +9,7 @@
 #include <linux/sizes.h>
 
 #include <drm/drm_managed.h>
+#include <drm/drm_exec.h>
 #include <drm/ttm/ttm_tt.h>
 #include <drm/xe_drm.h>
 
@@ -86,13 +87,13 @@ struct xe_engine *xe_gt_migrate_engine(struct xe_gt *gt)
 static void xe_migrate_fini(struct drm_device *dev, void *arg)
 {
 	struct xe_migrate *m = arg;
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 
-	xe_vm_lock(m->eng->vm, &ww, 0, false);
+	xe_vm_lock(m->eng->vm, &exec, 0, false);
 	xe_bo_unpin(m->pt_bo);
 	if (m->cleared_bo)
 		xe_bo_unpin(m->cleared_bo);
-	xe_vm_unlock(m->eng->vm, &ww);
+	xe_vm_unlock(m->eng->vm, &exec);
 
 	dma_fence_put(m->fence);
 	if (m->cleared_bo)
@@ -315,7 +316,7 @@ struct xe_migrate *xe_migrate_init(struct xe_gt *gt)
 	struct xe_device *xe = gt_to_xe(gt);
 	struct xe_migrate *m;
 	struct xe_vm *vm;
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 	int err;
 
 	XE_BUG_ON(xe_gt_is_media_type(gt));
@@ -332,9 +333,9 @@ struct xe_migrate *xe_migrate_init(struct xe_gt *gt)
 	if (IS_ERR(vm))
 		return ERR_CAST(vm);
 
-	xe_vm_lock(vm, &ww, 0, false);
+	xe_vm_lock(vm, &exec, 0, false);
 	err = xe_migrate_prepare_vm(gt, m, vm);
-	xe_vm_unlock(vm, &ww);
+	xe_vm_unlock(vm, &exec);
 	if (err) {
 		xe_vm_close_and_put(vm);
 		return ERR_PTR(err);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 4d734ec4d6ab..55cced8870e6 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -7,7 +7,7 @@
 
 #include <linux/dma-fence-array.h>
 
-#include <drm/ttm/ttm_execbuf_util.h>
+#include <drm/drm_exec.h>
 #include <drm/ttm/ttm_tt.h>
 #include <drm/xe_drm.h>
 #include <linux/kthread.h>
@@ -260,10 +260,10 @@ static void arm_preempt_fences(struct xe_vm *vm, struct list_head *list)
 static int add_preempt_fences(struct xe_vm *vm, struct xe_bo *bo)
 {
 	struct xe_engine *e;
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 	int err;
 
-	err = xe_bo_lock(bo, &ww, vm->preempt.num_engines, true);
+	err = xe_bo_lock(bo, &exec, vm->preempt.num_engines, true);
 	if (err)
 		return err;
 
@@ -274,11 +274,12 @@ static int add_preempt_fences(struct xe_vm *vm, struct xe_bo *bo)
 					   DMA_RESV_USAGE_BOOKKEEP);
 		}
 
-	xe_bo_unlock(bo, &ww);
+	xe_bo_unlock(bo, &exec);
 	return 0;
 }
 
-static void resume_and_reinstall_preempt_fences(struct xe_vm *vm)
+static void resume_and_reinstall_preempt_fences(struct xe_vm *vm,
+						struct drm_exec *exec)
 {
 	struct xe_engine *e;
 
@@ -288,18 +289,15 @@ static void resume_and_reinstall_preempt_fences(struct xe_vm *vm)
 	list_for_each_entry(e, &vm->preempt.engines, compute.link) {
 		e->ops->resume(e);
 
-		drm_gpuva_add_fence(&vm->mgr, e->compute.pfence,
-				    DMA_RESV_USAGE_BOOKKEEP,
-				    DMA_RESV_USAGE_BOOKKEEP);
+		drm_gpuva_manager_add_fence(&vm->mgr, exec, e->compute.pfence,
+					    DMA_RESV_USAGE_BOOKKEEP,
+					    DMA_RESV_USAGE_BOOKKEEP);
 	}
 }
 
 int xe_vm_add_compute_engine(struct xe_vm *vm, struct xe_engine *e)
 {
-	struct ttm_validate_buffer tv_onstack[XE_ONSTACK_TV];
-	struct ttm_validate_buffer *tv;
-	struct ww_acquire_ctx ww;
-	struct list_head objs;
+	struct drm_exec exec;
 	struct dma_fence *pfence;
 	int err;
 	bool wait;
@@ -308,7 +306,7 @@ int xe_vm_add_compute_engine(struct xe_vm *vm, struct xe_engine *e)
 
 	down_write(&vm->lock);
 
-	err = xe_vm_lock_dma_resv(vm, &ww, tv_onstack, &tv, &objs, true, 1);
+	err = xe_vm_lock_dma_resv(vm, &exec, true, 1);
 	if (err)
 		goto out_unlock_outer;
 
@@ -325,9 +323,9 @@ int xe_vm_add_compute_engine(struct xe_vm *vm, struct xe_engine *e)
 
 	down_read(&vm->userptr.notifier_lock);
 
-	drm_gpuva_add_fence(&vm->mgr, pfence,
-			    DMA_RESV_USAGE_BOOKKEEP,
-			    DMA_RESV_USAGE_BOOKKEEP);
+	drm_gpuva_manager_add_fence(&vm->mgr, &exec, pfence,
+				    DMA_RESV_USAGE_BOOKKEEP,
+				    DMA_RESV_USAGE_BOOKKEEP);
 
 	/*
 	 * Check to see if a preemption on VM is in flight or userptr
@@ -341,7 +339,7 @@ int xe_vm_add_compute_engine(struct xe_vm *vm, struct xe_engine *e)
 	up_read(&vm->userptr.notifier_lock);
 
 out_unlock:
-	xe_vm_unlock_dma_resv(vm, tv_onstack, tv, &ww, &objs);
+	xe_vm_unlock_dma_resv(vm, &exec);
 out_unlock_outer:
 	up_write(&vm->lock);
 
@@ -367,25 +365,24 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm)
 		list_empty(&vm->userptr.invalidated)) ? 0 : -EAGAIN;
 }
 
+static struct drm_gem_object *xe_vm_gem(struct xe_vm *vm)
+{
+	int idx = vm->flags & XE_VM_FLAG_MIGRATION ?
+		XE_VM_FLAG_GT_ID(vm->flags) : 0;
+
+	/* Safe to use index 0 as all BO in the VM share a single dma-resv lock */
+	return &vm->pt_root[idx]->bo->ttm.base;
+}
+
 /**
  * xe_vm_lock_dma_resv() - Lock the vm dma_resv object and the dma_resv
  * objects of the vm's external buffer objects.
- * @vm: The vm.
- * @ww: Pointer to a struct ww_acquire_ctx locking context.
- * @tv_onstack: Array size XE_ONSTACK_TV of storage for the struct
- * ttm_validate_buffers used for locking.
- * @tv: Pointer to a pointer that on output contains the actual storage used.
- * @objs: List head for the buffer objects locked.
+ * @vm: The vm
  * @intr: Whether to lock interruptible.
  * @num_shared: Number of dma-fence slots to reserve in the locked objects.
  *
  * Locks the vm dma-resv objects and all the dma-resv objects of the
- * buffer objects on the vm external object list. The TTM utilities require
- * a list of struct ttm_validate_buffers pointing to the actual buffer
- * objects to lock. Storage for those struct ttm_validate_buffers should
- * be provided in @tv_onstack, and is typically reserved on the stack
- * of the caller. If the size of @tv_onstack isn't sufficient, then
- * storage will be allocated internally using kvmalloc().
+ * buffer objects on the vm external object list.
  *
  * The function performs deadlock handling internally, and after a
  * successful return the ww locking transaction should be considered
@@ -395,46 +392,18 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm)
  * @intr is set to true, -EINTR or -ERESTARTSYS may be returned. In case
  * of error, any locking performed has been reverted.
  */
-int xe_vm_lock_dma_resv(struct xe_vm *vm, struct ww_acquire_ctx *ww,
-			struct ttm_validate_buffer *tv_onstack,
-			struct ttm_validate_buffer **tv,
-			struct list_head *objs,
-			bool intr,
+int xe_vm_lock_dma_resv(struct xe_vm *vm, struct drm_exec *exec, bool intr,
 			unsigned int num_shared)
 {
-	struct ttm_validate_buffer *tv_vm, *tv_bo;
 	struct xe_vma *vma, *next;
-	struct drm_gpuva *gpuva;
-	LIST_HEAD(dups);
 	int err;
 
 	lockdep_assert_held(&vm->lock);
 
-	if (vm->mgr.extobj.entries < XE_ONSTACK_TV) {
-		tv_vm = tv_onstack;
-	} else {
-		tv_vm = kvmalloc_array(vm->mgr.extobj.entries + 1,
-				       sizeof(*tv_vm),
-				       GFP_KERNEL);
-		if (!tv_vm)
-			return -ENOMEM;
-	}
-	tv_bo = tv_vm + 1;
-
-	INIT_LIST_HEAD(objs);
-	drm_gpuva_for_each_extobj(gpuva, &vm->mgr) {
-		tv_bo->num_shared = num_shared;
-		tv_bo->bo = &gem_to_xe_bo(gpuva->gem.obj)->ttm;
-
-		list_add_tail(&tv_bo->head, objs);
-		tv_bo++;
-	}
-	tv_vm->num_shared = num_shared;
-	tv_vm->bo = xe_vm_ttm_bo(vm);
-	list_add_tail(&tv_vm->head, objs);
-	err = ttm_eu_reserve_buffers(ww, objs, intr, &dups);
+	err = drm_gpuva_manager_lock(&vm->mgr, exec, xe_vm_gem(vm), intr,
+				     num_shared);
 	if (err)
-		goto out_err;
+		return err;
 
 	spin_lock(&vm->notifier.list_lock);
 	list_for_each_entry_safe(vma, next, &vm->notifier.rebind_list,
@@ -447,34 +416,22 @@ int xe_vm_lock_dma_resv(struct xe_vm *vm, struct ww_acquire_ctx *ww,
 	}
 	spin_unlock(&vm->notifier.list_lock);
 
-	*tv = tv_vm;
 	return 0;
-
-out_err:
-	if (tv_vm != tv_onstack)
-		kvfree(tv_vm);
-
-	return err;
 }
 
 /**
  * xe_vm_unlock_dma_resv() - Unlock reservation objects locked by
  * xe_vm_lock_dma_resv()
  * @vm: The vm.
- * @tv_onstack: The @tv_onstack array given to xe_vm_lock_dma_resv().
- * @tv: The value of *@tv given by xe_vm_lock_dma_resv().
- * @ww: The ww_acquire_context used for locking.
- * @objs: The list returned from xe_vm_lock_dma_resv().
  *
  * Unlocks the reservation objects and frees any memory allocated by
  * xe_vm_lock_dma_resv().
  */
-void xe_vm_unlock_dma_resv(struct xe_vm *vm,
-			   struct ttm_validate_buffer *tv_onstack,
-			   struct ttm_validate_buffer *tv,
-			   struct ww_acquire_ctx *ww,
-			   struct list_head *objs)
+void xe_vm_unlock_dma_resv(struct xe_vm *vm, struct drm_exec *exec)
 {
+	struct drm_gem_object *obj, *skip = xe_vm_gem(vm);
+	unsigned long index;
+
 	/*
 	 * Nothing should've been able to enter the list while we were locked,
 	 * since we've held the dma-resvs of all the vm's external objects,
@@ -483,19 +440,20 @@ void xe_vm_unlock_dma_resv(struct xe_vm *vm,
 	 */
 	XE_WARN_ON(!list_empty(&vm->notifier.rebind_list));
 
-	ttm_eu_backoff_reservation(ww, objs);
-	if (tv && tv != tv_onstack)
-		kvfree(tv);
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		struct xe_bo *bo = gem_to_xe_bo(obj);
+
+		if (obj != skip)
+			ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
+	}
+	drm_gpuva_manager_unlock(&vm->mgr, exec);
 }
 
 static void preempt_rebind_work_func(struct work_struct *w)
 {
 	struct xe_vm *vm = container_of(w, struct xe_vm, preempt.rebind_work);
+	struct drm_exec exec;
 	struct xe_vma *vma;
-	struct ttm_validate_buffer tv_onstack[XE_ONSTACK_TV];
-	struct ttm_validate_buffer *tv;
-	struct ww_acquire_ctx ww;
-	struct list_head objs;
 	struct dma_fence *rebind_fence;
 	unsigned int fence_count = 0;
 	LIST_HEAD(preempt_fences);
@@ -536,8 +494,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
 			goto out_unlock_outer;
 	}
 
-	err = xe_vm_lock_dma_resv(vm, &ww, tv_onstack, &tv, &objs,
-				  false, vm->preempt.num_engines);
+	err = xe_vm_lock_dma_resv(vm, &exec, false, vm->preempt.num_engines);
 	if (err)
 		goto out_unlock_outer;
 
@@ -608,11 +565,11 @@ static void preempt_rebind_work_func(struct work_struct *w)
 
 	/* Point of no return. */
 	arm_preempt_fences(vm, &preempt_fences);
-	resume_and_reinstall_preempt_fences(vm);
+	resume_and_reinstall_preempt_fences(vm, &exec);
 	up_read(&vm->userptr.notifier_lock);
 
 out_unlock:
-	xe_vm_unlock_dma_resv(vm, tv_onstack, tv, &ww, &objs);
+	xe_vm_unlock_dma_resv(vm, &exec);
 out_unlock_outer:
 	if (err == -EAGAIN) {
 		trace_xe_vm_rebind_worker_retry(vm);
@@ -963,27 +920,16 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
 
 static void xe_vma_destroy_unlocked(struct xe_vma *vma)
 {
-	struct ttm_validate_buffer tv[2];
-	struct ww_acquire_ctx ww;
+	struct xe_vm *vm = xe_vma_vm(vma);
 	struct xe_bo *bo = xe_vma_bo(vma);
-	LIST_HEAD(objs);
-	LIST_HEAD(dups);
+	struct drm_exec exec;
 	int err;
 
-	memset(tv, 0, sizeof(tv));
-	tv[0].bo = xe_vm_ttm_bo(xe_vma_vm(vma));
-	list_add(&tv[0].head, &objs);
-
-	if (bo) {
-		tv[1].bo = &xe_bo_get(bo)->ttm;
-		list_add(&tv[1].head, &objs);
-	}
-	err = ttm_eu_reserve_buffers(&ww, &objs, false, &dups);
+	err = xe_vm_bo_lock(vm, xe_bo_get(bo), &exec, 0, false);
 	XE_WARN_ON(err);
-
 	xe_vma_destroy(vma, NULL);
+	xe_vm_bo_unlock(vm, bo, &exec, false);
 
-	ttm_eu_backoff_reservation(&ww, &objs);
 	if (bo)
 		xe_bo_put(bo);
 }
@@ -1254,7 +1200,7 @@ static void vm_error_capture(struct xe_vm *vm, int err,
 void xe_vm_close_and_put(struct xe_vm *vm)
 {
 	struct list_head contested;
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 	struct xe_device *xe = xe_vm_device(vm);
 	struct xe_gt *gt;
 	struct xe_vma *vma, *next_vma;
@@ -1281,7 +1227,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 	}
 
 	down_write(&vm->lock);
-	xe_vm_lock(vm, &ww, 0, false);
+	xe_vm_lock(vm, &exec, 0, false);
 	drm_gpuva_iter_for_each(gpuva, it) {
 		vma = gpuva_to_vma(gpuva);
 
@@ -1323,7 +1269,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 					      NULL);
 		}
 	}
-	xe_vm_unlock(vm, &ww);
+	xe_vm_unlock(vm, &exec);
 
 	/*
 	 * VM is now dead, cannot re-add nodes to vm->vmas if it's NULL
@@ -1356,7 +1302,7 @@ static void vm_destroy_work_func(struct work_struct *w)
 {
 	struct xe_vm *vm =
 		container_of(w, struct xe_vm, destroy_work);
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 	struct xe_device *xe = xe_vm_device(vm);
 	struct xe_gt *gt;
 	u8 id;
@@ -1382,14 +1328,14 @@ static void vm_destroy_work_func(struct work_struct *w)
 	 * is needed for xe_vm_lock to work. If we remove that dependency this
 	 * can be moved to xe_vm_close_and_put.
 	 */
-	xe_vm_lock(vm, &ww, 0, false);
+	xe_vm_lock(vm, &exec, 0, false);
 	for_each_gt(gt, xe, id) {
 		if (vm->pt_root[id]) {
 			xe_pt_destroy(vm->pt_root[id], vm->flags, NULL);
 			vm->pt_root[id] = NULL;
 		}
 	}
-	xe_vm_unlock(vm, &ww);
+	xe_vm_unlock(vm, &exec);
 
 	trace_xe_vm_free(vm);
 	dma_fence_put(vm->rebind_fence);
@@ -1969,21 +1915,6 @@ static int xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma,
 
 #define VM_BIND_OP(op)	(op & 0xffff)
 
-struct ttm_buffer_object *xe_vm_ttm_bo(struct xe_vm *vm)
-{
-	int idx = vm->flags & XE_VM_FLAG_MIGRATION ?
-		XE_VM_FLAG_GT_ID(vm->flags) : 0;
-
-	/* Safe to use index 0 as all BO in the VM share a single dma-resv lock */
-	return &vm->pt_root[idx]->bo->ttm;
-}
-
-static void xe_vm_tv_populate(struct xe_vm *vm, struct ttm_validate_buffer *tv)
-{
-	tv->num_shared = 1;
-	tv->bo = xe_vm_ttm_bo(vm);
-}
-
 static void vm_set_async_error(struct xe_vm *vm, int err)
 {
 	lockdep_assert_held(&vm->lock);
@@ -2088,7 +2019,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 			 u32 operation, u8 gt_mask, u32 region)
 {
 	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 	struct drm_gpuva_ops *ops;
 	struct drm_gpuva_op *__op;
 	struct xe_vma_op *op;
@@ -2136,11 +2067,11 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 	case XE_VM_BIND_OP_UNMAP_ALL:
 		XE_BUG_ON(!bo);
 
-		err = xe_bo_lock(bo, &ww, 0, true);
+		err = xe_bo_lock(bo, &exec, 0, true);
 		if (err)
 			return ERR_PTR(err);
 		ops = drm_gpuva_gem_unmap_ops_create(&vm->mgr, obj);
-		xe_bo_unlock(bo, &ww);
+		xe_bo_unlock(bo, &exec);
 
 		drm_gpuva_for_each_op(__op, ops) {
 			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
@@ -2174,13 +2105,13 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 {
 	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
 	struct xe_vma *vma;
-	struct ww_acquire_ctx ww;
+	struct drm_exec exec;
 	int err;
 
 	lockdep_assert_held_write(&vm->lock);
 
 	if (bo) {
-		err = xe_bo_lock(bo, &ww, 0, true);
+		err = xe_bo_lock(bo, &exec, 0, true);
 		if (err)
 			return ERR_PTR(err);
 	}
@@ -2189,7 +2120,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 			    op->va.range - 1, read_only, null,
 			    gt_mask);
 	if (bo)
-		xe_bo_unlock(bo, &ww);
+		xe_bo_unlock(bo, &exec);
 
 	if (xe_vma_is_userptr(vma)) {
 		err = xe_vma_userptr_pin_pages(vma);
@@ -2441,19 +2372,15 @@ static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)
 static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
 			       struct xe_vma_op *op)
 {
-	LIST_HEAD(objs);
-	LIST_HEAD(dups);
-	struct ttm_validate_buffer tv_bo, tv_vm;
-	struct ww_acquire_ctx ww;
 	struct xe_bo *vbo;
+	struct drm_exec exec;
 	int err;
+	bool lru_update = op->base.op != DRM_GPUVA_OP_UNMAP;
 
 	lockdep_assert_held_write(&vm->lock);
 
-	xe_vm_tv_populate(vm, &tv_vm);
-	list_add_tail(&tv_vm.head, &objs);
 	vbo = xe_vma_bo(vma);
-	if (vbo) {
+	if (vbo)
 		/*
 		 * An unbind can drop the last reference to the BO and
 		 * the BO is needed for ttm_eu_backoff_reservation so
@@ -2461,22 +2388,15 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
 		 */
 		xe_bo_get(vbo);
 
-		if (!vbo->vm) {
-			tv_bo.bo = &vbo->ttm;
-			tv_bo.num_shared = 1;
-			list_add(&tv_bo.head, &objs);
-		}
-	}
-
 again:
-	err = ttm_eu_reserve_buffers(&ww, &objs, true, &dups);
+	err = xe_vm_bo_lock(vm, vbo, &exec, 1, false);
 	if (err) {
 		xe_bo_put(vbo);
 		return err;
 	}
 
 	xe_vm_assert_held(vm);
-	xe_bo_assert_held(xe_vma_bo(vma));
+	xe_bo_assert_held(vbo);
 
 	switch (op->base.op) {
 	case DRM_GPUVA_OP_MAP:
@@ -2552,7 +2472,7 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
 		XE_BUG_ON("NOT POSSIBLE");
 	}
 
-	ttm_eu_backoff_reservation(&ww, &objs);
+	xe_vm_bo_unlock(vm, vbo, &exec, lru_update);
 	if (err == -EAGAIN && xe_vma_is_userptr(vma)) {
 		lockdep_assert_held_write(&vm->lock);
 		err = xe_vma_userptr_pin_pages(vma);
@@ -3208,30 +3128,67 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	return err == -ENODATA ? 0 : err;
 }
 
-/*
- * XXX: Using the TTM wrappers for now, likely can call into dma-resv code
- * directly to optimize. Also this likely should be an inline function.
- */
-int xe_vm_lock(struct xe_vm *vm, struct ww_acquire_ctx *ww,
+int xe_vm_lock(struct xe_vm *vm, struct drm_exec *exec,
 	       int num_resv, bool intr)
 {
-	struct ttm_validate_buffer tv_vm;
-	LIST_HEAD(objs);
-	LIST_HEAD(dups);
+	int err;
 
-	XE_BUG_ON(!ww);
+	drm_exec_init(exec, intr);
+	drm_exec_while_not_all_locked(exec) {
+		err = drm_exec_prepare_obj(exec, xe_vm_gem(vm),
+					   num_resv);
+		drm_exec_continue_on_contention(exec);
+		if (err && err != -EALREADY)
+			goto out_err;
+	}
 
-	tv_vm.num_shared = num_resv;
-	tv_vm.bo = xe_vm_ttm_bo(vm);;
-	list_add_tail(&tv_vm.head, &objs);
+	return 0;
 
-	return ttm_eu_reserve_buffers(ww, &objs, intr, &dups);
+out_err:
+	drm_exec_fini(exec);
+	return err;
 }
 
-void xe_vm_unlock(struct xe_vm *vm, struct ww_acquire_ctx *ww)
+void xe_vm_unlock(struct xe_vm *vm, struct drm_exec *exec)
 {
-	dma_resv_unlock(xe_vm_resv(vm));
-	ww_acquire_fini(ww);
+	drm_exec_fini(exec);
+}
+
+int xe_vm_bo_lock(struct xe_vm *vm, struct xe_bo *bo, struct drm_exec *exec,
+		  int num_resv, bool intr)
+{
+	int err;
+
+	drm_exec_init(exec, intr);
+	drm_exec_while_not_all_locked(exec) {
+		err = drm_exec_prepare_obj(exec, xe_vm_gem(vm),
+					   num_resv);
+		drm_exec_continue_on_contention(exec);
+		if (err && err != -EALREADY)
+			goto out_err;
+
+		if (bo && !bo->vm) {
+			err = drm_exec_prepare_obj(exec, &bo->ttm.base,
+						   num_resv);
+			drm_exec_continue_on_contention(exec);
+			if (err && err != -EALREADY)
+				goto out_err;
+		}
+	}
+
+	return 0;
+
+out_err:
+	drm_exec_fini(exec);
+	return err;
+}
+
+void xe_vm_bo_unlock(struct xe_vm *vm, struct xe_bo *bo, struct drm_exec *exec,
+		     bool lru_update)
+{
+	if (lru_update && bo && (!bo->vm || xe_vm_no_dma_fences(vm)))
+		ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
+	drm_exec_fini(exec);
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index f279fa622260..47b981d9fc04 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -12,6 +12,7 @@
 #include "xe_vm_types.h"
 
 struct drm_device;
+struct drm_exec;
 struct drm_printer;
 struct drm_file;
 
@@ -38,10 +39,14 @@ static inline void xe_vm_put(struct xe_vm *vm)
 	kref_put(&vm->refcount, xe_vm_free);
 }
 
-int xe_vm_lock(struct xe_vm *vm, struct ww_acquire_ctx *ww,
+int xe_vm_lock(struct xe_vm *vm, struct drm_exec *exec,
 	       int num_resv, bool intr);
+void xe_vm_unlock(struct xe_vm *vm, struct drm_exec *exec);
 
-void xe_vm_unlock(struct xe_vm *vm, struct ww_acquire_ctx *ww);
+int xe_vm_bo_lock(struct xe_vm *vm, struct xe_bo *bo, struct drm_exec *exec,
+		  int num_resv, bool intr);
+void xe_vm_bo_unlock(struct xe_vm *vm, struct xe_bo *bo, struct drm_exec *exec,
+		     bool lru_update);
 
 static inline bool xe_vm_is_closed(struct xe_vm *vm)
 {
@@ -219,23 +224,9 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma);
 
 int xe_vma_userptr_check_repin(struct xe_vma *vma);
 
-/*
- * XE_ONSTACK_TV is used to size the tv_onstack array that is input
- * to xe_vm_lock_dma_resv() and xe_vm_unlock_dma_resv().
- */
-#define XE_ONSTACK_TV 20
-int xe_vm_lock_dma_resv(struct xe_vm *vm, struct ww_acquire_ctx *ww,
-			struct ttm_validate_buffer *tv_onstack,
-			struct ttm_validate_buffer **tv,
-			struct list_head *objs,
-			bool intr,
+int xe_vm_lock_dma_resv(struct xe_vm *vm, struct drm_exec *exec, bool intr,
 			unsigned int num_shared);
-
-void xe_vm_unlock_dma_resv(struct xe_vm *vm,
-			   struct ttm_validate_buffer *tv_onstack,
-			   struct ttm_validate_buffer *tv,
-			   struct ww_acquire_ctx *ww,
-			   struct list_head *objs);
+void xe_vm_unlock_dma_resv(struct xe_vm *vm, struct drm_exec *exec);
 
 int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id);
 
diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
index 03508645fa08..a68bc6fec1de 100644
--- a/drivers/gpu/drm/xe/xe_vm_madvise.c
+++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
@@ -7,6 +7,7 @@
 
 #include <linux/nospec.h>
 
+#include <drm/drm_exec.h>
 #include <drm/ttm/ttm_tt.h>
 #include <drm/xe_drm.h>
 
@@ -28,16 +29,16 @@ static int madvise_preferred_mem_class(struct xe_device *xe, struct xe_vm *vm,
 
 	for (i = 0; i < num_vmas; ++i) {
 		struct xe_bo *bo;
-		struct ww_acquire_ctx ww;
+		struct drm_exec exec;
 
 		bo = xe_vma_bo(vmas[i]);
 
-		err = xe_bo_lock(bo, &ww, 0, true);
+		err = xe_bo_lock(bo, &exec, 0, true);
 		if (err)
 			return err;
 		bo->props.preferred_mem_class = value;
 		xe_bo_placement_for_flags(xe, bo, bo->flags);
-		xe_bo_unlock(bo, &ww);
+		xe_bo_unlock(bo, &exec);
 	}
 
 	return 0;
@@ -53,16 +54,16 @@ static int madvise_preferred_gt(struct xe_device *xe, struct xe_vm *vm,
 
 	for (i = 0; i < num_vmas; ++i) {
 		struct xe_bo *bo;
-		struct ww_acquire_ctx ww;
+		struct drm_exec exec;
 
 		bo = xe_vma_bo(vmas[i]);
 
-		err = xe_bo_lock(bo, &ww, 0, true);
+		err = xe_bo_lock(bo, &exec, 0, true);
 		if (err)
 			return err;
 		bo->props.preferred_gt = value;
 		xe_bo_placement_for_flags(xe, bo, bo->flags);
-		xe_bo_unlock(bo, &ww);
+		xe_bo_unlock(bo, &exec);
 	}
 
 	return 0;
@@ -89,17 +90,17 @@ static int madvise_preferred_mem_class_gt(struct xe_device *xe,
 
 	for (i = 0; i < num_vmas; ++i) {
 		struct xe_bo *bo;
-		struct ww_acquire_ctx ww;
+		struct drm_exec exec;
 
 		bo = xe_vma_bo(vmas[i]);
 
-		err = xe_bo_lock(bo, &ww, 0, true);
+		err = xe_bo_lock(bo, &exec, 0, true);
 		if (err)
 			return err;
 		bo->props.preferred_mem_class = mem_class;
 		bo->props.preferred_gt = gt_id;
 		xe_bo_placement_for_flags(xe, bo, bo->flags);
-		xe_bo_unlock(bo, &ww);
+		xe_bo_unlock(bo, &exec);
 	}
 
 	return 0;
@@ -112,13 +113,13 @@ static int madvise_cpu_atomic(struct xe_device *xe, struct xe_vm *vm,
 
 	for (i = 0; i < num_vmas; ++i) {
 		struct xe_bo *bo;
-		struct ww_acquire_ctx ww;
+		struct drm_exec exec;
 
 		bo = xe_vma_bo(vmas[i]);
 		if (XE_IOCTL_ERR(xe, !(bo->flags & XE_BO_CREATE_SYSTEM_BIT)))
 			return -EINVAL;
 
-		err = xe_bo_lock(bo, &ww, 0, true);
+		err = xe_bo_lock(bo, &exec, 0, true);
 		if (err)
 			return err;
 		bo->props.cpu_atomic = !!value;
@@ -130,7 +131,7 @@ static int madvise_cpu_atomic(struct xe_device *xe, struct xe_vm *vm,
 		 */
 		if (bo->props.cpu_atomic)
 			ttm_bo_unmap_virtual(&bo->ttm);
-		xe_bo_unlock(bo, &ww);
+		xe_bo_unlock(bo, &exec);
 	}
 
 	return 0;
@@ -143,18 +144,18 @@ static int madvise_device_atomic(struct xe_device *xe, struct xe_vm *vm,
 
 	for (i = 0; i < num_vmas; ++i) {
 		struct xe_bo *bo;
-		struct ww_acquire_ctx ww;
+		struct drm_exec exec;
 
 		bo = xe_vma_bo(vmas[i]);
 		if (XE_IOCTL_ERR(xe, !(bo->flags & XE_BO_CREATE_VRAM0_BIT) &&
 				 !(bo->flags & XE_BO_CREATE_VRAM1_BIT)))
 			return -EINVAL;
 
-		err = xe_bo_lock(bo, &ww, 0, true);
+		err = xe_bo_lock(bo, &exec, 0, true);
 		if (err)
 			return err;
 		bo->props.device_atomic = !!value;
-		xe_bo_unlock(bo, &ww);
+		xe_bo_unlock(bo, &exec);
 	}
 
 	return 0;
@@ -174,16 +175,16 @@ static int madvise_priority(struct xe_device *xe, struct xe_vm *vm,
 
 	for (i = 0; i < num_vmas; ++i) {
 		struct xe_bo *bo;
-		struct ww_acquire_ctx ww;
+		struct drm_exec exec;
 
 		bo = xe_vma_bo(vmas[i]);
 
-		err = xe_bo_lock(bo, &ww, 0, true);
+		err = xe_bo_lock(bo, &exec, 0, true);
 		if (err)
 			return err;
 		bo->ttm.priority = value;
 		ttm_bo_move_to_lru_tail(&bo->ttm);
-		xe_bo_unlock(bo, &ww);
+		xe_bo_unlock(bo, &exec);
 	}
 
 	return 0;
diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
index 943c8fcda533..a2f6d90ac899 100644
--- a/include/drm/drm_gpuva_mgr.h
+++ b/include/drm/drm_gpuva_mgr.h
@@ -32,6 +32,8 @@
 #include <linux/spinlock.h>
 #include <linux/types.h>
 
+#include <drm/drm_exec.h>
+
 struct drm_gpuva_manager;
 struct drm_gpuva_fn_ops;
 struct drm_gpuva_prealloc;
@@ -169,9 +171,17 @@ struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end);
 
 bool drm_gpuva_interval_empty(struct drm_gpuva_manager *mgr, u64 addr, u64 range);
 
-void drm_gpuva_add_fence(struct drm_gpuva_manager *mgr, struct dma_fence *fence,
-			 enum dma_resv_usage private_usage,
-			 enum dma_resv_usage extobj_usage);
+int drm_gpuva_manager_lock(struct drm_gpuva_manager *mgr, struct drm_exec *exec,
+			   struct drm_gem_object *mgr_obj, bool intr,
+			   unsigned int num_fences);
+void drm_gpuva_manager_unlock(struct drm_gpuva_manager *mgr,
+			      struct drm_exec *exec);
+
+void drm_gpuva_manager_add_fence(struct drm_gpuva_manager *mgr,
+				 struct drm_exec *exec,
+				 struct dma_fence *fence,
+				 enum dma_resv_usage private_usage,
+				 enum dma_resv_usage extobj_usage);
 
 /**
  * drm_gpuva_evict - sets whether the backing GEM of this &drm_gpuva is evicted
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 28/31] drm/xe: Allow dma-fences as in-syncs for compute / faulting VM
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (26 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 27/31] drm/xe: Use drm_exec for locking rather than TTM exec helpers Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-05 19:43   ` Rodrigo Vivi
  2023-05-11 10:03   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 29/31] drm/xe: Allow compute VMs to output dma-fences on binds Matthew Brost
                   ` (4 subsequent siblings)
  32 siblings, 2 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

This is allowed per the dma-fencing rules.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_sync.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c
index 99f1ed87196d..1e4e4acb2c4a 100644
--- a/drivers/gpu/drm/xe/xe_sync.c
+++ b/drivers/gpu/drm/xe/xe_sync.c
@@ -105,6 +105,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
 {
 	struct drm_xe_sync sync_in;
 	int err;
+	bool signal;
 
 	if (copy_from_user(&sync_in, sync_user, sizeof(*sync_user)))
 		return -EFAULT;
@@ -113,9 +114,10 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
 			 ~(SYNC_FLAGS_TYPE_MASK | DRM_XE_SYNC_SIGNAL)))
 		return -EINVAL;
 
+	signal = sync_in.flags & DRM_XE_SYNC_SIGNAL;
 	switch (sync_in.flags & SYNC_FLAGS_TYPE_MASK) {
 	case DRM_XE_SYNC_SYNCOBJ:
-		if (XE_IOCTL_ERR(xe, no_dma_fences))
+		if (XE_IOCTL_ERR(xe, no_dma_fences && signal))
 			return -ENOTSUPP;
 
 		if (XE_IOCTL_ERR(xe, upper_32_bits(sync_in.addr)))
@@ -125,7 +127,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
 		if (XE_IOCTL_ERR(xe, !sync->syncobj))
 			return -ENOENT;
 
-		if (!(sync_in.flags & DRM_XE_SYNC_SIGNAL)) {
+		if (!signal) {
 			sync->fence = drm_syncobj_fence_get(sync->syncobj);
 			if (XE_IOCTL_ERR(xe, !sync->fence))
 				return -EINVAL;
@@ -133,7 +135,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
 		break;
 
 	case DRM_XE_SYNC_TIMELINE_SYNCOBJ:
-		if (XE_IOCTL_ERR(xe, no_dma_fences))
+		if (XE_IOCTL_ERR(xe, no_dma_fences && signal))
 			return -ENOTSUPP;
 
 		if (XE_IOCTL_ERR(xe, upper_32_bits(sync_in.addr)))
@@ -146,7 +148,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
 		if (XE_IOCTL_ERR(xe, !sync->syncobj))
 			return -ENOENT;
 
-		if (sync_in.flags & DRM_XE_SYNC_SIGNAL) {
+		if (signal) {
 			sync->chain_fence = dma_fence_chain_alloc();
 			if (!sync->chain_fence)
 				return -ENOMEM;
@@ -168,7 +170,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
 		break;
 
 	case DRM_XE_SYNC_USER_FENCE:
-		if (XE_IOCTL_ERR(xe, !(sync_in.flags & DRM_XE_SYNC_SIGNAL)))
+		if (XE_IOCTL_ERR(xe, !signal))
 			return -ENOTSUPP;
 
 		if (XE_IOCTL_ERR(xe, sync_in.addr & 0x7))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 29/31] drm/xe: Allow compute VMs to output dma-fences on binds
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (27 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 28/31] drm/xe: Allow dma-fences as in-syncs for compute / faulting VM Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-09 14:50   ` Rodrigo Vivi
  2023-05-11 10:04   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 30/31] drm/xe: remove async worker, sync binds, new error handling Matthew Brost
                   ` (3 subsequent siblings)
  32 siblings, 2 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

Binds are not long running jobs thus we can export dma-fences even if a
VM is in compute mode.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 55cced8870e6..07023506ce6b 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3047,7 +3047,7 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	for (num_syncs = 0; num_syncs < args->num_syncs; num_syncs++) {
 		err = xe_sync_entry_parse(xe, xef, &syncs[num_syncs],
 					  &syncs_user[num_syncs], false,
-					  xe_vm_no_dma_fences(vm));
+					  xe_vm_in_fault_mode(vm));
 		if (err)
 			goto free_syncs;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 30/31] drm/xe: remove async worker, sync binds, new error handling
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (28 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 29/31] drm/xe: Allow compute VMs to output dma-fences on binds Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-17 16:53   ` Thomas Hellström
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 31/31] drm/xe/uapi: Add some VM bind kernel doc Matthew Brost
                   ` (2 subsequent siblings)
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost

Async worker is gone, all jobs and memory allocations done in IOCTL.

Async vs. sync now means when do bind operations complete relative to
the IOCTL. Async completes when out-syncs signal while sync completes
when the IOCTL returns. In-syncs and out-syncs are only allowed in async
mode.

The error handling is similar to before, on memory allocation errors
binds are pause, VM is put in an error state, and the bind IOCTL
returns -ENOSPC. The user is allowed to issue sync unbinds, with the
reclaim bit set, while in an error state. Bind operations without the
reclaim bit set are rejected with -EALREADY until the exits the error
state. To exit the error issue a restart bind operation which will pick
up where the original failure left off.

TODO: Update kernel doc

Signed-off-by: Matthew Brost <mattthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_engine.c          |   7 +-
 drivers/gpu/drm/xe/xe_engine_types.h    |   1 +
 drivers/gpu/drm/xe/xe_exec.c            |  42 --
 drivers/gpu/drm/xe/xe_sync.c            |  14 +-
 drivers/gpu/drm/xe/xe_sync.h            |   2 +-
 drivers/gpu/drm/xe/xe_vm.c              | 712 ++++++------------------
 drivers/gpu/drm/xe/xe_vm.h              |   2 -
 drivers/gpu/drm/xe/xe_vm_types.h        |  37 +-
 drivers/gpu/drm/xe/xe_wait_user_fence.c |  43 +-
 include/uapi/drm/xe_drm.h               |  79 +--
 10 files changed, 213 insertions(+), 726 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
index 8b425b777259..d26aadb5e727 100644
--- a/drivers/gpu/drm/xe/xe_engine.c
+++ b/drivers/gpu/drm/xe/xe_engine.c
@@ -541,7 +541,10 @@ int xe_engine_create_ioctl(struct drm_device *dev, void *data,
 	if (XE_IOCTL_ERR(xe, eci[0].gt_id >= xe->info.tile_count))
 	       return -EINVAL;
 
-	if (eci[0].engine_class == DRM_XE_ENGINE_CLASS_VM_BIND) {
+	if (eci[0].engine_class >= DRM_XE_ENGINE_CLASS_VM_BIND_ASYNC) {
+		bool sync = eci[0].engine_class ==
+			DRM_XE_ENGINE_CLASS_VM_BIND_SYNC;
+	
 		for_each_gt(gt, xe, id) {
 			struct xe_engine *new;
 
@@ -564,6 +567,8 @@ int xe_engine_create_ioctl(struct drm_device *dev, void *data,
 					       args->width, hwe,
 					       ENGINE_FLAG_PERSISTENT |
 					       ENGINE_FLAG_VM |
+					       (sync ? 0 :
+						ENGINE_FLAG_VM_ASYNC) |
 					       (id ?
 					       ENGINE_FLAG_BIND_ENGINE_CHILD :
 					       0));
diff --git a/drivers/gpu/drm/xe/xe_engine_types.h b/drivers/gpu/drm/xe/xe_engine_types.h
index 36bfaeec23f4..4949edfa0980 100644
--- a/drivers/gpu/drm/xe/xe_engine_types.h
+++ b/drivers/gpu/drm/xe/xe_engine_types.h
@@ -59,6 +59,7 @@ struct xe_engine {
 #define ENGINE_FLAG_VM			BIT(4)
 #define ENGINE_FLAG_BIND_ENGINE_CHILD	BIT(5)
 #define ENGINE_FLAG_WA			BIT(6)
+#define ENGINE_FLAG_VM_ASYNC		BIT(7)
 
 	/**
 	 * @flags: flags for this engine, should statically setup aside from ban
diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 9f7f1088c403..c6f6a2bbd87b 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -196,26 +196,6 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		}
 	}
 
-	/*
-	 * We can't install a job into the VM dma-resv shared slot before an
-	 * async VM bind passed in as a fence without the risk of deadlocking as
-	 * the bind can trigger an eviction which in turn depends on anything in
-	 * the VM dma-resv shared slots. Not an ideal solution, but we wait for
-	 * all dependent async VM binds to start (install correct fences into
-	 * dma-resv slots) before moving forward.
-	 */
-	if (!xe_vm_no_dma_fences(vm) &&
-	    vm->flags & XE_VM_FLAG_ASYNC_BIND_OPS) {
-		for (i = 0; i < args->num_syncs; i++) {
-			struct dma_fence *fence = syncs[i].fence;
-			if (fence) {
-				err = xe_vm_async_fence_wait_start(fence);
-				if (err)
-					goto err_syncs;
-			}
-		}
-	}
-
 retry:
 	if (!xe_vm_no_dma_fences(vm) && xe_vm_userptr_check_repin(vm)) {
 		err = down_write_killable(&vm->lock);
@@ -228,28 +208,6 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	if (err)
 		goto err_syncs;
 
-	/* We don't allow execs while the VM is in error state */
-	if (vm->async_ops.error) {
-		err = vm->async_ops.error;
-		goto err_unlock_list;
-	}
-
-	/*
-	 * Extreme corner where we exit a VM error state with a munmap style VM
-	 * unbind inflight which requires a rebind. In this case the rebind
-	 * needs to install some fences into the dma-resv slots. The worker to
-	 * do this queued, let that worker make progress by dropping vm->lock,
-	 * flushing the worker and retrying the exec.
-	 */
-	if (vm->async_ops.munmap_rebind_inflight) {
-		if (write_locked)
-			up_write(&vm->lock);
-		else
-			up_read(&vm->lock);
-		flush_work(&vm->async_ops.work);
-		goto retry;
-	}
-
 	if (write_locked) {
 		err = xe_vm_userptr_pin(vm);
 		downgrade_write(&vm->lock);
diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c
index 1e4e4acb2c4a..c05142f9780a 100644
--- a/drivers/gpu/drm/xe/xe_sync.c
+++ b/drivers/gpu/drm/xe/xe_sync.c
@@ -18,7 +18,6 @@
 #include "xe_sched_job_types.h"
 
 #define SYNC_FLAGS_TYPE_MASK 0x3
-#define SYNC_FLAGS_FENCE_INSTALLED	0x10000
 
 struct user_fence {
 	struct xe_device *xe;
@@ -221,12 +220,11 @@ int xe_sync_entry_add_deps(struct xe_sync_entry *sync, struct xe_sched_job *job)
 	return 0;
 }
 
-bool xe_sync_entry_signal(struct xe_sync_entry *sync, struct xe_sched_job *job,
+void xe_sync_entry_signal(struct xe_sync_entry *sync, struct xe_sched_job *job,
 			  struct dma_fence *fence)
 {
-	if (!(sync->flags & DRM_XE_SYNC_SIGNAL) ||
-	    sync->flags & SYNC_FLAGS_FENCE_INSTALLED)
-		return false;
+	if (!(sync->flags & DRM_XE_SYNC_SIGNAL))
+		return;
 
 	if (sync->chain_fence) {
 		drm_syncobj_add_point(sync->syncobj, sync->chain_fence,
@@ -258,12 +256,6 @@ bool xe_sync_entry_signal(struct xe_sync_entry *sync, struct xe_sched_job *job,
 		job->user_fence.addr = sync->addr;
 		job->user_fence.value = sync->timeline_value;
 	}
-
-	/* TODO: external BO? */
-
-	sync->flags |= SYNC_FLAGS_FENCE_INSTALLED;
-
-	return true;
 }
 
 void xe_sync_entry_cleanup(struct xe_sync_entry *sync)
diff --git a/drivers/gpu/drm/xe/xe_sync.h b/drivers/gpu/drm/xe/xe_sync.h
index 4cbcf7a19911..30958ddc4cdc 100644
--- a/drivers/gpu/drm/xe/xe_sync.h
+++ b/drivers/gpu/drm/xe/xe_sync.h
@@ -19,7 +19,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
 int xe_sync_entry_wait(struct xe_sync_entry *sync);
 int xe_sync_entry_add_deps(struct xe_sync_entry *sync,
 			   struct xe_sched_job *job);
-bool xe_sync_entry_signal(struct xe_sync_entry *sync,
+void xe_sync_entry_signal(struct xe_sync_entry *sync,
 			  struct xe_sched_job *job,
 			  struct dma_fence *fence);
 void xe_sync_entry_cleanup(struct xe_sync_entry *sync);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 07023506ce6b..126b2d1b4e84 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -457,7 +457,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	struct dma_fence *rebind_fence;
 	unsigned int fence_count = 0;
 	LIST_HEAD(preempt_fences);
-	int err;
+	int err = 0;
 	long wait;
 	int __maybe_unused tries = 0;
 
@@ -472,22 +472,9 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	down_write(&vm->lock);
 
 retry:
-	if (vm->async_ops.error)
+	if (vm->ops_state.error || vm->ops_state.munmap_rebind_inflight)
 		goto out_unlock_outer;
 
-	/*
-	 * Extreme corner where we exit a VM error state with a munmap style VM
-	 * unbind inflight which requires a rebind. In this case the rebind
-	 * needs to install some fences into the dma-resv slots. The worker to
-	 * do this queued, let that worker make progress by dropping vm->lock
-	 * and trying this again.
-	 */
-	if (vm->async_ops.munmap_rebind_inflight) {
-		up_write(&vm->lock);
-		flush_work(&vm->async_ops.work);
-		goto retry;
-	}
-
 	if (xe_vm_userptr_check_repin(vm)) {
 		err = xe_vm_userptr_pin(vm);
 		if (err)
@@ -990,7 +977,6 @@ static struct drm_gpuva_fn_ops gpuva_ops = {
 	.op_alloc = xe_vm_op_alloc,
 };
 
-static void xe_vma_op_work_func(struct work_struct *w);
 static void vm_destroy_work_func(struct work_struct *w);
 
 struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
@@ -1022,9 +1008,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 	INIT_LIST_HEAD(&vm->notifier.rebind_list);
 	spin_lock_init(&vm->notifier.list_lock);
 
-	INIT_LIST_HEAD(&vm->async_ops.pending);
-	INIT_WORK(&vm->async_ops.work, xe_vma_op_work_func);
-	spin_lock_init(&vm->async_ops.lock);
+	INIT_LIST_HEAD(&vm->ops_state.pending);
 
 	INIT_WORK(&vm->destroy_work, vm_destroy_work_func);
 
@@ -1079,11 +1063,6 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 		vm->flags |= XE_VM_FLAG_COMPUTE_MODE;
 	}
 
-	if (flags & DRM_XE_VM_CREATE_ASYNC_BIND_OPS) {
-		vm->async_ops.fence.context = dma_fence_context_alloc(1);
-		vm->flags |= XE_VM_FLAG_ASYNC_BIND_OPS;
-	}
-
 	/* Fill pt_root after allocating scratch tables */
 	for_each_gt(gt, xe, id) {
 		if (!vm->pt_root[id])
@@ -1105,7 +1084,9 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 			migrate_vm = xe_migrate_get_vm(gt->migrate);
 			eng = xe_engine_create_class(xe, gt, migrate_vm,
 						     XE_ENGINE_CLASS_COPY,
-						     ENGINE_FLAG_VM);
+						     ENGINE_FLAG_VM |
+						     ((flags & XE_VM_FLAG_ASYNC_DEFAULT) ?
+						     ENGINE_FLAG_VM_ASYNC : 0));
 			xe_vm_put(migrate_vm);
 			if (IS_ERR(eng)) {
 				xe_vm_close_and_put(vm);
@@ -1160,42 +1141,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 	return ERR_PTR(err);
 }
 
-static void flush_async_ops(struct xe_vm *vm)
-{
-	queue_work(system_unbound_wq, &vm->async_ops.work);
-	flush_work(&vm->async_ops.work);
-}
-
-static void vm_error_capture(struct xe_vm *vm, int err,
-			     u32 op, u64 addr, u64 size)
-{
-	struct drm_xe_vm_bind_op_error_capture capture;
-	u64 __user *address =
-		u64_to_user_ptr(vm->async_ops.error_capture.addr);
-	bool in_kthread = !current->mm;
-
-	capture.error = err;
-	capture.op = op;
-	capture.addr = addr;
-	capture.size = size;
-
-	if (in_kthread) {
-		if (!mmget_not_zero(vm->async_ops.error_capture.mm))
-			goto mm_closed;
-		kthread_use_mm(vm->async_ops.error_capture.mm);
-	}
-
-	if (copy_to_user(address, &capture, sizeof(capture)))
-		XE_WARN_ON("Copy to user failed");
-
-	if (in_kthread) {
-		kthread_unuse_mm(vm->async_ops.error_capture.mm);
-		mmput(vm->async_ops.error_capture.mm);
-	}
-
-mm_closed:
-	wake_up_all(&vm->async_ops.error_capture.wq);
-}
+static void vm_bind_ioctl_ops_cleanup(struct xe_vm *vm);
 
 void xe_vm_close_and_put(struct xe_vm *vm)
 {
@@ -1214,7 +1160,6 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 
 	vm->size = 0;
 	smp_mb();
-	flush_async_ops(vm);
 	if (xe_vm_in_compute_mode(vm))
 		flush_work(&vm->preempt.rebind_work);
 
@@ -1227,6 +1172,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 	}
 
 	down_write(&vm->lock);
+	vm_bind_ioctl_ops_cleanup(vm);
 	xe_vm_lock(vm, &exec, 0, false);
 	drm_gpuva_iter_for_each(gpuva, it) {
 		vma = gpuva_to_vma(gpuva);
@@ -1281,9 +1227,6 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 		xe_vma_destroy_unlocked(vma);
 	}
 
-	if (vm->async_ops.error_capture.addr)
-		wake_up_all(&vm->async_ops.error_capture.wq);
-
 	up_write(&vm->lock);
 
 	drm_gpuva_manager_destroy(&vm->mgr);
@@ -1437,10 +1380,8 @@ xe_vm_unbind_vma(struct xe_vma *vma, struct xe_engine *e,
 
 err_fences:
 	if (fences) {
-		while (cur_fence) {
-			/* FIXME: Rewind the previous binds? */
+		while (cur_fence)
 			dma_fence_put(fences[--cur_fence]);
-		}
 		kfree(fences);
 	}
 
@@ -1511,100 +1452,24 @@ xe_vm_bind_vma(struct xe_vma *vma, struct xe_engine *e,
 
 err_fences:
 	if (fences) {
-		while (cur_fence) {
-			/* FIXME: Rewind the previous binds? */
+		while (cur_fence)
 			dma_fence_put(fences[--cur_fence]);
-		}
 		kfree(fences);
 	}
 
 	return ERR_PTR(err);
 }
 
-struct async_op_fence {
-	struct dma_fence fence;
-	struct dma_fence *wait_fence;
-	struct dma_fence_cb cb;
-	struct xe_vm *vm;
-	wait_queue_head_t wq;
-	bool started;
-};
-
-static const char *async_op_fence_get_driver_name(struct dma_fence *dma_fence)
-{
-	return "xe";
-}
-
-static const char *
-async_op_fence_get_timeline_name(struct dma_fence *dma_fence)
+static bool xe_vm_sync_mode(struct xe_vm *vm, struct xe_engine *e)
 {
-	return "async_op_fence";
-}
-
-static const struct dma_fence_ops async_op_fence_ops = {
-	.get_driver_name = async_op_fence_get_driver_name,
-	.get_timeline_name = async_op_fence_get_timeline_name,
-};
-
-static void async_op_fence_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
-{
-	struct async_op_fence *afence =
-		container_of(cb, struct async_op_fence, cb);
-
-	afence->fence.error = afence->wait_fence->error;
-	dma_fence_signal(&afence->fence);
-	xe_vm_put(afence->vm);
-	dma_fence_put(afence->wait_fence);
-	dma_fence_put(&afence->fence);
-}
-
-static void add_async_op_fence_cb(struct xe_vm *vm,
-				  struct dma_fence *fence,
-				  struct async_op_fence *afence)
-{
-	int ret;
-
-	if (!xe_vm_no_dma_fences(vm)) {
-		afence->started = true;
-		smp_wmb();
-		wake_up_all(&afence->wq);
-	}
-
-	afence->wait_fence = dma_fence_get(fence);
-	afence->vm = xe_vm_get(vm);
-	dma_fence_get(&afence->fence);
-	ret = dma_fence_add_callback(fence, &afence->cb, async_op_fence_cb);
-	if (ret == -ENOENT) {
-		afence->fence.error = afence->wait_fence->error;
-		dma_fence_signal(&afence->fence);
-	}
-	if (ret) {
-		xe_vm_put(vm);
-		dma_fence_put(afence->wait_fence);
-		dma_fence_put(&afence->fence);
-	}
-	XE_WARN_ON(ret && ret != -ENOENT);
-}
-
-int xe_vm_async_fence_wait_start(struct dma_fence *fence)
-{
-	if (fence->ops == &async_op_fence_ops) {
-		struct async_op_fence *afence =
-			container_of(fence, struct async_op_fence, fence);
-
-		XE_BUG_ON(xe_vm_no_dma_fences(afence->vm));
-
-		smp_rmb();
-		return wait_event_interruptible(afence->wq, afence->started);
-	}
-
-	return 0;
+	return e ? !(e->flags & ENGINE_FLAG_VM_ASYNC) :
+		!(vm->flags & XE_VM_FLAG_ASYNC_DEFAULT);
 }
 
 static int __xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma,
 			struct xe_engine *e, struct xe_sync_entry *syncs,
-			u32 num_syncs, struct async_op_fence *afence,
-			bool immediate, bool first_op, bool last_op)
+			u32 num_syncs, bool immediate, bool first_op,
+			bool last_op)
 {
 	struct dma_fence *fence;
 
@@ -1624,17 +1489,18 @@ static int __xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma,
 		for (i = 0; last_op && i < num_syncs; i++)
 			xe_sync_entry_signal(&syncs[i], NULL, fence);
 	}
-	if (afence)
-		add_async_op_fence_cb(vm, fence, afence);
 
+	if (last_op && xe_vm_sync_mode(vm, e))
+		dma_fence_wait(fence, true);
 	dma_fence_put(fence);
+
 	return 0;
 }
 
 static int xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma, struct xe_engine *e,
 		      struct xe_bo *bo, struct xe_sync_entry *syncs,
-		      u32 num_syncs, struct async_op_fence *afence,
-		      bool immediate, bool first_op, bool last_op)
+		      u32 num_syncs, bool immediate, bool first_op,
+		      bool last_op)
 {
 	int err;
 
@@ -1647,14 +1513,13 @@ static int xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma, struct xe_engine *e,
 			return err;
 	}
 
-	return __xe_vm_bind(vm, vma, e, syncs, num_syncs, afence, immediate,
-			    first_op, last_op);
+	return __xe_vm_bind(vm, vma, e, syncs, num_syncs, immediate, first_op,
+			    last_op);
 }
 
 static int xe_vm_unbind(struct xe_vm *vm, struct xe_vma *vma,
 			struct xe_engine *e, struct xe_sync_entry *syncs,
-			u32 num_syncs, struct async_op_fence *afence,
-			bool first_op, bool last_op)
+			u32 num_syncs, bool first_op, bool last_op)
 {
 	struct dma_fence *fence;
 
@@ -1664,100 +1529,18 @@ static int xe_vm_unbind(struct xe_vm *vm, struct xe_vma *vma,
 	fence = xe_vm_unbind_vma(vma, e, syncs, num_syncs, first_op, last_op);
 	if (IS_ERR(fence))
 		return PTR_ERR(fence);
-	if (afence)
-		add_async_op_fence_cb(vm, fence, afence);
 
 	xe_vma_destroy(vma, fence);
+	if (last_op && xe_vm_sync_mode(vm, e))
+		dma_fence_wait(fence, true);
 	dma_fence_put(fence);
 
 	return 0;
 }
 
-static int vm_set_error_capture_address(struct xe_device *xe, struct xe_vm *vm,
-					u64 value)
-{
-	if (XE_IOCTL_ERR(xe, !value))
-		return -EINVAL;
-
-	if (XE_IOCTL_ERR(xe, !(vm->flags & XE_VM_FLAG_ASYNC_BIND_OPS)))
-		return -ENOTSUPP;
-
-	if (XE_IOCTL_ERR(xe, vm->async_ops.error_capture.addr))
-		return -ENOTSUPP;
-
-	vm->async_ops.error_capture.mm = current->mm;
-	vm->async_ops.error_capture.addr = value;
-	init_waitqueue_head(&vm->async_ops.error_capture.wq);
-
-	return 0;
-}
-
-typedef int (*xe_vm_set_property_fn)(struct xe_device *xe, struct xe_vm *vm,
-				     u64 value);
-
-static const xe_vm_set_property_fn vm_set_property_funcs[] = {
-	[XE_VM_PROPERTY_BIND_OP_ERROR_CAPTURE_ADDRESS] =
-		vm_set_error_capture_address,
-};
-
-static int vm_user_ext_set_property(struct xe_device *xe, struct xe_vm *vm,
-				    u64 extension)
-{
-	u64 __user *address = u64_to_user_ptr(extension);
-	struct drm_xe_ext_vm_set_property ext;
-	int err;
-
-	err = __copy_from_user(&ext, address, sizeof(ext));
-	if (XE_IOCTL_ERR(xe, err))
-		return -EFAULT;
-
-	if (XE_IOCTL_ERR(xe, ext.property >=
-			 ARRAY_SIZE(vm_set_property_funcs)))
-		return -EINVAL;
-
-	return vm_set_property_funcs[ext.property](xe, vm, ext.value);
-}
-
-typedef int (*xe_vm_user_extension_fn)(struct xe_device *xe, struct xe_vm *vm,
-				       u64 extension);
-
-static const xe_vm_set_property_fn vm_user_extension_funcs[] = {
-	[XE_VM_EXTENSION_SET_PROPERTY] = vm_user_ext_set_property,
-};
-
-#define MAX_USER_EXTENSIONS	16
-static int vm_user_extensions(struct xe_device *xe, struct xe_vm *vm,
-			      u64 extensions, int ext_number)
-{
-	u64 __user *address = u64_to_user_ptr(extensions);
-	struct xe_user_extension ext;
-	int err;
-
-	if (XE_IOCTL_ERR(xe, ext_number >= MAX_USER_EXTENSIONS))
-		return -E2BIG;
-
-	err = __copy_from_user(&ext, address, sizeof(ext));
-	if (XE_IOCTL_ERR(xe, err))
-		return -EFAULT;
-
-	if (XE_IOCTL_ERR(xe, ext.name >=
-			 ARRAY_SIZE(vm_user_extension_funcs)))
-		return -EINVAL;
-
-	err = vm_user_extension_funcs[ext.name](xe, vm, extensions);
-	if (XE_IOCTL_ERR(xe, err))
-		return err;
-
-	if (ext.next_extension)
-		return vm_user_extensions(xe, vm, ext.next_extension,
-					  ++ext_number);
-
-	return 0;
-}
-
 #define ALL_DRM_XE_VM_CREATE_FLAGS (DRM_XE_VM_CREATE_SCRATCH_PAGE | \
 				    DRM_XE_VM_CREATE_COMPUTE_MODE | \
-				    DRM_XE_VM_CREATE_ASYNC_BIND_OPS | \
+				    DRM_XE_VM_CREATE_ASYNC_DEFAULT | \
 				    DRM_XE_VM_CREATE_FAULT_MODE)
 
 int xe_vm_create_ioctl(struct drm_device *dev, void *data,
@@ -1794,12 +1577,15 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
 			 !xe->info.supports_usm))
 		return -EINVAL;
 
+	if (XE_IOCTL_ERR(xe, args->extensions))
+		return -EINVAL;
+
 	if (args->flags & DRM_XE_VM_CREATE_SCRATCH_PAGE)
 		flags |= XE_VM_FLAG_SCRATCH_PAGE;
 	if (args->flags & DRM_XE_VM_CREATE_COMPUTE_MODE)
 		flags |= XE_VM_FLAG_COMPUTE_MODE;
-	if (args->flags & DRM_XE_VM_CREATE_ASYNC_BIND_OPS)
-		flags |= XE_VM_FLAG_ASYNC_BIND_OPS;
+	if (args->flags & DRM_XE_VM_CREATE_ASYNC_DEFAULT)
+		flags |= XE_VM_FLAG_ASYNC_DEFAULT;
 	if (args->flags & DRM_XE_VM_CREATE_FAULT_MODE)
 		flags |= XE_VM_FLAG_FAULT_MODE;
 
@@ -1807,14 +1593,6 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
 	if (IS_ERR(vm))
 		return PTR_ERR(vm);
 
-	if (args->extensions) {
-		err = vm_user_extensions(xe, vm, args->extensions, 0);
-		if (XE_IOCTL_ERR(xe, err)) {
-			xe_vm_close_and_put(vm);
-			return err;
-		}
-	}
-
 	mutex_lock(&xef->vm.lock);
 	err = xa_alloc(&xef->vm.xa, &id, vm, xa_limit_32b, GFP_KERNEL);
 	mutex_unlock(&xef->vm.lock);
@@ -1884,8 +1662,7 @@ static const u32 region_to_mem_type[] = {
 static int xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma,
 			  struct xe_engine *e, u32 region,
 			  struct xe_sync_entry *syncs, u32 num_syncs,
-			  struct async_op_fence *afence, bool first_op,
-			  bool last_op)
+			  bool first_op, bool last_op)
 {
 	int err;
 
@@ -1899,7 +1676,7 @@ static int xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma,
 
 	if (vma->gt_mask != (vma->gt_present & ~vma->usm.gt_invalidated)) {
 		return xe_vm_bind(vm, vma, e, xe_vma_bo(vma), syncs, num_syncs,
-				  afence, true, first_op, last_op);
+				  true, first_op, last_op);
 	} else {
 		int i;
 
@@ -1907,54 +1684,17 @@ static int xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma,
 		for (i = 0; last_op && i < num_syncs; i++)
 			xe_sync_entry_signal(&syncs[i], NULL,
 					     dma_fence_get_stub());
-		if (afence)
-			dma_fence_signal(&afence->fence);
+
 		return 0;
 	}
 }
 
 #define VM_BIND_OP(op)	(op & 0xffff)
 
-static void vm_set_async_error(struct xe_vm *vm, int err)
+static void xe_vm_set_error(struct xe_vm *vm, int err)
 {
 	lockdep_assert_held(&vm->lock);
-	vm->async_ops.error = err;
-}
-
-static int vm_bind_ioctl_lookup_vma(struct xe_vm *vm, struct xe_bo *bo,
-				    u64 addr, u64 range, u32 op)
-{
-	struct xe_device *xe = xe_vm_device(vm);
-	struct xe_vma *vma;
-	bool async = !!(op & XE_VM_BIND_FLAG_ASYNC);
-
-	lockdep_assert_held(&vm->lock);
-
-	switch (VM_BIND_OP(op)) {
-	case XE_VM_BIND_OP_MAP:
-	case XE_VM_BIND_OP_MAP_USERPTR:
-		vma = xe_vm_find_overlapping_vma(vm, addr, range);
-		if (XE_IOCTL_ERR(xe, vma && !async))
-			return -EBUSY;
-		break;
-	case XE_VM_BIND_OP_UNMAP:
-	case XE_VM_BIND_OP_PREFETCH:
-		vma = xe_vm_find_overlapping_vma(vm, addr, range);
-		if (XE_IOCTL_ERR(xe, !vma) ||
-		    XE_IOCTL_ERR(xe, (xe_vma_start(vma) != addr ||
-				 xe_vma_end(vma) != addr + range) && !async))
-			return -EINVAL;
-		break;
-	case XE_VM_BIND_OP_UNMAP_ALL:
-		if (XE_IOCTL_ERR(xe, list_empty(&bo->ttm.base.gpuva.list)))
-			return -ENODATA;
-		break;
-	default:
-		XE_BUG_ON("NOT POSSIBLE");
-		return -EINVAL;
-	}
-
-	return 0;
+	vm->ops_state.error = err;
 }
 
 static void prep_vma_destroy(struct xe_vm *vm, struct xe_vma *vma,
@@ -2162,41 +1902,20 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
 {
 	struct xe_vma_op *last_op = NULL;
 	struct list_head *async_list = NULL;
-	struct async_op_fence *fence = NULL;
-	int err, i;
+	int i;
 
 	lockdep_assert_held_write(&vm->lock);
-	XE_BUG_ON(num_ops_list > 1 && !async);
-
-	if (num_syncs && async) {
-		u64 seqno;
-
-		fence = kmalloc(sizeof(*fence), GFP_KERNEL);
-		if (!fence)
-			return -ENOMEM;
-
-		seqno = e ? ++e->bind.fence_seqno : ++vm->async_ops.fence.seqno;
-		dma_fence_init(&fence->fence, &async_op_fence_ops,
-			       &vm->async_ops.lock, e ? e->bind.fence_ctx :
-			       vm->async_ops.fence.context, seqno);
-
-		if (!xe_vm_no_dma_fences(vm)) {
-			fence->vm = vm;
-			fence->started = false;
-			init_waitqueue_head(&fence->wq);
-		}
-	}
 
 	for (i = 0; i < num_ops_list; ++i) {
 		struct drm_gpuva_ops *__ops = ops[i];
 		struct drm_gpuva_op *__op;
+		bool has_ops = false;
 
 		drm_gpuva_for_each_op(__op, __ops) {
 			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
 			bool first = !async_list;
 
-			XE_BUG_ON(!first && !async);
-
+			has_ops = true;
 			INIT_LIST_HEAD(&op->link);
 			if (first)
 				async_list = ops_list;
@@ -2218,10 +1937,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
 				vma = new_vma(vm, &op->base.map,
 					      op->gt_mask, op->map.read_only,
 					      op->map.null );
-				if (IS_ERR(vma)) {
-					err = PTR_ERR(vma);
-					goto free_fence;
-				}
+				if (IS_ERR(vma))
+					return PTR_ERR(vma);
 
 				op->map.vma = vma;
 				break;
@@ -2246,10 +1963,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
 					vma = new_vma(vm, op->base.remap.prev,
 						      op->gt_mask, read_only,
 						      null);
-					if (IS_ERR(vma)) {
-						err = PTR_ERR(vma);
-						goto free_fence;
-					}
+					if (IS_ERR(vma))
+						return PTR_ERR(vma);
 
 					op->remap.prev = vma;
 
@@ -2281,10 +1996,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
 					vma = new_vma(vm, op->base.remap.next,
 						      op->gt_mask, read_only,
 						      null);
-					if (IS_ERR(vma)) {
-						err = PTR_ERR(vma);
-						goto free_fence;
-					}
+					if (IS_ERR(vma))
+						return PTR_ERR(vma);
 
 					op->remap.next = vma;
 					op->remap.skip_next = !xe_vma_is_userptr(old) &&
@@ -2307,21 +2020,22 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
 			last_op = op;
 		}
 
-		last_op->ops = __ops;
+		if (has_ops) {
+			last_op->ops = __ops;
+		} else {
+			drm_gpuva_ops_free(&vm->mgr, __ops);
+			ops[i] = NULL;
+		}
 	}
 
-	XE_BUG_ON(!last_op);	/* FIXME: This is not an error, handle */
+	if (!last_op)
+		return -ENODATA;
 
 	last_op->flags |= XE_VMA_OP_LAST;
 	last_op->num_syncs = num_syncs;
 	last_op->syncs = syncs;
-	last_op->fence = fence;
 
 	return 0;
-
-free_fence:
-	kfree(fence);
-	return err;
 }
 
 static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)
@@ -2401,7 +2115,7 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
 	switch (op->base.op) {
 	case DRM_GPUVA_OP_MAP:
 		err = xe_vm_bind(vm, vma, op->engine, xe_vma_bo(vma),
-				 op->syncs, op->num_syncs, op->fence,
+				 op->syncs, op->num_syncs,
 				 op->map.immediate || !xe_vm_in_fault_mode(vm),
 				 op->flags & XE_VMA_OP_FIRST,
 				 op->flags & XE_VMA_OP_LAST);
@@ -2413,15 +2127,14 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
 
 		if (!op->remap.unmap_done) {
 			if (prev || next) {
-				vm->async_ops.munmap_rebind_inflight = true;
+				vm->ops_state.munmap_rebind_inflight = true;
 				vma->gpuva.flags |= XE_VMA_FIRST_REBIND;
 			}
 			err = xe_vm_unbind(vm, vma, op->engine, op->syncs,
 					   op->num_syncs,
-					   !prev && !next ? op->fence : NULL,
 					   op->flags & XE_VMA_OP_FIRST,
-					   op->flags & XE_VMA_OP_LAST && !prev &&
-					   !next);
+					   op->flags & XE_VMA_OP_LAST &&
+					   !prev && !next);
 			if (err)
 				break;
 			op->remap.unmap_done = true;
@@ -2431,8 +2144,7 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
 			op->remap.prev->gpuva.flags |= XE_VMA_LAST_REBIND;
 			err = xe_vm_bind(vm, op->remap.prev, op->engine,
 					 xe_vma_bo(op->remap.prev), op->syncs,
-					 op->num_syncs,
-					 !next ? op->fence : NULL, true, false,
+					 op->num_syncs, true, false,
 					 op->flags & XE_VMA_OP_LAST && !next);
 			op->remap.prev->gpuva.flags &= ~XE_VMA_LAST_REBIND;
 			if (err)
@@ -2445,26 +2157,25 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
 			err = xe_vm_bind(vm, op->remap.next, op->engine,
 					 xe_vma_bo(op->remap.next),
 					 op->syncs, op->num_syncs,
-					 op->fence, true, false,
+					 true, false,
 					 op->flags & XE_VMA_OP_LAST);
 			op->remap.next->gpuva.flags &= ~XE_VMA_LAST_REBIND;
 			if (err)
 				break;
 			op->remap.next = NULL;
 		}
-		vm->async_ops.munmap_rebind_inflight = false;
+		vm->ops_state.munmap_rebind_inflight = false;
 
 		break;
 	}
 	case DRM_GPUVA_OP_UNMAP:
 		err = xe_vm_unbind(vm, vma, op->engine, op->syncs,
-				   op->num_syncs, op->fence,
-				   op->flags & XE_VMA_OP_FIRST,
+				   op->num_syncs, op->flags & XE_VMA_OP_FIRST,
 				   op->flags & XE_VMA_OP_LAST);
 		break;
 	case DRM_GPUVA_OP_PREFETCH:
 		err = xe_vm_prefetch(vm, vma, op->engine, op->prefetch.region,
-				     op->syncs, op->num_syncs, op->fence,
+				     op->syncs, op->num_syncs,
 				     op->flags & XE_VMA_OP_FIRST,
 				     op->flags & XE_VMA_OP_LAST);
 		break;
@@ -2538,20 +2249,17 @@ static void xe_vma_op_cleanup(struct xe_vm *vm, struct xe_vma_op *op)
 {
 	bool last = op->flags & XE_VMA_OP_LAST;
 
+	lockdep_assert_held_write(&vm->lock);
+
 	if (last) {
 		while (op->num_syncs--)
 			xe_sync_entry_cleanup(&op->syncs[op->num_syncs]);
 		kfree(op->syncs);
 		if (op->engine)
 			xe_engine_put(op->engine);
-		if (op->fence)
-			dma_fence_put(&op->fence->fence);
 	}
-	if (!list_empty(&op->link)) {
-		spin_lock_irq(&vm->async_ops.lock);
+	if (!list_empty(&op->link))
 		list_del(&op->link);
-		spin_unlock_irq(&vm->async_ops.lock);
-	}
 	if (op->ops)
 		drm_gpuva_ops_free(&vm->mgr, op->ops);
 	if (last)
@@ -2606,127 +2314,23 @@ static void xe_vma_op_unwind(struct xe_vm *vm, struct xe_vma_op *op,
 	}
 }
 
-static struct xe_vma_op *next_vma_op(struct xe_vm *vm)
-{
-	return list_first_entry_or_null(&vm->async_ops.pending,
-					struct xe_vma_op, link);
-}
-
-static void xe_vma_op_work_func(struct work_struct *w)
-{
-	struct xe_vm *vm = container_of(w, struct xe_vm, async_ops.work);
-
-	for (;;) {
-		struct xe_vma_op *op;
-		int err;
-
-		if (vm->async_ops.error && !xe_vm_is_closed(vm))
-			break;
-
-		spin_lock_irq(&vm->async_ops.lock);
-		op = next_vma_op(vm);
-		spin_unlock_irq(&vm->async_ops.lock);
-
-		if (!op)
-			break;
-
-		if (!xe_vm_is_closed(vm)) {
-			down_write(&vm->lock);
-			err = xe_vma_op_execute(vm, op);
-			if (err) {
-				drm_warn(&xe_vm_device(vm)->drm,
-					 "Async VM op(%d) failed with %d",
-					 op->base.op, err);
-				vm_set_async_error(vm, err);
-				up_write(&vm->lock);
-
-				if (vm->async_ops.error_capture.addr)
-					vm_error_capture(vm, err, 0, 0, 0);
-				break;
-			}
-			up_write(&vm->lock);
-		} else {
-			struct xe_vma *vma;
-
-			switch (op->base.op) {
-			case DRM_GPUVA_OP_REMAP:
-				vma = gpuva_to_vma(op->base.remap.unmap->va);
-				trace_xe_vma_flush(vma);
-
-				down_write(&vm->lock);
-				xe_vma_destroy_unlocked(vma);
-				up_write(&vm->lock);
-				break;
-			case DRM_GPUVA_OP_UNMAP:
-				vma = gpuva_to_vma(op->base.unmap.va);
-				trace_xe_vma_flush(vma);
-
-				down_write(&vm->lock);
-				xe_vma_destroy_unlocked(vma);
-				up_write(&vm->lock);
-				break;
-			default:
-				/* Nothing to do */
-				break;
-			}
-
-			if (op->fence && !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-						   &op->fence->fence.flags)) {
-				if (!xe_vm_no_dma_fences(vm)) {
-					op->fence->started = true;
-					smp_wmb();
-					wake_up_all(&op->fence->wq);
-				}
-				dma_fence_signal(&op->fence->fence);
-			}
-		}
-
-		xe_vma_op_cleanup(vm, op);
-	}
-}
-
-/*
- * Commit operations list, this step cannot fail in async mode, can fail if the
- * bind operation fails in sync mode.
- */
 static int vm_bind_ioctl_ops_commit(struct xe_vm *vm,
-				    struct list_head *ops_list, bool async)
+				    struct list_head *ops_list,
+				    bool reclaim)
 {
-	struct xe_vma_op *op, *last_op, *next;
+	struct xe_vma_op *op, *next;
 	int err;
 
 	lockdep_assert_held_write(&vm->lock);
 
 	list_for_each_entry(op, ops_list, link) {
-		last_op = op;
 		err = xe_vma_op_commit(vm, op);
 		if (err)
 			goto unwind;
 	}
 
-	if (!async) {
-		err = xe_vma_op_execute(vm, last_op);
-		if (err)
-			goto unwind;
-		xe_vma_op_cleanup(vm, last_op);
-	} else {
-		int i;
-		bool installed = false;
-
-		for (i = 0; i < last_op->num_syncs; i++)
-			installed |= xe_sync_entry_signal(&last_op->syncs[i],
-							  NULL,
-							  &last_op->fence->fence);
-		if (!installed && last_op->fence)
-			dma_fence_signal(&last_op->fence->fence);
-
-		spin_lock_irq(&vm->async_ops.lock);
-		list_splice_tail(ops_list, &vm->async_ops.pending);
-		spin_unlock_irq(&vm->async_ops.lock);
-
-		if (!vm->async_ops.error)
-			queue_work(system_unbound_wq, &vm->async_ops.work);
-	}
+	if (!reclaim)
+		list_splice_tail(ops_list, &vm->ops_state.pending);
 
 	return 0;
 
@@ -2764,15 +2368,49 @@ static void vm_bind_ioctl_ops_unwind(struct xe_vm *vm,
 	}
 }
 
+static int vm_bind_ioctl_ops_execute(struct xe_vm *vm,
+				     struct list_head *ops_list)
+{
+	struct xe_vma_op *op, *next;
+	int err;
+
+	lockdep_assert_held_write(&vm->lock);
+
+	list_for_each_entry_safe(op, next, ops_list, link) {
+		err = xe_vma_op_execute(vm, op);
+		if (err) {
+			drm_warn(&xe_vm_device(vm)->drm,
+				 "Async VM op(%d) failed with %d",
+				 op->base.op, err);
+			xe_vm_set_error(vm, err);
+			return -ENOSPC;
+		}
+		xe_vma_op_cleanup(vm, op);
+	}
+
+	return 0;
+}
+
+static void vm_bind_ioctl_ops_cleanup(struct xe_vm *vm)
+{
+	struct xe_vma_op *op, *next;
+
+	lockdep_assert_held_write(&vm->lock);
+
+	list_for_each_entry_safe(op, next, &vm->ops_state.pending, link)
+		xe_vma_op_cleanup(vm, op);
+}
+
 #ifdef TEST_VM_ASYNC_OPS_ERROR
 #define SUPPORTED_FLAGS	\
 	(FORCE_ASYNC_OP_ERROR | XE_VM_BIND_FLAG_ASYNC | \
 	 XE_VM_BIND_FLAG_READONLY | XE_VM_BIND_FLAG_IMMEDIATE | \
-	 XE_VM_BIND_FLAG_NULL | 0xffff)
+	 XE_VM_BIND_FLAG_NULL | XE_VM_BIND_FLAG_RECLAIM | 0xffff)
 #else
 #define SUPPORTED_FLAGS	\
 	(XE_VM_BIND_FLAG_ASYNC | XE_VM_BIND_FLAG_READONLY | \
-	 XE_VM_BIND_FLAG_IMMEDIATE | XE_VM_BIND_FLAG_NULL | 0xffff)
+	 XE_VM_BIND_FLAG_IMMEDIATE | XE_VM_BIND_FLAG_NULL | \
+	 XE_VM_BIND_FLAG_RECLAIM | 0xffff)
 #endif
 #define XE_64K_PAGE_MASK 0xffffull
 
@@ -2781,7 +2419,7 @@ static void vm_bind_ioctl_ops_unwind(struct xe_vm *vm,
 static int vm_bind_ioctl_check_args(struct xe_device *xe,
 				    struct drm_xe_vm_bind *args,
 				    struct drm_xe_vm_bind_op **bind_ops,
-				    bool *async)
+				    bool *async, bool *reclaim)
 {
 	int err;
 	int i;
@@ -2822,31 +2460,31 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
 
 		if (i == 0) {
 			*async = !!(op & XE_VM_BIND_FLAG_ASYNC);
-		} else if (XE_IOCTL_ERR(xe, !*async) ||
-			   XE_IOCTL_ERR(xe, !(op & XE_VM_BIND_FLAG_ASYNC)) ||
+			*reclaim = !!(op & XE_VM_BIND_FLAG_RECLAIM);
+			if (XE_IOCTL_ERR(xe, !*async && args->num_syncs) ||
+			    XE_IOCTL_ERR(xe, *async && *reclaim)) {
+				err = -EINVAL;
+				goto free_bind_ops;
+			}
+		} else if (XE_IOCTL_ERR(xe, *async !=
+					!!(op & XE_VM_BIND_FLAG_ASYNC)) ||
+			   XE_IOCTL_ERR(xe, *reclaim !=
+					!!(op & XE_VM_BIND_FLAG_RECLAIM)) ||
 			   XE_IOCTL_ERR(xe, VM_BIND_OP(op) ==
 					XE_VM_BIND_OP_RESTART)) {
 			err = -EINVAL;
 			goto free_bind_ops;
 		}
 
-		if (XE_IOCTL_ERR(xe, !*async &&
-				 VM_BIND_OP(op) == XE_VM_BIND_OP_UNMAP_ALL)) {
-			err = -EINVAL;
-			goto free_bind_ops;
-		}
-
-		if (XE_IOCTL_ERR(xe, !*async &&
-				 VM_BIND_OP(op) == XE_VM_BIND_OP_PREFETCH)) {
-			err = -EINVAL;
-			goto free_bind_ops;
-		}
-
 		if (XE_IOCTL_ERR(xe, VM_BIND_OP(op) >
 				 XE_VM_BIND_OP_PREFETCH) ||
 		    XE_IOCTL_ERR(xe, op & ~SUPPORTED_FLAGS) ||
 		    XE_IOCTL_ERR(xe, obj && null) ||
 		    XE_IOCTL_ERR(xe, obj_offset && null) ||
+		    XE_IOCTL_ERR(xe, VM_BIND_OP(op) != XE_VM_BIND_OP_UNMAP &&
+				 VM_BIND_OP(op) != XE_VM_BIND_OP_UNMAP_ALL &&
+				 VM_BIND_OP(op) != XE_VM_BIND_OP_RESTART &&
+				 *reclaim) ||
 		    XE_IOCTL_ERR(xe, VM_BIND_OP(op) != XE_VM_BIND_OP_MAP &&
 				 null) ||
 		    XE_IOCTL_ERR(xe, !obj &&
@@ -2904,11 +2542,11 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	struct xe_sync_entry *syncs = NULL;
 	struct drm_xe_vm_bind_op *bind_ops;
 	LIST_HEAD(ops_list);
-	bool async;
+	bool async, reclaim, restart = false;
 	int err;
 	int i;
 
-	err = vm_bind_ioctl_check_args(xe, args, &bind_ops, &async);
+	err = vm_bind_ioctl_check_args(xe, args, &bind_ops, &async, &reclaim);
 	if (err)
 		return err;
 
@@ -2924,6 +2562,35 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		goto put_vm;
 	}
 
+	if (XE_IOCTL_ERR(xe, vm->ops_state.error && !reclaim)) {
+		err = -EALREADY;
+		goto free_objs;
+	}
+
+	if (XE_IOCTL_ERR(xe, !vm->ops_state.error && reclaim)) {
+		err = -EINVAL;
+		goto free_objs;
+	}
+
+	if (VM_BIND_OP(bind_ops[0].op) == XE_VM_BIND_OP_RESTART) {
+		if (XE_IOCTL_ERR(xe, args->num_syncs))
+			err = EINVAL;
+		if (XE_IOCTL_ERR(xe, !err && !vm->ops_state.error))
+			err = -EPROTO;
+		if (err)
+			goto put_vm;
+
+		err = down_write_killable(&vm->lock);
+		if (err)
+			goto put_vm;
+
+		trace_xe_vm_restart(vm);
+		xe_vm_set_error(vm, 0);
+		restart = true;
+
+		goto execute;
+	}
+
 	if (args->engine_id) {
 		e = xe_engine_lookup(xef, args->engine_id);
 		if (XE_IOCTL_ERR(xe, !e)) {
@@ -2934,37 +2601,15 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 			err = -EINVAL;
 			goto put_engine;
 		}
-	}
-
-	if (VM_BIND_OP(bind_ops[0].op) == XE_VM_BIND_OP_RESTART) {
-		if (XE_IOCTL_ERR(xe, !(vm->flags & XE_VM_FLAG_ASYNC_BIND_OPS)))
-			err = -ENOTSUPP;
-		if (XE_IOCTL_ERR(xe, !err && args->num_syncs))
-			err = EINVAL;
-		if (XE_IOCTL_ERR(xe, !err && !vm->async_ops.error))
-			err = -EPROTO;
-
-		if (!err) {
-			down_write(&vm->lock);
-			trace_xe_vm_restart(vm);
-			vm_set_async_error(vm, 0);
-			up_write(&vm->lock);
-
-			queue_work(system_unbound_wq, &vm->async_ops.work);
-
-			/* Rebinds may have been blocked, give worker a kick */
-			if (xe_vm_in_compute_mode(vm))
-				queue_work(xe_vm_device(vm)->ordered_wq,
-					   &vm->preempt.rebind_work);
+		if (XE_IOCTL_ERR(xe, async !=
+				 !!(e->flags & ENGINE_FLAG_VM_ASYNC))) {
+			err = -EINVAL;
+			goto put_engine;
 		}
-
-		goto put_engine;
-	}
-
-	if (XE_IOCTL_ERR(xe, !vm->async_ops.error &&
-			 async != !!(vm->flags & XE_VM_FLAG_ASYNC_BIND_OPS))) {
-		err = -ENOTSUPP;
-		goto put_engine;
+	} else if (XE_IOCTL_ERR(xe, async !=
+				!!(vm->flags & XE_VM_FLAG_ASYNC_DEFAULT))) {
+		err = -EINVAL;
+		goto put_vm;
 	}
 
 	for (i = 0; i < args->num_binds; ++i) {
@@ -3056,17 +2701,6 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	if (err)
 		goto free_syncs;
 
-	/* Do some error checking first to make the unwind easier */
-	for (i = 0; i < args->num_binds; ++i) {
-		u64 range = bind_ops[i].range;
-		u64 addr = bind_ops[i].addr;
-		u32 op = bind_ops[i].op;
-
-		err = vm_bind_ioctl_lookup_vma(vm, bos[i], addr, range, op);
-		if (err)
-			goto release_vm_lock;
-	}
-
 	for (i = 0; i < args->num_binds; ++i) {
 		u64 range = bind_ops[i].range;
 		u64 addr = bind_ops[i].addr;
@@ -3090,10 +2724,29 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	if (err)
 		goto unwind_ops;
 
-	err = vm_bind_ioctl_ops_commit(vm, &ops_list, async);
+	xe_vm_get(vm);
+	if (e)
+		xe_engine_get(e);
+	err = vm_bind_ioctl_ops_commit(vm, &ops_list, reclaim);
+
+execute:
+	if (!err)
+		err = vm_bind_ioctl_ops_execute(vm, reclaim && !restart ?
+						&ops_list :
+						&vm->ops_state.pending);
+
+	/* Rebinds may have been blocked, give worker a kick */
+	if (!err && restart && xe_vm_in_compute_mode(vm))
+		queue_work(xe_vm_device(vm)->ordered_wq,
+			   &vm->preempt.rebind_work);
+
 	up_write(&vm->lock);
 
-	for (i = 0; i < args->num_binds; ++i)
+	if (e)
+		xe_engine_put(e);
+	xe_vm_put(vm);
+
+	for (i = 0; bos && i < args->num_binds; ++i)
 		xe_bo_put(bos[i]);
 
 	kfree(bos);
@@ -3105,9 +2758,10 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 
 unwind_ops:
 	vm_bind_ioctl_ops_unwind(vm, ops, args->num_binds);
-release_vm_lock:
 	up_write(&vm->lock);
 free_syncs:
+	for (i = 0; err == -ENODATA && i < num_syncs; i++)
+		xe_sync_entry_signal(&syncs[i], NULL, dma_fence_get_stub());
 	while (num_syncs--)
 		xe_sync_entry_cleanup(&syncs[num_syncs]);
 
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 47b981d9fc04..908e0c764968 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -171,8 +171,6 @@ struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
 
 int xe_vm_invalidate_vma(struct xe_vma *vma);
 
-int xe_vm_async_fence_wait_start(struct dma_fence *fence);
-
 extern struct ttm_device_funcs xe_ttm_funcs;
 
 struct ttm_buffer_object *xe_vm_ttm_bo(struct xe_vm *vm);
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index ce1260b8d3ef..b48d0eaa8939 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -15,7 +15,6 @@
 #include "xe_device_types.h"
 #include "xe_pt_types.h"
 
-struct async_op_fence;
 struct xe_bo;
 struct xe_sync_entry;
 struct xe_vm;
@@ -143,7 +142,7 @@ struct xe_vm {
 	/** @flags: flags for this VM, statically setup a creation time */
 #define XE_VM_FLAGS_64K			BIT(0)
 #define XE_VM_FLAG_COMPUTE_MODE		BIT(1)
-#define XE_VM_FLAG_ASYNC_BIND_OPS	BIT(2)
+#define XE_VM_FLAG_ASYNC_DEFAULT	BIT(2)
 #define XE_VM_FLAG_MIGRATION		BIT(3)
 #define XE_VM_FLAG_SCRATCH_PAGE		BIT(4)
 #define XE_VM_FLAG_FAULT_MODE		BIT(5)
@@ -179,40 +178,18 @@ struct xe_vm {
 	 */
 	struct work_struct destroy_work;
 
-	/** @async_ops: async VM operations (bind / unbinds) */
+	/** @ops_state: VM operations (bind / unbinds) state*/
 	struct {
-		/** @list: list of pending async VM ops */
+		/** @list: list of pending VM ops */
 		struct list_head pending;
-		/** @work: worker to execute async VM ops */
-		struct work_struct work;
-		/** @lock: protects list of pending async VM ops and fences */
-		spinlock_t lock;
-		/** @error_capture: error capture state */
-		struct {
-			/** @mm: user MM */
-			struct mm_struct *mm;
-			/**
-			 * @addr: user pointer to copy error capture state too
-			 */
-			u64 addr;
-			/** @wq: user fence wait queue for VM errors */
-			wait_queue_head_t wq;
-		} error_capture;
-		/** @fence: fence state */
-		struct {
-			/** @context: context of async fence */
-			u64 context;
-			/** @seqno: seqno of async fence */
-			u32 seqno;
-		} fence;
-		/** @error: error state for async VM ops */
+		/** @error: error state for VM ops */
 		int error;
 		/**
 		 * @munmap_rebind_inflight: an munmap style VM bind is in the
 		 * middle of a set of ops which requires a rebind at the end.
 		 */
 		bool munmap_rebind_inflight;
-	} async_ops;
+	} ops_state;
 
 	/** @userptr: user pointer state */
 	struct {
@@ -366,10 +343,6 @@ struct xe_vma_op {
 	u32 num_syncs;
 	/** @link: async operation link */
 	struct list_head link;
-	/**
-	 * @fence: async operation fence, signaled on last operation complete
-	 */
-	struct async_op_fence *fence;
 	/** @gt_mask: gt mask for this operation */
 	u8 gt_mask;
 	/** @flags: operation flags */
diff --git a/drivers/gpu/drm/xe/xe_wait_user_fence.c b/drivers/gpu/drm/xe/xe_wait_user_fence.c
index 15c2e5aa08d2..3256850e253c 100644
--- a/drivers/gpu/drm/xe/xe_wait_user_fence.c
+++ b/drivers/gpu/drm/xe/xe_wait_user_fence.c
@@ -12,7 +12,6 @@
 #include "xe_device.h"
 #include "xe_gt.h"
 #include "xe_macros.h"
-#include "xe_vm.h"
 
 static int do_compare(u64 addr, u64 value, u64 mask, u16 op)
 {
@@ -80,8 +79,7 @@ static int check_hw_engines(struct xe_device *xe,
 }
 
 #define VALID_FLAGS	(DRM_XE_UFENCE_WAIT_SOFT_OP | \
-			 DRM_XE_UFENCE_WAIT_ABSTIME | \
-			 DRM_XE_UFENCE_WAIT_VM_ERROR)
+			 DRM_XE_UFENCE_WAIT_ABSTIME)
 #define MAX_OP		DRM_XE_UFENCE_WAIT_LTE
 
 int xe_wait_user_fence_ioctl(struct drm_device *dev, void *data,
@@ -93,11 +91,9 @@ int xe_wait_user_fence_ioctl(struct drm_device *dev, void *data,
 	struct drm_xe_engine_class_instance eci[XE_HW_ENGINE_MAX_INSTANCE];
 	struct drm_xe_engine_class_instance __user *user_eci =
 		u64_to_user_ptr(args->instances);
-	struct xe_vm *vm = NULL;
 	u64 addr = args->addr;
 	int err;
-	bool no_engines = args->flags & DRM_XE_UFENCE_WAIT_SOFT_OP ||
-		args->flags & DRM_XE_UFENCE_WAIT_VM_ERROR;
+	bool no_engines = args->flags & DRM_XE_UFENCE_WAIT_SOFT_OP;
 	unsigned long timeout = args->timeout;
 
 	if (XE_IOCTL_ERR(xe, args->extensions))
@@ -116,8 +112,7 @@ int xe_wait_user_fence_ioctl(struct drm_device *dev, void *data,
 	if (XE_IOCTL_ERR(xe, !no_engines && !args->num_engines))
 		return -EINVAL;
 
-	if (XE_IOCTL_ERR(xe, !(args->flags & DRM_XE_UFENCE_WAIT_VM_ERROR) &&
-			 addr & 0x7))
+	if (XE_IOCTL_ERR(xe, addr & 0x7))
 		return -EINVAL;
 
 	if (!no_engines) {
@@ -132,22 +127,6 @@ int xe_wait_user_fence_ioctl(struct drm_device *dev, void *data,
 			return -EINVAL;
 	}
 
-	if (args->flags & DRM_XE_UFENCE_WAIT_VM_ERROR) {
-		if (XE_IOCTL_ERR(xe, args->vm_id >> 32))
-			return -EINVAL;
-
-		vm = xe_vm_lookup(to_xe_file(file), args->vm_id);
-		if (XE_IOCTL_ERR(xe, !vm))
-			return -ENOENT;
-
-		if (XE_IOCTL_ERR(xe, !vm->async_ops.error_capture.addr)) {
-			xe_vm_put(vm);
-			return -ENOTSUPP;
-		}
-
-		addr = vm->async_ops.error_capture.addr;
-	}
-
 	if (XE_IOCTL_ERR(xe, timeout > MAX_SCHEDULE_TIMEOUT))
 		return -EINVAL;
 
@@ -157,15 +136,8 @@ int xe_wait_user_fence_ioctl(struct drm_device *dev, void *data,
 	 * hardware engine. Open coding as 'do_compare' can sleep which doesn't
 	 * work with the wait_event_* macros.
 	 */
-	if (vm)
-		add_wait_queue(&vm->async_ops.error_capture.wq, &w_wait);
-	else
-		add_wait_queue(&xe->ufence_wq, &w_wait);
+	add_wait_queue(&xe->ufence_wq, &w_wait);
 	for (;;) {
-		if (vm && xe_vm_is_closed(vm)) {
-			err = -ENODEV;
-			break;
-		}
 		err = do_compare(addr, args->value, args->mask, args->op);
 		if (err <= 0)
 			break;
@@ -182,12 +154,7 @@ int xe_wait_user_fence_ioctl(struct drm_device *dev, void *data,
 
 		timeout = wait_woken(&w_wait, TASK_INTERRUPTIBLE, timeout);
 	}
-	if (vm) {
-		remove_wait_queue(&vm->async_ops.error_capture.wq, &w_wait);
-		xe_vm_put(vm);
-	} else {
-		remove_wait_queue(&xe->ufence_wq, &w_wait);
-	}
+	remove_wait_queue(&xe->ufence_wq, &w_wait);
 	if (XE_IOCTL_ERR(xe, err < 0))
 		return err;
 	else if (XE_IOCTL_ERR(xe, !timeout))
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 27c51946fadd..cb4debe4ebda 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -145,10 +145,11 @@ struct drm_xe_engine_class_instance {
 #define DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE	3
 #define DRM_XE_ENGINE_CLASS_COMPUTE		4
 	/*
-	 * Kernel only class (not actual hardware engine class). Used for
+	 * Kernel only classes (not actual hardware engine class). Used for
 	 * creating ordered queues of VM bind operations.
 	 */
-#define DRM_XE_ENGINE_CLASS_VM_BIND		5
+#define DRM_XE_ENGINE_CLASS_VM_BIND_ASYNC	5
+#define DRM_XE_ENGINE_CLASS_VM_BIND_SYNC	6
 
 	__u16 engine_instance;
 	__u16 gt_id;
@@ -312,39 +313,8 @@ struct drm_xe_gem_mmap_offset {
 	__u64 reserved[2];
 };
 
-/**
- * struct drm_xe_vm_bind_op_error_capture - format of VM bind op error capture
- */
-struct drm_xe_vm_bind_op_error_capture {
-	/** @error: errno that occured */
-	__s32 error;
-	/** @op: operation that encounter an error */
-	__u32 op;
-	/** @addr: address of bind op */
-	__u64 addr;
-	/** @size: size of bind */
-	__u64 size;
-};
-
-/** struct drm_xe_ext_vm_set_property - VM set property extension */
-struct drm_xe_ext_vm_set_property {
-	/** @base: base user extension */
-	struct xe_user_extension base;
-
-	/** @property: property to set */
-#define XE_VM_PROPERTY_BIND_OP_ERROR_CAPTURE_ADDRESS		0
-	__u32 property;
-
-	/** @value: property value */
-	__u64 value;
-
-	/** @reserved: Reserved */
-	__u64 reserved[2];
-};
-
 struct drm_xe_vm_create {
 	/** @extensions: Pointer to the first extension struct, if any */
-#define XE_VM_EXTENSION_SET_PROPERTY	0
 	__u64 extensions;
 
 	/** @flags: Flags */
@@ -352,7 +322,7 @@ struct drm_xe_vm_create {
 
 #define DRM_XE_VM_CREATE_SCRATCH_PAGE	(0x1 << 0)
 #define DRM_XE_VM_CREATE_COMPUTE_MODE	(0x1 << 1)
-#define DRM_XE_VM_CREATE_ASYNC_BIND_OPS	(0x1 << 2)
+#define DRM_XE_VM_CREATE_ASYNC_DEFAULT	(0x1 << 2)
 #define DRM_XE_VM_CREATE_FAULT_MODE	(0x1 << 3)
 
 	/** @vm_id: Returned VM ID */
@@ -417,30 +387,6 @@ struct drm_xe_vm_bind_op {
 #define XE_VM_BIND_OP_PREFETCH		0x5
 
 #define XE_VM_BIND_FLAG_READONLY	(0x1 << 16)
-	/*
-	 * A bind ops completions are always async, hence the support for out
-	 * sync. This flag indicates the allocation of the memory for new page
-	 * tables and the job to program the pages tables is asynchronous
-	 * relative to the IOCTL. That part of a bind operation can fail under
-	 * memory pressure, the job in practice can't fail unless the system is
-	 * totally shot.
-	 *
-	 * If this flag is clear and the IOCTL doesn't return an error, in
-	 * practice the bind op is good and will complete.
-	 *
-	 * If this flag is set and doesn't return return an error, the bind op
-	 * can still fail and recovery is needed. If configured, the bind op that
-	 * caused the error will be captured in drm_xe_vm_bind_op_error_capture.
-	 * Once the user sees the error (via a ufence +
-	 * XE_VM_PROPERTY_BIND_OP_ERROR_CAPTURE_ADDRESS), it should free memory
-	 * via non-async unbinds, and then restart all queue'd async binds op via
-	 * XE_VM_BIND_OP_RESTART. Or alternatively the user should destroy the
-	 * VM.
-	 *
-	 * This flag is only allowed when DRM_XE_VM_CREATE_ASYNC_BIND_OPS is
-	 * configured in the VM and must be set if the VM is configured with
-	 * DRM_XE_VM_CREATE_ASYNC_BIND_OPS and not in an error state.
-	 */
 #define XE_VM_BIND_FLAG_ASYNC		(0x1 << 17)
 	/*
 	 * Valid on a faulting VM only, do the MAP operation immediately rather
@@ -455,6 +401,7 @@ struct drm_xe_vm_bind_op {
 	 * VK sparse bindings.
 	 */
 #define XE_VM_BIND_FLAG_NULL		(0x1 << 19)
+#define XE_VM_BIND_FLAG_RECLAIM		(0x1 << 20)
 
 	/** @reserved: Reserved */
 	__u64 reserved[2];
@@ -702,17 +649,10 @@ struct drm_xe_mmio {
 struct drm_xe_wait_user_fence {
 	/** @extensions: Pointer to the first extension struct, if any */
 	__u64 extensions;
-	union {
-		/**
-		 * @addr: user pointer address to wait on, must qword aligned
-		 */
-		__u64 addr;
-		/**
-		 * @vm_id: The ID of the VM which encounter an error used with
-		 * DRM_XE_UFENCE_WAIT_VM_ERROR. Upper 32 bits must be clear.
-		 */
-		__u64 vm_id;
-	};
+	/**
+	 * @addr: user pointer address to wait on, must qword aligned
+	 */
+	__u64 addr;
 	/** @op: wait operation (type of comparison) */
 #define DRM_XE_UFENCE_WAIT_EQ	0
 #define DRM_XE_UFENCE_WAIT_NEQ	1
@@ -724,7 +664,6 @@ struct drm_xe_wait_user_fence {
 	/** @flags: wait flags */
 #define DRM_XE_UFENCE_WAIT_SOFT_OP	(1 << 0)	/* e.g. Wait on VM bind */
 #define DRM_XE_UFENCE_WAIT_ABSTIME	(1 << 1)
-#define DRM_XE_UFENCE_WAIT_VM_ERROR	(1 << 2)
 	__u16 flags;
 	/** @value: compare value */
 	__u64 value;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] [PATCH v2 31/31] drm/xe/uapi: Add some VM bind kernel doc
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (29 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 30/31] drm/xe: remove async worker, sync binds, new error handling Matthew Brost
@ 2023-05-02  0:17 ` Matthew Brost
  2023-05-05 19:45   ` Rodrigo Vivi
  2023-05-02  0:20 ` [Intel-xe] ✗ CI.Patch_applied: failure for Upstreaming prep / all of mbrosts patches (rev2) Patchwork
  2023-05-03 12:37 ` [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Thomas Hellström
  32 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-02  0:17 UTC (permalink / raw)
  To: intel-xe

Try to explain how VM bind works in Xe.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 include/uapi/drm/xe_drm.h | 45 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 42 insertions(+), 3 deletions(-)

diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index cb4debe4ebda..c7137db2cbe8 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -148,7 +148,16 @@ struct drm_xe_engine_class_instance {
 	 * Kernel only classes (not actual hardware engine class). Used for
 	 * creating ordered queues of VM bind operations.
 	 */
+	/**
+	 * @DRM_XE_ENGINE_CLASS_VM_BIND_ASYNC: VM bind engine which are allowed
+	 * to use in / out syncs. The out sync indicates bind op(s) completion.
+	 */
 #define DRM_XE_ENGINE_CLASS_VM_BIND_ASYNC	5
+	/**
+	 * @DRM_XE_ENGINE_CLASS_VM_BIND_SYNC: VM bind engine which are not
+	 * allowed to use in / out syncs, The IOCTL return indicates bind op(s)
+	 * completion.
+	 */
 #define DRM_XE_ENGINE_CLASS_VM_BIND_SYNC	6
 
 	__u16 engine_instance;
@@ -322,6 +331,7 @@ struct drm_xe_vm_create {
 
 #define DRM_XE_VM_CREATE_SCRATCH_PAGE	(0x1 << 0)
 #define DRM_XE_VM_CREATE_COMPUTE_MODE	(0x1 << 1)
+	/** @DRM_XE_VM_CREATE_ASYNC_DEFAULT: Default VM bind engine is async */
 #define DRM_XE_VM_CREATE_ASYNC_DEFAULT	(0x1 << 2)
 #define DRM_XE_VM_CREATE_FAULT_MODE	(0x1 << 3)
 
@@ -379,21 +389,44 @@ struct drm_xe_vm_bind_op {
 	/** @mem_region: Memory region to prefetch VMA to, instance not a mask */
 	__u32 region;
 
+	/** @XE_VM_BIND_OP_MAP: Map a buffer object */
 #define XE_VM_BIND_OP_MAP		0x0
+	/** @XE_VM_BIND_OP_UNMAP: Unmap a buffer object or userptr */
 #define XE_VM_BIND_OP_UNMAP		0x1
+	/** @XE_VM_BIND_OP_MAP_USERPTR: Map a userptr */
 #define XE_VM_BIND_OP_MAP_USERPTR	0x2
+	/**
+	 * @XE_VM_BIND_OP_RESTART: Restart last bind operation that failed with
+	 * -ENOSPC
+	 */
 #define XE_VM_BIND_OP_RESTART		0x3
+	/**
+	 * @XE_VM_BIND_OP_UNMAP_ALL: Unmap all mappings associated with a
+	 * buffer ibject
+	 */
 #define XE_VM_BIND_OP_UNMAP_ALL		0x4
+	/**
+	 * @XE_VM_BIND_OP_PREFETCH: For a deferred bind (faulting VM)
+	 * validate buffer object and (re)bind
+	 */
 #define XE_VM_BIND_OP_PREFETCH		0x5
-
+	/** @XE_VM_BIND_FLAG_READONLY: Set mapping to read only */
 #define XE_VM_BIND_FLAG_READONLY	(0x1 << 16)
+	/**
+	 * @XE_VM_BIND_FLAG_ASYNC: Sanity check for if using async bind engine
+	 * (in / out syncs) this set needs to be set.
+	 */
 #define XE_VM_BIND_FLAG_ASYNC		(0x1 << 17)
-	/*
+	/**
+	 * @XE_VM_BIND_FLAG_IMMEDIATE:
+	 *
 	 * Valid on a faulting VM only, do the MAP operation immediately rather
 	 * than differing the MAP to the page fault handler.
 	 */
 #define XE_VM_BIND_FLAG_IMMEDIATE	(0x1 << 18)
-	/*
+	/**
+	 * @XE_VM_BIND_FLAG_NULL:
+	 *
 	 * When the NULL flag is set, the page tables are setup with a special
 	 * bit which indicates writes are dropped and all reads return zero. The
 	 * NULL flags is only valid for XE_VM_BIND_OP_MAP operations, the BO
@@ -401,6 +434,12 @@ struct drm_xe_vm_bind_op {
 	 * VK sparse bindings.
 	 */
 #define XE_VM_BIND_FLAG_NULL		(0x1 << 19)
+	/**
+	 * @XE_VM_BIND_FLAG_RECLAIM: Should be set when a VM is in an error
+	 * state (bind op returns -ENOSPC), used with sync bind engines to issue
+	 * UNMAP operations which hopefully free enough memory so when VM is
+	 * restarted via @XE_VM_BIND_OP_RESTART the failed bind ops succeed.
+	 */
 #define XE_VM_BIND_FLAG_RECLAIM		(0x1 << 20)
 
 	/** @reserved: Reserved */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Intel-xe] ✗ CI.Patch_applied: failure for Upstreaming prep / all of mbrosts patches (rev2)
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (30 preceding siblings ...)
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 31/31] drm/xe/uapi: Add some VM bind kernel doc Matthew Brost
@ 2023-05-02  0:20 ` Patchwork
  2023-05-02  1:54   ` Christopher Snowhill (kode54)
  2023-05-02  1:59   ` Christopher Snowhill (kode54)
  2023-05-03 12:37 ` [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Thomas Hellström
  32 siblings, 2 replies; 126+ messages in thread
From: Patchwork @ 2023-05-02  0:20 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Upstreaming prep / all of mbrosts patches (rev2)
URL   : https://patchwork.freedesktop.org/series/117156/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: 9a7a643c2 drm/xe: Fix media detection for pre-GMD_ID platforms
=== git am output follows ===
error: patch failed: drivers/gpu/drm/xe/xe_pt.c:778
error: drivers/gpu/drm/xe/xe_pt.c: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch
Applying: drm/sched: Add run_wq argument to drm_sched_init
Applying: drm/sched: Move schedule policy to scheduler
Applying: drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
Applying: drm/xe: Use DRM_SCHED_POLICY_SINGLE_ENTITY mode
Applying: drm/xe: Long running job update
Applying: drm/xe: Ensure LR engines are not persistent
Applying: drm/xe: Only try to lock external BOs in VM bind
Applying: drm/xe: VM LRU bulk move
Applying: drm/xe/guc: Read HXG fields from DW1 of G2H response
Applying: drm/xe/guc: Return the lower part of blocking H2G message
Applying: drm/xe/guc: Use doorbells for submission if possible
Applying: drm/xe/guc: Print doorbell ID in GuC engine debugfs entry
Applying: maple_tree: split up MA_STATE() macro
Applying: maple_tree: Export mas_preallocate
Applying: drm: manager to keep track of GPUs VA mappings
Applying: drm/xe: Port Xe to GPUVA
Patch failed at 0016 drm/xe: Port Xe to GPUVA
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe]  ✗ CI.Patch_applied: failure for Upstreaming prep / all of mbrosts patches (rev2)
  2023-05-02  0:20 ` [Intel-xe] ✗ CI.Patch_applied: failure for Upstreaming prep / all of mbrosts patches (rev2) Patchwork
@ 2023-05-02  1:54   ` Christopher Snowhill (kode54)
  2023-05-02  1:59   ` Christopher Snowhill (kode54)
  1 sibling, 0 replies; 126+ messages in thread
From: Christopher Snowhill (kode54) @ 2023-05-02  1:54 UTC (permalink / raw)
  To: intel-xe



Sent from my iPad

> On May 1, 2023, at 5:20 PM, Patchwork <patchwork@emeril.freedesktop.org> wrote:
> 
> == Series Details ==
> 
> Series: Upstreaming prep / all of mbrosts patches (rev2)
> URL   : https://patchwork.freedesktop.org/series/117156/
> State : failure
> 
> == Summary ==
> 
> === Applying kernel patches on branch 'drm-xe-next' with base: ===
> Base commit: 9a7a643c2 drm/xe: Fix media detection for pre-GMD_ID platforms
> === git am output follows ===
> error: patch failed: drivers/gpu/drm/xe/xe_pt.c:778
> error: drivers/gpu/drm/xe/xe_pt.c: patch does not apply
> hint: Use 'git am --show-current-patch' to see the failed patch
> Applying: drm/sched: Add run_wq argument to drm_sched_init
> Applying: drm/sched: Move schedule policy to scheduler
> Applying: drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
> Applying: drm/xe: Use DRM_SCHED_POLICY_SINGLE_ENTITY mode
> Applying: drm/xe: Long running job update
> Applying: drm/xe: Ensure LR engines are not persistent
> Applying: drm/xe: Only try to lock external BOs in VM bind
> Applying: drm/xe: VM LRU bulk move
> Applying: drm/xe/guc: Read HXG fields from DW1 of G2H response
> Applying: drm/xe/guc: Return the lower part of blocking H2G message
> Applying: drm/xe/guc: Use doorbells for submission if possible
> Applying: drm/xe/guc: Print doorbell ID in GuC engine debugfs entry
> Applying: maple_tree: split up MA_STATE() macro
> Applying: maple_tree: Export mas_preallocate
> Applying: drm: manager to keep track of GPUs VA mappings
> Applying: drm/xe: Port Xe to GPUVA
> Patch failed at 0016 drm/xe: Port Xe to GPUVA
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
> 
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe]  ✗ CI.Patch_applied: failure for Upstreaming prep / all of mbrosts patches (rev2)
  2023-05-02  0:20 ` [Intel-xe] ✗ CI.Patch_applied: failure for Upstreaming prep / all of mbrosts patches (rev2) Patchwork
  2023-05-02  1:54   ` Christopher Snowhill (kode54)
@ 2023-05-02  1:59   ` Christopher Snowhill (kode54)
  1 sibling, 0 replies; 126+ messages in thread
From: Christopher Snowhill (kode54) @ 2023-05-02  1:59 UTC (permalink / raw)
  To: intel-xe



> On May 1, 2023, at 5:20 PM, Patchwork <patchwork@emeril.freedesktop.org> wrote:
> 
> == Series Details ==
> 
> Series: Upstreaming prep / all of mbrosts patches (rev2)
> URL   : https://patchwork.freedesktop.org/series/117156/
> State : failure
> 
> == Summary ==
> 
> === Applying kernel patches on branch 'drm-xe-next' with base: ===
> Base commit: 9a7a643c2 drm/xe: Fix media detection for pre-GMD_ID platforms
> === git am output follows ===
> error: patch failed: drivers/gpu/drm/xe/xe_pt.c:778
> error: drivers/gpu/drm/xe/xe_pt.c: patch does not apply
> hint: Use 'git am --show-current-patch' to see the failed patch
> Applying: drm/sched: Add run_wq argument to drm_sched_init
> Applying: drm/sched: Move schedule policy to scheduler
> Applying: drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
> Applying: drm/xe: Use DRM_SCHED_POLICY_SINGLE_ENTITY mode
> Applying: drm/xe: Long running job update
> Applying: drm/xe: Ensure LR engines are not persistent
> Applying: drm/xe: Only try to lock external BOs in VM bind
> Applying: drm/xe: VM LRU bulk move
> Applying: drm/xe/guc: Read HXG fields from DW1 of G2H response
> Applying: drm/xe/guc: Return the lower part of blocking H2G message
> Applying: drm/xe/guc: Use doorbells for submission if possible
> Applying: drm/xe/guc: Print doorbell ID in GuC engine debugfs entry
> Applying: maple_tree: split up MA_STATE() macro
> Applying: maple_tree: Export mas_preallocate
> Applying: drm: manager to keep track of GPUs VA mappings
> Applying: drm/xe: Port Xe to GPUVA
> Patch failed at 0016 drm/xe: Port Xe to GPUVA
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".

damn I hate Apple Mail. I wanted to select which message in the thread I am replying to, and it naturally assumes I want to quote and reply to the most recent.

Anyway, is there a Mesa branch that works with this series yet? Or will the current Xe branch work here? I already needed to use a special flavor of the branch because of shared object issues that hadn't been merged into the main MR, and I'm using @mlankhorst's uAPI header and validation, and my own compat ioctl function one line change to make 32 bit work.

Otherwise, if there's no working userspace for this yet, I'll just give it a pass at testing for now.

I did notice some typoes in the above comments or commit messages, though. Like "doorbels". I'll have to look over it again if it's worth noting any more.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 01/31] drm/sched: Add run_wq argument to drm_sched_init
  2023-05-02  0:16 ` [Intel-xe] [PATCH v2 01/31] drm/sched: Add run_wq argument to drm_sched_init Matthew Brost
@ 2023-05-03 12:03   ` Thomas Hellström
  2023-05-03 15:06     ` Matthew Brost
  0 siblings, 1 reply; 126+ messages in thread
From: Thomas Hellström @ 2023-05-03 12:03 UTC (permalink / raw)
  To: Matthew Brost, intel-xe

Hi,

On 5/2/23 02:16, Matthew Brost wrote:
> We will have this argument upstream, lets pull into the Xe repo.

Please rephrase the commit message. Add explanation, imperative wording 
and remove mentions of the Xe repo (nobody cares about that once it goes 
upstream)

/Thomas

>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 +-
>   drivers/gpu/drm/etnaviv/etnaviv_sched.c    |  2 +-
>   drivers/gpu/drm/lima/lima_sched.c          |  2 +-
>   drivers/gpu/drm/msm/msm_ringbuffer.c       |  2 +-
>   drivers/gpu/drm/panfrost/panfrost_job.c    |  2 +-
>   drivers/gpu/drm/scheduler/sched_main.c     |  4 +++-
>   drivers/gpu/drm/v3d/v3d_sched.c            | 10 +++++-----
>   drivers/gpu/drm/xe/xe_execlist.c           |  2 +-
>   drivers/gpu/drm/xe/xe_guc_submit.c         |  2 +-
>   include/drm/gpu_scheduler.h                |  1 +
>   10 files changed, 16 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 902f9b5ff82c..fe28f6b71fe3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2364,7 +2364,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>   			break;
>   		}
>   
> -		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
> +		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL,
>   				   ring->num_hw_submission, amdgpu_job_hang_limit,
>   				   timeout, adev->reset_domain->wq,
>   				   ring->sched_score, ring->name,
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index 1ae87dfd19c4..8486a2923f1b 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -133,7 +133,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>   {
>   	int ret;
>   
> -	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
> +	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
>   			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>   			     msecs_to_jiffies(500), NULL, NULL,
>   			     dev_name(gpu->dev), gpu->dev);
> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> index ff003403fbbc..54f53bece27c 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
>   
>   	INIT_WORK(&pipe->recover_work, lima_sched_recover_work);
>   
> -	return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
> +	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1,
>   			      lima_job_hang_limit,
>   			      msecs_to_jiffies(timeout), NULL,
>   			      NULL, name, pipe->ldev->dev);
> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
> index 57a8e9564540..5879fc262047 100644
> --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> @@ -95,7 +95,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
>   	 /* currently managing hangcheck ourselves: */
>   	sched_timeout = MAX_SCHEDULE_TIMEOUT;
>   
> -	ret = drm_sched_init(&ring->sched, &msm_sched_ops,
> +	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
>   			num_hw_submissions, 0, sched_timeout,
>   			NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
>   	if (ret) {
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index dbc597ab46fb..f48b07056a16 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -815,7 +815,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
>   		js->queue[j].fence_context = dma_fence_context_alloc(1);
>   
>   		ret = drm_sched_init(&js->queue[j].sched,
> -				     &panfrost_sched_ops,
> +				     &panfrost_sched_ops, NULL,
>   				     nentries, 0,
>   				     msecs_to_jiffies(JOB_TIMEOUT_MS),
>   				     pfdev->reset.wq,
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index cfd8a838e283..e79b9c760efe 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1182,6 +1182,7 @@ static void drm_sched_main(struct work_struct *w)
>    *
>    * @sched: scheduler instance
>    * @ops: backend operations for this scheduler
> + * @run_wq: workqueue to use for run work. If NULL, the system_wq is used
>    * @hw_submission: number of hw submissions that can be in flight
>    * @hang_limit: number of times to allow a job to hang before dropping it
>    * @timeout: timeout value in jiffies for the scheduler
> @@ -1195,6 +1196,7 @@ static void drm_sched_main(struct work_struct *w)
>    */
>   int drm_sched_init(struct drm_gpu_scheduler *sched,
>   		   const struct drm_sched_backend_ops *ops,
> +		   struct workqueue_struct *run_wq,
>   		   unsigned hw_submission, unsigned hang_limit,
>   		   long timeout, struct workqueue_struct *timeout_wq,
>   		   atomic_t *score, const char *name, struct device *dev)
> @@ -1203,9 +1205,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>   	sched->ops = ops;
>   	sched->hw_submission_limit = hw_submission;
>   	sched->name = name;
> +	sched->run_wq = run_wq ? : system_wq;
>   	sched->timeout = timeout;
>   	sched->timeout_wq = timeout_wq ? : system_wq;
> -	sched->run_wq = system_wq;	/* FIXME: Let user pass this in */
>   	sched->hang_limit = hang_limit;
>   	sched->score = score ? score : &sched->_score;
>   	sched->dev = dev;
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index 06238e6d7f5c..38e092ea41e6 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -388,7 +388,7 @@ v3d_sched_init(struct v3d_dev *v3d)
>   	int ret;
>   
>   	ret = drm_sched_init(&v3d->queue[V3D_BIN].sched,
> -			     &v3d_bin_sched_ops,
> +			     &v3d_bin_sched_ops, NULL,
>   			     hw_jobs_limit, job_hang_limit,
>   			     msecs_to_jiffies(hang_limit_ms), NULL,
>   			     NULL, "v3d_bin", v3d->drm.dev);
> @@ -396,7 +396,7 @@ v3d_sched_init(struct v3d_dev *v3d)
>   		return ret;
>   
>   	ret = drm_sched_init(&v3d->queue[V3D_RENDER].sched,
> -			     &v3d_render_sched_ops,
> +			     &v3d_render_sched_ops, NULL,
>   			     hw_jobs_limit, job_hang_limit,
>   			     msecs_to_jiffies(hang_limit_ms), NULL,
>   			     NULL, "v3d_render", v3d->drm.dev);
> @@ -404,7 +404,7 @@ v3d_sched_init(struct v3d_dev *v3d)
>   		goto fail;
>   
>   	ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
> -			     &v3d_tfu_sched_ops,
> +			     &v3d_tfu_sched_ops, NULL,
>   			     hw_jobs_limit, job_hang_limit,
>   			     msecs_to_jiffies(hang_limit_ms), NULL,
>   			     NULL, "v3d_tfu", v3d->drm.dev);
> @@ -413,7 +413,7 @@ v3d_sched_init(struct v3d_dev *v3d)
>   
>   	if (v3d_has_csd(v3d)) {
>   		ret = drm_sched_init(&v3d->queue[V3D_CSD].sched,
> -				     &v3d_csd_sched_ops,
> +				     &v3d_csd_sched_ops, NULL,
>   				     hw_jobs_limit, job_hang_limit,
>   				     msecs_to_jiffies(hang_limit_ms), NULL,
>   				     NULL, "v3d_csd", v3d->drm.dev);
> @@ -421,7 +421,7 @@ v3d_sched_init(struct v3d_dev *v3d)
>   			goto fail;
>   
>   		ret = drm_sched_init(&v3d->queue[V3D_CACHE_CLEAN].sched,
> -				     &v3d_cache_clean_sched_ops,
> +				     &v3d_cache_clean_sched_ops, NULL,
>   				     hw_jobs_limit, job_hang_limit,
>   				     msecs_to_jiffies(hang_limit_ms), NULL,
>   				     NULL, "v3d_cache_clean", v3d->drm.dev);
> diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
> index de4f0044b211..d6d60ebf3d5f 100644
> --- a/drivers/gpu/drm/xe/xe_execlist.c
> +++ b/drivers/gpu/drm/xe/xe_execlist.c
> @@ -336,7 +336,7 @@ static int execlist_engine_init(struct xe_engine *e)
>   
>   	exl->engine = e;
>   
> -	err = drm_sched_init(&exl->sched, &drm_sched_ops,
> +	err = drm_sched_init(&exl->sched, &drm_sched_ops, NULL,
>   			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
>   			     XE_SCHED_HANG_LIMIT, XE_SCHED_JOB_TIMEOUT,
>   			     NULL, NULL, e->hwe->name,
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index e857013070b9..735f31257f3a 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1081,7 +1081,7 @@ static int guc_engine_init(struct xe_engine *e)
>   	init_waitqueue_head(&ge->suspend_wait);
>   
>   	timeout = xe_vm_no_dma_fences(e->vm) ? MAX_SCHEDULE_TIMEOUT : HZ * 5;
> -	err = drm_sched_init(&ge->sched, &drm_sched_ops,
> +	err = drm_sched_init(&ge->sched, &drm_sched_ops, NULL,
>   			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
>   			     64, timeout, guc_to_gt(guc)->ordered_wq, NULL,
>   			     e->name, gt_to_xe(e->gt)->drm.dev);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index cf85f93218fc..09bc39840dc8 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -539,6 +539,7 @@ struct drm_gpu_scheduler {
>   
>   int drm_sched_init(struct drm_gpu_scheduler *sched,
>   		   const struct drm_sched_backend_ops *ops,
> +		   struct workqueue_struct *run_wq,
>   		   uint32_t hw_submission, unsigned hang_limit,
>   		   long timeout, struct workqueue_struct *timeout_wq,
>   		   atomic_t *score, const char *name, struct device *dev);

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 02/31] drm/sched: Move schedule policy to scheduler
  2023-05-02  0:16 ` [Intel-xe] [PATCH v2 02/31] drm/sched: Move schedule policy to scheduler Matthew Brost
@ 2023-05-03 12:13   ` Thomas Hellström
  2023-05-03 15:11     ` Matthew Brost
  0 siblings, 1 reply; 126+ messages in thread
From: Thomas Hellström @ 2023-05-03 12:13 UTC (permalink / raw)
  To: Matthew Brost, intel-xe


On 5/2/23 02:16, Matthew Brost wrote:
> Rather than a global modparam for scheduling policy, move the scheduling
> policy to scheduler so user can control each scheduler policy.

Could you add some more info here, about why this is done and how the 
scheduler policy is supposed to be controlled? Should it say "driver can 
control" rather than "user can control" at this stage+


>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
>   drivers/gpu/drm/etnaviv/etnaviv_sched.c    |  3 ++-
>   drivers/gpu/drm/lima/lima_sched.c          |  3 ++-
>   drivers/gpu/drm/msm/msm_ringbuffer.c       |  3 ++-
>   drivers/gpu/drm/panfrost/panfrost_job.c    |  3 ++-
>   drivers/gpu/drm/scheduler/sched_entity.c   | 24 ++++++++++++++++++----
>   drivers/gpu/drm/scheduler/sched_main.c     | 21 ++++++++++++++-----
>   drivers/gpu/drm/v3d/v3d_sched.c            | 15 +++++++++-----
>   drivers/gpu/drm/xe/xe_execlist.c           |  2 +-
>   drivers/gpu/drm/xe/xe_guc_submit.c         |  3 ++-
>   include/drm/gpu_scheduler.h                | 20 ++++++++++++------
>   11 files changed, 72 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index fe28f6b71fe3..577ea5b98cd5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2368,6 +2368,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>   				   ring->num_hw_submission, amdgpu_job_hang_limit,
>   				   timeout, adev->reset_domain->wq,
>   				   ring->sched_score, ring->name,
> +				   DRM_SCHED_POLICY_DEFAULT,
>   				   adev->dev);
>   		if (r) {
>   			DRM_ERROR("Failed to create scheduler on ring %s.\n",
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index 8486a2923f1b..61204a3f8b0b 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -136,7 +136,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>   	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
>   			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>   			     msecs_to_jiffies(500), NULL, NULL,
> -			     dev_name(gpu->dev), gpu->dev);
> +			     dev_name(gpu->dev), DRM_SCHED_POLICY_DEFAULT,
> +			     gpu->dev);
>   	if (ret)
>   		return ret;
>   
> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> index 54f53bece27c..33042ba6ae93 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -491,7 +491,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
>   	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1,
>   			      lima_job_hang_limit,
>   			      msecs_to_jiffies(timeout), NULL,
> -			      NULL, name, pipe->ldev->dev);
> +			      NULL, name, DRM_SCHED_POLICY_DEFAULT,
> +			      pipe->ldev->dev);
>   }
>   
>   void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
> index 5879fc262047..f408a9097315 100644
> --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> @@ -97,7 +97,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
>   
>   	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
>   			num_hw_submissions, 0, sched_timeout,
> -			NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
> +			NULL, NULL, to_msm_bo(ring->bo)->name,
> +			DRM_SCHED_POLICY_DEFAULT, gpu->dev->dev);
>   	if (ret) {
>   		goto fail;
>   	}
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index f48b07056a16..effa48b33dce 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -819,7 +819,8 @@ int panfrost_job_init(struct panfrost_device *pfdev)
>   				     nentries, 0,
>   				     msecs_to_jiffies(JOB_TIMEOUT_MS),
>   				     pfdev->reset.wq,
> -				     NULL, "pan_js", pfdev->dev);
> +				     NULL, "pan_js", DRM_SCHED_POLICY_DEFAULT,
> +				     pfdev->dev);
>   		if (ret) {
>   			dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret);
>   			goto err_sched;
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 15d04a0ec623..2300b2fc06ab 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -33,6 +33,20 @@
>   #define to_drm_sched_job(sched_job)		\
>   		container_of((sched_job), struct drm_sched_job, queue_node)
>   
> +static bool bad_policies(struct drm_gpu_scheduler **sched_list,
> +			 unsigned int num_sched_list)
> +{
> +	enum drm_sched_policy sched_policy = sched_list[0]->sched_policy;
> +	unsigned int i;
> +
> +	/* All scdedule policies must match */
s/scdedule/schedule/
> +	for (i = 1; i < num_sched_list; ++i)
> +		if (sched_policy != sched_list[i]->sched_policy)
> +			return true;
> +
> +	return false;
> +}
> +
>   /**
>    * drm_sched_entity_init - Init a context entity used by scheduler when
>    * submit to HW ring.
> @@ -62,7 +76,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>   			  unsigned int num_sched_list,
>   			  atomic_t *guilty)
>   {
> -	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
> +	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])) ||
> +	    bad_policies(sched_list, num_sched_list))
>   		return -EINVAL;
>   
>   	memset(entity, 0, sizeof(struct drm_sched_entity));
> @@ -75,8 +90,9 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>   	entity->last_scheduled = NULL;
>   	RB_CLEAR_NODE(&entity->rb_tree_node);
>   
> -	if(num_sched_list)
> +	if(num_sched_list) {
>   		entity->rq = &sched_list[0]->sched_rq[entity->priority];
> +	}
Why are brackets added here?
>   
>   	init_completion(&entity->entity_idle);
>   
> @@ -440,7 +456,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
>   	 * Update the entity's location in the min heap according to
>   	 * the timestamp of the next job, if any.
>   	 */
> -	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
> +	if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO) {
>   		struct drm_sched_job *next;
>   
>   		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> @@ -528,7 +544,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>   		drm_sched_rq_add_entity(entity->rq, entity);
>   		spin_unlock(&entity->rq_lock);
>   
> -		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> +		if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO)
>   			drm_sched_rq_update_fifo(entity, sched_job->submit_ts);
>   
>   		drm_sched_wakeup(entity->rq->sched);
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index e79b9c760efe..6777a2db554f 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -62,14 +62,14 @@
>   #define to_drm_sched_job(sched_job)		\
>   		container_of((sched_job), struct drm_sched_job, queue_node)
>   
> -int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
> +int default_drm_sched_policy = DRM_SCHED_POLICY_FIFO;
>   
>   /**
>    * DOC: sched_policy (int)
>    * Used to override default entities scheduling policy in a run queue.
>    */
>   MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
> -module_param_named(sched_policy, drm_sched_policy, int, 0444);
> +module_param_named(sched_policy, default_drm_sched_policy, int, 0444);
>   
>   static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a,
>   							    const struct rb_node *b)
> @@ -173,7 +173,7 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
>   	if (rq->current_entity == entity)
>   		rq->current_entity = NULL;
>   
> -	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> +	if (rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO)
>   		drm_sched_rq_remove_fifo_locked(entity);
>   
>   	spin_unlock(&rq->lock);
> @@ -956,7 +956,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>   
>   	/* Kernel run queue has higher priority than normal run queue*/
>   	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
> -		entity = drm_sched_policy == DRM_SCHED_POLICY_FIFO ?
> +		entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ?
>   			drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]) :
>   			drm_sched_rq_select_entity_rr(&sched->sched_rq[i]);
>   		if (entity)
> @@ -1190,6 +1190,7 @@ static void drm_sched_main(struct work_struct *w)
>    *		used
>    * @score: optional score atomic shared with other schedulers
>    * @name: name used for debugging
> + * @sched_policy: schedule policy
>    * @dev: target &struct device
>    *
>    * Return 0 on success, otherwise error code.
> @@ -1199,9 +1200,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>   		   struct workqueue_struct *run_wq,
>   		   unsigned hw_submission, unsigned hang_limit,
>   		   long timeout, struct workqueue_struct *timeout_wq,
> -		   atomic_t *score, const char *name, struct device *dev)
> +		   atomic_t *score, const char *name,
> +		   enum drm_sched_policy sched_policy,
> +		   struct device *dev)
>   {
>   	int i;
> +
> +	if (sched_policy >= DRM_SCHED_POLICY_COUNT)
> +		return -EINVAL;
> +
>   	sched->ops = ops;
>   	sched->hw_submission_limit = hw_submission;
>   	sched->name = name;
> @@ -1211,6 +1218,10 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>   	sched->hang_limit = hang_limit;
>   	sched->score = score ? score : &sched->_score;
>   	sched->dev = dev;
> +	if (sched_policy == DRM_SCHED_POLICY_DEFAULT)
> +		sched->sched_policy = default_drm_sched_policy;
> +	else
> +		sched->sched_policy = sched_policy;
>   	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
>   		drm_sched_rq_init(sched, &sched->sched_rq[i]);
>   
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index 38e092ea41e6..5e3fe77fa991 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -391,7 +391,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>   			     &v3d_bin_sched_ops, NULL,
>   			     hw_jobs_limit, job_hang_limit,
>   			     msecs_to_jiffies(hang_limit_ms), NULL,
> -			     NULL, "v3d_bin", v3d->drm.dev);
> +			     NULL, "v3d_bin", DRM_SCHED_POLICY_DEFAULT,
> +			     v3d->drm.dev);
>   	if (ret)
>   		return ret;
>   
> @@ -399,7 +400,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>   			     &v3d_render_sched_ops, NULL,
>   			     hw_jobs_limit, job_hang_limit,
>   			     msecs_to_jiffies(hang_limit_ms), NULL,
> -			     NULL, "v3d_render", v3d->drm.dev);
> +			     ULL, "v3d_render", DRM_SCHED_POLICY_DEFAULT,
> +			     v3d->drm.dev);
>   	if (ret)
>   		goto fail;
>   
> @@ -407,7 +409,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>   			     &v3d_tfu_sched_ops, NULL,
>   			     hw_jobs_limit, job_hang_limit,
>   			     msecs_to_jiffies(hang_limit_ms), NULL,
> -			     NULL, "v3d_tfu", v3d->drm.dev);
> +			     NULL, "v3d_tfu", DRM_SCHED_POLICY_DEFAULT,
> +			     v3d->drm.dev);
>   	if (ret)
>   		goto fail;
>   
> @@ -416,7 +419,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>   				     &v3d_csd_sched_ops, NULL,
>   				     hw_jobs_limit, job_hang_limit,
>   				     msecs_to_jiffies(hang_limit_ms), NULL,
> -				     NULL, "v3d_csd", v3d->drm.dev);
> +				     NULL, "v3d_csd", DRM_SCHED_POLICY_DEFAULT,
> +				     v3d->drm.dev);
>   		if (ret)
>   			goto fail;
>   
> @@ -424,7 +428,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>   				     &v3d_cache_clean_sched_ops, NULL,
>   				     hw_jobs_limit, job_hang_limit,
>   				     msecs_to_jiffies(hang_limit_ms), NULL,
> -				     NULL, "v3d_cache_clean", v3d->drm.dev);
> +				     NULL, "v3d_cache_clean",
> +				     DRM_SCHED_POLICY_DEFAULT, v3d->drm.dev);
>   		if (ret)
>   			goto fail;
>   	}
> diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
> index d6d60ebf3d5f..48060d14547a 100644
> --- a/drivers/gpu/drm/xe/xe_execlist.c
> +++ b/drivers/gpu/drm/xe/xe_execlist.c
> @@ -339,7 +339,7 @@ static int execlist_engine_init(struct xe_engine *e)
>   	err = drm_sched_init(&exl->sched, &drm_sched_ops, NULL,
>   			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
>   			     XE_SCHED_HANG_LIMIT, XE_SCHED_JOB_TIMEOUT,
> -			     NULL, NULL, e->hwe->name,
> +			     NULL, NULL, e->hwe->name, DRM_SCHED_POLICY_DEFAULT,
>   			     gt_to_xe(e->gt)->drm.dev);
>   	if (err)
>   		goto err_free;
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 735f31257f3a..9d3fadca43be 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1084,7 +1084,8 @@ static int guc_engine_init(struct xe_engine *e)
>   	err = drm_sched_init(&ge->sched, &drm_sched_ops, NULL,
>   			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
>   			     64, timeout, guc_to_gt(guc)->ordered_wq, NULL,
> -			     e->name, gt_to_xe(e->gt)->drm.dev);
> +			     e->name, DRM_SCHED_POLICY_DEFAULT,
> +			     gt_to_xe(e->gt)->drm.dev);
>   	if (err)
>   		goto err_free;
>   
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 09bc39840dc8..3df801401028 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -63,11 +63,15 @@ enum drm_sched_priority {
>   	DRM_SCHED_PRIORITY_UNSET = -2
>   };
>   
> -/* Used to chose between FIFO and RR jobs scheduling */
> -extern int drm_sched_policy;
> -
> -#define DRM_SCHED_POLICY_RR    0
> -#define DRM_SCHED_POLICY_FIFO  1
> +/* Used to chose default scheduling policy*/
> +extern int default_drm_sched_policy;
> +
> +enum drm_sched_policy {
> +	DRM_SCHED_POLICY_DEFAULT,
> +	DRM_SCHED_POLICY_RR,
> +	DRM_SCHED_POLICY_FIFO,
> +	DRM_SCHED_POLICY_COUNT,
> +};
>   
>   /**
>    * struct drm_sched_entity - A wrapper around a job queue (typically
> @@ -505,6 +509,7 @@ struct drm_sched_backend_ops {
>    *              guilty and it will no longer be considered for scheduling.
>    * @score: score to help loadbalancer pick a idle sched
>    * @_score: score used when the driver doesn't provide one
> + * @sched_policy: Schedule policy for scheduler
>    * @ready: marks if the underlying HW is ready to work
>    * @free_guilty: A hit to time out handler to free the guilty job.
>    * @pause_run_wq: pause queuing of @work_run on @run_wq
> @@ -531,6 +536,7 @@ struct drm_gpu_scheduler {
>   	int				hang_limit;
>   	atomic_t                        *score;
>   	atomic_t                        _score;
> +	enum drm_sched_policy		sched_policy;
>   	bool				ready;
>   	bool				free_guilty;
>   	bool				pause_run_wq;
> @@ -542,7 +548,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>   		   struct workqueue_struct *run_wq,
>   		   uint32_t hw_submission, unsigned hang_limit,
>   		   long timeout, struct workqueue_struct *timeout_wq,
> -		   atomic_t *score, const char *name, struct device *dev);
> +		   atomic_t *score, const char *name,
> +		   enum drm_sched_policy sched_policy,
> +		   struct device *dev);
>   
>   void drm_sched_fini(struct drm_gpu_scheduler *sched);
>   int drm_sched_job_init(struct drm_sched_job *job,

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches
  2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
                   ` (31 preceding siblings ...)
  2023-05-02  0:20 ` [Intel-xe] ✗ CI.Patch_applied: failure for Upstreaming prep / all of mbrosts patches (rev2) Patchwork
@ 2023-05-03 12:37 ` Thomas Hellström
  2023-05-03 15:27   ` Matthew Brost
  32 siblings, 1 reply; 126+ messages in thread
From: Thomas Hellström @ 2023-05-03 12:37 UTC (permalink / raw)
  To: Matthew Brost, intel-xe

Hi, Matthew

On 5/2/23 02:16, Matthew Brost wrote:
> Series includes:
>
> - DRM scheduler changes for firmware backends (1 to 1 entity to scheduler)
> - LR workload story
> - VM LRU handling
> - GuC doorbell submission
> - Basic GPUVA
> - Sparse binding support
> - GPUVA + extobj + drm exec (collaboration with dakr + Francois Dugast)
> - GPUVA + userptr (minimal, more can be once Nouveua has userptr)
> - Fix fencing rules for compute / fault mode
> - Remove async worker for VM + error handling updates
> - Kernel doc for VM bind

It would be beneficial to the reviewer if  you could make separate 
series where applicable with links to previous discussions and add 
information about whether all design issues /discussions were resolved 
or if anything was remaining and a brief guidance as to what is in each 
patch of the series.

/Thomas




> Series is not fully ready for upstream and some of these things need to
> get merged upstream first but overall it is largely correct and
> certainly step in the right direction. Based on its size and the fact it
> took me 8 hours to rebase this today I'd say let's get this tree and
> fixup everything else in place.
>
> Minor uAPI breakage, IGT series:
> https://patchwork.freedesktop.org/series/117177/
>
> gitlab link:
> https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/344
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>
> Christian König (1):
>    drm: execution context for GEM buffers v3
>
> Danilo Krummrich (2):
>    maple_tree: split up MA_STATE() macro
>    drm: manager to keep track of GPUs VA mappings
>
> Matthew Brost (28):
>    drm/sched: Add run_wq argument to drm_sched_init
>    drm/sched: Move schedule policy to scheduler
>    drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
>    drm/xe: Use DRM_SCHED_POLICY_SINGLE_ENTITY mode
>    drm/xe: Long running job update
>    drm/xe: Ensure LR engines are not persistent
>    drm/xe: Only try to lock external BOs in VM bind
>    drm/xe: VM LRU bulk move
>    drm/xe/guc: Read HXG fields from DW1 of G2H response
>    drm/xe/guc: Return the lower part of blocking H2G message
>    drm/xe/guc: Use doorbells for submission if possible
>    drm/xe/guc: Print doorbell ID in GuC engine debugfs entry
>    maple_tree: Export mas_preallocate
>    drm/xe: Port Xe to GPUVA
>    drm/xe: NULL binding implementation
>    drm/xe: Avoid doing rebinds
>    drm/xe: Reduce the number list links in xe_vma
>    drm/xe: Optimize size of xe_vma allocation
>    drm/gpuva: Add drm device to GPUVA manager
>    drm/gpuva: Move dma-resv to GPUVA manager
>    drm/gpuva: Add support for extobj
>    drm/xe: Userptr refactor
>    drm/exec: Always compile drm_exec
>    drm/xe: Use drm_exec for locking rather than TTM exec helpers
>    drm/xe: Allow dma-fences as in-syncs for compute / faulting VM
>    drm/xe: Allow compute VMs to output dma-fences on binds
>    drm/xe: remove async worker, sync binds, new error handling
>    drm/xe/uapi: Add some VM bind kernel doc
>
>   Documentation/gpu/drm-mm.rst                 |   43 +
>   drivers/gpu/drm/Kconfig                      |    6 +
>   drivers/gpu/drm/Makefile                     |    4 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c   |    3 +-
>   drivers/gpu/drm/drm_debugfs.c                |   41 +
>   drivers/gpu/drm/drm_exec.c                   |  248 ++
>   drivers/gpu/drm/drm_gem.c                    |    3 +
>   drivers/gpu/drm/drm_gpuva_mgr.c              | 1779 ++++++++++++
>   drivers/gpu/drm/etnaviv/etnaviv_sched.c      |    5 +-
>   drivers/gpu/drm/i915/display/intel_display.c |    6 +-
>   drivers/gpu/drm/lima/lima_sched.c            |    5 +-
>   drivers/gpu/drm/msm/msm_ringbuffer.c         |    5 +-
>   drivers/gpu/drm/panfrost/panfrost_job.c      |    5 +-
>   drivers/gpu/drm/scheduler/sched_entity.c     |   84 +-
>   drivers/gpu/drm/scheduler/sched_fence.c      |    2 +-
>   drivers/gpu/drm/scheduler/sched_main.c       |   88 +-
>   drivers/gpu/drm/v3d/v3d_sched.c              |   25 +-
>   drivers/gpu/drm/xe/Kconfig                   |    1 +
>   drivers/gpu/drm/xe/regs/xe_guc_regs.h        |    1 +
>   drivers/gpu/drm/xe/tests/xe_bo.c             |   26 +-
>   drivers/gpu/drm/xe/tests/xe_migrate.c        |    6 +-
>   drivers/gpu/drm/xe/xe_bo.c                   |  100 +-
>   drivers/gpu/drm/xe/xe_bo.h                   |   13 +-
>   drivers/gpu/drm/xe/xe_bo_evict.c             |   24 +-
>   drivers/gpu/drm/xe/xe_bo_types.h             |    1 -
>   drivers/gpu/drm/xe/xe_device.c               |    2 +-
>   drivers/gpu/drm/xe/xe_dma_buf.c              |    2 +-
>   drivers/gpu/drm/xe/xe_engine.c               |   50 +-
>   drivers/gpu/drm/xe/xe_engine.h               |    4 +
>   drivers/gpu/drm/xe/xe_engine_types.h         |    1 +
>   drivers/gpu/drm/xe/xe_exec.c                 |  117 +-
>   drivers/gpu/drm/xe/xe_execlist.c             |    3 +-
>   drivers/gpu/drm/xe/xe_gt_pagefault.c         |   84 +-
>   drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c  |   14 +-
>   drivers/gpu/drm/xe/xe_guc.c                  |    6 +
>   drivers/gpu/drm/xe/xe_guc_ct.c               |   12 +-
>   drivers/gpu/drm/xe/xe_guc_engine_types.h     |    9 +
>   drivers/gpu/drm/xe/xe_guc_pc.c               |    6 +-
>   drivers/gpu/drm/xe/xe_guc_submit.c           |  398 ++-
>   drivers/gpu/drm/xe/xe_guc_submit.h           |    1 +
>   drivers/gpu/drm/xe/xe_guc_types.h            |    4 +
>   drivers/gpu/drm/xe/xe_huc.c                  |    2 +-
>   drivers/gpu/drm/xe/xe_lrc.c                  |    8 +-
>   drivers/gpu/drm/xe/xe_migrate.c              |   31 +-
>   drivers/gpu/drm/xe/xe_pt.c                   |  198 +-
>   drivers/gpu/drm/xe/xe_sync.c                 |   26 +-
>   drivers/gpu/drm/xe/xe_sync.h                 |    2 +-
>   drivers/gpu/drm/xe/xe_trace.h                |   20 +-
>   drivers/gpu/drm/xe/xe_vm.c                   | 2567 +++++++-----------
>   drivers/gpu/drm/xe/xe_vm.h                   |  135 +-
>   drivers/gpu/drm/xe/xe_vm_madvise.c           |  125 +-
>   drivers/gpu/drm/xe/xe_vm_types.h             |  324 ++-
>   drivers/gpu/drm/xe/xe_wait_user_fence.c      |   43 +-
>   include/drm/drm_debugfs.h                    |   24 +
>   include/drm/drm_drv.h                        |    7 +
>   include/drm/drm_exec.h                       |  115 +
>   include/drm/drm_gem.h                        |   75 +
>   include/drm/drm_gpuva_mgr.h                  |  759 ++++++
>   include/drm/gpu_scheduler.h                  |   29 +-
>   include/linux/maple_tree.h                   |    7 +-
>   include/uapi/drm/xe_drm.h                    |  128 +-
>   lib/maple_tree.c                             |    1 +
>   62 files changed, 5543 insertions(+), 2320 deletions(-)
>   create mode 100644 drivers/gpu/drm/drm_exec.c
>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>   create mode 100644 include/drm/drm_exec.h
>   create mode 100644 include/drm/drm_gpuva_mgr.h
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 01/31] drm/sched: Add run_wq argument to drm_sched_init
  2023-05-03 12:03   ` Thomas Hellström
@ 2023-05-03 15:06     ` Matthew Brost
  2023-05-05 18:24       ` Rodrigo Vivi
  0 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-03 15:06 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

On Wed, May 03, 2023 at 02:03:49PM +0200, Thomas Hellström wrote:
> Hi,
> 
> On 5/2/23 02:16, Matthew Brost wrote:
> > We will have this argument upstream, lets pull into the Xe repo.
> 
> Please rephrase the commit message. Add explanation, imperative wording and
> remove mentions of the Xe repo (nobody cares about that once it goes
> upstream)
> 
> /Thomas

This patch gets squashed into one eariler in our repo into t upstream
version:
https://patchwork.freedesktop.org/patch/530648/?series=116054&rev=1

Can reword if you think it is a big deal but this is just a placeholder.

Matt
 

> 
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 +-
> >   drivers/gpu/drm/etnaviv/etnaviv_sched.c    |  2 +-
> >   drivers/gpu/drm/lima/lima_sched.c          |  2 +-
> >   drivers/gpu/drm/msm/msm_ringbuffer.c       |  2 +-
> >   drivers/gpu/drm/panfrost/panfrost_job.c    |  2 +-
> >   drivers/gpu/drm/scheduler/sched_main.c     |  4 +++-
> >   drivers/gpu/drm/v3d/v3d_sched.c            | 10 +++++-----
> >   drivers/gpu/drm/xe/xe_execlist.c           |  2 +-
> >   drivers/gpu/drm/xe/xe_guc_submit.c         |  2 +-
> >   include/drm/gpu_scheduler.h                |  1 +
> >   10 files changed, 16 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 902f9b5ff82c..fe28f6b71fe3 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2364,7 +2364,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
> >   			break;
> >   		}
> > -		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
> > +		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL,
> >   				   ring->num_hw_submission, amdgpu_job_hang_limit,
> >   				   timeout, adev->reset_domain->wq,
> >   				   ring->sched_score, ring->name,
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > index 1ae87dfd19c4..8486a2923f1b 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > @@ -133,7 +133,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
> >   {
> >   	int ret;
> > -	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
> > +	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
> >   			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
> >   			     msecs_to_jiffies(500), NULL, NULL,
> >   			     dev_name(gpu->dev), gpu->dev);
> > diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> > index ff003403fbbc..54f53bece27c 100644
> > --- a/drivers/gpu/drm/lima/lima_sched.c
> > +++ b/drivers/gpu/drm/lima/lima_sched.c
> > @@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
> >   	INIT_WORK(&pipe->recover_work, lima_sched_recover_work);
> > -	return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
> > +	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1,
> >   			      lima_job_hang_limit,
> >   			      msecs_to_jiffies(timeout), NULL,
> >   			      NULL, name, pipe->ldev->dev);
> > diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
> > index 57a8e9564540..5879fc262047 100644
> > --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> > +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> > @@ -95,7 +95,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
> >   	 /* currently managing hangcheck ourselves: */
> >   	sched_timeout = MAX_SCHEDULE_TIMEOUT;
> > -	ret = drm_sched_init(&ring->sched, &msm_sched_ops,
> > +	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
> >   			num_hw_submissions, 0, sched_timeout,
> >   			NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
> >   	if (ret) {
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> > index dbc597ab46fb..f48b07056a16 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> > @@ -815,7 +815,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
> >   		js->queue[j].fence_context = dma_fence_context_alloc(1);
> >   		ret = drm_sched_init(&js->queue[j].sched,
> > -				     &panfrost_sched_ops,
> > +				     &panfrost_sched_ops, NULL,
> >   				     nentries, 0,
> >   				     msecs_to_jiffies(JOB_TIMEOUT_MS),
> >   				     pfdev->reset.wq,
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index cfd8a838e283..e79b9c760efe 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1182,6 +1182,7 @@ static void drm_sched_main(struct work_struct *w)
> >    *
> >    * @sched: scheduler instance
> >    * @ops: backend operations for this scheduler
> > + * @run_wq: workqueue to use for run work. If NULL, the system_wq is used
> >    * @hw_submission: number of hw submissions that can be in flight
> >    * @hang_limit: number of times to allow a job to hang before dropping it
> >    * @timeout: timeout value in jiffies for the scheduler
> > @@ -1195,6 +1196,7 @@ static void drm_sched_main(struct work_struct *w)
> >    */
> >   int drm_sched_init(struct drm_gpu_scheduler *sched,
> >   		   const struct drm_sched_backend_ops *ops,
> > +		   struct workqueue_struct *run_wq,
> >   		   unsigned hw_submission, unsigned hang_limit,
> >   		   long timeout, struct workqueue_struct *timeout_wq,
> >   		   atomic_t *score, const char *name, struct device *dev)
> > @@ -1203,9 +1205,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> >   	sched->ops = ops;
> >   	sched->hw_submission_limit = hw_submission;
> >   	sched->name = name;
> > +	sched->run_wq = run_wq ? : system_wq;
> >   	sched->timeout = timeout;
> >   	sched->timeout_wq = timeout_wq ? : system_wq;
> > -	sched->run_wq = system_wq;	/* FIXME: Let user pass this in */
> >   	sched->hang_limit = hang_limit;
> >   	sched->score = score ? score : &sched->_score;
> >   	sched->dev = dev;
> > diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> > index 06238e6d7f5c..38e092ea41e6 100644
> > --- a/drivers/gpu/drm/v3d/v3d_sched.c
> > +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> > @@ -388,7 +388,7 @@ v3d_sched_init(struct v3d_dev *v3d)
> >   	int ret;
> >   	ret = drm_sched_init(&v3d->queue[V3D_BIN].sched,
> > -			     &v3d_bin_sched_ops,
> > +			     &v3d_bin_sched_ops, NULL,
> >   			     hw_jobs_limit, job_hang_limit,
> >   			     msecs_to_jiffies(hang_limit_ms), NULL,
> >   			     NULL, "v3d_bin", v3d->drm.dev);
> > @@ -396,7 +396,7 @@ v3d_sched_init(struct v3d_dev *v3d)
> >   		return ret;
> >   	ret = drm_sched_init(&v3d->queue[V3D_RENDER].sched,
> > -			     &v3d_render_sched_ops,
> > +			     &v3d_render_sched_ops, NULL,
> >   			     hw_jobs_limit, job_hang_limit,
> >   			     msecs_to_jiffies(hang_limit_ms), NULL,
> >   			     NULL, "v3d_render", v3d->drm.dev);
> > @@ -404,7 +404,7 @@ v3d_sched_init(struct v3d_dev *v3d)
> >   		goto fail;
> >   	ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
> > -			     &v3d_tfu_sched_ops,
> > +			     &v3d_tfu_sched_ops, NULL,
> >   			     hw_jobs_limit, job_hang_limit,
> >   			     msecs_to_jiffies(hang_limit_ms), NULL,
> >   			     NULL, "v3d_tfu", v3d->drm.dev);
> > @@ -413,7 +413,7 @@ v3d_sched_init(struct v3d_dev *v3d)
> >   	if (v3d_has_csd(v3d)) {
> >   		ret = drm_sched_init(&v3d->queue[V3D_CSD].sched,
> > -				     &v3d_csd_sched_ops,
> > +				     &v3d_csd_sched_ops, NULL,
> >   				     hw_jobs_limit, job_hang_limit,
> >   				     msecs_to_jiffies(hang_limit_ms), NULL,
> >   				     NULL, "v3d_csd", v3d->drm.dev);
> > @@ -421,7 +421,7 @@ v3d_sched_init(struct v3d_dev *v3d)
> >   			goto fail;
> >   		ret = drm_sched_init(&v3d->queue[V3D_CACHE_CLEAN].sched,
> > -				     &v3d_cache_clean_sched_ops,
> > +				     &v3d_cache_clean_sched_ops, NULL,
> >   				     hw_jobs_limit, job_hang_limit,
> >   				     msecs_to_jiffies(hang_limit_ms), NULL,
> >   				     NULL, "v3d_cache_clean", v3d->drm.dev);
> > diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
> > index de4f0044b211..d6d60ebf3d5f 100644
> > --- a/drivers/gpu/drm/xe/xe_execlist.c
> > +++ b/drivers/gpu/drm/xe/xe_execlist.c
> > @@ -336,7 +336,7 @@ static int execlist_engine_init(struct xe_engine *e)
> >   	exl->engine = e;
> > -	err = drm_sched_init(&exl->sched, &drm_sched_ops,
> > +	err = drm_sched_init(&exl->sched, &drm_sched_ops, NULL,
> >   			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
> >   			     XE_SCHED_HANG_LIMIT, XE_SCHED_JOB_TIMEOUT,
> >   			     NULL, NULL, e->hwe->name,
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index e857013070b9..735f31257f3a 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -1081,7 +1081,7 @@ static int guc_engine_init(struct xe_engine *e)
> >   	init_waitqueue_head(&ge->suspend_wait);
> >   	timeout = xe_vm_no_dma_fences(e->vm) ? MAX_SCHEDULE_TIMEOUT : HZ * 5;
> > -	err = drm_sched_init(&ge->sched, &drm_sched_ops,
> > +	err = drm_sched_init(&ge->sched, &drm_sched_ops, NULL,
> >   			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
> >   			     64, timeout, guc_to_gt(guc)->ordered_wq, NULL,
> >   			     e->name, gt_to_xe(e->gt)->drm.dev);
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index cf85f93218fc..09bc39840dc8 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -539,6 +539,7 @@ struct drm_gpu_scheduler {
> >   int drm_sched_init(struct drm_gpu_scheduler *sched,
> >   		   const struct drm_sched_backend_ops *ops,
> > +		   struct workqueue_struct *run_wq,
> >   		   uint32_t hw_submission, unsigned hang_limit,
> >   		   long timeout, struct workqueue_struct *timeout_wq,
> >   		   atomic_t *score, const char *name, struct device *dev);

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 02/31] drm/sched: Move schedule policy to scheduler
  2023-05-03 12:13   ` Thomas Hellström
@ 2023-05-03 15:11     ` Matthew Brost
  0 siblings, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-03 15:11 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

On Wed, May 03, 2023 at 02:13:15PM +0200, Thomas Hellström wrote:
> 
> On 5/2/23 02:16, Matthew Brost wrote:
> > Rather than a global modparam for scheduling policy, move the scheduling
> > policy to scheduler so user can control each scheduler policy.
> 
> Could you add some more info here, about why this is done and how the
> scheduler policy is supposed to be controlled? Should it say "driver can
> control" rather than "user can control" at this stage+
> 

Rather than a global modparam for scheduling policy, move the scheduling
policy to scheduler so driver can control each scheduler policy. This is
required as it is possible in certain drivers certain scheduling polices
are not allow, thus the driver must set the policy rather than the
modparam.

^^^ How about that?

Again this is a placeholder patch as it should land upstream first.

Matt

> 
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
> >   drivers/gpu/drm/etnaviv/etnaviv_sched.c    |  3 ++-
> >   drivers/gpu/drm/lima/lima_sched.c          |  3 ++-
> >   drivers/gpu/drm/msm/msm_ringbuffer.c       |  3 ++-
> >   drivers/gpu/drm/panfrost/panfrost_job.c    |  3 ++-
> >   drivers/gpu/drm/scheduler/sched_entity.c   | 24 ++++++++++++++++++----
> >   drivers/gpu/drm/scheduler/sched_main.c     | 21 ++++++++++++++-----
> >   drivers/gpu/drm/v3d/v3d_sched.c            | 15 +++++++++-----
> >   drivers/gpu/drm/xe/xe_execlist.c           |  2 +-
> >   drivers/gpu/drm/xe/xe_guc_submit.c         |  3 ++-
> >   include/drm/gpu_scheduler.h                | 20 ++++++++++++------
> >   11 files changed, 72 insertions(+), 26 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index fe28f6b71fe3..577ea5b98cd5 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2368,6 +2368,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
> >   				   ring->num_hw_submission, amdgpu_job_hang_limit,
> >   				   timeout, adev->reset_domain->wq,
> >   				   ring->sched_score, ring->name,
> > +				   DRM_SCHED_POLICY_DEFAULT,
> >   				   adev->dev);
> >   		if (r) {
> >   			DRM_ERROR("Failed to create scheduler on ring %s.\n",
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > index 8486a2923f1b..61204a3f8b0b 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > @@ -136,7 +136,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
> >   	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
> >   			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
> >   			     msecs_to_jiffies(500), NULL, NULL,
> > -			     dev_name(gpu->dev), gpu->dev);
> > +			     dev_name(gpu->dev), DRM_SCHED_POLICY_DEFAULT,
> > +			     gpu->dev);
> >   	if (ret)
> >   		return ret;
> > diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> > index 54f53bece27c..33042ba6ae93 100644
> > --- a/drivers/gpu/drm/lima/lima_sched.c
> > +++ b/drivers/gpu/drm/lima/lima_sched.c
> > @@ -491,7 +491,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
> >   	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1,
> >   			      lima_job_hang_limit,
> >   			      msecs_to_jiffies(timeout), NULL,
> > -			      NULL, name, pipe->ldev->dev);
> > +			      NULL, name, DRM_SCHED_POLICY_DEFAULT,
> > +			      pipe->ldev->dev);
> >   }
> >   void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
> > diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
> > index 5879fc262047..f408a9097315 100644
> > --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> > +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> > @@ -97,7 +97,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
> >   	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
> >   			num_hw_submissions, 0, sched_timeout,
> > -			NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
> > +			NULL, NULL, to_msm_bo(ring->bo)->name,
> > +			DRM_SCHED_POLICY_DEFAULT, gpu->dev->dev);
> >   	if (ret) {
> >   		goto fail;
> >   	}
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> > index f48b07056a16..effa48b33dce 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> > @@ -819,7 +819,8 @@ int panfrost_job_init(struct panfrost_device *pfdev)
> >   				     nentries, 0,
> >   				     msecs_to_jiffies(JOB_TIMEOUT_MS),
> >   				     pfdev->reset.wq,
> > -				     NULL, "pan_js", pfdev->dev);
> > +				     NULL, "pan_js", DRM_SCHED_POLICY_DEFAULT,
> > +				     pfdev->dev);
> >   		if (ret) {
> >   			dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret);
> >   			goto err_sched;
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > index 15d04a0ec623..2300b2fc06ab 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -33,6 +33,20 @@
> >   #define to_drm_sched_job(sched_job)		\
> >   		container_of((sched_job), struct drm_sched_job, queue_node)
> > +static bool bad_policies(struct drm_gpu_scheduler **sched_list,
> > +			 unsigned int num_sched_list)
> > +{
> > +	enum drm_sched_policy sched_policy = sched_list[0]->sched_policy;
> > +	unsigned int i;
> > +
> > +	/* All scdedule policies must match */
> s/scdedule/schedule/
> > +	for (i = 1; i < num_sched_list; ++i)
> > +		if (sched_policy != sched_list[i]->sched_policy)
> > +			return true;
> > +
> > +	return false;
> > +}
> > +
> >   /**
> >    * drm_sched_entity_init - Init a context entity used by scheduler when
> >    * submit to HW ring.
> > @@ -62,7 +76,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
> >   			  unsigned int num_sched_list,
> >   			  atomic_t *guilty)
> >   {
> > -	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
> > +	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])) ||
> > +	    bad_policies(sched_list, num_sched_list))
> >   		return -EINVAL;
> >   	memset(entity, 0, sizeof(struct drm_sched_entity));
> > @@ -75,8 +90,9 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
> >   	entity->last_scheduled = NULL;
> >   	RB_CLEAR_NODE(&entity->rb_tree_node);
> > -	if(num_sched_list)
> > +	if(num_sched_list) {
> >   		entity->rq = &sched_list[0]->sched_rq[entity->priority];
> > +	}
> Why are brackets added here?
> >   	init_completion(&entity->entity_idle);
> > @@ -440,7 +456,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
> >   	 * Update the entity's location in the min heap according to
> >   	 * the timestamp of the next job, if any.
> >   	 */
> > -	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
> > +	if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO) {
> >   		struct drm_sched_job *next;
> >   		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> > @@ -528,7 +544,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
> >   		drm_sched_rq_add_entity(entity->rq, entity);
> >   		spin_unlock(&entity->rq_lock);
> > -		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> > +		if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO)
> >   			drm_sched_rq_update_fifo(entity, sched_job->submit_ts);
> >   		drm_sched_wakeup(entity->rq->sched);
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index e79b9c760efe..6777a2db554f 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -62,14 +62,14 @@
> >   #define to_drm_sched_job(sched_job)		\
> >   		container_of((sched_job), struct drm_sched_job, queue_node)
> > -int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
> > +int default_drm_sched_policy = DRM_SCHED_POLICY_FIFO;
> >   /**
> >    * DOC: sched_policy (int)
> >    * Used to override default entities scheduling policy in a run queue.
> >    */
> >   MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
> > -module_param_named(sched_policy, drm_sched_policy, int, 0444);
> > +module_param_named(sched_policy, default_drm_sched_policy, int, 0444);
> >   static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a,
> >   							    const struct rb_node *b)
> > @@ -173,7 +173,7 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
> >   	if (rq->current_entity == entity)
> >   		rq->current_entity = NULL;
> > -	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> > +	if (rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO)
> >   		drm_sched_rq_remove_fifo_locked(entity);
> >   	spin_unlock(&rq->lock);
> > @@ -956,7 +956,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
> >   	/* Kernel run queue has higher priority than normal run queue*/
> >   	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
> > -		entity = drm_sched_policy == DRM_SCHED_POLICY_FIFO ?
> > +		entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ?
> >   			drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]) :
> >   			drm_sched_rq_select_entity_rr(&sched->sched_rq[i]);
> >   		if (entity)
> > @@ -1190,6 +1190,7 @@ static void drm_sched_main(struct work_struct *w)
> >    *		used
> >    * @score: optional score atomic shared with other schedulers
> >    * @name: name used for debugging
> > + * @sched_policy: schedule policy
> >    * @dev: target &struct device
> >    *
> >    * Return 0 on success, otherwise error code.
> > @@ -1199,9 +1200,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> >   		   struct workqueue_struct *run_wq,
> >   		   unsigned hw_submission, unsigned hang_limit,
> >   		   long timeout, struct workqueue_struct *timeout_wq,
> > -		   atomic_t *score, const char *name, struct device *dev)
> > +		   atomic_t *score, const char *name,
> > +		   enum drm_sched_policy sched_policy,
> > +		   struct device *dev)
> >   {
> >   	int i;
> > +
> > +	if (sched_policy >= DRM_SCHED_POLICY_COUNT)
> > +		return -EINVAL;
> > +
> >   	sched->ops = ops;
> >   	sched->hw_submission_limit = hw_submission;
> >   	sched->name = name;
> > @@ -1211,6 +1218,10 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> >   	sched->hang_limit = hang_limit;
> >   	sched->score = score ? score : &sched->_score;
> >   	sched->dev = dev;
> > +	if (sched_policy == DRM_SCHED_POLICY_DEFAULT)
> > +		sched->sched_policy = default_drm_sched_policy;
> > +	else
> > +		sched->sched_policy = sched_policy;
> >   	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
> >   		drm_sched_rq_init(sched, &sched->sched_rq[i]);
> > diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> > index 38e092ea41e6..5e3fe77fa991 100644
> > --- a/drivers/gpu/drm/v3d/v3d_sched.c
> > +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> > @@ -391,7 +391,8 @@ v3d_sched_init(struct v3d_dev *v3d)
> >   			     &v3d_bin_sched_ops, NULL,
> >   			     hw_jobs_limit, job_hang_limit,
> >   			     msecs_to_jiffies(hang_limit_ms), NULL,
> > -			     NULL, "v3d_bin", v3d->drm.dev);
> > +			     NULL, "v3d_bin", DRM_SCHED_POLICY_DEFAULT,
> > +			     v3d->drm.dev);
> >   	if (ret)
> >   		return ret;
> > @@ -399,7 +400,8 @@ v3d_sched_init(struct v3d_dev *v3d)
> >   			     &v3d_render_sched_ops, NULL,
> >   			     hw_jobs_limit, job_hang_limit,
> >   			     msecs_to_jiffies(hang_limit_ms), NULL,
> > -			     NULL, "v3d_render", v3d->drm.dev);
> > +			     ULL, "v3d_render", DRM_SCHED_POLICY_DEFAULT,
> > +			     v3d->drm.dev);
> >   	if (ret)
> >   		goto fail;
> > @@ -407,7 +409,8 @@ v3d_sched_init(struct v3d_dev *v3d)
> >   			     &v3d_tfu_sched_ops, NULL,
> >   			     hw_jobs_limit, job_hang_limit,
> >   			     msecs_to_jiffies(hang_limit_ms), NULL,
> > -			     NULL, "v3d_tfu", v3d->drm.dev);
> > +			     NULL, "v3d_tfu", DRM_SCHED_POLICY_DEFAULT,
> > +			     v3d->drm.dev);
> >   	if (ret)
> >   		goto fail;
> > @@ -416,7 +419,8 @@ v3d_sched_init(struct v3d_dev *v3d)
> >   				     &v3d_csd_sched_ops, NULL,
> >   				     hw_jobs_limit, job_hang_limit,
> >   				     msecs_to_jiffies(hang_limit_ms), NULL,
> > -				     NULL, "v3d_csd", v3d->drm.dev);
> > +				     NULL, "v3d_csd", DRM_SCHED_POLICY_DEFAULT,
> > +				     v3d->drm.dev);
> >   		if (ret)
> >   			goto fail;
> > @@ -424,7 +428,8 @@ v3d_sched_init(struct v3d_dev *v3d)
> >   				     &v3d_cache_clean_sched_ops, NULL,
> >   				     hw_jobs_limit, job_hang_limit,
> >   				     msecs_to_jiffies(hang_limit_ms), NULL,
> > -				     NULL, "v3d_cache_clean", v3d->drm.dev);
> > +				     NULL, "v3d_cache_clean",
> > +				     DRM_SCHED_POLICY_DEFAULT, v3d->drm.dev);
> >   		if (ret)
> >   			goto fail;
> >   	}
> > diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
> > index d6d60ebf3d5f..48060d14547a 100644
> > --- a/drivers/gpu/drm/xe/xe_execlist.c
> > +++ b/drivers/gpu/drm/xe/xe_execlist.c
> > @@ -339,7 +339,7 @@ static int execlist_engine_init(struct xe_engine *e)
> >   	err = drm_sched_init(&exl->sched, &drm_sched_ops, NULL,
> >   			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
> >   			     XE_SCHED_HANG_LIMIT, XE_SCHED_JOB_TIMEOUT,
> > -			     NULL, NULL, e->hwe->name,
> > +			     NULL, NULL, e->hwe->name, DRM_SCHED_POLICY_DEFAULT,
> >   			     gt_to_xe(e->gt)->drm.dev);
> >   	if (err)
> >   		goto err_free;
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index 735f31257f3a..9d3fadca43be 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -1084,7 +1084,8 @@ static int guc_engine_init(struct xe_engine *e)
> >   	err = drm_sched_init(&ge->sched, &drm_sched_ops, NULL,
> >   			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
> >   			     64, timeout, guc_to_gt(guc)->ordered_wq, NULL,
> > -			     e->name, gt_to_xe(e->gt)->drm.dev);
> > +			     e->name, DRM_SCHED_POLICY_DEFAULT,
> > +			     gt_to_xe(e->gt)->drm.dev);
> >   	if (err)
> >   		goto err_free;
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 09bc39840dc8..3df801401028 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -63,11 +63,15 @@ enum drm_sched_priority {
> >   	DRM_SCHED_PRIORITY_UNSET = -2
> >   };
> > -/* Used to chose between FIFO and RR jobs scheduling */
> > -extern int drm_sched_policy;
> > -
> > -#define DRM_SCHED_POLICY_RR    0
> > -#define DRM_SCHED_POLICY_FIFO  1
> > +/* Used to chose default scheduling policy*/
> > +extern int default_drm_sched_policy;
> > +
> > +enum drm_sched_policy {
> > +	DRM_SCHED_POLICY_DEFAULT,
> > +	DRM_SCHED_POLICY_RR,
> > +	DRM_SCHED_POLICY_FIFO,
> > +	DRM_SCHED_POLICY_COUNT,
> > +};
> >   /**
> >    * struct drm_sched_entity - A wrapper around a job queue (typically
> > @@ -505,6 +509,7 @@ struct drm_sched_backend_ops {
> >    *              guilty and it will no longer be considered for scheduling.
> >    * @score: score to help loadbalancer pick a idle sched
> >    * @_score: score used when the driver doesn't provide one
> > + * @sched_policy: Schedule policy for scheduler
> >    * @ready: marks if the underlying HW is ready to work
> >    * @free_guilty: A hit to time out handler to free the guilty job.
> >    * @pause_run_wq: pause queuing of @work_run on @run_wq
> > @@ -531,6 +536,7 @@ struct drm_gpu_scheduler {
> >   	int				hang_limit;
> >   	atomic_t                        *score;
> >   	atomic_t                        _score;
> > +	enum drm_sched_policy		sched_policy;
> >   	bool				ready;
> >   	bool				free_guilty;
> >   	bool				pause_run_wq;
> > @@ -542,7 +548,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> >   		   struct workqueue_struct *run_wq,
> >   		   uint32_t hw_submission, unsigned hang_limit,
> >   		   long timeout, struct workqueue_struct *timeout_wq,
> > -		   atomic_t *score, const char *name, struct device *dev);
> > +		   atomic_t *score, const char *name,
> > +		   enum drm_sched_policy sched_policy,
> > +		   struct device *dev);
> >   void drm_sched_fini(struct drm_gpu_scheduler *sched);
> >   int drm_sched_job_init(struct drm_sched_job *job,

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches
  2023-05-03 12:37 ` [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Thomas Hellström
@ 2023-05-03 15:27   ` Matthew Brost
  0 siblings, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-03 15:27 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

On Wed, May 03, 2023 at 02:37:14PM +0200, Thomas Hellström wrote:
> Hi, Matthew
> 
> On 5/2/23 02:16, Matthew Brost wrote:
> > Series includes:
> > 
> > - DRM scheduler changes for firmware backends (1 to 1 entity to scheduler)
> > - LR workload story
> > - VM LRU handling
> > - GuC doorbell submission
> > - Basic GPUVA
> > - Sparse binding support
> > - GPUVA + extobj + drm exec (collaboration with dakr + Francois Dugast)
> > - GPUVA + userptr (minimal, more can be once Nouveua has userptr)
> > - Fix fencing rules for compute / fault mode
> > - Remove async worker for VM + error handling updates
> > - Kernel doc for VM bind
> 
> It would be beneficial to the reviewer if  you could make separate series
> where applicable with links to previous discussions and add information
> about whether all design issues /discussions were resolved or if anything
> was remaining and a brief guidance as to what is in each patch of the
> series.
> 

Basically nothing has been reviewed, sad but this where we are. Most of
this code is 2-3 months old too, I can't maintain like 5 branches out
code and keep everything stable, hence 1 large series. Going forward,
I'd ask for timely reviews, I make an effort to review code actively and
if I'm asked review code I do it in a timely manner. In case, I think
the team is just going to have get through this. Again this isn't truely
upstream either, we just need to get this our repo. Any of the common
code will also go through reviews on dri-devel before we go upstream.

Matt 

> /Thomas
> 
> 
> 
> 
> > Series is not fully ready for upstream and some of these things need to
> > get merged upstream first but overall it is largely correct and
> > certainly step in the right direction. Based on its size and the fact it
> > took me 8 hours to rebase this today I'd say let's get this tree and
> > fixup everything else in place.
> > 
> > Minor uAPI breakage, IGT series:
> > https://patchwork.freedesktop.org/series/117177/
> > 
> > gitlab link:
> > https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/344
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > 
> > Christian König (1):
> >    drm: execution context for GEM buffers v3
> > 
> > Danilo Krummrich (2):
> >    maple_tree: split up MA_STATE() macro
> >    drm: manager to keep track of GPUs VA mappings
> > 
> > Matthew Brost (28):
> >    drm/sched: Add run_wq argument to drm_sched_init
> >    drm/sched: Move schedule policy to scheduler
> >    drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
> >    drm/xe: Use DRM_SCHED_POLICY_SINGLE_ENTITY mode
> >    drm/xe: Long running job update
> >    drm/xe: Ensure LR engines are not persistent
> >    drm/xe: Only try to lock external BOs in VM bind
> >    drm/xe: VM LRU bulk move
> >    drm/xe/guc: Read HXG fields from DW1 of G2H response
> >    drm/xe/guc: Return the lower part of blocking H2G message
> >    drm/xe/guc: Use doorbells for submission if possible
> >    drm/xe/guc: Print doorbell ID in GuC engine debugfs entry
> >    maple_tree: Export mas_preallocate
> >    drm/xe: Port Xe to GPUVA
> >    drm/xe: NULL binding implementation
> >    drm/xe: Avoid doing rebinds
> >    drm/xe: Reduce the number list links in xe_vma
> >    drm/xe: Optimize size of xe_vma allocation
> >    drm/gpuva: Add drm device to GPUVA manager
> >    drm/gpuva: Move dma-resv to GPUVA manager
> >    drm/gpuva: Add support for extobj
> >    drm/xe: Userptr refactor
> >    drm/exec: Always compile drm_exec
> >    drm/xe: Use drm_exec for locking rather than TTM exec helpers
> >    drm/xe: Allow dma-fences as in-syncs for compute / faulting VM
> >    drm/xe: Allow compute VMs to output dma-fences on binds
> >    drm/xe: remove async worker, sync binds, new error handling
> >    drm/xe/uapi: Add some VM bind kernel doc
> > 
> >   Documentation/gpu/drm-mm.rst                 |   43 +
> >   drivers/gpu/drm/Kconfig                      |    6 +
> >   drivers/gpu/drm/Makefile                     |    4 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c   |    3 +-
> >   drivers/gpu/drm/drm_debugfs.c                |   41 +
> >   drivers/gpu/drm/drm_exec.c                   |  248 ++
> >   drivers/gpu/drm/drm_gem.c                    |    3 +
> >   drivers/gpu/drm/drm_gpuva_mgr.c              | 1779 ++++++++++++
> >   drivers/gpu/drm/etnaviv/etnaviv_sched.c      |    5 +-
> >   drivers/gpu/drm/i915/display/intel_display.c |    6 +-
> >   drivers/gpu/drm/lima/lima_sched.c            |    5 +-
> >   drivers/gpu/drm/msm/msm_ringbuffer.c         |    5 +-
> >   drivers/gpu/drm/panfrost/panfrost_job.c      |    5 +-
> >   drivers/gpu/drm/scheduler/sched_entity.c     |   84 +-
> >   drivers/gpu/drm/scheduler/sched_fence.c      |    2 +-
> >   drivers/gpu/drm/scheduler/sched_main.c       |   88 +-
> >   drivers/gpu/drm/v3d/v3d_sched.c              |   25 +-
> >   drivers/gpu/drm/xe/Kconfig                   |    1 +
> >   drivers/gpu/drm/xe/regs/xe_guc_regs.h        |    1 +
> >   drivers/gpu/drm/xe/tests/xe_bo.c             |   26 +-
> >   drivers/gpu/drm/xe/tests/xe_migrate.c        |    6 +-
> >   drivers/gpu/drm/xe/xe_bo.c                   |  100 +-
> >   drivers/gpu/drm/xe/xe_bo.h                   |   13 +-
> >   drivers/gpu/drm/xe/xe_bo_evict.c             |   24 +-
> >   drivers/gpu/drm/xe/xe_bo_types.h             |    1 -
> >   drivers/gpu/drm/xe/xe_device.c               |    2 +-
> >   drivers/gpu/drm/xe/xe_dma_buf.c              |    2 +-
> >   drivers/gpu/drm/xe/xe_engine.c               |   50 +-
> >   drivers/gpu/drm/xe/xe_engine.h               |    4 +
> >   drivers/gpu/drm/xe/xe_engine_types.h         |    1 +
> >   drivers/gpu/drm/xe/xe_exec.c                 |  117 +-
> >   drivers/gpu/drm/xe/xe_execlist.c             |    3 +-
> >   drivers/gpu/drm/xe/xe_gt_pagefault.c         |   84 +-
> >   drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c  |   14 +-
> >   drivers/gpu/drm/xe/xe_guc.c                  |    6 +
> >   drivers/gpu/drm/xe/xe_guc_ct.c               |   12 +-
> >   drivers/gpu/drm/xe/xe_guc_engine_types.h     |    9 +
> >   drivers/gpu/drm/xe/xe_guc_pc.c               |    6 +-
> >   drivers/gpu/drm/xe/xe_guc_submit.c           |  398 ++-
> >   drivers/gpu/drm/xe/xe_guc_submit.h           |    1 +
> >   drivers/gpu/drm/xe/xe_guc_types.h            |    4 +
> >   drivers/gpu/drm/xe/xe_huc.c                  |    2 +-
> >   drivers/gpu/drm/xe/xe_lrc.c                  |    8 +-
> >   drivers/gpu/drm/xe/xe_migrate.c              |   31 +-
> >   drivers/gpu/drm/xe/xe_pt.c                   |  198 +-
> >   drivers/gpu/drm/xe/xe_sync.c                 |   26 +-
> >   drivers/gpu/drm/xe/xe_sync.h                 |    2 +-
> >   drivers/gpu/drm/xe/xe_trace.h                |   20 +-
> >   drivers/gpu/drm/xe/xe_vm.c                   | 2567 +++++++-----------
> >   drivers/gpu/drm/xe/xe_vm.h                   |  135 +-
> >   drivers/gpu/drm/xe/xe_vm_madvise.c           |  125 +-
> >   drivers/gpu/drm/xe/xe_vm_types.h             |  324 ++-
> >   drivers/gpu/drm/xe/xe_wait_user_fence.c      |   43 +-
> >   include/drm/drm_debugfs.h                    |   24 +
> >   include/drm/drm_drv.h                        |    7 +
> >   include/drm/drm_exec.h                       |  115 +
> >   include/drm/drm_gem.h                        |   75 +
> >   include/drm/drm_gpuva_mgr.h                  |  759 ++++++
> >   include/drm/gpu_scheduler.h                  |   29 +-
> >   include/linux/maple_tree.h                   |    7 +-
> >   include/uapi/drm/xe_drm.h                    |  128 +-
> >   lib/maple_tree.c                             |    1 +
> >   62 files changed, 5543 insertions(+), 2320 deletions(-)
> >   create mode 100644 drivers/gpu/drm/drm_exec.c
> >   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
> >   create mode 100644 include/drm/drm_exec.h
> >   create mode 100644 include/drm/drm_gpuva_mgr.h
> > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 01/31] drm/sched: Add run_wq argument to drm_sched_init
  2023-05-03 15:06     ` Matthew Brost
@ 2023-05-05 18:24       ` Rodrigo Vivi
  0 siblings, 0 replies; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-05 18:24 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Wed, May 03, 2023 at 03:06:06PM +0000, Matthew Brost wrote:
> On Wed, May 03, 2023 at 02:03:49PM +0200, Thomas Hellström wrote:
> > Hi,
> > 
> > On 5/2/23 02:16, Matthew Brost wrote:
> > > We will have this argument upstream, lets pull into the Xe repo.
> > 
> > Please rephrase the commit message. Add explanation, imperative wording and
> > remove mentions of the Xe repo (nobody cares about that once it goes
> > upstream)
> > 
> > /Thomas
> 
> This patch gets squashed into one eariler in our repo into t upstream
> version:
> https://patchwork.freedesktop.org/patch/530648/?series=116054&rev=1

if the patch is to be squashed in an existent one please create it with
git commit --fixup=<hash of commit to be changed>

so we push this to the top of the branch and during the cleanup rebases
that I'm running it will get automatically squashed...
(git rebase --autosquash)

without the fixup! annotation I don't have visibility to know which
patches should be squashed.

> 
> Can reword if you think it is a big deal but this is just a placeholder.
> 
> Matt
>  
> 
> > 
> > > 
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 +-
> > >   drivers/gpu/drm/etnaviv/etnaviv_sched.c    |  2 +-
> > >   drivers/gpu/drm/lima/lima_sched.c          |  2 +-
> > >   drivers/gpu/drm/msm/msm_ringbuffer.c       |  2 +-
> > >   drivers/gpu/drm/panfrost/panfrost_job.c    |  2 +-
> > >   drivers/gpu/drm/scheduler/sched_main.c     |  4 +++-
> > >   drivers/gpu/drm/v3d/v3d_sched.c            | 10 +++++-----
> > >   drivers/gpu/drm/xe/xe_execlist.c           |  2 +-
> > >   drivers/gpu/drm/xe/xe_guc_submit.c         |  2 +-
> > >   include/drm/gpu_scheduler.h                |  1 +
> > >   10 files changed, 16 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index 902f9b5ff82c..fe28f6b71fe3 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -2364,7 +2364,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
> > >   			break;
> > >   		}
> > > -		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
> > > +		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL,
> > >   				   ring->num_hw_submission, amdgpu_job_hang_limit,
> > >   				   timeout, adev->reset_domain->wq,
> > >   				   ring->sched_score, ring->name,
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > index 1ae87dfd19c4..8486a2923f1b 100644
> > > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > @@ -133,7 +133,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
> > >   {
> > >   	int ret;
> > > -	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
> > > +	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
> > >   			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
> > >   			     msecs_to_jiffies(500), NULL, NULL,
> > >   			     dev_name(gpu->dev), gpu->dev);
> > > diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> > > index ff003403fbbc..54f53bece27c 100644
> > > --- a/drivers/gpu/drm/lima/lima_sched.c
> > > +++ b/drivers/gpu/drm/lima/lima_sched.c
> > > @@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
> > >   	INIT_WORK(&pipe->recover_work, lima_sched_recover_work);
> > > -	return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
> > > +	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1,
> > >   			      lima_job_hang_limit,
> > >   			      msecs_to_jiffies(timeout), NULL,
> > >   			      NULL, name, pipe->ldev->dev);
> > > diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
> > > index 57a8e9564540..5879fc262047 100644
> > > --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> > > +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> > > @@ -95,7 +95,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
> > >   	 /* currently managing hangcheck ourselves: */
> > >   	sched_timeout = MAX_SCHEDULE_TIMEOUT;
> > > -	ret = drm_sched_init(&ring->sched, &msm_sched_ops,
> > > +	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
> > >   			num_hw_submissions, 0, sched_timeout,
> > >   			NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
> > >   	if (ret) {
> > > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> > > index dbc597ab46fb..f48b07056a16 100644
> > > --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> > > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> > > @@ -815,7 +815,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
> > >   		js->queue[j].fence_context = dma_fence_context_alloc(1);
> > >   		ret = drm_sched_init(&js->queue[j].sched,
> > > -				     &panfrost_sched_ops,
> > > +				     &panfrost_sched_ops, NULL,
> > >   				     nentries, 0,
> > >   				     msecs_to_jiffies(JOB_TIMEOUT_MS),
> > >   				     pfdev->reset.wq,
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > > index cfd8a838e283..e79b9c760efe 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -1182,6 +1182,7 @@ static void drm_sched_main(struct work_struct *w)
> > >    *
> > >    * @sched: scheduler instance
> > >    * @ops: backend operations for this scheduler
> > > + * @run_wq: workqueue to use for run work. If NULL, the system_wq is used
> > >    * @hw_submission: number of hw submissions that can be in flight
> > >    * @hang_limit: number of times to allow a job to hang before dropping it
> > >    * @timeout: timeout value in jiffies for the scheduler
> > > @@ -1195,6 +1196,7 @@ static void drm_sched_main(struct work_struct *w)
> > >    */
> > >   int drm_sched_init(struct drm_gpu_scheduler *sched,
> > >   		   const struct drm_sched_backend_ops *ops,
> > > +		   struct workqueue_struct *run_wq,
> > >   		   unsigned hw_submission, unsigned hang_limit,
> > >   		   long timeout, struct workqueue_struct *timeout_wq,
> > >   		   atomic_t *score, const char *name, struct device *dev)
> > > @@ -1203,9 +1205,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> > >   	sched->ops = ops;
> > >   	sched->hw_submission_limit = hw_submission;
> > >   	sched->name = name;
> > > +	sched->run_wq = run_wq ? : system_wq;
> > >   	sched->timeout = timeout;
> > >   	sched->timeout_wq = timeout_wq ? : system_wq;
> > > -	sched->run_wq = system_wq;	/* FIXME: Let user pass this in */
> > >   	sched->hang_limit = hang_limit;
> > >   	sched->score = score ? score : &sched->_score;
> > >   	sched->dev = dev;
> > > diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> > > index 06238e6d7f5c..38e092ea41e6 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_sched.c
> > > +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> > > @@ -388,7 +388,7 @@ v3d_sched_init(struct v3d_dev *v3d)
> > >   	int ret;
> > >   	ret = drm_sched_init(&v3d->queue[V3D_BIN].sched,
> > > -			     &v3d_bin_sched_ops,
> > > +			     &v3d_bin_sched_ops, NULL,
> > >   			     hw_jobs_limit, job_hang_limit,
> > >   			     msecs_to_jiffies(hang_limit_ms), NULL,
> > >   			     NULL, "v3d_bin", v3d->drm.dev);
> > > @@ -396,7 +396,7 @@ v3d_sched_init(struct v3d_dev *v3d)
> > >   		return ret;
> > >   	ret = drm_sched_init(&v3d->queue[V3D_RENDER].sched,
> > > -			     &v3d_render_sched_ops,
> > > +			     &v3d_render_sched_ops, NULL,
> > >   			     hw_jobs_limit, job_hang_limit,
> > >   			     msecs_to_jiffies(hang_limit_ms), NULL,
> > >   			     NULL, "v3d_render", v3d->drm.dev);
> > > @@ -404,7 +404,7 @@ v3d_sched_init(struct v3d_dev *v3d)
> > >   		goto fail;
> > >   	ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
> > > -			     &v3d_tfu_sched_ops,
> > > +			     &v3d_tfu_sched_ops, NULL,
> > >   			     hw_jobs_limit, job_hang_limit,
> > >   			     msecs_to_jiffies(hang_limit_ms), NULL,
> > >   			     NULL, "v3d_tfu", v3d->drm.dev);
> > > @@ -413,7 +413,7 @@ v3d_sched_init(struct v3d_dev *v3d)
> > >   	if (v3d_has_csd(v3d)) {
> > >   		ret = drm_sched_init(&v3d->queue[V3D_CSD].sched,
> > > -				     &v3d_csd_sched_ops,
> > > +				     &v3d_csd_sched_ops, NULL,
> > >   				     hw_jobs_limit, job_hang_limit,
> > >   				     msecs_to_jiffies(hang_limit_ms), NULL,
> > >   				     NULL, "v3d_csd", v3d->drm.dev);
> > > @@ -421,7 +421,7 @@ v3d_sched_init(struct v3d_dev *v3d)
> > >   			goto fail;
> > >   		ret = drm_sched_init(&v3d->queue[V3D_CACHE_CLEAN].sched,
> > > -				     &v3d_cache_clean_sched_ops,
> > > +				     &v3d_cache_clean_sched_ops, NULL,
> > >   				     hw_jobs_limit, job_hang_limit,
> > >   				     msecs_to_jiffies(hang_limit_ms), NULL,
> > >   				     NULL, "v3d_cache_clean", v3d->drm.dev);
> > > diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
> > > index de4f0044b211..d6d60ebf3d5f 100644
> > > --- a/drivers/gpu/drm/xe/xe_execlist.c
> > > +++ b/drivers/gpu/drm/xe/xe_execlist.c
> > > @@ -336,7 +336,7 @@ static int execlist_engine_init(struct xe_engine *e)
> > >   	exl->engine = e;
> > > -	err = drm_sched_init(&exl->sched, &drm_sched_ops,
> > > +	err = drm_sched_init(&exl->sched, &drm_sched_ops, NULL,
> > >   			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
> > >   			     XE_SCHED_HANG_LIMIT, XE_SCHED_JOB_TIMEOUT,
> > >   			     NULL, NULL, e->hwe->name,
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > index e857013070b9..735f31257f3a 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > @@ -1081,7 +1081,7 @@ static int guc_engine_init(struct xe_engine *e)
> > >   	init_waitqueue_head(&ge->suspend_wait);
> > >   	timeout = xe_vm_no_dma_fences(e->vm) ? MAX_SCHEDULE_TIMEOUT : HZ * 5;
> > > -	err = drm_sched_init(&ge->sched, &drm_sched_ops,
> > > +	err = drm_sched_init(&ge->sched, &drm_sched_ops, NULL,
> > >   			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
> > >   			     64, timeout, guc_to_gt(guc)->ordered_wq, NULL,
> > >   			     e->name, gt_to_xe(e->gt)->drm.dev);
> > > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > > index cf85f93218fc..09bc39840dc8 100644
> > > --- a/include/drm/gpu_scheduler.h
> > > +++ b/include/drm/gpu_scheduler.h
> > > @@ -539,6 +539,7 @@ struct drm_gpu_scheduler {
> > >   int drm_sched_init(struct drm_gpu_scheduler *sched,
> > >   		   const struct drm_sched_backend_ops *ops,
> > > +		   struct workqueue_struct *run_wq,
> > >   		   uint32_t hw_submission, unsigned hang_limit,
> > >   		   long timeout, struct workqueue_struct *timeout_wq,
> > >   		   atomic_t *score, const char *name, struct device *dev);

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 05/31] drm/xe: Long running job update
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 05/31] drm/xe: Long running job update Matthew Brost
@ 2023-05-05 18:36   ` Rodrigo Vivi
  2023-05-08  1:14     ` Matthew Brost
  2023-05-08 13:14   ` Thomas Hellström
  1 sibling, 1 reply; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-05 18:36 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, May 01, 2023 at 05:17:01PM -0700, Matthew Brost wrote:
> Flow control + write ring in exec, return NULL in run_job, siganl

typo: s/siganl/signal

> xe_hw_fence immediately, and override TDR for LR jobs.

So, this would likely be the recommendation on how to deal with
the lack of completion fence right?! Could you please put a more
descriptive text that we could convert to a documentation later?

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_engine.c           | 32 ++++++++
>  drivers/gpu/drm/xe/xe_engine.h           |  4 +
>  drivers/gpu/drm/xe/xe_exec.c             |  8 ++
>  drivers/gpu/drm/xe/xe_guc_engine_types.h |  2 +
>  drivers/gpu/drm/xe/xe_guc_submit.c       | 95 +++++++++++++++++++++---
>  drivers/gpu/drm/xe/xe_trace.h            |  5 ++
>  6 files changed, 137 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
> index 094ec17d3004..d1e84d7adbd4 100644
> --- a/drivers/gpu/drm/xe/xe_engine.c
> +++ b/drivers/gpu/drm/xe/xe_engine.c
> @@ -18,6 +18,7 @@
>  #include "xe_macros.h"
>  #include "xe_migrate.h"
>  #include "xe_pm.h"
> +#include "xe_ring_ops_types.h"
>  #include "xe_trace.h"
>  #include "xe_vm.h"
>  
> @@ -673,6 +674,37 @@ static void engine_kill_compute(struct xe_engine *e)
>  	up_write(&e->vm->lock);
>  }
>  
> +/**
> + * xe_engine_is_lr() - Whether an engine is long-running
> + * @e: The engine
> + *
> + * Return: True if the engine is long-running, false otherwise.
> + */
> +bool xe_engine_is_lr(struct xe_engine *e)
> +{
> +	return e->vm && xe_vm_no_dma_fences(e->vm) &&
> +		!(e->flags & ENGINE_FLAG_VM);

Why do we have this ENGINE_FLAG_VM here?

> +}
> +
> +static s32 xe_engine_num_job_inflight(struct xe_engine *e)
> +{
> +	return e->lrc->fence_ctx.next_seqno - xe_lrc_seqno(e->lrc) - 1;
> +}
> +
> +/**
> + * xe_engine_ring_full() - Whether an engine's ring is full
> + * @e: The engine
> + *
> + * Return: True if the engine's ring is full, false otherwise.
> + */
> +bool xe_engine_ring_full(struct xe_engine *e)
> +{
> +	struct xe_lrc *lrc = e->lrc;
> +	s32 max_job = lrc->ring.size / MAX_JOB_SIZE_BYTES;
> +
> +	return xe_engine_num_job_inflight(e) >= max_job;
> +}
> +
>  /**
>   * xe_engine_is_idle() - Whether an engine is idle.
>   * @engine: The engine
> diff --git a/drivers/gpu/drm/xe/xe_engine.h b/drivers/gpu/drm/xe/xe_engine.h
> index a49cf2ab405e..2e60f6d90226 100644
> --- a/drivers/gpu/drm/xe/xe_engine.h
> +++ b/drivers/gpu/drm/xe/xe_engine.h
> @@ -42,6 +42,10 @@ static inline bool xe_engine_is_parallel(struct xe_engine *engine)
>  	return engine->width > 1;
>  }
>  
> +bool xe_engine_is_lr(struct xe_engine *e);
> +
> +bool xe_engine_ring_full(struct xe_engine *e);
> +
>  bool xe_engine_is_idle(struct xe_engine *engine);
>  
>  void xe_engine_kill(struct xe_engine *e);
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index ea869f2452ef..44ea9bcd0066 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -13,6 +13,7 @@
>  #include "xe_device.h"
>  #include "xe_engine.h"
>  #include "xe_macros.h"
> +#include "xe_ring_ops_types.h"
>  #include "xe_sched_job.h"
>  #include "xe_sync.h"
>  #include "xe_vm.h"
> @@ -277,6 +278,11 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  		goto err_engine_end;
>  	}
>  
> +	if (xe_engine_is_lr(engine) && xe_engine_ring_full(engine)) {
> +		err = -EWOULDBLOCK;
> +		goto err_engine_end;
> +	}
> +
>  	job = xe_sched_job_create(engine, xe_engine_is_parallel(engine) ?
>  				  addresses : &args->address);
>  	if (IS_ERR(job)) {
> @@ -363,6 +369,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  		xe_sync_entry_signal(&syncs[i], job,
>  				     &job->drm.s_fence->finished);
>  
> +	if (xe_engine_is_lr(engine))
> +		engine->ring_ops->emit_job(job);
>  	xe_sched_job_push(job);
>  	xe_vm_reactivate_rebind(vm);
>  
> diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> index cbfb13026ec1..5d83132034a6 100644
> --- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
> +++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> @@ -31,6 +31,8 @@ struct xe_guc_engine {
>  	 */
>  #define MAX_STATIC_MSG_TYPE	3
>  	struct drm_sched_msg static_msgs[MAX_STATIC_MSG_TYPE];
> +	/** @lr_tdr: long running TDR worker */
> +	struct work_struct lr_tdr;
>  	/** @fini_async: do final fini async from this worker */
>  	struct work_struct fini_async;
>  	/** @resume_time: time of last resume */
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 68d09e7a4cc0..0a41f5d04f6d 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -500,6 +500,14 @@ static void register_engine(struct xe_engine *e)
>  		parallel_write(xe, map, wq_desc.wq_status, WQ_STATUS_ACTIVE);
>  	}
>  
> +	/*
> +	 * We must keep a reference for LR engines if engine is registered with
> +	 * the GuC as jobs signal immediately and can't destroy an engine if the
> +	 * GuC has a reference to it.
> +	 */
> +	if (xe_engine_is_lr(e))
> +		xe_engine_get(e);
> +
>  	set_engine_registered(e);
>  	trace_xe_engine_register(e);
>  	if (xe_engine_is_parallel(e))
> @@ -662,6 +670,7 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
>  {
>  	struct xe_sched_job *job = to_xe_sched_job(drm_job);
>  	struct xe_engine *e = job->engine;
> +	bool lr = xe_engine_is_lr(e);
>  
>  	XE_BUG_ON((engine_destroyed(e) || engine_pending_disable(e)) &&
>  		  !engine_banned(e) && !engine_suspended(e));
> @@ -671,14 +680,19 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
>  	if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
>  		if (!engine_registered(e))
>  			register_engine(e);
> -		e->ring_ops->emit_job(job);
> +		if (!lr)	/* Written in IOCTL */
> +			e->ring_ops->emit_job(job);
>  		submit_engine(e);
>  	}
>  
> -	if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags))
> +	if (lr) {
> +		xe_sched_job_set_error(job, -ENOTSUPP);
> +		return NULL;
> +	} else if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags)) {
>  		return job->fence;
> -	else
> +	} else {
>  		return dma_fence_get(job->fence);
> +	}
>  }
>  
>  static void guc_engine_free_job(struct drm_sched_job *drm_job)
> @@ -782,6 +796,57 @@ static void simple_error_capture(struct xe_engine *e)
>  }
>  #endif
>  
> +static void xe_guc_engine_trigger_cleanup(struct xe_engine *e)
> +{
> +	struct xe_guc *guc = engine_to_guc(e);
> +
> +	if (xe_engine_is_lr(e))
> +		queue_work(guc_to_gt(guc)->ordered_wq, &e->guc->lr_tdr);
> +	else
> +		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> +}
> +
> +static void xe_guc_engine_lr_cleanup(struct work_struct *w)
> +{
> +	struct xe_guc_engine *ge =
> +		container_of(w, struct xe_guc_engine, lr_tdr);
> +	struct xe_engine *e = ge->engine;
> +	struct drm_gpu_scheduler *sched = &ge->sched;
> +
> +	XE_BUG_ON(!xe_engine_is_lr(e));
> +	trace_xe_engine_lr_cleanup(e);
> +
> +	/* Kill the run_job / process_msg entry points */
> +	drm_sched_run_wq_stop(sched);
> +
> +	/* Engine state now stable, disable scheduling / deregister if needed */
> +	if (engine_registered(e)) {
> +		struct xe_guc *guc = engine_to_guc(e);
> +		int ret;
> +
> +		set_engine_banned(e);
> +		xe_engine_get(e);
> +		disable_scheduling_deregister(guc, e);
> +
> +		/*
> +		 * Must wait for scheduling to be disabled before signalling
> +		 * any fences, if GT broken the GT reset code should signal us.
> +		 */
> +		smp_rmb();
> +		ret = wait_event_timeout(guc->ct.wq,
> +					 !engine_pending_disable(e) ||
> +					 guc_read_stopped(guc), HZ * 5);
> +		if (!ret) {
> +			XE_WARN_ON("Schedule disable failed to respond");
> +			drm_sched_run_wq_start(sched);
> +			xe_gt_reset_async(e->gt);
> +			return;
> +		}
> +	}
> +
> +	drm_sched_run_wq_start(sched);
> +}
> +
>  static enum drm_gpu_sched_stat
>  guc_engine_timedout_job(struct drm_sched_job *drm_job)
>  {
> @@ -832,7 +897,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
>  			err = -EIO;
>  		set_engine_banned(e);
>  		xe_engine_get(e);
> -		disable_scheduling_deregister(engine_to_guc(e), e);
> +		disable_scheduling_deregister(guc, e);
>  
>  		/*
>  		 * Must wait for scheduling to be disabled before signalling
> @@ -865,7 +930,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
>  	 */
>  	list_add(&drm_job->list, &sched->pending_list);
>  	drm_sched_run_wq_start(sched);
> -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> +	xe_guc_engine_trigger_cleanup(e);
>  
>  	/* Mark all outstanding jobs as bad, thus completing them */
>  	spin_lock(&sched->job_list_lock);
> @@ -889,6 +954,8 @@ static void __guc_engine_fini_async(struct work_struct *w)
>  
>  	trace_xe_engine_destroy(e);
>  
> +	if (xe_engine_is_lr(e))
> +		cancel_work_sync(&ge->lr_tdr);
>  	if (e->flags & ENGINE_FLAG_PERSISTENT)
>  		xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
>  	release_guc_id(guc, e);
> @@ -906,7 +973,7 @@ static void guc_engine_fini_async(struct xe_engine *e)
>  	bool kernel = e->flags & ENGINE_FLAG_KERNEL;
>  
>  	INIT_WORK(&e->guc->fini_async, __guc_engine_fini_async);
> -	queue_work(system_unbound_wq, &e->guc->fini_async);
> +	queue_work(system_wq, &e->guc->fini_async);
>  
>  	/* We must block on kernel engines so slabs are empty on driver unload */
>  	if (kernel) {
> @@ -1089,12 +1156,16 @@ static int guc_engine_init(struct xe_engine *e)
>  	if (err)
>  		goto err_free;
>  
> +
>  	sched = &ge->sched;
>  	err = drm_sched_entity_init(&ge->entity, DRM_SCHED_PRIORITY_NORMAL,
>  				    &sched, 1, NULL);
>  	if (err)
>  		goto err_sched;
>  
> +	if (xe_engine_is_lr(e))
> +		INIT_WORK(&e->guc->lr_tdr, xe_guc_engine_lr_cleanup);
> +
>  	mutex_lock(&guc->submission_state.lock);
>  
>  	err = alloc_guc_id(guc, e);
> @@ -1146,7 +1217,7 @@ static void guc_engine_kill(struct xe_engine *e)
>  {
>  	trace_xe_engine_kill(e);
>  	set_engine_killed(e);
> -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> +	xe_guc_engine_trigger_cleanup(e);
>  }
>  
>  static void guc_engine_add_msg(struct xe_engine *e, struct drm_sched_msg *msg,
> @@ -1296,6 +1367,9 @@ static void guc_engine_stop(struct xe_guc *guc, struct xe_engine *e)
>  	/* Stop scheduling + flush any DRM scheduler operations */
>  	drm_sched_run_wq_stop(sched);
>  
> +	if (engine_registered(e) && xe_engine_is_lr(e))
> +		xe_engine_put(e);
> +
>  	/* Clean up lost G2H + reset engine state */
>  	if (engine_destroyed(e) && engine_registered(e)) {
>  		if (engine_banned(e))
> @@ -1520,6 +1594,9 @@ int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>  	trace_xe_engine_deregister_done(e);
>  
>  	clear_engine_registered(e);
> +	if (xe_engine_is_lr(e))
> +		xe_engine_put(e);
> +
>  	if (engine_banned(e))
>  		xe_engine_put(e);
>  	else
> @@ -1557,7 +1634,7 @@ int xe_guc_engine_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
>  	 */
>  	set_engine_reset(e);
>  	if (!engine_banned(e))
> -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> +		xe_guc_engine_trigger_cleanup(e);
>  
>  	return 0;
>  }
> @@ -1584,7 +1661,7 @@ int xe_guc_engine_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
>  	/* Treat the same as engine reset */
>  	set_engine_reset(e);
>  	if (!engine_banned(e))
> -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> +		xe_guc_engine_trigger_cleanup(e);
>  
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> index 2f8eb7ebe9a7..02861c26e145 100644
> --- a/drivers/gpu/drm/xe/xe_trace.h
> +++ b/drivers/gpu/drm/xe/xe_trace.h
> @@ -219,6 +219,11 @@ DEFINE_EVENT(xe_engine, xe_engine_resubmit,
>  	     TP_ARGS(e)
>  );
>  
> +DEFINE_EVENT(xe_engine, xe_engine_lr_cleanup,
> +	     TP_PROTO(struct xe_engine *e),
> +	     TP_ARGS(e)
> +);
> +
>  DECLARE_EVENT_CLASS(xe_sched_job,
>  		    TP_PROTO(struct xe_sched_job *job),
>  		    TP_ARGS(job),
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 06/31] drm/xe: Ensure LR engines are not persistent
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 06/31] drm/xe: Ensure LR engines are not persistent Matthew Brost
@ 2023-05-05 18:38   ` Rodrigo Vivi
  2023-05-08  1:03     ` Matthew Brost
  2023-05-09 12:21   ` Thomas Hellström
  1 sibling, 1 reply; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-05 18:38 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, May 01, 2023 at 05:17:02PM -0700, Matthew Brost wrote:
> With our ref counting scheme LR engines only close properly if not
> persistent, ensure that LR engines are non-persistent.

Better to spell out long running somewhere here...

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_engine.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
> index d1e84d7adbd4..91600b1e8249 100644
> --- a/drivers/gpu/drm/xe/xe_engine.c
> +++ b/drivers/gpu/drm/xe/xe_engine.c
> @@ -596,7 +596,9 @@ int xe_engine_create_ioctl(struct drm_device *dev, void *data,
>  			return -ENOENT;
>  
>  		e = xe_engine_create(xe, vm, logical_mask,
> -				     args->width, hwe, ENGINE_FLAG_PERSISTENT);
> +				     args->width, hwe,
> +				     xe_vm_no_dma_fences(vm) ? 0 :

shouldn't we use that existent function xe_engine_is_lr instead of this?

> +				     ENGINE_FLAG_PERSISTENT);
>  		xe_vm_put(vm);
>  		if (IS_ERR(e))
>  			return PTR_ERR(e);
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 07/31] drm/xe: Only try to lock external BOs in VM bind
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 07/31] drm/xe: Only try to lock external BOs in VM bind Matthew Brost
@ 2023-05-05 18:40   ` Rodrigo Vivi
  2023-05-08  1:08     ` Matthew Brost
  2023-05-08  1:17   ` Christopher Snowhill
  1 sibling, 1 reply; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-05 18:40 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Matthew Brost

On Mon, May 01, 2023 at 05:17:03PM -0700, Matthew Brost wrote:
> Not needed and causes some issues with bulk LRU moves.

I'm confused with this explanation and the code below.
could you please provide a bit more wording here?

> 
> Signed-off-by: Matthew Brost <mattthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_vm.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 272f0f7f24fe..6c427ff92c44 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -2064,9 +2064,11 @@ static int vm_bind_ioctl(struct xe_vm *vm, struct xe_vma *vma,
>  		 */
>  		xe_bo_get(vbo);
>  
> -		tv_bo.bo = &vbo->ttm;
> -		tv_bo.num_shared = 1;
> -		list_add(&tv_bo.head, &objs);
> +		if (!vbo->vm) {
> +			tv_bo.bo = &vbo->ttm;
> +			tv_bo.num_shared = 1;
> +			list_add(&tv_bo.head, &objs);
> +		}
>  	}
>  
>  again:
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 09/31] drm/xe/guc: Read HXG fields from DW1 of G2H response
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 09/31] drm/xe/guc: Read HXG fields from DW1 of G2H response Matthew Brost
@ 2023-05-05 18:50   ` Rodrigo Vivi
  2023-05-09 12:49   ` Thomas Hellström
  1 sibling, 0 replies; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-05 18:50 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, May 01, 2023 at 05:17:05PM -0700, Matthew Brost wrote:
> The HXG fields are DW1 not DW0, fix this.

it took me a while to understand how the msg was constructed
to agree with this... I believe we should transform this into some
struct with embedded u32 with some proper names to make it clear
and avoid future mistakes like this.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

I also believe this patch and any other like this should be resend
individually so we get the fixes already in place.

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_guc_ct.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index 9055ff133a7c..6abf1dee95af 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -782,13 +782,13 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
>  	if (type == GUC_HXG_TYPE_RESPONSE_FAILURE) {
>  		g2h_fence->fail = true;
>  		g2h_fence->error =
> -			FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, msg[0]);
> +			FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, msg[1]);
>  		g2h_fence->hint =
> -			FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, msg[0]);
> +			FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, msg[1]);
>  	} else if (type == GUC_HXG_TYPE_NO_RESPONSE_RETRY) {
>  		g2h_fence->retry = true;
>  		g2h_fence->reason =
> -			FIELD_GET(GUC_HXG_RETRY_MSG_0_REASON, msg[0]);
> +			FIELD_GET(GUC_HXG_RETRY_MSG_0_REASON, msg[1]);
>  	} else if (g2h_fence->response_buffer) {
>  		g2h_fence->response_len = response_len;
>  		memcpy(g2h_fence->response_buffer, msg + GUC_CTB_MSG_MIN_LEN,
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 10/31] drm/xe/guc: Return the lower part of blocking H2G message
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 10/31] drm/xe/guc: Return the lower part of blocking H2G message Matthew Brost
@ 2023-05-05 18:52   ` Rodrigo Vivi
  2023-05-08  1:10     ` Matthew Brost
  0 siblings, 1 reply; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-05 18:52 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, May 01, 2023 at 05:17:06PM -0700, Matthew Brost wrote:
> The upper layers may need this data, an example of this is allocating
> DIST doorbell.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_guc_ct.c | 6 +++++-
>  drivers/gpu/drm/xe/xe_guc_pc.c | 6 ++++--
>  drivers/gpu/drm/xe/xe_huc.c    | 2 +-
>  3 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index 6abf1dee95af..60b69fcfac9f 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -25,6 +25,7 @@
>  struct g2h_fence {
>  	u32 *response_buffer;
>  	u32 seqno;
> +	u32 status;
>  	u16 response_len;
>  	u16 error;
>  	u16 hint;
> @@ -727,7 +728,7 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
>  		ret = -EIO;
>  	}
>  
> -	return ret > 0 ? 0 : ret;
> +	return ret > 0 ? g2h_fence.status : ret;

The problem I see here is how the upper level could differentiate
between and error and a status.

should we convert the functions to have an &status argument passed in?

>  }
>  
>  int xe_guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
> @@ -793,6 +794,9 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
>  		g2h_fence->response_len = response_len;
>  		memcpy(g2h_fence->response_buffer, msg + GUC_CTB_MSG_MIN_LEN,
>  		       response_len * sizeof(u32));
> +	} else {
> +		g2h_fence->status =
> +			FIELD_GET(GUC_HXG_RESPONSE_MSG_0_DATA0, msg[1]);
>  	}
>  
>  	g2h_release_space(ct, GUC_CTB_HXG_MSG_MAX_LEN);
> diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
> index 72d460d5323b..3d2ea723a4a7 100644
> --- a/drivers/gpu/drm/xe/xe_guc_pc.c
> +++ b/drivers/gpu/drm/xe/xe_guc_pc.c
> @@ -204,11 +204,13 @@ static int pc_action_query_task_state(struct xe_guc_pc *pc)
>  
>  	/* Blocking here to ensure the results are ready before reading them */
>  	ret = xe_guc_ct_send_block(ct, action, ARRAY_SIZE(action));
> -	if (ret)
> +	if (ret < 0) {
>  		drm_err(&pc_to_xe(pc)->drm,
>  			"GuC PC query task state failed: %pe", ERR_PTR(ret));
> +		return ret;
> +	}
>  
> -	return ret;
> +	return 0;
>  }
>  
>  static int pc_action_set_param(struct xe_guc_pc *pc, u8 id, u32 value)
> diff --git a/drivers/gpu/drm/xe/xe_huc.c b/drivers/gpu/drm/xe/xe_huc.c
> index 55dcaab34ea4..9c48c3075410 100644
> --- a/drivers/gpu/drm/xe/xe_huc.c
> +++ b/drivers/gpu/drm/xe/xe_huc.c
> @@ -39,7 +39,7 @@ int xe_huc_init(struct xe_huc *huc)
>  
>  	huc->fw.type = XE_UC_FW_TYPE_HUC;
>  	ret = xe_uc_fw_init(&huc->fw);
> -	if (ret)
> +	if (ret < 0)
>  		goto out;
>  
>  	xe_uc_fw_change_status(&huc->fw, XE_UC_FIRMWARE_LOADABLE);
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 12/31] drm/xe/guc: Print doorbell ID in GuC engine debugfs entry
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 12/31] drm/xe/guc: Print doorbell ID in GuC engine debugfs entry Matthew Brost
@ 2023-05-05 18:55   ` Rodrigo Vivi
  2023-05-09 13:01     ` Thomas Hellström
  0 siblings, 1 reply; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-05 18:55 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, May 01, 2023 at 05:17:08PM -0700, Matthew Brost wrote:
> This information is helpful so print it.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_guc_submit.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 1b6f36b04cd1..880f480c6d5f 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -2016,6 +2016,8 @@ static void guc_engine_print(struct xe_engine *e, struct drm_printer *p)
>  	drm_printf(p, "\tTimeslice: %u (us)\n", e->sched_props.timeslice_us);
>  	drm_printf(p, "\tPreempt timeout: %u (us)\n",
>  		   e->sched_props.preempt_timeout_us);
> +	drm_printf(p, "\tDoorbell ID: %u\n",
> +		   e->guc->doorbell_id);
>  	for (i = 0; i < e->width; ++i ) {
>  		struct xe_lrc *lrc = e->lrc + i;
>  
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 20/31] drm/xe: Optimize size of xe_vma allocation
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 20/31] drm/xe: Optimize size of xe_vma allocation Matthew Brost
@ 2023-05-05 19:37   ` Rodrigo Vivi
  2023-05-08  1:21     ` Matthew Brost
  2023-05-11  9:05   ` Thomas Hellström
  1 sibling, 1 reply; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-05 19:37 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, May 01, 2023 at 05:17:16PM -0700, Matthew Brost wrote:
> Reduce gt_mask to a u8 from a u64, only allocate userptr state if VMA is
> a userptr, and union of destroy callback and worker.

too many different things in one patch. could you please split the patch?

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_vm.c       | 14 +++--
>  drivers/gpu/drm/xe/xe_vm_types.h | 88 +++++++++++++++++---------------
>  2 files changed, 57 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index e5f2fffb2aec..e8d9939ee535 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -814,7 +814,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  				    u64 bo_offset_or_userptr,
>  				    u64 start, u64 end,
>  				    bool read_only, bool null,
> -				    u64 gt_mask)
> +				    u8 gt_mask)
>  {
>  	struct xe_vma *vma;
>  	struct xe_gt *gt;
> @@ -823,7 +823,11 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  	XE_BUG_ON(start >= end);
>  	XE_BUG_ON(end >= vm->size);
>  
> -	vma = kzalloc(sizeof(*vma), GFP_KERNEL);
> +	if (!bo && !null)	/* userptr */
> +		vma = kzalloc(sizeof(*vma), GFP_KERNEL);
> +	else
> +		vma = kzalloc(sizeof(*vma) - sizeof(struct xe_userptr),
> +			      GFP_KERNEL);
>  	if (!vma) {
>  		vma = ERR_PTR(-ENOMEM);
>  		return vma;
> @@ -2149,7 +2153,7 @@ static void print_op(struct xe_device *xe, struct drm_gpuva_op *op)
>  static struct drm_gpuva_ops *
>  vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  			 u64 bo_offset_or_userptr, u64 addr, u64 range,
> -			 u32 operation, u64 gt_mask, u32 region)
> +			 u32 operation, u8 gt_mask, u32 region)
>  {
>  	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
>  	struct ww_acquire_ctx ww;
> @@ -2234,7 +2238,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  }
>  
>  static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
> -			      u64 gt_mask, bool read_only, bool null)
> +			      u8 gt_mask, bool read_only, bool null)
>  {
>  	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
>  	struct xe_vma *vma;
> @@ -3217,8 +3221,8 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  		u64 addr = bind_ops[i].addr;
>  		u32 op = bind_ops[i].op;
>  		u64 obj_offset = bind_ops[i].obj_offset;
> -		u64 gt_mask = bind_ops[i].gt_mask;
>  		u32 region = bind_ops[i].region;
> +		u8 gt_mask = bind_ops[i].gt_mask;
>  
>  		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
>  						  addr, range, op, gt_mask,
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index 22def5483c12..df4797ec4d7f 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -34,22 +34,34 @@ struct xe_vm;
>  #define XE_VMA_PTE_2M		(DRM_GPUVA_USERBITS << 7)
>  #define XE_VMA_PTE_1G		(DRM_GPUVA_USERBITS << 8)
>  
> +/** struct xe_userptr - User pointer */
> +struct xe_userptr {
> +	/**
> +	 * @notifier: MMU notifier for user pointer (invalidation call back)
> +	 */
> +	struct mmu_interval_notifier notifier;
> +	/** @sgt: storage for a scatter gather table */
> +	struct sg_table sgt;
> +	/** @sg: allocated scatter gather table */
> +	struct sg_table *sg;
> +	/** @notifier_seq: notifier sequence number */
> +	unsigned long notifier_seq;
> +	/**
> +	 * @initial_bind: user pointer has been bound at least once.
> +	 * write: vm->userptr.notifier_lock in read mode and vm->resv held.
> +	 * read: vm->userptr.notifier_lock in write mode or vm->resv held.
> +	 */
> +	bool initial_bind;
> +#if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
> +	u32 divisor;
> +#endif
> +};
> +
> +/** xe_vma - Virtual memory address */
>  struct xe_vma {
>  	/** @gpuva: Base GPUVA object */
>  	struct drm_gpuva gpuva;
>  
> -	/** @gt_mask: GT mask of where to create binding for this VMA */
> -	u64 gt_mask;
> -
> -	/**
> -	 * @gt_present: GT mask of binding are present for this VMA.
> -	 * protected by vm->lock, vm->resv and for userptrs,
> -	 * vm->userptr.notifier_lock for writing. Needs either for reading,
> -	 * but if reading is done under the vm->lock only, it needs to be held
> -	 * in write mode.
> -	 */
> -	u64 gt_present;
> -
>  	union {
>  		/** @userptr_link: link into VM repin list if userptr */
>  		struct list_head userptr_link;
> @@ -77,16 +89,29 @@ struct xe_vma {
>  		} notifier;
>  	};
>  
> -	/** @destroy_cb: callback to destroy VMA when unbind job is done */
> -	struct dma_fence_cb destroy_cb;
> +	union {
> +		/** @destroy_cb: callback to destroy VMA when unbind job is done */
> +		struct dma_fence_cb destroy_cb;
> +		/** @destroy_work: worker to destroy this BO */
> +		struct work_struct destroy_work;
> +	};
>  
> -	/** @destroy_work: worker to destroy this BO */
> -	struct work_struct destroy_work;
> +	/** @gt_mask: GT mask of where to create binding for this VMA */
> +	u8 gt_mask;
> +
> +	/**
> +	 * @gt_present: GT mask of binding are present for this VMA.
> +	 * protected by vm->lock, vm->resv and for userptrs,
> +	 * vm->userptr.notifier_lock for writing. Needs either for reading,
> +	 * but if reading is done under the vm->lock only, it needs to be held
> +	 * in write mode.
> +	 */
> +	u8 gt_present;
>  
>  	/** @usm: unified shared memory state */
>  	struct {
>  		/** @gt_invalidated: VMA has been invalidated */
> -		u64 gt_invalidated;
> +		u8 gt_invalidated;
>  	} usm;
>  
>  	struct {
> @@ -97,28 +122,11 @@ struct xe_vma {
>  		struct list_head link;
>  	} extobj;
>  
> -	/** @userptr: user pointer state */
> -	struct {
> -		/**
> -		 * @notifier: MMU notifier for user pointer (invalidation call back)
> -		 */
> -		struct mmu_interval_notifier notifier;
> -		/** @sgt: storage for a scatter gather table */
> -		struct sg_table sgt;
> -		/** @sg: allocated scatter gather table */
> -		struct sg_table *sg;
> -		/** @notifier_seq: notifier sequence number */
> -		unsigned long notifier_seq;
> -		/**
> -		 * @initial_bind: user pointer has been bound at least once.
> -		 * write: vm->userptr.notifier_lock in read mode and vm->resv held.
> -		 * read: vm->userptr.notifier_lock in write mode or vm->resv held.
> -		 */
> -		bool initial_bind;
> -#if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
> -		u32 divisor;
> -#endif
> -	} userptr;
> +	/**
> +	 * @userptr: user pointer state, only allocated for VMAs that are
> +	 * user pointers
> +	 */
> +	struct xe_userptr userptr;
>  };
>  
>  struct xe_device;
> @@ -387,7 +395,7 @@ struct xe_vma_op {
>  	 */
>  	struct async_op_fence *fence;
>  	/** @gt_mask: gt mask for this operation */
> -	u64 gt_mask;
> +	u8 gt_mask;
>  	/** @flags: operation flags */
>  	enum xe_vma_op_flags flags;
>  
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 21/31] drm/gpuva: Add drm device to GPUVA manager
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 21/31] drm/gpuva: Add drm device to GPUVA manager Matthew Brost
@ 2023-05-05 19:39   ` Rodrigo Vivi
  2023-05-11  9:06     ` Thomas Hellström
  0 siblings, 1 reply; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-05 19:39 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, May 01, 2023 at 05:17:17PM -0700, Matthew Brost wrote:
> This is the logical place for this, will help with upcoming changes too.

Please split the xe from the drm stuff in different patches and
a bit more words of why would be better.

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/drm_gpuva_mgr.c  |  3 +++
>  drivers/gpu/drm/xe/xe_migrate.c  | 10 +++++-----
>  drivers/gpu/drm/xe/xe_pt.c       | 18 +++++++++---------
>  drivers/gpu/drm/xe/xe_vm.c       | 31 +++++++++++++++----------------
>  drivers/gpu/drm/xe/xe_vm.h       | 10 ++++++++++
>  drivers/gpu/drm/xe/xe_vm_types.h |  2 --
>  include/drm/drm_gpuva_mgr.h      |  4 ++++
>  7 files changed, 46 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
> index bd7d27ee44bb..137322945e91 100644
> --- a/drivers/gpu/drm/drm_gpuva_mgr.c
> +++ b/drivers/gpu/drm/drm_gpuva_mgr.c
> @@ -413,6 +413,7 @@ static void __drm_gpuva_remove(struct drm_gpuva *va);
>  /**
>   * drm_gpuva_manager_init - initialize a &drm_gpuva_manager
>   * @mgr: pointer to the &drm_gpuva_manager to initialize
> + * @drm: drm device
>   * @name: the name of the GPU VA space
>   * @start_offset: the start offset of the GPU VA space
>   * @range: the size of the GPU VA space
> @@ -427,6 +428,7 @@ static void __drm_gpuva_remove(struct drm_gpuva *va);
>   */
>  void
>  drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
> +		       struct drm_device *drm,
>  		       const char *name,
>  		       u64 start_offset, u64 range,
>  		       u64 reserve_offset, u64 reserve_range,
> @@ -437,6 +439,7 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
>  	mgr->mm_start = start_offset;
>  	mgr->mm_range = range;
>  
> +	mgr->drm = drm;
>  	mgr->name = name ? name : "unknown";
>  	mgr->ops = ops;
>  
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index b44aa094a466..0a393c5772e5 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -129,7 +129,7 @@ static u64 xe_migrate_vram_ofs(u64 addr)
>  static int xe_migrate_create_cleared_bo(struct xe_migrate *m, struct xe_vm *vm)
>  {
>  	struct xe_gt *gt = m->gt;
> -	struct xe_device *xe = vm->xe;
> +	struct xe_device *xe = xe_vm_device(vm);
>  	size_t cleared_size;
>  	u64 vram_addr;
>  	bool is_vram;
> @@ -175,7 +175,7 @@ static int xe_migrate_prepare_vm(struct xe_gt *gt, struct xe_migrate *m,
>  	/* Need to be sure everything fits in the first PT, or create more */
>  	XE_BUG_ON(m->batch_base_ofs + batch->size >= SZ_2M);
>  
> -	bo = xe_bo_create_pin_map(vm->xe, m->gt, vm,
> +	bo = xe_bo_create_pin_map(xe_vm_device(vm), m->gt, vm,
>  				  num_entries * XE_PAGE_SIZE,
>  				  ttm_bo_type_kernel,
>  				  XE_BO_CREATE_VRAM_IF_DGFX(m->gt) |
> @@ -1051,7 +1051,7 @@ xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
>  
>  	if (wait_vm && !dma_resv_test_signaled(&vm->resv,
>  					       DMA_RESV_USAGE_BOOKKEEP)) {
> -		vm_dbg(&vm->xe->drm, "wait on VM for munmap");
> +		vm_dbg(&xe_vm_device(vm)->drm, "wait on VM for munmap");
>  		return ERR_PTR(-ETIME);
>  	}
>  
> @@ -1069,7 +1069,7 @@ xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
>  
>  	if (vm) {
>  		trace_xe_vm_cpu_bind(vm);
> -		xe_device_wmb(vm->xe);
> +		xe_device_wmb(xe_vm_device(vm));
>  	}
>  
>  	fence = dma_fence_get_stub();
> @@ -1263,7 +1263,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
>  	 * trigger preempts before moving forward
>  	 */
>  	if (first_munmap_rebind) {
> -		vm_dbg(&vm->xe->drm, "wait on first_munmap_rebind");
> +		vm_dbg(&xe_vm_device(vm)->drm, "wait on first_munmap_rebind");
>  		err = job_add_deps(job, &vm->resv,
>  				   DMA_RESV_USAGE_BOOKKEEP);
>  		if (err)
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 8eab8e1bbaf0..4167f666d98d 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -218,7 +218,7 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_gt *gt,
>  	if (!pt)
>  		return ERR_PTR(-ENOMEM);
>  
> -	bo = xe_bo_create_pin_map(vm->xe, gt, vm, SZ_4K,
> +	bo = xe_bo_create_pin_map(xe_vm_device(vm), gt, vm, SZ_4K,
>  				  ttm_bo_type_kernel,
>  				  XE_BO_CREATE_VRAM_IF_DGFX(gt) |
>  				  XE_BO_CREATE_IGNORE_MIN_PAGE_SIZE_BIT |
> @@ -264,11 +264,11 @@ void xe_pt_populate_empty(struct xe_gt *gt, struct xe_vm *vm,
>  		 * FIXME: Some memory is allocated already allocated to zero?
>  		 * Find out which memory that is and avoid this memset...
>  		 */
> -		xe_map_memset(vm->xe, map, 0, 0, SZ_4K);
> +		xe_map_memset(xe_vm_device(vm), map, 0, 0, SZ_4K);
>  	} else {
>  		empty = __xe_pt_empty_pte(gt, vm, pt->level);
>  		for (i = 0; i < XE_PDES; i++)
> -			xe_pt_write(vm->xe, map, i, empty);
> +			xe_pt_write(xe_vm_device(vm), map, i, empty);
>  	}
>  }
>  
> @@ -355,7 +355,7 @@ int xe_pt_create_scratch(struct xe_device *xe, struct xe_gt *gt,
>  	if (IS_ERR(vm->scratch_bo[id]))
>  		return PTR_ERR(vm->scratch_bo[id]);
>  
> -	xe_map_memset(vm->xe, &vm->scratch_bo[id]->vmap, 0, 0,
> +	xe_map_memset(xe_vm_device(vm), &vm->scratch_bo[id]->vmap, 0, 0,
>  		      vm->scratch_bo[id]->size);
>  
>  	for (i = 0; i < vm->pt_root[id]->level; i++) {
> @@ -538,7 +538,7 @@ xe_pt_insert_entry(struct xe_pt_stage_bind_walk *xe_walk, struct xe_pt *parent,
>  		if (unlikely(xe_child))
>  			parent->drm.dir->entries[offset] = &xe_child->drm;
>  
> -		xe_pt_write(xe_walk->vm->xe, map, offset, pte);
> +		xe_pt_write(xe_vm_device(xe_walk->vm), map, offset, pte);
>  		parent->num_live++;
>  	} else {
>  		/* Shared pt. Stage update. */
> @@ -1337,7 +1337,7 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
>  	xe_vm_assert_held(vm);
>  	XE_BUG_ON(xe_gt_is_media_type(gt));
>  
> -	vm_dbg(&xe_vma_vm(vma)->xe->drm,
> +	vm_dbg(&xe_vma_device(vma)->drm,
>  	       "Preparing bind, with range [%llx...%llx) engine %p.\n",
>  	       xe_vma_start(vma), xe_vma_end(vma) - 1, e);
>  
> @@ -1366,7 +1366,7 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
>  
>  
>  		if (last_munmap_rebind)
> -			vm_dbg(&vm->xe->drm, "last_munmap_rebind");
> +			vm_dbg(&xe_vm_device(vm)->drm, "last_munmap_rebind");
>  
>  		/* TLB invalidation must be done before signaling rebind */
>  		if (rebind && !xe_vm_no_dma_fences(xe_vma_vm(vma))) {
> @@ -1401,7 +1401,7 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
>  			xe_bo_put_commit(&deferred);
>  		}
>  		if (!rebind && last_munmap_rebind && xe_vm_in_compute_mode(vm))
> -			queue_work(vm->xe->ordered_wq,
> +			queue_work(xe_vm_device(vm)->ordered_wq,
>  				   &vm->preempt.rebind_work);
>  	} else {
>  		kfree(ifence);
> @@ -1664,7 +1664,7 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
>  	xe_vm_assert_held(vm);
>  	XE_BUG_ON(xe_gt_is_media_type(gt));
>  
> -	vm_dbg(&xe_vma_vm(vma)->xe->drm,
> +	vm_dbg(&xe_vma_device(vma)->drm,
>  	       "Preparing unbind, with range [%llx...%llx) engine %p.\n",
>  	       xe_vma_start(vma), xe_vma_end(vma) - 1, e);
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index e8d9939ee535..688130c509a4 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -50,7 +50,7 @@ int xe_vma_userptr_check_repin(struct xe_vma *vma)
>  int xe_vma_userptr_pin_pages(struct xe_vma *vma)
>  {
>  	struct xe_vm *vm = xe_vma_vm(vma);
> -	struct xe_device *xe = vm->xe;
> +	struct xe_device *xe = xe_vm_device(vm);
>  	const unsigned long num_pages = xe_vma_size(vma) >> PAGE_SHIFT;
>  	struct page **pages;
>  	bool in_kthread = !current->mm;
> @@ -852,12 +852,12 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  	if (gt_mask) {
>  		vma->gt_mask = gt_mask;
>  	} else {
> -		for_each_gt(gt, vm->xe, id)
> +		for_each_gt(gt, xe_vm_device(vm), id)
>  			if (!xe_gt_is_media_type(gt))
>  				vma->gt_mask |= 0x1 << id;
>  	}
>  
> -	if (vm->xe->info.platform == XE_PVC)
> +	if (xe_vm_device(vm)->info.platform == XE_PVC)
>  		vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>  
>  	if (bo) {
> @@ -904,7 +904,7 @@ static void vm_remove_extobj(struct xe_vma *vma)
>  static void xe_vma_destroy_late(struct xe_vma *vma)
>  {
>  	struct xe_vm *vm = xe_vma_vm(vma);
> -	struct xe_device *xe = vm->xe;
> +	struct xe_device *xe = xe_vm_device(vm);
>  	bool read_only = xe_vma_read_only(vma);
>  
>  	if (xe_vma_is_userptr(vma)) {
> @@ -1084,7 +1084,6 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
>  	if (!vm)
>  		return ERR_PTR(-ENOMEM);
>  
> -	vm->xe = xe;
>  	kref_init(&vm->refcount);
>  	dma_resv_init(&vm->resv);
>  
> @@ -1125,7 +1124,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
>  	if (err)
>  		goto err_put;
>  
> -	drm_gpuva_manager_init(&vm->mgr, "Xe VM", 0, vm->size, 0, 0,
> +	drm_gpuva_manager_init(&vm->mgr, &xe->drm, "Xe VM", 0, vm->size, 0, 0,
>  			       &gpuva_ops);
>  	if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
>  		vm->flags |= XE_VM_FLAGS_64K;
> @@ -1284,7 +1283,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
>  {
>  	struct list_head contested;
>  	struct ww_acquire_ctx ww;
> -	struct xe_device *xe = vm->xe;
> +	struct xe_device *xe = xe_vm_device(vm);
>  	struct xe_gt *gt;
>  	struct xe_vma *vma, *next_vma;
>  	struct drm_gpuva *gpuva;
> @@ -1387,7 +1386,7 @@ static void vm_destroy_work_func(struct work_struct *w)
>  	struct xe_vm *vm =
>  		container_of(w, struct xe_vm, destroy_work);
>  	struct ww_acquire_ctx ww;
> -	struct xe_device *xe = vm->xe;
> +	struct xe_device *xe = xe_vm_device(vm);
>  	struct xe_gt *gt;
>  	u8 id;
>  	void *lookup;
> @@ -1481,7 +1480,7 @@ xe_vm_unbind_vma(struct xe_vma *vma, struct xe_engine *e,
>  			return ERR_PTR(-ENOMEM);
>  	}
>  
> -	for_each_gt(gt, vm->xe, id) {
> +	for_each_gt(gt, xe_vm_device(vm), id) {
>  		if (!(vma->gt_present & BIT(id)))
>  			goto next;
>  
> @@ -1555,7 +1554,7 @@ xe_vm_bind_vma(struct xe_vma *vma, struct xe_engine *e,
>  			return ERR_PTR(-ENOMEM);
>  	}
>  
> -	for_each_gt(gt, vm->xe, id) {
> +	for_each_gt(gt, xe_vm_device(vm), id) {
>  		if (!(vma->gt_mask & BIT(id)))
>  			goto next;
>  
> @@ -2061,7 +2060,7 @@ static int vm_insert_extobj(struct xe_vm *vm, struct xe_vma *vma)
>  static int vm_bind_ioctl_lookup_vma(struct xe_vm *vm, struct xe_bo *bo,
>  				    u64 addr, u64 range, u32 op)
>  {
> -	struct xe_device *xe = vm->xe;
> +	struct xe_device *xe = xe_vm_device(vm);
>  	struct xe_vma *vma;
>  	bool async = !!(op & XE_VM_BIND_FLAG_ASYNC);
>  
> @@ -2164,7 +2163,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  
>  	lockdep_assert_held_write(&vm->lock);
>  
> -	vm_dbg(&vm->xe->drm,
> +	vm_dbg(&xe_vm_device(vm)->drm,
>  	       "op=%d, addr=0x%016llx, range=0x%016llx, bo_offset_or_userptr=0x%016llx",
>  	       VM_BIND_OP(operation), addr, range, bo_offset_or_userptr);
>  
> @@ -2232,7 +2231,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  
>  	if (!IS_ERR(ops))
>  		drm_gpuva_for_each_op(__op, ops)
> -			print_op(vm->xe, __op);
> +			print_op(xe_vm_device(vm), __op);
>  
>  	return ops;
>  }
> @@ -2783,7 +2782,7 @@ static void xe_vma_op_work_func(struct work_struct *w)
>  			down_write(&vm->lock);
>  			err = xe_vma_op_execute(vm, op);
>  			if (err) {
> -				drm_warn(&vm->xe->drm,
> +				drm_warn(&xe_vm_device(vm)->drm,
>  					 "Async VM op(%d) failed with %d",
>  					 op->base.op, err);
>  				vm_set_async_error(vm, err);
> @@ -3103,7 +3102,7 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  
>  			/* Rebinds may have been blocked, give worker a kick */
>  			if (xe_vm_in_compute_mode(vm))
> -				queue_work(vm->xe->ordered_wq,
> +				queue_work(xe_vm_device(vm)->ordered_wq,
>  					   &vm->preempt.rebind_work);
>  		}
>  
> @@ -3315,7 +3314,7 @@ void xe_vm_unlock(struct xe_vm *vm, struct ww_acquire_ctx *ww)
>   */
>  int xe_vm_invalidate_vma(struct xe_vma *vma)
>  {
> -	struct xe_device *xe = xe_vma_vm(vma)->xe;
> +	struct xe_device *xe = xe_vm_device(xe_vma_vm(vma));
>  	struct xe_gt *gt;
>  	u32 gt_needs_invalidate = 0;
>  	int seqno[XE_MAX_GT];
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 96e2c6b07bf8..cbbe95d6291f 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -52,6 +52,11 @@ static inline bool xe_vm_is_closed(struct xe_vm *vm)
>  struct xe_vma *
>  xe_vm_find_overlapping_vma(struct xe_vm *vm, u64 start, u64 range);
>  
> +static inline struct xe_device *xe_vm_device(struct xe_vm *vm)
> +{
> +	return container_of(vm->mgr.drm, struct xe_device, drm);
> +}
> +
>  static inline struct xe_vm *gpuva_to_vm(struct drm_gpuva *gpuva)
>  {
>  	return container_of(gpuva->mgr, struct xe_vm, mgr);
> @@ -102,6 +107,11 @@ static inline struct xe_vm *xe_vma_vm(struct xe_vma *vma)
>  	return container_of(vma->gpuva.mgr, struct xe_vm, mgr);
>  }
>  
> +static inline struct xe_device *xe_vma_device(struct xe_vma *vma)
> +{
> +	return xe_vm_device(xe_vma_vm(vma));
> +}
> +
>  static inline bool xe_vma_read_only(struct xe_vma *vma)
>  {
>  	return vma->gpuva.flags & XE_VMA_READ_ONLY;
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index df4797ec4d7f..fca42910dcae 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -137,8 +137,6 @@ struct xe_vm {
>  	/** @mgr: base GPUVA used to track VMAs */
>  	struct drm_gpuva_manager mgr;
>  
> -	struct xe_device *xe;
> -
>  	struct kref refcount;
>  
>  	/* engine used for (un)binding vma's */
> diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
> index 62169d850098..55b0acfdcc44 100644
> --- a/include/drm/drm_gpuva_mgr.h
> +++ b/include/drm/drm_gpuva_mgr.h
> @@ -169,6 +169,9 @@ static inline bool drm_gpuva_evicted(struct drm_gpuva *va)
>   * There should be one manager instance per GPU virtual address space.
>   */
>  struct drm_gpuva_manager {
> +	/** @drm: drm device */
> +	struct drm_device *drm;
> +
>  	/**
>  	 * @name: the name of the DRM GPU VA space
>  	 */
> @@ -204,6 +207,7 @@ struct drm_gpuva_manager {
>  };
>  
>  void drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
> +			    struct drm_device *drm,
>  			    const char *name,
>  			    u64 start_offset, u64 range,
>  			    u64 reserve_offset, u64 reserve_range,
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 24/31] drm/xe: Userptr refactor
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 24/31] drm/xe: Userptr refactor Matthew Brost
@ 2023-05-05 19:41   ` Rodrigo Vivi
  2023-05-11  9:46   ` Thomas Hellström
  1 sibling, 0 replies; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-05 19:41 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, May 01, 2023 at 05:17:20PM -0700, Matthew Brost wrote:
> Add GPUVA userptr flag, add GPUVA userptr sub-struct, and drop sg
> pointer. A larger follow on cleanup may push more of userptr
> implementation to GPUVA.

here as well, please have xe and drm changes in separated patches.
But the final result looks good.... 

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_pt.c       |  6 +--
>  drivers/gpu/drm/xe/xe_vm.c       | 41 +++++++++++----------
>  drivers/gpu/drm/xe/xe_vm.h       | 23 +++++++-----
>  drivers/gpu/drm/xe/xe_vm_types.h | 20 +++++-----
>  include/drm/drm_gpuva_mgr.h      | 63 +++++++++++++++++++++-----------
>  5 files changed, 89 insertions(+), 64 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 0f40f1950686..964baa24eba3 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -92,8 +92,8 @@ static dma_addr_t vma_addr(struct xe_vma *vma, u64 offset,
>  		page = offset >> PAGE_SHIFT;
>  		offset &= (PAGE_SIZE - 1);
>  
> -		xe_res_first_sg(vma->userptr.sg, page << PAGE_SHIFT, page_size,
> -				&cur);
> +		xe_res_first_sg(&vma->userptr.sgt, page << PAGE_SHIFT,
> +				page_size, &cur);
>  		return xe_res_dma(&cur) + offset;
>  	} else {
>  		return xe_bo_addr(xe_vma_bo(vma), offset, page_size, is_vram);
> @@ -813,7 +813,7 @@ xe_pt_stage_bind(struct xe_gt *gt, struct xe_vma *vma,
>  	xe_bo_assert_held(bo);
>  	if (!xe_vma_is_null(vma)) {
>  		if (xe_vma_is_userptr(vma))
> -			xe_res_first_sg(vma->userptr.sg, 0, xe_vma_size(vma),
> +			xe_res_first_sg(&vma->userptr.sgt, 0, xe_vma_size(vma),
>  					&curs);
>  		else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
>  			xe_res_first(bo->ttm.resource, xe_vma_bo_offset(vma),
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 336e21c710a5..4d734ec4d6ab 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -73,13 +73,13 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
>  	if (!pages)
>  		return -ENOMEM;
>  
> -	if (vma->userptr.sg) {
> +	if (xe_vma_userptr_sg_mapped(vma)) {
>  		dma_unmap_sgtable(xe->drm.dev,
> -				  vma->userptr.sg,
> +				  &vma->userptr.sgt,
>  				  read_only ? DMA_TO_DEVICE :
>  				  DMA_BIDIRECTIONAL, 0);
> -		sg_free_table(vma->userptr.sg);
> -		vma->userptr.sg = NULL;
> +		sg_free_table(&vma->userptr.sgt);
> +		vma->gpuva.flags &= ~XE_VMA_USERPTR_SG_MAPPED;
>  	}
>  
>  	pinned = ret = 0;
> @@ -119,19 +119,19 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
>  					0, (u64)pinned << PAGE_SHIFT,
>  					GFP_KERNEL);
>  	if (ret) {
> -		vma->userptr.sg = NULL;
> +		vma->gpuva.flags &= ~XE_VMA_USERPTR_SG_MAPPED;
>  		goto out;
>  	}
> -	vma->userptr.sg = &vma->userptr.sgt;
> +	vma->gpuva.flags |= XE_VMA_USERPTR_SG_MAPPED;
>  
> -	ret = dma_map_sgtable(xe->drm.dev, vma->userptr.sg,
> +	ret = dma_map_sgtable(xe->drm.dev, &vma->userptr.sgt,
>  			      read_only ? DMA_TO_DEVICE :
>  			      DMA_BIDIRECTIONAL,
>  			      DMA_ATTR_SKIP_CPU_SYNC |
>  			      DMA_ATTR_NO_KERNEL_MAPPING);
>  	if (ret) {
> -		sg_free_table(vma->userptr.sg);
> -		vma->userptr.sg = NULL;
> +		sg_free_table(&vma->userptr.sgt);
> +		vma->gpuva.flags &= ~XE_VMA_USERPTR_SG_MAPPED;
>  		goto out;
>  	}
>  
> @@ -820,15 +820,13 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  	 */
>  	INIT_LIST_HEAD(&vma->rebind_link);
>  
> -	INIT_LIST_HEAD(&vma->gpuva.gem.entry);
> -	INIT_LIST_HEAD(&vma->gpuva.gem.extobj_link);
>  	vma->gpuva.mgr = &vm->mgr;
>  	vma->gpuva.va.addr = start;
>  	vma->gpuva.va.range = end - start + 1;
>  	if (read_only)
>  		vma->gpuva.flags |= XE_VMA_READ_ONLY;
>  	if (null)
> -		vma->gpuva.flags |= XE_VMA_NULL;
> +		vma->gpuva.flags |= DRM_GPUVA_SPARSE;
>  
>  	if (gt_mask) {
>  		vma->gt_mask = gt_mask;
> @@ -845,6 +843,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  		xe_bo_assert_held(bo);
>  
>  		drm_gem_object_get(&bo->ttm.base);
> +		INIT_LIST_HEAD(&vma->gpuva.gem.entry);
> +		INIT_LIST_HEAD(&vma->gpuva.gem.extobj_link);
>  		vma->gpuva.gem.obj = &bo->ttm.base;
>  		vma->gpuva.gem.offset = bo_offset_or_userptr;
>  		if (!bo->vm)
> @@ -855,7 +855,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  			u64 size = end - start + 1;
>  			int err;
>  
> -			vma->gpuva.gem.offset = bo_offset_or_userptr;
> +			vma->gpuva.flags |= DRM_GPUVA_USERPTR;
> +			vma->gpuva.userptr.address= bo_offset_or_userptr;
>  			err = mmu_interval_notifier_insert(&vma->userptr.notifier,
>  							   current->mm,
>  							   xe_vma_userptr(vma),
> @@ -883,13 +884,13 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
>  	bool read_only = xe_vma_read_only(vma);
>  
>  	if (xe_vma_is_userptr(vma)) {
> -		if (vma->userptr.sg) {
> +		if (xe_vma_userptr_sg_mapped(vma)) {
>  			dma_unmap_sgtable(xe->drm.dev,
> -					  vma->userptr.sg,
> +					  &vma->userptr.sgt,
>  					  read_only ? DMA_TO_DEVICE :
>  					  DMA_BIDIRECTIONAL, 0);
> -			sg_free_table(vma->userptr.sg);
> -			vma->userptr.sg = NULL;
> +			sg_free_table(&vma->userptr.sgt);
> +			vma->gpuva.flags &= ~XE_VMA_USERPTR_SG_MAPPED;
>  		}
>  
>  		/*
> @@ -2309,7 +2310,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
>  						XE_VMA_READ_ONLY;
>  					bool null =
>  						op->base.remap.unmap->va->flags &
> -						XE_VMA_NULL;
> +						DRM_GPUVA_SPARSE;
>  
>  					vma = new_vma(vm, op->base.remap.prev,
>  						      op->gt_mask, read_only,
> @@ -2344,7 +2345,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
>  
>  					bool null =
>  						op->base.remap.unmap->va->flags &
> -						XE_VMA_NULL;
> +						DRM_GPUVA_SPARSE;
>  
>  					vma = new_vma(vm, op->base.remap.next,
>  						      op->gt_mask, read_only,
> @@ -3320,7 +3321,7 @@ int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
>  		} else if (is_userptr) {
>  			struct xe_res_cursor cur;
>  
> -			xe_res_first_sg(vma->userptr.sg, 0, XE_PAGE_SIZE, &cur);
> +			xe_res_first_sg(&vma->userptr.sgt, 0, XE_PAGE_SIZE, &cur);
>  			addr = xe_res_dma(&cur);
>  		} else {
>  			addr = xe_bo_addr(xe_vma_bo(vma), 0, XE_PAGE_SIZE, &is_vram);
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 12de652d8d1c..f279fa622260 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -101,12 +101,6 @@ static inline u64 xe_vma_bo_offset(struct xe_vma *vma)
>  	return vma->gpuva.gem.offset;
>  }
>  
> -static inline struct xe_bo *xe_vma_bo(struct xe_vma *vma)
> -{
> -	return !vma->gpuva.gem.obj ? NULL :
> -		container_of(vma->gpuva.gem.obj, struct xe_bo, ttm.base);
> -}
> -
>  static inline struct xe_vm *xe_vma_vm(struct xe_vma *vma)
>  {
>  	return container_of(vma->gpuva.mgr, struct xe_vm, mgr);
> @@ -129,7 +123,7 @@ static inline bool xe_vma_read_only(struct xe_vma *vma)
>  
>  static inline u64 xe_vma_userptr(struct xe_vma *vma)
>  {
> -	return vma->gpuva.gem.offset;
> +	return vma->gpuva.userptr.address;
>  }
>  
>  #define xe_vm_assert_held(vm) dma_resv_assert_held(&(vm)->mgr.resv)
> @@ -197,12 +191,18 @@ static inline void xe_vm_reactivate_rebind(struct xe_vm *vm)
>  
>  static inline bool xe_vma_is_null(struct xe_vma *vma)
>  {
> -	return vma->gpuva.flags & XE_VMA_NULL;
> +	return vma->gpuva.flags & DRM_GPUVA_SPARSE;
>  }
>  
>  static inline bool xe_vma_is_userptr(struct xe_vma *vma)
>  {
> -	return !xe_vma_bo(vma) && !xe_vma_is_null(vma);
> +	return vma->gpuva.flags & DRM_GPUVA_USERPTR;
> +}
> +
> +static inline struct xe_bo *xe_vma_bo(struct xe_vma *vma)
> +{
> +	return xe_vma_is_null(vma) || xe_vma_is_userptr(vma) ? NULL :
> +		container_of(vma->gpuva.gem.obj, struct xe_bo, ttm.base);
>  }
>  
>  static inline bool xe_vma_has_no_bo(struct xe_vma *vma)
> @@ -210,6 +210,11 @@ static inline bool xe_vma_has_no_bo(struct xe_vma *vma)
>  	return !xe_vma_bo(vma);
>  }
>  
> +static inline bool xe_vma_userptr_sg_mapped(struct xe_vma *vma)
> +{
> +	return vma->gpuva.flags & XE_VMA_USERPTR_SG_MAPPED;
> +}
> +
>  int xe_vma_userptr_pin_pages(struct xe_vma *vma);
>  
>  int xe_vma_userptr_check_repin(struct xe_vma *vma);
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index 0b59bde3bc4e..ce1260b8d3ef 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -23,15 +23,15 @@ struct xe_vm;
>  #define TEST_VM_ASYNC_OPS_ERROR
>  #define FORCE_ASYNC_OP_ERROR	BIT(31)
>  
> -#define XE_VMA_READ_ONLY	DRM_GPUVA_USERBITS
> -#define XE_VMA_DESTROYED	(DRM_GPUVA_USERBITS << 1)
> -#define XE_VMA_ATOMIC_PTE_BIT	(DRM_GPUVA_USERBITS << 2)
> -#define XE_VMA_FIRST_REBIND	(DRM_GPUVA_USERBITS << 3)
> -#define XE_VMA_LAST_REBIND	(DRM_GPUVA_USERBITS << 4)
> -#define XE_VMA_NULL		(DRM_GPUVA_USERBITS << 5)
> -#define XE_VMA_PTE_4K		(DRM_GPUVA_USERBITS << 6)
> -#define XE_VMA_PTE_2M		(DRM_GPUVA_USERBITS << 7)
> -#define XE_VMA_PTE_1G		(DRM_GPUVA_USERBITS << 8)
> +#define XE_VMA_READ_ONLY		DRM_GPUVA_USERBITS
> +#define XE_VMA_DESTROYED		(DRM_GPUVA_USERBITS << 1)
> +#define XE_VMA_ATOMIC_PTE_BIT		(DRM_GPUVA_USERBITS << 2)
> +#define XE_VMA_FIRST_REBIND		(DRM_GPUVA_USERBITS << 3)
> +#define XE_VMA_LAST_REBIND		(DRM_GPUVA_USERBITS << 4)
> +#define XE_VMA_USERPTR_SG_MAPPED	(DRM_GPUVA_USERBITS << 5)
> +#define XE_VMA_PTE_4K			(DRM_GPUVA_USERBITS << 6)
> +#define XE_VMA_PTE_2M			(DRM_GPUVA_USERBITS << 7)
> +#define XE_VMA_PTE_1G			(DRM_GPUVA_USERBITS << 8)
>  
>  /** struct xe_userptr - User pointer */
>  struct xe_userptr {
> @@ -41,8 +41,6 @@ struct xe_userptr {
>  	struct mmu_interval_notifier notifier;
>  	/** @sgt: storage for a scatter gather table */
>  	struct sg_table sgt;
> -	/** @sg: allocated scatter gather table */
> -	struct sg_table *sg;
>  	/** @notifier_seq: notifier sequence number */
>  	unsigned long notifier_seq;
>  	/**
> diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
> index 57861a7ed504..943c8fcda533 100644
> --- a/include/drm/drm_gpuva_mgr.h
> +++ b/include/drm/drm_gpuva_mgr.h
> @@ -62,10 +62,17 @@ enum drm_gpuva_flags {
>  	 */
>  	DRM_GPUVA_EXTOBJ = (1 << 2),
>  
> +	/**
> +	 * @DRM_GPUVA_USERPTR:
> +	 *
> +	 * Flag indicating that the &drm_gpuva is a user pointer mapping.
> +	 */
> +	DRM_GPUVA_USERPTR = (1 << 3),
> +
>  	/**
>  	 * @DRM_GPUVA_USERBITS: user defined bits
>  	 */
> -	DRM_GPUVA_USERBITS = (1 << 3),
> +	DRM_GPUVA_USERBITS = (1 << 4),
>  };
>  
>  /**
> @@ -102,31 +109,45 @@ struct drm_gpuva {
>  		u64 range;
>  	} va;
>  
> -	/**
> -	 * @gem: structure containing the &drm_gem_object and it's offset
> -	 */
> -	struct {
> -		/**
> -		 * @offset: the offset within the &drm_gem_object
> -		 */
> -		u64 offset;
> -
> -		/**
> -		 * @obj: the mapped &drm_gem_object
> -		 */
> -		struct drm_gem_object *obj;
> -
> +	union {
>  		/**
> -		 * @entry: the &list_head to attach this object to a &drm_gem_object
> +		 * @gem: structure containing the &drm_gem_object and it's
> +		 * offset
>  		 */
> -		struct list_head entry;
> +		struct {
> +			/**
> +			 * @offset: the offset within the &drm_gem_object
> +			 */
> +			u64 offset;
> +
> +			/**
> +			 * @obj: the mapped &drm_gem_object
> +			 */
> +			struct drm_gem_object *obj;
> +
> +			/**
> +			 * @entry: the &list_head to attach this object to a
> +			 * &drm_gem_object
> +			 */
> +			struct list_head entry;
> +
> +			/**
> +			 * @extobj_link: the &list_head to attach this object to
> +			 * a @drm_gpuva_manager.extobj.list
> +			 */
> +			struct list_head extobj_link;
> +		} gem;
>  
>  		/**
> -		 * @extobj_link: the &list_head to attach this object to a
> -		 * @drm_gpuva_manager.extobj.list
> +		 * @userptr: structure containing user pointer state
>  		 */
> -		struct list_head extobj_link;
> -	} gem;
> +		struct {
> +			/**
> +			 * @address: user pointer address
> +			 */
> +			u64 address;
> +		} userptr;
> +	};
>  };
>  
>  void drm_gpuva_link(struct drm_gpuva *va);
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 27/31] drm/xe: Use drm_exec for locking rather than TTM exec helpers
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 27/31] drm/xe: Use drm_exec for locking rather than TTM exec helpers Matthew Brost
@ 2023-05-05 19:42   ` Rodrigo Vivi
  2023-05-11 10:01   ` Thomas Hellström
  1 sibling, 0 replies; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-05 19:42 UTC (permalink / raw)
  To: Matthew Brost; +Cc: Danilo Krummrich, intel-xe

On Mon, May 01, 2023 at 05:17:23PM -0700, Matthew Brost wrote:
> drm_exec is intended to replace TTM exec helpers, use drm_exec. Also
> combine parts of drm_exec with gpuva where it makes sense (locking,
> fence installation).

here again... to many things in one patch and xe and drm mixed.

We need to split this up...

> 
> Suggested-by: Danilo Krummrich <dakr@redhat.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
> ---
>  drivers/gpu/drm/drm_gpuva_mgr.c              |  67 ++++-
>  drivers/gpu/drm/i915/display/intel_display.c |   6 +-
>  drivers/gpu/drm/xe/Kconfig                   |   1 +
>  drivers/gpu/drm/xe/tests/xe_bo.c             |  26 +-
>  drivers/gpu/drm/xe/tests/xe_migrate.c        |   6 +-
>  drivers/gpu/drm/xe/xe_bo.c                   |  56 ++--
>  drivers/gpu/drm/xe/xe_bo.h                   |   6 +-
>  drivers/gpu/drm/xe/xe_bo_evict.c             |  24 +-
>  drivers/gpu/drm/xe/xe_bo_types.h             |   1 -
>  drivers/gpu/drm/xe/xe_engine.c               |   7 +-
>  drivers/gpu/drm/xe/xe_exec.c                 |  37 +--
>  drivers/gpu/drm/xe/xe_gt_pagefault.c         |  55 +---
>  drivers/gpu/drm/xe/xe_lrc.c                  |   8 +-
>  drivers/gpu/drm/xe/xe_migrate.c              |  13 +-
>  drivers/gpu/drm/xe/xe_vm.c                   | 283 ++++++++-----------
>  drivers/gpu/drm/xe/xe_vm.h                   |  27 +-
>  drivers/gpu/drm/xe/xe_vm_madvise.c           |  37 +--
>  include/drm/drm_gpuva_mgr.h                  |  16 +-
>  18 files changed, 315 insertions(+), 361 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
> index e8cd6e154336..93c912c34211 100644
> --- a/drivers/gpu/drm/drm_gpuva_mgr.c
> +++ b/drivers/gpu/drm/drm_gpuva_mgr.c
> @@ -483,6 +483,50 @@ drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr)
>  }
>  EXPORT_SYMBOL(drm_gpuva_manager_destroy);
>  
> +/**
> + * TODO
> + */
> +int drm_gpuva_manager_lock(struct drm_gpuva_manager *mgr, struct drm_exec *exec,
> +			   struct drm_gem_object *mgr_obj, bool intr,
> +			   unsigned int num_fences)
> +{
> +	struct drm_gpuva *gpuva;
> +	int ret;
> +
> +	drm_exec_init(exec, intr);
> +	drm_exec_while_not_all_locked(exec) {
> +		ret = drm_exec_prepare_obj(exec, mgr_obj, num_fences);
> +		drm_exec_continue_on_contention(exec);
> +		if (ret && ret != -EALREADY)
> +			goto err_exec;
> +
> +		drm_gpuva_for_each_extobj(gpuva, mgr) {
> +			ret = drm_exec_prepare_obj(exec, gpuva->gem.obj,
> +						   num_fences);
> +			drm_exec_break_on_contention(exec);
> +			if (ret && ret != -EALREADY)
> +				goto err_exec;
> +		}
> +	}
> +
> +	return 0;
> +
> +err_exec:
> +	drm_exec_fini(exec);
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_gpuva_manager_lock);
> +
> +/**
> + * TODO
> + */
> +void drm_gpuva_manager_unlock(struct drm_gpuva_manager *mgr,
> +			      struct drm_exec *exec)
> +{
> +	drm_exec_fini(exec);
> +}
> +EXPORT_SYMBOL(drm_gpuva_manager_unlock);
> +
>  static inline bool
>  drm_gpuva_in_mm_range(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
>  {
> @@ -888,7 +932,7 @@ drm_gpuva_interval_empty(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
>  EXPORT_SYMBOL(drm_gpuva_interval_empty);
>  
>  /**
> - * drm_gpuva_add_fence - add fence to private and all extobj dma-resv
> + * drm_gpuva_manager_add_fence - add fence to private and all extobj dma-resv
>   * @mgr: the &drm_gpuva_manager to add a fence to
>   * @fence: fence to add
>   * @private_usage: private dma-resv usage
> @@ -896,17 +940,24 @@ EXPORT_SYMBOL(drm_gpuva_interval_empty);
>   *
>   * Returns: true if the interval is empty, false otherwise
>   */
> -void drm_gpuva_add_fence(struct drm_gpuva_manager *mgr, struct dma_fence *fence,
> -			 enum dma_resv_usage private_usage,
> -			 enum dma_resv_usage extobj_usage)
> +void drm_gpuva_manager_add_fence(struct drm_gpuva_manager *mgr,
> +				 struct drm_exec *exec,
> +				 struct dma_fence *fence,
> +				 enum dma_resv_usage private_usage,
> +				 enum dma_resv_usage extobj_usage)
>  {
> -	struct drm_gpuva *gpuva;
> +	struct drm_gem_object *obj;
> +	unsigned long index;
> +
> +	dma_resv_assert_held(&mgr->resv);
>  
>  	dma_resv_add_fence(&mgr->resv, fence, private_usage);
> -	drm_gpuva_for_each_extobj(gpuva, mgr)
> -		dma_resv_add_fence(gpuva->gem.obj->resv, fence, extobj_usage);
> +	drm_exec_for_each_locked_object(exec, index, obj)
> +		if (likely(&mgr->resv != obj->resv))
> +			dma_resv_add_fence(obj->resv, fence, extobj_usage);
>  }
> -EXPORT_SYMBOL(drm_gpuva_add_fence);
> +EXPORT_SYMBOL(drm_gpuva_manager_add_fence);
> +
>  
>  /**
>   * drm_gpuva_map - helper to insert a &drm_gpuva from &drm_gpuva_fn_ops
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> index 28a227450329..aab1a3a0f06d 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -7340,11 +7340,11 @@ static int i915_gem_object_read_from_page(struct xe_bo *bo,
>  	void *virtual;
>  	bool is_iomem;
>  	int ret;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  
>  	XE_BUG_ON(size != 8);
>  
> -	ret = xe_bo_lock(bo, &ww, 0, true);
> +	ret = xe_bo_lock(bo, &exec, 0, true);
>  	if (ret)
>  		return ret;
>  
> @@ -7361,7 +7361,7 @@ static int i915_gem_object_read_from_page(struct xe_bo *bo,
>  
>  	ttm_bo_kunmap(&map);
>  out_unlock:
> -	xe_bo_unlock(bo, &ww);
> +	xe_bo_unlock(bo, &exec);
>  	return ret;
>  }
>  #endif
> diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
> index f6f3b491d162..bbcc9b64b776 100644
> --- a/drivers/gpu/drm/xe/Kconfig
> +++ b/drivers/gpu/drm/xe/Kconfig
> @@ -8,6 +8,7 @@ config DRM_XE
>  	select SHMEM
>  	select TMPFS
>  	select DRM_BUDDY
> +	select DRM_EXEC
>  	select DRM_KMS_HELPER
>  	select DRM_PANEL
>  	select DRM_SUBALLOC_HELPER
> diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
> index 9bd381e5b7a6..316c6cf2bb86 100644
> --- a/drivers/gpu/drm/xe/tests/xe_bo.c
> +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
> @@ -175,17 +175,17 @@ static int evict_test_run_gt(struct xe_device *xe, struct xe_gt *gt, struct kuni
>  	unsigned int bo_flags = XE_BO_CREATE_USER_BIT |
>  		XE_BO_CREATE_VRAM_IF_DGFX(gt);
>  	struct xe_vm *vm = xe_migrate_get_vm(xe->gt[0].migrate);
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  	int err, i;
>  
>  	kunit_info(test, "Testing device %s gt id %u vram id %u\n",
>  		   dev_name(xe->drm.dev), gt->info.id, gt->info.vram_id);
>  
>  	for (i = 0; i < 2; ++i) {
> -		xe_vm_lock(vm, &ww, 0, false);
> +		xe_vm_lock(vm, &exec, 0, false);
>  		bo = xe_bo_create(xe, NULL, vm, 0x10000, ttm_bo_type_device,
>  				  bo_flags);
> -		xe_vm_unlock(vm, &ww);
> +		xe_vm_unlock(vm, &exec);
>  		if (IS_ERR(bo)) {
>  			KUNIT_FAIL(test, "bo create err=%pe\n", bo);
>  			break;
> @@ -198,9 +198,9 @@ static int evict_test_run_gt(struct xe_device *xe, struct xe_gt *gt, struct kuni
>  			goto cleanup_bo;
>  		}
>  
> -		xe_bo_lock(external, &ww, 0, false);
> +		xe_bo_lock(external, &exec, 0, false);
>  		err = xe_bo_pin_external(external);
> -		xe_bo_unlock(external, &ww);
> +		xe_bo_unlock(external, &exec);
>  		if (err) {
>  			KUNIT_FAIL(test, "external bo pin err=%pe\n",
>  				   ERR_PTR(err));
> @@ -240,18 +240,18 @@ static int evict_test_run_gt(struct xe_device *xe, struct xe_gt *gt, struct kuni
>  
>  		if (i) {
>  			down_read(&vm->lock);
> -			xe_vm_lock(vm, &ww, 0, false);
> +			xe_vm_lock(vm, &exec, 0, false);
>  			err = xe_bo_validate(bo, bo->vm, false);
> -			xe_vm_unlock(vm, &ww);
> +			xe_vm_unlock(vm, &exec);
>  			up_read(&vm->lock);
>  			if (err) {
>  				KUNIT_FAIL(test, "bo valid err=%pe\n",
>  					   ERR_PTR(err));
>  				goto cleanup_all;
>  			}
> -			xe_bo_lock(external, &ww, 0, false);
> +			xe_bo_lock(external, &exec, 0, false);
>  			err = xe_bo_validate(external, NULL, false);
> -			xe_bo_unlock(external, &ww);
> +			xe_bo_unlock(external, &exec);
>  			if (err) {
>  				KUNIT_FAIL(test, "external bo valid err=%pe\n",
>  					   ERR_PTR(err));
> @@ -259,18 +259,18 @@ static int evict_test_run_gt(struct xe_device *xe, struct xe_gt *gt, struct kuni
>  			}
>  		}
>  
> -		xe_bo_lock(external, &ww, 0, false);
> +		xe_bo_lock(external, &exec, 0, false);
>  		xe_bo_unpin_external(external);
> -		xe_bo_unlock(external, &ww);
> +		xe_bo_unlock(external, &exec);
>  
>  		xe_bo_put(external);
>  		xe_bo_put(bo);
>  		continue;
>  
>  cleanup_all:
> -		xe_bo_lock(external, &ww, 0, false);
> +		xe_bo_lock(external, &exec, 0, false);
>  		xe_bo_unpin_external(external);
> -		xe_bo_unlock(external, &ww);
> +		xe_bo_unlock(external, &exec);
>  cleanup_external:
>  		xe_bo_put(external);
>  cleanup_bo:
> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> index 0f4371ad1fd9..e1482b4491b1 100644
> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> @@ -394,14 +394,14 @@ static int migrate_test_run_device(struct xe_device *xe)
>  
>  	for_each_gt(gt, xe, id) {
>  		struct xe_migrate *m = gt->migrate;
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>  
>  		kunit_info(test, "Testing gt id %d.\n", id);
> -		xe_vm_lock(m->eng->vm, &ww, 0, true);
> +		xe_vm_lock(m->eng->vm, &exec, 0, true);
>  		xe_device_mem_access_get(xe);
>  		xe_migrate_sanity_test(m, test);
>  		xe_device_mem_access_put(xe);
> -		xe_vm_unlock(m->eng->vm, &ww);
> +		xe_vm_unlock(m->eng->vm, &exec);
>  	}
>  
>  	return 0;
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index e0422ffb6327..a427edbf486b 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -8,6 +8,7 @@
>  #include <linux/dma-buf.h>
>  
>  #include <drm/drm_drv.h>
> +#include <drm/drm_exec.h>
>  #include <drm/drm_gem_ttm_helper.h>
>  #include <drm/ttm/ttm_device.h>
>  #include <drm/ttm/ttm_placement.h>
> @@ -991,13 +992,13 @@ static void xe_gem_object_close(struct drm_gem_object *obj,
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
>  
>  	if (bo->vm && !xe_vm_no_dma_fences(bo->vm)) {
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>  
>  		XE_BUG_ON(!xe_bo_is_user(bo));
>  
> -		xe_bo_lock(bo, &ww, 0, false);
> +		xe_bo_lock(bo, &exec, 0, false);
>  		ttm_bo_set_bulk_move(&bo->ttm, NULL);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>  	}
>  }
>  
> @@ -1402,11 +1403,6 @@ int xe_bo_pin_external(struct xe_bo *bo)
>  	}
>  
>  	ttm_bo_pin(&bo->ttm);
> -
> -	/*
> -	 * FIXME: If we always use the reserve / unreserve functions for locking
> -	 * we do not need this.
> -	 */
>  	ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
>  
>  	return 0;
> @@ -1461,11 +1457,6 @@ int xe_bo_pin(struct xe_bo *bo)
>  	}
>  
>  	ttm_bo_pin(&bo->ttm);
> -
> -	/*
> -	 * FIXME: If we always use the reserve / unreserve functions for locking
> -	 * we do not need this.
> -	 */
>  	ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
>  
>  	return 0;
> @@ -1496,11 +1487,6 @@ void xe_bo_unpin_external(struct xe_bo *bo)
>  	}
>  
>  	ttm_bo_unpin(&bo->ttm);
> -
> -	/*
> -	 * FIXME: If we always use the reserve / unreserve functions for locking
> -	 * we do not need this.
> -	 */
>  	ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
>  }
>  
> @@ -1650,7 +1636,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  	struct xe_device *xe = to_xe_device(dev);
>  	struct xe_file *xef = to_xe_file(file);
>  	struct drm_xe_gem_create *args = data;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  	struct xe_vm *vm = NULL;
>  	struct xe_bo *bo;
>  	unsigned bo_flags = XE_BO_CREATE_USER_BIT;
> @@ -1686,7 +1672,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  		vm = xe_vm_lookup(xef, args->vm_id);
>  		if (XE_IOCTL_ERR(xe, !vm))
>  			return -ENOENT;
> -		err = xe_vm_lock(vm, &ww, 0, true);
> +		err = xe_vm_lock(vm, &exec, 0, true);
>  		if (err) {
>  			xe_vm_put(vm);
>  			return err;
> @@ -1703,7 +1689,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  	bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
>  			  bo_flags);
>  	if (vm) {
> -		xe_vm_unlock(vm, &ww);
> +		xe_vm_unlock(vm, &exec);
>  		xe_vm_put(vm);
>  	}
>  
> @@ -1744,26 +1730,30 @@ int xe_gem_mmap_offset_ioctl(struct drm_device *dev, void *data,
>  	return 0;
>  }
>  
> -int xe_bo_lock(struct xe_bo *bo, struct ww_acquire_ctx *ww,
> +int xe_bo_lock(struct xe_bo *bo, struct drm_exec *exec,
>  	       int num_resv, bool intr)
>  {
> -	struct ttm_validate_buffer tv_bo;
> -	LIST_HEAD(objs);
> -	LIST_HEAD(dups);
> +	int err;
>  
> -	XE_BUG_ON(!ww);
> +	drm_exec_init(exec, intr);
> +	drm_exec_while_not_all_locked(exec) {
> +		err = drm_exec_prepare_obj(exec, &bo->ttm.base,
> +					   num_resv);
> +		drm_exec_continue_on_contention(exec);
> +		if (err && err != -EALREADY)
> +			goto out_err;
> +	}
>  
> -	tv_bo.num_shared = num_resv;
> -	tv_bo.bo = &bo->ttm;;
> -	list_add_tail(&tv_bo.head, &objs);
> +	return 0;
>  
> -	return ttm_eu_reserve_buffers(ww, &objs, intr, &dups);
> +out_err:
> +	drm_exec_fini(exec);
> +	return err;
>  }
>  
> -void xe_bo_unlock(struct xe_bo *bo, struct ww_acquire_ctx *ww)
> +void xe_bo_unlock(struct xe_bo *bo, struct drm_exec *exec)
>  {
> -	dma_resv_unlock(bo->ttm.base.resv);
> -	ww_acquire_fini(ww);
> +	drm_exec_fini(exec);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index 9b401d30a130..5a80ebf72d10 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -75,6 +75,7 @@
>  
>  #define XE_BO_PROPS_INVALID	(-1)
>  
> +struct drm_exec;
>  struct sg_table;
>  
>  struct xe_bo *xe_bo_alloc(void);
> @@ -142,10 +143,9 @@ static inline void xe_bo_assert_held(struct xe_bo *bo)
>  		dma_resv_assert_held((bo)->ttm.base.resv);
>  }
>  
> -int xe_bo_lock(struct xe_bo *bo, struct ww_acquire_ctx *ww,
> +int xe_bo_lock(struct xe_bo *bo, struct drm_exec *exec,
>  	       int num_resv, bool intr);
> -
> -void xe_bo_unlock(struct xe_bo *bo, struct ww_acquire_ctx *ww);
> +void xe_bo_unlock(struct xe_bo *bo, struct drm_exec *exec);
>  
>  static inline void xe_bo_unlock_vm_held(struct xe_bo *bo)
>  {
> diff --git a/drivers/gpu/drm/xe/xe_bo_evict.c b/drivers/gpu/drm/xe/xe_bo_evict.c
> index 6642c5f52009..46d9d9eb110c 100644
> --- a/drivers/gpu/drm/xe/xe_bo_evict.c
> +++ b/drivers/gpu/drm/xe/xe_bo_evict.c
> @@ -3,6 +3,8 @@
>   * Copyright © 2022 Intel Corporation
>   */
>  
> +#include <drm/drm_exec.h>
> +
>  #include "xe_bo_evict.h"
>  
>  #include "xe_bo.h"
> @@ -27,7 +29,7 @@
>  int xe_bo_evict_all(struct xe_device *xe)
>  {
>  	struct ttm_device *bdev = &xe->ttm;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  	struct xe_bo *bo;
>  	struct xe_gt *gt;
>  	struct list_head still_in_list;
> @@ -62,9 +64,9 @@ int xe_bo_evict_all(struct xe_device *xe)
>  		list_move_tail(&bo->pinned_link, &still_in_list);
>  		spin_unlock(&xe->pinned.lock);
>  
> -		xe_bo_lock(bo, &ww, 0, false);
> +		xe_bo_lock(bo, &exec, 0, false);
>  		ret = xe_bo_evict_pinned(bo);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>  		xe_bo_put(bo);
>  		if (ret) {
>  			spin_lock(&xe->pinned.lock);
> @@ -96,9 +98,9 @@ int xe_bo_evict_all(struct xe_device *xe)
>  		list_move_tail(&bo->pinned_link, &xe->pinned.evicted);
>  		spin_unlock(&xe->pinned.lock);
>  
> -		xe_bo_lock(bo, &ww, 0, false);
> +		xe_bo_lock(bo, &exec, 0, false);
>  		ret = xe_bo_evict_pinned(bo);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>  		xe_bo_put(bo);
>  		if (ret)
>  			return ret;
> @@ -123,7 +125,7 @@ int xe_bo_evict_all(struct xe_device *xe)
>   */
>  int xe_bo_restore_kernel(struct xe_device *xe)
>  {
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  	struct xe_bo *bo;
>  	int ret;
>  
> @@ -140,9 +142,9 @@ int xe_bo_restore_kernel(struct xe_device *xe)
>  		list_move_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present);
>  		spin_unlock(&xe->pinned.lock);
>  
> -		xe_bo_lock(bo, &ww, 0, false);
> +		xe_bo_lock(bo, &exec, 0, false);
>  		ret = xe_bo_restore_pinned(bo);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>  		if (ret) {
>  			xe_bo_put(bo);
>  			return ret;
> @@ -182,7 +184,7 @@ int xe_bo_restore_kernel(struct xe_device *xe)
>   */
>  int xe_bo_restore_user(struct xe_device *xe)
>  {
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  	struct xe_bo *bo;
>  	struct xe_gt *gt;
>  	struct list_head still_in_list;
> @@ -204,9 +206,9 @@ int xe_bo_restore_user(struct xe_device *xe)
>  		xe_bo_get(bo);
>  		spin_unlock(&xe->pinned.lock);
>  
> -		xe_bo_lock(bo, &ww, 0, false);
> +		xe_bo_lock(bo, &exec, 0, false);
>  		ret = xe_bo_restore_pinned(bo);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>  		xe_bo_put(bo);
>  		if (ret) {
>  			spin_lock(&xe->pinned.lock);
> diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
> index 06de3330211d..2ba34a8c9b66 100644
> --- a/drivers/gpu/drm/xe/xe_bo_types.h
> +++ b/drivers/gpu/drm/xe/xe_bo_types.h
> @@ -11,7 +11,6 @@
>  #include <drm/drm_mm.h>
>  #include <drm/ttm/ttm_bo.h>
>  #include <drm/ttm/ttm_device.h>
> -#include <drm/ttm/ttm_execbuf_util.h>
>  #include <drm/ttm/ttm_placement.h>
>  
>  struct xe_device;
> diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
> index 91600b1e8249..8b425b777259 100644
> --- a/drivers/gpu/drm/xe/xe_engine.c
> +++ b/drivers/gpu/drm/xe/xe_engine.c
> @@ -8,6 +8,7 @@
>  #include <linux/nospec.h>
>  
>  #include <drm/drm_device.h>
> +#include <drm/drm_exec.h>
>  #include <drm/drm_file.h>
>  #include <drm/xe_drm.h>
>  
> @@ -89,18 +90,18 @@ struct xe_engine *xe_engine_create(struct xe_device *xe, struct xe_vm *vm,
>  				   u32 logical_mask, u16 width,
>  				   struct xe_hw_engine *hwe, u32 flags)
>  {
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  	struct xe_engine *e;
>  	int err;
>  
>  	if (vm) {
> -		err = xe_vm_lock(vm, &ww, 0, true);
> +		err = xe_vm_lock(vm, &exec, 0, true);
>  		if (err)
>  			return ERR_PTR(err);
>  	}
>  	e = __xe_engine_create(xe, vm, logical_mask, width, hwe, flags);
>  	if (vm)
> -		xe_vm_unlock(vm, &ww);
> +		xe_vm_unlock(vm, &exec);
>  
>  	return e;
>  }
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index 2ae02f1500d5..9f7f1088c403 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -6,6 +6,7 @@
>  #include "xe_exec.h"
>  
>  #include <drm/drm_device.h>
> +#include <drm/drm_exec.h>
>  #include <drm/drm_file.h>
>  #include <drm/xe_drm.h>
>  
> @@ -92,21 +93,16 @@
>   *	Unlock all
>   */
>  
> -static int xe_exec_begin(struct xe_engine *e, struct ww_acquire_ctx *ww,
> -			 struct ttm_validate_buffer tv_onstack[],
> -			 struct ttm_validate_buffer **tv,
> -			 struct list_head *objs)
> +static int xe_exec_begin(struct xe_engine *e, struct drm_exec *exec)
>  {
>  	struct xe_vm *vm = e->vm;
>  	struct xe_vma *vma;
> -	LIST_HEAD(dups);
>  	int err;
>  
> -	*tv = NULL;
>  	if (xe_vm_no_dma_fences(e->vm))
>  		return 0;
>  
> -	err = xe_vm_lock_dma_resv(vm, ww, tv_onstack, tv, objs, true, 1);
> +	err = xe_vm_lock_dma_resv(vm, exec, true, 1);
>  	if (err)
>  		return err;
>  
> @@ -123,8 +119,7 @@ static int xe_exec_begin(struct xe_engine *e, struct ww_acquire_ctx *ww,
>  
>  		err = xe_bo_validate(xe_vma_bo(vma), vm, false);
>  		if (err) {
> -			xe_vm_unlock_dma_resv(vm, tv_onstack, *tv, ww, objs);
> -			*tv = NULL;
> +			xe_vm_unlock_dma_resv(vm, exec);
>  			return err;
>  		}
>  	}
> @@ -132,14 +127,10 @@ static int xe_exec_begin(struct xe_engine *e, struct ww_acquire_ctx *ww,
>  	return 0;
>  }
>  
> -static void xe_exec_end(struct xe_engine *e,
> -			struct ttm_validate_buffer *tv_onstack,
> -			struct ttm_validate_buffer *tv,
> -			struct ww_acquire_ctx *ww,
> -			struct list_head *objs)
> +static void xe_exec_end(struct xe_engine *e, struct drm_exec *exec)
>  {
>  	if (!xe_vm_no_dma_fences(e->vm))
> -		xe_vm_unlock_dma_resv(e->vm, tv_onstack, tv, ww, objs);
> +		xe_vm_unlock_dma_resv(e->vm, exec);
>  }
>  
>  int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> @@ -149,17 +140,14 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  	struct drm_xe_exec *args = data;
>  	struct drm_xe_sync __user *syncs_user = u64_to_user_ptr(args->syncs);
>  	u64 __user *addresses_user = u64_to_user_ptr(args->address);
> +	struct drm_exec exec;
>  	struct xe_engine *engine;
>  	struct xe_sync_entry *syncs = NULL;
>  	u64 addresses[XE_HW_ENGINE_MAX_INSTANCE];
> -	struct ttm_validate_buffer tv_onstack[XE_ONSTACK_TV];
> -	struct ttm_validate_buffer *tv = NULL;
>  	u32 i, num_syncs = 0;
>  	struct xe_sched_job *job;
>  	struct dma_fence *rebind_fence;
>  	struct xe_vm *vm;
> -	struct ww_acquire_ctx ww;
> -	struct list_head objs;
>  	bool write_locked;
>  	int err = 0;
>  
> @@ -270,7 +258,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  			goto err_unlock_list;
>  	}
>  
> -	err = xe_exec_begin(engine, &ww, tv_onstack, &tv, &objs);
> +	err = xe_exec_begin(engine, &exec);
>  	if (err)
>  		goto err_unlock_list;
>  
> @@ -361,9 +349,10 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  	 * are written as we don't pass in a read / write list.
>  	 */
>  	if (!xe_vm_no_dma_fences(vm))
> -		drm_gpuva_add_fence(&vm->mgr, &job->drm.s_fence->finished,
> -				    DMA_RESV_USAGE_BOOKKEEP,
> -				    DMA_RESV_USAGE_WRITE);
> +		drm_gpuva_manager_add_fence(&vm->mgr, &exec,
> +					    &job->drm.s_fence->finished,
> +					    DMA_RESV_USAGE_BOOKKEEP,
> +					    DMA_RESV_USAGE_WRITE);
>  
>  	for (i = 0; i < num_syncs; i++)
>  		xe_sync_entry_signal(&syncs[i], job,
> @@ -387,7 +376,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  	if (err)
>  		xe_sched_job_put(job);
>  err_engine_end:
> -	xe_exec_end(engine, tv_onstack, tv, &ww, &objs);
> +	xe_exec_end(engine, &exec);
>  err_unlock_list:
>  	if (write_locked)
>  		up_write(&vm->lock);
> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> index d7bf6b0a0697..1145c6eaa17d 100644
> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> @@ -9,7 +9,7 @@
>  #include <linux/circ_buf.h>
>  
>  #include <drm/drm_managed.h>
> -#include <drm/ttm/ttm_execbuf_util.h>
> +#include <drm/drm_exec.h>
>  
>  #include "xe_bo.h"
>  #include "xe_gt.h"
> @@ -84,11 +84,6 @@ static bool vma_matches(struct xe_vma *vma, u64 page_addr)
>  	return true;
>  }
>  
> -static bool only_needs_bo_lock(struct xe_bo *bo)
> -{
> -	return bo && bo->vm;
> -}
> -
>  static struct xe_vma *lookup_vma(struct xe_vm *vm, u64 page_addr)
>  {
>  	struct xe_vma *vma = NULL;
> @@ -109,10 +104,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
>  	struct xe_vm *vm;
>  	struct xe_vma *vma = NULL;
>  	struct xe_bo *bo;
> -	LIST_HEAD(objs);
> -	LIST_HEAD(dups);
> -	struct ttm_validate_buffer tv_bo, tv_vm;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  	struct dma_fence *fence;
>  	bool write_locked;
>  	int ret = 0;
> @@ -170,20 +162,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
>  
>  	/* Lock VM and BOs dma-resv */
>  	bo = xe_vma_bo(vma);
> -	if (only_needs_bo_lock(bo)) {
> -		/* This path ensures the BO's LRU is updated */
> -		ret = xe_bo_lock(bo, &ww, xe->info.tile_count, false);
> -	} else {
> -		tv_vm.num_shared = xe->info.tile_count;
> -		tv_vm.bo = xe_vm_ttm_bo(vm);
> -		list_add(&tv_vm.head, &objs);
> -		if (bo) {
> -			tv_bo.bo = &bo->ttm;
> -			tv_bo.num_shared = xe->info.tile_count;
> -			list_add(&tv_bo.head, &objs);
> -		}
> -		ret = ttm_eu_reserve_buffers(&ww, &objs, false, &dups);
> -	}
> +	ret = xe_vm_bo_lock(vm, bo, &exec, xe->info.tile_count, false);
>  	if (ret)
>  		goto unlock_vm;
>  
> @@ -226,10 +205,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
>  	vma->usm.gt_invalidated &= ~BIT(gt->info.id);
>  
>  unlock_dma_resv:
> -	if (only_needs_bo_lock(bo))
> -		xe_bo_unlock(bo, &ww);
> -	else
> -		ttm_eu_backoff_reservation(&ww, &objs);
> +	xe_vm_bo_unlock(vm, bo, &exec, true);
>  unlock_vm:
>  	if (!ret)
>  		vm->usm.last_fault_vma = vma;
> @@ -496,10 +472,7 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
>  	struct xe_vm *vm;
>  	struct xe_vma *vma;
>  	struct xe_bo *bo;
> -	LIST_HEAD(objs);
> -	LIST_HEAD(dups);
> -	struct ttm_validate_buffer tv_bo, tv_vm;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  	int ret = 0;
>  
>  	/* We only support ACC_TRIGGER at the moment */
> @@ -532,28 +505,14 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
>  
>  	/* Lock VM and BOs dma-resv */
>  	bo = xe_vma_bo(vma);
> -	if (only_needs_bo_lock(bo)) {
> -		/* This path ensures the BO's LRU is updated */
> -		ret = xe_bo_lock(bo, &ww, xe->info.tile_count, false);
> -	} else {
> -		tv_vm.num_shared = xe->info.tile_count;
> -		tv_vm.bo = xe_vm_ttm_bo(vm);
> -		list_add(&tv_vm.head, &objs);
> -		tv_bo.bo = &bo->ttm;
> -		tv_bo.num_shared = xe->info.tile_count;
> -		list_add(&tv_bo.head, &objs);
> -		ret = ttm_eu_reserve_buffers(&ww, &objs, false, &dups);
> -	}
> +	ret = xe_vm_bo_lock(vm, bo, &exec, xe->info.tile_count, false);
>  	if (ret)
>  		goto unlock_vm;
>  
>  	/* Migrate to VRAM, move should invalidate the VMA first */
>  	ret = xe_bo_migrate(bo, XE_PL_VRAM0 + gt->info.vram_id);
>  
> -	if (only_needs_bo_lock(bo))
> -		xe_bo_unlock(bo, &ww);
> -	else
> -		ttm_eu_backoff_reservation(&ww, &objs);
> +	xe_vm_bo_unlock(vm, bo, &exec, true);
>  unlock_vm:
>  	up_read(&vm->lock);
>  	xe_vm_put(vm);
> diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
> index ae605e7805de..3cc34efe8dd8 100644
> --- a/drivers/gpu/drm/xe/xe_lrc.c
> +++ b/drivers/gpu/drm/xe/xe_lrc.c
> @@ -3,6 +3,8 @@
>   * Copyright © 2021 Intel Corporation
>   */
>  
> +#include <drm/drm_exec.h>
> +
>  #include "xe_lrc.h"
>  
>  #include "regs/xe_engine_regs.h"
> @@ -712,16 +714,16 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
>  
>  void xe_lrc_finish(struct xe_lrc *lrc)
>  {
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  
>  	xe_hw_fence_ctx_finish(&lrc->fence_ctx);
>  	if (lrc->bo->vm)
> -		xe_vm_lock(lrc->bo->vm, &ww, 0, false);
> +		xe_vm_lock(lrc->bo->vm, &exec, 0, false);
>  	else
>  		xe_bo_lock_no_vm(lrc->bo, NULL);
>  	xe_bo_unpin(lrc->bo);
>  	if (lrc->bo->vm)
> -		xe_vm_unlock(lrc->bo->vm, &ww);
> +		xe_vm_unlock(lrc->bo->vm, &exec);
>  	else
>  		xe_bo_unlock_no_vm(lrc->bo);
>  	xe_bo_put(lrc->bo);
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index 91a06c925a1e..1dd497252640 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -9,6 +9,7 @@
>  #include <linux/sizes.h>
>  
>  #include <drm/drm_managed.h>
> +#include <drm/drm_exec.h>
>  #include <drm/ttm/ttm_tt.h>
>  #include <drm/xe_drm.h>
>  
> @@ -86,13 +87,13 @@ struct xe_engine *xe_gt_migrate_engine(struct xe_gt *gt)
>  static void xe_migrate_fini(struct drm_device *dev, void *arg)
>  {
>  	struct xe_migrate *m = arg;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  
> -	xe_vm_lock(m->eng->vm, &ww, 0, false);
> +	xe_vm_lock(m->eng->vm, &exec, 0, false);
>  	xe_bo_unpin(m->pt_bo);
>  	if (m->cleared_bo)
>  		xe_bo_unpin(m->cleared_bo);
> -	xe_vm_unlock(m->eng->vm, &ww);
> +	xe_vm_unlock(m->eng->vm, &exec);
>  
>  	dma_fence_put(m->fence);
>  	if (m->cleared_bo)
> @@ -315,7 +316,7 @@ struct xe_migrate *xe_migrate_init(struct xe_gt *gt)
>  	struct xe_device *xe = gt_to_xe(gt);
>  	struct xe_migrate *m;
>  	struct xe_vm *vm;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  	int err;
>  
>  	XE_BUG_ON(xe_gt_is_media_type(gt));
> @@ -332,9 +333,9 @@ struct xe_migrate *xe_migrate_init(struct xe_gt *gt)
>  	if (IS_ERR(vm))
>  		return ERR_CAST(vm);
>  
> -	xe_vm_lock(vm, &ww, 0, false);
> +	xe_vm_lock(vm, &exec, 0, false);
>  	err = xe_migrate_prepare_vm(gt, m, vm);
> -	xe_vm_unlock(vm, &ww);
> +	xe_vm_unlock(vm, &exec);
>  	if (err) {
>  		xe_vm_close_and_put(vm);
>  		return ERR_PTR(err);
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 4d734ec4d6ab..55cced8870e6 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -7,7 +7,7 @@
>  
>  #include <linux/dma-fence-array.h>
>  
> -#include <drm/ttm/ttm_execbuf_util.h>
> +#include <drm/drm_exec.h>
>  #include <drm/ttm/ttm_tt.h>
>  #include <drm/xe_drm.h>
>  #include <linux/kthread.h>
> @@ -260,10 +260,10 @@ static void arm_preempt_fences(struct xe_vm *vm, struct list_head *list)
>  static int add_preempt_fences(struct xe_vm *vm, struct xe_bo *bo)
>  {
>  	struct xe_engine *e;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  	int err;
>  
> -	err = xe_bo_lock(bo, &ww, vm->preempt.num_engines, true);
> +	err = xe_bo_lock(bo, &exec, vm->preempt.num_engines, true);
>  	if (err)
>  		return err;
>  
> @@ -274,11 +274,12 @@ static int add_preempt_fences(struct xe_vm *vm, struct xe_bo *bo)
>  					   DMA_RESV_USAGE_BOOKKEEP);
>  		}
>  
> -	xe_bo_unlock(bo, &ww);
> +	xe_bo_unlock(bo, &exec);
>  	return 0;
>  }
>  
> -static void resume_and_reinstall_preempt_fences(struct xe_vm *vm)
> +static void resume_and_reinstall_preempt_fences(struct xe_vm *vm,
> +						struct drm_exec *exec)
>  {
>  	struct xe_engine *e;
>  
> @@ -288,18 +289,15 @@ static void resume_and_reinstall_preempt_fences(struct xe_vm *vm)
>  	list_for_each_entry(e, &vm->preempt.engines, compute.link) {
>  		e->ops->resume(e);
>  
> -		drm_gpuva_add_fence(&vm->mgr, e->compute.pfence,
> -				    DMA_RESV_USAGE_BOOKKEEP,
> -				    DMA_RESV_USAGE_BOOKKEEP);
> +		drm_gpuva_manager_add_fence(&vm->mgr, exec, e->compute.pfence,
> +					    DMA_RESV_USAGE_BOOKKEEP,
> +					    DMA_RESV_USAGE_BOOKKEEP);
>  	}
>  }
>  
>  int xe_vm_add_compute_engine(struct xe_vm *vm, struct xe_engine *e)
>  {
> -	struct ttm_validate_buffer tv_onstack[XE_ONSTACK_TV];
> -	struct ttm_validate_buffer *tv;
> -	struct ww_acquire_ctx ww;
> -	struct list_head objs;
> +	struct drm_exec exec;
>  	struct dma_fence *pfence;
>  	int err;
>  	bool wait;
> @@ -308,7 +306,7 @@ int xe_vm_add_compute_engine(struct xe_vm *vm, struct xe_engine *e)
>  
>  	down_write(&vm->lock);
>  
> -	err = xe_vm_lock_dma_resv(vm, &ww, tv_onstack, &tv, &objs, true, 1);
> +	err = xe_vm_lock_dma_resv(vm, &exec, true, 1);
>  	if (err)
>  		goto out_unlock_outer;
>  
> @@ -325,9 +323,9 @@ int xe_vm_add_compute_engine(struct xe_vm *vm, struct xe_engine *e)
>  
>  	down_read(&vm->userptr.notifier_lock);
>  
> -	drm_gpuva_add_fence(&vm->mgr, pfence,
> -			    DMA_RESV_USAGE_BOOKKEEP,
> -			    DMA_RESV_USAGE_BOOKKEEP);
> +	drm_gpuva_manager_add_fence(&vm->mgr, &exec, pfence,
> +				    DMA_RESV_USAGE_BOOKKEEP,
> +				    DMA_RESV_USAGE_BOOKKEEP);
>  
>  	/*
>  	 * Check to see if a preemption on VM is in flight or userptr
> @@ -341,7 +339,7 @@ int xe_vm_add_compute_engine(struct xe_vm *vm, struct xe_engine *e)
>  	up_read(&vm->userptr.notifier_lock);
>  
>  out_unlock:
> -	xe_vm_unlock_dma_resv(vm, tv_onstack, tv, &ww, &objs);
> +	xe_vm_unlock_dma_resv(vm, &exec);
>  out_unlock_outer:
>  	up_write(&vm->lock);
>  
> @@ -367,25 +365,24 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm)
>  		list_empty(&vm->userptr.invalidated)) ? 0 : -EAGAIN;
>  }
>  
> +static struct drm_gem_object *xe_vm_gem(struct xe_vm *vm)
> +{
> +	int idx = vm->flags & XE_VM_FLAG_MIGRATION ?
> +		XE_VM_FLAG_GT_ID(vm->flags) : 0;
> +
> +	/* Safe to use index 0 as all BO in the VM share a single dma-resv lock */
> +	return &vm->pt_root[idx]->bo->ttm.base;
> +}
> +
>  /**
>   * xe_vm_lock_dma_resv() - Lock the vm dma_resv object and the dma_resv
>   * objects of the vm's external buffer objects.
> - * @vm: The vm.
> - * @ww: Pointer to a struct ww_acquire_ctx locking context.
> - * @tv_onstack: Array size XE_ONSTACK_TV of storage for the struct
> - * ttm_validate_buffers used for locking.
> - * @tv: Pointer to a pointer that on output contains the actual storage used.
> - * @objs: List head for the buffer objects locked.
> + * @vm: The vm
>   * @intr: Whether to lock interruptible.
>   * @num_shared: Number of dma-fence slots to reserve in the locked objects.
>   *
>   * Locks the vm dma-resv objects and all the dma-resv objects of the
> - * buffer objects on the vm external object list. The TTM utilities require
> - * a list of struct ttm_validate_buffers pointing to the actual buffer
> - * objects to lock. Storage for those struct ttm_validate_buffers should
> - * be provided in @tv_onstack, and is typically reserved on the stack
> - * of the caller. If the size of @tv_onstack isn't sufficient, then
> - * storage will be allocated internally using kvmalloc().
> + * buffer objects on the vm external object list.
>   *
>   * The function performs deadlock handling internally, and after a
>   * successful return the ww locking transaction should be considered
> @@ -395,46 +392,18 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm)
>   * @intr is set to true, -EINTR or -ERESTARTSYS may be returned. In case
>   * of error, any locking performed has been reverted.
>   */
> -int xe_vm_lock_dma_resv(struct xe_vm *vm, struct ww_acquire_ctx *ww,
> -			struct ttm_validate_buffer *tv_onstack,
> -			struct ttm_validate_buffer **tv,
> -			struct list_head *objs,
> -			bool intr,
> +int xe_vm_lock_dma_resv(struct xe_vm *vm, struct drm_exec *exec, bool intr,
>  			unsigned int num_shared)
>  {
> -	struct ttm_validate_buffer *tv_vm, *tv_bo;
>  	struct xe_vma *vma, *next;
> -	struct drm_gpuva *gpuva;
> -	LIST_HEAD(dups);
>  	int err;
>  
>  	lockdep_assert_held(&vm->lock);
>  
> -	if (vm->mgr.extobj.entries < XE_ONSTACK_TV) {
> -		tv_vm = tv_onstack;
> -	} else {
> -		tv_vm = kvmalloc_array(vm->mgr.extobj.entries + 1,
> -				       sizeof(*tv_vm),
> -				       GFP_KERNEL);
> -		if (!tv_vm)
> -			return -ENOMEM;
> -	}
> -	tv_bo = tv_vm + 1;
> -
> -	INIT_LIST_HEAD(objs);
> -	drm_gpuva_for_each_extobj(gpuva, &vm->mgr) {
> -		tv_bo->num_shared = num_shared;
> -		tv_bo->bo = &gem_to_xe_bo(gpuva->gem.obj)->ttm;
> -
> -		list_add_tail(&tv_bo->head, objs);
> -		tv_bo++;
> -	}
> -	tv_vm->num_shared = num_shared;
> -	tv_vm->bo = xe_vm_ttm_bo(vm);
> -	list_add_tail(&tv_vm->head, objs);
> -	err = ttm_eu_reserve_buffers(ww, objs, intr, &dups);
> +	err = drm_gpuva_manager_lock(&vm->mgr, exec, xe_vm_gem(vm), intr,
> +				     num_shared);
>  	if (err)
> -		goto out_err;
> +		return err;
>  
>  	spin_lock(&vm->notifier.list_lock);
>  	list_for_each_entry_safe(vma, next, &vm->notifier.rebind_list,
> @@ -447,34 +416,22 @@ int xe_vm_lock_dma_resv(struct xe_vm *vm, struct ww_acquire_ctx *ww,
>  	}
>  	spin_unlock(&vm->notifier.list_lock);
>  
> -	*tv = tv_vm;
>  	return 0;
> -
> -out_err:
> -	if (tv_vm != tv_onstack)
> -		kvfree(tv_vm);
> -
> -	return err;
>  }
>  
>  /**
>   * xe_vm_unlock_dma_resv() - Unlock reservation objects locked by
>   * xe_vm_lock_dma_resv()
>   * @vm: The vm.
> - * @tv_onstack: The @tv_onstack array given to xe_vm_lock_dma_resv().
> - * @tv: The value of *@tv given by xe_vm_lock_dma_resv().
> - * @ww: The ww_acquire_context used for locking.
> - * @objs: The list returned from xe_vm_lock_dma_resv().
>   *
>   * Unlocks the reservation objects and frees any memory allocated by
>   * xe_vm_lock_dma_resv().
>   */
> -void xe_vm_unlock_dma_resv(struct xe_vm *vm,
> -			   struct ttm_validate_buffer *tv_onstack,
> -			   struct ttm_validate_buffer *tv,
> -			   struct ww_acquire_ctx *ww,
> -			   struct list_head *objs)
> +void xe_vm_unlock_dma_resv(struct xe_vm *vm, struct drm_exec *exec)
>  {
> +	struct drm_gem_object *obj, *skip = xe_vm_gem(vm);
> +	unsigned long index;
> +
>  	/*
>  	 * Nothing should've been able to enter the list while we were locked,
>  	 * since we've held the dma-resvs of all the vm's external objects,
> @@ -483,19 +440,20 @@ void xe_vm_unlock_dma_resv(struct xe_vm *vm,
>  	 */
>  	XE_WARN_ON(!list_empty(&vm->notifier.rebind_list));
>  
> -	ttm_eu_backoff_reservation(ww, objs);
> -	if (tv && tv != tv_onstack)
> -		kvfree(tv);
> +	drm_exec_for_each_locked_object(exec, index, obj) {
> +		struct xe_bo *bo = gem_to_xe_bo(obj);
> +
> +		if (obj != skip)
> +			ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
> +	}
> +	drm_gpuva_manager_unlock(&vm->mgr, exec);
>  }
>  
>  static void preempt_rebind_work_func(struct work_struct *w)
>  {
>  	struct xe_vm *vm = container_of(w, struct xe_vm, preempt.rebind_work);
> +	struct drm_exec exec;
>  	struct xe_vma *vma;
> -	struct ttm_validate_buffer tv_onstack[XE_ONSTACK_TV];
> -	struct ttm_validate_buffer *tv;
> -	struct ww_acquire_ctx ww;
> -	struct list_head objs;
>  	struct dma_fence *rebind_fence;
>  	unsigned int fence_count = 0;
>  	LIST_HEAD(preempt_fences);
> @@ -536,8 +494,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
>  			goto out_unlock_outer;
>  	}
>  
> -	err = xe_vm_lock_dma_resv(vm, &ww, tv_onstack, &tv, &objs,
> -				  false, vm->preempt.num_engines);
> +	err = xe_vm_lock_dma_resv(vm, &exec, false, vm->preempt.num_engines);
>  	if (err)
>  		goto out_unlock_outer;
>  
> @@ -608,11 +565,11 @@ static void preempt_rebind_work_func(struct work_struct *w)
>  
>  	/* Point of no return. */
>  	arm_preempt_fences(vm, &preempt_fences);
> -	resume_and_reinstall_preempt_fences(vm);
> +	resume_and_reinstall_preempt_fences(vm, &exec);
>  	up_read(&vm->userptr.notifier_lock);
>  
>  out_unlock:
> -	xe_vm_unlock_dma_resv(vm, tv_onstack, tv, &ww, &objs);
> +	xe_vm_unlock_dma_resv(vm, &exec);
>  out_unlock_outer:
>  	if (err == -EAGAIN) {
>  		trace_xe_vm_rebind_worker_retry(vm);
> @@ -963,27 +920,16 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
>  
>  static void xe_vma_destroy_unlocked(struct xe_vma *vma)
>  {
> -	struct ttm_validate_buffer tv[2];
> -	struct ww_acquire_ctx ww;
> +	struct xe_vm *vm = xe_vma_vm(vma);
>  	struct xe_bo *bo = xe_vma_bo(vma);
> -	LIST_HEAD(objs);
> -	LIST_HEAD(dups);
> +	struct drm_exec exec;
>  	int err;
>  
> -	memset(tv, 0, sizeof(tv));
> -	tv[0].bo = xe_vm_ttm_bo(xe_vma_vm(vma));
> -	list_add(&tv[0].head, &objs);
> -
> -	if (bo) {
> -		tv[1].bo = &xe_bo_get(bo)->ttm;
> -		list_add(&tv[1].head, &objs);
> -	}
> -	err = ttm_eu_reserve_buffers(&ww, &objs, false, &dups);
> +	err = xe_vm_bo_lock(vm, xe_bo_get(bo), &exec, 0, false);
>  	XE_WARN_ON(err);
> -
>  	xe_vma_destroy(vma, NULL);
> +	xe_vm_bo_unlock(vm, bo, &exec, false);
>  
> -	ttm_eu_backoff_reservation(&ww, &objs);
>  	if (bo)
>  		xe_bo_put(bo);
>  }
> @@ -1254,7 +1200,7 @@ static void vm_error_capture(struct xe_vm *vm, int err,
>  void xe_vm_close_and_put(struct xe_vm *vm)
>  {
>  	struct list_head contested;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  	struct xe_device *xe = xe_vm_device(vm);
>  	struct xe_gt *gt;
>  	struct xe_vma *vma, *next_vma;
> @@ -1281,7 +1227,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
>  	}
>  
>  	down_write(&vm->lock);
> -	xe_vm_lock(vm, &ww, 0, false);
> +	xe_vm_lock(vm, &exec, 0, false);
>  	drm_gpuva_iter_for_each(gpuva, it) {
>  		vma = gpuva_to_vma(gpuva);
>  
> @@ -1323,7 +1269,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
>  					      NULL);
>  		}
>  	}
> -	xe_vm_unlock(vm, &ww);
> +	xe_vm_unlock(vm, &exec);
>  
>  	/*
>  	 * VM is now dead, cannot re-add nodes to vm->vmas if it's NULL
> @@ -1356,7 +1302,7 @@ static void vm_destroy_work_func(struct work_struct *w)
>  {
>  	struct xe_vm *vm =
>  		container_of(w, struct xe_vm, destroy_work);
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  	struct xe_device *xe = xe_vm_device(vm);
>  	struct xe_gt *gt;
>  	u8 id;
> @@ -1382,14 +1328,14 @@ static void vm_destroy_work_func(struct work_struct *w)
>  	 * is needed for xe_vm_lock to work. If we remove that dependency this
>  	 * can be moved to xe_vm_close_and_put.
>  	 */
> -	xe_vm_lock(vm, &ww, 0, false);
> +	xe_vm_lock(vm, &exec, 0, false);
>  	for_each_gt(gt, xe, id) {
>  		if (vm->pt_root[id]) {
>  			xe_pt_destroy(vm->pt_root[id], vm->flags, NULL);
>  			vm->pt_root[id] = NULL;
>  		}
>  	}
> -	xe_vm_unlock(vm, &ww);
> +	xe_vm_unlock(vm, &exec);
>  
>  	trace_xe_vm_free(vm);
>  	dma_fence_put(vm->rebind_fence);
> @@ -1969,21 +1915,6 @@ static int xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma,
>  
>  #define VM_BIND_OP(op)	(op & 0xffff)
>  
> -struct ttm_buffer_object *xe_vm_ttm_bo(struct xe_vm *vm)
> -{
> -	int idx = vm->flags & XE_VM_FLAG_MIGRATION ?
> -		XE_VM_FLAG_GT_ID(vm->flags) : 0;
> -
> -	/* Safe to use index 0 as all BO in the VM share a single dma-resv lock */
> -	return &vm->pt_root[idx]->bo->ttm;
> -}
> -
> -static void xe_vm_tv_populate(struct xe_vm *vm, struct ttm_validate_buffer *tv)
> -{
> -	tv->num_shared = 1;
> -	tv->bo = xe_vm_ttm_bo(vm);
> -}
> -
>  static void vm_set_async_error(struct xe_vm *vm, int err)
>  {
>  	lockdep_assert_held(&vm->lock);
> @@ -2088,7 +2019,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  			 u32 operation, u8 gt_mask, u32 region)
>  {
>  	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  	struct drm_gpuva_ops *ops;
>  	struct drm_gpuva_op *__op;
>  	struct xe_vma_op *op;
> @@ -2136,11 +2067,11 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  	case XE_VM_BIND_OP_UNMAP_ALL:
>  		XE_BUG_ON(!bo);
>  
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>  		if (err)
>  			return ERR_PTR(err);
>  		ops = drm_gpuva_gem_unmap_ops_create(&vm->mgr, obj);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>  
>  		drm_gpuva_for_each_op(__op, ops) {
>  			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
> @@ -2174,13 +2105,13 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>  {
>  	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
>  	struct xe_vma *vma;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>  	int err;
>  
>  	lockdep_assert_held_write(&vm->lock);
>  
>  	if (bo) {
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>  		if (err)
>  			return ERR_PTR(err);
>  	}
> @@ -2189,7 +2120,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>  			    op->va.range - 1, read_only, null,
>  			    gt_mask);
>  	if (bo)
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>  
>  	if (xe_vma_is_userptr(vma)) {
>  		err = xe_vma_userptr_pin_pages(vma);
> @@ -2441,19 +2372,15 @@ static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)
>  static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
>  			       struct xe_vma_op *op)
>  {
> -	LIST_HEAD(objs);
> -	LIST_HEAD(dups);
> -	struct ttm_validate_buffer tv_bo, tv_vm;
> -	struct ww_acquire_ctx ww;
>  	struct xe_bo *vbo;
> +	struct drm_exec exec;
>  	int err;
> +	bool lru_update = op->base.op != DRM_GPUVA_OP_UNMAP;
>  
>  	lockdep_assert_held_write(&vm->lock);
>  
> -	xe_vm_tv_populate(vm, &tv_vm);
> -	list_add_tail(&tv_vm.head, &objs);
>  	vbo = xe_vma_bo(vma);
> -	if (vbo) {
> +	if (vbo)
>  		/*
>  		 * An unbind can drop the last reference to the BO and
>  		 * the BO is needed for ttm_eu_backoff_reservation so
> @@ -2461,22 +2388,15 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
>  		 */
>  		xe_bo_get(vbo);
>  
> -		if (!vbo->vm) {
> -			tv_bo.bo = &vbo->ttm;
> -			tv_bo.num_shared = 1;
> -			list_add(&tv_bo.head, &objs);
> -		}
> -	}
> -
>  again:
> -	err = ttm_eu_reserve_buffers(&ww, &objs, true, &dups);
> +	err = xe_vm_bo_lock(vm, vbo, &exec, 1, false);
>  	if (err) {
>  		xe_bo_put(vbo);
>  		return err;
>  	}
>  
>  	xe_vm_assert_held(vm);
> -	xe_bo_assert_held(xe_vma_bo(vma));
> +	xe_bo_assert_held(vbo);
>  
>  	switch (op->base.op) {
>  	case DRM_GPUVA_OP_MAP:
> @@ -2552,7 +2472,7 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
>  		XE_BUG_ON("NOT POSSIBLE");
>  	}
>  
> -	ttm_eu_backoff_reservation(&ww, &objs);
> +	xe_vm_bo_unlock(vm, vbo, &exec, lru_update);
>  	if (err == -EAGAIN && xe_vma_is_userptr(vma)) {
>  		lockdep_assert_held_write(&vm->lock);
>  		err = xe_vma_userptr_pin_pages(vma);
> @@ -3208,30 +3128,67 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  	return err == -ENODATA ? 0 : err;
>  }
>  
> -/*
> - * XXX: Using the TTM wrappers for now, likely can call into dma-resv code
> - * directly to optimize. Also this likely should be an inline function.
> - */
> -int xe_vm_lock(struct xe_vm *vm, struct ww_acquire_ctx *ww,
> +int xe_vm_lock(struct xe_vm *vm, struct drm_exec *exec,
>  	       int num_resv, bool intr)
>  {
> -	struct ttm_validate_buffer tv_vm;
> -	LIST_HEAD(objs);
> -	LIST_HEAD(dups);
> +	int err;
>  
> -	XE_BUG_ON(!ww);
> +	drm_exec_init(exec, intr);
> +	drm_exec_while_not_all_locked(exec) {
> +		err = drm_exec_prepare_obj(exec, xe_vm_gem(vm),
> +					   num_resv);
> +		drm_exec_continue_on_contention(exec);
> +		if (err && err != -EALREADY)
> +			goto out_err;
> +	}
>  
> -	tv_vm.num_shared = num_resv;
> -	tv_vm.bo = xe_vm_ttm_bo(vm);;
> -	list_add_tail(&tv_vm.head, &objs);
> +	return 0;
>  
> -	return ttm_eu_reserve_buffers(ww, &objs, intr, &dups);
> +out_err:
> +	drm_exec_fini(exec);
> +	return err;
>  }
>  
> -void xe_vm_unlock(struct xe_vm *vm, struct ww_acquire_ctx *ww)
> +void xe_vm_unlock(struct xe_vm *vm, struct drm_exec *exec)
>  {
> -	dma_resv_unlock(xe_vm_resv(vm));
> -	ww_acquire_fini(ww);
> +	drm_exec_fini(exec);
> +}
> +
> +int xe_vm_bo_lock(struct xe_vm *vm, struct xe_bo *bo, struct drm_exec *exec,
> +		  int num_resv, bool intr)
> +{
> +	int err;
> +
> +	drm_exec_init(exec, intr);
> +	drm_exec_while_not_all_locked(exec) {
> +		err = drm_exec_prepare_obj(exec, xe_vm_gem(vm),
> +					   num_resv);
> +		drm_exec_continue_on_contention(exec);
> +		if (err && err != -EALREADY)
> +			goto out_err;
> +
> +		if (bo && !bo->vm) {
> +			err = drm_exec_prepare_obj(exec, &bo->ttm.base,
> +						   num_resv);
> +			drm_exec_continue_on_contention(exec);
> +			if (err && err != -EALREADY)
> +				goto out_err;
> +		}
> +	}
> +
> +	return 0;
> +
> +out_err:
> +	drm_exec_fini(exec);
> +	return err;
> +}
> +
> +void xe_vm_bo_unlock(struct xe_vm *vm, struct xe_bo *bo, struct drm_exec *exec,
> +		     bool lru_update)
> +{
> +	if (lru_update && bo && (!bo->vm || xe_vm_no_dma_fences(vm)))
> +		ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
> +	drm_exec_fini(exec);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index f279fa622260..47b981d9fc04 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -12,6 +12,7 @@
>  #include "xe_vm_types.h"
>  
>  struct drm_device;
> +struct drm_exec;
>  struct drm_printer;
>  struct drm_file;
>  
> @@ -38,10 +39,14 @@ static inline void xe_vm_put(struct xe_vm *vm)
>  	kref_put(&vm->refcount, xe_vm_free);
>  }
>  
> -int xe_vm_lock(struct xe_vm *vm, struct ww_acquire_ctx *ww,
> +int xe_vm_lock(struct xe_vm *vm, struct drm_exec *exec,
>  	       int num_resv, bool intr);
> +void xe_vm_unlock(struct xe_vm *vm, struct drm_exec *exec);
>  
> -void xe_vm_unlock(struct xe_vm *vm, struct ww_acquire_ctx *ww);
> +int xe_vm_bo_lock(struct xe_vm *vm, struct xe_bo *bo, struct drm_exec *exec,
> +		  int num_resv, bool intr);
> +void xe_vm_bo_unlock(struct xe_vm *vm, struct xe_bo *bo, struct drm_exec *exec,
> +		     bool lru_update);
>  
>  static inline bool xe_vm_is_closed(struct xe_vm *vm)
>  {
> @@ -219,23 +224,9 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma);
>  
>  int xe_vma_userptr_check_repin(struct xe_vma *vma);
>  
> -/*
> - * XE_ONSTACK_TV is used to size the tv_onstack array that is input
> - * to xe_vm_lock_dma_resv() and xe_vm_unlock_dma_resv().
> - */
> -#define XE_ONSTACK_TV 20
> -int xe_vm_lock_dma_resv(struct xe_vm *vm, struct ww_acquire_ctx *ww,
> -			struct ttm_validate_buffer *tv_onstack,
> -			struct ttm_validate_buffer **tv,
> -			struct list_head *objs,
> -			bool intr,
> +int xe_vm_lock_dma_resv(struct xe_vm *vm, struct drm_exec *exec, bool intr,
>  			unsigned int num_shared);
> -
> -void xe_vm_unlock_dma_resv(struct xe_vm *vm,
> -			   struct ttm_validate_buffer *tv_onstack,
> -			   struct ttm_validate_buffer *tv,
> -			   struct ww_acquire_ctx *ww,
> -			   struct list_head *objs);
> +void xe_vm_unlock_dma_resv(struct xe_vm *vm, struct drm_exec *exec);
>  
>  int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id);
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
> index 03508645fa08..a68bc6fec1de 100644
> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> @@ -7,6 +7,7 @@
>  
>  #include <linux/nospec.h>
>  
> +#include <drm/drm_exec.h>
>  #include <drm/ttm/ttm_tt.h>
>  #include <drm/xe_drm.h>
>  
> @@ -28,16 +29,16 @@ static int madvise_preferred_mem_class(struct xe_device *xe, struct xe_vm *vm,
>  
>  	for (i = 0; i < num_vmas; ++i) {
>  		struct xe_bo *bo;
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>  
>  		bo = xe_vma_bo(vmas[i]);
>  
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>  		if (err)
>  			return err;
>  		bo->props.preferred_mem_class = value;
>  		xe_bo_placement_for_flags(xe, bo, bo->flags);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>  	}
>  
>  	return 0;
> @@ -53,16 +54,16 @@ static int madvise_preferred_gt(struct xe_device *xe, struct xe_vm *vm,
>  
>  	for (i = 0; i < num_vmas; ++i) {
>  		struct xe_bo *bo;
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>  
>  		bo = xe_vma_bo(vmas[i]);
>  
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>  		if (err)
>  			return err;
>  		bo->props.preferred_gt = value;
>  		xe_bo_placement_for_flags(xe, bo, bo->flags);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>  	}
>  
>  	return 0;
> @@ -89,17 +90,17 @@ static int madvise_preferred_mem_class_gt(struct xe_device *xe,
>  
>  	for (i = 0; i < num_vmas; ++i) {
>  		struct xe_bo *bo;
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>  
>  		bo = xe_vma_bo(vmas[i]);
>  
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>  		if (err)
>  			return err;
>  		bo->props.preferred_mem_class = mem_class;
>  		bo->props.preferred_gt = gt_id;
>  		xe_bo_placement_for_flags(xe, bo, bo->flags);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>  	}
>  
>  	return 0;
> @@ -112,13 +113,13 @@ static int madvise_cpu_atomic(struct xe_device *xe, struct xe_vm *vm,
>  
>  	for (i = 0; i < num_vmas; ++i) {
>  		struct xe_bo *bo;
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>  
>  		bo = xe_vma_bo(vmas[i]);
>  		if (XE_IOCTL_ERR(xe, !(bo->flags & XE_BO_CREATE_SYSTEM_BIT)))
>  			return -EINVAL;
>  
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>  		if (err)
>  			return err;
>  		bo->props.cpu_atomic = !!value;
> @@ -130,7 +131,7 @@ static int madvise_cpu_atomic(struct xe_device *xe, struct xe_vm *vm,
>  		 */
>  		if (bo->props.cpu_atomic)
>  			ttm_bo_unmap_virtual(&bo->ttm);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>  	}
>  
>  	return 0;
> @@ -143,18 +144,18 @@ static int madvise_device_atomic(struct xe_device *xe, struct xe_vm *vm,
>  
>  	for (i = 0; i < num_vmas; ++i) {
>  		struct xe_bo *bo;
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>  
>  		bo = xe_vma_bo(vmas[i]);
>  		if (XE_IOCTL_ERR(xe, !(bo->flags & XE_BO_CREATE_VRAM0_BIT) &&
>  				 !(bo->flags & XE_BO_CREATE_VRAM1_BIT)))
>  			return -EINVAL;
>  
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>  		if (err)
>  			return err;
>  		bo->props.device_atomic = !!value;
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>  	}
>  
>  	return 0;
> @@ -174,16 +175,16 @@ static int madvise_priority(struct xe_device *xe, struct xe_vm *vm,
>  
>  	for (i = 0; i < num_vmas; ++i) {
>  		struct xe_bo *bo;
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>  
>  		bo = xe_vma_bo(vmas[i]);
>  
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>  		if (err)
>  			return err;
>  		bo->ttm.priority = value;
>  		ttm_bo_move_to_lru_tail(&bo->ttm);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>  	}
>  
>  	return 0;
> diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
> index 943c8fcda533..a2f6d90ac899 100644
> --- a/include/drm/drm_gpuva_mgr.h
> +++ b/include/drm/drm_gpuva_mgr.h
> @@ -32,6 +32,8 @@
>  #include <linux/spinlock.h>
>  #include <linux/types.h>
>  
> +#include <drm/drm_exec.h>
> +
>  struct drm_gpuva_manager;
>  struct drm_gpuva_fn_ops;
>  struct drm_gpuva_prealloc;
> @@ -169,9 +171,17 @@ struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end);
>  
>  bool drm_gpuva_interval_empty(struct drm_gpuva_manager *mgr, u64 addr, u64 range);
>  
> -void drm_gpuva_add_fence(struct drm_gpuva_manager *mgr, struct dma_fence *fence,
> -			 enum dma_resv_usage private_usage,
> -			 enum dma_resv_usage extobj_usage);
> +int drm_gpuva_manager_lock(struct drm_gpuva_manager *mgr, struct drm_exec *exec,
> +			   struct drm_gem_object *mgr_obj, bool intr,
> +			   unsigned int num_fences);
> +void drm_gpuva_manager_unlock(struct drm_gpuva_manager *mgr,
> +			      struct drm_exec *exec);
> +
> +void drm_gpuva_manager_add_fence(struct drm_gpuva_manager *mgr,
> +				 struct drm_exec *exec,
> +				 struct dma_fence *fence,
> +				 enum dma_resv_usage private_usage,
> +				 enum dma_resv_usage extobj_usage);
>  
>  /**
>   * drm_gpuva_evict - sets whether the backing GEM of this &drm_gpuva is evicted
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 28/31] drm/xe: Allow dma-fences as in-syncs for compute / faulting VM
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 28/31] drm/xe: Allow dma-fences as in-syncs for compute / faulting VM Matthew Brost
@ 2023-05-05 19:43   ` Rodrigo Vivi
  2023-05-08  1:19     ` Matthew Brost
  2023-05-11 10:03   ` Thomas Hellström
  1 sibling, 1 reply; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-05 19:43 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, May 01, 2023 at 05:17:24PM -0700, Matthew Brost wrote:
> This is allowed per the dma-fencing rules.

it would be good a word saying 'why' we are doing this.
only because we can it doesn't mean we should...

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_sync.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c
> index 99f1ed87196d..1e4e4acb2c4a 100644
> --- a/drivers/gpu/drm/xe/xe_sync.c
> +++ b/drivers/gpu/drm/xe/xe_sync.c
> @@ -105,6 +105,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
>  {
>  	struct drm_xe_sync sync_in;
>  	int err;
> +	bool signal;
>  
>  	if (copy_from_user(&sync_in, sync_user, sizeof(*sync_user)))
>  		return -EFAULT;
> @@ -113,9 +114,10 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
>  			 ~(SYNC_FLAGS_TYPE_MASK | DRM_XE_SYNC_SIGNAL)))
>  		return -EINVAL;
>  
> +	signal = sync_in.flags & DRM_XE_SYNC_SIGNAL;
>  	switch (sync_in.flags & SYNC_FLAGS_TYPE_MASK) {
>  	case DRM_XE_SYNC_SYNCOBJ:
> -		if (XE_IOCTL_ERR(xe, no_dma_fences))
> +		if (XE_IOCTL_ERR(xe, no_dma_fences && signal))
>  			return -ENOTSUPP;
>  
>  		if (XE_IOCTL_ERR(xe, upper_32_bits(sync_in.addr)))
> @@ -125,7 +127,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
>  		if (XE_IOCTL_ERR(xe, !sync->syncobj))
>  			return -ENOENT;
>  
> -		if (!(sync_in.flags & DRM_XE_SYNC_SIGNAL)) {
> +		if (!signal) {
>  			sync->fence = drm_syncobj_fence_get(sync->syncobj);
>  			if (XE_IOCTL_ERR(xe, !sync->fence))
>  				return -EINVAL;
> @@ -133,7 +135,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
>  		break;
>  
>  	case DRM_XE_SYNC_TIMELINE_SYNCOBJ:
> -		if (XE_IOCTL_ERR(xe, no_dma_fences))
> +		if (XE_IOCTL_ERR(xe, no_dma_fences && signal))
>  			return -ENOTSUPP;
>  
>  		if (XE_IOCTL_ERR(xe, upper_32_bits(sync_in.addr)))
> @@ -146,7 +148,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
>  		if (XE_IOCTL_ERR(xe, !sync->syncobj))
>  			return -ENOENT;
>  
> -		if (sync_in.flags & DRM_XE_SYNC_SIGNAL) {
> +		if (signal) {
>  			sync->chain_fence = dma_fence_chain_alloc();
>  			if (!sync->chain_fence)
>  				return -ENOMEM;
> @@ -168,7 +170,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
>  		break;
>  
>  	case DRM_XE_SYNC_USER_FENCE:
> -		if (XE_IOCTL_ERR(xe, !(sync_in.flags & DRM_XE_SYNC_SIGNAL)))
> +		if (XE_IOCTL_ERR(xe, !signal))
>  			return -ENOTSUPP;
>  
>  		if (XE_IOCTL_ERR(xe, sync_in.addr & 0x7))
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 31/31] drm/xe/uapi: Add some VM bind kernel doc
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 31/31] drm/xe/uapi: Add some VM bind kernel doc Matthew Brost
@ 2023-05-05 19:45   ` Rodrigo Vivi
  2023-05-11 10:14     ` Thomas Hellström
  0 siblings, 1 reply; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-05 19:45 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, May 01, 2023 at 05:17:27PM -0700, Matthew Brost wrote:
> Try to explain how VM bind works in Xe.

We will need more doc and likely with examples and all...
but this is already something we need.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  include/uapi/drm/xe_drm.h | 45 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index cb4debe4ebda..c7137db2cbe8 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -148,7 +148,16 @@ struct drm_xe_engine_class_instance {
>  	 * Kernel only classes (not actual hardware engine class). Used for
>  	 * creating ordered queues of VM bind operations.
>  	 */
> +	/**
> +	 * @DRM_XE_ENGINE_CLASS_VM_BIND_ASYNC: VM bind engine which are allowed
> +	 * to use in / out syncs. The out sync indicates bind op(s) completion.
> +	 */
>  #define DRM_XE_ENGINE_CLASS_VM_BIND_ASYNC	5
> +	/**
> +	 * @DRM_XE_ENGINE_CLASS_VM_BIND_SYNC: VM bind engine which are not
> +	 * allowed to use in / out syncs, The IOCTL return indicates bind op(s)
> +	 * completion.
> +	 */
>  #define DRM_XE_ENGINE_CLASS_VM_BIND_SYNC	6
>  
>  	__u16 engine_instance;
> @@ -322,6 +331,7 @@ struct drm_xe_vm_create {
>  
>  #define DRM_XE_VM_CREATE_SCRATCH_PAGE	(0x1 << 0)
>  #define DRM_XE_VM_CREATE_COMPUTE_MODE	(0x1 << 1)
> +	/** @DRM_XE_VM_CREATE_ASYNC_DEFAULT: Default VM bind engine is async */
>  #define DRM_XE_VM_CREATE_ASYNC_DEFAULT	(0x1 << 2)
>  #define DRM_XE_VM_CREATE_FAULT_MODE	(0x1 << 3)
>  
> @@ -379,21 +389,44 @@ struct drm_xe_vm_bind_op {
>  	/** @mem_region: Memory region to prefetch VMA to, instance not a mask */
>  	__u32 region;
>  
> +	/** @XE_VM_BIND_OP_MAP: Map a buffer object */
>  #define XE_VM_BIND_OP_MAP		0x0
> +	/** @XE_VM_BIND_OP_UNMAP: Unmap a buffer object or userptr */
>  #define XE_VM_BIND_OP_UNMAP		0x1
> +	/** @XE_VM_BIND_OP_MAP_USERPTR: Map a userptr */
>  #define XE_VM_BIND_OP_MAP_USERPTR	0x2
> +	/**
> +	 * @XE_VM_BIND_OP_RESTART: Restart last bind operation that failed with
> +	 * -ENOSPC
> +	 */
>  #define XE_VM_BIND_OP_RESTART		0x3
> +	/**
> +	 * @XE_VM_BIND_OP_UNMAP_ALL: Unmap all mappings associated with a
> +	 * buffer ibject
> +	 */
>  #define XE_VM_BIND_OP_UNMAP_ALL		0x4
> +	/**
> +	 * @XE_VM_BIND_OP_PREFETCH: For a deferred bind (faulting VM)
> +	 * validate buffer object and (re)bind
> +	 */
>  #define XE_VM_BIND_OP_PREFETCH		0x5
> -
> +	/** @XE_VM_BIND_FLAG_READONLY: Set mapping to read only */
>  #define XE_VM_BIND_FLAG_READONLY	(0x1 << 16)
> +	/**
> +	 * @XE_VM_BIND_FLAG_ASYNC: Sanity check for if using async bind engine
> +	 * (in / out syncs) this set needs to be set.
> +	 */
>  #define XE_VM_BIND_FLAG_ASYNC		(0x1 << 17)
> -	/*
> +	/**
> +	 * @XE_VM_BIND_FLAG_IMMEDIATE:
> +	 *
>  	 * Valid on a faulting VM only, do the MAP operation immediately rather
>  	 * than differing the MAP to the page fault handler.
>  	 */
>  #define XE_VM_BIND_FLAG_IMMEDIATE	(0x1 << 18)
> -	/*
> +	/**
> +	 * @XE_VM_BIND_FLAG_NULL:
> +	 *
>  	 * When the NULL flag is set, the page tables are setup with a special
>  	 * bit which indicates writes are dropped and all reads return zero. The
>  	 * NULL flags is only valid for XE_VM_BIND_OP_MAP operations, the BO
> @@ -401,6 +434,12 @@ struct drm_xe_vm_bind_op {
>  	 * VK sparse bindings.
>  	 */
>  #define XE_VM_BIND_FLAG_NULL		(0x1 << 19)
> +	/**
> +	 * @XE_VM_BIND_FLAG_RECLAIM: Should be set when a VM is in an error
> +	 * state (bind op returns -ENOSPC), used with sync bind engines to issue
> +	 * UNMAP operations which hopefully free enough memory so when VM is
> +	 * restarted via @XE_VM_BIND_OP_RESTART the failed bind ops succeed.
> +	 */
>  #define XE_VM_BIND_FLAG_RECLAIM		(0x1 << 20)
>  
>  	/** @reserved: Reserved */
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 06/31] drm/xe: Ensure LR engines are not persistent
  2023-05-05 18:38   ` Rodrigo Vivi
@ 2023-05-08  1:03     ` Matthew Brost
  0 siblings, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-08  1:03 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On Fri, May 05, 2023 at 02:38:45PM -0400, Rodrigo Vivi wrote:
> On Mon, May 01, 2023 at 05:17:02PM -0700, Matthew Brost wrote:
> > With our ref counting scheme LR engines only close properly if not
> > persistent, ensure that LR engines are non-persistent.
> 
> Better to spell out long running somewhere here...
> 

Will do.

> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_engine.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
> > index d1e84d7adbd4..91600b1e8249 100644
> > --- a/drivers/gpu/drm/xe/xe_engine.c
> > +++ b/drivers/gpu/drm/xe/xe_engine.c
> > @@ -596,7 +596,9 @@ int xe_engine_create_ioctl(struct drm_device *dev, void *data,
> >  			return -ENOENT;
> >  
> >  		e = xe_engine_create(xe, vm, logical_mask,
> > -				     args->width, hwe, ENGINE_FLAG_PERSISTENT);
> > +				     args->width, hwe,
> > +				     xe_vm_no_dma_fences(vm) ? 0 :
> 
> shouldn't we use that existent function xe_engine_is_lr instead of this?
>

The engine isn't created yet... So no this seems to be correct as a LR
engine us a user engine with a VM that doesn't allow DMA fences.

Matt<

> > +				     ENGINE_FLAG_PERSISTENT);
> >  		xe_vm_put(vm);
> >  		if (IS_ERR(e))
> >  			return PTR_ERR(e);
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 07/31] drm/xe: Only try to lock external BOs in VM bind
  2023-05-05 18:40   ` Rodrigo Vivi
@ 2023-05-08  1:08     ` Matthew Brost
  2023-05-08  1:15       ` Christopher Snowhill
  2023-05-08 21:34       ` Rodrigo Vivi
  0 siblings, 2 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-08  1:08 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe, Matthew Brost

On Fri, May 05, 2023 at 02:40:40PM -0400, Rodrigo Vivi wrote:
> On Mon, May 01, 2023 at 05:17:03PM -0700, Matthew Brost wrote:
> > Not needed and causes some issues with bulk LRU moves.
> 
> I'm confused with this explanation and the code below.
> could you please provide a bit more wording here?
> 

We only need to try to lock a BO if it external as non-external BOs
share the dma-resv with the already locked VM. Trying to lock
non-external BOs caused an issue (list corruption) in an uncoming patch
which adds bulk LRU move. Since this code isn't needed, remove it.

^^^ How about this.

> > 
> > Signed-off-by: Matthew Brost <mattthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_vm.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > index 272f0f7f24fe..6c427ff92c44 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -2064,9 +2064,11 @@ static int vm_bind_ioctl(struct xe_vm *vm, struct xe_vma *vma,
> >  		 */
> >  		xe_bo_get(vbo);
> >  
> > -		tv_bo.bo = &vbo->ttm;
> > -		tv_bo.num_shared = 1;
> > -		list_add(&tv_bo.head, &objs);
> > +		if (!vbo->vm) {
> > +			tv_bo.bo = &vbo->ttm;
> > +			tv_bo.num_shared = 1;
> > +			list_add(&tv_bo.head, &objs);
> > +		}
> >  	}
> >  
> >  again:
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 10/31] drm/xe/guc: Return the lower part of blocking H2G message
  2023-05-05 18:52   ` Rodrigo Vivi
@ 2023-05-08  1:10     ` Matthew Brost
  2023-05-08  9:20       ` Michal Wajdeczko
  0 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-08  1:10 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On Fri, May 05, 2023 at 02:52:45PM -0400, Rodrigo Vivi wrote:
> On Mon, May 01, 2023 at 05:17:06PM -0700, Matthew Brost wrote:
> > The upper layers may need this data, an example of this is allocating
> > DIST doorbell.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_guc_ct.c | 6 +++++-
> >  drivers/gpu/drm/xe/xe_guc_pc.c | 6 ++++--
> >  drivers/gpu/drm/xe/xe_huc.c    | 2 +-
> >  3 files changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > index 6abf1dee95af..60b69fcfac9f 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > @@ -25,6 +25,7 @@
> >  struct g2h_fence {
> >  	u32 *response_buffer;
> >  	u32 seqno;
> > +	u32 status;
> >  	u16 response_len;
> >  	u16 error;
> >  	u16 hint;
> > @@ -727,7 +728,7 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
> >  		ret = -EIO;
> >  	}
> >  
> > -	return ret > 0 ? 0 : ret;
> > +	return ret > 0 ? g2h_fence.status : ret;
> 
> The problem I see here is how the upper level could differentiate
> between and error and a status.
> 

g2h_fence.status is 16 (can't be negative), so 0 or greater is a good
return.

> should we convert the functions to have an &status argument passed in?
>

I like it the way it is but don't really care either way. I'll change
this if you like.

Matt 

> >  }
> >  
> >  int xe_guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
> > @@ -793,6 +794,9 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
> >  		g2h_fence->response_len = response_len;
> >  		memcpy(g2h_fence->response_buffer, msg + GUC_CTB_MSG_MIN_LEN,
> >  		       response_len * sizeof(u32));
> > +	} else {
> > +		g2h_fence->status =
> > +			FIELD_GET(GUC_HXG_RESPONSE_MSG_0_DATA0, msg[1]);
> >  	}
> >  
> >  	g2h_release_space(ct, GUC_CTB_HXG_MSG_MAX_LEN);
> > diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
> > index 72d460d5323b..3d2ea723a4a7 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_pc.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_pc.c
> > @@ -204,11 +204,13 @@ static int pc_action_query_task_state(struct xe_guc_pc *pc)
> >  
> >  	/* Blocking here to ensure the results are ready before reading them */
> >  	ret = xe_guc_ct_send_block(ct, action, ARRAY_SIZE(action));
> > -	if (ret)
> > +	if (ret < 0) {
> >  		drm_err(&pc_to_xe(pc)->drm,
> >  			"GuC PC query task state failed: %pe", ERR_PTR(ret));
> > +		return ret;
> > +	}
> >  
> > -	return ret;
> > +	return 0;
> >  }
> >  
> >  static int pc_action_set_param(struct xe_guc_pc *pc, u8 id, u32 value)
> > diff --git a/drivers/gpu/drm/xe/xe_huc.c b/drivers/gpu/drm/xe/xe_huc.c
> > index 55dcaab34ea4..9c48c3075410 100644
> > --- a/drivers/gpu/drm/xe/xe_huc.c
> > +++ b/drivers/gpu/drm/xe/xe_huc.c
> > @@ -39,7 +39,7 @@ int xe_huc_init(struct xe_huc *huc)
> >  
> >  	huc->fw.type = XE_UC_FW_TYPE_HUC;
> >  	ret = xe_uc_fw_init(&huc->fw);
> > -	if (ret)
> > +	if (ret < 0)
> >  		goto out;
> >  
> >  	xe_uc_fw_change_status(&huc->fw, XE_UC_FIRMWARE_LOADABLE);
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 05/31] drm/xe: Long running job update
  2023-05-05 18:36   ` Rodrigo Vivi
@ 2023-05-08  1:14     ` Matthew Brost
  0 siblings, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-08  1:14 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On Fri, May 05, 2023 at 02:36:58PM -0400, Rodrigo Vivi wrote:
> On Mon, May 01, 2023 at 05:17:01PM -0700, Matthew Brost wrote:
> > Flow control + write ring in exec, return NULL in run_job, siganl
> 
> typo: s/siganl/signal
> 

Yep.

> > xe_hw_fence immediately, and override TDR for LR jobs.
> 
> So, this would likely be the recommendation on how to deal with
> the lack of completion fence right?! Could you please put a more
> descriptive text that we could convert to a documentation later?
>

Sure, let write something a bit longer here. In my next drm scheduler
series upstream I'll likely need a bit of DOC in drm scheduler anyways.

Matt

> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_engine.c           | 32 ++++++++
> >  drivers/gpu/drm/xe/xe_engine.h           |  4 +
> >  drivers/gpu/drm/xe/xe_exec.c             |  8 ++
> >  drivers/gpu/drm/xe/xe_guc_engine_types.h |  2 +
> >  drivers/gpu/drm/xe/xe_guc_submit.c       | 95 +++++++++++++++++++++---
> >  drivers/gpu/drm/xe/xe_trace.h            |  5 ++
> >  6 files changed, 137 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
> > index 094ec17d3004..d1e84d7adbd4 100644
> > --- a/drivers/gpu/drm/xe/xe_engine.c
> > +++ b/drivers/gpu/drm/xe/xe_engine.c
> > @@ -18,6 +18,7 @@
> >  #include "xe_macros.h"
> >  #include "xe_migrate.h"
> >  #include "xe_pm.h"
> > +#include "xe_ring_ops_types.h"
> >  #include "xe_trace.h"
> >  #include "xe_vm.h"
> >  
> > @@ -673,6 +674,37 @@ static void engine_kill_compute(struct xe_engine *e)
> >  	up_write(&e->vm->lock);
> >  }
> >  
> > +/**
> > + * xe_engine_is_lr() - Whether an engine is long-running
> > + * @e: The engine
> > + *
> > + * Return: True if the engine is long-running, false otherwise.
> > + */
> > +bool xe_engine_is_lr(struct xe_engine *e)
> > +{
> > +	return e->vm && xe_vm_no_dma_fences(e->vm) &&
> > +		!(e->flags & ENGINE_FLAG_VM);
> 
> Why do we have this ENGINE_FLAG_VM here?
> 
> > +}
> > +
> > +static s32 xe_engine_num_job_inflight(struct xe_engine *e)
> > +{
> > +	return e->lrc->fence_ctx.next_seqno - xe_lrc_seqno(e->lrc) - 1;
> > +}
> > +
> > +/**
> > + * xe_engine_ring_full() - Whether an engine's ring is full
> > + * @e: The engine
> > + *
> > + * Return: True if the engine's ring is full, false otherwise.
> > + */
> > +bool xe_engine_ring_full(struct xe_engine *e)
> > +{
> > +	struct xe_lrc *lrc = e->lrc;
> > +	s32 max_job = lrc->ring.size / MAX_JOB_SIZE_BYTES;
> > +
> > +	return xe_engine_num_job_inflight(e) >= max_job;
> > +}
> > +
> >  /**
> >   * xe_engine_is_idle() - Whether an engine is idle.
> >   * @engine: The engine
> > diff --git a/drivers/gpu/drm/xe/xe_engine.h b/drivers/gpu/drm/xe/xe_engine.h
> > index a49cf2ab405e..2e60f6d90226 100644
> > --- a/drivers/gpu/drm/xe/xe_engine.h
> > +++ b/drivers/gpu/drm/xe/xe_engine.h
> > @@ -42,6 +42,10 @@ static inline bool xe_engine_is_parallel(struct xe_engine *engine)
> >  	return engine->width > 1;
> >  }
> >  
> > +bool xe_engine_is_lr(struct xe_engine *e);
> > +
> > +bool xe_engine_ring_full(struct xe_engine *e);
> > +
> >  bool xe_engine_is_idle(struct xe_engine *engine);
> >  
> >  void xe_engine_kill(struct xe_engine *e);
> > diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> > index ea869f2452ef..44ea9bcd0066 100644
> > --- a/drivers/gpu/drm/xe/xe_exec.c
> > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > @@ -13,6 +13,7 @@
> >  #include "xe_device.h"
> >  #include "xe_engine.h"
> >  #include "xe_macros.h"
> > +#include "xe_ring_ops_types.h"
> >  #include "xe_sched_job.h"
> >  #include "xe_sync.h"
> >  #include "xe_vm.h"
> > @@ -277,6 +278,11 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >  		goto err_engine_end;
> >  	}
> >  
> > +	if (xe_engine_is_lr(engine) && xe_engine_ring_full(engine)) {
> > +		err = -EWOULDBLOCK;
> > +		goto err_engine_end;
> > +	}
> > +
> >  	job = xe_sched_job_create(engine, xe_engine_is_parallel(engine) ?
> >  				  addresses : &args->address);
> >  	if (IS_ERR(job)) {
> > @@ -363,6 +369,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >  		xe_sync_entry_signal(&syncs[i], job,
> >  				     &job->drm.s_fence->finished);
> >  
> > +	if (xe_engine_is_lr(engine))
> > +		engine->ring_ops->emit_job(job);
> >  	xe_sched_job_push(job);
> >  	xe_vm_reactivate_rebind(vm);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > index cbfb13026ec1..5d83132034a6 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > @@ -31,6 +31,8 @@ struct xe_guc_engine {
> >  	 */
> >  #define MAX_STATIC_MSG_TYPE	3
> >  	struct drm_sched_msg static_msgs[MAX_STATIC_MSG_TYPE];
> > +	/** @lr_tdr: long running TDR worker */
> > +	struct work_struct lr_tdr;
> >  	/** @fini_async: do final fini async from this worker */
> >  	struct work_struct fini_async;
> >  	/** @resume_time: time of last resume */
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index 68d09e7a4cc0..0a41f5d04f6d 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -500,6 +500,14 @@ static void register_engine(struct xe_engine *e)
> >  		parallel_write(xe, map, wq_desc.wq_status, WQ_STATUS_ACTIVE);
> >  	}
> >  
> > +	/*
> > +	 * We must keep a reference for LR engines if engine is registered with
> > +	 * the GuC as jobs signal immediately and can't destroy an engine if the
> > +	 * GuC has a reference to it.
> > +	 */
> > +	if (xe_engine_is_lr(e))
> > +		xe_engine_get(e);
> > +
> >  	set_engine_registered(e);
> >  	trace_xe_engine_register(e);
> >  	if (xe_engine_is_parallel(e))
> > @@ -662,6 +670,7 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
> >  {
> >  	struct xe_sched_job *job = to_xe_sched_job(drm_job);
> >  	struct xe_engine *e = job->engine;
> > +	bool lr = xe_engine_is_lr(e);
> >  
> >  	XE_BUG_ON((engine_destroyed(e) || engine_pending_disable(e)) &&
> >  		  !engine_banned(e) && !engine_suspended(e));
> > @@ -671,14 +680,19 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
> >  	if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
> >  		if (!engine_registered(e))
> >  			register_engine(e);
> > -		e->ring_ops->emit_job(job);
> > +		if (!lr)	/* Written in IOCTL */
> > +			e->ring_ops->emit_job(job);
> >  		submit_engine(e);
> >  	}
> >  
> > -	if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags))
> > +	if (lr) {
> > +		xe_sched_job_set_error(job, -ENOTSUPP);
> > +		return NULL;
> > +	} else if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags)) {
> >  		return job->fence;
> > -	else
> > +	} else {
> >  		return dma_fence_get(job->fence);
> > +	}
> >  }
> >  
> >  static void guc_engine_free_job(struct drm_sched_job *drm_job)
> > @@ -782,6 +796,57 @@ static void simple_error_capture(struct xe_engine *e)
> >  }
> >  #endif
> >  
> > +static void xe_guc_engine_trigger_cleanup(struct xe_engine *e)
> > +{
> > +	struct xe_guc *guc = engine_to_guc(e);
> > +
> > +	if (xe_engine_is_lr(e))
> > +		queue_work(guc_to_gt(guc)->ordered_wq, &e->guc->lr_tdr);
> > +	else
> > +		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > +}
> > +
> > +static void xe_guc_engine_lr_cleanup(struct work_struct *w)
> > +{
> > +	struct xe_guc_engine *ge =
> > +		container_of(w, struct xe_guc_engine, lr_tdr);
> > +	struct xe_engine *e = ge->engine;
> > +	struct drm_gpu_scheduler *sched = &ge->sched;
> > +
> > +	XE_BUG_ON(!xe_engine_is_lr(e));
> > +	trace_xe_engine_lr_cleanup(e);
> > +
> > +	/* Kill the run_job / process_msg entry points */
> > +	drm_sched_run_wq_stop(sched);
> > +
> > +	/* Engine state now stable, disable scheduling / deregister if needed */
> > +	if (engine_registered(e)) {
> > +		struct xe_guc *guc = engine_to_guc(e);
> > +		int ret;
> > +
> > +		set_engine_banned(e);
> > +		xe_engine_get(e);
> > +		disable_scheduling_deregister(guc, e);
> > +
> > +		/*
> > +		 * Must wait for scheduling to be disabled before signalling
> > +		 * any fences, if GT broken the GT reset code should signal us.
> > +		 */
> > +		smp_rmb();
> > +		ret = wait_event_timeout(guc->ct.wq,
> > +					 !engine_pending_disable(e) ||
> > +					 guc_read_stopped(guc), HZ * 5);
> > +		if (!ret) {
> > +			XE_WARN_ON("Schedule disable failed to respond");
> > +			drm_sched_run_wq_start(sched);
> > +			xe_gt_reset_async(e->gt);
> > +			return;
> > +		}
> > +	}
> > +
> > +	drm_sched_run_wq_start(sched);
> > +}
> > +
> >  static enum drm_gpu_sched_stat
> >  guc_engine_timedout_job(struct drm_sched_job *drm_job)
> >  {
> > @@ -832,7 +897,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
> >  			err = -EIO;
> >  		set_engine_banned(e);
> >  		xe_engine_get(e);
> > -		disable_scheduling_deregister(engine_to_guc(e), e);
> > +		disable_scheduling_deregister(guc, e);
> >  
> >  		/*
> >  		 * Must wait for scheduling to be disabled before signalling
> > @@ -865,7 +930,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
> >  	 */
> >  	list_add(&drm_job->list, &sched->pending_list);
> >  	drm_sched_run_wq_start(sched);
> > -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > +	xe_guc_engine_trigger_cleanup(e);
> >  
> >  	/* Mark all outstanding jobs as bad, thus completing them */
> >  	spin_lock(&sched->job_list_lock);
> > @@ -889,6 +954,8 @@ static void __guc_engine_fini_async(struct work_struct *w)
> >  
> >  	trace_xe_engine_destroy(e);
> >  
> > +	if (xe_engine_is_lr(e))
> > +		cancel_work_sync(&ge->lr_tdr);
> >  	if (e->flags & ENGINE_FLAG_PERSISTENT)
> >  		xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
> >  	release_guc_id(guc, e);
> > @@ -906,7 +973,7 @@ static void guc_engine_fini_async(struct xe_engine *e)
> >  	bool kernel = e->flags & ENGINE_FLAG_KERNEL;
> >  
> >  	INIT_WORK(&e->guc->fini_async, __guc_engine_fini_async);
> > -	queue_work(system_unbound_wq, &e->guc->fini_async);
> > +	queue_work(system_wq, &e->guc->fini_async);
> >  
> >  	/* We must block on kernel engines so slabs are empty on driver unload */
> >  	if (kernel) {
> > @@ -1089,12 +1156,16 @@ static int guc_engine_init(struct xe_engine *e)
> >  	if (err)
> >  		goto err_free;
> >  
> > +
> >  	sched = &ge->sched;
> >  	err = drm_sched_entity_init(&ge->entity, DRM_SCHED_PRIORITY_NORMAL,
> >  				    &sched, 1, NULL);
> >  	if (err)
> >  		goto err_sched;
> >  
> > +	if (xe_engine_is_lr(e))
> > +		INIT_WORK(&e->guc->lr_tdr, xe_guc_engine_lr_cleanup);
> > +
> >  	mutex_lock(&guc->submission_state.lock);
> >  
> >  	err = alloc_guc_id(guc, e);
> > @@ -1146,7 +1217,7 @@ static void guc_engine_kill(struct xe_engine *e)
> >  {
> >  	trace_xe_engine_kill(e);
> >  	set_engine_killed(e);
> > -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > +	xe_guc_engine_trigger_cleanup(e);
> >  }
> >  
> >  static void guc_engine_add_msg(struct xe_engine *e, struct drm_sched_msg *msg,
> > @@ -1296,6 +1367,9 @@ static void guc_engine_stop(struct xe_guc *guc, struct xe_engine *e)
> >  	/* Stop scheduling + flush any DRM scheduler operations */
> >  	drm_sched_run_wq_stop(sched);
> >  
> > +	if (engine_registered(e) && xe_engine_is_lr(e))
> > +		xe_engine_put(e);
> > +
> >  	/* Clean up lost G2H + reset engine state */
> >  	if (engine_destroyed(e) && engine_registered(e)) {
> >  		if (engine_banned(e))
> > @@ -1520,6 +1594,9 @@ int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> >  	trace_xe_engine_deregister_done(e);
> >  
> >  	clear_engine_registered(e);
> > +	if (xe_engine_is_lr(e))
> > +		xe_engine_put(e);
> > +
> >  	if (engine_banned(e))
> >  		xe_engine_put(e);
> >  	else
> > @@ -1557,7 +1634,7 @@ int xe_guc_engine_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
> >  	 */
> >  	set_engine_reset(e);
> >  	if (!engine_banned(e))
> > -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > +		xe_guc_engine_trigger_cleanup(e);
> >  
> >  	return 0;
> >  }
> > @@ -1584,7 +1661,7 @@ int xe_guc_engine_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
> >  	/* Treat the same as engine reset */
> >  	set_engine_reset(e);
> >  	if (!engine_banned(e))
> > -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > +		xe_guc_engine_trigger_cleanup(e);
> >  
> >  	return 0;
> >  }
> > diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> > index 2f8eb7ebe9a7..02861c26e145 100644
> > --- a/drivers/gpu/drm/xe/xe_trace.h
> > +++ b/drivers/gpu/drm/xe/xe_trace.h
> > @@ -219,6 +219,11 @@ DEFINE_EVENT(xe_engine, xe_engine_resubmit,
> >  	     TP_ARGS(e)
> >  );
> >  
> > +DEFINE_EVENT(xe_engine, xe_engine_lr_cleanup,
> > +	     TP_PROTO(struct xe_engine *e),
> > +	     TP_ARGS(e)
> > +);
> > +
> >  DECLARE_EVENT_CLASS(xe_sched_job,
> >  		    TP_PROTO(struct xe_sched_job *job),
> >  		    TP_ARGS(job),
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 07/31] drm/xe: Only try to lock external BOs in VM bind
  2023-05-08  1:08     ` Matthew Brost
@ 2023-05-08  1:15       ` Christopher Snowhill
  2023-05-08 21:34       ` Rodrigo Vivi
  1 sibling, 0 replies; 126+ messages in thread
From: Christopher Snowhill @ 2023-05-08  1:15 UTC (permalink / raw)
  To: Matthew Brost; +Cc: Rodrigo Vivi, intel-xe, Matthew Brost

[-- Attachment #1: Type: text/plain, Size: 1758 bytes --]

s/it external/it's external/

Otherwise, the description looks good.

On Sun, May 7, 2023 at 6:08 PM Matthew Brost <matthew.brost@intel.com>
wrote:

> On Fri, May 05, 2023 at 02:40:40PM -0400, Rodrigo Vivi wrote:
> > On Mon, May 01, 2023 at 05:17:03PM -0700, Matthew Brost wrote:
> > > Not needed and causes some issues with bulk LRU moves.
> >
> > I'm confused with this explanation and the code below.
> > could you please provide a bit more wording here?
> >
>
> We only need to try to lock a BO if it external as non-external BOs
> share the dma-resv with the already locked VM. Trying to lock
> non-external BOs caused an issue (list corruption) in an uncoming patch
> which adds bulk LRU move. Since this code isn't needed, remove it.
>
> ^^^ How about this.
>
> > >
> > > Signed-off-by: Matthew Brost <mattthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_vm.c | 8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > > index 272f0f7f24fe..6c427ff92c44 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > @@ -2064,9 +2064,11 @@ static int vm_bind_ioctl(struct xe_vm *vm,
> struct xe_vma *vma,
> > >              */
> > >             xe_bo_get(vbo);
> > >
> > > -           tv_bo.bo = &vbo->ttm;
> > > -           tv_bo.num_shared = 1;
> > > -           list_add(&tv_bo.head, &objs);
> > > +           if (!vbo->vm) {
> > > +                   tv_bo.bo = &vbo->ttm;
> > > +                   tv_bo.num_shared = 1;
> > > +                   list_add(&tv_bo.head, &objs);
> > > +           }
> > >     }
> > >
> > >  again:
> > > --
> > > 2.34.1
> > >
>

[-- Attachment #2: Type: text/html, Size: 2676 bytes --]

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 07/31] drm/xe: Only try to lock external BOs in VM bind
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 07/31] drm/xe: Only try to lock external BOs in VM bind Matthew Brost
  2023-05-05 18:40   ` Rodrigo Vivi
@ 2023-05-08  1:17   ` Christopher Snowhill
  1 sibling, 0 replies; 126+ messages in thread
From: Christopher Snowhill @ 2023-05-08  1:17 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Matthew Brost

[-- Attachment #1: Type: text/plain, Size: 1215 bytes --]

Also, you've typoed your email address as mattthew.brost@intel.com in a few
places here, including the Signed-off-by. It has also ended up in the CC
list for reply-all.

On Mon, May 1, 2023 at 5:17 PM Matthew Brost <matthew.brost@intel.com>
wrote:

> Not needed and causes some issues with bulk LRU moves.
>
> Signed-off-by: Matthew Brost <mattthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_vm.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 272f0f7f24fe..6c427ff92c44 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -2064,9 +2064,11 @@ static int vm_bind_ioctl(struct xe_vm *vm, struct
> xe_vma *vma,
>                  */
>                 xe_bo_get(vbo);
>
> -               tv_bo.bo = &vbo->ttm;
> -               tv_bo.num_shared = 1;
> -               list_add(&tv_bo.head, &objs);
> +               if (!vbo->vm) {
> +                       tv_bo.bo = &vbo->ttm;
> +                       tv_bo.num_shared = 1;
> +                       list_add(&tv_bo.head, &objs);
> +               }
>         }
>
>  again:
> --
> 2.34.1
>
>

[-- Attachment #2: Type: text/html, Size: 1930 bytes --]

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 28/31] drm/xe: Allow dma-fences as in-syncs for compute / faulting VM
  2023-05-05 19:43   ` Rodrigo Vivi
@ 2023-05-08  1:19     ` Matthew Brost
  2023-05-08 21:29       ` Rodrigo Vivi
  0 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-08  1:19 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On Fri, May 05, 2023 at 03:43:15PM -0400, Rodrigo Vivi wrote:
> On Mon, May 01, 2023 at 05:17:24PM -0700, Matthew Brost wrote:
> > This is allowed per the dma-fencing rules.
> 
> it would be good a word saying 'why' we are doing this.
> only because we can it doesn't mean we should...
> 

This is allowed and encouraged the dma-fencing rule. This along with
allowing compute VMs to export dma-fences on binds will result in a
simpler compute UMD.

Sound ok?

Matt
  
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_sync.c | 12 +++++++-----
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c
> > index 99f1ed87196d..1e4e4acb2c4a 100644
> > --- a/drivers/gpu/drm/xe/xe_sync.c
> > +++ b/drivers/gpu/drm/xe/xe_sync.c
> > @@ -105,6 +105,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
> >  {
> >  	struct drm_xe_sync sync_in;
> >  	int err;
> > +	bool signal;
> >  
> >  	if (copy_from_user(&sync_in, sync_user, sizeof(*sync_user)))
> >  		return -EFAULT;
> > @@ -113,9 +114,10 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
> >  			 ~(SYNC_FLAGS_TYPE_MASK | DRM_XE_SYNC_SIGNAL)))
> >  		return -EINVAL;
> >  
> > +	signal = sync_in.flags & DRM_XE_SYNC_SIGNAL;
> >  	switch (sync_in.flags & SYNC_FLAGS_TYPE_MASK) {
> >  	case DRM_XE_SYNC_SYNCOBJ:
> > -		if (XE_IOCTL_ERR(xe, no_dma_fences))
> > +		if (XE_IOCTL_ERR(xe, no_dma_fences && signal))
> >  			return -ENOTSUPP;
> >  
> >  		if (XE_IOCTL_ERR(xe, upper_32_bits(sync_in.addr)))
> > @@ -125,7 +127,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
> >  		if (XE_IOCTL_ERR(xe, !sync->syncobj))
> >  			return -ENOENT;
> >  
> > -		if (!(sync_in.flags & DRM_XE_SYNC_SIGNAL)) {
> > +		if (!signal) {
> >  			sync->fence = drm_syncobj_fence_get(sync->syncobj);
> >  			if (XE_IOCTL_ERR(xe, !sync->fence))
> >  				return -EINVAL;
> > @@ -133,7 +135,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
> >  		break;
> >  
> >  	case DRM_XE_SYNC_TIMELINE_SYNCOBJ:
> > -		if (XE_IOCTL_ERR(xe, no_dma_fences))
> > +		if (XE_IOCTL_ERR(xe, no_dma_fences && signal))
> >  			return -ENOTSUPP;
> >  
> >  		if (XE_IOCTL_ERR(xe, upper_32_bits(sync_in.addr)))
> > @@ -146,7 +148,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
> >  		if (XE_IOCTL_ERR(xe, !sync->syncobj))
> >  			return -ENOENT;
> >  
> > -		if (sync_in.flags & DRM_XE_SYNC_SIGNAL) {
> > +		if (signal) {
> >  			sync->chain_fence = dma_fence_chain_alloc();
> >  			if (!sync->chain_fence)
> >  				return -ENOMEM;
> > @@ -168,7 +170,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
> >  		break;
> >  
> >  	case DRM_XE_SYNC_USER_FENCE:
> > -		if (XE_IOCTL_ERR(xe, !(sync_in.flags & DRM_XE_SYNC_SIGNAL)))
> > +		if (XE_IOCTL_ERR(xe, !signal))
> >  			return -ENOTSUPP;
> >  
> >  		if (XE_IOCTL_ERR(xe, sync_in.addr & 0x7))
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 20/31] drm/xe: Optimize size of xe_vma allocation
  2023-05-05 19:37   ` Rodrigo Vivi
@ 2023-05-08  1:21     ` Matthew Brost
  0 siblings, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-08  1:21 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On Fri, May 05, 2023 at 03:37:02PM -0400, Rodrigo Vivi wrote:
> On Mon, May 01, 2023 at 05:17:16PM -0700, Matthew Brost wrote:
> > Reduce gt_mask to a u8 from a u64, only allocate userptr state if VMA is
> > a userptr, and union of destroy callback and worker.
> 
> too many different things in one patch. could you please split the patch?
> 

Yes, will do.

Matt

> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_vm.c       | 14 +++--
> >  drivers/gpu/drm/xe/xe_vm_types.h | 88 +++++++++++++++++---------------
> >  2 files changed, 57 insertions(+), 45 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > index e5f2fffb2aec..e8d9939ee535 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -814,7 +814,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> >  				    u64 bo_offset_or_userptr,
> >  				    u64 start, u64 end,
> >  				    bool read_only, bool null,
> > -				    u64 gt_mask)
> > +				    u8 gt_mask)
> >  {
> >  	struct xe_vma *vma;
> >  	struct xe_gt *gt;
> > @@ -823,7 +823,11 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> >  	XE_BUG_ON(start >= end);
> >  	XE_BUG_ON(end >= vm->size);
> >  
> > -	vma = kzalloc(sizeof(*vma), GFP_KERNEL);
> > +	if (!bo && !null)	/* userptr */
> > +		vma = kzalloc(sizeof(*vma), GFP_KERNEL);
> > +	else
> > +		vma = kzalloc(sizeof(*vma) - sizeof(struct xe_userptr),
> > +			      GFP_KERNEL);
> >  	if (!vma) {
> >  		vma = ERR_PTR(-ENOMEM);
> >  		return vma;
> > @@ -2149,7 +2153,7 @@ static void print_op(struct xe_device *xe, struct drm_gpuva_op *op)
> >  static struct drm_gpuva_ops *
> >  vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
> >  			 u64 bo_offset_or_userptr, u64 addr, u64 range,
> > -			 u32 operation, u64 gt_mask, u32 region)
> > +			 u32 operation, u8 gt_mask, u32 region)
> >  {
> >  	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
> >  	struct ww_acquire_ctx ww;
> > @@ -2234,7 +2238,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
> >  }
> >  
> >  static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
> > -			      u64 gt_mask, bool read_only, bool null)
> > +			      u8 gt_mask, bool read_only, bool null)
> >  {
> >  	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
> >  	struct xe_vma *vma;
> > @@ -3217,8 +3221,8 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >  		u64 addr = bind_ops[i].addr;
> >  		u32 op = bind_ops[i].op;
> >  		u64 obj_offset = bind_ops[i].obj_offset;
> > -		u64 gt_mask = bind_ops[i].gt_mask;
> >  		u32 region = bind_ops[i].region;
> > +		u8 gt_mask = bind_ops[i].gt_mask;
> >  
> >  		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
> >  						  addr, range, op, gt_mask,
> > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> > index 22def5483c12..df4797ec4d7f 100644
> > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > @@ -34,22 +34,34 @@ struct xe_vm;
> >  #define XE_VMA_PTE_2M		(DRM_GPUVA_USERBITS << 7)
> >  #define XE_VMA_PTE_1G		(DRM_GPUVA_USERBITS << 8)
> >  
> > +/** struct xe_userptr - User pointer */
> > +struct xe_userptr {
> > +	/**
> > +	 * @notifier: MMU notifier for user pointer (invalidation call back)
> > +	 */
> > +	struct mmu_interval_notifier notifier;
> > +	/** @sgt: storage for a scatter gather table */
> > +	struct sg_table sgt;
> > +	/** @sg: allocated scatter gather table */
> > +	struct sg_table *sg;
> > +	/** @notifier_seq: notifier sequence number */
> > +	unsigned long notifier_seq;
> > +	/**
> > +	 * @initial_bind: user pointer has been bound at least once.
> > +	 * write: vm->userptr.notifier_lock in read mode and vm->resv held.
> > +	 * read: vm->userptr.notifier_lock in write mode or vm->resv held.
> > +	 */
> > +	bool initial_bind;
> > +#if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
> > +	u32 divisor;
> > +#endif
> > +};
> > +
> > +/** xe_vma - Virtual memory address */
> >  struct xe_vma {
> >  	/** @gpuva: Base GPUVA object */
> >  	struct drm_gpuva gpuva;
> >  
> > -	/** @gt_mask: GT mask of where to create binding for this VMA */
> > -	u64 gt_mask;
> > -
> > -	/**
> > -	 * @gt_present: GT mask of binding are present for this VMA.
> > -	 * protected by vm->lock, vm->resv and for userptrs,
> > -	 * vm->userptr.notifier_lock for writing. Needs either for reading,
> > -	 * but if reading is done under the vm->lock only, it needs to be held
> > -	 * in write mode.
> > -	 */
> > -	u64 gt_present;
> > -
> >  	union {
> >  		/** @userptr_link: link into VM repin list if userptr */
> >  		struct list_head userptr_link;
> > @@ -77,16 +89,29 @@ struct xe_vma {
> >  		} notifier;
> >  	};
> >  
> > -	/** @destroy_cb: callback to destroy VMA when unbind job is done */
> > -	struct dma_fence_cb destroy_cb;
> > +	union {
> > +		/** @destroy_cb: callback to destroy VMA when unbind job is done */
> > +		struct dma_fence_cb destroy_cb;
> > +		/** @destroy_work: worker to destroy this BO */
> > +		struct work_struct destroy_work;
> > +	};
> >  
> > -	/** @destroy_work: worker to destroy this BO */
> > -	struct work_struct destroy_work;
> > +	/** @gt_mask: GT mask of where to create binding for this VMA */
> > +	u8 gt_mask;
> > +
> > +	/**
> > +	 * @gt_present: GT mask of binding are present for this VMA.
> > +	 * protected by vm->lock, vm->resv and for userptrs,
> > +	 * vm->userptr.notifier_lock for writing. Needs either for reading,
> > +	 * but if reading is done under the vm->lock only, it needs to be held
> > +	 * in write mode.
> > +	 */
> > +	u8 gt_present;
> >  
> >  	/** @usm: unified shared memory state */
> >  	struct {
> >  		/** @gt_invalidated: VMA has been invalidated */
> > -		u64 gt_invalidated;
> > +		u8 gt_invalidated;
> >  	} usm;
> >  
> >  	struct {
> > @@ -97,28 +122,11 @@ struct xe_vma {
> >  		struct list_head link;
> >  	} extobj;
> >  
> > -	/** @userptr: user pointer state */
> > -	struct {
> > -		/**
> > -		 * @notifier: MMU notifier for user pointer (invalidation call back)
> > -		 */
> > -		struct mmu_interval_notifier notifier;
> > -		/** @sgt: storage for a scatter gather table */
> > -		struct sg_table sgt;
> > -		/** @sg: allocated scatter gather table */
> > -		struct sg_table *sg;
> > -		/** @notifier_seq: notifier sequence number */
> > -		unsigned long notifier_seq;
> > -		/**
> > -		 * @initial_bind: user pointer has been bound at least once.
> > -		 * write: vm->userptr.notifier_lock in read mode and vm->resv held.
> > -		 * read: vm->userptr.notifier_lock in write mode or vm->resv held.
> > -		 */
> > -		bool initial_bind;
> > -#if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
> > -		u32 divisor;
> > -#endif
> > -	} userptr;
> > +	/**
> > +	 * @userptr: user pointer state, only allocated for VMAs that are
> > +	 * user pointers
> > +	 */
> > +	struct xe_userptr userptr;
> >  };
> >  
> >  struct xe_device;
> > @@ -387,7 +395,7 @@ struct xe_vma_op {
> >  	 */
> >  	struct async_op_fence *fence;
> >  	/** @gt_mask: gt mask for this operation */
> > -	u64 gt_mask;
> > +	u8 gt_mask;
> >  	/** @flags: operation flags */
> >  	enum xe_vma_op_flags flags;
> >  
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 10/31] drm/xe/guc: Return the lower part of blocking H2G message
  2023-05-08  1:10     ` Matthew Brost
@ 2023-05-08  9:20       ` Michal Wajdeczko
  0 siblings, 0 replies; 126+ messages in thread
From: Michal Wajdeczko @ 2023-05-08  9:20 UTC (permalink / raw)
  To: Matthew Brost, Rodrigo Vivi; +Cc: intel-xe



On 08.05.2023 03:10, Matthew Brost wrote:
> On Fri, May 05, 2023 at 02:52:45PM -0400, Rodrigo Vivi wrote:
>> On Mon, May 01, 2023 at 05:17:06PM -0700, Matthew Brost wrote:
>>> The upper layers may need this data, an example of this is allocating
>>> DIST doorbell.
>>>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>  drivers/gpu/drm/xe/xe_guc_ct.c | 6 +++++-
>>>  drivers/gpu/drm/xe/xe_guc_pc.c | 6 ++++--
>>>  drivers/gpu/drm/xe/xe_huc.c    | 2 +-
>>>  3 files changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
>>> index 6abf1dee95af..60b69fcfac9f 100644
>>> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
>>> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
>>> @@ -25,6 +25,7 @@
>>>  struct g2h_fence {
>>>  	u32 *response_buffer;
>>>  	u32 seqno;
>>> +	u32 status;

if purpose of this field is to hold data from the success reply, then
why it is called as misleading 'status' ?

in our spec we call it 'data0' see [1]

[1] https://www.kernel.org/doc/html/latest/gpu/i915.html?#hxg-response

>>>  	u16 response_len;
>>>  	u16 error;
>>>  	u16 hint;

btw, is there any true benefit to decompose HXG replies into these
fields in g2h_fence ? done/retry/fail modes are mutually exclusive,
while separate flags here can't guarantee that


>>> @@ -727,7 +728,7 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
>>>  		ret = -EIO;
>>>  	}
>>>  
>>> -	return ret > 0 ? 0 : ret;
>>> +	return ret > 0 ? g2h_fence.status : ret;
>>
>> The problem I see here is how the upper level could differentiate
>> between and error and a status.
>>
> 
> g2h_fence.status is 16 (can't be negative), so 0 or greater is a good
> return.

to be precise, data0 from the success reply is 28-bit (see [1] above)

you probably mixed that with 16-bit error from failure reply (see [2])

[2] https://www.kernel.org/doc/html/latest/gpu/i915.html?#hxg-failure

> 
>> should we convert the functions to have an &status argument passed in?

it is not 'status'

>>
> 
> I like it the way it is but don't really care either way. I'll change
> this if you like.

HXG spec allows reply messages longer than 1dw (1dw = HXG header only)
thus if we claim support for such use case, as we do accept
response_buffer, then we should return actual length of the received
reply, rather than just data0, to allow caller parse flex size replies:

 * Return: Non-negative response length (in dwords) or
 *         a negative error code on failure.

but since majority, if not all, of our currently defined H2G defines
just data0, without extra data1, data2, ... then for such cases we
should rather provide separate wrapper that will make sure reply size is
exactly 1 (header only) and return extracted data0 from it.

and since data0 is just 28-bit we can still use single return value
documented as:

 * Return: Non-negative data0 from the success response or
 *         a negative error code on failure.

(as I still hope proper documentation will be prepared for core
functions near term, when it's most desired)

> 
> Matt 
> 
>>>  }
>>>  
>>>  int xe_guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
>>> @@ -793,6 +794,9 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
>>>  		g2h_fence->response_len = response_len;
>>>  		memcpy(g2h_fence->response_buffer, msg + GUC_CTB_MSG_MIN_LEN,
>>>  		       response_len * sizeof(u32));
>>> +	} else {
>>> +		g2h_fence->status =
>>> +			FIELD_GET(GUC_HXG_RESPONSE_MSG_0_DATA0, msg[1]);
>>>  	}
>>>  
>>>  	g2h_release_space(ct, GUC_CTB_HXG_MSG_MAX_LEN);
>>> diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
>>> index 72d460d5323b..3d2ea723a4a7 100644
>>> --- a/drivers/gpu/drm/xe/xe_guc_pc.c
>>> +++ b/drivers/gpu/drm/xe/xe_guc_pc.c
>>> @@ -204,11 +204,13 @@ static int pc_action_query_task_state(struct xe_guc_pc *pc)
>>>  
>>>  	/* Blocking here to ensure the results are ready before reading them */
>>>  	ret = xe_guc_ct_send_block(ct, action, ARRAY_SIZE(action));
>>> -	if (ret)
>>> +	if (ret < 0) {

since almost all H2G expects data0 == 0 on success, this fix shouldn't
be necessary, unless you are hiding other issue ... which in this case
is lack of proper initialization of the 'status/data0' field in
g2h_fence_init()

and btw, long term we should likely be better prepared for more
unexpected replies from the GuC, including non-zero data0 replies ...
but that should be another patch anyway

>>>  		drm_err(&pc_to_xe(pc)->drm,
>>>  			"GuC PC query task state failed: %pe", ERR_PTR(ret));
>>> +		return ret;
>>> +	}
>>>  
>>> -	return ret;
>>> +	return 0;
>>>  }
>>>  
>>>  static int pc_action_set_param(struct xe_guc_pc *pc, u8 id, u32 value)
>>> diff --git a/drivers/gpu/drm/xe/xe_huc.c b/drivers/gpu/drm/xe/xe_huc.c
>>> index 55dcaab34ea4..9c48c3075410 100644
>>> --- a/drivers/gpu/drm/xe/xe_huc.c
>>> +++ b/drivers/gpu/drm/xe/xe_huc.c
>>> @@ -39,7 +39,7 @@ int xe_huc_init(struct xe_huc *huc)
>>>  
>>>  	huc->fw.type = XE_UC_FW_TYPE_HUC;
>>>  	ret = xe_uc_fw_init(&huc->fw);
>>> -	if (ret)
>>> +	if (ret < 0)

ditto

>>>  		goto out;
>>>  
>>>  	xe_uc_fw_change_status(&huc->fw, XE_UC_FIRMWARE_LOADABLE);
>>> -- 
>>> 2.34.1
>>>


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 03/31] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
  2023-05-02  0:16 ` [Intel-xe] [PATCH v2 03/31] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
@ 2023-05-08 12:40   ` Thomas Hellström
  2023-05-22  1:16     ` Matthew Brost
  0 siblings, 1 reply; 126+ messages in thread
From: Thomas Hellström @ 2023-05-08 12:40 UTC (permalink / raw)
  To: Matthew Brost, intel-xe

An question below, with that addressed (possibly without change)

although I'm not a scheduler expert and we should ideally have 
additional reviewers,

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

On 5/2/23 02:16, Matthew Brost wrote:
> DRM_SCHED_POLICY_SINGLE_ENTITY creates a 1 to 1 relationship between
> scheduler and entity. No priorities or run queue used in this mode.
> Intended for devices with firmware schedulers.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/scheduler/sched_entity.c | 64 +++++++++++++++++++-----
>   drivers/gpu/drm/scheduler/sched_fence.c  |  2 +-
>   drivers/gpu/drm/scheduler/sched_main.c   | 63 ++++++++++++++++++++---
>   include/drm/gpu_scheduler.h              |  8 +++
>   4 files changed, 115 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 2300b2fc06ab..8b70900c54cc 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -83,6 +83,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>   	memset(entity, 0, sizeof(struct drm_sched_entity));
>   	INIT_LIST_HEAD(&entity->list);
>   	entity->rq = NULL;
> +	entity->single_sched = NULL;
>   	entity->guilty = guilty;
>   	entity->num_sched_list = num_sched_list;
>   	entity->priority = priority;
> @@ -91,7 +92,15 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>   	RB_CLEAR_NODE(&entity->rb_tree_node);
>   
>   	if(num_sched_list) {
> -		entity->rq = &sched_list[0]->sched_rq[entity->priority];
> +		if (sched_list[0]->sched_policy !=
> +		    DRM_SCHED_POLICY_SINGLE_ENTITY) {
> +			entity->rq = &sched_list[0]->sched_rq[entity->priority];
> +		} else {
> +			if (num_sched_list != 1 || sched_list[0]->single_entity)
> +				return -EINVAL;
> +			sched_list[0]->single_entity = entity;
> +			entity->single_sched = sched_list[0];
> +		}
>   	}
>   
>   	init_completion(&entity->entity_idle);
> @@ -125,7 +134,8 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>   				    struct drm_gpu_scheduler **sched_list,
>   				    unsigned int num_sched_list)
>   {
> -	WARN_ON(!num_sched_list || !sched_list);

Is there a way to get to the drm device so we can use drm_WARN_ON() here 
and below? I figure not?

Thanks,

Thomas


> +	WARN_ON(!num_sched_list || !sched_list ||
> +		!!entity->single_sched);
>   
>   	entity->sched_list = sched_list;
>   	entity->num_sched_list = num_sched_list;
> @@ -195,13 +205,15 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
>   {
>   	struct drm_sched_job *job;
>   	struct dma_fence *prev;
> +	bool single_entity = !!entity->single_sched;
>   
> -	if (!entity->rq)
> +	if (!entity->rq && !single_entity)
>   		return;
>   
>   	spin_lock(&entity->rq_lock);
>   	entity->stopped = true;
> -	drm_sched_rq_remove_entity(entity->rq, entity);
> +	if (!single_entity)
> +		drm_sched_rq_remove_entity(entity->rq, entity);
>   	spin_unlock(&entity->rq_lock);
>   
>   	/* Make sure this entity is not used by the scheduler at the moment */
> @@ -223,6 +235,20 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
>   	dma_fence_put(prev);
>   }
>   
> +/**
> + * drm_sched_entity_to_scheduler - Schedule entity to GPU scheduler
> + * @entity: scheduler entity
> + *
> + * Returns GPU scheduler for the entity
> + */
> +struct drm_gpu_scheduler *
> +drm_sched_entity_to_scheduler(struct drm_sched_entity *entity)
> +{
> +	bool single_entity = !!entity->single_sched;
> +
> +	return single_entity ? entity->single_sched : entity->rq->sched;
> +}
> +
>   /**
>    * drm_sched_entity_flush - Flush a context entity
>    *
> @@ -240,11 +266,12 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
>   	struct drm_gpu_scheduler *sched;
>   	struct task_struct *last_user;
>   	long ret = timeout;
> +	bool single_entity = !!entity->single_sched;
>   
> -	if (!entity->rq)
> +	if (!entity->rq && !single_entity)
>   		return 0;
>   
> -	sched = entity->rq->sched;
> +	sched = drm_sched_entity_to_scheduler(entity);
>   	/**
>   	 * The client will not queue more IBs during this fini, consume existing
>   	 * queued IBs or discard them on SIGKILL
> @@ -337,7 +364,7 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
>   		container_of(cb, struct drm_sched_entity, cb);
>   
>   	drm_sched_entity_clear_dep(f, cb);
> -	drm_sched_wakeup(entity->rq->sched);
> +	drm_sched_wakeup(drm_sched_entity_to_scheduler(entity));
>   }
>   
>   /**
> @@ -351,6 +378,8 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
>   void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>   				   enum drm_sched_priority priority)
>   {
> +	WARN_ON(!!entity->single_sched);
> +
>   	spin_lock(&entity->rq_lock);
>   	entity->priority = priority;
>   	spin_unlock(&entity->rq_lock);
> @@ -363,7 +392,7 @@ EXPORT_SYMBOL(drm_sched_entity_set_priority);
>    */
>   static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>   {
> -	struct drm_gpu_scheduler *sched = entity->rq->sched;
> +	struct drm_gpu_scheduler *sched = drm_sched_entity_to_scheduler(entity);
>   	struct dma_fence *fence = entity->dependency;
>   	struct drm_sched_fence *s_fence;
>   
> @@ -456,7 +485,8 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
>   	 * Update the entity's location in the min heap according to
>   	 * the timestamp of the next job, if any.
>   	 */
> -	if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO) {
> +	if (drm_sched_entity_to_scheduler(entity)->sched_policy ==
> +	    DRM_SCHED_POLICY_FIFO) {
>   		struct drm_sched_job *next;
>   
>   		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> @@ -473,6 +503,8 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>   	struct drm_gpu_scheduler *sched;
>   	struct drm_sched_rq *rq;
>   
> +	WARN_ON(!!entity->single_sched);
> +
>   	/* single possible engine and already selected */
>   	if (!entity->sched_list)
>   		return;
> @@ -522,16 +554,21 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>   void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>   {
>   	struct drm_sched_entity *entity = sched_job->entity;
> +	bool single_entity = !!entity->single_sched;
>   	bool first;
>   
>   	trace_drm_sched_job(sched_job, entity);
> -	atomic_inc(entity->rq->sched->score);
> +	if (!single_entity)
> +		atomic_inc(entity->rq->sched->score);
>   	WRITE_ONCE(entity->last_user, current->group_leader);
>   	first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node);
>   	sched_job->submit_ts = ktime_get();
>   
>   	/* first job wakes up scheduler */
>   	if (first) {
> +		struct drm_gpu_scheduler *sched =
> +			drm_sched_entity_to_scheduler(entity);
> +
>   		/* Add the entity to the run queue */
>   		spin_lock(&entity->rq_lock);
>   		if (entity->stopped) {
> @@ -541,13 +578,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>   			return;
>   		}
>   
> -		drm_sched_rq_add_entity(entity->rq, entity);
> +		if (!single_entity)
> +			drm_sched_rq_add_entity(entity->rq, entity);
>   		spin_unlock(&entity->rq_lock);
>   
> -		if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO)
> +		if (sched->sched_policy == DRM_SCHED_POLICY_FIFO)
>   			drm_sched_rq_update_fifo(entity, sched_job->submit_ts);
>   
> -		drm_sched_wakeup(entity->rq->sched);
> +		drm_sched_wakeup(sched);
>   	}
>   }
>   EXPORT_SYMBOL(drm_sched_entity_push_job);
> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> index 7fd869520ef2..1ba5056851dd 100644
> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> @@ -167,7 +167,7 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
>   {
>   	unsigned seq;
>   
> -	fence->sched = entity->rq->sched;
> +	fence->sched = drm_sched_entity_to_scheduler(entity);
>   	seq = atomic_inc_return(&entity->fence_seq);
>   	dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>   		       &fence->lock, entity->fence_context, seq);
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 6777a2db554f..870568d94f1f 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -32,7 +32,8 @@
>    * backend operations to the scheduler like submitting a job to hardware run queue,
>    * returning the dependencies of a job etc.
>    *
> - * The organisation of the scheduler is the following:
> + * The organisation of the scheduler is the following for scheduling policies
> + * DRM_SCHED_POLICY_RR and DRM_SCHED_POLICY_FIFO:
>    *
>    * 1. Each hw run queue has one scheduler
>    * 2. Each scheduler has multiple run queues with different priorities
> @@ -41,7 +42,22 @@
>    * 4. Entities themselves maintain a queue of jobs that will be scheduled on
>    *    the hardware.
>    *
> - * The jobs in a entity are always scheduled in the order that they were pushed.
> + * The organisation of the scheduler is the following for scheduling policy
> + * DRM_SCHED_POLICY_SINGLE_ENTITY:
> + *
> + * 1. One to one relationship between scheduler and entity
> + * 2. No priorities implemented per scheduler (single job queue)
> + * 3. No run queues in scheduler rather jobs are directly dequeued from entity
> + * 4. The entity maintains a queue of jobs that will be scheduled on the
> + * hardware
> + *
> + * The jobs in a entity are always scheduled in the order that they were pushed
> + * regardless of scheduling policy.
> + *
> + * A policy of DRM_SCHED_POLICY_RR or DRM_SCHED_POLICY_FIFO is expected to used
> + * when the KMD is scheduling directly on the hardware while a scheduling policy
> + * of DRM_SCHED_POLICY_SINGLE_ENTITY is expected to be used when there is a
> + * firmare scheduler.
>    */
>   
>   #include <linux/wait.h>
> @@ -92,6 +108,8 @@ static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *enti
>   
>   void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
>   {
> +	WARN_ON(!!entity->single_sched);
> +
>   	/*
>   	 * Both locks need to be grabbed, one to protect from entity->rq change
>   	 * for entity from within concurrent drm_sched_entity_select_rq and the
> @@ -122,6 +140,8 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
>   static void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
>   			      struct drm_sched_rq *rq)
>   {
> +	WARN_ON(sched->sched_policy == DRM_SCHED_POLICY_SINGLE_ENTITY);
> +
>   	spin_lock_init(&rq->lock);
>   	INIT_LIST_HEAD(&rq->entities);
>   	rq->rb_tree_root = RB_ROOT_CACHED;
> @@ -140,6 +160,8 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
>   void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
>   			     struct drm_sched_entity *entity)
>   {
> +	WARN_ON(!!entity->single_sched);
> +
>   	if (!list_empty(&entity->list))
>   		return;
>   
> @@ -162,6 +184,8 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
>   void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
>   				struct drm_sched_entity *entity)
>   {
> +	WARN_ON(!!entity->single_sched);
> +
>   	if (list_empty(&entity->list))
>   		return;
>   
> @@ -691,7 +715,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   		       struct drm_sched_entity *entity,
>   		       void *owner)
>   {
> -	if (!entity->rq)
> +	if (!entity->rq && !entity->single_sched)
>   		return -ENOENT;
>   
>   	job->entity = entity;
> @@ -724,13 +748,16 @@ void drm_sched_job_arm(struct drm_sched_job *job)
>   {
>   	struct drm_gpu_scheduler *sched;
>   	struct drm_sched_entity *entity = job->entity;
> +	bool single_entity = !!entity->single_sched;
>   
>   	BUG_ON(!entity);
> -	drm_sched_entity_select_rq(entity);
> -	sched = entity->rq->sched;
> +	if (!single_entity)
> +		drm_sched_entity_select_rq(entity);
> +	sched = drm_sched_entity_to_scheduler(entity);
>   
>   	job->sched = sched;
> -	job->s_priority = entity->rq - sched->sched_rq;
> +	if (!single_entity)
> +		job->s_priority = entity->rq - sched->sched_rq;
>   	job->id = atomic64_inc_return(&sched->job_id_count);
>   
>   	drm_sched_fence_init(job->s_fence, job->entity);
> @@ -954,6 +981,13 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>   	if (!drm_sched_ready(sched))
>   		return NULL;
>   
> +	if (sched->single_entity) {
> +		if (drm_sched_entity_is_ready(sched->single_entity))
> +			return sched->single_entity;
> +
> +		return NULL;
> +	}
> +
>   	/* Kernel run queue has higher priority than normal run queue*/
>   	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
>   		entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ?
> @@ -1210,6 +1244,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>   		return -EINVAL;
>   
>   	sched->ops = ops;
> +	sched->single_entity = NULL;
>   	sched->hw_submission_limit = hw_submission;
>   	sched->name = name;
>   	sched->run_wq = run_wq ? : system_wq;
> @@ -1222,7 +1257,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>   		sched->sched_policy = default_drm_sched_policy;
>   	else
>   		sched->sched_policy = sched_policy;
> -	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
> +	for (i = DRM_SCHED_PRIORITY_MIN; sched_policy !=
> +	     DRM_SCHED_POLICY_SINGLE_ENTITY && i < DRM_SCHED_PRIORITY_COUNT;
> +	     i++)
>   		drm_sched_rq_init(sched, &sched->sched_rq[i]);
>   
>   	init_waitqueue_head(&sched->job_scheduled);
> @@ -1255,7 +1292,15 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
>   
>   	drm_sched_run_wq_stop(sched);
>   
> -	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
> +	if (sched->single_entity) {
> +		spin_lock(&sched->single_entity->rq_lock);
> +		sched->single_entity->stopped = true;
> +		spin_unlock(&sched->single_entity->rq_lock);
> +	}
> +
> +	for (i = DRM_SCHED_PRIORITY_COUNT - 1; sched->sched_policy !=
> +	     DRM_SCHED_POLICY_SINGLE_ENTITY && i >= DRM_SCHED_PRIORITY_MIN;
> +	     i--) {
>   		struct drm_sched_rq *rq = &sched->sched_rq[i];
>   
>   		if (!rq)
> @@ -1299,6 +1344,8 @@ void drm_sched_increase_karma(struct drm_sched_job *bad)
>   	struct drm_sched_entity *entity;
>   	struct drm_gpu_scheduler *sched = bad->sched;
>   
> +	WARN_ON(sched->sched_policy == DRM_SCHED_POLICY_SINGLE_ENTITY);
> +
>   	/* don't change @bad's karma if it's from KERNEL RQ,
>   	 * because sometimes GPU hang would cause kernel jobs (like VM updating jobs)
>   	 * corrupt but keep in mind that kernel jobs always considered good.
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 3df801401028..669d6520cd3a 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -70,6 +70,7 @@ enum drm_sched_policy {
>   	DRM_SCHED_POLICY_DEFAULT,
>   	DRM_SCHED_POLICY_RR,
>   	DRM_SCHED_POLICY_FIFO,
> +	DRM_SCHED_POLICY_SINGLE_ENTITY,
>   	DRM_SCHED_POLICY_COUNT,
>   };
>   
> @@ -103,6 +104,9 @@ struct drm_sched_entity {
>   	 */
>   	struct drm_sched_rq		*rq;
>   
> +	/** @single_sched: Single scheduler */
> +	struct drm_gpu_scheduler	*single_sched;
> +
>   	/**
>   	 * @sched_list:
>   	 *
> @@ -488,6 +492,7 @@ struct drm_sched_backend_ops {
>    * struct drm_gpu_scheduler - scheduler instance-specific data
>    *
>    * @ops: backend operations provided by the driver.
> + * @single_entity: Single entity for the scheduler
>    * @hw_submission_limit: the max size of the hardware queue.
>    * @timeout: the time after which a job is removed from the scheduler.
>    * @name: name of the ring for which this scheduler is being used.
> @@ -519,6 +524,7 @@ struct drm_sched_backend_ops {
>    */
>   struct drm_gpu_scheduler {
>   	const struct drm_sched_backend_ops	*ops;
> +	struct drm_sched_entity		*single_entity;
>   	uint32_t			hw_submission_limit;
>   	long				timeout;
>   	const char			*name;
> @@ -604,6 +610,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>   			  struct drm_gpu_scheduler **sched_list,
>   			  unsigned int num_sched_list,
>   			  atomic_t *guilty);
> +struct drm_gpu_scheduler *
> +drm_sched_entity_to_scheduler(struct drm_sched_entity *entity);
>   long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout);
>   void drm_sched_entity_fini(struct drm_sched_entity *entity);
>   void drm_sched_entity_destroy(struct drm_sched_entity *entity);

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 04/31] drm/xe: Use DRM_SCHED_POLICY_SINGLE_ENTITY mode
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 04/31] drm/xe: Use DRM_SCHED_POLICY_SINGLE_ENTITY mode Matthew Brost
@ 2023-05-08 12:41   ` Thomas Hellström
  0 siblings, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-08 12:41 UTC (permalink / raw)
  To: Matthew Brost, intel-xe


On 5/2/23 02:17, Matthew Brost wrote:
> We create 1 GPU scheduler per entity in Xe, use
> DRM_SCHED_POLICY_SINGLE_ENTITY scheduling which is designed for that
> paradigm.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_execlist.c   | 3 ++-
>   drivers/gpu/drm/xe/xe_guc_submit.c | 3 +--
>   2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
> index 48060d14547a..79fb951c2965 100644
> --- a/drivers/gpu/drm/xe/xe_execlist.c
> +++ b/drivers/gpu/drm/xe/xe_execlist.c
> @@ -339,7 +339,8 @@ static int execlist_engine_init(struct xe_engine *e)
>   	err = drm_sched_init(&exl->sched, &drm_sched_ops, NULL,
>   			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
>   			     XE_SCHED_HANG_LIMIT, XE_SCHED_JOB_TIMEOUT,
> -			     NULL, NULL, e->hwe->name, DRM_SCHED_POLICY_DEFAULT,
> +			     NULL, NULL, e->hwe->name,
> +			     DRM_SCHED_POLICY_SINGLE_ENTITY,
>   			     gt_to_xe(e->gt)->drm.dev);
>   	if (err)
>   		goto err_free;
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 9d3fadca43be..68d09e7a4cc0 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1084,7 +1084,7 @@ static int guc_engine_init(struct xe_engine *e)
>   	err = drm_sched_init(&ge->sched, &drm_sched_ops, NULL,
>   			     e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
>   			     64, timeout, guc_to_gt(guc)->ordered_wq, NULL,
> -			     e->name, DRM_SCHED_POLICY_DEFAULT,
> +			     e->name, DRM_SCHED_POLICY_SINGLE_ENTITY,
>   			     gt_to_xe(e->gt)->drm.dev);
>   	if (err)
>   		goto err_free;
> @@ -1185,7 +1185,6 @@ static int guc_engine_set_priority(struct xe_engine *e,
>   	if (!msg)
>   		return -ENOMEM;
>   
> -	drm_sched_entity_set_priority(e->entity, priority);
>   	guc_engine_add_msg(e, msg, SET_SCHED_PROPS);
>   
>   	return 0;

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>



^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 05/31] drm/xe: Long running job update
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 05/31] drm/xe: Long running job update Matthew Brost
  2023-05-05 18:36   ` Rodrigo Vivi
@ 2023-05-08 13:14   ` Thomas Hellström
  2023-05-09 14:56     ` Matthew Brost
  2023-05-09 22:21     ` Matthew Brost
  1 sibling, 2 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-08 13:14 UTC (permalink / raw)
  To: Matthew Brost, intel-xe

Hi, Matthew

In addition to Rodrigo's comments:

On 5/2/23 02:17, Matthew Brost wrote:
> Flow control + write ring in exec, return NULL in run_job, siganl
> xe_hw_fence immediately, and override TDR for LR jobs.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_engine.c           | 32 ++++++++
>   drivers/gpu/drm/xe/xe_engine.h           |  4 +
>   drivers/gpu/drm/xe/xe_exec.c             |  8 ++
>   drivers/gpu/drm/xe/xe_guc_engine_types.h |  2 +
>   drivers/gpu/drm/xe/xe_guc_submit.c       | 95 +++++++++++++++++++++---
>   drivers/gpu/drm/xe/xe_trace.h            |  5 ++
>   6 files changed, 137 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
> index 094ec17d3004..d1e84d7adbd4 100644
> --- a/drivers/gpu/drm/xe/xe_engine.c
> +++ b/drivers/gpu/drm/xe/xe_engine.c
> @@ -18,6 +18,7 @@
>   #include "xe_macros.h"
>   #include "xe_migrate.h"
>   #include "xe_pm.h"
> +#include "xe_ring_ops_types.h"
>   #include "xe_trace.h"
>   #include "xe_vm.h"
>   
> @@ -673,6 +674,37 @@ static void engine_kill_compute(struct xe_engine *e)
>   	up_write(&e->vm->lock);
>   }
>   
> +/**
> + * xe_engine_is_lr() - Whether an engine is long-running
> + * @e: The engine
> + *
> + * Return: True if the engine is long-running, false otherwise.
> + */
> +bool xe_engine_is_lr(struct xe_engine *e)
> +{
> +	return e->vm && xe_vm_no_dma_fences(e->vm) &&
> +		!(e->flags & ENGINE_FLAG_VM);
> +}
> +
> +static s32 xe_engine_num_job_inflight(struct xe_engine *e)
> +{
> +	return e->lrc->fence_ctx.next_seqno - xe_lrc_seqno(e->lrc) - 1;
> +}
> +
> +/**
> + * xe_engine_ring_full() - Whether an engine's ring is full
> + * @e: The engine
> + *
> + * Return: True if the engine's ring is full, false otherwise.
> + */
> +bool xe_engine_ring_full(struct xe_engine *e)
> +{
> +	struct xe_lrc *lrc = e->lrc;
> +	s32 max_job = lrc->ring.size / MAX_JOB_SIZE_BYTES;
> +
> +	return xe_engine_num_job_inflight(e) >= max_job;
> +}
> +
>   /**
>    * xe_engine_is_idle() - Whether an engine is idle.
>    * @engine: The engine
> diff --git a/drivers/gpu/drm/xe/xe_engine.h b/drivers/gpu/drm/xe/xe_engine.h
> index a49cf2ab405e..2e60f6d90226 100644
> --- a/drivers/gpu/drm/xe/xe_engine.h
> +++ b/drivers/gpu/drm/xe/xe_engine.h
> @@ -42,6 +42,10 @@ static inline bool xe_engine_is_parallel(struct xe_engine *engine)
>   	return engine->width > 1;
>   }
>   
> +bool xe_engine_is_lr(struct xe_engine *e);
> +
> +bool xe_engine_ring_full(struct xe_engine *e);
> +
>   bool xe_engine_is_idle(struct xe_engine *engine);
>   
>   void xe_engine_kill(struct xe_engine *e);
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index ea869f2452ef..44ea9bcd0066 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -13,6 +13,7 @@
>   #include "xe_device.h"
>   #include "xe_engine.h"
>   #include "xe_macros.h"
> +#include "xe_ring_ops_types.h"
>   #include "xe_sched_job.h"
>   #include "xe_sync.h"
>   #include "xe_vm.h"
> @@ -277,6 +278,11 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   		goto err_engine_end;
>   	}
>   
> +	if (xe_engine_is_lr(engine) && xe_engine_ring_full(engine)) {
> +		err = -EWOULDBLOCK;
> +		goto err_engine_end;
> +	}
> +
>   	job = xe_sched_job_create(engine, xe_engine_is_parallel(engine) ?
>   				  addresses : &args->address);
>   	if (IS_ERR(job)) {
> @@ -363,6 +369,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   		xe_sync_entry_signal(&syncs[i], job,
>   				     &job->drm.s_fence->finished);
>   
> +	if (xe_engine_is_lr(engine))
> +		engine->ring_ops->emit_job(job);
>   	xe_sched_job_push(job);
>   	xe_vm_reactivate_rebind(vm);
>   
> diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> index cbfb13026ec1..5d83132034a6 100644
> --- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
> +++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> @@ -31,6 +31,8 @@ struct xe_guc_engine {
>   	 */
>   #define MAX_STATIC_MSG_TYPE	3
>   	struct drm_sched_msg static_msgs[MAX_STATIC_MSG_TYPE];
> +	/** @lr_tdr: long running TDR worker */
> +	struct work_struct lr_tdr;
>   	/** @fini_async: do final fini async from this worker */
>   	struct work_struct fini_async;
>   	/** @resume_time: time of last resume */
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 68d09e7a4cc0..0a41f5d04f6d 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -500,6 +500,14 @@ static void register_engine(struct xe_engine *e)
>   		parallel_write(xe, map, wq_desc.wq_status, WQ_STATUS_ACTIVE);
>   	}
>   
> +	/*
> +	 * We must keep a reference for LR engines if engine is registered with
> +	 * the GuC as jobs signal immediately and can't destroy an engine if the
> +	 * GuC has a reference to it.
> +	 */
> +	if (xe_engine_is_lr(e))
> +		xe_engine_get(e);
> +
>   	set_engine_registered(e);
>   	trace_xe_engine_register(e);
>   	if (xe_engine_is_parallel(e))
> @@ -662,6 +670,7 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
>   {
>   	struct xe_sched_job *job = to_xe_sched_job(drm_job);
>   	struct xe_engine *e = job->engine;
> +	bool lr = xe_engine_is_lr(e);
>   
>   	XE_BUG_ON((engine_destroyed(e) || engine_pending_disable(e)) &&
>   		  !engine_banned(e) && !engine_suspended(e));
> @@ -671,14 +680,19 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
>   	if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
>   		if (!engine_registered(e))
>   			register_engine(e);
> -		e->ring_ops->emit_job(job);
> +		if (!lr)	/* Written in IOCTL */

Hmm? What does "Written in IOCTL mean?" Could you rephrase to something 
more descriptive?

> +			e->ring_ops->emit_job(job);
>   		submit_engine(e);
>   	}
>   
> -	if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags))
> +	if (lr) {
> +		xe_sched_job_set_error(job, -ENOTSUPP);
> +		return NULL;
> +	} else if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags)) {
>   		return job->fence;
> -	else
> +	} else {
>   		return dma_fence_get(job->fence);
> +	}
>   }
>   
>   static void guc_engine_free_job(struct drm_sched_job *drm_job)
> @@ -782,6 +796,57 @@ static void simple_error_capture(struct xe_engine *e)
>   }
>   #endif
>   
> +static void xe_guc_engine_trigger_cleanup(struct xe_engine *e)
> +{
> +	struct xe_guc *guc = engine_to_guc(e);
> +
> +	if (xe_engine_is_lr(e))
> +		queue_work(guc_to_gt(guc)->ordered_wq, &e->guc->lr_tdr);
> +	else
> +		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> +}
> +
> +static void xe_guc_engine_lr_cleanup(struct work_struct *w)
> +{
> +	struct xe_guc_engine *ge =
> +		container_of(w, struct xe_guc_engine, lr_tdr);
> +	struct xe_engine *e = ge->engine;
> +	struct drm_gpu_scheduler *sched = &ge->sched;
> +
> +	XE_BUG_ON(!xe_engine_is_lr(e));
> +	trace_xe_engine_lr_cleanup(e);
> +
> +	/* Kill the run_job / process_msg entry points */
> +	drm_sched_run_wq_stop(sched);
> +
> +	/* Engine state now stable, disable scheduling / deregister if needed */
> +	if (engine_registered(e)) {
> +		struct xe_guc *guc = engine_to_guc(e);
> +		int ret;
> +
> +		set_engine_banned(e);
> +		xe_engine_get(e);
> +		disable_scheduling_deregister(guc, e);
> +
> +		/*
> +		 * Must wait for scheduling to be disabled before signalling
> +		 * any fences, if GT broken the GT reset code should signal us.
> +		 */
> +		smp_rmb();

wait_event() paired with wake_up() family of functions typically set the 
necessary barriers to make sure anything written prior to wake_up() is 
seen in wait_event(). So that smp_rmb() is most likely not needed. If it 
still is, its pairing smp_wmb() should be documented and pointed to as 
well. See documentation of set_current_state() vs __set_current_state().

> +		ret = wait_event_timeout(guc->ct.wq,
> +					 !engine_pending_disable(e) ||
> +					 guc_read_stopped(guc), HZ * 5);
> +		if (!ret) {
> +			XE_WARN_ON("Schedule disable failed to respond");
> +			drm_sched_run_wq_start(sched);
> +			xe_gt_reset_async(e->gt);
> +			return;
> +		}
> +	}
> +
> +	drm_sched_run_wq_start(sched);
> +}
> +
>   static enum drm_gpu_sched_stat
>   guc_engine_timedout_job(struct drm_sched_job *drm_job)
>   {
> @@ -832,7 +897,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
>   			err = -EIO;
>   		set_engine_banned(e);
>   		xe_engine_get(e);
> -		disable_scheduling_deregister(engine_to_guc(e), e);
> +		disable_scheduling_deregister(guc, e);
>   
>   		/*
>   		 * Must wait for scheduling to be disabled before signalling
> @@ -865,7 +930,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
>   	 */
>   	list_add(&drm_job->list, &sched->pending_list);
>   	drm_sched_run_wq_start(sched);
> -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> +	xe_guc_engine_trigger_cleanup(e);
>   
>   	/* Mark all outstanding jobs as bad, thus completing them */
>   	spin_lock(&sched->job_list_lock);
> @@ -889,6 +954,8 @@ static void __guc_engine_fini_async(struct work_struct *w)
>   
>   	trace_xe_engine_destroy(e);
>   
> +	if (xe_engine_is_lr(e))
> +		cancel_work_sync(&ge->lr_tdr);
>   	if (e->flags & ENGINE_FLAG_PERSISTENT)
>   		xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
>   	release_guc_id(guc, e);
> @@ -906,7 +973,7 @@ static void guc_engine_fini_async(struct xe_engine *e)
>   	bool kernel = e->flags & ENGINE_FLAG_KERNEL;
>   
>   	INIT_WORK(&e->guc->fini_async, __guc_engine_fini_async);
> -	queue_work(system_unbound_wq, &e->guc->fini_async);
> +	queue_work(system_wq, &e->guc->fini_async);
>   
>   	/* We must block on kernel engines so slabs are empty on driver unload */
>   	if (kernel) {
> @@ -1089,12 +1156,16 @@ static int guc_engine_init(struct xe_engine *e)
>   	if (err)
>   		goto err_free;
>   
> +

Unrelated whitespace?


>   	sched = &ge->sched;
>   	err = drm_sched_entity_init(&ge->entity, DRM_SCHED_PRIORITY_NORMAL,
>   				    &sched, 1, NULL);
>   	if (err)
>   		goto err_sched;
>   
> +	if (xe_engine_is_lr(e))
> +		INIT_WORK(&e->guc->lr_tdr, xe_guc_engine_lr_cleanup);
> +
>   	mutex_lock(&guc->submission_state.lock);
>   
>   	err = alloc_guc_id(guc, e);
> @@ -1146,7 +1217,7 @@ static void guc_engine_kill(struct xe_engine *e)
>   {
>   	trace_xe_engine_kill(e);
>   	set_engine_killed(e);
> -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> +	xe_guc_engine_trigger_cleanup(e);
>   }
>   
>   static void guc_engine_add_msg(struct xe_engine *e, struct drm_sched_msg *msg,
> @@ -1296,6 +1367,9 @@ static void guc_engine_stop(struct xe_guc *guc, struct xe_engine *e)
>   	/* Stop scheduling + flush any DRM scheduler operations */
>   	drm_sched_run_wq_stop(sched);
>   
> +	if (engine_registered(e) && xe_engine_is_lr(e))
> +		xe_engine_put(e);
> +
>   	/* Clean up lost G2H + reset engine state */
>   	if (engine_destroyed(e) && engine_registered(e)) {
>   		if (engine_banned(e))
> @@ -1520,6 +1594,9 @@ int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>   	trace_xe_engine_deregister_done(e);
>   
>   	clear_engine_registered(e);
> +	if (xe_engine_is_lr(e))
> +		xe_engine_put(e);
> +
>   	if (engine_banned(e))
>   		xe_engine_put(e);
>   	else
> @@ -1557,7 +1634,7 @@ int xe_guc_engine_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
>   	 */
>   	set_engine_reset(e);
>   	if (!engine_banned(e))
> -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> +		xe_guc_engine_trigger_cleanup(e);
>   
>   	return 0;
>   }
> @@ -1584,7 +1661,7 @@ int xe_guc_engine_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
>   	/* Treat the same as engine reset */
>   	set_engine_reset(e);
>   	if (!engine_banned(e))
> -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> +		xe_guc_engine_trigger_cleanup(e);
>   
>   	return 0;
>   }
> diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> index 2f8eb7ebe9a7..02861c26e145 100644
> --- a/drivers/gpu/drm/xe/xe_trace.h
> +++ b/drivers/gpu/drm/xe/xe_trace.h
> @@ -219,6 +219,11 @@ DEFINE_EVENT(xe_engine, xe_engine_resubmit,
>   	     TP_ARGS(e)
>   );
>   
> +DEFINE_EVENT(xe_engine, xe_engine_lr_cleanup,
> +	     TP_PROTO(struct xe_engine *e),
> +	     TP_ARGS(e)
> +);
> +
>   DECLARE_EVENT_CLASS(xe_sched_job,
>   		    TP_PROTO(struct xe_sched_job *job),
>   		    TP_ARGS(job),


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 28/31] drm/xe: Allow dma-fences as in-syncs for compute / faulting VM
  2023-05-08  1:19     ` Matthew Brost
@ 2023-05-08 21:29       ` Rodrigo Vivi
  0 siblings, 0 replies; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-08 21:29 UTC (permalink / raw)
  To: Matthew Brost; +Cc: Rodrigo Vivi, intel-xe

On Mon, May 08, 2023 at 01:19:48AM +0000, Matthew Brost wrote:
> On Fri, May 05, 2023 at 03:43:15PM -0400, Rodrigo Vivi wrote:
> > On Mon, May 01, 2023 at 05:17:24PM -0700, Matthew Brost wrote:
> > > This is allowed per the dma-fencing rules.
> > 
> > it would be good a word saying 'why' we are doing this.
> > only because we can it doesn't mean we should...
> > 
> 
> This is allowed and encouraged the dma-fencing rule. This along with
> allowing compute VMs to export dma-fences on binds will result in a
> simpler compute UMD.
> 
> Sound ok?

makes sense. with that

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


> 
> Matt
>   
> > > 
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_sync.c | 12 +++++++-----
> > >  1 file changed, 7 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c
> > > index 99f1ed87196d..1e4e4acb2c4a 100644
> > > --- a/drivers/gpu/drm/xe/xe_sync.c
> > > +++ b/drivers/gpu/drm/xe/xe_sync.c
> > > @@ -105,6 +105,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
> > >  {
> > >  	struct drm_xe_sync sync_in;
> > >  	int err;
> > > +	bool signal;
> > >  
> > >  	if (copy_from_user(&sync_in, sync_user, sizeof(*sync_user)))
> > >  		return -EFAULT;
> > > @@ -113,9 +114,10 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
> > >  			 ~(SYNC_FLAGS_TYPE_MASK | DRM_XE_SYNC_SIGNAL)))
> > >  		return -EINVAL;
> > >  
> > > +	signal = sync_in.flags & DRM_XE_SYNC_SIGNAL;
> > >  	switch (sync_in.flags & SYNC_FLAGS_TYPE_MASK) {
> > >  	case DRM_XE_SYNC_SYNCOBJ:
> > > -		if (XE_IOCTL_ERR(xe, no_dma_fences))
> > > +		if (XE_IOCTL_ERR(xe, no_dma_fences && signal))
> > >  			return -ENOTSUPP;
> > >  
> > >  		if (XE_IOCTL_ERR(xe, upper_32_bits(sync_in.addr)))
> > > @@ -125,7 +127,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
> > >  		if (XE_IOCTL_ERR(xe, !sync->syncobj))
> > >  			return -ENOENT;
> > >  
> > > -		if (!(sync_in.flags & DRM_XE_SYNC_SIGNAL)) {
> > > +		if (!signal) {
> > >  			sync->fence = drm_syncobj_fence_get(sync->syncobj);
> > >  			if (XE_IOCTL_ERR(xe, !sync->fence))
> > >  				return -EINVAL;
> > > @@ -133,7 +135,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
> > >  		break;
> > >  
> > >  	case DRM_XE_SYNC_TIMELINE_SYNCOBJ:
> > > -		if (XE_IOCTL_ERR(xe, no_dma_fences))
> > > +		if (XE_IOCTL_ERR(xe, no_dma_fences && signal))
> > >  			return -ENOTSUPP;
> > >  
> > >  		if (XE_IOCTL_ERR(xe, upper_32_bits(sync_in.addr)))
> > > @@ -146,7 +148,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
> > >  		if (XE_IOCTL_ERR(xe, !sync->syncobj))
> > >  			return -ENOENT;
> > >  
> > > -		if (sync_in.flags & DRM_XE_SYNC_SIGNAL) {
> > > +		if (signal) {
> > >  			sync->chain_fence = dma_fence_chain_alloc();
> > >  			if (!sync->chain_fence)
> > >  				return -ENOMEM;
> > > @@ -168,7 +170,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
> > >  		break;
> > >  
> > >  	case DRM_XE_SYNC_USER_FENCE:
> > > -		if (XE_IOCTL_ERR(xe, !(sync_in.flags & DRM_XE_SYNC_SIGNAL)))
> > > +		if (XE_IOCTL_ERR(xe, !signal))
> > >  			return -ENOTSUPP;
> > >  
> > >  		if (XE_IOCTL_ERR(xe, sync_in.addr & 0x7))
> > > -- 
> > > 2.34.1
> > > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 07/31] drm/xe: Only try to lock external BOs in VM bind
  2023-05-08  1:08     ` Matthew Brost
  2023-05-08  1:15       ` Christopher Snowhill
@ 2023-05-08 21:34       ` Rodrigo Vivi
  2023-05-09 12:29         ` Thomas Hellström
  1 sibling, 1 reply; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-08 21:34 UTC (permalink / raw)
  To: Matthew Brost; +Cc: Rodrigo Vivi, intel-xe, Matthew Brost

On Mon, May 08, 2023 at 01:08:10AM +0000, Matthew Brost wrote:
> On Fri, May 05, 2023 at 02:40:40PM -0400, Rodrigo Vivi wrote:
> > On Mon, May 01, 2023 at 05:17:03PM -0700, Matthew Brost wrote:
> > > Not needed and causes some issues with bulk LRU moves.
> > 
> > I'm confused with this explanation and the code below.
> > could you please provide a bit more wording here?
> > 
> 
> We only need to try to lock a BO if it external as non-external BOs
> share the dma-resv with the already locked VM. Trying to lock
> non-external BOs caused an issue (list corruption) in an uncoming patch
> which adds bulk LRU move. Since this code isn't needed, remove it.

it makes more sense now. with this in commit msg (but with Christopher fix)


Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


> 
> ^^^ How about this.
> 
> > > 
> > > Signed-off-by: Matthew Brost <mattthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_vm.c | 8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > > index 272f0f7f24fe..6c427ff92c44 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > @@ -2064,9 +2064,11 @@ static int vm_bind_ioctl(struct xe_vm *vm, struct xe_vma *vma,
> > >  		 */
> > >  		xe_bo_get(vbo);
> > >  
> > > -		tv_bo.bo = &vbo->ttm;
> > > -		tv_bo.num_shared = 1;
> > > -		list_add(&tv_bo.head, &objs);
> > > +		if (!vbo->vm) {
> > > +			tv_bo.bo = &vbo->ttm;
> > > +			tv_bo.num_shared = 1;
> > > +			list_add(&tv_bo.head, &objs);
> > > +		}
> > >  	}
> > >  
> > >  again:
> > > -- 
> > > 2.34.1
> > > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move Matthew Brost
@ 2023-05-08 21:39   ` Rodrigo Vivi
  2023-05-09 22:09     ` Matthew Brost
  2023-05-09 12:47   ` Thomas Hellström
  1 sibling, 1 reply; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-08 21:39 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, May 01, 2023 at 05:17:04PM -0700, Matthew Brost wrote:
> Use the TTM LRU bulk move for BOs tied to a VM. Update the bulk moves
> LRU position on every exec.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c       | 32 ++++++++++++++++++++++++++++----
>  drivers/gpu/drm/xe/xe_bo.h       |  4 ++--
>  drivers/gpu/drm/xe/xe_dma_buf.c  |  2 +-
>  drivers/gpu/drm/xe/xe_exec.c     |  6 ++++++
>  drivers/gpu/drm/xe/xe_vm_types.h |  3 +++
>  5 files changed, 40 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 3ab404e33fae..da99ee53e7d7 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -985,6 +985,23 @@ static void xe_gem_object_free(struct drm_gem_object *obj)
>  	ttm_bo_put(container_of(obj, struct ttm_buffer_object, base));
>  }
>  
> +static void xe_gem_object_close(struct drm_gem_object *obj,
> +				struct drm_file *file_priv)
> +{
> +	struct xe_bo *bo = gem_to_xe_bo(obj);
> +
> +	if (bo->vm && !xe_vm_no_dma_fences(bo->vm)) {
> +		struct ww_acquire_ctx ww;
> +
> +		XE_BUG_ON(!xe_bo_is_user(bo));

We need to really stop using BUG_ON and move towards the usage of more WARNs.

But the rest of the patch looks good to me... I just believe it would be
good to get Thomas' review here.

> +
> +		xe_bo_lock(bo, &ww, 0, false);
> +		ttm_bo_set_bulk_move(&bo->ttm, NULL);
> +		xe_bo_unlock(bo, &ww);
> +	}
> +}
> +
> +
>  static bool should_migrate_to_system(struct xe_bo *bo)
>  {
>  	struct xe_device *xe = xe_bo_device(bo);
> @@ -1040,6 +1057,7 @@ static const struct vm_operations_struct xe_gem_vm_ops = {
>  
>  static const struct drm_gem_object_funcs xe_gem_object_funcs = {
>  	.free = xe_gem_object_free,
> +	.close = xe_gem_object_close,
>  	.mmap = drm_gem_ttm_mmap,
>  	.export = xe_gem_prime_export,
>  	.vm_ops = &xe_gem_vm_ops,
> @@ -1081,8 +1099,8 @@ void xe_bo_free(struct xe_bo *bo)
>  
>  struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  				    struct xe_gt *gt, struct dma_resv *resv,
> -				    size_t size, enum ttm_bo_type type,
> -				    u32 flags)
> +				    struct ttm_lru_bulk_move *bulk, size_t size,
> +				    enum ttm_bo_type type, u32 flags)
>  {
>  	struct ttm_operation_ctx ctx = {
>  		.interruptible = true,
> @@ -1149,7 +1167,10 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  		return ERR_PTR(err);
>  
>  	bo->created = true;
> -	ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
> +	if (bulk)
> +		ttm_bo_set_bulk_move(&bo->ttm, bulk);
> +	else
> +		ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
>  
>  	return bo;
>  }
> @@ -1219,7 +1240,10 @@ xe_bo_create_locked_range(struct xe_device *xe,
>  		}
>  	}
>  
> -	bo = __xe_bo_create_locked(xe, bo, gt, vm ? &vm->resv : NULL, size,
> +	bo = __xe_bo_create_locked(xe, bo, gt, vm ? &vm->resv : NULL,
> +				   vm && !xe_vm_no_dma_fences(vm) &&
> +				   flags & XE_BO_CREATE_USER_BIT ?
> +				   &vm->lru_bulk_move : NULL, size,
>  				   type, flags);
>  	if (IS_ERR(bo))
>  		return bo;
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index 8354d05ccdf3..25457b3c757b 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -81,8 +81,8 @@ void xe_bo_free(struct xe_bo *bo);
>  
>  struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  				    struct xe_gt *gt, struct dma_resv *resv,
> -				    size_t size, enum ttm_bo_type type,
> -				    u32 flags);
> +				    struct ttm_lru_bulk_move *bulk, size_t size,
> +				    enum ttm_bo_type type, u32 flags);
>  struct xe_bo *
>  xe_bo_create_locked_range(struct xe_device *xe,
>  			  struct xe_gt *gt, struct xe_vm *vm,
> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> index 9b252cc782b7..975dee1f770f 100644
> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> @@ -199,7 +199,7 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
>  	int ret;
>  
>  	dma_resv_lock(resv, NULL);
> -	bo = __xe_bo_create_locked(xe, storage, NULL, resv, dma_buf->size,
> +	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
>  				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
>  	if (IS_ERR(bo)) {
>  		ret = PTR_ERR(bo);
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index 44ea9bcd0066..21a9c2fddf86 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -374,6 +374,12 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  	xe_sched_job_push(job);
>  	xe_vm_reactivate_rebind(vm);
>  
> +	if (!err && !xe_vm_no_dma_fences(vm)) {
> +		spin_lock(&xe->ttm.lru_lock);
> +		ttm_lru_bulk_move_tail(&vm->lru_bulk_move);
> +		spin_unlock(&xe->ttm.lru_lock);
> +	}
> +
>  err_repin:
>  	if (!xe_vm_no_dma_fences(vm))
>  		up_read(&vm->userptr.notifier_lock);
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index fada7896867f..d3e99f22510d 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -164,6 +164,9 @@ struct xe_vm {
>  	/** Protects @rebind_list and the page-table structures */
>  	struct dma_resv resv;
>  
> +	/** @lru_bulk_move: Bulk LRU move list for this VM's BOs */
> +	struct ttm_lru_bulk_move lru_bulk_move;
> +
>  	u64 size;
>  	struct rb_root vmas;
>  
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 11/31] drm/xe/guc: Use doorbells for submission if possible
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 11/31] drm/xe/guc: Use doorbells for submission if possible Matthew Brost
@ 2023-05-08 21:42   ` Rodrigo Vivi
  2023-05-10  0:49     ` Matthew Brost
  2023-05-09 13:00   ` Thomas Hellström
  2023-05-21 12:32   ` Oded Gabbay
  2 siblings, 1 reply; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-08 21:42 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Faith Ekstrand

On Mon, May 01, 2023 at 05:17:07PM -0700, Matthew Brost wrote:
> We have 256 doorbells (on most platforms) that we can allocate to bypass
> using the H2G channel for submission. This will avoid contention on the
> CT mutex.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Suggested-by: Faith Ekstrand <faith.ekstrand@collabora.com>
> ---
>  drivers/gpu/drm/xe/regs/xe_guc_regs.h    |   1 +
>  drivers/gpu/drm/xe/xe_guc.c              |   6 +
>  drivers/gpu/drm/xe/xe_guc_engine_types.h |   7 +
>  drivers/gpu/drm/xe/xe_guc_submit.c       | 295 ++++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
>  drivers/gpu/drm/xe/xe_guc_types.h        |   4 +
>  drivers/gpu/drm/xe/xe_trace.h            |   5 +
>  7 files changed, 315 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/regs/xe_guc_regs.h b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> index 37e0ac550931..11b117293a62 100644
> --- a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> +++ b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> @@ -109,6 +109,7 @@ struct guc_doorbell_info {
>  
>  #define DIST_DBS_POPULATED			XE_REG(0xd08)
>  #define   DOORBELLS_PER_SQIDI_MASK		REG_GENMASK(23, 16)
> +#define	  DOORBELLS_PER_SQIDI_SHIFT		16
>  #define   SQIDIS_DOORBELL_EXIST_MASK		REG_GENMASK(15, 0)
>  
>  #define GUC_BCS_RCS_IER				XE_REG(0xC550)
> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> index 89d20faced19..0c87f78a868b 100644
> --- a/drivers/gpu/drm/xe/xe_guc.c
> +++ b/drivers/gpu/drm/xe/xe_guc.c
> @@ -297,6 +297,12 @@ int xe_guc_init(struct xe_guc *guc)
>   */
>  int xe_guc_init_post_hwconfig(struct xe_guc *guc)
>  {
> +	int ret;
> +
> +	ret = xe_guc_submit_init_post_hwconfig(guc);
> +	if (ret)
> +		return ret;
> +
>  	return xe_guc_ads_init_post_hwconfig(&guc->ads);
>  }
>  
> diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> index 5d83132034a6..420b7f53e649 100644
> --- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
> +++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> @@ -12,6 +12,7 @@
>  #include <drm/gpu_scheduler.h>
>  
>  struct dma_fence;
> +struct xe_bo;
>  struct xe_engine;
>  
>  /**
> @@ -37,6 +38,10 @@ struct xe_guc_engine {
>  	struct work_struct fini_async;
>  	/** @resume_time: time of last resume */
>  	u64 resume_time;
> +	/** @doorbell_bo: BO for memory doorbell */
> +	struct xe_bo *doorbell_bo;
> +	/** @doorbell_offset: MMIO doorbell offset */
> +	u32 doorbell_offset;
>  	/** @state: GuC specific state for this xe_engine */
>  	atomic_t state;
>  	/** @wqi_head: work queue item tail */
> @@ -45,6 +50,8 @@ struct xe_guc_engine {
>  	u32 wqi_tail;
>  	/** @id: GuC id for this xe_engine */
>  	u16 id;
> +	/** @doorbell_id: doorbell id */
> +	u16 doorbell_id;
>  	/** @suspend_wait: wait queue used to wait on pending suspends */
>  	wait_queue_head_t suspend_wait;
>  	/** @suspend_pending: a suspend of the engine is pending */
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 0a41f5d04f6d..1b6f36b04cd1 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -13,7 +13,10 @@
>  
>  #include <drm/drm_managed.h>
>  
> +#include "regs/xe_guc_regs.h"
>  #include "regs/xe_lrc_layout.h"
> +
> +#include "xe_bo.h"
>  #include "xe_device.h"
>  #include "xe_engine.h"
>  #include "xe_force_wake.h"
> @@ -26,12 +29,22 @@
>  #include "xe_lrc.h"
>  #include "xe_macros.h"
>  #include "xe_map.h"
> +#include "xe_mmio.h"
>  #include "xe_mocs.h"
>  #include "xe_ring_ops_types.h"
>  #include "xe_sched_job.h"
>  #include "xe_trace.h"
>  #include "xe_vm.h"
>  
> +#define HAS_GUC_MMIO_DB(xe) (IS_DGFX(xe) || GRAPHICS_VERx100(xe) >= 1250)
> +#define HAS_GUC_DIST_DB(xe) \
> +	(GRAPHICS_VERx100(xe) >= 1200 && !HAS_GUC_MMIO_DB(xe))
> +
> +#define GUC_NUM_HW_DOORBELLS 256
> +
> +#define GUC_MMIO_DB_BAR_OFFSET SZ_4M
> +#define GUC_MMIO_DB_BAR_SIZE SZ_4M
> +
>  static struct xe_gt *
>  guc_to_gt(struct xe_guc *guc)
>  {
> @@ -63,6 +76,7 @@ engine_to_guc(struct xe_engine *e)
>  #define ENGINE_STATE_SUSPENDED		(1 << 5)
>  #define ENGINE_STATE_RESET		(1 << 6)
>  #define ENGINE_STATE_KILLED		(1 << 7)
> +#define ENGINE_STATE_DB_REGISTERED	(1 << 8)
>  
>  static bool engine_registered(struct xe_engine *e)
>  {
> @@ -179,6 +193,16 @@ static void set_engine_killed(struct xe_engine *e)
>  	atomic_or(ENGINE_STATE_KILLED, &e->guc->state);
>  }
>  
> +static bool engine_doorbell_registered(struct xe_engine *e)
> +{
> +	return atomic_read(&e->guc->state) & ENGINE_STATE_DB_REGISTERED;
> +}
> +
> +static void set_engine_doorbell_registered(struct xe_engine *e)
> +{
> +	atomic_or(ENGINE_STATE_DB_REGISTERED, &e->guc->state);
> +}
> +
>  static bool engine_killed_or_banned(struct xe_engine *e)
>  {
>  	return engine_killed(e) || engine_banned(e);
> @@ -190,6 +214,7 @@ static void guc_submit_fini(struct drm_device *drm, void *arg)
>  
>  	xa_destroy(&guc->submission_state.engine_lookup);
>  	ida_destroy(&guc->submission_state.guc_ids);
> +	ida_destroy(&guc->submission_state.doorbell_ids);
>  	bitmap_free(guc->submission_state.guc_ids_bitmap);
>  }
>  
> @@ -230,6 +255,7 @@ int xe_guc_submit_init(struct xe_guc *guc)
>  	mutex_init(&guc->submission_state.lock);
>  	xa_init(&guc->submission_state.engine_lookup);
>  	ida_init(&guc->submission_state.guc_ids);
> +	ida_init(&guc->submission_state.doorbell_ids);
>  
>  	spin_lock_init(&guc->submission_state.suspend.lock);
>  	guc->submission_state.suspend.context = dma_fence_context_alloc(1);
> @@ -243,6 +269,237 @@ int xe_guc_submit_init(struct xe_guc *guc)
>  	return 0;
>  }
>  
> +int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc)
> +{
> +	if (HAS_GUC_DIST_DB(guc_to_xe(guc))) {
> +		u32 distdbreg = xe_mmio_read32(guc_to_gt(guc),
> +					       DIST_DBS_POPULATED.reg);
> +		u32 num_sqidi =
> +			hweight32(distdbreg & SQIDIS_DOORBELL_EXIST_MASK);
> +		u32 doorbells_per_sqidi =
> +			((distdbreg >> DOORBELLS_PER_SQIDI_SHIFT) &
> +			 DOORBELLS_PER_SQIDI_MASK) + 1;
> +
> +		guc->submission_state.num_doorbells =
> +			num_sqidi * doorbells_per_sqidi;
> +	} else {
> +		guc->submission_state.num_doorbells = GUC_NUM_HW_DOORBELLS;
> +	}
> +
> +	return 0;
> +}
> +
> +static bool alloc_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
> +{
> +	int ret;
> +
> +	lockdep_assert_held(&guc->submission_state.lock);
> +
> +	e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
> +	ret = ida_simple_get(&guc->submission_state.doorbell_ids, 0,
> +			     guc->submission_state.num_doorbells, GFP_NOWAIT);
> +	if (ret < 0)
> +		return false;
> +
> +	e->guc->doorbell_id = ret;
> +
> +	return true;
> +}
> +
> +static void release_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
> +{
> +	mutex_lock(&guc->submission_state.lock);
> +	ida_simple_remove(&guc->submission_state.doorbell_ids,
> +			  e->guc->doorbell_id);
> +	mutex_unlock(&guc->submission_state.lock);
> +
> +	e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
> +}
> +
> +static int allocate_doorbell(struct xe_guc *guc, u16 guc_id, u16 doorbell_id,
> +			     u64 gpa, u32 gtt_addr)
> +{
> +	u32 action[] = {
> +		XE_GUC_ACTION_ALLOCATE_DOORBELL,
> +		guc_id,
> +		doorbell_id,
> +		lower_32_bits(gpa),
> +		upper_32_bits(gpa),
> +		gtt_addr
> +	};
> +
> +	return xe_guc_ct_send_block(&guc->ct, action, ARRAY_SIZE(action));
> +}
> +
> +static void deallocate_doorbell(struct xe_guc *guc, u16 guc_id)
> +{
> +	u32 action[] = {
> +		XE_GUC_ACTION_DEALLOCATE_DOORBELL,
> +		guc_id
> +	};
> +
> +	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> +}
> +
> +static bool has_doorbell(struct xe_engine *e)
> +{
> +	return e->guc->doorbell_id != GUC_NUM_HW_DOORBELLS;
> +}
> +
> +#define doorbell_read(guc_, e_, field_) ({			\
> +	struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;	\
> +	iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);	\
> +	xe_map_rd_field(guc_to_xe((guc_)), &_vmap, 0,		\
> +				  struct guc_doorbell_info, field_); \
> +	})
> +#define doorbell_write(guc_, e_, field_, val_) ({		\
> +	struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;	\
> +	iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);	\
> +	xe_map_wr_field(guc_to_xe((guc_)), &_vmap, 0,		\
> +				  struct guc_doorbell_info, field_, val_); \
> +	})
> +
> +static void init_doorbell(struct xe_guc *guc, struct xe_engine *e)
> +{
> +	struct xe_device *xe = guc_to_xe(guc);
> +
> +	/* GuC does the initialization with distributed and MMIO doorbells */
> +	if (!HAS_GUC_DIST_DB(xe) && !HAS_GUC_MMIO_DB(xe)) {
> +		doorbell_write(guc, e, db_status, GUC_DOORBELL_ENABLED);
> +		doorbell_write(guc, e, cookie, 0);
> +	}
> +}
> +
> +static void fini_doorbell(struct xe_guc *guc, struct xe_engine *e)
> +{
> +	if (!HAS_GUC_MMIO_DB(guc_to_xe(guc)) &&
> +	    xe_device_mem_access_ongoing(guc_to_xe(guc)))
> +		doorbell_write(guc, e, db_status, GUC_DOORBELL_DISABLED);
> +}
> +
> +static void destroy_doorbell(struct xe_guc *guc, struct xe_engine *e)
> +{
> +	if (has_doorbell(e)) {
> +		release_doorbell_id(guc, e);
> +		xe_bo_unpin_map_no_vm(e->guc->doorbell_bo);
> +	}
> +}
> +
> +static void ring_memory_doorbell(struct xe_guc *guc, struct xe_engine *e)
> +{
> +	u32 cookie;
> +
> +	cookie = doorbell_read(guc, e, cookie);
> +	doorbell_write(guc, e, cookie, cookie + 1 ?: cookie + 2);
> +
> +	XE_WARN_ON(doorbell_read(guc, e, db_status) != GUC_DOORBELL_ENABLED);
> +}
> +
> +#define GUC_MMIO_DOORBELL_RING_ACK	0xACEDBEEF
> +#define GUC_MMIO_DOORBELL_RING_NACK	0xDEADBEEF

Is this a guc abi? should it be in the guc abi files?

I feel that we need someone with deeper guc knowledge on this review
although based on what I followed on the discussion with Faith and others
it looks like a good move in general.

> +static void ring_mmio_doorbell(struct xe_guc *guc, u32 doorbell_offset)
> +{
> +	u32 db_value;
> +
> +	db_value = xe_mmio_read32(guc_to_gt(guc), GUC_MMIO_DB_BAR_OFFSET +
> +				  doorbell_offset);
> +
> +	/*
> +	 * The read from the doorbell page will return ack/nack. We don't remove
> +	 * doorbells from active clients so we don't expect to ever get a nack.
> +	 * XXX: if doorbell is lost, re-acquire it?
> +	 */
> +	XE_WARN_ON(db_value == GUC_MMIO_DOORBELL_RING_NACK);
> +	XE_WARN_ON(db_value != GUC_MMIO_DOORBELL_RING_ACK);
> +}
> +
> +static void ring_doorbell(struct xe_guc *guc, struct xe_engine *e)
> +{
> +	XE_BUG_ON(!has_doorbell(e));
> +
> +	if (HAS_GUC_MMIO_DB(guc_to_xe(guc)))
> +		ring_mmio_doorbell(guc, e->guc->doorbell_offset);
> +	else
> +		ring_memory_doorbell(guc, e);
> +
> +	trace_xe_engine_ring_db(e);
> +}
> +
> +static void register_engine(struct xe_engine *e);
> +
> +static int create_doorbell(struct xe_guc *guc, struct xe_engine *e, bool init)
> +{
> +	struct xe_gt *gt = guc_to_gt(guc);
> +	struct xe_device *xe = gt_to_xe(gt);
> +	u64 gpa;
> +	u32 gtt_addr;
> +	int ret;
> +
> +	XE_BUG_ON(!has_doorbell(e));
> +
> +	if (HAS_GUC_MMIO_DB(xe)) {
> +		e->guc->doorbell_offset = PAGE_SIZE * e->guc->doorbell_id;
> +		gpa = GUC_MMIO_DB_BAR_OFFSET + e->guc->doorbell_offset;
> +		gtt_addr = 0;
> +	} else {
> +		struct xe_bo *bo;
> +
> +		if (!e->guc->doorbell_bo) {
> +			bo = xe_bo_create_pin_map(xe, gt, NULL, PAGE_SIZE,
> +						  ttm_bo_type_kernel,
> +						  XE_BO_CREATE_VRAM_IF_DGFX(gt) |
> +						  XE_BO_CREATE_GGTT_BIT);
> +			if (IS_ERR(bo))
> +				return PTR_ERR(bo);
> +
> +			e->guc->doorbell_bo = bo;
> +		} else {
> +			bo = e->guc->doorbell_bo;
> +		}
> +
> +		init_doorbell(guc, e);
> +		gpa = xe_bo_main_addr(bo, PAGE_SIZE);
> +		gtt_addr = xe_bo_ggtt_addr(bo);
> +	}
> +
> +	if (init && e->flags & ENGINE_FLAG_KERNEL)
> +		return 0;
> +
> +	register_engine(e);
> +	ret = allocate_doorbell(guc, e->guc->id, e->guc->doorbell_id, gpa,
> +				gtt_addr);
> +	if (ret < 0) {
> +		fini_doorbell(guc, e);
> +		return ret;
> +	}
> +
> +	/*
> +	 * In distributed doorbells, guc is returning the cacheline selected
> +	 * by HW as part of the 7bit data from the allocate doorbell command:
> +	 *  bit [22]   - Cacheline allocated
> +	 *  bit [21:16] - Cacheline offset address
> +	 * (bit 21 must be zero, or our assumption of only using half a page is
> +	 * no longer correct).
> +	 */
> +	if (HAS_GUC_DIST_DB(xe)) {
> +		u32 dd_cacheline_info;
> +
> +		XE_WARN_ON(!(ret & BIT(22)));
> +		XE_WARN_ON(ret & BIT(21));
> +
> +		dd_cacheline_info = FIELD_GET(GENMASK(21, 16), ret);
> +		e->guc->doorbell_offset = dd_cacheline_info * cache_line_size();
> +
> +		/* and verify db status was updated correctly by the guc fw */
> +		XE_WARN_ON(doorbell_read(guc, e, db_status) !=
> +			   GUC_DOORBELL_ENABLED);
> +	}
> +
> +	set_engine_doorbell_registered(e);
> +
> +	return 0;
> +}
> +
>  static int alloc_guc_id(struct xe_guc *guc, struct xe_engine *e)
>  {
>  	int ret;
> @@ -623,6 +880,7 @@ static void submit_engine(struct xe_engine *e)
>  	u32 num_g2h = 0;
>  	int len = 0;
>  	bool extra_submit = false;
> +	bool enable = false;
>  
>  	XE_BUG_ON(!engine_registered(e));
>  
> @@ -642,6 +900,7 @@ static void submit_engine(struct xe_engine *e)
>  		num_g2h = 1;
>  		if (xe_engine_is_parallel(e))
>  			extra_submit = true;
> +		enable = true;
>  
>  		e->guc->resume_time = RESUME_PENDING;
>  		set_engine_pending_enable(e);
> @@ -653,7 +912,10 @@ static void submit_engine(struct xe_engine *e)
>  		trace_xe_engine_submit(e);
>  	}
>  
> -	xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
> +	if (enable || !engine_doorbell_registered(e))
> +		xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
> +	else
> +		ring_doorbell(guc, e);
>  
>  	if (extra_submit) {
>  		len = 0;
> @@ -678,8 +940,17 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
>  	trace_xe_sched_job_run(job);
>  
>  	if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
> -		if (!engine_registered(e))
> -			register_engine(e);
> +		if (!engine_registered(e)) {
> +			if (has_doorbell(e)) {
> +				int err = create_doorbell(engine_to_guc(e), e,
> +							  false);
> +
> +				/* Not fatal, but let's warn */
> +				XE_WARN_ON(err);
> +			} else {
> +				register_engine(e);
> +			}
> +		}
>  		if (!lr)	/* Written in IOCTL */
>  			e->ring_ops->emit_job(job);
>  		submit_engine(e);
> @@ -722,6 +993,11 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
>  	MAKE_SCHED_CONTEXT_ACTION(e, DISABLE);
>  	int ret;
>  
> +	if (has_doorbell(e)) {
> +		fini_doorbell(guc, e);
> +		deallocate_doorbell(guc, e->guc->id);
> +	}
> +
>  	set_min_preemption_timeout(guc, e);
>  	smp_rmb();
>  	ret = wait_event_timeout(guc->ct.wq, !engine_pending_enable(e) ||
> @@ -958,6 +1234,7 @@ static void __guc_engine_fini_async(struct work_struct *w)
>  		cancel_work_sync(&ge->lr_tdr);
>  	if (e->flags & ENGINE_FLAG_PERSISTENT)
>  		xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
> +	destroy_doorbell(guc, e);
>  	release_guc_id(guc, e);
>  	drm_sched_entity_fini(&ge->entity);
>  	drm_sched_fini(&ge->sched);
> @@ -1136,6 +1413,7 @@ static int guc_engine_init(struct xe_engine *e)
>  	struct xe_guc_engine *ge;
>  	long timeout;
>  	int err;
> +	bool create_db = false;
>  
>  	XE_BUG_ON(!xe_device_guc_submission_enabled(guc_to_xe(guc)));
>  
> @@ -1177,8 +1455,17 @@ static int guc_engine_init(struct xe_engine *e)
>  	if (guc_read_stopped(guc))
>  		drm_sched_stop(sched, NULL);
>  
> +	create_db = alloc_doorbell_id(guc, e);
> +
>  	mutex_unlock(&guc->submission_state.lock);
>  
> +	if (create_db) {
> +		/* Error isn't fatal as we don't need a doorbell */
> +		err = create_doorbell(guc, e, true);
> +		if (err)
> +			release_doorbell_id(guc, e);
> +	}
> +
>  	switch (e->class) {
>  	case XE_ENGINE_CLASS_RENDER:
>  		sprintf(e->name, "rcs%d", e->guc->id);
> @@ -1302,7 +1589,7 @@ static int guc_engine_set_job_timeout(struct xe_engine *e, u32 job_timeout_ms)
>  {
>  	struct drm_gpu_scheduler *sched = &e->guc->sched;
>  
> -	XE_BUG_ON(engine_registered(e));
> +	XE_BUG_ON(engine_registered(e) && !has_doorbell(e));
>  	XE_BUG_ON(engine_banned(e));
>  	XE_BUG_ON(engine_killed(e));
>  
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
> index 8002734d6f24..bada6c02d6aa 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.h
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
> @@ -13,6 +13,7 @@ struct xe_engine;
>  struct xe_guc;
>  
>  int xe_guc_submit_init(struct xe_guc *guc);
> +int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc);
>  void xe_guc_submit_print(struct xe_guc *guc, struct drm_printer *p);
>  
>  int xe_guc_submit_reset_prepare(struct xe_guc *guc);
> diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
> index ac7eec28934d..9ee4d572f4e0 100644
> --- a/drivers/gpu/drm/xe/xe_guc_types.h
> +++ b/drivers/gpu/drm/xe/xe_guc_types.h
> @@ -36,10 +36,14 @@ struct xe_guc {
>  		struct xarray engine_lookup;
>  		/** @guc_ids: used to allocate new guc_ids, single-lrc */
>  		struct ida guc_ids;
> +		/** @doorbell_ids: use to allocate new doorbells */
> +		struct ida doorbell_ids;
>  		/** @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc */
>  		unsigned long *guc_ids_bitmap;
>  		/** @stopped: submissions are stopped */
>  		atomic_t stopped;
> +		/** @num_doorbells: number of doorbels */
> +		int num_doorbells;
>  		/** @lock: protects submission state */
>  		struct mutex lock;
>  		/** @suspend: suspend fence state */
> diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> index 02861c26e145..38e9d7c6197b 100644
> --- a/drivers/gpu/drm/xe/xe_trace.h
> +++ b/drivers/gpu/drm/xe/xe_trace.h
> @@ -149,6 +149,11 @@ DEFINE_EVENT(xe_engine, xe_engine_submit,
>  	     TP_ARGS(e)
>  );
>  
> +DEFINE_EVENT(xe_engine, xe_engine_ring_db,
> +	     TP_PROTO(struct xe_engine *e),
> +	     TP_ARGS(e)
> +);
> +
>  DEFINE_EVENT(xe_engine, xe_engine_scheduling_enable,
>  	     TP_PROTO(struct xe_engine *e),
>  	     TP_ARGS(e)
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 19/31] drm/xe: Reduce the number list links in xe_vma
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 19/31] drm/xe: Reduce the number list links in xe_vma Matthew Brost
@ 2023-05-08 21:43   ` Rodrigo Vivi
  2023-05-11  8:38   ` Thomas Hellström
  1 sibling, 0 replies; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-08 21:43 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, May 01, 2023 at 05:17:15PM -0700, Matthew Brost wrote:
> 5 list links in can be squashed into a union in xe_vma as being on the
> various list is mutually exclusive.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_gt_pagefault.c |  2 +-
>  drivers/gpu/drm/xe/xe_pt.c           |  5 +-
>  drivers/gpu/drm/xe/xe_vm.c           | 29 ++++++------
>  drivers/gpu/drm/xe/xe_vm_types.h     | 71 +++++++++++++++-------------
>  4 files changed, 55 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> index cfffe3398fe4..d7bf6b0a0697 100644
> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> @@ -157,7 +157,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
>  
>  	if (xe_vma_is_userptr(vma) && write_locked) {
>  		spin_lock(&vm->userptr.invalidated_lock);
> -		list_del_init(&vma->userptr.invalidate_link);
> +		list_del_init(&vma->invalidate_link);
>  		spin_unlock(&vm->userptr.invalidated_lock);
>  
>  		ret = xe_vma_userptr_pin_pages(vma);
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 010f44260cda..8eab8e1bbaf0 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1116,8 +1116,7 @@ static int xe_pt_userptr_inject_eagain(struct xe_vma *vma)
>  
>  		vma->userptr.divisor = divisor << 1;
>  		spin_lock(&vm->userptr.invalidated_lock);
> -		list_move_tail(&vma->userptr.invalidate_link,
> -			       &vm->userptr.invalidated);
> +		list_move_tail(&vma->invalidate_link, &vm->userptr.invalidated);
>  		spin_unlock(&vm->userptr.invalidated_lock);
>  		return true;
>  	}
> @@ -1724,7 +1723,7 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
>  
>  		if (!vma->gt_present) {
>  			spin_lock(&vm->userptr.invalidated_lock);
> -			list_del_init(&vma->userptr.invalidate_link);
> +			list_del_init(&vma->invalidate_link);
>  			spin_unlock(&vm->userptr.invalidated_lock);
>  		}
>  		up_read(&vm->userptr.notifier_lock);
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index e0ed7201aeb0..e5f2fffb2aec 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -677,8 +677,7 @@ static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni,
>  	if (!xe_vm_in_fault_mode(vm) &&
>  	    !(vma->gpuva.flags & XE_VMA_DESTROYED) && vma->gt_present) {
>  		spin_lock(&vm->userptr.invalidated_lock);
> -		list_move_tail(&vma->userptr.invalidate_link,
> -			       &vm->userptr.invalidated);
> +		list_move_tail(&vma->invalidate_link, &vm->userptr.invalidated);
>  		spin_unlock(&vm->userptr.invalidated_lock);
>  	}
>  
> @@ -726,8 +725,8 @@ int xe_vm_userptr_pin(struct xe_vm *vm)
>  	/* Collect invalidated userptrs */
>  	spin_lock(&vm->userptr.invalidated_lock);
>  	list_for_each_entry_safe(vma, next, &vm->userptr.invalidated,
> -				 userptr.invalidate_link) {
> -		list_del_init(&vma->userptr.invalidate_link);
> +				 invalidate_link) {
> +		list_del_init(&vma->invalidate_link);
>  		list_move_tail(&vma->userptr_link, &vm->userptr.repin_list);
>  	}
>  	spin_unlock(&vm->userptr.invalidated_lock);
> @@ -830,12 +829,11 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  		return vma;
>  	}
>  
> -	/* FIXME: Way to many lists, should be able to reduce this */
> +	/*
> +	 * userptr_link, destroy_link, notifier.rebind_link,
> +	 * invalidate_link
> +	 */
>  	INIT_LIST_HEAD(&vma->rebind_link);
> -	INIT_LIST_HEAD(&vma->unbind_link);
> -	INIT_LIST_HEAD(&vma->userptr_link);
> -	INIT_LIST_HEAD(&vma->userptr.invalidate_link);
> -	INIT_LIST_HEAD(&vma->notifier.rebind_link);
>  	INIT_LIST_HEAD(&vma->extobj.link);
>  
>  	INIT_LIST_HEAD(&vma->gpuva.gem.entry);
> @@ -953,15 +951,14 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
>  	struct xe_vm *vm = xe_vma_vm(vma);
>  
>  	lockdep_assert_held_write(&vm->lock);
> -	XE_BUG_ON(!list_empty(&vma->unbind_link));
>  
>  	if (xe_vma_is_userptr(vma)) {
>  		XE_WARN_ON(!(vma->gpuva.flags & XE_VMA_DESTROYED));
>  
>  		spin_lock(&vm->userptr.invalidated_lock);
> -		list_del_init(&vma->userptr.invalidate_link);
> +		if (!list_empty(&vma->invalidate_link))
> +			list_del_init(&vma->invalidate_link);
>  		spin_unlock(&vm->userptr.invalidated_lock);
> -		list_del(&vma->userptr_link);
>  	} else if (!xe_vma_is_null(vma)) {
>  		xe_bo_assert_held(xe_vma_bo(vma));
>  		drm_gpuva_unlink(&vma->gpuva);
> @@ -1328,7 +1325,9 @@ void xe_vm_close_and_put(struct xe_vm *vm)
>  			continue;
>  		}
>  
> -		list_add_tail(&vma->unbind_link, &contested);
> +		if (!list_empty(&vma->destroy_link))
> +			list_del_init(&vma->destroy_link);
> +		list_add_tail(&vma->destroy_link, &contested);
>  	}
>  
>  	/*
> @@ -1356,8 +1355,8 @@ void xe_vm_close_and_put(struct xe_vm *vm)
>  	 * Since we hold a refcount to the bo, we can remove and free
>  	 * the members safely without locking.
>  	 */
> -	list_for_each_entry_safe(vma, next_vma, &contested, unbind_link) {
> -		list_del_init(&vma->unbind_link);
> +	list_for_each_entry_safe(vma, next_vma, &contested, destroy_link) {
> +		list_del_init(&vma->destroy_link);
>  		xe_vma_destroy_unlocked(vma);
>  	}
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index d55ec8156caa..22def5483c12 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -50,21 +50,32 @@ struct xe_vma {
>  	 */
>  	u64 gt_present;
>  
> -	/** @userptr_link: link into VM repin list if userptr */
> -	struct list_head userptr_link;
> +	union {
> +		/** @userptr_link: link into VM repin list if userptr */
> +		struct list_head userptr_link;
>  
> -	/**
> -	 * @rebind_link: link into VM if this VMA needs rebinding, and
> -	 * if it's a bo (not userptr) needs validation after a possible
> -	 * eviction. Protected by the vm's resv lock.
> -	 */
> -	struct list_head rebind_link;
> +		/**
> +		 * @rebind_link: link into VM if this VMA needs rebinding, and
> +		 * if it's a bo (not userptr) needs validation after a possible
> +		 * eviction. Protected by the vm's resv lock.
> +		 */
> +		struct list_head rebind_link;
>  
> -	/**
> -	 * @unbind_link: link or list head if an unbind of multiple VMAs, in
> -	 * single unbind op, is being done.
> -	 */
> -	struct list_head unbind_link;
> +		/** @destroy_link: link for contested VMAs on VM close */
> +		struct list_head destroy_link;
> +
> +		/** @invalidate_link: Link for the vm::userptr.invalidated list */
> +		struct list_head invalidate_link;
> +
> +		struct {
> +			 /*
> +			  * @notifier.rebind_link: link for
> +			  * vm->notifier.rebind_list, protected by
> +			  * vm->notifier.list_lock
> +			  */
> +			struct list_head rebind_link;
> +		} notifier;
> +	};
>  
>  	/** @destroy_cb: callback to destroy VMA when unbind job is done */
>  	struct dma_fence_cb destroy_cb;
> @@ -72,10 +83,22 @@ struct xe_vma {
>  	/** @destroy_work: worker to destroy this BO */
>  	struct work_struct destroy_work;
>  
> +	/** @usm: unified shared memory state */
> +	struct {
> +		/** @gt_invalidated: VMA has been invalidated */
> +		u64 gt_invalidated;
> +	} usm;
> +
> +	struct {
> +		/**
> +		 * @extobj.link: Link into vm's external object list.
> +		 * protected by the vm lock.
> +		 */
> +		struct list_head link;
> +	} extobj;
> +
>  	/** @userptr: user pointer state */
>  	struct {
> -		/** @invalidate_link: Link for the vm::userptr.invalidated list */
> -		struct list_head invalidate_link;
>  		/**
>  		 * @notifier: MMU notifier for user pointer (invalidation call back)
>  		 */
> @@ -96,24 +119,6 @@ struct xe_vma {
>  		u32 divisor;
>  #endif
>  	} userptr;
> -
> -	/** @usm: unified shared memory state */
> -	struct {
> -		/** @gt_invalidated: VMA has been invalidated */
> -		u64 gt_invalidated;
> -	} usm;
> -
> -	struct {
> -		struct list_head rebind_link;
> -	} notifier;
> -
> -	struct {
> -		/**
> -		 * @extobj.link: Link into vm's external object list.
> -		 * protected by the vm lock.
> -		 */
> -		struct list_head link;
> -	} extobj;
>  };
>  
>  struct xe_device;
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 06/31] drm/xe: Ensure LR engines are not persistent
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 06/31] drm/xe: Ensure LR engines are not persistent Matthew Brost
  2023-05-05 18:38   ` Rodrigo Vivi
@ 2023-05-09 12:21   ` Thomas Hellström
  1 sibling, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-09 12:21 UTC (permalink / raw)
  To: Matthew Brost, intel-xe


On 5/2/23 02:17, Matthew Brost wrote:
> With our ref counting scheme LR engines only close properly if not
> persistent, ensure that LR engines are non-persistent.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

With Rodrigos "Long-Running" comment addressed,

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


> ---
>   drivers/gpu/drm/xe/xe_engine.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
> index d1e84d7adbd4..91600b1e8249 100644
> --- a/drivers/gpu/drm/xe/xe_engine.c
> +++ b/drivers/gpu/drm/xe/xe_engine.c
> @@ -596,7 +596,9 @@ int xe_engine_create_ioctl(struct drm_device *dev, void *data,
>   			return -ENOENT;
>   
>   		e = xe_engine_create(xe, vm, logical_mask,
> -				     args->width, hwe, ENGINE_FLAG_PERSISTENT);
> +				     args->width, hwe,
> +				     xe_vm_no_dma_fences(vm) ? 0 :
> +				     ENGINE_FLAG_PERSISTENT);
>   		xe_vm_put(vm);
>   		if (IS_ERR(e))
>   			return PTR_ERR(e);

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 07/31] drm/xe: Only try to lock external BOs in VM bind
  2023-05-08 21:34       ` Rodrigo Vivi
@ 2023-05-09 12:29         ` Thomas Hellström
  2023-05-10 23:25           ` Matthew Brost
  0 siblings, 1 reply; 126+ messages in thread
From: Thomas Hellström @ 2023-05-09 12:29 UTC (permalink / raw)
  To: Rodrigo Vivi, Matthew Brost; +Cc: Rodrigo Vivi, intel-xe, Matthew Brost


On 5/8/23 23:34, Rodrigo Vivi wrote:
> On Mon, May 08, 2023 at 01:08:10AM +0000, Matthew Brost wrote:
>> On Fri, May 05, 2023 at 02:40:40PM -0400, Rodrigo Vivi wrote:
>>> On Mon, May 01, 2023 at 05:17:03PM -0700, Matthew Brost wrote:
>>>> Not needed and causes some issues with bulk LRU moves.
>>> I'm confused with this explanation and the code below.
>>> could you please provide a bit more wording here?
>>>
>> We only need to try to lock a BO if it external as non-external BOs
>> share the dma-resv with the already locked VM. Trying to lock
>> non-external BOs caused an issue (list corruption) in an uncoming patch

s/uncoming/upcoming/

Also it's not clear to me how this could fix a list corruption in the 
bulk LRU moves? I mean, if it's a duplicate lock then it gets removed 
from the tv list and not touched again? Could you explain the mechanism 
of the fix?

Thanks,

Thomas


>> which adds bulk LRU move. Since this code isn't needed, remove it.
> it makes more sense now. with this in commit msg (but with Christopher fix)
>
>
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>
>
>> ^^^ How about this.
>>
>>>> Signed-off-by: Matthew Brost <mattthew.brost@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/xe/xe_vm.c | 8 +++++---
>>>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>>>> index 272f0f7f24fe..6c427ff92c44 100644
>>>> --- a/drivers/gpu/drm/xe/xe_vm.c
>>>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>>>> @@ -2064,9 +2064,11 @@ static int vm_bind_ioctl(struct xe_vm *vm, struct xe_vma *vma,
>>>>   		 */
>>>>   		xe_bo_get(vbo);
>>>>   
>>>> -		tv_bo.bo = &vbo->ttm;
>>>> -		tv_bo.num_shared = 1;
>>>> -		list_add(&tv_bo.head, &objs);
>>>> +		if (!vbo->vm) {
>>>> +			tv_bo.bo = &vbo->ttm;
>>>> +			tv_bo.num_shared = 1;
>>>> +			list_add(&tv_bo.head, &objs);
>>>> +		}
>>>>   	}
>>>>   
>>>>   again:
>>>> -- 
>>>> 2.34.1
>>>>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move Matthew Brost
  2023-05-08 21:39   ` Rodrigo Vivi
@ 2023-05-09 12:47   ` Thomas Hellström
  2023-05-09 22:05     ` Matthew Brost
  1 sibling, 1 reply; 126+ messages in thread
From: Thomas Hellström @ 2023-05-09 12:47 UTC (permalink / raw)
  To: Matthew Brost, intel-xe


On 5/2/23 02:17, Matthew Brost wrote:
> Use the TTM LRU bulk move for BOs tied to a VM. Update the bulk moves
> LRU position on every exec.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_bo.c       | 32 ++++++++++++++++++++++++++++----
>   drivers/gpu/drm/xe/xe_bo.h       |  4 ++--
>   drivers/gpu/drm/xe/xe_dma_buf.c  |  2 +-
>   drivers/gpu/drm/xe/xe_exec.c     |  6 ++++++
>   drivers/gpu/drm/xe/xe_vm_types.h |  3 +++
>   5 files changed, 40 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 3ab404e33fae..da99ee53e7d7 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -985,6 +985,23 @@ static void xe_gem_object_free(struct drm_gem_object *obj)
>   	ttm_bo_put(container_of(obj, struct ttm_buffer_object, base));
>   }
>   
> +static void xe_gem_object_close(struct drm_gem_object *obj,
> +				struct drm_file *file_priv)
> +{
> +	struct xe_bo *bo = gem_to_xe_bo(obj);
> +
> +	if (bo->vm && !xe_vm_no_dma_fences(bo->vm)) {
Is there a reason we don't use bulk moves for LR vms? Admittedly bumping 
LRU doesn't make much sense when we support user-space command buffer 
chaining, but I think we should be doing it on exec at least, no?
> +		struct ww_acquire_ctx ww;
> +
> +		XE_BUG_ON(!xe_bo_is_user(bo));

Also why can't we use this for kernel objects as well? At some point we 
want to get to evictable page-table objects? Could we do this in the 
release_notify() callback to cover all potential bos?

/Thomas



^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 09/31] drm/xe/guc: Read HXG fields from DW1 of G2H response
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 09/31] drm/xe/guc: Read HXG fields from DW1 of G2H response Matthew Brost
  2023-05-05 18:50   ` Rodrigo Vivi
@ 2023-05-09 12:49   ` Thomas Hellström
  1 sibling, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-09 12:49 UTC (permalink / raw)
  To: Matthew Brost, intel-xe


On 5/2/23 02:17, Matthew Brost wrote:
> The HXG fields are DW1 not DW0, fix this.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


> ---
>   drivers/gpu/drm/xe/xe_guc_ct.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index 9055ff133a7c..6abf1dee95af 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -782,13 +782,13 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
>   	if (type == GUC_HXG_TYPE_RESPONSE_FAILURE) {
>   		g2h_fence->fail = true;
>   		g2h_fence->error =
> -			FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, msg[0]);
> +			FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, msg[1]);
>   		g2h_fence->hint =
> -			FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, msg[0]);
> +			FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, msg[1]);
>   	} else if (type == GUC_HXG_TYPE_NO_RESPONSE_RETRY) {
>   		g2h_fence->retry = true;
>   		g2h_fence->reason =
> -			FIELD_GET(GUC_HXG_RETRY_MSG_0_REASON, msg[0]);
> +			FIELD_GET(GUC_HXG_RETRY_MSG_0_REASON, msg[1]);
>   	} else if (g2h_fence->response_buffer) {
>   		g2h_fence->response_len = response_len;
>   		memcpy(g2h_fence->response_buffer, msg + GUC_CTB_MSG_MIN_LEN,

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 11/31] drm/xe/guc: Use doorbells for submission if possible
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 11/31] drm/xe/guc: Use doorbells for submission if possible Matthew Brost
  2023-05-08 21:42   ` Rodrigo Vivi
@ 2023-05-09 13:00   ` Thomas Hellström
  2023-05-10  0:51     ` Matthew Brost
  2023-05-21 12:32   ` Oded Gabbay
  2 siblings, 1 reply; 126+ messages in thread
From: Thomas Hellström @ 2023-05-09 13:00 UTC (permalink / raw)
  To: Matthew Brost, intel-xe; +Cc: Faith Ekstrand


On 5/2/23 02:17, Matthew Brost wrote:
> We have 256 doorbells (on most platforms) that we can allocate to bypass
> using the H2G channel for submission. This will avoid contention on the
> CT mutex.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Suggested-by: Faith Ekstrand <faith.ekstrand@collabora.com>

Could we describe in a DOC section how doorbells are distributed and if 
there are any suggestions on how to improve that moving forward?

/Thomas

> ---
>   drivers/gpu/drm/xe/regs/xe_guc_regs.h    |   1 +
>   drivers/gpu/drm/xe/xe_guc.c              |   6 +
>   drivers/gpu/drm/xe/xe_guc_engine_types.h |   7 +
>   drivers/gpu/drm/xe/xe_guc_submit.c       | 295 ++++++++++++++++++++++-
>   drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
>   drivers/gpu/drm/xe/xe_guc_types.h        |   4 +
>   drivers/gpu/drm/xe/xe_trace.h            |   5 +
>   7 files changed, 315 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/regs/xe_guc_regs.h b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> index 37e0ac550931..11b117293a62 100644
> --- a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> +++ b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> @@ -109,6 +109,7 @@ struct guc_doorbell_info {
>   
>   #define DIST_DBS_POPULATED			XE_REG(0xd08)
>   #define   DOORBELLS_PER_SQIDI_MASK		REG_GENMASK(23, 16)
> +#define	  DOORBELLS_PER_SQIDI_SHIFT		16
>   #define   SQIDIS_DOORBELL_EXIST_MASK		REG_GENMASK(15, 0)
>   
>   #define GUC_BCS_RCS_IER				XE_REG(0xC550)
> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> index 89d20faced19..0c87f78a868b 100644
> --- a/drivers/gpu/drm/xe/xe_guc.c
> +++ b/drivers/gpu/drm/xe/xe_guc.c
> @@ -297,6 +297,12 @@ int xe_guc_init(struct xe_guc *guc)
>    */
>   int xe_guc_init_post_hwconfig(struct xe_guc *guc)
>   {
> +	int ret;
> +
> +	ret = xe_guc_submit_init_post_hwconfig(guc);
> +	if (ret)
> +		return ret;
> +
>   	return xe_guc_ads_init_post_hwconfig(&guc->ads);
>   }
>   
> diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> index 5d83132034a6..420b7f53e649 100644
> --- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
> +++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> @@ -12,6 +12,7 @@
>   #include <drm/gpu_scheduler.h>
>   
>   struct dma_fence;
> +struct xe_bo;
>   struct xe_engine;
>   
>   /**
> @@ -37,6 +38,10 @@ struct xe_guc_engine {
>   	struct work_struct fini_async;
>   	/** @resume_time: time of last resume */
>   	u64 resume_time;
> +	/** @doorbell_bo: BO for memory doorbell */
> +	struct xe_bo *doorbell_bo;
> +	/** @doorbell_offset: MMIO doorbell offset */
> +	u32 doorbell_offset;
>   	/** @state: GuC specific state for this xe_engine */
>   	atomic_t state;
>   	/** @wqi_head: work queue item tail */
> @@ -45,6 +50,8 @@ struct xe_guc_engine {
>   	u32 wqi_tail;
>   	/** @id: GuC id for this xe_engine */
>   	u16 id;
> +	/** @doorbell_id: doorbell id */
> +	u16 doorbell_id;
>   	/** @suspend_wait: wait queue used to wait on pending suspends */
>   	wait_queue_head_t suspend_wait;
>   	/** @suspend_pending: a suspend of the engine is pending */
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 0a41f5d04f6d..1b6f36b04cd1 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -13,7 +13,10 @@
>   
>   #include <drm/drm_managed.h>
>   
> +#include "regs/xe_guc_regs.h"
>   #include "regs/xe_lrc_layout.h"
> +
> +#include "xe_bo.h"
>   #include "xe_device.h"
>   #include "xe_engine.h"
>   #include "xe_force_wake.h"
> @@ -26,12 +29,22 @@
>   #include "xe_lrc.h"
>   #include "xe_macros.h"
>   #include "xe_map.h"
> +#include "xe_mmio.h"
>   #include "xe_mocs.h"
>   #include "xe_ring_ops_types.h"
>   #include "xe_sched_job.h"
>   #include "xe_trace.h"
>   #include "xe_vm.h"
>   
> +#define HAS_GUC_MMIO_DB(xe) (IS_DGFX(xe) || GRAPHICS_VERx100(xe) >= 1250)
> +#define HAS_GUC_DIST_DB(xe) \
> +	(GRAPHICS_VERx100(xe) >= 1200 && !HAS_GUC_MMIO_DB(xe))
> +
> +#define GUC_NUM_HW_DOORBELLS 256
> +
> +#define GUC_MMIO_DB_BAR_OFFSET SZ_4M
> +#define GUC_MMIO_DB_BAR_SIZE SZ_4M
> +
>   static struct xe_gt *
>   guc_to_gt(struct xe_guc *guc)
>   {
> @@ -63,6 +76,7 @@ engine_to_guc(struct xe_engine *e)
>   #define ENGINE_STATE_SUSPENDED		(1 << 5)
>   #define ENGINE_STATE_RESET		(1 << 6)
>   #define ENGINE_STATE_KILLED		(1 << 7)
> +#define ENGINE_STATE_DB_REGISTERED	(1 << 8)
>   
>   static bool engine_registered(struct xe_engine *e)
>   {
> @@ -179,6 +193,16 @@ static void set_engine_killed(struct xe_engine *e)
>   	atomic_or(ENGINE_STATE_KILLED, &e->guc->state);
>   }
>   
> +static bool engine_doorbell_registered(struct xe_engine *e)
> +{
> +	return atomic_read(&e->guc->state) & ENGINE_STATE_DB_REGISTERED;
> +}
> +
> +static void set_engine_doorbell_registered(struct xe_engine *e)
> +{
> +	atomic_or(ENGINE_STATE_DB_REGISTERED, &e->guc->state);
> +}
> +
>   static bool engine_killed_or_banned(struct xe_engine *e)
>   {
>   	return engine_killed(e) || engine_banned(e);
> @@ -190,6 +214,7 @@ static void guc_submit_fini(struct drm_device *drm, void *arg)
>   
>   	xa_destroy(&guc->submission_state.engine_lookup);
>   	ida_destroy(&guc->submission_state.guc_ids);
> +	ida_destroy(&guc->submission_state.doorbell_ids);
>   	bitmap_free(guc->submission_state.guc_ids_bitmap);
>   }
>   
> @@ -230,6 +255,7 @@ int xe_guc_submit_init(struct xe_guc *guc)
>   	mutex_init(&guc->submission_state.lock);
>   	xa_init(&guc->submission_state.engine_lookup);
>   	ida_init(&guc->submission_state.guc_ids);
> +	ida_init(&guc->submission_state.doorbell_ids);
>   
>   	spin_lock_init(&guc->submission_state.suspend.lock);
>   	guc->submission_state.suspend.context = dma_fence_context_alloc(1);
> @@ -243,6 +269,237 @@ int xe_guc_submit_init(struct xe_guc *guc)
>   	return 0;
>   }
>   
> +int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc)
> +{
> +	if (HAS_GUC_DIST_DB(guc_to_xe(guc))) {
> +		u32 distdbreg = xe_mmio_read32(guc_to_gt(guc),
> +					       DIST_DBS_POPULATED.reg);
> +		u32 num_sqidi =
> +			hweight32(distdbreg & SQIDIS_DOORBELL_EXIST_MASK);
> +		u32 doorbells_per_sqidi =
> +			((distdbreg >> DOORBELLS_PER_SQIDI_SHIFT) &
> +			 DOORBELLS_PER_SQIDI_MASK) + 1;
> +
> +		guc->submission_state.num_doorbells =
> +			num_sqidi * doorbells_per_sqidi;
> +	} else {
> +		guc->submission_state.num_doorbells = GUC_NUM_HW_DOORBELLS;
> +	}
> +
> +	return 0;
> +}
> +
> +static bool alloc_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
> +{
> +	int ret;
> +
> +	lockdep_assert_held(&guc->submission_state.lock);
> +
> +	e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
> +	ret = ida_simple_get(&guc->submission_state.doorbell_ids, 0,
> +			     guc->submission_state.num_doorbells, GFP_NOWAIT);
> +	if (ret < 0)
> +		return false;
> +
> +	e->guc->doorbell_id = ret;
> +
> +	return true;
> +}
> +
> +static void release_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
> +{
> +	mutex_lock(&guc->submission_state.lock);
> +	ida_simple_remove(&guc->submission_state.doorbell_ids,
> +			  e->guc->doorbell_id);
> +	mutex_unlock(&guc->submission_state.lock);
> +
> +	e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
> +}
> +
> +static int allocate_doorbell(struct xe_guc *guc, u16 guc_id, u16 doorbell_id,
> +			     u64 gpa, u32 gtt_addr)
> +{
> +	u32 action[] = {
> +		XE_GUC_ACTION_ALLOCATE_DOORBELL,
> +		guc_id,
> +		doorbell_id,
> +		lower_32_bits(gpa),
> +		upper_32_bits(gpa),
> +		gtt_addr
> +	};
> +
> +	return xe_guc_ct_send_block(&guc->ct, action, ARRAY_SIZE(action));
> +}
> +
> +static void deallocate_doorbell(struct xe_guc *guc, u16 guc_id)
> +{
> +	u32 action[] = {
> +		XE_GUC_ACTION_DEALLOCATE_DOORBELL,
> +		guc_id
> +	};
> +
> +	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> +}
> +
> +static bool has_doorbell(struct xe_engine *e)
> +{
> +	return e->guc->doorbell_id != GUC_NUM_HW_DOORBELLS;
> +}
> +
> +#define doorbell_read(guc_, e_, field_) ({			\
> +	struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;	\
> +	iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);	\
> +	xe_map_rd_field(guc_to_xe((guc_)), &_vmap, 0,		\
> +				  struct guc_doorbell_info, field_); \
> +	})
> +#define doorbell_write(guc_, e_, field_, val_) ({		\
> +	struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;	\
> +	iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);	\
> +	xe_map_wr_field(guc_to_xe((guc_)), &_vmap, 0,		\
> +				  struct guc_doorbell_info, field_, val_); \
> +	})
> +
> +static void init_doorbell(struct xe_guc *guc, struct xe_engine *e)
> +{
> +	struct xe_device *xe = guc_to_xe(guc);
> +
> +	/* GuC does the initialization with distributed and MMIO doorbells */
> +	if (!HAS_GUC_DIST_DB(xe) && !HAS_GUC_MMIO_DB(xe)) {
> +		doorbell_write(guc, e, db_status, GUC_DOORBELL_ENABLED);
> +		doorbell_write(guc, e, cookie, 0);
> +	}
> +}
> +
> +static void fini_doorbell(struct xe_guc *guc, struct xe_engine *e)
> +{
> +	if (!HAS_GUC_MMIO_DB(guc_to_xe(guc)) &&
> +	    xe_device_mem_access_ongoing(guc_to_xe(guc)))
> +		doorbell_write(guc, e, db_status, GUC_DOORBELL_DISABLED);
> +}
> +
> +static void destroy_doorbell(struct xe_guc *guc, struct xe_engine *e)
> +{
> +	if (has_doorbell(e)) {
> +		release_doorbell_id(guc, e);
> +		xe_bo_unpin_map_no_vm(e->guc->doorbell_bo);
> +	}
> +}
> +
> +static void ring_memory_doorbell(struct xe_guc *guc, struct xe_engine *e)
> +{
> +	u32 cookie;
> +
> +	cookie = doorbell_read(guc, e, cookie);
> +	doorbell_write(guc, e, cookie, cookie + 1 ?: cookie + 2);
> +
> +	XE_WARN_ON(doorbell_read(guc, e, db_status) != GUC_DOORBELL_ENABLED);
> +}
> +
> +#define GUC_MMIO_DOORBELL_RING_ACK	0xACEDBEEF
> +#define GUC_MMIO_DOORBELL_RING_NACK	0xDEADBEEF
> +static void ring_mmio_doorbell(struct xe_guc *guc, u32 doorbell_offset)
> +{
> +	u32 db_value;
> +
> +	db_value = xe_mmio_read32(guc_to_gt(guc), GUC_MMIO_DB_BAR_OFFSET +
> +				  doorbell_offset);
> +
> +	/*
> +	 * The read from the doorbell page will return ack/nack. We don't remove
> +	 * doorbells from active clients so we don't expect to ever get a nack.
> +	 * XXX: if doorbell is lost, re-acquire it?
> +	 */
> +	XE_WARN_ON(db_value == GUC_MMIO_DOORBELL_RING_NACK);
> +	XE_WARN_ON(db_value != GUC_MMIO_DOORBELL_RING_ACK);
> +}
> +
> +static void ring_doorbell(struct xe_guc *guc, struct xe_engine *e)
> +{
> +	XE_BUG_ON(!has_doorbell(e));
> +
> +	if (HAS_GUC_MMIO_DB(guc_to_xe(guc)))
> +		ring_mmio_doorbell(guc, e->guc->doorbell_offset);
> +	else
> +		ring_memory_doorbell(guc, e);
> +
> +	trace_xe_engine_ring_db(e);
> +}
> +
> +static void register_engine(struct xe_engine *e);
> +
> +static int create_doorbell(struct xe_guc *guc, struct xe_engine *e, bool init)
> +{
> +	struct xe_gt *gt = guc_to_gt(guc);
> +	struct xe_device *xe = gt_to_xe(gt);
> +	u64 gpa;
> +	u32 gtt_addr;
> +	int ret;
> +
> +	XE_BUG_ON(!has_doorbell(e));
> +
> +	if (HAS_GUC_MMIO_DB(xe)) {
> +		e->guc->doorbell_offset = PAGE_SIZE * e->guc->doorbell_id;
> +		gpa = GUC_MMIO_DB_BAR_OFFSET + e->guc->doorbell_offset;
> +		gtt_addr = 0;
> +	} else {
> +		struct xe_bo *bo;
> +
> +		if (!e->guc->doorbell_bo) {
> +			bo = xe_bo_create_pin_map(xe, gt, NULL, PAGE_SIZE,
> +						  ttm_bo_type_kernel,
> +						  XE_BO_CREATE_VRAM_IF_DGFX(gt) |
> +						  XE_BO_CREATE_GGTT_BIT);
> +			if (IS_ERR(bo))
> +				return PTR_ERR(bo);
> +
> +			e->guc->doorbell_bo = bo;
> +		} else {
> +			bo = e->guc->doorbell_bo;
> +		}
> +
> +		init_doorbell(guc, e);
> +		gpa = xe_bo_main_addr(bo, PAGE_SIZE);
> +		gtt_addr = xe_bo_ggtt_addr(bo);
> +	}
> +
> +	if (init && e->flags & ENGINE_FLAG_KERNEL)
> +		return 0;
> +
> +	register_engine(e);
> +	ret = allocate_doorbell(guc, e->guc->id, e->guc->doorbell_id, gpa,
> +				gtt_addr);
> +	if (ret < 0) {
> +		fini_doorbell(guc, e);
> +		return ret;
> +	}
> +
> +	/*
> +	 * In distributed doorbells, guc is returning the cacheline selected
> +	 * by HW as part of the 7bit data from the allocate doorbell command:
> +	 *  bit [22]   - Cacheline allocated
> +	 *  bit [21:16] - Cacheline offset address
> +	 * (bit 21 must be zero, or our assumption of only using half a page is
> +	 * no longer correct).
> +	 */
> +	if (HAS_GUC_DIST_DB(xe)) {
> +		u32 dd_cacheline_info;
> +
> +		XE_WARN_ON(!(ret & BIT(22)));
> +		XE_WARN_ON(ret & BIT(21));
> +
> +		dd_cacheline_info = FIELD_GET(GENMASK(21, 16), ret);
> +		e->guc->doorbell_offset = dd_cacheline_info * cache_line_size();
> +
> +		/* and verify db status was updated correctly by the guc fw */
> +		XE_WARN_ON(doorbell_read(guc, e, db_status) !=
> +			   GUC_DOORBELL_ENABLED);
> +	}
> +
> +	set_engine_doorbell_registered(e);
> +
> +	return 0;
> +}
> +
>   static int alloc_guc_id(struct xe_guc *guc, struct xe_engine *e)
>   {
>   	int ret;
> @@ -623,6 +880,7 @@ static void submit_engine(struct xe_engine *e)
>   	u32 num_g2h = 0;
>   	int len = 0;
>   	bool extra_submit = false;
> +	bool enable = false;
>   
>   	XE_BUG_ON(!engine_registered(e));
>   
> @@ -642,6 +900,7 @@ static void submit_engine(struct xe_engine *e)
>   		num_g2h = 1;
>   		if (xe_engine_is_parallel(e))
>   			extra_submit = true;
> +		enable = true;
>   
>   		e->guc->resume_time = RESUME_PENDING;
>   		set_engine_pending_enable(e);
> @@ -653,7 +912,10 @@ static void submit_engine(struct xe_engine *e)
>   		trace_xe_engine_submit(e);
>   	}
>   
> -	xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
> +	if (enable || !engine_doorbell_registered(e))
> +		xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
> +	else
> +		ring_doorbell(guc, e);
>   
>   	if (extra_submit) {
>   		len = 0;
> @@ -678,8 +940,17 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
>   	trace_xe_sched_job_run(job);
>   
>   	if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
> -		if (!engine_registered(e))
> -			register_engine(e);
> +		if (!engine_registered(e)) {
> +			if (has_doorbell(e)) {
> +				int err = create_doorbell(engine_to_guc(e), e,
> +							  false);
> +
> +				/* Not fatal, but let's warn */
> +				XE_WARN_ON(err);
> +			} else {
> +				register_engine(e);
> +			}
> +		}
>   		if (!lr)	/* Written in IOCTL */
>   			e->ring_ops->emit_job(job);
>   		submit_engine(e);
> @@ -722,6 +993,11 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
>   	MAKE_SCHED_CONTEXT_ACTION(e, DISABLE);
>   	int ret;
>   
> +	if (has_doorbell(e)) {
> +		fini_doorbell(guc, e);
> +		deallocate_doorbell(guc, e->guc->id);
> +	}
> +
>   	set_min_preemption_timeout(guc, e);
>   	smp_rmb();
>   	ret = wait_event_timeout(guc->ct.wq, !engine_pending_enable(e) ||
> @@ -958,6 +1234,7 @@ static void __guc_engine_fini_async(struct work_struct *w)
>   		cancel_work_sync(&ge->lr_tdr);
>   	if (e->flags & ENGINE_FLAG_PERSISTENT)
>   		xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
> +	destroy_doorbell(guc, e);
>   	release_guc_id(guc, e);
>   	drm_sched_entity_fini(&ge->entity);
>   	drm_sched_fini(&ge->sched);
> @@ -1136,6 +1413,7 @@ static int guc_engine_init(struct xe_engine *e)
>   	struct xe_guc_engine *ge;
>   	long timeout;
>   	int err;
> +	bool create_db = false;
>   
>   	XE_BUG_ON(!xe_device_guc_submission_enabled(guc_to_xe(guc)));
>   
> @@ -1177,8 +1455,17 @@ static int guc_engine_init(struct xe_engine *e)
>   	if (guc_read_stopped(guc))
>   		drm_sched_stop(sched, NULL);
>   
> +	create_db = alloc_doorbell_id(guc, e);
> +
>   	mutex_unlock(&guc->submission_state.lock);
>   
> +	if (create_db) {
> +		/* Error isn't fatal as we don't need a doorbell */
> +		err = create_doorbell(guc, e, true);
> +		if (err)
> +			release_doorbell_id(guc, e);
> +	}
> +
>   	switch (e->class) {
>   	case XE_ENGINE_CLASS_RENDER:
>   		sprintf(e->name, "rcs%d", e->guc->id);
> @@ -1302,7 +1589,7 @@ static int guc_engine_set_job_timeout(struct xe_engine *e, u32 job_timeout_ms)
>   {
>   	struct drm_gpu_scheduler *sched = &e->guc->sched;
>   
> -	XE_BUG_ON(engine_registered(e));
> +	XE_BUG_ON(engine_registered(e) && !has_doorbell(e));
>   	XE_BUG_ON(engine_banned(e));
>   	XE_BUG_ON(engine_killed(e));
>   
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
> index 8002734d6f24..bada6c02d6aa 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.h
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
> @@ -13,6 +13,7 @@ struct xe_engine;
>   struct xe_guc;
>   
>   int xe_guc_submit_init(struct xe_guc *guc);
> +int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc);
>   void xe_guc_submit_print(struct xe_guc *guc, struct drm_printer *p);
>   
>   int xe_guc_submit_reset_prepare(struct xe_guc *guc);
> diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
> index ac7eec28934d..9ee4d572f4e0 100644
> --- a/drivers/gpu/drm/xe/xe_guc_types.h
> +++ b/drivers/gpu/drm/xe/xe_guc_types.h
> @@ -36,10 +36,14 @@ struct xe_guc {
>   		struct xarray engine_lookup;
>   		/** @guc_ids: used to allocate new guc_ids, single-lrc */
>   		struct ida guc_ids;
> +		/** @doorbell_ids: use to allocate new doorbells */
> +		struct ida doorbell_ids;
>   		/** @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc */
>   		unsigned long *guc_ids_bitmap;
>   		/** @stopped: submissions are stopped */
>   		atomic_t stopped;
> +		/** @num_doorbells: number of doorbels */
> +		int num_doorbells;
>   		/** @lock: protects submission state */
>   		struct mutex lock;
>   		/** @suspend: suspend fence state */
> diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> index 02861c26e145..38e9d7c6197b 100644
> --- a/drivers/gpu/drm/xe/xe_trace.h
> +++ b/drivers/gpu/drm/xe/xe_trace.h
> @@ -149,6 +149,11 @@ DEFINE_EVENT(xe_engine, xe_engine_submit,
>   	     TP_ARGS(e)
>   );
>   
> +DEFINE_EVENT(xe_engine, xe_engine_ring_db,
> +	     TP_PROTO(struct xe_engine *e),
> +	     TP_ARGS(e)
> +);
> +
>   DEFINE_EVENT(xe_engine, xe_engine_scheduling_enable,
>   	     TP_PROTO(struct xe_engine *e),
>   	     TP_ARGS(e)

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 12/31] drm/xe/guc: Print doorbell ID in GuC engine debugfs entry
  2023-05-05 18:55   ` Rodrigo Vivi
@ 2023-05-09 13:01     ` Thomas Hellström
  0 siblings, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-09 13:01 UTC (permalink / raw)
  To: Rodrigo Vivi, Matthew Brost; +Cc: intel-xe


On 5/5/23 20:55, Rodrigo Vivi wrote:
> On Mon, May 01, 2023 at 05:17:08PM -0700, Matthew Brost wrote:
>> This information is helpful so print it.
>>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>

>
>> ---
>>   drivers/gpu/drm/xe/xe_guc_submit.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>> index 1b6f36b04cd1..880f480c6d5f 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> @@ -2016,6 +2016,8 @@ static void guc_engine_print(struct xe_engine *e, struct drm_printer *p)
>>   	drm_printf(p, "\tTimeslice: %u (us)\n", e->sched_props.timeslice_us);
>>   	drm_printf(p, "\tPreempt timeout: %u (us)\n",
>>   		   e->sched_props.preempt_timeout_us);
>> +	drm_printf(p, "\tDoorbell ID: %u\n",
>> +		   e->guc->doorbell_id);
>>   	for (i = 0; i < e->width; ++i ) {
>>   		struct xe_lrc *lrc = e->lrc + i;
>>   
>> -- 
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 13/31] maple_tree: split up MA_STATE() macro
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 13/31] maple_tree: split up MA_STATE() macro Matthew Brost
@ 2023-05-09 13:21   ` Thomas Hellström
  2023-05-10  0:29     ` Matthew Brost
  0 siblings, 1 reply; 126+ messages in thread
From: Thomas Hellström @ 2023-05-09 13:21 UTC (permalink / raw)
  To: Matthew Brost, intel-xe; +Cc: Danilo Krummrich


On 5/2/23 02:17, Matthew Brost wrote:
> From: Danilo Krummrich <dakr@redhat.com>
>
> Split up the MA_STATE() macro such that components using the maple tree
> can easily inherit from struct ma_state and build custom tree walk
> macros to hide their internals from users.

I might misunderstand the patch, but isn't the real purpose to provide 
an MA_STATE initializer,and the way to achieve that is to split up the 
MA_STATE macro?

>
> Example:
>
> struct sample_iterator {
>          struct ma_state mas;
>          struct sample_mgr *mgr;
> };
>
> \#define SAMPLE_ITERATOR(name, __mgr, start)                    \
>          struct sample_iterator name = {                         \
>                  .mas = MA_STATE_INIT(&(__mgr)->mt, start, 0),   \
>                  .mgr = __mgr,                                   \
>          }
>
> \#define sample_iter_for_each_range(it__, entry__, end__) \
>          mas_for_each(&(it__).mas, entry__, end__)
>
> --
>
> struct sample *sample;
> SAMPLE_ITERATOR(si, min);
>
> sample_iter_for_each_range(&si, sample, max) {
>          frob(mgr, sample);
> }
>
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> ---
>   include/linux/maple_tree.h | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> index 1fadb5f5978b..87d55334f1c2 100644
> --- a/include/linux/maple_tree.h
> +++ b/include/linux/maple_tree.h
> @@ -423,8 +423,8 @@ struct ma_wr_state {
>   #define MA_ERROR(err) \
>   		((struct maple_enode *)(((unsigned long)err << 2) | 2UL))
>   
> -#define MA_STATE(name, mt, first, end)					\
> -	struct ma_state name = {					\
> +#define MA_STATE_INIT(mt, first, end)					\
> +	{								\

Naming: following the convention in, for example, the mutex and ww mutex 
code this should've been called

__MA_STATE_INITIALIZER(),

whereas the decapitalized name ma_state_init() would've been a (possibly 
inline) init function if it existed.

But this all should ofc be run by the maple tree maintainer(s).

FWIW, with these things addressed the change LGTM.

/Thomas


>   		.tree = mt,						\
>   		.index = first,						\
>   		.last = end,						\
> @@ -435,6 +435,9 @@ struct ma_wr_state {
>   		.mas_flags = 0,						\
>   	}
>   
> +#define MA_STATE(name, mt, first, end)					\
> +	struct ma_state name = MA_STATE_INIT(mt, first, end)
> +
>   #define MA_WR_STATE(name, ma_state, wr_entry)				\
>   	struct ma_wr_state name = {					\
>   		.mas = ma_state,					\

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 14/31] maple_tree: Export mas_preallocate
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 14/31] maple_tree: Export mas_preallocate Matthew Brost
@ 2023-05-09 13:33   ` Thomas Hellström
  2023-05-10  0:31     ` Matthew Brost
  0 siblings, 1 reply; 126+ messages in thread
From: Thomas Hellström @ 2023-05-09 13:33 UTC (permalink / raw)
  To: Matthew Brost, intel-xe


On 5/2/23 02:17, Matthew Brost wrote:
> The DRM GPUVA implementation needs this function.

A more thorough explanation as to why  it's needed would help convince 
maple tree maintainers an export is needed.

Otherwise the change itself LGTM.

/Thomas


>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   lib/maple_tree.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 9e2735cbc2b4..ae37a167e25d 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -5726,6 +5726,7 @@ int mas_preallocate(struct ma_state *mas, gfp_t gfp)
>   	mas_reset(mas);
>   	return ret;
>   }
> +EXPORT_SYMBOL_GPL(mas_preallocate);
>   
>   /*
>    * mas_destroy() - destroy a maple state.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 15/31] drm: manager to keep track of GPUs VA mappings
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 15/31] drm: manager to keep track of GPUs VA mappings Matthew Brost
@ 2023-05-09 13:49   ` Thomas Hellström
  2023-05-10  0:55     ` Matthew Brost
  0 siblings, 1 reply; 126+ messages in thread
From: Thomas Hellström @ 2023-05-09 13:49 UTC (permalink / raw)
  To: Matthew Brost, intel-xe; +Cc: Dave Airlie, Danilo Krummrich


On 5/2/23 02:17, Matthew Brost wrote:
> From: Danilo Krummrich <dakr@redhat.com>
>
> Add infrastructure to keep track of GPU virtual address (VA) mappings
> with a decicated VA space manager implementation.
>
> New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
> start implementing, allow userspace applications to request multiple and
> arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
> intended to serve the following purposes in this context.
>
> 1) Provide infrastructure to track GPU VA allocations and mappings,
>     making use of the maple_tree.
>
> 2) Generically connect GPU VA mappings to their backing buffers, in
>     particular DRM GEM objects.
>
> 3) Provide a common implementation to perform more complex mapping
>     operations on the GPU VA space. In particular splitting and merging
>     of GPU VA mappings, e.g. for intersecting mapping requests or partial
>     unmap requests.
>
> Suggested-by: Dave Airlie <airlied@redhat.com>
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Danilo, Matthew

Before embarking on a full review of this (saving this for last) I heard 
there might be plans to add userptr support, rebind_work interaction and 
such and resolve any driver differences using vfuncs.

Just wanted to raise a warning that helpers that attempt to "do it all" 
and depend on vfuncs are easy traps to start creating middle layers 
(like TTM) which are typically frowned upon. (See for example the 
discussion on the partly rejected patch series on the TTM shrinker).

So just as a recommendation to avoid redoing a lot of stuff, please be 
careful with additional helpers that require vfuncs and check if they 
can be implemented in another way by rethinking the layering.

Thanks,

Thomas



^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 16/31] drm/xe: Port Xe to GPUVA
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 16/31] drm/xe: Port Xe to GPUVA Matthew Brost
@ 2023-05-09 13:52   ` Thomas Hellström
  2023-05-11  2:41     ` Matthew Brost
  0 siblings, 1 reply; 126+ messages in thread
From: Thomas Hellström @ 2023-05-09 13:52 UTC (permalink / raw)
  To: Matthew Brost, intel-xe

Hi, Matthew,

On 5/2/23 02:17, Matthew Brost wrote:
> Rather than open coding VM binds and VMA tracking, use the GPUVA
> library. GPUVA provides a common infrastructure for VM binds to use mmap
> / munmap semantics and support for VK sparse bindings.
>
> The concepts are:
>
> 1) xe_vm inherits from drm_gpuva_manager
> 2) xe_vma inherits from drm_gpuva
> 3) xe_vma_op inherits from drm_gpuva_op
> 4) VM bind operations (MAP, UNMAP, PREFETCH, UNMAP_ALL) call into the
> GPUVA code to generate an VMA operations list which is parsed, commited,
> and executed.
>
> v2 (CI): Add break after default in case statement.
> v3: Rebase
> v4: Fix some error handling
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Before embarking on a second review of this code it would really be 
beneficial if you could address some comments from the first review. In 
particular splitting this huge patch up if possible (and I also think 
that removing the async worker *before* this patch if at all possible 
would really ease the review both for me and potential upcoming reviewers).

Thanks,

Thomas



^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 17/31] drm/xe: NULL binding implementation
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 17/31] drm/xe: NULL binding implementation Matthew Brost
@ 2023-05-09 14:34   ` Rodrigo Vivi
  2023-05-11  2:52     ` Matthew Brost
  2023-05-09 15:17   ` Thomas Hellström
  1 sibling, 1 reply; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-09 14:34 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, May 01, 2023 at 05:17:13PM -0700, Matthew Brost wrote:
> Add uAPI and implementation for NULL bindings. A NULL binding is defined
> as writes dropped and read zero. A single bit in the uAPI has been added
> which results in a single bit in the PTEs being set.

I have confirmed in the spec that this is the case for the BIT 9!

"If Null=1, the h/w will avoid the memory access and return all
zero's for the read access with a null completion, write accesses are dropped"

The code looks good, but just a few questions / comments below.

> 
> NULL bindings are indended to be used to implement VK sparse bindings.

is there any example available or any documentation that could explain
how this is used and why this is needed?

any IGT?

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.h           |  1 +
>  drivers/gpu/drm/xe/xe_exec.c         |  2 +
>  drivers/gpu/drm/xe/xe_gt_pagefault.c |  4 +-
>  drivers/gpu/drm/xe/xe_pt.c           | 77 ++++++++++++++++-------
>  drivers/gpu/drm/xe/xe_vm.c           | 92 ++++++++++++++++++----------
>  drivers/gpu/drm/xe/xe_vm.h           | 10 +++
>  drivers/gpu/drm/xe/xe_vm_madvise.c   |  2 +-
>  drivers/gpu/drm/xe/xe_vm_types.h     |  3 +
>  include/uapi/drm/xe_drm.h            |  8 +++
>  9 files changed, 144 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index 25457b3c757b..81051f456874 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -56,6 +56,7 @@
>  #define XE_PDE_IPS_64K			BIT_ULL(11)
>  
>  #define XE_GGTT_PTE_LM			BIT_ULL(1)
> +#define XE_PTE_NULL			BIT_ULL(9)
>  #define XE_USM_PPGTT_PTE_AE		BIT_ULL(10)
>  #define XE_PPGTT_PTE_LM			BIT_ULL(11)
>  #define XE_PDE_64K			BIT_ULL(6)
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index 90c46d092737..68f876afd13c 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -116,6 +116,8 @@ static int xe_exec_begin(struct xe_engine *e, struct ww_acquire_ctx *ww,
>  	 * to a location where the GPU can access it).
>  	 */
>  	list_for_each_entry(vma, &vm->rebind_list, rebind_link) {
> +		XE_BUG_ON(xe_vma_is_null(vma));

Can we avoid BUG here? Maybe a WARN?

> +
>  		if (xe_vma_is_userptr(vma))
>  			continue;
>  
> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> index f7a066090a13..cfffe3398fe4 100644
> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> @@ -526,8 +526,8 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
>  
>  	trace_xe_vma_acc(vma);
>  
> -	/* Userptr can't be migrated, nothing to do */
> -	if (xe_vma_is_userptr(vma))
> +	/* Userptr or null can't be migrated, nothing to do */
> +	if (xe_vma_has_no_bo(vma))
>  		goto unlock_vm;
>  
>  	/* Lock VM and BOs dma-resv */
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 2b5b05a8a084..b4edb751bfbb 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -82,7 +82,9 @@ u64 gen8_pde_encode(struct xe_bo *bo, u64 bo_offset,
>  static dma_addr_t vma_addr(struct xe_vma *vma, u64 offset,
>  			   size_t page_size, bool *is_vram)
>  {
> -	if (xe_vma_is_userptr(vma)) {
> +	if (xe_vma_is_null(vma)) {
> +		return 0;
> +	} else if (xe_vma_is_userptr(vma)) {
>  		struct xe_res_cursor cur;
>  		u64 page;
>  
> @@ -563,6 +565,10 @@ static bool xe_pt_hugepte_possible(u64 addr, u64 next, unsigned int level,
>  	if (next - xe_walk->va_curs_start > xe_walk->curs->size)
>  		return false;
>  
> +	/* null VMA's do not have dma adresses */
> +	if (xe_walk->pte_flags & XE_PTE_NULL)
> +		return true;
> +
>  	/* Is the DMA address huge PTE size aligned? */
>  	size = next - addr;
>  	dma = addr - xe_walk->va_curs_start + xe_res_dma(xe_walk->curs);
> @@ -585,6 +591,10 @@ xe_pt_scan_64K(u64 addr, u64 next, struct xe_pt_stage_bind_walk *xe_walk)
>  	if (next > xe_walk->l0_end_addr)
>  		return false;
>  
> +	/* null VMA's do not have dma adresses */
> +	if (xe_walk->pte_flags & XE_PTE_NULL)
> +		return true;
> +
>  	xe_res_next(&curs, addr - xe_walk->va_curs_start);
>  	for (; addr < next; addr += SZ_64K) {
>  		if (!IS_ALIGNED(xe_res_dma(&curs), SZ_64K) || curs.size < SZ_64K)
> @@ -630,17 +640,34 @@ xe_pt_stage_bind_entry(struct drm_pt *parent, pgoff_t offset,
>  	struct xe_pt *xe_child;
>  	bool covers;
>  	int ret = 0;
> -	u64 pte;
> +	u64 pte = 0;
>  
>  	/* Is this a leaf entry ?*/
>  	if (level == 0 || xe_pt_hugepte_possible(addr, next, level, xe_walk)) {
>  		struct xe_res_cursor *curs = xe_walk->curs;
> +		bool null = xe_walk->pte_flags & XE_PTE_NULL;
>  
>  		XE_WARN_ON(xe_walk->va_curs_start != addr);
>  
> -		pte = __gen8_pte_encode(xe_res_dma(curs) + xe_walk->dma_offset,
> -					xe_walk->cache, xe_walk->pte_flags,
> -					level);
> +		if (null) {
> +			pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
> +
> +			if (unlikely(xe_walk->pte_flags & XE_PTE_READ_ONLY))
> +				pte &= ~XE_PAGE_RW;
> +
> +			if (level == 1)
> +				pte |= XE_PDE_PS_2M;
> +			else if (level == 2)
> +				pte |= XE_PDPE_PS_1G;
> +
> +			pte |= XE_PTE_NULL;
> +		} else {
> +			pte = __gen8_pte_encode(xe_res_dma(curs) +
> +						xe_walk->dma_offset,
> +						xe_walk->cache,
> +						xe_walk->pte_flags,
> +						level);
> +		}
>  		pte |= xe_walk->default_pte;
>  
>  		/*
> @@ -658,7 +685,8 @@ xe_pt_stage_bind_entry(struct drm_pt *parent, pgoff_t offset,
>  		if (unlikely(ret))
>  			return ret;
>  
> -		xe_res_next(curs, next - addr);
> +		if (!null)
> +			xe_res_next(curs, next - addr);
>  		xe_walk->va_curs_start = next;
>  		*action = ACTION_CONTINUE;
>  
> @@ -751,7 +779,8 @@ xe_pt_stage_bind(struct xe_gt *gt, struct xe_vma *vma,
>  		.gt = gt,
>  		.curs = &curs,
>  		.va_curs_start = xe_vma_start(vma),
> -		.pte_flags = xe_vma_read_only(vma) ? XE_PTE_READ_ONLY : 0,
> +		.pte_flags = xe_vma_read_only(vma) ? XE_PTE_READ_ONLY : 0 |
> +			xe_vma_is_null(vma) ? XE_PTE_NULL : 0,
>  		.wupd.entries = entries,
>  		.needs_64K = (xe_vma_vm(vma)->flags & XE_VM_FLAGS_64K) &&
>  			is_vram,
> @@ -769,23 +798,28 @@ xe_pt_stage_bind(struct xe_gt *gt, struct xe_vma *vma,
>  			gt_to_xe(gt)->mem.vram.io_start;
>  		xe_walk.cache = XE_CACHE_WB;
>  	} else {
> -		if (!xe_vma_is_userptr(vma) && bo->flags & XE_BO_SCANOUT_BIT)
> +		if (!xe_vma_has_no_bo(vma) && bo->flags & XE_BO_SCANOUT_BIT)
>  			xe_walk.cache = XE_CACHE_WT;
>  		else
>  			xe_walk.cache = XE_CACHE_WB;
>  	}
> -	if (!xe_vma_is_userptr(vma) && xe_bo_is_stolen(bo))
> +	if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo))
>  		xe_walk.dma_offset = xe_ttm_stolen_gpu_offset(xe_bo_device(bo));
>  
>  	xe_bo_assert_held(bo);
> -	if (xe_vma_is_userptr(vma))
> -		xe_res_first_sg(vma->userptr.sg, 0, xe_vma_size(vma), &curs);
> -	else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
> -		xe_res_first(bo->ttm.resource, xe_vma_bo_offset(vma),
> -			     xe_vma_size(vma), &curs);
> -	else
> -		xe_res_first_sg(xe_bo_get_sg(bo), xe_vma_bo_offset(vma),
> -				xe_vma_size(vma), &curs);
> +	if (!xe_vma_is_null(vma)) {
> +		if (xe_vma_is_userptr(vma))
> +			xe_res_first_sg(vma->userptr.sg, 0, xe_vma_size(vma),
> +					&curs);
> +		else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
> +			xe_res_first(bo->ttm.resource, xe_vma_bo_offset(vma),
> +				     xe_vma_size(vma), &curs);
> +		else
> +			xe_res_first_sg(xe_bo_get_sg(bo), xe_vma_bo_offset(vma),
> +					xe_vma_size(vma), &curs);
> +	} else {
> +		curs.size = xe_vma_size(vma);
> +	}
>  
>  	ret = drm_pt_walk_range(&pt->drm, pt->level, xe_vma_start(vma),
>  				xe_vma_end(vma), &xe_walk.drm);
> @@ -979,7 +1013,7 @@ static void xe_pt_commit_locks_assert(struct xe_vma *vma)
>  
>  	if (xe_vma_is_userptr(vma))
>  		lockdep_assert_held_read(&vm->userptr.notifier_lock);
> -	else
> +	else if (!xe_vma_is_null(vma))
>  		dma_resv_assert_held(xe_vma_bo(vma)->ttm.base.resv);
>  
>  	dma_resv_assert_held(&vm->resv);
> @@ -1283,7 +1317,8 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
>  	struct xe_vm_pgtable_update entries[XE_VM_MAX_LEVEL * 2 + 1];
>  	struct xe_pt_migrate_pt_update bind_pt_update = {
>  		.base = {
> -			.ops = xe_vma_is_userptr(vma) ? &userptr_bind_ops : &bind_ops,
> +			.ops = xe_vma_is_userptr(vma) ? &userptr_bind_ops :
> +				&bind_ops,
>  			.vma = vma,
>  		},
>  		.bind = true,
> @@ -1348,7 +1383,7 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
>  				   DMA_RESV_USAGE_KERNEL :
>  				   DMA_RESV_USAGE_BOOKKEEP);
>  
> -		if (!xe_vma_is_userptr(vma) && !xe_vma_bo(vma)->vm)
> +		if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
>  			dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence,
>  					   DMA_RESV_USAGE_BOOKKEEP);
>  		xe_pt_commit_bind(vma, entries, num_entries, rebind,
> @@ -1667,7 +1702,7 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
>  				   DMA_RESV_USAGE_BOOKKEEP);
>  
>  		/* This fence will be installed by caller when doing eviction */
> -		if (!xe_vma_is_userptr(vma) && !xe_vma_bo(vma)->vm)
> +		if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
>  			dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence,
>  					   DMA_RESV_USAGE_BOOKKEEP);
>  		xe_pt_commit_unbind(vma, entries, num_entries,
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index f3608865e259..a46f44ab2546 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -60,6 +60,7 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
>  
>  	lockdep_assert_held(&vm->lock);
>  	XE_BUG_ON(!xe_vma_is_userptr(vma));
> +	XE_BUG_ON(xe_vma_is_null(vma));
>  retry:
>  	if (vma->gpuva.flags & XE_VMA_DESTROYED)
>  		return 0;
> @@ -581,7 +582,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
>  		goto out_unlock;
>  
>  	list_for_each_entry(vma, &vm->rebind_list, rebind_link) {
> -		if (xe_vma_is_userptr(vma) ||
> +		if (xe_vma_has_no_bo(vma) ||
>  		    vma->gpuva.flags & XE_VMA_DESTROYED)
>  			continue;
>  
> @@ -813,7 +814,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  				    struct xe_bo *bo,
>  				    u64 bo_offset_or_userptr,
>  				    u64 start, u64 end,
> -				    bool read_only,
> +				    bool read_only, bool null,
>  				    u64 gt_mask)
>  {
>  	struct xe_vma *vma;
> @@ -843,6 +844,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  	vma->gpuva.va.range = end - start + 1;
>  	if (read_only)
>  		vma->gpuva.flags |= XE_VMA_READ_ONLY;
> +	if (null)
> +		vma->gpuva.flags |= XE_VMA_NULL;
>  
>  	if (gt_mask) {
>  		vma->gt_mask = gt_mask;
> @@ -862,23 +865,26 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  		vma->gpuva.gem.obj = &bo->ttm.base;
>  		vma->gpuva.gem.offset = bo_offset_or_userptr;
>  		drm_gpuva_link(&vma->gpuva);
> -	} else /* userptr */ {
> -		u64 size = end - start + 1;
> -		int err;
> -
> -		vma->gpuva.gem.offset = bo_offset_or_userptr;
> +	} else /* userptr or null */ {
> +		if (!null) {
> +			u64 size = end - start + 1;
> +			int err;
> +
> +			vma->gpuva.gem.offset = bo_offset_or_userptr;
> +			err = mmu_interval_notifier_insert(&vma->userptr.notifier,
> +							   current->mm,
> +							   xe_vma_userptr(vma),
> +							   size,
> +							   &vma_userptr_notifier_ops);
> +			if (err) {
> +				kfree(vma);
> +				vma = ERR_PTR(err);
> +				return vma;
> +			}
>  
> -		err = mmu_interval_notifier_insert(&vma->userptr.notifier,
> -						   current->mm,
> -						   xe_vma_userptr(vma), size,
> -						   &vma_userptr_notifier_ops);
> -		if (err) {
> -			kfree(vma);
> -			vma = ERR_PTR(err);
> -			return vma;
> +			vma->userptr.notifier_seq = LONG_MAX;
>  		}
>  
> -		vma->userptr.notifier_seq = LONG_MAX;
>  		xe_vm_get(vm);
>  	}
>  
> @@ -916,6 +922,8 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
>  		 */
>  		mmu_interval_notifier_remove(&vma->userptr.notifier);
>  		xe_vm_put(vm);
> +	} else if (xe_vma_is_null(vma)) {
> +		xe_vm_put(vm);
>  	} else {
>  		xe_bo_put(xe_vma_bo(vma));
>  	}
> @@ -954,7 +962,7 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
>  		list_del_init(&vma->userptr.invalidate_link);
>  		spin_unlock(&vm->userptr.invalidated_lock);
>  		list_del(&vma->userptr_link);
> -	} else {
> +	} else if (!xe_vma_is_null(vma)) {
>  		xe_bo_assert_held(xe_vma_bo(vma));
>  		drm_gpuva_unlink(&vma->gpuva);
>  		if (!xe_vma_bo(vma)->vm)
> @@ -1305,7 +1313,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
>  	drm_gpuva_iter_for_each(gpuva, it) {
>  		vma = gpuva_to_vma(gpuva);
>  
> -		if (xe_vma_is_userptr(vma)) {
> +		if (xe_vma_has_no_bo(vma)) {
>  			down_read(&vm->userptr.notifier_lock);
>  			vma->gpuva.flags |= XE_VMA_DESTROYED;
>  			up_read(&vm->userptr.notifier_lock);
> @@ -1315,7 +1323,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
>  		drm_gpuva_iter_remove(&it);
>  
>  		/* easy case, remove from VMA? */
> -		if (xe_vma_is_userptr(vma) || xe_vma_bo(vma)->vm) {
> +		if (xe_vma_has_no_bo(vma) || xe_vma_bo(vma)->vm) {
>  			xe_vma_destroy(vma, NULL);
>  			continue;
>  		}
> @@ -1964,7 +1972,7 @@ static int xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma,
>  
>  	XE_BUG_ON(region > ARRAY_SIZE(region_to_mem_type));
>  
> -	if (!xe_vma_is_userptr(vma)) {
> +	if (!xe_vma_has_no_bo(vma)) {
>  		err = xe_bo_migrate(xe_vma_bo(vma), region_to_mem_type[region]);
>  		if (err)
>  			return err;
> @@ -2170,6 +2178,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  				operation & XE_VM_BIND_FLAG_IMMEDIATE;
>  			op->map.read_only =
>  				operation & XE_VM_BIND_FLAG_READONLY;
> +			op->map.null = operation & XE_VM_BIND_FLAG_NULL;
>  		}
>  		break;
>  	case XE_VM_BIND_OP_UNMAP:
> @@ -2226,7 +2235,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  }
>  
>  static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
> -			      u64 gt_mask, bool read_only)
> +			      u64 gt_mask, bool read_only, bool null)
>  {
>  	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
>  	struct xe_vma *vma;
> @@ -2242,7 +2251,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>  	}
>  	vma = xe_vma_create(vm, bo, op->gem.offset,
>  			    op->va.addr, op->va.addr +
> -			    op->va.range - 1, read_only,
> +			    op->va.range - 1, read_only, null,
>  			    gt_mask);
>  	if (bo)
>  		xe_bo_unlock(bo, &ww);
> @@ -2254,7 +2263,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>  			xe_vma_destroy(vma, NULL);
>  			return ERR_PTR(err);
>  		}
> -	} else if(!bo->vm) {
> +	} else if(!xe_vma_has_no_bo(vma) && !bo->vm) {
>  		vm_insert_extobj(vm, vma);
>  		err = add_preempt_fences(vm, bo);
>  		if (err) {
> @@ -2332,7 +2341,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
>  				struct xe_vma *vma;
>  
>  				vma = new_vma(vm, &op->base.map,
> -					      op->gt_mask, op->map.read_only);
> +					      op->gt_mask, op->map.read_only,
> +					      op->map.null );
>  				if (IS_ERR(vma)) {
>  					err = PTR_ERR(vma);
>  					goto free_fence;
> @@ -2347,9 +2357,13 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
>  					bool read_only =
>  						op->base.remap.unmap->va->flags &
>  						XE_VMA_READ_ONLY;
> +					bool null =
> +						op->base.remap.unmap->va->flags &
> +						XE_VMA_NULL;
>  
>  					vma = new_vma(vm, op->base.remap.prev,
> -						      op->gt_mask, read_only);
> +						      op->gt_mask, read_only,
> +						      null);
>  					if (IS_ERR(vma)) {
>  						err = PTR_ERR(vma);
>  						goto free_fence;
> @@ -2364,8 +2378,13 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
>  						op->base.remap.unmap->va->flags &
>  						XE_VMA_READ_ONLY;
>  
> +					bool null =
> +						op->base.remap.unmap->va->flags &
> +						XE_VMA_NULL;
> +
>  					vma = new_vma(vm, op->base.remap.next,
> -						      op->gt_mask, read_only);
> +						      op->gt_mask, read_only,
> +						      null);
>  					if (IS_ERR(vma)) {
>  						err = PTR_ERR(vma);
>  						goto free_fence;
> @@ -2853,11 +2872,12 @@ static void vm_bind_ioctl_ops_unwind(struct xe_vm *vm,
>  #ifdef TEST_VM_ASYNC_OPS_ERROR
>  #define SUPPORTED_FLAGS	\
>  	(FORCE_ASYNC_OP_ERROR | XE_VM_BIND_FLAG_ASYNC | \
> -	 XE_VM_BIND_FLAG_READONLY | XE_VM_BIND_FLAG_IMMEDIATE | 0xffff)
> +	 XE_VM_BIND_FLAG_READONLY | XE_VM_BIND_FLAG_IMMEDIATE | \
> +	 XE_VM_BIND_FLAG_NULL | 0xffff)
>  #else
>  #define SUPPORTED_FLAGS	\
>  	(XE_VM_BIND_FLAG_ASYNC | XE_VM_BIND_FLAG_READONLY | \
> -	 XE_VM_BIND_FLAG_IMMEDIATE | 0xffff)
> +	 XE_VM_BIND_FLAG_IMMEDIATE | XE_VM_BIND_FLAG_NULL | 0xffff)
>  #endif
>  #define XE_64K_PAGE_MASK 0xffffull
>  
> @@ -2903,6 +2923,7 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
>  		u32 obj = (*bind_ops)[i].obj;
>  		u64 obj_offset = (*bind_ops)[i].obj_offset;
>  		u32 region = (*bind_ops)[i].region;
> +		bool null = op &  XE_VM_BIND_FLAG_NULL;
>  
>  		if (i == 0) {
>  			*async = !!(op & XE_VM_BIND_FLAG_ASYNC);
> @@ -2929,8 +2950,12 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
>  		if (XE_IOCTL_ERR(xe, VM_BIND_OP(op) >
>  				 XE_VM_BIND_OP_PREFETCH) ||
>  		    XE_IOCTL_ERR(xe, op & ~SUPPORTED_FLAGS) ||
> +		    XE_IOCTL_ERR(xe, obj && null) ||
> +		    XE_IOCTL_ERR(xe, obj_offset && null) ||
> +		    XE_IOCTL_ERR(xe, VM_BIND_OP(op) != XE_VM_BIND_OP_MAP &&
> +				 null) ||
>  		    XE_IOCTL_ERR(xe, !obj &&
> -				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP) ||
> +				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP && !null) ||
>  		    XE_IOCTL_ERR(xe, !obj &&
>  				 VM_BIND_OP(op) == XE_VM_BIND_OP_UNMAP_ALL) ||
>  		    XE_IOCTL_ERR(xe, addr &&
> @@ -3254,6 +3279,7 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
>  	int ret;
>  
>  	XE_BUG_ON(!xe_vm_in_fault_mode(xe_vma_vm(vma)));
> +	XE_BUG_ON(xe_vma_is_null(vma));
>  	trace_xe_vma_usm_invalidate(vma);
>  
>  	/* Check that we don't race with page-table updates */
> @@ -3313,8 +3339,11 @@ int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
>  	drm_gpuva_iter_for_each(gpuva, it) {
>  		struct xe_vma* vma = gpuva_to_vma(gpuva);
>  		bool is_userptr = xe_vma_is_userptr(vma);
> +		bool null = xe_vma_is_null(vma);
>  
> -		if (is_userptr) {
> +		if (null) {
> +			addr = 0;
> +		} else if (is_userptr) {
>  			struct xe_res_cursor cur;
>  
>  			xe_res_first_sg(vma->userptr.sg, 0, XE_PAGE_SIZE, &cur);
> @@ -3324,7 +3353,8 @@ int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
>  		}
>  		drm_printf(p, " [%016llx-%016llx] S:0x%016llx A:%016llx %s\n",
>  			   xe_vma_start(vma), xe_vma_end(vma), xe_vma_size(vma),
> -			   addr, is_userptr ? "USR" : is_vram ? "VRAM" : "SYS");
> +			   addr, null ? "NULL" :
> +			   is_userptr ? "USR" : is_vram ? "VRAM" : "SYS");
>  	}
>  	up_read(&vm->lock);
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 21b1054949c4..96e2c6b07bf8 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -175,7 +175,17 @@ static inline void xe_vm_reactivate_rebind(struct xe_vm *vm)
>  	}
>  }
>  
> +static inline bool xe_vma_is_null(struct xe_vma *vma)
> +{
> +	return vma->gpuva.flags & XE_VMA_NULL;
> +}
> +
>  static inline bool xe_vma_is_userptr(struct xe_vma *vma)
> +{
> +	return !xe_vma_bo(vma) && !xe_vma_is_null(vma);
> +}
> +
> +static inline bool xe_vma_has_no_bo(struct xe_vma *vma)
>  {
>  	return !xe_vma_bo(vma);
>  }
> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
> index 02d27a354b36..03508645fa08 100644
> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> @@ -227,7 +227,7 @@ get_vmas(struct xe_vm *vm, int *num_vmas, u64 addr, u64 range)
>  	drm_gpuva_iter_for_each_range(gpuva, it, addr + range) {
>  		struct xe_vma *vma = gpuva_to_vma(gpuva);
>  
> -		if (xe_vma_is_userptr(vma))
> +		if (xe_vma_has_no_bo(vma))
>  			continue;
>  
>  		if (*num_vmas == max_vmas) {
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index 243dc91a61b0..b61007b70502 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -29,6 +29,7 @@ struct xe_vm;
>  #define XE_VMA_ATOMIC_PTE_BIT	(DRM_GPUVA_USERBITS << 2)
>  #define XE_VMA_FIRST_REBIND	(DRM_GPUVA_USERBITS << 3)
>  #define XE_VMA_LAST_REBIND	(DRM_GPUVA_USERBITS << 4)
> +#define XE_VMA_NULL		(DRM_GPUVA_USERBITS << 5)
>  
>  struct xe_vma {
>  	/** @gpuva: Base GPUVA object */
> @@ -315,6 +316,8 @@ struct xe_vma_op_map {
>  	bool immediate;
>  	/** @read_only: Read only */
>  	bool read_only;
> +	/** @null: NULL (writes dropped, read zero) */
> +	bool null;
>  };
>  
>  /** struct xe_vma_op_unmap - VMA unmap operation */
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index b0b80aae3ee8..27c51946fadd 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -447,6 +447,14 @@ struct drm_xe_vm_bind_op {
>  	 * than differing the MAP to the page fault handler.
>  	 */
>  #define XE_VM_BIND_FLAG_IMMEDIATE	(0x1 << 18)
> +	/*
> +	 * When the NULL flag is set, the page tables are setup with a special
> +	 * bit which indicates writes are dropped and all reads return zero. The
> +	 * NULL flags is only valid for XE_VM_BIND_OP_MAP operations, the BO
> +	 * handle MBZ, and the BO offset MBZ. This flag is intended to implement
> +	 * VK sparse bindings.
> +	 */
> +#define XE_VM_BIND_FLAG_NULL		(0x1 << 19)
>  
>  	/** @reserved: Reserved */
>  	__u64 reserved[2];
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 26/31] drm/exec: Always compile drm_exec
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 26/31] drm/exec: Always compile drm_exec Matthew Brost
@ 2023-05-09 14:45   ` Rodrigo Vivi
  2023-05-10  0:37     ` Matthew Brost
  2023-05-10  0:38     ` Matthew Brost
  0 siblings, 2 replies; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-09 14:45 UTC (permalink / raw)
  To: Matthew Brost; +Cc: Danilo Krummrich, intel-xe

On Mon, May 01, 2023 at 05:17:22PM -0700, Matthew Brost wrote:
> We want some helpers for DRM exec in gpuva, alway compile this.
> 
> Suggested-by: Danilo Krummrich <dakr@redhat.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/Makefile | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index ab728632d8a2..40067970af04 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -23,6 +23,7 @@ drm-y := \
>  	drm_dumb_buffers.o \
>  	drm_edid.o \
>  	drm_encoder.o \
> +	drm_exec.o \
>  	drm_file.o \
>  	drm_fourcc.o \
>  	drm_framebuffer.o \
> @@ -81,8 +82,6 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
>  # Memory-management helpers
>  #
>  #
> -obj-$(CONFIG_DRM_EXEC) += drm_exec.o

shouldn't this kill this kconfig entirely then?
Or should the helpers be split into some other common file?

> -
>  obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>  
>  drm_dma_helper-y := drm_gem_dma_helper.o
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 18/31] drm/xe: Avoid doing rebinds
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 18/31] drm/xe: Avoid doing rebinds Matthew Brost
@ 2023-05-09 14:48   ` Rodrigo Vivi
  0 siblings, 0 replies; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-09 14:48 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, May 01, 2023 at 05:17:14PM -0700, Matthew Brost wrote:
> If we dont change page sizes we can avoid doing rebinds rather just do a
> partial unbind. The algorithm to determine is page size is greedy as we

There's something off in this phrase...      ^ around here...
maybe s/is/its/ ?

But about the rebinds and remaps I was not able to follow the changes
below... probably if this patch was in a smaller series or if the code
for the remap that this is based on was already merged that could be
easier... or maybe someone with more deep knowledge in this area like
Thomas would be the best one to review this.

> assume all pages in the removed VMA are the largest page used in the
> VMA.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_pt.c       |  4 ++
>  drivers/gpu/drm/xe/xe_vm.c       | 71 +++++++++++++++++++++++++-------
>  drivers/gpu/drm/xe/xe_vm_types.h | 17 ++++----
>  3 files changed, 67 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index b4edb751bfbb..010f44260cda 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -412,6 +412,8 @@ struct xe_pt_stage_bind_walk {
>  	/* Input parameters for the walk */
>  	/** @vm: The vm we're building for. */
>  	struct xe_vm *vm;
> +	/** @vma: The vma we are binding for. */
> +	struct xe_vma *vma;
>  	/** @gt: The gt we're building for. */
>  	struct xe_gt *gt;
>  	/** @cache: Desired cache level for the ptes */
> @@ -688,6 +690,7 @@ xe_pt_stage_bind_entry(struct drm_pt *parent, pgoff_t offset,
>  		if (!null)
>  			xe_res_next(curs, next - addr);
>  		xe_walk->va_curs_start = next;
> +		xe_walk->vma->gpuva.flags |= (XE_VMA_PTE_4K << level);
>  		*action = ACTION_CONTINUE;
>  
>  		return ret;
> @@ -776,6 +779,7 @@ xe_pt_stage_bind(struct xe_gt *gt, struct xe_vma *vma,
>  			.max_level = XE_PT_HIGHEST_LEVEL,
>  		},
>  		.vm = xe_vma_vm(vma),
> +		.vma = vma,
>  		.gt = gt,
>  		.curs = &curs,
>  		.va_curs_start = xe_vma_start(vma),
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index a46f44ab2546..e0ed7201aeb0 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -2276,6 +2276,16 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>  	return vma;
>  }
>  
> +static u64 xe_vma_max_pte_size(struct xe_vma *vma)
> +{
> +	if (vma->gpuva.flags & XE_VMA_PTE_1G)
> +		return SZ_1G;
> +	else if (vma->gpuva.flags & XE_VMA_PTE_2M)
> +		return SZ_2M;
> +
> +	return SZ_4K;
> +}
> +
>  /*
>   * Parse operations list and create any resources needed for the operations
>   * prior to fully commiting to the operations. This setp can fail.
> @@ -2352,6 +2362,13 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
>  				break;
>  			}
>  			case DRM_GPUVA_OP_REMAP:
> +			{
> +				struct xe_vma *old =
> +					gpuva_to_vma(op->base.remap.unmap->va);
> +
> +				op->remap.start = xe_vma_start(old);
> +				op->remap.range = xe_vma_size(old);
> +
>  				if (op->base.remap.prev) {
>  					struct xe_vma *vma;
>  					bool read_only =
> @@ -2370,6 +2387,20 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
>  					}
>  
>  					op->remap.prev = vma;
> +
> +					/*
> +					 * XXX: Not sure why userptr doesn't
> +					 * work but really shouldn't be a use
> +					 * case.
> +					 */
> +					op->remap.skip_prev = !xe_vma_is_userptr(old) &&
> +						IS_ALIGNED(xe_vma_end(vma), xe_vma_max_pte_size(old));
> +					if (op->remap.skip_prev) {
> +						op->remap.range -=
> +							xe_vma_end(vma) -
> +							xe_vma_start(old);
> +						op->remap.start = xe_vma_end(vma);
> +					}
>  				}
>  
>  				if (op->base.remap.next) {
> @@ -2391,20 +2422,16 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
>  					}
>  
>  					op->remap.next = vma;
> +					op->remap.skip_next = !xe_vma_is_userptr(old) &&
> +						IS_ALIGNED(xe_vma_start(vma), xe_vma_max_pte_size(old));
> +					if (op->remap.skip_next)
> +						op->remap.range -=
> +							xe_vma_end(old) -
> +							xe_vma_start(vma);
>  				}
> -
> -				/* XXX: Support no doing remaps */
> -				op->remap.start =
> -					xe_vma_start(gpuva_to_vma(op->base.remap.unmap->va));
> -				op->remap.range =
> -					xe_vma_size(gpuva_to_vma(op->base.remap.unmap->va));
>  				break;
> +			}
>  			case DRM_GPUVA_OP_UNMAP:
> -				op->unmap.start =
> -					xe_vma_start(gpuva_to_vma(op->base.unmap.va));
> -				op->unmap.range =
> -					xe_vma_size(gpuva_to_vma(op->base.unmap.va));
> -				break;
>  			case DRM_GPUVA_OP_PREFETCH:
>  				/* Nothing to do */
>  				break;
> @@ -2445,10 +2472,23 @@ static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)
>  	case DRM_GPUVA_OP_REMAP:
>  		prep_vma_destroy(vm, gpuva_to_vma(op->base.remap.unmap->va),
>  				 true);
> -		if (op->remap.prev)
> +
> +		if (op->remap.prev) {
>  			err |= xe_vm_insert_vma(vm, op->remap.prev);
> -		if (op->remap.next)
> +			if (!err && op->remap.skip_prev)
> +				op->remap.prev = NULL;
> +		}
> +		if (op->remap.next) {
>  			err |= xe_vm_insert_vma(vm, op->remap.next);
> +			if (!err && op->remap.skip_next)
> +				op->remap.next = NULL;
> +		}
> +
> +		/* Adjust for partial unbind after removin VMA from VM */
> +		if (!err) {
> +			op->base.remap.unmap->va->va.addr = op->remap.start;
> +			op->base.remap.unmap->va->va.range = op->remap.range;
> +		}
>  		break;
>  	case DRM_GPUVA_OP_UNMAP:
>  		prep_vma_destroy(vm, gpuva_to_vma(op->base.unmap.va), true);
> @@ -2518,9 +2558,10 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
>  		bool next = !!op->remap.next;
>  
>  		if (!op->remap.unmap_done) {
> -			vm->async_ops.munmap_rebind_inflight = true;
> -			if (prev || next)
> +			if (prev || next) {
> +				vm->async_ops.munmap_rebind_inflight = true;
>  				vma->gpuva.flags |= XE_VMA_FIRST_REBIND;
> +			}
>  			err = xe_vm_unbind(vm, vma, op->engine, op->syncs,
>  					   op->num_syncs,
>  					   !prev && !next ? op->fence : NULL,
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index b61007b70502..d55ec8156caa 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -30,6 +30,9 @@ struct xe_vm;
>  #define XE_VMA_FIRST_REBIND	(DRM_GPUVA_USERBITS << 3)
>  #define XE_VMA_LAST_REBIND	(DRM_GPUVA_USERBITS << 4)
>  #define XE_VMA_NULL		(DRM_GPUVA_USERBITS << 5)
> +#define XE_VMA_PTE_4K		(DRM_GPUVA_USERBITS << 6)
> +#define XE_VMA_PTE_2M		(DRM_GPUVA_USERBITS << 7)
> +#define XE_VMA_PTE_1G		(DRM_GPUVA_USERBITS << 8)
>  
>  struct xe_vma {
>  	/** @gpuva: Base GPUVA object */
> @@ -320,14 +323,6 @@ struct xe_vma_op_map {
>  	bool null;
>  };
>  
> -/** struct xe_vma_op_unmap - VMA unmap operation */
> -struct xe_vma_op_unmap {
> -	/** @start: start of the VMA unmap */
> -	u64 start;
> -	/** @range: range of the VMA unmap */
> -	u64 range;
> -};
> -
>  /** struct xe_vma_op_remap - VMA remap operation */
>  struct xe_vma_op_remap {
>  	/** @prev: VMA preceding part of a split mapping */
> @@ -338,6 +333,10 @@ struct xe_vma_op_remap {
>  	u64 start;
>  	/** @range: range of the VMA unmap */
>  	u64 range;
> +	/** @skip_prev: skip prev rebind */
> +	bool skip_prev;
> +	/** @skip_next: skip next rebind */
> +	bool skip_next;
>  	/** @unmap_done: unmap operation in done */
>  	bool unmap_done;
>  };
> @@ -395,8 +394,6 @@ struct xe_vma_op {
>  	union {
>  		/** @map: VMA map operation specific data */
>  		struct xe_vma_op_map map;
> -		/** @unmap: VMA unmap operation specific data */
> -		struct xe_vma_op_unmap unmap;
>  		/** @remap: VMA remap operation specific data */
>  		struct xe_vma_op_remap remap;
>  		/** @prefetch: VMA prefetch operation specific data */
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 29/31] drm/xe: Allow compute VMs to output dma-fences on binds
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 29/31] drm/xe: Allow compute VMs to output dma-fences on binds Matthew Brost
@ 2023-05-09 14:50   ` Rodrigo Vivi
  2023-05-11 10:04   ` Thomas Hellström
  1 sibling, 0 replies; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-09 14:50 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, May 01, 2023 at 05:17:25PM -0700, Matthew Brost wrote:
> Binds are not long running jobs thus we can export dma-fences even if a
> VM is in compute mode.

is this true independent of the series? or something changed in the
series that made this to be true?

I wonder if this is not a good candidate for standalone patch...

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_vm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 55cced8870e6..07023506ce6b 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -3047,7 +3047,7 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  	for (num_syncs = 0; num_syncs < args->num_syncs; num_syncs++) {
>  		err = xe_sync_entry_parse(xe, xef, &syncs[num_syncs],
>  					  &syncs_user[num_syncs], false,
> -					  xe_vm_no_dma_fences(vm));
> +					  xe_vm_in_fault_mode(vm));
>  		if (err)
>  			goto free_syncs;
>  	}
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 05/31] drm/xe: Long running job update
  2023-05-08 13:14   ` Thomas Hellström
@ 2023-05-09 14:56     ` Matthew Brost
  2023-05-09 15:21       ` Thomas Hellström
  2023-05-09 22:21     ` Matthew Brost
  1 sibling, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-09 14:56 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

On Mon, May 08, 2023 at 03:14:10PM +0200, Thomas Hellström wrote:
> Hi, Matthew
> 
> In addition to Rodrigo's comments:
> 
> On 5/2/23 02:17, Matthew Brost wrote:
> > Flow control + write ring in exec, return NULL in run_job, siganl
> > xe_hw_fence immediately, and override TDR for LR jobs.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_engine.c           | 32 ++++++++
> >   drivers/gpu/drm/xe/xe_engine.h           |  4 +
> >   drivers/gpu/drm/xe/xe_exec.c             |  8 ++
> >   drivers/gpu/drm/xe/xe_guc_engine_types.h |  2 +
> >   drivers/gpu/drm/xe/xe_guc_submit.c       | 95 +++++++++++++++++++++---
> >   drivers/gpu/drm/xe/xe_trace.h            |  5 ++
> >   6 files changed, 137 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
> > index 094ec17d3004..d1e84d7adbd4 100644
> > --- a/drivers/gpu/drm/xe/xe_engine.c
> > +++ b/drivers/gpu/drm/xe/xe_engine.c
> > @@ -18,6 +18,7 @@
> >   #include "xe_macros.h"
> >   #include "xe_migrate.h"
> >   #include "xe_pm.h"
> > +#include "xe_ring_ops_types.h"
> >   #include "xe_trace.h"
> >   #include "xe_vm.h"
> > @@ -673,6 +674,37 @@ static void engine_kill_compute(struct xe_engine *e)
> >   	up_write(&e->vm->lock);
> >   }
> > +/**
> > + * xe_engine_is_lr() - Whether an engine is long-running
> > + * @e: The engine
> > + *
> > + * Return: True if the engine is long-running, false otherwise.
> > + */
> > +bool xe_engine_is_lr(struct xe_engine *e)
> > +{
> > +	return e->vm && xe_vm_no_dma_fences(e->vm) &&
> > +		!(e->flags & ENGINE_FLAG_VM);
> > +}
> > +
> > +static s32 xe_engine_num_job_inflight(struct xe_engine *e)
> > +{
> > +	return e->lrc->fence_ctx.next_seqno - xe_lrc_seqno(e->lrc) - 1;
> > +}
> > +
> > +/**
> > + * xe_engine_ring_full() - Whether an engine's ring is full
> > + * @e: The engine
> > + *
> > + * Return: True if the engine's ring is full, false otherwise.
> > + */
> > +bool xe_engine_ring_full(struct xe_engine *e)
> > +{
> > +	struct xe_lrc *lrc = e->lrc;
> > +	s32 max_job = lrc->ring.size / MAX_JOB_SIZE_BYTES;
> > +
> > +	return xe_engine_num_job_inflight(e) >= max_job;
> > +}
> > +
> >   /**
> >    * xe_engine_is_idle() - Whether an engine is idle.
> >    * @engine: The engine
> > diff --git a/drivers/gpu/drm/xe/xe_engine.h b/drivers/gpu/drm/xe/xe_engine.h
> > index a49cf2ab405e..2e60f6d90226 100644
> > --- a/drivers/gpu/drm/xe/xe_engine.h
> > +++ b/drivers/gpu/drm/xe/xe_engine.h
> > @@ -42,6 +42,10 @@ static inline bool xe_engine_is_parallel(struct xe_engine *engine)
> >   	return engine->width > 1;
> >   }
> > +bool xe_engine_is_lr(struct xe_engine *e);
> > +
> > +bool xe_engine_ring_full(struct xe_engine *e);
> > +
> >   bool xe_engine_is_idle(struct xe_engine *engine);
> >   void xe_engine_kill(struct xe_engine *e);
> > diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> > index ea869f2452ef..44ea9bcd0066 100644
> > --- a/drivers/gpu/drm/xe/xe_exec.c
> > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > @@ -13,6 +13,7 @@
> >   #include "xe_device.h"
> >   #include "xe_engine.h"
> >   #include "xe_macros.h"
> > +#include "xe_ring_ops_types.h"
> >   #include "xe_sched_job.h"
> >   #include "xe_sync.h"
> >   #include "xe_vm.h"
> > @@ -277,6 +278,11 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >   		goto err_engine_end;
> >   	}
> > +	if (xe_engine_is_lr(engine) && xe_engine_ring_full(engine)) {
> > +		err = -EWOULDBLOCK;
> > +		goto err_engine_end;
> > +	}
> > +
> >   	job = xe_sched_job_create(engine, xe_engine_is_parallel(engine) ?
> >   				  addresses : &args->address);
> >   	if (IS_ERR(job)) {
> > @@ -363,6 +369,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >   		xe_sync_entry_signal(&syncs[i], job,
> >   				     &job->drm.s_fence->finished);
> > +	if (xe_engine_is_lr(engine))
> > +		engine->ring_ops->emit_job(job);
> >   	xe_sched_job_push(job);
> >   	xe_vm_reactivate_rebind(vm);
> > diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > index cbfb13026ec1..5d83132034a6 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > @@ -31,6 +31,8 @@ struct xe_guc_engine {
> >   	 */
> >   #define MAX_STATIC_MSG_TYPE	3
> >   	struct drm_sched_msg static_msgs[MAX_STATIC_MSG_TYPE];
> > +	/** @lr_tdr: long running TDR worker */
> > +	struct work_struct lr_tdr;
> >   	/** @fini_async: do final fini async from this worker */
> >   	struct work_struct fini_async;
> >   	/** @resume_time: time of last resume */
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index 68d09e7a4cc0..0a41f5d04f6d 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -500,6 +500,14 @@ static void register_engine(struct xe_engine *e)
> >   		parallel_write(xe, map, wq_desc.wq_status, WQ_STATUS_ACTIVE);
> >   	}
> > +	/*
> > +	 * We must keep a reference for LR engines if engine is registered with
> > +	 * the GuC as jobs signal immediately and can't destroy an engine if the
> > +	 * GuC has a reference to it.
> > +	 */
> > +	if (xe_engine_is_lr(e))
> > +		xe_engine_get(e);
> > +
> >   	set_engine_registered(e);
> >   	trace_xe_engine_register(e);
> >   	if (xe_engine_is_parallel(e))
> > @@ -662,6 +670,7 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
> >   {
> >   	struct xe_sched_job *job = to_xe_sched_job(drm_job);
> >   	struct xe_engine *e = job->engine;
> > +	bool lr = xe_engine_is_lr(e);
> >   	XE_BUG_ON((engine_destroyed(e) || engine_pending_disable(e)) &&
> >   		  !engine_banned(e) && !engine_suspended(e));
> > @@ -671,14 +680,19 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
> >   	if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
> >   		if (!engine_registered(e))
> >   			register_engine(e);
> > -		e->ring_ops->emit_job(job);
> > +		if (!lr)	/* Written in IOCTL */
> 
> Hmm? What does "Written in IOCTL mean?" Could you rephrase to something more
> descriptive?
> 

"LR jos are emitted in the IOCTL"

Does that work?

Matt

> > +			e->ring_ops->emit_job(job);
> >   		submit_engine(e);
> >   	}
> > -	if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags))
> > +	if (lr) {
> > +		xe_sched_job_set_error(job, -ENOTSUPP);
> > +		return NULL;
> > +	} else if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags)) {
> >   		return job->fence;
> > -	else
> > +	} else {
> >   		return dma_fence_get(job->fence);
> > +	}
> >   }
> >   static void guc_engine_free_job(struct drm_sched_job *drm_job)
> > @@ -782,6 +796,57 @@ static void simple_error_capture(struct xe_engine *e)
> >   }
> >   #endif
> > +static void xe_guc_engine_trigger_cleanup(struct xe_engine *e)
> > +{
> > +	struct xe_guc *guc = engine_to_guc(e);
> > +
> > +	if (xe_engine_is_lr(e))
> > +		queue_work(guc_to_gt(guc)->ordered_wq, &e->guc->lr_tdr);
> > +	else
> > +		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > +}
> > +
> > +static void xe_guc_engine_lr_cleanup(struct work_struct *w)
> > +{
> > +	struct xe_guc_engine *ge =
> > +		container_of(w, struct xe_guc_engine, lr_tdr);
> > +	struct xe_engine *e = ge->engine;
> > +	struct drm_gpu_scheduler *sched = &ge->sched;
> > +
> > +	XE_BUG_ON(!xe_engine_is_lr(e));
> > +	trace_xe_engine_lr_cleanup(e);
> > +
> > +	/* Kill the run_job / process_msg entry points */
> > +	drm_sched_run_wq_stop(sched);
> > +
> > +	/* Engine state now stable, disable scheduling / deregister if needed */
> > +	if (engine_registered(e)) {
> > +		struct xe_guc *guc = engine_to_guc(e);
> > +		int ret;
> > +
> > +		set_engine_banned(e);
> > +		xe_engine_get(e);
> > +		disable_scheduling_deregister(guc, e);
> > +
> > +		/*
> > +		 * Must wait for scheduling to be disabled before signalling
> > +		 * any fences, if GT broken the GT reset code should signal us.
> > +		 */
> > +		smp_rmb();
> 
> wait_event() paired with wake_up() family of functions typically set the
> necessary barriers to make sure anything written prior to wake_up() is seen
> in wait_event(). So that smp_rmb() is most likely not needed. If it still
> is, its pairing smp_wmb() should be documented and pointed to as well. See
> documentation of set_current_state() vs __set_current_state().
> 
> > +		ret = wait_event_timeout(guc->ct.wq,
> > +					 !engine_pending_disable(e) ||
> > +					 guc_read_stopped(guc), HZ * 5);
> > +		if (!ret) {
> > +			XE_WARN_ON("Schedule disable failed to respond");
> > +			drm_sched_run_wq_start(sched);
> > +			xe_gt_reset_async(e->gt);
> > +			return;
> > +		}
> > +	}
> > +
> > +	drm_sched_run_wq_start(sched);
> > +}
> > +
> >   static enum drm_gpu_sched_stat
> >   guc_engine_timedout_job(struct drm_sched_job *drm_job)
> >   {
> > @@ -832,7 +897,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
> >   			err = -EIO;
> >   		set_engine_banned(e);
> >   		xe_engine_get(e);
> > -		disable_scheduling_deregister(engine_to_guc(e), e);
> > +		disable_scheduling_deregister(guc, e);
> >   		/*
> >   		 * Must wait for scheduling to be disabled before signalling
> > @@ -865,7 +930,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
> >   	 */
> >   	list_add(&drm_job->list, &sched->pending_list);
> >   	drm_sched_run_wq_start(sched);
> > -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > +	xe_guc_engine_trigger_cleanup(e);
> >   	/* Mark all outstanding jobs as bad, thus completing them */
> >   	spin_lock(&sched->job_list_lock);
> > @@ -889,6 +954,8 @@ static void __guc_engine_fini_async(struct work_struct *w)
> >   	trace_xe_engine_destroy(e);
> > +	if (xe_engine_is_lr(e))
> > +		cancel_work_sync(&ge->lr_tdr);
> >   	if (e->flags & ENGINE_FLAG_PERSISTENT)
> >   		xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
> >   	release_guc_id(guc, e);
> > @@ -906,7 +973,7 @@ static void guc_engine_fini_async(struct xe_engine *e)
> >   	bool kernel = e->flags & ENGINE_FLAG_KERNEL;
> >   	INIT_WORK(&e->guc->fini_async, __guc_engine_fini_async);
> > -	queue_work(system_unbound_wq, &e->guc->fini_async);
> > +	queue_work(system_wq, &e->guc->fini_async);
> >   	/* We must block on kernel engines so slabs are empty on driver unload */
> >   	if (kernel) {
> > @@ -1089,12 +1156,16 @@ static int guc_engine_init(struct xe_engine *e)
> >   	if (err)
> >   		goto err_free;
> > +
> 
> Unrelated whitespace?
> 
> 
> >   	sched = &ge->sched;
> >   	err = drm_sched_entity_init(&ge->entity, DRM_SCHED_PRIORITY_NORMAL,
> >   				    &sched, 1, NULL);
> >   	if (err)
> >   		goto err_sched;
> > +	if (xe_engine_is_lr(e))
> > +		INIT_WORK(&e->guc->lr_tdr, xe_guc_engine_lr_cleanup);
> > +
> >   	mutex_lock(&guc->submission_state.lock);
> >   	err = alloc_guc_id(guc, e);
> > @@ -1146,7 +1217,7 @@ static void guc_engine_kill(struct xe_engine *e)
> >   {
> >   	trace_xe_engine_kill(e);
> >   	set_engine_killed(e);
> > -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > +	xe_guc_engine_trigger_cleanup(e);
> >   }
> >   static void guc_engine_add_msg(struct xe_engine *e, struct drm_sched_msg *msg,
> > @@ -1296,6 +1367,9 @@ static void guc_engine_stop(struct xe_guc *guc, struct xe_engine *e)
> >   	/* Stop scheduling + flush any DRM scheduler operations */
> >   	drm_sched_run_wq_stop(sched);
> > +	if (engine_registered(e) && xe_engine_is_lr(e))
> > +		xe_engine_put(e);
> > +
> >   	/* Clean up lost G2H + reset engine state */
> >   	if (engine_destroyed(e) && engine_registered(e)) {
> >   		if (engine_banned(e))
> > @@ -1520,6 +1594,9 @@ int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> >   	trace_xe_engine_deregister_done(e);
> >   	clear_engine_registered(e);
> > +	if (xe_engine_is_lr(e))
> > +		xe_engine_put(e);
> > +
> >   	if (engine_banned(e))
> >   		xe_engine_put(e);
> >   	else
> > @@ -1557,7 +1634,7 @@ int xe_guc_engine_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
> >   	 */
> >   	set_engine_reset(e);
> >   	if (!engine_banned(e))
> > -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > +		xe_guc_engine_trigger_cleanup(e);
> >   	return 0;
> >   }
> > @@ -1584,7 +1661,7 @@ int xe_guc_engine_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
> >   	/* Treat the same as engine reset */
> >   	set_engine_reset(e);
> >   	if (!engine_banned(e))
> > -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > +		xe_guc_engine_trigger_cleanup(e);
> >   	return 0;
> >   }
> > diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> > index 2f8eb7ebe9a7..02861c26e145 100644
> > --- a/drivers/gpu/drm/xe/xe_trace.h
> > +++ b/drivers/gpu/drm/xe/xe_trace.h
> > @@ -219,6 +219,11 @@ DEFINE_EVENT(xe_engine, xe_engine_resubmit,
> >   	     TP_ARGS(e)
> >   );
> > +DEFINE_EVENT(xe_engine, xe_engine_lr_cleanup,
> > +	     TP_PROTO(struct xe_engine *e),
> > +	     TP_ARGS(e)
> > +);
> > +
> >   DECLARE_EVENT_CLASS(xe_sched_job,
> >   		    TP_PROTO(struct xe_sched_job *job),
> >   		    TP_ARGS(job),
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 17/31] drm/xe: NULL binding implementation
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 17/31] drm/xe: NULL binding implementation Matthew Brost
  2023-05-09 14:34   ` Rodrigo Vivi
@ 2023-05-09 15:17   ` Thomas Hellström
  1 sibling, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-09 15:17 UTC (permalink / raw)
  To: Matthew Brost, intel-xe

On Mon, 2023-05-01 at 17:17 -0700, Matthew Brost wrote:
> Add uAPI and implementation for NULL bindings. A NULL binding is
> defined
> as writes dropped and read zero. A single bit in the uAPI has been
> added
> which results in a single bit in the PTEs being set.
> 
> NULL bindings are indended to be used to implement VK sparse
> bindings.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.h           |  1 +
>  drivers/gpu/drm/xe/xe_exec.c         |  2 +
>  drivers/gpu/drm/xe/xe_gt_pagefault.c |  4 +-
>  drivers/gpu/drm/xe/xe_pt.c           | 77 ++++++++++++++++-------
>  drivers/gpu/drm/xe/xe_vm.c           | 92 ++++++++++++++++++--------
> --
>  drivers/gpu/drm/xe/xe_vm.h           | 10 +++
>  drivers/gpu/drm/xe/xe_vm_madvise.c   |  2 +-
>  drivers/gpu/drm/xe/xe_vm_types.h     |  3 +
>  include/uapi/drm/xe_drm.h            |  8 +++
>  9 files changed, 144 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index 25457b3c757b..81051f456874 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -56,6 +56,7 @@
>  #define XE_PDE_IPS_64K                 BIT_ULL(11)
>  
>  #define XE_GGTT_PTE_LM                 BIT_ULL(1)
> +#define XE_PTE_NULL                    BIT_ULL(9)

This looks like something we want to encode in the PPGTT PTE, (We
should really move those out to another header),rather than a flag we
want to pass around in the vm code. So perhaps XE_PPGTT_PTE_NULL do
follow the naming convention.


>  #define XE_USM_PPGTT_PTE_AE            BIT_ULL(10)
>  #define XE_PPGTT_PTE_LM                        BIT_ULL(11)
>  #define XE_PDE_64K                     BIT_ULL(6)
> diff --git a/drivers/gpu/drm/xe/xe_exec.c
> b/drivers/gpu/drm/xe/xe_exec.c
> index 90c46d092737..68f876afd13c 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -116,6 +116,8 @@ static int xe_exec_begin(struct xe_engine *e,
> struct ww_acquire_ctx *ww,
>          * to a location where the GPU can access it).
>          */
>         list_for_each_entry(vma, &vm->rebind_list, rebind_link) {
> +               XE_BUG_ON(xe_vma_is_null(vma));
> +
>                 if (xe_vma_is_userptr(vma))
>                         continue;
>  
> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> index f7a066090a13..cfffe3398fe4 100644
> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> @@ -526,8 +526,8 @@ static int handle_acc(struct xe_gt *gt, struct
> acc *acc)
>  
>         trace_xe_vma_acc(vma);
>  
> -       /* Userptr can't be migrated, nothing to do */
> -       if (xe_vma_is_userptr(vma))
> +       /* Userptr or null can't be migrated, nothing to do */
> +       if (xe_vma_has_no_bo(vma))
>                 goto unlock_vm;
>  
>         /* Lock VM and BOs dma-resv */
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 2b5b05a8a084..b4edb751bfbb 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -82,7 +82,9 @@ u64 gen8_pde_encode(struct xe_bo *bo, u64
> bo_offset,
>  static dma_addr_t vma_addr(struct xe_vma *vma, u64 offset,
>                            size_t page_size, bool *is_vram)
>  {
> -       if (xe_vma_is_userptr(vma)) {
> +       if (xe_vma_is_null(vma)) {
> +               return 0;
> +       } else if (xe_vma_is_userptr(vma)) {
>                 struct xe_res_cursor cur;
>                 u64 page;
>  
> @@ -563,6 +565,10 @@ static bool xe_pt_hugepte_possible(u64 addr, u64
> next, unsigned int level,
>         if (next - xe_walk->va_curs_start > xe_walk->curs->size)
>                 return false;
>  
> +       /* null VMA's do not have dma adresses */
> +       if (xe_walk->pte_flags & XE_PTE_NULL)

But here it is used as a flag determining PTE setup. These are
different, see the documentation on pte_flags. Look for XE_PAGE_RW and
XE_PTE_READ_ONLY for a similar difference. One is a bit in the PTE
which may vary across hardware, the other one is a flag handed to the
page-table walker indicating a NULL binding.

> +               return true;
> +
>         /* Is the DMA address huge PTE size aligned? */
>         size = next - addr;
>         dma = addr - xe_walk->va_curs_start + xe_res_dma(xe_walk-
> >curs);
> @@ -585,6 +591,10 @@ xe_pt_scan_64K(u64 addr, u64 next, struct
> xe_pt_stage_bind_walk *xe_walk)
>         if (next > xe_walk->l0_end_addr)
>                 return false;
>  
> +       /* null VMA's do not have dma adresses */
> +       if (xe_walk->pte_flags & XE_PTE_NULL)
> +               return true;
> +
>         xe_res_next(&curs, addr - xe_walk->va_curs_start);
>         for (; addr < next; addr += SZ_64K) {
>                 if (!IS_ALIGNED(xe_res_dma(&curs), SZ_64K) ||
> curs.size < SZ_64K)
> @@ -630,17 +640,34 @@ xe_pt_stage_bind_entry(struct drm_pt *parent,
> pgoff_t offset,
>         struct xe_pt *xe_child;
>         bool covers;
>         int ret = 0;
> -       u64 pte;
> +       u64 pte = 0;
>  
>         /* Is this a leaf entry ?*/
>         if (level == 0 || xe_pt_hugepte_possible(addr, next, level,
> xe_walk)) {
>                 struct xe_res_cursor *curs = xe_walk->curs;
> +               bool null = xe_walk->pte_flags & XE_PTE_NULL;

Although lower-case, this might confuse code readers having had too
little coffee. Could we use another name instead of null? null_vma?

>  
>                 XE_WARN_ON(xe_walk->va_curs_start != addr);
>  
> -               pte = __gen8_pte_encode(xe_res_dma(curs) + xe_walk-
> >dma_offset,
> -                                       xe_walk->cache, xe_walk-
> >pte_flags,
> -                                       level);
> +               if (null) {
> +                       pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
> +
> +                       if (unlikely(xe_walk->pte_flags &
> XE_PTE_READ_ONLY))
> +                               pte &= ~XE_PAGE_RW;
> +
> +                       if (level == 1)
> +                               pte |= XE_PDE_PS_2M;
> +                       else if (level == 2)
> +                               pte |= XE_PDPE_PS_1G;
> +
> +                       pte |= XE_PTE_NULL;
> +               } else {
> +                       pte = __gen8_pte_encode(xe_res_dma(curs) +
> +                                               xe_walk->dma_offset,
> +                                               xe_walk->cache,
> +                                               xe_walk->pte_flags,
> +                                               level);
> +               }

Again, it looks like XE_PPGTT_PTE_NULL should just be or'ed to
@default_pte at the walk start, skipping this conditional?

>                 pte |= xe_walk->default_pte;
>  
>                 /*
> @@ -658,7 +685,8 @@ xe_pt_stage_bind_entry(struct drm_pt *parent,
> pgoff_t offset,
>                 if (unlikely(ret))
>                         return ret;
>  
> -               xe_res_next(curs, next - addr);
> +               if (!null)
> +                       xe_res_next(curs, next - addr);


>                 xe_walk->va_curs_start = next;
>                 *action = ACTION_CONTINUE;
>  
> @@ -751,7 +779,8 @@ xe_pt_stage_bind(struct xe_gt *gt, struct xe_vma
> *vma,
>                 .gt = gt,
>                 .curs = &curs,
>                 .va_curs_start = xe_vma_start(vma),
> -               .pte_flags = xe_vma_read_only(vma) ? XE_PTE_READ_ONLY
> : 0,
> +               .pte_flags = xe_vma_read_only(vma) ? XE_PTE_READ_ONLY
> : 0 |
> +                       xe_vma_is_null(vma) ? XE_PTE_NULL : 0,
>                 .wupd.entries = entries,
>                 .needs_64K = (xe_vma_vm(vma)->flags &
> XE_VM_FLAGS_64K) &&
>                         is_vram,
> @@ -769,23 +798,28 @@ xe_pt_stage_bind(struct xe_gt *gt, struct
> xe_vma *vma,
>                         gt_to_xe(gt)->mem.vram.io_start;
>                 xe_walk.cache = XE_CACHE_WB;
>         } else {
> -               if (!xe_vma_is_userptr(vma) && bo->flags &
> XE_BO_SCANOUT_BIT)
> +               if (!xe_vma_has_no_bo(vma) && bo->flags &
> XE_BO_SCANOUT_BIT)
>                         xe_walk.cache = XE_CACHE_WT;
>                 else
>                         xe_walk.cache = XE_CACHE_WB;
>         }
> -       if (!xe_vma_is_userptr(vma) && xe_bo_is_stolen(bo))
> +       if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo))
>                 xe_walk.dma_offset =
> xe_ttm_stolen_gpu_offset(xe_bo_device(bo));
>  
>         xe_bo_assert_held(bo);
> -       if (xe_vma_is_userptr(vma))
> -               xe_res_first_sg(vma->userptr.sg, 0, xe_vma_size(vma),
> &curs);
> -       else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
> -               xe_res_first(bo->ttm.resource, xe_vma_bo_offset(vma),
> -                            xe_vma_size(vma), &curs);
> -       else
> -               xe_res_first_sg(xe_bo_get_sg(bo),
> xe_vma_bo_offset(vma),
> -                               xe_vma_size(vma), &curs);
> +       if (!xe_vma_is_null(vma)) {
> +               if (xe_vma_is_userptr(vma))
> +                       xe_res_first_sg(vma->userptr.sg, 0,
> xe_vma_size(vma),
> +                                       &curs);
> +               else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
> +                       xe_res_first(bo->ttm.resource,
> xe_vma_bo_offset(vma),
> +                                    xe_vma_size(vma), &curs);
> +               else
> +                       xe_res_first_sg(xe_bo_get_sg(bo),
> xe_vma_bo_offset(vma),
> +                                       xe_vma_size(vma), &curs);
> +       } else {
> +               curs.size = xe_vma_size(vma);
> +       }
>  
>         ret = drm_pt_walk_range(&pt->drm, pt->level,
> xe_vma_start(vma),
>                                 xe_vma_end(vma), &xe_walk.drm);
> @@ -979,7 +1013,7 @@ static void xe_pt_commit_locks_assert(struct
> xe_vma *vma)
>  
>         if (xe_vma_is_userptr(vma))
>                 lockdep_assert_held_read(&vm->userptr.notifier_lock);
> -       else
> +       else if (!xe_vma_is_null(vma))
>                 dma_resv_assert_held(xe_vma_bo(vma)->ttm.base.resv);

Without looking at the code, IIRC there is a vma-range-specific resv
needed to be held here. Could we assert the vm resv is held for NULL
vmas similar to userptrs?

>  
>         dma_resv_assert_held(&vm->resv);
> @@ -1283,7 +1317,8 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct
> xe_vma *vma, struct xe_engine *e,
>         struct xe_vm_pgtable_update entries[XE_VM_MAX_LEVEL * 2 + 1];
>         struct xe_pt_migrate_pt_update bind_pt_update = {
>                 .base = {
> -                       .ops = xe_vma_is_userptr(vma) ?
> &userptr_bind_ops : &bind_ops,
> +                       .ops = xe_vma_is_userptr(vma) ?
> &userptr_bind_ops :
> +                               &bind_ops,

Unrelated change.

>                         .vma = vma,
>                 },
>                 .bind = true,
> @@ -1348,7 +1383,7 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct
> xe_vma *vma, struct xe_engine *e,
>                                    DMA_RESV_USAGE_KERNEL :
>                                    DMA_RESV_USAGE_BOOKKEEP);
>  
> -               if (!xe_vma_is_userptr(vma) && !xe_vma_bo(vma)->vm)
> +               if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
>                         dma_resv_add_fence(xe_vma_bo(vma)-
> >ttm.base.resv, fence,
>                                            DMA_RESV_USAGE_BOOKKEEP);
>                 xe_pt_commit_bind(vma, entries, num_entries, rebind,
> @@ -1667,7 +1702,7 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct
> xe_vma *vma, struct xe_engine *e,
>                                    DMA_RESV_USAGE_BOOKKEEP);
>  
>                 /* This fence will be installed by caller when doing
> eviction */
> -               if (!xe_vma_is_userptr(vma) && !xe_vma_bo(vma)->vm)
> +               if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
>                         dma_resv_add_fence(xe_vma_bo(vma)-
> >ttm.base.resv, fence,
>                                            DMA_RESV_USAGE_BOOKKEEP);
>                 xe_pt_commit_unbind(vma, entries, num_entries,
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index f3608865e259..a46f44ab2546 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -60,6 +60,7 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
>  
>         lockdep_assert_held(&vm->lock);
>         XE_BUG_ON(!xe_vma_is_userptr(vma));
> +       XE_BUG_ON(xe_vma_is_null(vma));

Isn't this caught by the BUG_ON just above?

>  retry:
>         if (vma->gpuva.flags & XE_VMA_DESTROYED)
>                 return 0;
> @@ -581,7 +582,7 @@ static void preempt_rebind_work_func(struct
> work_struct *w)
>                 goto out_unlock;
>  
>         list_for_each_entry(vma, &vm->rebind_list, rebind_link) {
> -               if (xe_vma_is_userptr(vma) ||
> +               if (xe_vma_has_no_bo(vma) ||
>                     vma->gpuva.flags & XE_VMA_DESTROYED)
>                         continue;
>  
> @@ -813,7 +814,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm
> *vm,
>                                     struct xe_bo *bo,
>                                     u64 bo_offset_or_userptr,
>                                     u64 start, u64 end,
> -                                   bool read_only,
> +                                   bool read_only, bool null,

Again, variable name.

>                                     u64 gt_mask)
>  {
>         struct xe_vma *vma;
> @@ -843,6 +844,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm
> *vm,
>         vma->gpuva.va.range = end - start + 1;
>         if (read_only)
>                 vma->gpuva.flags |= XE_VMA_READ_ONLY;
> +       if (null)
> +               vma->gpuva.flags |= XE_VMA_NULL;
>  
>         if (gt_mask) {
>                 vma->gt_mask = gt_mask;
> @@ -862,23 +865,26 @@ static struct xe_vma *xe_vma_create(struct
> xe_vm *vm,
>                 vma->gpuva.gem.obj = &bo->ttm.base;
>                 vma->gpuva.gem.offset = bo_offset_or_userptr;
>                 drm_gpuva_link(&vma->gpuva);
> -       } else /* userptr */ {
> -               u64 size = end - start + 1;
> -               int err;
> -
> -               vma->gpuva.gem.offset = bo_offset_or_userptr;
> +       } else /* userptr or null */ {
> +               if (!null) {
> +                       u64 size = end - start + 1;
> +                       int err;
> +
> +                       vma->gpuva.gem.offset = bo_offset_or_userptr;
> +                       err = mmu_interval_notifier_insert(&vma-
> >userptr.notifier,
> +                                                          current-
> >mm,
> +                                                         
> xe_vma_userptr(vma),
> +                                                          size,
> +                                                         
> &vma_userptr_notifier_ops);
> +                       if (err) {
> +                               kfree(vma);
> +                               vma = ERR_PTR(err);
> +                               return vma;
> +                       }
>  
> -               err = mmu_interval_notifier_insert(&vma-
> >userptr.notifier,
> -                                                  current->mm,
> -                                                 
> xe_vma_userptr(vma), size,
> -                                                 
> &vma_userptr_notifier_ops);
> -               if (err) {
> -                       kfree(vma);
> -                       vma = ERR_PTR(err);
> -                       return vma;
> +                       vma->userptr.notifier_seq = LONG_MAX;
>                 }
>  
> -               vma->userptr.notifier_seq = LONG_MAX;
>                 xe_vm_get(vm);
>         }
>  
> @@ -916,6 +922,8 @@ static void xe_vma_destroy_late(struct xe_vma
> *vma)
>                  */
>                 mmu_interval_notifier_remove(&vma->userptr.notifier);
>                 xe_vm_put(vm);
> +       } else if (xe_vma_is_null(vma)) {
> +               xe_vm_put(vm);
>         } else {
>                 xe_bo_put(xe_vma_bo(vma));
>         }
> @@ -954,7 +962,7 @@ static void xe_vma_destroy(struct xe_vma *vma,
> struct dma_fence *fence)
>                 list_del_init(&vma->userptr.invalidate_link);
>                 spin_unlock(&vm->userptr.invalidated_lock);
>                 list_del(&vma->userptr_link);
> -       } else {
> +       } else if (!xe_vma_is_null(vma)) {
>                 xe_bo_assert_held(xe_vma_bo(vma));
>                 drm_gpuva_unlink(&vma->gpuva);
>                 if (!xe_vma_bo(vma)->vm)
> @@ -1305,7 +1313,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
>         drm_gpuva_iter_for_each(gpuva, it) {
>                 vma = gpuva_to_vma(gpuva);
>  
> -               if (xe_vma_is_userptr(vma)) {
> +               if (xe_vma_has_no_bo(vma)) {
>                         down_read(&vm->userptr.notifier_lock);
>                         vma->gpuva.flags |= XE_VMA_DESTROYED;
>                         up_read(&vm->userptr.notifier_lock);
> @@ -1315,7 +1323,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
>                 drm_gpuva_iter_remove(&it);
>  
>                 /* easy case, remove from VMA? */
> -               if (xe_vma_is_userptr(vma) || xe_vma_bo(vma)->vm) {
> +               if (xe_vma_has_no_bo(vma) || xe_vma_bo(vma)->vm) {
>                         xe_vma_destroy(vma, NULL);
>                         continue;
>                 }
> @@ -1964,7 +1972,7 @@ static int xe_vm_prefetch(struct xe_vm *vm,
> struct xe_vma *vma,
>  
>         XE_BUG_ON(region > ARRAY_SIZE(region_to_mem_type));
>  
> -       if (!xe_vma_is_userptr(vma)) {
> +       if (!xe_vma_has_no_bo(vma)) {
>                 err = xe_bo_migrate(xe_vma_bo(vma),
> region_to_mem_type[region]);
>                 if (err)
>                         return err;
> @@ -2170,6 +2178,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
> struct xe_bo *bo,
>                                 operation &
> XE_VM_BIND_FLAG_IMMEDIATE;
>                         op->map.read_only =
>                                 operation & XE_VM_BIND_FLAG_READONLY;
> +                       op->map.null = operation &
> XE_VM_BIND_FLAG_NULL;
>                 }
>                 break;
>         case XE_VM_BIND_OP_UNMAP:
> @@ -2226,7 +2235,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
> struct xe_bo *bo,
>  }
>  
>  static struct xe_vma *new_vma(struct xe_vm *vm, struct
> drm_gpuva_op_map *op,
> -                             u64 gt_mask, bool read_only)
> +                             u64 gt_mask, bool read_only, bool null)
>  {
>         struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) :
> NULL;
>         struct xe_vma *vma;
> @@ -2242,7 +2251,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm,
> struct drm_gpuva_op_map *op,
>         }
>         vma = xe_vma_create(vm, bo, op->gem.offset,
>                             op->va.addr, op->va.addr +
> -                           op->va.range - 1, read_only,
> +                           op->va.range - 1, read_only, null,
>                             gt_mask);
>         if (bo)
>                 xe_bo_unlock(bo, &ww);
> @@ -2254,7 +2263,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm,
> struct drm_gpuva_op_map *op,
>                         xe_vma_destroy(vma, NULL);
>                         return ERR_PTR(err);
>                 }
> -       } else if(!bo->vm) {
> +       } else if(!xe_vma_has_no_bo(vma) && !bo->vm) {
>                 vm_insert_extobj(vm, vma);
>                 err = add_preempt_fences(vm, bo);
>                 if (err) {
> @@ -2332,7 +2341,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm
> *vm, struct xe_engine *e,
>                                 struct xe_vma *vma;
>  
>                                 vma = new_vma(vm, &op->base.map,
> -                                             op->gt_mask, op-
> >map.read_only);
> +                                             op->gt_mask, op-
> >map.read_only,
> +                                             op->map.null );
>                                 if (IS_ERR(vma)) {
>                                         err = PTR_ERR(vma);
>                                         goto free_fence;
> @@ -2347,9 +2357,13 @@ static int vm_bind_ioctl_ops_parse(struct
> xe_vm *vm, struct xe_engine *e,
>                                         bool read_only =
>                                                 op->base.remap.unmap-
> >va->flags &
>                                                 XE_VMA_READ_ONLY;
> +                                       bool null =
> +                                               op->base.remap.unmap-
> >va->flags &
> +                                               XE_VMA_NULL;
>  
>                                         vma = new_vma(vm, op-
> >base.remap.prev,
> -                                                     op->gt_mask,
> read_only);
> +                                                     op->gt_mask,
> read_only,
> +                                                     null);
>                                         if (IS_ERR(vma)) {
>                                                 err = PTR_ERR(vma);
>                                                 goto free_fence;
> @@ -2364,8 +2378,13 @@ static int vm_bind_ioctl_ops_parse(struct
> xe_vm *vm, struct xe_engine *e,
>                                                 op->base.remap.unmap-
> >va->flags &
>                                                 XE_VMA_READ_ONLY;
>  
> +                                       bool null =
> +                                               op->base.remap.unmap-
> >va->flags &
> +                                               XE_VMA_NULL;
> +
>                                         vma = new_vma(vm, op-
> >base.remap.next,
> -                                                     op->gt_mask,
> read_only);
> +                                                     op->gt_mask,
> read_only,
> +                                                     null);
>                                         if (IS_ERR(vma)) {
>                                                 err = PTR_ERR(vma);
>                                                 goto free_fence;
> @@ -2853,11 +2872,12 @@ static void vm_bind_ioctl_ops_unwind(struct
> xe_vm *vm,
>  #ifdef TEST_VM_ASYNC_OPS_ERROR
>  #define SUPPORTED_FLAGS        \
>         (FORCE_ASYNC_OP_ERROR | XE_VM_BIND_FLAG_ASYNC | \
> -        XE_VM_BIND_FLAG_READONLY | XE_VM_BIND_FLAG_IMMEDIATE |
> 0xffff)
> +        XE_VM_BIND_FLAG_READONLY | XE_VM_BIND_FLAG_IMMEDIATE | \
> +        XE_VM_BIND_FLAG_NULL | 0xffff)
>  #else
>  #define SUPPORTED_FLAGS        \
>         (XE_VM_BIND_FLAG_ASYNC | XE_VM_BIND_FLAG_READONLY | \
> -        XE_VM_BIND_FLAG_IMMEDIATE | 0xffff)
> +        XE_VM_BIND_FLAG_IMMEDIATE | XE_VM_BIND_FLAG_NULL | 0xffff)
>  #endif
>  #define XE_64K_PAGE_MASK 0xffffull
>  
> @@ -2903,6 +2923,7 @@ static int vm_bind_ioctl_check_args(struct
> xe_device *xe,
>                 u32 obj = (*bind_ops)[i].obj;
>                 u64 obj_offset = (*bind_ops)[i].obj_offset;
>                 u32 region = (*bind_ops)[i].region;
> +               bool null = op &  XE_VM_BIND_FLAG_NULL;

And in multiple other places...

>  
>                 if (i == 0) {
>                         *async = !!(op & XE_VM_BIND_FLAG_ASYNC);
> @@ -2929,8 +2950,12 @@ static int vm_bind_ioctl_check_args(struct
> xe_device *xe,
>                 if (XE_IOCTL_ERR(xe, VM_BIND_OP(op) >
>                                  XE_VM_BIND_OP_PREFETCH) ||
>                     XE_IOCTL_ERR(xe, op & ~SUPPORTED_FLAGS) ||
> +                   XE_IOCTL_ERR(xe, obj && null) ||
> +                   XE_IOCTL_ERR(xe, obj_offset && null) ||
> +                   XE_IOCTL_ERR(xe, VM_BIND_OP(op) !=
> XE_VM_BIND_OP_MAP &&
> +                                null) ||
>                     XE_IOCTL_ERR(xe, !obj &&
> -                                VM_BIND_OP(op) == XE_VM_BIND_OP_MAP)
> ||
> +                                VM_BIND_OP(op) == XE_VM_BIND_OP_MAP
> && !null) ||
>                     XE_IOCTL_ERR(xe, !obj &&
>                                  VM_BIND_OP(op) ==
> XE_VM_BIND_OP_UNMAP_ALL) ||
>                     XE_IOCTL_ERR(xe, addr &&
> @@ -3254,6 +3279,7 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
>         int ret;
>  
>         XE_BUG_ON(!xe_vm_in_fault_mode(xe_vma_vm(vma)));
> +       XE_BUG_ON(xe_vma_is_null(vma));
>         trace_xe_vma_usm_invalidate(vma);
>  
>         /* Check that we don't race with page-table updates */
> @@ -3313,8 +3339,11 @@ int xe_analyze_vm(struct drm_printer *p,
> struct xe_vm *vm, int gt_id)
>         drm_gpuva_iter_for_each(gpuva, it) {
>                 struct xe_vma* vma = gpuva_to_vma(gpuva);
>                 bool is_userptr = xe_vma_is_userptr(vma);
> +               bool null = xe_vma_is_null(vma);
>  
> -               if (is_userptr) {
> +               if (null) {
> +                       addr = 0;
> +               } else if (is_userptr) {
>                         struct xe_res_cursor cur;
>  
>                         xe_res_first_sg(vma->userptr.sg, 0,
> XE_PAGE_SIZE, &cur);
> @@ -3324,7 +3353,8 @@ int xe_analyze_vm(struct drm_printer *p, struct
> xe_vm *vm, int gt_id)
>                 }
>                 drm_printf(p, " [%016llx-%016llx] S:0x%016llx
> A:%016llx %s\n",
>                            xe_vma_start(vma), xe_vma_end(vma),
> xe_vma_size(vma),
> -                          addr, is_userptr ? "USR" : is_vram ?
> "VRAM" : "SYS");
> +                          addr, null ? "NULL" :
> +                          is_userptr ? "USR" : is_vram ? "VRAM" :
> "SYS");
>         }
>         up_read(&vm->lock);
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 21b1054949c4..96e2c6b07bf8 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -175,7 +175,17 @@ static inline void
> xe_vm_reactivate_rebind(struct xe_vm *vm)
>         }
>  }
>  
> +static inline bool xe_vma_is_null(struct xe_vma *vma)
> +{
> +       return vma->gpuva.flags & XE_VMA_NULL;
> +}
> +
>  static inline bool xe_vma_is_userptr(struct xe_vma *vma)
> +{
> +       return !xe_vma_bo(vma) && !xe_vma_is_null(vma);
> +}
> +
> +static inline bool xe_vma_has_no_bo(struct xe_vma *vma)
>  {
>         return !xe_vma_bo(vma);
>  }



> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c
> b/drivers/gpu/drm/xe/xe_vm_madvise.c
> index 02d27a354b36..03508645fa08 100644
> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> @@ -227,7 +227,7 @@ get_vmas(struct xe_vm *vm, int *num_vmas, u64
> addr, u64 range)
>         drm_gpuva_iter_for_each_range(gpuva, it, addr + range) {
>                 struct xe_vma *vma = gpuva_to_vma(gpuva);
>  
> -               if (xe_vma_is_userptr(vma))
> +               if (xe_vma_has_no_bo(vma))
>                         continue;
>  
>                 if (*num_vmas == max_vmas) {
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> b/drivers/gpu/drm/xe/xe_vm_types.h
> index 243dc91a61b0..b61007b70502 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -29,6 +29,7 @@ struct xe_vm;
>  #define XE_VMA_ATOMIC_PTE_BIT  (DRM_GPUVA_USERBITS << 2)
>  #define XE_VMA_FIRST_REBIND    (DRM_GPUVA_USERBITS << 3)
>  #define XE_VMA_LAST_REBIND     (DRM_GPUVA_USERBITS << 4)
> +#define XE_VMA_NULL            (DRM_GPUVA_USERBITS << 5)
>  
>  struct xe_vma {
>         /** @gpuva: Base GPUVA object */
> @@ -315,6 +316,8 @@ struct xe_vma_op_map {
>         bool immediate;
>         /** @read_only: Read only */
>         bool read_only;
> +       /** @null: NULL (writes dropped, read zero) */
> +       bool null;
>  };
>  
>  /** struct xe_vma_op_unmap - VMA unmap operation */
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index b0b80aae3ee8..27c51946fadd 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -447,6 +447,14 @@ struct drm_xe_vm_bind_op {
>          * than differing the MAP to the page fault handler.
>          */
>  #define XE_VM_BIND_FLAG_IMMEDIATE      (0x1 << 18)
> +       /*
> +        * When the NULL flag is set, the page tables are setup with
> a special
> +        * bit which indicates writes are dropped and all reads
> return zero. The
> +        * NULL flags is only valid for XE_VM_BIND_OP_MAP operations,
> the BO
> +        * handle MBZ, and the BO offset MBZ. This flag is intended
> to implement
> +        * VK sparse bindings.
> +        */
> +#define XE_VM_BIND_FLAG_NULL           (0x1 << 19)
>  
>         /** @reserved: Reserved */
>         __u64 reserved[2];


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 05/31] drm/xe: Long running job update
  2023-05-09 14:56     ` Matthew Brost
@ 2023-05-09 15:21       ` Thomas Hellström
  2023-05-09 22:16         ` Matthew Brost
  0 siblings, 1 reply; 126+ messages in thread
From: Thomas Hellström @ 2023-05-09 15:21 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe


On 5/9/23 16:56, Matthew Brost wrote:
> On Mon, May 08, 2023 at 03:14:10PM +0200, Thomas Hellström wrote:
>> Hi, Matthew
>>
>> In addition to Rodrigo's comments:
>>
>> On 5/2/23 02:17, Matthew Brost wrote:
>>> Flow control + write ring in exec, return NULL in run_job, siganl
>>> xe_hw_fence immediately, and override TDR for LR jobs.
>>>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/xe_engine.c           | 32 ++++++++
>>>    drivers/gpu/drm/xe/xe_engine.h           |  4 +
>>>    drivers/gpu/drm/xe/xe_exec.c             |  8 ++
>>>    drivers/gpu/drm/xe/xe_guc_engine_types.h |  2 +
>>>    drivers/gpu/drm/xe/xe_guc_submit.c       | 95 +++++++++++++++++++++---
>>>    drivers/gpu/drm/xe/xe_trace.h            |  5 ++
>>>    6 files changed, 137 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
>>> index 094ec17d3004..d1e84d7adbd4 100644
>>> --- a/drivers/gpu/drm/xe/xe_engine.c
>>> +++ b/drivers/gpu/drm/xe/xe_engine.c
>>> @@ -18,6 +18,7 @@
>>>    #include "xe_macros.h"
>>>    #include "xe_migrate.h"
>>>    #include "xe_pm.h"
>>> +#include "xe_ring_ops_types.h"
>>>    #include "xe_trace.h"
>>>    #include "xe_vm.h"
>>> @@ -673,6 +674,37 @@ static void engine_kill_compute(struct xe_engine *e)
>>>    	up_write(&e->vm->lock);
>>>    }
>>> +/**
>>> + * xe_engine_is_lr() - Whether an engine is long-running
>>> + * @e: The engine
>>> + *
>>> + * Return: True if the engine is long-running, false otherwise.
>>> + */
>>> +bool xe_engine_is_lr(struct xe_engine *e)
>>> +{
>>> +	return e->vm && xe_vm_no_dma_fences(e->vm) &&
>>> +		!(e->flags & ENGINE_FLAG_VM);
>>> +}
>>> +
>>> +static s32 xe_engine_num_job_inflight(struct xe_engine *e)
>>> +{
>>> +	return e->lrc->fence_ctx.next_seqno - xe_lrc_seqno(e->lrc) - 1;
>>> +}
>>> +
>>> +/**
>>> + * xe_engine_ring_full() - Whether an engine's ring is full
>>> + * @e: The engine
>>> + *
>>> + * Return: True if the engine's ring is full, false otherwise.
>>> + */
>>> +bool xe_engine_ring_full(struct xe_engine *e)
>>> +{
>>> +	struct xe_lrc *lrc = e->lrc;
>>> +	s32 max_job = lrc->ring.size / MAX_JOB_SIZE_BYTES;
>>> +
>>> +	return xe_engine_num_job_inflight(e) >= max_job;
>>> +}
>>> +
>>>    /**
>>>     * xe_engine_is_idle() - Whether an engine is idle.
>>>     * @engine: The engine
>>> diff --git a/drivers/gpu/drm/xe/xe_engine.h b/drivers/gpu/drm/xe/xe_engine.h
>>> index a49cf2ab405e..2e60f6d90226 100644
>>> --- a/drivers/gpu/drm/xe/xe_engine.h
>>> +++ b/drivers/gpu/drm/xe/xe_engine.h
>>> @@ -42,6 +42,10 @@ static inline bool xe_engine_is_parallel(struct xe_engine *engine)
>>>    	return engine->width > 1;
>>>    }
>>> +bool xe_engine_is_lr(struct xe_engine *e);
>>> +
>>> +bool xe_engine_ring_full(struct xe_engine *e);
>>> +
>>>    bool xe_engine_is_idle(struct xe_engine *engine);
>>>    void xe_engine_kill(struct xe_engine *e);
>>> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
>>> index ea869f2452ef..44ea9bcd0066 100644
>>> --- a/drivers/gpu/drm/xe/xe_exec.c
>>> +++ b/drivers/gpu/drm/xe/xe_exec.c
>>> @@ -13,6 +13,7 @@
>>>    #include "xe_device.h"
>>>    #include "xe_engine.h"
>>>    #include "xe_macros.h"
>>> +#include "xe_ring_ops_types.h"
>>>    #include "xe_sched_job.h"
>>>    #include "xe_sync.h"
>>>    #include "xe_vm.h"
>>> @@ -277,6 +278,11 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>>    		goto err_engine_end;
>>>    	}
>>> +	if (xe_engine_is_lr(engine) && xe_engine_ring_full(engine)) {
>>> +		err = -EWOULDBLOCK;
>>> +		goto err_engine_end;
>>> +	}
>>> +
>>>    	job = xe_sched_job_create(engine, xe_engine_is_parallel(engine) ?
>>>    				  addresses : &args->address);
>>>    	if (IS_ERR(job)) {
>>> @@ -363,6 +369,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>>    		xe_sync_entry_signal(&syncs[i], job,
>>>    				     &job->drm.s_fence->finished);
>>> +	if (xe_engine_is_lr(engine))
>>> +		engine->ring_ops->emit_job(job);
>>>    	xe_sched_job_push(job);
>>>    	xe_vm_reactivate_rebind(vm);
>>> diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
>>> index cbfb13026ec1..5d83132034a6 100644
>>> --- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
>>> +++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
>>> @@ -31,6 +31,8 @@ struct xe_guc_engine {
>>>    	 */
>>>    #define MAX_STATIC_MSG_TYPE	3
>>>    	struct drm_sched_msg static_msgs[MAX_STATIC_MSG_TYPE];
>>> +	/** @lr_tdr: long running TDR worker */
>>> +	struct work_struct lr_tdr;
>>>    	/** @fini_async: do final fini async from this worker */
>>>    	struct work_struct fini_async;
>>>    	/** @resume_time: time of last resume */
>>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>>> index 68d09e7a4cc0..0a41f5d04f6d 100644
>>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>>> @@ -500,6 +500,14 @@ static void register_engine(struct xe_engine *e)
>>>    		parallel_write(xe, map, wq_desc.wq_status, WQ_STATUS_ACTIVE);
>>>    	}
>>> +	/*
>>> +	 * We must keep a reference for LR engines if engine is registered with
>>> +	 * the GuC as jobs signal immediately and can't destroy an engine if the
>>> +	 * GuC has a reference to it.
>>> +	 */
>>> +	if (xe_engine_is_lr(e))
>>> +		xe_engine_get(e);
>>> +
>>>    	set_engine_registered(e);
>>>    	trace_xe_engine_register(e);
>>>    	if (xe_engine_is_parallel(e))
>>> @@ -662,6 +670,7 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
>>>    {
>>>    	struct xe_sched_job *job = to_xe_sched_job(drm_job);
>>>    	struct xe_engine *e = job->engine;
>>> +	bool lr = xe_engine_is_lr(e);
>>>    	XE_BUG_ON((engine_destroyed(e) || engine_pending_disable(e)) &&
>>>    		  !engine_banned(e) && !engine_suspended(e));
>>> @@ -671,14 +680,19 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
>>>    	if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
>>>    		if (!engine_registered(e))
>>>    			register_engine(e);
>>> -		e->ring_ops->emit_job(job);
>>> +		if (!lr)	/* Written in IOCTL */
>> Hmm? What does "Written in IOCTL mean?" Could you rephrase to something more
>> descriptive?
>>
> "LR jos are emitted in the IOCTL"

Ah, I read it as "the lr variable was written in IOCTL."

Perhaps LR jobs are already emitted at execbuf time?

/Thomas


>
> Does that work?
>
> Matt
>
>>> +			e->ring_ops->emit_job(job);
>>>    		submit_engine(e);
>>>    	}
>>> -	if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags))
>>> +	if (lr) {
>>> +		xe_sched_job_set_error(job, -ENOTSUPP);
>>> +		return NULL;
>>> +	} else if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags)) {
>>>    		return job->fence;
>>> -	else
>>> +	} else {
>>>    		return dma_fence_get(job->fence);
>>> +	}
>>>    }
>>>    static void guc_engine_free_job(struct drm_sched_job *drm_job)
>>> @@ -782,6 +796,57 @@ static void simple_error_capture(struct xe_engine *e)
>>>    }
>>>    #endif
>>> +static void xe_guc_engine_trigger_cleanup(struct xe_engine *e)
>>> +{
>>> +	struct xe_guc *guc = engine_to_guc(e);
>>> +
>>> +	if (xe_engine_is_lr(e))
>>> +		queue_work(guc_to_gt(guc)->ordered_wq, &e->guc->lr_tdr);
>>> +	else
>>> +		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
>>> +}
>>> +
>>> +static void xe_guc_engine_lr_cleanup(struct work_struct *w)
>>> +{
>>> +	struct xe_guc_engine *ge =
>>> +		container_of(w, struct xe_guc_engine, lr_tdr);
>>> +	struct xe_engine *e = ge->engine;
>>> +	struct drm_gpu_scheduler *sched = &ge->sched;
>>> +
>>> +	XE_BUG_ON(!xe_engine_is_lr(e));
>>> +	trace_xe_engine_lr_cleanup(e);
>>> +
>>> +	/* Kill the run_job / process_msg entry points */
>>> +	drm_sched_run_wq_stop(sched);
>>> +
>>> +	/* Engine state now stable, disable scheduling / deregister if needed */
>>> +	if (engine_registered(e)) {
>>> +		struct xe_guc *guc = engine_to_guc(e);
>>> +		int ret;
>>> +
>>> +		set_engine_banned(e);
>>> +		xe_engine_get(e);
>>> +		disable_scheduling_deregister(guc, e);
>>> +
>>> +		/*
>>> +		 * Must wait for scheduling to be disabled before signalling
>>> +		 * any fences, if GT broken the GT reset code should signal us.
>>> +		 */
>>> +		smp_rmb();
>> wait_event() paired with wake_up() family of functions typically set the
>> necessary barriers to make sure anything written prior to wake_up() is seen
>> in wait_event(). So that smp_rmb() is most likely not needed. If it still
>> is, its pairing smp_wmb() should be documented and pointed to as well. See
>> documentation of set_current_state() vs __set_current_state().
>>
>>> +		ret = wait_event_timeout(guc->ct.wq,
>>> +					 !engine_pending_disable(e) ||
>>> +					 guc_read_stopped(guc), HZ * 5);
>>> +		if (!ret) {
>>> +			XE_WARN_ON("Schedule disable failed to respond");
>>> +			drm_sched_run_wq_start(sched);
>>> +			xe_gt_reset_async(e->gt);
>>> +			return;
>>> +		}
>>> +	}
>>> +
>>> +	drm_sched_run_wq_start(sched);
>>> +}
>>> +
>>>    static enum drm_gpu_sched_stat
>>>    guc_engine_timedout_job(struct drm_sched_job *drm_job)
>>>    {
>>> @@ -832,7 +897,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
>>>    			err = -EIO;
>>>    		set_engine_banned(e);
>>>    		xe_engine_get(e);
>>> -		disable_scheduling_deregister(engine_to_guc(e), e);
>>> +		disable_scheduling_deregister(guc, e);
>>>    		/*
>>>    		 * Must wait for scheduling to be disabled before signalling
>>> @@ -865,7 +930,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
>>>    	 */
>>>    	list_add(&drm_job->list, &sched->pending_list);
>>>    	drm_sched_run_wq_start(sched);
>>> -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
>>> +	xe_guc_engine_trigger_cleanup(e);
>>>    	/* Mark all outstanding jobs as bad, thus completing them */
>>>    	spin_lock(&sched->job_list_lock);
>>> @@ -889,6 +954,8 @@ static void __guc_engine_fini_async(struct work_struct *w)
>>>    	trace_xe_engine_destroy(e);
>>> +	if (xe_engine_is_lr(e))
>>> +		cancel_work_sync(&ge->lr_tdr);
>>>    	if (e->flags & ENGINE_FLAG_PERSISTENT)
>>>    		xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
>>>    	release_guc_id(guc, e);
>>> @@ -906,7 +973,7 @@ static void guc_engine_fini_async(struct xe_engine *e)
>>>    	bool kernel = e->flags & ENGINE_FLAG_KERNEL;
>>>    	INIT_WORK(&e->guc->fini_async, __guc_engine_fini_async);
>>> -	queue_work(system_unbound_wq, &e->guc->fini_async);
>>> +	queue_work(system_wq, &e->guc->fini_async);
>>>    	/* We must block on kernel engines so slabs are empty on driver unload */
>>>    	if (kernel) {
>>> @@ -1089,12 +1156,16 @@ static int guc_engine_init(struct xe_engine *e)
>>>    	if (err)
>>>    		goto err_free;
>>> +
>> Unrelated whitespace?
>>
>>
>>>    	sched = &ge->sched;
>>>    	err = drm_sched_entity_init(&ge->entity, DRM_SCHED_PRIORITY_NORMAL,
>>>    				    &sched, 1, NULL);
>>>    	if (err)
>>>    		goto err_sched;
>>> +	if (xe_engine_is_lr(e))
>>> +		INIT_WORK(&e->guc->lr_tdr, xe_guc_engine_lr_cleanup);
>>> +
>>>    	mutex_lock(&guc->submission_state.lock);
>>>    	err = alloc_guc_id(guc, e);
>>> @@ -1146,7 +1217,7 @@ static void guc_engine_kill(struct xe_engine *e)
>>>    {
>>>    	trace_xe_engine_kill(e);
>>>    	set_engine_killed(e);
>>> -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
>>> +	xe_guc_engine_trigger_cleanup(e);
>>>    }
>>>    static void guc_engine_add_msg(struct xe_engine *e, struct drm_sched_msg *msg,
>>> @@ -1296,6 +1367,9 @@ static void guc_engine_stop(struct xe_guc *guc, struct xe_engine *e)
>>>    	/* Stop scheduling + flush any DRM scheduler operations */
>>>    	drm_sched_run_wq_stop(sched);
>>> +	if (engine_registered(e) && xe_engine_is_lr(e))
>>> +		xe_engine_put(e);
>>> +
>>>    	/* Clean up lost G2H + reset engine state */
>>>    	if (engine_destroyed(e) && engine_registered(e)) {
>>>    		if (engine_banned(e))
>>> @@ -1520,6 +1594,9 @@ int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>>>    	trace_xe_engine_deregister_done(e);
>>>    	clear_engine_registered(e);
>>> +	if (xe_engine_is_lr(e))
>>> +		xe_engine_put(e);
>>> +
>>>    	if (engine_banned(e))
>>>    		xe_engine_put(e);
>>>    	else
>>> @@ -1557,7 +1634,7 @@ int xe_guc_engine_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
>>>    	 */
>>>    	set_engine_reset(e);
>>>    	if (!engine_banned(e))
>>> -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
>>> +		xe_guc_engine_trigger_cleanup(e);
>>>    	return 0;
>>>    }
>>> @@ -1584,7 +1661,7 @@ int xe_guc_engine_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
>>>    	/* Treat the same as engine reset */
>>>    	set_engine_reset(e);
>>>    	if (!engine_banned(e))
>>> -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
>>> +		xe_guc_engine_trigger_cleanup(e);
>>>    	return 0;
>>>    }
>>> diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
>>> index 2f8eb7ebe9a7..02861c26e145 100644
>>> --- a/drivers/gpu/drm/xe/xe_trace.h
>>> +++ b/drivers/gpu/drm/xe/xe_trace.h
>>> @@ -219,6 +219,11 @@ DEFINE_EVENT(xe_engine, xe_engine_resubmit,
>>>    	     TP_ARGS(e)
>>>    );
>>> +DEFINE_EVENT(xe_engine, xe_engine_lr_cleanup,
>>> +	     TP_PROTO(struct xe_engine *e),
>>> +	     TP_ARGS(e)
>>> +);
>>> +
>>>    DECLARE_EVENT_CLASS(xe_sched_job,
>>>    		    TP_PROTO(struct xe_sched_job *job),
>>>    		    TP_ARGS(job),

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move
  2023-05-09 12:47   ` Thomas Hellström
@ 2023-05-09 22:05     ` Matthew Brost
  2023-05-10  8:14       ` Thomas Hellström
  0 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-09 22:05 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

On Tue, May 09, 2023 at 02:47:54PM +0200, Thomas Hellström wrote:
> 
> On 5/2/23 02:17, Matthew Brost wrote:
> > Use the TTM LRU bulk move for BOs tied to a VM. Update the bulk moves
> > LRU position on every exec.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_bo.c       | 32 ++++++++++++++++++++++++++++----
> >   drivers/gpu/drm/xe/xe_bo.h       |  4 ++--
> >   drivers/gpu/drm/xe/xe_dma_buf.c  |  2 +-
> >   drivers/gpu/drm/xe/xe_exec.c     |  6 ++++++
> >   drivers/gpu/drm/xe/xe_vm_types.h |  3 +++
> >   5 files changed, 40 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > index 3ab404e33fae..da99ee53e7d7 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -985,6 +985,23 @@ static void xe_gem_object_free(struct drm_gem_object *obj)
> >   	ttm_bo_put(container_of(obj, struct ttm_buffer_object, base));
> >   }
> > +static void xe_gem_object_close(struct drm_gem_object *obj,
> > +				struct drm_file *file_priv)
> > +{
> > +	struct xe_bo *bo = gem_to_xe_bo(obj);
> > +
> > +	if (bo->vm && !xe_vm_no_dma_fences(bo->vm)) {
> Is there a reason we don't use bulk moves for LR vms? Admittedly bumping LRU
> doesn't make much sense when we support user-space command buffer chaining,
> but I think we should be doing it on exec at least, no?

Maybe you could make the argument for compute VMs, the preempt worker in
that case should probably do a bulk move. I can change this if desired.

Fot a fault VM it makes no sense as the fault handler updates the LRU
for individual BOs.

> > +		struct ww_acquire_ctx ww;
> > +
> > +		XE_BUG_ON(!xe_bo_is_user(bo));
> 
> Also why can't we use this for kernel objects as well? At some point we want
> to get to evictable page-table objects? Could we do this in the
> release_notify() callback to cover all potential bos?
> 

xe_gem_object_close is a user call, right? We can't call this on kernel
BOs. This also could be outside the if statement.

Matt

> /Thomas
> 
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move
  2023-05-08 21:39   ` Rodrigo Vivi
@ 2023-05-09 22:09     ` Matthew Brost
  2023-05-10  1:37       ` Rodrigo Vivi
  0 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-09 22:09 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On Mon, May 08, 2023 at 05:39:12PM -0400, Rodrigo Vivi wrote:
> On Mon, May 01, 2023 at 05:17:04PM -0700, Matthew Brost wrote:
> > Use the TTM LRU bulk move for BOs tied to a VM. Update the bulk moves
> > LRU position on every exec.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_bo.c       | 32 ++++++++++++++++++++++++++++----
> >  drivers/gpu/drm/xe/xe_bo.h       |  4 ++--
> >  drivers/gpu/drm/xe/xe_dma_buf.c  |  2 +-
> >  drivers/gpu/drm/xe/xe_exec.c     |  6 ++++++
> >  drivers/gpu/drm/xe/xe_vm_types.h |  3 +++
> >  5 files changed, 40 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > index 3ab404e33fae..da99ee53e7d7 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -985,6 +985,23 @@ static void xe_gem_object_free(struct drm_gem_object *obj)
> >  	ttm_bo_put(container_of(obj, struct ttm_buffer_object, base));
> >  }
> >  
> > +static void xe_gem_object_close(struct drm_gem_object *obj,
> > +				struct drm_file *file_priv)
> > +{
> > +	struct xe_bo *bo = gem_to_xe_bo(obj);
> > +
> > +	if (bo->vm && !xe_vm_no_dma_fences(bo->vm)) {
> > +		struct ww_acquire_ctx ww;
> > +
> > +		XE_BUG_ON(!xe_bo_is_user(bo));
> 
> We need to really stop using BUG_ON and move towards the usage of more WARNs.
> 

If that is the direction, sure I'll change this but personally I BUG_ON
for things that should be impossible with a correct KMD.

Matt 

> But the rest of the patch looks good to me... I just believe it would be
> good to get Thomas' review here.
> 
> > +
> > +		xe_bo_lock(bo, &ww, 0, false);
> > +		ttm_bo_set_bulk_move(&bo->ttm, NULL);
> > +		xe_bo_unlock(bo, &ww);
> > +	}
> > +}
> > +
> > +
> >  static bool should_migrate_to_system(struct xe_bo *bo)
> >  {
> >  	struct xe_device *xe = xe_bo_device(bo);
> > @@ -1040,6 +1057,7 @@ static const struct vm_operations_struct xe_gem_vm_ops = {
> >  
> >  static const struct drm_gem_object_funcs xe_gem_object_funcs = {
> >  	.free = xe_gem_object_free,
> > +	.close = xe_gem_object_close,
> >  	.mmap = drm_gem_ttm_mmap,
> >  	.export = xe_gem_prime_export,
> >  	.vm_ops = &xe_gem_vm_ops,
> > @@ -1081,8 +1099,8 @@ void xe_bo_free(struct xe_bo *bo)
> >  
> >  struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> >  				    struct xe_gt *gt, struct dma_resv *resv,
> > -				    size_t size, enum ttm_bo_type type,
> > -				    u32 flags)
> > +				    struct ttm_lru_bulk_move *bulk, size_t size,
> > +				    enum ttm_bo_type type, u32 flags)
> >  {
> >  	struct ttm_operation_ctx ctx = {
> >  		.interruptible = true,
> > @@ -1149,7 +1167,10 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> >  		return ERR_PTR(err);
> >  
> >  	bo->created = true;
> > -	ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
> > +	if (bulk)
> > +		ttm_bo_set_bulk_move(&bo->ttm, bulk);
> > +	else
> > +		ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
> >  
> >  	return bo;
> >  }
> > @@ -1219,7 +1240,10 @@ xe_bo_create_locked_range(struct xe_device *xe,
> >  		}
> >  	}
> >  
> > -	bo = __xe_bo_create_locked(xe, bo, gt, vm ? &vm->resv : NULL, size,
> > +	bo = __xe_bo_create_locked(xe, bo, gt, vm ? &vm->resv : NULL,
> > +				   vm && !xe_vm_no_dma_fences(vm) &&
> > +				   flags & XE_BO_CREATE_USER_BIT ?
> > +				   &vm->lru_bulk_move : NULL, size,
> >  				   type, flags);
> >  	if (IS_ERR(bo))
> >  		return bo;
> > diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> > index 8354d05ccdf3..25457b3c757b 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.h
> > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > @@ -81,8 +81,8 @@ void xe_bo_free(struct xe_bo *bo);
> >  
> >  struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> >  				    struct xe_gt *gt, struct dma_resv *resv,
> > -				    size_t size, enum ttm_bo_type type,
> > -				    u32 flags);
> > +				    struct ttm_lru_bulk_move *bulk, size_t size,
> > +				    enum ttm_bo_type type, u32 flags);
> >  struct xe_bo *
> >  xe_bo_create_locked_range(struct xe_device *xe,
> >  			  struct xe_gt *gt, struct xe_vm *vm,
> > diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> > index 9b252cc782b7..975dee1f770f 100644
> > --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> > +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> > @@ -199,7 +199,7 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
> >  	int ret;
> >  
> >  	dma_resv_lock(resv, NULL);
> > -	bo = __xe_bo_create_locked(xe, storage, NULL, resv, dma_buf->size,
> > +	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> >  				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
> >  	if (IS_ERR(bo)) {
> >  		ret = PTR_ERR(bo);
> > diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> > index 44ea9bcd0066..21a9c2fddf86 100644
> > --- a/drivers/gpu/drm/xe/xe_exec.c
> > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > @@ -374,6 +374,12 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >  	xe_sched_job_push(job);
> >  	xe_vm_reactivate_rebind(vm);
> >  
> > +	if (!err && !xe_vm_no_dma_fences(vm)) {
> > +		spin_lock(&xe->ttm.lru_lock);
> > +		ttm_lru_bulk_move_tail(&vm->lru_bulk_move);
> > +		spin_unlock(&xe->ttm.lru_lock);
> > +	}
> > +
> >  err_repin:
> >  	if (!xe_vm_no_dma_fences(vm))
> >  		up_read(&vm->userptr.notifier_lock);
> > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> > index fada7896867f..d3e99f22510d 100644
> > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > @@ -164,6 +164,9 @@ struct xe_vm {
> >  	/** Protects @rebind_list and the page-table structures */
> >  	struct dma_resv resv;
> >  
> > +	/** @lru_bulk_move: Bulk LRU move list for this VM's BOs */
> > +	struct ttm_lru_bulk_move lru_bulk_move;
> > +
> >  	u64 size;
> >  	struct rb_root vmas;
> >  
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 05/31] drm/xe: Long running job update
  2023-05-09 15:21       ` Thomas Hellström
@ 2023-05-09 22:16         ` Matthew Brost
  2023-05-10  8:15           ` Thomas Hellström
  0 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-09 22:16 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

On Tue, May 09, 2023 at 05:21:39PM +0200, Thomas Hellström wrote:
> 
> On 5/9/23 16:56, Matthew Brost wrote:
> > On Mon, May 08, 2023 at 03:14:10PM +0200, Thomas Hellström wrote:
> > > Hi, Matthew
> > > 
> > > In addition to Rodrigo's comments:
> > > 
> > > On 5/2/23 02:17, Matthew Brost wrote:
> > > > Flow control + write ring in exec, return NULL in run_job, siganl
> > > > xe_hw_fence immediately, and override TDR for LR jobs.
> > > > 
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/xe/xe_engine.c           | 32 ++++++++
> > > >    drivers/gpu/drm/xe/xe_engine.h           |  4 +
> > > >    drivers/gpu/drm/xe/xe_exec.c             |  8 ++
> > > >    drivers/gpu/drm/xe/xe_guc_engine_types.h |  2 +
> > > >    drivers/gpu/drm/xe/xe_guc_submit.c       | 95 +++++++++++++++++++++---
> > > >    drivers/gpu/drm/xe/xe_trace.h            |  5 ++
> > > >    6 files changed, 137 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
> > > > index 094ec17d3004..d1e84d7adbd4 100644
> > > > --- a/drivers/gpu/drm/xe/xe_engine.c
> > > > +++ b/drivers/gpu/drm/xe/xe_engine.c
> > > > @@ -18,6 +18,7 @@
> > > >    #include "xe_macros.h"
> > > >    #include "xe_migrate.h"
> > > >    #include "xe_pm.h"
> > > > +#include "xe_ring_ops_types.h"
> > > >    #include "xe_trace.h"
> > > >    #include "xe_vm.h"
> > > > @@ -673,6 +674,37 @@ static void engine_kill_compute(struct xe_engine *e)
> > > >    	up_write(&e->vm->lock);
> > > >    }
> > > > +/**
> > > > + * xe_engine_is_lr() - Whether an engine is long-running
> > > > + * @e: The engine
> > > > + *
> > > > + * Return: True if the engine is long-running, false otherwise.
> > > > + */
> > > > +bool xe_engine_is_lr(struct xe_engine *e)
> > > > +{
> > > > +	return e->vm && xe_vm_no_dma_fences(e->vm) &&
> > > > +		!(e->flags & ENGINE_FLAG_VM);
> > > > +}
> > > > +
> > > > +static s32 xe_engine_num_job_inflight(struct xe_engine *e)
> > > > +{
> > > > +	return e->lrc->fence_ctx.next_seqno - xe_lrc_seqno(e->lrc) - 1;
> > > > +}
> > > > +
> > > > +/**
> > > > + * xe_engine_ring_full() - Whether an engine's ring is full
> > > > + * @e: The engine
> > > > + *
> > > > + * Return: True if the engine's ring is full, false otherwise.
> > > > + */
> > > > +bool xe_engine_ring_full(struct xe_engine *e)
> > > > +{
> > > > +	struct xe_lrc *lrc = e->lrc;
> > > > +	s32 max_job = lrc->ring.size / MAX_JOB_SIZE_BYTES;
> > > > +
> > > > +	return xe_engine_num_job_inflight(e) >= max_job;
> > > > +}
> > > > +
> > > >    /**
> > > >     * xe_engine_is_idle() - Whether an engine is idle.
> > > >     * @engine: The engine
> > > > diff --git a/drivers/gpu/drm/xe/xe_engine.h b/drivers/gpu/drm/xe/xe_engine.h
> > > > index a49cf2ab405e..2e60f6d90226 100644
> > > > --- a/drivers/gpu/drm/xe/xe_engine.h
> > > > +++ b/drivers/gpu/drm/xe/xe_engine.h
> > > > @@ -42,6 +42,10 @@ static inline bool xe_engine_is_parallel(struct xe_engine *engine)
> > > >    	return engine->width > 1;
> > > >    }
> > > > +bool xe_engine_is_lr(struct xe_engine *e);
> > > > +
> > > > +bool xe_engine_ring_full(struct xe_engine *e);
> > > > +
> > > >    bool xe_engine_is_idle(struct xe_engine *engine);
> > > >    void xe_engine_kill(struct xe_engine *e);
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> > > > index ea869f2452ef..44ea9bcd0066 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec.c
> > > > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > > > @@ -13,6 +13,7 @@
> > > >    #include "xe_device.h"
> > > >    #include "xe_engine.h"
> > > >    #include "xe_macros.h"
> > > > +#include "xe_ring_ops_types.h"
> > > >    #include "xe_sched_job.h"
> > > >    #include "xe_sync.h"
> > > >    #include "xe_vm.h"
> > > > @@ -277,6 +278,11 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > > >    		goto err_engine_end;
> > > >    	}
> > > > +	if (xe_engine_is_lr(engine) && xe_engine_ring_full(engine)) {
> > > > +		err = -EWOULDBLOCK;
> > > > +		goto err_engine_end;
> > > > +	}
> > > > +
> > > >    	job = xe_sched_job_create(engine, xe_engine_is_parallel(engine) ?
> > > >    				  addresses : &args->address);
> > > >    	if (IS_ERR(job)) {
> > > > @@ -363,6 +369,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > > >    		xe_sync_entry_signal(&syncs[i], job,
> > > >    				     &job->drm.s_fence->finished);
> > > > +	if (xe_engine_is_lr(engine))
> > > > +		engine->ring_ops->emit_job(job);
> > > >    	xe_sched_job_push(job);
> > > >    	xe_vm_reactivate_rebind(vm);
> > > > diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > > > index cbfb13026ec1..5d83132034a6 100644
> > > > --- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > > > +++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > > > @@ -31,6 +31,8 @@ struct xe_guc_engine {
> > > >    	 */
> > > >    #define MAX_STATIC_MSG_TYPE	3
> > > >    	struct drm_sched_msg static_msgs[MAX_STATIC_MSG_TYPE];
> > > > +	/** @lr_tdr: long running TDR worker */
> > > > +	struct work_struct lr_tdr;
> > > >    	/** @fini_async: do final fini async from this worker */
> > > >    	struct work_struct fini_async;
> > > >    	/** @resume_time: time of last resume */
> > > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > index 68d09e7a4cc0..0a41f5d04f6d 100644
> > > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > @@ -500,6 +500,14 @@ static void register_engine(struct xe_engine *e)
> > > >    		parallel_write(xe, map, wq_desc.wq_status, WQ_STATUS_ACTIVE);
> > > >    	}
> > > > +	/*
> > > > +	 * We must keep a reference for LR engines if engine is registered with
> > > > +	 * the GuC as jobs signal immediately and can't destroy an engine if the
> > > > +	 * GuC has a reference to it.
> > > > +	 */
> > > > +	if (xe_engine_is_lr(e))
> > > > +		xe_engine_get(e);
> > > > +
> > > >    	set_engine_registered(e);
> > > >    	trace_xe_engine_register(e);
> > > >    	if (xe_engine_is_parallel(e))
> > > > @@ -662,6 +670,7 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
> > > >    {
> > > >    	struct xe_sched_job *job = to_xe_sched_job(drm_job);
> > > >    	struct xe_engine *e = job->engine;
> > > > +	bool lr = xe_engine_is_lr(e);
> > > >    	XE_BUG_ON((engine_destroyed(e) || engine_pending_disable(e)) &&
> > > >    		  !engine_banned(e) && !engine_suspended(e));
> > > > @@ -671,14 +680,19 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
> > > >    	if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
> > > >    		if (!engine_registered(e))
> > > >    			register_engine(e);
> > > > -		e->ring_ops->emit_job(job);
> > > > +		if (!lr)	/* Written in IOCTL */
> > > Hmm? What does "Written in IOCTL mean?" Could you rephrase to something more
> > > descriptive?
> > > 
> > "LR jos are emitted in the IOCTL"
> 
> Ah, I read it as "the lr variable was written in IOCTL."
> 
> Perhaps LR jobs are already emitted at execbuf time?
> 

I missed exec in my update.

s/LR jos are emitted in the IOCTL/LR jos are emitted in the exec IOCTL/

Matt

> /Thomas
> 
> 
> > 
> > Does that work?
> > 
> > Matt
> > 
> > > > +			e->ring_ops->emit_job(job);
> > > >    		submit_engine(e);
> > > >    	}
> > > > -	if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags))
> > > > +	if (lr) {
> > > > +		xe_sched_job_set_error(job, -ENOTSUPP);
> > > > +		return NULL;
> > > > +	} else if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags)) {
> > > >    		return job->fence;
> > > > -	else
> > > > +	} else {
> > > >    		return dma_fence_get(job->fence);
> > > > +	}
> > > >    }
> > > >    static void guc_engine_free_job(struct drm_sched_job *drm_job)
> > > > @@ -782,6 +796,57 @@ static void simple_error_capture(struct xe_engine *e)
> > > >    }
> > > >    #endif
> > > > +static void xe_guc_engine_trigger_cleanup(struct xe_engine *e)
> > > > +{
> > > > +	struct xe_guc *guc = engine_to_guc(e);
> > > > +
> > > > +	if (xe_engine_is_lr(e))
> > > > +		queue_work(guc_to_gt(guc)->ordered_wq, &e->guc->lr_tdr);
> > > > +	else
> > > > +		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > > > +}
> > > > +
> > > > +static void xe_guc_engine_lr_cleanup(struct work_struct *w)
> > > > +{
> > > > +	struct xe_guc_engine *ge =
> > > > +		container_of(w, struct xe_guc_engine, lr_tdr);
> > > > +	struct xe_engine *e = ge->engine;
> > > > +	struct drm_gpu_scheduler *sched = &ge->sched;
> > > > +
> > > > +	XE_BUG_ON(!xe_engine_is_lr(e));
> > > > +	trace_xe_engine_lr_cleanup(e);
> > > > +
> > > > +	/* Kill the run_job / process_msg entry points */
> > > > +	drm_sched_run_wq_stop(sched);
> > > > +
> > > > +	/* Engine state now stable, disable scheduling / deregister if needed */
> > > > +	if (engine_registered(e)) {
> > > > +		struct xe_guc *guc = engine_to_guc(e);
> > > > +		int ret;
> > > > +
> > > > +		set_engine_banned(e);
> > > > +		xe_engine_get(e);
> > > > +		disable_scheduling_deregister(guc, e);
> > > > +
> > > > +		/*
> > > > +		 * Must wait for scheduling to be disabled before signalling
> > > > +		 * any fences, if GT broken the GT reset code should signal us.
> > > > +		 */
> > > > +		smp_rmb();
> > > wait_event() paired with wake_up() family of functions typically set the
> > > necessary barriers to make sure anything written prior to wake_up() is seen
> > > in wait_event(). So that smp_rmb() is most likely not needed. If it still
> > > is, its pairing smp_wmb() should be documented and pointed to as well. See
> > > documentation of set_current_state() vs __set_current_state().
> > > 
> > > > +		ret = wait_event_timeout(guc->ct.wq,
> > > > +					 !engine_pending_disable(e) ||
> > > > +					 guc_read_stopped(guc), HZ * 5);
> > > > +		if (!ret) {
> > > > +			XE_WARN_ON("Schedule disable failed to respond");
> > > > +			drm_sched_run_wq_start(sched);
> > > > +			xe_gt_reset_async(e->gt);
> > > > +			return;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	drm_sched_run_wq_start(sched);
> > > > +}
> > > > +
> > > >    static enum drm_gpu_sched_stat
> > > >    guc_engine_timedout_job(struct drm_sched_job *drm_job)
> > > >    {
> > > > @@ -832,7 +897,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
> > > >    			err = -EIO;
> > > >    		set_engine_banned(e);
> > > >    		xe_engine_get(e);
> > > > -		disable_scheduling_deregister(engine_to_guc(e), e);
> > > > +		disable_scheduling_deregister(guc, e);
> > > >    		/*
> > > >    		 * Must wait for scheduling to be disabled before signalling
> > > > @@ -865,7 +930,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
> > > >    	 */
> > > >    	list_add(&drm_job->list, &sched->pending_list);
> > > >    	drm_sched_run_wq_start(sched);
> > > > -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > > > +	xe_guc_engine_trigger_cleanup(e);
> > > >    	/* Mark all outstanding jobs as bad, thus completing them */
> > > >    	spin_lock(&sched->job_list_lock);
> > > > @@ -889,6 +954,8 @@ static void __guc_engine_fini_async(struct work_struct *w)
> > > >    	trace_xe_engine_destroy(e);
> > > > +	if (xe_engine_is_lr(e))
> > > > +		cancel_work_sync(&ge->lr_tdr);
> > > >    	if (e->flags & ENGINE_FLAG_PERSISTENT)
> > > >    		xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
> > > >    	release_guc_id(guc, e);
> > > > @@ -906,7 +973,7 @@ static void guc_engine_fini_async(struct xe_engine *e)
> > > >    	bool kernel = e->flags & ENGINE_FLAG_KERNEL;
> > > >    	INIT_WORK(&e->guc->fini_async, __guc_engine_fini_async);
> > > > -	queue_work(system_unbound_wq, &e->guc->fini_async);
> > > > +	queue_work(system_wq, &e->guc->fini_async);
> > > >    	/* We must block on kernel engines so slabs are empty on driver unload */
> > > >    	if (kernel) {
> > > > @@ -1089,12 +1156,16 @@ static int guc_engine_init(struct xe_engine *e)
> > > >    	if (err)
> > > >    		goto err_free;
> > > > +
> > > Unrelated whitespace?
> > > 
> > > 
> > > >    	sched = &ge->sched;
> > > >    	err = drm_sched_entity_init(&ge->entity, DRM_SCHED_PRIORITY_NORMAL,
> > > >    				    &sched, 1, NULL);
> > > >    	if (err)
> > > >    		goto err_sched;
> > > > +	if (xe_engine_is_lr(e))
> > > > +		INIT_WORK(&e->guc->lr_tdr, xe_guc_engine_lr_cleanup);
> > > > +
> > > >    	mutex_lock(&guc->submission_state.lock);
> > > >    	err = alloc_guc_id(guc, e);
> > > > @@ -1146,7 +1217,7 @@ static void guc_engine_kill(struct xe_engine *e)
> > > >    {
> > > >    	trace_xe_engine_kill(e);
> > > >    	set_engine_killed(e);
> > > > -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > > > +	xe_guc_engine_trigger_cleanup(e);
> > > >    }
> > > >    static void guc_engine_add_msg(struct xe_engine *e, struct drm_sched_msg *msg,
> > > > @@ -1296,6 +1367,9 @@ static void guc_engine_stop(struct xe_guc *guc, struct xe_engine *e)
> > > >    	/* Stop scheduling + flush any DRM scheduler operations */
> > > >    	drm_sched_run_wq_stop(sched);
> > > > +	if (engine_registered(e) && xe_engine_is_lr(e))
> > > > +		xe_engine_put(e);
> > > > +
> > > >    	/* Clean up lost G2H + reset engine state */
> > > >    	if (engine_destroyed(e) && engine_registered(e)) {
> > > >    		if (engine_banned(e))
> > > > @@ -1520,6 +1594,9 @@ int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> > > >    	trace_xe_engine_deregister_done(e);
> > > >    	clear_engine_registered(e);
> > > > +	if (xe_engine_is_lr(e))
> > > > +		xe_engine_put(e);
> > > > +
> > > >    	if (engine_banned(e))
> > > >    		xe_engine_put(e);
> > > >    	else
> > > > @@ -1557,7 +1634,7 @@ int xe_guc_engine_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
> > > >    	 */
> > > >    	set_engine_reset(e);
> > > >    	if (!engine_banned(e))
> > > > -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > > > +		xe_guc_engine_trigger_cleanup(e);
> > > >    	return 0;
> > > >    }
> > > > @@ -1584,7 +1661,7 @@ int xe_guc_engine_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
> > > >    	/* Treat the same as engine reset */
> > > >    	set_engine_reset(e);
> > > >    	if (!engine_banned(e))
> > > > -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > > > +		xe_guc_engine_trigger_cleanup(e);
> > > >    	return 0;
> > > >    }
> > > > diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> > > > index 2f8eb7ebe9a7..02861c26e145 100644
> > > > --- a/drivers/gpu/drm/xe/xe_trace.h
> > > > +++ b/drivers/gpu/drm/xe/xe_trace.h
> > > > @@ -219,6 +219,11 @@ DEFINE_EVENT(xe_engine, xe_engine_resubmit,
> > > >    	     TP_ARGS(e)
> > > >    );
> > > > +DEFINE_EVENT(xe_engine, xe_engine_lr_cleanup,
> > > > +	     TP_PROTO(struct xe_engine *e),
> > > > +	     TP_ARGS(e)
> > > > +);
> > > > +
> > > >    DECLARE_EVENT_CLASS(xe_sched_job,
> > > >    		    TP_PROTO(struct xe_sched_job *job),
> > > >    		    TP_ARGS(job),

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 05/31] drm/xe: Long running job update
  2023-05-08 13:14   ` Thomas Hellström
  2023-05-09 14:56     ` Matthew Brost
@ 2023-05-09 22:21     ` Matthew Brost
  1 sibling, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-09 22:21 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

On Mon, May 08, 2023 at 03:14:10PM +0200, Thomas Hellström wrote:
> Hi, Matthew
> 
> In addition to Rodrigo's comments:
> 

Opps, missed a few in the first reply... Addressing below.

> On 5/2/23 02:17, Matthew Brost wrote:
> > Flow control + write ring in exec, return NULL in run_job, siganl
> > xe_hw_fence immediately, and override TDR for LR jobs.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_engine.c           | 32 ++++++++
> >   drivers/gpu/drm/xe/xe_engine.h           |  4 +
> >   drivers/gpu/drm/xe/xe_exec.c             |  8 ++
> >   drivers/gpu/drm/xe/xe_guc_engine_types.h |  2 +
> >   drivers/gpu/drm/xe/xe_guc_submit.c       | 95 +++++++++++++++++++++---
> >   drivers/gpu/drm/xe/xe_trace.h            |  5 ++
> >   6 files changed, 137 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
> > index 094ec17d3004..d1e84d7adbd4 100644
> > --- a/drivers/gpu/drm/xe/xe_engine.c
> > +++ b/drivers/gpu/drm/xe/xe_engine.c
> > @@ -18,6 +18,7 @@
> >   #include "xe_macros.h"
> >   #include "xe_migrate.h"
> >   #include "xe_pm.h"
> > +#include "xe_ring_ops_types.h"
> >   #include "xe_trace.h"
> >   #include "xe_vm.h"
> > @@ -673,6 +674,37 @@ static void engine_kill_compute(struct xe_engine *e)
> >   	up_write(&e->vm->lock);
> >   }
> > +/**
> > + * xe_engine_is_lr() - Whether an engine is long-running
> > + * @e: The engine
> > + *
> > + * Return: True if the engine is long-running, false otherwise.
> > + */
> > +bool xe_engine_is_lr(struct xe_engine *e)
> > +{
> > +	return e->vm && xe_vm_no_dma_fences(e->vm) &&
> > +		!(e->flags & ENGINE_FLAG_VM);
> > +}
> > +
> > +static s32 xe_engine_num_job_inflight(struct xe_engine *e)
> > +{
> > +	return e->lrc->fence_ctx.next_seqno - xe_lrc_seqno(e->lrc) - 1;
> > +}
> > +
> > +/**
> > + * xe_engine_ring_full() - Whether an engine's ring is full
> > + * @e: The engine
> > + *
> > + * Return: True if the engine's ring is full, false otherwise.
> > + */
> > +bool xe_engine_ring_full(struct xe_engine *e)
> > +{
> > +	struct xe_lrc *lrc = e->lrc;
> > +	s32 max_job = lrc->ring.size / MAX_JOB_SIZE_BYTES;
> > +
> > +	return xe_engine_num_job_inflight(e) >= max_job;
> > +}
> > +
> >   /**
> >    * xe_engine_is_idle() - Whether an engine is idle.
> >    * @engine: The engine
> > diff --git a/drivers/gpu/drm/xe/xe_engine.h b/drivers/gpu/drm/xe/xe_engine.h
> > index a49cf2ab405e..2e60f6d90226 100644
> > --- a/drivers/gpu/drm/xe/xe_engine.h
> > +++ b/drivers/gpu/drm/xe/xe_engine.h
> > @@ -42,6 +42,10 @@ static inline bool xe_engine_is_parallel(struct xe_engine *engine)
> >   	return engine->width > 1;
> >   }
> > +bool xe_engine_is_lr(struct xe_engine *e);
> > +
> > +bool xe_engine_ring_full(struct xe_engine *e);
> > +
> >   bool xe_engine_is_idle(struct xe_engine *engine);
> >   void xe_engine_kill(struct xe_engine *e);
> > diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> > index ea869f2452ef..44ea9bcd0066 100644
> > --- a/drivers/gpu/drm/xe/xe_exec.c
> > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > @@ -13,6 +13,7 @@
> >   #include "xe_device.h"
> >   #include "xe_engine.h"
> >   #include "xe_macros.h"
> > +#include "xe_ring_ops_types.h"
> >   #include "xe_sched_job.h"
> >   #include "xe_sync.h"
> >   #include "xe_vm.h"
> > @@ -277,6 +278,11 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >   		goto err_engine_end;
> >   	}
> > +	if (xe_engine_is_lr(engine) && xe_engine_ring_full(engine)) {
> > +		err = -EWOULDBLOCK;
> > +		goto err_engine_end;
> > +	}
> > +
> >   	job = xe_sched_job_create(engine, xe_engine_is_parallel(engine) ?
> >   				  addresses : &args->address);
> >   	if (IS_ERR(job)) {
> > @@ -363,6 +369,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >   		xe_sync_entry_signal(&syncs[i], job,
> >   				     &job->drm.s_fence->finished);
> > +	if (xe_engine_is_lr(engine))
> > +		engine->ring_ops->emit_job(job);
> >   	xe_sched_job_push(job);
> >   	xe_vm_reactivate_rebind(vm);
> > diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > index cbfb13026ec1..5d83132034a6 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > @@ -31,6 +31,8 @@ struct xe_guc_engine {
> >   	 */
> >   #define MAX_STATIC_MSG_TYPE	3
> >   	struct drm_sched_msg static_msgs[MAX_STATIC_MSG_TYPE];
> > +	/** @lr_tdr: long running TDR worker */
> > +	struct work_struct lr_tdr;
> >   	/** @fini_async: do final fini async from this worker */
> >   	struct work_struct fini_async;
> >   	/** @resume_time: time of last resume */
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index 68d09e7a4cc0..0a41f5d04f6d 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -500,6 +500,14 @@ static void register_engine(struct xe_engine *e)
> >   		parallel_write(xe, map, wq_desc.wq_status, WQ_STATUS_ACTIVE);
> >   	}
> > +	/*
> > +	 * We must keep a reference for LR engines if engine is registered with
> > +	 * the GuC as jobs signal immediately and can't destroy an engine if the
> > +	 * GuC has a reference to it.
> > +	 */
> > +	if (xe_engine_is_lr(e))
> > +		xe_engine_get(e);
> > +
> >   	set_engine_registered(e);
> >   	trace_xe_engine_register(e);
> >   	if (xe_engine_is_parallel(e))
> > @@ -662,6 +670,7 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
> >   {
> >   	struct xe_sched_job *job = to_xe_sched_job(drm_job);
> >   	struct xe_engine *e = job->engine;
> > +	bool lr = xe_engine_is_lr(e);
> >   	XE_BUG_ON((engine_destroyed(e) || engine_pending_disable(e)) &&
> >   		  !engine_banned(e) && !engine_suspended(e));
> > @@ -671,14 +680,19 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
> >   	if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
> >   		if (!engine_registered(e))
> >   			register_engine(e);
> > -		e->ring_ops->emit_job(job);
> > +		if (!lr)	/* Written in IOCTL */
> 
> Hmm? What does "Written in IOCTL mean?" Could you rephrase to something more
> descriptive?
> 
> > +			e->ring_ops->emit_job(job);
> >   		submit_engine(e);
> >   	}
> > -	if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags))
> > +	if (lr) {
> > +		xe_sched_job_set_error(job, -ENOTSUPP);
> > +		return NULL;
> > +	} else if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags)) {
> >   		return job->fence;
> > -	else
> > +	} else {
> >   		return dma_fence_get(job->fence);
> > +	}
> >   }
> >   static void guc_engine_free_job(struct drm_sched_job *drm_job)
> > @@ -782,6 +796,57 @@ static void simple_error_capture(struct xe_engine *e)
> >   }
> >   #endif
> > +static void xe_guc_engine_trigger_cleanup(struct xe_engine *e)
> > +{
> > +	struct xe_guc *guc = engine_to_guc(e);
> > +
> > +	if (xe_engine_is_lr(e))
> > +		queue_work(guc_to_gt(guc)->ordered_wq, &e->guc->lr_tdr);
> > +	else
> > +		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > +}
> > +
> > +static void xe_guc_engine_lr_cleanup(struct work_struct *w)
> > +{
> > +	struct xe_guc_engine *ge =
> > +		container_of(w, struct xe_guc_engine, lr_tdr);
> > +	struct xe_engine *e = ge->engine;
> > +	struct drm_gpu_scheduler *sched = &ge->sched;
> > +
> > +	XE_BUG_ON(!xe_engine_is_lr(e));
> > +	trace_xe_engine_lr_cleanup(e);
> > +
> > +	/* Kill the run_job / process_msg entry points */
> > +	drm_sched_run_wq_stop(sched);
> > +
> > +	/* Engine state now stable, disable scheduling / deregister if needed */
> > +	if (engine_registered(e)) {
> > +		struct xe_guc *guc = engine_to_guc(e);
> > +		int ret;
> > +
> > +		set_engine_banned(e);
> > +		xe_engine_get(e);
> > +		disable_scheduling_deregister(guc, e);
> > +
> > +		/*
> > +		 * Must wait for scheduling to be disabled before signalling
> > +		 * any fences, if GT broken the GT reset code should signal us.
> > +		 */
> > +		smp_rmb();
> 
> wait_event() paired with wake_up() family of functions typically set the
> necessary barriers to make sure anything written prior to wake_up() is seen
> in wait_event(). So that smp_rmb() is most likely not needed. If it still
> is, its pairing smp_wmb() should be documented and pointed to as well. See
> documentation of set_current_state() vs __set_current_state().
>

Ok, thanks. This is copy paste from other and code. I've always be a
little shakey with my understanding of barriers, believe you that this
is wrong. Will remove and study up on this.
 
> > +		ret = wait_event_timeout(guc->ct.wq,
> > +					 !engine_pending_disable(e) ||
> > +					 guc_read_stopped(guc), HZ * 5);
> > +		if (!ret) {
> > +			XE_WARN_ON("Schedule disable failed to respond");
> > +			drm_sched_run_wq_start(sched);
> > +			xe_gt_reset_async(e->gt);
> > +			return;
> > +		}
> > +	}
> > +
> > +	drm_sched_run_wq_start(sched);
> > +}
> > +
> >   static enum drm_gpu_sched_stat
> >   guc_engine_timedout_job(struct drm_sched_job *drm_job)
> >   {
> > @@ -832,7 +897,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
> >   			err = -EIO;
> >   		set_engine_banned(e);
> >   		xe_engine_get(e);
> > -		disable_scheduling_deregister(engine_to_guc(e), e);
> > +		disable_scheduling_deregister(guc, e);
> >   		/*
> >   		 * Must wait for scheduling to be disabled before signalling
> > @@ -865,7 +930,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
> >   	 */
> >   	list_add(&drm_job->list, &sched->pending_list);
> >   	drm_sched_run_wq_start(sched);
> > -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > +	xe_guc_engine_trigger_cleanup(e);
> >   	/* Mark all outstanding jobs as bad, thus completing them */
> >   	spin_lock(&sched->job_list_lock);
> > @@ -889,6 +954,8 @@ static void __guc_engine_fini_async(struct work_struct *w)
> >   	trace_xe_engine_destroy(e);
> > +	if (xe_engine_is_lr(e))
> > +		cancel_work_sync(&ge->lr_tdr);
> >   	if (e->flags & ENGINE_FLAG_PERSISTENT)
> >   		xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
> >   	release_guc_id(guc, e);
> > @@ -906,7 +973,7 @@ static void guc_engine_fini_async(struct xe_engine *e)
> >   	bool kernel = e->flags & ENGINE_FLAG_KERNEL;
> >   	INIT_WORK(&e->guc->fini_async, __guc_engine_fini_async);
> > -	queue_work(system_unbound_wq, &e->guc->fini_async);
> > +	queue_work(system_wq, &e->guc->fini_async);
> >   	/* We must block on kernel engines so slabs are empty on driver unload */
> >   	if (kernel) {
> > @@ -1089,12 +1156,16 @@ static int guc_engine_init(struct xe_engine *e)
> >   	if (err)
> >   		goto err_free;
> > +
> 
> Unrelated whitespace?

Looks like it, will remove.

Matt

> 
> 
> >   	sched = &ge->sched;
> >   	err = drm_sched_entity_init(&ge->entity, DRM_SCHED_PRIORITY_NORMAL,
> >   				    &sched, 1, NULL);
> >   	if (err)
> >   		goto err_sched;
> > +	if (xe_engine_is_lr(e))
> > +		INIT_WORK(&e->guc->lr_tdr, xe_guc_engine_lr_cleanup);
> > +
> >   	mutex_lock(&guc->submission_state.lock);
> >   	err = alloc_guc_id(guc, e);
> > @@ -1146,7 +1217,7 @@ static void guc_engine_kill(struct xe_engine *e)
> >   {
> >   	trace_xe_engine_kill(e);
> >   	set_engine_killed(e);
> > -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > +	xe_guc_engine_trigger_cleanup(e);
> >   }
> >   static void guc_engine_add_msg(struct xe_engine *e, struct drm_sched_msg *msg,
> > @@ -1296,6 +1367,9 @@ static void guc_engine_stop(struct xe_guc *guc, struct xe_engine *e)
> >   	/* Stop scheduling + flush any DRM scheduler operations */
> >   	drm_sched_run_wq_stop(sched);
> > +	if (engine_registered(e) && xe_engine_is_lr(e))
> > +		xe_engine_put(e);
> > +
> >   	/* Clean up lost G2H + reset engine state */
> >   	if (engine_destroyed(e) && engine_registered(e)) {
> >   		if (engine_banned(e))
> > @@ -1520,6 +1594,9 @@ int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> >   	trace_xe_engine_deregister_done(e);
> >   	clear_engine_registered(e);
> > +	if (xe_engine_is_lr(e))
> > +		xe_engine_put(e);
> > +
> >   	if (engine_banned(e))
> >   		xe_engine_put(e);
> >   	else
> > @@ -1557,7 +1634,7 @@ int xe_guc_engine_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
> >   	 */
> >   	set_engine_reset(e);
> >   	if (!engine_banned(e))
> > -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > +		xe_guc_engine_trigger_cleanup(e);
> >   	return 0;
> >   }
> > @@ -1584,7 +1661,7 @@ int xe_guc_engine_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
> >   	/* Treat the same as engine reset */
> >   	set_engine_reset(e);
> >   	if (!engine_banned(e))
> > -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
> > +		xe_guc_engine_trigger_cleanup(e);
> >   	return 0;
> >   }
> > diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> > index 2f8eb7ebe9a7..02861c26e145 100644
> > --- a/drivers/gpu/drm/xe/xe_trace.h
> > +++ b/drivers/gpu/drm/xe/xe_trace.h
> > @@ -219,6 +219,11 @@ DEFINE_EVENT(xe_engine, xe_engine_resubmit,
> >   	     TP_ARGS(e)
> >   );
> > +DEFINE_EVENT(xe_engine, xe_engine_lr_cleanup,
> > +	     TP_PROTO(struct xe_engine *e),
> > +	     TP_ARGS(e)
> > +);
> > +
> >   DECLARE_EVENT_CLASS(xe_sched_job,
> >   		    TP_PROTO(struct xe_sched_job *job),
> >   		    TP_ARGS(job),
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 13/31] maple_tree: split up MA_STATE() macro
  2023-05-09 13:21   ` Thomas Hellström
@ 2023-05-10  0:29     ` Matthew Brost
  0 siblings, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-10  0:29 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: Danilo Krummrich, intel-xe

On Tue, May 09, 2023 at 03:21:56PM +0200, Thomas Hellström wrote:
> 
> On 5/2/23 02:17, Matthew Brost wrote:
> > From: Danilo Krummrich <dakr@redhat.com>
> > 
> > Split up the MA_STATE() macro such that components using the maple tree
> > can easily inherit from struct ma_state and build custom tree walk
> > macros to hide their internals from users.
> 
> I might misunderstand the patch, but isn't the real purpose to provide an
> MA_STATE initializer,and the way to achieve that is to split up the MA_STATE
> macro?
> 

This patch isn't mine and on dri-devel. Can you put your comments there?
https://patchwork.freedesktop.org/patch/530673/?series=112994&rev=4

Matt

> > 
> > Example:
> > 
> > struct sample_iterator {
> >          struct ma_state mas;
> >          struct sample_mgr *mgr;
> > };
> > 
> > \#define SAMPLE_ITERATOR(name, __mgr, start)                    \
> >          struct sample_iterator name = {                         \
> >                  .mas = MA_STATE_INIT(&(__mgr)->mt, start, 0),   \
> >                  .mgr = __mgr,                                   \
> >          }
> > 
> > \#define sample_iter_for_each_range(it__, entry__, end__) \
> >          mas_for_each(&(it__).mas, entry__, end__)
> > 
> > --
> > 
> > struct sample *sample;
> > SAMPLE_ITERATOR(si, min);
> > 
> > sample_iter_for_each_range(&si, sample, max) {
> >          frob(mgr, sample);
> > }
> > 
> > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > ---
> >   include/linux/maple_tree.h | 7 +++++--
> >   1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> > index 1fadb5f5978b..87d55334f1c2 100644
> > --- a/include/linux/maple_tree.h
> > +++ b/include/linux/maple_tree.h
> > @@ -423,8 +423,8 @@ struct ma_wr_state {
> >   #define MA_ERROR(err) \
> >   		((struct maple_enode *)(((unsigned long)err << 2) | 2UL))
> > -#define MA_STATE(name, mt, first, end)					\
> > -	struct ma_state name = {					\
> > +#define MA_STATE_INIT(mt, first, end)					\
> > +	{								\
> 
> Naming: following the convention in, for example, the mutex and ww mutex
> code this should've been called
> 
> __MA_STATE_INITIALIZER(),
> 
> whereas the decapitalized name ma_state_init() would've been a (possibly
> inline) init function if it existed.
> 
> But this all should ofc be run by the maple tree maintainer(s).
> 
> FWIW, with these things addressed the change LGTM.
> 
> /Thomas
> 
> 
> >   		.tree = mt,						\
> >   		.index = first,						\
> >   		.last = end,						\
> > @@ -435,6 +435,9 @@ struct ma_wr_state {
> >   		.mas_flags = 0,						\
> >   	}
> > +#define MA_STATE(name, mt, first, end)					\
> > +	struct ma_state name = MA_STATE_INIT(mt, first, end)
> > +
> >   #define MA_WR_STATE(name, ma_state, wr_entry)				\
> >   	struct ma_wr_state name = {					\
> >   		.mas = ma_state,					\

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 14/31] maple_tree: Export mas_preallocate
  2023-05-09 13:33   ` Thomas Hellström
@ 2023-05-10  0:31     ` Matthew Brost
  0 siblings, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-10  0:31 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

On Tue, May 09, 2023 at 03:33:48PM +0200, Thomas Hellström wrote:
> 
> On 5/2/23 02:17, Matthew Brost wrote:
> > The DRM GPUVA implementation needs this function.
> 
> A more thorough explanation as to why  it's needed would help convince maple
> tree maintainers an export is needed.
> 
> Otherwise the change itself LGTM.
> 
> /Thomas
> 

I think version of this is in Linus tree. I think just get this after a
rebase.

Matt

> 
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   lib/maple_tree.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> > index 9e2735cbc2b4..ae37a167e25d 100644
> > --- a/lib/maple_tree.c
> > +++ b/lib/maple_tree.c
> > @@ -5726,6 +5726,7 @@ int mas_preallocate(struct ma_state *mas, gfp_t gfp)
> >   	mas_reset(mas);
> >   	return ret;
> >   }
> > +EXPORT_SYMBOL_GPL(mas_preallocate);
> >   /*
> >    * mas_destroy() - destroy a maple state.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 26/31] drm/exec: Always compile drm_exec
  2023-05-09 14:45   ` Rodrigo Vivi
@ 2023-05-10  0:37     ` Matthew Brost
  2023-05-10  0:38     ` Matthew Brost
  1 sibling, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-10  0:37 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: Danilo Krummrich, intel-xe

On Tue, May 09, 2023 at 10:45:21AM -0400, Rodrigo Vivi wrote:
> On Mon, May 01, 2023 at 05:17:22PM -0700, Matthew Brost wrote:
> > We want some helpers for DRM exec in gpuva, alway compile this.
> > 
> > Suggested-by: Danilo Krummrich <dakr@redhat.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/Makefile | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> > index ab728632d8a2..40067970af04 100644
> > --- a/drivers/gpu/drm/Makefile
> > +++ b/drivers/gpu/drm/Makefile
> > @@ -23,6 +23,7 @@ drm-y := \
> >  	drm_dumb_buffers.o \
> >  	drm_edid.o \
> >  	drm_encoder.o \
> > +	drm_exec.o \
> >  	drm_file.o \
> >  	drm_fourcc.o \
> >  	drm_framebuffer.o \
> > @@ -81,8 +82,6 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
> >  # Memory-management helpers
> >  #
> >  #
> > -obj-$(CONFIG_DRM_EXEC) += drm_exec.o
> 
> shouldn't this kill this kconfig entirely then?
> Or should the helpers be split into some other common file?
> 

Yes we should kill this one entirely. Will fix.

Matt

> > -
> >  obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
> >  
> >  drm_dma_helper-y := drm_gem_dma_helper.o
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 26/31] drm/exec: Always compile drm_exec
  2023-05-09 14:45   ` Rodrigo Vivi
  2023-05-10  0:37     ` Matthew Brost
@ 2023-05-10  0:38     ` Matthew Brost
  1 sibling, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-10  0:38 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: Danilo Krummrich, intel-xe

On Tue, May 09, 2023 at 10:45:21AM -0400, Rodrigo Vivi wrote:
> On Mon, May 01, 2023 at 05:17:22PM -0700, Matthew Brost wrote:
> > We want some helpers for DRM exec in gpuva, alway compile this.
> > 
> > Suggested-by: Danilo Krummrich <dakr@redhat.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/Makefile | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> > index ab728632d8a2..40067970af04 100644
> > --- a/drivers/gpu/drm/Makefile
> > +++ b/drivers/gpu/drm/Makefile
> > @@ -23,6 +23,7 @@ drm-y := \
> >  	drm_dumb_buffers.o \
> >  	drm_edid.o \
> >  	drm_encoder.o \
> > +	drm_exec.o \
> >  	drm_file.o \
> >  	drm_fourcc.o \
> >  	drm_framebuffer.o \
> > @@ -81,8 +82,6 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
> >  # Memory-management helpers
> >  #
> >  #
> > -obj-$(CONFIG_DRM_EXEC) += drm_exec.o
> 
> shouldn't this kill this kconfig entirely then?
> Or should the helpers be split into some other common file?
> 

I can probably post 28 & 29 in an independent series.

Matt

> > -
> >  obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
> >  
> >  drm_dma_helper-y := drm_gem_dma_helper.o
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 11/31] drm/xe/guc: Use doorbells for submission if possible
  2023-05-08 21:42   ` Rodrigo Vivi
@ 2023-05-10  0:49     ` Matthew Brost
  0 siblings, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-10  0:49 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe, Faith Ekstrand

On Mon, May 08, 2023 at 05:42:05PM -0400, Rodrigo Vivi wrote:
> On Mon, May 01, 2023 at 05:17:07PM -0700, Matthew Brost wrote:
> > We have 256 doorbells (on most platforms) that we can allocate to bypass
> > using the H2G channel for submission. This will avoid contention on the
> > CT mutex.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Suggested-by: Faith Ekstrand <faith.ekstrand@collabora.com>
> > ---
> >  drivers/gpu/drm/xe/regs/xe_guc_regs.h    |   1 +
> >  drivers/gpu/drm/xe/xe_guc.c              |   6 +
> >  drivers/gpu/drm/xe/xe_guc_engine_types.h |   7 +
> >  drivers/gpu/drm/xe/xe_guc_submit.c       | 295 ++++++++++++++++++++++-
> >  drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
> >  drivers/gpu/drm/xe/xe_guc_types.h        |   4 +
> >  drivers/gpu/drm/xe/xe_trace.h            |   5 +
> >  7 files changed, 315 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/regs/xe_guc_regs.h b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> > index 37e0ac550931..11b117293a62 100644
> > --- a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> > +++ b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> > @@ -109,6 +109,7 @@ struct guc_doorbell_info {
> >  
> >  #define DIST_DBS_POPULATED			XE_REG(0xd08)
> >  #define   DOORBELLS_PER_SQIDI_MASK		REG_GENMASK(23, 16)
> > +#define	  DOORBELLS_PER_SQIDI_SHIFT		16
> >  #define   SQIDIS_DOORBELL_EXIST_MASK		REG_GENMASK(15, 0)
> >  
> >  #define GUC_BCS_RCS_IER				XE_REG(0xC550)
> > diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> > index 89d20faced19..0c87f78a868b 100644
> > --- a/drivers/gpu/drm/xe/xe_guc.c
> > +++ b/drivers/gpu/drm/xe/xe_guc.c
> > @@ -297,6 +297,12 @@ int xe_guc_init(struct xe_guc *guc)
> >   */
> >  int xe_guc_init_post_hwconfig(struct xe_guc *guc)
> >  {
> > +	int ret;
> > +
> > +	ret = xe_guc_submit_init_post_hwconfig(guc);
> > +	if (ret)
> > +		return ret;
> > +
> >  	return xe_guc_ads_init_post_hwconfig(&guc->ads);
> >  }
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > index 5d83132034a6..420b7f53e649 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > @@ -12,6 +12,7 @@
> >  #include <drm/gpu_scheduler.h>
> >  
> >  struct dma_fence;
> > +struct xe_bo;
> >  struct xe_engine;
> >  
> >  /**
> > @@ -37,6 +38,10 @@ struct xe_guc_engine {
> >  	struct work_struct fini_async;
> >  	/** @resume_time: time of last resume */
> >  	u64 resume_time;
> > +	/** @doorbell_bo: BO for memory doorbell */
> > +	struct xe_bo *doorbell_bo;
> > +	/** @doorbell_offset: MMIO doorbell offset */
> > +	u32 doorbell_offset;
> >  	/** @state: GuC specific state for this xe_engine */
> >  	atomic_t state;
> >  	/** @wqi_head: work queue item tail */
> > @@ -45,6 +50,8 @@ struct xe_guc_engine {
> >  	u32 wqi_tail;
> >  	/** @id: GuC id for this xe_engine */
> >  	u16 id;
> > +	/** @doorbell_id: doorbell id */
> > +	u16 doorbell_id;
> >  	/** @suspend_wait: wait queue used to wait on pending suspends */
> >  	wait_queue_head_t suspend_wait;
> >  	/** @suspend_pending: a suspend of the engine is pending */
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index 0a41f5d04f6d..1b6f36b04cd1 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -13,7 +13,10 @@
> >  
> >  #include <drm/drm_managed.h>
> >  
> > +#include "regs/xe_guc_regs.h"
> >  #include "regs/xe_lrc_layout.h"
> > +
> > +#include "xe_bo.h"
> >  #include "xe_device.h"
> >  #include "xe_engine.h"
> >  #include "xe_force_wake.h"
> > @@ -26,12 +29,22 @@
> >  #include "xe_lrc.h"
> >  #include "xe_macros.h"
> >  #include "xe_map.h"
> > +#include "xe_mmio.h"
> >  #include "xe_mocs.h"
> >  #include "xe_ring_ops_types.h"
> >  #include "xe_sched_job.h"
> >  #include "xe_trace.h"
> >  #include "xe_vm.h"
> >  
> > +#define HAS_GUC_MMIO_DB(xe) (IS_DGFX(xe) || GRAPHICS_VERx100(xe) >= 1250)
> > +#define HAS_GUC_DIST_DB(xe) \
> > +	(GRAPHICS_VERx100(xe) >= 1200 && !HAS_GUC_MMIO_DB(xe))
> > +
> > +#define GUC_NUM_HW_DOORBELLS 256
> > +
> > +#define GUC_MMIO_DB_BAR_OFFSET SZ_4M
> > +#define GUC_MMIO_DB_BAR_SIZE SZ_4M
> > +
> >  static struct xe_gt *
> >  guc_to_gt(struct xe_guc *guc)
> >  {
> > @@ -63,6 +76,7 @@ engine_to_guc(struct xe_engine *e)
> >  #define ENGINE_STATE_SUSPENDED		(1 << 5)
> >  #define ENGINE_STATE_RESET		(1 << 6)
> >  #define ENGINE_STATE_KILLED		(1 << 7)
> > +#define ENGINE_STATE_DB_REGISTERED	(1 << 8)
> >  
> >  static bool engine_registered(struct xe_engine *e)
> >  {
> > @@ -179,6 +193,16 @@ static void set_engine_killed(struct xe_engine *e)
> >  	atomic_or(ENGINE_STATE_KILLED, &e->guc->state);
> >  }
> >  
> > +static bool engine_doorbell_registered(struct xe_engine *e)
> > +{
> > +	return atomic_read(&e->guc->state) & ENGINE_STATE_DB_REGISTERED;
> > +}
> > +
> > +static void set_engine_doorbell_registered(struct xe_engine *e)
> > +{
> > +	atomic_or(ENGINE_STATE_DB_REGISTERED, &e->guc->state);
> > +}
> > +
> >  static bool engine_killed_or_banned(struct xe_engine *e)
> >  {
> >  	return engine_killed(e) || engine_banned(e);
> > @@ -190,6 +214,7 @@ static void guc_submit_fini(struct drm_device *drm, void *arg)
> >  
> >  	xa_destroy(&guc->submission_state.engine_lookup);
> >  	ida_destroy(&guc->submission_state.guc_ids);
> > +	ida_destroy(&guc->submission_state.doorbell_ids);
> >  	bitmap_free(guc->submission_state.guc_ids_bitmap);
> >  }
> >  
> > @@ -230,6 +255,7 @@ int xe_guc_submit_init(struct xe_guc *guc)
> >  	mutex_init(&guc->submission_state.lock);
> >  	xa_init(&guc->submission_state.engine_lookup);
> >  	ida_init(&guc->submission_state.guc_ids);
> > +	ida_init(&guc->submission_state.doorbell_ids);
> >  
> >  	spin_lock_init(&guc->submission_state.suspend.lock);
> >  	guc->submission_state.suspend.context = dma_fence_context_alloc(1);
> > @@ -243,6 +269,237 @@ int xe_guc_submit_init(struct xe_guc *guc)
> >  	return 0;
> >  }
> >  
> > +int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc)
> > +{
> > +	if (HAS_GUC_DIST_DB(guc_to_xe(guc))) {
> > +		u32 distdbreg = xe_mmio_read32(guc_to_gt(guc),
> > +					       DIST_DBS_POPULATED.reg);
> > +		u32 num_sqidi =
> > +			hweight32(distdbreg & SQIDIS_DOORBELL_EXIST_MASK);
> > +		u32 doorbells_per_sqidi =
> > +			((distdbreg >> DOORBELLS_PER_SQIDI_SHIFT) &
> > +			 DOORBELLS_PER_SQIDI_MASK) + 1;
> > +
> > +		guc->submission_state.num_doorbells =
> > +			num_sqidi * doorbells_per_sqidi;
> > +	} else {
> > +		guc->submission_state.num_doorbells = GUC_NUM_HW_DOORBELLS;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static bool alloc_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +	int ret;
> > +
> > +	lockdep_assert_held(&guc->submission_state.lock);
> > +
> > +	e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
> > +	ret = ida_simple_get(&guc->submission_state.doorbell_ids, 0,
> > +			     guc->submission_state.num_doorbells, GFP_NOWAIT);
> > +	if (ret < 0)
> > +		return false;
> > +
> > +	e->guc->doorbell_id = ret;
> > +
> > +	return true;
> > +}
> > +
> > +static void release_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +	mutex_lock(&guc->submission_state.lock);
> > +	ida_simple_remove(&guc->submission_state.doorbell_ids,
> > +			  e->guc->doorbell_id);
> > +	mutex_unlock(&guc->submission_state.lock);
> > +
> > +	e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
> > +}
> > +
> > +static int allocate_doorbell(struct xe_guc *guc, u16 guc_id, u16 doorbell_id,
> > +			     u64 gpa, u32 gtt_addr)
> > +{
> > +	u32 action[] = {
> > +		XE_GUC_ACTION_ALLOCATE_DOORBELL,
> > +		guc_id,
> > +		doorbell_id,
> > +		lower_32_bits(gpa),
> > +		upper_32_bits(gpa),
> > +		gtt_addr
> > +	};
> > +
> > +	return xe_guc_ct_send_block(&guc->ct, action, ARRAY_SIZE(action));
> > +}
> > +
> > +static void deallocate_doorbell(struct xe_guc *guc, u16 guc_id)
> > +{
> > +	u32 action[] = {
> > +		XE_GUC_ACTION_DEALLOCATE_DOORBELL,
> > +		guc_id
> > +	};
> > +
> > +	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> > +}
> > +
> > +static bool has_doorbell(struct xe_engine *e)
> > +{
> > +	return e->guc->doorbell_id != GUC_NUM_HW_DOORBELLS;
> > +}
> > +
> > +#define doorbell_read(guc_, e_, field_) ({			\
> > +	struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;	\
> > +	iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);	\
> > +	xe_map_rd_field(guc_to_xe((guc_)), &_vmap, 0,		\
> > +				  struct guc_doorbell_info, field_); \
> > +	})
> > +#define doorbell_write(guc_, e_, field_, val_) ({		\
> > +	struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;	\
> > +	iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);	\
> > +	xe_map_wr_field(guc_to_xe((guc_)), &_vmap, 0,		\
> > +				  struct guc_doorbell_info, field_, val_); \
> > +	})
> > +
> > +static void init_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +	struct xe_device *xe = guc_to_xe(guc);
> > +
> > +	/* GuC does the initialization with distributed and MMIO doorbells */
> > +	if (!HAS_GUC_DIST_DB(xe) && !HAS_GUC_MMIO_DB(xe)) {
> > +		doorbell_write(guc, e, db_status, GUC_DOORBELL_ENABLED);
> > +		doorbell_write(guc, e, cookie, 0);
> > +	}
> > +}
> > +
> > +static void fini_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +	if (!HAS_GUC_MMIO_DB(guc_to_xe(guc)) &&
> > +	    xe_device_mem_access_ongoing(guc_to_xe(guc)))
> > +		doorbell_write(guc, e, db_status, GUC_DOORBELL_DISABLED);
> > +}
> > +
> > +static void destroy_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +	if (has_doorbell(e)) {
> > +		release_doorbell_id(guc, e);
> > +		xe_bo_unpin_map_no_vm(e->guc->doorbell_bo);
> > +	}
> > +}
> > +
> > +static void ring_memory_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +	u32 cookie;
> > +
> > +	cookie = doorbell_read(guc, e, cookie);
> > +	doorbell_write(guc, e, cookie, cookie + 1 ?: cookie + 2);
> > +
> > +	XE_WARN_ON(doorbell_read(guc, e, db_status) != GUC_DOORBELL_ENABLED);
> > +}
> > +
> > +#define GUC_MMIO_DOORBELL_RING_ACK	0xACEDBEEF
> > +#define GUC_MMIO_DOORBELL_RING_NACK	0xDEADBEEF
> 
> Is this a guc abi? should it be in the guc abi files?
> 

Probably? I can move this.

> I feel that we need someone with deeper guc knowledge on this review
> although based on what I followed on the discussion with Faith and others
> it looks like a good move in general.
>

We probably need Daniele or JOhn H. to review this one.

Matt

> > +static void ring_mmio_doorbell(struct xe_guc *guc, u32 doorbell_offset)
> > +{
> > +	u32 db_value;
> > +
> > +	db_value = xe_mmio_read32(guc_to_gt(guc), GUC_MMIO_DB_BAR_OFFSET +
> > +				  doorbell_offset);
> > +
> > +	/*
> > +	 * The read from the doorbell page will return ack/nack. We don't remove
> > +	 * doorbells from active clients so we don't expect to ever get a nack.
> > +	 * XXX: if doorbell is lost, re-acquire it?
> > +	 */
> > +	XE_WARN_ON(db_value == GUC_MMIO_DOORBELL_RING_NACK);
> > +	XE_WARN_ON(db_value != GUC_MMIO_DOORBELL_RING_ACK);
> > +}
> > +
> > +static void ring_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +	XE_BUG_ON(!has_doorbell(e));
> > +
> > +	if (HAS_GUC_MMIO_DB(guc_to_xe(guc)))
> > +		ring_mmio_doorbell(guc, e->guc->doorbell_offset);
> > +	else
> > +		ring_memory_doorbell(guc, e);
> > +
> > +	trace_xe_engine_ring_db(e);
> > +}
> > +
> > +static void register_engine(struct xe_engine *e);
> > +
> > +static int create_doorbell(struct xe_guc *guc, struct xe_engine *e, bool init)
> > +{
> > +	struct xe_gt *gt = guc_to_gt(guc);
> > +	struct xe_device *xe = gt_to_xe(gt);
> > +	u64 gpa;
> > +	u32 gtt_addr;
> > +	int ret;
> > +
> > +	XE_BUG_ON(!has_doorbell(e));
> > +
> > +	if (HAS_GUC_MMIO_DB(xe)) {
> > +		e->guc->doorbell_offset = PAGE_SIZE * e->guc->doorbell_id;
> > +		gpa = GUC_MMIO_DB_BAR_OFFSET + e->guc->doorbell_offset;
> > +		gtt_addr = 0;
> > +	} else {
> > +		struct xe_bo *bo;
> > +
> > +		if (!e->guc->doorbell_bo) {
> > +			bo = xe_bo_create_pin_map(xe, gt, NULL, PAGE_SIZE,
> > +						  ttm_bo_type_kernel,
> > +						  XE_BO_CREATE_VRAM_IF_DGFX(gt) |
> > +						  XE_BO_CREATE_GGTT_BIT);
> > +			if (IS_ERR(bo))
> > +				return PTR_ERR(bo);
> > +
> > +			e->guc->doorbell_bo = bo;
> > +		} else {
> > +			bo = e->guc->doorbell_bo;
> > +		}
> > +
> > +		init_doorbell(guc, e);
> > +		gpa = xe_bo_main_addr(bo, PAGE_SIZE);
> > +		gtt_addr = xe_bo_ggtt_addr(bo);
> > +	}
> > +
> > +	if (init && e->flags & ENGINE_FLAG_KERNEL)
> > +		return 0;
> > +
> > +	register_engine(e);
> > +	ret = allocate_doorbell(guc, e->guc->id, e->guc->doorbell_id, gpa,
> > +				gtt_addr);
> > +	if (ret < 0) {
> > +		fini_doorbell(guc, e);
> > +		return ret;
> > +	}
> > +
> > +	/*
> > +	 * In distributed doorbells, guc is returning the cacheline selected
> > +	 * by HW as part of the 7bit data from the allocate doorbell command:
> > +	 *  bit [22]   - Cacheline allocated
> > +	 *  bit [21:16] - Cacheline offset address
> > +	 * (bit 21 must be zero, or our assumption of only using half a page is
> > +	 * no longer correct).
> > +	 */
> > +	if (HAS_GUC_DIST_DB(xe)) {
> > +		u32 dd_cacheline_info;
> > +
> > +		XE_WARN_ON(!(ret & BIT(22)));
> > +		XE_WARN_ON(ret & BIT(21));
> > +
> > +		dd_cacheline_info = FIELD_GET(GENMASK(21, 16), ret);
> > +		e->guc->doorbell_offset = dd_cacheline_info * cache_line_size();
> > +
> > +		/* and verify db status was updated correctly by the guc fw */
> > +		XE_WARN_ON(doorbell_read(guc, e, db_status) !=
> > +			   GUC_DOORBELL_ENABLED);
> > +	}
> > +
> > +	set_engine_doorbell_registered(e);
> > +
> > +	return 0;
> > +}
> > +
> >  static int alloc_guc_id(struct xe_guc *guc, struct xe_engine *e)
> >  {
> >  	int ret;
> > @@ -623,6 +880,7 @@ static void submit_engine(struct xe_engine *e)
> >  	u32 num_g2h = 0;
> >  	int len = 0;
> >  	bool extra_submit = false;
> > +	bool enable = false;
> >  
> >  	XE_BUG_ON(!engine_registered(e));
> >  
> > @@ -642,6 +900,7 @@ static void submit_engine(struct xe_engine *e)
> >  		num_g2h = 1;
> >  		if (xe_engine_is_parallel(e))
> >  			extra_submit = true;
> > +		enable = true;
> >  
> >  		e->guc->resume_time = RESUME_PENDING;
> >  		set_engine_pending_enable(e);
> > @@ -653,7 +912,10 @@ static void submit_engine(struct xe_engine *e)
> >  		trace_xe_engine_submit(e);
> >  	}
> >  
> > -	xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
> > +	if (enable || !engine_doorbell_registered(e))
> > +		xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
> > +	else
> > +		ring_doorbell(guc, e);
> >  
> >  	if (extra_submit) {
> >  		len = 0;
> > @@ -678,8 +940,17 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
> >  	trace_xe_sched_job_run(job);
> >  
> >  	if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
> > -		if (!engine_registered(e))
> > -			register_engine(e);
> > +		if (!engine_registered(e)) {
> > +			if (has_doorbell(e)) {
> > +				int err = create_doorbell(engine_to_guc(e), e,
> > +							  false);
> > +
> > +				/* Not fatal, but let's warn */
> > +				XE_WARN_ON(err);
> > +			} else {
> > +				register_engine(e);
> > +			}
> > +		}
> >  		if (!lr)	/* Written in IOCTL */
> >  			e->ring_ops->emit_job(job);
> >  		submit_engine(e);
> > @@ -722,6 +993,11 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
> >  	MAKE_SCHED_CONTEXT_ACTION(e, DISABLE);
> >  	int ret;
> >  
> > +	if (has_doorbell(e)) {
> > +		fini_doorbell(guc, e);
> > +		deallocate_doorbell(guc, e->guc->id);
> > +	}
> > +
> >  	set_min_preemption_timeout(guc, e);
> >  	smp_rmb();
> >  	ret = wait_event_timeout(guc->ct.wq, !engine_pending_enable(e) ||
> > @@ -958,6 +1234,7 @@ static void __guc_engine_fini_async(struct work_struct *w)
> >  		cancel_work_sync(&ge->lr_tdr);
> >  	if (e->flags & ENGINE_FLAG_PERSISTENT)
> >  		xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
> > +	destroy_doorbell(guc, e);
> >  	release_guc_id(guc, e);
> >  	drm_sched_entity_fini(&ge->entity);
> >  	drm_sched_fini(&ge->sched);
> > @@ -1136,6 +1413,7 @@ static int guc_engine_init(struct xe_engine *e)
> >  	struct xe_guc_engine *ge;
> >  	long timeout;
> >  	int err;
> > +	bool create_db = false;
> >  
> >  	XE_BUG_ON(!xe_device_guc_submission_enabled(guc_to_xe(guc)));
> >  
> > @@ -1177,8 +1455,17 @@ static int guc_engine_init(struct xe_engine *e)
> >  	if (guc_read_stopped(guc))
> >  		drm_sched_stop(sched, NULL);
> >  
> > +	create_db = alloc_doorbell_id(guc, e);
> > +
> >  	mutex_unlock(&guc->submission_state.lock);
> >  
> > +	if (create_db) {
> > +		/* Error isn't fatal as we don't need a doorbell */
> > +		err = create_doorbell(guc, e, true);
> > +		if (err)
> > +			release_doorbell_id(guc, e);
> > +	}
> > +
> >  	switch (e->class) {
> >  	case XE_ENGINE_CLASS_RENDER:
> >  		sprintf(e->name, "rcs%d", e->guc->id);
> > @@ -1302,7 +1589,7 @@ static int guc_engine_set_job_timeout(struct xe_engine *e, u32 job_timeout_ms)
> >  {
> >  	struct drm_gpu_scheduler *sched = &e->guc->sched;
> >  
> > -	XE_BUG_ON(engine_registered(e));
> > +	XE_BUG_ON(engine_registered(e) && !has_doorbell(e));
> >  	XE_BUG_ON(engine_banned(e));
> >  	XE_BUG_ON(engine_killed(e));
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
> > index 8002734d6f24..bada6c02d6aa 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
> > @@ -13,6 +13,7 @@ struct xe_engine;
> >  struct xe_guc;
> >  
> >  int xe_guc_submit_init(struct xe_guc *guc);
> > +int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc);
> >  void xe_guc_submit_print(struct xe_guc *guc, struct drm_printer *p);
> >  
> >  int xe_guc_submit_reset_prepare(struct xe_guc *guc);
> > diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
> > index ac7eec28934d..9ee4d572f4e0 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_types.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_types.h
> > @@ -36,10 +36,14 @@ struct xe_guc {
> >  		struct xarray engine_lookup;
> >  		/** @guc_ids: used to allocate new guc_ids, single-lrc */
> >  		struct ida guc_ids;
> > +		/** @doorbell_ids: use to allocate new doorbells */
> > +		struct ida doorbell_ids;
> >  		/** @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc */
> >  		unsigned long *guc_ids_bitmap;
> >  		/** @stopped: submissions are stopped */
> >  		atomic_t stopped;
> > +		/** @num_doorbells: number of doorbels */
> > +		int num_doorbells;
> >  		/** @lock: protects submission state */
> >  		struct mutex lock;
> >  		/** @suspend: suspend fence state */
> > diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> > index 02861c26e145..38e9d7c6197b 100644
> > --- a/drivers/gpu/drm/xe/xe_trace.h
> > +++ b/drivers/gpu/drm/xe/xe_trace.h
> > @@ -149,6 +149,11 @@ DEFINE_EVENT(xe_engine, xe_engine_submit,
> >  	     TP_ARGS(e)
> >  );
> >  
> > +DEFINE_EVENT(xe_engine, xe_engine_ring_db,
> > +	     TP_PROTO(struct xe_engine *e),
> > +	     TP_ARGS(e)
> > +);
> > +
> >  DEFINE_EVENT(xe_engine, xe_engine_scheduling_enable,
> >  	     TP_PROTO(struct xe_engine *e),
> >  	     TP_ARGS(e)
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 11/31] drm/xe/guc: Use doorbells for submission if possible
  2023-05-09 13:00   ` Thomas Hellström
@ 2023-05-10  0:51     ` Matthew Brost
  0 siblings, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-10  0:51 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe, Faith Ekstrand

On Tue, May 09, 2023 at 03:00:17PM +0200, Thomas Hellström wrote:
> 
> On 5/2/23 02:17, Matthew Brost wrote:
> > We have 256 doorbells (on most platforms) that we can allocate to bypass
> > using the H2G channel for submission. This will avoid contention on the
> > CT mutex.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Suggested-by: Faith Ekstrand <faith.ekstrand@collabora.com>
> 
> Could we describe in a DOC section how doorbells are distributed and if
> there are any suggestions on how to improve that moving forward?
> 

We I do the GuC documentation (not yet done) I can include some doorbell
documentation.

Matt

> /Thomas
> 
> > ---
> >   drivers/gpu/drm/xe/regs/xe_guc_regs.h    |   1 +
> >   drivers/gpu/drm/xe/xe_guc.c              |   6 +
> >   drivers/gpu/drm/xe/xe_guc_engine_types.h |   7 +
> >   drivers/gpu/drm/xe/xe_guc_submit.c       | 295 ++++++++++++++++++++++-
> >   drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
> >   drivers/gpu/drm/xe/xe_guc_types.h        |   4 +
> >   drivers/gpu/drm/xe/xe_trace.h            |   5 +
> >   7 files changed, 315 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/regs/xe_guc_regs.h b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> > index 37e0ac550931..11b117293a62 100644
> > --- a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> > +++ b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> > @@ -109,6 +109,7 @@ struct guc_doorbell_info {
> >   #define DIST_DBS_POPULATED			XE_REG(0xd08)
> >   #define   DOORBELLS_PER_SQIDI_MASK		REG_GENMASK(23, 16)
> > +#define	  DOORBELLS_PER_SQIDI_SHIFT		16
> >   #define   SQIDIS_DOORBELL_EXIST_MASK		REG_GENMASK(15, 0)
> >   #define GUC_BCS_RCS_IER				XE_REG(0xC550)
> > diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> > index 89d20faced19..0c87f78a868b 100644
> > --- a/drivers/gpu/drm/xe/xe_guc.c
> > +++ b/drivers/gpu/drm/xe/xe_guc.c
> > @@ -297,6 +297,12 @@ int xe_guc_init(struct xe_guc *guc)
> >    */
> >   int xe_guc_init_post_hwconfig(struct xe_guc *guc)
> >   {
> > +	int ret;
> > +
> > +	ret = xe_guc_submit_init_post_hwconfig(guc);
> > +	if (ret)
> > +		return ret;
> > +
> >   	return xe_guc_ads_init_post_hwconfig(&guc->ads);
> >   }
> > diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > index 5d83132034a6..420b7f53e649 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > @@ -12,6 +12,7 @@
> >   #include <drm/gpu_scheduler.h>
> >   struct dma_fence;
> > +struct xe_bo;
> >   struct xe_engine;
> >   /**
> > @@ -37,6 +38,10 @@ struct xe_guc_engine {
> >   	struct work_struct fini_async;
> >   	/** @resume_time: time of last resume */
> >   	u64 resume_time;
> > +	/** @doorbell_bo: BO for memory doorbell */
> > +	struct xe_bo *doorbell_bo;
> > +	/** @doorbell_offset: MMIO doorbell offset */
> > +	u32 doorbell_offset;
> >   	/** @state: GuC specific state for this xe_engine */
> >   	atomic_t state;
> >   	/** @wqi_head: work queue item tail */
> > @@ -45,6 +50,8 @@ struct xe_guc_engine {
> >   	u32 wqi_tail;
> >   	/** @id: GuC id for this xe_engine */
> >   	u16 id;
> > +	/** @doorbell_id: doorbell id */
> > +	u16 doorbell_id;
> >   	/** @suspend_wait: wait queue used to wait on pending suspends */
> >   	wait_queue_head_t suspend_wait;
> >   	/** @suspend_pending: a suspend of the engine is pending */
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index 0a41f5d04f6d..1b6f36b04cd1 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -13,7 +13,10 @@
> >   #include <drm/drm_managed.h>
> > +#include "regs/xe_guc_regs.h"
> >   #include "regs/xe_lrc_layout.h"
> > +
> > +#include "xe_bo.h"
> >   #include "xe_device.h"
> >   #include "xe_engine.h"
> >   #include "xe_force_wake.h"
> > @@ -26,12 +29,22 @@
> >   #include "xe_lrc.h"
> >   #include "xe_macros.h"
> >   #include "xe_map.h"
> > +#include "xe_mmio.h"
> >   #include "xe_mocs.h"
> >   #include "xe_ring_ops_types.h"
> >   #include "xe_sched_job.h"
> >   #include "xe_trace.h"
> >   #include "xe_vm.h"
> > +#define HAS_GUC_MMIO_DB(xe) (IS_DGFX(xe) || GRAPHICS_VERx100(xe) >= 1250)
> > +#define HAS_GUC_DIST_DB(xe) \
> > +	(GRAPHICS_VERx100(xe) >= 1200 && !HAS_GUC_MMIO_DB(xe))
> > +
> > +#define GUC_NUM_HW_DOORBELLS 256
> > +
> > +#define GUC_MMIO_DB_BAR_OFFSET SZ_4M
> > +#define GUC_MMIO_DB_BAR_SIZE SZ_4M
> > +
> >   static struct xe_gt *
> >   guc_to_gt(struct xe_guc *guc)
> >   {
> > @@ -63,6 +76,7 @@ engine_to_guc(struct xe_engine *e)
> >   #define ENGINE_STATE_SUSPENDED		(1 << 5)
> >   #define ENGINE_STATE_RESET		(1 << 6)
> >   #define ENGINE_STATE_KILLED		(1 << 7)
> > +#define ENGINE_STATE_DB_REGISTERED	(1 << 8)
> >   static bool engine_registered(struct xe_engine *e)
> >   {
> > @@ -179,6 +193,16 @@ static void set_engine_killed(struct xe_engine *e)
> >   	atomic_or(ENGINE_STATE_KILLED, &e->guc->state);
> >   }
> > +static bool engine_doorbell_registered(struct xe_engine *e)
> > +{
> > +	return atomic_read(&e->guc->state) & ENGINE_STATE_DB_REGISTERED;
> > +}
> > +
> > +static void set_engine_doorbell_registered(struct xe_engine *e)
> > +{
> > +	atomic_or(ENGINE_STATE_DB_REGISTERED, &e->guc->state);
> > +}
> > +
> >   static bool engine_killed_or_banned(struct xe_engine *e)
> >   {
> >   	return engine_killed(e) || engine_banned(e);
> > @@ -190,6 +214,7 @@ static void guc_submit_fini(struct drm_device *drm, void *arg)
> >   	xa_destroy(&guc->submission_state.engine_lookup);
> >   	ida_destroy(&guc->submission_state.guc_ids);
> > +	ida_destroy(&guc->submission_state.doorbell_ids);
> >   	bitmap_free(guc->submission_state.guc_ids_bitmap);
> >   }
> > @@ -230,6 +255,7 @@ int xe_guc_submit_init(struct xe_guc *guc)
> >   	mutex_init(&guc->submission_state.lock);
> >   	xa_init(&guc->submission_state.engine_lookup);
> >   	ida_init(&guc->submission_state.guc_ids);
> > +	ida_init(&guc->submission_state.doorbell_ids);
> >   	spin_lock_init(&guc->submission_state.suspend.lock);
> >   	guc->submission_state.suspend.context = dma_fence_context_alloc(1);
> > @@ -243,6 +269,237 @@ int xe_guc_submit_init(struct xe_guc *guc)
> >   	return 0;
> >   }
> > +int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc)
> > +{
> > +	if (HAS_GUC_DIST_DB(guc_to_xe(guc))) {
> > +		u32 distdbreg = xe_mmio_read32(guc_to_gt(guc),
> > +					       DIST_DBS_POPULATED.reg);
> > +		u32 num_sqidi =
> > +			hweight32(distdbreg & SQIDIS_DOORBELL_EXIST_MASK);
> > +		u32 doorbells_per_sqidi =
> > +			((distdbreg >> DOORBELLS_PER_SQIDI_SHIFT) &
> > +			 DOORBELLS_PER_SQIDI_MASK) + 1;
> > +
> > +		guc->submission_state.num_doorbells =
> > +			num_sqidi * doorbells_per_sqidi;
> > +	} else {
> > +		guc->submission_state.num_doorbells = GUC_NUM_HW_DOORBELLS;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static bool alloc_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +	int ret;
> > +
> > +	lockdep_assert_held(&guc->submission_state.lock);
> > +
> > +	e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
> > +	ret = ida_simple_get(&guc->submission_state.doorbell_ids, 0,
> > +			     guc->submission_state.num_doorbells, GFP_NOWAIT);
> > +	if (ret < 0)
> > +		return false;
> > +
> > +	e->guc->doorbell_id = ret;
> > +
> > +	return true;
> > +}
> > +
> > +static void release_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +	mutex_lock(&guc->submission_state.lock);
> > +	ida_simple_remove(&guc->submission_state.doorbell_ids,
> > +			  e->guc->doorbell_id);
> > +	mutex_unlock(&guc->submission_state.lock);
> > +
> > +	e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
> > +}
> > +
> > +static int allocate_doorbell(struct xe_guc *guc, u16 guc_id, u16 doorbell_id,
> > +			     u64 gpa, u32 gtt_addr)
> > +{
> > +	u32 action[] = {
> > +		XE_GUC_ACTION_ALLOCATE_DOORBELL,
> > +		guc_id,
> > +		doorbell_id,
> > +		lower_32_bits(gpa),
> > +		upper_32_bits(gpa),
> > +		gtt_addr
> > +	};
> > +
> > +	return xe_guc_ct_send_block(&guc->ct, action, ARRAY_SIZE(action));
> > +}
> > +
> > +static void deallocate_doorbell(struct xe_guc *guc, u16 guc_id)
> > +{
> > +	u32 action[] = {
> > +		XE_GUC_ACTION_DEALLOCATE_DOORBELL,
> > +		guc_id
> > +	};
> > +
> > +	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> > +}
> > +
> > +static bool has_doorbell(struct xe_engine *e)
> > +{
> > +	return e->guc->doorbell_id != GUC_NUM_HW_DOORBELLS;
> > +}
> > +
> > +#define doorbell_read(guc_, e_, field_) ({			\
> > +	struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;	\
> > +	iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);	\
> > +	xe_map_rd_field(guc_to_xe((guc_)), &_vmap, 0,		\
> > +				  struct guc_doorbell_info, field_); \
> > +	})
> > +#define doorbell_write(guc_, e_, field_, val_) ({		\
> > +	struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;	\
> > +	iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);	\
> > +	xe_map_wr_field(guc_to_xe((guc_)), &_vmap, 0,		\
> > +				  struct guc_doorbell_info, field_, val_); \
> > +	})
> > +
> > +static void init_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +	struct xe_device *xe = guc_to_xe(guc);
> > +
> > +	/* GuC does the initialization with distributed and MMIO doorbells */
> > +	if (!HAS_GUC_DIST_DB(xe) && !HAS_GUC_MMIO_DB(xe)) {
> > +		doorbell_write(guc, e, db_status, GUC_DOORBELL_ENABLED);
> > +		doorbell_write(guc, e, cookie, 0);
> > +	}
> > +}
> > +
> > +static void fini_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +	if (!HAS_GUC_MMIO_DB(guc_to_xe(guc)) &&
> > +	    xe_device_mem_access_ongoing(guc_to_xe(guc)))
> > +		doorbell_write(guc, e, db_status, GUC_DOORBELL_DISABLED);
> > +}
> > +
> > +static void destroy_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +	if (has_doorbell(e)) {
> > +		release_doorbell_id(guc, e);
> > +		xe_bo_unpin_map_no_vm(e->guc->doorbell_bo);
> > +	}
> > +}
> > +
> > +static void ring_memory_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +	u32 cookie;
> > +
> > +	cookie = doorbell_read(guc, e, cookie);
> > +	doorbell_write(guc, e, cookie, cookie + 1 ?: cookie + 2);
> > +
> > +	XE_WARN_ON(doorbell_read(guc, e, db_status) != GUC_DOORBELL_ENABLED);
> > +}
> > +
> > +#define GUC_MMIO_DOORBELL_RING_ACK	0xACEDBEEF
> > +#define GUC_MMIO_DOORBELL_RING_NACK	0xDEADBEEF
> > +static void ring_mmio_doorbell(struct xe_guc *guc, u32 doorbell_offset)
> > +{
> > +	u32 db_value;
> > +
> > +	db_value = xe_mmio_read32(guc_to_gt(guc), GUC_MMIO_DB_BAR_OFFSET +
> > +				  doorbell_offset);
> > +
> > +	/*
> > +	 * The read from the doorbell page will return ack/nack. We don't remove
> > +	 * doorbells from active clients so we don't expect to ever get a nack.
> > +	 * XXX: if doorbell is lost, re-acquire it?
> > +	 */
> > +	XE_WARN_ON(db_value == GUC_MMIO_DOORBELL_RING_NACK);
> > +	XE_WARN_ON(db_value != GUC_MMIO_DOORBELL_RING_ACK);
> > +}
> > +
> > +static void ring_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +	XE_BUG_ON(!has_doorbell(e));
> > +
> > +	if (HAS_GUC_MMIO_DB(guc_to_xe(guc)))
> > +		ring_mmio_doorbell(guc, e->guc->doorbell_offset);
> > +	else
> > +		ring_memory_doorbell(guc, e);
> > +
> > +	trace_xe_engine_ring_db(e);
> > +}
> > +
> > +static void register_engine(struct xe_engine *e);
> > +
> > +static int create_doorbell(struct xe_guc *guc, struct xe_engine *e, bool init)
> > +{
> > +	struct xe_gt *gt = guc_to_gt(guc);
> > +	struct xe_device *xe = gt_to_xe(gt);
> > +	u64 gpa;
> > +	u32 gtt_addr;
> > +	int ret;
> > +
> > +	XE_BUG_ON(!has_doorbell(e));
> > +
> > +	if (HAS_GUC_MMIO_DB(xe)) {
> > +		e->guc->doorbell_offset = PAGE_SIZE * e->guc->doorbell_id;
> > +		gpa = GUC_MMIO_DB_BAR_OFFSET + e->guc->doorbell_offset;
> > +		gtt_addr = 0;
> > +	} else {
> > +		struct xe_bo *bo;
> > +
> > +		if (!e->guc->doorbell_bo) {
> > +			bo = xe_bo_create_pin_map(xe, gt, NULL, PAGE_SIZE,
> > +						  ttm_bo_type_kernel,
> > +						  XE_BO_CREATE_VRAM_IF_DGFX(gt) |
> > +						  XE_BO_CREATE_GGTT_BIT);
> > +			if (IS_ERR(bo))
> > +				return PTR_ERR(bo);
> > +
> > +			e->guc->doorbell_bo = bo;
> > +		} else {
> > +			bo = e->guc->doorbell_bo;
> > +		}
> > +
> > +		init_doorbell(guc, e);
> > +		gpa = xe_bo_main_addr(bo, PAGE_SIZE);
> > +		gtt_addr = xe_bo_ggtt_addr(bo);
> > +	}
> > +
> > +	if (init && e->flags & ENGINE_FLAG_KERNEL)
> > +		return 0;
> > +
> > +	register_engine(e);
> > +	ret = allocate_doorbell(guc, e->guc->id, e->guc->doorbell_id, gpa,
> > +				gtt_addr);
> > +	if (ret < 0) {
> > +		fini_doorbell(guc, e);
> > +		return ret;
> > +	}
> > +
> > +	/*
> > +	 * In distributed doorbells, guc is returning the cacheline selected
> > +	 * by HW as part of the 7bit data from the allocate doorbell command:
> > +	 *  bit [22]   - Cacheline allocated
> > +	 *  bit [21:16] - Cacheline offset address
> > +	 * (bit 21 must be zero, or our assumption of only using half a page is
> > +	 * no longer correct).
> > +	 */
> > +	if (HAS_GUC_DIST_DB(xe)) {
> > +		u32 dd_cacheline_info;
> > +
> > +		XE_WARN_ON(!(ret & BIT(22)));
> > +		XE_WARN_ON(ret & BIT(21));
> > +
> > +		dd_cacheline_info = FIELD_GET(GENMASK(21, 16), ret);
> > +		e->guc->doorbell_offset = dd_cacheline_info * cache_line_size();
> > +
> > +		/* and verify db status was updated correctly by the guc fw */
> > +		XE_WARN_ON(doorbell_read(guc, e, db_status) !=
> > +			   GUC_DOORBELL_ENABLED);
> > +	}
> > +
> > +	set_engine_doorbell_registered(e);
> > +
> > +	return 0;
> > +}
> > +
> >   static int alloc_guc_id(struct xe_guc *guc, struct xe_engine *e)
> >   {
> >   	int ret;
> > @@ -623,6 +880,7 @@ static void submit_engine(struct xe_engine *e)
> >   	u32 num_g2h = 0;
> >   	int len = 0;
> >   	bool extra_submit = false;
> > +	bool enable = false;
> >   	XE_BUG_ON(!engine_registered(e));
> > @@ -642,6 +900,7 @@ static void submit_engine(struct xe_engine *e)
> >   		num_g2h = 1;
> >   		if (xe_engine_is_parallel(e))
> >   			extra_submit = true;
> > +		enable = true;
> >   		e->guc->resume_time = RESUME_PENDING;
> >   		set_engine_pending_enable(e);
> > @@ -653,7 +912,10 @@ static void submit_engine(struct xe_engine *e)
> >   		trace_xe_engine_submit(e);
> >   	}
> > -	xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
> > +	if (enable || !engine_doorbell_registered(e))
> > +		xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
> > +	else
> > +		ring_doorbell(guc, e);
> >   	if (extra_submit) {
> >   		len = 0;
> > @@ -678,8 +940,17 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
> >   	trace_xe_sched_job_run(job);
> >   	if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
> > -		if (!engine_registered(e))
> > -			register_engine(e);
> > +		if (!engine_registered(e)) {
> > +			if (has_doorbell(e)) {
> > +				int err = create_doorbell(engine_to_guc(e), e,
> > +							  false);
> > +
> > +				/* Not fatal, but let's warn */
> > +				XE_WARN_ON(err);
> > +			} else {
> > +				register_engine(e);
> > +			}
> > +		}
> >   		if (!lr)	/* Written in IOCTL */
> >   			e->ring_ops->emit_job(job);
> >   		submit_engine(e);
> > @@ -722,6 +993,11 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
> >   	MAKE_SCHED_CONTEXT_ACTION(e, DISABLE);
> >   	int ret;
> > +	if (has_doorbell(e)) {
> > +		fini_doorbell(guc, e);
> > +		deallocate_doorbell(guc, e->guc->id);
> > +	}
> > +
> >   	set_min_preemption_timeout(guc, e);
> >   	smp_rmb();
> >   	ret = wait_event_timeout(guc->ct.wq, !engine_pending_enable(e) ||
> > @@ -958,6 +1234,7 @@ static void __guc_engine_fini_async(struct work_struct *w)
> >   		cancel_work_sync(&ge->lr_tdr);
> >   	if (e->flags & ENGINE_FLAG_PERSISTENT)
> >   		xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
> > +	destroy_doorbell(guc, e);
> >   	release_guc_id(guc, e);
> >   	drm_sched_entity_fini(&ge->entity);
> >   	drm_sched_fini(&ge->sched);
> > @@ -1136,6 +1413,7 @@ static int guc_engine_init(struct xe_engine *e)
> >   	struct xe_guc_engine *ge;
> >   	long timeout;
> >   	int err;
> > +	bool create_db = false;
> >   	XE_BUG_ON(!xe_device_guc_submission_enabled(guc_to_xe(guc)));
> > @@ -1177,8 +1455,17 @@ static int guc_engine_init(struct xe_engine *e)
> >   	if (guc_read_stopped(guc))
> >   		drm_sched_stop(sched, NULL);
> > +	create_db = alloc_doorbell_id(guc, e);
> > +
> >   	mutex_unlock(&guc->submission_state.lock);
> > +	if (create_db) {
> > +		/* Error isn't fatal as we don't need a doorbell */
> > +		err = create_doorbell(guc, e, true);
> > +		if (err)
> > +			release_doorbell_id(guc, e);
> > +	}
> > +
> >   	switch (e->class) {
> >   	case XE_ENGINE_CLASS_RENDER:
> >   		sprintf(e->name, "rcs%d", e->guc->id);
> > @@ -1302,7 +1589,7 @@ static int guc_engine_set_job_timeout(struct xe_engine *e, u32 job_timeout_ms)
> >   {
> >   	struct drm_gpu_scheduler *sched = &e->guc->sched;
> > -	XE_BUG_ON(engine_registered(e));
> > +	XE_BUG_ON(engine_registered(e) && !has_doorbell(e));
> >   	XE_BUG_ON(engine_banned(e));
> >   	XE_BUG_ON(engine_killed(e));
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
> > index 8002734d6f24..bada6c02d6aa 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
> > @@ -13,6 +13,7 @@ struct xe_engine;
> >   struct xe_guc;
> >   int xe_guc_submit_init(struct xe_guc *guc);
> > +int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc);
> >   void xe_guc_submit_print(struct xe_guc *guc, struct drm_printer *p);
> >   int xe_guc_submit_reset_prepare(struct xe_guc *guc);
> > diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
> > index ac7eec28934d..9ee4d572f4e0 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_types.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_types.h
> > @@ -36,10 +36,14 @@ struct xe_guc {
> >   		struct xarray engine_lookup;
> >   		/** @guc_ids: used to allocate new guc_ids, single-lrc */
> >   		struct ida guc_ids;
> > +		/** @doorbell_ids: use to allocate new doorbells */
> > +		struct ida doorbell_ids;
> >   		/** @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc */
> >   		unsigned long *guc_ids_bitmap;
> >   		/** @stopped: submissions are stopped */
> >   		atomic_t stopped;
> > +		/** @num_doorbells: number of doorbels */
> > +		int num_doorbells;
> >   		/** @lock: protects submission state */
> >   		struct mutex lock;
> >   		/** @suspend: suspend fence state */
> > diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> > index 02861c26e145..38e9d7c6197b 100644
> > --- a/drivers/gpu/drm/xe/xe_trace.h
> > +++ b/drivers/gpu/drm/xe/xe_trace.h
> > @@ -149,6 +149,11 @@ DEFINE_EVENT(xe_engine, xe_engine_submit,
> >   	     TP_ARGS(e)
> >   );
> > +DEFINE_EVENT(xe_engine, xe_engine_ring_db,
> > +	     TP_PROTO(struct xe_engine *e),
> > +	     TP_ARGS(e)
> > +);
> > +
> >   DEFINE_EVENT(xe_engine, xe_engine_scheduling_enable,
> >   	     TP_PROTO(struct xe_engine *e),
> >   	     TP_ARGS(e)

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 15/31] drm: manager to keep track of GPUs VA mappings
  2023-05-09 13:49   ` Thomas Hellström
@ 2023-05-10  0:55     ` Matthew Brost
  0 siblings, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-10  0:55 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: Dave Airlie, Danilo Krummrich, intel-xe

On Tue, May 09, 2023 at 03:49:12PM +0200, Thomas Hellström wrote:
> 
> On 5/2/23 02:17, Matthew Brost wrote:
> > From: Danilo Krummrich <dakr@redhat.com>
> > 
> > Add infrastructure to keep track of GPU virtual address (VA) mappings
> > with a decicated VA space manager implementation.
> > 
> > New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
> > start implementing, allow userspace applications to request multiple and
> > arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
> > intended to serve the following purposes in this context.
> > 
> > 1) Provide infrastructure to track GPU VA allocations and mappings,
> >     making use of the maple_tree.
> > 
> > 2) Generically connect GPU VA mappings to their backing buffers, in
> >     particular DRM GEM objects.
> > 
> > 3) Provide a common implementation to perform more complex mapping
> >     operations on the GPU VA space. In particular splitting and merging
> >     of GPU VA mappings, e.g. for intersecting mapping requests or partial
> >     unmap requests.
> > 
> > Suggested-by: Dave Airlie <airlied@redhat.com>
> > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> 
> Danilo, Matthew
> 
> Before embarking on a full review of this (saving this for last) I heard
> there might be plans to add userptr support, rebind_work interaction and
> such and resolve any driver differences using vfuncs.
> 
> Just wanted to raise a warning that helpers that attempt to "do it all" and
> depend on vfuncs are easy traps to start creating middle layers (like TTM)
> which are typically frowned upon. (See for example the discussion on the
> partly rejected patch series on the TTM shrinker).
> 
> So just as a recommendation to avoid redoing a lot of stuff, please be
> careful with additional helpers that require vfuncs and check if they can be
> implemented in another way by rethinking the layering.
> 

Noted. This series goes as far as supporting userptr as a GPUVA in a
standard way, an extobj list per manager (VM in Xe), plus common ways to
lock a manager. Agree much more than would be difficult to do in a
standard way.

Matt

> Thanks,
> 
> Thomas
> 
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move
  2023-05-09 22:09     ` Matthew Brost
@ 2023-05-10  1:37       ` Rodrigo Vivi
  0 siblings, 0 replies; 126+ messages in thread
From: Rodrigo Vivi @ 2023-05-10  1:37 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Rodrigo Vivi

On Tue, May 09, 2023 at 10:09:57PM +0000, Matthew Brost wrote:
> On Mon, May 08, 2023 at 05:39:12PM -0400, Rodrigo Vivi wrote:
> > On Mon, May 01, 2023 at 05:17:04PM -0700, Matthew Brost wrote:
> > > Use the TTM LRU bulk move for BOs tied to a VM. Update the bulk moves
> > > LRU position on every exec.
> > >
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_bo.c       | 32 ++++++++++++++++++++++++++++----
> > >  drivers/gpu/drm/xe/xe_bo.h       |  4 ++--
> > >  drivers/gpu/drm/xe/xe_dma_buf.c  |  2 +-
> > >  drivers/gpu/drm/xe/xe_exec.c     |  6 ++++++
> > >  drivers/gpu/drm/xe/xe_vm_types.h |  3 +++
> > >  5 files changed, 40 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > > index 3ab404e33fae..da99ee53e7d7 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > @@ -985,6 +985,23 @@ static void xe_gem_object_free(struct drm_gem_object *obj)
> > >  	ttm_bo_put(container_of(obj, struct ttm_buffer_object, base));
> > >  }
> > >
> > > +static void xe_gem_object_close(struct drm_gem_object *obj,
> > > +				struct drm_file *file_priv)
> > > +{
> > > +	struct xe_bo *bo = gem_to_xe_bo(obj);
> > > +
> > > +	if (bo->vm && !xe_vm_no_dma_fences(bo->vm)) {
> > > +		struct ww_acquire_ctx ww;
> > > +
> > > +		XE_BUG_ON(!xe_bo_is_user(bo));
> >
> > We need to really stop using BUG_ON and move towards the usage of more WARNs.
> >
>
> If that is the direction, sure I'll change this but personally I BUG_ON
> for things that should be impossible with a correct KMD.

This is the current trend and official Kernel recommendation. Part of
checkpatch:

# do not use BUG() or variants
                if ($line =~ /\b(?!AA_|BUILD_|DCCP_|IDA_|KVM_|RWLOCK_|snd_|SPIN_)(?:[a-zA-Z_]*_)?BUG(?:_ON)?(?:_[A-Z_]+)?\s*\(/) {
                        my $msg_level = \&WARN;
                        $msg_level = \&CHK if ($file);
                        &{$msg_level}("AVOID_BUG",
                                      "Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants\n" . $herecurr);
                }

>
> Matt
>
> > But the rest of the patch looks good to me... I just believe it would be
> > good to get Thomas' review here.
> >
> > > +
> > > +		xe_bo_lock(bo, &ww, 0, false);
> > > +		ttm_bo_set_bulk_move(&bo->ttm, NULL);
> > > +		xe_bo_unlock(bo, &ww);
> > > +	}
> > > +}
> > > +
> > > +
> > >  static bool should_migrate_to_system(struct xe_bo *bo)
> > >  {
> > >  	struct xe_device *xe = xe_bo_device(bo);
> > > @@ -1040,6 +1057,7 @@ static const struct vm_operations_struct xe_gem_vm_ops = {
> > >
> > >  static const struct drm_gem_object_funcs xe_gem_object_funcs = {
> > >  	.free = xe_gem_object_free,
> > > +	.close = xe_gem_object_close,
> > >  	.mmap = drm_gem_ttm_mmap,
> > >  	.export = xe_gem_prime_export,
> > >  	.vm_ops = &xe_gem_vm_ops,
> > > @@ -1081,8 +1099,8 @@ void xe_bo_free(struct xe_bo *bo)
> > >
> > >  struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > >  				    struct xe_gt *gt, struct dma_resv *resv,
> > > -				    size_t size, enum ttm_bo_type type,
> > > -				    u32 flags)
> > > +				    struct ttm_lru_bulk_move *bulk, size_t size,
> > > +				    enum ttm_bo_type type, u32 flags)
> > >  {
> > >  	struct ttm_operation_ctx ctx = {
> > >  		.interruptible = true,
> > > @@ -1149,7 +1167,10 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > >  		return ERR_PTR(err);
> > >
> > >  	bo->created = true;
> > > -	ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
> > > +	if (bulk)
> > > +		ttm_bo_set_bulk_move(&bo->ttm, bulk);
> > > +	else
> > > +		ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
> > >
> > >  	return bo;
> > >  }
> > > @@ -1219,7 +1240,10 @@ xe_bo_create_locked_range(struct xe_device *xe,
> > >  		}
> > >  	}
> > >
> > > -	bo = __xe_bo_create_locked(xe, bo, gt, vm ? &vm->resv : NULL, size,
> > > +	bo = __xe_bo_create_locked(xe, bo, gt, vm ? &vm->resv : NULL,
> > > +				   vm && !xe_vm_no_dma_fences(vm) &&
> > > +				   flags & XE_BO_CREATE_USER_BIT ?
> > > +				   &vm->lru_bulk_move : NULL, size,
> > >  				   type, flags);
> > >  	if (IS_ERR(bo))
> > >  		return bo;
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> > > index 8354d05ccdf3..25457b3c757b 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.h
> > > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > > @@ -81,8 +81,8 @@ void xe_bo_free(struct xe_bo *bo);
> > >
> > >  struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > >  				    struct xe_gt *gt, struct dma_resv *resv,
> > > -				    size_t size, enum ttm_bo_type type,
> > > -				    u32 flags);
> > > +				    struct ttm_lru_bulk_move *bulk, size_t size,
> > > +				    enum ttm_bo_type type, u32 flags);
> > >  struct xe_bo *
> > >  xe_bo_create_locked_range(struct xe_device *xe,
> > >  			  struct xe_gt *gt, struct xe_vm *vm,
> > > diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> > > index 9b252cc782b7..975dee1f770f 100644
> > > --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> > > +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> > > @@ -199,7 +199,7 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
> > >  	int ret;
> > >
> > >  	dma_resv_lock(resv, NULL);
> > > -	bo = __xe_bo_create_locked(xe, storage, NULL, resv, dma_buf->size,
> > > +	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> > >  				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
> > >  	if (IS_ERR(bo)) {
> > >  		ret = PTR_ERR(bo);
> > > diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> > > index 44ea9bcd0066..21a9c2fddf86 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec.c
> > > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > > @@ -374,6 +374,12 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > >  	xe_sched_job_push(job);
> > >  	xe_vm_reactivate_rebind(vm);
> > >
> > > +	if (!err && !xe_vm_no_dma_fences(vm)) {
> > > +		spin_lock(&xe->ttm.lru_lock);
> > > +		ttm_lru_bulk_move_tail(&vm->lru_bulk_move);
> > > +		spin_unlock(&xe->ttm.lru_lock);
> > > +	}
> > > +
> > >  err_repin:
> > >  	if (!xe_vm_no_dma_fences(vm))
> > >  		up_read(&vm->userptr.notifier_lock);
> > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> > > index fada7896867f..d3e99f22510d 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > > @@ -164,6 +164,9 @@ struct xe_vm {
> > >  	/** Protects @rebind_list and the page-table structures */
> > >  	struct dma_resv resv;
> > >
> > > +	/** @lru_bulk_move: Bulk LRU move list for this VM's BOs */
> > > +	struct ttm_lru_bulk_move lru_bulk_move;
> > > +
> > >  	u64 size;
> > >  	struct rb_root vmas;
> > >
> > > --
> > > 2.34.1
> > >

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move
  2023-05-09 22:05     ` Matthew Brost
@ 2023-05-10  8:14       ` Thomas Hellström
  2023-05-10 18:40         ` Matthew Brost
  0 siblings, 1 reply; 126+ messages in thread
From: Thomas Hellström @ 2023-05-10  8:14 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe


On 5/10/23 00:05, Matthew Brost wrote:
> On Tue, May 09, 2023 at 02:47:54PM +0200, Thomas Hellström wrote:
>> On 5/2/23 02:17, Matthew Brost wrote:
>>> Use the TTM LRU bulk move for BOs tied to a VM. Update the bulk moves
>>> LRU position on every exec.
>>>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/xe_bo.c       | 32 ++++++++++++++++++++++++++++----
>>>    drivers/gpu/drm/xe/xe_bo.h       |  4 ++--
>>>    drivers/gpu/drm/xe/xe_dma_buf.c  |  2 +-
>>>    drivers/gpu/drm/xe/xe_exec.c     |  6 ++++++
>>>    drivers/gpu/drm/xe/xe_vm_types.h |  3 +++
>>>    5 files changed, 40 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>>> index 3ab404e33fae..da99ee53e7d7 100644
>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>> @@ -985,6 +985,23 @@ static void xe_gem_object_free(struct drm_gem_object *obj)
>>>    	ttm_bo_put(container_of(obj, struct ttm_buffer_object, base));
>>>    }
>>> +static void xe_gem_object_close(struct drm_gem_object *obj,
>>> +				struct drm_file *file_priv)
>>> +{
>>> +	struct xe_bo *bo = gem_to_xe_bo(obj);
>>> +
>>> +	if (bo->vm && !xe_vm_no_dma_fences(bo->vm)) {
>> Is there a reason we don't use bulk moves for LR vms? Admittedly bumping LRU
>> doesn't make much sense when we support user-space command buffer chaining,
>> but I think we should be doing it on exec at least, no?
> Maybe you could make the argument for compute VMs, the preempt worker in
> that case should probably do a bulk move. I can change this if desired.
Yes, please.
>
> Fot a fault VM it makes no sense as the fault handler updates the LRU
> for individual BOs.
Yes that makes sense.
>
>>> +		struct ww_acquire_ctx ww;
>>> +
>>> +		XE_BUG_ON(!xe_bo_is_user(bo));
>> Also why can't we use this for kernel objects as well? At some point we want
>> to get to evictable page-table objects? Could we do this in the
>> release_notify() callback to cover all potential bos?
>>
> xe_gem_object_close is a user call, right? We can't call this on kernel
> BOs. This also could be outside the if statement.

Hmm, yes the question was can we stop doing this in 
xe_gem_object_close() and instead do it in release_notify() to cover 
also kernel objects. Since release_notify() is called just after 
individualizing dma_resv, it makes sense to individualize also LRU at 
that point?

/Thomas


>
> Matt
>
>> /Thomas
>>
>>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 05/31] drm/xe: Long running job update
  2023-05-09 22:16         ` Matthew Brost
@ 2023-05-10  8:15           ` Thomas Hellström
  0 siblings, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-10  8:15 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe


On 5/10/23 00:16, Matthew Brost wrote:
> On Tue, May 09, 2023 at 05:21:39PM +0200, Thomas Hellström wrote:
>> On 5/9/23 16:56, Matthew Brost wrote:
>>> On Mon, May 08, 2023 at 03:14:10PM +0200, Thomas Hellström wrote:
>>>> Hi, Matthew
>>>>
>>>> In addition to Rodrigo's comments:
>>>>
>>>> On 5/2/23 02:17, Matthew Brost wrote:
>>>>> Flow control + write ring in exec, return NULL in run_job, siganl
>>>>> xe_hw_fence immediately, and override TDR for LR jobs.
>>>>>
>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>> ---
>>>>>     drivers/gpu/drm/xe/xe_engine.c           | 32 ++++++++
>>>>>     drivers/gpu/drm/xe/xe_engine.h           |  4 +
>>>>>     drivers/gpu/drm/xe/xe_exec.c             |  8 ++
>>>>>     drivers/gpu/drm/xe/xe_guc_engine_types.h |  2 +
>>>>>     drivers/gpu/drm/xe/xe_guc_submit.c       | 95 +++++++++++++++++++++---
>>>>>     drivers/gpu/drm/xe/xe_trace.h            |  5 ++
>>>>>     6 files changed, 137 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
>>>>> index 094ec17d3004..d1e84d7adbd4 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_engine.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_engine.c
>>>>> @@ -18,6 +18,7 @@
>>>>>     #include "xe_macros.h"
>>>>>     #include "xe_migrate.h"
>>>>>     #include "xe_pm.h"
>>>>> +#include "xe_ring_ops_types.h"
>>>>>     #include "xe_trace.h"
>>>>>     #include "xe_vm.h"
>>>>> @@ -673,6 +674,37 @@ static void engine_kill_compute(struct xe_engine *e)
>>>>>     	up_write(&e->vm->lock);
>>>>>     }
>>>>> +/**
>>>>> + * xe_engine_is_lr() - Whether an engine is long-running
>>>>> + * @e: The engine
>>>>> + *
>>>>> + * Return: True if the engine is long-running, false otherwise.
>>>>> + */
>>>>> +bool xe_engine_is_lr(struct xe_engine *e)
>>>>> +{
>>>>> +	return e->vm && xe_vm_no_dma_fences(e->vm) &&
>>>>> +		!(e->flags & ENGINE_FLAG_VM);
>>>>> +}
>>>>> +
>>>>> +static s32 xe_engine_num_job_inflight(struct xe_engine *e)
>>>>> +{
>>>>> +	return e->lrc->fence_ctx.next_seqno - xe_lrc_seqno(e->lrc) - 1;
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * xe_engine_ring_full() - Whether an engine's ring is full
>>>>> + * @e: The engine
>>>>> + *
>>>>> + * Return: True if the engine's ring is full, false otherwise.
>>>>> + */
>>>>> +bool xe_engine_ring_full(struct xe_engine *e)
>>>>> +{
>>>>> +	struct xe_lrc *lrc = e->lrc;
>>>>> +	s32 max_job = lrc->ring.size / MAX_JOB_SIZE_BYTES;
>>>>> +
>>>>> +	return xe_engine_num_job_inflight(e) >= max_job;
>>>>> +}
>>>>> +
>>>>>     /**
>>>>>      * xe_engine_is_idle() - Whether an engine is idle.
>>>>>      * @engine: The engine
>>>>> diff --git a/drivers/gpu/drm/xe/xe_engine.h b/drivers/gpu/drm/xe/xe_engine.h
>>>>> index a49cf2ab405e..2e60f6d90226 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_engine.h
>>>>> +++ b/drivers/gpu/drm/xe/xe_engine.h
>>>>> @@ -42,6 +42,10 @@ static inline bool xe_engine_is_parallel(struct xe_engine *engine)
>>>>>     	return engine->width > 1;
>>>>>     }
>>>>> +bool xe_engine_is_lr(struct xe_engine *e);
>>>>> +
>>>>> +bool xe_engine_ring_full(struct xe_engine *e);
>>>>> +
>>>>>     bool xe_engine_is_idle(struct xe_engine *engine);
>>>>>     void xe_engine_kill(struct xe_engine *e);
>>>>> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
>>>>> index ea869f2452ef..44ea9bcd0066 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_exec.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_exec.c
>>>>> @@ -13,6 +13,7 @@
>>>>>     #include "xe_device.h"
>>>>>     #include "xe_engine.h"
>>>>>     #include "xe_macros.h"
>>>>> +#include "xe_ring_ops_types.h"
>>>>>     #include "xe_sched_job.h"
>>>>>     #include "xe_sync.h"
>>>>>     #include "xe_vm.h"
>>>>> @@ -277,6 +278,11 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>>>>     		goto err_engine_end;
>>>>>     	}
>>>>> +	if (xe_engine_is_lr(engine) && xe_engine_ring_full(engine)) {
>>>>> +		err = -EWOULDBLOCK;
>>>>> +		goto err_engine_end;
>>>>> +	}
>>>>> +
>>>>>     	job = xe_sched_job_create(engine, xe_engine_is_parallel(engine) ?
>>>>>     				  addresses : &args->address);
>>>>>     	if (IS_ERR(job)) {
>>>>> @@ -363,6 +369,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>>>>     		xe_sync_entry_signal(&syncs[i], job,
>>>>>     				     &job->drm.s_fence->finished);
>>>>> +	if (xe_engine_is_lr(engine))
>>>>> +		engine->ring_ops->emit_job(job);
>>>>>     	xe_sched_job_push(job);
>>>>>     	xe_vm_reactivate_rebind(vm);
>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
>>>>> index cbfb13026ec1..5d83132034a6 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
>>>>> +++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
>>>>> @@ -31,6 +31,8 @@ struct xe_guc_engine {
>>>>>     	 */
>>>>>     #define MAX_STATIC_MSG_TYPE	3
>>>>>     	struct drm_sched_msg static_msgs[MAX_STATIC_MSG_TYPE];
>>>>> +	/** @lr_tdr: long running TDR worker */
>>>>> +	struct work_struct lr_tdr;
>>>>>     	/** @fini_async: do final fini async from this worker */
>>>>>     	struct work_struct fini_async;
>>>>>     	/** @resume_time: time of last resume */
>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>>>>> index 68d09e7a4cc0..0a41f5d04f6d 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>>>>> @@ -500,6 +500,14 @@ static void register_engine(struct xe_engine *e)
>>>>>     		parallel_write(xe, map, wq_desc.wq_status, WQ_STATUS_ACTIVE);
>>>>>     	}
>>>>> +	/*
>>>>> +	 * We must keep a reference for LR engines if engine is registered with
>>>>> +	 * the GuC as jobs signal immediately and can't destroy an engine if the
>>>>> +	 * GuC has a reference to it.
>>>>> +	 */
>>>>> +	if (xe_engine_is_lr(e))
>>>>> +		xe_engine_get(e);
>>>>> +
>>>>>     	set_engine_registered(e);
>>>>>     	trace_xe_engine_register(e);
>>>>>     	if (xe_engine_is_parallel(e))
>>>>> @@ -662,6 +670,7 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
>>>>>     {
>>>>>     	struct xe_sched_job *job = to_xe_sched_job(drm_job);
>>>>>     	struct xe_engine *e = job->engine;
>>>>> +	bool lr = xe_engine_is_lr(e);
>>>>>     	XE_BUG_ON((engine_destroyed(e) || engine_pending_disable(e)) &&
>>>>>     		  !engine_banned(e) && !engine_suspended(e));
>>>>> @@ -671,14 +680,19 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
>>>>>     	if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
>>>>>     		if (!engine_registered(e))
>>>>>     			register_engine(e);
>>>>> -		e->ring_ops->emit_job(job);
>>>>> +		if (!lr)	/* Written in IOCTL */
>>>> Hmm? What does "Written in IOCTL mean?" Could you rephrase to something more
>>>> descriptive?
>>>>
>>> "LR jos are emitted in the IOCTL"
>> Ah, I read it as "the lr variable was written in IOCTL."
>>
>> Perhaps LR jobs are already emitted at execbuf time?
>>
> I missed exec in my update.
>
> s/LR jos are emitted in the IOCTL/LR jos are emitted in the exec IOCTL/

Sounds good, with also s/jos/jobs/

/Thomas



>
> Matt
>
>> /Thomas
>>
>>
>>> Does that work?
>>>
>>> Matt
>>>
>>>>> +			e->ring_ops->emit_job(job);
>>>>>     		submit_engine(e);
>>>>>     	}
>>>>> -	if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags))
>>>>> +	if (lr) {
>>>>> +		xe_sched_job_set_error(job, -ENOTSUPP);
>>>>> +		return NULL;
>>>>> +	} else if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags)) {
>>>>>     		return job->fence;
>>>>> -	else
>>>>> +	} else {
>>>>>     		return dma_fence_get(job->fence);
>>>>> +	}
>>>>>     }
>>>>>     static void guc_engine_free_job(struct drm_sched_job *drm_job)
>>>>> @@ -782,6 +796,57 @@ static void simple_error_capture(struct xe_engine *e)
>>>>>     }
>>>>>     #endif
>>>>> +static void xe_guc_engine_trigger_cleanup(struct xe_engine *e)
>>>>> +{
>>>>> +	struct xe_guc *guc = engine_to_guc(e);
>>>>> +
>>>>> +	if (xe_engine_is_lr(e))
>>>>> +		queue_work(guc_to_gt(guc)->ordered_wq, &e->guc->lr_tdr);
>>>>> +	else
>>>>> +		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
>>>>> +}
>>>>> +
>>>>> +static void xe_guc_engine_lr_cleanup(struct work_struct *w)
>>>>> +{
>>>>> +	struct xe_guc_engine *ge =
>>>>> +		container_of(w, struct xe_guc_engine, lr_tdr);
>>>>> +	struct xe_engine *e = ge->engine;
>>>>> +	struct drm_gpu_scheduler *sched = &ge->sched;
>>>>> +
>>>>> +	XE_BUG_ON(!xe_engine_is_lr(e));
>>>>> +	trace_xe_engine_lr_cleanup(e);
>>>>> +
>>>>> +	/* Kill the run_job / process_msg entry points */
>>>>> +	drm_sched_run_wq_stop(sched);
>>>>> +
>>>>> +	/* Engine state now stable, disable scheduling / deregister if needed */
>>>>> +	if (engine_registered(e)) {
>>>>> +		struct xe_guc *guc = engine_to_guc(e);
>>>>> +		int ret;
>>>>> +
>>>>> +		set_engine_banned(e);
>>>>> +		xe_engine_get(e);
>>>>> +		disable_scheduling_deregister(guc, e);
>>>>> +
>>>>> +		/*
>>>>> +		 * Must wait for scheduling to be disabled before signalling
>>>>> +		 * any fences, if GT broken the GT reset code should signal us.
>>>>> +		 */
>>>>> +		smp_rmb();
>>>> wait_event() paired with wake_up() family of functions typically set the
>>>> necessary barriers to make sure anything written prior to wake_up() is seen
>>>> in wait_event(). So that smp_rmb() is most likely not needed. If it still
>>>> is, its pairing smp_wmb() should be documented and pointed to as well. See
>>>> documentation of set_current_state() vs __set_current_state().
>>>>
>>>>> +		ret = wait_event_timeout(guc->ct.wq,
>>>>> +					 !engine_pending_disable(e) ||
>>>>> +					 guc_read_stopped(guc), HZ * 5);
>>>>> +		if (!ret) {
>>>>> +			XE_WARN_ON("Schedule disable failed to respond");
>>>>> +			drm_sched_run_wq_start(sched);
>>>>> +			xe_gt_reset_async(e->gt);
>>>>> +			return;
>>>>> +		}
>>>>> +	}
>>>>> +
>>>>> +	drm_sched_run_wq_start(sched);
>>>>> +}
>>>>> +
>>>>>     static enum drm_gpu_sched_stat
>>>>>     guc_engine_timedout_job(struct drm_sched_job *drm_job)
>>>>>     {
>>>>> @@ -832,7 +897,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
>>>>>     			err = -EIO;
>>>>>     		set_engine_banned(e);
>>>>>     		xe_engine_get(e);
>>>>> -		disable_scheduling_deregister(engine_to_guc(e), e);
>>>>> +		disable_scheduling_deregister(guc, e);
>>>>>     		/*
>>>>>     		 * Must wait for scheduling to be disabled before signalling
>>>>> @@ -865,7 +930,7 @@ guc_engine_timedout_job(struct drm_sched_job *drm_job)
>>>>>     	 */
>>>>>     	list_add(&drm_job->list, &sched->pending_list);
>>>>>     	drm_sched_run_wq_start(sched);
>>>>> -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
>>>>> +	xe_guc_engine_trigger_cleanup(e);
>>>>>     	/* Mark all outstanding jobs as bad, thus completing them */
>>>>>     	spin_lock(&sched->job_list_lock);
>>>>> @@ -889,6 +954,8 @@ static void __guc_engine_fini_async(struct work_struct *w)
>>>>>     	trace_xe_engine_destroy(e);
>>>>> +	if (xe_engine_is_lr(e))
>>>>> +		cancel_work_sync(&ge->lr_tdr);
>>>>>     	if (e->flags & ENGINE_FLAG_PERSISTENT)
>>>>>     		xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
>>>>>     	release_guc_id(guc, e);
>>>>> @@ -906,7 +973,7 @@ static void guc_engine_fini_async(struct xe_engine *e)
>>>>>     	bool kernel = e->flags & ENGINE_FLAG_KERNEL;
>>>>>     	INIT_WORK(&e->guc->fini_async, __guc_engine_fini_async);
>>>>> -	queue_work(system_unbound_wq, &e->guc->fini_async);
>>>>> +	queue_work(system_wq, &e->guc->fini_async);
>>>>>     	/* We must block on kernel engines so slabs are empty on driver unload */
>>>>>     	if (kernel) {
>>>>> @@ -1089,12 +1156,16 @@ static int guc_engine_init(struct xe_engine *e)
>>>>>     	if (err)
>>>>>     		goto err_free;
>>>>> +
>>>> Unrelated whitespace?
>>>>
>>>>
>>>>>     	sched = &ge->sched;
>>>>>     	err = drm_sched_entity_init(&ge->entity, DRM_SCHED_PRIORITY_NORMAL,
>>>>>     				    &sched, 1, NULL);
>>>>>     	if (err)
>>>>>     		goto err_sched;
>>>>> +	if (xe_engine_is_lr(e))
>>>>> +		INIT_WORK(&e->guc->lr_tdr, xe_guc_engine_lr_cleanup);
>>>>> +
>>>>>     	mutex_lock(&guc->submission_state.lock);
>>>>>     	err = alloc_guc_id(guc, e);
>>>>> @@ -1146,7 +1217,7 @@ static void guc_engine_kill(struct xe_engine *e)
>>>>>     {
>>>>>     	trace_xe_engine_kill(e);
>>>>>     	set_engine_killed(e);
>>>>> -	drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
>>>>> +	xe_guc_engine_trigger_cleanup(e);
>>>>>     }
>>>>>     static void guc_engine_add_msg(struct xe_engine *e, struct drm_sched_msg *msg,
>>>>> @@ -1296,6 +1367,9 @@ static void guc_engine_stop(struct xe_guc *guc, struct xe_engine *e)
>>>>>     	/* Stop scheduling + flush any DRM scheduler operations */
>>>>>     	drm_sched_run_wq_stop(sched);
>>>>> +	if (engine_registered(e) && xe_engine_is_lr(e))
>>>>> +		xe_engine_put(e);
>>>>> +
>>>>>     	/* Clean up lost G2H + reset engine state */
>>>>>     	if (engine_destroyed(e) && engine_registered(e)) {
>>>>>     		if (engine_banned(e))
>>>>> @@ -1520,6 +1594,9 @@ int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>>>>>     	trace_xe_engine_deregister_done(e);
>>>>>     	clear_engine_registered(e);
>>>>> +	if (xe_engine_is_lr(e))
>>>>> +		xe_engine_put(e);
>>>>> +
>>>>>     	if (engine_banned(e))
>>>>>     		xe_engine_put(e);
>>>>>     	else
>>>>> @@ -1557,7 +1634,7 @@ int xe_guc_engine_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
>>>>>     	 */
>>>>>     	set_engine_reset(e);
>>>>>     	if (!engine_banned(e))
>>>>> -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
>>>>> +		xe_guc_engine_trigger_cleanup(e);
>>>>>     	return 0;
>>>>>     }
>>>>> @@ -1584,7 +1661,7 @@ int xe_guc_engine_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
>>>>>     	/* Treat the same as engine reset */
>>>>>     	set_engine_reset(e);
>>>>>     	if (!engine_banned(e))
>>>>> -		drm_sched_set_timeout(&e->guc->sched, MIN_SCHED_TIMEOUT);
>>>>> +		xe_guc_engine_trigger_cleanup(e);
>>>>>     	return 0;
>>>>>     }
>>>>> diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
>>>>> index 2f8eb7ebe9a7..02861c26e145 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_trace.h
>>>>> +++ b/drivers/gpu/drm/xe/xe_trace.h
>>>>> @@ -219,6 +219,11 @@ DEFINE_EVENT(xe_engine, xe_engine_resubmit,
>>>>>     	     TP_ARGS(e)
>>>>>     );
>>>>> +DEFINE_EVENT(xe_engine, xe_engine_lr_cleanup,
>>>>> +	     TP_PROTO(struct xe_engine *e),
>>>>> +	     TP_ARGS(e)
>>>>> +);
>>>>> +
>>>>>     DECLARE_EVENT_CLASS(xe_sched_job,
>>>>>     		    TP_PROTO(struct xe_sched_job *job),
>>>>>     		    TP_ARGS(job),

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move
  2023-05-10  8:14       ` Thomas Hellström
@ 2023-05-10 18:40         ` Matthew Brost
  2023-05-11  7:24           ` Thomas Hellström
  0 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-10 18:40 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

On Wed, May 10, 2023 at 10:14:12AM +0200, Thomas Hellström wrote:
> 
> On 5/10/23 00:05, Matthew Brost wrote:
> > On Tue, May 09, 2023 at 02:47:54PM +0200, Thomas Hellström wrote:
> > > On 5/2/23 02:17, Matthew Brost wrote:
> > > > Use the TTM LRU bulk move for BOs tied to a VM. Update the bulk moves
> > > > LRU position on every exec.
> > > > 
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/xe/xe_bo.c       | 32 ++++++++++++++++++++++++++++----
> > > >    drivers/gpu/drm/xe/xe_bo.h       |  4 ++--
> > > >    drivers/gpu/drm/xe/xe_dma_buf.c  |  2 +-
> > > >    drivers/gpu/drm/xe/xe_exec.c     |  6 ++++++
> > > >    drivers/gpu/drm/xe/xe_vm_types.h |  3 +++
> > > >    5 files changed, 40 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > > > index 3ab404e33fae..da99ee53e7d7 100644
> > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > @@ -985,6 +985,23 @@ static void xe_gem_object_free(struct drm_gem_object *obj)
> > > >    	ttm_bo_put(container_of(obj, struct ttm_buffer_object, base));
> > > >    }
> > > > +static void xe_gem_object_close(struct drm_gem_object *obj,
> > > > +				struct drm_file *file_priv)
> > > > +{
> > > > +	struct xe_bo *bo = gem_to_xe_bo(obj);
> > > > +
> > > > +	if (bo->vm && !xe_vm_no_dma_fences(bo->vm)) {
> > > Is there a reason we don't use bulk moves for LR vms? Admittedly bumping LRU
> > > doesn't make much sense when we support user-space command buffer chaining,
> > > but I think we should be doing it on exec at least, no?
> > Maybe you could make the argument for compute VMs, the preempt worker in
> > that case should probably do a bulk move. I can change this if desired.
> Yes, please.
> > 
> > Fot a fault VM it makes no sense as the fault handler updates the LRU
> > for individual BOs.
> Yes that makes sense.
> > 
> > > > +		struct ww_acquire_ctx ww;
> > > > +
> > > > +		XE_BUG_ON(!xe_bo_is_user(bo));
> > > Also why can't we use this for kernel objects as well? At some point we want
> > > to get to evictable page-table objects? Could we do this in the
> > > release_notify() callback to cover all potential bos?
> > > 
> > xe_gem_object_close is a user call, right? We can't call this on kernel
> > BOs. This also could be outside the if statement.
> 
> Hmm, yes the question was can we stop doing this in xe_gem_object_close()
> and instead do it in release_notify() to cover also kernel objects. Since
> release_notify() is called just after individualizing dma_resv, it makes
> sense to individualize also LRU at that point?
> 

If we ever support moving kernel BOs, then yes. We need to do a lot of
work to get there, with I'd rather leave this where is but I'll add a
comment indicating if we want to support kernel BO eviction, this should
be updated.

Sound good?

Matt

> /Thomas
> 
> 
> > 
> > Matt
> > 
> > > /Thomas
> > > 
> > > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 07/31] drm/xe: Only try to lock external BOs in VM bind
  2023-05-09 12:29         ` Thomas Hellström
@ 2023-05-10 23:25           ` Matthew Brost
  2023-05-11  7:43             ` Thomas Hellström
  0 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-10 23:25 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: Rodrigo Vivi, Matthew Brost, intel-xe, Rodrigo Vivi

On Tue, May 09, 2023 at 02:29:23PM +0200, Thomas Hellström wrote:
> 
> On 5/8/23 23:34, Rodrigo Vivi wrote:
> > On Mon, May 08, 2023 at 01:08:10AM +0000, Matthew Brost wrote:
> > > On Fri, May 05, 2023 at 02:40:40PM -0400, Rodrigo Vivi wrote:
> > > > On Mon, May 01, 2023 at 05:17:03PM -0700, Matthew Brost wrote:
> > > > > Not needed and causes some issues with bulk LRU moves.
> > > > I'm confused with this explanation and the code below.
> > > > could you please provide a bit more wording here?
> > > > 
> > > We only need to try to lock a BO if it external as non-external BOs
> > > share the dma-resv with the already locked VM. Trying to lock
> > > non-external BOs caused an issue (list corruption) in an uncoming patch
> 
> s/uncoming/upcoming/
> 
> Also it's not clear to me how this could fix a list corruption in the bulk
> LRU moves? I mean, if it's a duplicate lock then it gets removed from the tv
> list and not touched again? Could you explain the mechanism of the fix?
> 

I had my head wrappeed around this at one point but now I forget as I
coded this one a while ago. This changes later series (drm_exec
locking), so IMO not that big of a deal merge this along with the
following patch without further explaination but if you think it is a
huge deal I can try to figure out what the issue is again or another
option is just stage the LRU patch after drm_exec.

Open to whatever but prefer to leave as is.

Matt

> Thanks,
> 
> Thomas
> 
> 
> > > which adds bulk LRU move. Since this code isn't needed, remove it.
> > it makes more sense now. with this in commit msg (but with Christopher fix)
> > 
> > 
> > Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > 
> > 
> > > ^^^ How about this.
> > > 
> > > > > Signed-off-by: Matthew Brost <mattthew.brost@intel.com>
> > > > > ---
> > > > >   drivers/gpu/drm/xe/xe_vm.c | 8 +++++---
> > > > >   1 file changed, 5 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > > > > index 272f0f7f24fe..6c427ff92c44 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > > > @@ -2064,9 +2064,11 @@ static int vm_bind_ioctl(struct xe_vm *vm, struct xe_vma *vma,
> > > > >   		 */
> > > > >   		xe_bo_get(vbo);
> > > > > -		tv_bo.bo = &vbo->ttm;
> > > > > -		tv_bo.num_shared = 1;
> > > > > -		list_add(&tv_bo.head, &objs);
> > > > > +		if (!vbo->vm) {
> > > > > +			tv_bo.bo = &vbo->ttm;
> > > > > +			tv_bo.num_shared = 1;
> > > > > +			list_add(&tv_bo.head, &objs);
> > > > > +		}
> > > > >   	}
> > > > >   again:
> > > > > -- 
> > > > > 2.34.1
> > > > > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 16/31] drm/xe: Port Xe to GPUVA
  2023-05-09 13:52   ` Thomas Hellström
@ 2023-05-11  2:41     ` Matthew Brost
  2023-05-11  7:39       ` Thomas Hellström
  0 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-11  2:41 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

On Tue, May 09, 2023 at 03:52:24PM +0200, Thomas Hellström wrote:
> Hi, Matthew,
> 
> On 5/2/23 02:17, Matthew Brost wrote:
> > Rather than open coding VM binds and VMA tracking, use the GPUVA
> > library. GPUVA provides a common infrastructure for VM binds to use mmap
> > / munmap semantics and support for VK sparse bindings.
> > 
> > The concepts are:
> > 
> > 1) xe_vm inherits from drm_gpuva_manager
> > 2) xe_vma inherits from drm_gpuva
> > 3) xe_vma_op inherits from drm_gpuva_op
> > 4) VM bind operations (MAP, UNMAP, PREFETCH, UNMAP_ALL) call into the
> > GPUVA code to generate an VMA operations list which is parsed, commited,
> > and executed.
> > 
> > v2 (CI): Add break after default in case statement.
> > v3: Rebase
> > v4: Fix some error handling
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> 
> Before embarking on a second review of this code it would really be
> beneficial if you could address some comments from the first review. In
> particular splitting this huge patch up if possible (and I also think that
> removing the async worker *before* this patch if at all possible would
> really ease the review both for me and potential upcoming reviewers).
> 

My bad that I missed your comments on the list, yes I will address your
comments in the respin, expect it by Mondayish.

Removing the async worker first doesn't make a ton of sense as GPUVA
makes the error handling a lot easier plus that basically means a
complete rewrite.

Matt

> Thanks,
> 
> Thomas
> 
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 17/31] drm/xe: NULL binding implementation
  2023-05-09 14:34   ` Rodrigo Vivi
@ 2023-05-11  2:52     ` Matthew Brost
  0 siblings, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-11  2:52 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On Tue, May 09, 2023 at 10:34:52AM -0400, Rodrigo Vivi wrote:
> On Mon, May 01, 2023 at 05:17:13PM -0700, Matthew Brost wrote:
> > Add uAPI and implementation for NULL bindings. A NULL binding is defined
> > as writes dropped and read zero. A single bit in the uAPI has been added
> > which results in a single bit in the PTEs being set.
> 
> I have confirmed in the spec that this is the case for the BIT 9!
> 
> "If Null=1, the h/w will avoid the memory access and return all
> zero's for the read access with a null completion, write accesses are dropped"
> 
> The code looks good, but just a few questions / comments below.
> 
> > 
> > NULL bindings are indended to be used to implement VK sparse bindings.
> 
> is there any example available or any documentation that could explain
> how this is used and why this is needed?
> 

VK sparse being needs this:
https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/vkQueueBindSparse.html

> any IGT?
> 

https://patchwork.freedesktop.org/patch/534957/?series=117177&rev=2

> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_bo.h           |  1 +
> >  drivers/gpu/drm/xe/xe_exec.c         |  2 +
> >  drivers/gpu/drm/xe/xe_gt_pagefault.c |  4 +-
> >  drivers/gpu/drm/xe/xe_pt.c           | 77 ++++++++++++++++-------
> >  drivers/gpu/drm/xe/xe_vm.c           | 92 ++++++++++++++++++----------
> >  drivers/gpu/drm/xe/xe_vm.h           | 10 +++
> >  drivers/gpu/drm/xe/xe_vm_madvise.c   |  2 +-
> >  drivers/gpu/drm/xe/xe_vm_types.h     |  3 +
> >  include/uapi/drm/xe_drm.h            |  8 +++
> >  9 files changed, 144 insertions(+), 55 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> > index 25457b3c757b..81051f456874 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.h
> > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > @@ -56,6 +56,7 @@
> >  #define XE_PDE_IPS_64K			BIT_ULL(11)
> >  
> >  #define XE_GGTT_PTE_LM			BIT_ULL(1)
> > +#define XE_PTE_NULL			BIT_ULL(9)
> >  #define XE_USM_PPGTT_PTE_AE		BIT_ULL(10)
> >  #define XE_PPGTT_PTE_LM			BIT_ULL(11)
> >  #define XE_PDE_64K			BIT_ULL(6)
> > diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> > index 90c46d092737..68f876afd13c 100644
> > --- a/drivers/gpu/drm/xe/xe_exec.c
> > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > @@ -116,6 +116,8 @@ static int xe_exec_begin(struct xe_engine *e, struct ww_acquire_ctx *ww,
> >  	 * to a location where the GPU can access it).
> >  	 */
> >  	list_for_each_entry(vma, &vm->rebind_list, rebind_link) {
> > +		XE_BUG_ON(xe_vma_is_null(vma));
> 
> Can we avoid BUG here? Maybe a WARN?
> 

Sure.

Matt

> > +
> >  		if (xe_vma_is_userptr(vma))
> >  			continue;
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > index f7a066090a13..cfffe3398fe4 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > @@ -526,8 +526,8 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
> >  
> >  	trace_xe_vma_acc(vma);
> >  
> > -	/* Userptr can't be migrated, nothing to do */
> > -	if (xe_vma_is_userptr(vma))
> > +	/* Userptr or null can't be migrated, nothing to do */
> > +	if (xe_vma_has_no_bo(vma))
> >  		goto unlock_vm;
> >  
> >  	/* Lock VM and BOs dma-resv */
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> > index 2b5b05a8a084..b4edb751bfbb 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -82,7 +82,9 @@ u64 gen8_pde_encode(struct xe_bo *bo, u64 bo_offset,
> >  static dma_addr_t vma_addr(struct xe_vma *vma, u64 offset,
> >  			   size_t page_size, bool *is_vram)
> >  {
> > -	if (xe_vma_is_userptr(vma)) {
> > +	if (xe_vma_is_null(vma)) {
> > +		return 0;
> > +	} else if (xe_vma_is_userptr(vma)) {
> >  		struct xe_res_cursor cur;
> >  		u64 page;
> >  
> > @@ -563,6 +565,10 @@ static bool xe_pt_hugepte_possible(u64 addr, u64 next, unsigned int level,
> >  	if (next - xe_walk->va_curs_start > xe_walk->curs->size)
> >  		return false;
> >  
> > +	/* null VMA's do not have dma adresses */
> > +	if (xe_walk->pte_flags & XE_PTE_NULL)
> > +		return true;
> > +
> >  	/* Is the DMA address huge PTE size aligned? */
> >  	size = next - addr;
> >  	dma = addr - xe_walk->va_curs_start + xe_res_dma(xe_walk->curs);
> > @@ -585,6 +591,10 @@ xe_pt_scan_64K(u64 addr, u64 next, struct xe_pt_stage_bind_walk *xe_walk)
> >  	if (next > xe_walk->l0_end_addr)
> >  		return false;
> >  
> > +	/* null VMA's do not have dma adresses */
> > +	if (xe_walk->pte_flags & XE_PTE_NULL)
> > +		return true;
> > +
> >  	xe_res_next(&curs, addr - xe_walk->va_curs_start);
> >  	for (; addr < next; addr += SZ_64K) {
> >  		if (!IS_ALIGNED(xe_res_dma(&curs), SZ_64K) || curs.size < SZ_64K)
> > @@ -630,17 +640,34 @@ xe_pt_stage_bind_entry(struct drm_pt *parent, pgoff_t offset,
> >  	struct xe_pt *xe_child;
> >  	bool covers;
> >  	int ret = 0;
> > -	u64 pte;
> > +	u64 pte = 0;
> >  
> >  	/* Is this a leaf entry ?*/
> >  	if (level == 0 || xe_pt_hugepte_possible(addr, next, level, xe_walk)) {
> >  		struct xe_res_cursor *curs = xe_walk->curs;
> > +		bool null = xe_walk->pte_flags & XE_PTE_NULL;
> >  
> >  		XE_WARN_ON(xe_walk->va_curs_start != addr);
> >  
> > -		pte = __gen8_pte_encode(xe_res_dma(curs) + xe_walk->dma_offset,
> > -					xe_walk->cache, xe_walk->pte_flags,
> > -					level);
> > +		if (null) {
> > +			pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
> > +
> > +			if (unlikely(xe_walk->pte_flags & XE_PTE_READ_ONLY))
> > +				pte &= ~XE_PAGE_RW;
> > +
> > +			if (level == 1)
> > +				pte |= XE_PDE_PS_2M;
> > +			else if (level == 2)
> > +				pte |= XE_PDPE_PS_1G;
> > +
> > +			pte |= XE_PTE_NULL;
> > +		} else {
> > +			pte = __gen8_pte_encode(xe_res_dma(curs) +
> > +						xe_walk->dma_offset,
> > +						xe_walk->cache,
> > +						xe_walk->pte_flags,
> > +						level);
> > +		}
> >  		pte |= xe_walk->default_pte;
> >  
> >  		/*
> > @@ -658,7 +685,8 @@ xe_pt_stage_bind_entry(struct drm_pt *parent, pgoff_t offset,
> >  		if (unlikely(ret))
> >  			return ret;
> >  
> > -		xe_res_next(curs, next - addr);
> > +		if (!null)
> > +			xe_res_next(curs, next - addr);
> >  		xe_walk->va_curs_start = next;
> >  		*action = ACTION_CONTINUE;
> >  
> > @@ -751,7 +779,8 @@ xe_pt_stage_bind(struct xe_gt *gt, struct xe_vma *vma,
> >  		.gt = gt,
> >  		.curs = &curs,
> >  		.va_curs_start = xe_vma_start(vma),
> > -		.pte_flags = xe_vma_read_only(vma) ? XE_PTE_READ_ONLY : 0,
> > +		.pte_flags = xe_vma_read_only(vma) ? XE_PTE_READ_ONLY : 0 |
> > +			xe_vma_is_null(vma) ? XE_PTE_NULL : 0,
> >  		.wupd.entries = entries,
> >  		.needs_64K = (xe_vma_vm(vma)->flags & XE_VM_FLAGS_64K) &&
> >  			is_vram,
> > @@ -769,23 +798,28 @@ xe_pt_stage_bind(struct xe_gt *gt, struct xe_vma *vma,
> >  			gt_to_xe(gt)->mem.vram.io_start;
> >  		xe_walk.cache = XE_CACHE_WB;
> >  	} else {
> > -		if (!xe_vma_is_userptr(vma) && bo->flags & XE_BO_SCANOUT_BIT)
> > +		if (!xe_vma_has_no_bo(vma) && bo->flags & XE_BO_SCANOUT_BIT)
> >  			xe_walk.cache = XE_CACHE_WT;
> >  		else
> >  			xe_walk.cache = XE_CACHE_WB;
> >  	}
> > -	if (!xe_vma_is_userptr(vma) && xe_bo_is_stolen(bo))
> > +	if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo))
> >  		xe_walk.dma_offset = xe_ttm_stolen_gpu_offset(xe_bo_device(bo));
> >  
> >  	xe_bo_assert_held(bo);
> > -	if (xe_vma_is_userptr(vma))
> > -		xe_res_first_sg(vma->userptr.sg, 0, xe_vma_size(vma), &curs);
> > -	else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
> > -		xe_res_first(bo->ttm.resource, xe_vma_bo_offset(vma),
> > -			     xe_vma_size(vma), &curs);
> > -	else
> > -		xe_res_first_sg(xe_bo_get_sg(bo), xe_vma_bo_offset(vma),
> > -				xe_vma_size(vma), &curs);
> > +	if (!xe_vma_is_null(vma)) {
> > +		if (xe_vma_is_userptr(vma))
> > +			xe_res_first_sg(vma->userptr.sg, 0, xe_vma_size(vma),
> > +					&curs);
> > +		else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
> > +			xe_res_first(bo->ttm.resource, xe_vma_bo_offset(vma),
> > +				     xe_vma_size(vma), &curs);
> > +		else
> > +			xe_res_first_sg(xe_bo_get_sg(bo), xe_vma_bo_offset(vma),
> > +					xe_vma_size(vma), &curs);
> > +	} else {
> > +		curs.size = xe_vma_size(vma);
> > +	}
> >  
> >  	ret = drm_pt_walk_range(&pt->drm, pt->level, xe_vma_start(vma),
> >  				xe_vma_end(vma), &xe_walk.drm);
> > @@ -979,7 +1013,7 @@ static void xe_pt_commit_locks_assert(struct xe_vma *vma)
> >  
> >  	if (xe_vma_is_userptr(vma))
> >  		lockdep_assert_held_read(&vm->userptr.notifier_lock);
> > -	else
> > +	else if (!xe_vma_is_null(vma))
> >  		dma_resv_assert_held(xe_vma_bo(vma)->ttm.base.resv);
> >  
> >  	dma_resv_assert_held(&vm->resv);
> > @@ -1283,7 +1317,8 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
> >  	struct xe_vm_pgtable_update entries[XE_VM_MAX_LEVEL * 2 + 1];
> >  	struct xe_pt_migrate_pt_update bind_pt_update = {
> >  		.base = {
> > -			.ops = xe_vma_is_userptr(vma) ? &userptr_bind_ops : &bind_ops,
> > +			.ops = xe_vma_is_userptr(vma) ? &userptr_bind_ops :
> > +				&bind_ops,
> >  			.vma = vma,
> >  		},
> >  		.bind = true,
> > @@ -1348,7 +1383,7 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
> >  				   DMA_RESV_USAGE_KERNEL :
> >  				   DMA_RESV_USAGE_BOOKKEEP);
> >  
> > -		if (!xe_vma_is_userptr(vma) && !xe_vma_bo(vma)->vm)
> > +		if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
> >  			dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence,
> >  					   DMA_RESV_USAGE_BOOKKEEP);
> >  		xe_pt_commit_bind(vma, entries, num_entries, rebind,
> > @@ -1667,7 +1702,7 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
> >  				   DMA_RESV_USAGE_BOOKKEEP);
> >  
> >  		/* This fence will be installed by caller when doing eviction */
> > -		if (!xe_vma_is_userptr(vma) && !xe_vma_bo(vma)->vm)
> > +		if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
> >  			dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence,
> >  					   DMA_RESV_USAGE_BOOKKEEP);
> >  		xe_pt_commit_unbind(vma, entries, num_entries,
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > index f3608865e259..a46f44ab2546 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -60,6 +60,7 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
> >  
> >  	lockdep_assert_held(&vm->lock);
> >  	XE_BUG_ON(!xe_vma_is_userptr(vma));
> > +	XE_BUG_ON(xe_vma_is_null(vma));
> >  retry:
> >  	if (vma->gpuva.flags & XE_VMA_DESTROYED)
> >  		return 0;
> > @@ -581,7 +582,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
> >  		goto out_unlock;
> >  
> >  	list_for_each_entry(vma, &vm->rebind_list, rebind_link) {
> > -		if (xe_vma_is_userptr(vma) ||
> > +		if (xe_vma_has_no_bo(vma) ||
> >  		    vma->gpuva.flags & XE_VMA_DESTROYED)
> >  			continue;
> >  
> > @@ -813,7 +814,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> >  				    struct xe_bo *bo,
> >  				    u64 bo_offset_or_userptr,
> >  				    u64 start, u64 end,
> > -				    bool read_only,
> > +				    bool read_only, bool null,
> >  				    u64 gt_mask)
> >  {
> >  	struct xe_vma *vma;
> > @@ -843,6 +844,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> >  	vma->gpuva.va.range = end - start + 1;
> >  	if (read_only)
> >  		vma->gpuva.flags |= XE_VMA_READ_ONLY;
> > +	if (null)
> > +		vma->gpuva.flags |= XE_VMA_NULL;
> >  
> >  	if (gt_mask) {
> >  		vma->gt_mask = gt_mask;
> > @@ -862,23 +865,26 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> >  		vma->gpuva.gem.obj = &bo->ttm.base;
> >  		vma->gpuva.gem.offset = bo_offset_or_userptr;
> >  		drm_gpuva_link(&vma->gpuva);
> > -	} else /* userptr */ {
> > -		u64 size = end - start + 1;
> > -		int err;
> > -
> > -		vma->gpuva.gem.offset = bo_offset_or_userptr;
> > +	} else /* userptr or null */ {
> > +		if (!null) {
> > +			u64 size = end - start + 1;
> > +			int err;
> > +
> > +			vma->gpuva.gem.offset = bo_offset_or_userptr;
> > +			err = mmu_interval_notifier_insert(&vma->userptr.notifier,
> > +							   current->mm,
> > +							   xe_vma_userptr(vma),
> > +							   size,
> > +							   &vma_userptr_notifier_ops);
> > +			if (err) {
> > +				kfree(vma);
> > +				vma = ERR_PTR(err);
> > +				return vma;
> > +			}
> >  
> > -		err = mmu_interval_notifier_insert(&vma->userptr.notifier,
> > -						   current->mm,
> > -						   xe_vma_userptr(vma), size,
> > -						   &vma_userptr_notifier_ops);
> > -		if (err) {
> > -			kfree(vma);
> > -			vma = ERR_PTR(err);
> > -			return vma;
> > +			vma->userptr.notifier_seq = LONG_MAX;
> >  		}
> >  
> > -		vma->userptr.notifier_seq = LONG_MAX;
> >  		xe_vm_get(vm);
> >  	}
> >  
> > @@ -916,6 +922,8 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
> >  		 */
> >  		mmu_interval_notifier_remove(&vma->userptr.notifier);
> >  		xe_vm_put(vm);
> > +	} else if (xe_vma_is_null(vma)) {
> > +		xe_vm_put(vm);
> >  	} else {
> >  		xe_bo_put(xe_vma_bo(vma));
> >  	}
> > @@ -954,7 +962,7 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
> >  		list_del_init(&vma->userptr.invalidate_link);
> >  		spin_unlock(&vm->userptr.invalidated_lock);
> >  		list_del(&vma->userptr_link);
> > -	} else {
> > +	} else if (!xe_vma_is_null(vma)) {
> >  		xe_bo_assert_held(xe_vma_bo(vma));
> >  		drm_gpuva_unlink(&vma->gpuva);
> >  		if (!xe_vma_bo(vma)->vm)
> > @@ -1305,7 +1313,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
> >  	drm_gpuva_iter_for_each(gpuva, it) {
> >  		vma = gpuva_to_vma(gpuva);
> >  
> > -		if (xe_vma_is_userptr(vma)) {
> > +		if (xe_vma_has_no_bo(vma)) {
> >  			down_read(&vm->userptr.notifier_lock);
> >  			vma->gpuva.flags |= XE_VMA_DESTROYED;
> >  			up_read(&vm->userptr.notifier_lock);
> > @@ -1315,7 +1323,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
> >  		drm_gpuva_iter_remove(&it);
> >  
> >  		/* easy case, remove from VMA? */
> > -		if (xe_vma_is_userptr(vma) || xe_vma_bo(vma)->vm) {
> > +		if (xe_vma_has_no_bo(vma) || xe_vma_bo(vma)->vm) {
> >  			xe_vma_destroy(vma, NULL);
> >  			continue;
> >  		}
> > @@ -1964,7 +1972,7 @@ static int xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma,
> >  
> >  	XE_BUG_ON(region > ARRAY_SIZE(region_to_mem_type));
> >  
> > -	if (!xe_vma_is_userptr(vma)) {
> > +	if (!xe_vma_has_no_bo(vma)) {
> >  		err = xe_bo_migrate(xe_vma_bo(vma), region_to_mem_type[region]);
> >  		if (err)
> >  			return err;
> > @@ -2170,6 +2178,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
> >  				operation & XE_VM_BIND_FLAG_IMMEDIATE;
> >  			op->map.read_only =
> >  				operation & XE_VM_BIND_FLAG_READONLY;
> > +			op->map.null = operation & XE_VM_BIND_FLAG_NULL;
> >  		}
> >  		break;
> >  	case XE_VM_BIND_OP_UNMAP:
> > @@ -2226,7 +2235,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
> >  }
> >  
> >  static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
> > -			      u64 gt_mask, bool read_only)
> > +			      u64 gt_mask, bool read_only, bool null)
> >  {
> >  	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
> >  	struct xe_vma *vma;
> > @@ -2242,7 +2251,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
> >  	}
> >  	vma = xe_vma_create(vm, bo, op->gem.offset,
> >  			    op->va.addr, op->va.addr +
> > -			    op->va.range - 1, read_only,
> > +			    op->va.range - 1, read_only, null,
> >  			    gt_mask);
> >  	if (bo)
> >  		xe_bo_unlock(bo, &ww);
> > @@ -2254,7 +2263,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
> >  			xe_vma_destroy(vma, NULL);
> >  			return ERR_PTR(err);
> >  		}
> > -	} else if(!bo->vm) {
> > +	} else if(!xe_vma_has_no_bo(vma) && !bo->vm) {
> >  		vm_insert_extobj(vm, vma);
> >  		err = add_preempt_fences(vm, bo);
> >  		if (err) {
> > @@ -2332,7 +2341,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
> >  				struct xe_vma *vma;
> >  
> >  				vma = new_vma(vm, &op->base.map,
> > -					      op->gt_mask, op->map.read_only);
> > +					      op->gt_mask, op->map.read_only,
> > +					      op->map.null );
> >  				if (IS_ERR(vma)) {
> >  					err = PTR_ERR(vma);
> >  					goto free_fence;
> > @@ -2347,9 +2357,13 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
> >  					bool read_only =
> >  						op->base.remap.unmap->va->flags &
> >  						XE_VMA_READ_ONLY;
> > +					bool null =
> > +						op->base.remap.unmap->va->flags &
> > +						XE_VMA_NULL;
> >  
> >  					vma = new_vma(vm, op->base.remap.prev,
> > -						      op->gt_mask, read_only);
> > +						      op->gt_mask, read_only,
> > +						      null);
> >  					if (IS_ERR(vma)) {
> >  						err = PTR_ERR(vma);
> >  						goto free_fence;
> > @@ -2364,8 +2378,13 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_engine *e,
> >  						op->base.remap.unmap->va->flags &
> >  						XE_VMA_READ_ONLY;
> >  
> > +					bool null =
> > +						op->base.remap.unmap->va->flags &
> > +						XE_VMA_NULL;
> > +
> >  					vma = new_vma(vm, op->base.remap.next,
> > -						      op->gt_mask, read_only);
> > +						      op->gt_mask, read_only,
> > +						      null);
> >  					if (IS_ERR(vma)) {
> >  						err = PTR_ERR(vma);
> >  						goto free_fence;
> > @@ -2853,11 +2872,12 @@ static void vm_bind_ioctl_ops_unwind(struct xe_vm *vm,
> >  #ifdef TEST_VM_ASYNC_OPS_ERROR
> >  #define SUPPORTED_FLAGS	\
> >  	(FORCE_ASYNC_OP_ERROR | XE_VM_BIND_FLAG_ASYNC | \
> > -	 XE_VM_BIND_FLAG_READONLY | XE_VM_BIND_FLAG_IMMEDIATE | 0xffff)
> > +	 XE_VM_BIND_FLAG_READONLY | XE_VM_BIND_FLAG_IMMEDIATE | \
> > +	 XE_VM_BIND_FLAG_NULL | 0xffff)
> >  #else
> >  #define SUPPORTED_FLAGS	\
> >  	(XE_VM_BIND_FLAG_ASYNC | XE_VM_BIND_FLAG_READONLY | \
> > -	 XE_VM_BIND_FLAG_IMMEDIATE | 0xffff)
> > +	 XE_VM_BIND_FLAG_IMMEDIATE | XE_VM_BIND_FLAG_NULL | 0xffff)
> >  #endif
> >  #define XE_64K_PAGE_MASK 0xffffull
> >  
> > @@ -2903,6 +2923,7 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
> >  		u32 obj = (*bind_ops)[i].obj;
> >  		u64 obj_offset = (*bind_ops)[i].obj_offset;
> >  		u32 region = (*bind_ops)[i].region;
> > +		bool null = op &  XE_VM_BIND_FLAG_NULL;
> >  
> >  		if (i == 0) {
> >  			*async = !!(op & XE_VM_BIND_FLAG_ASYNC);
> > @@ -2929,8 +2950,12 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
> >  		if (XE_IOCTL_ERR(xe, VM_BIND_OP(op) >
> >  				 XE_VM_BIND_OP_PREFETCH) ||
> >  		    XE_IOCTL_ERR(xe, op & ~SUPPORTED_FLAGS) ||
> > +		    XE_IOCTL_ERR(xe, obj && null) ||
> > +		    XE_IOCTL_ERR(xe, obj_offset && null) ||
> > +		    XE_IOCTL_ERR(xe, VM_BIND_OP(op) != XE_VM_BIND_OP_MAP &&
> > +				 null) ||
> >  		    XE_IOCTL_ERR(xe, !obj &&
> > -				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP) ||
> > +				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP && !null) ||
> >  		    XE_IOCTL_ERR(xe, !obj &&
> >  				 VM_BIND_OP(op) == XE_VM_BIND_OP_UNMAP_ALL) ||
> >  		    XE_IOCTL_ERR(xe, addr &&
> > @@ -3254,6 +3279,7 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
> >  	int ret;
> >  
> >  	XE_BUG_ON(!xe_vm_in_fault_mode(xe_vma_vm(vma)));
> > +	XE_BUG_ON(xe_vma_is_null(vma));
> >  	trace_xe_vma_usm_invalidate(vma);
> >  
> >  	/* Check that we don't race with page-table updates */
> > @@ -3313,8 +3339,11 @@ int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
> >  	drm_gpuva_iter_for_each(gpuva, it) {
> >  		struct xe_vma* vma = gpuva_to_vma(gpuva);
> >  		bool is_userptr = xe_vma_is_userptr(vma);
> > +		bool null = xe_vma_is_null(vma);
> >  
> > -		if (is_userptr) {
> > +		if (null) {
> > +			addr = 0;
> > +		} else if (is_userptr) {
> >  			struct xe_res_cursor cur;
> >  
> >  			xe_res_first_sg(vma->userptr.sg, 0, XE_PAGE_SIZE, &cur);
> > @@ -3324,7 +3353,8 @@ int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
> >  		}
> >  		drm_printf(p, " [%016llx-%016llx] S:0x%016llx A:%016llx %s\n",
> >  			   xe_vma_start(vma), xe_vma_end(vma), xe_vma_size(vma),
> > -			   addr, is_userptr ? "USR" : is_vram ? "VRAM" : "SYS");
> > +			   addr, null ? "NULL" :
> > +			   is_userptr ? "USR" : is_vram ? "VRAM" : "SYS");
> >  	}
> >  	up_read(&vm->lock);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> > index 21b1054949c4..96e2c6b07bf8 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.h
> > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > @@ -175,7 +175,17 @@ static inline void xe_vm_reactivate_rebind(struct xe_vm *vm)
> >  	}
> >  }
> >  
> > +static inline bool xe_vma_is_null(struct xe_vma *vma)
> > +{
> > +	return vma->gpuva.flags & XE_VMA_NULL;
> > +}
> > +
> >  static inline bool xe_vma_is_userptr(struct xe_vma *vma)
> > +{
> > +	return !xe_vma_bo(vma) && !xe_vma_is_null(vma);
> > +}
> > +
> > +static inline bool xe_vma_has_no_bo(struct xe_vma *vma)
> >  {
> >  	return !xe_vma_bo(vma);
> >  }
> > diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
> > index 02d27a354b36..03508645fa08 100644
> > --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> > +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> > @@ -227,7 +227,7 @@ get_vmas(struct xe_vm *vm, int *num_vmas, u64 addr, u64 range)
> >  	drm_gpuva_iter_for_each_range(gpuva, it, addr + range) {
> >  		struct xe_vma *vma = gpuva_to_vma(gpuva);
> >  
> > -		if (xe_vma_is_userptr(vma))
> > +		if (xe_vma_has_no_bo(vma))
> >  			continue;
> >  
> >  		if (*num_vmas == max_vmas) {
> > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> > index 243dc91a61b0..b61007b70502 100644
> > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > @@ -29,6 +29,7 @@ struct xe_vm;
> >  #define XE_VMA_ATOMIC_PTE_BIT	(DRM_GPUVA_USERBITS << 2)
> >  #define XE_VMA_FIRST_REBIND	(DRM_GPUVA_USERBITS << 3)
> >  #define XE_VMA_LAST_REBIND	(DRM_GPUVA_USERBITS << 4)
> > +#define XE_VMA_NULL		(DRM_GPUVA_USERBITS << 5)
> >  
> >  struct xe_vma {
> >  	/** @gpuva: Base GPUVA object */
> > @@ -315,6 +316,8 @@ struct xe_vma_op_map {
> >  	bool immediate;
> >  	/** @read_only: Read only */
> >  	bool read_only;
> > +	/** @null: NULL (writes dropped, read zero) */
> > +	bool null;
> >  };
> >  
> >  /** struct xe_vma_op_unmap - VMA unmap operation */
> > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> > index b0b80aae3ee8..27c51946fadd 100644
> > --- a/include/uapi/drm/xe_drm.h
> > +++ b/include/uapi/drm/xe_drm.h
> > @@ -447,6 +447,14 @@ struct drm_xe_vm_bind_op {
> >  	 * than differing the MAP to the page fault handler.
> >  	 */
> >  #define XE_VM_BIND_FLAG_IMMEDIATE	(0x1 << 18)
> > +	/*
> > +	 * When the NULL flag is set, the page tables are setup with a special
> > +	 * bit which indicates writes are dropped and all reads return zero. The
> > +	 * NULL flags is only valid for XE_VM_BIND_OP_MAP operations, the BO
> > +	 * handle MBZ, and the BO offset MBZ. This flag is intended to implement
> > +	 * VK sparse bindings.
> > +	 */
> > +#define XE_VM_BIND_FLAG_NULL		(0x1 << 19)
> >  
> >  	/** @reserved: Reserved */
> >  	__u64 reserved[2];
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move
  2023-05-10 18:40         ` Matthew Brost
@ 2023-05-11  7:24           ` Thomas Hellström
  2023-05-11 14:11             ` Matthew Brost
  0 siblings, 1 reply; 126+ messages in thread
From: Thomas Hellström @ 2023-05-11  7:24 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe


On 5/10/23 20:40, Matthew Brost wrote:
> On Wed, May 10, 2023 at 10:14:12AM +0200, Thomas Hellström wrote:
>> On 5/10/23 00:05, Matthew Brost wrote:
>>> On Tue, May 09, 2023 at 02:47:54PM +0200, Thomas Hellström wrote:
>>>> On 5/2/23 02:17, Matthew Brost wrote:
>>>>> Use the TTM LRU bulk move for BOs tied to a VM. Update the bulk moves
>>>>> LRU position on every exec.
>>>>>
>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>> ---
>>>>>     drivers/gpu/drm/xe/xe_bo.c       | 32 ++++++++++++++++++++++++++++----
>>>>>     drivers/gpu/drm/xe/xe_bo.h       |  4 ++--
>>>>>     drivers/gpu/drm/xe/xe_dma_buf.c  |  2 +-
>>>>>     drivers/gpu/drm/xe/xe_exec.c     |  6 ++++++
>>>>>     drivers/gpu/drm/xe/xe_vm_types.h |  3 +++
>>>>>     5 files changed, 40 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>>>>> index 3ab404e33fae..da99ee53e7d7 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>>>> @@ -985,6 +985,23 @@ static void xe_gem_object_free(struct drm_gem_object *obj)
>>>>>     	ttm_bo_put(container_of(obj, struct ttm_buffer_object, base));
>>>>>     }
>>>>> +static void xe_gem_object_close(struct drm_gem_object *obj,
>>>>> +				struct drm_file *file_priv)
>>>>> +{
>>>>> +	struct xe_bo *bo = gem_to_xe_bo(obj);
>>>>> +
>>>>> +	if (bo->vm && !xe_vm_no_dma_fences(bo->vm)) {
>>>> Is there a reason we don't use bulk moves for LR vms? Admittedly bumping LRU
>>>> doesn't make much sense when we support user-space command buffer chaining,
>>>> but I think we should be doing it on exec at least, no?
>>> Maybe you could make the argument for compute VMs, the preempt worker in
>>> that case should probably do a bulk move. I can change this if desired.
>> Yes, please.
>>> Fot a fault VM it makes no sense as the fault handler updates the LRU
>>> for individual BOs.
>> Yes that makes sense.
>>>>> +		struct ww_acquire_ctx ww;
>>>>> +
>>>>> +		XE_BUG_ON(!xe_bo_is_user(bo));
>>>> Also why can't we use this for kernel objects as well? At some point we want
>>>> to get to evictable page-table objects? Could we do this in the
>>>> release_notify() callback to cover all potential bos?
>>>>
>>> xe_gem_object_close is a user call, right? We can't call this on kernel
>>> BOs. This also could be outside the if statement.
>> Hmm, yes the question was can we stop doing this in xe_gem_object_close()
>> and instead do it in release_notify() to cover also kernel objects. Since
>> release_notify() is called just after individualizing dma_resv, it makes
>> sense to individualize also LRU at that point?
>>
> If we ever support moving kernel BOs, then yes. We need to do a lot of
> work to get there, with I'd rather leave this where is but I'll add a
> comment indicating if we want to support kernel BO eviction, this should
> be updated.
>
> Sound good?

Well, I can't see the motivation to have it in gem close? Are other 
drivers doing that? Whether the object should be bulk moved or not is 
tied to whether it's a vm private object or not and that is closely tied 
to whether the reservation object is the vm resv or the object resv?

/Thomas

>
> Matt
>
>> /Thomas
>>
>>
>>> Matt
>>>
>>>> /Thomas
>>>>
>>>>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 16/31] drm/xe: Port Xe to GPUVA
  2023-05-11  2:41     ` Matthew Brost
@ 2023-05-11  7:39       ` Thomas Hellström
  0 siblings, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-11  7:39 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe


On 5/11/23 04:41, Matthew Brost wrote:
> On Tue, May 09, 2023 at 03:52:24PM +0200, Thomas Hellström wrote:
>> Hi, Matthew,
>>
>> On 5/2/23 02:17, Matthew Brost wrote:
>>> Rather than open coding VM binds and VMA tracking, use the GPUVA
>>> library. GPUVA provides a common infrastructure for VM binds to use mmap
>>> / munmap semantics and support for VK sparse bindings.
>>>
>>> The concepts are:
>>>
>>> 1) xe_vm inherits from drm_gpuva_manager
>>> 2) xe_vma inherits from drm_gpuva
>>> 3) xe_vma_op inherits from drm_gpuva_op
>>> 4) VM bind operations (MAP, UNMAP, PREFETCH, UNMAP_ALL) call into the
>>> GPUVA code to generate an VMA operations list which is parsed, commited,
>>> and executed.
>>>
>>> v2 (CI): Add break after default in case statement.
>>> v3: Rebase
>>> v4: Fix some error handling
>>>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> Before embarking on a second review of this code it would really be
>> beneficial if you could address some comments from the first review. In
>> particular splitting this huge patch up if possible (and I also think that
>> removing the async worker *before* this patch if at all possible would
>> really ease the review both for me and potential upcoming reviewers).
>>
> My bad that I missed your comments on the list, yes I will address your
> comments in the respin, expect it by Mondayish.

NP, basically this was a comment saying I'd rather wait for a respin 
before tackling this again :).

>
> Removing the async worker first doesn't make a ton of sense as GPUVA
> makes the error handling a lot easier plus that basically means a
> complete rewrite.

OK, yes I guess from a reviewer's point of view the other way around 
would be easier, but if you keep the ordering please split into separate 
patch series for GPUVA and async removal, because invasive changes in 
code that is later removed in the same series is typically not well 
received.

Thanks,

Thomas


>
> Matt
>
>> Thanks,
>>
>> Thomas
>>
>>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 07/31] drm/xe: Only try to lock external BOs in VM bind
  2023-05-10 23:25           ` Matthew Brost
@ 2023-05-11  7:43             ` Thomas Hellström
  0 siblings, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-11  7:43 UTC (permalink / raw)
  To: Matthew Brost; +Cc: Rodrigo Vivi, Matthew Brost, intel-xe, Rodrigo Vivi


On 5/11/23 01:25, Matthew Brost wrote:
> On Tue, May 09, 2023 at 02:29:23PM +0200, Thomas Hellström wrote:
>> On 5/8/23 23:34, Rodrigo Vivi wrote:
>>> On Mon, May 08, 2023 at 01:08:10AM +0000, Matthew Brost wrote:
>>>> On Fri, May 05, 2023 at 02:40:40PM -0400, Rodrigo Vivi wrote:
>>>>> On Mon, May 01, 2023 at 05:17:03PM -0700, Matthew Brost wrote:
>>>>>> Not needed and causes some issues with bulk LRU moves.
>>>>> I'm confused with this explanation and the code below.
>>>>> could you please provide a bit more wording here?
>>>>>
>>>> We only need to try to lock a BO if it external as non-external BOs
>>>> share the dma-resv with the already locked VM. Trying to lock
>>>> non-external BOs caused an issue (list corruption) in an uncoming patch
>> s/uncoming/upcoming/
>>
>> Also it's not clear to me how this could fix a list corruption in the bulk
>> LRU moves? I mean, if it's a duplicate lock then it gets removed from the tv
>> list and not touched again? Could you explain the mechanism of the fix?
>>
> I had my head wrappeed around this at one point but now I forget as I
> coded this one a while ago. This changes later series (drm_exec
> locking), so IMO not that big of a deal merge this along with the
> following patch without further explaination but if you think it is a
> huge deal I can try to figure out what the issue is again or another
> option is just stage the LRU patch after drm_exec.
>
> Open to whatever but prefer to leave as is.

Since this will be part of a bulk LRU series, separate from drm_exec 
then yes please add an explanation as why this is needed, as we can't 
make something dependent on a future patch that may or may not go in 
depending on the review outcome.

Could also consider reordering so that drm_exec goes in before bulk LRU?

/Thomas

>
> Matt
>
>> Thanks,
>>
>> Thomas
>>
>>
>>>> which adds bulk LRU move. Since this code isn't needed, remove it.
>>> it makes more sense now. with this in commit msg (but with Christopher fix)
>>>
>>>
>>> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>
>>>
>>>> ^^^ How about this.
>>>>
>>>>>> Signed-off-by: Matthew Brost <mattthew.brost@intel.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/xe/xe_vm.c | 8 +++++---
>>>>>>    1 file changed, 5 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>>>>>> index 272f0f7f24fe..6c427ff92c44 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_vm.c
>>>>>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>>>>>> @@ -2064,9 +2064,11 @@ static int vm_bind_ioctl(struct xe_vm *vm, struct xe_vma *vma,
>>>>>>    		 */
>>>>>>    		xe_bo_get(vbo);
>>>>>> -		tv_bo.bo = &vbo->ttm;
>>>>>> -		tv_bo.num_shared = 1;
>>>>>> -		list_add(&tv_bo.head, &objs);
>>>>>> +		if (!vbo->vm) {
>>>>>> +			tv_bo.bo = &vbo->ttm;
>>>>>> +			tv_bo.num_shared = 1;
>>>>>> +			list_add(&tv_bo.head, &objs);
>>>>>> +		}
>>>>>>    	}
>>>>>>    again:
>>>>>> -- 
>>>>>> 2.34.1
>>>>>>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 19/31] drm/xe: Reduce the number list links in xe_vma
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 19/31] drm/xe: Reduce the number list links in xe_vma Matthew Brost
  2023-05-08 21:43   ` Rodrigo Vivi
@ 2023-05-11  8:38   ` Thomas Hellström
  1 sibling, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-11  8:38 UTC (permalink / raw)
  To: Matthew Brost, intel-xe


On 5/2/23 02:17, Matthew Brost wrote:
> 5 list links in can be squashed into a union in xe_vma as being on the
> various list is mutually exclusive.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_gt_pagefault.c |  2 +-
>   drivers/gpu/drm/xe/xe_pt.c           |  5 +-
>   drivers/gpu/drm/xe/xe_vm.c           | 29 ++++++------
>   drivers/gpu/drm/xe/xe_vm_types.h     | 71 +++++++++++++++-------------
>   4 files changed, 55 insertions(+), 52 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> index cfffe3398fe4..d7bf6b0a0697 100644
> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> @@ -157,7 +157,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
>   
>   	if (xe_vma_is_userptr(vma) && write_locked) {
>   		spin_lock(&vm->userptr.invalidated_lock);
> -		list_del_init(&vma->userptr.invalidate_link);
> +		list_del_init(&vma->invalidate_link);
>   		spin_unlock(&vm->userptr.invalidated_lock);
>   
>   		ret = xe_vma_userptr_pin_pages(vma);
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 010f44260cda..8eab8e1bbaf0 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1116,8 +1116,7 @@ static int xe_pt_userptr_inject_eagain(struct xe_vma *vma)
>   
>   		vma->userptr.divisor = divisor << 1;
>   		spin_lock(&vm->userptr.invalidated_lock);
> -		list_move_tail(&vma->userptr.invalidate_link,
> -			       &vm->userptr.invalidated);
> +		list_move_tail(&vma->invalidate_link, &vm->userptr.invalidated);
>   		spin_unlock(&vm->userptr.invalidated_lock);
>   		return true;
>   	}
> @@ -1724,7 +1723,7 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
>   
>   		if (!vma->gt_present) {
>   			spin_lock(&vm->userptr.invalidated_lock);
> -			list_del_init(&vma->userptr.invalidate_link);
> +			list_del_init(&vma->invalidate_link);
>   			spin_unlock(&vm->userptr.invalidated_lock);
>   		}
>   		up_read(&vm->userptr.notifier_lock);
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index e0ed7201aeb0..e5f2fffb2aec 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -677,8 +677,7 @@ static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni,
>   	if (!xe_vm_in_fault_mode(vm) &&
>   	    !(vma->gpuva.flags & XE_VMA_DESTROYED) && vma->gt_present) {
>   		spin_lock(&vm->userptr.invalidated_lock);
> -		list_move_tail(&vma->userptr.invalidate_link,
> -			       &vm->userptr.invalidated);
> +		list_move_tail(&vma->invalidate_link, &vm->userptr.invalidated);
>   		spin_unlock(&vm->userptr.invalidated_lock);
>   	}
>   
> @@ -726,8 +725,8 @@ int xe_vm_userptr_pin(struct xe_vm *vm)
>   	/* Collect invalidated userptrs */
>   	spin_lock(&vm->userptr.invalidated_lock);
>   	list_for_each_entry_safe(vma, next, &vm->userptr.invalidated,
> -				 userptr.invalidate_link) {
> -		list_del_init(&vma->userptr.invalidate_link);
> +				 invalidate_link) {
> +		list_del_init(&vma->invalidate_link);
>   		list_move_tail(&vma->userptr_link, &vm->userptr.repin_list);
>   	}
>   	spin_unlock(&vm->userptr.invalidated_lock);
> @@ -830,12 +829,11 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>   		return vma;
>   	}
>   
> -	/* FIXME: Way to many lists, should be able to reduce this */
> +	/*
> +	 * userptr_link, destroy_link, notifier.rebind_link,
> +	 * invalidate_link
> +	 */
>   	INIT_LIST_HEAD(&vma->rebind_link);
> -	INIT_LIST_HEAD(&vma->unbind_link);
> -	INIT_LIST_HEAD(&vma->userptr_link);
> -	INIT_LIST_HEAD(&vma->userptr.invalidate_link);
> -	INIT_LIST_HEAD(&vma->notifier.rebind_link);
>   	INIT_LIST_HEAD(&vma->extobj.link);
>   
>   	INIT_LIST_HEAD(&vma->gpuva.gem.entry);
> @@ -953,15 +951,14 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
>   	struct xe_vm *vm = xe_vma_vm(vma);
>   
>   	lockdep_assert_held_write(&vm->lock);
> -	XE_BUG_ON(!list_empty(&vma->unbind_link));
>   
>   	if (xe_vma_is_userptr(vma)) {
>   		XE_WARN_ON(!(vma->gpuva.flags & XE_VMA_DESTROYED));
>   
>   		spin_lock(&vm->userptr.invalidated_lock);
> -		list_del_init(&vma->userptr.invalidate_link);
> +		if (!list_empty(&vma->invalidate_link))
> +			list_del_init(&vma->invalidate_link);
>   		spin_unlock(&vm->userptr.invalidated_lock);
> -		list_del(&vma->userptr_link);
>   	} else if (!xe_vma_is_null(vma)) {
>   		xe_bo_assert_held(xe_vma_bo(vma));
>   		drm_gpuva_unlink(&vma->gpuva);
> @@ -1328,7 +1325,9 @@ void xe_vm_close_and_put(struct xe_vm *vm)
>   			continue;
>   		}
>   
> -		list_add_tail(&vma->unbind_link, &contested);
> +		if (!list_empty(&vma->destroy_link))
> +			list_del_init(&vma->destroy_link);
> +		list_add_tail(&vma->destroy_link, &contested);
>   	}
>   
>   	/*
> @@ -1356,8 +1355,8 @@ void xe_vm_close_and_put(struct xe_vm *vm)
>   	 * Since we hold a refcount to the bo, we can remove and free
>   	 * the members safely without locking.
>   	 */
> -	list_for_each_entry_safe(vma, next_vma, &contested, unbind_link) {
> -		list_del_init(&vma->unbind_link);
> +	list_for_each_entry_safe(vma, next_vma, &contested, destroy_link) {
> +		list_del_init(&vma->destroy_link);
>   		xe_vma_destroy_unlocked(vma);
>   	}
>   
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index d55ec8156caa..22def5483c12 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -50,21 +50,32 @@ struct xe_vma {
>   	 */
>   	u64 gt_present;
>   
> -	/** @userptr_link: link into VM repin list if userptr */
> -	struct list_head userptr_link;
> +	union {
> +		/** @userptr_link: link into VM repin list if userptr */
> +		struct list_head userptr_link;
>   
> -	/**
> -	 * @rebind_link: link into VM if this VMA needs rebinding, and
> -	 * if it's a bo (not userptr) needs validation after a possible
> -	 * eviction. Protected by the vm's resv lock.
> -	 */
> -	struct list_head rebind_link;
> +		/**
> +		 * @rebind_link: link into VM if this VMA needs rebinding, and
> +		 * if it's a bo (not userptr) needs validation after a possible
> +		 * eviction. Protected by the vm's resv lock.
> +		 */
> +		struct list_head rebind_link;

Since the different lists have very different locking protection, I'm 
pretty sure you can come up with a scenario where this is invalid, for 
example a userptr vma being on the vm->userptr.repin_list and the 
vm->rebind_list simultaneously, if we, for example do a repin and then 
hit an -EINTR during xe_vm_lock_dma_resv() and then immediately a new 
userptr invalidation (now the rebind_link and userptr.invalidate_link 
are used on different lists), and then a new repin (now the rebind_link 
and userptr_link are used on different lists, simultaneously).

If you want to do a union like this, we either need to have the exact 
same locking rules for the links or they should be used for completely 
different purposes that can never happen together, like the 
userptr.invalidate_link and the notifier.rebind_link, one used for 
userptr only and the other for bo only.

I'm also pretty sure that a vma can be on the rebind_list and the 
notifier.rebind_list simultaneously, due to the different locking, with 
a similar reasoning as above.

So without the rule of exact same locking rules, it becomes practically 
impossible to reason that using a union is safe.

/Thomas



^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 20/31] drm/xe: Optimize size of xe_vma allocation
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 20/31] drm/xe: Optimize size of xe_vma allocation Matthew Brost
  2023-05-05 19:37   ` Rodrigo Vivi
@ 2023-05-11  9:05   ` Thomas Hellström
  1 sibling, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-11  9:05 UTC (permalink / raw)
  To: Matthew Brost, intel-xe


On 5/2/23 02:17, Matthew Brost wrote:
> Reduce gt_mask to a u8 from a u64, only allocate userptr state if VMA is
> a userptr, and union of destroy callback and worker.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_vm.c       | 14 +++--
>   drivers/gpu/drm/xe/xe_vm_types.h | 88 +++++++++++++++++---------------
>   2 files changed, 57 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index e5f2fffb2aec..e8d9939ee535 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -814,7 +814,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>   				    u64 bo_offset_or_userptr,
>   				    u64 start, u64 end,
>   				    bool read_only, bool null,
> -				    u64 gt_mask)
> +				    u8 gt_mask)
>   {
>   	struct xe_vma *vma;
>   	struct xe_gt *gt;
> @@ -823,7 +823,11 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>   	XE_BUG_ON(start >= end);
>   	XE_BUG_ON(end >= vm->size);
>   
> -	vma = kzalloc(sizeof(*vma), GFP_KERNEL);
> +	if (!bo && !null)	/* userptr */
> +		vma = kzalloc(sizeof(*vma), GFP_KERNEL);
> +	else
> +		vma = kzalloc(sizeof(*vma) - sizeof(struct xe_userptr),
> +			      GFP_KERNEL);
>   	if (!vma) {
>   		vma = ERR_PTR(-ENOMEM);
>   		return vma;
> @@ -2149,7 +2153,7 @@ static void print_op(struct xe_device *xe, struct drm_gpuva_op *op)
>   static struct drm_gpuva_ops *
>   vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>   			 u64 bo_offset_or_userptr, u64 addr, u64 range,
> -			 u32 operation, u64 gt_mask, u32 region)
> +			 u32 operation, u8 gt_mask, u32 region)
>   {
>   	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
>   	struct ww_acquire_ctx ww;
> @@ -2234,7 +2238,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>   }
>   
>   static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
> -			      u64 gt_mask, bool read_only, bool null)
> +			      u8 gt_mask, bool read_only, bool null)
>   {
>   	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
>   	struct xe_vma *vma;
> @@ -3217,8 +3221,8 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   		u64 addr = bind_ops[i].addr;
>   		u32 op = bind_ops[i].op;
>   		u64 obj_offset = bind_ops[i].obj_offset;
> -		u64 gt_mask = bind_ops[i].gt_mask;
>   		u32 region = bind_ops[i].region;
> +		u8 gt_mask = bind_ops[i].gt_mask;
>   
>   		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
>   						  addr, range, op, gt_mask,
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index 22def5483c12..df4797ec4d7f 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -34,22 +34,34 @@ struct xe_vm;
>   #define XE_VMA_PTE_2M		(DRM_GPUVA_USERBITS << 7)
>   #define XE_VMA_PTE_1G		(DRM_GPUVA_USERBITS << 8)
>   
> +/** struct xe_userptr - User pointer */
> +struct xe_userptr {
> +	/**
> +	 * @notifier: MMU notifier for user pointer (invalidation call back)
> +	 */
> +	struct mmu_interval_notifier notifier;
> +	/** @sgt: storage for a scatter gather table */
> +	struct sg_table sgt;
> +	/** @sg: allocated scatter gather table */
> +	struct sg_table *sg;
> +	/** @notifier_seq: notifier sequence number */
> +	unsigned long notifier_seq;
> +	/**
> +	 * @initial_bind: user pointer has been bound at least once.
> +	 * write: vm->userptr.notifier_lock in read mode and vm->resv held.
> +	 * read: vm->userptr.notifier_lock in write mode or vm->resv held.
> +	 */
> +	bool initial_bind;
> +#if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
> +	u32 divisor;
> +#endif
> +};
> +
> +/** xe_vma - Virtual memory address */
>   struct xe_vma {
>   	/** @gpuva: Base GPUVA object */
>   	struct drm_gpuva gpuva;
>   
> -	/** @gt_mask: GT mask of where to create binding for this VMA */
> -	u64 gt_mask;
> -
> -	/**
> -	 * @gt_present: GT mask of binding are present for this VMA.
> -	 * protected by vm->lock, vm->resv and for userptrs,
> -	 * vm->userptr.notifier_lock for writing. Needs either for reading,
> -	 * but if reading is done under the vm->lock only, it needs to be held
> -	 * in write mode.
> -	 */
> -	u64 gt_present;
> -
>   	union {
>   		/** @userptr_link: link into VM repin list if userptr */
>   		struct list_head userptr_link;
> @@ -77,16 +89,29 @@ struct xe_vma {
>   		} notifier;
>   	};
>   
> -	/** @destroy_cb: callback to destroy VMA when unbind job is done */
> -	struct dma_fence_cb destroy_cb;
> +	union {
> +		/** @destroy_cb: callback to destroy VMA when unbind job is done */
> +		struct dma_fence_cb destroy_cb;
> +		/** @destroy_work: worker to destroy this BO */
> +		struct work_struct destroy_work;
> +	};
>   
> -	/** @destroy_work: worker to destroy this BO */
> -	struct work_struct destroy_work;
> +	/** @gt_mask: GT mask of where to create binding for this VMA */
> +	u8 gt_mask;
> +
> +	/**
> +	 * @gt_present: GT mask of binding are present for this VMA.
> +	 * protected by vm->lock, vm->resv and for userptrs,
> +	 * vm->userptr.notifier_lock for writing. Needs either for reading,
> +	 * but if reading is done under the vm->lock only, it needs to be held
> +	 * in write mode.
> +	 */
> +	u8 gt_present;
>   
>   	/** @usm: unified shared memory state */
>   	struct {
>   		/** @gt_invalidated: VMA has been invalidated */
> -		u64 gt_invalidated;
> +		u8 gt_invalidated;
>   	} usm;
>   
>   	struct {
> @@ -97,28 +122,11 @@ struct xe_vma {
>   		struct list_head link;
>   	} extobj;
>   
> -	/** @userptr: user pointer state */
> -	struct {
> -		/**
> -		 * @notifier: MMU notifier for user pointer (invalidation call back)
> -		 */
> -		struct mmu_interval_notifier notifier;
> -		/** @sgt: storage for a scatter gather table */
> -		struct sg_table sgt;
> -		/** @sg: allocated scatter gather table */
> -		struct sg_table *sg;
> -		/** @notifier_seq: notifier sequence number */
> -		unsigned long notifier_seq;
> -		/**
> -		 * @initial_bind: user pointer has been bound at least once.
> -		 * write: vm->userptr.notifier_lock in read mode and vm->resv held.
> -		 * read: vm->userptr.notifier_lock in write mode or vm->resv held.
> -		 */
> -		bool initial_bind;
> -#if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
> -		u32 divisor;
> -#endif
> -	} userptr;
> +	/**
> +	 * @userptr: user pointer state, only allocated for VMAs that are
> +	 * user pointers
> +	 */
> +	struct xe_userptr userptr;

I think this is very fragile, What happens when someone doesn't read the 
code and simply adds a member after @userptr, or generic code accidently 
dereferences a field in @userptr?

Wouldn't the proper way to do this, if at all, to subclass xe_vma into 
an xe_vma_usertptr to guard against such things happening?

For the u8 space optimizations, also a pahole layout before and after 
the change would be beneficial in the commit message.

/Thomas


>   };
>   
>   struct xe_device;
> @@ -387,7 +395,7 @@ struct xe_vma_op {
>   	 */
>   	struct async_op_fence *fence;
>   	/** @gt_mask: gt mask for this operation */
> -	u64 gt_mask;
> +	u8 gt_mask;
>   	/** @flags: operation flags */
>   	enum xe_vma_op_flags flags;
>   

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 21/31] drm/gpuva: Add drm device to GPUVA manager
  2023-05-05 19:39   ` Rodrigo Vivi
@ 2023-05-11  9:06     ` Thomas Hellström
  0 siblings, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-11  9:06 UTC (permalink / raw)
  To: Rodrigo Vivi, Matthew Brost; +Cc: intel-xe


On 5/5/23 21:39, Rodrigo Vivi wrote:
> On Mon, May 01, 2023 at 05:17:17PM -0700, Matthew Brost wrote:
>> This is the logical place for this, will help with upcoming changes too.
> Please split the xe from the drm stuff in different patches and
> a bit more words of why would be better.

+1

/Thomas



^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 22/31] drm/gpuva: Move dma-resv to GPUVA manager
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 22/31] drm/gpuva: Move dma-resv " Matthew Brost
@ 2023-05-11  9:10   ` Thomas Hellström
  0 siblings, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-11  9:10 UTC (permalink / raw)
  To: Matthew Brost, intel-xe


On 5/2/23 02:17, Matthew Brost wrote:
> Logical place for this, will help with upcoming patches.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

This makes sense to me, but like the previuos patch please add a more 
elaborate commit message according to the linux patch submission 
guidelines, and split the drm- and xe changes when possible.

/Thomas


> ---
>   drivers/gpu/drm/drm_gpuva_mgr.c  |  2 ++
>   drivers/gpu/drm/xe/xe_bo.c       | 10 +++++-----
>   drivers/gpu/drm/xe/xe_bo.h       |  2 +-
>   drivers/gpu/drm/xe/xe_exec.c     |  4 ++--
>   drivers/gpu/drm/xe/xe_migrate.c  |  4 ++--
>   drivers/gpu/drm/xe/xe_pt.c       |  6 +++---
>   drivers/gpu/drm/xe/xe_vm.c       | 34 ++++++++++++++++----------------
>   drivers/gpu/drm/xe/xe_vm.h       | 12 ++++++++++-
>   drivers/gpu/drm/xe/xe_vm_types.h |  6 +-----
>   include/drm/drm_gpuva_mgr.h      |  6 ++++++
>   10 files changed, 50 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
> index 137322945e91..6d2d0f4d5018 100644
> --- a/drivers/gpu/drm/drm_gpuva_mgr.c
> +++ b/drivers/gpu/drm/drm_gpuva_mgr.c
> @@ -443,6 +443,8 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
>   	mgr->name = name ? name : "unknown";
>   	mgr->ops = ops;
>   
> +	dma_resv_init(&mgr->resv);
> +
>   	memset(&mgr->kernel_alloc_node, 0, sizeof(struct drm_gpuva));
>   
>   	if (reserve_range) {
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index a475d0584916..e0422ffb6327 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -441,9 +441,9 @@ static int xe_bo_trigger_rebind(struct xe_device *xe, struct xe_bo *bo,
>   			 * that we indeed have it locked, put the vma an the
>   			 * vm's notifier.rebind_list instead and scoop later.
>   			 */
> -			if (dma_resv_trylock(&vm->resv))
> +			if (dma_resv_trylock(xe_vm_resv(vm)))
>   				vm_resv_locked = true;
> -			else if (ctx->resv != &vm->resv) {
> +			else if (ctx->resv != xe_vm_resv(vm)) {
>   				spin_lock(&vm->notifier.list_lock);
>   				list_move_tail(&vma->notifier.rebind_link,
>   					       &vm->notifier.rebind_list);
> @@ -456,7 +456,7 @@ static int xe_bo_trigger_rebind(struct xe_device *xe, struct xe_bo *bo,
>   				list_add_tail(&vma->rebind_link, &vm->rebind_list);
>   
>   			if (vm_resv_locked)
> -				dma_resv_unlock(&vm->resv);
> +				dma_resv_unlock(xe_vm_resv(vm));
>   		}
>   	}
>   
> @@ -1240,7 +1240,7 @@ xe_bo_create_locked_range(struct xe_device *xe,
>   		}
>   	}
>   
> -	bo = __xe_bo_create_locked(xe, bo, gt, vm ? &vm->resv : NULL,
> +	bo = __xe_bo_create_locked(xe, bo, gt, vm ? xe_vm_resv(vm) : NULL,
>   				   vm && !xe_vm_no_dma_fences(vm) &&
>   				   flags & XE_BO_CREATE_USER_BIT ?
>   				   &vm->lru_bulk_move : NULL, size,
> @@ -1555,7 +1555,7 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
>   		xe_vm_assert_held(vm);
>   
>   		ctx.allow_res_evict = allow_res_evict;
> -		ctx.resv = &vm->resv;
> +		ctx.resv = xe_vm_resv(vm);
>   	}
>   
>   	return ttm_bo_validate(&bo->ttm, &bo->placement, &ctx);
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index 81051f456874..9b401d30a130 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -150,7 +150,7 @@ void xe_bo_unlock(struct xe_bo *bo, struct ww_acquire_ctx *ww);
>   static inline void xe_bo_unlock_vm_held(struct xe_bo *bo)
>   {
>   	if (bo) {
> -		XE_BUG_ON(bo->vm && bo->ttm.base.resv != &bo->vm->resv);
> +		XE_BUG_ON(bo->vm && bo->ttm.base.resv != &bo->vm->mgr.resv);
>   		if (bo->vm)
>   			xe_vm_assert_held(bo->vm);
>   		else
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index 68f876afd13c..b352fd6e1f4d 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -327,7 +327,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   	/* Wait behind munmap style rebinds */
>   	if (!xe_vm_no_dma_fences(vm)) {
>   		err = drm_sched_job_add_resv_dependencies(&job->drm,
> -							  &vm->resv,
> +							  xe_vm_resv(vm),
>   							  DMA_RESV_USAGE_KERNEL);
>   		if (err)
>   			goto err_put_job;
> @@ -355,7 +355,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   	xe_sched_job_arm(job);
>   	if (!xe_vm_no_dma_fences(vm)) {
>   		/* Block userptr invalidations / BO eviction */
> -		dma_resv_add_fence(&vm->resv,
> +		dma_resv_add_fence(xe_vm_resv(vm),
>   				   &job->drm.s_fence->finished,
>   				   DMA_RESV_USAGE_BOOKKEEP);
>   
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index 0a393c5772e5..91a06c925a1e 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -1049,7 +1049,7 @@ xe_migrate_update_pgtables_cpu(struct xe_migrate *m,
>   					  DMA_RESV_USAGE_KERNEL))
>   		return ERR_PTR(-ETIME);
>   
> -	if (wait_vm && !dma_resv_test_signaled(&vm->resv,
> +	if (wait_vm && !dma_resv_test_signaled(xe_vm_resv(vm),
>   					       DMA_RESV_USAGE_BOOKKEEP)) {
>   		vm_dbg(&xe_vm_device(vm)->drm, "wait on VM for munmap");
>   		return ERR_PTR(-ETIME);
> @@ -1264,7 +1264,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
>   	 */
>   	if (first_munmap_rebind) {
>   		vm_dbg(&xe_vm_device(vm)->drm, "wait on first_munmap_rebind");
> -		err = job_add_deps(job, &vm->resv,
> +		err = job_add_deps(job, xe_vm_resv(vm),
>   				   DMA_RESV_USAGE_BOOKKEEP);
>   		if (err)
>   			goto err_job;
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 4167f666d98d..0f40f1950686 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1020,7 +1020,7 @@ static void xe_pt_commit_locks_assert(struct xe_vma *vma)
>   	else if (!xe_vma_is_null(vma))
>   		dma_resv_assert_held(xe_vma_bo(vma)->ttm.base.resv);
>   
> -	dma_resv_assert_held(&vm->resv);
> +	dma_resv_assert_held(xe_vm_resv(vm));
>   }
>   
>   static void xe_pt_commit_bind(struct xe_vma *vma,
> @@ -1381,7 +1381,7 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
>   		}
>   
>   		/* add shared fence now for pagetable delayed destroy */
> -		dma_resv_add_fence(&vm->resv, fence, !rebind &&
> +		dma_resv_add_fence(xe_vm_resv(vm), fence, !rebind &&
>   				   last_munmap_rebind ?
>   				   DMA_RESV_USAGE_KERNEL :
>   				   DMA_RESV_USAGE_BOOKKEEP);
> @@ -1701,7 +1701,7 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
>   		fence = &ifence->base.base;
>   
>   		/* add shared fence now for pagetable delayed destroy */
> -		dma_resv_add_fence(&vm->resv, fence,
> +		dma_resv_add_fence(xe_vm_resv(vm), fence,
>   				   DMA_RESV_USAGE_BOOKKEEP);
>   
>   		/* This fence will be installed by caller when doing eviction */
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 688130c509a4..8f7140501ff2 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -307,7 +307,7 @@ static void resume_and_reinstall_preempt_fences(struct xe_vm *vm)
>   	list_for_each_entry(e, &vm->preempt.engines, compute.link) {
>   		e->ops->resume(e);
>   
> -		dma_resv_add_fence(&vm->resv, e->compute.pfence,
> +		dma_resv_add_fence(xe_vm_resv(vm), e->compute.pfence,
>   				   DMA_RESV_USAGE_BOOKKEEP);
>   		xe_vm_fence_all_extobjs(vm, e->compute.pfence,
>   					DMA_RESV_USAGE_BOOKKEEP);
> @@ -345,7 +345,7 @@ int xe_vm_add_compute_engine(struct xe_vm *vm, struct xe_engine *e)
>   
>   	down_read(&vm->userptr.notifier_lock);
>   
> -	dma_resv_add_fence(&vm->resv, pfence,
> +	dma_resv_add_fence(xe_vm_resv(vm), pfence,
>   			   DMA_RESV_USAGE_BOOKKEEP);
>   
>   	xe_vm_fence_all_extobjs(vm, pfence, DMA_RESV_USAGE_BOOKKEEP);
> @@ -603,7 +603,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
>   	}
>   
>   	/* Wait on munmap style VM unbinds */
> -	wait = dma_resv_wait_timeout(&vm->resv,
> +	wait = dma_resv_wait_timeout(xe_vm_resv(vm),
>   				     DMA_RESV_USAGE_KERNEL,
>   				     false, MAX_SCHEDULE_TIMEOUT);
>   	if (wait <= 0) {
> @@ -689,13 +689,13 @@ static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni,
>   	 * unbinds to complete, and those are attached as BOOKMARK fences
>   	 * to the vm.
>   	 */
> -	dma_resv_iter_begin(&cursor, &vm->resv,
> +	dma_resv_iter_begin(&cursor, xe_vm_resv(vm),
>   			    DMA_RESV_USAGE_BOOKKEEP);
>   	dma_resv_for_each_fence_unlocked(&cursor, fence)
>   		dma_fence_enable_sw_signaling(fence);
>   	dma_resv_iter_end(&cursor);
>   
> -	err = dma_resv_wait_timeout(&vm->resv,
> +	err = dma_resv_wait_timeout(xe_vm_resv(vm),
>   				    DMA_RESV_USAGE_BOOKKEEP,
>   				    false, MAX_SCHEDULE_TIMEOUT);
>   	XE_WARN_ON(err <= 0);
> @@ -742,12 +742,12 @@ int xe_vm_userptr_pin(struct xe_vm *vm)
>   	}
>   
>   	/* Take lock and move to rebind_list for rebinding. */
> -	err = dma_resv_lock_interruptible(&vm->resv, NULL);
> +	err = dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
>   	if (err)
>   		goto out_err;
>   
>   	list_splice_tail(&tmp_evict, &vm->rebind_list);
> -	dma_resv_unlock(&vm->resv);
> +	dma_resv_unlock(xe_vm_resv(vm));
>   
>   	return 0;
>   
> @@ -1085,7 +1085,6 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
>   		return ERR_PTR(-ENOMEM);
>   
>   	kref_init(&vm->refcount);
> -	dma_resv_init(&vm->resv);
>   
>   	vm->size = 1ull << xe_pt_shift(xe->info.vm_max_level + 1);
>   
> @@ -1120,12 +1119,13 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
>   		xe_device_mem_access_get(xe);
>   	}
>   
> -	err = dma_resv_lock_interruptible(&vm->resv, NULL);
> +	drm_gpuva_manager_init(&vm->mgr, &xe->drm, "Xe VM", 0, vm->size, 0, 0,
> +			       &gpuva_ops);
> +
> +	err = dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
>   	if (err)
>   		goto err_put;
>   
> -	drm_gpuva_manager_init(&vm->mgr, &xe->drm, "Xe VM", 0, vm->size, 0, 0,
> -			       &gpuva_ops);
>   	if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
>   		vm->flags |= XE_VM_FLAGS_64K;
>   
> @@ -1173,7 +1173,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
>   
>   		xe_pt_populate_empty(gt, vm, vm->pt_root[id]);
>   	}
> -	dma_resv_unlock(&vm->resv);
> +	dma_resv_unlock(xe_vm_resv(vm));
>   
>   	/* Kernel migration VM shouldn't have a circular loop.. */
>   	if (!(flags & XE_VM_FLAG_MIGRATION)) {
> @@ -1230,10 +1230,10 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
>   		if (vm->pt_root[id])
>   			xe_pt_destroy(vm->pt_root[id], vm->flags, NULL);
>   	}
> -	dma_resv_unlock(&vm->resv);
> +	dma_resv_unlock(xe_vm_resv(vm));
>   	drm_gpuva_manager_destroy(&vm->mgr);
>   err_put:
> -	dma_resv_fini(&vm->resv);
> +	dma_resv_fini(xe_vm_resv(vm));
>   	kfree(vm);
>   	if (!(flags & XE_VM_FLAG_MIGRATION)) {
>   		xe_device_mem_access_put(xe);
> @@ -1422,7 +1422,7 @@ static void vm_destroy_work_func(struct work_struct *w)
>   
>   	trace_xe_vm_free(vm);
>   	dma_fence_put(vm->rebind_fence);
> -	dma_resv_fini(&vm->resv);
> +	dma_resv_fini(xe_vm_resv(vm));
>   	kfree(vm);
>   }
>   
> @@ -3298,7 +3298,7 @@ int xe_vm_lock(struct xe_vm *vm, struct ww_acquire_ctx *ww,
>   
>   void xe_vm_unlock(struct xe_vm *vm, struct ww_acquire_ctx *ww)
>   {
> -	dma_resv_unlock(&vm->resv);
> +	dma_resv_unlock(xe_vm_resv(vm));
>   	ww_acquire_fini(ww);
>   }
>   
> @@ -3331,7 +3331,7 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
>   			WARN_ON_ONCE(!mmu_interval_check_retry
>   				     (&vma->userptr.notifier,
>   				      vma->userptr.notifier_seq));
> -			WARN_ON_ONCE(!dma_resv_test_signaled(&xe_vma_vm(vma)->resv,
> +			WARN_ON_ONCE(!dma_resv_test_signaled(xe_vma_resv(vma),
>   							     DMA_RESV_USAGE_BOOKKEEP));
>   
>   		} else {
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index cbbe95d6291f..81a9271be728 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -57,6 +57,11 @@ static inline struct xe_device *xe_vm_device(struct xe_vm *vm)
>   	return container_of(vm->mgr.drm, struct xe_device, drm);
>   }
>   
> +static inline struct dma_resv *xe_vm_resv(struct xe_vm *vm)
> +{
> +	return &vm->mgr.resv;
> +}
> +
>   static inline struct xe_vm *gpuva_to_vm(struct drm_gpuva *gpuva)
>   {
>   	return container_of(gpuva->mgr, struct xe_vm, mgr);
> @@ -112,6 +117,11 @@ static inline struct xe_device *xe_vma_device(struct xe_vma *vma)
>   	return xe_vm_device(xe_vma_vm(vma));
>   }
>   
> +static inline struct dma_resv *xe_vma_resv(struct xe_vma *vma)
> +{
> +	return xe_vm_resv(xe_vma_vm(vma));
> +}
> +
>   static inline bool xe_vma_read_only(struct xe_vma *vma)
>   {
>   	return vma->gpuva.flags & XE_VMA_READ_ONLY;
> @@ -122,7 +132,7 @@ static inline u64 xe_vma_userptr(struct xe_vma *vma)
>   	return vma->gpuva.gem.offset;
>   }
>   
> -#define xe_vm_assert_held(vm) dma_resv_assert_held(&(vm)->resv)
> +#define xe_vm_assert_held(vm) dma_resv_assert_held(&(vm)->mgr.resv)
>   
>   u64 xe_vm_pdp4_descriptor(struct xe_vm *vm, struct xe_gt *full_gt);
>   
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index fca42910dcae..26571d171a43 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -8,7 +8,6 @@
>   
>   #include <drm/drm_gpuva_mgr.h>
>   
> -#include <linux/dma-resv.h>
>   #include <linux/kref.h>
>   #include <linux/mmu_notifier.h>
>   #include <linux/scatterlist.h>
> @@ -131,7 +130,7 @@ struct xe_vma {
>   
>   struct xe_device;
>   
> -#define xe_vm_assert_held(vm) dma_resv_assert_held(&(vm)->resv)
> +#define xe_vm_assert_held(vm) dma_resv_assert_held(&(vm)->mgr.resv)
>   
>   struct xe_vm {
>   	/** @mgr: base GPUVA used to track VMAs */
> @@ -142,9 +141,6 @@ struct xe_vm {
>   	/* engine used for (un)binding vma's */
>   	struct xe_engine *eng[XE_MAX_GT];
>   
> -	/** Protects @rebind_list and the page-table structures */
> -	struct dma_resv resv;
> -
>   	/** @lru_bulk_move: Bulk LRU move list for this VM's BOs */
>   	struct ttm_lru_bulk_move lru_bulk_move;
>   
> diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
> index 55b0acfdcc44..010b649e363f 100644
> --- a/include/drm/drm_gpuva_mgr.h
> +++ b/include/drm/drm_gpuva_mgr.h
> @@ -25,6 +25,7 @@
>    * OTHER DEALINGS IN THE SOFTWARE.
>    */
>   
> +#include <linux/dma-resv.h>
>   #include <linux/maple_tree.h>
>   #include <linux/mm.h>
>   #include <linux/rbtree.h>
> @@ -177,6 +178,11 @@ struct drm_gpuva_manager {
>   	 */
>   	const char *name;
>   
> +	/**
> +	 * @resv: dma-resv for all private GEMs mapped in this address space
> +	 */
> +	struct dma_resv resv;
> +
>   	/**
>   	 * @mm_start: start of the VA space
>   	 */

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 23/31] drm/gpuva: Add support for extobj
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 23/31] drm/gpuva: Add support for extobj Matthew Brost
@ 2023-05-11  9:35   ` Thomas Hellström
  0 siblings, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-11  9:35 UTC (permalink / raw)
  To: Matthew Brost, intel-xe


On 5/2/23 02:17, Matthew Brost wrote:
> Manager maintains lists of GPUVA with extobjs.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/drm_gpuva_mgr.c  | 45 +++++++++++++--
>   drivers/gpu/drm/xe/xe_exec.c     | 24 ++++----
>   drivers/gpu/drm/xe/xe_vm.c       | 99 +++++---------------------------
>   drivers/gpu/drm/xe/xe_vm.h       |  3 -
>   drivers/gpu/drm/xe/xe_vm_types.h | 16 ------
>   include/drm/drm_gpuva_mgr.h      | 39 ++++++++++++-
>   6 files changed, 105 insertions(+), 121 deletions(-)

Please make a separate series with elaborate commit message and if 
possible, split the drm and xe parts.

Also it seems the handling of multiple vmas in a single vm pointing to 
the same extobj.

There is a bug in xe that I introduced that also forgets about that, but 
IIRC only on vma removal.

/Thomas



^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 24/31] drm/xe: Userptr refactor
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 24/31] drm/xe: Userptr refactor Matthew Brost
  2023-05-05 19:41   ` Rodrigo Vivi
@ 2023-05-11  9:46   ` Thomas Hellström
  1 sibling, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-11  9:46 UTC (permalink / raw)
  To: Matthew Brost, intel-xe

Hi, Matthew

In addition to Rodrigo's comments:

On 5/2/23 02:17, Matthew Brost wrote:
> Add GPUVA userptr flag, add GPUVA userptr sub-struct,

> and drop sg
> pointer.

These are unrelated, right? If so, separate patch?

Thanks,

Thomas


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 27/31] drm/xe: Use drm_exec for locking rather than TTM exec helpers
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 27/31] drm/xe: Use drm_exec for locking rather than TTM exec helpers Matthew Brost
  2023-05-05 19:42   ` Rodrigo Vivi
@ 2023-05-11 10:01   ` Thomas Hellström
  1 sibling, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-11 10:01 UTC (permalink / raw)
  To: Matthew Brost, intel-xe; +Cc: Danilo Krummrich


On 5/2/23 02:17, Matthew Brost wrote:
> drm_exec is intended to replace TTM exec helpers, use drm_exec. Also
> combine parts of drm_exec with gpuva where it makes sense (locking,
> fence installation).
>
> Suggested-by: Danilo Krummrich <dakr@redhat.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
> ---
>   drivers/gpu/drm/drm_gpuva_mgr.c              |  67 ++++-
>   drivers/gpu/drm/i915/display/intel_display.c |   6 +-
>   drivers/gpu/drm/xe/Kconfig                   |   1 +
>   drivers/gpu/drm/xe/tests/xe_bo.c             |  26 +-
>   drivers/gpu/drm/xe/tests/xe_migrate.c        |   6 +-
>   drivers/gpu/drm/xe/xe_bo.c                   |  56 ++--
>   drivers/gpu/drm/xe/xe_bo.h                   |   6 +-
>   drivers/gpu/drm/xe/xe_bo_evict.c             |  24 +-
>   drivers/gpu/drm/xe/xe_bo_types.h             |   1 -
>   drivers/gpu/drm/xe/xe_engine.c               |   7 +-
>   drivers/gpu/drm/xe/xe_exec.c                 |  37 +--
>   drivers/gpu/drm/xe/xe_gt_pagefault.c         |  55 +---
>   drivers/gpu/drm/xe/xe_lrc.c                  |   8 +-
>   drivers/gpu/drm/xe/xe_migrate.c              |  13 +-
>   drivers/gpu/drm/xe/xe_vm.c                   | 283 ++++++++-----------
>   drivers/gpu/drm/xe/xe_vm.h                   |  27 +-
>   drivers/gpu/drm/xe/xe_vm_madvise.c           |  37 +--
>   include/drm/drm_gpuva_mgr.h                  |  16 +-
>   18 files changed, 315 insertions(+), 361 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
> index e8cd6e154336..93c912c34211 100644
> --- a/drivers/gpu/drm/drm_gpuva_mgr.c
> +++ b/drivers/gpu/drm/drm_gpuva_mgr.c
> @@ -483,6 +483,50 @@ drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr)
>   }
>   EXPORT_SYMBOL(drm_gpuva_manager_destroy);
>   
> +/**
> + * TODO
> + */
> +int drm_gpuva_manager_lock(struct drm_gpuva_manager *mgr, struct drm_exec *exec,
> +			   struct drm_gem_object *mgr_obj, bool intr,
> +			   unsigned int num_fences)
> +{
> +	struct drm_gpuva *gpuva;
> +	int ret;
> +
> +	drm_exec_init(exec, intr);
> +	drm_exec_while_not_all_locked(exec) {
> +		ret = drm_exec_prepare_obj(exec, mgr_obj, num_fences);
> +		drm_exec_continue_on_contention(exec);
> +		if (ret && ret != -EALREADY)
> +			goto err_exec;
> +
> +		drm_gpuva_for_each_extobj(gpuva, mgr) {
> +			ret = drm_exec_prepare_obj(exec, gpuva->gem.obj,
> +						   num_fences);
> +			drm_exec_break_on_contention(exec);
> +			if (ret && ret != -EALREADY)
> +				goto err_exec;
> +		}
> +	}
> +

I think that in the not too distant future we want to include the 
bo_validate() in the drm_exec_while_not_all_locked() loop (the WW 
transaction), in order to be able to use sleeping WW locks for eviction. 
And then this helper wouldn't be flexible enough, since we'd want o 
avoid vfuncs and probably want to open-code the loop in the driver.

Since Rodrigo already commented on splitting things up, could we do the 
drm_exec as a separate part before drm_exec / GPUVA integration, since, 
given the above, the latter might not lend itself very well to the 
needed flexibility.

/Thomas


> +	return 0;
> +
> +err_exec:
> +	drm_exec_fini(exec);
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_gpuva_manager_lock);
> +
> +/**
> + * TODO
> + */
> +void drm_gpuva_manager_unlock(struct drm_gpuva_manager *mgr,
> +			      struct drm_exec *exec)
> +{
> +	drm_exec_fini(exec);
> +}
> +EXPORT_SYMBOL(drm_gpuva_manager_unlock);
> +
>   static inline bool
>   drm_gpuva_in_mm_range(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
>   {
> @@ -888,7 +932,7 @@ drm_gpuva_interval_empty(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
>   EXPORT_SYMBOL(drm_gpuva_interval_empty);
>   
>   /**
> - * drm_gpuva_add_fence - add fence to private and all extobj dma-resv
> + * drm_gpuva_manager_add_fence - add fence to private and all extobj dma-resv
>    * @mgr: the &drm_gpuva_manager to add a fence to
>    * @fence: fence to add
>    * @private_usage: private dma-resv usage
> @@ -896,17 +940,24 @@ EXPORT_SYMBOL(drm_gpuva_interval_empty);
>    *
>    * Returns: true if the interval is empty, false otherwise
>    */
> -void drm_gpuva_add_fence(struct drm_gpuva_manager *mgr, struct dma_fence *fence,
> -			 enum dma_resv_usage private_usage,
> -			 enum dma_resv_usage extobj_usage)
> +void drm_gpuva_manager_add_fence(struct drm_gpuva_manager *mgr,
> +				 struct drm_exec *exec,
> +				 struct dma_fence *fence,
> +				 enum dma_resv_usage private_usage,
> +				 enum dma_resv_usage extobj_usage)
>   {
> -	struct drm_gpuva *gpuva;
> +	struct drm_gem_object *obj;
> +	unsigned long index;
> +
> +	dma_resv_assert_held(&mgr->resv);
>   
>   	dma_resv_add_fence(&mgr->resv, fence, private_usage);
> -	drm_gpuva_for_each_extobj(gpuva, mgr)
> -		dma_resv_add_fence(gpuva->gem.obj->resv, fence, extobj_usage);
> +	drm_exec_for_each_locked_object(exec, index, obj)
> +		if (likely(&mgr->resv != obj->resv))
> +			dma_resv_add_fence(obj->resv, fence, extobj_usage);
>   }
> -EXPORT_SYMBOL(drm_gpuva_add_fence);
> +EXPORT_SYMBOL(drm_gpuva_manager_add_fence);
> +
>   
>   /**
>    * drm_gpuva_map - helper to insert a &drm_gpuva from &drm_gpuva_fn_ops
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> index 28a227450329..aab1a3a0f06d 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -7340,11 +7340,11 @@ static int i915_gem_object_read_from_page(struct xe_bo *bo,
>   	void *virtual;
>   	bool is_iomem;
>   	int ret;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   
>   	XE_BUG_ON(size != 8);
>   
> -	ret = xe_bo_lock(bo, &ww, 0, true);
> +	ret = xe_bo_lock(bo, &exec, 0, true);
>   	if (ret)
>   		return ret;
>   
> @@ -7361,7 +7361,7 @@ static int i915_gem_object_read_from_page(struct xe_bo *bo,
>   
>   	ttm_bo_kunmap(&map);
>   out_unlock:
> -	xe_bo_unlock(bo, &ww);
> +	xe_bo_unlock(bo, &exec);
>   	return ret;
>   }
>   #endif
> diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
> index f6f3b491d162..bbcc9b64b776 100644
> --- a/drivers/gpu/drm/xe/Kconfig
> +++ b/drivers/gpu/drm/xe/Kconfig
> @@ -8,6 +8,7 @@ config DRM_XE
>   	select SHMEM
>   	select TMPFS
>   	select DRM_BUDDY
> +	select DRM_EXEC
>   	select DRM_KMS_HELPER
>   	select DRM_PANEL
>   	select DRM_SUBALLOC_HELPER
> diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
> index 9bd381e5b7a6..316c6cf2bb86 100644
> --- a/drivers/gpu/drm/xe/tests/xe_bo.c
> +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
> @@ -175,17 +175,17 @@ static int evict_test_run_gt(struct xe_device *xe, struct xe_gt *gt, struct kuni
>   	unsigned int bo_flags = XE_BO_CREATE_USER_BIT |
>   		XE_BO_CREATE_VRAM_IF_DGFX(gt);
>   	struct xe_vm *vm = xe_migrate_get_vm(xe->gt[0].migrate);
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   	int err, i;
>   
>   	kunit_info(test, "Testing device %s gt id %u vram id %u\n",
>   		   dev_name(xe->drm.dev), gt->info.id, gt->info.vram_id);
>   
>   	for (i = 0; i < 2; ++i) {
> -		xe_vm_lock(vm, &ww, 0, false);
> +		xe_vm_lock(vm, &exec, 0, false);
>   		bo = xe_bo_create(xe, NULL, vm, 0x10000, ttm_bo_type_device,
>   				  bo_flags);
> -		xe_vm_unlock(vm, &ww);
> +		xe_vm_unlock(vm, &exec);
>   		if (IS_ERR(bo)) {
>   			KUNIT_FAIL(test, "bo create err=%pe\n", bo);
>   			break;
> @@ -198,9 +198,9 @@ static int evict_test_run_gt(struct xe_device *xe, struct xe_gt *gt, struct kuni
>   			goto cleanup_bo;
>   		}
>   
> -		xe_bo_lock(external, &ww, 0, false);
> +		xe_bo_lock(external, &exec, 0, false);
>   		err = xe_bo_pin_external(external);
> -		xe_bo_unlock(external, &ww);
> +		xe_bo_unlock(external, &exec);
>   		if (err) {
>   			KUNIT_FAIL(test, "external bo pin err=%pe\n",
>   				   ERR_PTR(err));
> @@ -240,18 +240,18 @@ static int evict_test_run_gt(struct xe_device *xe, struct xe_gt *gt, struct kuni
>   
>   		if (i) {
>   			down_read(&vm->lock);
> -			xe_vm_lock(vm, &ww, 0, false);
> +			xe_vm_lock(vm, &exec, 0, false);
>   			err = xe_bo_validate(bo, bo->vm, false);
> -			xe_vm_unlock(vm, &ww);
> +			xe_vm_unlock(vm, &exec);
>   			up_read(&vm->lock);
>   			if (err) {
>   				KUNIT_FAIL(test, "bo valid err=%pe\n",
>   					   ERR_PTR(err));
>   				goto cleanup_all;
>   			}
> -			xe_bo_lock(external, &ww, 0, false);
> +			xe_bo_lock(external, &exec, 0, false);
>   			err = xe_bo_validate(external, NULL, false);
> -			xe_bo_unlock(external, &ww);
> +			xe_bo_unlock(external, &exec);
>   			if (err) {
>   				KUNIT_FAIL(test, "external bo valid err=%pe\n",
>   					   ERR_PTR(err));
> @@ -259,18 +259,18 @@ static int evict_test_run_gt(struct xe_device *xe, struct xe_gt *gt, struct kuni
>   			}
>   		}
>   
> -		xe_bo_lock(external, &ww, 0, false);
> +		xe_bo_lock(external, &exec, 0, false);
>   		xe_bo_unpin_external(external);
> -		xe_bo_unlock(external, &ww);
> +		xe_bo_unlock(external, &exec);
>   
>   		xe_bo_put(external);
>   		xe_bo_put(bo);
>   		continue;
>   
>   cleanup_all:
> -		xe_bo_lock(external, &ww, 0, false);
> +		xe_bo_lock(external, &exec, 0, false);
>   		xe_bo_unpin_external(external);
> -		xe_bo_unlock(external, &ww);
> +		xe_bo_unlock(external, &exec);
>   cleanup_external:
>   		xe_bo_put(external);
>   cleanup_bo:
> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> index 0f4371ad1fd9..e1482b4491b1 100644
> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> @@ -394,14 +394,14 @@ static int migrate_test_run_device(struct xe_device *xe)
>   
>   	for_each_gt(gt, xe, id) {
>   		struct xe_migrate *m = gt->migrate;
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>   
>   		kunit_info(test, "Testing gt id %d.\n", id);
> -		xe_vm_lock(m->eng->vm, &ww, 0, true);
> +		xe_vm_lock(m->eng->vm, &exec, 0, true);
>   		xe_device_mem_access_get(xe);
>   		xe_migrate_sanity_test(m, test);
>   		xe_device_mem_access_put(xe);
> -		xe_vm_unlock(m->eng->vm, &ww);
> +		xe_vm_unlock(m->eng->vm, &exec);
>   	}
>   
>   	return 0;
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index e0422ffb6327..a427edbf486b 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -8,6 +8,7 @@
>   #include <linux/dma-buf.h>
>   
>   #include <drm/drm_drv.h>
> +#include <drm/drm_exec.h>
>   #include <drm/drm_gem_ttm_helper.h>
>   #include <drm/ttm/ttm_device.h>
>   #include <drm/ttm/ttm_placement.h>
> @@ -991,13 +992,13 @@ static void xe_gem_object_close(struct drm_gem_object *obj,
>   	struct xe_bo *bo = gem_to_xe_bo(obj);
>   
>   	if (bo->vm && !xe_vm_no_dma_fences(bo->vm)) {
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>   
>   		XE_BUG_ON(!xe_bo_is_user(bo));
>   
> -		xe_bo_lock(bo, &ww, 0, false);
> +		xe_bo_lock(bo, &exec, 0, false);
>   		ttm_bo_set_bulk_move(&bo->ttm, NULL);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>   	}
>   }
>   
> @@ -1402,11 +1403,6 @@ int xe_bo_pin_external(struct xe_bo *bo)
>   	}
>   
>   	ttm_bo_pin(&bo->ttm);
> -
> -	/*
> -	 * FIXME: If we always use the reserve / unreserve functions for locking
> -	 * we do not need this.
> -	 */
>   	ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
>   
>   	return 0;
> @@ -1461,11 +1457,6 @@ int xe_bo_pin(struct xe_bo *bo)
>   	}
>   
>   	ttm_bo_pin(&bo->ttm);
> -
> -	/*
> -	 * FIXME: If we always use the reserve / unreserve functions for locking
> -	 * we do not need this.
> -	 */
>   	ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
>   
>   	return 0;
> @@ -1496,11 +1487,6 @@ void xe_bo_unpin_external(struct xe_bo *bo)
>   	}
>   
>   	ttm_bo_unpin(&bo->ttm);
> -
> -	/*
> -	 * FIXME: If we always use the reserve / unreserve functions for locking
> -	 * we do not need this.
> -	 */
>   	ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
>   }
>   
> @@ -1650,7 +1636,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>   	struct xe_device *xe = to_xe_device(dev);
>   	struct xe_file *xef = to_xe_file(file);
>   	struct drm_xe_gem_create *args = data;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   	struct xe_vm *vm = NULL;
>   	struct xe_bo *bo;
>   	unsigned bo_flags = XE_BO_CREATE_USER_BIT;
> @@ -1686,7 +1672,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>   		vm = xe_vm_lookup(xef, args->vm_id);
>   		if (XE_IOCTL_ERR(xe, !vm))
>   			return -ENOENT;
> -		err = xe_vm_lock(vm, &ww, 0, true);
> +		err = xe_vm_lock(vm, &exec, 0, true);
>   		if (err) {
>   			xe_vm_put(vm);
>   			return err;
> @@ -1703,7 +1689,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>   	bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
>   			  bo_flags);
>   	if (vm) {
> -		xe_vm_unlock(vm, &ww);
> +		xe_vm_unlock(vm, &exec);
>   		xe_vm_put(vm);
>   	}
>   
> @@ -1744,26 +1730,30 @@ int xe_gem_mmap_offset_ioctl(struct drm_device *dev, void *data,
>   	return 0;
>   }
>   
> -int xe_bo_lock(struct xe_bo *bo, struct ww_acquire_ctx *ww,
> +int xe_bo_lock(struct xe_bo *bo, struct drm_exec *exec,
>   	       int num_resv, bool intr)
>   {
> -	struct ttm_validate_buffer tv_bo;
> -	LIST_HEAD(objs);
> -	LIST_HEAD(dups);
> +	int err;
>   
> -	XE_BUG_ON(!ww);
> +	drm_exec_init(exec, intr);
> +	drm_exec_while_not_all_locked(exec) {
> +		err = drm_exec_prepare_obj(exec, &bo->ttm.base,
> +					   num_resv);
> +		drm_exec_continue_on_contention(exec);
> +		if (err && err != -EALREADY)
> +			goto out_err;
> +	}
>   
> -	tv_bo.num_shared = num_resv;
> -	tv_bo.bo = &bo->ttm;;
> -	list_add_tail(&tv_bo.head, &objs);
> +	return 0;
>   
> -	return ttm_eu_reserve_buffers(ww, &objs, intr, &dups);
> +out_err:
> +	drm_exec_fini(exec);
> +	return err;
>   }
>   
> -void xe_bo_unlock(struct xe_bo *bo, struct ww_acquire_ctx *ww)
> +void xe_bo_unlock(struct xe_bo *bo, struct drm_exec *exec)
>   {
> -	dma_resv_unlock(bo->ttm.base.resv);
> -	ww_acquire_fini(ww);
> +	drm_exec_fini(exec);
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index 9b401d30a130..5a80ebf72d10 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -75,6 +75,7 @@
>   
>   #define XE_BO_PROPS_INVALID	(-1)
>   
> +struct drm_exec;
>   struct sg_table;
>   
>   struct xe_bo *xe_bo_alloc(void);
> @@ -142,10 +143,9 @@ static inline void xe_bo_assert_held(struct xe_bo *bo)
>   		dma_resv_assert_held((bo)->ttm.base.resv);
>   }
>   
> -int xe_bo_lock(struct xe_bo *bo, struct ww_acquire_ctx *ww,
> +int xe_bo_lock(struct xe_bo *bo, struct drm_exec *exec,
>   	       int num_resv, bool intr);
> -
> -void xe_bo_unlock(struct xe_bo *bo, struct ww_acquire_ctx *ww);
> +void xe_bo_unlock(struct xe_bo *bo, struct drm_exec *exec);
>   
>   static inline void xe_bo_unlock_vm_held(struct xe_bo *bo)
>   {
> diff --git a/drivers/gpu/drm/xe/xe_bo_evict.c b/drivers/gpu/drm/xe/xe_bo_evict.c
> index 6642c5f52009..46d9d9eb110c 100644
> --- a/drivers/gpu/drm/xe/xe_bo_evict.c
> +++ b/drivers/gpu/drm/xe/xe_bo_evict.c
> @@ -3,6 +3,8 @@
>    * Copyright © 2022 Intel Corporation
>    */
>   
> +#include <drm/drm_exec.h>
> +
>   #include "xe_bo_evict.h"
>   
>   #include "xe_bo.h"
> @@ -27,7 +29,7 @@
>   int xe_bo_evict_all(struct xe_device *xe)
>   {
>   	struct ttm_device *bdev = &xe->ttm;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   	struct xe_bo *bo;
>   	struct xe_gt *gt;
>   	struct list_head still_in_list;
> @@ -62,9 +64,9 @@ int xe_bo_evict_all(struct xe_device *xe)
>   		list_move_tail(&bo->pinned_link, &still_in_list);
>   		spin_unlock(&xe->pinned.lock);
>   
> -		xe_bo_lock(bo, &ww, 0, false);
> +		xe_bo_lock(bo, &exec, 0, false);
>   		ret = xe_bo_evict_pinned(bo);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>   		xe_bo_put(bo);
>   		if (ret) {
>   			spin_lock(&xe->pinned.lock);
> @@ -96,9 +98,9 @@ int xe_bo_evict_all(struct xe_device *xe)
>   		list_move_tail(&bo->pinned_link, &xe->pinned.evicted);
>   		spin_unlock(&xe->pinned.lock);
>   
> -		xe_bo_lock(bo, &ww, 0, false);
> +		xe_bo_lock(bo, &exec, 0, false);
>   		ret = xe_bo_evict_pinned(bo);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>   		xe_bo_put(bo);
>   		if (ret)
>   			return ret;
> @@ -123,7 +125,7 @@ int xe_bo_evict_all(struct xe_device *xe)
>    */
>   int xe_bo_restore_kernel(struct xe_device *xe)
>   {
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   	struct xe_bo *bo;
>   	int ret;
>   
> @@ -140,9 +142,9 @@ int xe_bo_restore_kernel(struct xe_device *xe)
>   		list_move_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present);
>   		spin_unlock(&xe->pinned.lock);
>   
> -		xe_bo_lock(bo, &ww, 0, false);
> +		xe_bo_lock(bo, &exec, 0, false);
>   		ret = xe_bo_restore_pinned(bo);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>   		if (ret) {
>   			xe_bo_put(bo);
>   			return ret;
> @@ -182,7 +184,7 @@ int xe_bo_restore_kernel(struct xe_device *xe)
>    */
>   int xe_bo_restore_user(struct xe_device *xe)
>   {
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   	struct xe_bo *bo;
>   	struct xe_gt *gt;
>   	struct list_head still_in_list;
> @@ -204,9 +206,9 @@ int xe_bo_restore_user(struct xe_device *xe)
>   		xe_bo_get(bo);
>   		spin_unlock(&xe->pinned.lock);
>   
> -		xe_bo_lock(bo, &ww, 0, false);
> +		xe_bo_lock(bo, &exec, 0, false);
>   		ret = xe_bo_restore_pinned(bo);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>   		xe_bo_put(bo);
>   		if (ret) {
>   			spin_lock(&xe->pinned.lock);
> diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
> index 06de3330211d..2ba34a8c9b66 100644
> --- a/drivers/gpu/drm/xe/xe_bo_types.h
> +++ b/drivers/gpu/drm/xe/xe_bo_types.h
> @@ -11,7 +11,6 @@
>   #include <drm/drm_mm.h>
>   #include <drm/ttm/ttm_bo.h>
>   #include <drm/ttm/ttm_device.h>
> -#include <drm/ttm/ttm_execbuf_util.h>
>   #include <drm/ttm/ttm_placement.h>
>   
>   struct xe_device;
> diff --git a/drivers/gpu/drm/xe/xe_engine.c b/drivers/gpu/drm/xe/xe_engine.c
> index 91600b1e8249..8b425b777259 100644
> --- a/drivers/gpu/drm/xe/xe_engine.c
> +++ b/drivers/gpu/drm/xe/xe_engine.c
> @@ -8,6 +8,7 @@
>   #include <linux/nospec.h>
>   
>   #include <drm/drm_device.h>
> +#include <drm/drm_exec.h>
>   #include <drm/drm_file.h>
>   #include <drm/xe_drm.h>
>   
> @@ -89,18 +90,18 @@ struct xe_engine *xe_engine_create(struct xe_device *xe, struct xe_vm *vm,
>   				   u32 logical_mask, u16 width,
>   				   struct xe_hw_engine *hwe, u32 flags)
>   {
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   	struct xe_engine *e;
>   	int err;
>   
>   	if (vm) {
> -		err = xe_vm_lock(vm, &ww, 0, true);
> +		err = xe_vm_lock(vm, &exec, 0, true);
>   		if (err)
>   			return ERR_PTR(err);
>   	}
>   	e = __xe_engine_create(xe, vm, logical_mask, width, hwe, flags);
>   	if (vm)
> -		xe_vm_unlock(vm, &ww);
> +		xe_vm_unlock(vm, &exec);
>   
>   	return e;
>   }
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index 2ae02f1500d5..9f7f1088c403 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -6,6 +6,7 @@
>   #include "xe_exec.h"
>   
>   #include <drm/drm_device.h>
> +#include <drm/drm_exec.h>
>   #include <drm/drm_file.h>
>   #include <drm/xe_drm.h>
>   
> @@ -92,21 +93,16 @@
>    *	Unlock all
>    */
>   
> -static int xe_exec_begin(struct xe_engine *e, struct ww_acquire_ctx *ww,
> -			 struct ttm_validate_buffer tv_onstack[],
> -			 struct ttm_validate_buffer **tv,
> -			 struct list_head *objs)
> +static int xe_exec_begin(struct xe_engine *e, struct drm_exec *exec)
>   {
>   	struct xe_vm *vm = e->vm;
>   	struct xe_vma *vma;
> -	LIST_HEAD(dups);
>   	int err;
>   
> -	*tv = NULL;
>   	if (xe_vm_no_dma_fences(e->vm))
>   		return 0;
>   
> -	err = xe_vm_lock_dma_resv(vm, ww, tv_onstack, tv, objs, true, 1);
> +	err = xe_vm_lock_dma_resv(vm, exec, true, 1);
>   	if (err)
>   		return err;
>   
> @@ -123,8 +119,7 @@ static int xe_exec_begin(struct xe_engine *e, struct ww_acquire_ctx *ww,
>   
>   		err = xe_bo_validate(xe_vma_bo(vma), vm, false);
>   		if (err) {
> -			xe_vm_unlock_dma_resv(vm, tv_onstack, *tv, ww, objs);
> -			*tv = NULL;
> +			xe_vm_unlock_dma_resv(vm, exec);
>   			return err;
>   		}
>   	}
> @@ -132,14 +127,10 @@ static int xe_exec_begin(struct xe_engine *e, struct ww_acquire_ctx *ww,
>   	return 0;
>   }
>   
> -static void xe_exec_end(struct xe_engine *e,
> -			struct ttm_validate_buffer *tv_onstack,
> -			struct ttm_validate_buffer *tv,
> -			struct ww_acquire_ctx *ww,
> -			struct list_head *objs)
> +static void xe_exec_end(struct xe_engine *e, struct drm_exec *exec)
>   {
>   	if (!xe_vm_no_dma_fences(e->vm))
> -		xe_vm_unlock_dma_resv(e->vm, tv_onstack, tv, ww, objs);
> +		xe_vm_unlock_dma_resv(e->vm, exec);
>   }
>   
>   int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> @@ -149,17 +140,14 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   	struct drm_xe_exec *args = data;
>   	struct drm_xe_sync __user *syncs_user = u64_to_user_ptr(args->syncs);
>   	u64 __user *addresses_user = u64_to_user_ptr(args->address);
> +	struct drm_exec exec;
>   	struct xe_engine *engine;
>   	struct xe_sync_entry *syncs = NULL;
>   	u64 addresses[XE_HW_ENGINE_MAX_INSTANCE];
> -	struct ttm_validate_buffer tv_onstack[XE_ONSTACK_TV];
> -	struct ttm_validate_buffer *tv = NULL;
>   	u32 i, num_syncs = 0;
>   	struct xe_sched_job *job;
>   	struct dma_fence *rebind_fence;
>   	struct xe_vm *vm;
> -	struct ww_acquire_ctx ww;
> -	struct list_head objs;
>   	bool write_locked;
>   	int err = 0;
>   
> @@ -270,7 +258,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   			goto err_unlock_list;
>   	}
>   
> -	err = xe_exec_begin(engine, &ww, tv_onstack, &tv, &objs);
> +	err = xe_exec_begin(engine, &exec);
>   	if (err)
>   		goto err_unlock_list;
>   
> @@ -361,9 +349,10 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   	 * are written as we don't pass in a read / write list.
>   	 */
>   	if (!xe_vm_no_dma_fences(vm))
> -		drm_gpuva_add_fence(&vm->mgr, &job->drm.s_fence->finished,
> -				    DMA_RESV_USAGE_BOOKKEEP,
> -				    DMA_RESV_USAGE_WRITE);
> +		drm_gpuva_manager_add_fence(&vm->mgr, &exec,
> +					    &job->drm.s_fence->finished,
> +					    DMA_RESV_USAGE_BOOKKEEP,
> +					    DMA_RESV_USAGE_WRITE);
>   
>   	for (i = 0; i < num_syncs; i++)
>   		xe_sync_entry_signal(&syncs[i], job,
> @@ -387,7 +376,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   	if (err)
>   		xe_sched_job_put(job);
>   err_engine_end:
> -	xe_exec_end(engine, tv_onstack, tv, &ww, &objs);
> +	xe_exec_end(engine, &exec);
>   err_unlock_list:
>   	if (write_locked)
>   		up_write(&vm->lock);
> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> index d7bf6b0a0697..1145c6eaa17d 100644
> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> @@ -9,7 +9,7 @@
>   #include <linux/circ_buf.h>
>   
>   #include <drm/drm_managed.h>
> -#include <drm/ttm/ttm_execbuf_util.h>
> +#include <drm/drm_exec.h>
>   
>   #include "xe_bo.h"
>   #include "xe_gt.h"
> @@ -84,11 +84,6 @@ static bool vma_matches(struct xe_vma *vma, u64 page_addr)
>   	return true;
>   }
>   
> -static bool only_needs_bo_lock(struct xe_bo *bo)
> -{
> -	return bo && bo->vm;
> -}
> -
>   static struct xe_vma *lookup_vma(struct xe_vm *vm, u64 page_addr)
>   {
>   	struct xe_vma *vma = NULL;
> @@ -109,10 +104,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
>   	struct xe_vm *vm;
>   	struct xe_vma *vma = NULL;
>   	struct xe_bo *bo;
> -	LIST_HEAD(objs);
> -	LIST_HEAD(dups);
> -	struct ttm_validate_buffer tv_bo, tv_vm;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   	struct dma_fence *fence;
>   	bool write_locked;
>   	int ret = 0;
> @@ -170,20 +162,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
>   
>   	/* Lock VM and BOs dma-resv */
>   	bo = xe_vma_bo(vma);
> -	if (only_needs_bo_lock(bo)) {
> -		/* This path ensures the BO's LRU is updated */
> -		ret = xe_bo_lock(bo, &ww, xe->info.tile_count, false);
> -	} else {
> -		tv_vm.num_shared = xe->info.tile_count;
> -		tv_vm.bo = xe_vm_ttm_bo(vm);
> -		list_add(&tv_vm.head, &objs);
> -		if (bo) {
> -			tv_bo.bo = &bo->ttm;
> -			tv_bo.num_shared = xe->info.tile_count;
> -			list_add(&tv_bo.head, &objs);
> -		}
> -		ret = ttm_eu_reserve_buffers(&ww, &objs, false, &dups);
> -	}
> +	ret = xe_vm_bo_lock(vm, bo, &exec, xe->info.tile_count, false);
>   	if (ret)
>   		goto unlock_vm;
>   
> @@ -226,10 +205,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
>   	vma->usm.gt_invalidated &= ~BIT(gt->info.id);
>   
>   unlock_dma_resv:
> -	if (only_needs_bo_lock(bo))
> -		xe_bo_unlock(bo, &ww);
> -	else
> -		ttm_eu_backoff_reservation(&ww, &objs);
> +	xe_vm_bo_unlock(vm, bo, &exec, true);
>   unlock_vm:
>   	if (!ret)
>   		vm->usm.last_fault_vma = vma;
> @@ -496,10 +472,7 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
>   	struct xe_vm *vm;
>   	struct xe_vma *vma;
>   	struct xe_bo *bo;
> -	LIST_HEAD(objs);
> -	LIST_HEAD(dups);
> -	struct ttm_validate_buffer tv_bo, tv_vm;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   	int ret = 0;
>   
>   	/* We only support ACC_TRIGGER at the moment */
> @@ -532,28 +505,14 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
>   
>   	/* Lock VM and BOs dma-resv */
>   	bo = xe_vma_bo(vma);
> -	if (only_needs_bo_lock(bo)) {
> -		/* This path ensures the BO's LRU is updated */
> -		ret = xe_bo_lock(bo, &ww, xe->info.tile_count, false);
> -	} else {
> -		tv_vm.num_shared = xe->info.tile_count;
> -		tv_vm.bo = xe_vm_ttm_bo(vm);
> -		list_add(&tv_vm.head, &objs);
> -		tv_bo.bo = &bo->ttm;
> -		tv_bo.num_shared = xe->info.tile_count;
> -		list_add(&tv_bo.head, &objs);
> -		ret = ttm_eu_reserve_buffers(&ww, &objs, false, &dups);
> -	}
> +	ret = xe_vm_bo_lock(vm, bo, &exec, xe->info.tile_count, false);
>   	if (ret)
>   		goto unlock_vm;
>   
>   	/* Migrate to VRAM, move should invalidate the VMA first */
>   	ret = xe_bo_migrate(bo, XE_PL_VRAM0 + gt->info.vram_id);
>   
> -	if (only_needs_bo_lock(bo))
> -		xe_bo_unlock(bo, &ww);
> -	else
> -		ttm_eu_backoff_reservation(&ww, &objs);
> +	xe_vm_bo_unlock(vm, bo, &exec, true);
>   unlock_vm:
>   	up_read(&vm->lock);
>   	xe_vm_put(vm);
> diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
> index ae605e7805de..3cc34efe8dd8 100644
> --- a/drivers/gpu/drm/xe/xe_lrc.c
> +++ b/drivers/gpu/drm/xe/xe_lrc.c
> @@ -3,6 +3,8 @@
>    * Copyright © 2021 Intel Corporation
>    */
>   
> +#include <drm/drm_exec.h>
> +
>   #include "xe_lrc.h"
>   
>   #include "regs/xe_engine_regs.h"
> @@ -712,16 +714,16 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
>   
>   void xe_lrc_finish(struct xe_lrc *lrc)
>   {
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   
>   	xe_hw_fence_ctx_finish(&lrc->fence_ctx);
>   	if (lrc->bo->vm)
> -		xe_vm_lock(lrc->bo->vm, &ww, 0, false);
> +		xe_vm_lock(lrc->bo->vm, &exec, 0, false);
>   	else
>   		xe_bo_lock_no_vm(lrc->bo, NULL);
>   	xe_bo_unpin(lrc->bo);
>   	if (lrc->bo->vm)
> -		xe_vm_unlock(lrc->bo->vm, &ww);
> +		xe_vm_unlock(lrc->bo->vm, &exec);
>   	else
>   		xe_bo_unlock_no_vm(lrc->bo);
>   	xe_bo_put(lrc->bo);
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index 91a06c925a1e..1dd497252640 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -9,6 +9,7 @@
>   #include <linux/sizes.h>
>   
>   #include <drm/drm_managed.h>
> +#include <drm/drm_exec.h>
>   #include <drm/ttm/ttm_tt.h>
>   #include <drm/xe_drm.h>
>   
> @@ -86,13 +87,13 @@ struct xe_engine *xe_gt_migrate_engine(struct xe_gt *gt)
>   static void xe_migrate_fini(struct drm_device *dev, void *arg)
>   {
>   	struct xe_migrate *m = arg;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   
> -	xe_vm_lock(m->eng->vm, &ww, 0, false);
> +	xe_vm_lock(m->eng->vm, &exec, 0, false);
>   	xe_bo_unpin(m->pt_bo);
>   	if (m->cleared_bo)
>   		xe_bo_unpin(m->cleared_bo);
> -	xe_vm_unlock(m->eng->vm, &ww);
> +	xe_vm_unlock(m->eng->vm, &exec);
>   
>   	dma_fence_put(m->fence);
>   	if (m->cleared_bo)
> @@ -315,7 +316,7 @@ struct xe_migrate *xe_migrate_init(struct xe_gt *gt)
>   	struct xe_device *xe = gt_to_xe(gt);
>   	struct xe_migrate *m;
>   	struct xe_vm *vm;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   	int err;
>   
>   	XE_BUG_ON(xe_gt_is_media_type(gt));
> @@ -332,9 +333,9 @@ struct xe_migrate *xe_migrate_init(struct xe_gt *gt)
>   	if (IS_ERR(vm))
>   		return ERR_CAST(vm);
>   
> -	xe_vm_lock(vm, &ww, 0, false);
> +	xe_vm_lock(vm, &exec, 0, false);
>   	err = xe_migrate_prepare_vm(gt, m, vm);
> -	xe_vm_unlock(vm, &ww);
> +	xe_vm_unlock(vm, &exec);
>   	if (err) {
>   		xe_vm_close_and_put(vm);
>   		return ERR_PTR(err);
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 4d734ec4d6ab..55cced8870e6 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -7,7 +7,7 @@
>   
>   #include <linux/dma-fence-array.h>
>   
> -#include <drm/ttm/ttm_execbuf_util.h>
> +#include <drm/drm_exec.h>
>   #include <drm/ttm/ttm_tt.h>
>   #include <drm/xe_drm.h>
>   #include <linux/kthread.h>
> @@ -260,10 +260,10 @@ static void arm_preempt_fences(struct xe_vm *vm, struct list_head *list)
>   static int add_preempt_fences(struct xe_vm *vm, struct xe_bo *bo)
>   {
>   	struct xe_engine *e;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   	int err;
>   
> -	err = xe_bo_lock(bo, &ww, vm->preempt.num_engines, true);
> +	err = xe_bo_lock(bo, &exec, vm->preempt.num_engines, true);
>   	if (err)
>   		return err;
>   
> @@ -274,11 +274,12 @@ static int add_preempt_fences(struct xe_vm *vm, struct xe_bo *bo)
>   					   DMA_RESV_USAGE_BOOKKEEP);
>   		}
>   
> -	xe_bo_unlock(bo, &ww);
> +	xe_bo_unlock(bo, &exec);
>   	return 0;
>   }
>   
> -static void resume_and_reinstall_preempt_fences(struct xe_vm *vm)
> +static void resume_and_reinstall_preempt_fences(struct xe_vm *vm,
> +						struct drm_exec *exec)
>   {
>   	struct xe_engine *e;
>   
> @@ -288,18 +289,15 @@ static void resume_and_reinstall_preempt_fences(struct xe_vm *vm)
>   	list_for_each_entry(e, &vm->preempt.engines, compute.link) {
>   		e->ops->resume(e);
>   
> -		drm_gpuva_add_fence(&vm->mgr, e->compute.pfence,
> -				    DMA_RESV_USAGE_BOOKKEEP,
> -				    DMA_RESV_USAGE_BOOKKEEP);
> +		drm_gpuva_manager_add_fence(&vm->mgr, exec, e->compute.pfence,
> +					    DMA_RESV_USAGE_BOOKKEEP,
> +					    DMA_RESV_USAGE_BOOKKEEP);
>   	}
>   }
>   
>   int xe_vm_add_compute_engine(struct xe_vm *vm, struct xe_engine *e)
>   {
> -	struct ttm_validate_buffer tv_onstack[XE_ONSTACK_TV];
> -	struct ttm_validate_buffer *tv;
> -	struct ww_acquire_ctx ww;
> -	struct list_head objs;
> +	struct drm_exec exec;
>   	struct dma_fence *pfence;
>   	int err;
>   	bool wait;
> @@ -308,7 +306,7 @@ int xe_vm_add_compute_engine(struct xe_vm *vm, struct xe_engine *e)
>   
>   	down_write(&vm->lock);
>   
> -	err = xe_vm_lock_dma_resv(vm, &ww, tv_onstack, &tv, &objs, true, 1);
> +	err = xe_vm_lock_dma_resv(vm, &exec, true, 1);
>   	if (err)
>   		goto out_unlock_outer;
>   
> @@ -325,9 +323,9 @@ int xe_vm_add_compute_engine(struct xe_vm *vm, struct xe_engine *e)
>   
>   	down_read(&vm->userptr.notifier_lock);
>   
> -	drm_gpuva_add_fence(&vm->mgr, pfence,
> -			    DMA_RESV_USAGE_BOOKKEEP,
> -			    DMA_RESV_USAGE_BOOKKEEP);
> +	drm_gpuva_manager_add_fence(&vm->mgr, &exec, pfence,
> +				    DMA_RESV_USAGE_BOOKKEEP,
> +				    DMA_RESV_USAGE_BOOKKEEP);
>   
>   	/*
>   	 * Check to see if a preemption on VM is in flight or userptr
> @@ -341,7 +339,7 @@ int xe_vm_add_compute_engine(struct xe_vm *vm, struct xe_engine *e)
>   	up_read(&vm->userptr.notifier_lock);
>   
>   out_unlock:
> -	xe_vm_unlock_dma_resv(vm, tv_onstack, tv, &ww, &objs);
> +	xe_vm_unlock_dma_resv(vm, &exec);
>   out_unlock_outer:
>   	up_write(&vm->lock);
>   
> @@ -367,25 +365,24 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm)
>   		list_empty(&vm->userptr.invalidated)) ? 0 : -EAGAIN;
>   }
>   
> +static struct drm_gem_object *xe_vm_gem(struct xe_vm *vm)
> +{
> +	int idx = vm->flags & XE_VM_FLAG_MIGRATION ?
> +		XE_VM_FLAG_GT_ID(vm->flags) : 0;
> +
> +	/* Safe to use index 0 as all BO in the VM share a single dma-resv lock */
> +	return &vm->pt_root[idx]->bo->ttm.base;
> +}
> +
>   /**
>    * xe_vm_lock_dma_resv() - Lock the vm dma_resv object and the dma_resv
>    * objects of the vm's external buffer objects.
> - * @vm: The vm.
> - * @ww: Pointer to a struct ww_acquire_ctx locking context.
> - * @tv_onstack: Array size XE_ONSTACK_TV of storage for the struct
> - * ttm_validate_buffers used for locking.
> - * @tv: Pointer to a pointer that on output contains the actual storage used.
> - * @objs: List head for the buffer objects locked.
> + * @vm: The vm
>    * @intr: Whether to lock interruptible.
>    * @num_shared: Number of dma-fence slots to reserve in the locked objects.
>    *
>    * Locks the vm dma-resv objects and all the dma-resv objects of the
> - * buffer objects on the vm external object list. The TTM utilities require
> - * a list of struct ttm_validate_buffers pointing to the actual buffer
> - * objects to lock. Storage for those struct ttm_validate_buffers should
> - * be provided in @tv_onstack, and is typically reserved on the stack
> - * of the caller. If the size of @tv_onstack isn't sufficient, then
> - * storage will be allocated internally using kvmalloc().
> + * buffer objects on the vm external object list.
>    *
>    * The function performs deadlock handling internally, and after a
>    * successful return the ww locking transaction should be considered
> @@ -395,46 +392,18 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm)
>    * @intr is set to true, -EINTR or -ERESTARTSYS may be returned. In case
>    * of error, any locking performed has been reverted.
>    */
> -int xe_vm_lock_dma_resv(struct xe_vm *vm, struct ww_acquire_ctx *ww,
> -			struct ttm_validate_buffer *tv_onstack,
> -			struct ttm_validate_buffer **tv,
> -			struct list_head *objs,
> -			bool intr,
> +int xe_vm_lock_dma_resv(struct xe_vm *vm, struct drm_exec *exec, bool intr,
>   			unsigned int num_shared)
>   {
> -	struct ttm_validate_buffer *tv_vm, *tv_bo;
>   	struct xe_vma *vma, *next;
> -	struct drm_gpuva *gpuva;
> -	LIST_HEAD(dups);
>   	int err;
>   
>   	lockdep_assert_held(&vm->lock);
>   
> -	if (vm->mgr.extobj.entries < XE_ONSTACK_TV) {
> -		tv_vm = tv_onstack;
> -	} else {
> -		tv_vm = kvmalloc_array(vm->mgr.extobj.entries + 1,
> -				       sizeof(*tv_vm),
> -				       GFP_KERNEL);
> -		if (!tv_vm)
> -			return -ENOMEM;
> -	}
> -	tv_bo = tv_vm + 1;
> -
> -	INIT_LIST_HEAD(objs);
> -	drm_gpuva_for_each_extobj(gpuva, &vm->mgr) {
> -		tv_bo->num_shared = num_shared;
> -		tv_bo->bo = &gem_to_xe_bo(gpuva->gem.obj)->ttm;
> -
> -		list_add_tail(&tv_bo->head, objs);
> -		tv_bo++;
> -	}
> -	tv_vm->num_shared = num_shared;
> -	tv_vm->bo = xe_vm_ttm_bo(vm);
> -	list_add_tail(&tv_vm->head, objs);
> -	err = ttm_eu_reserve_buffers(ww, objs, intr, &dups);
> +	err = drm_gpuva_manager_lock(&vm->mgr, exec, xe_vm_gem(vm), intr,
> +				     num_shared);
>   	if (err)
> -		goto out_err;
> +		return err;
>   
>   	spin_lock(&vm->notifier.list_lock);
>   	list_for_each_entry_safe(vma, next, &vm->notifier.rebind_list,
> @@ -447,34 +416,22 @@ int xe_vm_lock_dma_resv(struct xe_vm *vm, struct ww_acquire_ctx *ww,
>   	}
>   	spin_unlock(&vm->notifier.list_lock);
>   
> -	*tv = tv_vm;
>   	return 0;
> -
> -out_err:
> -	if (tv_vm != tv_onstack)
> -		kvfree(tv_vm);
> -
> -	return err;
>   }
>   
>   /**
>    * xe_vm_unlock_dma_resv() - Unlock reservation objects locked by
>    * xe_vm_lock_dma_resv()
>    * @vm: The vm.
> - * @tv_onstack: The @tv_onstack array given to xe_vm_lock_dma_resv().
> - * @tv: The value of *@tv given by xe_vm_lock_dma_resv().
> - * @ww: The ww_acquire_context used for locking.
> - * @objs: The list returned from xe_vm_lock_dma_resv().
>    *
>    * Unlocks the reservation objects and frees any memory allocated by
>    * xe_vm_lock_dma_resv().
>    */
> -void xe_vm_unlock_dma_resv(struct xe_vm *vm,
> -			   struct ttm_validate_buffer *tv_onstack,
> -			   struct ttm_validate_buffer *tv,
> -			   struct ww_acquire_ctx *ww,
> -			   struct list_head *objs)
> +void xe_vm_unlock_dma_resv(struct xe_vm *vm, struct drm_exec *exec)
>   {
> +	struct drm_gem_object *obj, *skip = xe_vm_gem(vm);
> +	unsigned long index;
> +
>   	/*
>   	 * Nothing should've been able to enter the list while we were locked,
>   	 * since we've held the dma-resvs of all the vm's external objects,
> @@ -483,19 +440,20 @@ void xe_vm_unlock_dma_resv(struct xe_vm *vm,
>   	 */
>   	XE_WARN_ON(!list_empty(&vm->notifier.rebind_list));
>   
> -	ttm_eu_backoff_reservation(ww, objs);
> -	if (tv && tv != tv_onstack)
> -		kvfree(tv);
> +	drm_exec_for_each_locked_object(exec, index, obj) {
> +		struct xe_bo *bo = gem_to_xe_bo(obj);
> +
> +		if (obj != skip)
> +			ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
> +	}
> +	drm_gpuva_manager_unlock(&vm->mgr, exec);
>   }
>   
>   static void preempt_rebind_work_func(struct work_struct *w)
>   {
>   	struct xe_vm *vm = container_of(w, struct xe_vm, preempt.rebind_work);
> +	struct drm_exec exec;
>   	struct xe_vma *vma;
> -	struct ttm_validate_buffer tv_onstack[XE_ONSTACK_TV];
> -	struct ttm_validate_buffer *tv;
> -	struct ww_acquire_ctx ww;
> -	struct list_head objs;
>   	struct dma_fence *rebind_fence;
>   	unsigned int fence_count = 0;
>   	LIST_HEAD(preempt_fences);
> @@ -536,8 +494,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
>   			goto out_unlock_outer;
>   	}
>   
> -	err = xe_vm_lock_dma_resv(vm, &ww, tv_onstack, &tv, &objs,
> -				  false, vm->preempt.num_engines);
> +	err = xe_vm_lock_dma_resv(vm, &exec, false, vm->preempt.num_engines);
>   	if (err)
>   		goto out_unlock_outer;
>   
> @@ -608,11 +565,11 @@ static void preempt_rebind_work_func(struct work_struct *w)
>   
>   	/* Point of no return. */
>   	arm_preempt_fences(vm, &preempt_fences);
> -	resume_and_reinstall_preempt_fences(vm);
> +	resume_and_reinstall_preempt_fences(vm, &exec);
>   	up_read(&vm->userptr.notifier_lock);
>   
>   out_unlock:
> -	xe_vm_unlock_dma_resv(vm, tv_onstack, tv, &ww, &objs);
> +	xe_vm_unlock_dma_resv(vm, &exec);
>   out_unlock_outer:
>   	if (err == -EAGAIN) {
>   		trace_xe_vm_rebind_worker_retry(vm);
> @@ -963,27 +920,16 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
>   
>   static void xe_vma_destroy_unlocked(struct xe_vma *vma)
>   {
> -	struct ttm_validate_buffer tv[2];
> -	struct ww_acquire_ctx ww;
> +	struct xe_vm *vm = xe_vma_vm(vma);
>   	struct xe_bo *bo = xe_vma_bo(vma);
> -	LIST_HEAD(objs);
> -	LIST_HEAD(dups);
> +	struct drm_exec exec;
>   	int err;
>   
> -	memset(tv, 0, sizeof(tv));
> -	tv[0].bo = xe_vm_ttm_bo(xe_vma_vm(vma));
> -	list_add(&tv[0].head, &objs);
> -
> -	if (bo) {
> -		tv[1].bo = &xe_bo_get(bo)->ttm;
> -		list_add(&tv[1].head, &objs);
> -	}
> -	err = ttm_eu_reserve_buffers(&ww, &objs, false, &dups);
> +	err = xe_vm_bo_lock(vm, xe_bo_get(bo), &exec, 0, false);
>   	XE_WARN_ON(err);
> -
>   	xe_vma_destroy(vma, NULL);
> +	xe_vm_bo_unlock(vm, bo, &exec, false);
>   
> -	ttm_eu_backoff_reservation(&ww, &objs);
>   	if (bo)
>   		xe_bo_put(bo);
>   }
> @@ -1254,7 +1200,7 @@ static void vm_error_capture(struct xe_vm *vm, int err,
>   void xe_vm_close_and_put(struct xe_vm *vm)
>   {
>   	struct list_head contested;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   	struct xe_device *xe = xe_vm_device(vm);
>   	struct xe_gt *gt;
>   	struct xe_vma *vma, *next_vma;
> @@ -1281,7 +1227,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
>   	}
>   
>   	down_write(&vm->lock);
> -	xe_vm_lock(vm, &ww, 0, false);
> +	xe_vm_lock(vm, &exec, 0, false);
>   	drm_gpuva_iter_for_each(gpuva, it) {
>   		vma = gpuva_to_vma(gpuva);
>   
> @@ -1323,7 +1269,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
>   					      NULL);
>   		}
>   	}
> -	xe_vm_unlock(vm, &ww);
> +	xe_vm_unlock(vm, &exec);
>   
>   	/*
>   	 * VM is now dead, cannot re-add nodes to vm->vmas if it's NULL
> @@ -1356,7 +1302,7 @@ static void vm_destroy_work_func(struct work_struct *w)
>   {
>   	struct xe_vm *vm =
>   		container_of(w, struct xe_vm, destroy_work);
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   	struct xe_device *xe = xe_vm_device(vm);
>   	struct xe_gt *gt;
>   	u8 id;
> @@ -1382,14 +1328,14 @@ static void vm_destroy_work_func(struct work_struct *w)
>   	 * is needed for xe_vm_lock to work. If we remove that dependency this
>   	 * can be moved to xe_vm_close_and_put.
>   	 */
> -	xe_vm_lock(vm, &ww, 0, false);
> +	xe_vm_lock(vm, &exec, 0, false);
>   	for_each_gt(gt, xe, id) {
>   		if (vm->pt_root[id]) {
>   			xe_pt_destroy(vm->pt_root[id], vm->flags, NULL);
>   			vm->pt_root[id] = NULL;
>   		}
>   	}
> -	xe_vm_unlock(vm, &ww);
> +	xe_vm_unlock(vm, &exec);
>   
>   	trace_xe_vm_free(vm);
>   	dma_fence_put(vm->rebind_fence);
> @@ -1969,21 +1915,6 @@ static int xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma,
>   
>   #define VM_BIND_OP(op)	(op & 0xffff)
>   
> -struct ttm_buffer_object *xe_vm_ttm_bo(struct xe_vm *vm)
> -{
> -	int idx = vm->flags & XE_VM_FLAG_MIGRATION ?
> -		XE_VM_FLAG_GT_ID(vm->flags) : 0;
> -
> -	/* Safe to use index 0 as all BO in the VM share a single dma-resv lock */
> -	return &vm->pt_root[idx]->bo->ttm;
> -}
> -
> -static void xe_vm_tv_populate(struct xe_vm *vm, struct ttm_validate_buffer *tv)
> -{
> -	tv->num_shared = 1;
> -	tv->bo = xe_vm_ttm_bo(vm);
> -}
> -
>   static void vm_set_async_error(struct xe_vm *vm, int err)
>   {
>   	lockdep_assert_held(&vm->lock);
> @@ -2088,7 +2019,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>   			 u32 operation, u8 gt_mask, u32 region)
>   {
>   	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   	struct drm_gpuva_ops *ops;
>   	struct drm_gpuva_op *__op;
>   	struct xe_vma_op *op;
> @@ -2136,11 +2067,11 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>   	case XE_VM_BIND_OP_UNMAP_ALL:
>   		XE_BUG_ON(!bo);
>   
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>   		if (err)
>   			return ERR_PTR(err);
>   		ops = drm_gpuva_gem_unmap_ops_create(&vm->mgr, obj);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>   
>   		drm_gpuva_for_each_op(__op, ops) {
>   			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
> @@ -2174,13 +2105,13 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>   {
>   	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
>   	struct xe_vma *vma;
> -	struct ww_acquire_ctx ww;
> +	struct drm_exec exec;
>   	int err;
>   
>   	lockdep_assert_held_write(&vm->lock);
>   
>   	if (bo) {
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>   		if (err)
>   			return ERR_PTR(err);
>   	}
> @@ -2189,7 +2120,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>   			    op->va.range - 1, read_only, null,
>   			    gt_mask);
>   	if (bo)
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>   
>   	if (xe_vma_is_userptr(vma)) {
>   		err = xe_vma_userptr_pin_pages(vma);
> @@ -2441,19 +2372,15 @@ static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)
>   static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
>   			       struct xe_vma_op *op)
>   {
> -	LIST_HEAD(objs);
> -	LIST_HEAD(dups);
> -	struct ttm_validate_buffer tv_bo, tv_vm;
> -	struct ww_acquire_ctx ww;
>   	struct xe_bo *vbo;
> +	struct drm_exec exec;
>   	int err;
> +	bool lru_update = op->base.op != DRM_GPUVA_OP_UNMAP;
>   
>   	lockdep_assert_held_write(&vm->lock);
>   
> -	xe_vm_tv_populate(vm, &tv_vm);
> -	list_add_tail(&tv_vm.head, &objs);
>   	vbo = xe_vma_bo(vma);
> -	if (vbo) {
> +	if (vbo)
>   		/*
>   		 * An unbind can drop the last reference to the BO and
>   		 * the BO is needed for ttm_eu_backoff_reservation so
> @@ -2461,22 +2388,15 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
>   		 */
>   		xe_bo_get(vbo);
>   
> -		if (!vbo->vm) {
> -			tv_bo.bo = &vbo->ttm;
> -			tv_bo.num_shared = 1;
> -			list_add(&tv_bo.head, &objs);
> -		}
> -	}
> -
>   again:
> -	err = ttm_eu_reserve_buffers(&ww, &objs, true, &dups);
> +	err = xe_vm_bo_lock(vm, vbo, &exec, 1, false);
>   	if (err) {
>   		xe_bo_put(vbo);
>   		return err;
>   	}
>   
>   	xe_vm_assert_held(vm);
> -	xe_bo_assert_held(xe_vma_bo(vma));
> +	xe_bo_assert_held(vbo);
>   
>   	switch (op->base.op) {
>   	case DRM_GPUVA_OP_MAP:
> @@ -2552,7 +2472,7 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
>   		XE_BUG_ON("NOT POSSIBLE");
>   	}
>   
> -	ttm_eu_backoff_reservation(&ww, &objs);
> +	xe_vm_bo_unlock(vm, vbo, &exec, lru_update);
>   	if (err == -EAGAIN && xe_vma_is_userptr(vma)) {
>   		lockdep_assert_held_write(&vm->lock);
>   		err = xe_vma_userptr_pin_pages(vma);
> @@ -3208,30 +3128,67 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   	return err == -ENODATA ? 0 : err;
>   }
>   
> -/*
> - * XXX: Using the TTM wrappers for now, likely can call into dma-resv code
> - * directly to optimize. Also this likely should be an inline function.
> - */
> -int xe_vm_lock(struct xe_vm *vm, struct ww_acquire_ctx *ww,
> +int xe_vm_lock(struct xe_vm *vm, struct drm_exec *exec,
>   	       int num_resv, bool intr)
>   {
> -	struct ttm_validate_buffer tv_vm;
> -	LIST_HEAD(objs);
> -	LIST_HEAD(dups);
> +	int err;
>   
> -	XE_BUG_ON(!ww);
> +	drm_exec_init(exec, intr);
> +	drm_exec_while_not_all_locked(exec) {
> +		err = drm_exec_prepare_obj(exec, xe_vm_gem(vm),
> +					   num_resv);
> +		drm_exec_continue_on_contention(exec);
> +		if (err && err != -EALREADY)
> +			goto out_err;
> +	}
>   
> -	tv_vm.num_shared = num_resv;
> -	tv_vm.bo = xe_vm_ttm_bo(vm);;
> -	list_add_tail(&tv_vm.head, &objs);
> +	return 0;
>   
> -	return ttm_eu_reserve_buffers(ww, &objs, intr, &dups);
> +out_err:
> +	drm_exec_fini(exec);
> +	return err;
>   }
>   
> -void xe_vm_unlock(struct xe_vm *vm, struct ww_acquire_ctx *ww)
> +void xe_vm_unlock(struct xe_vm *vm, struct drm_exec *exec)
>   {
> -	dma_resv_unlock(xe_vm_resv(vm));
> -	ww_acquire_fini(ww);
> +	drm_exec_fini(exec);
> +}
> +
> +int xe_vm_bo_lock(struct xe_vm *vm, struct xe_bo *bo, struct drm_exec *exec,
> +		  int num_resv, bool intr)
> +{
> +	int err;
> +
> +	drm_exec_init(exec, intr);
> +	drm_exec_while_not_all_locked(exec) {
> +		err = drm_exec_prepare_obj(exec, xe_vm_gem(vm),
> +					   num_resv);
> +		drm_exec_continue_on_contention(exec);
> +		if (err && err != -EALREADY)
> +			goto out_err;
> +
> +		if (bo && !bo->vm) {
> +			err = drm_exec_prepare_obj(exec, &bo->ttm.base,
> +						   num_resv);
> +			drm_exec_continue_on_contention(exec);
> +			if (err && err != -EALREADY)
> +				goto out_err;
> +		}
> +	}
> +
> +	return 0;
> +
> +out_err:
> +	drm_exec_fini(exec);
> +	return err;
> +}
> +
> +void xe_vm_bo_unlock(struct xe_vm *vm, struct xe_bo *bo, struct drm_exec *exec,
> +		     bool lru_update)
> +{
> +	if (lru_update && bo && (!bo->vm || xe_vm_no_dma_fences(vm)))
> +		ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
> +	drm_exec_fini(exec);
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index f279fa622260..47b981d9fc04 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -12,6 +12,7 @@
>   #include "xe_vm_types.h"
>   
>   struct drm_device;
> +struct drm_exec;
>   struct drm_printer;
>   struct drm_file;
>   
> @@ -38,10 +39,14 @@ static inline void xe_vm_put(struct xe_vm *vm)
>   	kref_put(&vm->refcount, xe_vm_free);
>   }
>   
> -int xe_vm_lock(struct xe_vm *vm, struct ww_acquire_ctx *ww,
> +int xe_vm_lock(struct xe_vm *vm, struct drm_exec *exec,
>   	       int num_resv, bool intr);
> +void xe_vm_unlock(struct xe_vm *vm, struct drm_exec *exec);
>   
> -void xe_vm_unlock(struct xe_vm *vm, struct ww_acquire_ctx *ww);
> +int xe_vm_bo_lock(struct xe_vm *vm, struct xe_bo *bo, struct drm_exec *exec,
> +		  int num_resv, bool intr);
> +void xe_vm_bo_unlock(struct xe_vm *vm, struct xe_bo *bo, struct drm_exec *exec,
> +		     bool lru_update);
>   
>   static inline bool xe_vm_is_closed(struct xe_vm *vm)
>   {
> @@ -219,23 +224,9 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma);
>   
>   int xe_vma_userptr_check_repin(struct xe_vma *vma);
>   
> -/*
> - * XE_ONSTACK_TV is used to size the tv_onstack array that is input
> - * to xe_vm_lock_dma_resv() and xe_vm_unlock_dma_resv().
> - */
> -#define XE_ONSTACK_TV 20
> -int xe_vm_lock_dma_resv(struct xe_vm *vm, struct ww_acquire_ctx *ww,
> -			struct ttm_validate_buffer *tv_onstack,
> -			struct ttm_validate_buffer **tv,
> -			struct list_head *objs,
> -			bool intr,
> +int xe_vm_lock_dma_resv(struct xe_vm *vm, struct drm_exec *exec, bool intr,
>   			unsigned int num_shared);
> -
> -void xe_vm_unlock_dma_resv(struct xe_vm *vm,
> -			   struct ttm_validate_buffer *tv_onstack,
> -			   struct ttm_validate_buffer *tv,
> -			   struct ww_acquire_ctx *ww,
> -			   struct list_head *objs);
> +void xe_vm_unlock_dma_resv(struct xe_vm *vm, struct drm_exec *exec);
>   
>   int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id);
>   
> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
> index 03508645fa08..a68bc6fec1de 100644
> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> @@ -7,6 +7,7 @@
>   
>   #include <linux/nospec.h>
>   
> +#include <drm/drm_exec.h>
>   #include <drm/ttm/ttm_tt.h>
>   #include <drm/xe_drm.h>
>   
> @@ -28,16 +29,16 @@ static int madvise_preferred_mem_class(struct xe_device *xe, struct xe_vm *vm,
>   
>   	for (i = 0; i < num_vmas; ++i) {
>   		struct xe_bo *bo;
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>   
>   		bo = xe_vma_bo(vmas[i]);
>   
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>   		if (err)
>   			return err;
>   		bo->props.preferred_mem_class = value;
>   		xe_bo_placement_for_flags(xe, bo, bo->flags);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>   	}
>   
>   	return 0;
> @@ -53,16 +54,16 @@ static int madvise_preferred_gt(struct xe_device *xe, struct xe_vm *vm,
>   
>   	for (i = 0; i < num_vmas; ++i) {
>   		struct xe_bo *bo;
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>   
>   		bo = xe_vma_bo(vmas[i]);
>   
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>   		if (err)
>   			return err;
>   		bo->props.preferred_gt = value;
>   		xe_bo_placement_for_flags(xe, bo, bo->flags);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>   	}
>   
>   	return 0;
> @@ -89,17 +90,17 @@ static int madvise_preferred_mem_class_gt(struct xe_device *xe,
>   
>   	for (i = 0; i < num_vmas; ++i) {
>   		struct xe_bo *bo;
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>   
>   		bo = xe_vma_bo(vmas[i]);
>   
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>   		if (err)
>   			return err;
>   		bo->props.preferred_mem_class = mem_class;
>   		bo->props.preferred_gt = gt_id;
>   		xe_bo_placement_for_flags(xe, bo, bo->flags);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>   	}
>   
>   	return 0;
> @@ -112,13 +113,13 @@ static int madvise_cpu_atomic(struct xe_device *xe, struct xe_vm *vm,
>   
>   	for (i = 0; i < num_vmas; ++i) {
>   		struct xe_bo *bo;
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>   
>   		bo = xe_vma_bo(vmas[i]);
>   		if (XE_IOCTL_ERR(xe, !(bo->flags & XE_BO_CREATE_SYSTEM_BIT)))
>   			return -EINVAL;
>   
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>   		if (err)
>   			return err;
>   		bo->props.cpu_atomic = !!value;
> @@ -130,7 +131,7 @@ static int madvise_cpu_atomic(struct xe_device *xe, struct xe_vm *vm,
>   		 */
>   		if (bo->props.cpu_atomic)
>   			ttm_bo_unmap_virtual(&bo->ttm);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>   	}
>   
>   	return 0;
> @@ -143,18 +144,18 @@ static int madvise_device_atomic(struct xe_device *xe, struct xe_vm *vm,
>   
>   	for (i = 0; i < num_vmas; ++i) {
>   		struct xe_bo *bo;
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>   
>   		bo = xe_vma_bo(vmas[i]);
>   		if (XE_IOCTL_ERR(xe, !(bo->flags & XE_BO_CREATE_VRAM0_BIT) &&
>   				 !(bo->flags & XE_BO_CREATE_VRAM1_BIT)))
>   			return -EINVAL;
>   
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>   		if (err)
>   			return err;
>   		bo->props.device_atomic = !!value;
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>   	}
>   
>   	return 0;
> @@ -174,16 +175,16 @@ static int madvise_priority(struct xe_device *xe, struct xe_vm *vm,
>   
>   	for (i = 0; i < num_vmas; ++i) {
>   		struct xe_bo *bo;
> -		struct ww_acquire_ctx ww;
> +		struct drm_exec exec;
>   
>   		bo = xe_vma_bo(vmas[i]);
>   
> -		err = xe_bo_lock(bo, &ww, 0, true);
> +		err = xe_bo_lock(bo, &exec, 0, true);
>   		if (err)
>   			return err;
>   		bo->ttm.priority = value;
>   		ttm_bo_move_to_lru_tail(&bo->ttm);
> -		xe_bo_unlock(bo, &ww);
> +		xe_bo_unlock(bo, &exec);
>   	}
>   
>   	return 0;
> diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
> index 943c8fcda533..a2f6d90ac899 100644
> --- a/include/drm/drm_gpuva_mgr.h
> +++ b/include/drm/drm_gpuva_mgr.h
> @@ -32,6 +32,8 @@
>   #include <linux/spinlock.h>
>   #include <linux/types.h>
>   
> +#include <drm/drm_exec.h>
> +
>   struct drm_gpuva_manager;
>   struct drm_gpuva_fn_ops;
>   struct drm_gpuva_prealloc;
> @@ -169,9 +171,17 @@ struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end);
>   
>   bool drm_gpuva_interval_empty(struct drm_gpuva_manager *mgr, u64 addr, u64 range);
>   
> -void drm_gpuva_add_fence(struct drm_gpuva_manager *mgr, struct dma_fence *fence,
> -			 enum dma_resv_usage private_usage,
> -			 enum dma_resv_usage extobj_usage);
> +int drm_gpuva_manager_lock(struct drm_gpuva_manager *mgr, struct drm_exec *exec,
> +			   struct drm_gem_object *mgr_obj, bool intr,
> +			   unsigned int num_fences);
> +void drm_gpuva_manager_unlock(struct drm_gpuva_manager *mgr,
> +			      struct drm_exec *exec);
> +
> +void drm_gpuva_manager_add_fence(struct drm_gpuva_manager *mgr,
> +				 struct drm_exec *exec,
> +				 struct dma_fence *fence,
> +				 enum dma_resv_usage private_usage,
> +				 enum dma_resv_usage extobj_usage);
>   
>   /**
>    * drm_gpuva_evict - sets whether the backing GEM of this &drm_gpuva is evicted

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 28/31] drm/xe: Allow dma-fences as in-syncs for compute / faulting VM
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 28/31] drm/xe: Allow dma-fences as in-syncs for compute / faulting VM Matthew Brost
  2023-05-05 19:43   ` Rodrigo Vivi
@ 2023-05-11 10:03   ` Thomas Hellström
  1 sibling, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-11 10:03 UTC (permalink / raw)
  To: Matthew Brost, intel-xe


On 5/2/23 02:17, Matthew Brost wrote:
> This is allowed per the dma-fencing rules.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

LGTM.

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>



> ---
>   drivers/gpu/drm/xe/xe_sync.c | 12 +++++++-----
>   1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c
> index 99f1ed87196d..1e4e4acb2c4a 100644
> --- a/drivers/gpu/drm/xe/xe_sync.c
> +++ b/drivers/gpu/drm/xe/xe_sync.c
> @@ -105,6 +105,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
>   {
>   	struct drm_xe_sync sync_in;
>   	int err;
> +	bool signal;
>   
>   	if (copy_from_user(&sync_in, sync_user, sizeof(*sync_user)))
>   		return -EFAULT;
> @@ -113,9 +114,10 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
>   			 ~(SYNC_FLAGS_TYPE_MASK | DRM_XE_SYNC_SIGNAL)))
>   		return -EINVAL;
>   
> +	signal = sync_in.flags & DRM_XE_SYNC_SIGNAL;
>   	switch (sync_in.flags & SYNC_FLAGS_TYPE_MASK) {
>   	case DRM_XE_SYNC_SYNCOBJ:
> -		if (XE_IOCTL_ERR(xe, no_dma_fences))
> +		if (XE_IOCTL_ERR(xe, no_dma_fences && signal))
>   			return -ENOTSUPP;
>   
>   		if (XE_IOCTL_ERR(xe, upper_32_bits(sync_in.addr)))
> @@ -125,7 +127,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
>   		if (XE_IOCTL_ERR(xe, !sync->syncobj))
>   			return -ENOENT;
>   
> -		if (!(sync_in.flags & DRM_XE_SYNC_SIGNAL)) {
> +		if (!signal) {
>   			sync->fence = drm_syncobj_fence_get(sync->syncobj);
>   			if (XE_IOCTL_ERR(xe, !sync->fence))
>   				return -EINVAL;
> @@ -133,7 +135,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
>   		break;
>   
>   	case DRM_XE_SYNC_TIMELINE_SYNCOBJ:
> -		if (XE_IOCTL_ERR(xe, no_dma_fences))
> +		if (XE_IOCTL_ERR(xe, no_dma_fences && signal))
>   			return -ENOTSUPP;
>   
>   		if (XE_IOCTL_ERR(xe, upper_32_bits(sync_in.addr)))
> @@ -146,7 +148,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
>   		if (XE_IOCTL_ERR(xe, !sync->syncobj))
>   			return -ENOENT;
>   
> -		if (sync_in.flags & DRM_XE_SYNC_SIGNAL) {
> +		if (signal) {
>   			sync->chain_fence = dma_fence_chain_alloc();
>   			if (!sync->chain_fence)
>   				return -ENOMEM;
> @@ -168,7 +170,7 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
>   		break;
>   
>   	case DRM_XE_SYNC_USER_FENCE:
> -		if (XE_IOCTL_ERR(xe, !(sync_in.flags & DRM_XE_SYNC_SIGNAL)))
> +		if (XE_IOCTL_ERR(xe, !signal))
>   			return -ENOTSUPP;
>   
>   		if (XE_IOCTL_ERR(xe, sync_in.addr & 0x7))

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 29/31] drm/xe: Allow compute VMs to output dma-fences on binds
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 29/31] drm/xe: Allow compute VMs to output dma-fences on binds Matthew Brost
  2023-05-09 14:50   ` Rodrigo Vivi
@ 2023-05-11 10:04   ` Thomas Hellström
  1 sibling, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-11 10:04 UTC (permalink / raw)
  To: Matthew Brost, intel-xe


On 5/2/23 02:17, Matthew Brost wrote:
> Binds are not long running jobs thus we can export dma-fences even if a
> VM is in compute mode.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


> ---
>   drivers/gpu/drm/xe/xe_vm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 55cced8870e6..07023506ce6b 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -3047,7 +3047,7 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   	for (num_syncs = 0; num_syncs < args->num_syncs; num_syncs++) {
>   		err = xe_sync_entry_parse(xe, xef, &syncs[num_syncs],
>   					  &syncs_user[num_syncs], false,
> -					  xe_vm_no_dma_fences(vm));
> +					  xe_vm_in_fault_mode(vm));
>   		if (err)
>   			goto free_syncs;
>   	}

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 31/31] drm/xe/uapi: Add some VM bind kernel doc
  2023-05-05 19:45   ` Rodrigo Vivi
@ 2023-05-11 10:14     ` Thomas Hellström
  0 siblings, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-11 10:14 UTC (permalink / raw)
  To: Rodrigo Vivi, Matthew Brost; +Cc: intel-xe


On 5/5/23 21:45, Rodrigo Vivi wrote:
> On Mon, May 01, 2023 at 05:17:27PM -0700, Matthew Brost wrote:
>> Try to explain how VM bind works in Xe.
> We will need more doc and likely with examples and all...
> but this is already something we need.
>
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>   include/uapi/drm/xe_drm.h | 45 ++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 42 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index cb4debe4ebda..c7137db2cbe8 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -148,7 +148,16 @@ struct drm_xe_engine_class_instance {
>>   	 * Kernel only classes (not actual hardware engine class). Used for
>>   	 * creating ordered queues of VM bind operations.
>>   	 */
>> +	/**
>> +	 * @DRM_XE_ENGINE_CLASS_VM_BIND_ASYNC: VM bind engine which are allowed
>> +	 * to use in / out syncs. The out sync indicates bind op(s) completion.
>> +	 */
>>   #define DRM_XE_ENGINE_CLASS_VM_BIND_ASYNC	5
>> +	/**
>> +	 * @DRM_XE_ENGINE_CLASS_VM_BIND_SYNC: VM bind engine which are not
>> +	 * allowed to use in / out syncs, The IOCTL return indicates bind op(s)
>> +	 * completion.
>> +	 */
>>   #define DRM_XE_ENGINE_CLASS_VM_BIND_SYNC	6
>>   
>>   	__u16 engine_instance;
>> @@ -322,6 +331,7 @@ struct drm_xe_vm_create {
>>   
>>   #define DRM_XE_VM_CREATE_SCRATCH_PAGE	(0x1 << 0)
>>   #define DRM_XE_VM_CREATE_COMPUTE_MODE	(0x1 << 1)
>> +	/** @DRM_XE_VM_CREATE_ASYNC_DEFAULT: Default VM bind engine is async */
>>   #define DRM_XE_VM_CREATE_ASYNC_DEFAULT	(0x1 << 2)
>>   #define DRM_XE_VM_CREATE_FAULT_MODE	(0x1 << 3)
>>   
>> @@ -379,21 +389,44 @@ struct drm_xe_vm_bind_op {
>>   	/** @mem_region: Memory region to prefetch VMA to, instance not a mask */
>>   	__u32 region;
>>   
>> +	/** @XE_VM_BIND_OP_MAP: Map a buffer object */
>>   #define XE_VM_BIND_OP_MAP		0x0
>> +	/** @XE_VM_BIND_OP_UNMAP: Unmap a buffer object or userptr */
>>   #define XE_VM_BIND_OP_UNMAP		0x1
>> +	/** @XE_VM_BIND_OP_MAP_USERPTR: Map a userptr */
>>   #define XE_VM_BIND_OP_MAP_USERPTR	0x2
>> +	/**
>> +	 * @XE_VM_BIND_OP_RESTART: Restart last bind operation that failed with
>> +	 * -ENOSPC
>> +	 */
>>   #define XE_VM_BIND_OP_RESTART		0x3
>> +	/**
>> +	 * @XE_VM_BIND_OP_UNMAP_ALL: Unmap all mappings associated with a
>> +	 * buffer ibject
>> +	 */
>>   #define XE_VM_BIND_OP_UNMAP_ALL		0x4
>> +	/**
>> +	 * @XE_VM_BIND_OP_PREFETCH: For a deferred bind (faulting VM)
>> +	 * validate buffer object and (re)bind
>> +	 */
>>   #define XE_VM_BIND_OP_PREFETCH		0x5
>> -
>> +	/** @XE_VM_BIND_FLAG_READONLY: Set mapping to read only */
>>   #define XE_VM_BIND_FLAG_READONLY	(0x1 << 16)
>> +	/**
>> +	 * @XE_VM_BIND_FLAG_ASYNC: Sanity check for if using async bind engine
>> +	 * (in / out syncs) this set needs to be set.
>> +	 */
>>   #define XE_VM_BIND_FLAG_ASYNC		(0x1 << 17)
>> -	/*
>> +	/**
>> +	 * @XE_VM_BIND_FLAG_IMMEDIATE:
>> +	 *
>>   	 * Valid on a faulting VM only, do the MAP operation immediately rather
>>   	 * than differing the MAP to the page fault handler.
>>   	 */
>>   #define XE_VM_BIND_FLAG_IMMEDIATE	(0x1 << 18)
>> -	/*
>> +	/**
>> +	 * @XE_VM_BIND_FLAG_NULL:
>> +	 *
>>   	 * When the NULL flag is set, the page tables are setup with a special
>>   	 * bit which indicates writes are dropped and all reads return zero. The
>>   	 * NULL flags is only valid for XE_VM_BIND_OP_MAP operations, the BO
>> @@ -401,6 +434,12 @@ struct drm_xe_vm_bind_op {
>>   	 * VK sparse bindings.
>>   	 */
>>   #define XE_VM_BIND_FLAG_NULL		(0x1 << 19)
>> +	/**
>> +	 * @XE_VM_BIND_FLAG_RECLAIM: Should be set when a VM is in an error
>> +	 * state (bind op returns -ENOSPC), used with sync bind engines to issue
>> +	 * UNMAP operations which hopefully free enough memory so when VM is
>> +	 * restarted via @XE_VM_BIND_OP_RESTART the failed bind ops succeed.
>> +	 */
>>   #define XE_VM_BIND_FLAG_RECLAIM		(0x1 << 20)
>>   
>>   	/** @reserved: Reserved */
>> -- 
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move
  2023-05-11  7:24           ` Thomas Hellström
@ 2023-05-11 14:11             ` Matthew Brost
  2023-05-12  9:03               ` Thomas Hellström
  0 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-05-11 14:11 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

On Thu, May 11, 2023 at 09:24:05AM +0200, Thomas Hellström wrote:
> 
> On 5/10/23 20:40, Matthew Brost wrote:
> > On Wed, May 10, 2023 at 10:14:12AM +0200, Thomas Hellström wrote:
> > > On 5/10/23 00:05, Matthew Brost wrote:
> > > > On Tue, May 09, 2023 at 02:47:54PM +0200, Thomas Hellström wrote:
> > > > > On 5/2/23 02:17, Matthew Brost wrote:
> > > > > > Use the TTM LRU bulk move for BOs tied to a VM. Update the bulk moves
> > > > > > LRU position on every exec.
> > > > > > 
> > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > > ---
> > > > > >     drivers/gpu/drm/xe/xe_bo.c       | 32 ++++++++++++++++++++++++++++----
> > > > > >     drivers/gpu/drm/xe/xe_bo.h       |  4 ++--
> > > > > >     drivers/gpu/drm/xe/xe_dma_buf.c  |  2 +-
> > > > > >     drivers/gpu/drm/xe/xe_exec.c     |  6 ++++++
> > > > > >     drivers/gpu/drm/xe/xe_vm_types.h |  3 +++
> > > > > >     5 files changed, 40 insertions(+), 7 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > > > > > index 3ab404e33fae..da99ee53e7d7 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > > > @@ -985,6 +985,23 @@ static void xe_gem_object_free(struct drm_gem_object *obj)
> > > > > >     	ttm_bo_put(container_of(obj, struct ttm_buffer_object, base));
> > > > > >     }
> > > > > > +static void xe_gem_object_close(struct drm_gem_object *obj,
> > > > > > +				struct drm_file *file_priv)
> > > > > > +{
> > > > > > +	struct xe_bo *bo = gem_to_xe_bo(obj);
> > > > > > +
> > > > > > +	if (bo->vm && !xe_vm_no_dma_fences(bo->vm)) {
> > > > > Is there a reason we don't use bulk moves for LR vms? Admittedly bumping LRU
> > > > > doesn't make much sense when we support user-space command buffer chaining,
> > > > > but I think we should be doing it on exec at least, no?
> > > > Maybe you could make the argument for compute VMs, the preempt worker in
> > > > that case should probably do a bulk move. I can change this if desired.
> > > Yes, please.
> > > > Fot a fault VM it makes no sense as the fault handler updates the LRU
> > > > for individual BOs.
> > > Yes that makes sense.
> > > > > > +		struct ww_acquire_ctx ww;
> > > > > > +
> > > > > > +		XE_BUG_ON(!xe_bo_is_user(bo));
> > > > > Also why can't we use this for kernel objects as well? At some point we want
> > > > > to get to evictable page-table objects? Could we do this in the
> > > > > release_notify() callback to cover all potential bos?
> > > > > 
> > > > xe_gem_object_close is a user call, right? We can't call this on kernel
> > > > BOs. This also could be outside the if statement.
> > > Hmm, yes the question was can we stop doing this in xe_gem_object_close()
> > > and instead do it in release_notify() to cover also kernel objects. Since
> > > release_notify() is called just after individualizing dma_resv, it makes
> > > sense to individualize also LRU at that point?
> > > 
> > If we ever support moving kernel BOs, then yes. We need to do a lot of
> > work to get there, with I'd rather leave this where is but I'll add a
> > comment indicating if we want to support kernel BO eviction, this should
> > be updated.
> > 
> > Sound good?
> 
> Well, I can't see the motivation to have it in gem close? Are other drivers
> doing that? Whether the object should be bulk moved or not is tied to
> whether it's a vm private object or not and that is closely tied to whether
> the reservation object is the vm resv or the object resv?
> 

AMDGPU does via amdgpu_gem_object_close -> amdgpu_vm_bo_del, so yes.

I also think I moved it here as before release_notify() I think there is
an assert TTM for the bulk move being NULL, let me find that.

 319 static void ttm_bo_release(struct kref *kref)
 320 {
 321         struct ttm_buffer_object *bo =
 322             container_of(kref, struct ttm_buffer_object, kref);
 323         struct ttm_device *bdev = bo->bdev;
 324         int ret;
 325
 326         WARN_ON_ONCE(bo->pin_count);
 327         WARN_ON_ONCE(bo->bulk_move);

Matt

> /Thomas
> 
> > 
> > Matt
> > 
> > > /Thomas
> > > 
> > > 
> > > > Matt
> > > > 
> > > > > /Thomas
> > > > > 
> > > > > 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move
  2023-05-11 14:11             ` Matthew Brost
@ 2023-05-12  9:03               ` Thomas Hellström
  0 siblings, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-12  9:03 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe


On 5/11/23 16:11, Matthew Brost wrote:
> On Thu, May 11, 2023 at 09:24:05AM +0200, Thomas Hellström wrote:
>> On 5/10/23 20:40, Matthew Brost wrote:
>>> On Wed, May 10, 2023 at 10:14:12AM +0200, Thomas Hellström wrote:
>>>> On 5/10/23 00:05, Matthew Brost wrote:
>>>>> On Tue, May 09, 2023 at 02:47:54PM +0200, Thomas Hellström wrote:
>>>>>> On 5/2/23 02:17, Matthew Brost wrote:
>>>>>>> Use the TTM LRU bulk move for BOs tied to a VM. Update the bulk moves
>>>>>>> LRU position on every exec.
>>>>>>>
>>>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>> ---
>>>>>>>      drivers/gpu/drm/xe/xe_bo.c       | 32 ++++++++++++++++++++++++++++----
>>>>>>>      drivers/gpu/drm/xe/xe_bo.h       |  4 ++--
>>>>>>>      drivers/gpu/drm/xe/xe_dma_buf.c  |  2 +-
>>>>>>>      drivers/gpu/drm/xe/xe_exec.c     |  6 ++++++
>>>>>>>      drivers/gpu/drm/xe/xe_vm_types.h |  3 +++
>>>>>>>      5 files changed, 40 insertions(+), 7 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>>>>>>> index 3ab404e33fae..da99ee53e7d7 100644
>>>>>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>>>>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>>>>>> @@ -985,6 +985,23 @@ static void xe_gem_object_free(struct drm_gem_object *obj)
>>>>>>>      	ttm_bo_put(container_of(obj, struct ttm_buffer_object, base));
>>>>>>>      }
>>>>>>> +static void xe_gem_object_close(struct drm_gem_object *obj,
>>>>>>> +				struct drm_file *file_priv)
>>>>>>> +{
>>>>>>> +	struct xe_bo *bo = gem_to_xe_bo(obj);
>>>>>>> +
>>>>>>> +	if (bo->vm && !xe_vm_no_dma_fences(bo->vm)) {
>>>>>> Is there a reason we don't use bulk moves for LR vms? Admittedly bumping LRU
>>>>>> doesn't make much sense when we support user-space command buffer chaining,
>>>>>> but I think we should be doing it on exec at least, no?
>>>>> Maybe you could make the argument for compute VMs, the preempt worker in
>>>>> that case should probably do a bulk move. I can change this if desired.
>>>> Yes, please.
>>>>> Fot a fault VM it makes no sense as the fault handler updates the LRU
>>>>> for individual BOs.
>>>> Yes that makes sense.
>>>>>>> +		struct ww_acquire_ctx ww;
>>>>>>> +
>>>>>>> +		XE_BUG_ON(!xe_bo_is_user(bo));
>>>>>> Also why can't we use this for kernel objects as well? At some point we want
>>>>>> to get to evictable page-table objects? Could we do this in the
>>>>>> release_notify() callback to cover all potential bos?
>>>>>>
>>>>> xe_gem_object_close is a user call, right? We can't call this on kernel
>>>>> BOs. This also could be outside the if statement.
>>>> Hmm, yes the question was can we stop doing this in xe_gem_object_close()
>>>> and instead do it in release_notify() to cover also kernel objects. Since
>>>> release_notify() is called just after individualizing dma_resv, it makes
>>>> sense to individualize also LRU at that point?
>>>>
>>> If we ever support moving kernel BOs, then yes. We need to do a lot of
>>> work to get there, with I'd rather leave this where is but I'll add a
>>> comment indicating if we want to support kernel BO eviction, this should
>>> be updated.
>>>
>>> Sound good?
>> Well, I can't see the motivation to have it in gem close? Are other drivers
>> doing that? Whether the object should be bulk moved or not is tied to
>> whether it's a vm private object or not and that is closely tied to whether
>> the reservation object is the vm resv or the object resv?
>>
> AMDGPU does via amdgpu_gem_object_close -> amdgpu_vm_bo_del, so yes.
>
> I also think I moved it here as before release_notify() I think there is
> an assert TTM for the bulk move being NULL, let me find that.
>
>   319 static void ttm_bo_release(struct kref *kref)
>   320 {
>   321         struct ttm_buffer_object *bo =
>   322             container_of(kref, struct ttm_buffer_object, kref);
>   323         struct ttm_device *bdev = bo->bdev;
>   324         int ret;
>   325
>   326         WARN_ON_ONCE(bo->pin_count);
>   327         WARN_ON_ONCE(bo->bulk_move);

Ugh, that's unfortunate.

In any case, it looks like if a client has multiple handles to the 
object, the close() callback will be called multiple times, and the bulk 
object released on the first, right?

The second best option would I guess then be to have it in 
xe_gem_object_free(), I suppose, but the problem with that is the 
potentially sleeping uninterruptible object lock :(.

If we could have it in release_notify() we already have the object lock.

So can we have it in xe_gem_object_free() for now and later perhaps ping 
Christian about moving that WARN_ON_ONCE?

/Thomas


> Matt
>
>> /Thomas
>>
>>> Matt
>>>
>>>> /Thomas
>>>>
>>>>
>>>>> Matt
>>>>>
>>>>>> /Thomas
>>>>>>
>>>>>>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 30/31] drm/xe: remove async worker, sync binds, new error handling
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 30/31] drm/xe: remove async worker, sync binds, new error handling Matthew Brost
@ 2023-05-17 16:53   ` Thomas Hellström
  0 siblings, 0 replies; 126+ messages in thread
From: Thomas Hellström @ 2023-05-17 16:53 UTC (permalink / raw)
  To: Matthew Brost, intel-xe

Hi, Matthew

Some quick comments below. I really need to apply this patch to see the
resulting code so I'll do a full review in the next version where I
have something that applies cleanly.

/Thomas

On Mon, 2023-05-01 at 17:17 -0700, Matthew Brost wrote:
> Async worker is gone, all jobs and memory allocations done in IOCTL.

Done synchronously?

> 
> Async vs. sync now means when do bind operations complete relative to
> the IOCTL. Async completes when out-syncs signal while sync completes
> when the IOCTL returns. In-syncs and out-syncs are only allowed in
> async
> mode.
> 
> The error handling is similar to before, on memory allocation errors
> binds are pause, VM is put in an error state, and the bind IOCTL
> returns -ENOSPC. The user is allowed to issue sync unbinds, with the
> reclaim bit set, while in an error state. Bind operations without the
> reclaim bit set are rejected with -EALREADY until the exits the error
> state. To exit the error issue a restart bind operation which will
> pick
> up where the original failure left off.

What about -EINTR?

> 
> TODO: Update kernel doc

This is done in the next patch in the series, right? so we can drop
this TODO?

> 
> Signed-off-by: Matthew Brost <mattthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_engine.c          |   7 +-
>  drivers/gpu/drm/xe/xe_engine_types.h    |   1 +
>  drivers/gpu/drm/xe/xe_exec.c            |  42 --
>  drivers/gpu/drm/xe/xe_sync.c            |  14 +-
>  drivers/gpu/drm/xe/xe_sync.h            |   2 +-
>  drivers/gpu/drm/xe/xe_vm.c              | 712 ++++++----------------
> --
>  drivers/gpu/drm/xe/xe_vm.h              |   2 -
>  drivers/gpu/drm/xe/xe_vm_types.h        |  37 +-
>  drivers/gpu/drm/xe/xe_wait_user_fence.c |  43 +-
>  include/uapi/drm/xe_drm.h               |  79 +--
>  10 files changed, 213 insertions(+), 726 deletions(-)
> 


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 11/31] drm/xe/guc: Use doorbells for submission if possible
  2023-05-02  0:17 ` [Intel-xe] [PATCH v2 11/31] drm/xe/guc: Use doorbells for submission if possible Matthew Brost
  2023-05-08 21:42   ` Rodrigo Vivi
  2023-05-09 13:00   ` Thomas Hellström
@ 2023-05-21 12:32   ` Oded Gabbay
  2023-06-08 19:30     ` Matthew Brost
  2 siblings, 1 reply; 126+ messages in thread
From: Oded Gabbay @ 2023-05-21 12:32 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Faith Ekstrand

On Sun, May 21, 2023 at 3:18 PM Matthew Brost <matthew.brost@intel.com> wrote:
>
> We have 256 doorbells (on most platforms) that we can allocate to bypass
> using the H2G channel for submission. This will avoid contention on the
> CT mutex.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Suggested-by: Faith Ekstrand <faith.ekstrand@collabora.com>
> ---
>  drivers/gpu/drm/xe/regs/xe_guc_regs.h    |   1 +
>  drivers/gpu/drm/xe/xe_guc.c              |   6 +
>  drivers/gpu/drm/xe/xe_guc_engine_types.h |   7 +
>  drivers/gpu/drm/xe/xe_guc_submit.c       | 295 ++++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
>  drivers/gpu/drm/xe/xe_guc_types.h        |   4 +
>  drivers/gpu/drm/xe/xe_trace.h            |   5 +
>  7 files changed, 315 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/regs/xe_guc_regs.h b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> index 37e0ac550931..11b117293a62 100644
> --- a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> +++ b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> @@ -109,6 +109,7 @@ struct guc_doorbell_info {
>
>  #define DIST_DBS_POPULATED                     XE_REG(0xd08)
>  #define   DOORBELLS_PER_SQIDI_MASK             REG_GENMASK(23, 16)
> +#define          DOORBELLS_PER_SQIDI_SHIFT             16
>  #define   SQIDIS_DOORBELL_EXIST_MASK           REG_GENMASK(15, 0)
>
>  #define GUC_BCS_RCS_IER                                XE_REG(0xC550)
> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> index 89d20faced19..0c87f78a868b 100644
> --- a/drivers/gpu/drm/xe/xe_guc.c
> +++ b/drivers/gpu/drm/xe/xe_guc.c
> @@ -297,6 +297,12 @@ int xe_guc_init(struct xe_guc *guc)
>   */
>  int xe_guc_init_post_hwconfig(struct xe_guc *guc)
>  {
> +       int ret;
> +
> +       ret = xe_guc_submit_init_post_hwconfig(guc);
> +       if (ret)
> +               return ret;
> +
>         return xe_guc_ads_init_post_hwconfig(&guc->ads);
>  }
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> index 5d83132034a6..420b7f53e649 100644
> --- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
> +++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> @@ -12,6 +12,7 @@
>  #include <drm/gpu_scheduler.h>
>
>  struct dma_fence;
> +struct xe_bo;
>  struct xe_engine;
>
>  /**
> @@ -37,6 +38,10 @@ struct xe_guc_engine {
>         struct work_struct fini_async;
>         /** @resume_time: time of last resume */
>         u64 resume_time;
> +       /** @doorbell_bo: BO for memory doorbell */
> +       struct xe_bo *doorbell_bo;
> +       /** @doorbell_offset: MMIO doorbell offset */
> +       u32 doorbell_offset;
>         /** @state: GuC specific state for this xe_engine */
>         atomic_t state;
>         /** @wqi_head: work queue item tail */
> @@ -45,6 +50,8 @@ struct xe_guc_engine {
>         u32 wqi_tail;
>         /** @id: GuC id for this xe_engine */
>         u16 id;
> +       /** @doorbell_id: doorbell id */
> +       u16 doorbell_id;
>         /** @suspend_wait: wait queue used to wait on pending suspends */
>         wait_queue_head_t suspend_wait;
>         /** @suspend_pending: a suspend of the engine is pending */
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 0a41f5d04f6d..1b6f36b04cd1 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -13,7 +13,10 @@
>
>  #include <drm/drm_managed.h>
>
> +#include "regs/xe_guc_regs.h"
>  #include "regs/xe_lrc_layout.h"
> +
> +#include "xe_bo.h"
>  #include "xe_device.h"
>  #include "xe_engine.h"
>  #include "xe_force_wake.h"
> @@ -26,12 +29,22 @@
>  #include "xe_lrc.h"
>  #include "xe_macros.h"
>  #include "xe_map.h"
> +#include "xe_mmio.h"
>  #include "xe_mocs.h"
>  #include "xe_ring_ops_types.h"
>  #include "xe_sched_job.h"
>  #include "xe_trace.h"
>  #include "xe_vm.h"
>
> +#define HAS_GUC_MMIO_DB(xe) (IS_DGFX(xe) || GRAPHICS_VERx100(xe) >= 1250)
> +#define HAS_GUC_DIST_DB(xe) \
> +       (GRAPHICS_VERx100(xe) >= 1200 && !HAS_GUC_MMIO_DB(xe))
> +
> +#define GUC_NUM_HW_DOORBELLS 256
> +
> +#define GUC_MMIO_DB_BAR_OFFSET SZ_4M
> +#define GUC_MMIO_DB_BAR_SIZE SZ_4M
> +
>  static struct xe_gt *
>  guc_to_gt(struct xe_guc *guc)
>  {
> @@ -63,6 +76,7 @@ engine_to_guc(struct xe_engine *e)
>  #define ENGINE_STATE_SUSPENDED         (1 << 5)
>  #define ENGINE_STATE_RESET             (1 << 6)
>  #define ENGINE_STATE_KILLED            (1 << 7)
> +#define ENGINE_STATE_DB_REGISTERED     (1 << 8)
>
>  static bool engine_registered(struct xe_engine *e)
>  {
> @@ -179,6 +193,16 @@ static void set_engine_killed(struct xe_engine *e)
>         atomic_or(ENGINE_STATE_KILLED, &e->guc->state);
>  }
>
> +static bool engine_doorbell_registered(struct xe_engine *e)
> +{
> +       return atomic_read(&e->guc->state) & ENGINE_STATE_DB_REGISTERED;
> +}
> +
> +static void set_engine_doorbell_registered(struct xe_engine *e)
> +{
> +       atomic_or(ENGINE_STATE_DB_REGISTERED, &e->guc->state);
> +}
> +
>  static bool engine_killed_or_banned(struct xe_engine *e)
>  {
>         return engine_killed(e) || engine_banned(e);
> @@ -190,6 +214,7 @@ static void guc_submit_fini(struct drm_device *drm, void *arg)
>
>         xa_destroy(&guc->submission_state.engine_lookup);
>         ida_destroy(&guc->submission_state.guc_ids);
> +       ida_destroy(&guc->submission_state.doorbell_ids);
>         bitmap_free(guc->submission_state.guc_ids_bitmap);
>  }
>
> @@ -230,6 +255,7 @@ int xe_guc_submit_init(struct xe_guc *guc)
>         mutex_init(&guc->submission_state.lock);
>         xa_init(&guc->submission_state.engine_lookup);
>         ida_init(&guc->submission_state.guc_ids);
> +       ida_init(&guc->submission_state.doorbell_ids);
>
>         spin_lock_init(&guc->submission_state.suspend.lock);
>         guc->submission_state.suspend.context = dma_fence_context_alloc(1);
> @@ -243,6 +269,237 @@ int xe_guc_submit_init(struct xe_guc *guc)
>         return 0;
>  }
>
> +int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc)
> +{
> +       if (HAS_GUC_DIST_DB(guc_to_xe(guc))) {
> +               u32 distdbreg = xe_mmio_read32(guc_to_gt(guc),
> +                                              DIST_DBS_POPULATED.reg);
> +               u32 num_sqidi =
> +                       hweight32(distdbreg & SQIDIS_DOORBELL_EXIST_MASK);
> +               u32 doorbells_per_sqidi =
> +                       ((distdbreg >> DOORBELLS_PER_SQIDI_SHIFT) &
> +                        DOORBELLS_PER_SQIDI_MASK) + 1;
> +
> +               guc->submission_state.num_doorbells =
> +                       num_sqidi * doorbells_per_sqidi;
> +       } else {
> +               guc->submission_state.num_doorbells = GUC_NUM_HW_DOORBELLS;
> +       }
> +
> +       return 0;
> +}
> +
> +static bool alloc_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
> +{
> +       int ret;
> +
> +       lockdep_assert_held(&guc->submission_state.lock);
> +
> +       e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
> +       ret = ida_simple_get(&guc->submission_state.doorbell_ids, 0,
> +                            guc->submission_state.num_doorbells, GFP_NOWAIT);
> +       if (ret < 0)
> +               return false;
> +
> +       e->guc->doorbell_id = ret;
> +
> +       return true;
> +}
> +
> +static void release_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
> +{
> +       mutex_lock(&guc->submission_state.lock);
> +       ida_simple_remove(&guc->submission_state.doorbell_ids,
> +                         e->guc->doorbell_id);
> +       mutex_unlock(&guc->submission_state.lock);
> +
> +       e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
> +}
> +
> +static int allocate_doorbell(struct xe_guc *guc, u16 guc_id, u16 doorbell_id,
> +                            u64 gpa, u32 gtt_addr)
> +{
> +       u32 action[] = {
> +               XE_GUC_ACTION_ALLOCATE_DOORBELL,
> +               guc_id,
> +               doorbell_id,
> +               lower_32_bits(gpa),
> +               upper_32_bits(gpa),
> +               gtt_addr
> +       };
> +
> +       return xe_guc_ct_send_block(&guc->ct, action, ARRAY_SIZE(action));
> +}
> +
> +static void deallocate_doorbell(struct xe_guc *guc, u16 guc_id)
> +{
> +       u32 action[] = {
> +               XE_GUC_ACTION_DEALLOCATE_DOORBELL,
> +               guc_id
> +       };
> +
> +       xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> +}
> +
> +static bool has_doorbell(struct xe_engine *e)
> +{
> +       return e->guc->doorbell_id != GUC_NUM_HW_DOORBELLS;
> +}
> +
> +#define doorbell_read(guc_, e_, field_) ({                     \
> +       struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;  \
> +       iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);     \
> +       xe_map_rd_field(guc_to_xe((guc_)), &_vmap, 0,           \
> +                                 struct guc_doorbell_info, field_); \
> +       })
> +#define doorbell_write(guc_, e_, field_, val_) ({              \
> +       struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;  \
> +       iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);     \
> +       xe_map_wr_field(guc_to_xe((guc_)), &_vmap, 0,           \
> +                                 struct guc_doorbell_info, field_, val_); \
> +       })
> +
> +static void init_doorbell(struct xe_guc *guc, struct xe_engine *e)
> +{
> +       struct xe_device *xe = guc_to_xe(guc);
> +
> +       /* GuC does the initialization with distributed and MMIO doorbells */
> +       if (!HAS_GUC_DIST_DB(xe) && !HAS_GUC_MMIO_DB(xe)) {
> +               doorbell_write(guc, e, db_status, GUC_DOORBELL_ENABLED);
> +               doorbell_write(guc, e, cookie, 0);
> +       }
> +}
> +
> +static void fini_doorbell(struct xe_guc *guc, struct xe_engine *e)
> +{
> +       if (!HAS_GUC_MMIO_DB(guc_to_xe(guc)) &&
> +           xe_device_mem_access_ongoing(guc_to_xe(guc)))
> +               doorbell_write(guc, e, db_status, GUC_DOORBELL_DISABLED);
> +}
> +
> +static void destroy_doorbell(struct xe_guc *guc, struct xe_engine *e)
> +{
> +       if (has_doorbell(e)) {
> +               release_doorbell_id(guc, e);
> +               xe_bo_unpin_map_no_vm(e->guc->doorbell_bo);
> +       }
> +}
> +
> +static void ring_memory_doorbell(struct xe_guc *guc, struct xe_engine *e)
> +{
> +       u32 cookie;
> +
> +       cookie = doorbell_read(guc, e, cookie);
> +       doorbell_write(guc, e, cookie, cookie + 1 ?: cookie + 2);
> +
> +       XE_WARN_ON(doorbell_read(guc, e, db_status) != GUC_DOORBELL_ENABLED);
> +}
> +
> +#define GUC_MMIO_DOORBELL_RING_ACK     0xACEDBEEF
> +#define GUC_MMIO_DOORBELL_RING_NACK    0xDEADBEEF
> +static void ring_mmio_doorbell(struct xe_guc *guc, u32 doorbell_offset)
> +{
> +       u32 db_value;
> +
> +       db_value = xe_mmio_read32(guc_to_gt(guc), GUC_MMIO_DB_BAR_OFFSET +
> +                                 doorbell_offset);
> +
> +       /*
> +        * The read from the doorbell page will return ack/nack. We don't remove
> +        * doorbells from active clients so we don't expect to ever get a nack.
> +        * XXX: if doorbell is lost, re-acquire it?
> +        */
> +       XE_WARN_ON(db_value == GUC_MMIO_DOORBELL_RING_NACK);
> +       XE_WARN_ON(db_value != GUC_MMIO_DOORBELL_RING_ACK);
> +}
> +
> +static void ring_doorbell(struct xe_guc *guc, struct xe_engine *e)
> +{
> +       XE_BUG_ON(!has_doorbell(e));
> +
> +       if (HAS_GUC_MMIO_DB(guc_to_xe(guc)))
> +               ring_mmio_doorbell(guc, e->guc->doorbell_offset);
> +       else
> +               ring_memory_doorbell(guc, e);
> +
> +       trace_xe_engine_ring_db(e);
> +}
> +
> +static void register_engine(struct xe_engine *e);
> +
> +static int create_doorbell(struct xe_guc *guc, struct xe_engine *e, bool init)
> +{
> +       struct xe_gt *gt = guc_to_gt(guc);
> +       struct xe_device *xe = gt_to_xe(gt);
> +       u64 gpa;
> +       u32 gtt_addr;
> +       int ret;
> +
> +       XE_BUG_ON(!has_doorbell(e));
> +
> +       if (HAS_GUC_MMIO_DB(xe)) {
> +               e->guc->doorbell_offset = PAGE_SIZE * e->guc->doorbell_id;
I think there is an implied assumption here (and in the code below)
that PAGE_SIZE is always 4KB, which is problematic in non-x86
architectures.
If there is no limitation on xe being used solely on x86_64, then I
think it would be better to change this to SZ_4K

> +               gpa = GUC_MMIO_DB_BAR_OFFSET + e->guc->doorbell_offset;
> +               gtt_addr = 0;
> +       } else {
> +               struct xe_bo *bo;
> +
> +               if (!e->guc->doorbell_bo) {
> +                       bo = xe_bo_create_pin_map(xe, gt, NULL, PAGE_SIZE,
> +                                                 ttm_bo_type_kernel,
> +                                                 XE_BO_CREATE_VRAM_IF_DGFX(gt) |
> +                                                 XE_BO_CREATE_GGTT_BIT);
> +                       if (IS_ERR(bo))
> +                               return PTR_ERR(bo);
> +
> +                       e->guc->doorbell_bo = bo;
> +               } else {
> +                       bo = e->guc->doorbell_bo;
> +               }
> +
> +               init_doorbell(guc, e);
> +               gpa = xe_bo_main_addr(bo, PAGE_SIZE);
> +               gtt_addr = xe_bo_ggtt_addr(bo);
> +       }
> +
> +       if (init && e->flags & ENGINE_FLAG_KERNEL)
> +               return 0;
> +
> +       register_engine(e);
> +       ret = allocate_doorbell(guc, e->guc->id, e->guc->doorbell_id, gpa,
> +                               gtt_addr);
> +       if (ret < 0) {
> +               fini_doorbell(guc, e);
> +               return ret;
> +       }
> +
> +       /*
> +        * In distributed doorbells, guc is returning the cacheline selected
> +        * by HW as part of the 7bit data from the allocate doorbell command:
> +        *  bit [22]   - Cacheline allocated
> +        *  bit [21:16] - Cacheline offset address
> +        * (bit 21 must be zero, or our assumption of only using half a page is
> +        * no longer correct).
> +        */
> +       if (HAS_GUC_DIST_DB(xe)) {
> +               u32 dd_cacheline_info;
> +
> +               XE_WARN_ON(!(ret & BIT(22)));
> +               XE_WARN_ON(ret & BIT(21));
> +
> +               dd_cacheline_info = FIELD_GET(GENMASK(21, 16), ret);
> +               e->guc->doorbell_offset = dd_cacheline_info * cache_line_size();
I don't understand something. We have 256 doorbells, but here you
overrun the doorbell_offset, where dd_cacheline_info can be 0-31
(because bit 21 is always 0).
This has overlap... Some doorbells will get the same address.
> +
> +               /* and verify db status was updated correctly by the guc fw */
> +               XE_WARN_ON(doorbell_read(guc, e, db_status) !=
> +                          GUC_DOORBELL_ENABLED);
> +       }
> +
> +       set_engine_doorbell_registered(e);
> +
> +       return 0;
> +}
> +
>  static int alloc_guc_id(struct xe_guc *guc, struct xe_engine *e)
>  {
>         int ret;
> @@ -623,6 +880,7 @@ static void submit_engine(struct xe_engine *e)
>         u32 num_g2h = 0;
>         int len = 0;
>         bool extra_submit = false;
> +       bool enable = false;
>
>         XE_BUG_ON(!engine_registered(e));
>
> @@ -642,6 +900,7 @@ static void submit_engine(struct xe_engine *e)
>                 num_g2h = 1;
>                 if (xe_engine_is_parallel(e))
>                         extra_submit = true;
> +               enable = true;
>
>                 e->guc->resume_time = RESUME_PENDING;
>                 set_engine_pending_enable(e);
> @@ -653,7 +912,10 @@ static void submit_engine(struct xe_engine *e)
>                 trace_xe_engine_submit(e);
>         }
>
> -       xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
> +       if (enable || !engine_doorbell_registered(e))
> +               xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
> +       else
> +               ring_doorbell(guc, e);
>
>         if (extra_submit) {
>                 len = 0;
> @@ -678,8 +940,17 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
>         trace_xe_sched_job_run(job);
>
>         if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
> -               if (!engine_registered(e))
> -                       register_engine(e);
> +               if (!engine_registered(e)) {
> +                       if (has_doorbell(e)) {
> +                               int err = create_doorbell(engine_to_guc(e), e,
> +                                                         false);
> +
> +                               /* Not fatal, but let's warn */
> +                               XE_WARN_ON(err);
> +                       } else {
> +                               register_engine(e);
> +                       }
> +               }
>                 if (!lr)        /* Written in IOCTL */
>                         e->ring_ops->emit_job(job);
>                 submit_engine(e);
> @@ -722,6 +993,11 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
>         MAKE_SCHED_CONTEXT_ACTION(e, DISABLE);
>         int ret;
>
> +       if (has_doorbell(e)) {
> +               fini_doorbell(guc, e);
> +               deallocate_doorbell(guc, e->guc->id);
> +       }
> +
>         set_min_preemption_timeout(guc, e);
>         smp_rmb();
>         ret = wait_event_timeout(guc->ct.wq, !engine_pending_enable(e) ||
> @@ -958,6 +1234,7 @@ static void __guc_engine_fini_async(struct work_struct *w)
>                 cancel_work_sync(&ge->lr_tdr);
>         if (e->flags & ENGINE_FLAG_PERSISTENT)
>                 xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
> +       destroy_doorbell(guc, e);
>         release_guc_id(guc, e);
>         drm_sched_entity_fini(&ge->entity);
>         drm_sched_fini(&ge->sched);
> @@ -1136,6 +1413,7 @@ static int guc_engine_init(struct xe_engine *e)
>         struct xe_guc_engine *ge;
>         long timeout;
>         int err;
> +       bool create_db = false;
>
>         XE_BUG_ON(!xe_device_guc_submission_enabled(guc_to_xe(guc)));
>
> @@ -1177,8 +1455,17 @@ static int guc_engine_init(struct xe_engine *e)
>         if (guc_read_stopped(guc))
>                 drm_sched_stop(sched, NULL);
>
> +       create_db = alloc_doorbell_id(guc, e);
> +
>         mutex_unlock(&guc->submission_state.lock);
>
> +       if (create_db) {
> +               /* Error isn't fatal as we don't need a doorbell */
> +               err = create_doorbell(guc, e, true);
> +               if (err)
> +                       release_doorbell_id(guc, e);
> +       }
> +
>         switch (e->class) {
>         case XE_ENGINE_CLASS_RENDER:
>                 sprintf(e->name, "rcs%d", e->guc->id);
> @@ -1302,7 +1589,7 @@ static int guc_engine_set_job_timeout(struct xe_engine *e, u32 job_timeout_ms)
>  {
>         struct drm_gpu_scheduler *sched = &e->guc->sched;
>
> -       XE_BUG_ON(engine_registered(e));
> +       XE_BUG_ON(engine_registered(e) && !has_doorbell(e));
>         XE_BUG_ON(engine_banned(e));
>         XE_BUG_ON(engine_killed(e));
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
> index 8002734d6f24..bada6c02d6aa 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.h
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
> @@ -13,6 +13,7 @@ struct xe_engine;
>  struct xe_guc;
>
>  int xe_guc_submit_init(struct xe_guc *guc);
> +int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc);
>  void xe_guc_submit_print(struct xe_guc *guc, struct drm_printer *p);
>
>  int xe_guc_submit_reset_prepare(struct xe_guc *guc);
> diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
> index ac7eec28934d..9ee4d572f4e0 100644
> --- a/drivers/gpu/drm/xe/xe_guc_types.h
> +++ b/drivers/gpu/drm/xe/xe_guc_types.h
> @@ -36,10 +36,14 @@ struct xe_guc {
>                 struct xarray engine_lookup;
>                 /** @guc_ids: used to allocate new guc_ids, single-lrc */
>                 struct ida guc_ids;
> +               /** @doorbell_ids: use to allocate new doorbells */
> +               struct ida doorbell_ids;
>                 /** @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc */
>                 unsigned long *guc_ids_bitmap;
>                 /** @stopped: submissions are stopped */
>                 atomic_t stopped;
> +               /** @num_doorbells: number of doorbels */
> +               int num_doorbells;
>                 /** @lock: protects submission state */
>                 struct mutex lock;
>                 /** @suspend: suspend fence state */
> diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> index 02861c26e145..38e9d7c6197b 100644
> --- a/drivers/gpu/drm/xe/xe_trace.h
> +++ b/drivers/gpu/drm/xe/xe_trace.h
> @@ -149,6 +149,11 @@ DEFINE_EVENT(xe_engine, xe_engine_submit,
>              TP_ARGS(e)
>  );
>
> +DEFINE_EVENT(xe_engine, xe_engine_ring_db,
> +            TP_PROTO(struct xe_engine *e),
> +            TP_ARGS(e)
> +);
> +
>  DEFINE_EVENT(xe_engine, xe_engine_scheduling_enable,
>              TP_PROTO(struct xe_engine *e),
>              TP_ARGS(e)
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 03/31] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
  2023-05-08 12:40   ` Thomas Hellström
@ 2023-05-22  1:16     ` Matthew Brost
  0 siblings, 0 replies; 126+ messages in thread
From: Matthew Brost @ 2023-05-22  1:16 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

On Mon, May 08, 2023 at 02:40:24PM +0200, Thomas Hellström wrote:
> An question below, with that addressed (possibly without change)
> 
> although I'm not a scheduler expert and we should ideally have additional
> reviewers,
> 
> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> On 5/2/23 02:16, Matthew Brost wrote:
> > DRM_SCHED_POLICY_SINGLE_ENTITY creates a 1 to 1 relationship between
> > scheduler and entity. No priorities or run queue used in this mode.
> > Intended for devices with firmware schedulers.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/scheduler/sched_entity.c | 64 +++++++++++++++++++-----
> >   drivers/gpu/drm/scheduler/sched_fence.c  |  2 +-
> >   drivers/gpu/drm/scheduler/sched_main.c   | 63 ++++++++++++++++++++---
> >   include/drm/gpu_scheduler.h              |  8 +++
> >   4 files changed, 115 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > index 2300b2fc06ab..8b70900c54cc 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -83,6 +83,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
> >   	memset(entity, 0, sizeof(struct drm_sched_entity));
> >   	INIT_LIST_HEAD(&entity->list);
> >   	entity->rq = NULL;
> > +	entity->single_sched = NULL;
> >   	entity->guilty = guilty;
> >   	entity->num_sched_list = num_sched_list;
> >   	entity->priority = priority;
> > @@ -91,7 +92,15 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
> >   	RB_CLEAR_NODE(&entity->rb_tree_node);
> >   	if(num_sched_list) {
> > -		entity->rq = &sched_list[0]->sched_rq[entity->priority];
> > +		if (sched_list[0]->sched_policy !=
> > +		    DRM_SCHED_POLICY_SINGLE_ENTITY) {
> > +			entity->rq = &sched_list[0]->sched_rq[entity->priority];
> > +		} else {
> > +			if (num_sched_list != 1 || sched_list[0]->single_entity)
> > +				return -EINVAL;
> > +			sched_list[0]->single_entity = entity;
> > +			entity->single_sched = sched_list[0];
> > +		}
> >   	}
> >   	init_completion(&entity->entity_idle);
> > @@ -125,7 +134,8 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >   				    struct drm_gpu_scheduler **sched_list,
> >   				    unsigned int num_sched_list)
> >   {
> > -	WARN_ON(!num_sched_list || !sched_list);
> 
> Is there a way to get to the drm device so we can use drm_WARN_ON() here and
> below? I figure not?
> 

Correct, no way to resolve the DRM device.

Matt

> Thanks,
> 
> Thomas
> 
> 
> > +	WARN_ON(!num_sched_list || !sched_list ||
> > +		!!entity->single_sched);
> >   	entity->sched_list = sched_list;
> >   	entity->num_sched_list = num_sched_list;
> > @@ -195,13 +205,15 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
> >   {
> >   	struct drm_sched_job *job;
> >   	struct dma_fence *prev;
> > +	bool single_entity = !!entity->single_sched;
> > -	if (!entity->rq)
> > +	if (!entity->rq && !single_entity)
> >   		return;
> >   	spin_lock(&entity->rq_lock);
> >   	entity->stopped = true;
> > -	drm_sched_rq_remove_entity(entity->rq, entity);
> > +	if (!single_entity)
> > +		drm_sched_rq_remove_entity(entity->rq, entity);
> >   	spin_unlock(&entity->rq_lock);
> >   	/* Make sure this entity is not used by the scheduler at the moment */
> > @@ -223,6 +235,20 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
> >   	dma_fence_put(prev);
> >   }
> > +/**
> > + * drm_sched_entity_to_scheduler - Schedule entity to GPU scheduler
> > + * @entity: scheduler entity
> > + *
> > + * Returns GPU scheduler for the entity
> > + */
> > +struct drm_gpu_scheduler *
> > +drm_sched_entity_to_scheduler(struct drm_sched_entity *entity)
> > +{
> > +	bool single_entity = !!entity->single_sched;
> > +
> > +	return single_entity ? entity->single_sched : entity->rq->sched;
> > +}
> > +
> >   /**
> >    * drm_sched_entity_flush - Flush a context entity
> >    *
> > @@ -240,11 +266,12 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
> >   	struct drm_gpu_scheduler *sched;
> >   	struct task_struct *last_user;
> >   	long ret = timeout;
> > +	bool single_entity = !!entity->single_sched;
> > -	if (!entity->rq)
> > +	if (!entity->rq && !single_entity)
> >   		return 0;
> > -	sched = entity->rq->sched;
> > +	sched = drm_sched_entity_to_scheduler(entity);
> >   	/**
> >   	 * The client will not queue more IBs during this fini, consume existing
> >   	 * queued IBs or discard them on SIGKILL
> > @@ -337,7 +364,7 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
> >   		container_of(cb, struct drm_sched_entity, cb);
> >   	drm_sched_entity_clear_dep(f, cb);
> > -	drm_sched_wakeup(entity->rq->sched);
> > +	drm_sched_wakeup(drm_sched_entity_to_scheduler(entity));
> >   }
> >   /**
> > @@ -351,6 +378,8 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
> >   void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
> >   				   enum drm_sched_priority priority)
> >   {
> > +	WARN_ON(!!entity->single_sched);
> > +
> >   	spin_lock(&entity->rq_lock);
> >   	entity->priority = priority;
> >   	spin_unlock(&entity->rq_lock);
> > @@ -363,7 +392,7 @@ EXPORT_SYMBOL(drm_sched_entity_set_priority);
> >    */
> >   static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
> >   {
> > -	struct drm_gpu_scheduler *sched = entity->rq->sched;
> > +	struct drm_gpu_scheduler *sched = drm_sched_entity_to_scheduler(entity);
> >   	struct dma_fence *fence = entity->dependency;
> >   	struct drm_sched_fence *s_fence;
> > @@ -456,7 +485,8 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
> >   	 * Update the entity's location in the min heap according to
> >   	 * the timestamp of the next job, if any.
> >   	 */
> > -	if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO) {
> > +	if (drm_sched_entity_to_scheduler(entity)->sched_policy ==
> > +	    DRM_SCHED_POLICY_FIFO) {
> >   		struct drm_sched_job *next;
> >   		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> > @@ -473,6 +503,8 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> >   	struct drm_gpu_scheduler *sched;
> >   	struct drm_sched_rq *rq;
> > +	WARN_ON(!!entity->single_sched);
> > +
> >   	/* single possible engine and already selected */
> >   	if (!entity->sched_list)
> >   		return;
> > @@ -522,16 +554,21 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> >   void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
> >   {
> >   	struct drm_sched_entity *entity = sched_job->entity;
> > +	bool single_entity = !!entity->single_sched;
> >   	bool first;
> >   	trace_drm_sched_job(sched_job, entity);
> > -	atomic_inc(entity->rq->sched->score);
> > +	if (!single_entity)
> > +		atomic_inc(entity->rq->sched->score);
> >   	WRITE_ONCE(entity->last_user, current->group_leader);
> >   	first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node);
> >   	sched_job->submit_ts = ktime_get();
> >   	/* first job wakes up scheduler */
> >   	if (first) {
> > +		struct drm_gpu_scheduler *sched =
> > +			drm_sched_entity_to_scheduler(entity);
> > +
> >   		/* Add the entity to the run queue */
> >   		spin_lock(&entity->rq_lock);
> >   		if (entity->stopped) {
> > @@ -541,13 +578,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
> >   			return;
> >   		}
> > -		drm_sched_rq_add_entity(entity->rq, entity);
> > +		if (!single_entity)
> > +			drm_sched_rq_add_entity(entity->rq, entity);
> >   		spin_unlock(&entity->rq_lock);
> > -		if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO)
> > +		if (sched->sched_policy == DRM_SCHED_POLICY_FIFO)
> >   			drm_sched_rq_update_fifo(entity, sched_job->submit_ts);
> > -		drm_sched_wakeup(entity->rq->sched);
> > +		drm_sched_wakeup(sched);
> >   	}
> >   }
> >   EXPORT_SYMBOL(drm_sched_entity_push_job);
> > diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> > index 7fd869520ef2..1ba5056851dd 100644
> > --- a/drivers/gpu/drm/scheduler/sched_fence.c
> > +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> > @@ -167,7 +167,7 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
> >   {
> >   	unsigned seq;
> > -	fence->sched = entity->rq->sched;
> > +	fence->sched = drm_sched_entity_to_scheduler(entity);
> >   	seq = atomic_inc_return(&entity->fence_seq);
> >   	dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> >   		       &fence->lock, entity->fence_context, seq);
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 6777a2db554f..870568d94f1f 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -32,7 +32,8 @@
> >    * backend operations to the scheduler like submitting a job to hardware run queue,
> >    * returning the dependencies of a job etc.
> >    *
> > - * The organisation of the scheduler is the following:
> > + * The organisation of the scheduler is the following for scheduling policies
> > + * DRM_SCHED_POLICY_RR and DRM_SCHED_POLICY_FIFO:
> >    *
> >    * 1. Each hw run queue has one scheduler
> >    * 2. Each scheduler has multiple run queues with different priorities
> > @@ -41,7 +42,22 @@
> >    * 4. Entities themselves maintain a queue of jobs that will be scheduled on
> >    *    the hardware.
> >    *
> > - * The jobs in a entity are always scheduled in the order that they were pushed.
> > + * The organisation of the scheduler is the following for scheduling policy
> > + * DRM_SCHED_POLICY_SINGLE_ENTITY:
> > + *
> > + * 1. One to one relationship between scheduler and entity
> > + * 2. No priorities implemented per scheduler (single job queue)
> > + * 3. No run queues in scheduler rather jobs are directly dequeued from entity
> > + * 4. The entity maintains a queue of jobs that will be scheduled on the
> > + * hardware
> > + *
> > + * The jobs in a entity are always scheduled in the order that they were pushed
> > + * regardless of scheduling policy.
> > + *
> > + * A policy of DRM_SCHED_POLICY_RR or DRM_SCHED_POLICY_FIFO is expected to used
> > + * when the KMD is scheduling directly on the hardware while a scheduling policy
> > + * of DRM_SCHED_POLICY_SINGLE_ENTITY is expected to be used when there is a
> > + * firmare scheduler.
> >    */
> >   #include <linux/wait.h>
> > @@ -92,6 +108,8 @@ static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *enti
> >   void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
> >   {
> > +	WARN_ON(!!entity->single_sched);
> > +
> >   	/*
> >   	 * Both locks need to be grabbed, one to protect from entity->rq change
> >   	 * for entity from within concurrent drm_sched_entity_select_rq and the
> > @@ -122,6 +140,8 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
> >   static void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
> >   			      struct drm_sched_rq *rq)
> >   {
> > +	WARN_ON(sched->sched_policy == DRM_SCHED_POLICY_SINGLE_ENTITY);
> > +
> >   	spin_lock_init(&rq->lock);
> >   	INIT_LIST_HEAD(&rq->entities);
> >   	rq->rb_tree_root = RB_ROOT_CACHED;
> > @@ -140,6 +160,8 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
> >   void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
> >   			     struct drm_sched_entity *entity)
> >   {
> > +	WARN_ON(!!entity->single_sched);
> > +
> >   	if (!list_empty(&entity->list))
> >   		return;
> > @@ -162,6 +184,8 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
> >   void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
> >   				struct drm_sched_entity *entity)
> >   {
> > +	WARN_ON(!!entity->single_sched);
> > +
> >   	if (list_empty(&entity->list))
> >   		return;
> > @@ -691,7 +715,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >   		       struct drm_sched_entity *entity,
> >   		       void *owner)
> >   {
> > -	if (!entity->rq)
> > +	if (!entity->rq && !entity->single_sched)
> >   		return -ENOENT;
> >   	job->entity = entity;
> > @@ -724,13 +748,16 @@ void drm_sched_job_arm(struct drm_sched_job *job)
> >   {
> >   	struct drm_gpu_scheduler *sched;
> >   	struct drm_sched_entity *entity = job->entity;
> > +	bool single_entity = !!entity->single_sched;
> >   	BUG_ON(!entity);
> > -	drm_sched_entity_select_rq(entity);
> > -	sched = entity->rq->sched;
> > +	if (!single_entity)
> > +		drm_sched_entity_select_rq(entity);
> > +	sched = drm_sched_entity_to_scheduler(entity);
> >   	job->sched = sched;
> > -	job->s_priority = entity->rq - sched->sched_rq;
> > +	if (!single_entity)
> > +		job->s_priority = entity->rq - sched->sched_rq;
> >   	job->id = atomic64_inc_return(&sched->job_id_count);
> >   	drm_sched_fence_init(job->s_fence, job->entity);
> > @@ -954,6 +981,13 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
> >   	if (!drm_sched_ready(sched))
> >   		return NULL;
> > +	if (sched->single_entity) {
> > +		if (drm_sched_entity_is_ready(sched->single_entity))
> > +			return sched->single_entity;
> > +
> > +		return NULL;
> > +	}
> > +
> >   	/* Kernel run queue has higher priority than normal run queue*/
> >   	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
> >   		entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ?
> > @@ -1210,6 +1244,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> >   		return -EINVAL;
> >   	sched->ops = ops;
> > +	sched->single_entity = NULL;
> >   	sched->hw_submission_limit = hw_submission;
> >   	sched->name = name;
> >   	sched->run_wq = run_wq ? : system_wq;
> > @@ -1222,7 +1257,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> >   		sched->sched_policy = default_drm_sched_policy;
> >   	else
> >   		sched->sched_policy = sched_policy;
> > -	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
> > +	for (i = DRM_SCHED_PRIORITY_MIN; sched_policy !=
> > +	     DRM_SCHED_POLICY_SINGLE_ENTITY && i < DRM_SCHED_PRIORITY_COUNT;
> > +	     i++)
> >   		drm_sched_rq_init(sched, &sched->sched_rq[i]);
> >   	init_waitqueue_head(&sched->job_scheduled);
> > @@ -1255,7 +1292,15 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
> >   	drm_sched_run_wq_stop(sched);
> > -	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
> > +	if (sched->single_entity) {
> > +		spin_lock(&sched->single_entity->rq_lock);
> > +		sched->single_entity->stopped = true;
> > +		spin_unlock(&sched->single_entity->rq_lock);
> > +	}
> > +
> > +	for (i = DRM_SCHED_PRIORITY_COUNT - 1; sched->sched_policy !=
> > +	     DRM_SCHED_POLICY_SINGLE_ENTITY && i >= DRM_SCHED_PRIORITY_MIN;
> > +	     i--) {
> >   		struct drm_sched_rq *rq = &sched->sched_rq[i];
> >   		if (!rq)
> > @@ -1299,6 +1344,8 @@ void drm_sched_increase_karma(struct drm_sched_job *bad)
> >   	struct drm_sched_entity *entity;
> >   	struct drm_gpu_scheduler *sched = bad->sched;
> > +	WARN_ON(sched->sched_policy == DRM_SCHED_POLICY_SINGLE_ENTITY);
> > +
> >   	/* don't change @bad's karma if it's from KERNEL RQ,
> >   	 * because sometimes GPU hang would cause kernel jobs (like VM updating jobs)
> >   	 * corrupt but keep in mind that kernel jobs always considered good.
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 3df801401028..669d6520cd3a 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -70,6 +70,7 @@ enum drm_sched_policy {
> >   	DRM_SCHED_POLICY_DEFAULT,
> >   	DRM_SCHED_POLICY_RR,
> >   	DRM_SCHED_POLICY_FIFO,
> > +	DRM_SCHED_POLICY_SINGLE_ENTITY,
> >   	DRM_SCHED_POLICY_COUNT,
> >   };
> > @@ -103,6 +104,9 @@ struct drm_sched_entity {
> >   	 */
> >   	struct drm_sched_rq		*rq;
> > +	/** @single_sched: Single scheduler */
> > +	struct drm_gpu_scheduler	*single_sched;
> > +
> >   	/**
> >   	 * @sched_list:
> >   	 *
> > @@ -488,6 +492,7 @@ struct drm_sched_backend_ops {
> >    * struct drm_gpu_scheduler - scheduler instance-specific data
> >    *
> >    * @ops: backend operations provided by the driver.
> > + * @single_entity: Single entity for the scheduler
> >    * @hw_submission_limit: the max size of the hardware queue.
> >    * @timeout: the time after which a job is removed from the scheduler.
> >    * @name: name of the ring for which this scheduler is being used.
> > @@ -519,6 +524,7 @@ struct drm_sched_backend_ops {
> >    */
> >   struct drm_gpu_scheduler {
> >   	const struct drm_sched_backend_ops	*ops;
> > +	struct drm_sched_entity		*single_entity;
> >   	uint32_t			hw_submission_limit;
> >   	long				timeout;
> >   	const char			*name;
> > @@ -604,6 +610,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
> >   			  struct drm_gpu_scheduler **sched_list,
> >   			  unsigned int num_sched_list,
> >   			  atomic_t *guilty);
> > +struct drm_gpu_scheduler *
> > +drm_sched_entity_to_scheduler(struct drm_sched_entity *entity);
> >   long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout);
> >   void drm_sched_entity_fini(struct drm_sched_entity *entity);
> >   void drm_sched_entity_destroy(struct drm_sched_entity *entity);

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 11/31] drm/xe/guc: Use doorbells for submission if possible
  2023-05-21 12:32   ` Oded Gabbay
@ 2023-06-08 19:30     ` Matthew Brost
  2023-06-12 13:01       ` Oded Gabbay
  0 siblings, 1 reply; 126+ messages in thread
From: Matthew Brost @ 2023-06-08 19:30 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: intel-xe, Faith Ekstrand

On Sun, May 21, 2023 at 03:32:10PM +0300, Oded Gabbay wrote:
> On Sun, May 21, 2023 at 3:18 PM Matthew Brost <matthew.brost@intel.com> wrote:
> >
> > We have 256 doorbells (on most platforms) that we can allocate to bypass
> > using the H2G channel for submission. This will avoid contention on the
> > CT mutex.
> >
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Suggested-by: Faith Ekstrand <faith.ekstrand@collabora.com>
> > ---
> >  drivers/gpu/drm/xe/regs/xe_guc_regs.h    |   1 +
> >  drivers/gpu/drm/xe/xe_guc.c              |   6 +
> >  drivers/gpu/drm/xe/xe_guc_engine_types.h |   7 +
> >  drivers/gpu/drm/xe/xe_guc_submit.c       | 295 ++++++++++++++++++++++-
> >  drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
> >  drivers/gpu/drm/xe/xe_guc_types.h        |   4 +
> >  drivers/gpu/drm/xe/xe_trace.h            |   5 +
> >  7 files changed, 315 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/regs/xe_guc_regs.h b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> > index 37e0ac550931..11b117293a62 100644
> > --- a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> > +++ b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> > @@ -109,6 +109,7 @@ struct guc_doorbell_info {
> >
> >  #define DIST_DBS_POPULATED                     XE_REG(0xd08)
> >  #define   DOORBELLS_PER_SQIDI_MASK             REG_GENMASK(23, 16)
> > +#define          DOORBELLS_PER_SQIDI_SHIFT             16
> >  #define   SQIDIS_DOORBELL_EXIST_MASK           REG_GENMASK(15, 0)
> >
> >  #define GUC_BCS_RCS_IER                                XE_REG(0xC550)
> > diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> > index 89d20faced19..0c87f78a868b 100644
> > --- a/drivers/gpu/drm/xe/xe_guc.c
> > +++ b/drivers/gpu/drm/xe/xe_guc.c
> > @@ -297,6 +297,12 @@ int xe_guc_init(struct xe_guc *guc)
> >   */
> >  int xe_guc_init_post_hwconfig(struct xe_guc *guc)
> >  {
> > +       int ret;
> > +
> > +       ret = xe_guc_submit_init_post_hwconfig(guc);
> > +       if (ret)
> > +               return ret;
> > +
> >         return xe_guc_ads_init_post_hwconfig(&guc->ads);
> >  }
> >
> > diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > index 5d83132034a6..420b7f53e649 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > @@ -12,6 +12,7 @@
> >  #include <drm/gpu_scheduler.h>
> >
> >  struct dma_fence;
> > +struct xe_bo;
> >  struct xe_engine;
> >
> >  /**
> > @@ -37,6 +38,10 @@ struct xe_guc_engine {
> >         struct work_struct fini_async;
> >         /** @resume_time: time of last resume */
> >         u64 resume_time;
> > +       /** @doorbell_bo: BO for memory doorbell */
> > +       struct xe_bo *doorbell_bo;
> > +       /** @doorbell_offset: MMIO doorbell offset */
> > +       u32 doorbell_offset;
> >         /** @state: GuC specific state for this xe_engine */
> >         atomic_t state;
> >         /** @wqi_head: work queue item tail */
> > @@ -45,6 +50,8 @@ struct xe_guc_engine {
> >         u32 wqi_tail;
> >         /** @id: GuC id for this xe_engine */
> >         u16 id;
> > +       /** @doorbell_id: doorbell id */
> > +       u16 doorbell_id;
> >         /** @suspend_wait: wait queue used to wait on pending suspends */
> >         wait_queue_head_t suspend_wait;
> >         /** @suspend_pending: a suspend of the engine is pending */
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index 0a41f5d04f6d..1b6f36b04cd1 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -13,7 +13,10 @@
> >
> >  #include <drm/drm_managed.h>
> >
> > +#include "regs/xe_guc_regs.h"
> >  #include "regs/xe_lrc_layout.h"
> > +
> > +#include "xe_bo.h"
> >  #include "xe_device.h"
> >  #include "xe_engine.h"
> >  #include "xe_force_wake.h"
> > @@ -26,12 +29,22 @@
> >  #include "xe_lrc.h"
> >  #include "xe_macros.h"
> >  #include "xe_map.h"
> > +#include "xe_mmio.h"
> >  #include "xe_mocs.h"
> >  #include "xe_ring_ops_types.h"
> >  #include "xe_sched_job.h"
> >  #include "xe_trace.h"
> >  #include "xe_vm.h"
> >
> > +#define HAS_GUC_MMIO_DB(xe) (IS_DGFX(xe) || GRAPHICS_VERx100(xe) >= 1250)
> > +#define HAS_GUC_DIST_DB(xe) \
> > +       (GRAPHICS_VERx100(xe) >= 1200 && !HAS_GUC_MMIO_DB(xe))
> > +
> > +#define GUC_NUM_HW_DOORBELLS 256
> > +
> > +#define GUC_MMIO_DB_BAR_OFFSET SZ_4M
> > +#define GUC_MMIO_DB_BAR_SIZE SZ_4M
> > +
> >  static struct xe_gt *
> >  guc_to_gt(struct xe_guc *guc)
> >  {
> > @@ -63,6 +76,7 @@ engine_to_guc(struct xe_engine *e)
> >  #define ENGINE_STATE_SUSPENDED         (1 << 5)
> >  #define ENGINE_STATE_RESET             (1 << 6)
> >  #define ENGINE_STATE_KILLED            (1 << 7)
> > +#define ENGINE_STATE_DB_REGISTERED     (1 << 8)
> >
> >  static bool engine_registered(struct xe_engine *e)
> >  {
> > @@ -179,6 +193,16 @@ static void set_engine_killed(struct xe_engine *e)
> >         atomic_or(ENGINE_STATE_KILLED, &e->guc->state);
> >  }
> >
> > +static bool engine_doorbell_registered(struct xe_engine *e)
> > +{
> > +       return atomic_read(&e->guc->state) & ENGINE_STATE_DB_REGISTERED;
> > +}
> > +
> > +static void set_engine_doorbell_registered(struct xe_engine *e)
> > +{
> > +       atomic_or(ENGINE_STATE_DB_REGISTERED, &e->guc->state);
> > +}
> > +
> >  static bool engine_killed_or_banned(struct xe_engine *e)
> >  {
> >         return engine_killed(e) || engine_banned(e);
> > @@ -190,6 +214,7 @@ static void guc_submit_fini(struct drm_device *drm, void *arg)
> >
> >         xa_destroy(&guc->submission_state.engine_lookup);
> >         ida_destroy(&guc->submission_state.guc_ids);
> > +       ida_destroy(&guc->submission_state.doorbell_ids);
> >         bitmap_free(guc->submission_state.guc_ids_bitmap);
> >  }
> >
> > @@ -230,6 +255,7 @@ int xe_guc_submit_init(struct xe_guc *guc)
> >         mutex_init(&guc->submission_state.lock);
> >         xa_init(&guc->submission_state.engine_lookup);
> >         ida_init(&guc->submission_state.guc_ids);
> > +       ida_init(&guc->submission_state.doorbell_ids);
> >
> >         spin_lock_init(&guc->submission_state.suspend.lock);
> >         guc->submission_state.suspend.context = dma_fence_context_alloc(1);
> > @@ -243,6 +269,237 @@ int xe_guc_submit_init(struct xe_guc *guc)
> >         return 0;
> >  }
> >
> > +int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc)
> > +{
> > +       if (HAS_GUC_DIST_DB(guc_to_xe(guc))) {
> > +               u32 distdbreg = xe_mmio_read32(guc_to_gt(guc),
> > +                                              DIST_DBS_POPULATED.reg);
> > +               u32 num_sqidi =
> > +                       hweight32(distdbreg & SQIDIS_DOORBELL_EXIST_MASK);
> > +               u32 doorbells_per_sqidi =
> > +                       ((distdbreg >> DOORBELLS_PER_SQIDI_SHIFT) &
> > +                        DOORBELLS_PER_SQIDI_MASK) + 1;
> > +
> > +               guc->submission_state.num_doorbells =
> > +                       num_sqidi * doorbells_per_sqidi;
> > +       } else {
> > +               guc->submission_state.num_doorbells = GUC_NUM_HW_DOORBELLS;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static bool alloc_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +       int ret;
> > +
> > +       lockdep_assert_held(&guc->submission_state.lock);
> > +
> > +       e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
> > +       ret = ida_simple_get(&guc->submission_state.doorbell_ids, 0,
> > +                            guc->submission_state.num_doorbells, GFP_NOWAIT);
> > +       if (ret < 0)
> > +               return false;
> > +
> > +       e->guc->doorbell_id = ret;
> > +
> > +       return true;
> > +}
> > +
> > +static void release_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +       mutex_lock(&guc->submission_state.lock);
> > +       ida_simple_remove(&guc->submission_state.doorbell_ids,
> > +                         e->guc->doorbell_id);
> > +       mutex_unlock(&guc->submission_state.lock);
> > +
> > +       e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
> > +}
> > +
> > +static int allocate_doorbell(struct xe_guc *guc, u16 guc_id, u16 doorbell_id,
> > +                            u64 gpa, u32 gtt_addr)
> > +{
> > +       u32 action[] = {
> > +               XE_GUC_ACTION_ALLOCATE_DOORBELL,
> > +               guc_id,
> > +               doorbell_id,
> > +               lower_32_bits(gpa),
> > +               upper_32_bits(gpa),
> > +               gtt_addr
> > +       };
> > +
> > +       return xe_guc_ct_send_block(&guc->ct, action, ARRAY_SIZE(action));
> > +}
> > +
> > +static void deallocate_doorbell(struct xe_guc *guc, u16 guc_id)
> > +{
> > +       u32 action[] = {
> > +               XE_GUC_ACTION_DEALLOCATE_DOORBELL,
> > +               guc_id
> > +       };
> > +
> > +       xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> > +}
> > +
> > +static bool has_doorbell(struct xe_engine *e)
> > +{
> > +       return e->guc->doorbell_id != GUC_NUM_HW_DOORBELLS;
> > +}
> > +
> > +#define doorbell_read(guc_, e_, field_) ({                     \
> > +       struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;  \
> > +       iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);     \
> > +       xe_map_rd_field(guc_to_xe((guc_)), &_vmap, 0,           \
> > +                                 struct guc_doorbell_info, field_); \
> > +       })
> > +#define doorbell_write(guc_, e_, field_, val_) ({              \
> > +       struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;  \
> > +       iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);     \
> > +       xe_map_wr_field(guc_to_xe((guc_)), &_vmap, 0,           \
> > +                                 struct guc_doorbell_info, field_, val_); \
> > +       })
> > +
> > +static void init_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +       struct xe_device *xe = guc_to_xe(guc);
> > +
> > +       /* GuC does the initialization with distributed and MMIO doorbells */
> > +       if (!HAS_GUC_DIST_DB(xe) && !HAS_GUC_MMIO_DB(xe)) {
> > +               doorbell_write(guc, e, db_status, GUC_DOORBELL_ENABLED);
> > +               doorbell_write(guc, e, cookie, 0);
> > +       }
> > +}
> > +
> > +static void fini_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +       if (!HAS_GUC_MMIO_DB(guc_to_xe(guc)) &&
> > +           xe_device_mem_access_ongoing(guc_to_xe(guc)))
> > +               doorbell_write(guc, e, db_status, GUC_DOORBELL_DISABLED);
> > +}
> > +
> > +static void destroy_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +       if (has_doorbell(e)) {
> > +               release_doorbell_id(guc, e);
> > +               xe_bo_unpin_map_no_vm(e->guc->doorbell_bo);
> > +       }
> > +}
> > +
> > +static void ring_memory_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +       u32 cookie;
> > +
> > +       cookie = doorbell_read(guc, e, cookie);
> > +       doorbell_write(guc, e, cookie, cookie + 1 ?: cookie + 2);
> > +
> > +       XE_WARN_ON(doorbell_read(guc, e, db_status) != GUC_DOORBELL_ENABLED);
> > +}
> > +
> > +#define GUC_MMIO_DOORBELL_RING_ACK     0xACEDBEEF
> > +#define GUC_MMIO_DOORBELL_RING_NACK    0xDEADBEEF
> > +static void ring_mmio_doorbell(struct xe_guc *guc, u32 doorbell_offset)
> > +{
> > +       u32 db_value;
> > +
> > +       db_value = xe_mmio_read32(guc_to_gt(guc), GUC_MMIO_DB_BAR_OFFSET +
> > +                                 doorbell_offset);
> > +
> > +       /*
> > +        * The read from the doorbell page will return ack/nack. We don't remove
> > +        * doorbells from active clients so we don't expect to ever get a nack.
> > +        * XXX: if doorbell is lost, re-acquire it?
> > +        */
> > +       XE_WARN_ON(db_value == GUC_MMIO_DOORBELL_RING_NACK);
> > +       XE_WARN_ON(db_value != GUC_MMIO_DOORBELL_RING_ACK);
> > +}
> > +
> > +static void ring_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > +{
> > +       XE_BUG_ON(!has_doorbell(e));
> > +
> > +       if (HAS_GUC_MMIO_DB(guc_to_xe(guc)))
> > +               ring_mmio_doorbell(guc, e->guc->doorbell_offset);
> > +       else
> > +               ring_memory_doorbell(guc, e);
> > +
> > +       trace_xe_engine_ring_db(e);
> > +}
> > +
> > +static void register_engine(struct xe_engine *e);
> > +
> > +static int create_doorbell(struct xe_guc *guc, struct xe_engine *e, bool init)
> > +{
> > +       struct xe_gt *gt = guc_to_gt(guc);
> > +       struct xe_device *xe = gt_to_xe(gt);
> > +       u64 gpa;
> > +       u32 gtt_addr;
> > +       int ret;
> > +
> > +       XE_BUG_ON(!has_doorbell(e));
> > +
> > +       if (HAS_GUC_MMIO_DB(xe)) {
> > +               e->guc->doorbell_offset = PAGE_SIZE * e->guc->doorbell_id;
> I think there is an implied assumption here (and in the code below)
> that PAGE_SIZE is always 4KB, which is problematic in non-x86
> architectures.
> If there is no limitation on xe being used solely on x86_64, then I
> think it would be better to change this to SZ_4K
>

Will fix.
 
> > +               gpa = GUC_MMIO_DB_BAR_OFFSET + e->guc->doorbell_offset;
> > +               gtt_addr = 0;
> > +       } else {
> > +               struct xe_bo *bo;
> > +
> > +               if (!e->guc->doorbell_bo) {
> > +                       bo = xe_bo_create_pin_map(xe, gt, NULL, PAGE_SIZE,
> > +                                                 ttm_bo_type_kernel,
> > +                                                 XE_BO_CREATE_VRAM_IF_DGFX(gt) |
> > +                                                 XE_BO_CREATE_GGTT_BIT);
> > +                       if (IS_ERR(bo))
> > +                               return PTR_ERR(bo);
> > +
> > +                       e->guc->doorbell_bo = bo;
> > +               } else {
> > +                       bo = e->guc->doorbell_bo;
> > +               }
> > +
> > +               init_doorbell(guc, e);
> > +               gpa = xe_bo_main_addr(bo, PAGE_SIZE);
> > +               gtt_addr = xe_bo_ggtt_addr(bo);
> > +       }
> > +
> > +       if (init && e->flags & ENGINE_FLAG_KERNEL)
> > +               return 0;
> > +
> > +       register_engine(e);
> > +       ret = allocate_doorbell(guc, e->guc->id, e->guc->doorbell_id, gpa,
> > +                               gtt_addr);
> > +       if (ret < 0) {
> > +               fini_doorbell(guc, e);
> > +               return ret;
> > +       }
> > +
> > +       /*
> > +        * In distributed doorbells, guc is returning the cacheline selected
> > +        * by HW as part of the 7bit data from the allocate doorbell command:
> > +        *  bit [22]   - Cacheline allocated
> > +        *  bit [21:16] - Cacheline offset address
> > +        * (bit 21 must be zero, or our assumption of only using half a page is
> > +        * no longer correct).
> > +        */
> > +       if (HAS_GUC_DIST_DB(xe)) {
> > +               u32 dd_cacheline_info;
> > +
> > +               XE_WARN_ON(!(ret & BIT(22)));
> > +               XE_WARN_ON(ret & BIT(21));
> > +
> > +               dd_cacheline_info = FIELD_GET(GENMASK(21, 16), ret);
> > +               e->guc->doorbell_offset = dd_cacheline_info * cache_line_size();
> I don't understand something. We have 256 doorbells, but here you
> overrun the doorbell_offset, where dd_cacheline_info can be 0-31
> (because bit 21 is always 0).
> This has overlap... Some doorbells will get the same address.

Yes, there is a 8 to 1 multiplexing behind the scenes here.

Matt

> > +
> > +               /* and verify db status was updated correctly by the guc fw */
> > +               XE_WARN_ON(doorbell_read(guc, e, db_status) !=
> > +                          GUC_DOORBELL_ENABLED);
> > +       }
> > +
> > +       set_engine_doorbell_registered(e);
> > +
> > +       return 0;
> > +}
> > +
> >  static int alloc_guc_id(struct xe_guc *guc, struct xe_engine *e)
> >  {
> >         int ret;
> > @@ -623,6 +880,7 @@ static void submit_engine(struct xe_engine *e)
> >         u32 num_g2h = 0;
> >         int len = 0;
> >         bool extra_submit = false;
> > +       bool enable = false;
> >
> >         XE_BUG_ON(!engine_registered(e));
> >
> > @@ -642,6 +900,7 @@ static void submit_engine(struct xe_engine *e)
> >                 num_g2h = 1;
> >                 if (xe_engine_is_parallel(e))
> >                         extra_submit = true;
> > +               enable = true;
> >
> >                 e->guc->resume_time = RESUME_PENDING;
> >                 set_engine_pending_enable(e);
> > @@ -653,7 +912,10 @@ static void submit_engine(struct xe_engine *e)
> >                 trace_xe_engine_submit(e);
> >         }
> >
> > -       xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
> > +       if (enable || !engine_doorbell_registered(e))
> > +               xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
> > +       else
> > +               ring_doorbell(guc, e);
> >
> >         if (extra_submit) {
> >                 len = 0;
> > @@ -678,8 +940,17 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
> >         trace_xe_sched_job_run(job);
> >
> >         if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
> > -               if (!engine_registered(e))
> > -                       register_engine(e);
> > +               if (!engine_registered(e)) {
> > +                       if (has_doorbell(e)) {
> > +                               int err = create_doorbell(engine_to_guc(e), e,
> > +                                                         false);
> > +
> > +                               /* Not fatal, but let's warn */
> > +                               XE_WARN_ON(err);
> > +                       } else {
> > +                               register_engine(e);
> > +                       }
> > +               }
> >                 if (!lr)        /* Written in IOCTL */
> >                         e->ring_ops->emit_job(job);
> >                 submit_engine(e);
> > @@ -722,6 +993,11 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
> >         MAKE_SCHED_CONTEXT_ACTION(e, DISABLE);
> >         int ret;
> >
> > +       if (has_doorbell(e)) {
> > +               fini_doorbell(guc, e);
> > +               deallocate_doorbell(guc, e->guc->id);
> > +       }
> > +
> >         set_min_preemption_timeout(guc, e);
> >         smp_rmb();
> >         ret = wait_event_timeout(guc->ct.wq, !engine_pending_enable(e) ||
> > @@ -958,6 +1234,7 @@ static void __guc_engine_fini_async(struct work_struct *w)
> >                 cancel_work_sync(&ge->lr_tdr);
> >         if (e->flags & ENGINE_FLAG_PERSISTENT)
> >                 xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
> > +       destroy_doorbell(guc, e);
> >         release_guc_id(guc, e);
> >         drm_sched_entity_fini(&ge->entity);
> >         drm_sched_fini(&ge->sched);
> > @@ -1136,6 +1413,7 @@ static int guc_engine_init(struct xe_engine *e)
> >         struct xe_guc_engine *ge;
> >         long timeout;
> >         int err;
> > +       bool create_db = false;
> >
> >         XE_BUG_ON(!xe_device_guc_submission_enabled(guc_to_xe(guc)));
> >
> > @@ -1177,8 +1455,17 @@ static int guc_engine_init(struct xe_engine *e)
> >         if (guc_read_stopped(guc))
> >                 drm_sched_stop(sched, NULL);
> >
> > +       create_db = alloc_doorbell_id(guc, e);
> > +
> >         mutex_unlock(&guc->submission_state.lock);
> >
> > +       if (create_db) {
> > +               /* Error isn't fatal as we don't need a doorbell */
> > +               err = create_doorbell(guc, e, true);
> > +               if (err)
> > +                       release_doorbell_id(guc, e);
> > +       }
> > +
> >         switch (e->class) {
> >         case XE_ENGINE_CLASS_RENDER:
> >                 sprintf(e->name, "rcs%d", e->guc->id);
> > @@ -1302,7 +1589,7 @@ static int guc_engine_set_job_timeout(struct xe_engine *e, u32 job_timeout_ms)
> >  {
> >         struct drm_gpu_scheduler *sched = &e->guc->sched;
> >
> > -       XE_BUG_ON(engine_registered(e));
> > +       XE_BUG_ON(engine_registered(e) && !has_doorbell(e));
> >         XE_BUG_ON(engine_banned(e));
> >         XE_BUG_ON(engine_killed(e));
> >
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
> > index 8002734d6f24..bada6c02d6aa 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
> > @@ -13,6 +13,7 @@ struct xe_engine;
> >  struct xe_guc;
> >
> >  int xe_guc_submit_init(struct xe_guc *guc);
> > +int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc);
> >  void xe_guc_submit_print(struct xe_guc *guc, struct drm_printer *p);
> >
> >  int xe_guc_submit_reset_prepare(struct xe_guc *guc);
> > diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
> > index ac7eec28934d..9ee4d572f4e0 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_types.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_types.h
> > @@ -36,10 +36,14 @@ struct xe_guc {
> >                 struct xarray engine_lookup;
> >                 /** @guc_ids: used to allocate new guc_ids, single-lrc */
> >                 struct ida guc_ids;
> > +               /** @doorbell_ids: use to allocate new doorbells */
> > +               struct ida doorbell_ids;
> >                 /** @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc */
> >                 unsigned long *guc_ids_bitmap;
> >                 /** @stopped: submissions are stopped */
> >                 atomic_t stopped;
> > +               /** @num_doorbells: number of doorbels */
> > +               int num_doorbells;
> >                 /** @lock: protects submission state */
> >                 struct mutex lock;
> >                 /** @suspend: suspend fence state */
> > diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> > index 02861c26e145..38e9d7c6197b 100644
> > --- a/drivers/gpu/drm/xe/xe_trace.h
> > +++ b/drivers/gpu/drm/xe/xe_trace.h
> > @@ -149,6 +149,11 @@ DEFINE_EVENT(xe_engine, xe_engine_submit,
> >              TP_ARGS(e)
> >  );
> >
> > +DEFINE_EVENT(xe_engine, xe_engine_ring_db,
> > +            TP_PROTO(struct xe_engine *e),
> > +            TP_ARGS(e)
> > +);
> > +
> >  DEFINE_EVENT(xe_engine, xe_engine_scheduling_enable,
> >              TP_PROTO(struct xe_engine *e),
> >              TP_ARGS(e)
> > --
> > 2.34.1
> >
> >

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Intel-xe] [PATCH v2 11/31] drm/xe/guc: Use doorbells for submission if possible
  2023-06-08 19:30     ` Matthew Brost
@ 2023-06-12 13:01       ` Oded Gabbay
  0 siblings, 0 replies; 126+ messages in thread
From: Oded Gabbay @ 2023-06-12 13:01 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Faith Ekstrand

On Thu, Jun 8, 2023 at 10:31 PM Matthew Brost <matthew.brost@intel.com> wrote:
>
> On Sun, May 21, 2023 at 03:32:10PM +0300, Oded Gabbay wrote:
> > On Sun, May 21, 2023 at 3:18 PM Matthew Brost <matthew.brost@intel.com> wrote:
> > >
> > > We have 256 doorbells (on most platforms) that we can allocate to bypass
> > > using the H2G channel for submission. This will avoid contention on the
> > > CT mutex.
> > >
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > Suggested-by: Faith Ekstrand <faith.ekstrand@collabora.com>
> > > ---
> > >  drivers/gpu/drm/xe/regs/xe_guc_regs.h    |   1 +
> > >  drivers/gpu/drm/xe/xe_guc.c              |   6 +
> > >  drivers/gpu/drm/xe/xe_guc_engine_types.h |   7 +
> > >  drivers/gpu/drm/xe/xe_guc_submit.c       | 295 ++++++++++++++++++++++-
> > >  drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
> > >  drivers/gpu/drm/xe/xe_guc_types.h        |   4 +
> > >  drivers/gpu/drm/xe/xe_trace.h            |   5 +
> > >  7 files changed, 315 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/regs/xe_guc_regs.h b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> > > index 37e0ac550931..11b117293a62 100644
> > > --- a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> > > +++ b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> > > @@ -109,6 +109,7 @@ struct guc_doorbell_info {
> > >
> > >  #define DIST_DBS_POPULATED                     XE_REG(0xd08)
> > >  #define   DOORBELLS_PER_SQIDI_MASK             REG_GENMASK(23, 16)
> > > +#define          DOORBELLS_PER_SQIDI_SHIFT             16
> > >  #define   SQIDIS_DOORBELL_EXIST_MASK           REG_GENMASK(15, 0)
> > >
> > >  #define GUC_BCS_RCS_IER                                XE_REG(0xC550)
> > > diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> > > index 89d20faced19..0c87f78a868b 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc.c
> > > @@ -297,6 +297,12 @@ int xe_guc_init(struct xe_guc *guc)
> > >   */
> > >  int xe_guc_init_post_hwconfig(struct xe_guc *guc)
> > >  {
> > > +       int ret;
> > > +
> > > +       ret = xe_guc_submit_init_post_hwconfig(guc);
> > > +       if (ret)
> > > +               return ret;
> > > +
> > >         return xe_guc_ads_init_post_hwconfig(&guc->ads);
> > >  }
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_engine_types.h b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > > index 5d83132034a6..420b7f53e649 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_guc_engine_types.h
> > > @@ -12,6 +12,7 @@
> > >  #include <drm/gpu_scheduler.h>
> > >
> > >  struct dma_fence;
> > > +struct xe_bo;
> > >  struct xe_engine;
> > >
> > >  /**
> > > @@ -37,6 +38,10 @@ struct xe_guc_engine {
> > >         struct work_struct fini_async;
> > >         /** @resume_time: time of last resume */
> > >         u64 resume_time;
> > > +       /** @doorbell_bo: BO for memory doorbell */
> > > +       struct xe_bo *doorbell_bo;
> > > +       /** @doorbell_offset: MMIO doorbell offset */
> > > +       u32 doorbell_offset;
> > >         /** @state: GuC specific state for this xe_engine */
> > >         atomic_t state;
> > >         /** @wqi_head: work queue item tail */
> > > @@ -45,6 +50,8 @@ struct xe_guc_engine {
> > >         u32 wqi_tail;
> > >         /** @id: GuC id for this xe_engine */
> > >         u16 id;
> > > +       /** @doorbell_id: doorbell id */
> > > +       u16 doorbell_id;
> > >         /** @suspend_wait: wait queue used to wait on pending suspends */
> > >         wait_queue_head_t suspend_wait;
> > >         /** @suspend_pending: a suspend of the engine is pending */
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > index 0a41f5d04f6d..1b6f36b04cd1 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > @@ -13,7 +13,10 @@
> > >
> > >  #include <drm/drm_managed.h>
> > >
> > > +#include "regs/xe_guc_regs.h"
> > >  #include "regs/xe_lrc_layout.h"
> > > +
> > > +#include "xe_bo.h"
> > >  #include "xe_device.h"
> > >  #include "xe_engine.h"
> > >  #include "xe_force_wake.h"
> > > @@ -26,12 +29,22 @@
> > >  #include "xe_lrc.h"
> > >  #include "xe_macros.h"
> > >  #include "xe_map.h"
> > > +#include "xe_mmio.h"
> > >  #include "xe_mocs.h"
> > >  #include "xe_ring_ops_types.h"
> > >  #include "xe_sched_job.h"
> > >  #include "xe_trace.h"
> > >  #include "xe_vm.h"
> > >
> > > +#define HAS_GUC_MMIO_DB(xe) (IS_DGFX(xe) || GRAPHICS_VERx100(xe) >= 1250)
> > > +#define HAS_GUC_DIST_DB(xe) \
> > > +       (GRAPHICS_VERx100(xe) >= 1200 && !HAS_GUC_MMIO_DB(xe))
> > > +
> > > +#define GUC_NUM_HW_DOORBELLS 256
> > > +
> > > +#define GUC_MMIO_DB_BAR_OFFSET SZ_4M
> > > +#define GUC_MMIO_DB_BAR_SIZE SZ_4M
> > > +
> > >  static struct xe_gt *
> > >  guc_to_gt(struct xe_guc *guc)
> > >  {
> > > @@ -63,6 +76,7 @@ engine_to_guc(struct xe_engine *e)
> > >  #define ENGINE_STATE_SUSPENDED         (1 << 5)
> > >  #define ENGINE_STATE_RESET             (1 << 6)
> > >  #define ENGINE_STATE_KILLED            (1 << 7)
> > > +#define ENGINE_STATE_DB_REGISTERED     (1 << 8)
> > >
> > >  static bool engine_registered(struct xe_engine *e)
> > >  {
> > > @@ -179,6 +193,16 @@ static void set_engine_killed(struct xe_engine *e)
> > >         atomic_or(ENGINE_STATE_KILLED, &e->guc->state);
> > >  }
> > >
> > > +static bool engine_doorbell_registered(struct xe_engine *e)
> > > +{
> > > +       return atomic_read(&e->guc->state) & ENGINE_STATE_DB_REGISTERED;
> > > +}
> > > +
> > > +static void set_engine_doorbell_registered(struct xe_engine *e)
> > > +{
> > > +       atomic_or(ENGINE_STATE_DB_REGISTERED, &e->guc->state);
> > > +}
> > > +
> > >  static bool engine_killed_or_banned(struct xe_engine *e)
> > >  {
> > >         return engine_killed(e) || engine_banned(e);
> > > @@ -190,6 +214,7 @@ static void guc_submit_fini(struct drm_device *drm, void *arg)
> > >
> > >         xa_destroy(&guc->submission_state.engine_lookup);
> > >         ida_destroy(&guc->submission_state.guc_ids);
> > > +       ida_destroy(&guc->submission_state.doorbell_ids);
> > >         bitmap_free(guc->submission_state.guc_ids_bitmap);
> > >  }
> > >
> > > @@ -230,6 +255,7 @@ int xe_guc_submit_init(struct xe_guc *guc)
> > >         mutex_init(&guc->submission_state.lock);
> > >         xa_init(&guc->submission_state.engine_lookup);
> > >         ida_init(&guc->submission_state.guc_ids);
> > > +       ida_init(&guc->submission_state.doorbell_ids);
> > >
> > >         spin_lock_init(&guc->submission_state.suspend.lock);
> > >         guc->submission_state.suspend.context = dma_fence_context_alloc(1);
> > > @@ -243,6 +269,237 @@ int xe_guc_submit_init(struct xe_guc *guc)
> > >         return 0;
> > >  }
> > >
> > > +int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc)
> > > +{
> > > +       if (HAS_GUC_DIST_DB(guc_to_xe(guc))) {
> > > +               u32 distdbreg = xe_mmio_read32(guc_to_gt(guc),
> > > +                                              DIST_DBS_POPULATED.reg);
> > > +               u32 num_sqidi =
> > > +                       hweight32(distdbreg & SQIDIS_DOORBELL_EXIST_MASK);
> > > +               u32 doorbells_per_sqidi =
> > > +                       ((distdbreg >> DOORBELLS_PER_SQIDI_SHIFT) &
> > > +                        DOORBELLS_PER_SQIDI_MASK) + 1;
> > > +
> > > +               guc->submission_state.num_doorbells =
> > > +                       num_sqidi * doorbells_per_sqidi;
> > > +       } else {
> > > +               guc->submission_state.num_doorbells = GUC_NUM_HW_DOORBELLS;
> > > +       }
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static bool alloc_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
> > > +{
> > > +       int ret;
> > > +
> > > +       lockdep_assert_held(&guc->submission_state.lock);
> > > +
> > > +       e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
> > > +       ret = ida_simple_get(&guc->submission_state.doorbell_ids, 0,
> > > +                            guc->submission_state.num_doorbells, GFP_NOWAIT);
> > > +       if (ret < 0)
> > > +               return false;
> > > +
> > > +       e->guc->doorbell_id = ret;
> > > +
> > > +       return true;
> > > +}
> > > +
> > > +static void release_doorbell_id(struct xe_guc *guc, struct xe_engine *e)
> > > +{
> > > +       mutex_lock(&guc->submission_state.lock);
> > > +       ida_simple_remove(&guc->submission_state.doorbell_ids,
> > > +                         e->guc->doorbell_id);
> > > +       mutex_unlock(&guc->submission_state.lock);
> > > +
> > > +       e->guc->doorbell_id = GUC_NUM_HW_DOORBELLS;
> > > +}
> > > +
> > > +static int allocate_doorbell(struct xe_guc *guc, u16 guc_id, u16 doorbell_id,
> > > +                            u64 gpa, u32 gtt_addr)
> > > +{
> > > +       u32 action[] = {
> > > +               XE_GUC_ACTION_ALLOCATE_DOORBELL,
> > > +               guc_id,
> > > +               doorbell_id,
> > > +               lower_32_bits(gpa),
> > > +               upper_32_bits(gpa),
> > > +               gtt_addr
> > > +       };
> > > +
> > > +       return xe_guc_ct_send_block(&guc->ct, action, ARRAY_SIZE(action));
> > > +}
> > > +
> > > +static void deallocate_doorbell(struct xe_guc *guc, u16 guc_id)
> > > +{
> > > +       u32 action[] = {
> > > +               XE_GUC_ACTION_DEALLOCATE_DOORBELL,
> > > +               guc_id
> > > +       };
> > > +
> > > +       xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> > > +}
> > > +
> > > +static bool has_doorbell(struct xe_engine *e)
> > > +{
> > > +       return e->guc->doorbell_id != GUC_NUM_HW_DOORBELLS;
> > > +}
> > > +
> > > +#define doorbell_read(guc_, e_, field_) ({                     \
> > > +       struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;  \
> > > +       iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);     \
> > > +       xe_map_rd_field(guc_to_xe((guc_)), &_vmap, 0,           \
> > > +                                 struct guc_doorbell_info, field_); \
> > > +       })
> > > +#define doorbell_write(guc_, e_, field_, val_) ({              \
> > > +       struct iosys_map _vmap = (e_)->guc->doorbell_bo->vmap;  \
> > > +       iosys_map_incr(&_vmap, (e_)->guc->doorbell_offset);     \
> > > +       xe_map_wr_field(guc_to_xe((guc_)), &_vmap, 0,           \
> > > +                                 struct guc_doorbell_info, field_, val_); \
> > > +       })
> > > +
> > > +static void init_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > > +{
> > > +       struct xe_device *xe = guc_to_xe(guc);
> > > +
> > > +       /* GuC does the initialization with distributed and MMIO doorbells */
> > > +       if (!HAS_GUC_DIST_DB(xe) && !HAS_GUC_MMIO_DB(xe)) {
> > > +               doorbell_write(guc, e, db_status, GUC_DOORBELL_ENABLED);
> > > +               doorbell_write(guc, e, cookie, 0);
> > > +       }
> > > +}
> > > +
> > > +static void fini_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > > +{
> > > +       if (!HAS_GUC_MMIO_DB(guc_to_xe(guc)) &&
> > > +           xe_device_mem_access_ongoing(guc_to_xe(guc)))
> > > +               doorbell_write(guc, e, db_status, GUC_DOORBELL_DISABLED);
> > > +}
> > > +
> > > +static void destroy_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > > +{
> > > +       if (has_doorbell(e)) {
> > > +               release_doorbell_id(guc, e);
> > > +               xe_bo_unpin_map_no_vm(e->guc->doorbell_bo);
> > > +       }
> > > +}
> > > +
> > > +static void ring_memory_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > > +{
> > > +       u32 cookie;
> > > +
> > > +       cookie = doorbell_read(guc, e, cookie);
> > > +       doorbell_write(guc, e, cookie, cookie + 1 ?: cookie + 2);
> > > +
> > > +       XE_WARN_ON(doorbell_read(guc, e, db_status) != GUC_DOORBELL_ENABLED);
> > > +}
> > > +
> > > +#define GUC_MMIO_DOORBELL_RING_ACK     0xACEDBEEF
> > > +#define GUC_MMIO_DOORBELL_RING_NACK    0xDEADBEEF
> > > +static void ring_mmio_doorbell(struct xe_guc *guc, u32 doorbell_offset)
> > > +{
> > > +       u32 db_value;
> > > +
> > > +       db_value = xe_mmio_read32(guc_to_gt(guc), GUC_MMIO_DB_BAR_OFFSET +
> > > +                                 doorbell_offset);
> > > +
> > > +       /*
> > > +        * The read from the doorbell page will return ack/nack. We don't remove
> > > +        * doorbells from active clients so we don't expect to ever get a nack.
> > > +        * XXX: if doorbell is lost, re-acquire it?
> > > +        */
> > > +       XE_WARN_ON(db_value == GUC_MMIO_DOORBELL_RING_NACK);
> > > +       XE_WARN_ON(db_value != GUC_MMIO_DOORBELL_RING_ACK);
> > > +}
> > > +
> > > +static void ring_doorbell(struct xe_guc *guc, struct xe_engine *e)
> > > +{
> > > +       XE_BUG_ON(!has_doorbell(e));
> > > +
> > > +       if (HAS_GUC_MMIO_DB(guc_to_xe(guc)))
> > > +               ring_mmio_doorbell(guc, e->guc->doorbell_offset);
> > > +       else
> > > +               ring_memory_doorbell(guc, e);
> > > +
> > > +       trace_xe_engine_ring_db(e);
> > > +}
> > > +
> > > +static void register_engine(struct xe_engine *e);
> > > +
> > > +static int create_doorbell(struct xe_guc *guc, struct xe_engine *e, bool init)
> > > +{
> > > +       struct xe_gt *gt = guc_to_gt(guc);
> > > +       struct xe_device *xe = gt_to_xe(gt);
> > > +       u64 gpa;
> > > +       u32 gtt_addr;
> > > +       int ret;
> > > +
> > > +       XE_BUG_ON(!has_doorbell(e));
> > > +
> > > +       if (HAS_GUC_MMIO_DB(xe)) {
> > > +               e->guc->doorbell_offset = PAGE_SIZE * e->guc->doorbell_id;
> > I think there is an implied assumption here (and in the code below)
> > that PAGE_SIZE is always 4KB, which is problematic in non-x86
> > architectures.
> > If there is no limitation on xe being used solely on x86_64, then I
> > think it would be better to change this to SZ_4K
> >
>
> Will fix.
>
> > > +               gpa = GUC_MMIO_DB_BAR_OFFSET + e->guc->doorbell_offset;
> > > +               gtt_addr = 0;
> > > +       } else {
> > > +               struct xe_bo *bo;
> > > +
> > > +               if (!e->guc->doorbell_bo) {
> > > +                       bo = xe_bo_create_pin_map(xe, gt, NULL, PAGE_SIZE,
> > > +                                                 ttm_bo_type_kernel,
> > > +                                                 XE_BO_CREATE_VRAM_IF_DGFX(gt) |
> > > +                                                 XE_BO_CREATE_GGTT_BIT);
> > > +                       if (IS_ERR(bo))
> > > +                               return PTR_ERR(bo);
> > > +
> > > +                       e->guc->doorbell_bo = bo;
> > > +               } else {
> > > +                       bo = e->guc->doorbell_bo;
> > > +               }
> > > +
> > > +               init_doorbell(guc, e);
> > > +               gpa = xe_bo_main_addr(bo, PAGE_SIZE);
> > > +               gtt_addr = xe_bo_ggtt_addr(bo);
> > > +       }
> > > +
> > > +       if (init && e->flags & ENGINE_FLAG_KERNEL)
> > > +               return 0;
> > > +
> > > +       register_engine(e);
> > > +       ret = allocate_doorbell(guc, e->guc->id, e->guc->doorbell_id, gpa,
> > > +                               gtt_addr);
> > > +       if (ret < 0) {
> > > +               fini_doorbell(guc, e);
> > > +               return ret;
> > > +       }
> > > +
> > > +       /*
> > > +        * In distributed doorbells, guc is returning the cacheline selected
> > > +        * by HW as part of the 7bit data from the allocate doorbell command:
> > > +        *  bit [22]   - Cacheline allocated
> > > +        *  bit [21:16] - Cacheline offset address
> > > +        * (bit 21 must be zero, or our assumption of only using half a page is
> > > +        * no longer correct).
> > > +        */
> > > +       if (HAS_GUC_DIST_DB(xe)) {
> > > +               u32 dd_cacheline_info;
> > > +
> > > +               XE_WARN_ON(!(ret & BIT(22)));
> > > +               XE_WARN_ON(ret & BIT(21));
> > > +
> > > +               dd_cacheline_info = FIELD_GET(GENMASK(21, 16), ret);
> > > +               e->guc->doorbell_offset = dd_cacheline_info * cache_line_size();
> > I don't understand something. We have 256 doorbells, but here you
> > overrun the doorbell_offset, where dd_cacheline_info can be 0-31
> > (because bit 21 is always 0).
> > This has overlap... Some doorbells will get the same address.
>
> Yes, there is a 8 to 1 multiplexing behind the scenes here.
>
> Matt
Could you please elaborate more on what is happening behind the scenes ?
Maybe add that explanation as a comment on this line ?

Oded
>
> > > +
> > > +               /* and verify db status was updated correctly by the guc fw */
> > > +               XE_WARN_ON(doorbell_read(guc, e, db_status) !=
> > > +                          GUC_DOORBELL_ENABLED);
> > > +       }
> > > +
> > > +       set_engine_doorbell_registered(e);
> > > +
> > > +       return 0;
> > > +}
> > > +
> > >  static int alloc_guc_id(struct xe_guc *guc, struct xe_engine *e)
> > >  {
> > >         int ret;
> > > @@ -623,6 +880,7 @@ static void submit_engine(struct xe_engine *e)
> > >         u32 num_g2h = 0;
> > >         int len = 0;
> > >         bool extra_submit = false;
> > > +       bool enable = false;
> > >
> > >         XE_BUG_ON(!engine_registered(e));
> > >
> > > @@ -642,6 +900,7 @@ static void submit_engine(struct xe_engine *e)
> > >                 num_g2h = 1;
> > >                 if (xe_engine_is_parallel(e))
> > >                         extra_submit = true;
> > > +               enable = true;
> > >
> > >                 e->guc->resume_time = RESUME_PENDING;
> > >                 set_engine_pending_enable(e);
> > > @@ -653,7 +912,10 @@ static void submit_engine(struct xe_engine *e)
> > >                 trace_xe_engine_submit(e);
> > >         }
> > >
> > > -       xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
> > > +       if (enable || !engine_doorbell_registered(e))
> > > +               xe_guc_ct_send(&guc->ct, action, len, g2h_len, num_g2h);
> > > +       else
> > > +               ring_doorbell(guc, e);
> > >
> > >         if (extra_submit) {
> > >                 len = 0;
> > > @@ -678,8 +940,17 @@ guc_engine_run_job(struct drm_sched_job *drm_job)
> > >         trace_xe_sched_job_run(job);
> > >
> > >         if (!engine_killed_or_banned(e) && !xe_sched_job_is_error(job)) {
> > > -               if (!engine_registered(e))
> > > -                       register_engine(e);
> > > +               if (!engine_registered(e)) {
> > > +                       if (has_doorbell(e)) {
> > > +                               int err = create_doorbell(engine_to_guc(e), e,
> > > +                                                         false);
> > > +
> > > +                               /* Not fatal, but let's warn */
> > > +                               XE_WARN_ON(err);
> > > +                       } else {
> > > +                               register_engine(e);
> > > +                       }
> > > +               }
> > >                 if (!lr)        /* Written in IOCTL */
> > >                         e->ring_ops->emit_job(job);
> > >                 submit_engine(e);
> > > @@ -722,6 +993,11 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
> > >         MAKE_SCHED_CONTEXT_ACTION(e, DISABLE);
> > >         int ret;
> > >
> > > +       if (has_doorbell(e)) {
> > > +               fini_doorbell(guc, e);
> > > +               deallocate_doorbell(guc, e->guc->id);
> > > +       }
> > > +
> > >         set_min_preemption_timeout(guc, e);
> > >         smp_rmb();
> > >         ret = wait_event_timeout(guc->ct.wq, !engine_pending_enable(e) ||
> > > @@ -958,6 +1234,7 @@ static void __guc_engine_fini_async(struct work_struct *w)
> > >                 cancel_work_sync(&ge->lr_tdr);
> > >         if (e->flags & ENGINE_FLAG_PERSISTENT)
> > >                 xe_device_remove_persistent_engines(gt_to_xe(e->gt), e);
> > > +       destroy_doorbell(guc, e);
> > >         release_guc_id(guc, e);
> > >         drm_sched_entity_fini(&ge->entity);
> > >         drm_sched_fini(&ge->sched);
> > > @@ -1136,6 +1413,7 @@ static int guc_engine_init(struct xe_engine *e)
> > >         struct xe_guc_engine *ge;
> > >         long timeout;
> > >         int err;
> > > +       bool create_db = false;
> > >
> > >         XE_BUG_ON(!xe_device_guc_submission_enabled(guc_to_xe(guc)));
> > >
> > > @@ -1177,8 +1455,17 @@ static int guc_engine_init(struct xe_engine *e)
> > >         if (guc_read_stopped(guc))
> > >                 drm_sched_stop(sched, NULL);
> > >
> > > +       create_db = alloc_doorbell_id(guc, e);
> > > +
> > >         mutex_unlock(&guc->submission_state.lock);
> > >
> > > +       if (create_db) {
> > > +               /* Error isn't fatal as we don't need a doorbell */
> > > +               err = create_doorbell(guc, e, true);
> > > +               if (err)
> > > +                       release_doorbell_id(guc, e);
> > > +       }
> > > +
> > >         switch (e->class) {
> > >         case XE_ENGINE_CLASS_RENDER:
> > >                 sprintf(e->name, "rcs%d", e->guc->id);
> > > @@ -1302,7 +1589,7 @@ static int guc_engine_set_job_timeout(struct xe_engine *e, u32 job_timeout_ms)
> > >  {
> > >         struct drm_gpu_scheduler *sched = &e->guc->sched;
> > >
> > > -       XE_BUG_ON(engine_registered(e));
> > > +       XE_BUG_ON(engine_registered(e) && !has_doorbell(e));
> > >         XE_BUG_ON(engine_banned(e));
> > >         XE_BUG_ON(engine_killed(e));
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
> > > index 8002734d6f24..bada6c02d6aa 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.h
> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
> > > @@ -13,6 +13,7 @@ struct xe_engine;
> > >  struct xe_guc;
> > >
> > >  int xe_guc_submit_init(struct xe_guc *guc);
> > > +int xe_guc_submit_init_post_hwconfig(struct xe_guc *guc);
> > >  void xe_guc_submit_print(struct xe_guc *guc, struct drm_printer *p);
> > >
> > >  int xe_guc_submit_reset_prepare(struct xe_guc *guc);
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
> > > index ac7eec28934d..9ee4d572f4e0 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_guc_types.h
> > > @@ -36,10 +36,14 @@ struct xe_guc {
> > >                 struct xarray engine_lookup;
> > >                 /** @guc_ids: used to allocate new guc_ids, single-lrc */
> > >                 struct ida guc_ids;
> > > +               /** @doorbell_ids: use to allocate new doorbells */
> > > +               struct ida doorbell_ids;
> > >                 /** @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc */
> > >                 unsigned long *guc_ids_bitmap;
> > >                 /** @stopped: submissions are stopped */
> > >                 atomic_t stopped;
> > > +               /** @num_doorbells: number of doorbels */
> > > +               int num_doorbells;
> > >                 /** @lock: protects submission state */
> > >                 struct mutex lock;
> > >                 /** @suspend: suspend fence state */
> > > diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> > > index 02861c26e145..38e9d7c6197b 100644
> > > --- a/drivers/gpu/drm/xe/xe_trace.h
> > > +++ b/drivers/gpu/drm/xe/xe_trace.h
> > > @@ -149,6 +149,11 @@ DEFINE_EVENT(xe_engine, xe_engine_submit,
> > >              TP_ARGS(e)
> > >  );
> > >
> > > +DEFINE_EVENT(xe_engine, xe_engine_ring_db,
> > > +            TP_PROTO(struct xe_engine *e),
> > > +            TP_ARGS(e)
> > > +);
> > > +
> > >  DEFINE_EVENT(xe_engine, xe_engine_scheduling_enable,
> > >              TP_PROTO(struct xe_engine *e),
> > >              TP_ARGS(e)
> > > --
> > > 2.34.1
> > >
> > >

^ permalink raw reply	[flat|nested] 126+ messages in thread

end of thread, other threads:[~2023-06-12 13:01 UTC | newest]

Thread overview: 126+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-02  0:16 [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Matthew Brost
2023-05-02  0:16 ` [Intel-xe] [PATCH v2 01/31] drm/sched: Add run_wq argument to drm_sched_init Matthew Brost
2023-05-03 12:03   ` Thomas Hellström
2023-05-03 15:06     ` Matthew Brost
2023-05-05 18:24       ` Rodrigo Vivi
2023-05-02  0:16 ` [Intel-xe] [PATCH v2 02/31] drm/sched: Move schedule policy to scheduler Matthew Brost
2023-05-03 12:13   ` Thomas Hellström
2023-05-03 15:11     ` Matthew Brost
2023-05-02  0:16 ` [Intel-xe] [PATCH v2 03/31] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
2023-05-08 12:40   ` Thomas Hellström
2023-05-22  1:16     ` Matthew Brost
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 04/31] drm/xe: Use DRM_SCHED_POLICY_SINGLE_ENTITY mode Matthew Brost
2023-05-08 12:41   ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 05/31] drm/xe: Long running job update Matthew Brost
2023-05-05 18:36   ` Rodrigo Vivi
2023-05-08  1:14     ` Matthew Brost
2023-05-08 13:14   ` Thomas Hellström
2023-05-09 14:56     ` Matthew Brost
2023-05-09 15:21       ` Thomas Hellström
2023-05-09 22:16         ` Matthew Brost
2023-05-10  8:15           ` Thomas Hellström
2023-05-09 22:21     ` Matthew Brost
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 06/31] drm/xe: Ensure LR engines are not persistent Matthew Brost
2023-05-05 18:38   ` Rodrigo Vivi
2023-05-08  1:03     ` Matthew Brost
2023-05-09 12:21   ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 07/31] drm/xe: Only try to lock external BOs in VM bind Matthew Brost
2023-05-05 18:40   ` Rodrigo Vivi
2023-05-08  1:08     ` Matthew Brost
2023-05-08  1:15       ` Christopher Snowhill
2023-05-08 21:34       ` Rodrigo Vivi
2023-05-09 12:29         ` Thomas Hellström
2023-05-10 23:25           ` Matthew Brost
2023-05-11  7:43             ` Thomas Hellström
2023-05-08  1:17   ` Christopher Snowhill
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move Matthew Brost
2023-05-08 21:39   ` Rodrigo Vivi
2023-05-09 22:09     ` Matthew Brost
2023-05-10  1:37       ` Rodrigo Vivi
2023-05-09 12:47   ` Thomas Hellström
2023-05-09 22:05     ` Matthew Brost
2023-05-10  8:14       ` Thomas Hellström
2023-05-10 18:40         ` Matthew Brost
2023-05-11  7:24           ` Thomas Hellström
2023-05-11 14:11             ` Matthew Brost
2023-05-12  9:03               ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 09/31] drm/xe/guc: Read HXG fields from DW1 of G2H response Matthew Brost
2023-05-05 18:50   ` Rodrigo Vivi
2023-05-09 12:49   ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 10/31] drm/xe/guc: Return the lower part of blocking H2G message Matthew Brost
2023-05-05 18:52   ` Rodrigo Vivi
2023-05-08  1:10     ` Matthew Brost
2023-05-08  9:20       ` Michal Wajdeczko
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 11/31] drm/xe/guc: Use doorbells for submission if possible Matthew Brost
2023-05-08 21:42   ` Rodrigo Vivi
2023-05-10  0:49     ` Matthew Brost
2023-05-09 13:00   ` Thomas Hellström
2023-05-10  0:51     ` Matthew Brost
2023-05-21 12:32   ` Oded Gabbay
2023-06-08 19:30     ` Matthew Brost
2023-06-12 13:01       ` Oded Gabbay
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 12/31] drm/xe/guc: Print doorbell ID in GuC engine debugfs entry Matthew Brost
2023-05-05 18:55   ` Rodrigo Vivi
2023-05-09 13:01     ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 13/31] maple_tree: split up MA_STATE() macro Matthew Brost
2023-05-09 13:21   ` Thomas Hellström
2023-05-10  0:29     ` Matthew Brost
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 14/31] maple_tree: Export mas_preallocate Matthew Brost
2023-05-09 13:33   ` Thomas Hellström
2023-05-10  0:31     ` Matthew Brost
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 15/31] drm: manager to keep track of GPUs VA mappings Matthew Brost
2023-05-09 13:49   ` Thomas Hellström
2023-05-10  0:55     ` Matthew Brost
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 16/31] drm/xe: Port Xe to GPUVA Matthew Brost
2023-05-09 13:52   ` Thomas Hellström
2023-05-11  2:41     ` Matthew Brost
2023-05-11  7:39       ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 17/31] drm/xe: NULL binding implementation Matthew Brost
2023-05-09 14:34   ` Rodrigo Vivi
2023-05-11  2:52     ` Matthew Brost
2023-05-09 15:17   ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 18/31] drm/xe: Avoid doing rebinds Matthew Brost
2023-05-09 14:48   ` Rodrigo Vivi
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 19/31] drm/xe: Reduce the number list links in xe_vma Matthew Brost
2023-05-08 21:43   ` Rodrigo Vivi
2023-05-11  8:38   ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 20/31] drm/xe: Optimize size of xe_vma allocation Matthew Brost
2023-05-05 19:37   ` Rodrigo Vivi
2023-05-08  1:21     ` Matthew Brost
2023-05-11  9:05   ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 21/31] drm/gpuva: Add drm device to GPUVA manager Matthew Brost
2023-05-05 19:39   ` Rodrigo Vivi
2023-05-11  9:06     ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 22/31] drm/gpuva: Move dma-resv " Matthew Brost
2023-05-11  9:10   ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 23/31] drm/gpuva: Add support for extobj Matthew Brost
2023-05-11  9:35   ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 24/31] drm/xe: Userptr refactor Matthew Brost
2023-05-05 19:41   ` Rodrigo Vivi
2023-05-11  9:46   ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 25/31] drm: execution context for GEM buffers v3 Matthew Brost
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 26/31] drm/exec: Always compile drm_exec Matthew Brost
2023-05-09 14:45   ` Rodrigo Vivi
2023-05-10  0:37     ` Matthew Brost
2023-05-10  0:38     ` Matthew Brost
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 27/31] drm/xe: Use drm_exec for locking rather than TTM exec helpers Matthew Brost
2023-05-05 19:42   ` Rodrigo Vivi
2023-05-11 10:01   ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 28/31] drm/xe: Allow dma-fences as in-syncs for compute / faulting VM Matthew Brost
2023-05-05 19:43   ` Rodrigo Vivi
2023-05-08  1:19     ` Matthew Brost
2023-05-08 21:29       ` Rodrigo Vivi
2023-05-11 10:03   ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 29/31] drm/xe: Allow compute VMs to output dma-fences on binds Matthew Brost
2023-05-09 14:50   ` Rodrigo Vivi
2023-05-11 10:04   ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 30/31] drm/xe: remove async worker, sync binds, new error handling Matthew Brost
2023-05-17 16:53   ` Thomas Hellström
2023-05-02  0:17 ` [Intel-xe] [PATCH v2 31/31] drm/xe/uapi: Add some VM bind kernel doc Matthew Brost
2023-05-05 19:45   ` Rodrigo Vivi
2023-05-11 10:14     ` Thomas Hellström
2023-05-02  0:20 ` [Intel-xe] ✗ CI.Patch_applied: failure for Upstreaming prep / all of mbrosts patches (rev2) Patchwork
2023-05-02  1:54   ` Christopher Snowhill (kode54)
2023-05-02  1:59   ` Christopher Snowhill (kode54)
2023-05-03 12:37 ` [Intel-xe] [PATCH v2 00/31] Upstreaming prep / all of mbrosts patches Thomas Hellström
2023-05-03 15:27   ` Matthew Brost

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.