[PATCH 0/7] drm/xe: Rework rebinding and allow same-vm eviction

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/7] drm/xe: Rework rebinding and allow same-vm eviction
@ 2024-03-21 11:37 Thomas Hellström
  2024-03-21 11:37 ` [PATCH 1/7] drm/xe: Make TLB invalidation fences unordered Thomas Hellström
                   ` (11 more replies)
  0 siblings, 12 replies; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström

We were not allowing eviction / shrinking of completely unbound
local objects. Since unbinding is the UMD primary means of
freeing up memory on local VM overcommit situations, fix this.

The funtionality added here also open up the possibility to evict
purgeable local objects with upcoming changes.

To make this work properly, rebinding needs to be moved to the
while-not-all-locked drm-exec loop, since rebinding may allocate
gpu page table bos and thus cause evictions which forces us
to re-run validation.

This is done in patch 5, but when writing that patch, a number
couple of rebinding flaws were discovered.

1) When saving the rebinding fence we always presumed the
   rebinding fences were ordered. That is not true, and is
   fixed in patch 2, where we attach the rebind fences as
   kernel fences to the vm's resv.
2) In fact, TLB invalidation fences may currently not be assumed to be
   ordered at all. This is fixed in patch 3.
3) The combination of fixes for 1) and 2) makes the rebind
   of each vma wait for the TLB invalidation of the previous
   rebind, which is unnecessary and would incur unneeded latency.
   This is fixed in patch 1 where we move rebind TLB invalidation
   to the ring ops.

Patch 4 helps avoid fragile fence context assumptions for binds.
Patch 5 helps us decide which local bos can be evicted
Patch 6 allows evictions of completely unbound local bos

Thomas Hellström (7):
  drm/xe: Use ring ops TLB invalidation for rebinds
  drm/xe: Rework rebinding
  drm/xe: Make TLB invalidation fences unordered
  drm/xe: Reserve fences where needed
  drm/xe: Move vma rebinding to the drm_exec locking loop
  drm/xe/bo: Forward the decision to evict local objects during
    validation
  drm/xe/bo: Allow eviction of unbound local bos

 drivers/gpu/drm/xe/display/xe_fb_pin.c      |  2 +-
 drivers/gpu/drm/xe/tests/xe_bo.c            |  6 +-
 drivers/gpu/drm/xe/tests/xe_dma_buf.c       |  4 +-
 drivers/gpu/drm/xe/tests/xe_migrate.c       |  2 +-
 drivers/gpu/drm/xe/xe_bo.c                  | 40 ++++++---
 drivers/gpu/drm/xe/xe_bo.h                  |  2 +-
 drivers/gpu/drm/xe/xe_dma_buf.c             |  2 +-
 drivers/gpu/drm/xe/xe_exec.c                | 74 +++++------------
 drivers/gpu/drm/xe/xe_exec_queue_types.h    |  2 +
 drivers/gpu/drm/xe/xe_ggtt.c                |  2 +-
 drivers/gpu/drm/xe/xe_gt_pagefault.c        |  5 +-
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c |  1 -
 drivers/gpu/drm/xe/xe_gt_types.h            |  7 --
 drivers/gpu/drm/xe/xe_pt.c                  | 24 ++++--
 drivers/gpu/drm/xe/xe_ring_ops.c            | 11 +--
 drivers/gpu/drm/xe/xe_sched_job.c           | 11 +++
 drivers/gpu/drm/xe/xe_sched_job_types.h     |  2 +
 drivers/gpu/drm/xe/xe_vm.c                  | 90 ++++++++++-----------
 drivers/gpu/drm/xe/xe_vm.h                  |  5 +-
 drivers/gpu/drm/xe/xe_vm_types.h            | 18 ++++-
 20 files changed, 159 insertions(+), 151 deletions(-)

-- 
2.44.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/7] drm/xe: Make TLB invalidation fences unordered
  2024-03-21 11:37 [PATCH 0/7] drm/xe: Rework rebinding and allow same-vm eviction Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-21 11:37 ` [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds Thomas Hellström
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Matthew Brost, stable

They can actually complete out-of-order, so allocate a unique
fence context for each fence.

Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after rebinds issued from execs")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 1 -
 drivers/gpu/drm/xe/xe_gt_types.h            | 7 -------
 drivers/gpu/drm/xe/xe_pt.c                  | 3 +--
 3 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index a3c4ffba679d..787cba5e49a1 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -63,7 +63,6 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 	INIT_LIST_HEAD(&gt->tlb_invalidation.pending_fences);
 	spin_lock_init(&gt->tlb_invalidation.pending_lock);
 	spin_lock_init(&gt->tlb_invalidation.lock);
-	gt->tlb_invalidation.fence_context = dma_fence_context_alloc(1);
 	INIT_DELAYED_WORK(&gt->tlb_invalidation.fence_tdr,
 			  xe_gt_tlb_fence_timeout);
 
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index f6da2ad9719f..2143dffcaf11 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -179,13 +179,6 @@ struct xe_gt {
 		 * xe_gt_tlb_fence_timeout after the timeut interval is over.
 		 */
 		struct delayed_work fence_tdr;
-		/** @tlb_invalidation.fence_context: context for TLB invalidation fences */
-		u64 fence_context;
-		/**
-		 * @tlb_invalidation.fence_seqno: seqno to TLB invalidation fences, protected by
-		 * tlb_invalidation.lock
-		 */
-		u32 fence_seqno;
 		/** @tlb_invalidation.lock: protects TLB invalidation fences */
 		spinlock_t lock;
 	} tlb_invalidation;
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 8d3922d2206e..a5bb8d20f815 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1135,8 +1135,7 @@ static int invalidation_fence_init(struct xe_gt *gt,
 	spin_lock_irq(&gt->tlb_invalidation.lock);
 	dma_fence_init(&ifence->base.base, &invalidation_fence_ops,
 		       &gt->tlb_invalidation.lock,
-		       gt->tlb_invalidation.fence_context,
-		       ++gt->tlb_invalidation.fence_seqno);
+		       dma_fence_context_alloc(1), 1);
 	spin_unlock_irq(&gt->tlb_invalidation.lock);
 
 	INIT_LIST_HEAD(&ifence->base.link);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds
  2024-03-21 11:37 [PATCH 0/7] drm/xe: Rework rebinding and allow same-vm eviction Thomas Hellström
  2024-03-21 11:37 ` [PATCH 1/7] drm/xe: Make TLB invalidation fences unordered Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-21 19:09   ` Matthew Brost
  2024-03-21 11:37 ` [PATCH 2/7] drm/xe: Rework rebinding Thomas Hellström
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Matthew Brost, stable

For each rebind we insert a GuC TLB invalidation and add a
corresponding unordered TLB invalidation fence. This might
add a huge number of TLB invalidation fences to wait for so
rather than doing that, defer the TLB invalidation to the
next ring ops for each affected exec queue. Since the TLB
is invalidated on exec_queue switch, we need to invalidate
once for each affected exec_queue.

Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after rebinds issued from execs")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  2 ++
 drivers/gpu/drm/xe/xe_pt.c               |  5 +++--
 drivers/gpu/drm/xe/xe_ring_ops.c         | 11 ++++-------
 drivers/gpu/drm/xe/xe_sched_job.c        | 11 +++++++++++
 drivers/gpu/drm/xe/xe_sched_job_types.h  |  2 ++
 drivers/gpu/drm/xe/xe_vm_types.h         |  5 +++++
 6 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 62b3d9d1d7cd..891ad30e906f 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -148,6 +148,8 @@ struct xe_exec_queue {
 	const struct xe_ring_ops *ring_ops;
 	/** @entity: DRM sched entity for this exec queue (1 to 1 relationship) */
 	struct drm_sched_entity *entity;
+	/** @tlb_flush_seqno: The seqno of the last rebind tlb flush performed */
+	u64 tlb_flush_seqno;
 	/** @lrc: logical ring context for this exec queue */
 	struct xe_lrc lrc[];
 };
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 8d3922d2206e..21bc0d13fccf 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1254,11 +1254,12 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
 	 * non-faulting LR, in particular on user-space batch buffer chaining,
 	 * it needs to be done here.
 	 */
-	if ((rebind && !xe_vm_in_lr_mode(vm) && !vm->batch_invalidate_tlb) ||
-	    (!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) {
+	if ((!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) {
 		ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
 		if (!ifence)
 			return ERR_PTR(-ENOMEM);
+	} else if (rebind && !xe_vm_in_lr_mode(vm) && !vm->batch_invalidate_tlb) {
+		vm->tlb_flush_seqno++;
 	}
 
 	rfence = kzalloc(sizeof(*rfence), GFP_KERNEL);
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index c4edffcd4a32..5b2b37b59813 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -219,10 +219,9 @@ static void __emit_job_gen12_simple(struct xe_sched_job *job, struct xe_lrc *lrc
 {
 	u32 dw[MAX_JOB_SIZE_DW], i = 0;
 	u32 ppgtt_flag = get_ppgtt_flag(job);
-	struct xe_vm *vm = job->q->vm;
 	struct xe_gt *gt = job->q->gt;
 
-	if (vm && vm->batch_invalidate_tlb) {
+	if (job->ring_ops_flush_tlb) {
 		dw[i++] = preparser_disable(true);
 		i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
 					seqno, true, dw, i);
@@ -270,7 +269,6 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
 	struct xe_gt *gt = job->q->gt;
 	struct xe_device *xe = gt_to_xe(gt);
 	bool decode = job->q->class == XE_ENGINE_CLASS_VIDEO_DECODE;
-	struct xe_vm *vm = job->q->vm;
 
 	dw[i++] = preparser_disable(true);
 
@@ -282,13 +280,13 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
 			i = emit_aux_table_inv(gt, VE0_AUX_INV, dw, i);
 	}
 
-	if (vm && vm->batch_invalidate_tlb)
+	if (job->ring_ops_flush_tlb)
 		i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
 					seqno, true, dw, i);
 
 	dw[i++] = preparser_disable(false);
 
-	if (!vm || !vm->batch_invalidate_tlb)
+	if (!job->ring_ops_flush_tlb)
 		i = emit_store_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
 					seqno, dw, i);
 
@@ -317,7 +315,6 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 	struct xe_gt *gt = job->q->gt;
 	struct xe_device *xe = gt_to_xe(gt);
 	bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
-	struct xe_vm *vm = job->q->vm;
 	u32 mask_flags = 0;
 
 	dw[i++] = preparser_disable(true);
@@ -327,7 +324,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 		mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
 
 	/* See __xe_pt_bind_vma() for a discussion on TLB invalidations. */
-	i = emit_pipe_invalidate(mask_flags, vm && vm->batch_invalidate_tlb, dw, i);
+	i = emit_pipe_invalidate(mask_flags, job->ring_ops_flush_tlb, dw, i);
 
 	/* hsdes: 1809175790 */
 	if (has_aux_ccs(xe))
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index 8151ddafb940..d55458d915a9 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -250,6 +250,17 @@ bool xe_sched_job_completed(struct xe_sched_job *job)
 
 void xe_sched_job_arm(struct xe_sched_job *job)
 {
+	struct xe_exec_queue *q = job->q;
+	struct xe_vm *vm = q->vm;
+
+	if (vm && !xe_sched_job_is_migration(q) && !xe_vm_in_lr_mode(vm) &&
+	    vm->tlb_flush_seqno != q->tlb_flush_seqno) {
+		q->tlb_flush_seqno = vm->tlb_flush_seqno;
+		job->ring_ops_flush_tlb = true;
+	} else if (vm && vm->batch_invalidate_tlb) {
+		job->ring_ops_flush_tlb = true;
+	}
+
 	drm_sched_job_arm(&job->drm);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h
index b1d83da50a53..5e12724219fd 100644
--- a/drivers/gpu/drm/xe/xe_sched_job_types.h
+++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
@@ -39,6 +39,8 @@ struct xe_sched_job {
 	} user_fence;
 	/** @migrate_flush_flags: Additional flush flags for migration jobs */
 	u32 migrate_flush_flags;
+	/** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */
+	bool ring_ops_flush_tlb;
 	/** @batch_addr: batch buffer address of job */
 	u64 batch_addr[];
 };
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index ae5fb565f6bf..5747f136d24d 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -264,6 +264,11 @@ struct xe_vm {
 		bool capture_once;
 	} error_capture;
 
+	/**
+	 * @tlb_flush_seqno: Required TLB flush seqno for the next exec.
+	 * protected by the vm resv.
+	 */
+	u64 tlb_flush_seqno;
 	/** @batch_invalidate_tlb: Always invalidate TLB before batch start */
 	bool batch_invalidate_tlb;
 	/** @xef: XE file handle for tracking this VM's drm client */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/7] drm/xe: Rework rebinding
  2024-03-21 11:37 [PATCH 0/7] drm/xe: Rework rebinding and allow same-vm eviction Thomas Hellström
  2024-03-21 11:37 ` [PATCH 1/7] drm/xe: Make TLB invalidation fences unordered Thomas Hellström
  2024-03-21 11:37 ` [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-21 19:14   ` Matthew Brost
  2024-03-21 11:37 ` [PATCH 2/7] drm/xe: Use ring ops TLB invalidation for rebinds Thomas Hellström
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Rodrigo Vivi, Matthew Brost, stable

Instead of handling the vm's rebind fence separately,
which is error prone if they are not strictly ordered,
attach rebind fences as kernel fences to the vm's resv.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_exec.c     | 31 +++----------------------------
 drivers/gpu/drm/xe/xe_pt.c       |  2 +-
 drivers/gpu/drm/xe/xe_vm.c       | 27 +++++++++------------------
 drivers/gpu/drm/xe/xe_vm.h       |  2 +-
 drivers/gpu/drm/xe/xe_vm_types.h |  3 ---
 5 files changed, 14 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 7692ebfe7d47..759497d4a102 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -152,7 +152,6 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	struct drm_exec *exec = &vm_exec.exec;
 	u32 i, num_syncs = 0, num_ufence = 0;
 	struct xe_sched_job *job;
-	struct dma_fence *rebind_fence;
 	struct xe_vm *vm;
 	bool write_locked, skip_retry = false;
 	ktime_t end = 0;
@@ -294,35 +293,11 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	 * Rebind any invalidated userptr or evicted BOs in the VM, non-compute
 	 * VM mode only.
 	 */
-	rebind_fence = xe_vm_rebind(vm, false);
-	if (IS_ERR(rebind_fence)) {
-		err = PTR_ERR(rebind_fence);
+	err = xe_vm_rebind(vm, false);
+	if (err)
 		goto err_put_job;
-	}
-
-	/*
-	 * We store the rebind_fence in the VM so subsequent execs don't get
-	 * scheduled before the rebinds of userptrs / evicted BOs is complete.
-	 */
-	if (rebind_fence) {
-		dma_fence_put(vm->rebind_fence);
-		vm->rebind_fence = rebind_fence;
-	}
-	if (vm->rebind_fence) {
-		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-			     &vm->rebind_fence->flags)) {
-			dma_fence_put(vm->rebind_fence);
-			vm->rebind_fence = NULL;
-		} else {
-			dma_fence_get(vm->rebind_fence);
-			err = drm_sched_job_add_dependency(&job->drm,
-							   vm->rebind_fence);
-			if (err)
-				goto err_put_job;
-		}
-	}
 
-	/* Wait behind munmap style rebinds */
+	/* Wait behind rebinds */
 	if (!xe_vm_in_lr_mode(vm)) {
 		err = drm_sched_job_add_resv_dependencies(&job->drm,
 							  xe_vm_resv(vm),
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 21bc0d13fccf..0484ed5b495f 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1298,7 +1298,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
 		}
 
 		/* add shared fence now for pagetable delayed destroy */
-		dma_resv_add_fence(xe_vm_resv(vm), fence, !rebind &&
+		dma_resv_add_fence(xe_vm_resv(vm), fence, rebind ||
 				   last_munmap_rebind ?
 				   DMA_RESV_USAGE_KERNEL :
 				   DMA_RESV_USAGE_BOOKKEEP);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 80d43d75b1da..35fba6e3f889 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -522,7 +522,6 @@ static void preempt_rebind_work_func(struct work_struct *w)
 {
 	struct xe_vm *vm = container_of(w, struct xe_vm, preempt.rebind_work);
 	struct drm_exec exec;
-	struct dma_fence *rebind_fence;
 	unsigned int fence_count = 0;
 	LIST_HEAD(preempt_fences);
 	ktime_t end = 0;
@@ -568,18 +567,11 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	if (err)
 		goto out_unlock;
 
-	rebind_fence = xe_vm_rebind(vm, true);
-	if (IS_ERR(rebind_fence)) {
-		err = PTR_ERR(rebind_fence);
+	err = xe_vm_rebind(vm, true);
+	if (err)
 		goto out_unlock;
-	}
-
-	if (rebind_fence) {
-		dma_fence_wait(rebind_fence, false);
-		dma_fence_put(rebind_fence);
-	}
 
-	/* Wait on munmap style VM unbinds */
+	/* Wait on rebinds and munmap style VM unbinds */
 	wait = dma_resv_wait_timeout(xe_vm_resv(vm),
 				     DMA_RESV_USAGE_KERNEL,
 				     false, MAX_SCHEDULE_TIMEOUT);
@@ -773,14 +765,14 @@ xe_vm_bind_vma(struct xe_vma *vma, struct xe_exec_queue *q,
 	       struct xe_sync_entry *syncs, u32 num_syncs,
 	       bool first_op, bool last_op);
 
-struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
+int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 {
-	struct dma_fence *fence = NULL;
+	struct dma_fence *fence;
 	struct xe_vma *vma, *next;
 
 	lockdep_assert_held(&vm->lock);
 	if (xe_vm_in_lr_mode(vm) && !rebind_worker)
-		return NULL;
+		return 0;
 
 	xe_vm_assert_held(vm);
 	list_for_each_entry_safe(vma, next, &vm->rebind_list,
@@ -788,17 +780,17 @@ struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 		xe_assert(vm->xe, vma->tile_present);
 
 		list_del_init(&vma->combined_links.rebind);
-		dma_fence_put(fence);
 		if (rebind_worker)
 			trace_xe_vma_rebind_worker(vma);
 		else
 			trace_xe_vma_rebind_exec(vma);
 		fence = xe_vm_bind_vma(vma, NULL, NULL, 0, false, false);
 		if (IS_ERR(fence))
-			return fence;
+			return PTR_ERR(fence);
+		dma_fence_put(fence);
 	}
 
-	return fence;
+	return 0;
 }
 
 static void xe_vma_free(struct xe_vma *vma)
@@ -1588,7 +1580,6 @@ static void vm_destroy_work_func(struct work_struct *w)
 		XE_WARN_ON(vm->pt_root[id]);
 
 	trace_xe_vm_free(vm);
-	dma_fence_put(vm->rebind_fence);
 	kfree(vm);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 6df1f1c7f85d..4853354336f2 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -207,7 +207,7 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm);
 
 int xe_vm_userptr_check_repin(struct xe_vm *vm);
 
-struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
+int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
 
 int xe_vm_invalidate_vma(struct xe_vma *vma);
 
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 5747f136d24d..badf3945083d 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -177,9 +177,6 @@ struct xe_vm {
 	 */
 	struct list_head rebind_list;
 
-	/** @rebind_fence: rebind fence from execbuf */
-	struct dma_fence *rebind_fence;
-
 	/**
 	 * @destroy_work: worker to destroy VM, needed as a dma_fence signaling
 	 * from an irq context can be last put and the destroy needs to be able
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/7] drm/xe: Use ring ops TLB invalidation for rebinds
  2024-03-21 11:37 [PATCH 0/7] drm/xe: Rework rebinding and allow same-vm eviction Thomas Hellström
                   ` (2 preceding siblings ...)
  2024-03-21 11:37 ` [PATCH 2/7] drm/xe: Rework rebinding Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-21 11:37 ` [PATCH 3/7] drm/xe: Make TLB invalidation fences unordered Thomas Hellström
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Matthew Brost, stable

For each rebind we insert a GuC TLB invalidation and add a
corresponding unordered TLB invalidation fence. This might
add a huge number of TLB invalidation fences to wait for so
rather than doing that, defer the TLB invalidation to the
next ring ops for each affected exec queue. Since the TLB
is invalidated on exec_queue switch, we need to invalidate
once for each affected exec_queue.

Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after rebinds issued from execs")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  2 ++
 drivers/gpu/drm/xe/xe_pt.c               |  5 +++--
 drivers/gpu/drm/xe/xe_ring_ops.c         | 11 ++++-------
 drivers/gpu/drm/xe/xe_sched_job.c        | 11 +++++++++++
 drivers/gpu/drm/xe/xe_sched_job_types.h  |  2 ++
 drivers/gpu/drm/xe/xe_vm_types.h         |  5 +++++
 6 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 62b3d9d1d7cd..891ad30e906f 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -148,6 +148,8 @@ struct xe_exec_queue {
 	const struct xe_ring_ops *ring_ops;
 	/** @entity: DRM sched entity for this exec queue (1 to 1 relationship) */
 	struct drm_sched_entity *entity;
+	/** @tlb_flush_seqno: The seqno of the last rebind tlb flush performed */
+	u64 tlb_flush_seqno;
 	/** @lrc: logical ring context for this exec queue */
 	struct xe_lrc lrc[];
 };
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index a5bb8d20f815..109c26e5689e 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1253,11 +1253,12 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
 	 * non-faulting LR, in particular on user-space batch buffer chaining,
 	 * it needs to be done here.
 	 */
-	if ((rebind && !xe_vm_in_lr_mode(vm) && !vm->batch_invalidate_tlb) ||
-	    (!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) {
+	if ((!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) {
 		ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
 		if (!ifence)
 			return ERR_PTR(-ENOMEM);
+	} else if (rebind && !xe_vm_in_lr_mode(vm) && !vm->batch_invalidate_tlb) {
+		vm->tlb_flush_seqno++;
 	}
 
 	rfence = kzalloc(sizeof(*rfence), GFP_KERNEL);
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index c4edffcd4a32..5b2b37b59813 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -219,10 +219,9 @@ static void __emit_job_gen12_simple(struct xe_sched_job *job, struct xe_lrc *lrc
 {
 	u32 dw[MAX_JOB_SIZE_DW], i = 0;
 	u32 ppgtt_flag = get_ppgtt_flag(job);
-	struct xe_vm *vm = job->q->vm;
 	struct xe_gt *gt = job->q->gt;
 
-	if (vm && vm->batch_invalidate_tlb) {
+	if (job->ring_ops_flush_tlb) {
 		dw[i++] = preparser_disable(true);
 		i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
 					seqno, true, dw, i);
@@ -270,7 +269,6 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
 	struct xe_gt *gt = job->q->gt;
 	struct xe_device *xe = gt_to_xe(gt);
 	bool decode = job->q->class == XE_ENGINE_CLASS_VIDEO_DECODE;
-	struct xe_vm *vm = job->q->vm;
 
 	dw[i++] = preparser_disable(true);
 
@@ -282,13 +280,13 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
 			i = emit_aux_table_inv(gt, VE0_AUX_INV, dw, i);
 	}
 
-	if (vm && vm->batch_invalidate_tlb)
+	if (job->ring_ops_flush_tlb)
 		i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
 					seqno, true, dw, i);
 
 	dw[i++] = preparser_disable(false);
 
-	if (!vm || !vm->batch_invalidate_tlb)
+	if (!job->ring_ops_flush_tlb)
 		i = emit_store_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
 					seqno, dw, i);
 
@@ -317,7 +315,6 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 	struct xe_gt *gt = job->q->gt;
 	struct xe_device *xe = gt_to_xe(gt);
 	bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
-	struct xe_vm *vm = job->q->vm;
 	u32 mask_flags = 0;
 
 	dw[i++] = preparser_disable(true);
@@ -327,7 +324,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 		mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
 
 	/* See __xe_pt_bind_vma() for a discussion on TLB invalidations. */
-	i = emit_pipe_invalidate(mask_flags, vm && vm->batch_invalidate_tlb, dw, i);
+	i = emit_pipe_invalidate(mask_flags, job->ring_ops_flush_tlb, dw, i);
 
 	/* hsdes: 1809175790 */
 	if (has_aux_ccs(xe))
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index 8151ddafb940..d55458d915a9 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -250,6 +250,17 @@ bool xe_sched_job_completed(struct xe_sched_job *job)
 
 void xe_sched_job_arm(struct xe_sched_job *job)
 {
+	struct xe_exec_queue *q = job->q;
+	struct xe_vm *vm = q->vm;
+
+	if (vm && !xe_sched_job_is_migration(q) && !xe_vm_in_lr_mode(vm) &&
+	    vm->tlb_flush_seqno != q->tlb_flush_seqno) {
+		q->tlb_flush_seqno = vm->tlb_flush_seqno;
+		job->ring_ops_flush_tlb = true;
+	} else if (vm && vm->batch_invalidate_tlb) {
+		job->ring_ops_flush_tlb = true;
+	}
+
 	drm_sched_job_arm(&job->drm);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h
index b1d83da50a53..5e12724219fd 100644
--- a/drivers/gpu/drm/xe/xe_sched_job_types.h
+++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
@@ -39,6 +39,8 @@ struct xe_sched_job {
 	} user_fence;
 	/** @migrate_flush_flags: Additional flush flags for migration jobs */
 	u32 migrate_flush_flags;
+	/** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */
+	bool ring_ops_flush_tlb;
 	/** @batch_addr: batch buffer address of job */
 	u64 batch_addr[];
 };
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index ae5fb565f6bf..5747f136d24d 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -264,6 +264,11 @@ struct xe_vm {
 		bool capture_once;
 	} error_capture;
 
+	/**
+	 * @tlb_flush_seqno: Required TLB flush seqno for the next exec.
+	 * protected by the vm resv.
+	 */
+	u64 tlb_flush_seqno;
 	/** @batch_invalidate_tlb: Always invalidate TLB before batch start */
 	bool batch_invalidate_tlb;
 	/** @xef: XE file handle for tracking this VM's drm client */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/7] drm/xe: Make TLB invalidation fences unordered
  2024-03-21 11:37 [PATCH 0/7] drm/xe: Rework rebinding and allow same-vm eviction Thomas Hellström
                   ` (3 preceding siblings ...)
  2024-03-21 11:37 ` [PATCH 2/7] drm/xe: Use ring ops TLB invalidation for rebinds Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-22 17:55   ` Matthew Brost
  2024-03-21 11:37 ` [PATCH 3/7] drm/xe: Reserve fences where needed Thomas Hellström
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Matthew Brost, stable

They can actually complete out-of-order, so allocate a unique
fence context for each fence.

Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after rebinds issued from execs")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 1 -
 drivers/gpu/drm/xe/xe_gt_types.h            | 7 -------
 drivers/gpu/drm/xe/xe_pt.c                  | 3 +--
 3 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index a3c4ffba679d..787cba5e49a1 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -63,7 +63,6 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 	INIT_LIST_HEAD(&gt->tlb_invalidation.pending_fences);
 	spin_lock_init(&gt->tlb_invalidation.pending_lock);
 	spin_lock_init(&gt->tlb_invalidation.lock);
-	gt->tlb_invalidation.fence_context = dma_fence_context_alloc(1);
 	INIT_DELAYED_WORK(&gt->tlb_invalidation.fence_tdr,
 			  xe_gt_tlb_fence_timeout);
 
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index f6da2ad9719f..2143dffcaf11 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -179,13 +179,6 @@ struct xe_gt {
 		 * xe_gt_tlb_fence_timeout after the timeut interval is over.
 		 */
 		struct delayed_work fence_tdr;
-		/** @tlb_invalidation.fence_context: context for TLB invalidation fences */
-		u64 fence_context;
-		/**
-		 * @tlb_invalidation.fence_seqno: seqno to TLB invalidation fences, protected by
-		 * tlb_invalidation.lock
-		 */
-		u32 fence_seqno;
 		/** @tlb_invalidation.lock: protects TLB invalidation fences */
 		spinlock_t lock;
 	} tlb_invalidation;
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 0484ed5b495f..045a8c0845ba 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1135,8 +1135,7 @@ static int invalidation_fence_init(struct xe_gt *gt,
 	spin_lock_irq(&gt->tlb_invalidation.lock);
 	dma_fence_init(&ifence->base.base, &invalidation_fence_ops,
 		       &gt->tlb_invalidation.lock,
-		       gt->tlb_invalidation.fence_context,
-		       ++gt->tlb_invalidation.fence_seqno);
+		       dma_fence_context_alloc(1), 1);
 	spin_unlock_irq(&gt->tlb_invalidation.lock);
 
 	INIT_LIST_HEAD(&ifence->base.link);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/7] drm/xe: Reserve fences where needed
  2024-03-21 11:37 [PATCH 0/7] drm/xe: Rework rebinding and allow same-vm eviction Thomas Hellström
                   ` (4 preceding siblings ...)
  2024-03-21 11:37 ` [PATCH 3/7] drm/xe: Make TLB invalidation fences unordered Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-21 20:10   ` Matthew Brost
  2024-03-21 11:37 ` [PATCH 4/7] " Thomas Hellström
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Matthew Auld

Reserve fence slots where actually needed rather than trying to
predict how many fence slots will be needed over a complete
wound-wait transaction.

Fixes: 29f424eb8702 ("drm/xe/exec: move fence reservation")
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_exec.c         | 27 +---------------
 drivers/gpu/drm/xe/xe_gt_pagefault.c |  3 +-
 drivers/gpu/drm/xe/xe_pt.c           | 14 ++++++++
 drivers/gpu/drm/xe/xe_vm.c           | 48 +++++++++++++---------------
 drivers/gpu/drm/xe/xe_vm.h           |  3 +-
 5 files changed, 40 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 7692ebfe7d47..511d23426a5e 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -96,41 +96,16 @@
 
 static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec)
 {
-	struct xe_vm *vm = container_of(vm_exec->vm, struct xe_vm, gpuvm);
 	struct drm_gem_object *obj;
 	unsigned long index;
-	int num_fences;
 	int ret;
 
 	ret = drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec);
 	if (ret)
 		return ret;
 
-	/*
-	 * 1 fence slot for the final submit, and 1 more for every per-tile for
-	 * GPU bind and 1 extra for CPU bind. Note that there are potentially
-	 * many vma per object/dma-resv, however the fence slot will just be
-	 * re-used, since they are largely the same timeline and the seqno
-	 * should be in order. In the case of CPU bind there is dummy fence used
-	 * for all CPU binds, so no need to have a per-tile slot for that.
-	 */
-	num_fences = 1 + 1 + vm->xe->info.tile_count;
-
-	/*
-	 * We don't know upfront exactly how many fence slots we will need at
-	 * the start of the exec, since the TTM bo_validate above can consume
-	 * numerous fence slots. Also due to how the dma_resv_reserve_fences()
-	 * works it only ensures that at least that many fence slots are
-	 * available i.e if there are already 10 slots available and we reserve
-	 * two more, it can just noop without reserving anything.  With this it
-	 * is quite possible that TTM steals some of the fence slots and then
-	 * when it comes time to do the vma binding and final exec stage we are
-	 * lacking enough fence slots, leading to some nasty BUG_ON() when
-	 * adding the fences. Hence just add our own fences here, after the
-	 * validate stage.
-	 */
 	drm_exec_for_each_locked_object(&vm_exec->exec, index, obj) {
-		ret = dma_resv_reserve_fences(obj->resv, num_fences);
+		ret = dma_resv_reserve_fences(obj->resv, 1);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 241c294270d9..fa9e9853c53b 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -100,10 +100,9 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
 {
 	struct xe_bo *bo = xe_vma_bo(vma);
 	struct xe_vm *vm = xe_vma_vm(vma);
-	unsigned int num_shared = 2; /* slots for bind + move */
 	int err;
 
-	err = xe_vm_prepare_vma(exec, vma, num_shared);
+	err = xe_vm_lock_vma(exec, vma);
 	if (err)
 		return err;
 
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 109c26e5689e..7add08b2954e 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1235,6 +1235,13 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
 	err = xe_pt_prepare_bind(tile, vma, entries, &num_entries);
 	if (err)
 		goto err;
+
+	err = dma_resv_reserve_fences(xe_vm_resv(vm), 1);
+	if (!err && !xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
+		err = dma_resv_reserve_fences(xe_vma_bo(vma)->ttm.base.resv, 1);
+	if (err)
+		goto err;
+
 	xe_tile_assert(tile, num_entries <= ARRAY_SIZE(entries));
 
 	xe_vm_dbg_print_entries(tile_to_xe(tile), entries, num_entries);
@@ -1576,6 +1583,7 @@ __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queu
 	struct dma_fence *fence = NULL;
 	struct invalidation_fence *ifence;
 	struct xe_range_fence *rfence;
+	int err;
 
 	LLIST_HEAD(deferred);
 
@@ -1593,6 +1601,12 @@ __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queu
 	xe_pt_calc_rfence_interval(vma, &unbind_pt_update, entries,
 				   num_entries);
 
+	err = dma_resv_reserve_fences(xe_vm_resv(vm), 1);
+	if (!err && !xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
+		err = dma_resv_reserve_fences(xe_vma_bo(vma)->ttm.base.resv, 1);
+	if (err)
+		return ERR_PTR(err);
+
 	ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
 	if (!ifence)
 		return ERR_PTR(-ENOMEM);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 80d43d75b1da..cff8db605743 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -485,14 +485,11 @@ static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
 static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
 				 bool *done)
 {
+	struct drm_gem_object *obj;
+	unsigned long index;
 	int err;
 
-	/*
-	 * 1 fence for each preempt fence plus a fence for each tile from a
-	 * possible rebind
-	 */
-	err = drm_gpuvm_prepare_vm(&vm->gpuvm, exec, vm->preempt.num_exec_queues +
-				   vm->xe->info.tile_count);
+	err = drm_gpuvm_prepare_vm(&vm->gpuvm, exec, 0);
 	if (err)
 		return err;
 
@@ -507,7 +504,7 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
 		return 0;
 	}
 
-	err = drm_gpuvm_prepare_objects(&vm->gpuvm, exec, vm->preempt.num_exec_queues);
+	err = drm_gpuvm_prepare_objects(&vm->gpuvm, exec, 0);
 	if (err)
 		return err;
 
@@ -515,7 +512,17 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
 	if (err)
 		return err;
 
-	return drm_gpuvm_validate(&vm->gpuvm, exec);
+	err = drm_gpuvm_validate(&vm->gpuvm, exec);
+	if (err)
+		return err;
+
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		err = dma_resv_reserve_fences(obj->resv, vm->preempt.num_exec_queues);
+		if (err)
+			return err;
+	}
+
+	return 0;
 }
 
 static void preempt_rebind_work_func(struct work_struct *w)
@@ -1004,35 +1011,26 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
 }
 
 /**
- * xe_vm_prepare_vma() - drm_exec utility to lock a vma
+ * xe_vm_lock_vma() - drm_exec utility to lock a vma
  * @exec: The drm_exec object we're currently locking for.
  * @vma: The vma for witch we want to lock the vm resv and any attached
  * object's resv.
- * @num_shared: The number of dma-fence slots to pre-allocate in the
- * objects' reservation objects.
  *
  * Return: 0 on success, negative error code on error. In particular
  * may return -EDEADLK on WW transaction contention and -EINTR if
  * an interruptible wait is terminated by a signal.
  */
-int xe_vm_prepare_vma(struct drm_exec *exec, struct xe_vma *vma,
-		      unsigned int num_shared)
+int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma)
 {
 	struct xe_vm *vm = xe_vma_vm(vma);
 	struct xe_bo *bo = xe_vma_bo(vma);
 	int err;
 
 	XE_WARN_ON(!vm);
-	if (num_shared)
-		err = drm_exec_prepare_obj(exec, xe_vm_obj(vm), num_shared);
-	else
-		err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
-	if (!err && bo && !bo->vm) {
-		if (num_shared)
-			err = drm_exec_prepare_obj(exec, &bo->ttm.base, num_shared);
-		else
-			err = drm_exec_lock_obj(exec, &bo->ttm.base);
-	}
+
+	err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
+	if (!err && bo && !bo->vm)
+		err = drm_exec_lock_obj(exec, &bo->ttm.base);
 
 	return err;
 }
@@ -1044,7 +1042,7 @@ static void xe_vma_destroy_unlocked(struct xe_vma *vma)
 
 	drm_exec_init(&exec, 0, 0);
 	drm_exec_until_all_locked(&exec) {
-		err = xe_vm_prepare_vma(&exec, vma, 0);
+		err = xe_vm_lock_vma(&exec, vma);
 		drm_exec_retry_on_contention(&exec);
 		if (XE_WARN_ON(err))
 			break;
@@ -2511,7 +2509,7 @@ static int op_execute(struct drm_exec *exec, struct xe_vm *vm,
 
 	lockdep_assert_held_write(&vm->lock);
 
-	err = xe_vm_prepare_vma(exec, vma, 1);
+	err = xe_vm_lock_vma(exec, vma);
 	if (err)
 		return err;
 
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 6df1f1c7f85d..47f3960120a0 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -242,8 +242,7 @@ bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end);
 
 int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id);
 
-int xe_vm_prepare_vma(struct drm_exec *exec, struct xe_vma *vma,
-		      unsigned int num_shared);
+int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma);
 
 /**
  * xe_vm_resv() - Return's the vm's reservation object
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/7] drm/xe: Reserve fences where needed
  2024-03-21 11:37 [PATCH 0/7] drm/xe: Rework rebinding and allow same-vm eviction Thomas Hellström
                   ` (5 preceding siblings ...)
  2024-03-21 11:37 ` [PATCH 3/7] drm/xe: Reserve fences where needed Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-21 11:37 ` [PATCH 4/7] drm/xe: Rework rebinding Thomas Hellström
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Matthew Auld

Reserve fence slots where actually needed rather than trying to
predict how many fence slots will be needed over a complete
wound-wait transaction.

Fixes: 29f424eb8702 ("drm/xe/exec: move fence reservation")
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_exec.c         | 27 +---------------
 drivers/gpu/drm/xe/xe_gt_pagefault.c |  3 +-
 drivers/gpu/drm/xe/xe_pt.c           | 14 ++++++++
 drivers/gpu/drm/xe/xe_vm.c           | 48 +++++++++++++---------------
 drivers/gpu/drm/xe/xe_vm.h           |  3 +-
 5 files changed, 40 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 759497d4a102..397a49b731f1 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -96,41 +96,16 @@
 
 static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec)
 {
-	struct xe_vm *vm = container_of(vm_exec->vm, struct xe_vm, gpuvm);
 	struct drm_gem_object *obj;
 	unsigned long index;
-	int num_fences;
 	int ret;
 
 	ret = drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec);
 	if (ret)
 		return ret;
 
-	/*
-	 * 1 fence slot for the final submit, and 1 more for every per-tile for
-	 * GPU bind and 1 extra for CPU bind. Note that there are potentially
-	 * many vma per object/dma-resv, however the fence slot will just be
-	 * re-used, since they are largely the same timeline and the seqno
-	 * should be in order. In the case of CPU bind there is dummy fence used
-	 * for all CPU binds, so no need to have a per-tile slot for that.
-	 */
-	num_fences = 1 + 1 + vm->xe->info.tile_count;
-
-	/*
-	 * We don't know upfront exactly how many fence slots we will need at
-	 * the start of the exec, since the TTM bo_validate above can consume
-	 * numerous fence slots. Also due to how the dma_resv_reserve_fences()
-	 * works it only ensures that at least that many fence slots are
-	 * available i.e if there are already 10 slots available and we reserve
-	 * two more, it can just noop without reserving anything.  With this it
-	 * is quite possible that TTM steals some of the fence slots and then
-	 * when it comes time to do the vma binding and final exec stage we are
-	 * lacking enough fence slots, leading to some nasty BUG_ON() when
-	 * adding the fences. Hence just add our own fences here, after the
-	 * validate stage.
-	 */
 	drm_exec_for_each_locked_object(&vm_exec->exec, index, obj) {
-		ret = dma_resv_reserve_fences(obj->resv, num_fences);
+		ret = dma_resv_reserve_fences(obj->resv, 1);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 241c294270d9..fa9e9853c53b 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -100,10 +100,9 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
 {
 	struct xe_bo *bo = xe_vma_bo(vma);
 	struct xe_vm *vm = xe_vma_vm(vma);
-	unsigned int num_shared = 2; /* slots for bind + move */
 	int err;
 
-	err = xe_vm_prepare_vma(exec, vma, num_shared);
+	err = xe_vm_lock_vma(exec, vma);
 	if (err)
 		return err;
 
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 045a8c0845ba..92cd660fbcef 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1235,6 +1235,13 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
 	err = xe_pt_prepare_bind(tile, vma, entries, &num_entries);
 	if (err)
 		goto err;
+
+	err = dma_resv_reserve_fences(xe_vm_resv(vm), 1);
+	if (!err && !xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
+		err = dma_resv_reserve_fences(xe_vma_bo(vma)->ttm.base.resv, 1);
+	if (err)
+		goto err;
+
 	xe_tile_assert(tile, num_entries <= ARRAY_SIZE(entries));
 
 	xe_vm_dbg_print_entries(tile_to_xe(tile), entries, num_entries);
@@ -1576,6 +1583,7 @@ __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queu
 	struct dma_fence *fence = NULL;
 	struct invalidation_fence *ifence;
 	struct xe_range_fence *rfence;
+	int err;
 
 	LLIST_HEAD(deferred);
 
@@ -1593,6 +1601,12 @@ __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queu
 	xe_pt_calc_rfence_interval(vma, &unbind_pt_update, entries,
 				   num_entries);
 
+	err = dma_resv_reserve_fences(xe_vm_resv(vm), 1);
+	if (!err && !xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
+		err = dma_resv_reserve_fences(xe_vma_bo(vma)->ttm.base.resv, 1);
+	if (err)
+		return ERR_PTR(err);
+
 	ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
 	if (!ifence)
 		return ERR_PTR(-ENOMEM);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 35fba6e3f889..33a85e702e2c 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -485,14 +485,11 @@ static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
 static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
 				 bool *done)
 {
+	struct drm_gem_object *obj;
+	unsigned long index;
 	int err;
 
-	/*
-	 * 1 fence for each preempt fence plus a fence for each tile from a
-	 * possible rebind
-	 */
-	err = drm_gpuvm_prepare_vm(&vm->gpuvm, exec, vm->preempt.num_exec_queues +
-				   vm->xe->info.tile_count);
+	err = drm_gpuvm_prepare_vm(&vm->gpuvm, exec, 0);
 	if (err)
 		return err;
 
@@ -507,7 +504,7 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
 		return 0;
 	}
 
-	err = drm_gpuvm_prepare_objects(&vm->gpuvm, exec, vm->preempt.num_exec_queues);
+	err = drm_gpuvm_prepare_objects(&vm->gpuvm, exec, 0);
 	if (err)
 		return err;
 
@@ -515,7 +512,17 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
 	if (err)
 		return err;
 
-	return drm_gpuvm_validate(&vm->gpuvm, exec);
+	err = drm_gpuvm_validate(&vm->gpuvm, exec);
+	if (err)
+		return err;
+
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		err = dma_resv_reserve_fences(obj->resv, vm->preempt.num_exec_queues);
+		if (err)
+			return err;
+	}
+
+	return 0;
 }
 
 static void preempt_rebind_work_func(struct work_struct *w)
@@ -996,35 +1003,26 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
 }
 
 /**
- * xe_vm_prepare_vma() - drm_exec utility to lock a vma
+ * xe_vm_lock_vma() - drm_exec utility to lock a vma
  * @exec: The drm_exec object we're currently locking for.
  * @vma: The vma for witch we want to lock the vm resv and any attached
  * object's resv.
- * @num_shared: The number of dma-fence slots to pre-allocate in the
- * objects' reservation objects.
  *
  * Return: 0 on success, negative error code on error. In particular
  * may return -EDEADLK on WW transaction contention and -EINTR if
  * an interruptible wait is terminated by a signal.
  */
-int xe_vm_prepare_vma(struct drm_exec *exec, struct xe_vma *vma,
-		      unsigned int num_shared)
+int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma)
 {
 	struct xe_vm *vm = xe_vma_vm(vma);
 	struct xe_bo *bo = xe_vma_bo(vma);
 	int err;
 
 	XE_WARN_ON(!vm);
-	if (num_shared)
-		err = drm_exec_prepare_obj(exec, xe_vm_obj(vm), num_shared);
-	else
-		err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
-	if (!err && bo && !bo->vm) {
-		if (num_shared)
-			err = drm_exec_prepare_obj(exec, &bo->ttm.base, num_shared);
-		else
-			err = drm_exec_lock_obj(exec, &bo->ttm.base);
-	}
+
+	err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
+	if (!err && bo && !bo->vm)
+		err = drm_exec_lock_obj(exec, &bo->ttm.base);
 
 	return err;
 }
@@ -1036,7 +1034,7 @@ static void xe_vma_destroy_unlocked(struct xe_vma *vma)
 
 	drm_exec_init(&exec, 0, 0);
 	drm_exec_until_all_locked(&exec) {
-		err = xe_vm_prepare_vma(&exec, vma, 0);
+		err = xe_vm_lock_vma(&exec, vma);
 		drm_exec_retry_on_contention(&exec);
 		if (XE_WARN_ON(err))
 			break;
@@ -2502,7 +2500,7 @@ static int op_execute(struct drm_exec *exec, struct xe_vm *vm,
 
 	lockdep_assert_held_write(&vm->lock);
 
-	err = xe_vm_prepare_vma(exec, vma, 1);
+	err = xe_vm_lock_vma(exec, vma);
 	if (err)
 		return err;
 
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 4853354336f2..20009d8b4702 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -242,8 +242,7 @@ bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end);
 
 int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id);
 
-int xe_vm_prepare_vma(struct drm_exec *exec, struct xe_vma *vma,
-		      unsigned int num_shared);
+int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma);
 
 /**
  * xe_vm_resv() - Return's the vm's reservation object
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/7] drm/xe: Rework rebinding
  2024-03-21 11:37 [PATCH 0/7] drm/xe: Rework rebinding and allow same-vm eviction Thomas Hellström
                   ` (6 preceding siblings ...)
  2024-03-21 11:37 ` [PATCH 4/7] " Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-21 11:37 ` [PATCH 5/7] drm/xe: Move vma rebinding to the drm_exec locking loop Thomas Hellström
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Rodrigo Vivi, Matthew Brost, stable

Instead of handling the vm's rebind fence separately,
which is error prone if they are not strictly ordered,
attach rebind fences as kernel fences to the vm's resv.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_exec.c     | 31 +++----------------------------
 drivers/gpu/drm/xe/xe_pt.c       |  2 +-
 drivers/gpu/drm/xe/xe_vm.c       | 27 +++++++++------------------
 drivers/gpu/drm/xe/xe_vm.h       |  2 +-
 drivers/gpu/drm/xe/xe_vm_types.h |  3 ---
 5 files changed, 14 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 511d23426a5e..397a49b731f1 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -127,7 +127,6 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	struct drm_exec *exec = &vm_exec.exec;
 	u32 i, num_syncs = 0, num_ufence = 0;
 	struct xe_sched_job *job;
-	struct dma_fence *rebind_fence;
 	struct xe_vm *vm;
 	bool write_locked, skip_retry = false;
 	ktime_t end = 0;
@@ -269,35 +268,11 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	 * Rebind any invalidated userptr or evicted BOs in the VM, non-compute
 	 * VM mode only.
 	 */
-	rebind_fence = xe_vm_rebind(vm, false);
-	if (IS_ERR(rebind_fence)) {
-		err = PTR_ERR(rebind_fence);
+	err = xe_vm_rebind(vm, false);
+	if (err)
 		goto err_put_job;
-	}
-
-	/*
-	 * We store the rebind_fence in the VM so subsequent execs don't get
-	 * scheduled before the rebinds of userptrs / evicted BOs is complete.
-	 */
-	if (rebind_fence) {
-		dma_fence_put(vm->rebind_fence);
-		vm->rebind_fence = rebind_fence;
-	}
-	if (vm->rebind_fence) {
-		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-			     &vm->rebind_fence->flags)) {
-			dma_fence_put(vm->rebind_fence);
-			vm->rebind_fence = NULL;
-		} else {
-			dma_fence_get(vm->rebind_fence);
-			err = drm_sched_job_add_dependency(&job->drm,
-							   vm->rebind_fence);
-			if (err)
-				goto err_put_job;
-		}
-	}
 
-	/* Wait behind munmap style rebinds */
+	/* Wait behind rebinds */
 	if (!xe_vm_in_lr_mode(vm)) {
 		err = drm_sched_job_add_resv_dependencies(&job->drm,
 							  xe_vm_resv(vm),
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 7add08b2954e..92cd660fbcef 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1304,7 +1304,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
 		}
 
 		/* add shared fence now for pagetable delayed destroy */
-		dma_resv_add_fence(xe_vm_resv(vm), fence, !rebind &&
+		dma_resv_add_fence(xe_vm_resv(vm), fence, rebind ||
 				   last_munmap_rebind ?
 				   DMA_RESV_USAGE_KERNEL :
 				   DMA_RESV_USAGE_BOOKKEEP);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index cff8db605743..33a85e702e2c 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -529,7 +529,6 @@ static void preempt_rebind_work_func(struct work_struct *w)
 {
 	struct xe_vm *vm = container_of(w, struct xe_vm, preempt.rebind_work);
 	struct drm_exec exec;
-	struct dma_fence *rebind_fence;
 	unsigned int fence_count = 0;
 	LIST_HEAD(preempt_fences);
 	ktime_t end = 0;
@@ -575,18 +574,11 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	if (err)
 		goto out_unlock;
 
-	rebind_fence = xe_vm_rebind(vm, true);
-	if (IS_ERR(rebind_fence)) {
-		err = PTR_ERR(rebind_fence);
+	err = xe_vm_rebind(vm, true);
+	if (err)
 		goto out_unlock;
-	}
-
-	if (rebind_fence) {
-		dma_fence_wait(rebind_fence, false);
-		dma_fence_put(rebind_fence);
-	}
 
-	/* Wait on munmap style VM unbinds */
+	/* Wait on rebinds and munmap style VM unbinds */
 	wait = dma_resv_wait_timeout(xe_vm_resv(vm),
 				     DMA_RESV_USAGE_KERNEL,
 				     false, MAX_SCHEDULE_TIMEOUT);
@@ -780,14 +772,14 @@ xe_vm_bind_vma(struct xe_vma *vma, struct xe_exec_queue *q,
 	       struct xe_sync_entry *syncs, u32 num_syncs,
 	       bool first_op, bool last_op);
 
-struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
+int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 {
-	struct dma_fence *fence = NULL;
+	struct dma_fence *fence;
 	struct xe_vma *vma, *next;
 
 	lockdep_assert_held(&vm->lock);
 	if (xe_vm_in_lr_mode(vm) && !rebind_worker)
-		return NULL;
+		return 0;
 
 	xe_vm_assert_held(vm);
 	list_for_each_entry_safe(vma, next, &vm->rebind_list,
@@ -795,17 +787,17 @@ struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 		xe_assert(vm->xe, vma->tile_present);
 
 		list_del_init(&vma->combined_links.rebind);
-		dma_fence_put(fence);
 		if (rebind_worker)
 			trace_xe_vma_rebind_worker(vma);
 		else
 			trace_xe_vma_rebind_exec(vma);
 		fence = xe_vm_bind_vma(vma, NULL, NULL, 0, false, false);
 		if (IS_ERR(fence))
-			return fence;
+			return PTR_ERR(fence);
+		dma_fence_put(fence);
 	}
 
-	return fence;
+	return 0;
 }
 
 static void xe_vma_free(struct xe_vma *vma)
@@ -1586,7 +1578,6 @@ static void vm_destroy_work_func(struct work_struct *w)
 		XE_WARN_ON(vm->pt_root[id]);
 
 	trace_xe_vm_free(vm);
-	dma_fence_put(vm->rebind_fence);
 	kfree(vm);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 47f3960120a0..20009d8b4702 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -207,7 +207,7 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm);
 
 int xe_vm_userptr_check_repin(struct xe_vm *vm);
 
-struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
+int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
 
 int xe_vm_invalidate_vma(struct xe_vma *vma);
 
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 5747f136d24d..badf3945083d 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -177,9 +177,6 @@ struct xe_vm {
 	 */
 	struct list_head rebind_list;
 
-	/** @rebind_fence: rebind fence from execbuf */
-	struct dma_fence *rebind_fence;
-
 	/**
 	 * @destroy_work: worker to destroy VM, needed as a dma_fence signaling
 	 * from an irq context can be last put and the destroy needs to be able
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/7] drm/xe: Move vma rebinding to the drm_exec locking loop
  2024-03-21 11:37 [PATCH 0/7] drm/xe: Rework rebinding and allow same-vm eviction Thomas Hellström
                   ` (7 preceding siblings ...)
  2024-03-21 11:37 ` [PATCH 4/7] drm/xe: Rework rebinding Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-21 11:37 ` [PATCH 6/7] drm/xe/bo: Forward the decision to evict local objects during validation Thomas Hellström
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström

Rebinding might allocate page-table bos, causing evictions.
To support blocking locking during these evictions,
perform the rebinding in the drm_exec locking loop.

Fixes: 24f947d58fe5 ("drm/xe: Use DRM GPUVM helpers for external- and evicted objects")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_exec.c | 13 ++++++++++---
 drivers/gpu/drm/xe/xe_vm.c   | 12 +++++++++---
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 397a49b731f1..e00910246741 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -96,13 +96,20 @@
 
 static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec)
 {
+	struct xe_vm *vm = container_of(vm_exec->vm, struct xe_vm, gpuvm);
 	struct drm_gem_object *obj;
 	unsigned long index;
 	int ret;
 
-	ret = drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec);
-	if (ret)
-		return ret;
+	do {
+		ret = drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec);
+		if (ret)
+			return ret;
+
+		ret = xe_vm_rebind(vm, false);
+		if (ret)
+			return ret;
+	} while (!list_empty(&vm->gpuvm.evict.list));
 
 	drm_exec_for_each_locked_object(&vm_exec->exec, index, obj) {
 		ret = dma_resv_reserve_fences(obj->resv, 1);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 33a85e702e2c..f9da055180b3 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -512,9 +512,15 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
 	if (err)
 		return err;
 
-	err = drm_gpuvm_validate(&vm->gpuvm, exec);
-	if (err)
-		return err;
+	do {
+		err = drm_gpuvm_validate(&vm->gpuvm, exec);
+		if (err)
+			return err;
+
+		err = xe_vm_rebind(vm, true);
+		if (err)
+			return err;
+	} while (!list_empty(&vm->gpuvm.evict.list));
 
 	drm_exec_for_each_locked_object(exec, index, obj) {
 		err = dma_resv_reserve_fences(obj->resv, vm->preempt.num_exec_queues);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/7] drm/xe/bo: Forward the decision to evict local objects during validation
  2024-03-21 11:37 [PATCH 0/7] drm/xe: Rework rebinding and allow same-vm eviction Thomas Hellström
                   ` (8 preceding siblings ...)
  2024-03-21 11:37 ` [PATCH 5/7] drm/xe: Move vma rebinding to the drm_exec locking loop Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-21 11:37 ` [PATCH 7/7] drm/xe/bo: Allow eviction of unbound local bos Thomas Hellström
  2024-03-21 13:44 ` ✗ CI.Patch_applied: failure for drm/xe: Rework rebinding and allow same-vm eviction Patchwork
  11 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Rodrigo Vivi, Matthew Brost,
	Lucas De Marchi, Oded Gabbay

Currently we refuse evicting the VM's local objects. However that
is necessary for some objects. Most notably completely unbound objects.
Forward this decision to be per-object based in the TTM
eviction_valuable() callback.

Fixes: 24f947d58fe5 ("drm/xe: Use DRM GPUVM helpers for external- and evicted objects")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/display/xe_fb_pin.c |  2 +-
 drivers/gpu/drm/xe/tests/xe_bo.c       |  6 ++---
 drivers/gpu/drm/xe/tests/xe_dma_buf.c  |  4 +--
 drivers/gpu/drm/xe/tests/xe_migrate.c  |  2 +-
 drivers/gpu/drm/xe/xe_bo.c             | 36 ++++++++++++++++++--------
 drivers/gpu/drm/xe/xe_bo.h             |  2 +-
 drivers/gpu/drm/xe/xe_dma_buf.c        |  2 +-
 drivers/gpu/drm/xe/xe_exec.c           | 11 +++++---
 drivers/gpu/drm/xe/xe_ggtt.c           |  2 +-
 drivers/gpu/drm/xe/xe_gt_pagefault.c   |  2 +-
 drivers/gpu/drm/xe/xe_vm.c             | 15 +++++++----
 drivers/gpu/drm/xe/xe_vm_types.h       | 10 +++++++
 12 files changed, 64 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index 2a50a7eaaa31..1bf50f694110 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -288,7 +288,7 @@ static struct i915_vma *__xe_pin_fb_vma(struct intel_framebuffer *fb,
 	if (IS_DGFX(xe))
 		ret = xe_bo_migrate(bo, XE_PL_VRAM0);
 	else
-		ret = xe_bo_validate(bo, NULL, true);
+		ret = xe_bo_validate(bo, bo->vm);
 	if (!ret)
 		ttm_bo_pin(&bo->ttm);
 	ttm_bo_unreserve(&bo->ttm);
diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
index 0926a1c2eb86..5410cb1780a6 100644
--- a/drivers/gpu/drm/xe/tests/xe_bo.c
+++ b/drivers/gpu/drm/xe/tests/xe_bo.c
@@ -28,7 +28,7 @@ static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
 	u32 offset;
 
 	/* Move bo to VRAM if not already there. */
-	ret = xe_bo_validate(bo, NULL, false);
+	ret = xe_bo_validate(bo, NULL);
 	if (ret) {
 		KUNIT_FAIL(test, "Failed to validate bo.\n");
 		return ret;
@@ -274,7 +274,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
 		if (i) {
 			down_read(&vm->lock);
 			xe_vm_lock(vm, false);
-			err = xe_bo_validate(bo, bo->vm, false);
+			err = xe_bo_validate(bo, bo->vm);
 			xe_vm_unlock(vm);
 			up_read(&vm->lock);
 			if (err) {
@@ -283,7 +283,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
 				goto cleanup_all;
 			}
 			xe_bo_lock(external, false);
-			err = xe_bo_validate(external, NULL, false);
+			err = xe_bo_validate(external, NULL);
 			xe_bo_unlock(external);
 			if (err) {
 				KUNIT_FAIL(test, "external bo valid err=%pe\n",
diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
index 9f6d571d7fa9..37bcf812f3ca 100644
--- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
@@ -80,7 +80,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
 	}
 
 	/* Re-validate the importer. This should move also exporter in. */
-	ret = xe_bo_validate(imported, NULL, false);
+	ret = xe_bo_validate(imported, NULL);
 	if (ret) {
 		if (ret != -EINTR && ret != -ERESTARTSYS)
 			KUNIT_FAIL(test, "Validating importer failed with err=%d.\n",
@@ -156,7 +156,7 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
 
 			/* Is everything where we expect it to be? */
 			xe_bo_lock(import_bo, false);
-			err = xe_bo_validate(import_bo, NULL, false);
+			err = xe_bo_validate(import_bo, NULL);
 
 			/* Pinning in VRAM is not allowed. */
 			if (!is_dynamic(params) &&
diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index ce531498f57f..97735a3c66ab 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -120,7 +120,7 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
 		return;
 	}
 
-	err = xe_bo_validate(remote, NULL, false);
+	err = xe_bo_validate(remote, NULL);
 	if (err) {
 		KUNIT_FAIL(test, "Failed to validate system bo for %s: %li\n",
 			   str, err);
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 9298546909b5..db4cd1da8ef3 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1019,6 +1019,23 @@ static void xe_ttm_bo_delete_mem_notify(struct ttm_buffer_object *ttm_bo)
 	}
 }
 
+static bool xe_bo_eviction_valuable(struct ttm_buffer_object *ttm_bo,
+				    const struct ttm_place *place)
+{
+	if (xe_bo_is_xe_bo(ttm_bo)) {
+		struct xe_bo *xe_bo = ttm_to_xe_bo(ttm_bo);
+		struct xe_vm *vm = xe_bo->vm;
+
+		if (vm && !drm_gpuvm_is_extobj(&vm->gpuvm, &ttm_bo->base) &&
+		    vm->is_validating) {
+			xe_vm_assert_held(vm);
+			return false;
+		}
+	}
+
+	return ttm_bo_eviction_valuable(ttm_bo, place);
+}
+
 const struct ttm_device_funcs xe_ttm_funcs = {
 	.ttm_tt_create = xe_ttm_tt_create,
 	.ttm_tt_populate = xe_ttm_tt_populate,
@@ -1029,7 +1046,7 @@ const struct ttm_device_funcs xe_ttm_funcs = {
 	.io_mem_reserve = xe_ttm_io_mem_reserve,
 	.io_mem_pfn = xe_ttm_io_mem_pfn,
 	.release_notify = xe_ttm_bo_release_notify,
-	.eviction_valuable = ttm_bo_eviction_valuable,
+	.eviction_valuable = xe_bo_eviction_valuable,
 	.delete_mem_notify = xe_ttm_bo_delete_mem_notify,
 };
 
@@ -1631,7 +1648,7 @@ int xe_bo_pin_external(struct xe_bo *bo)
 	xe_assert(xe, xe_bo_is_user(bo));
 
 	if (!xe_bo_is_pinned(bo)) {
-		err = xe_bo_validate(bo, NULL, false);
+		err = xe_bo_validate(bo, NULL);
 		if (err)
 			return err;
 
@@ -1675,7 +1692,7 @@ int xe_bo_pin(struct xe_bo *bo)
 	/* We only expect at most 1 pin */
 	xe_assert(xe, !xe_bo_is_pinned(bo));
 
-	err = xe_bo_validate(bo, NULL, false);
+	err = xe_bo_validate(bo, bo->vm);
 	if (err)
 		return err;
 
@@ -1772,19 +1789,17 @@ void xe_bo_unpin(struct xe_bo *bo)
  * xe_bo_validate() - Make sure the bo is in an allowed placement
  * @bo: The bo,
  * @vm: Pointer to a the vm the bo shares a locked dma_resv object with, or
- *      NULL. Used together with @allow_res_evict.
- * @allow_res_evict: Whether it's allowed to evict bos sharing @vm's
- *                   reservation object.
+ *      NULL.
  *
  * Make sure the bo is in allowed placement, migrating it if necessary. If
  * needed, other bos will be evicted. If bos selected for eviction shares
- * the @vm's reservation object, they can be evicted iff @allow_res_evict is
- * set to true, otherwise they will be bypassed.
+ * the @vm's reservation object, they can be evicted if the
+ * xe_bo_eviction_valuable() function allows it.
  *
  * Return: 0 on success, negative error code on failure. May return
  * -EINTR or -ERESTARTSYS if internal waits are interrupted by a signal.
  */
-int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
+int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm)
 {
 	struct ttm_operation_ctx ctx = {
 		.interruptible = true,
@@ -1792,10 +1807,9 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
 	};
 
 	if (vm) {
-		lockdep_assert_held(&vm->lock);
 		xe_vm_assert_held(vm);
 
-		ctx.allow_res_evict = allow_res_evict;
+		ctx.allow_res_evict = true;
 		ctx.resv = xe_vm_resv(vm);
 	}
 
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 52e441f77e96..18776a70c843 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -193,7 +193,7 @@ int xe_bo_pin_external(struct xe_bo *bo);
 int xe_bo_pin(struct xe_bo *bo);
 void xe_bo_unpin_external(struct xe_bo *bo);
 void xe_bo_unpin(struct xe_bo *bo);
-int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict);
+int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm);
 
 static inline bool xe_bo_is_pinned(struct xe_bo *bo)
 {
diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 5b26af21e029..f1dc2bc5179b 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -102,7 +102,7 @@ static struct sg_table *xe_dma_buf_map(struct dma_buf_attachment *attach,
 		if (!attach->peer2peer)
 			r = xe_bo_migrate(bo, XE_PL_TT);
 		else
-			r = xe_bo_validate(bo, NULL, false);
+			r = xe_bo_validate(bo, NULL);
 		if (r)
 			return ERR_PTR(r);
 	}
diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index e00910246741..e4074a9e870f 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -101,23 +101,27 @@ static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec)
 	unsigned long index;
 	int ret;
 
+	vm->is_validating = true;
 	do {
 		ret = drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec);
 		if (ret)
-			return ret;
+			goto out;
 
 		ret = xe_vm_rebind(vm, false);
 		if (ret)
-			return ret;
+			goto out;
 	} while (!list_empty(&vm->gpuvm.evict.list));
 
 	drm_exec_for_each_locked_object(&vm_exec->exec, index, obj) {
 		ret = dma_resv_reserve_fences(obj->resv, 1);
 		if (ret)
-			return ret;
+			goto out;
 	}
 
 	return 0;
+out:
+	vm->is_validating = false;
+	return ret;
 }
 
 int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
@@ -340,6 +344,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	if (err)
 		xe_sched_job_put(job);
 err_exec:
+	vm->is_validating = false;
 	drm_exec_fini(exec);
 err_unlock_list:
 	up_read(&vm->lock);
diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index f54523d7d03c..312dbfcf9de5 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -406,7 +406,7 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
 		return 0;
 	}
 
-	err = xe_bo_validate(bo, NULL, false);
+	err = xe_bo_validate(bo, bo->vm);
 	if (err)
 		return err;
 
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index fa9e9853c53b..090f38848b23 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -118,7 +118,7 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
 			return err;
 	} else if (bo) {
 		/* Create backing store if needed */
-		err = xe_bo_validate(bo, vm, true);
+		err = xe_bo_validate(bo, vm);
 		if (err)
 			return err;
 	}
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index f9da055180b3..4424b867e29d 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -474,7 +474,7 @@ static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
 		list_move_tail(&gpuva_to_vma(gpuva)->combined_links.rebind,
 			       &vm->rebind_list);
 
-	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false);
+	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm);
 	if (ret)
 		return ret;
 
@@ -512,23 +512,27 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
 	if (err)
 		return err;
 
+	vm->is_validating = true;
 	do {
 		err = drm_gpuvm_validate(&vm->gpuvm, exec);
 		if (err)
-			return err;
+			goto out;
 
 		err = xe_vm_rebind(vm, true);
 		if (err)
-			return err;
+			goto out;
 	} while (!list_empty(&vm->gpuvm.evict.list));
 
 	drm_exec_for_each_locked_object(exec, index, obj) {
 		err = dma_resv_reserve_fences(obj->resv, vm->preempt.num_exec_queues);
 		if (err)
-			return err;
+			goto out;
 	}
 
 	return 0;
+out:
+	vm->is_validating = false;
+	return err;
 }
 
 static void preempt_rebind_work_func(struct work_struct *w)
@@ -617,6 +621,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	up_read(&vm->userptr.notifier_lock);
 
 out_unlock:
+	vm->is_validating = false;
 	drm_exec_fini(&exec);
 out_unlock_outer:
 	if (err == -EAGAIN) {
@@ -1848,7 +1853,7 @@ static int xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma, struct xe_exec_queue
 	xe_bo_assert_held(bo);
 
 	if (bo && immediate) {
-		err = xe_bo_validate(bo, vm, true);
+		err = xe_bo_validate(bo, vm);
 		if (err)
 			return err;
 	}
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index badf3945083d..0d286da148c0 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -268,6 +268,16 @@ struct xe_vm {
 	u64 tlb_flush_seqno;
 	/** @batch_invalidate_tlb: Always invalidate TLB before batch start */
 	bool batch_invalidate_tlb;
+
+	/**
+	 * @is_validaing: Whether we are validating the vm's local objects.
+	 * This field is protected by the vm's resv. Note that this
+	 * is needed only since TTM doesn't forward the ttm_operation_ctx to the
+	 * eviction_valuable() callback, so if / when that is in place, this
+	 * should be removed.
+	 */
+	bool is_validating;
+
 	/** @xef: XE file handle for tracking this VM's drm client */
 	struct xe_file *xef;
 };
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/7] drm/xe/bo: Allow eviction of unbound local bos
  2024-03-21 11:37 [PATCH 0/7] drm/xe: Rework rebinding and allow same-vm eviction Thomas Hellström
                   ` (9 preceding siblings ...)
  2024-03-21 11:37 ` [PATCH 6/7] drm/xe/bo: Forward the decision to evict local objects during validation Thomas Hellström
@ 2024-03-21 11:37 ` Thomas Hellström
  2024-03-21 13:44 ` ✗ CI.Patch_applied: failure for drm/xe: Rework rebinding and allow same-vm eviction Patchwork
  11 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 11:37 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Rodrigo Vivi, Matthew Brost,
	Lucas De Marchi, Oded Gabbay

Local bos that don't have any gpu_vmas set up are allowed
to be evicted. Support that.

Fixes: 24f947d58fe5 ("drm/xe: Use DRM GPUVM helpers for external- and evicted objects")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index db4cd1da8ef3..87e6aff52ca6 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1029,10 +1029,14 @@ static bool xe_bo_eviction_valuable(struct ttm_buffer_object *ttm_bo,
 		if (vm && !drm_gpuvm_is_extobj(&vm->gpuvm, &ttm_bo->base) &&
 		    vm->is_validating) {
 			xe_vm_assert_held(vm);
+			/* Not bound to the vm? */
+			if (list_empty(&ttm_bo->base.gpuva.list))
+				goto allow;
 			return false;
 		}
 	}
 
+allow:
 	return ttm_bo_eviction_valuable(ttm_bo, place);
 }
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* ✗ CI.Patch_applied: failure for drm/xe: Rework rebinding and allow same-vm eviction
  2024-03-21 11:37 [PATCH 0/7] drm/xe: Rework rebinding and allow same-vm eviction Thomas Hellström
                   ` (10 preceding siblings ...)
  2024-03-21 11:37 ` [PATCH 7/7] drm/xe/bo: Allow eviction of unbound local bos Thomas Hellström
@ 2024-03-21 13:44 ` Patchwork
  11 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2024-03-21 13:44 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

== Series Details ==

Series: drm/xe: Rework rebinding and allow same-vm eviction
URL   : https://patchwork.freedesktop.org/series/131421/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: 9ae91289c9ed drm-tip: 2024y-03m-21d-13h-03m-03s UTC integration manifest
=== git am output follows ===
error: patch failed: drivers/gpu/drm/xe/xe_exec.c:96
error: drivers/gpu/drm/xe/xe_exec.c: patch does not apply
error: patch failed: drivers/gpu/drm/xe/xe_gt_pagefault.c:100
error: drivers/gpu/drm/xe/xe_gt_pagefault.c: patch does not apply
error: patch failed: drivers/gpu/drm/xe/xe_pt.c:1235
error: drivers/gpu/drm/xe/xe_pt.c: patch does not apply
error: patch failed: drivers/gpu/drm/xe/xe_vm.c:485
error: drivers/gpu/drm/xe/xe_vm.c: patch does not apply
error: patch failed: drivers/gpu/drm/xe/xe_vm.h:242
error: drivers/gpu/drm/xe/xe_vm.h: patch does not apply
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Applying: drm/xe: Use ring ops TLB invalidation for rebinds
Applying: drm/xe: Rework rebinding
Applying: drm/xe: Reserve fences where needed
Applying: drm/xe: Reserve fences where needed
Patch failed at 0004 drm/xe: Reserve fences where needed
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds
  2024-03-21 11:37 ` [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds Thomas Hellström
@ 2024-03-21 19:09   ` Matthew Brost
  2024-03-21 21:14     ` Thomas Hellström
  2024-03-21 21:55     ` Thomas Hellström
  0 siblings, 2 replies; 23+ messages in thread
From: Matthew Brost @ 2024-03-21 19:09 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe, stable

On Thu, Mar 21, 2024 at 12:37:11PM +0100, Thomas Hellström wrote:
> For each rebind we insert a GuC TLB invalidation and add a
> corresponding unordered TLB invalidation fence. This might
> add a huge number of TLB invalidation fences to wait for so
> rather than doing that, defer the TLB invalidation to the
> next ring ops for each affected exec queue. Since the TLB
> is invalidated on exec_queue switch, we need to invalidate
> once for each affected exec_queue.
> 
> Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after rebinds issued from execs")
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: <stable@vger.kernel.org> # v6.8+
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  2 ++
>  drivers/gpu/drm/xe/xe_pt.c               |  5 +++--
>  drivers/gpu/drm/xe/xe_ring_ops.c         | 11 ++++-------
>  drivers/gpu/drm/xe/xe_sched_job.c        | 11 +++++++++++
>  drivers/gpu/drm/xe/xe_sched_job_types.h  |  2 ++
>  drivers/gpu/drm/xe/xe_vm_types.h         |  5 +++++
>  6 files changed, 27 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index 62b3d9d1d7cd..891ad30e906f 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -148,6 +148,8 @@ struct xe_exec_queue {
>  	const struct xe_ring_ops *ring_ops;
>  	/** @entity: DRM sched entity for this exec queue (1 to 1 relationship) */
>  	struct drm_sched_entity *entity;
> +	/** @tlb_flush_seqno: The seqno of the last rebind tlb flush performed */
> +	u64 tlb_flush_seqno;
>  	/** @lrc: logical ring context for this exec queue */
>  	struct xe_lrc lrc[];
>  };
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 8d3922d2206e..21bc0d13fccf 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1254,11 +1254,12 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
>  	 * non-faulting LR, in particular on user-space batch buffer chaining,
>  	 * it needs to be done here.
>  	 */
> -	if ((rebind && !xe_vm_in_lr_mode(vm) && !vm->batch_invalidate_tlb) ||
> -	    (!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) {
> +	if ((!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) {

Looked why this works in fault mode, we disallow scratch page in fault
mode. I thought at one point we had implementation for that [1] but it
looks like it never got merged. Some to keep an eye on.

[1] https://patchwork.freedesktop.org/series/120480/

>  		ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
>  		if (!ifence)
>  			return ERR_PTR(-ENOMEM);
> +	} else if (rebind && !xe_vm_in_lr_mode(vm) && !vm->batch_invalidate_tlb) {
> +		vm->tlb_flush_seqno++;

Can we unwind this if / else clause a bit?

I think batch_invalidate_tlb can only be true if !xe_vm_in_lr_mode(vm).

So else if 'rebind && !xe_vm_in_lr_mode(vm)' should work. Also if
batch_invalidate_tlb is we true we always issue TLB invalidate anyways
and incrementing the seqno is harmles too.

Side note, I'd be remiss if I didn't mention that I really do not like
updating these functions (__xe_pt_bind_vma / __xe_pt_unbind_vma) as they
are going away / being reworked here [2] in order to implement 1 job per
IOCTL / proper error handling.

[2] https://patchwork.freedesktop.org/series/125608/

>  	}
>  
>  	rfence = kzalloc(sizeof(*rfence), GFP_KERNEL);
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> index c4edffcd4a32..5b2b37b59813 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -219,10 +219,9 @@ static void __emit_job_gen12_simple(struct xe_sched_job *job, struct xe_lrc *lrc
>  {
>  	u32 dw[MAX_JOB_SIZE_DW], i = 0;
>  	u32 ppgtt_flag = get_ppgtt_flag(job);
> -	struct xe_vm *vm = job->q->vm;
>  	struct xe_gt *gt = job->q->gt;
>  
> -	if (vm && vm->batch_invalidate_tlb) {
> +	if (job->ring_ops_flush_tlb) {
>  		dw[i++] = preparser_disable(true);
>  		i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
>  					seqno, true, dw, i);
> @@ -270,7 +269,6 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
>  	struct xe_gt *gt = job->q->gt;
>  	struct xe_device *xe = gt_to_xe(gt);
>  	bool decode = job->q->class == XE_ENGINE_CLASS_VIDEO_DECODE;
> -	struct xe_vm *vm = job->q->vm;
>  
>  	dw[i++] = preparser_disable(true);
>  
> @@ -282,13 +280,13 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
>  			i = emit_aux_table_inv(gt, VE0_AUX_INV, dw, i);
>  	}
>  
> -	if (vm && vm->batch_invalidate_tlb)
> +	if (job->ring_ops_flush_tlb)
>  		i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
>  					seqno, true, dw, i);
>  
>  	dw[i++] = preparser_disable(false);
>  
> -	if (!vm || !vm->batch_invalidate_tlb)
> +	if (!job->ring_ops_flush_tlb)
>  		i = emit_store_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
>  					seqno, dw, i);
>  
> @@ -317,7 +315,6 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
>  	struct xe_gt *gt = job->q->gt;
>  	struct xe_device *xe = gt_to_xe(gt);
>  	bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
> -	struct xe_vm *vm = job->q->vm;
>  	u32 mask_flags = 0;
>  
>  	dw[i++] = preparser_disable(true);
> @@ -327,7 +324,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
>  		mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
>  
>  	/* See __xe_pt_bind_vma() for a discussion on TLB invalidations. */
> -	i = emit_pipe_invalidate(mask_flags, vm && vm->batch_invalidate_tlb, dw, i);
> +	i = emit_pipe_invalidate(mask_flags, job->ring_ops_flush_tlb, dw, i);
>  
>  	/* hsdes: 1809175790 */
>  	if (has_aux_ccs(xe))
> diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
> index 8151ddafb940..d55458d915a9 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job.c
> +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> @@ -250,6 +250,17 @@ bool xe_sched_job_completed(struct xe_sched_job *job)
>  
>  void xe_sched_job_arm(struct xe_sched_job *job)
>  {
> +	struct xe_exec_queue *q = job->q;
> +	struct xe_vm *vm = q->vm;
> +
> +	if (vm && !xe_sched_job_is_migration(q) && !xe_vm_in_lr_mode(vm) &&
> +	    vm->tlb_flush_seqno != q->tlb_flush_seqno) {
> +		q->tlb_flush_seqno = vm->tlb_flush_seqno;
> +		job->ring_ops_flush_tlb = true;
> +	} else if (vm && vm->batch_invalidate_tlb) {
> +		job->ring_ops_flush_tlb = true;
> +	}
> +

Can we simplify this too?

	if (vm && (vm->batch_invalidate_tlb || (vm->tlb_flush_seqno != q->tlb_flush_seqno))) {
		q->tlb_flush_seqno = vm->tlb_flush_seqno;
		job->ring_ops_flush_tlb = true;
	}

I think this works as xe_sched_job_is_migration has
emit_migration_job_gen12 which doesn't look at job->ring_ops_flush_tlb,
so no need to xe_sched_job_is_migration.

Also no need to check xe_vm_in_lr_mode as we wouldn'y increment the
seqno above if that true.

Lastly, harmless to increment q->tlb_flush_seqno in the case of
batch_invalidate_tlb being true.

>  	drm_sched_job_arm(&job->drm);
>  }
>  
> diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h
> index b1d83da50a53..5e12724219fd 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job_types.h
> +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
> @@ -39,6 +39,8 @@ struct xe_sched_job {
>  	} user_fence;
>  	/** @migrate_flush_flags: Additional flush flags for migration jobs */
>  	u32 migrate_flush_flags;
> +	/** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */
> +	bool ring_ops_flush_tlb;

How about JOB_FLAG_FLUSH_TLB rather than a new field? See
JOB_FLAG_SUBMIT flag usage.

Matt

>  	/** @batch_addr: batch buffer address of job */
>  	u64 batch_addr[];
>  };
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index ae5fb565f6bf..5747f136d24d 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -264,6 +264,11 @@ struct xe_vm {
>  		bool capture_once;
>  	} error_capture;
>  
> +	/**
> +	 * @tlb_flush_seqno: Required TLB flush seqno for the next exec.
> +	 * protected by the vm resv.
> +	 */
> +	u64 tlb_flush_seqno;
>  	/** @batch_invalidate_tlb: Always invalidate TLB before batch start */
>  	bool batch_invalidate_tlb;
>  	/** @xef: XE file handle for tracking this VM's drm client */
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] drm/xe: Rework rebinding
  2024-03-21 11:37 ` [PATCH 2/7] drm/xe: Rework rebinding Thomas Hellström
@ 2024-03-21 19:14   ` Matthew Brost
  2024-03-21 21:16     ` Thomas Hellström
  0 siblings, 1 reply; 23+ messages in thread
From: Matthew Brost @ 2024-03-21 19:14 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe, Rodrigo Vivi, stable

On Thu, Mar 21, 2024 at 12:37:12PM +0100, Thomas Hellström wrote:
> Instead of handling the vm's rebind fence separately,
> which is error prone if they are not strictly ordered,
> attach rebind fences as kernel fences to the vm's resv.
> 

See comment from previous, do not like updates to __xe_pt_bind_vma but I
guess I can live with it. Otherwise LGTM.

With that:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: <stable@vger.kernel.org> # v6.8+
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec.c     | 31 +++----------------------------
>  drivers/gpu/drm/xe/xe_pt.c       |  2 +-
>  drivers/gpu/drm/xe/xe_vm.c       | 27 +++++++++------------------
>  drivers/gpu/drm/xe/xe_vm.h       |  2 +-
>  drivers/gpu/drm/xe/xe_vm_types.h |  3 ---
>  5 files changed, 14 insertions(+), 51 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index 7692ebfe7d47..759497d4a102 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -152,7 +152,6 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  	struct drm_exec *exec = &vm_exec.exec;
>  	u32 i, num_syncs = 0, num_ufence = 0;
>  	struct xe_sched_job *job;
> -	struct dma_fence *rebind_fence;
>  	struct xe_vm *vm;
>  	bool write_locked, skip_retry = false;
>  	ktime_t end = 0;
> @@ -294,35 +293,11 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  	 * Rebind any invalidated userptr or evicted BOs in the VM, non-compute
>  	 * VM mode only.
>  	 */
> -	rebind_fence = xe_vm_rebind(vm, false);
> -	if (IS_ERR(rebind_fence)) {
> -		err = PTR_ERR(rebind_fence);
> +	err = xe_vm_rebind(vm, false);
> +	if (err)
>  		goto err_put_job;
> -	}
> -
> -	/*
> -	 * We store the rebind_fence in the VM so subsequent execs don't get
> -	 * scheduled before the rebinds of userptrs / evicted BOs is complete.
> -	 */
> -	if (rebind_fence) {
> -		dma_fence_put(vm->rebind_fence);
> -		vm->rebind_fence = rebind_fence;
> -	}
> -	if (vm->rebind_fence) {
> -		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> -			     &vm->rebind_fence->flags)) {
> -			dma_fence_put(vm->rebind_fence);
> -			vm->rebind_fence = NULL;
> -		} else {
> -			dma_fence_get(vm->rebind_fence);
> -			err = drm_sched_job_add_dependency(&job->drm,
> -							   vm->rebind_fence);
> -			if (err)
> -				goto err_put_job;
> -		}
> -	}
>  
> -	/* Wait behind munmap style rebinds */
> +	/* Wait behind rebinds */
>  	if (!xe_vm_in_lr_mode(vm)) {
>  		err = drm_sched_job_add_resv_dependencies(&job->drm,
>  							  xe_vm_resv(vm),
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 21bc0d13fccf..0484ed5b495f 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1298,7 +1298,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
>  		}
>  
>  		/* add shared fence now for pagetable delayed destroy */
> -		dma_resv_add_fence(xe_vm_resv(vm), fence, !rebind &&
> +		dma_resv_add_fence(xe_vm_resv(vm), fence, rebind ||
>  				   last_munmap_rebind ?
>  				   DMA_RESV_USAGE_KERNEL :
>  				   DMA_RESV_USAGE_BOOKKEEP);
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 80d43d75b1da..35fba6e3f889 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -522,7 +522,6 @@ static void preempt_rebind_work_func(struct work_struct *w)
>  {
>  	struct xe_vm *vm = container_of(w, struct xe_vm, preempt.rebind_work);
>  	struct drm_exec exec;
> -	struct dma_fence *rebind_fence;
>  	unsigned int fence_count = 0;
>  	LIST_HEAD(preempt_fences);
>  	ktime_t end = 0;
> @@ -568,18 +567,11 @@ static void preempt_rebind_work_func(struct work_struct *w)
>  	if (err)
>  		goto out_unlock;
>  
> -	rebind_fence = xe_vm_rebind(vm, true);
> -	if (IS_ERR(rebind_fence)) {
> -		err = PTR_ERR(rebind_fence);
> +	err = xe_vm_rebind(vm, true);
> +	if (err)
>  		goto out_unlock;
> -	}
> -
> -	if (rebind_fence) {
> -		dma_fence_wait(rebind_fence, false);
> -		dma_fence_put(rebind_fence);
> -	}
>  
> -	/* Wait on munmap style VM unbinds */
> +	/* Wait on rebinds and munmap style VM unbinds */
>  	wait = dma_resv_wait_timeout(xe_vm_resv(vm),
>  				     DMA_RESV_USAGE_KERNEL,
>  				     false, MAX_SCHEDULE_TIMEOUT);
> @@ -773,14 +765,14 @@ xe_vm_bind_vma(struct xe_vma *vma, struct xe_exec_queue *q,
>  	       struct xe_sync_entry *syncs, u32 num_syncs,
>  	       bool first_op, bool last_op);
>  
> -struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
> +int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
>  {
> -	struct dma_fence *fence = NULL;
> +	struct dma_fence *fence;
>  	struct xe_vma *vma, *next;
>  
>  	lockdep_assert_held(&vm->lock);
>  	if (xe_vm_in_lr_mode(vm) && !rebind_worker)
> -		return NULL;
> +		return 0;
>  
>  	xe_vm_assert_held(vm);
>  	list_for_each_entry_safe(vma, next, &vm->rebind_list,
> @@ -788,17 +780,17 @@ struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
>  		xe_assert(vm->xe, vma->tile_present);
>  
>  		list_del_init(&vma->combined_links.rebind);
> -		dma_fence_put(fence);
>  		if (rebind_worker)
>  			trace_xe_vma_rebind_worker(vma);
>  		else
>  			trace_xe_vma_rebind_exec(vma);
>  		fence = xe_vm_bind_vma(vma, NULL, NULL, 0, false, false);
>  		if (IS_ERR(fence))
> -			return fence;
> +			return PTR_ERR(fence);
> +		dma_fence_put(fence);
>  	}
>  
> -	return fence;
> +	return 0;
>  }
>  
>  static void xe_vma_free(struct xe_vma *vma)
> @@ -1588,7 +1580,6 @@ static void vm_destroy_work_func(struct work_struct *w)
>  		XE_WARN_ON(vm->pt_root[id]);
>  
>  	trace_xe_vm_free(vm);
> -	dma_fence_put(vm->rebind_fence);
>  	kfree(vm);
>  }
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 6df1f1c7f85d..4853354336f2 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -207,7 +207,7 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm);
>  
>  int xe_vm_userptr_check_repin(struct xe_vm *vm);
>  
> -struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
> +int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
>  
>  int xe_vm_invalidate_vma(struct xe_vma *vma);
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index 5747f136d24d..badf3945083d 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -177,9 +177,6 @@ struct xe_vm {
>  	 */
>  	struct list_head rebind_list;
>  
> -	/** @rebind_fence: rebind fence from execbuf */
> -	struct dma_fence *rebind_fence;
> -
>  	/**
>  	 * @destroy_work: worker to destroy VM, needed as a dma_fence signaling
>  	 * from an irq context can be last put and the destroy needs to be able
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] drm/xe: Reserve fences where needed
  2024-03-21 11:37 ` [PATCH 3/7] drm/xe: Reserve fences where needed Thomas Hellström
@ 2024-03-21 20:10   ` Matthew Brost
  2024-03-21 21:34     ` Thomas Hellström
  0 siblings, 1 reply; 23+ messages in thread
From: Matthew Brost @ 2024-03-21 20:10 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe, Matthew Auld

On Thu, Mar 21, 2024 at 12:37:15PM +0100, Thomas Hellström wrote:
> Reserve fence slots where actually needed rather than trying to
> predict how many fence slots will be needed over a complete
> wound-wait transaction.
> 

You include this patch twice in the series.

> Fixes: 29f424eb8702 ("drm/xe/exec: move fence reservation")
> Cc: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec.c         | 27 +---------------
>  drivers/gpu/drm/xe/xe_gt_pagefault.c |  3 +-
>  drivers/gpu/drm/xe/xe_pt.c           | 14 ++++++++
>  drivers/gpu/drm/xe/xe_vm.c           | 48 +++++++++++++---------------
>  drivers/gpu/drm/xe/xe_vm.h           |  3 +-
>  5 files changed, 40 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index 7692ebfe7d47..511d23426a5e 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -96,41 +96,16 @@
>  
>  static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec)
>  {
> -	struct xe_vm *vm = container_of(vm_exec->vm, struct xe_vm, gpuvm);
>  	struct drm_gem_object *obj;
>  	unsigned long index;
> -	int num_fences;
>  	int ret;
>  
>  	ret = drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec);
>  	if (ret)
>  		return ret;
>  
> -	/*
> -	 * 1 fence slot for the final submit, and 1 more for every per-tile for
> -	 * GPU bind and 1 extra for CPU bind. Note that there are potentially
> -	 * many vma per object/dma-resv, however the fence slot will just be
> -	 * re-used, since they are largely the same timeline and the seqno
> -	 * should be in order. In the case of CPU bind there is dummy fence used
> -	 * for all CPU binds, so no need to have a per-tile slot for that.
> -	 */
> -	num_fences = 1 + 1 + vm->xe->info.tile_count;
> -
> -	/*
> -	 * We don't know upfront exactly how many fence slots we will need at
> -	 * the start of the exec, since the TTM bo_validate above can consume
> -	 * numerous fence slots. Also due to how the dma_resv_reserve_fences()
> -	 * works it only ensures that at least that many fence slots are
> -	 * available i.e if there are already 10 slots available and we reserve
> -	 * two more, it can just noop without reserving anything.  With this it
> -	 * is quite possible that TTM steals some of the fence slots and then
> -	 * when it comes time to do the vma binding and final exec stage we are
> -	 * lacking enough fence slots, leading to some nasty BUG_ON() when
> -	 * adding the fences. Hence just add our own fences here, after the
> -	 * validate stage.
> -	 */
>  	drm_exec_for_each_locked_object(&vm_exec->exec, index, obj) {
> -		ret = dma_resv_reserve_fences(obj->resv, num_fences);
> +		ret = dma_resv_reserve_fences(obj->resv, 1);
>  		if (ret)
>  			return ret;
>  	}
> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> index 241c294270d9..fa9e9853c53b 100644
> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> @@ -100,10 +100,9 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
>  {
>  	struct xe_bo *bo = xe_vma_bo(vma);
>  	struct xe_vm *vm = xe_vma_vm(vma);
> -	unsigned int num_shared = 2; /* slots for bind + move */
>  	int err;
>  
> -	err = xe_vm_prepare_vma(exec, vma, num_shared);
> +	err = xe_vm_lock_vma(exec, vma);
>  	if (err)
>  		return err;
>  
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 109c26e5689e..7add08b2954e 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1235,6 +1235,13 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
>  	err = xe_pt_prepare_bind(tile, vma, entries, &num_entries);
>  	if (err)
>  		goto err;
> +
> +	err = dma_resv_reserve_fences(xe_vm_resv(vm), 1);
> +	if (!err && !xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
> +		err = dma_resv_reserve_fences(xe_vma_bo(vma)->ttm.base.resv, 1);
> +	if (err)
> +		goto err;
> +

Again not a huge fan of updating __xe_pt_bind_vma / __xe_pt_unbind_vma
here because these are going away [1] but again conceptually this is
more or less correct the place.

[1] https://patchwork.freedesktop.org/series/125608/

>  	xe_tile_assert(tile, num_entries <= ARRAY_SIZE(entries));
>  
>  	xe_vm_dbg_print_entries(tile_to_xe(tile), entries, num_entries);
> @@ -1576,6 +1583,7 @@ __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queu
>  	struct dma_fence *fence = NULL;
>  	struct invalidation_fence *ifence;
>  	struct xe_range_fence *rfence;
> +	int err;
>  
>  	LLIST_HEAD(deferred);
>  
> @@ -1593,6 +1601,12 @@ __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queu
>  	xe_pt_calc_rfence_interval(vma, &unbind_pt_update, entries,
>  				   num_entries);
>  
> +	err = dma_resv_reserve_fences(xe_vm_resv(vm), 1);
> +	if (!err && !xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
> +		err = dma_resv_reserve_fences(xe_vma_bo(vma)->ttm.base.resv, 1);
> +	if (err)
> +		return ERR_PTR(err);
> +
>  	ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
>  	if (!ifence)
>  		return ERR_PTR(-ENOMEM);
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 80d43d75b1da..cff8db605743 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -485,14 +485,11 @@ static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
>  static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
>  				 bool *done)
>  {
> +	struct drm_gem_object *obj;
> +	unsigned long index;
>  	int err;
>  
> -	/*
> -	 * 1 fence for each preempt fence plus a fence for each tile from a
> -	 * possible rebind
> -	 */
> -	err = drm_gpuvm_prepare_vm(&vm->gpuvm, exec, vm->preempt.num_exec_queues +
> -				   vm->xe->info.tile_count);
> +	err = drm_gpuvm_prepare_vm(&vm->gpuvm, exec, 0);
>  	if (err)
>  		return err;
>  
> @@ -507,7 +504,7 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
>  		return 0;
>  	}
>  
> -	err = drm_gpuvm_prepare_objects(&vm->gpuvm, exec, vm->preempt.num_exec_queues);
> +	err = drm_gpuvm_prepare_objects(&vm->gpuvm, exec, 0);

I think this needs to be 1 as drm_gpuvm_validate needs a fence slot,
right?

>  	if (err)
>  		return err;
>  
> @@ -515,7 +512,17 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
>  	if (err)
>  		return err;
>  
> -	return drm_gpuvm_validate(&vm->gpuvm, exec);
> +	err = drm_gpuvm_validate(&vm->gpuvm, exec);
> +	if (err)
> +		return err;
> +
> +	drm_exec_for_each_locked_object(exec, index, obj) {
> +		err = dma_resv_reserve_fences(obj->resv, vm->preempt.num_exec_queues);

What if we only have external BOs? Do we allocate slots in the VM
dma-resv slot with this loop?

Also what is the behavior of?

dma_resv_reserve_fences(obj, 1);
dma_resv_reserve_fences(obj, 1);

Do we have 2 slots now? e.g. to calls to reserve slots accumulate?

Lastly, strickly speaking I don't we need to patch in this series, do
we? It doesn't appaer to being fixing anything unless I'm missing
something.

Matt

> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
>  }
>  
>  static void preempt_rebind_work_func(struct work_struct *w)
> @@ -1004,35 +1011,26 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
>  }
>  
>  /**
> - * xe_vm_prepare_vma() - drm_exec utility to lock a vma
> + * xe_vm_lock_vma() - drm_exec utility to lock a vma
>   * @exec: The drm_exec object we're currently locking for.
>   * @vma: The vma for witch we want to lock the vm resv and any attached
>   * object's resv.
> - * @num_shared: The number of dma-fence slots to pre-allocate in the
> - * objects' reservation objects.
>   *
>   * Return: 0 on success, negative error code on error. In particular
>   * may return -EDEADLK on WW transaction contention and -EINTR if
>   * an interruptible wait is terminated by a signal.
>   */
> -int xe_vm_prepare_vma(struct drm_exec *exec, struct xe_vma *vma,
> -		      unsigned int num_shared)
> +int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma)
>  {
>  	struct xe_vm *vm = xe_vma_vm(vma);
>  	struct xe_bo *bo = xe_vma_bo(vma);
>  	int err;
>  
>  	XE_WARN_ON(!vm);
> -	if (num_shared)
> -		err = drm_exec_prepare_obj(exec, xe_vm_obj(vm), num_shared);
> -	else
> -		err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
> -	if (!err && bo && !bo->vm) {
> -		if (num_shared)
> -			err = drm_exec_prepare_obj(exec, &bo->ttm.base, num_shared);
> -		else
> -			err = drm_exec_lock_obj(exec, &bo->ttm.base);
> -	}
> +
> +	err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
> +	if (!err && bo && !bo->vm)
> +		err = drm_exec_lock_obj(exec, &bo->ttm.base);
>  
>  	return err;
>  }
> @@ -1044,7 +1042,7 @@ static void xe_vma_destroy_unlocked(struct xe_vma *vma)
>  
>  	drm_exec_init(&exec, 0, 0);
>  	drm_exec_until_all_locked(&exec) {
> -		err = xe_vm_prepare_vma(&exec, vma, 0);
> +		err = xe_vm_lock_vma(&exec, vma);
>  		drm_exec_retry_on_contention(&exec);
>  		if (XE_WARN_ON(err))
>  			break;
> @@ -2511,7 +2509,7 @@ static int op_execute(struct drm_exec *exec, struct xe_vm *vm,
>  
>  	lockdep_assert_held_write(&vm->lock);
>  
> -	err = xe_vm_prepare_vma(exec, vma, 1);
> +	err = xe_vm_lock_vma(exec, vma);
>  	if (err)
>  		return err;
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 6df1f1c7f85d..47f3960120a0 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -242,8 +242,7 @@ bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end);
>  
>  int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id);
>  
> -int xe_vm_prepare_vma(struct drm_exec *exec, struct xe_vma *vma,
> -		      unsigned int num_shared);
> +int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma);
>  
>  /**
>   * xe_vm_resv() - Return's the vm's reservation object
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds
  2024-03-21 19:09   ` Matthew Brost
@ 2024-03-21 21:14     ` Thomas Hellström
  2024-03-21 21:21       ` Thomas Hellström
  2024-03-21 21:55     ` Thomas Hellström
  1 sibling, 1 reply; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 21:14 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, stable

Hi, Matthew,

Thanks for reviewing, please see inline.

On Thu, 2024-03-21 at 19:09 +0000, Matthew Brost wrote:
> On Thu, Mar 21, 2024 at 12:37:11PM +0100, Thomas Hellström wrote:
> > For each rebind we insert a GuC TLB invalidation and add a
> > corresponding unordered TLB invalidation fence. This might
> > add a huge number of TLB invalidation fences to wait for so
> > rather than doing that, defer the TLB invalidation to the
> > next ring ops for each affected exec queue. Since the TLB
> > is invalidated on exec_queue switch, we need to invalidate
> > once for each affected exec_queue.
> > 
> > Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after
> > rebinds issued from execs")
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: <stable@vger.kernel.org> # v6.8+
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_exec_queue_types.h |  2 ++
> >  drivers/gpu/drm/xe/xe_pt.c               |  5 +++--
> >  drivers/gpu/drm/xe/xe_ring_ops.c         | 11 ++++-------
> >  drivers/gpu/drm/xe/xe_sched_job.c        | 11 +++++++++++
> >  drivers/gpu/drm/xe/xe_sched_job_types.h  |  2 ++
> >  drivers/gpu/drm/xe/xe_vm_types.h         |  5 +++++
> >  6 files changed, 27 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > index 62b3d9d1d7cd..891ad30e906f 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > @@ -148,6 +148,8 @@ struct xe_exec_queue {
> >  	const struct xe_ring_ops *ring_ops;
> >  	/** @entity: DRM sched entity for this exec queue (1 to 1
> > relationship) */
> >  	struct drm_sched_entity *entity;
> > +	/** @tlb_flush_seqno: The seqno of the last rebind tlb
> > flush performed */
> > +	u64 tlb_flush_seqno;
> >  	/** @lrc: logical ring context for this exec queue */
> >  	struct xe_lrc lrc[];
> >  };
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > b/drivers/gpu/drm/xe/xe_pt.c
> > index 8d3922d2206e..21bc0d13fccf 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -1254,11 +1254,12 @@ __xe_pt_bind_vma(struct xe_tile *tile,
> > struct xe_vma *vma, struct xe_exec_queue
> >  	 * non-faulting LR, in particular on user-space batch
> > buffer chaining,
> >  	 * it needs to be done here.
> >  	 */
> > -	if ((rebind && !xe_vm_in_lr_mode(vm) && !vm-
> > >batch_invalidate_tlb) ||
> > -	    (!rebind && xe_vm_has_scratch(vm) &&
> > xe_vm_in_preempt_fence_mode(vm))) {
> > +	if ((!rebind && xe_vm_has_scratch(vm) &&
> > xe_vm_in_preempt_fence_mode(vm))) {
> 
> Looked why this works in fault mode, we disallow scratch page in
> fault
> mode. I thought at one point we had implementation for that [1] but
> it
> looks like it never got merged. Some to keep an eye on.
> 
> [1] https://patchwork.freedesktop.org/series/120480/
> 
> >  		ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
> >  		if (!ifence)
> >  			return ERR_PTR(-ENOMEM);
> > +	} else if (rebind && !xe_vm_in_lr_mode(vm) && !vm-
> > >batch_invalidate_tlb) {
> > +		vm->tlb_flush_seqno++;
> 
> Can we unwind this if / else clause a bit?
> 
> I think batch_invalidate_tlb can only be true if
> !xe_vm_in_lr_mode(vm).
> 
> So else if 'rebind && !xe_vm_in_lr_mode(vm)' should work. Also if
> batch_invalidate_tlb is we true we always issue TLB invalidate
> anyways
> and incrementing the seqno is harmles too.

Yes, although I don't really like making assumptions in the code what
it does with a certain variable elsewhere. At some point in the future
people might change that, or say "Hey, they unnecessarily increment the
seqno here or forget a branch. I could add asserts about
batch_invalidate_tlb, though to avoid such future mishaps.


> 
> Side note, I'd be remiss if I didn't mention that I really do not
> like
> updating these functions (__xe_pt_bind_vma / __xe_pt_unbind_vma) as
> they
> are going away / being reworked here [2] in order to implement 1 job
> per
> IOCTL / proper error handling.
> 
> [2] https://patchwork.freedesktop.org/series/125608/

Fully understandible. That's why I asked whether you thought it
clashed. However I want this backported to 6.8 so I don't see any other
way of doing it.

/Thomas


> 
> >  	}
> >  
> >  	rfence = kzalloc(sizeof(*rfence), GFP_KERNEL);
> > diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c
> > b/drivers/gpu/drm/xe/xe_ring_ops.c
> > index c4edffcd4a32..5b2b37b59813 100644
> > --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> > +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> > @@ -219,10 +219,9 @@ static void __emit_job_gen12_simple(struct
> > xe_sched_job *job, struct xe_lrc *lrc
> >  {
> >  	u32 dw[MAX_JOB_SIZE_DW], i = 0;
> >  	u32 ppgtt_flag = get_ppgtt_flag(job);
> > -	struct xe_vm *vm = job->q->vm;
> >  	struct xe_gt *gt = job->q->gt;
> >  
> > -	if (vm && vm->batch_invalidate_tlb) {
> > +	if (job->ring_ops_flush_tlb) {
> >  		dw[i++] = preparser_disable(true);
> >  		i =
> > emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
> >  					seqno, true, dw, i);
> > @@ -270,7 +269,6 @@ static void __emit_job_gen12_video(struct
> > xe_sched_job *job, struct xe_lrc *lrc,
> >  	struct xe_gt *gt = job->q->gt;
> >  	struct xe_device *xe = gt_to_xe(gt);
> >  	bool decode = job->q->class ==
> > XE_ENGINE_CLASS_VIDEO_DECODE;
> > -	struct xe_vm *vm = job->q->vm;
> >  
> >  	dw[i++] = preparser_disable(true);
> >  
> > @@ -282,13 +280,13 @@ static void __emit_job_gen12_video(struct
> > xe_sched_job *job, struct xe_lrc *lrc,
> >  			i = emit_aux_table_inv(gt, VE0_AUX_INV,
> > dw, i);
> >  	}
> >  
> > -	if (vm && vm->batch_invalidate_tlb)
> > +	if (job->ring_ops_flush_tlb)
> >  		i =
> > emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
> >  					seqno, true, dw, i);
> >  
> >  	dw[i++] = preparser_disable(false);
> >  
> > -	if (!vm || !vm->batch_invalidate_tlb)
> > +	if (!job->ring_ops_flush_tlb)
> >  		i =
> > emit_store_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
> >  					seqno, dw, i);
> >  
> > @@ -317,7 +315,6 @@ static void
> > __emit_job_gen12_render_compute(struct xe_sched_job *job,
> >  	struct xe_gt *gt = job->q->gt;
> >  	struct xe_device *xe = gt_to_xe(gt);
> >  	bool lacks_render = !(gt->info.engine_mask &
> > XE_HW_ENGINE_RCS_MASK);
> > -	struct xe_vm *vm = job->q->vm;
> >  	u32 mask_flags = 0;
> >  
> >  	dw[i++] = preparser_disable(true);
> > @@ -327,7 +324,7 @@ static void
> > __emit_job_gen12_render_compute(struct xe_sched_job *job,
> >  		mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
> >  
> >  	/* See __xe_pt_bind_vma() for a discussion on TLB
> > invalidations. */
> > -	i = emit_pipe_invalidate(mask_flags, vm && vm-
> > >batch_invalidate_tlb, dw, i);
> > +	i = emit_pipe_invalidate(mask_flags, job-
> > >ring_ops_flush_tlb, dw, i);
> >  
> >  	/* hsdes: 1809175790 */
> >  	if (has_aux_ccs(xe))
> > diff --git a/drivers/gpu/drm/xe/xe_sched_job.c
> > b/drivers/gpu/drm/xe/xe_sched_job.c
> > index 8151ddafb940..d55458d915a9 100644
> > --- a/drivers/gpu/drm/xe/xe_sched_job.c
> > +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> > @@ -250,6 +250,17 @@ bool xe_sched_job_completed(struct
> > xe_sched_job *job)
> >  
> >  void xe_sched_job_arm(struct xe_sched_job *job)
> >  {
> > +	struct xe_exec_queue *q = job->q;
> > +	struct xe_vm *vm = q->vm;
> > +
> > +	if (vm && !xe_sched_job_is_migration(q) &&
> > !xe_vm_in_lr_mode(vm) &&
> > +	    vm->tlb_flush_seqno != q->tlb_flush_seqno) {
> > +		q->tlb_flush_seqno = vm->tlb_flush_seqno;
> > +		job->ring_ops_flush_tlb = true;
> > +	} else if (vm && vm->batch_invalidate_tlb) {
> > +		job->ring_ops_flush_tlb = true;
> > +	}
> > +
> 
> Can we simplify this too?
> 
> 	if (vm && (vm->batch_invalidate_tlb || (vm->tlb_flush_seqno
> != q->tlb_flush_seqno))) {
> 		q->tlb_flush_seqno = vm->tlb_flush_seqno;
> 		job->ring_ops_flush_tlb = true;
> 	}
> 
> I think this works as xe_sched_job_is_migration has
> emit_migration_job_gen12 which doesn't look at job-
> >ring_ops_flush_tlb,
> so no need to xe_sched_job_is_migration.
> 
> Also no need to check xe_vm_in_lr_mode as we wouldn'y increment the
> seqno above if that true.
> 
> Lastly, harmless to increment q->tlb_flush_seqno in the case of
> batch_invalidate_tlb being true.
> 
> >  	drm_sched_job_arm(&job->drm);
> >  }
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h
> > b/drivers/gpu/drm/xe/xe_sched_job_types.h
> > index b1d83da50a53..5e12724219fd 100644
> > --- a/drivers/gpu/drm/xe/xe_sched_job_types.h
> > +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
> > @@ -39,6 +39,8 @@ struct xe_sched_job {
> >  	} user_fence;
> >  	/** @migrate_flush_flags: Additional flush flags for
> > migration jobs */
> >  	u32 migrate_flush_flags;
> > +	/** @ring_ops_flush_tlb: The ring ops need to flush TLB
> > before payload. */
> > +	bool ring_ops_flush_tlb;
> 
> How about JOB_FLAG_FLUSH_TLB rather than a new field? See
> JOB_FLAG_SUBMIT flag usage.
> 
> Matt
> 
> >  	/** @batch_addr: batch buffer address of job */
> >  	u64 batch_addr[];
> >  };
> > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> > b/drivers/gpu/drm/xe/xe_vm_types.h
> > index ae5fb565f6bf..5747f136d24d 100644
> > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > @@ -264,6 +264,11 @@ struct xe_vm {
> >  		bool capture_once;
> >  	} error_capture;
> >  
> > +	/**
> > +	 * @tlb_flush_seqno: Required TLB flush seqno for the next
> > exec.
> > +	 * protected by the vm resv.
> > +	 */
> > +	u64 tlb_flush_seqno;
> >  	/** @batch_invalidate_tlb: Always invalidate TLB before
> > batch start */
> >  	bool batch_invalidate_tlb;
> >  	/** @xef: XE file handle for tracking this VM's drm client
> > */
> > -- 
> > 2.44.0
> > 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] drm/xe: Rework rebinding
  2024-03-21 19:14   ` Matthew Brost
@ 2024-03-21 21:16     ` Thomas Hellström
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 21:16 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Rodrigo Vivi, stable

On Thu, 2024-03-21 at 19:14 +0000, Matthew Brost wrote:
> On Thu, Mar 21, 2024 at 12:37:12PM +0100, Thomas Hellström wrote:
> > Instead of handling the vm's rebind fence separately,
> > which is error prone if they are not strictly ordered,
> > attach rebind fences as kernel fences to the vm's resv.
> > 
> 
> See comment from previous, do not like updates to __xe_pt_bind_vma
> but I
> guess I can live with it. Otherwise LGTM.

Thanks, yeah same reason there, unfortunately.

/Thomas


> 
> With that:
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> 
> > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel
> > GPUs")
> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: <stable@vger.kernel.org> # v6.8+
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_exec.c     | 31 +++-------------------------
> > ---
> >  drivers/gpu/drm/xe/xe_pt.c       |  2 +-
> >  drivers/gpu/drm/xe/xe_vm.c       | 27 +++++++++------------------
> >  drivers/gpu/drm/xe/xe_vm.h       |  2 +-
> >  drivers/gpu/drm/xe/xe_vm_types.h |  3 ---
> >  5 files changed, 14 insertions(+), 51 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_exec.c
> > b/drivers/gpu/drm/xe/xe_exec.c
> > index 7692ebfe7d47..759497d4a102 100644
> > --- a/drivers/gpu/drm/xe/xe_exec.c
> > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > @@ -152,7 +152,6 @@ int xe_exec_ioctl(struct drm_device *dev, void
> > *data, struct drm_file *file)
> >  	struct drm_exec *exec = &vm_exec.exec;
> >  	u32 i, num_syncs = 0, num_ufence = 0;
> >  	struct xe_sched_job *job;
> > -	struct dma_fence *rebind_fence;
> >  	struct xe_vm *vm;
> >  	bool write_locked, skip_retry = false;
> >  	ktime_t end = 0;
> > @@ -294,35 +293,11 @@ int xe_exec_ioctl(struct drm_device *dev,
> > void *data, struct drm_file *file)
> >  	 * Rebind any invalidated userptr or evicted BOs in the
> > VM, non-compute
> >  	 * VM mode only.
> >  	 */
> > -	rebind_fence = xe_vm_rebind(vm, false);
> > -	if (IS_ERR(rebind_fence)) {
> > -		err = PTR_ERR(rebind_fence);
> > +	err = xe_vm_rebind(vm, false);
> > +	if (err)
> >  		goto err_put_job;
> > -	}
> > -
> > -	/*
> > -	 * We store the rebind_fence in the VM so subsequent execs
> > don't get
> > -	 * scheduled before the rebinds of userptrs / evicted BOs
> > is complete.
> > -	 */
> > -	if (rebind_fence) {
> > -		dma_fence_put(vm->rebind_fence);
> > -		vm->rebind_fence = rebind_fence;
> > -	}
> > -	if (vm->rebind_fence) {
> > -		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > -			     &vm->rebind_fence->flags)) {
> > -			dma_fence_put(vm->rebind_fence);
> > -			vm->rebind_fence = NULL;
> > -		} else {
> > -			dma_fence_get(vm->rebind_fence);
> > -			err = drm_sched_job_add_dependency(&job-
> > >drm,
> > -							   vm-
> > >rebind_fence);
> > -			if (err)
> > -				goto err_put_job;
> > -		}
> > -	}
> >  
> > -	/* Wait behind munmap style rebinds */
> > +	/* Wait behind rebinds */
> >  	if (!xe_vm_in_lr_mode(vm)) {
> >  		err = drm_sched_job_add_resv_dependencies(&job-
> > >drm,
> >  							 
> > xe_vm_resv(vm),
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > b/drivers/gpu/drm/xe/xe_pt.c
> > index 21bc0d13fccf..0484ed5b495f 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -1298,7 +1298,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct
> > xe_vma *vma, struct xe_exec_queue
> >  		}
> >  
> >  		/* add shared fence now for pagetable delayed
> > destroy */
> > -		dma_resv_add_fence(xe_vm_resv(vm), fence, !rebind
> > &&
> > +		dma_resv_add_fence(xe_vm_resv(vm), fence, rebind
> > ||
> >  				   last_munmap_rebind ?
> >  				   DMA_RESV_USAGE_KERNEL :
> >  				   DMA_RESV_USAGE_BOOKKEEP);
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > b/drivers/gpu/drm/xe/xe_vm.c
> > index 80d43d75b1da..35fba6e3f889 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -522,7 +522,6 @@ static void preempt_rebind_work_func(struct
> > work_struct *w)
> >  {
> >  	struct xe_vm *vm = container_of(w, struct xe_vm,
> > preempt.rebind_work);
> >  	struct drm_exec exec;
> > -	struct dma_fence *rebind_fence;
> >  	unsigned int fence_count = 0;
> >  	LIST_HEAD(preempt_fences);
> >  	ktime_t end = 0;
> > @@ -568,18 +567,11 @@ static void preempt_rebind_work_func(struct
> > work_struct *w)
> >  	if (err)
> >  		goto out_unlock;
> >  
> > -	rebind_fence = xe_vm_rebind(vm, true);
> > -	if (IS_ERR(rebind_fence)) {
> > -		err = PTR_ERR(rebind_fence);
> > +	err = xe_vm_rebind(vm, true);
> > +	if (err)
> >  		goto out_unlock;
> > -	}
> > -
> > -	if (rebind_fence) {
> > -		dma_fence_wait(rebind_fence, false);
> > -		dma_fence_put(rebind_fence);
> > -	}
> >  
> > -	/* Wait on munmap style VM unbinds */
> > +	/* Wait on rebinds and munmap style VM unbinds */
> >  	wait = dma_resv_wait_timeout(xe_vm_resv(vm),
> >  				     DMA_RESV_USAGE_KERNEL,
> >  				     false, MAX_SCHEDULE_TIMEOUT);
> > @@ -773,14 +765,14 @@ xe_vm_bind_vma(struct xe_vma *vma, struct
> > xe_exec_queue *q,
> >  	       struct xe_sync_entry *syncs, u32 num_syncs,
> >  	       bool first_op, bool last_op);
> >  
> > -struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool
> > rebind_worker)
> > +int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
> >  {
> > -	struct dma_fence *fence = NULL;
> > +	struct dma_fence *fence;
> >  	struct xe_vma *vma, *next;
> >  
> >  	lockdep_assert_held(&vm->lock);
> >  	if (xe_vm_in_lr_mode(vm) && !rebind_worker)
> > -		return NULL;
> > +		return 0;
> >  
> >  	xe_vm_assert_held(vm);
> >  	list_for_each_entry_safe(vma, next, &vm->rebind_list,
> > @@ -788,17 +780,17 @@ struct dma_fence *xe_vm_rebind(struct xe_vm
> > *vm, bool rebind_worker)
> >  		xe_assert(vm->xe, vma->tile_present);
> >  
> >  		list_del_init(&vma->combined_links.rebind);
> > -		dma_fence_put(fence);
> >  		if (rebind_worker)
> >  			trace_xe_vma_rebind_worker(vma);
> >  		else
> >  			trace_xe_vma_rebind_exec(vma);
> >  		fence = xe_vm_bind_vma(vma, NULL, NULL, 0, false,
> > false);
> >  		if (IS_ERR(fence))
> > -			return fence;
> > +			return PTR_ERR(fence);
> > +		dma_fence_put(fence);
> >  	}
> >  
> > -	return fence;
> > +	return 0;
> >  }
> >  
> >  static void xe_vma_free(struct xe_vma *vma)
> > @@ -1588,7 +1580,6 @@ static void vm_destroy_work_func(struct
> > work_struct *w)
> >  		XE_WARN_ON(vm->pt_root[id]);
> >  
> >  	trace_xe_vm_free(vm);
> > -	dma_fence_put(vm->rebind_fence);
> >  	kfree(vm);
> >  }
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_vm.h
> > b/drivers/gpu/drm/xe/xe_vm.h
> > index 6df1f1c7f85d..4853354336f2 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.h
> > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > @@ -207,7 +207,7 @@ int __xe_vm_userptr_needs_repin(struct xe_vm
> > *vm);
> >  
> >  int xe_vm_userptr_check_repin(struct xe_vm *vm);
> >  
> > -struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool
> > rebind_worker);
> > +int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
> >  
> >  int xe_vm_invalidate_vma(struct xe_vma *vma);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> > b/drivers/gpu/drm/xe/xe_vm_types.h
> > index 5747f136d24d..badf3945083d 100644
> > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > @@ -177,9 +177,6 @@ struct xe_vm {
> >  	 */
> >  	struct list_head rebind_list;
> >  
> > -	/** @rebind_fence: rebind fence from execbuf */
> > -	struct dma_fence *rebind_fence;
> > -
> >  	/**
> >  	 * @destroy_work: worker to destroy VM, needed as a
> > dma_fence signaling
> >  	 * from an irq context can be last put and the destroy
> > needs to be able
> > -- 
> > 2.44.0
> > 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds
  2024-03-21 21:14     ` Thomas Hellström
@ 2024-03-21 21:21       ` Thomas Hellström
  2024-03-22  2:36         ` Matthew Brost
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 21:21 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, stable

On Thu, 2024-03-21 at 22:14 +0100, Thomas Hellström wrote:
> Hi, Matthew,
> 
> Thanks for reviewing, please see inline.
> 
> On Thu, 2024-03-21 at 19:09 +0000, Matthew Brost wrote:
> > On Thu, Mar 21, 2024 at 12:37:11PM +0100, Thomas Hellström wrote:
> > > For each rebind we insert a GuC TLB invalidation and add a
> > > corresponding unordered TLB invalidation fence. This might
> > > add a huge number of TLB invalidation fences to wait for so
> > > rather than doing that, defer the TLB invalidation to the
> > > next ring ops for each affected exec queue. Since the TLB
> > > is invalidated on exec_queue switch, we need to invalidate
> > > once for each affected exec_queue.
> > > 
> > > Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after
> > > rebinds issued from execs")
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Cc: <stable@vger.kernel.org> # v6.8+
> > > Signed-off-by: Thomas Hellström
> > > <thomas.hellstrom@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_exec_queue_types.h |  2 ++
> > >  drivers/gpu/drm/xe/xe_pt.c               |  5 +++--
> > >  drivers/gpu/drm/xe/xe_ring_ops.c         | 11 ++++-------
> > >  drivers/gpu/drm/xe/xe_sched_job.c        | 11 +++++++++++
> > >  drivers/gpu/drm/xe/xe_sched_job_types.h  |  2 ++
> > >  drivers/gpu/drm/xe/xe_vm_types.h         |  5 +++++
> > >  6 files changed, 27 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > index 62b3d9d1d7cd..891ad30e906f 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > @@ -148,6 +148,8 @@ struct xe_exec_queue {
> > >  	const struct xe_ring_ops *ring_ops;
> > >  	/** @entity: DRM sched entity for this exec queue (1 to
> > > 1
> > > relationship) */
> > >  	struct drm_sched_entity *entity;
> > > +	/** @tlb_flush_seqno: The seqno of the last rebind tlb
> > > flush performed */
> > > +	u64 tlb_flush_seqno;
> > >  	/** @lrc: logical ring context for this exec queue */
> > >  	struct xe_lrc lrc[];
> > >  };
> > > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > > b/drivers/gpu/drm/xe/xe_pt.c
> > > index 8d3922d2206e..21bc0d13fccf 100644
> > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > @@ -1254,11 +1254,12 @@ __xe_pt_bind_vma(struct xe_tile *tile,
> > > struct xe_vma *vma, struct xe_exec_queue
> > >  	 * non-faulting LR, in particular on user-space batch
> > > buffer chaining,
> > >  	 * it needs to be done here.
> > >  	 */
> > > -	if ((rebind && !xe_vm_in_lr_mode(vm) && !vm-

While I remember it. When looking at your series I noticed that this
line got incorrectly moved there. Looks like you used
xe_vm_in_lr_mode() rather than !xe_vm_in_lr_mode()..

Thomas


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] drm/xe: Reserve fences where needed
  2024-03-21 20:10   ` Matthew Brost
@ 2024-03-21 21:34     ` Thomas Hellström
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 21:34 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Matthew Auld

On Thu, 2024-03-21 at 20:10 +0000, Matthew Brost wrote:
> On Thu, Mar 21, 2024 at 12:37:15PM +0100, Thomas Hellström wrote:
> > Reserve fence slots where actually needed rather than trying to
> > predict how many fence slots will be needed over a complete
> > wound-wait transaction.
> > 
> 
> You include this patch twice in the series.

Yes, the whole series got mixed up due to me forgetting to remove the
old series before sending patches out.

Please don't bother continue reviewing until v2 is out!

Some answers below.

> 
> > Fixes: 29f424eb8702 ("drm/xe/exec: move fence reservation")
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_exec.c         | 27 +---------------
> >  drivers/gpu/drm/xe/xe_gt_pagefault.c |  3 +-
> >  drivers/gpu/drm/xe/xe_pt.c           | 14 ++++++++
> >  drivers/gpu/drm/xe/xe_vm.c           | 48 +++++++++++++-----------
> > ----
> >  drivers/gpu/drm/xe/xe_vm.h           |  3 +-
> >  5 files changed, 40 insertions(+), 55 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_exec.c
> > b/drivers/gpu/drm/xe/xe_exec.c
> > index 7692ebfe7d47..511d23426a5e 100644
> > --- a/drivers/gpu/drm/xe/xe_exec.c
> > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > @@ -96,41 +96,16 @@
> >  
> >  static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec)
> >  {
> > -	struct xe_vm *vm = container_of(vm_exec->vm, struct xe_vm,
> > gpuvm);
> >  	struct drm_gem_object *obj;
> >  	unsigned long index;
> > -	int num_fences;
> >  	int ret;
> >  
> >  	ret = drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec);
> >  	if (ret)
> >  		return ret;
> >  
> > -	/*
> > -	 * 1 fence slot for the final submit, and 1 more for every
> > per-tile for
> > -	 * GPU bind and 1 extra for CPU bind. Note that there are
> > potentially
> > -	 * many vma per object/dma-resv, however the fence slot
> > will just be
> > -	 * re-used, since they are largely the same timeline and
> > the seqno
> > -	 * should be in order. In the case of CPU bind there is
> > dummy fence used
> > -	 * for all CPU binds, so no need to have a per-tile slot
> > for that.
> > -	 */
> > -	num_fences = 1 + 1 + vm->xe->info.tile_count;
> > -
> > -	/*
> > -	 * We don't know upfront exactly how many fence slots we
> > will need at
> > -	 * the start of the exec, since the TTM bo_validate above
> > can consume
> > -	 * numerous fence slots. Also due to how the
> > dma_resv_reserve_fences()
> > -	 * works it only ensures that at least that many fence
> > slots are
> > -	 * available i.e if there are already 10 slots available
> > and we reserve
> > -	 * two more, it can just noop without reserving anything. 
> > With this it
> > -	 * is quite possible that TTM steals some of the fence
> > slots and then
> > -	 * when it comes time to do the vma binding and final exec
> > stage we are
> > -	 * lacking enough fence slots, leading to some nasty
> > BUG_ON() when
> > -	 * adding the fences. Hence just add our own fences here,
> > after the
> > -	 * validate stage.
> > -	 */
> >  	drm_exec_for_each_locked_object(&vm_exec->exec, index,
> > obj) {
> > -		ret = dma_resv_reserve_fences(obj->resv,
> > num_fences);
> > +		ret = dma_resv_reserve_fences(obj->resv, 1);
> >  		if (ret)
> >  			return ret;
> >  	}
> > diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > index 241c294270d9..fa9e9853c53b 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > @@ -100,10 +100,9 @@ static int xe_pf_begin(struct drm_exec *exec,
> > struct xe_vma *vma,
> >  {
> >  	struct xe_bo *bo = xe_vma_bo(vma);
> >  	struct xe_vm *vm = xe_vma_vm(vma);
> > -	unsigned int num_shared = 2; /* slots for bind + move */
> >  	int err;
> >  
> > -	err = xe_vm_prepare_vma(exec, vma, num_shared);
> > +	err = xe_vm_lock_vma(exec, vma);
> >  	if (err)
> >  		return err;
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > b/drivers/gpu/drm/xe/xe_pt.c
> > index 109c26e5689e..7add08b2954e 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -1235,6 +1235,13 @@ __xe_pt_bind_vma(struct xe_tile *tile,
> > struct xe_vma *vma, struct xe_exec_queue
> >  	err = xe_pt_prepare_bind(tile, vma, entries,
> > &num_entries);
> >  	if (err)
> >  		goto err;
> > +
> > +	err = dma_resv_reserve_fences(xe_vm_resv(vm), 1);
> > +	if (!err && !xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
> > +		err = dma_resv_reserve_fences(xe_vma_bo(vma)-
> > >ttm.base.resv, 1);
> > +	if (err)
> > +		goto err;
> > +
> 
> Again not a huge fan of updating __xe_pt_bind_vma /
> __xe_pt_unbind_vma
> here because these are going away [1] but again conceptually this is
> more or less correct the place.
> 
> [1] https://patchwork.freedesktop.org/series/125608/
> 
> >  	xe_tile_assert(tile, num_entries <= ARRAY_SIZE(entries));
> >  
> >  	xe_vm_dbg_print_entries(tile_to_xe(tile), entries,
> > num_entries);
> > @@ -1576,6 +1583,7 @@ __xe_pt_unbind_vma(struct xe_tile *tile,
> > struct xe_vma *vma, struct xe_exec_queu
> >  	struct dma_fence *fence = NULL;
> >  	struct invalidation_fence *ifence;
> >  	struct xe_range_fence *rfence;
> > +	int err;
> >  
> >  	LLIST_HEAD(deferred);
> >  
> > @@ -1593,6 +1601,12 @@ __xe_pt_unbind_vma(struct xe_tile *tile,
> > struct xe_vma *vma, struct xe_exec_queu
> >  	xe_pt_calc_rfence_interval(vma, &unbind_pt_update,
> > entries,
> >  				   num_entries);
> >  
> > +	err = dma_resv_reserve_fences(xe_vm_resv(vm), 1);
> > +	if (!err && !xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
> > +		err = dma_resv_reserve_fences(xe_vma_bo(vma)-
> > >ttm.base.resv, 1);
> > +	if (err)
> > +		return ERR_PTR(err);
> > +
> >  	ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
> >  	if (!ifence)
> >  		return ERR_PTR(-ENOMEM);
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > b/drivers/gpu/drm/xe/xe_vm.c
> > index 80d43d75b1da..cff8db605743 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -485,14 +485,11 @@ static int xe_gpuvm_validate(struct
> > drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
> >  static int xe_preempt_work_begin(struct drm_exec *exec, struct
> > xe_vm *vm,
> >  				 bool *done)
> >  {
> > +	struct drm_gem_object *obj;
> > +	unsigned long index;
> >  	int err;
> >  
> > -	/*
> > -	 * 1 fence for each preempt fence plus a fence for each
> > tile from a
> > -	 * possible rebind
> > -	 */
> > -	err = drm_gpuvm_prepare_vm(&vm->gpuvm, exec, vm-
> > >preempt.num_exec_queues +
> > -				   vm->xe->info.tile_count);
> > +	err = drm_gpuvm_prepare_vm(&vm->gpuvm, exec, 0);
> >  	if (err)
> >  		return err;
> >  
> > @@ -507,7 +504,7 @@ static int xe_preempt_work_begin(struct
> > drm_exec *exec, struct xe_vm *vm,
> >  		return 0;
> >  	}
> >  
> > -	err = drm_gpuvm_prepare_objects(&vm->gpuvm, exec, vm-
> > >preempt.num_exec_queues);
> > +	err = drm_gpuvm_prepare_objects(&vm->gpuvm, exec, 0);
> 
> I think this needs to be 1 as drm_gpuvm_validate needs a fence slot,
> right?

You mean for the bo move? It should be allocated by TTM on demand.

> 
> >  	if (err)
> >  		return err;
> >  
> > @@ -515,7 +512,17 @@ static int xe_preempt_work_begin(struct
> > drm_exec *exec, struct xe_vm *vm,
> >  	if (err)
> >  		return err;
> >  
> > -	return drm_gpuvm_validate(&vm->gpuvm, exec);
> > +	err = drm_gpuvm_validate(&vm->gpuvm, exec);
> > +	if (err)
> > +		return err;
> > +
> > +	drm_exec_for_each_locked_object(exec, index, obj) {
> > +		err = dma_resv_reserve_fences(obj->resv, vm-
> > >preempt.num_exec_queues);
> 
> What if we only have external BOs? Do we allocate slots in the VM
> dma-resv slot with this loop?

Yes we do.

> 
> Also what is the behavior of?
> 
> dma_resv_reserve_fences(obj, 1);
> dma_resv_reserve_fences(obj, 1);
> 
> Do we have 2 slots now? e.g. to calls to reserve slots accumulate?

Nope, there is only guaranteed to be 1. (There have been several
attempts to change those semantics, but I don't know what happened to
those TBH). I think we need to follow the principle 

"Anything that attaches a fence needs to reserve it, and no recursion
to be future-proof".


> 
> Lastly, strickly speaking I don't we need to patch in this series, do
> we? It doesn't appaer to being fixing anything unless I'm missing
> something.

It's actually the removed comment:

" ...however the fence slot will just be
* re-used, since they are largely the same timeline and the seqno
* should be in order. In the case of CPU bind there is dummy
fence used..." 

That made me scared. With the change of invalidation fences, that is
not true anymore, and with multiple tiles we now do the right thing
with this patch AFAICT.

/Thomas



> 
> Matt
> 
> > +		if (err)
> > +			return err;
> > +	}
> > +
> > +	return 0;
> >  }
> >  
> >  static void preempt_rebind_work_func(struct work_struct *w)
> > @@ -1004,35 +1011,26 @@ static void xe_vma_destroy(struct xe_vma
> > *vma, struct dma_fence *fence)
> >  }
> >  
> >  /**
> > - * xe_vm_prepare_vma() - drm_exec utility to lock a vma
> > + * xe_vm_lock_vma() - drm_exec utility to lock a vma
> >   * @exec: The drm_exec object we're currently locking for.
> >   * @vma: The vma for witch we want to lock the vm resv and any
> > attached
> >   * object's resv.
> > - * @num_shared: The number of dma-fence slots to pre-allocate in
> > the
> > - * objects' reservation objects.
> >   *
> >   * Return: 0 on success, negative error code on error. In
> > particular
> >   * may return -EDEADLK on WW transaction contention and -EINTR if
> >   * an interruptible wait is terminated by a signal.
> >   */
> > -int xe_vm_prepare_vma(struct drm_exec *exec, struct xe_vma *vma,
> > -		      unsigned int num_shared)
> > +int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma)
> >  {
> >  	struct xe_vm *vm = xe_vma_vm(vma);
> >  	struct xe_bo *bo = xe_vma_bo(vma);
> >  	int err;
> >  
> >  	XE_WARN_ON(!vm);
> > -	if (num_shared)
> > -		err = drm_exec_prepare_obj(exec, xe_vm_obj(vm),
> > num_shared);
> > -	else
> > -		err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
> > -	if (!err && bo && !bo->vm) {
> > -		if (num_shared)
> > -			err = drm_exec_prepare_obj(exec, &bo-
> > >ttm.base, num_shared);
> > -		else
> > -			err = drm_exec_lock_obj(exec, &bo-
> > >ttm.base);
> > -	}
> > +
> > +	err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
> > +	if (!err && bo && !bo->vm)
> > +		err = drm_exec_lock_obj(exec, &bo->ttm.base);
> >  
> >  	return err;
> >  }
> > @@ -1044,7 +1042,7 @@ static void xe_vma_destroy_unlocked(struct
> > xe_vma *vma)
> >  
> >  	drm_exec_init(&exec, 0, 0);
> >  	drm_exec_until_all_locked(&exec) {
> > -		err = xe_vm_prepare_vma(&exec, vma, 0);
> > +		err = xe_vm_lock_vma(&exec, vma);
> >  		drm_exec_retry_on_contention(&exec);
> >  		if (XE_WARN_ON(err))
> >  			break;
> > @@ -2511,7 +2509,7 @@ static int op_execute(struct drm_exec *exec,
> > struct xe_vm *vm,
> >  
> >  	lockdep_assert_held_write(&vm->lock);
> >  
> > -	err = xe_vm_prepare_vma(exec, vma, 1);
> > +	err = xe_vm_lock_vma(exec, vma);
> >  	if (err)
> >  		return err;
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_vm.h
> > b/drivers/gpu/drm/xe/xe_vm.h
> > index 6df1f1c7f85d..47f3960120a0 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.h
> > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > @@ -242,8 +242,7 @@ bool xe_vm_validate_should_retry(struct
> > drm_exec *exec, int err, ktime_t *end);
> >  
> >  int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int
> > gt_id);
> >  
> > -int xe_vm_prepare_vma(struct drm_exec *exec, struct xe_vma *vma,
> > -		      unsigned int num_shared);
> > +int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma);
> >  
> >  /**
> >   * xe_vm_resv() - Return's the vm's reservation object
> > -- 
> > 2.44.0
> > 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds
  2024-03-21 19:09   ` Matthew Brost
  2024-03-21 21:14     ` Thomas Hellström
@ 2024-03-21 21:55     ` Thomas Hellström
  1 sibling, 0 replies; 23+ messages in thread
From: Thomas Hellström @ 2024-03-21 21:55 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, stable

On Thu, 2024-03-21 at 19:09 +0000, Matthew Brost wrote:
> 
> 
> Can we simplify this too?
> 
> 	if (vm && (vm->batch_invalidate_tlb || (vm->tlb_flush_seqno
> != q->tlb_flush_seqno))) {
> 		q->tlb_flush_seqno = vm->tlb_flush_seqno;
> 		job->ring_ops_flush_tlb = true;
> 	}
> 
> I think this works as xe_sched_job_is_migration has
> emit_migration_job_gen12 which doesn't look at job-
> >ring_ops_flush_tlb,
> so no need to xe_sched_job_is_migration.
> 
> Also no need to check xe_vm_in_lr_mode as we wouldn'y increment the
> seqno above if that true.
> 
> Lastly, harmless to increment q->tlb_flush_seqno in the case of
> batch_invalidate_tlb being true.

I think I can simplify it a bit. Problem is that neither the migration
vm nor lr mode grabs the vm->resv at arm time so we would access the
seqnos unlocked and potentially get caught by clever static analyzers.

I actually had an assert for that at one point. I should probably re-
add it.

/Thomas


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds
  2024-03-21 21:21       ` Thomas Hellström
@ 2024-03-22  2:36         ` Matthew Brost
  0 siblings, 0 replies; 23+ messages in thread
From: Matthew Brost @ 2024-03-22  2:36 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe, stable

On Thu, Mar 21, 2024 at 10:21:07PM +0100, Thomas Hellström wrote:
> On Thu, 2024-03-21 at 22:14 +0100, Thomas Hellström wrote:
> > Hi, Matthew,
> > 
> > Thanks for reviewing, please see inline.
> > 
> > On Thu, 2024-03-21 at 19:09 +0000, Matthew Brost wrote:
> > > On Thu, Mar 21, 2024 at 12:37:11PM +0100, Thomas Hellström wrote:
> > > > For each rebind we insert a GuC TLB invalidation and add a
> > > > corresponding unordered TLB invalidation fence. This might
> > > > add a huge number of TLB invalidation fences to wait for so
> > > > rather than doing that, defer the TLB invalidation to the
> > > > next ring ops for each affected exec queue. Since the TLB
> > > > is invalidated on exec_queue switch, we need to invalidate
> > > > once for each affected exec_queue.
> > > > 
> > > > Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after
> > > > rebinds issued from execs")
> > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > Cc: <stable@vger.kernel.org> # v6.8+
> > > > Signed-off-by: Thomas Hellström
> > > > <thomas.hellstrom@linux.intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_exec_queue_types.h |  2 ++
> > > >  drivers/gpu/drm/xe/xe_pt.c               |  5 +++--
> > > >  drivers/gpu/drm/xe/xe_ring_ops.c         | 11 ++++-------
> > > >  drivers/gpu/drm/xe/xe_sched_job.c        | 11 +++++++++++
> > > >  drivers/gpu/drm/xe/xe_sched_job_types.h  |  2 ++
> > > >  drivers/gpu/drm/xe/xe_vm_types.h         |  5 +++++
> > > >  6 files changed, 27 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > index 62b3d9d1d7cd..891ad30e906f 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > @@ -148,6 +148,8 @@ struct xe_exec_queue {
> > > >  	const struct xe_ring_ops *ring_ops;
> > > >  	/** @entity: DRM sched entity for this exec queue (1 to
> > > > 1
> > > > relationship) */
> > > >  	struct drm_sched_entity *entity;
> > > > +	/** @tlb_flush_seqno: The seqno of the last rebind tlb
> > > > flush performed */
> > > > +	u64 tlb_flush_seqno;
> > > >  	/** @lrc: logical ring context for this exec queue */
> > > >  	struct xe_lrc lrc[];
> > > >  };
> > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > > > b/drivers/gpu/drm/xe/xe_pt.c
> > > > index 8d3922d2206e..21bc0d13fccf 100644
> > > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > > @@ -1254,11 +1254,12 @@ __xe_pt_bind_vma(struct xe_tile *tile,
> > > > struct xe_vma *vma, struct xe_exec_queue
> > > >  	 * non-faulting LR, in particular on user-space batch
> > > > buffer chaining,
> > > >  	 * it needs to be done here.
> > > >  	 */
> > > > -	if ((rebind && !xe_vm_in_lr_mode(vm) && !vm-
> 
> While I remember it. When looking at your series I noticed that this
> line got incorrectly moved there. Looks like you used
> xe_vm_in_lr_mode() rather than !xe_vm_in_lr_mode()..
> 

Thanks, indeed I did invert that logic. Will fix in my next rev which
I'll rebase after this is merged.

Matt

> Thomas
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] drm/xe: Make TLB invalidation fences unordered
  2024-03-21 11:37 ` [PATCH 3/7] drm/xe: Make TLB invalidation fences unordered Thomas Hellström
@ 2024-03-22 17:55   ` Matthew Brost
  0 siblings, 0 replies; 23+ messages in thread
From: Matthew Brost @ 2024-03-22 17:55 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe, stable

On Thu, Mar 21, 2024 at 12:37:14PM +0100, Thomas Hellström wrote:
> They can actually complete out-of-order, so allocate a unique
> fence context for each fence.
> 

Yes indeed these can complete out ordered on different xe_exec_queue but
should be ordered within an xe_exec_queue.

In addition to this patch I think we will need [1] too.

This patch does LGTM though, with that:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

[1] https://patchwork.freedesktop.org/patch/582006/?series=125608&rev=5

> Fixes: 5387e865d90e ("drm/xe: Add TLB invalidation fence after rebinds issued from execs")
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: <stable@vger.kernel.org> # v6.8+
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 1 -
>  drivers/gpu/drm/xe/xe_gt_types.h            | 7 -------
>  drivers/gpu/drm/xe/xe_pt.c                  | 3 +--
>  3 files changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> index a3c4ffba679d..787cba5e49a1 100644
> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> @@ -63,7 +63,6 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
>  	INIT_LIST_HEAD(&gt->tlb_invalidation.pending_fences);
>  	spin_lock_init(&gt->tlb_invalidation.pending_lock);
>  	spin_lock_init(&gt->tlb_invalidation.lock);
> -	gt->tlb_invalidation.fence_context = dma_fence_context_alloc(1);
>  	INIT_DELAYED_WORK(&gt->tlb_invalidation.fence_tdr,
>  			  xe_gt_tlb_fence_timeout);
>  
> diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
> index f6da2ad9719f..2143dffcaf11 100644
> --- a/drivers/gpu/drm/xe/xe_gt_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_types.h
> @@ -179,13 +179,6 @@ struct xe_gt {
>  		 * xe_gt_tlb_fence_timeout after the timeut interval is over.
>  		 */
>  		struct delayed_work fence_tdr;
> -		/** @tlb_invalidation.fence_context: context for TLB invalidation fences */
> -		u64 fence_context;
> -		/**
> -		 * @tlb_invalidation.fence_seqno: seqno to TLB invalidation fences, protected by
> -		 * tlb_invalidation.lock
> -		 */
> -		u32 fence_seqno;
>  		/** @tlb_invalidation.lock: protects TLB invalidation fences */
>  		spinlock_t lock;
>  	} tlb_invalidation;
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 0484ed5b495f..045a8c0845ba 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1135,8 +1135,7 @@ static int invalidation_fence_init(struct xe_gt *gt,
>  	spin_lock_irq(&gt->tlb_invalidation.lock);
>  	dma_fence_init(&ifence->base.base, &invalidation_fence_ops,
>  		       &gt->tlb_invalidation.lock,
> -		       gt->tlb_invalidation.fence_context,
> -		       ++gt->tlb_invalidation.fence_seqno);
> +		       dma_fence_context_alloc(1), 1);
>  	spin_unlock_irq(&gt->tlb_invalidation.lock);
>  
>  	INIT_LIST_HEAD(&ifence->base.link);
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-03-22 17:56 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-21 11:37 [PATCH 0/7] drm/xe: Rework rebinding and allow same-vm eviction Thomas Hellström
2024-03-21 11:37 ` [PATCH 1/7] drm/xe: Make TLB invalidation fences unordered Thomas Hellström
2024-03-21 11:37 ` [PATCH 1/7] drm/xe: Use ring ops TLB invalidation for rebinds Thomas Hellström
2024-03-21 19:09   ` Matthew Brost
2024-03-21 21:14     ` Thomas Hellström
2024-03-21 21:21       ` Thomas Hellström
2024-03-22  2:36         ` Matthew Brost
2024-03-21 21:55     ` Thomas Hellström
2024-03-21 11:37 ` [PATCH 2/7] drm/xe: Rework rebinding Thomas Hellström
2024-03-21 19:14   ` Matthew Brost
2024-03-21 21:16     ` Thomas Hellström
2024-03-21 11:37 ` [PATCH 2/7] drm/xe: Use ring ops TLB invalidation for rebinds Thomas Hellström
2024-03-21 11:37 ` [PATCH 3/7] drm/xe: Make TLB invalidation fences unordered Thomas Hellström
2024-03-22 17:55   ` Matthew Brost
2024-03-21 11:37 ` [PATCH 3/7] drm/xe: Reserve fences where needed Thomas Hellström
2024-03-21 20:10   ` Matthew Brost
2024-03-21 21:34     ` Thomas Hellström
2024-03-21 11:37 ` [PATCH 4/7] " Thomas Hellström
2024-03-21 11:37 ` [PATCH 4/7] drm/xe: Rework rebinding Thomas Hellström
2024-03-21 11:37 ` [PATCH 5/7] drm/xe: Move vma rebinding to the drm_exec locking loop Thomas Hellström
2024-03-21 11:37 ` [PATCH 6/7] drm/xe/bo: Forward the decision to evict local objects during validation Thomas Hellström
2024-03-21 11:37 ` [PATCH 7/7] drm/xe/bo: Allow eviction of unbound local bos Thomas Hellström
2024-03-21 13:44 ` ✗ CI.Patch_applied: failure for drm/xe: Rework rebinding and allow same-vm eviction Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.