[PATCH 0/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations.

dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations.
@ 2023-10-31 13:40 Tatsuyuki Ishi
  2023-10-31 13:40 ` [PATCH 1/6] drm/amdgpu: Don't implicit sync PRT maps Tatsuyuki Ishi
                   ` (6 more replies)
  0 siblings, 7 replies; 34+ messages in thread
From: Tatsuyuki Ishi @ 2023-10-31 13:40 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Tatsuyuki Ishi, christian.koenig

In Vulkan, it is the application's responsibility to perform adequate
synchronization before a sparse unmap, replace or BO destroy operation.
This adds an option to AMDGPU_VA_OPs to disable redundant implicit sync
that happens on sparse unmap or replace operations.

This has seen a significant improvement in stutter in Forza Horizon 5
and Forza Horizon 4. (As games that had significant issues in sparse
binding related stutter).

This patchset also address a tangential issue that some changes were
not being flushed immediately after the ioctls, but were deferred to be
processed on the next CS submission, which also results in stalling.
A refactor of state machine is included to achieve this.

Compared to the previous series [1], this specifically targets the VM
operations and keep everything else intact, including implicit sync on
kernel-initiated moves.

I've been able to pass a full Vulkan CTS run on Navi 10 with this.

Userspace code for this is available at [2] and a branch for the kernel
code is available at [3].

[1]: https://lore.kernel.org/all/20230821062005.109771-1-ishitatsuyuki@gmail.com/
[2]: https://gitlab.freedesktop.org/ishitatsuyuki/mesa/-/commits/vm-explicit-sync
[3]: https://github.com/ishitatsuyuki/linux/tree/explicit-sync-drm-misc-next

Tatsuyuki Ishi (6):
  drm/amdgpu: Don't implicit sync PRT maps.
  drm/amdgpu: Separate eviction from VM status.
  drm/amdgpu: Flush VM updates for split bindings eagerly.
  drm/amdgpu: Remove redundant state change after validation.
  drm/amdgpu: Add flag to disable implicit sync for GEM operations.
  drm/amdgpu: Bump amdgpu driver version.

 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c       |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       |  32 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |   7 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h     |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        | 185 ++++++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |  27 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c     |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  18 +-
 include/uapi/drm/amdgpu_drm.h                 |   2 +
 11 files changed, 165 insertions(+), 120 deletions(-)

-- 
2.42.0

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/6] drm/amdgpu: Don't implicit sync PRT maps.
  2023-10-31 13:40 [PATCH 0/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
@ 2023-10-31 13:40 ` Tatsuyuki Ishi
  2023-10-31 13:40 ` [PATCH 2/6] drm/amdgpu: Separate eviction from VM status Tatsuyuki Ishi
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 34+ messages in thread
From: Tatsuyuki Ishi @ 2023-10-31 13:40 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Tatsuyuki Ishi, christian.koenig

These are considered map operations rather than unmap, and there is no
point of doing implicit synchronization here.

Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index f5daadcec865..7b9762f1cddd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -902,7 +902,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	/* Implicitly sync to command submissions in the same VM before
 	 * unmapping. Sync to moving fences before mapping.
 	 */
-	if (!(flags & AMDGPU_PTE_VALID))
+	if (!(flags & (AMDGPU_PTE_VALID | AMDGPU_PTE_PRT)))
 		sync_mode = AMDGPU_SYNC_EQ_OWNER;
 	else
 		sync_mode = AMDGPU_SYNC_EXPLICIT;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/6] drm/amdgpu: Separate eviction from VM status.
  2023-10-31 13:40 [PATCH 0/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
  2023-10-31 13:40 ` [PATCH 1/6] drm/amdgpu: Don't implicit sync PRT maps Tatsuyuki Ishi
@ 2023-10-31 13:40 ` Tatsuyuki Ishi
  2023-10-31 13:55   ` Christian König
  2023-10-31 23:52   ` kernel test robot
  2023-10-31 13:40 ` [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly Tatsuyuki Ishi
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 34+ messages in thread
From: Tatsuyuki Ishi @ 2023-10-31 13:40 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Tatsuyuki Ishi, christian.koenig

In short, eviction never really belonged to the vm_status state machine.
Even when evicted, the BO could belong to either the moved or done state.
The "evicted" state needed to handle both cases, causing greater confusion.

Additionally, there were inconsistencies in the definition of an evicted
BO. Some places are based on the `evict` parameter passed from the TTM move
callback, while the others were updated based on whether the BO got its
optimal placement. The second is more accurate for our use case. With this
refactor, the evicted state is solely determined by the second rule.

Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    | 67 +++++++++--------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h    |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c |  1 +
 3 files changed, 29 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 7b9762f1cddd..dd6f72e2a1d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -174,19 +174,23 @@ int amdgpu_vm_set_pasid(struct amdgpu_device *adev, struct amdgpu_vm *vm,
  * State for PDs/PTs and per VM BOs which are not at the location they should
  * be.
  */
-static void amdgpu_vm_bo_evicted(struct amdgpu_vm_bo_base *vm_bo)
+static void amdgpu_vm_bo_set_evicted(struct amdgpu_vm_bo_base *vm_bo, bool evicted)
 {
 	struct amdgpu_vm *vm = vm_bo->vm;
 	struct amdgpu_bo *bo = vm_bo->bo;
 
-	vm_bo->moved = true;
 	spin_lock(&vm_bo->vm->status_lock);
-	if (bo->tbo.type == ttm_bo_type_kernel)
-		list_move(&vm_bo->vm_status, &vm->evicted);
-	else
-		list_move_tail(&vm_bo->vm_status, &vm->evicted);
+	if (evicted && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
+		if (bo->tbo.type == ttm_bo_type_kernel)
+			list_move(&vm_bo->eviction_status, &vm->evicted);
+		else
+			list_move_tail(&vm_bo->eviction_status, &vm->evicted);
+	} else {
+		list_del_init(&vm_bo->eviction_status);
+	}
 	spin_unlock(&vm_bo->vm->status_lock);
 }
+
 /**
  * amdgpu_vm_bo_moved - vm_bo is moved
  *
@@ -310,6 +314,7 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
 	base->bo = bo;
 	base->next = NULL;
 	INIT_LIST_HEAD(&base->vm_status);
+	INIT_LIST_HEAD(&base->eviction_status);
 
 	if (!bo)
 		return;
@@ -336,7 +341,7 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
 	 * is currently evicted. add the bo to the evicted list to make sure it
 	 * is validated on next vm use to avoid fault.
 	 * */
-	amdgpu_vm_bo_evicted(base);
+	amdgpu_vm_bo_set_evicted(base, true);
 }
 
 /**
@@ -460,7 +465,7 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	while (!list_empty(&vm->evicted)) {
 		bo_base = list_first_entry(&vm->evicted,
 					   struct amdgpu_vm_bo_base,
-					   vm_status);
+					   eviction_status);
 		spin_unlock(&vm->status_lock);
 
 		bo = bo_base->bo;
@@ -1034,7 +1039,7 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm,
 	list_for_each_entry_safe(bo_va, tmp, &vm->idle, base.vm_status)
 		amdgpu_vm_bo_get_memory(bo_va, stats);
 
-	list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.vm_status)
+	list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.eviction_status)
 		amdgpu_vm_bo_get_memory(bo_va, stats);
 
 	list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status)
@@ -1153,21 +1158,10 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
 			return r;
 	}
 
-	/* If the BO is not in its preferred location add it back to
-	 * the evicted list so that it gets validated again on the
-	 * next command submission.
-	 */
-	if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
-		uint32_t mem_type = bo->tbo.resource->mem_type;
-
-		if (!(bo->preferred_domains &
-		      amdgpu_mem_type_to_domain(mem_type)))
-			amdgpu_vm_bo_evicted(&bo_va->base);
-		else
-			amdgpu_vm_bo_idle(&bo_va->base);
-	} else {
+	if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv)
+		amdgpu_vm_bo_idle(&bo_va->base);
+	else
 		amdgpu_vm_bo_done(&bo_va->base);
-	}
 
 	list_splice_init(&bo_va->invalids, &bo_va->valids);
 	bo_va->cleared = clear;
@@ -1883,6 +1877,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
 
 	spin_lock(&vm->status_lock);
 	list_del(&bo_va->base.vm_status);
+	list_del(&bo_va->base.eviction_status);
 	spin_unlock(&vm->status_lock);
 
 	list_for_each_entry_safe(mapping, next, &bo_va->valids, list) {
@@ -1959,13 +1954,18 @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
 	if (bo->parent && (amdgpu_bo_shadowed(bo->parent) == bo))
 		bo = bo->parent;
 
+	/* If the BO is not in its preferred location add it back to
+	 * the evicted list so that it gets validated again on the
+	 * next command submission.
+	 */
+	uint32_t mem_type = bo->tbo.resource->mem_type;
+	bool suboptimal = !(bo->preferred_domains &
+			 amdgpu_mem_type_to_domain(mem_type));
+
 	for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) {
 		struct amdgpu_vm *vm = bo_base->vm;
 
-		if (evicted && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
-			amdgpu_vm_bo_evicted(bo_base);
-			continue;
-		}
+		amdgpu_vm_bo_set_evicted(bo_base, suboptimal);
 
 		if (bo_base->moved)
 			continue;
@@ -2648,13 +2648,11 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
 {
 	struct amdgpu_bo_va *bo_va, *tmp;
 	u64 total_idle = 0;
-	u64 total_evicted = 0;
 	u64 total_relocated = 0;
 	u64 total_moved = 0;
 	u64 total_invalidated = 0;
 	u64 total_done = 0;
 	unsigned int total_idle_objs = 0;
-	unsigned int total_evicted_objs = 0;
 	unsigned int total_relocated_objs = 0;
 	unsigned int total_moved_objs = 0;
 	unsigned int total_invalidated_objs = 0;
@@ -2671,15 +2669,6 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
 	total_idle_objs = id;
 	id = 0;
 
-	seq_puts(m, "\tEvicted BOs:\n");
-	list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.vm_status) {
-		if (!bo_va->base.bo)
-			continue;
-		total_evicted += amdgpu_bo_print_info(id++, bo_va->base.bo, m);
-	}
-	total_evicted_objs = id;
-	id = 0;
-
 	seq_puts(m, "\tRelocated BOs:\n");
 	list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status) {
 		if (!bo_va->base.bo)
@@ -2718,8 +2707,6 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
 
 	seq_printf(m, "\tTotal idle size:        %12lld\tobjs:\t%d\n", total_idle,
 		   total_idle_objs);
-	seq_printf(m, "\tTotal evicted size:     %12lld\tobjs:\t%d\n", total_evicted,
-		   total_evicted_objs);
 	seq_printf(m, "\tTotal relocated size:   %12lld\tobjs:\t%d\n", total_relocated,
 		   total_relocated_objs);
 	seq_printf(m, "\tTotal moved size:       %12lld\tobjs:\t%d\n", total_moved,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 204ab13184ed..d9ab97eabda9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -156,6 +156,7 @@ struct amdgpu_vm_bo_base {
 
 	/* protected by spinlock */
 	struct list_head		vm_status;
+	struct list_head		eviction_status;
 
 	/* protected by the BO being reserved */
 	bool				moved;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 96d601e209b8..f78f4040f466 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -652,6 +652,7 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry)
 
 	spin_lock(&entry->vm->status_lock);
 	list_del(&entry->vm_status);
+	list_del(&entry->eviction_status);
 	spin_unlock(&entry->vm->status_lock);
 	amdgpu_bo_unref(&entry->bo);
 }
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.
  2023-10-31 13:40 [PATCH 0/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
  2023-10-31 13:40 ` [PATCH 1/6] drm/amdgpu: Don't implicit sync PRT maps Tatsuyuki Ishi
  2023-10-31 13:40 ` [PATCH 2/6] drm/amdgpu: Separate eviction from VM status Tatsuyuki Ishi
@ 2023-10-31 13:40 ` Tatsuyuki Ishi
  2023-10-31 13:57   ` Christian König
  2023-11-01  1:18   ` kernel test robot
  2023-10-31 13:40 ` [PATCH 4/6] drm/amdgpu: Remove redundant state change after validation Tatsuyuki Ishi
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 34+ messages in thread
From: Tatsuyuki Ishi @ 2023-10-31 13:40 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Tatsuyuki Ishi, christian.koenig

The current amdgpu_gem_va_update_vm only tries to perform updates for the
BO specified in the GEM ioctl; however, when a binding is split, the
adjacent bindings also need to be updated. Such updates currently ends up
getting deferred until next submission which causes stalls.

Introduce a new state "dirty", shared between per-VM BOs and traditional
BOs, containing all BOs that have pending updates in `invalids`.
amdgpu_gem_va_update_vm will now simply flush any pending updates for BOs
in the dirty state.

Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 18 ++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 66 ++++++++++++++++++-------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  3 ++
 3 files changed, 63 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index a1b15d0d6c48..01d3a97248b0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -604,10 +604,9 @@ int amdgpu_gem_metadata_ioctl(struct drm_device *dev, void *data,
  * vital here, so they are not reported back to userspace.
  */
 static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
-				    struct amdgpu_vm *vm,
-				    struct amdgpu_bo_va *bo_va,
-				    uint32_t operation)
+				    struct amdgpu_vm *vm)
 {
+	struct amdgpu_bo_va *bo_va;
 	int r;
 
 	if (!amdgpu_vm_ready(vm))
@@ -617,12 +616,18 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
 	if (r)
 		goto error;
 
-	if (operation == AMDGPU_VA_OP_MAP ||
-	    operation == AMDGPU_VA_OP_REPLACE) {
+	spin_lock(&vm->status_lock);
+	while (!list_empty(&vm->dirty)) {
+		bo_va = list_first_entry(&vm->dirty, struct amdgpu_bo_va,
+					 base.vm_status);
+		spin_unlock(&vm->status_lock);
+
 		r = amdgpu_vm_bo_update(adev, bo_va, false);
 		if (r)
 			goto error;
+		spin_lock(&vm->status_lock);
 	}
+	spin_unlock(&vm->status_lock);
 
 	r = amdgpu_vm_update_pdes(adev, vm, false);
 
@@ -792,8 +797,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 		break;
 	}
 	if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) && !amdgpu_vm_debug)
-		amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
-					args->operation);
+		amdgpu_gem_va_update_vm(adev, &fpriv->vm);
 
 error:
 	drm_exec_fini(&exec);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index dd6f72e2a1d6..01d31891cd05 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -191,6 +191,21 @@ static void amdgpu_vm_bo_set_evicted(struct amdgpu_vm_bo_base *vm_bo, bool evict
 	spin_unlock(&vm_bo->vm->status_lock);
 }
 
+/**
+ * amdgpu_vm_bo_dirty - vm_bo is dirty
+ *
+ * @vm_bo: vm_bo which is dirty
+ *
+ * State for normal and per VM BOs that are not moved, but have new entries in
+ * bo_va->invalids.
+ */
+static void amdgpu_vm_bo_dirty(struct amdgpu_vm_bo_base *vm_bo)
+{
+	spin_lock(&vm_bo->vm->status_lock);
+	list_move(&vm_bo->vm_status, &vm_bo->vm->dirty);
+	spin_unlock(&vm_bo->vm->status_lock);
+}
+
 /**
  * amdgpu_vm_bo_moved - vm_bo is moved
  *
@@ -1042,6 +1057,9 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm,
 	list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.eviction_status)
 		amdgpu_vm_bo_get_memory(bo_va, stats);
 
+	list_for_each_entry_safe(bo_va, tmp, &vm->dirty, base.vm_status)
+		amdgpu_vm_bo_get_memory(bo_va, stats);
+
 	list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status)
 		amdgpu_vm_bo_get_memory(bo_va, stats);
 
@@ -1411,6 +1429,17 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
 			dma_resv_unlock(resv);
 		spin_lock(&vm->status_lock);
 	}
+
+	while (!list_empty(&vm->dirty)) {
+		bo_va = list_first_entry(&vm->dirty, struct amdgpu_bo_va,
+					 base.vm_status);
+		spin_unlock(&vm->status_lock);
+
+		r = amdgpu_vm_bo_update(adev, bo_va, false);
+		if (r)
+			return r;
+		spin_lock(&vm->status_lock);
+	}
 	spin_unlock(&vm->status_lock);
 
 	return 0;
@@ -1476,19 +1505,16 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device *adev,
 				    struct amdgpu_bo_va_mapping *mapping)
 {
 	struct amdgpu_vm *vm = bo_va->base.vm;
-	struct amdgpu_bo *bo = bo_va->base.bo;
 
 	mapping->bo_va = bo_va;
 	list_add(&mapping->list, &bo_va->invalids);
 	amdgpu_vm_it_insert(mapping, &vm->va);
+	if (!bo_va->base.moved)
+		amdgpu_vm_bo_dirty(&bo_va->base);
 
 	if (mapping->flags & AMDGPU_PTE_PRT)
 		amdgpu_vm_prt_get(adev);
 
-	if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
-	    !bo_va->base.moved) {
-		amdgpu_vm_bo_moved(&bo_va->base);
-	}
 	trace_amdgpu_vm_bo_map(bo_va, mapping);
 }
 
@@ -1725,6 +1751,8 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
 			before->flags = tmp->flags;
 			before->bo_va = tmp->bo_va;
 			list_add(&before->list, &tmp->bo_va->invalids);
+			if (!tmp->bo_va->base.moved)
+				amdgpu_vm_bo_dirty(&tmp->bo_va->base);
 		}
 
 		/* Remember mapping split at the end */
@@ -1736,6 +1764,8 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
 			after->flags = tmp->flags;
 			after->bo_va = tmp->bo_va;
 			list_add(&after->list, &tmp->bo_va->invalids);
+			if (!tmp->bo_va->base.moved)
+				amdgpu_vm_bo_dirty(&tmp->bo_va->base);
 		}
 
 		list_del(&tmp->list);
@@ -1761,30 +1791,18 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
 
 	/* Insert partial mapping before the range */
 	if (!list_empty(&before->list)) {
-		struct amdgpu_bo *bo = before->bo_va->base.bo;
-
 		amdgpu_vm_it_insert(before, &vm->va);
 		if (before->flags & AMDGPU_PTE_PRT)
 			amdgpu_vm_prt_get(adev);
-
-		if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
-		    !before->bo_va->base.moved)
-			amdgpu_vm_bo_moved(&before->bo_va->base);
 	} else {
 		kfree(before);
 	}
 
 	/* Insert partial mapping after the range */
 	if (!list_empty(&after->list)) {
-		struct amdgpu_bo *bo = after->bo_va->base.bo;
-
 		amdgpu_vm_it_insert(after, &vm->va);
 		if (after->flags & AMDGPU_PTE_PRT)
 			amdgpu_vm_prt_get(adev);
-
-		if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
-		    !after->bo_va->base.moved)
-			amdgpu_vm_bo_moved(&after->bo_va->base);
 	} else {
 		kfree(after);
 	}
@@ -2136,6 +2154,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, int32_t xcp
 	INIT_LIST_HEAD(&vm->evicted);
 	INIT_LIST_HEAD(&vm->relocated);
 	INIT_LIST_HEAD(&vm->moved);
+	INIT_LIST_HEAD(&vm->dirty);
 	INIT_LIST_HEAD(&vm->idle);
 	INIT_LIST_HEAD(&vm->invalidated);
 	spin_lock_init(&vm->status_lock);
@@ -2648,11 +2667,13 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
 {
 	struct amdgpu_bo_va *bo_va, *tmp;
 	u64 total_idle = 0;
+	u64 total_dirty = 0;
 	u64 total_relocated = 0;
 	u64 total_moved = 0;
 	u64 total_invalidated = 0;
 	u64 total_done = 0;
 	unsigned int total_idle_objs = 0;
+	unsigned int total_dirty_objs = 0;
 	unsigned int total_relocated_objs = 0;
 	unsigned int total_moved_objs = 0;
 	unsigned int total_invalidated_objs = 0;
@@ -2669,6 +2690,15 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
 	total_idle_objs = id;
 	id = 0;
 
+	seq_puts(m, "\tDirty BOs:\n");
+	list_for_each_entry_safe(bo_va, tmp, &vm->dirty, base.vm_status) {
+		if (!bo_va->base.bo)
+			continue;
+		total_dirty += amdgpu_bo_print_info(id++, bo_va->base.bo, m);
+	}
+	total_dirty_objs = id;
+	id = 0;
+
 	seq_puts(m, "\tRelocated BOs:\n");
 	list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status) {
 		if (!bo_va->base.bo)
@@ -2707,6 +2737,8 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
 
 	seq_printf(m, "\tTotal idle size:        %12lld\tobjs:\t%d\n", total_idle,
 		   total_idle_objs);
+	seq_printf(m, "\tTotal dirty size:       %12lld\tobjs:\t%d\n", total_dirty,
+		   total_dirty_objs);
 	seq_printf(m, "\tTotal relocated size:   %12lld\tobjs:\t%d\n", total_relocated,
 		   total_relocated_objs);
 	seq_printf(m, "\tTotal moved size:       %12lld\tobjs:\t%d\n", total_moved,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index d9ab97eabda9..f91d4fcf80b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -276,6 +276,9 @@ struct amdgpu_vm {
 	/* per VM BOs moved, but not yet updated in the PT */
 	struct list_head	moved;
 
+	/* normal and per VM BOs that are not moved, but have new PT entries */
+	struct list_head	dirty;
+
 	/* All BOs of this VM not currently in the state machine */
 	struct list_head	idle;
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/6] drm/amdgpu: Remove redundant state change after validation.
  2023-10-31 13:40 [PATCH 0/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
                   ` (2 preceding siblings ...)
  2023-10-31 13:40 ` [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly Tatsuyuki Ishi
@ 2023-10-31 13:40 ` Tatsuyuki Ishi
  2023-10-31 14:01   ` Christian König
  2023-10-31 13:40 ` [PATCH 5/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 34+ messages in thread
From: Tatsuyuki Ishi @ 2023-10-31 13:40 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Tatsuyuki Ishi, christian.koenig

All the state changes are handled in the TTM move callback; doing it again
here just leads to more confusion.

The table update remains here because it needs to be done exactly once,
while doing it in the move callback will result it getting triggered twice,
once by the actual BO and once by the shadow BO.

Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 01d31891cd05..50f7cee639ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -495,12 +495,9 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 				return r;
 		}
 
-		if (bo->tbo.type != ttm_bo_type_kernel) {
-			amdgpu_vm_bo_moved(bo_base);
-		} else {
+		if (bo->tbo.type == ttm_bo_type_kernel)
 			vm->update_funcs->map_table(to_amdgpu_bo_vm(bo));
-			amdgpu_vm_bo_relocated(bo_base);
-		}
+
 		spin_lock(&vm->status_lock);
 	}
 	spin_unlock(&vm->status_lock);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations.
  2023-10-31 13:40 [PATCH 0/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
                   ` (3 preceding siblings ...)
  2023-10-31 13:40 ` [PATCH 4/6] drm/amdgpu: Remove redundant state change after validation Tatsuyuki Ishi
@ 2023-10-31 13:40 ` Tatsuyuki Ishi
  2023-10-31 14:14   ` Michel Dänzer
  2023-11-01  2:42   ` kernel test robot
  2023-10-31 13:40 ` [PATCH 6/6] drm/amdgpu: Bump amdgpu driver version Tatsuyuki Ishi
  2023-11-02 14:04 ` [PATCH v2 0/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
  6 siblings, 2 replies; 34+ messages in thread
From: Tatsuyuki Ishi @ 2023-10-31 13:40 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Tatsuyuki Ishi, christian.koenig

In Vulkan, it is the application's responsibility to perform adequate
synchronization before a sparse unmap, replace or BO destroy operation.
Until now, the kernel applied the same rule as implicitly-synchronized
APIs like OpenGL, which with per-VM BOs made page table updates stall the
queue completely. The newly added AMDGPU_VM_EXPLICIT_SYNC flag allows
drivers to opt-out of this behavior, while still ensuring adequate implicit
sync happens for kernel-initiated updates (e.g. BO moves).

We record whether to use implicit sync or not for each freed mapping. To
avoid increasing the mapping struct's size, this is union-ized with the
interval tree field which is unused after the unmap.

The reason this is done with a GEM ioctl flag, instead of being a VM /
context global setting, is that the current libdrm implementation shares
the DRM handle even between different kind of drivers (radeonsi vs radv).

Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c       |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       | 14 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |  7 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h     |  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        | 55 +++++++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        | 23 ++++----
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 18 +++---
 include/uapi/drm/amdgpu_drm.h                 |  2 +
 9 files changed, 74 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 7d6daf8d2bfa..10e129bff977 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1196,7 +1196,7 @@ static void unmap_bo_from_gpuvm(struct kgd_mem *mem,
 	struct amdgpu_device *adev = entry->adev;
 	struct amdgpu_vm *vm = bo_va->base.vm;
 
-	amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
+	amdgpu_vm_bo_unmap(adev, bo_va, entry->va, true);
 
 	amdgpu_vm_clear_freed(adev, vm, &bo_va->last_pt_update);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 720011019741..612279e65bff 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -122,7 +122,7 @@ int amdgpu_unmap_static_csa(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		}
 	}
 
-	r = amdgpu_vm_bo_unmap(adev, bo_va, csa_addr);
+	r = amdgpu_vm_bo_unmap(adev, bo_va, csa_addr, true);
 	if (r) {
 		DRM_ERROR("failed to do bo_unmap on static CSA, err=%d\n", r);
 		goto error;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 01d3a97248b0..0d9496a06947 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -672,9 +672,9 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 	const uint32_t valid_flags = AMDGPU_VM_DELAY_UPDATE |
 		AMDGPU_VM_PAGE_READABLE | AMDGPU_VM_PAGE_WRITEABLE |
 		AMDGPU_VM_PAGE_EXECUTABLE | AMDGPU_VM_MTYPE_MASK |
-		AMDGPU_VM_PAGE_NOALLOC;
+		AMDGPU_VM_PAGE_NOALLOC | AMDGPU_VM_EXPLICIT_SYNC;
 	const uint32_t prt_flags = AMDGPU_VM_DELAY_UPDATE |
-		AMDGPU_VM_PAGE_PRT;
+		AMDGPU_VM_PAGE_PRT | AMDGPU_VM_EXPLICIT_SYNC;
 
 	struct drm_amdgpu_gem_va *args = data;
 	struct drm_gem_object *gobj;
@@ -685,6 +685,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 	struct drm_exec exec;
 	uint64_t va_flags;
 	uint64_t vm_size;
+	bool sync_unmap;
 	int r = 0;
 
 	if (args->va_address < AMDGPU_VA_RESERVED_SIZE) {
@@ -720,6 +721,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
+	sync_unmap = !(args->flags & AMDGPU_VM_EXPLICIT_SYNC);
+
 	switch (args->operation) {
 	case AMDGPU_VA_OP_MAP:
 	case AMDGPU_VA_OP_UNMAP:
@@ -779,19 +782,20 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 				     va_flags);
 		break;
 	case AMDGPU_VA_OP_UNMAP:
-		r = amdgpu_vm_bo_unmap(adev, bo_va, args->va_address);
+		r = amdgpu_vm_bo_unmap(adev, bo_va, args->va_address,
+				       sync_unmap);
 		break;
 
 	case AMDGPU_VA_OP_CLEAR:
 		r = amdgpu_vm_bo_clear_mappings(adev, &fpriv->vm,
 						args->va_address,
-						args->map_size);
+						args->map_size, sync_unmap);
 		break;
 	case AMDGPU_VA_OP_REPLACE:
 		va_flags = amdgpu_gem_va_map_flags(adev, args->flags);
 		r = amdgpu_vm_bo_replace_map(adev, bo_va, args->va_address,
 					     args->offset_in_bo, args->map_size,
-					     va_flags);
+					     va_flags, sync_unmap);
 		break;
 	default:
 		break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index f3ee83cdf97e..28be03f1bbcf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -67,7 +67,12 @@ struct amdgpu_bo_va_mapping {
 	struct rb_node			rb;
 	uint64_t			start;
 	uint64_t			last;
-	uint64_t			__subtree_last;
+	union {
+		/* BOs in interval tree only */
+		uint64_t		__subtree_last;
+		/* Freed BOs only */
+		bool			sync_unmap;
+	};
 	uint64_t			offset;
 	uint64_t			flags;
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index 2fd1bfb35916..e71443c8c59b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -276,6 +276,7 @@ TRACE_EVENT(amdgpu_vm_bo_unmap,
 			     __field(long, last)
 			     __field(u64, offset)
 			     __field(u64, flags)
+			     __field(bool, sync_unmap)
 			     ),
 
 	    TP_fast_assign(
@@ -284,10 +285,11 @@ TRACE_EVENT(amdgpu_vm_bo_unmap,
 			   __entry->last = mapping->last;
 			   __entry->offset = mapping->offset;
 			   __entry->flags = mapping->flags;
+			   __entry->sync_unmap = mapping->sync_unmap;
 			   ),
-	    TP_printk("bo=%p, start=%lx, last=%lx, offset=%010llx, flags=%llx",
+	    TP_printk("bo=%p, start=%lx, last=%lx, offset=%010llx, flags=%llx, sync_unmap=%d",
 		      __entry->bo, __entry->start, __entry->last,
-		      __entry->offset, __entry->flags)
+		      __entry->offset, __entry->flags, __entry->sync_unmap)
 );
 
 DECLARE_EVENT_CLASS(amdgpu_vm_mapping,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 50f7cee639ac..a15463e0bbc5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -861,6 +861,7 @@ static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence,
  * @immediate: immediate submission in a page fault
  * @unlocked: unlocked invalidation during MM callback
  * @flush_tlb: trigger tlb invalidation after update completed
+ * @sync_unmap: wait for BO users before unmapping
  * @resv: fences we need to sync to
  * @start: start of mapped range
  * @last: last mapped entry
@@ -878,8 +879,9 @@ static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence,
  */
 int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 			   bool immediate, bool unlocked, bool flush_tlb,
-			   struct dma_resv *resv, uint64_t start, uint64_t last,
-			   uint64_t flags, uint64_t offset, uint64_t vram_base,
+			   bool sync_unmap, struct dma_resv *resv,
+			   uint64_t start, uint64_t last, uint64_t flags,
+			   uint64_t offset, uint64_t vram_base,
 			   struct ttm_resource *res, dma_addr_t *pages_addr,
 			   struct dma_fence **fence)
 {
@@ -919,7 +921,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	/* Implicitly sync to command submissions in the same VM before
 	 * unmapping. Sync to moving fences before mapping.
 	 */
-	if (!(flags & (AMDGPU_PTE_VALID | AMDGPU_PTE_PRT)))
+	if (!(flags & (AMDGPU_PTE_VALID | AMDGPU_PTE_PRT)) && sync_unmap)
 		sync_mode = AMDGPU_SYNC_EQ_OWNER;
 	else
 		sync_mode = AMDGPU_SYNC_EXPLICIT;
@@ -1165,10 +1167,10 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
 		trace_amdgpu_vm_bo_update(mapping);
 
 		r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb,
-					   resv, mapping->start, mapping->last,
-					   update_flags, mapping->offset,
-					   vram_base, mem, pages_addr,
-					   last_update);
+					   true, resv, mapping->start,
+					   mapping->last, update_flags,
+					   mapping->offset, vram_base, mem,
+					   pages_addr, last_update);
 		if (r)
 			return r;
 	}
@@ -1349,7 +1351,8 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
 		    mapping->start < AMDGPU_GMC_HOLE_START)
 			init_pte_value = AMDGPU_PTE_DEFAULT_ATC;
 
-		r = amdgpu_vm_update_range(adev, vm, false, false, true, resv,
+		r = amdgpu_vm_update_range(adev, vm, false, false, true,
+					   mapping->sync_unmap, resv,
 					   mapping->start, mapping->last,
 					   init_pte_value, 0, 0, NULL, NULL,
 					   &f);
@@ -1589,6 +1592,7 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
  * @offset: requested offset in the BO
  * @size: BO size in bytes
  * @flags: attributes of pages (read/write/valid/etc.)
+ * @sync_unmap: wait for BO users before replacing existing mapping
  *
  * Add a mapping of the BO at the specefied addr into the VM. Replace existing
  * mappings as we do so.
@@ -1599,9 +1603,9 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
  * Object has to be reserved and unreserved outside!
  */
 int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
-			     struct amdgpu_bo_va *bo_va,
-			     uint64_t saddr, uint64_t offset,
-			     uint64_t size, uint64_t flags)
+			     struct amdgpu_bo_va *bo_va, uint64_t saddr,
+			     uint64_t offset, uint64_t size, uint64_t flags,
+			     bool sync_unmap)
 {
 	struct amdgpu_bo_va_mapping *mapping;
 	struct amdgpu_bo *bo = bo_va->base.bo;
@@ -1625,7 +1629,7 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
 	if (!mapping)
 		return -ENOMEM;
 
-	r = amdgpu_vm_bo_clear_mappings(adev, bo_va->base.vm, saddr, size);
+	r = amdgpu_vm_bo_clear_mappings(adev, bo_va->base.vm, saddr, size, sync_unmap);
 	if (r) {
 		kfree(mapping);
 		return r;
@@ -1658,9 +1662,8 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
  *
  * Object has to be reserved and unreserved outside!
  */
-int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
-		       struct amdgpu_bo_va *bo_va,
-		       uint64_t saddr)
+int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
+		       uint64_t saddr, bool sync_unmap)
 {
 	struct amdgpu_bo_va_mapping *mapping;
 	struct amdgpu_vm *vm = bo_va->base.vm;
@@ -1688,6 +1691,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
 	list_del(&mapping->list);
 	amdgpu_vm_it_remove(mapping, &vm->va);
 	mapping->bo_va = NULL;
+	mapping->sync_unmap = sync_unmap;
 	trace_amdgpu_vm_bo_unmap(bo_va, mapping);
 
 	if (valid)
@@ -1706,6 +1710,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
  * @vm: VM structure to use
  * @saddr: start of the range
  * @size: size of the range
+ * @sync_unmap: wait for BO users before unmapping
  *
  * Remove all mappings in a range, split them as appropriate.
  *
@@ -1713,8 +1718,8 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
  * 0 for success, error for failure.
  */
 int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
-				struct amdgpu_vm *vm,
-				uint64_t saddr, uint64_t size)
+				struct amdgpu_vm *vm, uint64_t saddr,
+				uint64_t size, bool sync_unmap)
 {
 	struct amdgpu_bo_va_mapping *before, *after, *tmp, *next;
 	LIST_HEAD(removed);
@@ -1782,6 +1787,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
 		    tmp->last = eaddr;
 
 		tmp->bo_va = NULL;
+		tmp->sync_unmap = sync_unmap;
 		list_add(&tmp->list, &vm->freed);
 		trace_amdgpu_vm_bo_unmap(NULL, tmp);
 	}
@@ -1899,6 +1905,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
 		list_del(&mapping->list);
 		amdgpu_vm_it_remove(mapping, &vm->va);
 		mapping->bo_va = NULL;
+		mapping->sync_unmap = true;
 		trace_amdgpu_vm_bo_unmap(bo_va, mapping);
 		list_add(&mapping->list, &vm->freed);
 	}
@@ -2481,20 +2488,19 @@ int amdgpu_vm_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 	struct amdgpu_device *adev = drm_to_adev(dev);
 	struct amdgpu_fpriv *fpriv = filp->driver_priv;
 
-	/* No valid flags defined yet */
-	if (args->in.flags)
-		return -EINVAL;
-
 	switch (args->in.op) {
 	case AMDGPU_VM_OP_RESERVE_VMID:
+		if (args->in.flags)
+			return -EINVAL;
 		/* We only have requirement to reserve vmid from gfxhub */
 		if (!fpriv->vm.reserved_vmid[AMDGPU_GFXHUB(0)]) {
 			amdgpu_vmid_alloc_reserved(adev, AMDGPU_GFXHUB(0));
 			fpriv->vm.reserved_vmid[AMDGPU_GFXHUB(0)] = true;
 		}
-
 		break;
 	case AMDGPU_VM_OP_UNRESERVE_VMID:
+		if (args->in.flags)
+			return -EINVAL;
 		if (fpriv->vm.reserved_vmid[AMDGPU_GFXHUB(0)]) {
 			amdgpu_vmid_free_reserved(adev, AMDGPU_GFXHUB(0));
 			fpriv->vm.reserved_vmid[AMDGPU_GFXHUB(0)] = false;
@@ -2633,8 +2639,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid,
 		goto error_unlock;
 	}
 
-	r = amdgpu_vm_update_range(adev, vm, true, false, false, NULL, addr,
-				   addr, flags, value, 0, NULL, NULL, NULL);
+	r = amdgpu_vm_update_range(adev, vm, true, false, false, true, NULL,
+				   addr, addr, flags, value, 0, NULL, NULL,
+				   NULL);
 	if (r)
 		goto error_unlock;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index f91d4fcf80b8..3574987595d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -427,12 +427,12 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
 			    struct amdgpu_vm *vm, struct amdgpu_bo *bo);
 int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 			   bool immediate, bool unlocked, bool flush_tlb,
-			   struct dma_resv *resv, uint64_t start, uint64_t last,
-			   uint64_t flags, uint64_t offset, uint64_t vram_base,
+			   bool sync_unmap, struct dma_resv *resv,
+			   uint64_t start, uint64_t last, uint64_t flags,
+			   uint64_t offset, uint64_t vram_base,
 			   struct ttm_resource *res, dma_addr_t *pages_addr,
 			   struct dma_fence **fence);
-int amdgpu_vm_bo_update(struct amdgpu_device *adev,
-			struct amdgpu_bo_va *bo_va,
+int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
 			bool clear);
 bool amdgpu_vm_evictable(struct amdgpu_bo *bo);
 void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
@@ -448,15 +448,14 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
 		     uint64_t addr, uint64_t offset,
 		     uint64_t size, uint64_t flags);
 int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
-			     struct amdgpu_bo_va *bo_va,
-			     uint64_t addr, uint64_t offset,
-			     uint64_t size, uint64_t flags);
-int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
-		       struct amdgpu_bo_va *bo_va,
-		       uint64_t addr);
+			     struct amdgpu_bo_va *bo_va, uint64_t addr,
+			     uint64_t offset, uint64_t size, uint64_t flags,
+			     bool sync_unmap);
+int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
+		       uint64_t addr, bool sync_unmap);
 int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
-				struct amdgpu_vm *vm,
-				uint64_t saddr, uint64_t size);
+				struct amdgpu_vm *vm, uint64_t saddr,
+				uint64_t size, bool sync_unmap);
 struct amdgpu_bo_va_mapping *amdgpu_vm_bo_lookup_mapping(struct amdgpu_vm *vm,
 							 uint64_t addr);
 void amdgpu_vm_bo_trace_cs(struct amdgpu_vm *vm, struct ww_acquire_ctx *ticket);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index bb16b795d1bc..6eb4a0a4bc84 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1291,9 +1291,9 @@ svm_range_unmap_from_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 
 	pr_debug("[0x%llx 0x%llx]\n", start, last);
 
-	return amdgpu_vm_update_range(adev, vm, false, true, true, NULL, start,
-				      last, init_pte_value, 0, 0, NULL, NULL,
-				      fence);
+	return amdgpu_vm_update_range(adev, vm, false, true, true, true, NULL,
+				      start, last, init_pte_value, 0, 0, NULL,
+				      NULL, fence);
 }
 
 static int
@@ -1398,12 +1398,12 @@ svm_range_map_to_gpu(struct kfd_process_device *pdd, struct svm_range *prange,
 		 * different memory partition based on fpfn/lpfn, we should use
 		 * same vm_manager.vram_base_offset regardless memory partition.
 		 */
-		r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb, NULL,
-					   last_start, prange->start + i,
-					   pte_flags,
-					   (last_start - prange->start) << PAGE_SHIFT,
-					   bo_adev ? bo_adev->vm_manager.vram_base_offset : 0,
-					   NULL, dma_addr, &vm->last_update);
+		r = amdgpu_vm_update_range(
+			adev, vm, false, false, flush_tlb, true, NULL,
+			last_start, prange->start + i, pte_flags,
+			(last_start - prange->start) << PAGE_SHIFT,
+			bo_adev ? bo_adev->vm_manager.vram_base_offset : 0,
+			NULL, dma_addr, &vm->last_update);
 
 		for (j = last_start - prange->start; j <= i; j++)
 			dma_addr[j] |= last_domain;
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index f477eda6a2b8..3cdcc299956e 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -556,6 +556,8 @@ struct drm_amdgpu_gem_op {
 #define AMDGPU_VM_MTYPE_RW		(5 << 5)
 /* don't allocate MALL */
 #define AMDGPU_VM_PAGE_NOALLOC		(1 << 9)
+/* don't sync on unmap */
+#define AMDGPU_VM_EXPLICIT_SYNC		(1 << 10)
 
 struct drm_amdgpu_gem_va {
 	/** GEM object handle */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 6/6] drm/amdgpu: Bump amdgpu driver version.
  2023-10-31 13:40 [PATCH 0/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
                   ` (4 preceding siblings ...)
  2023-10-31 13:40 ` [PATCH 5/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
@ 2023-10-31 13:40 ` Tatsuyuki Ishi
  2023-11-02 14:04 ` [PATCH v2 0/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
  6 siblings, 0 replies; 34+ messages in thread
From: Tatsuyuki Ishi @ 2023-10-31 13:40 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Tatsuyuki Ishi, christian.koenig

For detection of the new explicit sync functionality without having to try
the ioctl.

Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 81edf66dbea8..2aa406dee192 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -113,9 +113,10 @@
  *            gl1c_cache_size, gl2c_cache_size, mall_size, enabled_rb_pipes_mask_hi
  *   3.53.0 - Support for GFX11 CP GFX shadowing
  *   3.54.0 - Add AMDGPU_CTX_QUERY2_FLAGS_RESET_IN_PROGRESS support
+ * - 3.55.0 - Add AMDGPU_VM_EXPLICIT_SYNC flag for GEM operations.
  */
 #define KMS_DRIVER_MAJOR	3
-#define KMS_DRIVER_MINOR	54
+#define KMS_DRIVER_MINOR	55
 #define KMS_DRIVER_PATCHLEVEL	0
 
 unsigned int amdgpu_vram_limit = UINT_MAX;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/6] drm/amdgpu: Separate eviction from VM status.
  2023-10-31 13:40 ` [PATCH 2/6] drm/amdgpu: Separate eviction from VM status Tatsuyuki Ishi
@ 2023-10-31 13:55   ` Christian König
  2023-10-31 14:39     ` Tatsuyuki Ishi
  2023-10-31 23:52   ` kernel test robot
  1 sibling, 1 reply; 34+ messages in thread
From: Christian König @ 2023-10-31 13:55 UTC (permalink / raw)
  To: Tatsuyuki Ishi, dri-devel, amd-gfx

Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi:
> In short, eviction never really belonged to the vm_status state machine.

I strongly disagree to that.

> Even when evicted, the BO could belong to either the moved or done state.
> The "evicted" state needed to handle both cases, causing greater confusion.
>
> Additionally, there were inconsistencies in the definition of an evicted
> BO. Some places are based on the `evict` parameter passed from the TTM move
> callback, while the others were updated based on whether the BO got its
> optimal placement. The second is more accurate for our use case. With this
> refactor, the evicted state is solely determined by the second rule.

That strongly sounds like you don't understand what the evicted state it 
good for.

The evicted state is for page directories, page tables and per VM BOs 
which needs to move around before doing the next CS.

Please further explain what you try to do here.

Regards,
Christian.

>
> Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    | 67 +++++++++--------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h    |  1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c |  1 +
>   3 files changed, 29 insertions(+), 40 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 7b9762f1cddd..dd6f72e2a1d6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -174,19 +174,23 @@ int amdgpu_vm_set_pasid(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>    * State for PDs/PTs and per VM BOs which are not at the location they should
>    * be.
>    */
> -static void amdgpu_vm_bo_evicted(struct amdgpu_vm_bo_base *vm_bo)
> +static void amdgpu_vm_bo_set_evicted(struct amdgpu_vm_bo_base *vm_bo, bool evicted)
>   {
>   	struct amdgpu_vm *vm = vm_bo->vm;
>   	struct amdgpu_bo *bo = vm_bo->bo;
>   
> -	vm_bo->moved = true;
>   	spin_lock(&vm_bo->vm->status_lock);
> -	if (bo->tbo.type == ttm_bo_type_kernel)
> -		list_move(&vm_bo->vm_status, &vm->evicted);
> -	else
> -		list_move_tail(&vm_bo->vm_status, &vm->evicted);
> +	if (evicted && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
> +		if (bo->tbo.type == ttm_bo_type_kernel)
> +			list_move(&vm_bo->eviction_status, &vm->evicted);
> +		else
> +			list_move_tail(&vm_bo->eviction_status, &vm->evicted);
> +	} else {
> +		list_del_init(&vm_bo->eviction_status);
> +	}
>   	spin_unlock(&vm_bo->vm->status_lock);
>   }
> +
>   /**
>    * amdgpu_vm_bo_moved - vm_bo is moved
>    *
> @@ -310,6 +314,7 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
>   	base->bo = bo;
>   	base->next = NULL;
>   	INIT_LIST_HEAD(&base->vm_status);
> +	INIT_LIST_HEAD(&base->eviction_status);
>   
>   	if (!bo)
>   		return;
> @@ -336,7 +341,7 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
>   	 * is currently evicted. add the bo to the evicted list to make sure it
>   	 * is validated on next vm use to avoid fault.
>   	 * */
> -	amdgpu_vm_bo_evicted(base);
> +	amdgpu_vm_bo_set_evicted(base, true);
>   }
>   
>   /**
> @@ -460,7 +465,7 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   	while (!list_empty(&vm->evicted)) {
>   		bo_base = list_first_entry(&vm->evicted,
>   					   struct amdgpu_vm_bo_base,
> -					   vm_status);
> +					   eviction_status);
>   		spin_unlock(&vm->status_lock);
>   
>   		bo = bo_base->bo;
> @@ -1034,7 +1039,7 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm,
>   	list_for_each_entry_safe(bo_va, tmp, &vm->idle, base.vm_status)
>   		amdgpu_vm_bo_get_memory(bo_va, stats);
>   
> -	list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.vm_status)
> +	list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.eviction_status)
>   		amdgpu_vm_bo_get_memory(bo_va, stats);
>   
>   	list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status)
> @@ -1153,21 +1158,10 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
>   			return r;
>   	}
>   
> -	/* If the BO is not in its preferred location add it back to
> -	 * the evicted list so that it gets validated again on the
> -	 * next command submission.
> -	 */
> -	if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
> -		uint32_t mem_type = bo->tbo.resource->mem_type;
> -
> -		if (!(bo->preferred_domains &
> -		      amdgpu_mem_type_to_domain(mem_type)))
> -			amdgpu_vm_bo_evicted(&bo_va->base);
> -		else
> -			amdgpu_vm_bo_idle(&bo_va->base);
> -	} else {
> +	if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv)
> +		amdgpu_vm_bo_idle(&bo_va->base);
> +	else
>   		amdgpu_vm_bo_done(&bo_va->base);
> -	}
>   
>   	list_splice_init(&bo_va->invalids, &bo_va->valids);
>   	bo_va->cleared = clear;
> @@ -1883,6 +1877,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
>   
>   	spin_lock(&vm->status_lock);
>   	list_del(&bo_va->base.vm_status);
> +	list_del(&bo_va->base.eviction_status);
>   	spin_unlock(&vm->status_lock);
>   
>   	list_for_each_entry_safe(mapping, next, &bo_va->valids, list) {
> @@ -1959,13 +1954,18 @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
>   	if (bo->parent && (amdgpu_bo_shadowed(bo->parent) == bo))
>   		bo = bo->parent;
>   
> +	/* If the BO is not in its preferred location add it back to
> +	 * the evicted list so that it gets validated again on the
> +	 * next command submission.
> +	 */
> +	uint32_t mem_type = bo->tbo.resource->mem_type;
> +	bool suboptimal = !(bo->preferred_domains &
> +			 amdgpu_mem_type_to_domain(mem_type));
> +
>   	for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) {
>   		struct amdgpu_vm *vm = bo_base->vm;
>   
> -		if (evicted && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
> -			amdgpu_vm_bo_evicted(bo_base);
> -			continue;
> -		}
> +		amdgpu_vm_bo_set_evicted(bo_base, suboptimal);
>   
>   		if (bo_base->moved)
>   			continue;
> @@ -2648,13 +2648,11 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>   {
>   	struct amdgpu_bo_va *bo_va, *tmp;
>   	u64 total_idle = 0;
> -	u64 total_evicted = 0;
>   	u64 total_relocated = 0;
>   	u64 total_moved = 0;
>   	u64 total_invalidated = 0;
>   	u64 total_done = 0;
>   	unsigned int total_idle_objs = 0;
> -	unsigned int total_evicted_objs = 0;
>   	unsigned int total_relocated_objs = 0;
>   	unsigned int total_moved_objs = 0;
>   	unsigned int total_invalidated_objs = 0;
> @@ -2671,15 +2669,6 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>   	total_idle_objs = id;
>   	id = 0;
>   
> -	seq_puts(m, "\tEvicted BOs:\n");
> -	list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.vm_status) {
> -		if (!bo_va->base.bo)
> -			continue;
> -		total_evicted += amdgpu_bo_print_info(id++, bo_va->base.bo, m);
> -	}
> -	total_evicted_objs = id;
> -	id = 0;
> -
>   	seq_puts(m, "\tRelocated BOs:\n");
>   	list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status) {
>   		if (!bo_va->base.bo)
> @@ -2718,8 +2707,6 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>   
>   	seq_printf(m, "\tTotal idle size:        %12lld\tobjs:\t%d\n", total_idle,
>   		   total_idle_objs);
> -	seq_printf(m, "\tTotal evicted size:     %12lld\tobjs:\t%d\n", total_evicted,
> -		   total_evicted_objs);
>   	seq_printf(m, "\tTotal relocated size:   %12lld\tobjs:\t%d\n", total_relocated,
>   		   total_relocated_objs);
>   	seq_printf(m, "\tTotal moved size:       %12lld\tobjs:\t%d\n", total_moved,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 204ab13184ed..d9ab97eabda9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -156,6 +156,7 @@ struct amdgpu_vm_bo_base {
>   
>   	/* protected by spinlock */
>   	struct list_head		vm_status;
> +	struct list_head		eviction_status;
>   
>   	/* protected by the BO being reserved */
>   	bool				moved;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
> index 96d601e209b8..f78f4040f466 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
> @@ -652,6 +652,7 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry)
>   
>   	spin_lock(&entry->vm->status_lock);
>   	list_del(&entry->vm_status);
> +	list_del(&entry->eviction_status);
>   	spin_unlock(&entry->vm->status_lock);
>   	amdgpu_bo_unref(&entry->bo);
>   }


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.
  2023-10-31 13:40 ` [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly Tatsuyuki Ishi
@ 2023-10-31 13:57   ` Christian König
  2023-10-31 13:59     ` Bas Nieuwenhuizen
  2023-11-01  1:18   ` kernel test robot
  1 sibling, 1 reply; 34+ messages in thread
From: Christian König @ 2023-10-31 13:57 UTC (permalink / raw)
  To: Tatsuyuki Ishi, dri-devel, amd-gfx

Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi:
> The current amdgpu_gem_va_update_vm only tries to perform updates for the
> BO specified in the GEM ioctl; however, when a binding is split, the
> adjacent bindings also need to be updated. Such updates currently ends up
> getting deferred until next submission which causes stalls.

Yeah, that is a necessity. The hardware simply doesn't support what you 
try to do here in all cases.

So this approach won't work in general.

Regards,
Christian.

>
> Introduce a new state "dirty", shared between per-VM BOs and traditional
> BOs, containing all BOs that have pending updates in `invalids`.
> amdgpu_gem_va_update_vm will now simply flush any pending updates for BOs
> in the dirty state.
>
> Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 18 ++++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 66 ++++++++++++++++++-------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  3 ++
>   3 files changed, 63 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index a1b15d0d6c48..01d3a97248b0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -604,10 +604,9 @@ int amdgpu_gem_metadata_ioctl(struct drm_device *dev, void *data,
>    * vital here, so they are not reported back to userspace.
>    */
>   static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
> -				    struct amdgpu_vm *vm,
> -				    struct amdgpu_bo_va *bo_va,
> -				    uint32_t operation)
> +				    struct amdgpu_vm *vm)
>   {
> +	struct amdgpu_bo_va *bo_va;
>   	int r;
>   
>   	if (!amdgpu_vm_ready(vm))
> @@ -617,12 +616,18 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>   	if (r)
>   		goto error;
>   
> -	if (operation == AMDGPU_VA_OP_MAP ||
> -	    operation == AMDGPU_VA_OP_REPLACE) {
> +	spin_lock(&vm->status_lock);
> +	while (!list_empty(&vm->dirty)) {
> +		bo_va = list_first_entry(&vm->dirty, struct amdgpu_bo_va,
> +					 base.vm_status);
> +		spin_unlock(&vm->status_lock);
> +
>   		r = amdgpu_vm_bo_update(adev, bo_va, false);
>   		if (r)
>   			goto error;
> +		spin_lock(&vm->status_lock);
>   	}
> +	spin_unlock(&vm->status_lock);
>   
>   	r = amdgpu_vm_update_pdes(adev, vm, false);
>   
> @@ -792,8 +797,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
>   		break;
>   	}
>   	if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) && !amdgpu_vm_debug)
> -		amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
> -					args->operation);
> +		amdgpu_gem_va_update_vm(adev, &fpriv->vm);
>   
>   error:
>   	drm_exec_fini(&exec);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index dd6f72e2a1d6..01d31891cd05 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -191,6 +191,21 @@ static void amdgpu_vm_bo_set_evicted(struct amdgpu_vm_bo_base *vm_bo, bool evict
>   	spin_unlock(&vm_bo->vm->status_lock);
>   }
>   
> +/**
> + * amdgpu_vm_bo_dirty - vm_bo is dirty
> + *
> + * @vm_bo: vm_bo which is dirty
> + *
> + * State for normal and per VM BOs that are not moved, but have new entries in
> + * bo_va->invalids.
> + */
> +static void amdgpu_vm_bo_dirty(struct amdgpu_vm_bo_base *vm_bo)
> +{
> +	spin_lock(&vm_bo->vm->status_lock);
> +	list_move(&vm_bo->vm_status, &vm_bo->vm->dirty);
> +	spin_unlock(&vm_bo->vm->status_lock);
> +}
> +
>   /**
>    * amdgpu_vm_bo_moved - vm_bo is moved
>    *
> @@ -1042,6 +1057,9 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm,
>   	list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.eviction_status)
>   		amdgpu_vm_bo_get_memory(bo_va, stats);
>   
> +	list_for_each_entry_safe(bo_va, tmp, &vm->dirty, base.vm_status)
> +		amdgpu_vm_bo_get_memory(bo_va, stats);
> +
>   	list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status)
>   		amdgpu_vm_bo_get_memory(bo_va, stats);
>   
> @@ -1411,6 +1429,17 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
>   			dma_resv_unlock(resv);
>   		spin_lock(&vm->status_lock);
>   	}
> +
> +	while (!list_empty(&vm->dirty)) {
> +		bo_va = list_first_entry(&vm->dirty, struct amdgpu_bo_va,
> +					 base.vm_status);
> +		spin_unlock(&vm->status_lock);
> +
> +		r = amdgpu_vm_bo_update(adev, bo_va, false);
> +		if (r)
> +			return r;
> +		spin_lock(&vm->status_lock);
> +	}
>   	spin_unlock(&vm->status_lock);
>   
>   	return 0;
> @@ -1476,19 +1505,16 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device *adev,
>   				    struct amdgpu_bo_va_mapping *mapping)
>   {
>   	struct amdgpu_vm *vm = bo_va->base.vm;
> -	struct amdgpu_bo *bo = bo_va->base.bo;
>   
>   	mapping->bo_va = bo_va;
>   	list_add(&mapping->list, &bo_va->invalids);
>   	amdgpu_vm_it_insert(mapping, &vm->va);
> +	if (!bo_va->base.moved)
> +		amdgpu_vm_bo_dirty(&bo_va->base);
>   
>   	if (mapping->flags & AMDGPU_PTE_PRT)
>   		amdgpu_vm_prt_get(adev);
>   
> -	if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
> -	    !bo_va->base.moved) {
> -		amdgpu_vm_bo_moved(&bo_va->base);
> -	}
>   	trace_amdgpu_vm_bo_map(bo_va, mapping);
>   }
>   
> @@ -1725,6 +1751,8 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>   			before->flags = tmp->flags;
>   			before->bo_va = tmp->bo_va;
>   			list_add(&before->list, &tmp->bo_va->invalids);
> +			if (!tmp->bo_va->base.moved)
> +				amdgpu_vm_bo_dirty(&tmp->bo_va->base);
>   		}
>   
>   		/* Remember mapping split at the end */
> @@ -1736,6 +1764,8 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>   			after->flags = tmp->flags;
>   			after->bo_va = tmp->bo_va;
>   			list_add(&after->list, &tmp->bo_va->invalids);
> +			if (!tmp->bo_va->base.moved)
> +				amdgpu_vm_bo_dirty(&tmp->bo_va->base);
>   		}
>   
>   		list_del(&tmp->list);
> @@ -1761,30 +1791,18 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>   
>   	/* Insert partial mapping before the range */
>   	if (!list_empty(&before->list)) {
> -		struct amdgpu_bo *bo = before->bo_va->base.bo;
> -
>   		amdgpu_vm_it_insert(before, &vm->va);
>   		if (before->flags & AMDGPU_PTE_PRT)
>   			amdgpu_vm_prt_get(adev);
> -
> -		if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
> -		    !before->bo_va->base.moved)
> -			amdgpu_vm_bo_moved(&before->bo_va->base);
>   	} else {
>   		kfree(before);
>   	}
>   
>   	/* Insert partial mapping after the range */
>   	if (!list_empty(&after->list)) {
> -		struct amdgpu_bo *bo = after->bo_va->base.bo;
> -
>   		amdgpu_vm_it_insert(after, &vm->va);
>   		if (after->flags & AMDGPU_PTE_PRT)
>   			amdgpu_vm_prt_get(adev);
> -
> -		if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
> -		    !after->bo_va->base.moved)
> -			amdgpu_vm_bo_moved(&after->bo_va->base);
>   	} else {
>   		kfree(after);
>   	}
> @@ -2136,6 +2154,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, int32_t xcp
>   	INIT_LIST_HEAD(&vm->evicted);
>   	INIT_LIST_HEAD(&vm->relocated);
>   	INIT_LIST_HEAD(&vm->moved);
> +	INIT_LIST_HEAD(&vm->dirty);
>   	INIT_LIST_HEAD(&vm->idle);
>   	INIT_LIST_HEAD(&vm->invalidated);
>   	spin_lock_init(&vm->status_lock);
> @@ -2648,11 +2667,13 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>   {
>   	struct amdgpu_bo_va *bo_va, *tmp;
>   	u64 total_idle = 0;
> +	u64 total_dirty = 0;
>   	u64 total_relocated = 0;
>   	u64 total_moved = 0;
>   	u64 total_invalidated = 0;
>   	u64 total_done = 0;
>   	unsigned int total_idle_objs = 0;
> +	unsigned int total_dirty_objs = 0;
>   	unsigned int total_relocated_objs = 0;
>   	unsigned int total_moved_objs = 0;
>   	unsigned int total_invalidated_objs = 0;
> @@ -2669,6 +2690,15 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>   	total_idle_objs = id;
>   	id = 0;
>   
> +	seq_puts(m, "\tDirty BOs:\n");
> +	list_for_each_entry_safe(bo_va, tmp, &vm->dirty, base.vm_status) {
> +		if (!bo_va->base.bo)
> +			continue;
> +		total_dirty += amdgpu_bo_print_info(id++, bo_va->base.bo, m);
> +	}
> +	total_dirty_objs = id;
> +	id = 0;
> +
>   	seq_puts(m, "\tRelocated BOs:\n");
>   	list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status) {
>   		if (!bo_va->base.bo)
> @@ -2707,6 +2737,8 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>   
>   	seq_printf(m, "\tTotal idle size:        %12lld\tobjs:\t%d\n", total_idle,
>   		   total_idle_objs);
> +	seq_printf(m, "\tTotal dirty size:       %12lld\tobjs:\t%d\n", total_dirty,
> +		   total_dirty_objs);
>   	seq_printf(m, "\tTotal relocated size:   %12lld\tobjs:\t%d\n", total_relocated,
>   		   total_relocated_objs);
>   	seq_printf(m, "\tTotal moved size:       %12lld\tobjs:\t%d\n", total_moved,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index d9ab97eabda9..f91d4fcf80b8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -276,6 +276,9 @@ struct amdgpu_vm {
>   	/* per VM BOs moved, but not yet updated in the PT */
>   	struct list_head	moved;
>   
> +	/* normal and per VM BOs that are not moved, but have new PT entries */
> +	struct list_head	dirty;
> +
>   	/* All BOs of this VM not currently in the state machine */
>   	struct list_head	idle;
>   


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.
  2023-10-31 13:57   ` Christian König
@ 2023-10-31 13:59     ` Bas Nieuwenhuizen
  2023-10-31 14:07       ` Christian König
  0 siblings, 1 reply; 34+ messages in thread
From: Bas Nieuwenhuizen @ 2023-10-31 13:59 UTC (permalink / raw)
  To: Christian König; +Cc: Tatsuyuki Ishi, amd-gfx, dri-devel

[-- Attachment #1: Type: text/plain, Size: 11751 bytes --]

On Tue, Oct 31, 2023 at 2:57 PM Christian König <christian.koenig@amd.com>
wrote:

> Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi:
> > The current amdgpu_gem_va_update_vm only tries to perform updates for the
> > BO specified in the GEM ioctl; however, when a binding is split, the
> > adjacent bindings also need to be updated. Such updates currently ends up
> > getting deferred until next submission which causes stalls.
>
> Yeah, that is a necessity. The hardware simply doesn't support what you
> try to do here in all cases.
>

What can the hardware not do here? Is this just needing to wait for TLB
flushes before we can free pagetables, can we just delay that?


>
> So this approach won't work in general.
>
> Regards,
> Christian.
>
> >
> > Introduce a new state "dirty", shared between per-VM BOs and traditional
> > BOs, containing all BOs that have pending updates in `invalids`.
> > amdgpu_gem_va_update_vm will now simply flush any pending updates for BOs
> > in the dirty state.
> >
> > Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 18 ++++---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 66 ++++++++++++++++++-------
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  3 ++
> >   3 files changed, 63 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > index a1b15d0d6c48..01d3a97248b0 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > @@ -604,10 +604,9 @@ int amdgpu_gem_metadata_ioctl(struct drm_device
> *dev, void *data,
> >    * vital here, so they are not reported back to userspace.
> >    */
> >   static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
> > -                                 struct amdgpu_vm *vm,
> > -                                 struct amdgpu_bo_va *bo_va,
> > -                                 uint32_t operation)
> > +                                 struct amdgpu_vm *vm)
> >   {
> > +     struct amdgpu_bo_va *bo_va;
> >       int r;
> >
> >       if (!amdgpu_vm_ready(vm))
> > @@ -617,12 +616,18 @@ static void amdgpu_gem_va_update_vm(struct
> amdgpu_device *adev,
> >       if (r)
> >               goto error;
> >
> > -     if (operation == AMDGPU_VA_OP_MAP ||
> > -         operation == AMDGPU_VA_OP_REPLACE) {
> > +     spin_lock(&vm->status_lock);
> > +     while (!list_empty(&vm->dirty)) {
> > +             bo_va = list_first_entry(&vm->dirty, struct amdgpu_bo_va,
> > +                                      base.vm_status);
> > +             spin_unlock(&vm->status_lock);
> > +
> >               r = amdgpu_vm_bo_update(adev, bo_va, false);
> >               if (r)
> >                       goto error;
> > +             spin_lock(&vm->status_lock);
> >       }
> > +     spin_unlock(&vm->status_lock);
> >
> >       r = amdgpu_vm_update_pdes(adev, vm, false);
> >
> > @@ -792,8 +797,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void
> *data,
> >               break;
> >       }
> >       if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) &&
> !amdgpu_vm_debug)
> > -             amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
> > -                                     args->operation);
> > +             amdgpu_gem_va_update_vm(adev, &fpriv->vm);
> >
> >   error:
> >       drm_exec_fini(&exec);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > index dd6f72e2a1d6..01d31891cd05 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > @@ -191,6 +191,21 @@ static void amdgpu_vm_bo_set_evicted(struct
> amdgpu_vm_bo_base *vm_bo, bool evict
> >       spin_unlock(&vm_bo->vm->status_lock);
> >   }
> >
> > +/**
> > + * amdgpu_vm_bo_dirty - vm_bo is dirty
> > + *
> > + * @vm_bo: vm_bo which is dirty
> > + *
> > + * State for normal and per VM BOs that are not moved, but have new
> entries in
> > + * bo_va->invalids.
> > + */
> > +static void amdgpu_vm_bo_dirty(struct amdgpu_vm_bo_base *vm_bo)
> > +{
> > +     spin_lock(&vm_bo->vm->status_lock);
> > +     list_move(&vm_bo->vm_status, &vm_bo->vm->dirty);
> > +     spin_unlock(&vm_bo->vm->status_lock);
> > +}
> > +
> >   /**
> >    * amdgpu_vm_bo_moved - vm_bo is moved
> >    *
> > @@ -1042,6 +1057,9 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm,
> >       list_for_each_entry_safe(bo_va, tmp, &vm->evicted,
> base.eviction_status)
> >               amdgpu_vm_bo_get_memory(bo_va, stats);
> >
> > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty, base.vm_status)
> > +             amdgpu_vm_bo_get_memory(bo_va, stats);
> > +
> >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated,
> base.vm_status)
> >               amdgpu_vm_bo_get_memory(bo_va, stats);
> >
> > @@ -1411,6 +1429,17 @@ int amdgpu_vm_handle_moved(struct amdgpu_device
> *adev,
> >                       dma_resv_unlock(resv);
> >               spin_lock(&vm->status_lock);
> >       }
> > +
> > +     while (!list_empty(&vm->dirty)) {
> > +             bo_va = list_first_entry(&vm->dirty, struct amdgpu_bo_va,
> > +                                      base.vm_status);
> > +             spin_unlock(&vm->status_lock);
> > +
> > +             r = amdgpu_vm_bo_update(adev, bo_va, false);
> > +             if (r)
> > +                     return r;
> > +             spin_lock(&vm->status_lock);
> > +     }
> >       spin_unlock(&vm->status_lock);
> >
> >       return 0;
> > @@ -1476,19 +1505,16 @@ static void amdgpu_vm_bo_insert_map(struct
> amdgpu_device *adev,
> >                                   struct amdgpu_bo_va_mapping *mapping)
> >   {
> >       struct amdgpu_vm *vm = bo_va->base.vm;
> > -     struct amdgpu_bo *bo = bo_va->base.bo;
> >
> >       mapping->bo_va = bo_va;
> >       list_add(&mapping->list, &bo_va->invalids);
> >       amdgpu_vm_it_insert(mapping, &vm->va);
> > +     if (!bo_va->base.moved)
> > +             amdgpu_vm_bo_dirty(&bo_va->base);
> >
> >       if (mapping->flags & AMDGPU_PTE_PRT)
> >               amdgpu_vm_prt_get(adev);
> >
> > -     if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
> > -         !bo_va->base.moved) {
> > -             amdgpu_vm_bo_moved(&bo_va->base);
> > -     }
> >       trace_amdgpu_vm_bo_map(bo_va, mapping);
> >   }
> >
> > @@ -1725,6 +1751,8 @@ int amdgpu_vm_bo_clear_mappings(struct
> amdgpu_device *adev,
> >                       before->flags = tmp->flags;
> >                       before->bo_va = tmp->bo_va;
> >                       list_add(&before->list, &tmp->bo_va->invalids);
> > +                     if (!tmp->bo_va->base.moved)
> > +                             amdgpu_vm_bo_dirty(&tmp->bo_va->base);
> >               }
> >
> >               /* Remember mapping split at the end */
> > @@ -1736,6 +1764,8 @@ int amdgpu_vm_bo_clear_mappings(struct
> amdgpu_device *adev,
> >                       after->flags = tmp->flags;
> >                       after->bo_va = tmp->bo_va;
> >                       list_add(&after->list, &tmp->bo_va->invalids);
> > +                     if (!tmp->bo_va->base.moved)
> > +                             amdgpu_vm_bo_dirty(&tmp->bo_va->base);
> >               }
> >
> >               list_del(&tmp->list);
> > @@ -1761,30 +1791,18 @@ int amdgpu_vm_bo_clear_mappings(struct
> amdgpu_device *adev,
> >
> >       /* Insert partial mapping before the range */
> >       if (!list_empty(&before->list)) {
> > -             struct amdgpu_bo *bo = before->bo_va->base.bo;
> > -
> >               amdgpu_vm_it_insert(before, &vm->va);
> >               if (before->flags & AMDGPU_PTE_PRT)
> >                       amdgpu_vm_prt_get(adev);
> > -
> > -             if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv
> &&
> > -                 !before->bo_va->base.moved)
> > -                     amdgpu_vm_bo_moved(&before->bo_va->base);
> >       } else {
> >               kfree(before);
> >       }
> >
> >       /* Insert partial mapping after the range */
> >       if (!list_empty(&after->list)) {
> > -             struct amdgpu_bo *bo = after->bo_va->base.bo;
> > -
> >               amdgpu_vm_it_insert(after, &vm->va);
> >               if (after->flags & AMDGPU_PTE_PRT)
> >                       amdgpu_vm_prt_get(adev);
> > -
> > -             if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv
> &&
> > -                 !after->bo_va->base.moved)
> > -                     amdgpu_vm_bo_moved(&after->bo_va->base);
> >       } else {
> >               kfree(after);
> >       }
> > @@ -2136,6 +2154,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev,
> struct amdgpu_vm *vm, int32_t xcp
> >       INIT_LIST_HEAD(&vm->evicted);
> >       INIT_LIST_HEAD(&vm->relocated);
> >       INIT_LIST_HEAD(&vm->moved);
> > +     INIT_LIST_HEAD(&vm->dirty);
> >       INIT_LIST_HEAD(&vm->idle);
> >       INIT_LIST_HEAD(&vm->invalidated);
> >       spin_lock_init(&vm->status_lock);
> > @@ -2648,11 +2667,13 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm
> *vm, struct seq_file *m)
> >   {
> >       struct amdgpu_bo_va *bo_va, *tmp;
> >       u64 total_idle = 0;
> > +     u64 total_dirty = 0;
> >       u64 total_relocated = 0;
> >       u64 total_moved = 0;
> >       u64 total_invalidated = 0;
> >       u64 total_done = 0;
> >       unsigned int total_idle_objs = 0;
> > +     unsigned int total_dirty_objs = 0;
> >       unsigned int total_relocated_objs = 0;
> >       unsigned int total_moved_objs = 0;
> >       unsigned int total_invalidated_objs = 0;
> > @@ -2669,6 +2690,15 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm
> *vm, struct seq_file *m)
> >       total_idle_objs = id;
> >       id = 0;
> >
> > +     seq_puts(m, "\tDirty BOs:\n");
> > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty, base.vm_status) {
> > +             if (!bo_va->base.bo)
> > +                     continue;
> > +             total_dirty += amdgpu_bo_print_info(id++, bo_va->base.bo,
> m);
> > +     }
> > +     total_dirty_objs = id;
> > +     id = 0;
> > +
> >       seq_puts(m, "\tRelocated BOs:\n");
> >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated,
> base.vm_status) {
> >               if (!bo_va->base.bo)
> > @@ -2707,6 +2737,8 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm
> *vm, struct seq_file *m)
> >
> >       seq_printf(m, "\tTotal idle size:        %12lld\tobjs:\t%d\n",
> total_idle,
> >                  total_idle_objs);
> > +     seq_printf(m, "\tTotal dirty size:       %12lld\tobjs:\t%d\n",
> total_dirty,
> > +                total_dirty_objs);
> >       seq_printf(m, "\tTotal relocated size:   %12lld\tobjs:\t%d\n",
> total_relocated,
> >                  total_relocated_objs);
> >       seq_printf(m, "\tTotal moved size:       %12lld\tobjs:\t%d\n",
> total_moved,
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> > index d9ab97eabda9..f91d4fcf80b8 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> > @@ -276,6 +276,9 @@ struct amdgpu_vm {
> >       /* per VM BOs moved, but not yet updated in the PT */
> >       struct list_head        moved;
> >
> > +     /* normal and per VM BOs that are not moved, but have new PT
> entries */
> > +     struct list_head        dirty;
> > +
> >       /* All BOs of this VM not currently in the state machine */
> >       struct list_head        idle;
> >
>
>

[-- Attachment #2: Type: text/html, Size: 15682 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] drm/amdgpu: Remove redundant state change after validation.
  2023-10-31 13:40 ` [PATCH 4/6] drm/amdgpu: Remove redundant state change after validation Tatsuyuki Ishi
@ 2023-10-31 14:01   ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2023-10-31 14:01 UTC (permalink / raw)
  To: Tatsuyuki Ishi, dri-devel, amd-gfx

Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi:
> All the state changes are handled in the TTM move callback; doing it again
> here just leads to more confusion.

The state move here is because we need to track which PDs/PTs are 
already validated and which have new locations reflected in the PDEs.

With this change here you will sooner or later run into PDE corruption.

>
> The table update remains here because it needs to be done exactly once,
> while doing it in the move callback will result it getting triggered twice,
> once by the actual BO and once by the shadow BO.

The table update isn't done in the move callback because you can't take 
the appropriate locks there.

Regards,
Christian.


>
> Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 01d31891cd05..50f7cee639ac 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -495,12 +495,9 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   				return r;
>   		}
>   
> -		if (bo->tbo.type != ttm_bo_type_kernel) {
> -			amdgpu_vm_bo_moved(bo_base);
> -		} else {
> +		if (bo->tbo.type == ttm_bo_type_kernel)
>   			vm->update_funcs->map_table(to_amdgpu_bo_vm(bo));
> -			amdgpu_vm_bo_relocated(bo_base);
> -		}
> +
>   		spin_lock(&vm->status_lock);
>   	}
>   	spin_unlock(&vm->status_lock);


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.
  2023-10-31 13:59     ` Bas Nieuwenhuizen
@ 2023-10-31 14:07       ` Christian König
  2023-10-31 14:17         ` Bas Nieuwenhuizen
                           ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Christian König @ 2023-10-31 14:07 UTC (permalink / raw)
  To: Bas Nieuwenhuizen; +Cc: Tatsuyuki Ishi, amd-gfx, dri-devel

[-- Attachment #1: Type: text/plain, Size: 14508 bytes --]

Am 31.10.23 um 14:59 schrieb Bas Nieuwenhuizen:
>
>
> On Tue, Oct 31, 2023 at 2:57 PM Christian König 
> <christian.koenig@amd.com> wrote:
>
>     Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi:
>     > The current amdgpu_gem_va_update_vm only tries to perform
>     updates for the
>     > BO specified in the GEM ioctl; however, when a binding is split, the
>     > adjacent bindings also need to be updated. Such updates
>     currently ends up
>     > getting deferred until next submission which causes stalls.
>
>     Yeah, that is a necessity. The hardware simply doesn't support
>     what you
>     try to do here in all cases.
>
>
> What can the hardware not do here? Is this just needing to wait for 
> TLB flushes before we can free pagetables, can we just delay that?

On some hardware generations (especially Navi1x, but also everything 
older than Polaris) you can't invalidate the TLB while it is in use.

For Polaris and older it just means that you don't have a guarantee that 
the shader can't access the memory any more. So delaying the free 
operation helps here.

But for Navi1x it's a workaround for a hardware bug. If you try to 
invalidate the TLB while it is in use you can potentially triggering 
memory accesses to random addresses.

That's why we still delay TLB invalidation's to the next CS and use a 
new VMID for each submission instead of invalidating the old one.

I'm currently working on changing that for Navi2x and newer (maybe Vega 
as well), but this is something you can really only do on some hw 
generations after validating that it works.

Regards,
Christian.

>
>
>     So this approach won't work in general.
>
>     Regards,
>     Christian.
>
>     >
>     > Introduce a new state "dirty", shared between per-VM BOs and
>     traditional
>     > BOs, containing all BOs that have pending updates in `invalids`.
>     > amdgpu_gem_va_update_vm will now simply flush any pending
>     updates for BOs
>     > in the dirty state.
>     >
>     > Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
>     > ---
>     >   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 18 ++++---
>     >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 66
>     ++++++++++++++++++-------
>     >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  3 ++
>     >   3 files changed, 63 insertions(+), 24 deletions(-)
>     >
>     > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>     > index a1b15d0d6c48..01d3a97248b0 100644
>     > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>     > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>     > @@ -604,10 +604,9 @@ int amdgpu_gem_metadata_ioctl(struct
>     drm_device *dev, void *data,
>     >    * vital here, so they are not reported back to userspace.
>     >    */
>     >   static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>     > -                                 struct amdgpu_vm *vm,
>     > -                                 struct amdgpu_bo_va *bo_va,
>     > -                                 uint32_t operation)
>     > +                                 struct amdgpu_vm *vm)
>     >   {
>     > +     struct amdgpu_bo_va *bo_va;
>     >       int r;
>     >
>     >       if (!amdgpu_vm_ready(vm))
>     > @@ -617,12 +616,18 @@ static void amdgpu_gem_va_update_vm(struct
>     amdgpu_device *adev,
>     >       if (r)
>     >               goto error;
>     >
>     > -     if (operation == AMDGPU_VA_OP_MAP ||
>     > -         operation == AMDGPU_VA_OP_REPLACE) {
>     > +     spin_lock(&vm->status_lock);
>     > +     while (!list_empty(&vm->dirty)) {
>     > +             bo_va = list_first_entry(&vm->dirty, struct
>     amdgpu_bo_va,
>     > +                                      base.vm_status);
>     > +             spin_unlock(&vm->status_lock);
>     > +
>     >               r = amdgpu_vm_bo_update(adev, bo_va, false);
>     >               if (r)
>     >                       goto error;
>     > +             spin_lock(&vm->status_lock);
>     >       }
>     > +     spin_unlock(&vm->status_lock);
>     >
>     >       r = amdgpu_vm_update_pdes(adev, vm, false);
>     >
>     > @@ -792,8 +797,7 @@ int amdgpu_gem_va_ioctl(struct drm_device
>     *dev, void *data,
>     >               break;
>     >       }
>     >       if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) &&
>     !amdgpu_vm_debug)
>     > -             amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
>     > -  args->operation);
>     > +             amdgpu_gem_va_update_vm(adev, &fpriv->vm);
>     >
>     >   error:
>     >       drm_exec_fini(&exec);
>     > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>     > index dd6f72e2a1d6..01d31891cd05 100644
>     > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>     > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>     > @@ -191,6 +191,21 @@ static void amdgpu_vm_bo_set_evicted(struct
>     amdgpu_vm_bo_base *vm_bo, bool evict
>     >       spin_unlock(&vm_bo->vm->status_lock);
>     >   }
>     >
>     > +/**
>     > + * amdgpu_vm_bo_dirty - vm_bo is dirty
>     > + *
>     > + * @vm_bo: vm_bo which is dirty
>     > + *
>     > + * State for normal and per VM BOs that are not moved, but have
>     new entries in
>     > + * bo_va->invalids.
>     > + */
>     > +static void amdgpu_vm_bo_dirty(struct amdgpu_vm_bo_base *vm_bo)
>     > +{
>     > +     spin_lock(&vm_bo->vm->status_lock);
>     > +     list_move(&vm_bo->vm_status, &vm_bo->vm->dirty);
>     > +     spin_unlock(&vm_bo->vm->status_lock);
>     > +}
>     > +
>     >   /**
>     >    * amdgpu_vm_bo_moved - vm_bo is moved
>     >    *
>     > @@ -1042,6 +1057,9 @@ void amdgpu_vm_get_memory(struct amdgpu_vm
>     *vm,
>     >       list_for_each_entry_safe(bo_va, tmp, &vm->evicted,
>     base.eviction_status)
>     >               amdgpu_vm_bo_get_memory(bo_va, stats);
>     >
>     > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty,
>     base.vm_status)
>     > +             amdgpu_vm_bo_get_memory(bo_va, stats);
>     > +
>     >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated,
>     base.vm_status)
>     >               amdgpu_vm_bo_get_memory(bo_va, stats);
>     >
>     > @@ -1411,6 +1429,17 @@ int amdgpu_vm_handle_moved(struct
>     amdgpu_device *adev,
>     >                       dma_resv_unlock(resv);
>     >               spin_lock(&vm->status_lock);
>     >       }
>     > +
>     > +     while (!list_empty(&vm->dirty)) {
>     > +             bo_va = list_first_entry(&vm->dirty, struct
>     amdgpu_bo_va,
>     > +                                      base.vm_status);
>     > +             spin_unlock(&vm->status_lock);
>     > +
>     > +             r = amdgpu_vm_bo_update(adev, bo_va, false);
>     > +             if (r)
>     > +                     return r;
>     > +             spin_lock(&vm->status_lock);
>     > +     }
>     >       spin_unlock(&vm->status_lock);
>     >
>     >       return 0;
>     > @@ -1476,19 +1505,16 @@ static void
>     amdgpu_vm_bo_insert_map(struct amdgpu_device *adev,
>     >                                   struct amdgpu_bo_va_mapping
>     *mapping)
>     >   {
>     >       struct amdgpu_vm *vm = bo_va->base.vm;
>     > -     struct amdgpu_bo *bo = bo_va->base.bo <http://base.bo>;
>     >
>     >       mapping->bo_va = bo_va;
>     >       list_add(&mapping->list, &bo_va->invalids);
>     >       amdgpu_vm_it_insert(mapping, &vm->va);
>     > +     if (!bo_va->base.moved)
>     > +             amdgpu_vm_bo_dirty(&bo_va->base);
>     >
>     >       if (mapping->flags & AMDGPU_PTE_PRT)
>     >               amdgpu_vm_prt_get(adev);
>     >
>     > -     if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
>     > -         !bo_va->base.moved) {
>     > -             amdgpu_vm_bo_moved(&bo_va->base);
>     > -     }
>     >       trace_amdgpu_vm_bo_map(bo_va, mapping);
>     >   }
>     >
>     > @@ -1725,6 +1751,8 @@ int amdgpu_vm_bo_clear_mappings(struct
>     amdgpu_device *adev,
>     >                       before->flags = tmp->flags;
>     >                       before->bo_va = tmp->bo_va;
>     >                       list_add(&before->list,
>     &tmp->bo_va->invalids);
>     > +                     if (!tmp->bo_va->base.moved)
>     > +  amdgpu_vm_bo_dirty(&tmp->bo_va->base);
>     >               }
>     >
>     >               /* Remember mapping split at the end */
>     > @@ -1736,6 +1764,8 @@ int amdgpu_vm_bo_clear_mappings(struct
>     amdgpu_device *adev,
>     >                       after->flags = tmp->flags;
>     >                       after->bo_va = tmp->bo_va;
>     >                       list_add(&after->list, &tmp->bo_va->invalids);
>     > +                     if (!tmp->bo_va->base.moved)
>     > +  amdgpu_vm_bo_dirty(&tmp->bo_va->base);
>     >               }
>     >
>     >               list_del(&tmp->list);
>     > @@ -1761,30 +1791,18 @@ int amdgpu_vm_bo_clear_mappings(struct
>     amdgpu_device *adev,
>     >
>     >       /* Insert partial mapping before the range */
>     >       if (!list_empty(&before->list)) {
>     > -             struct amdgpu_bo *bo = before->bo_va->base.bo
>     <http://base.bo>;
>     > -
>     >               amdgpu_vm_it_insert(before, &vm->va);
>     >               if (before->flags & AMDGPU_PTE_PRT)
>     >                       amdgpu_vm_prt_get(adev);
>     > -
>     > -             if (bo && bo->tbo.base.resv ==
>     vm->root.bo->tbo.base.resv &&
>     > -                 !before->bo_va->base.moved)
>     > -  amdgpu_vm_bo_moved(&before->bo_va->base);
>     >       } else {
>     >               kfree(before);
>     >       }
>     >
>     >       /* Insert partial mapping after the range */
>     >       if (!list_empty(&after->list)) {
>     > -             struct amdgpu_bo *bo = after->bo_va->base.bo
>     <http://base.bo>;
>     > -
>     >               amdgpu_vm_it_insert(after, &vm->va);
>     >               if (after->flags & AMDGPU_PTE_PRT)
>     >                       amdgpu_vm_prt_get(adev);
>     > -
>     > -             if (bo && bo->tbo.base.resv ==
>     vm->root.bo->tbo.base.resv &&
>     > -                 !after->bo_va->base.moved)
>     > -  amdgpu_vm_bo_moved(&after->bo_va->base);
>     >       } else {
>     >               kfree(after);
>     >       }
>     > @@ -2136,6 +2154,7 @@ int amdgpu_vm_init(struct amdgpu_device
>     *adev, struct amdgpu_vm *vm, int32_t xcp
>     >       INIT_LIST_HEAD(&vm->evicted);
>     >       INIT_LIST_HEAD(&vm->relocated);
>     >       INIT_LIST_HEAD(&vm->moved);
>     > +     INIT_LIST_HEAD(&vm->dirty);
>     >       INIT_LIST_HEAD(&vm->idle);
>     >       INIT_LIST_HEAD(&vm->invalidated);
>     >       spin_lock_init(&vm->status_lock);
>     > @@ -2648,11 +2667,13 @@ void amdgpu_debugfs_vm_bo_info(struct
>     amdgpu_vm *vm, struct seq_file *m)
>     >   {
>     >       struct amdgpu_bo_va *bo_va, *tmp;
>     >       u64 total_idle = 0;
>     > +     u64 total_dirty = 0;
>     >       u64 total_relocated = 0;
>     >       u64 total_moved = 0;
>     >       u64 total_invalidated = 0;
>     >       u64 total_done = 0;
>     >       unsigned int total_idle_objs = 0;
>     > +     unsigned int total_dirty_objs = 0;
>     >       unsigned int total_relocated_objs = 0;
>     >       unsigned int total_moved_objs = 0;
>     >       unsigned int total_invalidated_objs = 0;
>     > @@ -2669,6 +2690,15 @@ void amdgpu_debugfs_vm_bo_info(struct
>     amdgpu_vm *vm, struct seq_file *m)
>     >       total_idle_objs = id;
>     >       id = 0;
>     >
>     > +     seq_puts(m, "\tDirty BOs:\n");
>     > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty,
>     base.vm_status) {
>     > +             if (!bo_va->base.bo <http://base.bo>)
>     > +                     continue;
>     > +             total_dirty += amdgpu_bo_print_info(id++,
>     bo_va->base.bo <http://base.bo>, m);
>     > +     }
>     > +     total_dirty_objs = id;
>     > +     id = 0;
>     > +
>     >       seq_puts(m, "\tRelocated BOs:\n");
>     >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated,
>     base.vm_status) {
>     >               if (!bo_va->base.bo <http://base.bo>)
>     > @@ -2707,6 +2737,8 @@ void amdgpu_debugfs_vm_bo_info(struct
>     amdgpu_vm *vm, struct seq_file *m)
>     >
>     >       seq_printf(m, "\tTotal idle size: %12lld\tobjs:\t%d\n",
>     total_idle,
>     >                  total_idle_objs);
>     > +     seq_printf(m, "\tTotal dirty size:  %12lld\tobjs:\t%d\n",
>     total_dirty,
>     > +                total_dirty_objs);
>     >       seq_printf(m, "\tTotal relocated size:
>      %12lld\tobjs:\t%d\n", total_relocated,
>     >                  total_relocated_objs);
>     >       seq_printf(m, "\tTotal moved size:  %12lld\tobjs:\t%d\n",
>     total_moved,
>     > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>     > index d9ab97eabda9..f91d4fcf80b8 100644
>     > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>     > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>     > @@ -276,6 +276,9 @@ struct amdgpu_vm {
>     >       /* per VM BOs moved, but not yet updated in the PT */
>     >       struct list_head        moved;
>     >
>     > +     /* normal and per VM BOs that are not moved, but have new
>     PT entries */
>     > +     struct list_head        dirty;
>     > +
>     >       /* All BOs of this VM not currently in the state machine */
>     >       struct list_head        idle;
>     >
>

[-- Attachment #2: Type: text/html, Size: 25838 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations.
  2023-10-31 13:40 ` [PATCH 5/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
@ 2023-10-31 14:14   ` Michel Dänzer
  2023-10-31 14:20     ` Bas Nieuwenhuizen
  2023-10-31 14:34     ` Christian König
  2023-11-01  2:42   ` kernel test robot
  1 sibling, 2 replies; 34+ messages in thread
From: Michel Dänzer @ 2023-10-31 14:14 UTC (permalink / raw)
  To: Tatsuyuki Ishi, dri-devel, amd-gfx; +Cc: christian.koenig

On 10/31/23 14:40, Tatsuyuki Ishi wrote:
> In Vulkan, it is the application's responsibility to perform adequate
> synchronization before a sparse unmap, replace or BO destroy operation.
> Until now, the kernel applied the same rule as implicitly-synchronized
> APIs like OpenGL, which with per-VM BOs made page table updates stall the
> queue completely. The newly added AMDGPU_VM_EXPLICIT_SYNC flag allows
> drivers to opt-out of this behavior, while still ensuring adequate implicit
> sync happens for kernel-initiated updates (e.g. BO moves).
> 
> We record whether to use implicit sync or not for each freed mapping. To
> avoid increasing the mapping struct's size, this is union-ized with the
> interval tree field which is unused after the unmap.
> 
> The reason this is done with a GEM ioctl flag, instead of being a VM /
> context global setting, is that the current libdrm implementation shares
> the DRM handle even between different kind of drivers (radeonsi vs radv).

Different drivers always use separate contexts though, even with the same DRM file description, don't they?

FWIW, RADV will also want explicit sync in the CS ioctl.


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.
  2023-10-31 14:07       ` Christian König
@ 2023-10-31 14:17         ` Bas Nieuwenhuizen
  2023-10-31 14:39           ` Tatsuyuki Ishi
  2023-11-02  2:36         ` Lang Yu
  2023-11-06  7:56         ` Tatsuyuki Ishi
  2 siblings, 1 reply; 34+ messages in thread
From: Bas Nieuwenhuizen @ 2023-10-31 14:17 UTC (permalink / raw)
  To: Christian König; +Cc: Tatsuyuki Ishi, amd-gfx, dri-devel

[-- Attachment #1: Type: text/plain, Size: 13604 bytes --]

On Tue, Oct 31, 2023 at 3:08 PM Christian König <christian.koenig@amd.com>
wrote:

> Am 31.10.23 um 14:59 schrieb Bas Nieuwenhuizen:
>
>
>
> On Tue, Oct 31, 2023 at 2:57 PM Christian König <christian.koenig@amd.com>
> wrote:
>
>> Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi:
>> > The current amdgpu_gem_va_update_vm only tries to perform updates for
>> the
>> > BO specified in the GEM ioctl; however, when a binding is split, the
>> > adjacent bindings also need to be updated. Such updates currently ends
>> up
>> > getting deferred until next submission which causes stalls.
>>
>> Yeah, that is a necessity. The hardware simply doesn't support what you
>> try to do here in all cases.
>>
>
> What can the hardware not do here? Is this just needing to wait for TLB
> flushes before we can free pagetables, can we just delay that?
>
>
> On some hardware generations (especially Navi1x, but also everything older
> than Polaris) you can't invalidate the TLB while it is in use.
>
> For Polaris and older it just means that you don't have a guarantee that
> the shader can't access the memory any more. So delaying the free operation
> helps here.
>
> But for Navi1x it's a workaround for a hardware bug. If you try to
> invalidate the TLB while it is in use you can potentially triggering memory
> accesses to random addresses.
>
> That's why we still delay TLB invalidation's to the next CS and use a new
> VMID for each submission instead of invalidating the old one.
>
> I'm currently working on changing that for Navi2x and newer (maybe Vega as
> well), but this is something you can really only do on some hw generations
> after validating that it works.
>

I think as long as we make sure all significant work gets done
asynchronously, doing the TLB flushing on the next submit (/submissions,
one per queue?) is fine for our purposes.

(As an aside after thinking some more I *think* we also need some work to
make these maps/unmaps (VALID->PRT and PRT->VALID) atomic, as I think it is
valid Vulkan to make these race. As such I'm speculating we'd need a bit
more reworking there too, not just a late free of the lower level
pagetables)

- Bas

>
> Regards,
> Christian.
>
>
>
>>
>> So this approach won't work in general.
>>
>> Regards,
>> Christian.
>>
>> >
>> > Introduce a new state "dirty", shared between per-VM BOs and traditional
>> > BOs, containing all BOs that have pending updates in `invalids`.
>> > amdgpu_gem_va_update_vm will now simply flush any pending updates for
>> BOs
>> > in the dirty state.
>> >
>> > Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
>> > ---
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 18 ++++---
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 66 ++++++++++++++++++-------
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  3 ++
>> >   3 files changed, 63 insertions(+), 24 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> > index a1b15d0d6c48..01d3a97248b0 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> > @@ -604,10 +604,9 @@ int amdgpu_gem_metadata_ioctl(struct drm_device
>> *dev, void *data,
>> >    * vital here, so they are not reported back to userspace.
>> >    */
>> >   static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>> > -                                 struct amdgpu_vm *vm,
>> > -                                 struct amdgpu_bo_va *bo_va,
>> > -                                 uint32_t operation)
>> > +                                 struct amdgpu_vm *vm)
>> >   {
>> > +     struct amdgpu_bo_va *bo_va;
>> >       int r;
>> >
>> >       if (!amdgpu_vm_ready(vm))
>> > @@ -617,12 +616,18 @@ static void amdgpu_gem_va_update_vm(struct
>> amdgpu_device *adev,
>> >       if (r)
>> >               goto error;
>> >
>> > -     if (operation == AMDGPU_VA_OP_MAP ||
>> > -         operation == AMDGPU_VA_OP_REPLACE) {
>> > +     spin_lock(&vm->status_lock);
>> > +     while (!list_empty(&vm->dirty)) {
>> > +             bo_va = list_first_entry(&vm->dirty, struct amdgpu_bo_va,
>> > +                                      base.vm_status);
>> > +             spin_unlock(&vm->status_lock);
>> > +
>> >               r = amdgpu_vm_bo_update(adev, bo_va, false);
>> >               if (r)
>> >                       goto error;
>> > +             spin_lock(&vm->status_lock);
>> >       }
>> > +     spin_unlock(&vm->status_lock);
>> >
>> >       r = amdgpu_vm_update_pdes(adev, vm, false);
>> >
>> > @@ -792,8 +797,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev,
>> void *data,
>> >               break;
>> >       }
>> >       if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) &&
>> !amdgpu_vm_debug)
>> > -             amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
>> > -                                     args->operation);
>> > +             amdgpu_gem_va_update_vm(adev, &fpriv->vm);
>> >
>> >   error:
>> >       drm_exec_fini(&exec);
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> > index dd6f72e2a1d6..01d31891cd05 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> > @@ -191,6 +191,21 @@ static void amdgpu_vm_bo_set_evicted(struct
>> amdgpu_vm_bo_base *vm_bo, bool evict
>> >       spin_unlock(&vm_bo->vm->status_lock);
>> >   }
>> >
>> > +/**
>> > + * amdgpu_vm_bo_dirty - vm_bo is dirty
>> > + *
>> > + * @vm_bo: vm_bo which is dirty
>> > + *
>> > + * State for normal and per VM BOs that are not moved, but have new
>> entries in
>> > + * bo_va->invalids.
>> > + */
>> > +static void amdgpu_vm_bo_dirty(struct amdgpu_vm_bo_base *vm_bo)
>> > +{
>> > +     spin_lock(&vm_bo->vm->status_lock);
>> > +     list_move(&vm_bo->vm_status, &vm_bo->vm->dirty);
>> > +     spin_unlock(&vm_bo->vm->status_lock);
>> > +}
>> > +
>> >   /**
>> >    * amdgpu_vm_bo_moved - vm_bo is moved
>> >    *
>> > @@ -1042,6 +1057,9 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm,
>> >       list_for_each_entry_safe(bo_va, tmp, &vm->evicted,
>> base.eviction_status)
>> >               amdgpu_vm_bo_get_memory(bo_va, stats);
>> >
>> > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty, base.vm_status)
>> > +             amdgpu_vm_bo_get_memory(bo_va, stats);
>> > +
>> >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated,
>> base.vm_status)
>> >               amdgpu_vm_bo_get_memory(bo_va, stats);
>> >
>> > @@ -1411,6 +1429,17 @@ int amdgpu_vm_handle_moved(struct amdgpu_device
>> *adev,
>> >                       dma_resv_unlock(resv);
>> >               spin_lock(&vm->status_lock);
>> >       }
>> > +
>> > +     while (!list_empty(&vm->dirty)) {
>> > +             bo_va = list_first_entry(&vm->dirty, struct amdgpu_bo_va,
>> > +                                      base.vm_status);
>> > +             spin_unlock(&vm->status_lock);
>> > +
>> > +             r = amdgpu_vm_bo_update(adev, bo_va, false);
>> > +             if (r)
>> > +                     return r;
>> > +             spin_lock(&vm->status_lock);
>> > +     }
>> >       spin_unlock(&vm->status_lock);
>> >
>> >       return 0;
>> > @@ -1476,19 +1505,16 @@ static void amdgpu_vm_bo_insert_map(struct
>> amdgpu_device *adev,
>> >                                   struct amdgpu_bo_va_mapping *mapping)
>> >   {
>> >       struct amdgpu_vm *vm = bo_va->base.vm;
>> > -     struct amdgpu_bo *bo = bo_va->base.bo;
>> >
>> >       mapping->bo_va = bo_va;
>> >       list_add(&mapping->list, &bo_va->invalids);
>> >       amdgpu_vm_it_insert(mapping, &vm->va);
>> > +     if (!bo_va->base.moved)
>> > +             amdgpu_vm_bo_dirty(&bo_va->base);
>> >
>> >       if (mapping->flags & AMDGPU_PTE_PRT)
>> >               amdgpu_vm_prt_get(adev);
>> >
>> > -     if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
>> > -         !bo_va->base.moved) {
>> > -             amdgpu_vm_bo_moved(&bo_va->base);
>> > -     }
>> >       trace_amdgpu_vm_bo_map(bo_va, mapping);
>> >   }
>> >
>> > @@ -1725,6 +1751,8 @@ int amdgpu_vm_bo_clear_mappings(struct
>> amdgpu_device *adev,
>> >                       before->flags = tmp->flags;
>> >                       before->bo_va = tmp->bo_va;
>> >                       list_add(&before->list, &tmp->bo_va->invalids);
>> > +                     if (!tmp->bo_va->base.moved)
>> > +                             amdgpu_vm_bo_dirty(&tmp->bo_va->base);
>> >               }
>> >
>> >               /* Remember mapping split at the end */
>> > @@ -1736,6 +1764,8 @@ int amdgpu_vm_bo_clear_mappings(struct
>> amdgpu_device *adev,
>> >                       after->flags = tmp->flags;
>> >                       after->bo_va = tmp->bo_va;
>> >                       list_add(&after->list, &tmp->bo_va->invalids);
>> > +                     if (!tmp->bo_va->base.moved)
>> > +                             amdgpu_vm_bo_dirty(&tmp->bo_va->base);
>> >               }
>> >
>> >               list_del(&tmp->list);
>> > @@ -1761,30 +1791,18 @@ int amdgpu_vm_bo_clear_mappings(struct
>> amdgpu_device *adev,
>> >
>> >       /* Insert partial mapping before the range */
>> >       if (!list_empty(&before->list)) {
>> > -             struct amdgpu_bo *bo = before->bo_va->base.bo;
>> > -
>> >               amdgpu_vm_it_insert(before, &vm->va);
>> >               if (before->flags & AMDGPU_PTE_PRT)
>> >                       amdgpu_vm_prt_get(adev);
>> > -
>> > -             if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv
>> &&
>> > -                 !before->bo_va->base.moved)
>> > -                     amdgpu_vm_bo_moved(&before->bo_va->base);
>> >       } else {
>> >               kfree(before);
>> >       }
>> >
>> >       /* Insert partial mapping after the range */
>> >       if (!list_empty(&after->list)) {
>> > -             struct amdgpu_bo *bo = after->bo_va->base.bo;
>> > -
>> >               amdgpu_vm_it_insert(after, &vm->va);
>> >               if (after->flags & AMDGPU_PTE_PRT)
>> >                       amdgpu_vm_prt_get(adev);
>> > -
>> > -             if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv
>> &&
>> > -                 !after->bo_va->base.moved)
>> > -                     amdgpu_vm_bo_moved(&after->bo_va->base);
>> >       } else {
>> >               kfree(after);
>> >       }
>> > @@ -2136,6 +2154,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev,
>> struct amdgpu_vm *vm, int32_t xcp
>> >       INIT_LIST_HEAD(&vm->evicted);
>> >       INIT_LIST_HEAD(&vm->relocated);
>> >       INIT_LIST_HEAD(&vm->moved);
>> > +     INIT_LIST_HEAD(&vm->dirty);
>> >       INIT_LIST_HEAD(&vm->idle);
>> >       INIT_LIST_HEAD(&vm->invalidated);
>> >       spin_lock_init(&vm->status_lock);
>> > @@ -2648,11 +2667,13 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm
>> *vm, struct seq_file *m)
>> >   {
>> >       struct amdgpu_bo_va *bo_va, *tmp;
>> >       u64 total_idle = 0;
>> > +     u64 total_dirty = 0;
>> >       u64 total_relocated = 0;
>> >       u64 total_moved = 0;
>> >       u64 total_invalidated = 0;
>> >       u64 total_done = 0;
>> >       unsigned int total_idle_objs = 0;
>> > +     unsigned int total_dirty_objs = 0;
>> >       unsigned int total_relocated_objs = 0;
>> >       unsigned int total_moved_objs = 0;
>> >       unsigned int total_invalidated_objs = 0;
>> > @@ -2669,6 +2690,15 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm
>> *vm, struct seq_file *m)
>> >       total_idle_objs = id;
>> >       id = 0;
>> >
>> > +     seq_puts(m, "\tDirty BOs:\n");
>> > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty, base.vm_status) {
>> > +             if (!bo_va->base.bo)
>> > +                     continue;
>> > +             total_dirty += amdgpu_bo_print_info(id++, bo_va->base.bo,
>> m);
>> > +     }
>> > +     total_dirty_objs = id;
>> > +     id = 0;
>> > +
>> >       seq_puts(m, "\tRelocated BOs:\n");
>> >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated,
>> base.vm_status) {
>> >               if (!bo_va->base.bo)
>> > @@ -2707,6 +2737,8 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm
>> *vm, struct seq_file *m)
>> >
>> >       seq_printf(m, "\tTotal idle size:        %12lld\tobjs:\t%d\n",
>> total_idle,
>> >                  total_idle_objs);
>> > +     seq_printf(m, "\tTotal dirty size:       %12lld\tobjs:\t%d\n",
>> total_dirty,
>> > +                total_dirty_objs);
>> >       seq_printf(m, "\tTotal relocated size:   %12lld\tobjs:\t%d\n",
>> total_relocated,
>> >                  total_relocated_objs);
>> >       seq_printf(m, "\tTotal moved size:       %12lld\tobjs:\t%d\n",
>> total_moved,
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> > index d9ab97eabda9..f91d4fcf80b8 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> > @@ -276,6 +276,9 @@ struct amdgpu_vm {
>> >       /* per VM BOs moved, but not yet updated in the PT */
>> >       struct list_head        moved;
>> >
>> > +     /* normal and per VM BOs that are not moved, but have new PT
>> entries */
>> > +     struct list_head        dirty;
>> > +
>> >       /* All BOs of this VM not currently in the state machine */
>> >       struct list_head        idle;
>> >
>>
>>
>

[-- Attachment #2: Type: text/html, Size: 22508 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations.
  2023-10-31 14:14   ` Michel Dänzer
@ 2023-10-31 14:20     ` Bas Nieuwenhuizen
  2023-10-31 14:34     ` Christian König
  1 sibling, 0 replies; 34+ messages in thread
From: Bas Nieuwenhuizen @ 2023-10-31 14:20 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: Tatsuyuki Ishi, amd-gfx, dri-devel, christian.koenig

[-- Attachment #1: Type: text/plain, Size: 1844 bytes --]

On Tue, Oct 31, 2023 at 3:14 PM Michel Dänzer <michel.daenzer@mailbox.org>
wrote:

> On 10/31/23 14:40, Tatsuyuki Ishi wrote:
> > In Vulkan, it is the application's responsibility to perform adequate
> > synchronization before a sparse unmap, replace or BO destroy operation.
> > Until now, the kernel applied the same rule as implicitly-synchronized
> > APIs like OpenGL, which with per-VM BOs made page table updates stall the
> > queue completely. The newly added AMDGPU_VM_EXPLICIT_SYNC flag allows
> > drivers to opt-out of this behavior, while still ensuring adequate
> implicit
> > sync happens for kernel-initiated updates (e.g. BO moves).
> >
> > We record whether to use implicit sync or not for each freed mapping. To
> > avoid increasing the mapping struct's size, this is union-ized with the
> > interval tree field which is unused after the unmap.
> >
> > The reason this is done with a GEM ioctl flag, instead of being a VM /
> > context global setting, is that the current libdrm implementation shares
> > the DRM handle even between different kind of drivers (radeonsi vs radv).
>
> Different drivers always use separate contexts though, even with the same
> DRM file description, don't they?
>
> FWIW, RADV will also want explicit sync in the CS ioctl.
>
> I think a crucial problem is that VA ioctls don't take a context so a
per-context flag doesn't solve this (the previous attempt used it because
all the sync changes were on the CS submit side and not the VA ioctl side)
. So I'd still like to solve that side for RADV, but I think the VA ioctl
flag makes sense here if we need to do anything different VA ioctl wise.


> --
> Earthling Michel Dänzer            |                  https://redhat.com
> Libre software enthusiast          |         Mesa and Xwayland developer
>
>

[-- Attachment #2: Type: text/html, Size: 2516 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations.
  2023-10-31 14:14   ` Michel Dänzer
  2023-10-31 14:20     ` Bas Nieuwenhuizen
@ 2023-10-31 14:34     ` Christian König
  2023-10-31 14:56       ` Michel Dänzer
  1 sibling, 1 reply; 34+ messages in thread
From: Christian König @ 2023-10-31 14:34 UTC (permalink / raw)
  To: Michel Dänzer, Tatsuyuki Ishi, dri-devel, amd-gfx

[-- Attachment #1: Type: text/plain, Size: 1400 bytes --]



Am 31.10.23 um 15:14 schrieb Michel Dänzer:
> On 10/31/23 14:40, Tatsuyuki Ishi wrote:
>> In Vulkan, it is the application's responsibility to perform adequate
>> synchronization before a sparse unmap, replace or BO destroy operation.
>> Until now, the kernel applied the same rule as implicitly-synchronized
>> APIs like OpenGL, which with per-VM BOs made page table updates stall the
>> queue completely. The newly added AMDGPU_VM_EXPLICIT_SYNC flag allows
>> drivers to opt-out of this behavior, while still ensuring adequate implicit
>> sync happens for kernel-initiated updates (e.g. BO moves).
>>
>> We record whether to use implicit sync or not for each freed mapping. To
>> avoid increasing the mapping struct's size, this is union-ized with the
>> interval tree field which is unused after the unmap.
>>
>> The reason this is done with a GEM ioctl flag, instead of being a VM /
>> context global setting, is that the current libdrm implementation shares
>> the DRM handle even between different kind of drivers (radeonsi vs radv).
> Different drivers always use separate contexts though, even with the same DRM file description, don't they?

Separate contexts don't help here since the VA space is shared between 
the two.

>
> FWIW, RADV will also want explicit sync in the CS ioctl.
You can replace that with the DMA-buf IOCTLs like Faith is planning to 
do for NVK. Regards, Christian.

[-- Attachment #2: Type: text/html, Size: 2093 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/6] drm/amdgpu: Separate eviction from VM status.
  2023-10-31 13:55   ` Christian König
@ 2023-10-31 14:39     ` Tatsuyuki Ishi
  2023-10-31 14:44       ` Christian König
  0 siblings, 1 reply; 34+ messages in thread
From: Tatsuyuki Ishi @ 2023-10-31 14:39 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel



> On Oct 31, 2023, at 22:55, Christian König <christian.koenig@amd.com> wrote:
> 
> Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi:
>> In short, eviction never really belonged to the vm_status state machine.
> 
> I strongly disagree to that.
> 
>> Even when evicted, the BO could belong to either the moved or done state.
>> The "evicted" state needed to handle both cases, causing greater confusion.
>> 
>> Additionally, there were inconsistencies in the definition of an evicted
>> BO. Some places are based on the `evict` parameter passed from the TTM move
>> callback, while the others were updated based on whether the BO got its
>> optimal placement. The second is more accurate for our use case. With this
>> refactor, the evicted state is solely determined by the second rule.
> 
> That strongly sounds like you don't understand what the evicted state it good for.
> 
> The evicted state is for page directories, page tables and per VM BOs which needs to move around before doing the next CS.
> 
> Please further explain what you try to do here.

This is mainly an attempt to address inconsistency in the definition of “eviction”. The TTM move callback sets eviction when eviction happens through ttm_bo_evict. This is however not the only way a BO might end up outside its preferred domains.

amdgpu_vm_bo_update later updates the eviction state based on whether the BO is in its preferred domains. In my understanding this includes all cases where the BO is evicted through ttm_bo_evict. Therefore we should apply this definition right from the move callback, not only after amdgpu_vm_bo_update has been called at least once.

Tatsuyuki.

> 
> Regards,
> Christian.
> 
>> 
>> Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    | 67 +++++++++--------------
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h    |  1 +
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c |  1 +
>>  3 files changed, 29 insertions(+), 40 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 7b9762f1cddd..dd6f72e2a1d6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -174,19 +174,23 @@ int amdgpu_vm_set_pasid(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>   * State for PDs/PTs and per VM BOs which are not at the location they should
>>   * be.
>>   */
>> -static void amdgpu_vm_bo_evicted(struct amdgpu_vm_bo_base *vm_bo)
>> +static void amdgpu_vm_bo_set_evicted(struct amdgpu_vm_bo_base *vm_bo, bool evicted)
>>  {
>>  	struct amdgpu_vm *vm = vm_bo->vm;
>>  	struct amdgpu_bo *bo = vm_bo->bo;
>>  -	vm_bo->moved = true;
>>  	spin_lock(&vm_bo->vm->status_lock);
>> -	if (bo->tbo.type == ttm_bo_type_kernel)
>> -		list_move(&vm_bo->vm_status, &vm->evicted);
>> -	else
>> -		list_move_tail(&vm_bo->vm_status, &vm->evicted);
>> +	if (evicted && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
>> +		if (bo->tbo.type == ttm_bo_type_kernel)
>> +			list_move(&vm_bo->eviction_status, &vm->evicted);
>> +		else
>> +			list_move_tail(&vm_bo->eviction_status, &vm->evicted);
>> +	} else {
>> +		list_del_init(&vm_bo->eviction_status);
>> +	}
>>  	spin_unlock(&vm_bo->vm->status_lock);
>>  }
>> +
>>  /**
>>   * amdgpu_vm_bo_moved - vm_bo is moved
>>   *
>> @@ -310,6 +314,7 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
>>  	base->bo = bo;
>>  	base->next = NULL;
>>  	INIT_LIST_HEAD(&base->vm_status);
>> +	INIT_LIST_HEAD(&base->eviction_status);
>>    	if (!bo)
>>  		return;
>> @@ -336,7 +341,7 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
>>  	 * is currently evicted. add the bo to the evicted list to make sure it
>>  	 * is validated on next vm use to avoid fault.
>>  	 * */
>> -	amdgpu_vm_bo_evicted(base);
>> +	amdgpu_vm_bo_set_evicted(base, true);
>>  }
>>    /**
>> @@ -460,7 +465,7 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>  	while (!list_empty(&vm->evicted)) {
>>  		bo_base = list_first_entry(&vm->evicted,
>>  					   struct amdgpu_vm_bo_base,
>> -					   vm_status);
>> +					   eviction_status);
>>  		spin_unlock(&vm->status_lock);
>>    		bo = bo_base->bo;
>> @@ -1034,7 +1039,7 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm,
>>  	list_for_each_entry_safe(bo_va, tmp, &vm->idle, base.vm_status)
>>  		amdgpu_vm_bo_get_memory(bo_va, stats);
>>  -	list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.vm_status)
>> +	list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.eviction_status)
>>  		amdgpu_vm_bo_get_memory(bo_va, stats);
>>    	list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status)
>> @@ -1153,21 +1158,10 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
>>  			return r;
>>  	}
>>  -	/* If the BO is not in its preferred location add it back to
>> -	 * the evicted list so that it gets validated again on the
>> -	 * next command submission.
>> -	 */
>> -	if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
>> -		uint32_t mem_type = bo->tbo.resource->mem_type;
>> -
>> -		if (!(bo->preferred_domains &
>> -		      amdgpu_mem_type_to_domain(mem_type)))
>> -			amdgpu_vm_bo_evicted(&bo_va->base);
>> -		else
>> -			amdgpu_vm_bo_idle(&bo_va->base);
>> -	} else {
>> +	if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv)
>> +		amdgpu_vm_bo_idle(&bo_va->base);
>> +	else
>>  		amdgpu_vm_bo_done(&bo_va->base);
>> -	}
>>    	list_splice_init(&bo_va->invalids, &bo_va->valids);
>>  	bo_va->cleared = clear;
>> @@ -1883,6 +1877,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
>>    	spin_lock(&vm->status_lock);
>>  	list_del(&bo_va->base.vm_status);
>> +	list_del(&bo_va->base.eviction_status);
>>  	spin_unlock(&vm->status_lock);
>>    	list_for_each_entry_safe(mapping, next, &bo_va->valids, list) {
>> @@ -1959,13 +1954,18 @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
>>  	if (bo->parent && (amdgpu_bo_shadowed(bo->parent) == bo))
>>  		bo = bo->parent;
>>  +	/* If the BO is not in its preferred location add it back to
>> +	 * the evicted list so that it gets validated again on the
>> +	 * next command submission.
>> +	 */
>> +	uint32_t mem_type = bo->tbo.resource->mem_type;
>> +	bool suboptimal = !(bo->preferred_domains &
>> +			 amdgpu_mem_type_to_domain(mem_type));
>> +
>>  	for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) {
>>  		struct amdgpu_vm *vm = bo_base->vm;
>>  -		if (evicted && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
>> -			amdgpu_vm_bo_evicted(bo_base);
>> -			continue;
>> -		}
>> +		amdgpu_vm_bo_set_evicted(bo_base, suboptimal);
>>    		if (bo_base->moved)
>>  			continue;
>> @@ -2648,13 +2648,11 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>>  {
>>  	struct amdgpu_bo_va *bo_va, *tmp;
>>  	u64 total_idle = 0;
>> -	u64 total_evicted = 0;
>>  	u64 total_relocated = 0;
>>  	u64 total_moved = 0;
>>  	u64 total_invalidated = 0;
>>  	u64 total_done = 0;
>>  	unsigned int total_idle_objs = 0;
>> -	unsigned int total_evicted_objs = 0;
>>  	unsigned int total_relocated_objs = 0;
>>  	unsigned int total_moved_objs = 0;
>>  	unsigned int total_invalidated_objs = 0;
>> @@ -2671,15 +2669,6 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>>  	total_idle_objs = id;
>>  	id = 0;
>>  -	seq_puts(m, "\tEvicted BOs:\n");
>> -	list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.vm_status) {
>> -		if (!bo_va->base.bo)
>> -			continue;
>> -		total_evicted += amdgpu_bo_print_info(id++, bo_va->base.bo, m);
>> -	}
>> -	total_evicted_objs = id;
>> -	id = 0;
>> -
>>  	seq_puts(m, "\tRelocated BOs:\n");
>>  	list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status) {
>>  		if (!bo_va->base.bo)
>> @@ -2718,8 +2707,6 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>>    	seq_printf(m, "\tTotal idle size:        %12lld\tobjs:\t%d\n", total_idle,
>>  		   total_idle_objs);
>> -	seq_printf(m, "\tTotal evicted size:     %12lld\tobjs:\t%d\n", total_evicted,
>> -		   total_evicted_objs);
>>  	seq_printf(m, "\tTotal relocated size:   %12lld\tobjs:\t%d\n", total_relocated,
>>  		   total_relocated_objs);
>>  	seq_printf(m, "\tTotal moved size:       %12lld\tobjs:\t%d\n", total_moved,
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> index 204ab13184ed..d9ab97eabda9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> @@ -156,6 +156,7 @@ struct amdgpu_vm_bo_base {
>>    	/* protected by spinlock */
>>  	struct list_head		vm_status;
>> +	struct list_head		eviction_status;
>>    	/* protected by the BO being reserved */
>>  	bool				moved;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
>> index 96d601e209b8..f78f4040f466 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
>> @@ -652,6 +652,7 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry)
>>    	spin_lock(&entry->vm->status_lock);
>>  	list_del(&entry->vm_status);
>> +	list_del(&entry->eviction_status);
>>  	spin_unlock(&entry->vm->status_lock);
>>  	amdgpu_bo_unref(&entry->bo);
>>  }
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.
  2023-10-31 14:17         ` Bas Nieuwenhuizen
@ 2023-10-31 14:39           ` Tatsuyuki Ishi
  0 siblings, 0 replies; 34+ messages in thread
From: Tatsuyuki Ishi @ 2023-10-31 14:39 UTC (permalink / raw)
  To: Bas Nieuwenhuizen; +Cc: amd-gfx, Christian König, dri-devel



> On Oct 31, 2023, at 23:17, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
> 
> 
> 
> On Tue, Oct 31, 2023 at 3:08 PM Christian König <christian.koenig@amd.com> wrote:
> Am 31.10.23 um 14:59 schrieb Bas Nieuwenhuizen:
>> 
>> 
>> On Tue, Oct 31, 2023 at 2:57 PM Christian König <christian.koenig@amd.com> wrote:
>> Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi:
>> > The current amdgpu_gem_va_update_vm only tries to perform updates for the
>> > BO specified in the GEM ioctl; however, when a binding is split, the
>> > adjacent bindings also need to be updated. Such updates currently ends up
>> > getting deferred until next submission which causes stalls.
>> 
>> Yeah, that is a necessity. The hardware simply doesn't support what you 
>> try to do here in all cases.
>> 
>> What can the hardware not do here? Is this just needing to wait for TLB flushes before we can free pagetables, can we just delay that?
> 
> On some hardware generations (especially Navi1x, but also everything older than Polaris) you can't invalidate the TLB while it is in use.
> 
> For Polaris and older it just means that you don't have a guarantee that the shader can't access the memory any more. So delaying the free operation helps here.
> 
> But for Navi1x it's a workaround for a hardware bug. If you try to invalidate the TLB while it is in use you can potentially triggering memory accesses to random addresses.
> 
> That's why we still delay TLB invalidation's to the next CS and use a new VMID for each submission instead of invalidating the old one.
> 
> I'm currently working on changing that for Navi2x and newer (maybe Vega as well), but this is something you can really only do on some hw generations after validating that it works.
> 
> I think as long as we make sure all significant work gets done asynchronously, doing the TLB flushing on the next submit (/submissions, one per queue?) is fine for our purposes.

For a bit more of context, the performance / frame timing in Forza with just patch 5 wasn’t quite right. As Bas said, ideally we want to perform all the PT updates right away, and only defer the TLB flush.

For now the state machine part of this patch doesn’t seem to be going in the right direction so I’ll consider dropping this change.

Tatsuyuki.

> 
> (As an aside after thinking some more I *think* we also need some work to make these maps/unmaps (VALID->PRT and PRT->VALID) atomic, as I think it is valid Vulkan to make these race. As such I'm speculating we'd need a bit more reworking there too, not just a late free of the lower level pagetables)
> 
> - Bas 
> 
> Regards,
> Christian. 
> 
>>  
>> 
>> So this approach won't work in general.
>> 
>> Regards,
>> Christian.
>> 
>> >
>> > Introduce a new state "dirty", shared between per-VM BOs and traditional
>> > BOs, containing all BOs that have pending updates in `invalids`.
>> > amdgpu_gem_va_update_vm will now simply flush any pending updates for BOs
>> > in the dirty state.
>> >
>> > Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
>> > ---
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 18 ++++---
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 66 ++++++++++++++++++-------
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  3 ++
>> >   3 files changed, 63 insertions(+), 24 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> > index a1b15d0d6c48..01d3a97248b0 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> > @@ -604,10 +604,9 @@ int amdgpu_gem_metadata_ioctl(struct drm_device *dev, void *data,
>> >    * vital here, so they are not reported back to userspace.
>> >    */
>> >   static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>> > -                                 struct amdgpu_vm *vm,
>> > -                                 struct amdgpu_bo_va *bo_va,
>> > -                                 uint32_t operation)
>> > +                                 struct amdgpu_vm *vm)
>> >   {
>> > +     struct amdgpu_bo_va *bo_va;
>> >       int r;
>> >   
>> >       if (!amdgpu_vm_ready(vm))
>> > @@ -617,12 +616,18 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>> >       if (r)
>> >               goto error;
>> >   
>> > -     if (operation == AMDGPU_VA_OP_MAP ||
>> > -         operation == AMDGPU_VA_OP_REPLACE) {
>> > +     spin_lock(&vm->status_lock);
>> > +     while (!list_empty(&vm->dirty)) {
>> > +             bo_va = list_first_entry(&vm->dirty, struct amdgpu_bo_va,
>> > +                                      base.vm_status);
>> > +             spin_unlock(&vm->status_lock);
>> > +
>> >               r = amdgpu_vm_bo_update(adev, bo_va, false);
>> >               if (r)
>> >                       goto error;
>> > +             spin_lock(&vm->status_lock);
>> >       }
>> > +     spin_unlock(&vm->status_lock);
>> >   
>> >       r = amdgpu_vm_update_pdes(adev, vm, false);
>> >   
>> > @@ -792,8 +797,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
>> >               break;
>> >       }
>> >       if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) && !amdgpu_vm_debug)
>> > -             amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
>> > -                                     args->operation);
>> > +             amdgpu_gem_va_update_vm(adev, &fpriv->vm);
>> >   
>> >   error:
>> >       drm_exec_fini(&exec);
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> > index dd6f72e2a1d6..01d31891cd05 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> > @@ -191,6 +191,21 @@ static void amdgpu_vm_bo_set_evicted(struct amdgpu_vm_bo_base *vm_bo, bool evict
>> >       spin_unlock(&vm_bo->vm->status_lock);
>> >   }
>> >   
>> > +/**
>> > + * amdgpu_vm_bo_dirty - vm_bo is dirty
>> > + *
>> > + * @vm_bo: vm_bo which is dirty
>> > + *
>> > + * State for normal and per VM BOs that are not moved, but have new entries in
>> > + * bo_va->invalids.
>> > + */
>> > +static void amdgpu_vm_bo_dirty(struct amdgpu_vm_bo_base *vm_bo)
>> > +{
>> > +     spin_lock(&vm_bo->vm->status_lock);
>> > +     list_move(&vm_bo->vm_status, &vm_bo->vm->dirty);
>> > +     spin_unlock(&vm_bo->vm->status_lock);
>> > +}
>> > +
>> >   /**
>> >    * amdgpu_vm_bo_moved - vm_bo is moved
>> >    *
>> > @@ -1042,6 +1057,9 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm,
>> >       list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.eviction_status)
>> >               amdgpu_vm_bo_get_memory(bo_va, stats);
>> >   
>> > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty, base.vm_status)
>> > +             amdgpu_vm_bo_get_memory(bo_va, stats);
>> > +
>> >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status)
>> >               amdgpu_vm_bo_get_memory(bo_va, stats);
>> >   
>> > @@ -1411,6 +1429,17 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
>> >                       dma_resv_unlock(resv);
>> >               spin_lock(&vm->status_lock);
>> >       }
>> > +
>> > +     while (!list_empty(&vm->dirty)) {
>> > +             bo_va = list_first_entry(&vm->dirty, struct amdgpu_bo_va,
>> > +                                      base.vm_status);
>> > +             spin_unlock(&vm->status_lock);
>> > +
>> > +             r = amdgpu_vm_bo_update(adev, bo_va, false);
>> > +             if (r)
>> > +                     return r;
>> > +             spin_lock(&vm->status_lock);
>> > +     }
>> >       spin_unlock(&vm->status_lock);
>> >   
>> >       return 0;
>> > @@ -1476,19 +1505,16 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device *adev,
>> >                                   struct amdgpu_bo_va_mapping *mapping)
>> >   {
>> >       struct amdgpu_vm *vm = bo_va->base.vm;
>> > -     struct amdgpu_bo *bo = bo_va->base.bo;
>> >   
>> >       mapping->bo_va = bo_va;
>> >       list_add(&mapping->list, &bo_va->invalids);
>> >       amdgpu_vm_it_insert(mapping, &vm->va);
>> > +     if (!bo_va->base.moved)
>> > +             amdgpu_vm_bo_dirty(&bo_va->base);
>> >   
>> >       if (mapping->flags & AMDGPU_PTE_PRT)
>> >               amdgpu_vm_prt_get(adev);
>> >   
>> > -     if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
>> > -         !bo_va->base.moved) {
>> > -             amdgpu_vm_bo_moved(&bo_va->base);
>> > -     }
>> >       trace_amdgpu_vm_bo_map(bo_va, mapping);
>> >   }
>> >   
>> > @@ -1725,6 +1751,8 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>> >                       before->flags = tmp->flags;
>> >                       before->bo_va = tmp->bo_va;
>> >                       list_add(&before->list, &tmp->bo_va->invalids);
>> > +                     if (!tmp->bo_va->base.moved)
>> > +                             amdgpu_vm_bo_dirty(&tmp->bo_va->base);
>> >               }
>> >   
>> >               /* Remember mapping split at the end */
>> > @@ -1736,6 +1764,8 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>> >                       after->flags = tmp->flags;
>> >                       after->bo_va = tmp->bo_va;
>> >                       list_add(&after->list, &tmp->bo_va->invalids);
>> > +                     if (!tmp->bo_va->base.moved)
>> > +                             amdgpu_vm_bo_dirty(&tmp->bo_va->base);
>> >               }
>> >   
>> >               list_del(&tmp->list);
>> > @@ -1761,30 +1791,18 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>> >   
>> >       /* Insert partial mapping before the range */
>> >       if (!list_empty(&before->list)) {
>> > -             struct amdgpu_bo *bo = before->bo_va->base.bo;
>> > -
>> >               amdgpu_vm_it_insert(before, &vm->va);
>> >               if (before->flags & AMDGPU_PTE_PRT)
>> >                       amdgpu_vm_prt_get(adev);
>> > -
>> > -             if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
>> > -                 !before->bo_va->base.moved)
>> > -                     amdgpu_vm_bo_moved(&before->bo_va->base);
>> >       } else {
>> >               kfree(before);
>> >       }
>> >   
>> >       /* Insert partial mapping after the range */
>> >       if (!list_empty(&after->list)) {
>> > -             struct amdgpu_bo *bo = after->bo_va->base.bo;
>> > -
>> >               amdgpu_vm_it_insert(after, &vm->va);
>> >               if (after->flags & AMDGPU_PTE_PRT)
>> >                       amdgpu_vm_prt_get(adev);
>> > -
>> > -             if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
>> > -                 !after->bo_va->base.moved)
>> > -                     amdgpu_vm_bo_moved(&after->bo_va->base);
>> >       } else {
>> >               kfree(after);
>> >       }
>> > @@ -2136,6 +2154,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, int32_t xcp
>> >       INIT_LIST_HEAD(&vm->evicted);
>> >       INIT_LIST_HEAD(&vm->relocated);
>> >       INIT_LIST_HEAD(&vm->moved);
>> > +     INIT_LIST_HEAD(&vm->dirty);
>> >       INIT_LIST_HEAD(&vm->idle);
>> >       INIT_LIST_HEAD(&vm->invalidated);
>> >       spin_lock_init(&vm->status_lock);
>> > @@ -2648,11 +2667,13 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>> >   {
>> >       struct amdgpu_bo_va *bo_va, *tmp;
>> >       u64 total_idle = 0;
>> > +     u64 total_dirty = 0;
>> >       u64 total_relocated = 0;
>> >       u64 total_moved = 0;
>> >       u64 total_invalidated = 0;
>> >       u64 total_done = 0;
>> >       unsigned int total_idle_objs = 0;
>> > +     unsigned int total_dirty_objs = 0;
>> >       unsigned int total_relocated_objs = 0;
>> >       unsigned int total_moved_objs = 0;
>> >       unsigned int total_invalidated_objs = 0;
>> > @@ -2669,6 +2690,15 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>> >       total_idle_objs = id;
>> >       id = 0;
>> >   
>> > +     seq_puts(m, "\tDirty BOs:\n");
>> > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty, base.vm_status) {
>> > +             if (!bo_va->base.bo)
>> > +                     continue;
>> > +             total_dirty += amdgpu_bo_print_info(id++, bo_va->base.bo, m);
>> > +     }
>> > +     total_dirty_objs = id;
>> > +     id = 0;
>> > +
>> >       seq_puts(m, "\tRelocated BOs:\n");
>> >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status) {
>> >               if (!bo_va->base.bo)
>> > @@ -2707,6 +2737,8 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>> >   
>> >       seq_printf(m, "\tTotal idle size:        %12lld\tobjs:\t%d\n", total_idle,
>> >                  total_idle_objs);
>> > +     seq_printf(m, "\tTotal dirty size:       %12lld\tobjs:\t%d\n", total_dirty,
>> > +                total_dirty_objs);
>> >       seq_printf(m, "\tTotal relocated size:   %12lld\tobjs:\t%d\n", total_relocated,
>> >                  total_relocated_objs);
>> >       seq_printf(m, "\tTotal moved size:       %12lld\tobjs:\t%d\n", total_moved,
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> > index d9ab97eabda9..f91d4fcf80b8 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> > @@ -276,6 +276,9 @@ struct amdgpu_vm {
>> >       /* per VM BOs moved, but not yet updated in the PT */
>> >       struct list_head        moved;
>> >   
>> > +     /* normal and per VM BOs that are not moved, but have new PT entries */
>> > +     struct list_head        dirty;
>> > +
>> >       /* All BOs of this VM not currently in the state machine */
>> >       struct list_head        idle;
>> >   



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/6] drm/amdgpu: Separate eviction from VM status.
  2023-10-31 14:39     ` Tatsuyuki Ishi
@ 2023-10-31 14:44       ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2023-10-31 14:44 UTC (permalink / raw)
  To: Tatsuyuki Ishi; +Cc: amd-gfx, dri-devel

Am 31.10.23 um 15:39 schrieb Tatsuyuki Ishi:
>
>> On Oct 31, 2023, at 22:55, Christian König <christian.koenig@amd.com> wrote:
>>
>> Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi:
>>> In short, eviction never really belonged to the vm_status state machine.
>> I strongly disagree to that.
>>
>>> Even when evicted, the BO could belong to either the moved or done state.
>>> The "evicted" state needed to handle both cases, causing greater confusion.
>>>
>>> Additionally, there were inconsistencies in the definition of an evicted
>>> BO. Some places are based on the `evict` parameter passed from the TTM move
>>> callback, while the others were updated based on whether the BO got its
>>> optimal placement. The second is more accurate for our use case. With this
>>> refactor, the evicted state is solely determined by the second rule.
>> That strongly sounds like you don't understand what the evicted state it good for.
>>
>> The evicted state is for page directories, page tables and per VM BOs which needs to move around before doing the next CS.
>>
>> Please further explain what you try to do here.
> This is mainly an attempt to address inconsistency in the definition of “eviction”. The TTM move callback sets eviction when eviction happens through ttm_bo_evict. This is however not the only way a BO might end up outside its preferred domains.
>
> amdgpu_vm_bo_update later updates the eviction state based on whether the BO is in its preferred domains. In my understanding this includes all cases where the BO is evicted through ttm_bo_evict. Therefore we should apply this definition right from the move callback, not only after amdgpu_vm_bo_update has been called at least once.

No, that is something completely separated. The evicted state just means 
that we need to re-validate the BO.

One cause of this is that TTM moved the BO.

But a different cause is that TTM moved the BO, we tried to validated it 
but fallen back to GTT for now and called amdgpu_vm_bo_update(). 
amdgpu_vm_bo_update() then moves the BO into the evicted state again so 
that we try to move it into VRAM on the next command submission.

This is purely an optimization done to create enough pressure so that 
TTM can do it's work.

Christian.

>
> Tatsuyuki.
>
>> Regards,
>> Christian.
>>
>>> Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    | 67 +++++++++--------------
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h    |  1 +
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c |  1 +
>>>   3 files changed, 29 insertions(+), 40 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 7b9762f1cddd..dd6f72e2a1d6 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -174,19 +174,23 @@ int amdgpu_vm_set_pasid(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>>    * State for PDs/PTs and per VM BOs which are not at the location they should
>>>    * be.
>>>    */
>>> -static void amdgpu_vm_bo_evicted(struct amdgpu_vm_bo_base *vm_bo)
>>> +static void amdgpu_vm_bo_set_evicted(struct amdgpu_vm_bo_base *vm_bo, bool evicted)
>>>   {
>>>   	struct amdgpu_vm *vm = vm_bo->vm;
>>>   	struct amdgpu_bo *bo = vm_bo->bo;
>>>   -	vm_bo->moved = true;
>>>   	spin_lock(&vm_bo->vm->status_lock);
>>> -	if (bo->tbo.type == ttm_bo_type_kernel)
>>> -		list_move(&vm_bo->vm_status, &vm->evicted);
>>> -	else
>>> -		list_move_tail(&vm_bo->vm_status, &vm->evicted);
>>> +	if (evicted && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
>>> +		if (bo->tbo.type == ttm_bo_type_kernel)
>>> +			list_move(&vm_bo->eviction_status, &vm->evicted);
>>> +		else
>>> +			list_move_tail(&vm_bo->eviction_status, &vm->evicted);
>>> +	} else {
>>> +		list_del_init(&vm_bo->eviction_status);
>>> +	}
>>>   	spin_unlock(&vm_bo->vm->status_lock);
>>>   }
>>> +
>>>   /**
>>>    * amdgpu_vm_bo_moved - vm_bo is moved
>>>    *
>>> @@ -310,6 +314,7 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
>>>   	base->bo = bo;
>>>   	base->next = NULL;
>>>   	INIT_LIST_HEAD(&base->vm_status);
>>> +	INIT_LIST_HEAD(&base->eviction_status);
>>>     	if (!bo)
>>>   		return;
>>> @@ -336,7 +341,7 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
>>>   	 * is currently evicted. add the bo to the evicted list to make sure it
>>>   	 * is validated on next vm use to avoid fault.
>>>   	 * */
>>> -	amdgpu_vm_bo_evicted(base);
>>> +	amdgpu_vm_bo_set_evicted(base, true);
>>>   }
>>>     /**
>>> @@ -460,7 +465,7 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>>   	while (!list_empty(&vm->evicted)) {
>>>   		bo_base = list_first_entry(&vm->evicted,
>>>   					   struct amdgpu_vm_bo_base,
>>> -					   vm_status);
>>> +					   eviction_status);
>>>   		spin_unlock(&vm->status_lock);
>>>     		bo = bo_base->bo;
>>> @@ -1034,7 +1039,7 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm,
>>>   	list_for_each_entry_safe(bo_va, tmp, &vm->idle, base.vm_status)
>>>   		amdgpu_vm_bo_get_memory(bo_va, stats);
>>>   -	list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.vm_status)
>>> +	list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.eviction_status)
>>>   		amdgpu_vm_bo_get_memory(bo_va, stats);
>>>     	list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status)
>>> @@ -1153,21 +1158,10 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
>>>   			return r;
>>>   	}
>>>   -	/* If the BO is not in its preferred location add it back to
>>> -	 * the evicted list so that it gets validated again on the
>>> -	 * next command submission.
>>> -	 */
>>> -	if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
>>> -		uint32_t mem_type = bo->tbo.resource->mem_type;
>>> -
>>> -		if (!(bo->preferred_domains &
>>> -		      amdgpu_mem_type_to_domain(mem_type)))
>>> -			amdgpu_vm_bo_evicted(&bo_va->base);
>>> -		else
>>> -			amdgpu_vm_bo_idle(&bo_va->base);
>>> -	} else {
>>> +	if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv)
>>> +		amdgpu_vm_bo_idle(&bo_va->base);
>>> +	else
>>>   		amdgpu_vm_bo_done(&bo_va->base);
>>> -	}
>>>     	list_splice_init(&bo_va->invalids, &bo_va->valids);
>>>   	bo_va->cleared = clear;
>>> @@ -1883,6 +1877,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
>>>     	spin_lock(&vm->status_lock);
>>>   	list_del(&bo_va->base.vm_status);
>>> +	list_del(&bo_va->base.eviction_status);
>>>   	spin_unlock(&vm->status_lock);
>>>     	list_for_each_entry_safe(mapping, next, &bo_va->valids, list) {
>>> @@ -1959,13 +1954,18 @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
>>>   	if (bo->parent && (amdgpu_bo_shadowed(bo->parent) == bo))
>>>   		bo = bo->parent;
>>>   +	/* If the BO is not in its preferred location add it back to
>>> +	 * the evicted list so that it gets validated again on the
>>> +	 * next command submission.
>>> +	 */
>>> +	uint32_t mem_type = bo->tbo.resource->mem_type;
>>> +	bool suboptimal = !(bo->preferred_domains &
>>> +			 amdgpu_mem_type_to_domain(mem_type));
>>> +
>>>   	for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) {
>>>   		struct amdgpu_vm *vm = bo_base->vm;
>>>   -		if (evicted && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
>>> -			amdgpu_vm_bo_evicted(bo_base);
>>> -			continue;
>>> -		}
>>> +		amdgpu_vm_bo_set_evicted(bo_base, suboptimal);
>>>     		if (bo_base->moved)
>>>   			continue;
>>> @@ -2648,13 +2648,11 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>>>   {
>>>   	struct amdgpu_bo_va *bo_va, *tmp;
>>>   	u64 total_idle = 0;
>>> -	u64 total_evicted = 0;
>>>   	u64 total_relocated = 0;
>>>   	u64 total_moved = 0;
>>>   	u64 total_invalidated = 0;
>>>   	u64 total_done = 0;
>>>   	unsigned int total_idle_objs = 0;
>>> -	unsigned int total_evicted_objs = 0;
>>>   	unsigned int total_relocated_objs = 0;
>>>   	unsigned int total_moved_objs = 0;
>>>   	unsigned int total_invalidated_objs = 0;
>>> @@ -2671,15 +2669,6 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>>>   	total_idle_objs = id;
>>>   	id = 0;
>>>   -	seq_puts(m, "\tEvicted BOs:\n");
>>> -	list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.vm_status) {
>>> -		if (!bo_va->base.bo)
>>> -			continue;
>>> -		total_evicted += amdgpu_bo_print_info(id++, bo_va->base.bo, m);
>>> -	}
>>> -	total_evicted_objs = id;
>>> -	id = 0;
>>> -
>>>   	seq_puts(m, "\tRelocated BOs:\n");
>>>   	list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status) {
>>>   		if (!bo_va->base.bo)
>>> @@ -2718,8 +2707,6 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>>>     	seq_printf(m, "\tTotal idle size:        %12lld\tobjs:\t%d\n", total_idle,
>>>   		   total_idle_objs);
>>> -	seq_printf(m, "\tTotal evicted size:     %12lld\tobjs:\t%d\n", total_evicted,
>>> -		   total_evicted_objs);
>>>   	seq_printf(m, "\tTotal relocated size:   %12lld\tobjs:\t%d\n", total_relocated,
>>>   		   total_relocated_objs);
>>>   	seq_printf(m, "\tTotal moved size:       %12lld\tobjs:\t%d\n", total_moved,
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> index 204ab13184ed..d9ab97eabda9 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> @@ -156,6 +156,7 @@ struct amdgpu_vm_bo_base {
>>>     	/* protected by spinlock */
>>>   	struct list_head		vm_status;
>>> +	struct list_head		eviction_status;
>>>     	/* protected by the BO being reserved */
>>>   	bool				moved;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
>>> index 96d601e209b8..f78f4040f466 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
>>> @@ -652,6 +652,7 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry)
>>>     	spin_lock(&entry->vm->status_lock);
>>>   	list_del(&entry->vm_status);
>>> +	list_del(&entry->eviction_status);
>>>   	spin_unlock(&entry->vm->status_lock);
>>>   	amdgpu_bo_unref(&entry->bo);
>>>   }


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations.
  2023-10-31 14:34     ` Christian König
@ 2023-10-31 14:56       ` Michel Dänzer
  0 siblings, 0 replies; 34+ messages in thread
From: Michel Dänzer @ 2023-10-31 14:56 UTC (permalink / raw)
  To: Christian König, Tatsuyuki Ishi, dri-devel, amd-gfx

On 10/31/23 15:34, Christian König wrote:
> Am 31.10.23 um 15:14 schrieb Michel Dänzer:
> 
>> FWIW, RADV will also want explicit sync in the CS ioctl.
> You can replace that with the DMA-buf IOCTLs like Faith is planning to do for NVK. 

Those ioctls cannot disable implicit sync for the CS ioctl. They can be used for making implicit sync work correctly for individual BOs though, once implicit sync is disabled for the CS ioctl.


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/6] drm/amdgpu: Separate eviction from VM status.
  2023-10-31 13:40 ` [PATCH 2/6] drm/amdgpu: Separate eviction from VM status Tatsuyuki Ishi
  2023-10-31 13:55   ` Christian König
@ 2023-10-31 23:52   ` kernel test robot
  1 sibling, 0 replies; 34+ messages in thread
From: kernel test robot @ 2023-10-31 23:52 UTC (permalink / raw)
  To: Tatsuyuki Ishi, dri-devel, amd-gfx
  Cc: Tatsuyuki Ishi, christian.koenig, oe-kbuild-all

Hi Tatsuyuki,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on drm/drm-next drm-exynos/exynos-drm-next drm-intel/for-linux-next drm-intel/for-linux-next-fixes drm-tip/drm-tip linus/master v6.6 next-20231031]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Tatsuyuki-Ishi/drm-amdgpu-Don-t-implicit-sync-PRT-maps/20231031-224530
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20231031134059.171277-3-ishitatsuyuki%40gmail.com
patch subject: [PATCH 2/6] drm/amdgpu: Separate eviction from VM status.
config: arc-randconfig-001-20231101 (https://download.01.org/0day-ci/archive/20231101/202311010709.XbwKjVaq-lkp@intel.com/config)
compiler: arceb-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231101/202311010709.XbwKjVaq-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311010709.XbwKjVaq-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:178: warning: Function parameter or member 'evicted' not described in 'amdgpu_vm_bo_set_evicted'
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:178: warning: expecting prototype for amdgpu_vm_bo_evicted(). Prototype was for amdgpu_vm_bo_set_evicted() instead

vim +178 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

dcb388eddb5f1b Nirmoy Das      2021-06-28  168  
bcdc9fd634d1f0 Christian König 2018-08-30  169  /**
bcdc9fd634d1f0 Christian König 2018-08-30  170   * amdgpu_vm_bo_evicted - vm_bo is evicted
bcdc9fd634d1f0 Christian König 2018-08-30  171   *
bcdc9fd634d1f0 Christian König 2018-08-30  172   * @vm_bo: vm_bo which is evicted
bcdc9fd634d1f0 Christian König 2018-08-30  173   *
bcdc9fd634d1f0 Christian König 2018-08-30  174   * State for PDs/PTs and per VM BOs which are not at the location they should
bcdc9fd634d1f0 Christian König 2018-08-30  175   * be.
bcdc9fd634d1f0 Christian König 2018-08-30  176   */
cac82290238e47 Tatsuyuki Ishi  2023-10-31  177  static void amdgpu_vm_bo_set_evicted(struct amdgpu_vm_bo_base *vm_bo, bool evicted)
bcdc9fd634d1f0 Christian König 2018-08-30 @178  {
bcdc9fd634d1f0 Christian König 2018-08-30  179  	struct amdgpu_vm *vm = vm_bo->vm;
bcdc9fd634d1f0 Christian König 2018-08-30  180  	struct amdgpu_bo *bo = vm_bo->bo;
bcdc9fd634d1f0 Christian König 2018-08-30  181  
757eb2bedd08a1 Philip Yang     2022-09-15  182  	spin_lock(&vm_bo->vm->status_lock);
cac82290238e47 Tatsuyuki Ishi  2023-10-31  183  	if (evicted && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
bcdc9fd634d1f0 Christian König 2018-08-30  184  		if (bo->tbo.type == ttm_bo_type_kernel)
cac82290238e47 Tatsuyuki Ishi  2023-10-31  185  			list_move(&vm_bo->eviction_status, &vm->evicted);
bcdc9fd634d1f0 Christian König 2018-08-30  186  		else
cac82290238e47 Tatsuyuki Ishi  2023-10-31  187  			list_move_tail(&vm_bo->eviction_status, &vm->evicted);
cac82290238e47 Tatsuyuki Ishi  2023-10-31  188  	} else {
cac82290238e47 Tatsuyuki Ishi  2023-10-31  189  		list_del_init(&vm_bo->eviction_status);
cac82290238e47 Tatsuyuki Ishi  2023-10-31  190  	}
757eb2bedd08a1 Philip Yang     2022-09-15  191  	spin_unlock(&vm_bo->vm->status_lock);
bcdc9fd634d1f0 Christian König 2018-08-30  192  }
cac82290238e47 Tatsuyuki Ishi  2023-10-31  193  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.
  2023-10-31 13:40 ` [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly Tatsuyuki Ishi
  2023-10-31 13:57   ` Christian König
@ 2023-11-01  1:18   ` kernel test robot
  1 sibling, 0 replies; 34+ messages in thread
From: kernel test robot @ 2023-11-01  1:18 UTC (permalink / raw)
  To: Tatsuyuki Ishi, dri-devel, amd-gfx
  Cc: Tatsuyuki Ishi, christian.koenig, oe-kbuild-all

Hi Tatsuyuki,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on drm-exynos/exynos-drm-next drm-intel/for-linux-next-fixes linus/master v6.6]
[cannot apply to drm/drm-next drm-intel/for-linux-next drm-tip/drm-tip next-20231031]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Tatsuyuki-Ishi/drm-amdgpu-Don-t-implicit-sync-PRT-maps/20231031-224530
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20231031134059.171277-4-ishitatsuyuki%40gmail.com
patch subject: [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.
config: arc-randconfig-001-20231101 (https://download.01.org/0day-ci/archive/20231101/202311010948.G6I55pTu-lkp@intel.com/config)
compiler: arceb-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231101/202311010948.G6I55pTu-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311010948.G6I55pTu-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c:608: warning: Excess function parameter 'bo_va' description in 'amdgpu_gem_va_update_vm'
>> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c:608: warning: Excess function parameter 'operation' description in 'amdgpu_gem_va_update_vm'

vim +608 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c

d38ceaf99ed015f Alex Deucher        2015-04-20  594  
d38ceaf99ed015f Alex Deucher        2015-04-20  595  /**
d38ceaf99ed015f Alex Deucher        2015-04-20  596   * amdgpu_gem_va_update_vm -update the bo_va in its VM
d38ceaf99ed015f Alex Deucher        2015-04-20  597   *
d38ceaf99ed015f Alex Deucher        2015-04-20  598   * @adev: amdgpu_device pointer
dc54d3d1744d23e Christian König     2017-03-13  599   * @vm: vm to update
d38ceaf99ed015f Alex Deucher        2015-04-20  600   * @bo_va: bo_va to update
dc54d3d1744d23e Christian König     2017-03-13  601   * @operation: map, unmap or clear
d38ceaf99ed015f Alex Deucher        2015-04-20  602   *
2ffdaafb5d5f37b Christian König     2017-01-27  603   * Update the bo_va directly after setting its address. Errors are not
d38ceaf99ed015f Alex Deucher        2015-04-20  604   * vital here, so they are not reported back to userspace.
d38ceaf99ed015f Alex Deucher        2015-04-20  605   */
d38ceaf99ed015f Alex Deucher        2015-04-20  606  static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
ddf1ffe56ab385a Tatsuyuki Ishi      2023-10-31  607  				    struct amdgpu_vm *vm)
d38ceaf99ed015f Alex Deucher        2015-04-20 @608  {
ddf1ffe56ab385a Tatsuyuki Ishi      2023-10-31  609  	struct amdgpu_bo_va *bo_va;
3f3333f8a0e90ac Christian König     2017-08-03  610  	int r;
d38ceaf99ed015f Alex Deucher        2015-04-20  611  
3f3333f8a0e90ac Christian König     2017-08-03  612  	if (!amdgpu_vm_ready(vm))
3f3333f8a0e90ac Christian König     2017-08-03  613  		return;
e410b5cbabe70b1 Chunming Zhou       2015-12-07  614  
f34678187a33970 Nicolai Hähnle      2017-03-23  615  	r = amdgpu_vm_clear_freed(adev, vm, NULL);
d38ceaf99ed015f Alex Deucher        2015-04-20  616  	if (r)
2ffdaafb5d5f37b Christian König     2017-01-27  617  		goto error;
194a33643b1161f monk.liu            2015-07-22  618  
ddf1ffe56ab385a Tatsuyuki Ishi      2023-10-31  619  	spin_lock(&vm->status_lock);
ddf1ffe56ab385a Tatsuyuki Ishi      2023-10-31  620  	while (!list_empty(&vm->dirty)) {
ddf1ffe56ab385a Tatsuyuki Ishi      2023-10-31  621  		bo_va = list_first_entry(&vm->dirty, struct amdgpu_bo_va,
ddf1ffe56ab385a Tatsuyuki Ishi      2023-10-31  622  					 base.vm_status);
ddf1ffe56ab385a Tatsuyuki Ishi      2023-10-31  623  		spin_unlock(&vm->status_lock);
ddf1ffe56ab385a Tatsuyuki Ishi      2023-10-31  624  
8f8cc3fb43508a2 Christian König     2022-03-17  625  		r = amdgpu_vm_bo_update(adev, bo_va, false);
0abc6878fc2d699 Christian König     2017-09-01  626  		if (r)
0abc6878fc2d699 Christian König     2017-09-01  627  			goto error;
ddf1ffe56ab385a Tatsuyuki Ishi      2023-10-31  628  		spin_lock(&vm->status_lock);
93bab704c1513f8 Gustavo A. R. Silva 2018-02-14  629  	}
ddf1ffe56ab385a Tatsuyuki Ishi      2023-10-31  630  	spin_unlock(&vm->status_lock);
93bab704c1513f8 Gustavo A. R. Silva 2018-02-14  631  
807e2994092c0bd Christian König     2019-03-14  632  	r = amdgpu_vm_update_pdes(adev, vm, false);
0abc6878fc2d699 Christian König     2017-09-01  633  
2ffdaafb5d5f37b Christian König     2017-01-27  634  error:
68fdd3df79ee4bf Christian König     2015-06-16  635  	if (r && r != -ERESTARTSYS)
d38ceaf99ed015f Alex Deucher        2015-04-20  636  		DRM_ERROR("Couldn't update BO_VA (%d)\n", r);
d38ceaf99ed015f Alex Deucher        2015-04-20  637  }
d38ceaf99ed015f Alex Deucher        2015-04-20  638  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations.
  2023-10-31 13:40 ` [PATCH 5/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
  2023-10-31 14:14   ` Michel Dänzer
@ 2023-11-01  2:42   ` kernel test robot
  1 sibling, 0 replies; 34+ messages in thread
From: kernel test robot @ 2023-11-01  2:42 UTC (permalink / raw)
  To: Tatsuyuki Ishi, dri-devel, amd-gfx
  Cc: Tatsuyuki Ishi, christian.koenig, oe-kbuild-all

Hi Tatsuyuki,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on drm-exynos/exynos-drm-next drm-intel/for-linux-next-fixes linus/master v6.6]
[cannot apply to drm/drm-next drm-intel/for-linux-next drm-tip/drm-tip next-20231031]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Tatsuyuki-Ishi/drm-amdgpu-Don-t-implicit-sync-PRT-maps/20231031-224530
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20231031134059.171277-6-ishitatsuyuki%40gmail.com
patch subject: [PATCH 5/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations.
config: arc-randconfig-001-20231101 (https://download.01.org/0day-ci/archive/20231101/202311011037.Bt6NSYwA-lkp@intel.com/config)
compiler: arceb-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231101/202311011037.Bt6NSYwA-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311011037.Bt6NSYwA-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:178: warning: Function parameter or member 'evicted' not described in 'amdgpu_vm_bo_set_evicted'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:178: warning: expecting prototype for amdgpu_vm_bo_evicted(). Prototype was for amdgpu_vm_bo_set_evicted() instead
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1667: warning: Function parameter or member 'sync_unmap' not described in 'amdgpu_vm_bo_unmap'


vim +1667 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

d38ceaf99ed015 Alex Deucher      2015-04-20  1650  
d38ceaf99ed015 Alex Deucher      2015-04-20  1651  /**
d38ceaf99ed015 Alex Deucher      2015-04-20  1652   * amdgpu_vm_bo_unmap - remove bo mapping from vm
d38ceaf99ed015 Alex Deucher      2015-04-20  1653   *
d38ceaf99ed015 Alex Deucher      2015-04-20  1654   * @adev: amdgpu_device pointer
d38ceaf99ed015 Alex Deucher      2015-04-20  1655   * @bo_va: bo_va to remove the address from
d38ceaf99ed015 Alex Deucher      2015-04-20  1656   * @saddr: where to the BO is mapped
d38ceaf99ed015 Alex Deucher      2015-04-20  1657   *
d38ceaf99ed015 Alex Deucher      2015-04-20  1658   * Remove a mapping of the BO at the specefied addr from the VM.
7fc48e5912795c Andrey Grodzovsky 2018-06-11  1659   *
7fc48e5912795c Andrey Grodzovsky 2018-06-11  1660   * Returns:
7fc48e5912795c Andrey Grodzovsky 2018-06-11  1661   * 0 for success, error for failure.
d38ceaf99ed015 Alex Deucher      2015-04-20  1662   *
49b02b180a541d Chunming Zhou     2015-11-13  1663   * Object has to be reserved and unreserved outside!
d38ceaf99ed015 Alex Deucher      2015-04-20  1664   */
1550024e9de031 Tatsuyuki Ishi    2023-10-31  1665  int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
1550024e9de031 Tatsuyuki Ishi    2023-10-31  1666  		       uint64_t saddr, bool sync_unmap)
d38ceaf99ed015 Alex Deucher      2015-04-20 @1667  {
d38ceaf99ed015 Alex Deucher      2015-04-20  1668  	struct amdgpu_bo_va_mapping *mapping;
ec681545afe5a4 Christian König   2017-08-01  1669  	struct amdgpu_vm *vm = bo_va->base.vm;
7fc11959018f8b Christian König   2015-07-30  1670  	bool valid = true;
d38ceaf99ed015 Alex Deucher      2015-04-20  1671  
6c7fc503a47f9b Christian König   2015-06-05  1672  	saddr /= AMDGPU_GPU_PAGE_SIZE;
32b41ac21fde8f Christian König   2016-03-08  1673  
7fc11959018f8b Christian König   2015-07-30  1674  	list_for_each_entry(mapping, &bo_va->valids, list) {
a9f87f64525435 Christian König   2017-03-30  1675  		if (mapping->start == saddr)
7fc11959018f8b Christian König   2015-07-30  1676  			break;
7fc11959018f8b Christian König   2015-07-30  1677  	}
7fc11959018f8b Christian König   2015-07-30  1678  
7fc11959018f8b Christian König   2015-07-30  1679  	if (&mapping->list == &bo_va->valids) {
7fc11959018f8b Christian König   2015-07-30  1680  		valid = false;
7fc11959018f8b Christian König   2015-07-30  1681  
7fc11959018f8b Christian König   2015-07-30  1682  		list_for_each_entry(mapping, &bo_va->invalids, list) {
a9f87f64525435 Christian König   2017-03-30  1683  			if (mapping->start == saddr)
d38ceaf99ed015 Alex Deucher      2015-04-20  1684  				break;
d38ceaf99ed015 Alex Deucher      2015-04-20  1685  		}
d38ceaf99ed015 Alex Deucher      2015-04-20  1686  
32b41ac21fde8f Christian König   2016-03-08  1687  		if (&mapping->list == &bo_va->invalids)
d38ceaf99ed015 Alex Deucher      2015-04-20  1688  			return -ENOENT;
d38ceaf99ed015 Alex Deucher      2015-04-20  1689  	}
32b41ac21fde8f Christian König   2016-03-08  1690  
d38ceaf99ed015 Alex Deucher      2015-04-20  1691  	list_del(&mapping->list);
a9f87f64525435 Christian König   2017-03-30  1692  	amdgpu_vm_it_remove(mapping, &vm->va);
aebc5e6f50f770 Christian König   2017-09-06  1693  	mapping->bo_va = NULL;
1550024e9de031 Tatsuyuki Ishi    2023-10-31  1694  	mapping->sync_unmap = sync_unmap;
93e3e4385b69d8 Christian König   2015-06-09  1695  	trace_amdgpu_vm_bo_unmap(bo_va, mapping);
d38ceaf99ed015 Alex Deucher      2015-04-20  1696  
e17841b97587ad Christian König   2016-03-08  1697  	if (valid)
d38ceaf99ed015 Alex Deucher      2015-04-20  1698  		list_add(&mapping->list, &vm->freed);
e17841b97587ad Christian König   2016-03-08  1699  	else
284710fa6c3a5f Christian König   2017-01-30  1700  		amdgpu_vm_free_mapping(adev, vm, mapping,
284710fa6c3a5f Christian König   2017-01-30  1701  				       bo_va->last_pt_update);
d38ceaf99ed015 Alex Deucher      2015-04-20  1702  
d38ceaf99ed015 Alex Deucher      2015-04-20  1703  	return 0;
d38ceaf99ed015 Alex Deucher      2015-04-20  1704  }
d38ceaf99ed015 Alex Deucher      2015-04-20  1705  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.
  2023-10-31 14:07       ` Christian König
  2023-10-31 14:17         ` Bas Nieuwenhuizen
@ 2023-11-02  2:36         ` Lang Yu
  2023-11-02  6:41           ` Christian König
  2023-11-06  7:56         ` Tatsuyuki Ishi
  2 siblings, 1 reply; 34+ messages in thread
From: Lang Yu @ 2023-11-02  2:36 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel, Tatsuyuki Ishi, amd-gfx

On 10/31/ , Christian König wrote:
> Am 31.10.23 um 14:59 schrieb Bas Nieuwenhuizen:
> > 
> > 
> > On Tue, Oct 31, 2023 at 2:57 PM Christian König
> > <christian.koenig@amd.com> wrote:
> > 
> >     Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi:
> >     > The current amdgpu_gem_va_update_vm only tries to perform
> >     updates for the
> >     > BO specified in the GEM ioctl; however, when a binding is split, the
> >     > adjacent bindings also need to be updated. Such updates
> >     currently ends up
> >     > getting deferred until next submission which causes stalls.
> > 
> >     Yeah, that is a necessity. The hardware simply doesn't support
> >     what you
> >     try to do here in all cases.
> > 
> > 
> > What can the hardware not do here? Is this just needing to wait for TLB
> > flushes before we can free pagetables, can we just delay that?
> 
> On some hardware generations (especially Navi1x, but also everything older
> than Polaris) you can't invalidate the TLB while it is in use.

Hi Christian,

non-legacy invalidation can invalidate the TLB while it is in use.
Right? Thanks.

Regards,
Lang

> For Polaris and older it just means that you don't have a guarantee that the
> shader can't access the memory any more. So delaying the free operation
> helps here.
> 
> But for Navi1x it's a workaround for a hardware bug. If you try to
> invalidate the TLB while it is in use you can potentially triggering memory
> accesses to random addresses.
> 
> That's why we still delay TLB invalidation's to the next CS and use a new
> VMID for each submission instead of invalidating the old one.
> 
> I'm currently working on changing that for Navi2x and newer (maybe Vega as
> well), but this is something you can really only do on some hw generations
> after validating that it works.
> 
> Regards,
> Christian.
> 
> > 
> > 
> >     So this approach won't work in general.
> > 
> >     Regards,
> >     Christian.
> > 
> >     >
> >     > Introduce a new state "dirty", shared between per-VM BOs and
> >     traditional
> >     > BOs, containing all BOs that have pending updates in `invalids`.
> >     > amdgpu_gem_va_update_vm will now simply flush any pending
> >     updates for BOs
> >     > in the dirty state.
> >     >
> >     > Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
> >     > ---
> >     >   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 18 ++++---
> >     >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 66
> >     ++++++++++++++++++-------
> >     >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  3 ++
> >     >   3 files changed, 63 insertions(+), 24 deletions(-)
> >     >
> >     > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >     b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >     > index a1b15d0d6c48..01d3a97248b0 100644
> >     > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >     > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >     > @@ -604,10 +604,9 @@ int amdgpu_gem_metadata_ioctl(struct
> >     drm_device *dev, void *data,
> >     >    * vital here, so they are not reported back to userspace.
> >     >    */
> >     >   static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
> >     > -                                 struct amdgpu_vm *vm,
> >     > -                                 struct amdgpu_bo_va *bo_va,
> >     > -                                 uint32_t operation)
> >     > +                                 struct amdgpu_vm *vm)
> >     >   {
> >     > +     struct amdgpu_bo_va *bo_va;
> >     >       int r;
> >     >
> >     >       if (!amdgpu_vm_ready(vm))
> >     > @@ -617,12 +616,18 @@ static void amdgpu_gem_va_update_vm(struct
> >     amdgpu_device *adev,
> >     >       if (r)
> >     >               goto error;
> >     >
> >     > -     if (operation == AMDGPU_VA_OP_MAP ||
> >     > -         operation == AMDGPU_VA_OP_REPLACE) {
> >     > +     spin_lock(&vm->status_lock);
> >     > +     while (!list_empty(&vm->dirty)) {
> >     > +             bo_va = list_first_entry(&vm->dirty, struct
> >     amdgpu_bo_va,
> >     > +                                      base.vm_status);
> >     > +             spin_unlock(&vm->status_lock);
> >     > +
> >     >               r = amdgpu_vm_bo_update(adev, bo_va, false);
> >     >               if (r)
> >     >                       goto error;
> >     > +             spin_lock(&vm->status_lock);
> >     >       }
> >     > +     spin_unlock(&vm->status_lock);
> >     >
> >     >       r = amdgpu_vm_update_pdes(adev, vm, false);
> >     >
> >     > @@ -792,8 +797,7 @@ int amdgpu_gem_va_ioctl(struct drm_device
> >     *dev, void *data,
> >     >               break;
> >     >       }
> >     >       if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) &&
> >     !amdgpu_vm_debug)
> >     > -             amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
> >     > -  args->operation);
> >     > +             amdgpu_gem_va_update_vm(adev, &fpriv->vm);
> >     >
> >     >   error:
> >     >       drm_exec_fini(&exec);
> >     > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> >     b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> >     > index dd6f72e2a1d6..01d31891cd05 100644
> >     > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> >     > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> >     > @@ -191,6 +191,21 @@ static void amdgpu_vm_bo_set_evicted(struct
> >     amdgpu_vm_bo_base *vm_bo, bool evict
> >     >       spin_unlock(&vm_bo->vm->status_lock);
> >     >   }
> >     >
> >     > +/**
> >     > + * amdgpu_vm_bo_dirty - vm_bo is dirty
> >     > + *
> >     > + * @vm_bo: vm_bo which is dirty
> >     > + *
> >     > + * State for normal and per VM BOs that are not moved, but have
> >     new entries in
> >     > + * bo_va->invalids.
> >     > + */
> >     > +static void amdgpu_vm_bo_dirty(struct amdgpu_vm_bo_base *vm_bo)
> >     > +{
> >     > +     spin_lock(&vm_bo->vm->status_lock);
> >     > +     list_move(&vm_bo->vm_status, &vm_bo->vm->dirty);
> >     > +     spin_unlock(&vm_bo->vm->status_lock);
> >     > +}
> >     > +
> >     >   /**
> >     >    * amdgpu_vm_bo_moved - vm_bo is moved
> >     >    *
> >     > @@ -1042,6 +1057,9 @@ void amdgpu_vm_get_memory(struct amdgpu_vm
> >     *vm,
> >     >       list_for_each_entry_safe(bo_va, tmp, &vm->evicted,
> >     base.eviction_status)
> >     >               amdgpu_vm_bo_get_memory(bo_va, stats);
> >     >
> >     > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty,
> >     base.vm_status)
> >     > +             amdgpu_vm_bo_get_memory(bo_va, stats);
> >     > +
> >     >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated,
> >     base.vm_status)
> >     >               amdgpu_vm_bo_get_memory(bo_va, stats);
> >     >
> >     > @@ -1411,6 +1429,17 @@ int amdgpu_vm_handle_moved(struct
> >     amdgpu_device *adev,
> >     >                       dma_resv_unlock(resv);
> >     >               spin_lock(&vm->status_lock);
> >     >       }
> >     > +
> >     > +     while (!list_empty(&vm->dirty)) {
> >     > +             bo_va = list_first_entry(&vm->dirty, struct
> >     amdgpu_bo_va,
> >     > +                                      base.vm_status);
> >     > +             spin_unlock(&vm->status_lock);
> >     > +
> >     > +             r = amdgpu_vm_bo_update(adev, bo_va, false);
> >     > +             if (r)
> >     > +                     return r;
> >     > +             spin_lock(&vm->status_lock);
> >     > +     }
> >     >       spin_unlock(&vm->status_lock);
> >     >
> >     >       return 0;
> >     > @@ -1476,19 +1505,16 @@ static void
> >     amdgpu_vm_bo_insert_map(struct amdgpu_device *adev,
> >     >                                   struct amdgpu_bo_va_mapping
> >     *mapping)
> >     >   {
> >     >       struct amdgpu_vm *vm = bo_va->base.vm;
> >     > -     struct amdgpu_bo *bo = bo_va->base.bo <http://base.bo>;
> >     >
> >     >       mapping->bo_va = bo_va;
> >     >       list_add(&mapping->list, &bo_va->invalids);
> >     >       amdgpu_vm_it_insert(mapping, &vm->va);
> >     > +     if (!bo_va->base.moved)
> >     > +             amdgpu_vm_bo_dirty(&bo_va->base);
> >     >
> >     >       if (mapping->flags & AMDGPU_PTE_PRT)
> >     >               amdgpu_vm_prt_get(adev);
> >     >
> >     > -     if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
> >     > -         !bo_va->base.moved) {
> >     > -             amdgpu_vm_bo_moved(&bo_va->base);
> >     > -     }
> >     >       trace_amdgpu_vm_bo_map(bo_va, mapping);
> >     >   }
> >     >
> >     > @@ -1725,6 +1751,8 @@ int amdgpu_vm_bo_clear_mappings(struct
> >     amdgpu_device *adev,
> >     >                       before->flags = tmp->flags;
> >     >                       before->bo_va = tmp->bo_va;
> >     >                       list_add(&before->list,
> >     &tmp->bo_va->invalids);
> >     > +                     if (!tmp->bo_va->base.moved)
> >     > +  amdgpu_vm_bo_dirty(&tmp->bo_va->base);
> >     >               }
> >     >
> >     >               /* Remember mapping split at the end */
> >     > @@ -1736,6 +1764,8 @@ int amdgpu_vm_bo_clear_mappings(struct
> >     amdgpu_device *adev,
> >     >                       after->flags = tmp->flags;
> >     >                       after->bo_va = tmp->bo_va;
> >     >                       list_add(&after->list, &tmp->bo_va->invalids);
> >     > +                     if (!tmp->bo_va->base.moved)
> >     > +  amdgpu_vm_bo_dirty(&tmp->bo_va->base);
> >     >               }
> >     >
> >     >               list_del(&tmp->list);
> >     > @@ -1761,30 +1791,18 @@ int amdgpu_vm_bo_clear_mappings(struct
> >     amdgpu_device *adev,
> >     >
> >     >       /* Insert partial mapping before the range */
> >     >       if (!list_empty(&before->list)) {
> >     > -             struct amdgpu_bo *bo = before->bo_va->base.bo
> >     <http://base.bo>;
> >     > -
> >     >               amdgpu_vm_it_insert(before, &vm->va);
> >     >               if (before->flags & AMDGPU_PTE_PRT)
> >     >                       amdgpu_vm_prt_get(adev);
> >     > -
> >     > -             if (bo && bo->tbo.base.resv ==
> >     vm->root.bo->tbo.base.resv &&
> >     > -                 !before->bo_va->base.moved)
> >     > -  amdgpu_vm_bo_moved(&before->bo_va->base);
> >     >       } else {
> >     >               kfree(before);
> >     >       }
> >     >
> >     >       /* Insert partial mapping after the range */
> >     >       if (!list_empty(&after->list)) {
> >     > -             struct amdgpu_bo *bo = after->bo_va->base.bo
> >     <http://base.bo>;
> >     > -
> >     >               amdgpu_vm_it_insert(after, &vm->va);
> >     >               if (after->flags & AMDGPU_PTE_PRT)
> >     >                       amdgpu_vm_prt_get(adev);
> >     > -
> >     > -             if (bo && bo->tbo.base.resv ==
> >     vm->root.bo->tbo.base.resv &&
> >     > -                 !after->bo_va->base.moved)
> >     > -  amdgpu_vm_bo_moved(&after->bo_va->base);
> >     >       } else {
> >     >               kfree(after);
> >     >       }
> >     > @@ -2136,6 +2154,7 @@ int amdgpu_vm_init(struct amdgpu_device
> >     *adev, struct amdgpu_vm *vm, int32_t xcp
> >     >       INIT_LIST_HEAD(&vm->evicted);
> >     >       INIT_LIST_HEAD(&vm->relocated);
> >     >       INIT_LIST_HEAD(&vm->moved);
> >     > +     INIT_LIST_HEAD(&vm->dirty);
> >     >       INIT_LIST_HEAD(&vm->idle);
> >     >       INIT_LIST_HEAD(&vm->invalidated);
> >     >       spin_lock_init(&vm->status_lock);
> >     > @@ -2648,11 +2667,13 @@ void amdgpu_debugfs_vm_bo_info(struct
> >     amdgpu_vm *vm, struct seq_file *m)
> >     >   {
> >     >       struct amdgpu_bo_va *bo_va, *tmp;
> >     >       u64 total_idle = 0;
> >     > +     u64 total_dirty = 0;
> >     >       u64 total_relocated = 0;
> >     >       u64 total_moved = 0;
> >     >       u64 total_invalidated = 0;
> >     >       u64 total_done = 0;
> >     >       unsigned int total_idle_objs = 0;
> >     > +     unsigned int total_dirty_objs = 0;
> >     >       unsigned int total_relocated_objs = 0;
> >     >       unsigned int total_moved_objs = 0;
> >     >       unsigned int total_invalidated_objs = 0;
> >     > @@ -2669,6 +2690,15 @@ void amdgpu_debugfs_vm_bo_info(struct
> >     amdgpu_vm *vm, struct seq_file *m)
> >     >       total_idle_objs = id;
> >     >       id = 0;
> >     >
> >     > +     seq_puts(m, "\tDirty BOs:\n");
> >     > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty,
> >     base.vm_status) {
> >     > +             if (!bo_va->base.bo <http://base.bo>)
> >     > +                     continue;
> >     > +             total_dirty += amdgpu_bo_print_info(id++,
> >     bo_va->base.bo <http://base.bo>, m);
> >     > +     }
> >     > +     total_dirty_objs = id;
> >     > +     id = 0;
> >     > +
> >     >       seq_puts(m, "\tRelocated BOs:\n");
> >     >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated,
> >     base.vm_status) {
> >     >               if (!bo_va->base.bo <http://base.bo>)
> >     > @@ -2707,6 +2737,8 @@ void amdgpu_debugfs_vm_bo_info(struct
> >     amdgpu_vm *vm, struct seq_file *m)
> >     >
> >     >       seq_printf(m, "\tTotal idle size: %12lld\tobjs:\t%d\n",
> >     total_idle,
> >     >                  total_idle_objs);
> >     > +     seq_printf(m, "\tTotal dirty size:  %12lld\tobjs:\t%d\n",
> >     total_dirty,
> >     > +                total_dirty_objs);
> >     >       seq_printf(m, "\tTotal relocated size:
> >      %12lld\tobjs:\t%d\n", total_relocated,
> >     >                  total_relocated_objs);
> >     >       seq_printf(m, "\tTotal moved size:  %12lld\tobjs:\t%d\n",
> >     total_moved,
> >     > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> >     b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> >     > index d9ab97eabda9..f91d4fcf80b8 100644
> >     > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> >     > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> >     > @@ -276,6 +276,9 @@ struct amdgpu_vm {
> >     >       /* per VM BOs moved, but not yet updated in the PT */
> >     >       struct list_head        moved;
> >     >
> >     > +     /* normal and per VM BOs that are not moved, but have new
> >     PT entries */
> >     > +     struct list_head        dirty;
> >     > +
> >     >       /* All BOs of this VM not currently in the state machine */
> >     >       struct list_head        idle;
> >     >
> > 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.
  2023-11-02  2:36         ` Lang Yu
@ 2023-11-02  6:41           ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2023-11-02  6:41 UTC (permalink / raw)
  To: Lang Yu; +Cc: dri-devel, Tatsuyuki Ishi, amd-gfx

Am 02.11.23 um 03:36 schrieb Lang Yu:
> On 10/31/ , Christian König wrote:
>> Am 31.10.23 um 14:59 schrieb Bas Nieuwenhuizen:
>>>
>>> On Tue, Oct 31, 2023 at 2:57 PM Christian König
>>> <christian.koenig@amd.com> wrote:
>>>
>>>      Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi:
>>>      > The current amdgpu_gem_va_update_vm only tries to perform
>>>      updates for the
>>>      > BO specified in the GEM ioctl; however, when a binding is split, the
>>>      > adjacent bindings also need to be updated. Such updates
>>>      currently ends up
>>>      > getting deferred until next submission which causes stalls.
>>>
>>>      Yeah, that is a necessity. The hardware simply doesn't support
>>>      what you
>>>      try to do here in all cases.
>>>
>>>
>>> What can the hardware not do here? Is this just needing to wait for TLB
>>> flushes before we can free pagetables, can we just delay that?
>> On some hardware generations (especially Navi1x, but also everything older
>> than Polaris) you can't invalidate the TLB while it is in use.
> Hi Christian,
>
> non-legacy invalidation can invalidate the TLB while it is in use.
> Right? Thanks.

Right, the problem is that they are only available starting with Vega 
(for GFX8 they only work for the APUs IIRC).

Regards,
Christian.

>
> Regards,
> Lang
>
>> For Polaris and older it just means that you don't have a guarantee that the
>> shader can't access the memory any more. So delaying the free operation
>> helps here.
>>
>> But for Navi1x it's a workaround for a hardware bug. If you try to
>> invalidate the TLB while it is in use you can potentially triggering memory
>> accesses to random addresses.
>>
>> That's why we still delay TLB invalidation's to the next CS and use a new
>> VMID for each submission instead of invalidating the old one.
>>
>> I'm currently working on changing that for Navi2x and newer (maybe Vega as
>> well), but this is something you can really only do on some hw generations
>> after validating that it works.
>>
>> Regards,
>> Christian.
>>
>>>
>>>      So this approach won't work in general.
>>>
>>>      Regards,
>>>      Christian.
>>>
>>>      >
>>>      > Introduce a new state "dirty", shared between per-VM BOs and
>>>      traditional
>>>      > BOs, containing all BOs that have pending updates in `invalids`.
>>>      > amdgpu_gem_va_update_vm will now simply flush any pending
>>>      updates for BOs
>>>      > in the dirty state.
>>>      >
>>>      > Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
>>>      > ---
>>>      >   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 18 ++++---
>>>      >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 66
>>>      ++++++++++++++++++-------
>>>      >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  3 ++
>>>      >   3 files changed, 63 insertions(+), 24 deletions(-)
>>>      >
>>>      > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>      b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>      > index a1b15d0d6c48..01d3a97248b0 100644
>>>      > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>      > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>      > @@ -604,10 +604,9 @@ int amdgpu_gem_metadata_ioctl(struct
>>>      drm_device *dev, void *data,
>>>      >    * vital here, so they are not reported back to userspace.
>>>      >    */
>>>      >   static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>>>      > -                                 struct amdgpu_vm *vm,
>>>      > -                                 struct amdgpu_bo_va *bo_va,
>>>      > -                                 uint32_t operation)
>>>      > +                                 struct amdgpu_vm *vm)
>>>      >   {
>>>      > +     struct amdgpu_bo_va *bo_va;
>>>      >       int r;
>>>      >
>>>      >       if (!amdgpu_vm_ready(vm))
>>>      > @@ -617,12 +616,18 @@ static void amdgpu_gem_va_update_vm(struct
>>>      amdgpu_device *adev,
>>>      >       if (r)
>>>      >               goto error;
>>>      >
>>>      > -     if (operation == AMDGPU_VA_OP_MAP ||
>>>      > -         operation == AMDGPU_VA_OP_REPLACE) {
>>>      > +     spin_lock(&vm->status_lock);
>>>      > +     while (!list_empty(&vm->dirty)) {
>>>      > +             bo_va = list_first_entry(&vm->dirty, struct
>>>      amdgpu_bo_va,
>>>      > +                                      base.vm_status);
>>>      > +             spin_unlock(&vm->status_lock);
>>>      > +
>>>      >               r = amdgpu_vm_bo_update(adev, bo_va, false);
>>>      >               if (r)
>>>      >                       goto error;
>>>      > +             spin_lock(&vm->status_lock);
>>>      >       }
>>>      > +     spin_unlock(&vm->status_lock);
>>>      >
>>>      >       r = amdgpu_vm_update_pdes(adev, vm, false);
>>>      >
>>>      > @@ -792,8 +797,7 @@ int amdgpu_gem_va_ioctl(struct drm_device
>>>      *dev, void *data,
>>>      >               break;
>>>      >       }
>>>      >       if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) &&
>>>      !amdgpu_vm_debug)
>>>      > -             amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
>>>      > -  args->operation);
>>>      > +             amdgpu_gem_va_update_vm(adev, &fpriv->vm);
>>>      >
>>>      >   error:
>>>      >       drm_exec_fini(&exec);
>>>      > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>      b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>      > index dd6f72e2a1d6..01d31891cd05 100644
>>>      > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>      > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>      > @@ -191,6 +191,21 @@ static void amdgpu_vm_bo_set_evicted(struct
>>>      amdgpu_vm_bo_base *vm_bo, bool evict
>>>      >       spin_unlock(&vm_bo->vm->status_lock);
>>>      >   }
>>>      >
>>>      > +/**
>>>      > + * amdgpu_vm_bo_dirty - vm_bo is dirty
>>>      > + *
>>>      > + * @vm_bo: vm_bo which is dirty
>>>      > + *
>>>      > + * State for normal and per VM BOs that are not moved, but have
>>>      new entries in
>>>      > + * bo_va->invalids.
>>>      > + */
>>>      > +static void amdgpu_vm_bo_dirty(struct amdgpu_vm_bo_base *vm_bo)
>>>      > +{
>>>      > +     spin_lock(&vm_bo->vm->status_lock);
>>>      > +     list_move(&vm_bo->vm_status, &vm_bo->vm->dirty);
>>>      > +     spin_unlock(&vm_bo->vm->status_lock);
>>>      > +}
>>>      > +
>>>      >   /**
>>>      >    * amdgpu_vm_bo_moved - vm_bo is moved
>>>      >    *
>>>      > @@ -1042,6 +1057,9 @@ void amdgpu_vm_get_memory(struct amdgpu_vm
>>>      *vm,
>>>      >       list_for_each_entry_safe(bo_va, tmp, &vm->evicted,
>>>      base.eviction_status)
>>>      >               amdgpu_vm_bo_get_memory(bo_va, stats);
>>>      >
>>>      > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty,
>>>      base.vm_status)
>>>      > +             amdgpu_vm_bo_get_memory(bo_va, stats);
>>>      > +
>>>      >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated,
>>>      base.vm_status)
>>>      >               amdgpu_vm_bo_get_memory(bo_va, stats);
>>>      >
>>>      > @@ -1411,6 +1429,17 @@ int amdgpu_vm_handle_moved(struct
>>>      amdgpu_device *adev,
>>>      >                       dma_resv_unlock(resv);
>>>      >               spin_lock(&vm->status_lock);
>>>      >       }
>>>      > +
>>>      > +     while (!list_empty(&vm->dirty)) {
>>>      > +             bo_va = list_first_entry(&vm->dirty, struct
>>>      amdgpu_bo_va,
>>>      > +                                      base.vm_status);
>>>      > +             spin_unlock(&vm->status_lock);
>>>      > +
>>>      > +             r = amdgpu_vm_bo_update(adev, bo_va, false);
>>>      > +             if (r)
>>>      > +                     return r;
>>>      > +             spin_lock(&vm->status_lock);
>>>      > +     }
>>>      >       spin_unlock(&vm->status_lock);
>>>      >
>>>      >       return 0;
>>>      > @@ -1476,19 +1505,16 @@ static void
>>>      amdgpu_vm_bo_insert_map(struct amdgpu_device *adev,
>>>      >                                   struct amdgpu_bo_va_mapping
>>>      *mapping)
>>>      >   {
>>>      >       struct amdgpu_vm *vm = bo_va->base.vm;
>>>      > -     struct amdgpu_bo *bo = bo_va->base.bo <http://base.bo>;
>>>      >
>>>      >       mapping->bo_va = bo_va;
>>>      >       list_add(&mapping->list, &bo_va->invalids);
>>>      >       amdgpu_vm_it_insert(mapping, &vm->va);
>>>      > +     if (!bo_va->base.moved)
>>>      > +             amdgpu_vm_bo_dirty(&bo_va->base);
>>>      >
>>>      >       if (mapping->flags & AMDGPU_PTE_PRT)
>>>      >               amdgpu_vm_prt_get(adev);
>>>      >
>>>      > -     if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
>>>      > -         !bo_va->base.moved) {
>>>      > -             amdgpu_vm_bo_moved(&bo_va->base);
>>>      > -     }
>>>      >       trace_amdgpu_vm_bo_map(bo_va, mapping);
>>>      >   }
>>>      >
>>>      > @@ -1725,6 +1751,8 @@ int amdgpu_vm_bo_clear_mappings(struct
>>>      amdgpu_device *adev,
>>>      >                       before->flags = tmp->flags;
>>>      >                       before->bo_va = tmp->bo_va;
>>>      >                       list_add(&before->list,
>>>      &tmp->bo_va->invalids);
>>>      > +                     if (!tmp->bo_va->base.moved)
>>>      > +  amdgpu_vm_bo_dirty(&tmp->bo_va->base);
>>>      >               }
>>>      >
>>>      >               /* Remember mapping split at the end */
>>>      > @@ -1736,6 +1764,8 @@ int amdgpu_vm_bo_clear_mappings(struct
>>>      amdgpu_device *adev,
>>>      >                       after->flags = tmp->flags;
>>>      >                       after->bo_va = tmp->bo_va;
>>>      >                       list_add(&after->list, &tmp->bo_va->invalids);
>>>      > +                     if (!tmp->bo_va->base.moved)
>>>      > +  amdgpu_vm_bo_dirty(&tmp->bo_va->base);
>>>      >               }
>>>      >
>>>      >               list_del(&tmp->list);
>>>      > @@ -1761,30 +1791,18 @@ int amdgpu_vm_bo_clear_mappings(struct
>>>      amdgpu_device *adev,
>>>      >
>>>      >       /* Insert partial mapping before the range */
>>>      >       if (!list_empty(&before->list)) {
>>>      > -             struct amdgpu_bo *bo = before->bo_va->base.bo
>>>      <http://base.bo>;
>>>      > -
>>>      >               amdgpu_vm_it_insert(before, &vm->va);
>>>      >               if (before->flags & AMDGPU_PTE_PRT)
>>>      >                       amdgpu_vm_prt_get(adev);
>>>      > -
>>>      > -             if (bo && bo->tbo.base.resv ==
>>>      vm->root.bo->tbo.base.resv &&
>>>      > -                 !before->bo_va->base.moved)
>>>      > -  amdgpu_vm_bo_moved(&before->bo_va->base);
>>>      >       } else {
>>>      >               kfree(before);
>>>      >       }
>>>      >
>>>      >       /* Insert partial mapping after the range */
>>>      >       if (!list_empty(&after->list)) {
>>>      > -             struct amdgpu_bo *bo = after->bo_va->base.bo
>>>      <http://base.bo>;
>>>      > -
>>>      >               amdgpu_vm_it_insert(after, &vm->va);
>>>      >               if (after->flags & AMDGPU_PTE_PRT)
>>>      >                       amdgpu_vm_prt_get(adev);
>>>      > -
>>>      > -             if (bo && bo->tbo.base.resv ==
>>>      vm->root.bo->tbo.base.resv &&
>>>      > -                 !after->bo_va->base.moved)
>>>      > -  amdgpu_vm_bo_moved(&after->bo_va->base);
>>>      >       } else {
>>>      >               kfree(after);
>>>      >       }
>>>      > @@ -2136,6 +2154,7 @@ int amdgpu_vm_init(struct amdgpu_device
>>>      *adev, struct amdgpu_vm *vm, int32_t xcp
>>>      >       INIT_LIST_HEAD(&vm->evicted);
>>>      >       INIT_LIST_HEAD(&vm->relocated);
>>>      >       INIT_LIST_HEAD(&vm->moved);
>>>      > +     INIT_LIST_HEAD(&vm->dirty);
>>>      >       INIT_LIST_HEAD(&vm->idle);
>>>      >       INIT_LIST_HEAD(&vm->invalidated);
>>>      >       spin_lock_init(&vm->status_lock);
>>>      > @@ -2648,11 +2667,13 @@ void amdgpu_debugfs_vm_bo_info(struct
>>>      amdgpu_vm *vm, struct seq_file *m)
>>>      >   {
>>>      >       struct amdgpu_bo_va *bo_va, *tmp;
>>>      >       u64 total_idle = 0;
>>>      > +     u64 total_dirty = 0;
>>>      >       u64 total_relocated = 0;
>>>      >       u64 total_moved = 0;
>>>      >       u64 total_invalidated = 0;
>>>      >       u64 total_done = 0;
>>>      >       unsigned int total_idle_objs = 0;
>>>      > +     unsigned int total_dirty_objs = 0;
>>>      >       unsigned int total_relocated_objs = 0;
>>>      >       unsigned int total_moved_objs = 0;
>>>      >       unsigned int total_invalidated_objs = 0;
>>>      > @@ -2669,6 +2690,15 @@ void amdgpu_debugfs_vm_bo_info(struct
>>>      amdgpu_vm *vm, struct seq_file *m)
>>>      >       total_idle_objs = id;
>>>      >       id = 0;
>>>      >
>>>      > +     seq_puts(m, "\tDirty BOs:\n");
>>>      > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty,
>>>      base.vm_status) {
>>>      > +             if (!bo_va->base.bo <http://base.bo>)
>>>      > +                     continue;
>>>      > +             total_dirty += amdgpu_bo_print_info(id++,
>>>      bo_va->base.bo <http://base.bo>, m);
>>>      > +     }
>>>      > +     total_dirty_objs = id;
>>>      > +     id = 0;
>>>      > +
>>>      >       seq_puts(m, "\tRelocated BOs:\n");
>>>      >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated,
>>>      base.vm_status) {
>>>      >               if (!bo_va->base.bo <http://base.bo>)
>>>      > @@ -2707,6 +2737,8 @@ void amdgpu_debugfs_vm_bo_info(struct
>>>      amdgpu_vm *vm, struct seq_file *m)
>>>      >
>>>      >       seq_printf(m, "\tTotal idle size: %12lld\tobjs:\t%d\n",
>>>      total_idle,
>>>      >                  total_idle_objs);
>>>      > +     seq_printf(m, "\tTotal dirty size:  %12lld\tobjs:\t%d\n",
>>>      total_dirty,
>>>      > +                total_dirty_objs);
>>>      >       seq_printf(m, "\tTotal relocated size:
>>>       %12lld\tobjs:\t%d\n", total_relocated,
>>>      >                  total_relocated_objs);
>>>      >       seq_printf(m, "\tTotal moved size:  %12lld\tobjs:\t%d\n",
>>>      total_moved,
>>>      > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>      b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>      > index d9ab97eabda9..f91d4fcf80b8 100644
>>>      > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>      > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>      > @@ -276,6 +276,9 @@ struct amdgpu_vm {
>>>      >       /* per VM BOs moved, but not yet updated in the PT */
>>>      >       struct list_head        moved;
>>>      >
>>>      > +     /* normal and per VM BOs that are not moved, but have new
>>>      PT entries */
>>>      > +     struct list_head        dirty;
>>>      > +
>>>      >       /* All BOs of this VM not currently in the state machine */
>>>      >       struct list_head        idle;
>>>      >
>>>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 0/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations.
  2023-10-31 13:40 [PATCH 0/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
                   ` (5 preceding siblings ...)
  2023-10-31 13:40 ` [PATCH 6/6] drm/amdgpu: Bump amdgpu driver version Tatsuyuki Ishi
@ 2023-11-02 14:04 ` Tatsuyuki Ishi
  2023-11-02 14:04   ` [PATCH v2 1/3] drm/amdgpu: Don't implicit sync PRT maps Tatsuyuki Ishi
                     ` (2 more replies)
  6 siblings, 3 replies; 34+ messages in thread
From: Tatsuyuki Ishi @ 2023-11-02 14:04 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Tatsuyuki Ishi, christian.koenig

In Vulkan, it is the application's responsibility to perform adequate
synchronization before a sparse unmap, replace or BO destroy operation.
This adds an option to AMDGPU_VA_OPs to disable redundant implicit sync
that happens on sparse unmap or replace operations.

This has seen a significant improvement in stutter in Forza Horizon 5
and Forza Horizon 4. (As games that had significant issues in sparse
binding related stutter).

Compared to the previous series [1], this specifically targets the VM
operations and keep everything else intact, including implicit sync on
kernel-initiated moves.

I've been able to pass a full Vulkan CTS run on Navi 10 with this.

Userspace code for this is available at [2] and a branch for the kernel
code is available at [3].

v2 changes:
- Drop the changes to flush split bindings eagerly as its incompatible
  with TLB flush quirks in current hardware. Drop the refactoring
  commits related to that change too.
- Fixed a missing doc warning.
- Removed an accidentally included ioctl change.

[1]: https://lore.kernel.org/all/20230821062005.109771-1-ishitatsuyuki@gmail.com/
[2]: https://gitlab.freedesktop.org/ishitatsuyuki/mesa/-/commits/vm-explicit-sync
[3]: https://github.com/ishitatsuyuki/linux/tree/explicit-sync-drm-misc-next

Tatsuyuki Ishi (3):
  drm/amdgpu: Don't implicit sync PRT maps.
  drm/amdgpu: Add flag to disable implicit sync for GEM operations.
  drm/amdgpu: Bump amdgpu driver version.

 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c       |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       | 14 ++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |  7 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h     |  6 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        | 47 +++++++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        | 23 +++++----
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 18 +++----
 include/uapi/drm/amdgpu_drm.h                 |  2 +
 10 files changed, 73 insertions(+), 51 deletions(-)

-- 
2.42.0

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 1/3] drm/amdgpu: Don't implicit sync PRT maps.
  2023-11-02 14:04 ` [PATCH v2 0/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
@ 2023-11-02 14:04   ` Tatsuyuki Ishi
  2023-11-02 14:04   ` [PATCH v2 2/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
  2023-11-02 14:04   ` [PATCH v2 3/3] drm/amdgpu: Bump amdgpu driver version Tatsuyuki Ishi
  2 siblings, 0 replies; 34+ messages in thread
From: Tatsuyuki Ishi @ 2023-11-02 14:04 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Tatsuyuki Ishi, christian.koenig

These are considered map operations rather than unmap, and there is no
point of doing implicit synchronization here.

Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index f5daadcec865..7b9762f1cddd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -902,7 +902,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	/* Implicitly sync to command submissions in the same VM before
 	 * unmapping. Sync to moving fences before mapping.
 	 */
-	if (!(flags & AMDGPU_PTE_VALID))
+	if (!(flags & (AMDGPU_PTE_VALID | AMDGPU_PTE_PRT)))
 		sync_mode = AMDGPU_SYNC_EQ_OWNER;
 	else
 		sync_mode = AMDGPU_SYNC_EXPLICIT;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 2/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations.
  2023-11-02 14:04 ` [PATCH v2 0/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
  2023-11-02 14:04   ` [PATCH v2 1/3] drm/amdgpu: Don't implicit sync PRT maps Tatsuyuki Ishi
@ 2023-11-02 14:04   ` Tatsuyuki Ishi
  2023-11-06 13:44     ` Christian König
  2023-11-02 14:04   ` [PATCH v2 3/3] drm/amdgpu: Bump amdgpu driver version Tatsuyuki Ishi
  2 siblings, 1 reply; 34+ messages in thread
From: Tatsuyuki Ishi @ 2023-11-02 14:04 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Tatsuyuki Ishi, christian.koenig

In Vulkan, it is the application's responsibility to perform adequate
synchronization before a sparse unmap, replace or BO destroy operation.
Until now, the kernel applied the same rule as implicitly-synchronized
APIs like OpenGL, which with per-VM BOs made page table updates stall the
queue completely. The newly added AMDGPU_VM_EXPLICIT_SYNC flag allows
drivers to opt-out of this behavior, while still ensuring adequate implicit
sync happens for kernel-initiated updates (e.g. BO moves).

We record whether to use implicit sync or not for each freed mapping. To
avoid increasing the mapping struct's size, this is union-ized with the
interval tree field which is unused after the unmap.

The reason this is done with a GEM ioctl flag, instead of being a VM /
context global setting, is that the current libdrm implementation shares
the DRM handle even between different kind of drivers (radeonsi vs radv).

Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c       |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       | 14 ++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |  7 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h     |  6 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        | 47 +++++++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        | 23 +++++----
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 18 +++----
 include/uapi/drm/amdgpu_drm.h                 |  2 +
 9 files changed, 71 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 7d6daf8d2bfa..10e129bff977 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1196,7 +1196,7 @@ static void unmap_bo_from_gpuvm(struct kgd_mem *mem,
 	struct amdgpu_device *adev = entry->adev;
 	struct amdgpu_vm *vm = bo_va->base.vm;
 
-	amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
+	amdgpu_vm_bo_unmap(adev, bo_va, entry->va, true);
 
 	amdgpu_vm_clear_freed(adev, vm, &bo_va->last_pt_update);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 720011019741..612279e65bff 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -122,7 +122,7 @@ int amdgpu_unmap_static_csa(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		}
 	}
 
-	r = amdgpu_vm_bo_unmap(adev, bo_va, csa_addr);
+	r = amdgpu_vm_bo_unmap(adev, bo_va, csa_addr, true);
 	if (r) {
 		DRM_ERROR("failed to do bo_unmap on static CSA, err=%d\n", r);
 		goto error;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index a1b15d0d6c48..cca68b89754e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -667,9 +667,9 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 	const uint32_t valid_flags = AMDGPU_VM_DELAY_UPDATE |
 		AMDGPU_VM_PAGE_READABLE | AMDGPU_VM_PAGE_WRITEABLE |
 		AMDGPU_VM_PAGE_EXECUTABLE | AMDGPU_VM_MTYPE_MASK |
-		AMDGPU_VM_PAGE_NOALLOC;
+		AMDGPU_VM_PAGE_NOALLOC | AMDGPU_VM_EXPLICIT_SYNC;
 	const uint32_t prt_flags = AMDGPU_VM_DELAY_UPDATE |
-		AMDGPU_VM_PAGE_PRT;
+		AMDGPU_VM_PAGE_PRT | AMDGPU_VM_EXPLICIT_SYNC;
 
 	struct drm_amdgpu_gem_va *args = data;
 	struct drm_gem_object *gobj;
@@ -680,6 +680,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 	struct drm_exec exec;
 	uint64_t va_flags;
 	uint64_t vm_size;
+	bool sync_unmap;
 	int r = 0;
 
 	if (args->va_address < AMDGPU_VA_RESERVED_SIZE) {
@@ -715,6 +716,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
+	sync_unmap = !(args->flags & AMDGPU_VM_EXPLICIT_SYNC);
+
 	switch (args->operation) {
 	case AMDGPU_VA_OP_MAP:
 	case AMDGPU_VA_OP_UNMAP:
@@ -774,19 +777,20 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 				     va_flags);
 		break;
 	case AMDGPU_VA_OP_UNMAP:
-		r = amdgpu_vm_bo_unmap(adev, bo_va, args->va_address);
+		r = amdgpu_vm_bo_unmap(adev, bo_va, args->va_address,
+				       sync_unmap);
 		break;
 
 	case AMDGPU_VA_OP_CLEAR:
 		r = amdgpu_vm_bo_clear_mappings(adev, &fpriv->vm,
 						args->va_address,
-						args->map_size);
+						args->map_size, sync_unmap);
 		break;
 	case AMDGPU_VA_OP_REPLACE:
 		va_flags = amdgpu_gem_va_map_flags(adev, args->flags);
 		r = amdgpu_vm_bo_replace_map(adev, bo_va, args->va_address,
 					     args->offset_in_bo, args->map_size,
-					     va_flags);
+					     va_flags, sync_unmap);
 		break;
 	default:
 		break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index f3ee83cdf97e..28be03f1bbcf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -67,7 +67,12 @@ struct amdgpu_bo_va_mapping {
 	struct rb_node			rb;
 	uint64_t			start;
 	uint64_t			last;
-	uint64_t			__subtree_last;
+	union {
+		/* BOs in interval tree only */
+		uint64_t		__subtree_last;
+		/* Freed BOs only */
+		bool			sync_unmap;
+	};
 	uint64_t			offset;
 	uint64_t			flags;
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index 2fd1bfb35916..e71443c8c59b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -276,6 +276,7 @@ TRACE_EVENT(amdgpu_vm_bo_unmap,
 			     __field(long, last)
 			     __field(u64, offset)
 			     __field(u64, flags)
+			     __field(bool, sync_unmap)
 			     ),
 
 	    TP_fast_assign(
@@ -284,10 +285,11 @@ TRACE_EVENT(amdgpu_vm_bo_unmap,
 			   __entry->last = mapping->last;
 			   __entry->offset = mapping->offset;
 			   __entry->flags = mapping->flags;
+			   __entry->sync_unmap = mapping->sync_unmap;
 			   ),
-	    TP_printk("bo=%p, start=%lx, last=%lx, offset=%010llx, flags=%llx",
+	    TP_printk("bo=%p, start=%lx, last=%lx, offset=%010llx, flags=%llx, sync_unmap=%d",
 		      __entry->bo, __entry->start, __entry->last,
-		      __entry->offset, __entry->flags)
+		      __entry->offset, __entry->flags, __entry->sync_unmap)
 );
 
 DECLARE_EVENT_CLASS(amdgpu_vm_mapping,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 7b9762f1cddd..a74472e16952 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -844,6 +844,7 @@ static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence,
  * @immediate: immediate submission in a page fault
  * @unlocked: unlocked invalidation during MM callback
  * @flush_tlb: trigger tlb invalidation after update completed
+ * @sync_unmap: wait for BO users before unmapping
  * @resv: fences we need to sync to
  * @start: start of mapped range
  * @last: last mapped entry
@@ -861,8 +862,9 @@ static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence,
  */
 int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 			   bool immediate, bool unlocked, bool flush_tlb,
-			   struct dma_resv *resv, uint64_t start, uint64_t last,
-			   uint64_t flags, uint64_t offset, uint64_t vram_base,
+			   bool sync_unmap, struct dma_resv *resv,
+			   uint64_t start, uint64_t last, uint64_t flags,
+			   uint64_t offset, uint64_t vram_base,
 			   struct ttm_resource *res, dma_addr_t *pages_addr,
 			   struct dma_fence **fence)
 {
@@ -902,7 +904,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	/* Implicitly sync to command submissions in the same VM before
 	 * unmapping. Sync to moving fences before mapping.
 	 */
-	if (!(flags & (AMDGPU_PTE_VALID | AMDGPU_PTE_PRT)))
+	if (!(flags & (AMDGPU_PTE_VALID | AMDGPU_PTE_PRT)) && sync_unmap)
 		sync_mode = AMDGPU_SYNC_EQ_OWNER;
 	else
 		sync_mode = AMDGPU_SYNC_EXPLICIT;
@@ -1145,10 +1147,10 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
 		trace_amdgpu_vm_bo_update(mapping);
 
 		r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb,
-					   resv, mapping->start, mapping->last,
-					   update_flags, mapping->offset,
-					   vram_base, mem, pages_addr,
-					   last_update);
+					   true, resv, mapping->start,
+					   mapping->last, update_flags,
+					   mapping->offset, vram_base, mem,
+					   pages_addr, last_update);
 		if (r)
 			return r;
 	}
@@ -1340,7 +1342,8 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
 		    mapping->start < AMDGPU_GMC_HOLE_START)
 			init_pte_value = AMDGPU_PTE_DEFAULT_ATC;
 
-		r = amdgpu_vm_update_range(adev, vm, false, false, true, resv,
+		r = amdgpu_vm_update_range(adev, vm, false, false, true,
+					   mapping->sync_unmap, resv,
 					   mapping->start, mapping->last,
 					   init_pte_value, 0, 0, NULL, NULL,
 					   &f);
@@ -1572,6 +1575,7 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
  * @offset: requested offset in the BO
  * @size: BO size in bytes
  * @flags: attributes of pages (read/write/valid/etc.)
+ * @sync_unmap: wait for BO users before replacing existing mapping
  *
  * Add a mapping of the BO at the specefied addr into the VM. Replace existing
  * mappings as we do so.
@@ -1582,9 +1586,9 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
  * Object has to be reserved and unreserved outside!
  */
 int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
-			     struct amdgpu_bo_va *bo_va,
-			     uint64_t saddr, uint64_t offset,
-			     uint64_t size, uint64_t flags)
+			     struct amdgpu_bo_va *bo_va, uint64_t saddr,
+			     uint64_t offset, uint64_t size, uint64_t flags,
+			     bool sync_unmap)
 {
 	struct amdgpu_bo_va_mapping *mapping;
 	struct amdgpu_bo *bo = bo_va->base.bo;
@@ -1608,7 +1612,7 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
 	if (!mapping)
 		return -ENOMEM;
 
-	r = amdgpu_vm_bo_clear_mappings(adev, bo_va->base.vm, saddr, size);
+	r = amdgpu_vm_bo_clear_mappings(adev, bo_va->base.vm, saddr, size, sync_unmap);
 	if (r) {
 		kfree(mapping);
 		return r;
@@ -1633,6 +1637,7 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
  * @adev: amdgpu_device pointer
  * @bo_va: bo_va to remove the address from
  * @saddr: where to the BO is mapped
+ * @sync_unmap: wait for BO users before unmapping
  *
  * Remove a mapping of the BO at the specefied addr from the VM.
  *
@@ -1641,9 +1646,8 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
  *
  * Object has to be reserved and unreserved outside!
  */
-int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
-		       struct amdgpu_bo_va *bo_va,
-		       uint64_t saddr)
+int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
+		       uint64_t saddr, bool sync_unmap)
 {
 	struct amdgpu_bo_va_mapping *mapping;
 	struct amdgpu_vm *vm = bo_va->base.vm;
@@ -1671,6 +1675,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
 	list_del(&mapping->list);
 	amdgpu_vm_it_remove(mapping, &vm->va);
 	mapping->bo_va = NULL;
+	mapping->sync_unmap = sync_unmap;
 	trace_amdgpu_vm_bo_unmap(bo_va, mapping);
 
 	if (valid)
@@ -1689,6 +1694,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
  * @vm: VM structure to use
  * @saddr: start of the range
  * @size: size of the range
+ * @sync_unmap: wait for BO users before unmapping
  *
  * Remove all mappings in a range, split them as appropriate.
  *
@@ -1696,8 +1702,8 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
  * 0 for success, error for failure.
  */
 int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
-				struct amdgpu_vm *vm,
-				uint64_t saddr, uint64_t size)
+				struct amdgpu_vm *vm, uint64_t saddr,
+				uint64_t size, bool sync_unmap)
 {
 	struct amdgpu_bo_va_mapping *before, *after, *tmp, *next;
 	LIST_HEAD(removed);
@@ -1761,6 +1767,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
 		    tmp->last = eaddr;
 
 		tmp->bo_va = NULL;
+		tmp->sync_unmap = sync_unmap;
 		list_add(&tmp->list, &vm->freed);
 		trace_amdgpu_vm_bo_unmap(NULL, tmp);
 	}
@@ -1889,6 +1896,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
 		list_del(&mapping->list);
 		amdgpu_vm_it_remove(mapping, &vm->va);
 		mapping->bo_va = NULL;
+		mapping->sync_unmap = true;
 		trace_amdgpu_vm_bo_unmap(bo_va, mapping);
 		list_add(&mapping->list, &vm->freed);
 	}
@@ -2617,8 +2625,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid,
 		goto error_unlock;
 	}
 
-	r = amdgpu_vm_update_range(adev, vm, true, false, false, NULL, addr,
-				   addr, flags, value, 0, NULL, NULL, NULL);
+	r = amdgpu_vm_update_range(adev, vm, true, false, false, true, NULL,
+				   addr, addr, flags, value, 0, NULL, NULL,
+				   NULL);
 	if (r)
 		goto error_unlock;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 204ab13184ed..73b7b49fdb2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -423,12 +423,12 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
 			    struct amdgpu_vm *vm, struct amdgpu_bo *bo);
 int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 			   bool immediate, bool unlocked, bool flush_tlb,
-			   struct dma_resv *resv, uint64_t start, uint64_t last,
-			   uint64_t flags, uint64_t offset, uint64_t vram_base,
+			   bool sync_unmap, struct dma_resv *resv,
+			   uint64_t start, uint64_t last, uint64_t flags,
+			   uint64_t offset, uint64_t vram_base,
 			   struct ttm_resource *res, dma_addr_t *pages_addr,
 			   struct dma_fence **fence);
-int amdgpu_vm_bo_update(struct amdgpu_device *adev,
-			struct amdgpu_bo_va *bo_va,
+int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
 			bool clear);
 bool amdgpu_vm_evictable(struct amdgpu_bo *bo);
 void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
@@ -444,15 +444,14 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
 		     uint64_t addr, uint64_t offset,
 		     uint64_t size, uint64_t flags);
 int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
-			     struct amdgpu_bo_va *bo_va,
-			     uint64_t addr, uint64_t offset,
-			     uint64_t size, uint64_t flags);
-int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
-		       struct amdgpu_bo_va *bo_va,
-		       uint64_t addr);
+			     struct amdgpu_bo_va *bo_va, uint64_t addr,
+			     uint64_t offset, uint64_t size, uint64_t flags,
+			     bool sync_unmap);
+int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
+		       uint64_t addr, bool sync_unmap);
 int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
-				struct amdgpu_vm *vm,
-				uint64_t saddr, uint64_t size);
+				struct amdgpu_vm *vm, uint64_t saddr,
+				uint64_t size, bool sync_unmap);
 struct amdgpu_bo_va_mapping *amdgpu_vm_bo_lookup_mapping(struct amdgpu_vm *vm,
 							 uint64_t addr);
 void amdgpu_vm_bo_trace_cs(struct amdgpu_vm *vm, struct ww_acquire_ctx *ticket);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index bb16b795d1bc..6eb4a0a4bc84 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1291,9 +1291,9 @@ svm_range_unmap_from_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 
 	pr_debug("[0x%llx 0x%llx]\n", start, last);
 
-	return amdgpu_vm_update_range(adev, vm, false, true, true, NULL, start,
-				      last, init_pte_value, 0, 0, NULL, NULL,
-				      fence);
+	return amdgpu_vm_update_range(adev, vm, false, true, true, true, NULL,
+				      start, last, init_pte_value, 0, 0, NULL,
+				      NULL, fence);
 }
 
 static int
@@ -1398,12 +1398,12 @@ svm_range_map_to_gpu(struct kfd_process_device *pdd, struct svm_range *prange,
 		 * different memory partition based on fpfn/lpfn, we should use
 		 * same vm_manager.vram_base_offset regardless memory partition.
 		 */
-		r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb, NULL,
-					   last_start, prange->start + i,
-					   pte_flags,
-					   (last_start - prange->start) << PAGE_SHIFT,
-					   bo_adev ? bo_adev->vm_manager.vram_base_offset : 0,
-					   NULL, dma_addr, &vm->last_update);
+		r = amdgpu_vm_update_range(
+			adev, vm, false, false, flush_tlb, true, NULL,
+			last_start, prange->start + i, pte_flags,
+			(last_start - prange->start) << PAGE_SHIFT,
+			bo_adev ? bo_adev->vm_manager.vram_base_offset : 0,
+			NULL, dma_addr, &vm->last_update);
 
 		for (j = last_start - prange->start; j <= i; j++)
 			dma_addr[j] |= last_domain;
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index f477eda6a2b8..3cdcc299956e 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -556,6 +556,8 @@ struct drm_amdgpu_gem_op {
 #define AMDGPU_VM_MTYPE_RW		(5 << 5)
 /* don't allocate MALL */
 #define AMDGPU_VM_PAGE_NOALLOC		(1 << 9)
+/* don't sync on unmap */
+#define AMDGPU_VM_EXPLICIT_SYNC		(1 << 10)
 
 struct drm_amdgpu_gem_va {
 	/** GEM object handle */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 3/3] drm/amdgpu: Bump amdgpu driver version.
  2023-11-02 14:04 ` [PATCH v2 0/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
  2023-11-02 14:04   ` [PATCH v2 1/3] drm/amdgpu: Don't implicit sync PRT maps Tatsuyuki Ishi
  2023-11-02 14:04   ` [PATCH v2 2/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
@ 2023-11-02 14:04   ` Tatsuyuki Ishi
  2 siblings, 0 replies; 34+ messages in thread
From: Tatsuyuki Ishi @ 2023-11-02 14:04 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Tatsuyuki Ishi, christian.koenig

For detection of the new explicit sync functionality without having to try
the ioctl.

Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 81edf66dbea8..2aa406dee192 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -113,9 +113,10 @@
  *            gl1c_cache_size, gl2c_cache_size, mall_size, enabled_rb_pipes_mask_hi
  *   3.53.0 - Support for GFX11 CP GFX shadowing
  *   3.54.0 - Add AMDGPU_CTX_QUERY2_FLAGS_RESET_IN_PROGRESS support
+ * - 3.55.0 - Add AMDGPU_VM_EXPLICIT_SYNC flag for GEM operations.
  */
 #define KMS_DRIVER_MAJOR	3
-#define KMS_DRIVER_MINOR	54
+#define KMS_DRIVER_MINOR	55
 #define KMS_DRIVER_PATCHLEVEL	0
 
 unsigned int amdgpu_vram_limit = UINT_MAX;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.
  2023-10-31 14:07       ` Christian König
  2023-10-31 14:17         ` Bas Nieuwenhuizen
  2023-11-02  2:36         ` Lang Yu
@ 2023-11-06  7:56         ` Tatsuyuki Ishi
  2023-11-06 13:33           ` Christian König
  2 siblings, 1 reply; 34+ messages in thread
From: Tatsuyuki Ishi @ 2023-11-06  7:56 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel

[-- Attachment #1: Type: text/plain, Size: 14014 bytes --]


> On Oct 31, 2023, at 23:07, Christian König <christian.koenig@amd.com> wrote:
> 
> Am 31.10.23 um 14:59 schrieb Bas Nieuwenhuizen:
>> 
>> 
>> On Tue, Oct 31, 2023 at 2:57 PM Christian König <christian.koenig@amd.com <mailto:christian.koenig@amd.com>> wrote:
>>> Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi:
>>> > The current amdgpu_gem_va_update_vm only tries to perform updates for the
>>> > BO specified in the GEM ioctl; however, when a binding is split, the
>>> > adjacent bindings also need to be updated. Such updates currently ends up
>>> > getting deferred until next submission which causes stalls.
>>> 
>>> Yeah, that is a necessity. The hardware simply doesn't support what you 
>>> try to do here in all cases.
>> 
>> What can the hardware not do here? Is this just needing to wait for TLB flushes before we can free pagetables, can we just delay that?
> 
> On some hardware generations (especially Navi1x, but also everything older than Polaris) you can't invalidate the TLB while it is in use.
> 
> For Polaris and older it just means that you don't have a guarantee that the shader can't access the memory any more. So delaying the free operation helps here.
> 
> But for Navi1x it's a workaround for a hardware bug. If you try to invalidate the TLB while it is in use you can potentially triggering memory accesses to random addresses.
> 
> That's why we still delay TLB invalidation's to the next CS and use a new VMID for each submission instead of invalidating the old one.

Thanks for the information. I looked into the VMID allocation logic and I see that if concurrent_flush is false, then we defer any flush (or VMID reuse that requires a flush) until that VMID is idle.

What patch #3 ends up doing is just performing the PT update right away. Any required TLB update is deferred by amdgpu_vm_update_range through the increment of tlb_seq, and amdgpu_vmid.c is responsible for doing the actual TLB flush in a manner that does not trigger the bug.

Can you confirm that this would be fine for the current hardware?

Tatsuyuki.

> 
> I'm currently working on changing that for Navi2x and newer (maybe Vega as well), but this is something you can really only do on some hw generations after validating that it works.
> 
> Regards,
> Christian. 
> 
>>  
>>> 
>>> So this approach won't work in general.
>>> 
>>> Regards,
>>> Christian.
>>> 
>>> >
>>> > Introduce a new state "dirty", shared between per-VM BOs and traditional
>>> > BOs, containing all BOs that have pending updates in `invalids`.
>>> > amdgpu_gem_va_update_vm will now simply flush any pending updates for BOs
>>> > in the dirty state.
>>> >
>>> > Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com <mailto:ishitatsuyuki@gmail.com>>
>>> > ---
>>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 18 ++++---
>>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 66 ++++++++++++++++++-------
>>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  3 ++
>>> >   3 files changed, 63 insertions(+), 24 deletions(-)
>>> >
>>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> > index a1b15d0d6c48..01d3a97248b0 100644
>>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> > @@ -604,10 +604,9 @@ int amdgpu_gem_metadata_ioctl(struct drm_device *dev, void *data,
>>> >    * vital here, so they are not reported back to userspace.
>>> >    */
>>> >   static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>>> > -                                 struct amdgpu_vm *vm,
>>> > -                                 struct amdgpu_bo_va *bo_va,
>>> > -                                 uint32_t operation)
>>> > +                                 struct amdgpu_vm *vm)
>>> >   {
>>> > +     struct amdgpu_bo_va *bo_va;
>>> >       int r;
>>> >   
>>> >       if (!amdgpu_vm_ready(vm))
>>> > @@ -617,12 +616,18 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>>> >       if (r)
>>> >               goto error;
>>> >   
>>> > -     if (operation == AMDGPU_VA_OP_MAP ||
>>> > -         operation == AMDGPU_VA_OP_REPLACE) {
>>> > +     spin_lock(&vm->status_lock);
>>> > +     while (!list_empty(&vm->dirty)) {
>>> > +             bo_va = list_first_entry(&vm->dirty, struct amdgpu_bo_va,
>>> > +                                      base.vm_status);
>>> > +             spin_unlock(&vm->status_lock);
>>> > +
>>> >               r = amdgpu_vm_bo_update(adev, bo_va, false);
>>> >               if (r)
>>> >                       goto error;
>>> > +             spin_lock(&vm->status_lock);
>>> >       }
>>> > +     spin_unlock(&vm->status_lock);
>>> >   
>>> >       r = amdgpu_vm_update_pdes(adev, vm, false);
>>> >   
>>> > @@ -792,8 +797,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
>>> >               break;
>>> >       }
>>> >       if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) && !amdgpu_vm_debug)
>>> > -             amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
>>> > -                                     args->operation);
>>> > +             amdgpu_gem_va_update_vm(adev, &fpriv->vm);
>>> >   
>>> >   error:
>>> >       drm_exec_fini(&exec);
>>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> > index dd6f72e2a1d6..01d31891cd05 100644
>>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> > @@ -191,6 +191,21 @@ static void amdgpu_vm_bo_set_evicted(struct amdgpu_vm_bo_base *vm_bo, bool evict
>>> >       spin_unlock(&vm_bo->vm->status_lock);
>>> >   }
>>> >   
>>> > +/**
>>> > + * amdgpu_vm_bo_dirty - vm_bo is dirty
>>> > + *
>>> > + * @vm_bo: vm_bo which is dirty
>>> > + *
>>> > + * State for normal and per VM BOs that are not moved, but have new entries in
>>> > + * bo_va->invalids.
>>> > + */
>>> > +static void amdgpu_vm_bo_dirty(struct amdgpu_vm_bo_base *vm_bo)
>>> > +{
>>> > +     spin_lock(&vm_bo->vm->status_lock);
>>> > +     list_move(&vm_bo->vm_status, &vm_bo->vm->dirty);
>>> > +     spin_unlock(&vm_bo->vm->status_lock);
>>> > +}
>>> > +
>>> >   /**
>>> >    * amdgpu_vm_bo_moved - vm_bo is moved
>>> >    *
>>> > @@ -1042,6 +1057,9 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm,
>>> >       list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.eviction_status)
>>> >               amdgpu_vm_bo_get_memory(bo_va, stats);
>>> >   
>>> > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty, base.vm_status)
>>> > +             amdgpu_vm_bo_get_memory(bo_va, stats);
>>> > +
>>> >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status)
>>> >               amdgpu_vm_bo_get_memory(bo_va, stats);
>>> >   
>>> > @@ -1411,6 +1429,17 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
>>> >                       dma_resv_unlock(resv);
>>> >               spin_lock(&vm->status_lock);
>>> >       }
>>> > +
>>> > +     while (!list_empty(&vm->dirty)) {
>>> > +             bo_va = list_first_entry(&vm->dirty, struct amdgpu_bo_va,
>>> > +                                      base.vm_status);
>>> > +             spin_unlock(&vm->status_lock);
>>> > +
>>> > +             r = amdgpu_vm_bo_update(adev, bo_va, false);
>>> > +             if (r)
>>> > +                     return r;
>>> > +             spin_lock(&vm->status_lock);
>>> > +     }
>>> >       spin_unlock(&vm->status_lock);
>>> >   
>>> >       return 0;
>>> > @@ -1476,19 +1505,16 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device *adev,
>>> >                                   struct amdgpu_bo_va_mapping *mapping)
>>> >   {
>>> >       struct amdgpu_vm *vm = bo_va->base.vm;
>>> > -     struct amdgpu_bo *bo = bo_va->base.bo <http://base.bo/>;
>>> >   
>>> >       mapping->bo_va = bo_va;
>>> >       list_add(&mapping->list, &bo_va->invalids);
>>> >       amdgpu_vm_it_insert(mapping, &vm->va);
>>> > +     if (!bo_va->base.moved)
>>> > +             amdgpu_vm_bo_dirty(&bo_va->base);
>>> >   
>>> >       if (mapping->flags & AMDGPU_PTE_PRT)
>>> >               amdgpu_vm_prt_get(adev);
>>> >   
>>> > -     if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
>>> > -         !bo_va->base.moved) {
>>> > -             amdgpu_vm_bo_moved(&bo_va->base);
>>> > -     }
>>> >       trace_amdgpu_vm_bo_map(bo_va, mapping);
>>> >   }
>>> >   
>>> > @@ -1725,6 +1751,8 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>>> >                       before->flags = tmp->flags;
>>> >                       before->bo_va = tmp->bo_va;
>>> >                       list_add(&before->list, &tmp->bo_va->invalids);
>>> > +                     if (!tmp->bo_va->base.moved)
>>> > +                             amdgpu_vm_bo_dirty(&tmp->bo_va->base);
>>> >               }
>>> >   
>>> >               /* Remember mapping split at the end */
>>> > @@ -1736,6 +1764,8 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>>> >                       after->flags = tmp->flags;
>>> >                       after->bo_va = tmp->bo_va;
>>> >                       list_add(&after->list, &tmp->bo_va->invalids);
>>> > +                     if (!tmp->bo_va->base.moved)
>>> > +                             amdgpu_vm_bo_dirty(&tmp->bo_va->base);
>>> >               }
>>> >   
>>> >               list_del(&tmp->list);
>>> > @@ -1761,30 +1791,18 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>>> >   
>>> >       /* Insert partial mapping before the range */
>>> >       if (!list_empty(&before->list)) {
>>> > -             struct amdgpu_bo *bo = before->bo_va->base.bo <http://base.bo/>;
>>> > -
>>> >               amdgpu_vm_it_insert(before, &vm->va);
>>> >               if (before->flags & AMDGPU_PTE_PRT)
>>> >                       amdgpu_vm_prt_get(adev);
>>> > -
>>> > -             if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
>>> > -                 !before->bo_va->base.moved)
>>> > -                     amdgpu_vm_bo_moved(&before->bo_va->base);
>>> >       } else {
>>> >               kfree(before);
>>> >       }
>>> >   
>>> >       /* Insert partial mapping after the range */
>>> >       if (!list_empty(&after->list)) {
>>> > -             struct amdgpu_bo *bo = after->bo_va->base.bo <http://base.bo/>;
>>> > -
>>> >               amdgpu_vm_it_insert(after, &vm->va);
>>> >               if (after->flags & AMDGPU_PTE_PRT)
>>> >                       amdgpu_vm_prt_get(adev);
>>> > -
>>> > -             if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
>>> > -                 !after->bo_va->base.moved)
>>> > -                     amdgpu_vm_bo_moved(&after->bo_va->base);
>>> >       } else {
>>> >               kfree(after);
>>> >       }
>>> > @@ -2136,6 +2154,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, int32_t xcp
>>> >       INIT_LIST_HEAD(&vm->evicted);
>>> >       INIT_LIST_HEAD(&vm->relocated);
>>> >       INIT_LIST_HEAD(&vm->moved);
>>> > +     INIT_LIST_HEAD(&vm->dirty);
>>> >       INIT_LIST_HEAD(&vm->idle);
>>> >       INIT_LIST_HEAD(&vm->invalidated);
>>> >       spin_lock_init(&vm->status_lock);
>>> > @@ -2648,11 +2667,13 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>>> >   {
>>> >       struct amdgpu_bo_va *bo_va, *tmp;
>>> >       u64 total_idle = 0;
>>> > +     u64 total_dirty = 0;
>>> >       u64 total_relocated = 0;
>>> >       u64 total_moved = 0;
>>> >       u64 total_invalidated = 0;
>>> >       u64 total_done = 0;
>>> >       unsigned int total_idle_objs = 0;
>>> > +     unsigned int total_dirty_objs = 0;
>>> >       unsigned int total_relocated_objs = 0;
>>> >       unsigned int total_moved_objs = 0;
>>> >       unsigned int total_invalidated_objs = 0;
>>> > @@ -2669,6 +2690,15 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>>> >       total_idle_objs = id;
>>> >       id = 0;
>>> >   
>>> > +     seq_puts(m, "\tDirty BOs:\n");
>>> > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty, base.vm_status) {
>>> > +             if (!bo_va->base.bo <http://base.bo/>)
>>> > +                     continue;
>>> > +             total_dirty += amdgpu_bo_print_info(id++, bo_va->base.bo <http://base.bo/>, m);
>>> > +     }
>>> > +     total_dirty_objs = id;
>>> > +     id = 0;
>>> > +
>>> >       seq_puts(m, "\tRelocated BOs:\n");
>>> >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status) {
>>> >               if (!bo_va->base.bo <http://base.bo/>)
>>> > @@ -2707,6 +2737,8 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)
>>> >   
>>> >       seq_printf(m, "\tTotal idle size:        %12lld\tobjs:\t%d\n", total_idle,
>>> >                  total_idle_objs);
>>> > +     seq_printf(m, "\tTotal dirty size:       %12lld\tobjs:\t%d\n", total_dirty,
>>> > +                total_dirty_objs);
>>> >       seq_printf(m, "\tTotal relocated size:   %12lld\tobjs:\t%d\n", total_relocated,
>>> >                  total_relocated_objs);
>>> >       seq_printf(m, "\tTotal moved size:       %12lld\tobjs:\t%d\n", total_moved,
>>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> > index d9ab97eabda9..f91d4fcf80b8 100644
>>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> > @@ -276,6 +276,9 @@ struct amdgpu_vm {
>>> >       /* per VM BOs moved, but not yet updated in the PT */
>>> >       struct list_head        moved;
>>> >   
>>> > +     /* normal and per VM BOs that are not moved, but have new PT entries */
>>> > +     struct list_head        dirty;
>>> > +
>>> >       /* All BOs of this VM not currently in the state machine */
>>> >       struct list_head        idle;
>>> >   
>>> 
> 


[-- Attachment #2: Type: text/html, Size: 27243 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.
  2023-11-06  7:56         ` Tatsuyuki Ishi
@ 2023-11-06 13:33           ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2023-11-06 13:33 UTC (permalink / raw)
  To: Tatsuyuki Ishi; +Cc: amd-gfx, dri-devel

[-- Attachment #1: Type: text/plain, Size: 15681 bytes --]

Am 06.11.23 um 08:56 schrieb Tatsuyuki Ishi:
>
>> On Oct 31, 2023, at 23:07, Christian König <christian.koenig@amd.com> 
>> wrote:
>>
>> Am 31.10.23 um 14:59 schrieb Bas Nieuwenhuizen:
>>>
>>>
>>> On Tue, Oct 31, 2023 at 2:57 PM Christian König 
>>> <christian.koenig@amd.com> wrote:
>>>
>>>     Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi:
>>>     > The current amdgpu_gem_va_update_vm only tries to perform
>>>     updates for the
>>>     > BO specified in the GEM ioctl; however, when a binding is
>>>     split, the
>>>     > adjacent bindings also need to be updated. Such updates
>>>     currently ends up
>>>     > getting deferred until next submission which causes stalls.
>>>
>>>     Yeah, that is a necessity. The hardware simply doesn't support
>>>     what you
>>>     try to do here in all cases.
>>>
>>>
>>> What can the hardware not do here? Is this just needing to wait for 
>>> TLB flushes before we can free pagetables, can we just delay that?
>>
>> On some hardware generations (especially Navi1x, but also everything 
>> older than Polaris) you can't invalidate the TLB while it is in use.
>>
>> For Polaris and older it just means that you don't have a guarantee 
>> that the shader can't access the memory any more. So delaying the 
>> free operation helps here.
>>
>> But for Navi1x it's a workaround for a hardware bug. If you try to 
>> invalidate the TLB while it is in use you can potentially triggering 
>> memory accesses to random addresses.
>>
>> That's why we still delay TLB invalidation's to the next CS and use a 
>> new VMID for each submission instead of invalidating the old one.
>
> Thanks for the information. I looked into the VMID allocation logic 
> and I see that if concurrent_flush is false, then we defer any flush 
> (or VMID reuse that requires a flush) until that VMID is idle.
>
> What patch #3 ends up doing is just performing the PT update right 
> away. Any required TLB update is deferred by amdgpu_vm_update_range 
> through the increment of tlb_seq, and amdgpu_vmid.c is responsible for 
> doing the actual TLB flush in a manner that does not trigger the bug.
>
> Can you confirm that this would be fine for the current hardware?

Yeah, that should work. I'm just think about the UAPI a bit, we should 
probably improve this to use something like a drm_syncobj instead of 
just a flag to be future prove.

Christian.

>
> Tatsuyuki.
>
>>
>> I'm currently working on changing that for Navi2x and newer (maybe 
>> Vega as well), but this is something you can really only do on some 
>> hw generations after validating that it works.
>>
>> Regards,
>> Christian.
>>
>>>
>>>
>>>     So this approach won't work in general.
>>>
>>>     Regards,
>>>     Christian.
>>>
>>>     >
>>>     > Introduce a new state "dirty", shared between per-VM BOs and
>>>     traditional
>>>     > BOs, containing all BOs that have pending updates in `invalids`.
>>>     > amdgpu_gem_va_update_vm will now simply flush any pending
>>>     updates for BOs
>>>     > in the dirty state.
>>>     >
>>>     > Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
>>>     > ---
>>>     >   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 18 ++++---
>>>     >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 66
>>>     ++++++++++++++++++-------
>>>     >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  | 3 ++
>>>     >   3 files changed, 63 insertions(+), 24 deletions(-)
>>>     >
>>>     > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>     > index a1b15d0d6c48..01d3a97248b0 100644
>>>     > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>     > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>     > @@ -604,10 +604,9 @@ int amdgpu_gem_metadata_ioctl(struct
>>>     drm_device *dev, void *data,
>>>     >    * vital here, so they are not reported back to userspace.
>>>     >    */
>>>     >   static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>>>     > -                                 struct amdgpu_vm *vm,
>>>     > -                                 struct amdgpu_bo_va *bo_va,
>>>     > -                                 uint32_t operation)
>>>     > +                                 struct amdgpu_vm *vm)
>>>     >   {
>>>     > +     struct amdgpu_bo_va *bo_va;
>>>     >       int r;
>>>     >
>>>     >       if (!amdgpu_vm_ready(vm))
>>>     > @@ -617,12 +616,18 @@ static void
>>>     amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>>>     >       if (r)
>>>     >               goto error;
>>>     >
>>>     > -     if (operation == AMDGPU_VA_OP_MAP ||
>>>     > -         operation == AMDGPU_VA_OP_REPLACE) {
>>>     > +     spin_lock(&vm->status_lock);
>>>     > +     while (!list_empty(&vm->dirty)) {
>>>     > +             bo_va = list_first_entry(&vm->dirty, struct
>>>     amdgpu_bo_va,
>>>     > + base.vm_status);
>>>     > +  spin_unlock(&vm->status_lock);
>>>     > +
>>>     >               r = amdgpu_vm_bo_update(adev, bo_va, false);
>>>     >               if (r)
>>>     >                       goto error;
>>>     > +  spin_lock(&vm->status_lock);
>>>     >       }
>>>     > +     spin_unlock(&vm->status_lock);
>>>     >
>>>     >       r = amdgpu_vm_update_pdes(adev, vm, false);
>>>     >
>>>     > @@ -792,8 +797,7 @@ int amdgpu_gem_va_ioctl(struct drm_device
>>>     *dev, void *data,
>>>     >               break;
>>>     >       }
>>>     >       if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) &&
>>>     !amdgpu_vm_debug)
>>>     > -             amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
>>>     > -  args->operation);
>>>     > +             amdgpu_gem_va_update_vm(adev, &fpriv->vm);
>>>     >
>>>     >   error:
>>>     >       drm_exec_fini(&exec);
>>>     > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>     > index dd6f72e2a1d6..01d31891cd05 100644
>>>     > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>     > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>     > @@ -191,6 +191,21 @@ static void
>>>     amdgpu_vm_bo_set_evicted(struct amdgpu_vm_bo_base *vm_bo, bool evict
>>>     >  spin_unlock(&vm_bo->vm->status_lock);
>>>     >   }
>>>     >
>>>     > +/**
>>>     > + * amdgpu_vm_bo_dirty - vm_bo is dirty
>>>     > + *
>>>     > + * @vm_bo: vm_bo which is dirty
>>>     > + *
>>>     > + * State for normal and per VM BOs that are not moved, but
>>>     have new entries in
>>>     > + * bo_va->invalids.
>>>     > + */
>>>     > +static void amdgpu_vm_bo_dirty(struct amdgpu_vm_bo_base *vm_bo)
>>>     > +{
>>>     > +  spin_lock(&vm_bo->vm->status_lock);
>>>     > +     list_move(&vm_bo->vm_status, &vm_bo->vm->dirty);
>>>     > +  spin_unlock(&vm_bo->vm->status_lock);
>>>     > +}
>>>     > +
>>>     >   /**
>>>     >    * amdgpu_vm_bo_moved - vm_bo is moved
>>>     >    *
>>>     > @@ -1042,6 +1057,9 @@ void amdgpu_vm_get_memory(struct
>>>     amdgpu_vm *vm,
>>>     >       list_for_each_entry_safe(bo_va, tmp, &vm->evicted,
>>>     base.eviction_status)
>>>     >               amdgpu_vm_bo_get_memory(bo_va, stats);
>>>     >
>>>     > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty,
>>>     base.vm_status)
>>>     > +             amdgpu_vm_bo_get_memory(bo_va, stats);
>>>     > +
>>>     >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated,
>>>     base.vm_status)
>>>     >               amdgpu_vm_bo_get_memory(bo_va, stats);
>>>     >
>>>     > @@ -1411,6 +1429,17 @@ int amdgpu_vm_handle_moved(struct
>>>     amdgpu_device *adev,
>>>     >                       dma_resv_unlock(resv);
>>>     >  spin_lock(&vm->status_lock);
>>>     >       }
>>>     > +
>>>     > +     while (!list_empty(&vm->dirty)) {
>>>     > +             bo_va = list_first_entry(&vm->dirty, struct
>>>     amdgpu_bo_va,
>>>     > + base.vm_status);
>>>     > +  spin_unlock(&vm->status_lock);
>>>     > +
>>>     > +             r = amdgpu_vm_bo_update(adev, bo_va, false);
>>>     > +             if (r)
>>>     > +                     return r;
>>>     > +  spin_lock(&vm->status_lock);
>>>     > +     }
>>>     >       spin_unlock(&vm->status_lock);
>>>     >
>>>     >       return 0;
>>>     > @@ -1476,19 +1505,16 @@ static void
>>>     amdgpu_vm_bo_insert_map(struct amdgpu_device *adev,
>>>     >                                   struct amdgpu_bo_va_mapping
>>>     *mapping)
>>>     >   {
>>>     >       struct amdgpu_vm *vm = bo_va->base.vm;
>>>     > -     struct amdgpu_bo *bo = bo_va->base.bo <http://base.bo/>;
>>>     >
>>>     >       mapping->bo_va = bo_va;
>>>     >       list_add(&mapping->list, &bo_va->invalids);
>>>     >       amdgpu_vm_it_insert(mapping, &vm->va);
>>>     > +     if (!bo_va->base.moved)
>>>     > +  amdgpu_vm_bo_dirty(&bo_va->base);
>>>     >
>>>     >       if (mapping->flags & AMDGPU_PTE_PRT)
>>>     >               amdgpu_vm_prt_get(adev);
>>>     >
>>>     > -     if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
>>>     > -         !bo_va->base.moved) {
>>>     > -  amdgpu_vm_bo_moved(&bo_va->base);
>>>     > -     }
>>>     >       trace_amdgpu_vm_bo_map(bo_va, mapping);
>>>     >   }
>>>     >
>>>     > @@ -1725,6 +1751,8 @@ int amdgpu_vm_bo_clear_mappings(struct
>>>     amdgpu_device *adev,
>>>     >                       before->flags = tmp->flags;
>>>     >                       before->bo_va = tmp->bo_va;
>>>     >  list_add(&before->list, &tmp->bo_va->invalids);
>>>     > +                     if (!tmp->bo_va->base.moved)
>>>     > +  amdgpu_vm_bo_dirty(&tmp->bo_va->base);
>>>     >               }
>>>     >
>>>     >               /* Remember mapping split at the end */
>>>     > @@ -1736,6 +1764,8 @@ int amdgpu_vm_bo_clear_mappings(struct
>>>     amdgpu_device *adev,
>>>     >                       after->flags = tmp->flags;
>>>     >                       after->bo_va = tmp->bo_va;
>>>     >  list_add(&after->list, &tmp->bo_va->invalids);
>>>     > +                     if (!tmp->bo_va->base.moved)
>>>     > +  amdgpu_vm_bo_dirty(&tmp->bo_va->base);
>>>     >               }
>>>     >
>>>     >               list_del(&tmp->list);
>>>     > @@ -1761,30 +1791,18 @@ int amdgpu_vm_bo_clear_mappings(struct
>>>     amdgpu_device *adev,
>>>     >
>>>     >       /* Insert partial mapping before the range */
>>>     >       if (!list_empty(&before->list)) {
>>>     > -             struct amdgpu_bo *bo = before->bo_va->base.bo
>>>     <http://base.bo/>;
>>>     > -
>>>     >               amdgpu_vm_it_insert(before, &vm->va);
>>>     >               if (before->flags & AMDGPU_PTE_PRT)
>>>     >  amdgpu_vm_prt_get(adev);
>>>     > -
>>>     > -             if (bo && bo->tbo.base.resv ==
>>>     vm->root.bo->tbo.base.resv &&
>>>     > -  !before->bo_va->base.moved)
>>>     > -  amdgpu_vm_bo_moved(&before->bo_va->base);
>>>     >       } else {
>>>     >               kfree(before);
>>>     >       }
>>>     >
>>>     >       /* Insert partial mapping after the range */
>>>     >       if (!list_empty(&after->list)) {
>>>     > -             struct amdgpu_bo *bo = after->bo_va->base.bo
>>>     <http://base.bo/>;
>>>     > -
>>>     >               amdgpu_vm_it_insert(after, &vm->va);
>>>     >               if (after->flags & AMDGPU_PTE_PRT)
>>>     >  amdgpu_vm_prt_get(adev);
>>>     > -
>>>     > -             if (bo && bo->tbo.base.resv ==
>>>     vm->root.bo->tbo.base.resv &&
>>>     > -  !after->bo_va->base.moved)
>>>     > -  amdgpu_vm_bo_moved(&after->bo_va->base);
>>>     >       } else {
>>>     >               kfree(after);
>>>     >       }
>>>     > @@ -2136,6 +2154,7 @@ int amdgpu_vm_init(struct amdgpu_device
>>>     *adev, struct amdgpu_vm *vm, int32_t xcp
>>>     >       INIT_LIST_HEAD(&vm->evicted);
>>>     >       INIT_LIST_HEAD(&vm->relocated);
>>>     >       INIT_LIST_HEAD(&vm->moved);
>>>     > +     INIT_LIST_HEAD(&vm->dirty);
>>>     >       INIT_LIST_HEAD(&vm->idle);
>>>     >  INIT_LIST_HEAD(&vm->invalidated);
>>>     >  spin_lock_init(&vm->status_lock);
>>>     > @@ -2648,11 +2667,13 @@ void amdgpu_debugfs_vm_bo_info(struct
>>>     amdgpu_vm *vm, struct seq_file *m)
>>>     >   {
>>>     >       struct amdgpu_bo_va *bo_va, *tmp;
>>>     >       u64 total_idle = 0;
>>>     > +     u64 total_dirty = 0;
>>>     >       u64 total_relocated = 0;
>>>     >       u64 total_moved = 0;
>>>     >       u64 total_invalidated = 0;
>>>     >       u64 total_done = 0;
>>>     >       unsigned int total_idle_objs = 0;
>>>     > +     unsigned int total_dirty_objs = 0;
>>>     >       unsigned int total_relocated_objs = 0;
>>>     >       unsigned int total_moved_objs = 0;
>>>     >       unsigned int total_invalidated_objs = 0;
>>>     > @@ -2669,6 +2690,15 @@ void amdgpu_debugfs_vm_bo_info(struct
>>>     amdgpu_vm *vm, struct seq_file *m)
>>>     >       total_idle_objs = id;
>>>     >       id = 0;
>>>     >
>>>     > +     seq_puts(m, "\tDirty BOs:\n");
>>>     > +     list_for_each_entry_safe(bo_va, tmp, &vm->dirty,
>>>     base.vm_status) {
>>>     > +             if (!bo_va->base.bo <http://base.bo/>)
>>>     > +                     continue;
>>>     > +             total_dirty += amdgpu_bo_print_info(id++,
>>>     bo_va->base.bo <http://base.bo/>, m);
>>>     > +     }
>>>     > +     total_dirty_objs = id;
>>>     > +     id = 0;
>>>     > +
>>>     >       seq_puts(m, "\tRelocated BOs:\n");
>>>     >       list_for_each_entry_safe(bo_va, tmp, &vm->relocated,
>>>     base.vm_status) {
>>>     >               if (!bo_va->base.bo <http://base.bo/>)
>>>     > @@ -2707,6 +2737,8 @@ void amdgpu_debugfs_vm_bo_info(struct
>>>     amdgpu_vm *vm, struct seq_file *m)
>>>     >
>>>     >       seq_printf(m, "\tTotal idle size:  
>>>     %12lld\tobjs:\t%d\n", total_idle,
>>>     >                  total_idle_objs);
>>>     > +     seq_printf(m, "\tTotal dirty size:
>>>      %12lld\tobjs:\t%d\n", total_dirty,
>>>     > +                total_dirty_objs);
>>>     >       seq_printf(m, "\tTotal relocated size:
>>>      %12lld\tobjs:\t%d\n", total_relocated,
>>>     >                  total_relocated_objs);
>>>     >       seq_printf(m, "\tTotal moved size:
>>>      %12lld\tobjs:\t%d\n", total_moved,
>>>     > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>     > index d9ab97eabda9..f91d4fcf80b8 100644
>>>     > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>     > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>     > @@ -276,6 +276,9 @@ struct amdgpu_vm {
>>>     >       /* per VM BOs moved, but not yet updated in the PT */
>>>     >       struct list_head        moved;
>>>     >
>>>     > +     /* normal and per VM BOs that are not moved, but have
>>>     new PT entries */
>>>     > +     struct list_head        dirty;
>>>     > +
>>>     >       /* All BOs of this VM not currently in the state machine */
>>>     >       struct list_head        idle;
>>>     >
>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 33080 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations.
  2023-11-02 14:04   ` [PATCH v2 2/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
@ 2023-11-06 13:44     ` Christian König
  2023-11-06 15:47       ` Tatsuyuki Ishi
  0 siblings, 1 reply; 34+ messages in thread
From: Christian König @ 2023-11-06 13:44 UTC (permalink / raw)
  To: Tatsuyuki Ishi, dri-devel, amd-gfx

  Am 02.11.23 um 15:04 schrieb Tatsuyuki Ishi:
> In Vulkan, it is the application's responsibility to perform adequate
> synchronization before a sparse unmap, replace or BO destroy operation.
> Until now, the kernel applied the same rule as implicitly-synchronized
> APIs like OpenGL, which with per-VM BOs made page table updates stall the
> queue completely. The newly added AMDGPU_VM_EXPLICIT_SYNC flag allows
> drivers to opt-out of this behavior, while still ensuring adequate implicit
> sync happens for kernel-initiated updates (e.g. BO moves).
>
> We record whether to use implicit sync or not for each freed mapping. To
> avoid increasing the mapping struct's size, this is union-ized with the
> interval tree field which is unused after the unmap.
>
> The reason this is done with a GEM ioctl flag, instead of being a VM /
> context global setting, is that the current libdrm implementation shares
> the DRM handle even between different kind of drivers (radeonsi vs radv).

It would be nice if we could make this more future prove by not using a 
flag, but rather a drm_syncobj.

You can extend the drm_amdgpu_gem_va structure by adding a drm_syncobj 
handle and timeline point at the end.

If the syncobj/timeline point results in a fence we give that as input 
dependency the operation has to wait for.

And output fence can come later on as well, but that one is much more 
harder to handle.

Regards,
Christian.

>
> Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
> ---
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c       |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       | 14 ++++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |  7 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h     |  6 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        | 47 +++++++++++--------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        | 23 +++++----
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 18 +++----
>   include/uapi/drm/amdgpu_drm.h                 |  2 +
>   9 files changed, 71 insertions(+), 50 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 7d6daf8d2bfa..10e129bff977 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1196,7 +1196,7 @@ static void unmap_bo_from_gpuvm(struct kgd_mem *mem,
>   	struct amdgpu_device *adev = entry->adev;
>   	struct amdgpu_vm *vm = bo_va->base.vm;
>   
> -	amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
> +	amdgpu_vm_bo_unmap(adev, bo_va, entry->va, true);
>   
>   	amdgpu_vm_clear_freed(adev, vm, &bo_va->last_pt_update);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> index 720011019741..612279e65bff 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> @@ -122,7 +122,7 @@ int amdgpu_unmap_static_csa(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   		}
>   	}
>   
> -	r = amdgpu_vm_bo_unmap(adev, bo_va, csa_addr);
> +	r = amdgpu_vm_bo_unmap(adev, bo_va, csa_addr, true);
>   	if (r) {
>   		DRM_ERROR("failed to do bo_unmap on static CSA, err=%d\n", r);
>   		goto error;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index a1b15d0d6c48..cca68b89754e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -667,9 +667,9 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
>   	const uint32_t valid_flags = AMDGPU_VM_DELAY_UPDATE |
>   		AMDGPU_VM_PAGE_READABLE | AMDGPU_VM_PAGE_WRITEABLE |
>   		AMDGPU_VM_PAGE_EXECUTABLE | AMDGPU_VM_MTYPE_MASK |
> -		AMDGPU_VM_PAGE_NOALLOC;
> +		AMDGPU_VM_PAGE_NOALLOC | AMDGPU_VM_EXPLICIT_SYNC;
>   	const uint32_t prt_flags = AMDGPU_VM_DELAY_UPDATE |
> -		AMDGPU_VM_PAGE_PRT;
> +		AMDGPU_VM_PAGE_PRT | AMDGPU_VM_EXPLICIT_SYNC;
>   
>   	struct drm_amdgpu_gem_va *args = data;
>   	struct drm_gem_object *gobj;
> @@ -680,6 +680,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
>   	struct drm_exec exec;
>   	uint64_t va_flags;
>   	uint64_t vm_size;
> +	bool sync_unmap;
>   	int r = 0;
>   
>   	if (args->va_address < AMDGPU_VA_RESERVED_SIZE) {
> @@ -715,6 +716,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
>   		return -EINVAL;
>   	}
>   
> +	sync_unmap = !(args->flags & AMDGPU_VM_EXPLICIT_SYNC);
> +
>   	switch (args->operation) {
>   	case AMDGPU_VA_OP_MAP:
>   	case AMDGPU_VA_OP_UNMAP:
> @@ -774,19 +777,20 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
>   				     va_flags);
>   		break;
>   	case AMDGPU_VA_OP_UNMAP:
> -		r = amdgpu_vm_bo_unmap(adev, bo_va, args->va_address);
> +		r = amdgpu_vm_bo_unmap(adev, bo_va, args->va_address,
> +				       sync_unmap);
>   		break;
>   
>   	case AMDGPU_VA_OP_CLEAR:
>   		r = amdgpu_vm_bo_clear_mappings(adev, &fpriv->vm,
>   						args->va_address,
> -						args->map_size);
> +						args->map_size, sync_unmap);
>   		break;
>   	case AMDGPU_VA_OP_REPLACE:
>   		va_flags = amdgpu_gem_va_map_flags(adev, args->flags);
>   		r = amdgpu_vm_bo_replace_map(adev, bo_va, args->va_address,
>   					     args->offset_in_bo, args->map_size,
> -					     va_flags);
> +					     va_flags, sync_unmap);
>   		break;
>   	default:
>   		break;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> index f3ee83cdf97e..28be03f1bbcf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> @@ -67,7 +67,12 @@ struct amdgpu_bo_va_mapping {
>   	struct rb_node			rb;
>   	uint64_t			start;
>   	uint64_t			last;
> -	uint64_t			__subtree_last;
> +	union {
> +		/* BOs in interval tree only */
> +		uint64_t		__subtree_last;
> +		/* Freed BOs only */
> +		bool			sync_unmap;
> +	};
>   	uint64_t			offset;
>   	uint64_t			flags;
>   };
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> index 2fd1bfb35916..e71443c8c59b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> @@ -276,6 +276,7 @@ TRACE_EVENT(amdgpu_vm_bo_unmap,
>   			     __field(long, last)
>   			     __field(u64, offset)
>   			     __field(u64, flags)
> +			     __field(bool, sync_unmap)
>   			     ),
>   
>   	    TP_fast_assign(
> @@ -284,10 +285,11 @@ TRACE_EVENT(amdgpu_vm_bo_unmap,
>   			   __entry->last = mapping->last;
>   			   __entry->offset = mapping->offset;
>   			   __entry->flags = mapping->flags;
> +			   __entry->sync_unmap = mapping->sync_unmap;
>   			   ),
> -	    TP_printk("bo=%p, start=%lx, last=%lx, offset=%010llx, flags=%llx",
> +	    TP_printk("bo=%p, start=%lx, last=%lx, offset=%010llx, flags=%llx, sync_unmap=%d",
>   		      __entry->bo, __entry->start, __entry->last,
> -		      __entry->offset, __entry->flags)
> +		      __entry->offset, __entry->flags, __entry->sync_unmap)
>   );
>   
>   DECLARE_EVENT_CLASS(amdgpu_vm_mapping,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 7b9762f1cddd..a74472e16952 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -844,6 +844,7 @@ static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence,
>    * @immediate: immediate submission in a page fault
>    * @unlocked: unlocked invalidation during MM callback
>    * @flush_tlb: trigger tlb invalidation after update completed
> + * @sync_unmap: wait for BO users before unmapping
>    * @resv: fences we need to sync to
>    * @start: start of mapped range
>    * @last: last mapped entry
> @@ -861,8 +862,9 @@ static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence,
>    */
>   int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   			   bool immediate, bool unlocked, bool flush_tlb,
> -			   struct dma_resv *resv, uint64_t start, uint64_t last,
> -			   uint64_t flags, uint64_t offset, uint64_t vram_base,
> +			   bool sync_unmap, struct dma_resv *resv,
> +			   uint64_t start, uint64_t last, uint64_t flags,
> +			   uint64_t offset, uint64_t vram_base,
>   			   struct ttm_resource *res, dma_addr_t *pages_addr,
>   			   struct dma_fence **fence)
>   {
> @@ -902,7 +904,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   	/* Implicitly sync to command submissions in the same VM before
>   	 * unmapping. Sync to moving fences before mapping.
>   	 */
> -	if (!(flags & (AMDGPU_PTE_VALID | AMDGPU_PTE_PRT)))
> +	if (!(flags & (AMDGPU_PTE_VALID | AMDGPU_PTE_PRT)) && sync_unmap)
>   		sync_mode = AMDGPU_SYNC_EQ_OWNER;
>   	else
>   		sync_mode = AMDGPU_SYNC_EXPLICIT;
> @@ -1145,10 +1147,10 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
>   		trace_amdgpu_vm_bo_update(mapping);
>   
>   		r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb,
> -					   resv, mapping->start, mapping->last,
> -					   update_flags, mapping->offset,
> -					   vram_base, mem, pages_addr,
> -					   last_update);
> +					   true, resv, mapping->start,
> +					   mapping->last, update_flags,
> +					   mapping->offset, vram_base, mem,
> +					   pages_addr, last_update);
>   		if (r)
>   			return r;
>   	}
> @@ -1340,7 +1342,8 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
>   		    mapping->start < AMDGPU_GMC_HOLE_START)
>   			init_pte_value = AMDGPU_PTE_DEFAULT_ATC;
>   
> -		r = amdgpu_vm_update_range(adev, vm, false, false, true, resv,
> +		r = amdgpu_vm_update_range(adev, vm, false, false, true,
> +					   mapping->sync_unmap, resv,
>   					   mapping->start, mapping->last,
>   					   init_pte_value, 0, 0, NULL, NULL,
>   					   &f);
> @@ -1572,6 +1575,7 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
>    * @offset: requested offset in the BO
>    * @size: BO size in bytes
>    * @flags: attributes of pages (read/write/valid/etc.)
> + * @sync_unmap: wait for BO users before replacing existing mapping
>    *
>    * Add a mapping of the BO at the specefied addr into the VM. Replace existing
>    * mappings as we do so.
> @@ -1582,9 +1586,9 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
>    * Object has to be reserved and unreserved outside!
>    */
>   int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
> -			     struct amdgpu_bo_va *bo_va,
> -			     uint64_t saddr, uint64_t offset,
> -			     uint64_t size, uint64_t flags)
> +			     struct amdgpu_bo_va *bo_va, uint64_t saddr,
> +			     uint64_t offset, uint64_t size, uint64_t flags,
> +			     bool sync_unmap)
>   {
>   	struct amdgpu_bo_va_mapping *mapping;
>   	struct amdgpu_bo *bo = bo_va->base.bo;
> @@ -1608,7 +1612,7 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
>   	if (!mapping)
>   		return -ENOMEM;
>   
> -	r = amdgpu_vm_bo_clear_mappings(adev, bo_va->base.vm, saddr, size);
> +	r = amdgpu_vm_bo_clear_mappings(adev, bo_va->base.vm, saddr, size, sync_unmap);
>   	if (r) {
>   		kfree(mapping);
>   		return r;
> @@ -1633,6 +1637,7 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
>    * @adev: amdgpu_device pointer
>    * @bo_va: bo_va to remove the address from
>    * @saddr: where to the BO is mapped
> + * @sync_unmap: wait for BO users before unmapping
>    *
>    * Remove a mapping of the BO at the specefied addr from the VM.
>    *
> @@ -1641,9 +1646,8 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
>    *
>    * Object has to be reserved and unreserved outside!
>    */
> -int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
> -		       struct amdgpu_bo_va *bo_va,
> -		       uint64_t saddr)
> +int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
> +		       uint64_t saddr, bool sync_unmap)
>   {
>   	struct amdgpu_bo_va_mapping *mapping;
>   	struct amdgpu_vm *vm = bo_va->base.vm;
> @@ -1671,6 +1675,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>   	list_del(&mapping->list);
>   	amdgpu_vm_it_remove(mapping, &vm->va);
>   	mapping->bo_va = NULL;
> +	mapping->sync_unmap = sync_unmap;
>   	trace_amdgpu_vm_bo_unmap(bo_va, mapping);
>   
>   	if (valid)
> @@ -1689,6 +1694,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>    * @vm: VM structure to use
>    * @saddr: start of the range
>    * @size: size of the range
> + * @sync_unmap: wait for BO users before unmapping
>    *
>    * Remove all mappings in a range, split them as appropriate.
>    *
> @@ -1696,8 +1702,8 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>    * 0 for success, error for failure.
>    */
>   int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
> -				struct amdgpu_vm *vm,
> -				uint64_t saddr, uint64_t size)
> +				struct amdgpu_vm *vm, uint64_t saddr,
> +				uint64_t size, bool sync_unmap)
>   {
>   	struct amdgpu_bo_va_mapping *before, *after, *tmp, *next;
>   	LIST_HEAD(removed);
> @@ -1761,6 +1767,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>   		    tmp->last = eaddr;
>   
>   		tmp->bo_va = NULL;
> +		tmp->sync_unmap = sync_unmap;
>   		list_add(&tmp->list, &vm->freed);
>   		trace_amdgpu_vm_bo_unmap(NULL, tmp);
>   	}
> @@ -1889,6 +1896,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
>   		list_del(&mapping->list);
>   		amdgpu_vm_it_remove(mapping, &vm->va);
>   		mapping->bo_va = NULL;
> +		mapping->sync_unmap = true;
>   		trace_amdgpu_vm_bo_unmap(bo_va, mapping);
>   		list_add(&mapping->list, &vm->freed);
>   	}
> @@ -2617,8 +2625,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid,
>   		goto error_unlock;
>   	}
>   
> -	r = amdgpu_vm_update_range(adev, vm, true, false, false, NULL, addr,
> -				   addr, flags, value, 0, NULL, NULL, NULL);
> +	r = amdgpu_vm_update_range(adev, vm, true, false, false, true, NULL,
> +				   addr, addr, flags, value, 0, NULL, NULL,
> +				   NULL);
>   	if (r)
>   		goto error_unlock;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 204ab13184ed..73b7b49fdb2e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -423,12 +423,12 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
>   			    struct amdgpu_vm *vm, struct amdgpu_bo *bo);
>   int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   			   bool immediate, bool unlocked, bool flush_tlb,
> -			   struct dma_resv *resv, uint64_t start, uint64_t last,
> -			   uint64_t flags, uint64_t offset, uint64_t vram_base,
> +			   bool sync_unmap, struct dma_resv *resv,
> +			   uint64_t start, uint64_t last, uint64_t flags,
> +			   uint64_t offset, uint64_t vram_base,
>   			   struct ttm_resource *res, dma_addr_t *pages_addr,
>   			   struct dma_fence **fence);
> -int amdgpu_vm_bo_update(struct amdgpu_device *adev,
> -			struct amdgpu_bo_va *bo_va,
> +int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
>   			bool clear);
>   bool amdgpu_vm_evictable(struct amdgpu_bo *bo);
>   void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
> @@ -444,15 +444,14 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
>   		     uint64_t addr, uint64_t offset,
>   		     uint64_t size, uint64_t flags);
>   int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
> -			     struct amdgpu_bo_va *bo_va,
> -			     uint64_t addr, uint64_t offset,
> -			     uint64_t size, uint64_t flags);
> -int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
> -		       struct amdgpu_bo_va *bo_va,
> -		       uint64_t addr);
> +			     struct amdgpu_bo_va *bo_va, uint64_t addr,
> +			     uint64_t offset, uint64_t size, uint64_t flags,
> +			     bool sync_unmap);
> +int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
> +		       uint64_t addr, bool sync_unmap);
>   int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
> -				struct amdgpu_vm *vm,
> -				uint64_t saddr, uint64_t size);
> +				struct amdgpu_vm *vm, uint64_t saddr,
> +				uint64_t size, bool sync_unmap);
>   struct amdgpu_bo_va_mapping *amdgpu_vm_bo_lookup_mapping(struct amdgpu_vm *vm,
>   							 uint64_t addr);
>   void amdgpu_vm_bo_trace_cs(struct amdgpu_vm *vm, struct ww_acquire_ctx *ticket);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index bb16b795d1bc..6eb4a0a4bc84 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -1291,9 +1291,9 @@ svm_range_unmap_from_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   
>   	pr_debug("[0x%llx 0x%llx]\n", start, last);
>   
> -	return amdgpu_vm_update_range(adev, vm, false, true, true, NULL, start,
> -				      last, init_pte_value, 0, 0, NULL, NULL,
> -				      fence);
> +	return amdgpu_vm_update_range(adev, vm, false, true, true, true, NULL,
> +				      start, last, init_pte_value, 0, 0, NULL,
> +				      NULL, fence);
>   }
>   
>   static int
> @@ -1398,12 +1398,12 @@ svm_range_map_to_gpu(struct kfd_process_device *pdd, struct svm_range *prange,
>   		 * different memory partition based on fpfn/lpfn, we should use
>   		 * same vm_manager.vram_base_offset regardless memory partition.
>   		 */
> -		r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb, NULL,
> -					   last_start, prange->start + i,
> -					   pte_flags,
> -					   (last_start - prange->start) << PAGE_SHIFT,
> -					   bo_adev ? bo_adev->vm_manager.vram_base_offset : 0,
> -					   NULL, dma_addr, &vm->last_update);
> +		r = amdgpu_vm_update_range(
> +			adev, vm, false, false, flush_tlb, true, NULL,
> +			last_start, prange->start + i, pte_flags,
> +			(last_start - prange->start) << PAGE_SHIFT,
> +			bo_adev ? bo_adev->vm_manager.vram_base_offset : 0,
> +			NULL, dma_addr, &vm->last_update);
>   
>   		for (j = last_start - prange->start; j <= i; j++)
>   			dma_addr[j] |= last_domain;
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index f477eda6a2b8..3cdcc299956e 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -556,6 +556,8 @@ struct drm_amdgpu_gem_op {
>   #define AMDGPU_VM_MTYPE_RW		(5 << 5)
>   /* don't allocate MALL */
>   #define AMDGPU_VM_PAGE_NOALLOC		(1 << 9)
> +/* don't sync on unmap */
> +#define AMDGPU_VM_EXPLICIT_SYNC		(1 << 10)
>   
>   struct drm_amdgpu_gem_va {
>   	/** GEM object handle */


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations.
  2023-11-06 13:44     ` Christian König
@ 2023-11-06 15:47       ` Tatsuyuki Ishi
  2023-11-06 19:14         ` Christian König
  0 siblings, 1 reply; 34+ messages in thread
From: Tatsuyuki Ishi @ 2023-11-06 15:47 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel

[-- Attachment #1: Type: text/plain, Size: 20157 bytes --]

> On Nov 6, 2023, at 22:44, Christian König <christian.koenig@amd.com> wrote:
> 
>  Am 02.11.23 um 15:04 schrieb Tatsuyuki Ishi:
>> In Vulkan, it is the application's responsibility to perform adequate
>> synchronization before a sparse unmap, replace or BO destroy operation.
>> Until now, the kernel applied the same rule as implicitly-synchronized
>> APIs like OpenGL, which with per-VM BOs made page table updates stall the
>> queue completely. The newly added AMDGPU_VM_EXPLICIT_SYNC flag allows
>> drivers to opt-out of this behavior, while still ensuring adequate implicit
>> sync happens for kernel-initiated updates (e.g. BO moves).
>> 
>> We record whether to use implicit sync or not for each freed mapping. To
>> avoid increasing the mapping struct's size, this is union-ized with the
>> interval tree field which is unused after the unmap.
>> 
>> The reason this is done with a GEM ioctl flag, instead of being a VM /
>> context global setting, is that the current libdrm implementation shares
>> the DRM handle even between different kind of drivers (radeonsi vs radv).
> 
> It would be nice if we could make this more future prove by not using a flag, but rather a drm_syncobj.

There is asynchronous VM_BIND and synchronous VM_BIND. Using syncobjs address asynchronous binds, but what this patch set solves is to add an explicitly synced synchronous bind.

Even within Vulkan, there are use cases for synchronous binds. This is when a non-sparse BO is destroyed (or created but that’s not synchronized). Such operations should still be explicit sync, unlike OpenGL where it syncs to previous submissions. So adding asynchronous bind doesn’t supersede this need.

I’ve also thought whether we can just make the unmap asynchronous, since the spec requires that destroyed stuff are not accessed in any way, but I think it will complicate behavior when the destruction of BO immediately follows.

We should implement asynchronous bind someday to make vkQueueBindSparse work (even) better, but that will likely involve a larger scope including the scheduler. Getting synchronous but explicitly synced binds should be simpler and a good incremental step.

Tatsuyuki.

> You can extend the drm_amdgpu_gem_va structure by adding a drm_syncobj handle and timeline point at the end.
> 
> If the syncobj/timeline point results in a fence we give that as input dependency the operation has to wait for.
> 
> And output fence can come later on as well, but that one is much more harder to handle.
> 
> Regards,
> Christian.
> 
>> 
>> Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
>> ---
>>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  2 +-
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c       |  2 +-
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       | 14 ++++--
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |  7 ++-
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h     |  6 ++-
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        | 47 +++++++++++--------
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        | 23 +++++----
>>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 18 +++----
>>  include/uapi/drm/amdgpu_drm.h                 |  2 +
>>  9 files changed, 71 insertions(+), 50 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 7d6daf8d2bfa..10e129bff977 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -1196,7 +1196,7 @@ static void unmap_bo_from_gpuvm(struct kgd_mem *mem,
>>  	struct amdgpu_device *adev = entry->adev;
>>  	struct amdgpu_vm *vm = bo_va->base.vm;
>>  -	amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
>> +	amdgpu_vm_bo_unmap(adev, bo_va, entry->va, true);
>>    	amdgpu_vm_clear_freed(adev, vm, &bo_va->last_pt_update);
>>  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>> index 720011019741..612279e65bff 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>> @@ -122,7 +122,7 @@ int amdgpu_unmap_static_csa(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>  		}
>>  	}
>>  -	r = amdgpu_vm_bo_unmap(adev, bo_va, csa_addr);
>> +	r = amdgpu_vm_bo_unmap(adev, bo_va, csa_addr, true);
>>  	if (r) {
>>  		DRM_ERROR("failed to do bo_unmap on static CSA, err=%d\n", r);
>>  		goto error;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index a1b15d0d6c48..cca68b89754e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -667,9 +667,9 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
>>  	const uint32_t valid_flags = AMDGPU_VM_DELAY_UPDATE |
>>  		AMDGPU_VM_PAGE_READABLE | AMDGPU_VM_PAGE_WRITEABLE |
>>  		AMDGPU_VM_PAGE_EXECUTABLE | AMDGPU_VM_MTYPE_MASK |
>> -		AMDGPU_VM_PAGE_NOALLOC;
>> +		AMDGPU_VM_PAGE_NOALLOC | AMDGPU_VM_EXPLICIT_SYNC;
>>  	const uint32_t prt_flags = AMDGPU_VM_DELAY_UPDATE |
>> -		AMDGPU_VM_PAGE_PRT;
>> +		AMDGPU_VM_PAGE_PRT | AMDGPU_VM_EXPLICIT_SYNC;
>>    	struct drm_amdgpu_gem_va *args = data;
>>  	struct drm_gem_object *gobj;
>> @@ -680,6 +680,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
>>  	struct drm_exec exec;
>>  	uint64_t va_flags;
>>  	uint64_t vm_size;
>> +	bool sync_unmap;
>>  	int r = 0;
>>    	if (args->va_address < AMDGPU_VA_RESERVED_SIZE) {
>> @@ -715,6 +716,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
>>  		return -EINVAL;
>>  	}
>>  +	sync_unmap = !(args->flags & AMDGPU_VM_EXPLICIT_SYNC);
>> +
>>  	switch (args->operation) {
>>  	case AMDGPU_VA_OP_MAP:
>>  	case AMDGPU_VA_OP_UNMAP:
>> @@ -774,19 +777,20 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
>>  				     va_flags);
>>  		break;
>>  	case AMDGPU_VA_OP_UNMAP:
>> -		r = amdgpu_vm_bo_unmap(adev, bo_va, args->va_address);
>> +		r = amdgpu_vm_bo_unmap(adev, bo_va, args->va_address,
>> +				       sync_unmap);
>>  		break;
>>    	case AMDGPU_VA_OP_CLEAR:
>>  		r = amdgpu_vm_bo_clear_mappings(adev, &fpriv->vm,
>>  						args->va_address,
>> -						args->map_size);
>> +						args->map_size, sync_unmap);
>>  		break;
>>  	case AMDGPU_VA_OP_REPLACE:
>>  		va_flags = amdgpu_gem_va_map_flags(adev, args->flags);
>>  		r = amdgpu_vm_bo_replace_map(adev, bo_va, args->va_address,
>>  					     args->offset_in_bo, args->map_size,
>> -					     va_flags);
>> +					     va_flags, sync_unmap);
>>  		break;
>>  	default:
>>  		break;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> index f3ee83cdf97e..28be03f1bbcf 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> @@ -67,7 +67,12 @@ struct amdgpu_bo_va_mapping {
>>  	struct rb_node			rb;
>>  	uint64_t			start;
>>  	uint64_t			last;
>> -	uint64_t			__subtree_last;
>> +	union {
>> +		/* BOs in interval tree only */
>> +		uint64_t		__subtree_last;
>> +		/* Freed BOs only */
>> +		bool			sync_unmap;
>> +	};
>>  	uint64_t			offset;
>>  	uint64_t			flags;
>>  };
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>> index 2fd1bfb35916..e71443c8c59b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>> @@ -276,6 +276,7 @@ TRACE_EVENT(amdgpu_vm_bo_unmap,
>>  			     __field(long, last)
>>  			     __field(u64, offset)
>>  			     __field(u64, flags)
>> +			     __field(bool, sync_unmap)
>>  			     ),
>>    	    TP_fast_assign(
>> @@ -284,10 +285,11 @@ TRACE_EVENT(amdgpu_vm_bo_unmap,
>>  			   __entry->last = mapping->last;
>>  			   __entry->offset = mapping->offset;
>>  			   __entry->flags = mapping->flags;
>> +			   __entry->sync_unmap = mapping->sync_unmap;
>>  			   ),
>> -	    TP_printk("bo=%p, start=%lx, last=%lx, offset=%010llx, flags=%llx",
>> +	    TP_printk("bo=%p, start=%lx, last=%lx, offset=%010llx, flags=%llx, sync_unmap=%d",
>>  		      __entry->bo, __entry->start, __entry->last,
>> -		      __entry->offset, __entry->flags)
>> +		      __entry->offset, __entry->flags, __entry->sync_unmap)
>>  );
>>    DECLARE_EVENT_CLASS(amdgpu_vm_mapping,
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 7b9762f1cddd..a74472e16952 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -844,6 +844,7 @@ static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence,
>>   * @immediate: immediate submission in a page fault
>>   * @unlocked: unlocked invalidation during MM callback
>>   * @flush_tlb: trigger tlb invalidation after update completed
>> + * @sync_unmap: wait for BO users before unmapping
>>   * @resv: fences we need to sync to
>>   * @start: start of mapped range
>>   * @last: last mapped entry
>> @@ -861,8 +862,9 @@ static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence,
>>   */
>>  int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>  			   bool immediate, bool unlocked, bool flush_tlb,
>> -			   struct dma_resv *resv, uint64_t start, uint64_t last,
>> -			   uint64_t flags, uint64_t offset, uint64_t vram_base,
>> +			   bool sync_unmap, struct dma_resv *resv,
>> +			   uint64_t start, uint64_t last, uint64_t flags,
>> +			   uint64_t offset, uint64_t vram_base,
>>  			   struct ttm_resource *res, dma_addr_t *pages_addr,
>>  			   struct dma_fence **fence)
>>  {
>> @@ -902,7 +904,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>  	/* Implicitly sync to command submissions in the same VM before
>>  	 * unmapping. Sync to moving fences before mapping.
>>  	 */
>> -	if (!(flags & (AMDGPU_PTE_VALID | AMDGPU_PTE_PRT)))
>> +	if (!(flags & (AMDGPU_PTE_VALID | AMDGPU_PTE_PRT)) && sync_unmap)
>>  		sync_mode = AMDGPU_SYNC_EQ_OWNER;
>>  	else
>>  		sync_mode = AMDGPU_SYNC_EXPLICIT;
>> @@ -1145,10 +1147,10 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
>>  		trace_amdgpu_vm_bo_update(mapping);
>>    		r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb,
>> -					   resv, mapping->start, mapping->last,
>> -					   update_flags, mapping->offset,
>> -					   vram_base, mem, pages_addr,
>> -					   last_update);
>> +					   true, resv, mapping->start,
>> +					   mapping->last, update_flags,
>> +					   mapping->offset, vram_base, mem,
>> +					   pages_addr, last_update);
>>  		if (r)
>>  			return r;
>>  	}
>> @@ -1340,7 +1342,8 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
>>  		    mapping->start < AMDGPU_GMC_HOLE_START)
>>  			init_pte_value = AMDGPU_PTE_DEFAULT_ATC;
>>  -		r = amdgpu_vm_update_range(adev, vm, false, false, true, resv,
>> +		r = amdgpu_vm_update_range(adev, vm, false, false, true,
>> +					   mapping->sync_unmap, resv,
>>  					   mapping->start, mapping->last,
>>  					   init_pte_value, 0, 0, NULL, NULL,
>>  					   &f);
>> @@ -1572,6 +1575,7 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
>>   * @offset: requested offset in the BO
>>   * @size: BO size in bytes
>>   * @flags: attributes of pages (read/write/valid/etc.)
>> + * @sync_unmap: wait for BO users before replacing existing mapping
>>   *
>>   * Add a mapping of the BO at the specefied addr into the VM. Replace existing
>>   * mappings as we do so.
>> @@ -1582,9 +1586,9 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
>>   * Object has to be reserved and unreserved outside!
>>   */
>>  int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
>> -			     struct amdgpu_bo_va *bo_va,
>> -			     uint64_t saddr, uint64_t offset,
>> -			     uint64_t size, uint64_t flags)
>> +			     struct amdgpu_bo_va *bo_va, uint64_t saddr,
>> +			     uint64_t offset, uint64_t size, uint64_t flags,
>> +			     bool sync_unmap)
>>  {
>>  	struct amdgpu_bo_va_mapping *mapping;
>>  	struct amdgpu_bo *bo = bo_va->base.bo;
>> @@ -1608,7 +1612,7 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
>>  	if (!mapping)
>>  		return -ENOMEM;
>>  -	r = amdgpu_vm_bo_clear_mappings(adev, bo_va->base.vm, saddr, size);
>> +	r = amdgpu_vm_bo_clear_mappings(adev, bo_va->base.vm, saddr, size, sync_unmap);
>>  	if (r) {
>>  		kfree(mapping);
>>  		return r;
>> @@ -1633,6 +1637,7 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
>>   * @adev: amdgpu_device pointer
>>   * @bo_va: bo_va to remove the address from
>>   * @saddr: where to the BO is mapped
>> + * @sync_unmap: wait for BO users before unmapping
>>   *
>>   * Remove a mapping of the BO at the specefied addr from the VM.
>>   *
>> @@ -1641,9 +1646,8 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
>>   *
>>   * Object has to be reserved and unreserved outside!
>>   */
>> -int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>> -		       struct amdgpu_bo_va *bo_va,
>> -		       uint64_t saddr)
>> +int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
>> +		       uint64_t saddr, bool sync_unmap)
>>  {
>>  	struct amdgpu_bo_va_mapping *mapping;
>>  	struct amdgpu_vm *vm = bo_va->base.vm;
>> @@ -1671,6 +1675,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>>  	list_del(&mapping->list);
>>  	amdgpu_vm_it_remove(mapping, &vm->va);
>>  	mapping->bo_va = NULL;
>> +	mapping->sync_unmap = sync_unmap;
>>  	trace_amdgpu_vm_bo_unmap(bo_va, mapping);
>>    	if (valid)
>> @@ -1689,6 +1694,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>>   * @vm: VM structure to use
>>   * @saddr: start of the range
>>   * @size: size of the range
>> + * @sync_unmap: wait for BO users before unmapping
>>   *
>>   * Remove all mappings in a range, split them as appropriate.
>>   *
>> @@ -1696,8 +1702,8 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>>   * 0 for success, error for failure.
>>   */
>>  int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>> -				struct amdgpu_vm *vm,
>> -				uint64_t saddr, uint64_t size)
>> +				struct amdgpu_vm *vm, uint64_t saddr,
>> +				uint64_t size, bool sync_unmap)
>>  {
>>  	struct amdgpu_bo_va_mapping *before, *after, *tmp, *next;
>>  	LIST_HEAD(removed);
>> @@ -1761,6 +1767,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>>  		    tmp->last = eaddr;
>>    		tmp->bo_va = NULL;
>> +		tmp->sync_unmap = sync_unmap;
>>  		list_add(&tmp->list, &vm->freed);
>>  		trace_amdgpu_vm_bo_unmap(NULL, tmp);
>>  	}
>> @@ -1889,6 +1896,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
>>  		list_del(&mapping->list);
>>  		amdgpu_vm_it_remove(mapping, &vm->va);
>>  		mapping->bo_va = NULL;
>> +		mapping->sync_unmap = true;
>>  		trace_amdgpu_vm_bo_unmap(bo_va, mapping);
>>  		list_add(&mapping->list, &vm->freed);
>>  	}
>> @@ -2617,8 +2625,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid,
>>  		goto error_unlock;
>>  	}
>>  -	r = amdgpu_vm_update_range(adev, vm, true, false, false, NULL, addr,
>> -				   addr, flags, value, 0, NULL, NULL, NULL);
>> +	r = amdgpu_vm_update_range(adev, vm, true, false, false, true, NULL,
>> +				   addr, addr, flags, value, 0, NULL, NULL,
>> +				   NULL);
>>  	if (r)
>>  		goto error_unlock;
>>  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> index 204ab13184ed..73b7b49fdb2e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> @@ -423,12 +423,12 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
>>  			    struct amdgpu_vm *vm, struct amdgpu_bo *bo);
>>  int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>  			   bool immediate, bool unlocked, bool flush_tlb,
>> -			   struct dma_resv *resv, uint64_t start, uint64_t last,
>> -			   uint64_t flags, uint64_t offset, uint64_t vram_base,
>> +			   bool sync_unmap, struct dma_resv *resv,
>> +			   uint64_t start, uint64_t last, uint64_t flags,
>> +			   uint64_t offset, uint64_t vram_base,
>>  			   struct ttm_resource *res, dma_addr_t *pages_addr,
>>  			   struct dma_fence **fence);
>> -int amdgpu_vm_bo_update(struct amdgpu_device *adev,
>> -			struct amdgpu_bo_va *bo_va,
>> +int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
>>  			bool clear);
>>  bool amdgpu_vm_evictable(struct amdgpu_bo *bo);
>>  void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
>> @@ -444,15 +444,14 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
>>  		     uint64_t addr, uint64_t offset,
>>  		     uint64_t size, uint64_t flags);
>>  int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
>> -			     struct amdgpu_bo_va *bo_va,
>> -			     uint64_t addr, uint64_t offset,
>> -			     uint64_t size, uint64_t flags);
>> -int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>> -		       struct amdgpu_bo_va *bo_va,
>> -		       uint64_t addr);
>> +			     struct amdgpu_bo_va *bo_va, uint64_t addr,
>> +			     uint64_t offset, uint64_t size, uint64_t flags,
>> +			     bool sync_unmap);
>> +int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
>> +		       uint64_t addr, bool sync_unmap);
>>  int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>> -				struct amdgpu_vm *vm,
>> -				uint64_t saddr, uint64_t size);
>> +				struct amdgpu_vm *vm, uint64_t saddr,
>> +				uint64_t size, bool sync_unmap);
>>  struct amdgpu_bo_va_mapping *amdgpu_vm_bo_lookup_mapping(struct amdgpu_vm *vm,
>>  							 uint64_t addr);
>>  void amdgpu_vm_bo_trace_cs(struct amdgpu_vm *vm, struct ww_acquire_ctx *ticket);
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index bb16b795d1bc..6eb4a0a4bc84 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -1291,9 +1291,9 @@ svm_range_unmap_from_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>    	pr_debug("[0x%llx 0x%llx]\n", start, last);
>>  -	return amdgpu_vm_update_range(adev, vm, false, true, true, NULL, start,
>> -				      last, init_pte_value, 0, 0, NULL, NULL,
>> -				      fence);
>> +	return amdgpu_vm_update_range(adev, vm, false, true, true, true, NULL,
>> +				      start, last, init_pte_value, 0, 0, NULL,
>> +				      NULL, fence);
>>  }
>>    static int
>> @@ -1398,12 +1398,12 @@ svm_range_map_to_gpu(struct kfd_process_device *pdd, struct svm_range *prange,
>>  		 * different memory partition based on fpfn/lpfn, we should use
>>  		 * same vm_manager.vram_base_offset regardless memory partition.
>>  		 */
>> -		r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb, NULL,
>> -					   last_start, prange->start + i,
>> -					   pte_flags,
>> -					   (last_start - prange->start) << PAGE_SHIFT,
>> -					   bo_adev ? bo_adev->vm_manager.vram_base_offset : 0,
>> -					   NULL, dma_addr, &vm->last_update);
>> +		r = amdgpu_vm_update_range(
>> +			adev, vm, false, false, flush_tlb, true, NULL,
>> +			last_start, prange->start + i, pte_flags,
>> +			(last_start - prange->start) << PAGE_SHIFT,
>> +			bo_adev ? bo_adev->vm_manager.vram_base_offset : 0,
>> +			NULL, dma_addr, &vm->last_update);
>>    		for (j = last_start - prange->start; j <= i; j++)
>>  			dma_addr[j] |= last_domain;
>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>> index f477eda6a2b8..3cdcc299956e 100644
>> --- a/include/uapi/drm/amdgpu_drm.h
>> +++ b/include/uapi/drm/amdgpu_drm.h
>> @@ -556,6 +556,8 @@ struct drm_amdgpu_gem_op {
>>  #define AMDGPU_VM_MTYPE_RW		(5 << 5)
>>  /* don't allocate MALL */
>>  #define AMDGPU_VM_PAGE_NOALLOC		(1 << 9)
>> +/* don't sync on unmap */
>> +#define AMDGPU_VM_EXPLICIT_SYNC		(1 << 10)
>>    struct drm_amdgpu_gem_va {
>>  	/** GEM object handle */


[-- Attachment #2: Type: text/html, Size: 70436 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations.
  2023-11-06 15:47       ` Tatsuyuki Ishi
@ 2023-11-06 19:14         ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2023-11-06 19:14 UTC (permalink / raw)
  To: Tatsuyuki Ishi, Christian König; +Cc: dri-devel, amd-gfx

[-- Attachment #1: Type: text/plain, Size: 21029 bytes --]

Am 06.11.23 um 16:47 schrieb Tatsuyuki Ishi:
>> On Nov 6, 2023, at 22:44, Christian König <christian.koenig@amd.com> 
>> wrote:
>>
>> Am 02.11.23 um 15:04 schrieb Tatsuyuki Ishi:
>>> In Vulkan, it is the application's responsibility to perform adequate
>>> synchronization before a sparse unmap, replace or BO destroy operation.
>>> Until now, the kernel applied the same rule as implicitly-synchronized
>>> APIs like OpenGL, which with per-VM BOs made page table updates 
>>> stall the
>>> queue completely. The newly added AMDGPU_VM_EXPLICIT_SYNC flag allows
>>> drivers to opt-out of this behavior, while still ensuring adequate 
>>> implicit
>>> sync happens for kernel-initiated updates (e.g. BO moves).
>>>
>>> We record whether to use implicit sync or not for each freed mapping. To
>>> avoid increasing the mapping struct's size, this is union-ized with the
>>> interval tree field which is unused after the unmap.
>>>
>>> The reason this is done with a GEM ioctl flag, instead of being a VM /
>>> context global setting, is that the current libdrm implementation shares
>>> the DRM handle even between different kind of drivers (radeonsi vs 
>>> radv).
>>
>> It would be nice if we could make this more future prove by not using 
>> a flag, but rather a drm_syncobj.
>
> There is asynchronous VM_BIND and synchronous VM_BIND. Using syncobjs 
> address asynchronous binds, but what this patch set solves is to add 
> an explicitly synced synchronous bind.

All VM updates are asynchronous in the sense that they are queues up and 
don't execute immediately.

If you don't add input/output fences and don't sync implicitly with 
command submission any more you actually have no idea in userspace when 
they execute.

That doesn't sound like a good idea to me.

>
> Even within Vulkan, there are use cases for synchronous binds. This is 
> when a non-sparse BO is destroyed (or created but that’s not 
> synchronized). Such operations should still be explicit sync, unlike 
> OpenGL where it syncs to previous submissions. So adding asynchronous 
> bind doesn’t supersede this need.
>
> I’ve also thought whether we can just make the unmap asynchronous, 
> since the spec requires that destroyed stuff are not accessed in any 
> way, but I think it will complicate behavior when the destruction of 
> BO immediately follows.
>
> We should implement asynchronous bind someday to make 
> vkQueueBindSparse work (even) better, but that will likely involve a 
> larger scope including the scheduler. Getting synchronous but 
> explicitly synced binds should be simpler and a good incremental step.

That's the whole point, I don't think that the flag is simpler/cleaner 
than a fence.

We still need to take the implicit sync which can come from kernel 
operations into account, but at the same time disable the implicit sync 
which comes from user space submissions.

As far as I can see the easiest way to do this and which both doesn't 
break anything currently and still leaves room to extend the interface 
is to add an input dependency fence.

If you then use a signaled syncpoint as input you get exactly the 
semantic you desire while we are still able to add an output fence later on.

Regards,
Christian.

>
> Tatsuyuki.
>
>> You can extend the drm_amdgpu_gem_va structure by adding a 
>> drm_syncobj handle and timeline point at the end.
>>
>> If the syncobj/timeline point results in a fence we give that as 
>> input dependency the operation has to wait for.
>>
>> And output fence can come later on as well, but that one is much more 
>> harder to handle.
>>
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
>>> ---
>>>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  2 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c       |  2 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       | 14 ++++--
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |  7 ++-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h     |  6 ++-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        | 47 +++++++++++--------
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        | 23 +++++----
>>>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 18 +++----
>>>  include/uapi/drm/amdgpu_drm.h                 |  2 +
>>>  9 files changed, 71 insertions(+), 50 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> index 7d6daf8d2bfa..10e129bff977 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> @@ -1196,7 +1196,7 @@ static void unmap_bo_from_gpuvm(struct kgd_mem 
>>> *mem,
>>> struct amdgpu_device *adev = entry->adev;
>>> struct amdgpu_vm *vm = bo_va->base.vm;
>>>  -amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
>>> +amdgpu_vm_bo_unmap(adev, bo_va, entry->va, true);
>>> amdgpu_vm_clear_freed(adev, vm, &bo_va->last_pt_update);
>>>  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>>> index 720011019741..612279e65bff 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>>> @@ -122,7 +122,7 @@ int amdgpu_unmap_static_csa(struct amdgpu_device 
>>> *adev, struct amdgpu_vm *vm,
>>> }
>>> }
>>>  -r = amdgpu_vm_bo_unmap(adev, bo_va, csa_addr);
>>> +r = amdgpu_vm_bo_unmap(adev, bo_va, csa_addr, true);
>>> if (r) {
>>> DRM_ERROR("failed to do bo_unmap on static CSA, err=%d\n", r);
>>> goto error;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> index a1b15d0d6c48..cca68b89754e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> @@ -667,9 +667,9 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, 
>>> void *data,
>>> const uint32_t valid_flags = AMDGPU_VM_DELAY_UPDATE |
>>> AMDGPU_VM_PAGE_READABLE | AMDGPU_VM_PAGE_WRITEABLE |
>>> AMDGPU_VM_PAGE_EXECUTABLE | AMDGPU_VM_MTYPE_MASK |
>>> -AMDGPU_VM_PAGE_NOALLOC;
>>> +AMDGPU_VM_PAGE_NOALLOC | AMDGPU_VM_EXPLICIT_SYNC;
>>> const uint32_t prt_flags = AMDGPU_VM_DELAY_UPDATE |
>>> -AMDGPU_VM_PAGE_PRT;
>>> +AMDGPU_VM_PAGE_PRT | AMDGPU_VM_EXPLICIT_SYNC;
>>> struct drm_amdgpu_gem_va *args = data;
>>> struct drm_gem_object *gobj;
>>> @@ -680,6 +680,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, 
>>> void *data,
>>> struct drm_exec exec;
>>> uint64_t va_flags;
>>> uint64_t vm_size;
>>> +bool sync_unmap;
>>> int r = 0;
>>> if (args->va_address < AMDGPU_VA_RESERVED_SIZE) {
>>> @@ -715,6 +716,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, 
>>> void *data,
>>> return -EINVAL;
>>> }
>>>  +sync_unmap = !(args->flags & AMDGPU_VM_EXPLICIT_SYNC);
>>> +
>>> switch (args->operation) {
>>> case AMDGPU_VA_OP_MAP:
>>> case AMDGPU_VA_OP_UNMAP:
>>> @@ -774,19 +777,20 @@ int amdgpu_gem_va_ioctl(struct drm_device 
>>> *dev, void *data,
>>>     va_flags);
>>> break;
>>> case AMDGPU_VA_OP_UNMAP:
>>> -r = amdgpu_vm_bo_unmap(adev, bo_va, args->va_address);
>>> +r = amdgpu_vm_bo_unmap(adev, bo_va, args->va_address,
>>> +      sync_unmap);
>>> break;
>>> case AMDGPU_VA_OP_CLEAR:
>>> r = amdgpu_vm_bo_clear_mappings(adev, &fpriv->vm,
>>> args->va_address,
>>> -args->map_size);
>>> +args->map_size, sync_unmap);
>>> break;
>>> case AMDGPU_VA_OP_REPLACE:
>>> va_flags = amdgpu_gem_va_map_flags(adev, args->flags);
>>> r = amdgpu_vm_bo_replace_map(adev, bo_va, args->va_address,
>>>     args->offset_in_bo, args->map_size,
>>> -    va_flags);
>>> +    va_flags, sync_unmap);
>>> break;
>>> default:
>>> break;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>> index f3ee83cdf97e..28be03f1bbcf 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>> @@ -67,7 +67,12 @@ struct amdgpu_bo_va_mapping {
>>> struct rb_noderb;
>>> uint64_tstart;
>>> uint64_tlast;
>>> -uint64_t__subtree_last;
>>> +union {
>>> +/* BOs in interval tree only */
>>> +uint64_t__subtree_last;
>>> +/* Freed BOs only */
>>> +boolsync_unmap;
>>> +};
>>> uint64_toffset;
>>> uint64_tflags;
>>>  };
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>> index 2fd1bfb35916..e71443c8c59b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>> @@ -276,6 +276,7 @@ TRACE_EVENT(amdgpu_vm_bo_unmap,
>>>     __field(long, last)
>>>     __field(u64, offset)
>>>     __field(u64, flags)
>>> +    __field(bool, sync_unmap)
>>>     ),
>>>    TP_fast_assign(
>>> @@ -284,10 +285,11 @@ TRACE_EVENT(amdgpu_vm_bo_unmap,
>>>   __entry->last = mapping->last;
>>>   __entry->offset = mapping->offset;
>>>   __entry->flags = mapping->flags;
>>> +  __entry->sync_unmap = mapping->sync_unmap;
>>>   ),
>>> -   TP_printk("bo=%p, start=%lx, last=%lx, offset=%010llx, flags=%llx",
>>> +   TP_printk("bo=%p, start=%lx, last=%lx, offset=%010llx, 
>>> flags=%llx, sync_unmap=%d",
>>>      __entry->bo, __entry->start, __entry->last,
>>> -     __entry->offset, __entry->flags)
>>> +     __entry->offset, __entry->flags, __entry->sync_unmap)
>>>  );
>>>    DECLARE_EVENT_CLASS(amdgpu_vm_mapping,
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 7b9762f1cddd..a74472e16952 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -844,6 +844,7 @@ static void amdgpu_vm_tlb_seq_cb(struct 
>>> dma_fence *fence,
>>>   * @immediate: immediate submission in a page fault
>>>   * @unlocked: unlocked invalidation during MM callback
>>>   * @flush_tlb: trigger tlb invalidation after update completed
>>> + * @sync_unmap: wait for BO users before unmapping
>>>   * @resv: fences we need to sync to
>>>   * @start: start of mapped range
>>>   * @last: last mapped entry
>>> @@ -861,8 +862,9 @@ static void amdgpu_vm_tlb_seq_cb(struct 
>>> dma_fence *fence,
>>>   */
>>>  int amdgpu_vm_update_range(struct amdgpu_device *adev, struct 
>>> amdgpu_vm *vm,
>>>   bool immediate, bool unlocked, bool flush_tlb,
>>> -  struct dma_resv *resv, uint64_t start, uint64_t last,
>>> -  uint64_t flags, uint64_t offset, uint64_t vram_base,
>>> +  bool sync_unmap, struct dma_resv *resv,
>>> +  uint64_t start, uint64_t last, uint64_t flags,
>>> +  uint64_t offset, uint64_t vram_base,
>>>   struct ttm_resource *res, dma_addr_t *pages_addr,
>>>   struct dma_fence **fence)
>>>  {
>>> @@ -902,7 +904,7 @@ int amdgpu_vm_update_range(struct amdgpu_device 
>>> *adev, struct amdgpu_vm *vm,
>>> /* Implicitly sync to command submissions in the same VM before
>>> * unmapping. Sync to moving fences before mapping.
>>> */
>>> -if (!(flags & (AMDGPU_PTE_VALID | AMDGPU_PTE_PRT)))
>>> +if (!(flags & (AMDGPU_PTE_VALID | AMDGPU_PTE_PRT)) && sync_unmap)
>>> sync_mode = AMDGPU_SYNC_EQ_OWNER;
>>> else
>>> sync_mode = AMDGPU_SYNC_EXPLICIT;
>>> @@ -1145,10 +1147,10 @@ int amdgpu_vm_bo_update(struct amdgpu_device 
>>> *adev, struct amdgpu_bo_va *bo_va,
>>> trace_amdgpu_vm_bo_update(mapping);
>>> r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb,
>>> -  resv, mapping->start, mapping->last,
>>> -  update_flags, mapping->offset,
>>> -  vram_base, mem, pages_addr,
>>> -  last_update);
>>> +  true, resv, mapping->start,
>>> +  mapping->last, update_flags,
>>> +  mapping->offset, vram_base, mem,
>>> +  pages_addr, last_update);
>>> if (r)
>>> return r;
>>> }
>>> @@ -1340,7 +1342,8 @@ int amdgpu_vm_clear_freed(struct amdgpu_device 
>>> *adev,
>>>    mapping->start < AMDGPU_GMC_HOLE_START)
>>> init_pte_value = AMDGPU_PTE_DEFAULT_ATC;
>>>  -r = amdgpu_vm_update_range(adev, vm, false, false, true, resv,
>>> +r = amdgpu_vm_update_range(adev, vm, false, false, true,
>>> +  mapping->sync_unmap, resv,
>>>   mapping->start, mapping->last,
>>>   init_pte_value, 0, 0, NULL, NULL,
>>>   &f);
>>> @@ -1572,6 +1575,7 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
>>>   * @offset: requested offset in the BO
>>>   * @size: BO size in bytes
>>>   * @flags: attributes of pages (read/write/valid/etc.)
>>> + * @sync_unmap: wait for BO users before replacing existing mapping
>>>   *
>>>   * Add a mapping of the BO at the specefied addr into the VM. 
>>> Replace existing
>>>   * mappings as we do so.
>>> @@ -1582,9 +1586,9 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
>>>   * Object has to be reserved and unreserved outside!
>>>   */
>>>  int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
>>> -    struct amdgpu_bo_va *bo_va,
>>> -    uint64_t saddr, uint64_t offset,
>>> -    uint64_t size, uint64_t flags)
>>> +    struct amdgpu_bo_va *bo_va, uint64_t saddr,
>>> +    uint64_t offset, uint64_t size, uint64_t flags,
>>> +    bool sync_unmap)
>>>  {
>>> struct amdgpu_bo_va_mapping *mapping;
>>> struct amdgpu_bo *bo = bo_va->base.bo;
>>> @@ -1608,7 +1612,7 @@ int amdgpu_vm_bo_replace_map(struct 
>>> amdgpu_device *adev,
>>> if (!mapping)
>>> return -ENOMEM;
>>>  -r = amdgpu_vm_bo_clear_mappings(adev, bo_va->base.vm, saddr, size);
>>> +r = amdgpu_vm_bo_clear_mappings(adev, bo_va->base.vm, saddr, size, 
>>> sync_unmap);
>>> if (r) {
>>> kfree(mapping);
>>> return r;
>>> @@ -1633,6 +1637,7 @@ int amdgpu_vm_bo_replace_map(struct 
>>> amdgpu_device *adev,
>>>   * @adev: amdgpu_device pointer
>>>   * @bo_va: bo_va to remove the address from
>>>   * @saddr: where to the BO is mapped
>>> + * @sync_unmap: wait for BO users before unmapping
>>>   *
>>>   * Remove a mapping of the BO at the specefied addr from the VM.
>>>   *
>>> @@ -1641,9 +1646,8 @@ int amdgpu_vm_bo_replace_map(struct 
>>> amdgpu_device *adev,
>>>   *
>>>   * Object has to be reserved and unreserved outside!
>>>   */
>>> -int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>>> -      struct amdgpu_bo_va *bo_va,
>>> -      uint64_t saddr)
>>> +int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, struct 
>>> amdgpu_bo_va *bo_va,
>>> +      uint64_t saddr, bool sync_unmap)
>>>  {
>>> struct amdgpu_bo_va_mapping *mapping;
>>> struct amdgpu_vm *vm = bo_va->base.vm;
>>> @@ -1671,6 +1675,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>>> list_del(&mapping->list);
>>> amdgpu_vm_it_remove(mapping, &vm->va);
>>> mapping->bo_va = NULL;
>>> +mapping->sync_unmap = sync_unmap;
>>> trace_amdgpu_vm_bo_unmap(bo_va, mapping);
>>> if (valid)
>>> @@ -1689,6 +1694,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>>>   * @vm: VM structure to use
>>>   * @saddr: start of the range
>>>   * @size: size of the range
>>> + * @sync_unmap: wait for BO users before unmapping
>>>   *
>>>   * Remove all mappings in a range, split them as appropriate.
>>>   *
>>> @@ -1696,8 +1702,8 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>>>   * 0 for success, error for failure.
>>>   */
>>>  int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>>> -struct amdgpu_vm *vm,
>>> -uint64_t saddr, uint64_t size)
>>> +struct amdgpu_vm *vm, uint64_t saddr,
>>> +uint64_t size, bool sync_unmap)
>>>  {
>>> struct amdgpu_bo_va_mapping *before, *after, *tmp, *next;
>>> LIST_HEAD(removed);
>>> @@ -1761,6 +1767,7 @@ int amdgpu_vm_bo_clear_mappings(struct 
>>> amdgpu_device *adev,
>>>    tmp->last = eaddr;
>>> tmp->bo_va = NULL;
>>> +tmp->sync_unmap = sync_unmap;
>>> list_add(&tmp->list, &vm->freed);
>>> trace_amdgpu_vm_bo_unmap(NULL, tmp);
>>> }
>>> @@ -1889,6 +1896,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
>>> list_del(&mapping->list);
>>> amdgpu_vm_it_remove(mapping, &vm->va);
>>> mapping->bo_va = NULL;
>>> +mapping->sync_unmap = true;
>>> trace_amdgpu_vm_bo_unmap(bo_va, mapping);
>>> list_add(&mapping->list, &vm->freed);
>>> }
>>> @@ -2617,8 +2625,9 @@ bool amdgpu_vm_handle_fault(struct 
>>> amdgpu_device *adev, u32 pasid,
>>> goto error_unlock;
>>> }
>>>  -r = amdgpu_vm_update_range(adev, vm, true, false, false, NULL, addr,
>>> -  addr, flags, value, 0, NULL, NULL, NULL);
>>> +r = amdgpu_vm_update_range(adev, vm, true, false, false, true, NULL,
>>> +  addr, addr, flags, value, 0, NULL, NULL,
>>> +  NULL);
>>> if (r)
>>> goto error_unlock;
>>>  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> index 204ab13184ed..73b7b49fdb2e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> @@ -423,12 +423,12 @@ void amdgpu_vm_bo_base_init(struct 
>>> amdgpu_vm_bo_base *base,
>>>    struct amdgpu_vm *vm, struct amdgpu_bo *bo);
>>>  int amdgpu_vm_update_range(struct amdgpu_device *adev, struct 
>>> amdgpu_vm *vm,
>>>   bool immediate, bool unlocked, bool flush_tlb,
>>> -  struct dma_resv *resv, uint64_t start, uint64_t last,
>>> -  uint64_t flags, uint64_t offset, uint64_t vram_base,
>>> +  bool sync_unmap, struct dma_resv *resv,
>>> +  uint64_t start, uint64_t last, uint64_t flags,
>>> +  uint64_t offset, uint64_t vram_base,
>>>   struct ttm_resource *res, dma_addr_t *pages_addr,
>>>   struct dma_fence **fence);
>>> -int amdgpu_vm_bo_update(struct amdgpu_device *adev,
>>> -struct amdgpu_bo_va *bo_va,
>>> +int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct 
>>> amdgpu_bo_va *bo_va,
>>> bool clear);
>>>  bool amdgpu_vm_evictable(struct amdgpu_bo *bo);
>>>  void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
>>> @@ -444,15 +444,14 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
>>>     uint64_t addr, uint64_t offset,
>>>     uint64_t size, uint64_t flags);
>>>  int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
>>> -    struct amdgpu_bo_va *bo_va,
>>> -    uint64_t addr, uint64_t offset,
>>> -    uint64_t size, uint64_t flags);
>>> -int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>>> -      struct amdgpu_bo_va *bo_va,
>>> -      uint64_t addr);
>>> +    struct amdgpu_bo_va *bo_va, uint64_t addr,
>>> +    uint64_t offset, uint64_t size, uint64_t flags,
>>> +    bool sync_unmap);
>>> +int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, struct 
>>> amdgpu_bo_va *bo_va,
>>> +      uint64_t addr, bool sync_unmap);
>>>  int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,
>>> -struct amdgpu_vm *vm,
>>> -uint64_t saddr, uint64_t size);
>>> +struct amdgpu_vm *vm, uint64_t saddr,
>>> +uint64_t size, bool sync_unmap);
>>>  struct amdgpu_bo_va_mapping *amdgpu_vm_bo_lookup_mapping(struct 
>>> amdgpu_vm *vm,
>>> uint64_t addr);
>>>  void amdgpu_vm_bo_trace_cs(struct amdgpu_vm *vm, struct 
>>> ww_acquire_ctx *ticket);
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> index bb16b795d1bc..6eb4a0a4bc84 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> @@ -1291,9 +1291,9 @@ svm_range_unmap_from_gpu(struct amdgpu_device 
>>> *adev, struct amdgpu_vm *vm,
>>> pr_debug("[0x%llx 0x%llx]\n", start, last);
>>>  -return amdgpu_vm_update_range(adev, vm, false, true, true, NULL, 
>>> start,
>>> -     last, init_pte_value, 0, 0, NULL, NULL,
>>> -     fence);
>>> +return amdgpu_vm_update_range(adev, vm, false, true, true, true, NULL,
>>> +     start, last, init_pte_value, 0, 0, NULL,
>>> +     NULL, fence);
>>>  }
>>>    static int
>>> @@ -1398,12 +1398,12 @@ svm_range_map_to_gpu(struct 
>>> kfd_process_device *pdd, struct svm_range *prange,
>>> * different memory partition based on fpfn/lpfn, we should use
>>> * same vm_manager.vram_base_offset regardless memory partition.
>>> */
>>> -r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb, NULL,
>>> -  last_start, prange->start + i,
>>> -  pte_flags,
>>> -  (last_start - prange->start) << PAGE_SHIFT,
>>> -  bo_adev ? bo_adev->vm_manager.vram_base_offset : 0,
>>> -  NULL, dma_addr, &vm->last_update);
>>> +r = amdgpu_vm_update_range(
>>> +adev, vm, false, false, flush_tlb, true, NULL,
>>> +last_start, prange->start + i, pte_flags,
>>> +(last_start - prange->start) << PAGE_SHIFT,
>>> +bo_adev ? bo_adev->vm_manager.vram_base_offset : 0,
>>> +NULL, dma_addr, &vm->last_update);
>>> for (j = last_start - prange->start; j <= i; j++)
>>> dma_addr[j] |= last_domain;
>>> diff --git a/include/uapi/drm/amdgpu_drm.h 
>>> b/include/uapi/drm/amdgpu_drm.h
>>> index f477eda6a2b8..3cdcc299956e 100644
>>> --- a/include/uapi/drm/amdgpu_drm.h
>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>> @@ -556,6 +556,8 @@ struct drm_amdgpu_gem_op {
>>>  #define AMDGPU_VM_MTYPE_RW(5 << 5)
>>>  /* don't allocate MALL */
>>>  #define AMDGPU_VM_PAGE_NOALLOC(1 << 9)
>>> +/* don't sync on unmap */
>>> +#define AMDGPU_VM_EXPLICIT_SYNC(1 << 10)
>>>    struct drm_amdgpu_gem_va {
>>> /** GEM object handle */
>

[-- Attachment #2: Type: text/html, Size: 85941 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2023-11-06 19:14 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-31 13:40 [PATCH 0/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
2023-10-31 13:40 ` [PATCH 1/6] drm/amdgpu: Don't implicit sync PRT maps Tatsuyuki Ishi
2023-10-31 13:40 ` [PATCH 2/6] drm/amdgpu: Separate eviction from VM status Tatsuyuki Ishi
2023-10-31 13:55   ` Christian König
2023-10-31 14:39     ` Tatsuyuki Ishi
2023-10-31 14:44       ` Christian König
2023-10-31 23:52   ` kernel test robot
2023-10-31 13:40 ` [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly Tatsuyuki Ishi
2023-10-31 13:57   ` Christian König
2023-10-31 13:59     ` Bas Nieuwenhuizen
2023-10-31 14:07       ` Christian König
2023-10-31 14:17         ` Bas Nieuwenhuizen
2023-10-31 14:39           ` Tatsuyuki Ishi
2023-11-02  2:36         ` Lang Yu
2023-11-02  6:41           ` Christian König
2023-11-06  7:56         ` Tatsuyuki Ishi
2023-11-06 13:33           ` Christian König
2023-11-01  1:18   ` kernel test robot
2023-10-31 13:40 ` [PATCH 4/6] drm/amdgpu: Remove redundant state change after validation Tatsuyuki Ishi
2023-10-31 14:01   ` Christian König
2023-10-31 13:40 ` [PATCH 5/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
2023-10-31 14:14   ` Michel Dänzer
2023-10-31 14:20     ` Bas Nieuwenhuizen
2023-10-31 14:34     ` Christian König
2023-10-31 14:56       ` Michel Dänzer
2023-11-01  2:42   ` kernel test robot
2023-10-31 13:40 ` [PATCH 6/6] drm/amdgpu: Bump amdgpu driver version Tatsuyuki Ishi
2023-11-02 14:04 ` [PATCH v2 0/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
2023-11-02 14:04   ` [PATCH v2 1/3] drm/amdgpu: Don't implicit sync PRT maps Tatsuyuki Ishi
2023-11-02 14:04   ` [PATCH v2 2/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations Tatsuyuki Ishi
2023-11-06 13:44     ` Christian König
2023-11-06 15:47       ` Tatsuyuki Ishi
2023-11-06 19:14         ` Christian König
2023-11-02 14:04   ` [PATCH v2 3/3] drm/amdgpu: Bump amdgpu driver version Tatsuyuki Ishi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).