All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v4 00/14] drm/i915/vm_bind: Add VM_BIND functionality
@ 2022-09-21  7:09 ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM
buffer objects (BOs) or sections of a BOs at specified GPU virtual
addresses on a specified address space (VM). Multiple mappings can map
to the same physical pages of an object (aliasing). These mappings (also
referred to as persistent mappings) will be persistent across multiple
GPU submissions (execbuf calls) issued by the UMD, without user having
to provide a list of all required mappings during each submission (as
required by older execbuf mode).

This patch series support VM_BIND version 1, as described by the param
I915_PARAM_VM_BIND_VERSION.

Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only works in
vm_bind mode. The vm_bind mode only works with this new execbuf3 ioctl.
The new execbuf3 ioctl will not have any execlist support and all the
legacy support like relocations etc., are removed.

TODOs:
* Support out fence for VM_UNBIND ioctl.
* Async VM_UNBIND support.
* Cleanups and optimizations.

NOTEs:
* It is based on below VM_BIND design+uapi rfc.
  Documentation/gpu/rfc/i915_vm_bind.rst

* The IGT RFC series is posted as,
  [RFC v2 0/8] vm_bind: Add VM_BIND validation support

v4: Share code between legacy execbuf and execbuf3.
    Address review feedback from Thomas and Tvrtko.
    Reformat patches and some cleanups.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Niranjana Vishwanathapura (14):
  drm/i915/vm_bind: Expose vm lookup function
  drm/i915/vm_bind: Add __i915_sw_fence_await_reservation()
  drm/i915/vm_bind: Expose i915_gem_object_max_page_size()
  drm/i915/vm_bind: Implement bind and unbind of object
  drm/i915/vm_bind: Support for VM private BOs
  drm/i915/vm_bind: Handle persistent vmas
  drm/i915/vm_bind: Add out fence support
  drm/i915/vm_bind: Abstract out common execbuf functions
  drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
  drm/i915/vm_bind: Update i915_vma_verify_bind_complete()
  drm/i915/vm_bind: Handle persistent vmas in execbuf3
  drm/i915/vm_bind: userptr dma-resv changes
  drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  drm/i915/vm_bind: Add uapi for user to enable vm_bind_mode

 drivers/gpu/drm/i915/Makefile                 |   3 +
 drivers/gpu/drm/i915/display/intel_fb_pin.c   |   2 +-
 .../drm/i915/display/intel_plane_initial.c    |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  16 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |   3 +
 drivers/gpu/drm/i915/gem/i915_gem_create.c    |  60 +-
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |   6 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 520 +----------
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 861 ++++++++++++++++++
 .../drm/i915/gem/i915_gem_execbuffer_common.c | 530 +++++++++++
 .../drm/i915/gem/i915_gem_execbuffer_common.h |  47 +
 drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |   2 +
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |   3 +
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   3 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |   3 +
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |  17 +
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  31 +
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 421 +++++++++
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  16 +-
 .../i915/gem/selftests/i915_gem_client_blt.c  |   2 +-
 .../drm/i915/gem/selftests/i915_gem_context.c |  12 +-
 .../drm/i915/gem/selftests/i915_gem_migrate.c |   2 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |   6 +-
 .../drm/i915/gem/selftests/igt_gem_utils.c    |   2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |   2 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
 drivers/gpu/drm/i915/gt/intel_gt.c            |   2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  20 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  27 +
 drivers/gpu/drm/i915/gt/intel_lrc.c           |   4 +-
 drivers/gpu/drm/i915/gt/intel_renderstate.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_ring.c          |   2 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
 drivers/gpu/drm/i915/gt/mock_engine.c         |   2 +-
 drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |   4 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  16 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   6 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |   2 +-
 .../drm/i915/gt/selftest_ring_submission.c    |   2 +-
 drivers/gpu/drm/i915/gt/selftest_rps.c        |   2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |   4 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |   2 +-
 drivers/gpu/drm/i915/i915_driver.c            |   4 +
 drivers/gpu/drm/i915/i915_gem.c               |   2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c           |  39 +
 drivers/gpu/drm/i915/i915_gem_gtt.h           |   3 +
 drivers/gpu/drm/i915/i915_getparam.c          |   3 +
 drivers/gpu/drm/i915/i915_perf.c              |   2 +-
 drivers/gpu/drm/i915/i915_sw_fence.c          |  25 +-
 drivers/gpu/drm/i915/i915_sw_fence.h          |   7 +-
 drivers/gpu/drm/i915/i915_vma.c               |  98 +-
 drivers/gpu/drm/i915/i915_vma.h               |  51 +-
 drivers/gpu/drm/i915/i915_vma_types.h         |  42 +
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  44 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |   4 +-
 drivers/gpu/drm/i915/selftests/i915_vma.c     |   2 +-
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
 .../drm/i915/selftests/intel_memory_region.c  |   2 +-
 include/uapi/drm/i915_drm.h                   | 285 +++++-
 61 files changed, 2690 insertions(+), 604 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c

-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Intel-gfx] [RFC v4 00/14] drm/i915/vm_bind: Add VM_BIND functionality
@ 2022-09-21  7:09 ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM
buffer objects (BOs) or sections of a BOs at specified GPU virtual
addresses on a specified address space (VM). Multiple mappings can map
to the same physical pages of an object (aliasing). These mappings (also
referred to as persistent mappings) will be persistent across multiple
GPU submissions (execbuf calls) issued by the UMD, without user having
to provide a list of all required mappings during each submission (as
required by older execbuf mode).

This patch series support VM_BIND version 1, as described by the param
I915_PARAM_VM_BIND_VERSION.

Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only works in
vm_bind mode. The vm_bind mode only works with this new execbuf3 ioctl.
The new execbuf3 ioctl will not have any execlist support and all the
legacy support like relocations etc., are removed.

TODOs:
* Support out fence for VM_UNBIND ioctl.
* Async VM_UNBIND support.
* Cleanups and optimizations.

NOTEs:
* It is based on below VM_BIND design+uapi rfc.
  Documentation/gpu/rfc/i915_vm_bind.rst

* The IGT RFC series is posted as,
  [RFC v2 0/8] vm_bind: Add VM_BIND validation support

v4: Share code between legacy execbuf and execbuf3.
    Address review feedback from Thomas and Tvrtko.
    Reformat patches and some cleanups.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Niranjana Vishwanathapura (14):
  drm/i915/vm_bind: Expose vm lookup function
  drm/i915/vm_bind: Add __i915_sw_fence_await_reservation()
  drm/i915/vm_bind: Expose i915_gem_object_max_page_size()
  drm/i915/vm_bind: Implement bind and unbind of object
  drm/i915/vm_bind: Support for VM private BOs
  drm/i915/vm_bind: Handle persistent vmas
  drm/i915/vm_bind: Add out fence support
  drm/i915/vm_bind: Abstract out common execbuf functions
  drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
  drm/i915/vm_bind: Update i915_vma_verify_bind_complete()
  drm/i915/vm_bind: Handle persistent vmas in execbuf3
  drm/i915/vm_bind: userptr dma-resv changes
  drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  drm/i915/vm_bind: Add uapi for user to enable vm_bind_mode

 drivers/gpu/drm/i915/Makefile                 |   3 +
 drivers/gpu/drm/i915/display/intel_fb_pin.c   |   2 +-
 .../drm/i915/display/intel_plane_initial.c    |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  16 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |   3 +
 drivers/gpu/drm/i915/gem/i915_gem_create.c    |  60 +-
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |   6 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 520 +----------
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 861 ++++++++++++++++++
 .../drm/i915/gem/i915_gem_execbuffer_common.c | 530 +++++++++++
 .../drm/i915/gem/i915_gem_execbuffer_common.h |  47 +
 drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |   2 +
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |   3 +
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   3 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |   3 +
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |  17 +
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  31 +
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 421 +++++++++
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  16 +-
 .../i915/gem/selftests/i915_gem_client_blt.c  |   2 +-
 .../drm/i915/gem/selftests/i915_gem_context.c |  12 +-
 .../drm/i915/gem/selftests/i915_gem_migrate.c |   2 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |   6 +-
 .../drm/i915/gem/selftests/igt_gem_utils.c    |   2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |   2 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
 drivers/gpu/drm/i915/gt/intel_gt.c            |   2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  20 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  27 +
 drivers/gpu/drm/i915/gt/intel_lrc.c           |   4 +-
 drivers/gpu/drm/i915/gt/intel_renderstate.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_ring.c          |   2 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
 drivers/gpu/drm/i915/gt/mock_engine.c         |   2 +-
 drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |   4 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  16 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   6 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |   2 +-
 .../drm/i915/gt/selftest_ring_submission.c    |   2 +-
 drivers/gpu/drm/i915/gt/selftest_rps.c        |   2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |   4 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |   2 +-
 drivers/gpu/drm/i915/i915_driver.c            |   4 +
 drivers/gpu/drm/i915/i915_gem.c               |   2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c           |  39 +
 drivers/gpu/drm/i915/i915_gem_gtt.h           |   3 +
 drivers/gpu/drm/i915/i915_getparam.c          |   3 +
 drivers/gpu/drm/i915/i915_perf.c              |   2 +-
 drivers/gpu/drm/i915/i915_sw_fence.c          |  25 +-
 drivers/gpu/drm/i915/i915_sw_fence.h          |   7 +-
 drivers/gpu/drm/i915/i915_vma.c               |  98 +-
 drivers/gpu/drm/i915/i915_vma.h               |  51 +-
 drivers/gpu/drm/i915/i915_vma_types.h         |  42 +
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  44 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |   4 +-
 drivers/gpu/drm/i915/selftests/i915_vma.c     |   2 +-
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
 .../drm/i915/selftests/intel_memory_region.c  |   2 +-
 include/uapi/drm/i915_drm.h                   | 285 +++++-
 61 files changed, 2690 insertions(+), 604 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c

-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC v4 01/14] drm/i915/vm_bind: Expose vm lookup function
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Make i915_gem_vm_lookup() function non-static as it will be
used by the vm_bind feature.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 11 ++++++++++-
 drivers/gpu/drm/i915/gem/i915_gem_context.h |  3 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 0bcde53c50c6..f4e648ec01ed 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -346,7 +346,16 @@ static int proto_context_register(struct drm_i915_file_private *fpriv,
 	return ret;
 }
 
-static struct i915_address_space *
+/**
+ * i915_gem_vm_lookup() - looks up for the VM reference given the vm id
+ * @file_priv: the private data associated with the user's file
+ * @id: the VM id
+ *
+ * Finds the VM reference associated to a specific id.
+ *
+ * Returns the VM pointer on success, NULL in case of failure.
+ */
+struct i915_address_space *
 i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
 {
 	struct i915_address_space *vm;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index e5b0f66ea1fe..899fa8f1e0fe 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -139,6 +139,9 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
 				       struct drm_file *file);
 
+struct i915_address_space *
+i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id);
+
 struct i915_gem_context *
 i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id);
 
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-gfx] [RFC v4 01/14] drm/i915/vm_bind: Expose vm lookup function
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Make i915_gem_vm_lookup() function non-static as it will be
used by the vm_bind feature.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 11 ++++++++++-
 drivers/gpu/drm/i915/gem/i915_gem_context.h |  3 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 0bcde53c50c6..f4e648ec01ed 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -346,7 +346,16 @@ static int proto_context_register(struct drm_i915_file_private *fpriv,
 	return ret;
 }
 
-static struct i915_address_space *
+/**
+ * i915_gem_vm_lookup() - looks up for the VM reference given the vm id
+ * @file_priv: the private data associated with the user's file
+ * @id: the VM id
+ *
+ * Finds the VM reference associated to a specific id.
+ *
+ * Returns the VM pointer on success, NULL in case of failure.
+ */
+struct i915_address_space *
 i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
 {
 	struct i915_address_space *vm;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index e5b0f66ea1fe..899fa8f1e0fe 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -139,6 +139,9 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
 				       struct drm_file *file);
 
+struct i915_address_space *
+i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id);
+
 struct i915_gem_context *
 i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id);
 
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC v4 02/14] drm/i915/vm_bind: Add __i915_sw_fence_await_reservation()
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Add function __i915_sw_fence_await_reservation() for
asynchronous wait on a dma-resv object with specified
dma_resv_usage. This is required for async vma unbind
with vm_bind.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/i915_sw_fence.c | 25 ++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_sw_fence.h |  7 ++++++-
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 6fc0d1b89690..0ce8f4efc1ed 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -569,12 +569,11 @@ int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 	return ret;
 }
 
-int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
-				    struct dma_resv *resv,
-				    const struct dma_fence_ops *exclude,
-				    bool write,
-				    unsigned long timeout,
-				    gfp_t gfp)
+int __i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
+				      struct dma_resv *resv,
+				      enum dma_resv_usage usage,
+				      unsigned long timeout,
+				      gfp_t gfp)
 {
 	struct dma_resv_iter cursor;
 	struct dma_fence *f;
@@ -583,7 +582,7 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 	debug_fence_assert(fence);
 	might_sleep_if(gfpflags_allow_blocking(gfp));
 
-	dma_resv_iter_begin(&cursor, resv, dma_resv_usage_rw(write));
+	dma_resv_iter_begin(&cursor, resv, usage);
 	dma_resv_for_each_fence_unlocked(&cursor, f) {
 		pending = i915_sw_fence_await_dma_fence(fence, f, timeout,
 							gfp);
@@ -598,6 +597,18 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 	return ret;
 }
 
+int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
+				    struct dma_resv *resv,
+				    const struct dma_fence_ops *exclude,
+				    bool write,
+				    unsigned long timeout,
+				    gfp_t gfp)
+{
+	return __i915_sw_fence_await_reservation(fence, resv,
+						 dma_resv_usage_rw(write),
+						 timeout, gfp);
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/lib_sw_fence.c"
 #include "selftests/i915_sw_fence.c"
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 619fc5a22f0c..3cf4b6e16f35 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -10,13 +10,13 @@
 #define _I915_SW_FENCE_H_
 
 #include <linux/dma-fence.h>
+#include <linux/dma-resv.h>
 #include <linux/gfp.h>
 #include <linux/kref.h>
 #include <linux/notifier.h> /* for NOTIFY_DONE */
 #include <linux/wait.h>
 
 struct completion;
-struct dma_resv;
 struct i915_sw_fence;
 
 enum i915_sw_fence_notify {
@@ -89,6 +89,11 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 				  unsigned long timeout,
 				  gfp_t gfp);
 
+int __i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
+				      struct dma_resv *resv,
+				      enum dma_resv_usage usage,
+				      unsigned long timeout,
+				      gfp_t gfp);
 int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    struct dma_resv *resv,
 				    const struct dma_fence_ops *exclude,
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-gfx] [RFC v4 02/14] drm/i915/vm_bind: Add __i915_sw_fence_await_reservation()
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Add function __i915_sw_fence_await_reservation() for
asynchronous wait on a dma-resv object with specified
dma_resv_usage. This is required for async vma unbind
with vm_bind.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/i915_sw_fence.c | 25 ++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_sw_fence.h |  7 ++++++-
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 6fc0d1b89690..0ce8f4efc1ed 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -569,12 +569,11 @@ int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 	return ret;
 }
 
-int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
-				    struct dma_resv *resv,
-				    const struct dma_fence_ops *exclude,
-				    bool write,
-				    unsigned long timeout,
-				    gfp_t gfp)
+int __i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
+				      struct dma_resv *resv,
+				      enum dma_resv_usage usage,
+				      unsigned long timeout,
+				      gfp_t gfp)
 {
 	struct dma_resv_iter cursor;
 	struct dma_fence *f;
@@ -583,7 +582,7 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 	debug_fence_assert(fence);
 	might_sleep_if(gfpflags_allow_blocking(gfp));
 
-	dma_resv_iter_begin(&cursor, resv, dma_resv_usage_rw(write));
+	dma_resv_iter_begin(&cursor, resv, usage);
 	dma_resv_for_each_fence_unlocked(&cursor, f) {
 		pending = i915_sw_fence_await_dma_fence(fence, f, timeout,
 							gfp);
@@ -598,6 +597,18 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 	return ret;
 }
 
+int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
+				    struct dma_resv *resv,
+				    const struct dma_fence_ops *exclude,
+				    bool write,
+				    unsigned long timeout,
+				    gfp_t gfp)
+{
+	return __i915_sw_fence_await_reservation(fence, resv,
+						 dma_resv_usage_rw(write),
+						 timeout, gfp);
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/lib_sw_fence.c"
 #include "selftests/i915_sw_fence.c"
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 619fc5a22f0c..3cf4b6e16f35 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -10,13 +10,13 @@
 #define _I915_SW_FENCE_H_
 
 #include <linux/dma-fence.h>
+#include <linux/dma-resv.h>
 #include <linux/gfp.h>
 #include <linux/kref.h>
 #include <linux/notifier.h> /* for NOTIFY_DONE */
 #include <linux/wait.h>
 
 struct completion;
-struct dma_resv;
 struct i915_sw_fence;
 
 enum i915_sw_fence_notify {
@@ -89,6 +89,11 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 				  unsigned long timeout,
 				  gfp_t gfp);
 
+int __i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
+				      struct dma_resv *resv,
+				      enum dma_resv_usage usage,
+				      unsigned long timeout,
+				      gfp_t gfp);
 int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    struct dma_resv *resv,
 				    const struct dma_fence_ops *exclude,
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC v4 03/14] drm/i915/vm_bind: Expose i915_gem_object_max_page_size()
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Expose i915_gem_object_max_page_size() function non-static
which will be used by the vm_bind feature.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 20 +++++++++++++++-----
 drivers/gpu/drm/i915/gem/i915_gem_object.h |  2 ++
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 33673fe7ee0a..3b3ab4abb0a3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -11,14 +11,24 @@
 #include "pxp/intel_pxp.h"
 
 #include "i915_drv.h"
+#include "i915_gem_context.h"
 #include "i915_gem_create.h"
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
 
-static u32 object_max_page_size(struct intel_memory_region **placements,
-				unsigned int n_placements)
+/**
+ * i915_gem_object_max_page_size() - max of min_page_size of the regions
+ * @placements:  list of regions
+ * @n_placements: number of the placements
+ *
+ * Calculates the max of the min_page_size of a list of placements passed in.
+ *
+ * Return: max of the min_page_size
+ */
+u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
+				  unsigned int n_placements)
 {
-	u32 max_page_size = 0;
+	u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
 	int i;
 
 	for (i = 0; i < n_placements; i++) {
@@ -28,7 +38,6 @@ static u32 object_max_page_size(struct intel_memory_region **placements,
 		max_page_size = max_t(u32, max_page_size, mr->min_page_size);
 	}
 
-	GEM_BUG_ON(!max_page_size);
 	return max_page_size;
 }
 
@@ -99,7 +108,8 @@ __i915_gem_object_create_user_ext(struct drm_i915_private *i915, u64 size,
 
 	i915_gem_flush_free_objects(i915);
 
-	size = round_up(size, object_max_page_size(placements, n_placements));
+	size = round_up(size, i915_gem_object_max_page_size(placements,
+							    n_placements));
 	if (size == 0)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 7317d4102955..8c97bddad921 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
 }
 
 void i915_gem_init__objects(struct drm_i915_private *i915);
+u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
+				  unsigned int n_placements);
 
 void i915_objects_module_exit(void);
 int i915_objects_module_init(void);
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-gfx] [RFC v4 03/14] drm/i915/vm_bind: Expose i915_gem_object_max_page_size()
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Expose i915_gem_object_max_page_size() function non-static
which will be used by the vm_bind feature.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 20 +++++++++++++++-----
 drivers/gpu/drm/i915/gem/i915_gem_object.h |  2 ++
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 33673fe7ee0a..3b3ab4abb0a3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -11,14 +11,24 @@
 #include "pxp/intel_pxp.h"
 
 #include "i915_drv.h"
+#include "i915_gem_context.h"
 #include "i915_gem_create.h"
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
 
-static u32 object_max_page_size(struct intel_memory_region **placements,
-				unsigned int n_placements)
+/**
+ * i915_gem_object_max_page_size() - max of min_page_size of the regions
+ * @placements:  list of regions
+ * @n_placements: number of the placements
+ *
+ * Calculates the max of the min_page_size of a list of placements passed in.
+ *
+ * Return: max of the min_page_size
+ */
+u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
+				  unsigned int n_placements)
 {
-	u32 max_page_size = 0;
+	u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
 	int i;
 
 	for (i = 0; i < n_placements; i++) {
@@ -28,7 +38,6 @@ static u32 object_max_page_size(struct intel_memory_region **placements,
 		max_page_size = max_t(u32, max_page_size, mr->min_page_size);
 	}
 
-	GEM_BUG_ON(!max_page_size);
 	return max_page_size;
 }
 
@@ -99,7 +108,8 @@ __i915_gem_object_create_user_ext(struct drm_i915_private *i915, u64 size,
 
 	i915_gem_flush_free_objects(i915);
 
-	size = round_up(size, object_max_page_size(placements, n_placements));
+	size = round_up(size, i915_gem_object_max_page_size(placements,
+							    n_placements));
 	if (size == 0)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 7317d4102955..8c97bddad921 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
 }
 
 void i915_gem_init__objects(struct drm_i915_private *i915);
+u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
+				  unsigned int n_placements);
 
 void i915_objects_module_exit(void);
 int i915_objects_module_init(void);
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC v4 04/14] drm/i915/vm_bind: Implement bind and unbind of object
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Add uapi and implement support for bind and unbind of an
object at the specified GPU virtual addresses.

The vm_bind mode is not supported in legacy execbuf2 ioctl.
It will be supported only in the newer execbuf3 ioctl.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   5 +
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  27 ++
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 308 ++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  10 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  17 +
 drivers/gpu/drm/i915/i915_driver.c            |   3 +
 drivers/gpu/drm/i915/i915_vma.c               |   3 +-
 drivers/gpu/drm/i915/i915_vma.h               |   2 -
 drivers/gpu/drm/i915/i915_vma_types.h         |  14 +
 include/uapi/drm/i915_drm.h                   | 167 ++++++++++
 11 files changed, 554 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index a26edcdadc21..9bf939ef18ea 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -166,6 +166,7 @@ gem-y += \
 	gem/i915_gem_ttm_move.o \
 	gem/i915_gem_ttm_pm.o \
 	gem/i915_gem_userptr.o \
+	gem/i915_gem_vm_bind_object.o \
 	gem/i915_gem_wait.o \
 	gem/i915_gemfs.o
 i915-y += \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index cd75b0ca2555..f85f10cf9c34 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -781,6 +781,11 @@ static int eb_select_context(struct i915_execbuffer *eb)
 	if (unlikely(IS_ERR(ctx)))
 		return PTR_ERR(ctx);
 
+	if (ctx->vm->vm_bind_mode) {
+		i915_gem_context_put(ctx);
+		return -EOPNOTSUPP;
+	}
+
 	eb->gem_context = ctx;
 	if (i915_gem_context_has_full_ppgtt(ctx))
 		eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
new file mode 100644
index 000000000000..4f3cfa1f6ef6
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#ifndef __I915_GEM_VM_BIND_H
+#define __I915_GEM_VM_BIND_H
+
+#include <linux/types.h>
+
+#include <drm/drm_file.h>
+#include <drm/drm_device.h>
+
+#include "gt/intel_gtt.h"
+#include "i915_vma_types.h"
+
+struct i915_vma *
+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
+
+int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
+			   struct drm_file *file);
+int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file);
+
+void i915_gem_vm_unbind_all(struct i915_address_space *vm);
+
+#endif /* __I915_GEM_VM_BIND_H */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
new file mode 100644
index 000000000000..c24e22657617
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -0,0 +1,308 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include <uapi/drm/i915_drm.h>
+
+#include <linux/interval_tree_generic.h>
+
+#include "gem/i915_gem_context.h"
+#include "gem/i915_gem_vm_bind.h"
+
+#include "gt/intel_gpu_commands.h"
+
+#define START(node) ((node)->start)
+#define LAST(node) ((node)->last)
+
+INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
+		     START, LAST, static inline, i915_vm_bind_it)
+
+#undef START
+#undef LAST
+
+/**
+ * DOC: VM_BIND/UNBIND ioctls
+ *
+ * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
+ * objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
+ * specified address space (VM). Multiple mappings can map to the same physical
+ * pages of an object (aliasing). These mappings (also referred to as persistent
+ * mappings) will be persistent across multiple GPU submissions (execbuf calls)
+ * issued by the UMD, without user having to provide a list of all required
+ * mappings during each submission (as required by older execbuf mode).
+ *
+ * The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
+ * signaling the completion of bind/unbind operation.
+ *
+ * VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
+ * User has to opt-in for VM_BIND mode of binding for an address space (VM)
+ * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be
+ * done asynchronously, when valid out fence is specified.
+ *
+ * VM_BIND locking order is as below.
+ *
+ * 1) vm_bind_lock mutex will protect vm_bind lists. This lock is taken in
+ *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
+ *    mapping.
+ *
+ *    In future, when GPU page faults are supported, we can potentially use a
+ *    rwsem instead, so that multiple page fault handlers can take the read
+ *    side lock to lookup the mapping and hence can run in parallel.
+ *    The older execbuf mode of binding do not need this lock.
+ *
+ * 2) The object's dma-resv lock will protect i915_vma state and needs
+ *    to be held while binding/unbinding a vma in the async worker and while
+ *    updating dma-resv fence list of an object. Note that private BOs of a VM
+ *    will all share a dma-resv object.
+ *
+ * 3) Spinlock/s to protect some of the VM's lists like the list of
+ *    invalidated vmas (due to eviction and userptr invalidation) etc.
+ */
+
+/**
+ * i915_gem_vm_bind_lookup_vma() - lookup for the vma with a starting addr
+ * @vm: virtual address space in which vma needs to be looked for
+ * @va: starting addr of the vma
+ *
+ * retrieves the vma with a starting address from the vm's vma tree.
+ *
+ * Returns: returns vma on success, NULL on failure.
+ */
+struct i915_vma *
+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
+{
+	lockdep_assert_held(&vm->vm_bind_lock);
+
+	return i915_vm_bind_it_iter_first(&vm->va, va, va);
+}
+
+/**
+ * i915_gem_vm_bind_remove() - Remove vma from the vm bind list
+ * @vma: vma that needs to be removed
+ * @release_obj: release the object
+ *
+ * Removes the vma from the vm's lists and interval tree
+ */
+static void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
+{
+	lockdep_assert_held(&vma->vm->vm_bind_lock);
+
+	list_del_init(&vma->vm_bind_link);
+	i915_vm_bind_it_remove(vma, &vma->vm->va);
+
+	/* Release object */
+	if (release_obj)
+		i915_gem_object_put(vma->obj);
+}
+
+static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
+				  struct drm_i915_gem_vm_unbind *va)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int ret;
+
+	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
+	if (ret)
+		return ret;
+
+	va->start = gen8_noncanonical_addr(va->start);
+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
+
+	if (!vma)
+		ret = -ENOENT;
+	else if (vma->size != va->length)
+		ret = -EINVAL;
+
+	if (ret) {
+		mutex_unlock(&vm->vm_bind_lock);
+		return ret;
+	}
+
+	i915_gem_vm_bind_remove(vma, false);
+
+	mutex_unlock(&vm->vm_bind_lock);
+
+	/* Destroy vma and then release object */
+	obj = vma->obj;
+	ret = i915_gem_object_lock(obj, NULL);
+	if (ret)
+		return ret;
+
+	i915_vma_destroy(vma);
+	i915_gem_object_unlock(obj);
+
+	i915_gem_object_put(obj);
+
+	return 0;
+}
+
+/**
+ * i915_gem_vm_unbind_all() - Unbind all mappings from an address space
+ * @vm: Address spece to remove mappings from
+ *
+ * Unbind all userspace requested vm_bind mappings
+ */
+void i915_gem_vm_unbind_all(struct i915_address_space *vm)
+{
+	struct i915_vma *vma, *t;
+
+	mutex_lock(&vm->vm_bind_lock);
+	list_for_each_entry_safe(vma, t, &vm->vm_bind_list, vm_bind_link)
+		i915_gem_vm_bind_remove(vma, true);
+	list_for_each_entry_safe(vma, t, &vm->vm_bound_list, vm_bind_link)
+		i915_gem_vm_bind_remove(vma, true);
+	mutex_unlock(&vm->vm_bind_lock);
+}
+
+static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
+					struct drm_i915_gem_object *obj,
+					struct drm_i915_gem_vm_bind *va)
+{
+	struct i915_gtt_view view;
+	struct i915_vma *vma;
+
+	va->start = gen8_noncanonical_addr(va->start);
+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
+	if (vma)
+		return ERR_PTR(-EEXIST);
+
+	view.type = I915_GTT_VIEW_PARTIAL;
+	view.partial.offset = va->offset >> PAGE_SHIFT;
+	view.partial.size = va->length >> PAGE_SHIFT;
+	vma = i915_vma_instance(obj, vm, &view);
+	if (IS_ERR(vma))
+		return vma;
+
+	vma->start = va->start;
+	vma->last = va->start + va->length - 1;
+
+	return vma;
+}
+
+static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
+				struct drm_i915_gem_vm_bind *va,
+				struct drm_file *file)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma = NULL;
+	struct i915_gem_ww_ctx ww;
+	u64 pin_flags;
+	int ret = 0;
+
+	if (!vm->vm_bind_mode)
+		return -EOPNOTSUPP;
+
+	obj = i915_gem_object_lookup(file, va->handle);
+	if (!obj)
+		return -ENOENT;
+
+	if (!va->length ||
+	    !IS_ALIGNED(va->offset | va->length,
+			i915_gem_object_max_page_size(obj->mm.placements,
+						      obj->mm.n_placements)) ||
+	    range_overflows_t(u64, va->offset, va->length, obj->base.size)) {
+		ret = -EINVAL;
+		goto put_obj;
+	}
+
+	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
+	if (ret)
+		goto put_obj;
+
+	vma = vm_bind_get_vma(vm, obj, va);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto unlock_vm;
+	}
+
+	pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
+
+	for_i915_gem_ww(&ww, ret, true) {
+		ret = i915_gem_object_lock(vma->obj, &ww);
+		if (ret)
+			continue;
+
+		ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
+		if (ret)
+			continue;
+
+		/* Make it evictable */
+		__i915_vma_unpin(vma);
+
+		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
+		i915_vm_bind_it_insert(vma, &vm->va);
+
+		/* Hold object reference until vm_unbind */
+		i915_gem_object_get(vma->obj);
+	}
+
+	if (ret)
+		i915_vma_destroy(vma);
+unlock_vm:
+	mutex_unlock(&vm->vm_bind_lock);
+put_obj:
+	i915_gem_object_put(obj);
+
+	return ret;
+}
+
+/**
+ * i915_gem_vm_bind_ioctl() - ioctl function for binding an obj into
+ * virtual address
+ * @dev: drm device associated to the virtual address
+ * @data: data related to the vm bind required
+ * @file: drm_file related to he ioctl
+ *
+ * Implements a function to bind the object into the virtual address
+ *
+ * Returns 0 on success, error code on failure.
+ */
+int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
+			   struct drm_file *file)
+{
+	struct drm_i915_gem_vm_bind *args = data;
+	struct i915_address_space *vm;
+	int ret;
+
+	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
+	if (unlikely(!vm))
+		return -ENOENT;
+
+	ret = i915_gem_vm_bind_obj(vm, args, file);
+
+	i915_vm_put(vm);
+	return ret;
+}
+
+/**
+ * i915_gem_vm_unbind_ioctl() - ioctl function for unbinding an obj from
+ * virtual address
+ * @dev: drm device associated to the virtual address
+ * @data: data related to the binding that needs to be unbinded
+ * @file: drm_file related to the ioctl
+ *
+ * Implements a function to unbind the object from the virtual address
+ *
+ * Returns 0 on success, error code on failure.
+ */
+int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file)
+{
+	struct drm_i915_gem_vm_unbind *args = data;
+	struct i915_address_space *vm;
+	int ret;
+
+	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
+	if (unlikely(!vm))
+		return -ENOENT;
+
+	ret = i915_gem_vm_unbind_vma(vm, args);
+
+	i915_vm_put(vm);
+	return ret;
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index b67831833c9a..0daa70c6ed0d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -12,6 +12,7 @@
 
 #include "gem/i915_gem_internal.h"
 #include "gem/i915_gem_lmem.h"
+#include "gem/i915_gem_vm_bind.h"
 #include "i915_trace.h"
 #include "i915_utils.h"
 #include "intel_gt.h"
@@ -176,6 +177,8 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
 void i915_address_space_fini(struct i915_address_space *vm)
 {
 	drm_mm_takedown(&vm->mm);
+	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
+	mutex_destroy(&vm->vm_bind_lock);
 }
 
 /**
@@ -202,6 +205,8 @@ static void __i915_vm_release(struct work_struct *work)
 	struct i915_address_space *vm =
 		container_of(work, struct i915_address_space, release_work);
 
+	i915_gem_vm_unbind_all(vm);
+
 	__i915_vm_close(vm);
 
 	/* Synchronize async unbinds. */
@@ -282,6 +287,11 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	INIT_LIST_HEAD(&vm->bound_list);
 	INIT_LIST_HEAD(&vm->unbound_list);
+
+	vm->va = RB_ROOT_CACHED;
+	INIT_LIST_HEAD(&vm->vm_bind_list);
+	INIT_LIST_HEAD(&vm->vm_bound_list);
+	mutex_init(&vm->vm_bind_lock);
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index c0ca53cba9f0..b52061858161 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -259,6 +259,23 @@ struct i915_address_space {
 	 */
 	struct list_head unbound_list;
 
+	/**
+	 * @vm_bind_mode: flag to indicate vm_bind method of binding
+	 *
+	 * True: allow only vm_bind method of binding.
+	 * False: allow only legacy execbuff method of binding.
+	 */
+	bool vm_bind_mode:1;
+
+	/** @vm_bind_lock: Mutex to protect @vm_bind_list and @vm_bound_list */
+	struct mutex vm_bind_lock;
+	/** @vm_bind_list: List of vm_binding in process */
+	struct list_head vm_bind_list;
+	/** @vm_bound_list: List of vm_binding completed */
+	struct list_head vm_bound_list;
+	/* @va: tree of persistent vmas */
+	struct rb_root_cached va;
+
 	/* Global GTT */
 	bool is_ggtt:1;
 
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 9d1fc2477f80..f9e4a784dd0e 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -69,6 +69,7 @@
 #include "gem/i915_gem_ioctls.h"
 #include "gem/i915_gem_mman.h"
 #include "gem/i915_gem_pm.h"
+#include "gem/i915_gem_vm_bind.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_pm.h"
 #include "gt/intel_rc6.h"
@@ -1892,6 +1893,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_BIND, i915_gem_vm_bind_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_UNBIND, i915_gem_vm_unbind_ioctl, DRM_RENDER_ALLOW),
 };
 
 /*
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index f17c09ead7d7..33cb0cbc7fb1 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -29,6 +29,7 @@
 #include "display/intel_frontbuffer.h"
 #include "gem/i915_gem_lmem.h"
 #include "gem/i915_gem_tiling.h"
+#include "gem/i915_gem_vm_bind.h"
 #include "gt/intel_engine.h"
 #include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_gt.h"
@@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	spin_unlock(&obj->vma.lock);
 	mutex_unlock(&vm->mutex);
 
+	INIT_LIST_HEAD(&vma->vm_bind_link);
 	return vma;
 
 err_unlock:
@@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 {
 	struct i915_vma *vma;
 
-	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
 	GEM_BUG_ON(!kref_read(&vm->ref));
 
 	spin_lock(&obj->vma.lock);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index aecd9c64486b..6feef0305fe1 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
 {
 	ptrdiff_t cmp;
 
-	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
-
 	cmp = ptrdiff(vma->vm, vm);
 	if (cmp)
 		return cmp;
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index ec0f6c9f57d0..bed7a344dcd7 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -289,6 +289,20 @@ struct i915_vma {
 	/** This object's place on the active/inactive lists */
 	struct list_head vm_link;
 
+	/** @vm_bind_link: node for the vm_bind related lists of vm */
+	struct list_head vm_bind_link;
+
+	/** Interval tree structures for persistent vma */
+
+	/** @rb: node for the interval tree of vm for persistent vmas */
+	struct rb_node rb;
+	/** @start: start endpoint of the rb node */
+	u64 start;
+	/** @last: Last endpoint of the rb node */
+	u64 last;
+	/** @__subtree_last: last in subtree */
+	u64 __subtree_last;
+
 	struct list_head obj_link; /* Link in the object's VMA list */
 	struct rb_node obj_node;
 	struct hlist_node obj_hash;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 520ad2691a99..4a4f2a77388c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -470,6 +470,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_VM_CREATE		0x3a
 #define DRM_I915_GEM_VM_DESTROY		0x3b
 #define DRM_I915_GEM_CREATE_EXT		0x3c
+#define DRM_I915_GEM_VM_BIND		0x3d
+#define DRM_I915_GEM_VM_UNBIND		0x3e
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
@@ -534,6 +536,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_QUERY			DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query)
 #define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
 #define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
+#define DRM_IOCTL_I915_GEM_VM_BIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1507,6 +1511,41 @@ struct drm_i915_gem_execbuffer2 {
 #define i915_execbuffer2_get_context_id(eb2) \
 	((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
 
+/**
+ * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
+ *
+ * The operation will wait for input fence to signal.
+ *
+ * The returned output fence will be signaled after the completion of the
+ * operation.
+ */
+struct drm_i915_gem_timeline_fence {
+	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
+	__u32 handle;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_TIMELINE_FENCE_WAIT:
+	 * Wait for the input fence before the operation.
+	 *
+	 * I915_TIMELINE_FENCE_SIGNAL:
+	 * Return operation completion fence as output.
+	 */
+	__u32 flags;
+#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
+#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
+#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
+
+	/**
+	 * @value: A point in the timeline.
+	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+	 * binary one.
+	 */
+	__u64 value;
+};
+
 struct drm_i915_gem_pin {
 	/** Handle of the buffer to be pinned. */
 	__u32 handle;
@@ -3717,6 +3756,134 @@ struct drm_i915_gem_create_ext_protected_content {
 /* ID of the protected content session managed by i915 when PXP is active */
 #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
 
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
+ * virtual address (VA) range to the section of an object that should be bound
+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) and can
+ * be mapped to whole object or a section of the object (partial binding).
+ * Multiple VA mappings can be created to the same section of the object
+ * (aliasing).
+ *
+ * The @start, @offset and @length must be 4K page aligned. However the DG2
+ * and XEHPSDV has 64K page size for device local memory and has compact page
+ * table. On those platforms, for binding device local-memory objects, the
+ * @start, @offset and @length must be 64K aligned. Also, UMDs should not mix
+ * the local memory 64K page and the system memory 4K page bindings in the same
+ * 2M range.
+ *
+ * Error code -EINVAL will be returned if @start, @offset and @length are not
+ * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
+ * -ENOSPC will be returned if the VA range specified can't be reserved.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_BIND operation can be done
+ * asynchronously, if valid @fence is specified.
+ */
+struct drm_i915_gem_vm_bind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @handle: Object handle */
+	__u32 handle;
+
+	/** @start: Virtual Address start to bind */
+	__u64 start;
+
+	/** @offset: Offset in object to bind */
+	__u64 offset;
+
+	/** @length: Length of mapping to bind */
+	__u64 length;
+
+	/**
+	 * @flags: Currently reserved, MBZ.
+	 *
+	 * Note that @fence carries its own flags.
+	 */
+	__u64 flags;
+
+	/**
+	 * @fence: Timeline fence for bind completion signaling.
+	 *
+	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
+	 *
+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
+	 * is invalid, and an error will be returned.
+	 *
+	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
+	 * is not requested and binding is completed synchronously.
+	 */
+	struct drm_i915_gem_timeline_fence fence;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
+ *
+ * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
+ * address (VA) range that should be unbound from the device page table of the
+ * specified address space (VM). VM_UNBIND will force unbind the specified
+ * range from device page table without waiting for any GPU job to complete.
+ * It is UMDs responsibility to ensure the mapping is no longer in use before
+ * calling VM_UNBIND.
+ *
+ * If the specified mapping is not found, the ioctl will simply return without
+ * any error.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
+ * asynchronously, if valid @fence is specified.
+ */
+struct drm_i915_gem_vm_unbind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @rsvd: Reserved, MBZ */
+	__u32 rsvd;
+
+	/** @start: Virtual Address start to unbind */
+	__u64 start;
+
+	/** @length: Length of mapping to unbind */
+	__u64 length;
+
+	/**
+	 * @flags: Currently reserved, MBZ.
+	 *
+	 * Note that @fence carries its own flags.
+	 */
+	__u64 flags;
+
+	/**
+	 * @fence: Timeline fence for unbind completion signaling.
+	 *
+	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
+	 *
+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
+	 * is invalid, and an error will be returned.
+	 *
+	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
+	 * is not requested and unbinding is completed synchronously.
+	 */
+	struct drm_i915_gem_timeline_fence fence;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-gfx] [RFC v4 04/14] drm/i915/vm_bind: Implement bind and unbind of object
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Add uapi and implement support for bind and unbind of an
object at the specified GPU virtual addresses.

The vm_bind mode is not supported in legacy execbuf2 ioctl.
It will be supported only in the newer execbuf3 ioctl.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   5 +
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  27 ++
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 308 ++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  10 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  17 +
 drivers/gpu/drm/i915/i915_driver.c            |   3 +
 drivers/gpu/drm/i915/i915_vma.c               |   3 +-
 drivers/gpu/drm/i915/i915_vma.h               |   2 -
 drivers/gpu/drm/i915/i915_vma_types.h         |  14 +
 include/uapi/drm/i915_drm.h                   | 167 ++++++++++
 11 files changed, 554 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index a26edcdadc21..9bf939ef18ea 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -166,6 +166,7 @@ gem-y += \
 	gem/i915_gem_ttm_move.o \
 	gem/i915_gem_ttm_pm.o \
 	gem/i915_gem_userptr.o \
+	gem/i915_gem_vm_bind_object.o \
 	gem/i915_gem_wait.o \
 	gem/i915_gemfs.o
 i915-y += \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index cd75b0ca2555..f85f10cf9c34 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -781,6 +781,11 @@ static int eb_select_context(struct i915_execbuffer *eb)
 	if (unlikely(IS_ERR(ctx)))
 		return PTR_ERR(ctx);
 
+	if (ctx->vm->vm_bind_mode) {
+		i915_gem_context_put(ctx);
+		return -EOPNOTSUPP;
+	}
+
 	eb->gem_context = ctx;
 	if (i915_gem_context_has_full_ppgtt(ctx))
 		eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
new file mode 100644
index 000000000000..4f3cfa1f6ef6
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#ifndef __I915_GEM_VM_BIND_H
+#define __I915_GEM_VM_BIND_H
+
+#include <linux/types.h>
+
+#include <drm/drm_file.h>
+#include <drm/drm_device.h>
+
+#include "gt/intel_gtt.h"
+#include "i915_vma_types.h"
+
+struct i915_vma *
+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
+
+int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
+			   struct drm_file *file);
+int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file);
+
+void i915_gem_vm_unbind_all(struct i915_address_space *vm);
+
+#endif /* __I915_GEM_VM_BIND_H */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
new file mode 100644
index 000000000000..c24e22657617
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -0,0 +1,308 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include <uapi/drm/i915_drm.h>
+
+#include <linux/interval_tree_generic.h>
+
+#include "gem/i915_gem_context.h"
+#include "gem/i915_gem_vm_bind.h"
+
+#include "gt/intel_gpu_commands.h"
+
+#define START(node) ((node)->start)
+#define LAST(node) ((node)->last)
+
+INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
+		     START, LAST, static inline, i915_vm_bind_it)
+
+#undef START
+#undef LAST
+
+/**
+ * DOC: VM_BIND/UNBIND ioctls
+ *
+ * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
+ * objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
+ * specified address space (VM). Multiple mappings can map to the same physical
+ * pages of an object (aliasing). These mappings (also referred to as persistent
+ * mappings) will be persistent across multiple GPU submissions (execbuf calls)
+ * issued by the UMD, without user having to provide a list of all required
+ * mappings during each submission (as required by older execbuf mode).
+ *
+ * The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
+ * signaling the completion of bind/unbind operation.
+ *
+ * VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
+ * User has to opt-in for VM_BIND mode of binding for an address space (VM)
+ * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be
+ * done asynchronously, when valid out fence is specified.
+ *
+ * VM_BIND locking order is as below.
+ *
+ * 1) vm_bind_lock mutex will protect vm_bind lists. This lock is taken in
+ *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
+ *    mapping.
+ *
+ *    In future, when GPU page faults are supported, we can potentially use a
+ *    rwsem instead, so that multiple page fault handlers can take the read
+ *    side lock to lookup the mapping and hence can run in parallel.
+ *    The older execbuf mode of binding do not need this lock.
+ *
+ * 2) The object's dma-resv lock will protect i915_vma state and needs
+ *    to be held while binding/unbinding a vma in the async worker and while
+ *    updating dma-resv fence list of an object. Note that private BOs of a VM
+ *    will all share a dma-resv object.
+ *
+ * 3) Spinlock/s to protect some of the VM's lists like the list of
+ *    invalidated vmas (due to eviction and userptr invalidation) etc.
+ */
+
+/**
+ * i915_gem_vm_bind_lookup_vma() - lookup for the vma with a starting addr
+ * @vm: virtual address space in which vma needs to be looked for
+ * @va: starting addr of the vma
+ *
+ * retrieves the vma with a starting address from the vm's vma tree.
+ *
+ * Returns: returns vma on success, NULL on failure.
+ */
+struct i915_vma *
+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
+{
+	lockdep_assert_held(&vm->vm_bind_lock);
+
+	return i915_vm_bind_it_iter_first(&vm->va, va, va);
+}
+
+/**
+ * i915_gem_vm_bind_remove() - Remove vma from the vm bind list
+ * @vma: vma that needs to be removed
+ * @release_obj: release the object
+ *
+ * Removes the vma from the vm's lists and interval tree
+ */
+static void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
+{
+	lockdep_assert_held(&vma->vm->vm_bind_lock);
+
+	list_del_init(&vma->vm_bind_link);
+	i915_vm_bind_it_remove(vma, &vma->vm->va);
+
+	/* Release object */
+	if (release_obj)
+		i915_gem_object_put(vma->obj);
+}
+
+static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
+				  struct drm_i915_gem_vm_unbind *va)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int ret;
+
+	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
+	if (ret)
+		return ret;
+
+	va->start = gen8_noncanonical_addr(va->start);
+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
+
+	if (!vma)
+		ret = -ENOENT;
+	else if (vma->size != va->length)
+		ret = -EINVAL;
+
+	if (ret) {
+		mutex_unlock(&vm->vm_bind_lock);
+		return ret;
+	}
+
+	i915_gem_vm_bind_remove(vma, false);
+
+	mutex_unlock(&vm->vm_bind_lock);
+
+	/* Destroy vma and then release object */
+	obj = vma->obj;
+	ret = i915_gem_object_lock(obj, NULL);
+	if (ret)
+		return ret;
+
+	i915_vma_destroy(vma);
+	i915_gem_object_unlock(obj);
+
+	i915_gem_object_put(obj);
+
+	return 0;
+}
+
+/**
+ * i915_gem_vm_unbind_all() - Unbind all mappings from an address space
+ * @vm: Address spece to remove mappings from
+ *
+ * Unbind all userspace requested vm_bind mappings
+ */
+void i915_gem_vm_unbind_all(struct i915_address_space *vm)
+{
+	struct i915_vma *vma, *t;
+
+	mutex_lock(&vm->vm_bind_lock);
+	list_for_each_entry_safe(vma, t, &vm->vm_bind_list, vm_bind_link)
+		i915_gem_vm_bind_remove(vma, true);
+	list_for_each_entry_safe(vma, t, &vm->vm_bound_list, vm_bind_link)
+		i915_gem_vm_bind_remove(vma, true);
+	mutex_unlock(&vm->vm_bind_lock);
+}
+
+static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
+					struct drm_i915_gem_object *obj,
+					struct drm_i915_gem_vm_bind *va)
+{
+	struct i915_gtt_view view;
+	struct i915_vma *vma;
+
+	va->start = gen8_noncanonical_addr(va->start);
+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
+	if (vma)
+		return ERR_PTR(-EEXIST);
+
+	view.type = I915_GTT_VIEW_PARTIAL;
+	view.partial.offset = va->offset >> PAGE_SHIFT;
+	view.partial.size = va->length >> PAGE_SHIFT;
+	vma = i915_vma_instance(obj, vm, &view);
+	if (IS_ERR(vma))
+		return vma;
+
+	vma->start = va->start;
+	vma->last = va->start + va->length - 1;
+
+	return vma;
+}
+
+static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
+				struct drm_i915_gem_vm_bind *va,
+				struct drm_file *file)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma = NULL;
+	struct i915_gem_ww_ctx ww;
+	u64 pin_flags;
+	int ret = 0;
+
+	if (!vm->vm_bind_mode)
+		return -EOPNOTSUPP;
+
+	obj = i915_gem_object_lookup(file, va->handle);
+	if (!obj)
+		return -ENOENT;
+
+	if (!va->length ||
+	    !IS_ALIGNED(va->offset | va->length,
+			i915_gem_object_max_page_size(obj->mm.placements,
+						      obj->mm.n_placements)) ||
+	    range_overflows_t(u64, va->offset, va->length, obj->base.size)) {
+		ret = -EINVAL;
+		goto put_obj;
+	}
+
+	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
+	if (ret)
+		goto put_obj;
+
+	vma = vm_bind_get_vma(vm, obj, va);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto unlock_vm;
+	}
+
+	pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
+
+	for_i915_gem_ww(&ww, ret, true) {
+		ret = i915_gem_object_lock(vma->obj, &ww);
+		if (ret)
+			continue;
+
+		ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
+		if (ret)
+			continue;
+
+		/* Make it evictable */
+		__i915_vma_unpin(vma);
+
+		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
+		i915_vm_bind_it_insert(vma, &vm->va);
+
+		/* Hold object reference until vm_unbind */
+		i915_gem_object_get(vma->obj);
+	}
+
+	if (ret)
+		i915_vma_destroy(vma);
+unlock_vm:
+	mutex_unlock(&vm->vm_bind_lock);
+put_obj:
+	i915_gem_object_put(obj);
+
+	return ret;
+}
+
+/**
+ * i915_gem_vm_bind_ioctl() - ioctl function for binding an obj into
+ * virtual address
+ * @dev: drm device associated to the virtual address
+ * @data: data related to the vm bind required
+ * @file: drm_file related to he ioctl
+ *
+ * Implements a function to bind the object into the virtual address
+ *
+ * Returns 0 on success, error code on failure.
+ */
+int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
+			   struct drm_file *file)
+{
+	struct drm_i915_gem_vm_bind *args = data;
+	struct i915_address_space *vm;
+	int ret;
+
+	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
+	if (unlikely(!vm))
+		return -ENOENT;
+
+	ret = i915_gem_vm_bind_obj(vm, args, file);
+
+	i915_vm_put(vm);
+	return ret;
+}
+
+/**
+ * i915_gem_vm_unbind_ioctl() - ioctl function for unbinding an obj from
+ * virtual address
+ * @dev: drm device associated to the virtual address
+ * @data: data related to the binding that needs to be unbinded
+ * @file: drm_file related to the ioctl
+ *
+ * Implements a function to unbind the object from the virtual address
+ *
+ * Returns 0 on success, error code on failure.
+ */
+int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file)
+{
+	struct drm_i915_gem_vm_unbind *args = data;
+	struct i915_address_space *vm;
+	int ret;
+
+	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
+	if (unlikely(!vm))
+		return -ENOENT;
+
+	ret = i915_gem_vm_unbind_vma(vm, args);
+
+	i915_vm_put(vm);
+	return ret;
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index b67831833c9a..0daa70c6ed0d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -12,6 +12,7 @@
 
 #include "gem/i915_gem_internal.h"
 #include "gem/i915_gem_lmem.h"
+#include "gem/i915_gem_vm_bind.h"
 #include "i915_trace.h"
 #include "i915_utils.h"
 #include "intel_gt.h"
@@ -176,6 +177,8 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
 void i915_address_space_fini(struct i915_address_space *vm)
 {
 	drm_mm_takedown(&vm->mm);
+	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
+	mutex_destroy(&vm->vm_bind_lock);
 }
 
 /**
@@ -202,6 +205,8 @@ static void __i915_vm_release(struct work_struct *work)
 	struct i915_address_space *vm =
 		container_of(work, struct i915_address_space, release_work);
 
+	i915_gem_vm_unbind_all(vm);
+
 	__i915_vm_close(vm);
 
 	/* Synchronize async unbinds. */
@@ -282,6 +287,11 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	INIT_LIST_HEAD(&vm->bound_list);
 	INIT_LIST_HEAD(&vm->unbound_list);
+
+	vm->va = RB_ROOT_CACHED;
+	INIT_LIST_HEAD(&vm->vm_bind_list);
+	INIT_LIST_HEAD(&vm->vm_bound_list);
+	mutex_init(&vm->vm_bind_lock);
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index c0ca53cba9f0..b52061858161 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -259,6 +259,23 @@ struct i915_address_space {
 	 */
 	struct list_head unbound_list;
 
+	/**
+	 * @vm_bind_mode: flag to indicate vm_bind method of binding
+	 *
+	 * True: allow only vm_bind method of binding.
+	 * False: allow only legacy execbuff method of binding.
+	 */
+	bool vm_bind_mode:1;
+
+	/** @vm_bind_lock: Mutex to protect @vm_bind_list and @vm_bound_list */
+	struct mutex vm_bind_lock;
+	/** @vm_bind_list: List of vm_binding in process */
+	struct list_head vm_bind_list;
+	/** @vm_bound_list: List of vm_binding completed */
+	struct list_head vm_bound_list;
+	/* @va: tree of persistent vmas */
+	struct rb_root_cached va;
+
 	/* Global GTT */
 	bool is_ggtt:1;
 
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 9d1fc2477f80..f9e4a784dd0e 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -69,6 +69,7 @@
 #include "gem/i915_gem_ioctls.h"
 #include "gem/i915_gem_mman.h"
 #include "gem/i915_gem_pm.h"
+#include "gem/i915_gem_vm_bind.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_pm.h"
 #include "gt/intel_rc6.h"
@@ -1892,6 +1893,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_BIND, i915_gem_vm_bind_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_UNBIND, i915_gem_vm_unbind_ioctl, DRM_RENDER_ALLOW),
 };
 
 /*
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index f17c09ead7d7..33cb0cbc7fb1 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -29,6 +29,7 @@
 #include "display/intel_frontbuffer.h"
 #include "gem/i915_gem_lmem.h"
 #include "gem/i915_gem_tiling.h"
+#include "gem/i915_gem_vm_bind.h"
 #include "gt/intel_engine.h"
 #include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_gt.h"
@@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	spin_unlock(&obj->vma.lock);
 	mutex_unlock(&vm->mutex);
 
+	INIT_LIST_HEAD(&vma->vm_bind_link);
 	return vma;
 
 err_unlock:
@@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 {
 	struct i915_vma *vma;
 
-	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
 	GEM_BUG_ON(!kref_read(&vm->ref));
 
 	spin_lock(&obj->vma.lock);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index aecd9c64486b..6feef0305fe1 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
 {
 	ptrdiff_t cmp;
 
-	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
-
 	cmp = ptrdiff(vma->vm, vm);
 	if (cmp)
 		return cmp;
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index ec0f6c9f57d0..bed7a344dcd7 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -289,6 +289,20 @@ struct i915_vma {
 	/** This object's place on the active/inactive lists */
 	struct list_head vm_link;
 
+	/** @vm_bind_link: node for the vm_bind related lists of vm */
+	struct list_head vm_bind_link;
+
+	/** Interval tree structures for persistent vma */
+
+	/** @rb: node for the interval tree of vm for persistent vmas */
+	struct rb_node rb;
+	/** @start: start endpoint of the rb node */
+	u64 start;
+	/** @last: Last endpoint of the rb node */
+	u64 last;
+	/** @__subtree_last: last in subtree */
+	u64 __subtree_last;
+
 	struct list_head obj_link; /* Link in the object's VMA list */
 	struct rb_node obj_node;
 	struct hlist_node obj_hash;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 520ad2691a99..4a4f2a77388c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -470,6 +470,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_VM_CREATE		0x3a
 #define DRM_I915_GEM_VM_DESTROY		0x3b
 #define DRM_I915_GEM_CREATE_EXT		0x3c
+#define DRM_I915_GEM_VM_BIND		0x3d
+#define DRM_I915_GEM_VM_UNBIND		0x3e
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
@@ -534,6 +536,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_QUERY			DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query)
 #define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
 #define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
+#define DRM_IOCTL_I915_GEM_VM_BIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1507,6 +1511,41 @@ struct drm_i915_gem_execbuffer2 {
 #define i915_execbuffer2_get_context_id(eb2) \
 	((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
 
+/**
+ * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
+ *
+ * The operation will wait for input fence to signal.
+ *
+ * The returned output fence will be signaled after the completion of the
+ * operation.
+ */
+struct drm_i915_gem_timeline_fence {
+	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
+	__u32 handle;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_TIMELINE_FENCE_WAIT:
+	 * Wait for the input fence before the operation.
+	 *
+	 * I915_TIMELINE_FENCE_SIGNAL:
+	 * Return operation completion fence as output.
+	 */
+	__u32 flags;
+#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
+#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
+#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
+
+	/**
+	 * @value: A point in the timeline.
+	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+	 * binary one.
+	 */
+	__u64 value;
+};
+
 struct drm_i915_gem_pin {
 	/** Handle of the buffer to be pinned. */
 	__u32 handle;
@@ -3717,6 +3756,134 @@ struct drm_i915_gem_create_ext_protected_content {
 /* ID of the protected content session managed by i915 when PXP is active */
 #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
 
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
+ * virtual address (VA) range to the section of an object that should be bound
+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) and can
+ * be mapped to whole object or a section of the object (partial binding).
+ * Multiple VA mappings can be created to the same section of the object
+ * (aliasing).
+ *
+ * The @start, @offset and @length must be 4K page aligned. However the DG2
+ * and XEHPSDV has 64K page size for device local memory and has compact page
+ * table. On those platforms, for binding device local-memory objects, the
+ * @start, @offset and @length must be 64K aligned. Also, UMDs should not mix
+ * the local memory 64K page and the system memory 4K page bindings in the same
+ * 2M range.
+ *
+ * Error code -EINVAL will be returned if @start, @offset and @length are not
+ * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
+ * -ENOSPC will be returned if the VA range specified can't be reserved.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_BIND operation can be done
+ * asynchronously, if valid @fence is specified.
+ */
+struct drm_i915_gem_vm_bind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @handle: Object handle */
+	__u32 handle;
+
+	/** @start: Virtual Address start to bind */
+	__u64 start;
+
+	/** @offset: Offset in object to bind */
+	__u64 offset;
+
+	/** @length: Length of mapping to bind */
+	__u64 length;
+
+	/**
+	 * @flags: Currently reserved, MBZ.
+	 *
+	 * Note that @fence carries its own flags.
+	 */
+	__u64 flags;
+
+	/**
+	 * @fence: Timeline fence for bind completion signaling.
+	 *
+	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
+	 *
+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
+	 * is invalid, and an error will be returned.
+	 *
+	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
+	 * is not requested and binding is completed synchronously.
+	 */
+	struct drm_i915_gem_timeline_fence fence;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
+ *
+ * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
+ * address (VA) range that should be unbound from the device page table of the
+ * specified address space (VM). VM_UNBIND will force unbind the specified
+ * range from device page table without waiting for any GPU job to complete.
+ * It is UMDs responsibility to ensure the mapping is no longer in use before
+ * calling VM_UNBIND.
+ *
+ * If the specified mapping is not found, the ioctl will simply return without
+ * any error.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
+ * asynchronously, if valid @fence is specified.
+ */
+struct drm_i915_gem_vm_unbind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @rsvd: Reserved, MBZ */
+	__u32 rsvd;
+
+	/** @start: Virtual Address start to unbind */
+	__u64 start;
+
+	/** @length: Length of mapping to unbind */
+	__u64 length;
+
+	/**
+	 * @flags: Currently reserved, MBZ.
+	 *
+	 * Note that @fence carries its own flags.
+	 */
+	__u64 flags;
+
+	/**
+	 * @fence: Timeline fence for unbind completion signaling.
+	 *
+	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
+	 *
+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
+	 * is invalid, and an error will be returned.
+	 *
+	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
+	 * is not requested and unbinding is completed synchronously.
+	 */
+	struct drm_i915_gem_timeline_fence fence;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC v4 05/14] drm/i915/vm_bind: Support for VM private BOs
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Each VM creates a root_obj and shares it with all of its private objects
to use it as dma_resv object. This has a performance advantage as it
requires a single dma_resv object update for all private BOs vs list of
dma_resv objects update for shared BOs, in the execbuf path.

VM private BOs can be only mapped on specified VM and cannot be dmabuf
exported. Also, they are supported only in vm_bind mode.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c    | 40 ++++++++++++++++++-
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  4 ++
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  3 ++
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
 .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 +++++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
 drivers/gpu/drm/i915/i915_vma.c               |  1 +
 drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
 include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
 12 files changed, 105 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 3b3ab4abb0a3..692d95ef5d3e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -253,6 +253,7 @@ struct create_ext {
 	unsigned int n_placements;
 	unsigned int placement_mask;
 	unsigned long flags;
+	u32 vm_id;
 };
 
 static void repr_placements(char *buf, size_t size,
@@ -402,9 +403,24 @@ static int ext_set_protected(struct i915_user_extension __user *base, void *data
 	return 0;
 }
 
+static int ext_set_vm_private(struct i915_user_extension __user *base,
+			      void *data)
+{
+	struct drm_i915_gem_create_ext_vm_private ext;
+	struct create_ext *ext_data = data;
+
+	if (copy_from_user(&ext, base, sizeof(ext)))
+		return -EFAULT;
+
+	ext_data->vm_id = ext.vm_id;
+
+	return 0;
+}
+
 static const i915_user_extension_fn create_extensions[] = {
 	[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
 	[I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
+	[I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
 };
 
 /**
@@ -420,6 +436,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct drm_i915_gem_create_ext *args = data;
 	struct create_ext ext_data = { .i915 = i915 };
+	struct i915_address_space *vm = NULL;
 	struct drm_i915_gem_object *obj;
 	int ret;
 
@@ -433,6 +450,12 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
+	if (ext_data.vm_id) {
+		vm = i915_gem_vm_lookup(file->driver_priv, ext_data.vm_id);
+		if (unlikely(!vm))
+			return -ENOENT;
+	}
+
 	if (!ext_data.n_placements) {
 		ext_data.placements[0] =
 			intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
@@ -459,8 +482,21 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 						ext_data.placements,
 						ext_data.n_placements,
 						ext_data.flags);
-	if (IS_ERR(obj))
-		return PTR_ERR(obj);
+	if (IS_ERR(obj)) {
+		ret = PTR_ERR(obj);
+		goto vm_put;
+	}
+
+	if (vm) {
+		obj->base.resv = vm->root_obj->base.resv;
+		obj->priv_root = i915_gem_object_get(vm->root_obj);
+		i915_vm_put(vm);
+	}
 
 	return i915_gem_publish(obj, file, &args->size, &args->handle);
+vm_put:
+	if (vm)
+		i915_vm_put(vm);
+
+	return ret;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index f5062d0c6333..6433173c3e84 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags)
 	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
 	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
 
+	if (obj->priv_root) {
+		drm_dbg(obj->base.dev,
+			"Exporting VM private objects is not allowed\n");
+		return ERR_PTR(-EINVAL);
+	}
+
 	exp_info.ops = &i915_dmabuf_ops;
 	exp_info.size = gem_obj->size;
 	exp_info.flags = flags;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index f85f10cf9c34..33d989a20227 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -864,6 +864,10 @@ static struct i915_vma *eb_lookup_vma(struct i915_execbuffer *eb, u32 handle)
 		if (unlikely(!obj))
 			return ERR_PTR(-ENOENT);
 
+		/* VM private objects are not supported here */
+		if (obj->priv_root)
+			return ERR_PTR(-EINVAL);
+
 		/*
 		 * If the user has opted-in for protected-object tracking, make
 		 * sure the object encryption can be used.
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 7ff9c7877bec..271ad62b3245 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -108,6 +108,9 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
  */
 void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
 {
+	if (obj->priv_root && !obj->ttm.created)
+		i915_gem_object_put(obj->priv_root);
+
 	mutex_destroy(&obj->mm.get_page.lock);
 	mutex_destroy(&obj->mm.get_dma_page.lock);
 	dma_resv_fini(&obj->base._resv);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 40305e2bcd49..2e79cfc0b06a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -241,6 +241,9 @@ struct drm_i915_gem_object {
 
 	const struct drm_i915_gem_object_ops *ops;
 
+	/* For VM private BO, points to root_obj in VM. NULL otherwise */
+	struct drm_i915_gem_object *priv_root;
+
 	struct {
 		/**
 		 * @vma.lock: protect the list/tree of vmas
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 0544b0a4a43a..0a02367fae5d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1153,6 +1153,9 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
 	mutex_destroy(&obj->ttm.get_io_page.lock);
 
 	if (obj->ttm.created) {
+		if (obj->priv_root)
+			i915_gem_object_put(obj->priv_root);
+
 		/*
 		 * We freely manage the shrinker LRU outide of the mm.pages life
 		 * cycle. As a result when destroying the object we should be
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index c24e22657617..7ca6a41fc981 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -92,6 +92,7 @@ static void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
 	lockdep_assert_held(&vma->vm->vm_bind_lock);
 
 	list_del_init(&vma->vm_bind_link);
+	list_del_init(&vma->non_priv_vm_bind_link);
 	i915_vm_bind_it_remove(vma, &vma->vm->va);
 
 	/* Release object */
@@ -210,6 +211,11 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		goto put_obj;
 	}
 
+	if (obj->priv_root && obj->priv_root != vm->root_obj) {
+		ret = -EINVAL;
+		goto put_obj;
+	}
+
 	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
 	if (ret)
 		goto put_obj;
@@ -236,6 +242,9 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 
 		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
 		i915_vm_bind_it_insert(vma, &vm->va);
+		if (!obj->priv_root)
+			list_add_tail(&vma->non_priv_vm_bind_link,
+				      &vm->non_priv_vm_bind_list);
 
 		/* Hold object reference until vm_unbind */
 		i915_gem_object_get(vma->obj);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 0daa70c6ed0d..da4f9dee0397 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -177,6 +177,7 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
 void i915_address_space_fini(struct i915_address_space *vm)
 {
 	drm_mm_takedown(&vm->mm);
+	i915_gem_object_put(vm->root_obj);
 	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
 	mutex_destroy(&vm->vm_bind_lock);
 }
@@ -292,6 +293,9 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	INIT_LIST_HEAD(&vm->vm_bind_list);
 	INIT_LIST_HEAD(&vm->vm_bound_list);
 	mutex_init(&vm->vm_bind_lock);
+	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
+	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
+	GEM_BUG_ON(IS_ERR(vm->root_obj));
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index b52061858161..3f2e87d3bf34 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -275,6 +275,8 @@ struct i915_address_space {
 	struct list_head vm_bound_list;
 	/* @va: tree of persistent vmas */
 	struct rb_root_cached va;
+	struct list_head non_priv_vm_bind_list;
+	struct drm_i915_gem_object *root_obj;
 
 	/* Global GTT */
 	bool is_ggtt:1;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 33cb0cbc7fb1..aa332ad69ec2 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	mutex_unlock(&vm->mutex);
 
 	INIT_LIST_HEAD(&vma->vm_bind_link);
+	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
 	return vma;
 
 err_unlock:
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index bed7a344dcd7..6d727c2d9802 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -291,6 +291,8 @@ struct i915_vma {
 
 	/** @vm_bind_link: node for the vm_bind related lists of vm */
 	struct list_head vm_bind_link;
+	/* @non_priv_vm_bind_link: Link in non-private persistent VMA list */
+	struct list_head non_priv_vm_bind_link;
 
 	/** Interval tree structures for persistent vma */
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 4a4f2a77388c..9f93e4afa1c8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3636,9 +3636,13 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
 	 * struct drm_i915_gem_create_ext_protected_content.
+	 *
+	 * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
+	 * struct drm_i915_gem_create_ext_vm_private.
 	 */
 #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
 #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
+#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
 	__u64 extensions;
 };
 
@@ -3756,6 +3760,32 @@ struct drm_i915_gem_create_ext_protected_content {
 /* ID of the protected content session managed by i915 when PXP is active */
 #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
 
+/**
+ * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
+ * private to the specified VM.
+ *
+ * See struct drm_i915_gem_create_ext.
+ *
+ * By default, BOs can be mapped on multiple VMs and can also be dma-buf
+ * exported. Hence these BOs are referred to as Shared BOs.
+ * During each execbuf3 submission, the request fence must be added to the
+ * dma-resv fence list of all shared BOs mapped on the VM.
+ *
+ * Unlike Shared BOs, these VM private BOs can only be mapped on the VM they
+ * are private to and can't be dma-buf exported. All private BOs of a VM share
+ * the dma-resv object. Hence during each execbuf3 submission, they need only
+ * one dma-resv fence list updated. Thus, the fast path (where required
+ * mappings are already bound) submission latency is O(1) w.r.t the number of
+ * VM private BOs.
+ */
+struct drm_i915_gem_create_ext_vm_private {
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+
+	/** @vm_id: Id of the VM to which Object is private */
+	__u32 vm_id;
+};
+
 /**
  * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
  *
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-gfx] [RFC v4 05/14] drm/i915/vm_bind: Support for VM private BOs
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Each VM creates a root_obj and shares it with all of its private objects
to use it as dma_resv object. This has a performance advantage as it
requires a single dma_resv object update for all private BOs vs list of
dma_resv objects update for shared BOs, in the execbuf path.

VM private BOs can be only mapped on specified VM and cannot be dmabuf
exported. Also, they are supported only in vm_bind mode.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c    | 40 ++++++++++++++++++-
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  4 ++
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  3 ++
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
 .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 +++++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
 drivers/gpu/drm/i915/i915_vma.c               |  1 +
 drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
 include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
 12 files changed, 105 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 3b3ab4abb0a3..692d95ef5d3e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -253,6 +253,7 @@ struct create_ext {
 	unsigned int n_placements;
 	unsigned int placement_mask;
 	unsigned long flags;
+	u32 vm_id;
 };
 
 static void repr_placements(char *buf, size_t size,
@@ -402,9 +403,24 @@ static int ext_set_protected(struct i915_user_extension __user *base, void *data
 	return 0;
 }
 
+static int ext_set_vm_private(struct i915_user_extension __user *base,
+			      void *data)
+{
+	struct drm_i915_gem_create_ext_vm_private ext;
+	struct create_ext *ext_data = data;
+
+	if (copy_from_user(&ext, base, sizeof(ext)))
+		return -EFAULT;
+
+	ext_data->vm_id = ext.vm_id;
+
+	return 0;
+}
+
 static const i915_user_extension_fn create_extensions[] = {
 	[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
 	[I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
+	[I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
 };
 
 /**
@@ -420,6 +436,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct drm_i915_gem_create_ext *args = data;
 	struct create_ext ext_data = { .i915 = i915 };
+	struct i915_address_space *vm = NULL;
 	struct drm_i915_gem_object *obj;
 	int ret;
 
@@ -433,6 +450,12 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
+	if (ext_data.vm_id) {
+		vm = i915_gem_vm_lookup(file->driver_priv, ext_data.vm_id);
+		if (unlikely(!vm))
+			return -ENOENT;
+	}
+
 	if (!ext_data.n_placements) {
 		ext_data.placements[0] =
 			intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
@@ -459,8 +482,21 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 						ext_data.placements,
 						ext_data.n_placements,
 						ext_data.flags);
-	if (IS_ERR(obj))
-		return PTR_ERR(obj);
+	if (IS_ERR(obj)) {
+		ret = PTR_ERR(obj);
+		goto vm_put;
+	}
+
+	if (vm) {
+		obj->base.resv = vm->root_obj->base.resv;
+		obj->priv_root = i915_gem_object_get(vm->root_obj);
+		i915_vm_put(vm);
+	}
 
 	return i915_gem_publish(obj, file, &args->size, &args->handle);
+vm_put:
+	if (vm)
+		i915_vm_put(vm);
+
+	return ret;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index f5062d0c6333..6433173c3e84 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags)
 	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
 	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
 
+	if (obj->priv_root) {
+		drm_dbg(obj->base.dev,
+			"Exporting VM private objects is not allowed\n");
+		return ERR_PTR(-EINVAL);
+	}
+
 	exp_info.ops = &i915_dmabuf_ops;
 	exp_info.size = gem_obj->size;
 	exp_info.flags = flags;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index f85f10cf9c34..33d989a20227 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -864,6 +864,10 @@ static struct i915_vma *eb_lookup_vma(struct i915_execbuffer *eb, u32 handle)
 		if (unlikely(!obj))
 			return ERR_PTR(-ENOENT);
 
+		/* VM private objects are not supported here */
+		if (obj->priv_root)
+			return ERR_PTR(-EINVAL);
+
 		/*
 		 * If the user has opted-in for protected-object tracking, make
 		 * sure the object encryption can be used.
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 7ff9c7877bec..271ad62b3245 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -108,6 +108,9 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
  */
 void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
 {
+	if (obj->priv_root && !obj->ttm.created)
+		i915_gem_object_put(obj->priv_root);
+
 	mutex_destroy(&obj->mm.get_page.lock);
 	mutex_destroy(&obj->mm.get_dma_page.lock);
 	dma_resv_fini(&obj->base._resv);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 40305e2bcd49..2e79cfc0b06a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -241,6 +241,9 @@ struct drm_i915_gem_object {
 
 	const struct drm_i915_gem_object_ops *ops;
 
+	/* For VM private BO, points to root_obj in VM. NULL otherwise */
+	struct drm_i915_gem_object *priv_root;
+
 	struct {
 		/**
 		 * @vma.lock: protect the list/tree of vmas
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 0544b0a4a43a..0a02367fae5d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1153,6 +1153,9 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
 	mutex_destroy(&obj->ttm.get_io_page.lock);
 
 	if (obj->ttm.created) {
+		if (obj->priv_root)
+			i915_gem_object_put(obj->priv_root);
+
 		/*
 		 * We freely manage the shrinker LRU outide of the mm.pages life
 		 * cycle. As a result when destroying the object we should be
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index c24e22657617..7ca6a41fc981 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -92,6 +92,7 @@ static void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
 	lockdep_assert_held(&vma->vm->vm_bind_lock);
 
 	list_del_init(&vma->vm_bind_link);
+	list_del_init(&vma->non_priv_vm_bind_link);
 	i915_vm_bind_it_remove(vma, &vma->vm->va);
 
 	/* Release object */
@@ -210,6 +211,11 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		goto put_obj;
 	}
 
+	if (obj->priv_root && obj->priv_root != vm->root_obj) {
+		ret = -EINVAL;
+		goto put_obj;
+	}
+
 	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
 	if (ret)
 		goto put_obj;
@@ -236,6 +242,9 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 
 		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
 		i915_vm_bind_it_insert(vma, &vm->va);
+		if (!obj->priv_root)
+			list_add_tail(&vma->non_priv_vm_bind_link,
+				      &vm->non_priv_vm_bind_list);
 
 		/* Hold object reference until vm_unbind */
 		i915_gem_object_get(vma->obj);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 0daa70c6ed0d..da4f9dee0397 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -177,6 +177,7 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
 void i915_address_space_fini(struct i915_address_space *vm)
 {
 	drm_mm_takedown(&vm->mm);
+	i915_gem_object_put(vm->root_obj);
 	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
 	mutex_destroy(&vm->vm_bind_lock);
 }
@@ -292,6 +293,9 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	INIT_LIST_HEAD(&vm->vm_bind_list);
 	INIT_LIST_HEAD(&vm->vm_bound_list);
 	mutex_init(&vm->vm_bind_lock);
+	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
+	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
+	GEM_BUG_ON(IS_ERR(vm->root_obj));
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index b52061858161..3f2e87d3bf34 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -275,6 +275,8 @@ struct i915_address_space {
 	struct list_head vm_bound_list;
 	/* @va: tree of persistent vmas */
 	struct rb_root_cached va;
+	struct list_head non_priv_vm_bind_list;
+	struct drm_i915_gem_object *root_obj;
 
 	/* Global GTT */
 	bool is_ggtt:1;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 33cb0cbc7fb1..aa332ad69ec2 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	mutex_unlock(&vm->mutex);
 
 	INIT_LIST_HEAD(&vma->vm_bind_link);
+	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
 	return vma;
 
 err_unlock:
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index bed7a344dcd7..6d727c2d9802 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -291,6 +291,8 @@ struct i915_vma {
 
 	/** @vm_bind_link: node for the vm_bind related lists of vm */
 	struct list_head vm_bind_link;
+	/* @non_priv_vm_bind_link: Link in non-private persistent VMA list */
+	struct list_head non_priv_vm_bind_link;
 
 	/** Interval tree structures for persistent vma */
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 4a4f2a77388c..9f93e4afa1c8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3636,9 +3636,13 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
 	 * struct drm_i915_gem_create_ext_protected_content.
+	 *
+	 * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
+	 * struct drm_i915_gem_create_ext_vm_private.
 	 */
 #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
 #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
+#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
 	__u64 extensions;
 };
 
@@ -3756,6 +3760,32 @@ struct drm_i915_gem_create_ext_protected_content {
 /* ID of the protected content session managed by i915 when PXP is active */
 #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
 
+/**
+ * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
+ * private to the specified VM.
+ *
+ * See struct drm_i915_gem_create_ext.
+ *
+ * By default, BOs can be mapped on multiple VMs and can also be dma-buf
+ * exported. Hence these BOs are referred to as Shared BOs.
+ * During each execbuf3 submission, the request fence must be added to the
+ * dma-resv fence list of all shared BOs mapped on the VM.
+ *
+ * Unlike Shared BOs, these VM private BOs can only be mapped on the VM they
+ * are private to and can't be dma-buf exported. All private BOs of a VM share
+ * the dma-resv object. Hence during each execbuf3 submission, they need only
+ * one dma-resv fence list updated. Thus, the fast path (where required
+ * mappings are already bound) submission latency is O(1) w.r.t the number of
+ * VM private BOs.
+ */
+struct drm_i915_gem_create_ext_vm_private {
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+
+	/** @vm_id: Id of the VM to which Object is private */
+	__u32 vm_id;
+};
+
 /**
  * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
  *
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC v4 06/14] drm/i915/vm_bind: Handle persistent vmas
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Treat VM_BIND vmas as persistent across execbuf ioctl calls and handle
them during the request submission in the execbuff path.

Support eviction by maintaining a list of evicted persistent vmas
for rebinding during next submission.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 .../drm/i915/gem/i915_gem_vm_bind_object.c    |  7 +++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  4 ++
 drivers/gpu/drm/i915/i915_gem_gtt.c           | 39 ++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h           |  3 ++
 drivers/gpu/drm/i915/i915_vma.c               | 46 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_vma.h               | 45 +++++++++++++-----
 drivers/gpu/drm/i915/i915_vma_types.h         | 17 +++++++
 8 files changed, 151 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 7ca6a41fc981..236f901b8b9c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -91,6 +91,12 @@ static void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
 {
 	lockdep_assert_held(&vma->vm->vm_bind_lock);
 
+	spin_lock(&vma->vm->vm_rebind_lock);
+	if (!list_empty(&vma->vm_rebind_link))
+		list_del_init(&vma->vm_rebind_link);
+	i915_vma_set_purged(vma);
+	spin_unlock(&vma->vm->vm_rebind_lock);
+
 	list_del_init(&vma->vm_bind_link);
 	list_del_init(&vma->non_priv_vm_bind_link);
 	i915_vm_bind_it_remove(vma, &vma->vm->va);
@@ -181,6 +187,7 @@ static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
 
 	vma->start = va->start;
 	vma->last = va->start + va->length - 1;
+	i915_vma_set_persistent(vma);
 
 	return vma;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index da4f9dee0397..6db31197fa87 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -296,6 +296,8 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
 	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
 	GEM_BUG_ON(IS_ERR(vm->root_obj));
+	INIT_LIST_HEAD(&vm->vm_rebind_list);
+	spin_lock_init(&vm->vm_rebind_lock);
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 3f2e87d3bf34..b73d35b4e05d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -273,6 +273,10 @@ struct i915_address_space {
 	struct list_head vm_bind_list;
 	/** @vm_bound_list: List of vm_binding completed */
 	struct list_head vm_bound_list;
+	/* @vm_rebind_list: list of vmas to be rebinded */
+	struct list_head vm_rebind_list;
+	/* @vm_rebind_lock: protects vm_rebound_list */
+	spinlock_t vm_rebind_lock;
 	/* @va: tree of persistent vmas */
 	struct rb_root_cached va;
 	struct list_head non_priv_vm_bind_list;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 329ff75b80b9..b7d0844de561 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -25,6 +25,45 @@
 #include "i915_trace.h"
 #include "i915_vgpu.h"
 
+/**
+ * i915_vm_sync() - Wait until address space is not in use
+ * @vm: address space
+ *
+ * Waits until all requests using the address space are complete.
+ *
+ * Returns: 0 if success, -ve err code upon failure
+ */
+int i915_vm_sync(struct i915_address_space *vm)
+{
+	int ret;
+
+	/* Wait for all requests under this vm to finish */
+	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
+				    DMA_RESV_USAGE_BOOKKEEP, false,
+				    MAX_SCHEDULE_TIMEOUT);
+	if (ret < 0)
+		return ret;
+	else if (ret > 0)
+		return 0;
+	else
+		return -ETIMEDOUT;
+}
+
+/**
+ * i915_vm_is_active() - Check if address space is being used
+ * @vm: address space
+ *
+ * Check if any request using the specified address space is
+ * active.
+ *
+ * Returns: true if address space is active, false otherwise.
+ */
+bool i915_vm_is_active(const struct i915_address_space *vm)
+{
+	return !dma_resv_test_signaled(vm->root_obj->base.resv,
+				       DMA_RESV_USAGE_BOOKKEEP);
+}
+
 int i915_gem_gtt_prepare_pages(struct drm_i915_gem_object *obj,
 			       struct sg_table *pages)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8c2f57eb5dda..a5bbdc59d9df 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -51,4 +51,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 
 #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
 
+int i915_vm_sync(struct i915_address_space *vm);
+bool i915_vm_is_active(const struct i915_address_space *vm);
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index aa332ad69ec2..ff216e9a2c8d 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
 
 	INIT_LIST_HEAD(&vma->vm_bind_link);
 	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
+	INIT_LIST_HEAD(&vma->vm_rebind_link);
 	return vma;
 
 err_unlock:
@@ -387,6 +388,24 @@ int i915_vma_wait_for_bind(struct i915_vma *vma)
 	return err;
 }
 
+/**
+ * i915_vma_sync() - Wait for the vma to be idle
+ * @vma: vma to be tested
+ *
+ * Returns 0 on success and error code on failure
+ */
+int i915_vma_sync(struct i915_vma *vma)
+{
+	int ret;
+
+	/* Wait for the asynchronous bindings and pending GPU reads */
+	ret = i915_active_wait(&vma->active);
+	if (ret || !i915_vma_is_persistent(vma) || i915_vma_is_purged(vma))
+		return ret;
+
+	return i915_vm_sync(vma->vm);
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 static int i915_vma_verify_bind_complete(struct i915_vma *vma)
 {
@@ -1654,6 +1673,13 @@ static void force_unbind(struct i915_vma *vma)
 	if (!drm_mm_node_allocated(&vma->node))
 		return;
 
+	/*
+	 * Mark persistent vma as purged to avoid it waiting
+	 * for VM to be released.
+	 */
+	if (i915_vma_is_persistent(vma))
+		i915_vma_set_purged(vma);
+
 	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
 	WARN_ON(__i915_vma_unbind(vma));
 	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
@@ -1846,6 +1872,8 @@ int _i915_vma_move_to_active(struct i915_vma *vma,
 	int err;
 
 	assert_object_held(obj);
+	if (i915_vma_is_persistent(vma))
+		return -EINVAL;
 
 	GEM_BUG_ON(!vma->pages);
 
@@ -2015,6 +2043,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
 	__i915_vma_evict(vma, false);
 
 	drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
+
+	if (i915_vma_is_persistent(vma)) {
+		spin_lock(&vma->vm->vm_rebind_lock);
+		if (list_empty(&vma->vm_rebind_link) &&
+		    !i915_vma_is_purged(vma))
+			list_add_tail(&vma->vm_rebind_link,
+				      &vma->vm->vm_rebind_list);
+		spin_unlock(&vma->vm->vm_rebind_lock);
+	}
+
 	return 0;
 }
 
@@ -2046,6 +2084,14 @@ static struct dma_fence *__i915_vma_unbind_async(struct i915_vma *vma)
 		return ERR_PTR(-EBUSY);
 	}
 
+	if (__i915_sw_fence_await_reservation(&vma->resource->chain,
+					      vma->obj->base.resv,
+					      DMA_RESV_USAGE_BOOKKEEP,
+					      i915_fence_timeout(vma->vm->i915),
+					      I915_FENCE_GFP) < 0) {
+		return ERR_PTR(-EBUSY);
+	}
+
 	fence = __i915_vma_evict(vma, true);
 
 	drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 6feef0305fe1..aa536c9ce472 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 
 void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
 #define I915_VMA_RELEASE_MAP BIT(0)
-
-static inline bool i915_vma_is_active(const struct i915_vma *vma)
-{
-	return !i915_active_is_idle(&vma->active);
-}
-
 /* do not reserve memory to prevent deadlocks */
 #define __EXEC_OBJECT_NO_RESERVE BIT(31)
 
@@ -138,6 +132,38 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
 	return i915_vm_to_ggtt(vma->vm)->pin_bias;
 }
 
+static inline bool i915_vma_is_persistent(const struct i915_vma *vma)
+{
+	return test_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
+}
+
+static inline void i915_vma_set_persistent(struct i915_vma *vma)
+{
+	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
+}
+
+static inline bool i915_vma_is_purged(const struct i915_vma *vma)
+{
+	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
+}
+
+static inline void i915_vma_set_purged(struct i915_vma *vma)
+{
+	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
+}
+
+static inline bool i915_vma_is_active(const struct i915_vma *vma)
+{
+	if (i915_vma_is_persistent(vma)) {
+		if (i915_vma_is_purged(vma))
+			return false;
+
+		return i915_vm_is_active(vma->vm);
+	}
+
+	return !i915_active_is_idle(&vma->active);
+}
+
 static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
 {
 	i915_gem_object_get(vma->obj);
@@ -406,12 +432,7 @@ void i915_vma_make_shrinkable(struct i915_vma *vma);
 void i915_vma_make_purgeable(struct i915_vma *vma);
 
 int i915_vma_wait_for_bind(struct i915_vma *vma);
-
-static inline int i915_vma_sync(struct i915_vma *vma)
-{
-	/* Wait for the asynchronous bindings and pending GPU reads */
-	return i915_active_wait(&vma->active);
-}
+int i915_vma_sync(struct i915_vma *vma);
 
 /**
  * i915_vma_get_current_resource - Get the current resource of the vma
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index 6d727c2d9802..d21bf97febaa 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -264,6 +264,21 @@ struct i915_vma {
 #define I915_VMA_SCANOUT_BIT	17
 #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
 
+  /**
+   * I915_VMA_PERSISTENT_BIT:
+   * The vma is persistent (created with VM_BIND call).
+   *
+   * I915_VMA_PURGED_BIT:
+   * The persistent vma is force unbound either due to VM_UNBIND call
+   * from UMD or VM is released. Do not check/wait for VM activeness
+   * in i915_vma_is_active() and i915_vma_sync() calls.
+   */
+#define I915_VMA_PERSISTENT_BIT	19
+#define I915_VMA_PURGED_BIT	20
+
+#define I915_VMA_PERSISTENT	((int)BIT(I915_VMA_PERSISTENT_BIT))
+#define I915_VMA_PURGED		((int)BIT(I915_VMA_PURGED_BIT))
+
 	struct i915_active active;
 
 #define I915_VMA_PAGES_BIAS 24
@@ -293,6 +308,8 @@ struct i915_vma {
 	struct list_head vm_bind_link;
 	/* @non_priv_vm_bind_link: Link in non-private persistent VMA list */
 	struct list_head non_priv_vm_bind_link;
+	/* @vm_rebind_link: link to vm_rebind_list and protected by vm_rebind_lock */
+	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
 
 	/** Interval tree structures for persistent vma */
 
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-gfx] [RFC v4 06/14] drm/i915/vm_bind: Handle persistent vmas
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Treat VM_BIND vmas as persistent across execbuf ioctl calls and handle
them during the request submission in the execbuff path.

Support eviction by maintaining a list of evicted persistent vmas
for rebinding during next submission.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 .../drm/i915/gem/i915_gem_vm_bind_object.c    |  7 +++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  4 ++
 drivers/gpu/drm/i915/i915_gem_gtt.c           | 39 ++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h           |  3 ++
 drivers/gpu/drm/i915/i915_vma.c               | 46 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_vma.h               | 45 +++++++++++++-----
 drivers/gpu/drm/i915/i915_vma_types.h         | 17 +++++++
 8 files changed, 151 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 7ca6a41fc981..236f901b8b9c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -91,6 +91,12 @@ static void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
 {
 	lockdep_assert_held(&vma->vm->vm_bind_lock);
 
+	spin_lock(&vma->vm->vm_rebind_lock);
+	if (!list_empty(&vma->vm_rebind_link))
+		list_del_init(&vma->vm_rebind_link);
+	i915_vma_set_purged(vma);
+	spin_unlock(&vma->vm->vm_rebind_lock);
+
 	list_del_init(&vma->vm_bind_link);
 	list_del_init(&vma->non_priv_vm_bind_link);
 	i915_vm_bind_it_remove(vma, &vma->vm->va);
@@ -181,6 +187,7 @@ static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
 
 	vma->start = va->start;
 	vma->last = va->start + va->length - 1;
+	i915_vma_set_persistent(vma);
 
 	return vma;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index da4f9dee0397..6db31197fa87 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -296,6 +296,8 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
 	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
 	GEM_BUG_ON(IS_ERR(vm->root_obj));
+	INIT_LIST_HEAD(&vm->vm_rebind_list);
+	spin_lock_init(&vm->vm_rebind_lock);
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 3f2e87d3bf34..b73d35b4e05d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -273,6 +273,10 @@ struct i915_address_space {
 	struct list_head vm_bind_list;
 	/** @vm_bound_list: List of vm_binding completed */
 	struct list_head vm_bound_list;
+	/* @vm_rebind_list: list of vmas to be rebinded */
+	struct list_head vm_rebind_list;
+	/* @vm_rebind_lock: protects vm_rebound_list */
+	spinlock_t vm_rebind_lock;
 	/* @va: tree of persistent vmas */
 	struct rb_root_cached va;
 	struct list_head non_priv_vm_bind_list;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 329ff75b80b9..b7d0844de561 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -25,6 +25,45 @@
 #include "i915_trace.h"
 #include "i915_vgpu.h"
 
+/**
+ * i915_vm_sync() - Wait until address space is not in use
+ * @vm: address space
+ *
+ * Waits until all requests using the address space are complete.
+ *
+ * Returns: 0 if success, -ve err code upon failure
+ */
+int i915_vm_sync(struct i915_address_space *vm)
+{
+	int ret;
+
+	/* Wait for all requests under this vm to finish */
+	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
+				    DMA_RESV_USAGE_BOOKKEEP, false,
+				    MAX_SCHEDULE_TIMEOUT);
+	if (ret < 0)
+		return ret;
+	else if (ret > 0)
+		return 0;
+	else
+		return -ETIMEDOUT;
+}
+
+/**
+ * i915_vm_is_active() - Check if address space is being used
+ * @vm: address space
+ *
+ * Check if any request using the specified address space is
+ * active.
+ *
+ * Returns: true if address space is active, false otherwise.
+ */
+bool i915_vm_is_active(const struct i915_address_space *vm)
+{
+	return !dma_resv_test_signaled(vm->root_obj->base.resv,
+				       DMA_RESV_USAGE_BOOKKEEP);
+}
+
 int i915_gem_gtt_prepare_pages(struct drm_i915_gem_object *obj,
 			       struct sg_table *pages)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8c2f57eb5dda..a5bbdc59d9df 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -51,4 +51,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 
 #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
 
+int i915_vm_sync(struct i915_address_space *vm);
+bool i915_vm_is_active(const struct i915_address_space *vm);
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index aa332ad69ec2..ff216e9a2c8d 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
 
 	INIT_LIST_HEAD(&vma->vm_bind_link);
 	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
+	INIT_LIST_HEAD(&vma->vm_rebind_link);
 	return vma;
 
 err_unlock:
@@ -387,6 +388,24 @@ int i915_vma_wait_for_bind(struct i915_vma *vma)
 	return err;
 }
 
+/**
+ * i915_vma_sync() - Wait for the vma to be idle
+ * @vma: vma to be tested
+ *
+ * Returns 0 on success and error code on failure
+ */
+int i915_vma_sync(struct i915_vma *vma)
+{
+	int ret;
+
+	/* Wait for the asynchronous bindings and pending GPU reads */
+	ret = i915_active_wait(&vma->active);
+	if (ret || !i915_vma_is_persistent(vma) || i915_vma_is_purged(vma))
+		return ret;
+
+	return i915_vm_sync(vma->vm);
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 static int i915_vma_verify_bind_complete(struct i915_vma *vma)
 {
@@ -1654,6 +1673,13 @@ static void force_unbind(struct i915_vma *vma)
 	if (!drm_mm_node_allocated(&vma->node))
 		return;
 
+	/*
+	 * Mark persistent vma as purged to avoid it waiting
+	 * for VM to be released.
+	 */
+	if (i915_vma_is_persistent(vma))
+		i915_vma_set_purged(vma);
+
 	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
 	WARN_ON(__i915_vma_unbind(vma));
 	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
@@ -1846,6 +1872,8 @@ int _i915_vma_move_to_active(struct i915_vma *vma,
 	int err;
 
 	assert_object_held(obj);
+	if (i915_vma_is_persistent(vma))
+		return -EINVAL;
 
 	GEM_BUG_ON(!vma->pages);
 
@@ -2015,6 +2043,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
 	__i915_vma_evict(vma, false);
 
 	drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
+
+	if (i915_vma_is_persistent(vma)) {
+		spin_lock(&vma->vm->vm_rebind_lock);
+		if (list_empty(&vma->vm_rebind_link) &&
+		    !i915_vma_is_purged(vma))
+			list_add_tail(&vma->vm_rebind_link,
+				      &vma->vm->vm_rebind_list);
+		spin_unlock(&vma->vm->vm_rebind_lock);
+	}
+
 	return 0;
 }
 
@@ -2046,6 +2084,14 @@ static struct dma_fence *__i915_vma_unbind_async(struct i915_vma *vma)
 		return ERR_PTR(-EBUSY);
 	}
 
+	if (__i915_sw_fence_await_reservation(&vma->resource->chain,
+					      vma->obj->base.resv,
+					      DMA_RESV_USAGE_BOOKKEEP,
+					      i915_fence_timeout(vma->vm->i915),
+					      I915_FENCE_GFP) < 0) {
+		return ERR_PTR(-EBUSY);
+	}
+
 	fence = __i915_vma_evict(vma, true);
 
 	drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 6feef0305fe1..aa536c9ce472 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 
 void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
 #define I915_VMA_RELEASE_MAP BIT(0)
-
-static inline bool i915_vma_is_active(const struct i915_vma *vma)
-{
-	return !i915_active_is_idle(&vma->active);
-}
-
 /* do not reserve memory to prevent deadlocks */
 #define __EXEC_OBJECT_NO_RESERVE BIT(31)
 
@@ -138,6 +132,38 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
 	return i915_vm_to_ggtt(vma->vm)->pin_bias;
 }
 
+static inline bool i915_vma_is_persistent(const struct i915_vma *vma)
+{
+	return test_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
+}
+
+static inline void i915_vma_set_persistent(struct i915_vma *vma)
+{
+	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
+}
+
+static inline bool i915_vma_is_purged(const struct i915_vma *vma)
+{
+	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
+}
+
+static inline void i915_vma_set_purged(struct i915_vma *vma)
+{
+	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
+}
+
+static inline bool i915_vma_is_active(const struct i915_vma *vma)
+{
+	if (i915_vma_is_persistent(vma)) {
+		if (i915_vma_is_purged(vma))
+			return false;
+
+		return i915_vm_is_active(vma->vm);
+	}
+
+	return !i915_active_is_idle(&vma->active);
+}
+
 static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
 {
 	i915_gem_object_get(vma->obj);
@@ -406,12 +432,7 @@ void i915_vma_make_shrinkable(struct i915_vma *vma);
 void i915_vma_make_purgeable(struct i915_vma *vma);
 
 int i915_vma_wait_for_bind(struct i915_vma *vma);
-
-static inline int i915_vma_sync(struct i915_vma *vma)
-{
-	/* Wait for the asynchronous bindings and pending GPU reads */
-	return i915_active_wait(&vma->active);
-}
+int i915_vma_sync(struct i915_vma *vma);
 
 /**
  * i915_vma_get_current_resource - Get the current resource of the vma
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index 6d727c2d9802..d21bf97febaa 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -264,6 +264,21 @@ struct i915_vma {
 #define I915_VMA_SCANOUT_BIT	17
 #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
 
+  /**
+   * I915_VMA_PERSISTENT_BIT:
+   * The vma is persistent (created with VM_BIND call).
+   *
+   * I915_VMA_PURGED_BIT:
+   * The persistent vma is force unbound either due to VM_UNBIND call
+   * from UMD or VM is released. Do not check/wait for VM activeness
+   * in i915_vma_is_active() and i915_vma_sync() calls.
+   */
+#define I915_VMA_PERSISTENT_BIT	19
+#define I915_VMA_PURGED_BIT	20
+
+#define I915_VMA_PERSISTENT	((int)BIT(I915_VMA_PERSISTENT_BIT))
+#define I915_VMA_PURGED		((int)BIT(I915_VMA_PURGED_BIT))
+
 	struct i915_active active;
 
 #define I915_VMA_PAGES_BIAS 24
@@ -293,6 +308,8 @@ struct i915_vma {
 	struct list_head vm_bind_link;
 	/* @non_priv_vm_bind_link: Link in non-private persistent VMA list */
 	struct list_head non_priv_vm_bind_link;
+	/* @vm_rebind_link: link to vm_rebind_list and protected by vm_rebind_lock */
+	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
 
 	/** Interval tree structures for persistent vma */
 
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC v4 07/14] drm/i915/vm_bind: Add out fence support
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Add support for handling out fence for vm_bind call.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  4 +
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 81 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_vma.c               |  6 +-
 drivers/gpu/drm/i915/i915_vma_types.h         |  7 ++
 4 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
index 4f3cfa1f6ef6..facba29ead04 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
@@ -6,6 +6,7 @@
 #ifndef __I915_GEM_VM_BIND_H
 #define __I915_GEM_VM_BIND_H
 
+#include <linux/dma-fence.h>
 #include <linux/types.h>
 
 #include <drm/drm_file.h>
@@ -24,4 +25,7 @@ int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
 
 void i915_gem_vm_unbind_all(struct i915_address_space *vm);
 
+void i915_vm_bind_signal_fence(struct i915_vma *vma,
+			       struct dma_fence * const fence);
+
 #endif /* __I915_GEM_VM_BIND_H */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 236f901b8b9c..5cd788404ee7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -7,6 +7,8 @@
 
 #include <linux/interval_tree_generic.h>
 
+#include <drm/drm_syncobj.h>
+
 #include "gem/i915_gem_context.h"
 #include "gem/i915_gem_vm_bind.h"
 
@@ -106,6 +108,75 @@ static void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
 		i915_gem_object_put(vma->obj);
 }
 
+static int i915_vm_bind_add_fence(struct drm_file *file, struct i915_vma *vma,
+				  u32 handle, u64 point)
+{
+	struct drm_syncobj *syncobj;
+
+	syncobj = drm_syncobj_find(file, handle);
+	if (!syncobj) {
+		DRM_DEBUG("Invalid syncobj handle provided\n");
+		return -ENOENT;
+	}
+
+	/*
+	 * For timeline syncobjs we need to preallocate chains for
+	 * later signaling.
+	 */
+	if (point) {
+		vma->vm_bind_fence.chain_fence = dma_fence_chain_alloc();
+		if (!vma->vm_bind_fence.chain_fence) {
+			drm_syncobj_put(syncobj);
+			return -ENOMEM;
+		}
+	} else {
+		vma->vm_bind_fence.chain_fence = NULL;
+	}
+	vma->vm_bind_fence.syncobj = syncobj;
+	vma->vm_bind_fence.value = point;
+
+	return 0;
+}
+
+static void i915_vm_bind_put_fence(struct i915_vma *vma)
+{
+	if (!vma->vm_bind_fence.syncobj)
+		return;
+
+	drm_syncobj_put(vma->vm_bind_fence.syncobj);
+	dma_fence_chain_free(vma->vm_bind_fence.chain_fence);
+}
+
+/**
+ * i915_vm_bind_signal_fence() - Add fence to vm_bind syncobj
+ * @vma: vma mapping requiring signaling
+ * @fence: fence to be added
+ *
+ * Associate specified @fence with the @vma's syncobj to be
+ * signaled after the @fence work completes.
+ */
+void i915_vm_bind_signal_fence(struct i915_vma *vma,
+			       struct dma_fence * const fence)
+{
+	struct drm_syncobj *syncobj = vma->vm_bind_fence.syncobj;
+
+	if (!syncobj)
+		return;
+
+	if (vma->vm_bind_fence.chain_fence) {
+		drm_syncobj_add_point(syncobj,
+				      vma->vm_bind_fence.chain_fence,
+				      fence, vma->vm_bind_fence.value);
+		/*
+		 * The chain's ownership is transferred to the
+		 * timeline.
+		 */
+		vma->vm_bind_fence.chain_fence = NULL;
+	} else {
+		drm_syncobj_replace_fence(syncobj, fence);
+	}
+}
+
 static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
 				  struct drm_i915_gem_vm_unbind *va)
 {
@@ -233,6 +304,13 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		goto unlock_vm;
 	}
 
+	if (va->fence.flags & I915_TIMELINE_FENCE_SIGNAL) {
+		ret = i915_vm_bind_add_fence(file, vma, va->fence.handle,
+					     va->fence.value);
+		if (ret)
+			goto put_vma;
+	}
+
 	pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
 
 	for_i915_gem_ww(&ww, ret, true) {
@@ -257,6 +335,9 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		i915_gem_object_get(vma->obj);
 	}
 
+	if (va->fence.flags & I915_TIMELINE_FENCE_SIGNAL)
+		i915_vm_bind_put_fence(vma);
+put_vma:
 	if (ret)
 		i915_vma_destroy(vma);
 unlock_vm:
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index ff216e9a2c8d..f7d711e675d6 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1540,8 +1540,12 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 err_vma_res:
 	i915_vma_resource_free(vma_res);
 err_fence:
-	if (work)
+	if (work) {
+		if (i915_vma_is_persistent(vma))
+			i915_vm_bind_signal_fence(vma, &work->base.dma);
+
 		dma_fence_work_commit_imm(&work->base);
+	}
 err_rpm:
 	if (wakeref)
 		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index d21bf97febaa..7fdbf73666e9 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -311,6 +311,13 @@ struct i915_vma {
 	/* @vm_rebind_link: link to vm_rebind_list and protected by vm_rebind_lock */
 	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
 
+	/** Timeline fence for vm_bind completion notification */
+	struct {
+		struct dma_fence_chain *chain_fence;
+		struct drm_syncobj *syncobj;
+		u64 value;
+	} vm_bind_fence;
+
 	/** Interval tree structures for persistent vma */
 
 	/** @rb: node for the interval tree of vm for persistent vmas */
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-gfx] [RFC v4 07/14] drm/i915/vm_bind: Add out fence support
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Add support for handling out fence for vm_bind call.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  4 +
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 81 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_vma.c               |  6 +-
 drivers/gpu/drm/i915/i915_vma_types.h         |  7 ++
 4 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
index 4f3cfa1f6ef6..facba29ead04 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
@@ -6,6 +6,7 @@
 #ifndef __I915_GEM_VM_BIND_H
 #define __I915_GEM_VM_BIND_H
 
+#include <linux/dma-fence.h>
 #include <linux/types.h>
 
 #include <drm/drm_file.h>
@@ -24,4 +25,7 @@ int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
 
 void i915_gem_vm_unbind_all(struct i915_address_space *vm);
 
+void i915_vm_bind_signal_fence(struct i915_vma *vma,
+			       struct dma_fence * const fence);
+
 #endif /* __I915_GEM_VM_BIND_H */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 236f901b8b9c..5cd788404ee7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -7,6 +7,8 @@
 
 #include <linux/interval_tree_generic.h>
 
+#include <drm/drm_syncobj.h>
+
 #include "gem/i915_gem_context.h"
 #include "gem/i915_gem_vm_bind.h"
 
@@ -106,6 +108,75 @@ static void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
 		i915_gem_object_put(vma->obj);
 }
 
+static int i915_vm_bind_add_fence(struct drm_file *file, struct i915_vma *vma,
+				  u32 handle, u64 point)
+{
+	struct drm_syncobj *syncobj;
+
+	syncobj = drm_syncobj_find(file, handle);
+	if (!syncobj) {
+		DRM_DEBUG("Invalid syncobj handle provided\n");
+		return -ENOENT;
+	}
+
+	/*
+	 * For timeline syncobjs we need to preallocate chains for
+	 * later signaling.
+	 */
+	if (point) {
+		vma->vm_bind_fence.chain_fence = dma_fence_chain_alloc();
+		if (!vma->vm_bind_fence.chain_fence) {
+			drm_syncobj_put(syncobj);
+			return -ENOMEM;
+		}
+	} else {
+		vma->vm_bind_fence.chain_fence = NULL;
+	}
+	vma->vm_bind_fence.syncobj = syncobj;
+	vma->vm_bind_fence.value = point;
+
+	return 0;
+}
+
+static void i915_vm_bind_put_fence(struct i915_vma *vma)
+{
+	if (!vma->vm_bind_fence.syncobj)
+		return;
+
+	drm_syncobj_put(vma->vm_bind_fence.syncobj);
+	dma_fence_chain_free(vma->vm_bind_fence.chain_fence);
+}
+
+/**
+ * i915_vm_bind_signal_fence() - Add fence to vm_bind syncobj
+ * @vma: vma mapping requiring signaling
+ * @fence: fence to be added
+ *
+ * Associate specified @fence with the @vma's syncobj to be
+ * signaled after the @fence work completes.
+ */
+void i915_vm_bind_signal_fence(struct i915_vma *vma,
+			       struct dma_fence * const fence)
+{
+	struct drm_syncobj *syncobj = vma->vm_bind_fence.syncobj;
+
+	if (!syncobj)
+		return;
+
+	if (vma->vm_bind_fence.chain_fence) {
+		drm_syncobj_add_point(syncobj,
+				      vma->vm_bind_fence.chain_fence,
+				      fence, vma->vm_bind_fence.value);
+		/*
+		 * The chain's ownership is transferred to the
+		 * timeline.
+		 */
+		vma->vm_bind_fence.chain_fence = NULL;
+	} else {
+		drm_syncobj_replace_fence(syncobj, fence);
+	}
+}
+
 static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
 				  struct drm_i915_gem_vm_unbind *va)
 {
@@ -233,6 +304,13 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		goto unlock_vm;
 	}
 
+	if (va->fence.flags & I915_TIMELINE_FENCE_SIGNAL) {
+		ret = i915_vm_bind_add_fence(file, vma, va->fence.handle,
+					     va->fence.value);
+		if (ret)
+			goto put_vma;
+	}
+
 	pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
 
 	for_i915_gem_ww(&ww, ret, true) {
@@ -257,6 +335,9 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		i915_gem_object_get(vma->obj);
 	}
 
+	if (va->fence.flags & I915_TIMELINE_FENCE_SIGNAL)
+		i915_vm_bind_put_fence(vma);
+put_vma:
 	if (ret)
 		i915_vma_destroy(vma);
 unlock_vm:
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index ff216e9a2c8d..f7d711e675d6 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1540,8 +1540,12 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 err_vma_res:
 	i915_vma_resource_free(vma_res);
 err_fence:
-	if (work)
+	if (work) {
+		if (i915_vma_is_persistent(vma))
+			i915_vm_bind_signal_fence(vma, &work->base.dma);
+
 		dma_fence_work_commit_imm(&work->base);
+	}
 err_rpm:
 	if (wakeref)
 		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index d21bf97febaa..7fdbf73666e9 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -311,6 +311,13 @@ struct i915_vma {
 	/* @vm_rebind_link: link to vm_rebind_list and protected by vm_rebind_lock */
 	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
 
+	/** Timeline fence for vm_bind completion notification */
+	struct {
+		struct dma_fence_chain *chain_fence;
+		struct drm_syncobj *syncobj;
+		u64 value;
+	} vm_bind_fence;
+
 	/** Interval tree structures for persistent vma */
 
 	/** @rb: node for the interval tree of vm for persistent vmas */
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC v4 08/14] drm/i915/vm_bind: Abstract out common execbuf functions
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

The new execbuf3 ioctl path and the legacy execbuf ioctl
paths have many common functionalities.
Share code between these two paths by abstracting out the
common functionalities into a separate file where possible.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 507 ++---------------
 .../drm/i915/gem/i915_gem_execbuffer_common.c | 530 ++++++++++++++++++
 .../drm/i915/gem/i915_gem_execbuffer_common.h |  47 ++
 4 files changed, 612 insertions(+), 473 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 9bf939ef18ea..bf952f478555 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -148,6 +148,7 @@ gem-y += \
 	gem/i915_gem_create.o \
 	gem/i915_gem_dmabuf.o \
 	gem/i915_gem_domain.o \
+	gem/i915_gem_execbuffer_common.o \
 	gem/i915_gem_execbuffer.o \
 	gem/i915_gem_internal.o \
 	gem/i915_gem_object.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 33d989a20227..363b2a788cdf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -9,8 +9,6 @@
 #include <linux/sync_file.h>
 #include <linux/uaccess.h>
 
-#include <drm/drm_syncobj.h>
-
 #include "display/intel_frontbuffer.h"
 
 #include "gem/i915_gem_ioctls.h"
@@ -28,6 +26,7 @@
 #include "i915_file_private.h"
 #include "i915_gem_clflush.h"
 #include "i915_gem_context.h"
+#include "i915_gem_execbuffer_common.h"
 #include "i915_gem_evict.h"
 #include "i915_gem_ioctls.h"
 #include "i915_trace.h"
@@ -235,13 +234,6 @@ enum {
  * the batchbuffer in trusted mode, otherwise the ioctl is rejected.
  */
 
-struct eb_fence {
-	struct drm_syncobj *syncobj; /* Use with ptr_mask_bits() */
-	struct dma_fence *dma_fence;
-	u64 value;
-	struct dma_fence_chain *chain_fence;
-};
-
 struct i915_execbuffer {
 	struct drm_i915_private *i915; /** i915 backpointer */
 	struct drm_file *file; /** per-file lookup tables and limits */
@@ -2446,164 +2438,29 @@ static const enum intel_engine_id user_ring_map[] = {
 	[I915_EXEC_VEBOX]	= VECS0
 };
 
-static struct i915_request *eb_throttle(struct i915_execbuffer *eb, struct intel_context *ce)
-{
-	struct intel_ring *ring = ce->ring;
-	struct intel_timeline *tl = ce->timeline;
-	struct i915_request *rq;
-
-	/*
-	 * Completely unscientific finger-in-the-air estimates for suitable
-	 * maximum user request size (to avoid blocking) and then backoff.
-	 */
-	if (intel_ring_update_space(ring) >= PAGE_SIZE)
-		return NULL;
-
-	/*
-	 * Find a request that after waiting upon, there will be at least half
-	 * the ring available. The hysteresis allows us to compete for the
-	 * shared ring and should mean that we sleep less often prior to
-	 * claiming our resources, but not so long that the ring completely
-	 * drains before we can submit our next request.
-	 */
-	list_for_each_entry(rq, &tl->requests, link) {
-		if (rq->ring != ring)
-			continue;
-
-		if (__intel_ring_space(rq->postfix,
-				       ring->emit, ring->size) > ring->size / 2)
-			break;
-	}
-	if (&rq->link == &tl->requests)
-		return NULL; /* weird, we will check again later for real */
-
-	return i915_request_get(rq);
-}
-
-static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
-			   bool throttle)
-{
-	struct intel_timeline *tl;
-	struct i915_request *rq = NULL;
-
-	/*
-	 * Take a local wakeref for preparing to dispatch the execbuf as
-	 * we expect to access the hardware fairly frequently in the
-	 * process, and require the engine to be kept awake between accesses.
-	 * Upon dispatch, we acquire another prolonged wakeref that we hold
-	 * until the timeline is idle, which in turn releases the wakeref
-	 * taken on the engine, and the parent device.
-	 */
-	tl = intel_context_timeline_lock(ce);
-	if (IS_ERR(tl))
-		return PTR_ERR(tl);
-
-	intel_context_enter(ce);
-	if (throttle)
-		rq = eb_throttle(eb, ce);
-	intel_context_timeline_unlock(tl);
-
-	if (rq) {
-		bool nonblock = eb->file->filp->f_flags & O_NONBLOCK;
-		long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
-
-		if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
-				      timeout) < 0) {
-			i915_request_put(rq);
-
-			/*
-			 * Error path, cannot use intel_context_timeline_lock as
-			 * that is user interruptable and this clean up step
-			 * must be done.
-			 */
-			mutex_lock(&ce->timeline->mutex);
-			intel_context_exit(ce);
-			mutex_unlock(&ce->timeline->mutex);
-
-			if (nonblock)
-				return -EWOULDBLOCK;
-			else
-				return -EINTR;
-		}
-		i915_request_put(rq);
-	}
-
-	return 0;
-}
-
 static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
 {
-	struct intel_context *ce = eb->context, *child;
 	int err;
-	int i = 0, j = 0;
 
 	GEM_BUG_ON(eb->args->flags & __EXEC_ENGINE_PINNED);
 
-	if (unlikely(intel_context_is_banned(ce)))
-		return -EIO;
-
-	/*
-	 * Pinning the contexts may generate requests in order to acquire
-	 * GGTT space, so do this first before we reserve a seqno for
-	 * ourselves.
-	 */
-	err = intel_context_pin_ww(ce, &eb->ww);
+	err = __eb_pin_engine(eb->context, &eb->ww, throttle,
+			      eb->file->filp->f_flags & O_NONBLOCK);
 	if (err)
 		return err;
-	for_each_child(ce, child) {
-		err = intel_context_pin_ww(child, &eb->ww);
-		GEM_BUG_ON(err);	/* perma-pinned should incr a counter */
-	}
-
-	for_each_child(ce, child) {
-		err = eb_pin_timeline(eb, child, throttle);
-		if (err)
-			goto unwind;
-		++i;
-	}
-	err = eb_pin_timeline(eb, ce, throttle);
-	if (err)
-		goto unwind;
 
 	eb->args->flags |= __EXEC_ENGINE_PINNED;
 	return 0;
-
-unwind:
-	for_each_child(ce, child) {
-		if (j++ < i) {
-			mutex_lock(&child->timeline->mutex);
-			intel_context_exit(child);
-			mutex_unlock(&child->timeline->mutex);
-		}
-	}
-	for_each_child(ce, child)
-		intel_context_unpin(child);
-	intel_context_unpin(ce);
-	return err;
 }
 
 static void eb_unpin_engine(struct i915_execbuffer *eb)
 {
-	struct intel_context *ce = eb->context, *child;
-
 	if (!(eb->args->flags & __EXEC_ENGINE_PINNED))
 		return;
 
 	eb->args->flags &= ~__EXEC_ENGINE_PINNED;
 
-	for_each_child(ce, child) {
-		mutex_lock(&child->timeline->mutex);
-		intel_context_exit(child);
-		mutex_unlock(&child->timeline->mutex);
-
-		intel_context_unpin(child);
-	}
-
-	mutex_lock(&ce->timeline->mutex);
-	intel_context_exit(ce);
-	mutex_unlock(&ce->timeline->mutex);
-
-	intel_context_unpin(ce);
+	__eb_unpin_engine(eb->context);
 }
 
 static unsigned int
@@ -2652,7 +2509,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
 static int
 eb_select_engine(struct i915_execbuffer *eb)
 {
-	struct intel_context *ce, *child;
+	struct intel_context *ce;
 	unsigned int idx;
 	int err;
 
@@ -2677,36 +2534,10 @@ eb_select_engine(struct i915_execbuffer *eb)
 	}
 	eb->num_batches = ce->parallel.number_children + 1;
 
-	for_each_child(ce, child)
-		intel_context_get(child);
-	intel_gt_pm_get(ce->engine->gt);
-
-	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
-		err = intel_context_alloc_state(ce);
-		if (err)
-			goto err;
-	}
-	for_each_child(ce, child) {
-		if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
-			err = intel_context_alloc_state(child);
-			if (err)
-				goto err;
-		}
-	}
-
-	/*
-	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
-	 * EIO if the GPU is already wedged.
-	 */
-	err = intel_gt_terminally_wedged(ce->engine->gt);
+	err = __eb_select_engine(ce);
 	if (err)
 		goto err;
 
-	if (!i915_vm_tryget(ce->vm)) {
-		err = -ENOENT;
-		goto err;
-	}
-
 	eb->context = ce;
 	eb->gt = ce->engine->gt;
 
@@ -2715,12 +2546,9 @@ eb_select_engine(struct i915_execbuffer *eb)
 	 * during ww handling. The pool is destroyed when last pm reference
 	 * is dropped, which breaks our -EDEADLK handling.
 	 */
-	return err;
+	return 0;
 
 err:
-	intel_gt_pm_put(ce->engine->gt);
-	for_each_child(ce, child)
-		intel_context_put(child);
 	intel_context_put(ce);
 	return err;
 }
@@ -2728,24 +2556,7 @@ eb_select_engine(struct i915_execbuffer *eb)
 static void
 eb_put_engine(struct i915_execbuffer *eb)
 {
-	struct intel_context *child;
-
-	i915_vm_put(eb->context->vm);
-	intel_gt_pm_put(eb->gt);
-	for_each_child(eb->context, child)
-		intel_context_put(child);
-	intel_context_put(eb->context);
-}
-
-static void
-__free_fence_array(struct eb_fence *fences, unsigned int n)
-{
-	while (n--) {
-		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
-		dma_fence_put(fences[n].dma_fence);
-		dma_fence_chain_free(fences[n].chain_fence);
-	}
-	kvfree(fences);
+	__eb_put_engine(eb->context, eb->gt);
 }
 
 static int
@@ -2756,7 +2567,6 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
 	u64 __user *user_values;
 	struct eb_fence *f;
 	u64 nfences;
-	int err = 0;
 
 	nfences = timeline_fences->fence_count;
 	if (!nfences)
@@ -2791,9 +2601,9 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
 
 	while (nfences--) {
 		struct drm_i915_gem_exec_fence user_fence;
-		struct drm_syncobj *syncobj;
-		struct dma_fence *fence = NULL;
+		bool wait, signal;
 		u64 point;
+		int ret;
 
 		if (__copy_from_user(&user_fence,
 				     user_fences++,
@@ -2806,70 +2616,15 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
 		if (__get_user(point, user_values++))
 			return -EFAULT;
 
-		syncobj = drm_syncobj_find(eb->file, user_fence.handle);
-		if (!syncobj) {
-			DRM_DEBUG("Invalid syncobj handle provided\n");
-			return -ENOENT;
-		}
-
-		fence = drm_syncobj_fence_get(syncobj);
-
-		if (!fence && user_fence.flags &&
-		    !(user_fence.flags & I915_EXEC_FENCE_SIGNAL)) {
-			DRM_DEBUG("Syncobj handle has no fence\n");
-			drm_syncobj_put(syncobj);
-			return -EINVAL;
-		}
-
-		if (fence)
-			err = dma_fence_chain_find_seqno(&fence, point);
-
-		if (err && !(user_fence.flags & I915_EXEC_FENCE_SIGNAL)) {
-			DRM_DEBUG("Syncobj handle missing requested point %llu\n", point);
-			dma_fence_put(fence);
-			drm_syncobj_put(syncobj);
-			return err;
-		}
-
-		/*
-		 * A point might have been signaled already and
-		 * garbage collected from the timeline. In this case
-		 * just ignore the point and carry on.
-		 */
-		if (!fence && !(user_fence.flags & I915_EXEC_FENCE_SIGNAL)) {
-			drm_syncobj_put(syncobj);
+		wait = user_fence.flags & I915_EXEC_FENCE_WAIT;
+		signal = user_fence.flags & I915_EXEC_FENCE_SIGNAL;
+		ret = add_timeline_fence(eb->file, user_fence.handle, point,
+					 f, wait, signal);
+		if (ret < 0)
+			return ret;
+		else if (!ret)
 			continue;
-		}
 
-		/*
-		 * For timeline syncobjs we need to preallocate chains for
-		 * later signaling.
-		 */
-		if (point != 0 && user_fence.flags & I915_EXEC_FENCE_SIGNAL) {
-			/*
-			 * Waiting and signaling the same point (when point !=
-			 * 0) would break the timeline.
-			 */
-			if (user_fence.flags & I915_EXEC_FENCE_WAIT) {
-				DRM_DEBUG("Trying to wait & signal the same timeline point.\n");
-				dma_fence_put(fence);
-				drm_syncobj_put(syncobj);
-				return -EINVAL;
-			}
-
-			f->chain_fence = dma_fence_chain_alloc();
-			if (!f->chain_fence) {
-				drm_syncobj_put(syncobj);
-				dma_fence_put(fence);
-				return -ENOMEM;
-			}
-		} else {
-			f->chain_fence = NULL;
-		}
-
-		f->syncobj = ptr_pack_bits(syncobj, user_fence.flags, 2);
-		f->dma_fence = fence;
-		f->value = point;
 		f++;
 		eb->num_fences++;
 	}
@@ -2949,65 +2704,6 @@ static int add_fence_array(struct i915_execbuffer *eb)
 	return 0;
 }
 
-static void put_fence_array(struct eb_fence *fences, int num_fences)
-{
-	if (fences)
-		__free_fence_array(fences, num_fences);
-}
-
-static int
-await_fence_array(struct i915_execbuffer *eb,
-		  struct i915_request *rq)
-{
-	unsigned int n;
-	int err;
-
-	for (n = 0; n < eb->num_fences; n++) {
-		struct drm_syncobj *syncobj;
-		unsigned int flags;
-
-		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
-
-		if (!eb->fences[n].dma_fence)
-			continue;
-
-		err = i915_request_await_dma_fence(rq, eb->fences[n].dma_fence);
-		if (err < 0)
-			return err;
-	}
-
-	return 0;
-}
-
-static void signal_fence_array(const struct i915_execbuffer *eb,
-			       struct dma_fence * const fence)
-{
-	unsigned int n;
-
-	for (n = 0; n < eb->num_fences; n++) {
-		struct drm_syncobj *syncobj;
-		unsigned int flags;
-
-		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
-		if (!(flags & I915_EXEC_FENCE_SIGNAL))
-			continue;
-
-		if (eb->fences[n].chain_fence) {
-			drm_syncobj_add_point(syncobj,
-					      eb->fences[n].chain_fence,
-					      fence,
-					      eb->fences[n].value);
-			/*
-			 * The chain's ownership is transferred to the
-			 * timeline.
-			 */
-			eb->fences[n].chain_fence = NULL;
-		} else {
-			drm_syncobj_replace_fence(syncobj, fence);
-		}
-	}
-}
-
 static int
 parse_timeline_fences(struct i915_user_extension __user *ext, void *data)
 {
@@ -3020,80 +2716,6 @@ parse_timeline_fences(struct i915_user_extension __user *ext, void *data)
 	return add_timeline_fence_array(eb, &timeline_fences);
 }
 
-static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
-{
-	struct i915_request *rq, *rn;
-
-	list_for_each_entry_safe(rq, rn, &tl->requests, link)
-		if (rq == end || !i915_request_retire(rq))
-			break;
-}
-
-static int eb_request_add(struct i915_execbuffer *eb, struct i915_request *rq,
-			  int err, bool last_parallel)
-{
-	struct intel_timeline * const tl = i915_request_timeline(rq);
-	struct i915_sched_attr attr = {};
-	struct i915_request *prev;
-
-	lockdep_assert_held(&tl->mutex);
-	lockdep_unpin_lock(&tl->mutex, rq->cookie);
-
-	trace_i915_request_add(rq);
-
-	prev = __i915_request_commit(rq);
-
-	/* Check that the context wasn't destroyed before submission */
-	if (likely(!intel_context_is_closed(eb->context))) {
-		attr = eb->gem_context->sched;
-	} else {
-		/* Serialise with context_close via the add_to_timeline */
-		i915_request_set_error_once(rq, -ENOENT);
-		__i915_request_skip(rq);
-		err = -ENOENT; /* override any transient errors */
-	}
-
-	if (intel_context_is_parallel(eb->context)) {
-		if (err) {
-			__i915_request_skip(rq);
-			set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
-				&rq->fence.flags);
-		}
-		if (last_parallel)
-			set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
-				&rq->fence.flags);
-	}
-
-	__i915_request_queue(rq, &attr);
-
-	/* Try to clean up the client's timeline after submitting the request */
-	if (prev)
-		retire_requests(tl, prev);
-
-	mutex_unlock(&tl->mutex);
-
-	return err;
-}
-
-static int eb_requests_add(struct i915_execbuffer *eb, int err)
-{
-	int i;
-
-	/*
-	 * We iterate in reverse order of creation to release timeline mutexes in
-	 * same order.
-	 */
-	for_each_batch_add_order(eb, i) {
-		struct i915_request *rq = eb->requests[i];
-
-		if (!rq)
-			continue;
-		err |= eb_request_add(eb, rq, err, i == 0);
-	}
-
-	return err;
-}
-
 static const i915_user_extension_fn execbuf_extensions[] = {
 	[DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES] = parse_timeline_fences,
 };
@@ -3120,73 +2742,26 @@ parse_execbuf2_extensions(struct drm_i915_gem_execbuffer2 *args,
 				    eb);
 }
 
-static void eb_requests_get(struct i915_execbuffer *eb)
-{
-	unsigned int i;
-
-	for_each_batch_create_order(eb, i) {
-		if (!eb->requests[i])
-			break;
-
-		i915_request_get(eb->requests[i]);
-	}
-}
-
-static void eb_requests_put(struct i915_execbuffer *eb)
-{
-	unsigned int i;
-
-	for_each_batch_create_order(eb, i) {
-		if (!eb->requests[i])
-			break;
-
-		i915_request_put(eb->requests[i]);
-	}
-}
-
 static struct sync_file *
 eb_composite_fence_create(struct i915_execbuffer *eb, int out_fence_fd)
 {
 	struct sync_file *out_fence = NULL;
-	struct dma_fence_array *fence_array;
-	struct dma_fence **fences;
-	unsigned int i;
-
-	GEM_BUG_ON(!intel_context_is_parent(eb->context));
+	struct dma_fence *fence;
 
-	fences = kmalloc_array(eb->num_batches, sizeof(*fences), GFP_KERNEL);
-	if (!fences)
-		return ERR_PTR(-ENOMEM);
-
-	for_each_batch_create_order(eb, i) {
-		fences[i] = &eb->requests[i]->fence;
-		__set_bit(I915_FENCE_FLAG_COMPOSITE,
-			  &eb->requests[i]->fence.flags);
-	}
-
-	fence_array = dma_fence_array_create(eb->num_batches,
-					     fences,
-					     eb->context->parallel.fence_context,
-					     eb->context->parallel.seqno++,
-					     false);
-	if (!fence_array) {
-		kfree(fences);
-		return ERR_PTR(-ENOMEM);
-	}
-
-	/* Move ownership to the dma_fence_array created above */
-	for_each_batch_create_order(eb, i)
-		dma_fence_get(fences[i]);
+	fence = __eb_composite_fence_create(eb->requests, eb->num_batches,
+					    eb->context);
+	if (IS_ERR(fence))
+		return ERR_CAST(fence);
 
 	if (out_fence_fd != -1) {
-		out_fence = sync_file_create(&fence_array->base);
+		out_fence = sync_file_create(fence);
 		/* sync_file now owns fence_arry, drop creation ref */
-		dma_fence_put(&fence_array->base);
+		dma_fence_put(fence);
 		if (!out_fence)
 			return ERR_PTR(-ENOMEM);
 	}
 
-	eb->composite_fence = &fence_array->base;
+	eb->composite_fence = fence;
 
 	return out_fence;
 }
@@ -3218,7 +2793,7 @@ eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq,
 	}
 
 	if (eb->fences) {
-		err = await_fence_array(eb, rq);
+		err = await_fence_array(eb->fences, eb->num_fences, rq);
 		if (err)
 			return ERR_PTR(err);
 	}
@@ -3236,23 +2811,6 @@ eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq,
 	return out_fence;
 }
 
-static struct intel_context *
-eb_find_context(struct i915_execbuffer *eb, unsigned int context_number)
-{
-	struct intel_context *child;
-
-	if (likely(context_number == 0))
-		return eb->context;
-
-	for_each_child(eb->context, child)
-		if (!--context_number)
-			return child;
-
-	GEM_BUG_ON("Context not found");
-
-	return NULL;
-}
-
 static struct sync_file *
 eb_requests_create(struct i915_execbuffer *eb, struct dma_fence *in_fence,
 		   int out_fence_fd)
@@ -3262,7 +2820,8 @@ eb_requests_create(struct i915_execbuffer *eb, struct dma_fence *in_fence,
 
 	for_each_batch_create_order(eb, i) {
 		/* Allocate a request for this batch buffer nice and early. */
-		eb->requests[i] = i915_request_create(eb_find_context(eb, i));
+		eb->requests[i] =
+			i915_request_create(eb_find_context(eb->context, i));
 		if (IS_ERR(eb->requests[i])) {
 			out_fence = ERR_CAST(eb->requests[i]);
 			eb->requests[i] = NULL;
@@ -3442,11 +3001,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	err = eb_submit(&eb);
 
 err_request:
-	eb_requests_get(&eb);
-	err = eb_requests_add(&eb, err);
+	eb_requests_get(eb.requests, eb.num_batches);
+	err = eb_requests_add(eb.requests, eb.num_batches, eb.context,
+			      eb.gem_context->sched, err);
 
 	if (eb.fences)
-		signal_fence_array(&eb, eb.composite_fence ?
+		signal_fence_array(eb.fences, eb.num_fences,
+				   eb.composite_fence ?
 				   eb.composite_fence :
 				   &eb.requests[0]->fence);
 
@@ -3471,7 +3032,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (!out_fence && eb.composite_fence)
 		dma_fence_put(eb.composite_fence);
 
-	eb_requests_put(&eb);
+	eb_requests_put(eb.requests, eb.num_batches);
 
 err_vma:
 	eb_release_vmas(&eb, true);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
new file mode 100644
index 000000000000..167268dfd930
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
@@ -0,0 +1,530 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include <linux/dma-fence-array.h>
+#include "gt/intel_gt.h"
+#include "gt/intel_gt_pm.h"
+#include "gt/intel_ring.h"
+
+#include "i915_gem_execbuffer_common.h"
+
+#define __EXEC_COMMON_FENCE_WAIT	BIT(0)
+#define __EXEC_COMMON_FENCE_SIGNAL	BIT(1)
+
+static struct i915_request *eb_throttle(struct intel_context *ce)
+{
+	struct intel_ring *ring = ce->ring;
+	struct intel_timeline *tl = ce->timeline;
+	struct i915_request *rq;
+
+	/*
+	 * Completely unscientific finger-in-the-air estimates for suitable
+	 * maximum user request size (to avoid blocking) and then backoff.
+	 */
+	if (intel_ring_update_space(ring) >= PAGE_SIZE)
+		return NULL;
+
+	/*
+	 * Find a request that after waiting upon, there will be at least half
+	 * the ring available. The hysteresis allows us to compete for the
+	 * shared ring and should mean that we sleep less often prior to
+	 * claiming our resources, but not so long that the ring completely
+	 * drains before we can submit our next request.
+	 */
+	list_for_each_entry(rq, &tl->requests, link) {
+		if (rq->ring != ring)
+			continue;
+
+		if (__intel_ring_space(rq->postfix,
+				       ring->emit, ring->size) > ring->size / 2)
+			break;
+	}
+	if (&rq->link == &tl->requests)
+		return NULL; /* weird, we will check again later for real */
+
+	return i915_request_get(rq);
+}
+
+static int eb_pin_timeline(struct intel_context *ce, bool throttle,
+			   bool nonblock)
+{
+	struct intel_timeline *tl;
+	struct i915_request *rq = NULL;
+
+	/*
+	 * Take a local wakeref for preparing to dispatch the execbuf as
+	 * we expect to access the hardware fairly frequently in the
+	 * process, and require the engine to be kept awake between accesses.
+	 * Upon dispatch, we acquire another prolonged wakeref that we hold
+	 * until the timeline is idle, which in turn releases the wakeref
+	 * taken on the engine, and the parent device.
+	 */
+	tl = intel_context_timeline_lock(ce);
+	if (IS_ERR(tl))
+		return PTR_ERR(tl);
+
+	intel_context_enter(ce);
+	if (throttle)
+		rq = eb_throttle(ce);
+	intel_context_timeline_unlock(tl);
+
+	if (rq) {
+		long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
+
+		if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
+				      timeout) < 0) {
+			i915_request_put(rq);
+
+			/*
+			 * Error path, cannot use intel_context_timeline_lock as
+			 * that is user interruptable and this clean up step
+			 * must be done.
+			 */
+			mutex_lock(&ce->timeline->mutex);
+			intel_context_exit(ce);
+			mutex_unlock(&ce->timeline->mutex);
+
+			if (nonblock)
+				return -EWOULDBLOCK;
+			else
+				return -EINTR;
+		}
+		i915_request_put(rq);
+	}
+
+	return 0;
+}
+
+int __eb_pin_engine(struct intel_context *ce, struct i915_gem_ww_ctx *ww,
+		    bool throttle, bool nonblock)
+{
+	struct intel_context *child;
+	int err;
+	int i = 0, j = 0;
+
+	if (unlikely(intel_context_is_banned(ce)))
+		return -EIO;
+
+	/*
+	 * Pinning the contexts may generate requests in order to acquire
+	 * GGTT space, so do this first before we reserve a seqno for
+	 * ourselves.
+	 */
+	err = intel_context_pin_ww(ce, ww);
+	if (err)
+		return err;
+
+	for_each_child(ce, child) {
+		err = intel_context_pin_ww(child, ww);
+		GEM_BUG_ON(err);	/* perma-pinned should incr a counter */
+	}
+
+	for_each_child(ce, child) {
+		err = eb_pin_timeline(child, throttle, nonblock);
+		if (err)
+			goto unwind;
+		++i;
+	}
+	err = eb_pin_timeline(ce, throttle, nonblock);
+	if (err)
+		goto unwind;
+
+	return 0;
+
+unwind:
+	for_each_child(ce, child) {
+		if (j++ < i) {
+			mutex_lock(&child->timeline->mutex);
+			intel_context_exit(child);
+			mutex_unlock(&child->timeline->mutex);
+		}
+	}
+	for_each_child(ce, child)
+		intel_context_unpin(child);
+	intel_context_unpin(ce);
+	return err;
+}
+
+void __eb_unpin_engine(struct intel_context *ce)
+{
+	struct intel_context *child;
+
+	for_each_child(ce, child) {
+		mutex_lock(&child->timeline->mutex);
+		intel_context_exit(child);
+		mutex_unlock(&child->timeline->mutex);
+
+		intel_context_unpin(child);
+	}
+
+	mutex_lock(&ce->timeline->mutex);
+	intel_context_exit(ce);
+	mutex_unlock(&ce->timeline->mutex);
+
+	intel_context_unpin(ce);
+}
+
+struct intel_context *
+eb_find_context(struct intel_context *context, unsigned int context_number)
+{
+	struct intel_context *child;
+
+	if (likely(context_number == 0))
+		return context;
+
+	for_each_child(context, child)
+		if (!--context_number)
+			return child;
+
+	GEM_BUG_ON("Context not found");
+
+	return NULL;
+}
+
+static void __free_fence_array(struct eb_fence *fences, u64 n)
+{
+	while (n--) {
+		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
+		dma_fence_put(fences[n].dma_fence);
+		dma_fence_chain_free(fences[n].chain_fence);
+	}
+	kvfree(fences);
+}
+
+void put_fence_array(struct eb_fence *fences, u64 num_fences)
+{
+	if (fences)
+		__free_fence_array(fences, num_fences);
+}
+
+int add_timeline_fence(struct drm_file *file, u32 handle, u64 point,
+		       struct eb_fence *f, bool wait, bool signal)
+{
+	struct drm_syncobj *syncobj;
+	struct dma_fence *fence = NULL;
+	u32 flags = 0;
+	int err = 0;
+
+	syncobj = drm_syncobj_find(file, handle);
+	if (!syncobj) {
+		DRM_DEBUG("Invalid syncobj handle provided\n");
+		return -ENOENT;
+	}
+
+	fence = drm_syncobj_fence_get(syncobj);
+
+	if (!fence && wait && !signal) {
+		DRM_DEBUG("Syncobj handle has no fence\n");
+		drm_syncobj_put(syncobj);
+		return -EINVAL;
+	}
+
+	if (fence)
+		err = dma_fence_chain_find_seqno(&fence, point);
+
+	if (err && !signal) {
+		DRM_DEBUG("Syncobj handle missing requested point %llu\n", point);
+		dma_fence_put(fence);
+		drm_syncobj_put(syncobj);
+		return err;
+	}
+
+	/*
+	 * A point might have been signaled already and
+	 * garbage collected from the timeline. In this case
+	 * just ignore the point and carry on.
+	 */
+	if (!fence && !signal) {
+		drm_syncobj_put(syncobj);
+		return 0;
+	}
+
+	/*
+	 * For timeline syncobjs we need to preallocate chains for
+	 * later signaling.
+	 */
+	if (point != 0 && signal) {
+		/*
+		 * Waiting and signaling the same point (when point !=
+		 * 0) would break the timeline.
+		 */
+		if (wait) {
+			DRM_DEBUG("Trying to wait & signal the same timeline point.\n");
+			dma_fence_put(fence);
+			drm_syncobj_put(syncobj);
+			return -EINVAL;
+		}
+
+		f->chain_fence = dma_fence_chain_alloc();
+		if (!f->chain_fence) {
+			drm_syncobj_put(syncobj);
+			dma_fence_put(fence);
+			return -ENOMEM;
+		}
+	} else {
+		f->chain_fence = NULL;
+	}
+
+	flags |= wait ? __EXEC_COMMON_FENCE_WAIT : 0;
+	flags |= signal ? __EXEC_COMMON_FENCE_SIGNAL : 0;
+
+	f->syncobj = ptr_pack_bits(syncobj, flags, 2);
+	f->dma_fence = fence;
+	f->value = point;
+	return 1;
+}
+
+int await_fence_array(struct eb_fence *fences, u64 num_fences,
+		      struct i915_request *rq)
+{
+	unsigned int n;
+
+	for (n = 0; n < num_fences; n++) {
+		struct drm_syncobj *syncobj;
+		unsigned int flags;
+		int err;
+
+		syncobj = ptr_unpack_bits(fences[n].syncobj, &flags, 2);
+
+		if (!fences[n].dma_fence)
+			continue;
+
+		err = i915_request_await_dma_fence(rq, fences[n].dma_fence);
+		if (err < 0)
+			return err;
+	}
+
+	return 0;
+}
+
+void signal_fence_array(struct eb_fence *fences, u64 num_fences,
+			struct dma_fence * const fence)
+{
+	unsigned int n;
+
+	for (n = 0; n < num_fences; n++) {
+		struct drm_syncobj *syncobj;
+		unsigned int flags;
+
+		syncobj = ptr_unpack_bits(fences[n].syncobj, &flags, 2);
+		if (!(flags & __EXEC_COMMON_FENCE_SIGNAL))
+			continue;
+
+		if (fences[n].chain_fence) {
+			drm_syncobj_add_point(syncobj,
+					      fences[n].chain_fence,
+					      fence,
+					      fences[n].value);
+			/*
+			 * The chain's ownership is transferred to the
+			 * timeline.
+			 */
+			fences[n].chain_fence = NULL;
+		} else {
+			drm_syncobj_replace_fence(syncobj, fence);
+		}
+	}
+}
+
+/*
+ * Using two helper loops for the order of which requests / batches are created
+ * and added the to backend. Requests are created in order from the parent to
+ * the last child. Requests are added in the reverse order, from the last child
+ * to parent. This is done for locking reasons as the timeline lock is acquired
+ * during request creation and released when the request is added to the
+ * backend. To make lockdep happy (see intel_context_timeline_lock) this must be
+ * the ordering.
+ */
+#define for_each_batch_create_order(_num_batches) \
+	for (unsigned int i = 0; i < (_num_batches); ++i)
+#define for_each_batch_add_order(_num_batches) \
+	for (int i = (_num_batches) - 1; i >= 0; --i)
+
+static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
+{
+	struct i915_request *rq, *rn;
+
+	list_for_each_entry_safe(rq, rn, &tl->requests, link)
+		if (rq == end || !i915_request_retire(rq))
+			break;
+}
+
+static int eb_request_add(struct intel_context *context,
+			  struct i915_request *rq,
+			  struct i915_sched_attr sched,
+			  int err, bool last_parallel)
+{
+	struct intel_timeline * const tl = i915_request_timeline(rq);
+	struct i915_sched_attr attr = {};
+	struct i915_request *prev;
+
+	lockdep_assert_held(&tl->mutex);
+	lockdep_unpin_lock(&tl->mutex, rq->cookie);
+
+	trace_i915_request_add(rq);
+
+	prev = __i915_request_commit(rq);
+
+	/* Check that the context wasn't destroyed before submission */
+	if (likely(!intel_context_is_closed(context))) {
+		attr = sched;
+	} else {
+		/* Serialise with context_close via the add_to_timeline */
+		i915_request_set_error_once(rq, -ENOENT);
+		__i915_request_skip(rq);
+		err = -ENOENT; /* override any transient errors */
+	}
+
+	if (intel_context_is_parallel(context)) {
+		if (err) {
+			__i915_request_skip(rq);
+			set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
+				&rq->fence.flags);
+		}
+		if (last_parallel)
+			set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
+				&rq->fence.flags);
+	}
+
+	__i915_request_queue(rq, &attr);
+
+	/* Try to clean up the client's timeline after submitting the request */
+	if (prev)
+		retire_requests(tl, prev);
+
+	mutex_unlock(&tl->mutex);
+
+	return err;
+}
+
+int eb_requests_add(struct i915_request **requests, unsigned int num_batches,
+		    struct intel_context *context, struct i915_sched_attr sched,
+		    int err)
+{
+	/*
+	 * We iterate in reverse order of creation to release timeline mutexes
+	 * in same order.
+	 */
+	for_each_batch_add_order(num_batches) {
+		struct i915_request *rq = requests[i];
+
+		if (!rq)
+			continue;
+
+		err = eb_request_add(context, rq, sched, err, i == 0);
+	}
+
+	return err;
+}
+
+void eb_requests_get(struct i915_request **requests, unsigned int num_batches)
+{
+	for_each_batch_create_order(num_batches) {
+		if (!requests[i])
+			break;
+
+		i915_request_get(requests[i]);
+	}
+}
+
+void eb_requests_put(struct i915_request **requests, unsigned int num_batches)
+{
+	for_each_batch_create_order(num_batches) {
+		if (!requests[i])
+			break;
+
+		i915_request_put(requests[i]);
+	}
+}
+
+struct dma_fence *__eb_composite_fence_create(struct i915_request **requests,
+					      unsigned int num_batches,
+					      struct intel_context *context)
+{
+	struct dma_fence_array *fence_array;
+	struct dma_fence **fences;
+
+	GEM_BUG_ON(!intel_context_is_parent(context));
+
+	fences = kmalloc_array(num_batches, sizeof(*fences), GFP_KERNEL);
+	if (!fences)
+		return ERR_PTR(-ENOMEM);
+
+	for_each_batch_create_order(num_batches) {
+		fences[i] = &requests[i]->fence;
+		__set_bit(I915_FENCE_FLAG_COMPOSITE,
+			  &requests[i]->fence.flags);
+	}
+
+	fence_array = dma_fence_array_create(num_batches,
+					     fences,
+					     context->parallel.fence_context,
+					     context->parallel.seqno++,
+					     false);
+	if (!fence_array) {
+		kfree(fences);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	/* Move ownership to the dma_fence_array created above */
+	for_each_batch_create_order(num_batches)
+		dma_fence_get(fences[i]);
+
+	return &fence_array->base;
+}
+
+int __eb_select_engine(struct intel_context *ce)
+{
+	struct intel_context *child;
+	int err;
+
+	for_each_child(ce, child)
+		intel_context_get(child);
+	intel_gt_pm_get(ce->engine->gt);
+
+	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
+		err = intel_context_alloc_state(ce);
+		if (err)
+			goto err;
+	}
+	for_each_child(ce, child) {
+		if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
+			err = intel_context_alloc_state(child);
+			if (err)
+				goto err;
+		}
+	}
+
+	/*
+	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
+	 * EIO if the GPU is already wedged.
+	 */
+	err = intel_gt_terminally_wedged(ce->engine->gt);
+	if (err)
+		goto err;
+
+	if (!i915_vm_tryget(ce->vm)) {
+		err = -ENOENT;
+		goto err;
+	}
+
+	return 0;
+err:
+	intel_gt_pm_put(ce->engine->gt);
+	for_each_child(ce, child)
+		intel_context_put(child);
+	return err;
+}
+
+void __eb_put_engine(struct intel_context *context, struct intel_gt *gt)
+{
+	struct intel_context *child;
+
+	i915_vm_put(context->vm);
+	intel_gt_pm_put(gt);
+	for_each_child(context, child)
+		intel_context_put(child);
+	intel_context_put(context);
+}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
new file mode 100644
index 000000000000..725febfd6a53
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#ifndef __I915_GEM_EXECBUFFER_COMMON_H
+#define __I915_GEM_EXECBUFFER_COMMON_H
+
+#include <drm/drm_syncobj.h>
+
+#include "gt/intel_context.h"
+
+struct eb_fence {
+	struct drm_syncobj *syncobj;
+	struct dma_fence *dma_fence;
+	u64 value;
+	struct dma_fence_chain *chain_fence;
+};
+
+int __eb_pin_engine(struct intel_context *ce, struct i915_gem_ww_ctx *ww,
+		    bool throttle, bool nonblock);
+void __eb_unpin_engine(struct intel_context *ce);
+int __eb_select_engine(struct intel_context *ce);
+void __eb_put_engine(struct intel_context *context, struct intel_gt *gt);
+
+struct intel_context *
+eb_find_context(struct intel_context *context, unsigned int context_number);
+
+int add_timeline_fence(struct drm_file *file, u32 handle, u64 point,
+		       struct eb_fence *f, bool wait, bool signal);
+void put_fence_array(struct eb_fence *fences, u64 num_fences);
+int await_fence_array(struct eb_fence *fences, u64 num_fences,
+		      struct i915_request *rq);
+void signal_fence_array(struct eb_fence *fences, u64 num_fences,
+			struct dma_fence * const fence);
+
+int eb_requests_add(struct i915_request **requests, unsigned int num_batches,
+		    struct intel_context *context, struct i915_sched_attr sched,
+		    int err);
+void eb_requests_get(struct i915_request **requests, unsigned int num_batches);
+void eb_requests_put(struct i915_request **requests, unsigned int num_batches);
+
+struct dma_fence *__eb_composite_fence_create(struct i915_request **requests,
+					      unsigned int num_batches,
+					      struct intel_context *context);
+
+#endif /* __I915_GEM_EXECBUFFER_COMMON_H */
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-gfx] [RFC v4 08/14] drm/i915/vm_bind: Abstract out common execbuf functions
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

The new execbuf3 ioctl path and the legacy execbuf ioctl
paths have many common functionalities.
Share code between these two paths by abstracting out the
common functionalities into a separate file where possible.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 507 ++---------------
 .../drm/i915/gem/i915_gem_execbuffer_common.c | 530 ++++++++++++++++++
 .../drm/i915/gem/i915_gem_execbuffer_common.h |  47 ++
 4 files changed, 612 insertions(+), 473 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 9bf939ef18ea..bf952f478555 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -148,6 +148,7 @@ gem-y += \
 	gem/i915_gem_create.o \
 	gem/i915_gem_dmabuf.o \
 	gem/i915_gem_domain.o \
+	gem/i915_gem_execbuffer_common.o \
 	gem/i915_gem_execbuffer.o \
 	gem/i915_gem_internal.o \
 	gem/i915_gem_object.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 33d989a20227..363b2a788cdf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -9,8 +9,6 @@
 #include <linux/sync_file.h>
 #include <linux/uaccess.h>
 
-#include <drm/drm_syncobj.h>
-
 #include "display/intel_frontbuffer.h"
 
 #include "gem/i915_gem_ioctls.h"
@@ -28,6 +26,7 @@
 #include "i915_file_private.h"
 #include "i915_gem_clflush.h"
 #include "i915_gem_context.h"
+#include "i915_gem_execbuffer_common.h"
 #include "i915_gem_evict.h"
 #include "i915_gem_ioctls.h"
 #include "i915_trace.h"
@@ -235,13 +234,6 @@ enum {
  * the batchbuffer in trusted mode, otherwise the ioctl is rejected.
  */
 
-struct eb_fence {
-	struct drm_syncobj *syncobj; /* Use with ptr_mask_bits() */
-	struct dma_fence *dma_fence;
-	u64 value;
-	struct dma_fence_chain *chain_fence;
-};
-
 struct i915_execbuffer {
 	struct drm_i915_private *i915; /** i915 backpointer */
 	struct drm_file *file; /** per-file lookup tables and limits */
@@ -2446,164 +2438,29 @@ static const enum intel_engine_id user_ring_map[] = {
 	[I915_EXEC_VEBOX]	= VECS0
 };
 
-static struct i915_request *eb_throttle(struct i915_execbuffer *eb, struct intel_context *ce)
-{
-	struct intel_ring *ring = ce->ring;
-	struct intel_timeline *tl = ce->timeline;
-	struct i915_request *rq;
-
-	/*
-	 * Completely unscientific finger-in-the-air estimates for suitable
-	 * maximum user request size (to avoid blocking) and then backoff.
-	 */
-	if (intel_ring_update_space(ring) >= PAGE_SIZE)
-		return NULL;
-
-	/*
-	 * Find a request that after waiting upon, there will be at least half
-	 * the ring available. The hysteresis allows us to compete for the
-	 * shared ring and should mean that we sleep less often prior to
-	 * claiming our resources, but not so long that the ring completely
-	 * drains before we can submit our next request.
-	 */
-	list_for_each_entry(rq, &tl->requests, link) {
-		if (rq->ring != ring)
-			continue;
-
-		if (__intel_ring_space(rq->postfix,
-				       ring->emit, ring->size) > ring->size / 2)
-			break;
-	}
-	if (&rq->link == &tl->requests)
-		return NULL; /* weird, we will check again later for real */
-
-	return i915_request_get(rq);
-}
-
-static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
-			   bool throttle)
-{
-	struct intel_timeline *tl;
-	struct i915_request *rq = NULL;
-
-	/*
-	 * Take a local wakeref for preparing to dispatch the execbuf as
-	 * we expect to access the hardware fairly frequently in the
-	 * process, and require the engine to be kept awake between accesses.
-	 * Upon dispatch, we acquire another prolonged wakeref that we hold
-	 * until the timeline is idle, which in turn releases the wakeref
-	 * taken on the engine, and the parent device.
-	 */
-	tl = intel_context_timeline_lock(ce);
-	if (IS_ERR(tl))
-		return PTR_ERR(tl);
-
-	intel_context_enter(ce);
-	if (throttle)
-		rq = eb_throttle(eb, ce);
-	intel_context_timeline_unlock(tl);
-
-	if (rq) {
-		bool nonblock = eb->file->filp->f_flags & O_NONBLOCK;
-		long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
-
-		if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
-				      timeout) < 0) {
-			i915_request_put(rq);
-
-			/*
-			 * Error path, cannot use intel_context_timeline_lock as
-			 * that is user interruptable and this clean up step
-			 * must be done.
-			 */
-			mutex_lock(&ce->timeline->mutex);
-			intel_context_exit(ce);
-			mutex_unlock(&ce->timeline->mutex);
-
-			if (nonblock)
-				return -EWOULDBLOCK;
-			else
-				return -EINTR;
-		}
-		i915_request_put(rq);
-	}
-
-	return 0;
-}
-
 static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
 {
-	struct intel_context *ce = eb->context, *child;
 	int err;
-	int i = 0, j = 0;
 
 	GEM_BUG_ON(eb->args->flags & __EXEC_ENGINE_PINNED);
 
-	if (unlikely(intel_context_is_banned(ce)))
-		return -EIO;
-
-	/*
-	 * Pinning the contexts may generate requests in order to acquire
-	 * GGTT space, so do this first before we reserve a seqno for
-	 * ourselves.
-	 */
-	err = intel_context_pin_ww(ce, &eb->ww);
+	err = __eb_pin_engine(eb->context, &eb->ww, throttle,
+			      eb->file->filp->f_flags & O_NONBLOCK);
 	if (err)
 		return err;
-	for_each_child(ce, child) {
-		err = intel_context_pin_ww(child, &eb->ww);
-		GEM_BUG_ON(err);	/* perma-pinned should incr a counter */
-	}
-
-	for_each_child(ce, child) {
-		err = eb_pin_timeline(eb, child, throttle);
-		if (err)
-			goto unwind;
-		++i;
-	}
-	err = eb_pin_timeline(eb, ce, throttle);
-	if (err)
-		goto unwind;
 
 	eb->args->flags |= __EXEC_ENGINE_PINNED;
 	return 0;
-
-unwind:
-	for_each_child(ce, child) {
-		if (j++ < i) {
-			mutex_lock(&child->timeline->mutex);
-			intel_context_exit(child);
-			mutex_unlock(&child->timeline->mutex);
-		}
-	}
-	for_each_child(ce, child)
-		intel_context_unpin(child);
-	intel_context_unpin(ce);
-	return err;
 }
 
 static void eb_unpin_engine(struct i915_execbuffer *eb)
 {
-	struct intel_context *ce = eb->context, *child;
-
 	if (!(eb->args->flags & __EXEC_ENGINE_PINNED))
 		return;
 
 	eb->args->flags &= ~__EXEC_ENGINE_PINNED;
 
-	for_each_child(ce, child) {
-		mutex_lock(&child->timeline->mutex);
-		intel_context_exit(child);
-		mutex_unlock(&child->timeline->mutex);
-
-		intel_context_unpin(child);
-	}
-
-	mutex_lock(&ce->timeline->mutex);
-	intel_context_exit(ce);
-	mutex_unlock(&ce->timeline->mutex);
-
-	intel_context_unpin(ce);
+	__eb_unpin_engine(eb->context);
 }
 
 static unsigned int
@@ -2652,7 +2509,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
 static int
 eb_select_engine(struct i915_execbuffer *eb)
 {
-	struct intel_context *ce, *child;
+	struct intel_context *ce;
 	unsigned int idx;
 	int err;
 
@@ -2677,36 +2534,10 @@ eb_select_engine(struct i915_execbuffer *eb)
 	}
 	eb->num_batches = ce->parallel.number_children + 1;
 
-	for_each_child(ce, child)
-		intel_context_get(child);
-	intel_gt_pm_get(ce->engine->gt);
-
-	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
-		err = intel_context_alloc_state(ce);
-		if (err)
-			goto err;
-	}
-	for_each_child(ce, child) {
-		if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
-			err = intel_context_alloc_state(child);
-			if (err)
-				goto err;
-		}
-	}
-
-	/*
-	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
-	 * EIO if the GPU is already wedged.
-	 */
-	err = intel_gt_terminally_wedged(ce->engine->gt);
+	err = __eb_select_engine(ce);
 	if (err)
 		goto err;
 
-	if (!i915_vm_tryget(ce->vm)) {
-		err = -ENOENT;
-		goto err;
-	}
-
 	eb->context = ce;
 	eb->gt = ce->engine->gt;
 
@@ -2715,12 +2546,9 @@ eb_select_engine(struct i915_execbuffer *eb)
 	 * during ww handling. The pool is destroyed when last pm reference
 	 * is dropped, which breaks our -EDEADLK handling.
 	 */
-	return err;
+	return 0;
 
 err:
-	intel_gt_pm_put(ce->engine->gt);
-	for_each_child(ce, child)
-		intel_context_put(child);
 	intel_context_put(ce);
 	return err;
 }
@@ -2728,24 +2556,7 @@ eb_select_engine(struct i915_execbuffer *eb)
 static void
 eb_put_engine(struct i915_execbuffer *eb)
 {
-	struct intel_context *child;
-
-	i915_vm_put(eb->context->vm);
-	intel_gt_pm_put(eb->gt);
-	for_each_child(eb->context, child)
-		intel_context_put(child);
-	intel_context_put(eb->context);
-}
-
-static void
-__free_fence_array(struct eb_fence *fences, unsigned int n)
-{
-	while (n--) {
-		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
-		dma_fence_put(fences[n].dma_fence);
-		dma_fence_chain_free(fences[n].chain_fence);
-	}
-	kvfree(fences);
+	__eb_put_engine(eb->context, eb->gt);
 }
 
 static int
@@ -2756,7 +2567,6 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
 	u64 __user *user_values;
 	struct eb_fence *f;
 	u64 nfences;
-	int err = 0;
 
 	nfences = timeline_fences->fence_count;
 	if (!nfences)
@@ -2791,9 +2601,9 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
 
 	while (nfences--) {
 		struct drm_i915_gem_exec_fence user_fence;
-		struct drm_syncobj *syncobj;
-		struct dma_fence *fence = NULL;
+		bool wait, signal;
 		u64 point;
+		int ret;
 
 		if (__copy_from_user(&user_fence,
 				     user_fences++,
@@ -2806,70 +2616,15 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
 		if (__get_user(point, user_values++))
 			return -EFAULT;
 
-		syncobj = drm_syncobj_find(eb->file, user_fence.handle);
-		if (!syncobj) {
-			DRM_DEBUG("Invalid syncobj handle provided\n");
-			return -ENOENT;
-		}
-
-		fence = drm_syncobj_fence_get(syncobj);
-
-		if (!fence && user_fence.flags &&
-		    !(user_fence.flags & I915_EXEC_FENCE_SIGNAL)) {
-			DRM_DEBUG("Syncobj handle has no fence\n");
-			drm_syncobj_put(syncobj);
-			return -EINVAL;
-		}
-
-		if (fence)
-			err = dma_fence_chain_find_seqno(&fence, point);
-
-		if (err && !(user_fence.flags & I915_EXEC_FENCE_SIGNAL)) {
-			DRM_DEBUG("Syncobj handle missing requested point %llu\n", point);
-			dma_fence_put(fence);
-			drm_syncobj_put(syncobj);
-			return err;
-		}
-
-		/*
-		 * A point might have been signaled already and
-		 * garbage collected from the timeline. In this case
-		 * just ignore the point and carry on.
-		 */
-		if (!fence && !(user_fence.flags & I915_EXEC_FENCE_SIGNAL)) {
-			drm_syncobj_put(syncobj);
+		wait = user_fence.flags & I915_EXEC_FENCE_WAIT;
+		signal = user_fence.flags & I915_EXEC_FENCE_SIGNAL;
+		ret = add_timeline_fence(eb->file, user_fence.handle, point,
+					 f, wait, signal);
+		if (ret < 0)
+			return ret;
+		else if (!ret)
 			continue;
-		}
 
-		/*
-		 * For timeline syncobjs we need to preallocate chains for
-		 * later signaling.
-		 */
-		if (point != 0 && user_fence.flags & I915_EXEC_FENCE_SIGNAL) {
-			/*
-			 * Waiting and signaling the same point (when point !=
-			 * 0) would break the timeline.
-			 */
-			if (user_fence.flags & I915_EXEC_FENCE_WAIT) {
-				DRM_DEBUG("Trying to wait & signal the same timeline point.\n");
-				dma_fence_put(fence);
-				drm_syncobj_put(syncobj);
-				return -EINVAL;
-			}
-
-			f->chain_fence = dma_fence_chain_alloc();
-			if (!f->chain_fence) {
-				drm_syncobj_put(syncobj);
-				dma_fence_put(fence);
-				return -ENOMEM;
-			}
-		} else {
-			f->chain_fence = NULL;
-		}
-
-		f->syncobj = ptr_pack_bits(syncobj, user_fence.flags, 2);
-		f->dma_fence = fence;
-		f->value = point;
 		f++;
 		eb->num_fences++;
 	}
@@ -2949,65 +2704,6 @@ static int add_fence_array(struct i915_execbuffer *eb)
 	return 0;
 }
 
-static void put_fence_array(struct eb_fence *fences, int num_fences)
-{
-	if (fences)
-		__free_fence_array(fences, num_fences);
-}
-
-static int
-await_fence_array(struct i915_execbuffer *eb,
-		  struct i915_request *rq)
-{
-	unsigned int n;
-	int err;
-
-	for (n = 0; n < eb->num_fences; n++) {
-		struct drm_syncobj *syncobj;
-		unsigned int flags;
-
-		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
-
-		if (!eb->fences[n].dma_fence)
-			continue;
-
-		err = i915_request_await_dma_fence(rq, eb->fences[n].dma_fence);
-		if (err < 0)
-			return err;
-	}
-
-	return 0;
-}
-
-static void signal_fence_array(const struct i915_execbuffer *eb,
-			       struct dma_fence * const fence)
-{
-	unsigned int n;
-
-	for (n = 0; n < eb->num_fences; n++) {
-		struct drm_syncobj *syncobj;
-		unsigned int flags;
-
-		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
-		if (!(flags & I915_EXEC_FENCE_SIGNAL))
-			continue;
-
-		if (eb->fences[n].chain_fence) {
-			drm_syncobj_add_point(syncobj,
-					      eb->fences[n].chain_fence,
-					      fence,
-					      eb->fences[n].value);
-			/*
-			 * The chain's ownership is transferred to the
-			 * timeline.
-			 */
-			eb->fences[n].chain_fence = NULL;
-		} else {
-			drm_syncobj_replace_fence(syncobj, fence);
-		}
-	}
-}
-
 static int
 parse_timeline_fences(struct i915_user_extension __user *ext, void *data)
 {
@@ -3020,80 +2716,6 @@ parse_timeline_fences(struct i915_user_extension __user *ext, void *data)
 	return add_timeline_fence_array(eb, &timeline_fences);
 }
 
-static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
-{
-	struct i915_request *rq, *rn;
-
-	list_for_each_entry_safe(rq, rn, &tl->requests, link)
-		if (rq == end || !i915_request_retire(rq))
-			break;
-}
-
-static int eb_request_add(struct i915_execbuffer *eb, struct i915_request *rq,
-			  int err, bool last_parallel)
-{
-	struct intel_timeline * const tl = i915_request_timeline(rq);
-	struct i915_sched_attr attr = {};
-	struct i915_request *prev;
-
-	lockdep_assert_held(&tl->mutex);
-	lockdep_unpin_lock(&tl->mutex, rq->cookie);
-
-	trace_i915_request_add(rq);
-
-	prev = __i915_request_commit(rq);
-
-	/* Check that the context wasn't destroyed before submission */
-	if (likely(!intel_context_is_closed(eb->context))) {
-		attr = eb->gem_context->sched;
-	} else {
-		/* Serialise with context_close via the add_to_timeline */
-		i915_request_set_error_once(rq, -ENOENT);
-		__i915_request_skip(rq);
-		err = -ENOENT; /* override any transient errors */
-	}
-
-	if (intel_context_is_parallel(eb->context)) {
-		if (err) {
-			__i915_request_skip(rq);
-			set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
-				&rq->fence.flags);
-		}
-		if (last_parallel)
-			set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
-				&rq->fence.flags);
-	}
-
-	__i915_request_queue(rq, &attr);
-
-	/* Try to clean up the client's timeline after submitting the request */
-	if (prev)
-		retire_requests(tl, prev);
-
-	mutex_unlock(&tl->mutex);
-
-	return err;
-}
-
-static int eb_requests_add(struct i915_execbuffer *eb, int err)
-{
-	int i;
-
-	/*
-	 * We iterate in reverse order of creation to release timeline mutexes in
-	 * same order.
-	 */
-	for_each_batch_add_order(eb, i) {
-		struct i915_request *rq = eb->requests[i];
-
-		if (!rq)
-			continue;
-		err |= eb_request_add(eb, rq, err, i == 0);
-	}
-
-	return err;
-}
-
 static const i915_user_extension_fn execbuf_extensions[] = {
 	[DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES] = parse_timeline_fences,
 };
@@ -3120,73 +2742,26 @@ parse_execbuf2_extensions(struct drm_i915_gem_execbuffer2 *args,
 				    eb);
 }
 
-static void eb_requests_get(struct i915_execbuffer *eb)
-{
-	unsigned int i;
-
-	for_each_batch_create_order(eb, i) {
-		if (!eb->requests[i])
-			break;
-
-		i915_request_get(eb->requests[i]);
-	}
-}
-
-static void eb_requests_put(struct i915_execbuffer *eb)
-{
-	unsigned int i;
-
-	for_each_batch_create_order(eb, i) {
-		if (!eb->requests[i])
-			break;
-
-		i915_request_put(eb->requests[i]);
-	}
-}
-
 static struct sync_file *
 eb_composite_fence_create(struct i915_execbuffer *eb, int out_fence_fd)
 {
 	struct sync_file *out_fence = NULL;
-	struct dma_fence_array *fence_array;
-	struct dma_fence **fences;
-	unsigned int i;
-
-	GEM_BUG_ON(!intel_context_is_parent(eb->context));
+	struct dma_fence *fence;
 
-	fences = kmalloc_array(eb->num_batches, sizeof(*fences), GFP_KERNEL);
-	if (!fences)
-		return ERR_PTR(-ENOMEM);
-
-	for_each_batch_create_order(eb, i) {
-		fences[i] = &eb->requests[i]->fence;
-		__set_bit(I915_FENCE_FLAG_COMPOSITE,
-			  &eb->requests[i]->fence.flags);
-	}
-
-	fence_array = dma_fence_array_create(eb->num_batches,
-					     fences,
-					     eb->context->parallel.fence_context,
-					     eb->context->parallel.seqno++,
-					     false);
-	if (!fence_array) {
-		kfree(fences);
-		return ERR_PTR(-ENOMEM);
-	}
-
-	/* Move ownership to the dma_fence_array created above */
-	for_each_batch_create_order(eb, i)
-		dma_fence_get(fences[i]);
+	fence = __eb_composite_fence_create(eb->requests, eb->num_batches,
+					    eb->context);
+	if (IS_ERR(fence))
+		return ERR_CAST(fence);
 
 	if (out_fence_fd != -1) {
-		out_fence = sync_file_create(&fence_array->base);
+		out_fence = sync_file_create(fence);
 		/* sync_file now owns fence_arry, drop creation ref */
-		dma_fence_put(&fence_array->base);
+		dma_fence_put(fence);
 		if (!out_fence)
 			return ERR_PTR(-ENOMEM);
 	}
 
-	eb->composite_fence = &fence_array->base;
+	eb->composite_fence = fence;
 
 	return out_fence;
 }
@@ -3218,7 +2793,7 @@ eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq,
 	}
 
 	if (eb->fences) {
-		err = await_fence_array(eb, rq);
+		err = await_fence_array(eb->fences, eb->num_fences, rq);
 		if (err)
 			return ERR_PTR(err);
 	}
@@ -3236,23 +2811,6 @@ eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq,
 	return out_fence;
 }
 
-static struct intel_context *
-eb_find_context(struct i915_execbuffer *eb, unsigned int context_number)
-{
-	struct intel_context *child;
-
-	if (likely(context_number == 0))
-		return eb->context;
-
-	for_each_child(eb->context, child)
-		if (!--context_number)
-			return child;
-
-	GEM_BUG_ON("Context not found");
-
-	return NULL;
-}
-
 static struct sync_file *
 eb_requests_create(struct i915_execbuffer *eb, struct dma_fence *in_fence,
 		   int out_fence_fd)
@@ -3262,7 +2820,8 @@ eb_requests_create(struct i915_execbuffer *eb, struct dma_fence *in_fence,
 
 	for_each_batch_create_order(eb, i) {
 		/* Allocate a request for this batch buffer nice and early. */
-		eb->requests[i] = i915_request_create(eb_find_context(eb, i));
+		eb->requests[i] =
+			i915_request_create(eb_find_context(eb->context, i));
 		if (IS_ERR(eb->requests[i])) {
 			out_fence = ERR_CAST(eb->requests[i]);
 			eb->requests[i] = NULL;
@@ -3442,11 +3001,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	err = eb_submit(&eb);
 
 err_request:
-	eb_requests_get(&eb);
-	err = eb_requests_add(&eb, err);
+	eb_requests_get(eb.requests, eb.num_batches);
+	err = eb_requests_add(eb.requests, eb.num_batches, eb.context,
+			      eb.gem_context->sched, err);
 
 	if (eb.fences)
-		signal_fence_array(&eb, eb.composite_fence ?
+		signal_fence_array(eb.fences, eb.num_fences,
+				   eb.composite_fence ?
 				   eb.composite_fence :
 				   &eb.requests[0]->fence);
 
@@ -3471,7 +3032,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (!out_fence && eb.composite_fence)
 		dma_fence_put(eb.composite_fence);
 
-	eb_requests_put(&eb);
+	eb_requests_put(eb.requests, eb.num_batches);
 
 err_vma:
 	eb_release_vmas(&eb, true);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
new file mode 100644
index 000000000000..167268dfd930
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
@@ -0,0 +1,530 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include <linux/dma-fence-array.h>
+#include "gt/intel_gt.h"
+#include "gt/intel_gt_pm.h"
+#include "gt/intel_ring.h"
+
+#include "i915_gem_execbuffer_common.h"
+
+#define __EXEC_COMMON_FENCE_WAIT	BIT(0)
+#define __EXEC_COMMON_FENCE_SIGNAL	BIT(1)
+
+static struct i915_request *eb_throttle(struct intel_context *ce)
+{
+	struct intel_ring *ring = ce->ring;
+	struct intel_timeline *tl = ce->timeline;
+	struct i915_request *rq;
+
+	/*
+	 * Completely unscientific finger-in-the-air estimates for suitable
+	 * maximum user request size (to avoid blocking) and then backoff.
+	 */
+	if (intel_ring_update_space(ring) >= PAGE_SIZE)
+		return NULL;
+
+	/*
+	 * Find a request that after waiting upon, there will be at least half
+	 * the ring available. The hysteresis allows us to compete for the
+	 * shared ring and should mean that we sleep less often prior to
+	 * claiming our resources, but not so long that the ring completely
+	 * drains before we can submit our next request.
+	 */
+	list_for_each_entry(rq, &tl->requests, link) {
+		if (rq->ring != ring)
+			continue;
+
+		if (__intel_ring_space(rq->postfix,
+				       ring->emit, ring->size) > ring->size / 2)
+			break;
+	}
+	if (&rq->link == &tl->requests)
+		return NULL; /* weird, we will check again later for real */
+
+	return i915_request_get(rq);
+}
+
+static int eb_pin_timeline(struct intel_context *ce, bool throttle,
+			   bool nonblock)
+{
+	struct intel_timeline *tl;
+	struct i915_request *rq = NULL;
+
+	/*
+	 * Take a local wakeref for preparing to dispatch the execbuf as
+	 * we expect to access the hardware fairly frequently in the
+	 * process, and require the engine to be kept awake between accesses.
+	 * Upon dispatch, we acquire another prolonged wakeref that we hold
+	 * until the timeline is idle, which in turn releases the wakeref
+	 * taken on the engine, and the parent device.
+	 */
+	tl = intel_context_timeline_lock(ce);
+	if (IS_ERR(tl))
+		return PTR_ERR(tl);
+
+	intel_context_enter(ce);
+	if (throttle)
+		rq = eb_throttle(ce);
+	intel_context_timeline_unlock(tl);
+
+	if (rq) {
+		long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
+
+		if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
+				      timeout) < 0) {
+			i915_request_put(rq);
+
+			/*
+			 * Error path, cannot use intel_context_timeline_lock as
+			 * that is user interruptable and this clean up step
+			 * must be done.
+			 */
+			mutex_lock(&ce->timeline->mutex);
+			intel_context_exit(ce);
+			mutex_unlock(&ce->timeline->mutex);
+
+			if (nonblock)
+				return -EWOULDBLOCK;
+			else
+				return -EINTR;
+		}
+		i915_request_put(rq);
+	}
+
+	return 0;
+}
+
+int __eb_pin_engine(struct intel_context *ce, struct i915_gem_ww_ctx *ww,
+		    bool throttle, bool nonblock)
+{
+	struct intel_context *child;
+	int err;
+	int i = 0, j = 0;
+
+	if (unlikely(intel_context_is_banned(ce)))
+		return -EIO;
+
+	/*
+	 * Pinning the contexts may generate requests in order to acquire
+	 * GGTT space, so do this first before we reserve a seqno for
+	 * ourselves.
+	 */
+	err = intel_context_pin_ww(ce, ww);
+	if (err)
+		return err;
+
+	for_each_child(ce, child) {
+		err = intel_context_pin_ww(child, ww);
+		GEM_BUG_ON(err);	/* perma-pinned should incr a counter */
+	}
+
+	for_each_child(ce, child) {
+		err = eb_pin_timeline(child, throttle, nonblock);
+		if (err)
+			goto unwind;
+		++i;
+	}
+	err = eb_pin_timeline(ce, throttle, nonblock);
+	if (err)
+		goto unwind;
+
+	return 0;
+
+unwind:
+	for_each_child(ce, child) {
+		if (j++ < i) {
+			mutex_lock(&child->timeline->mutex);
+			intel_context_exit(child);
+			mutex_unlock(&child->timeline->mutex);
+		}
+	}
+	for_each_child(ce, child)
+		intel_context_unpin(child);
+	intel_context_unpin(ce);
+	return err;
+}
+
+void __eb_unpin_engine(struct intel_context *ce)
+{
+	struct intel_context *child;
+
+	for_each_child(ce, child) {
+		mutex_lock(&child->timeline->mutex);
+		intel_context_exit(child);
+		mutex_unlock(&child->timeline->mutex);
+
+		intel_context_unpin(child);
+	}
+
+	mutex_lock(&ce->timeline->mutex);
+	intel_context_exit(ce);
+	mutex_unlock(&ce->timeline->mutex);
+
+	intel_context_unpin(ce);
+}
+
+struct intel_context *
+eb_find_context(struct intel_context *context, unsigned int context_number)
+{
+	struct intel_context *child;
+
+	if (likely(context_number == 0))
+		return context;
+
+	for_each_child(context, child)
+		if (!--context_number)
+			return child;
+
+	GEM_BUG_ON("Context not found");
+
+	return NULL;
+}
+
+static void __free_fence_array(struct eb_fence *fences, u64 n)
+{
+	while (n--) {
+		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
+		dma_fence_put(fences[n].dma_fence);
+		dma_fence_chain_free(fences[n].chain_fence);
+	}
+	kvfree(fences);
+}
+
+void put_fence_array(struct eb_fence *fences, u64 num_fences)
+{
+	if (fences)
+		__free_fence_array(fences, num_fences);
+}
+
+int add_timeline_fence(struct drm_file *file, u32 handle, u64 point,
+		       struct eb_fence *f, bool wait, bool signal)
+{
+	struct drm_syncobj *syncobj;
+	struct dma_fence *fence = NULL;
+	u32 flags = 0;
+	int err = 0;
+
+	syncobj = drm_syncobj_find(file, handle);
+	if (!syncobj) {
+		DRM_DEBUG("Invalid syncobj handle provided\n");
+		return -ENOENT;
+	}
+
+	fence = drm_syncobj_fence_get(syncobj);
+
+	if (!fence && wait && !signal) {
+		DRM_DEBUG("Syncobj handle has no fence\n");
+		drm_syncobj_put(syncobj);
+		return -EINVAL;
+	}
+
+	if (fence)
+		err = dma_fence_chain_find_seqno(&fence, point);
+
+	if (err && !signal) {
+		DRM_DEBUG("Syncobj handle missing requested point %llu\n", point);
+		dma_fence_put(fence);
+		drm_syncobj_put(syncobj);
+		return err;
+	}
+
+	/*
+	 * A point might have been signaled already and
+	 * garbage collected from the timeline. In this case
+	 * just ignore the point and carry on.
+	 */
+	if (!fence && !signal) {
+		drm_syncobj_put(syncobj);
+		return 0;
+	}
+
+	/*
+	 * For timeline syncobjs we need to preallocate chains for
+	 * later signaling.
+	 */
+	if (point != 0 && signal) {
+		/*
+		 * Waiting and signaling the same point (when point !=
+		 * 0) would break the timeline.
+		 */
+		if (wait) {
+			DRM_DEBUG("Trying to wait & signal the same timeline point.\n");
+			dma_fence_put(fence);
+			drm_syncobj_put(syncobj);
+			return -EINVAL;
+		}
+
+		f->chain_fence = dma_fence_chain_alloc();
+		if (!f->chain_fence) {
+			drm_syncobj_put(syncobj);
+			dma_fence_put(fence);
+			return -ENOMEM;
+		}
+	} else {
+		f->chain_fence = NULL;
+	}
+
+	flags |= wait ? __EXEC_COMMON_FENCE_WAIT : 0;
+	flags |= signal ? __EXEC_COMMON_FENCE_SIGNAL : 0;
+
+	f->syncobj = ptr_pack_bits(syncobj, flags, 2);
+	f->dma_fence = fence;
+	f->value = point;
+	return 1;
+}
+
+int await_fence_array(struct eb_fence *fences, u64 num_fences,
+		      struct i915_request *rq)
+{
+	unsigned int n;
+
+	for (n = 0; n < num_fences; n++) {
+		struct drm_syncobj *syncobj;
+		unsigned int flags;
+		int err;
+
+		syncobj = ptr_unpack_bits(fences[n].syncobj, &flags, 2);
+
+		if (!fences[n].dma_fence)
+			continue;
+
+		err = i915_request_await_dma_fence(rq, fences[n].dma_fence);
+		if (err < 0)
+			return err;
+	}
+
+	return 0;
+}
+
+void signal_fence_array(struct eb_fence *fences, u64 num_fences,
+			struct dma_fence * const fence)
+{
+	unsigned int n;
+
+	for (n = 0; n < num_fences; n++) {
+		struct drm_syncobj *syncobj;
+		unsigned int flags;
+
+		syncobj = ptr_unpack_bits(fences[n].syncobj, &flags, 2);
+		if (!(flags & __EXEC_COMMON_FENCE_SIGNAL))
+			continue;
+
+		if (fences[n].chain_fence) {
+			drm_syncobj_add_point(syncobj,
+					      fences[n].chain_fence,
+					      fence,
+					      fences[n].value);
+			/*
+			 * The chain's ownership is transferred to the
+			 * timeline.
+			 */
+			fences[n].chain_fence = NULL;
+		} else {
+			drm_syncobj_replace_fence(syncobj, fence);
+		}
+	}
+}
+
+/*
+ * Using two helper loops for the order of which requests / batches are created
+ * and added the to backend. Requests are created in order from the parent to
+ * the last child. Requests are added in the reverse order, from the last child
+ * to parent. This is done for locking reasons as the timeline lock is acquired
+ * during request creation and released when the request is added to the
+ * backend. To make lockdep happy (see intel_context_timeline_lock) this must be
+ * the ordering.
+ */
+#define for_each_batch_create_order(_num_batches) \
+	for (unsigned int i = 0; i < (_num_batches); ++i)
+#define for_each_batch_add_order(_num_batches) \
+	for (int i = (_num_batches) - 1; i >= 0; --i)
+
+static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
+{
+	struct i915_request *rq, *rn;
+
+	list_for_each_entry_safe(rq, rn, &tl->requests, link)
+		if (rq == end || !i915_request_retire(rq))
+			break;
+}
+
+static int eb_request_add(struct intel_context *context,
+			  struct i915_request *rq,
+			  struct i915_sched_attr sched,
+			  int err, bool last_parallel)
+{
+	struct intel_timeline * const tl = i915_request_timeline(rq);
+	struct i915_sched_attr attr = {};
+	struct i915_request *prev;
+
+	lockdep_assert_held(&tl->mutex);
+	lockdep_unpin_lock(&tl->mutex, rq->cookie);
+
+	trace_i915_request_add(rq);
+
+	prev = __i915_request_commit(rq);
+
+	/* Check that the context wasn't destroyed before submission */
+	if (likely(!intel_context_is_closed(context))) {
+		attr = sched;
+	} else {
+		/* Serialise with context_close via the add_to_timeline */
+		i915_request_set_error_once(rq, -ENOENT);
+		__i915_request_skip(rq);
+		err = -ENOENT; /* override any transient errors */
+	}
+
+	if (intel_context_is_parallel(context)) {
+		if (err) {
+			__i915_request_skip(rq);
+			set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
+				&rq->fence.flags);
+		}
+		if (last_parallel)
+			set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
+				&rq->fence.flags);
+	}
+
+	__i915_request_queue(rq, &attr);
+
+	/* Try to clean up the client's timeline after submitting the request */
+	if (prev)
+		retire_requests(tl, prev);
+
+	mutex_unlock(&tl->mutex);
+
+	return err;
+}
+
+int eb_requests_add(struct i915_request **requests, unsigned int num_batches,
+		    struct intel_context *context, struct i915_sched_attr sched,
+		    int err)
+{
+	/*
+	 * We iterate in reverse order of creation to release timeline mutexes
+	 * in same order.
+	 */
+	for_each_batch_add_order(num_batches) {
+		struct i915_request *rq = requests[i];
+
+		if (!rq)
+			continue;
+
+		err = eb_request_add(context, rq, sched, err, i == 0);
+	}
+
+	return err;
+}
+
+void eb_requests_get(struct i915_request **requests, unsigned int num_batches)
+{
+	for_each_batch_create_order(num_batches) {
+		if (!requests[i])
+			break;
+
+		i915_request_get(requests[i]);
+	}
+}
+
+void eb_requests_put(struct i915_request **requests, unsigned int num_batches)
+{
+	for_each_batch_create_order(num_batches) {
+		if (!requests[i])
+			break;
+
+		i915_request_put(requests[i]);
+	}
+}
+
+struct dma_fence *__eb_composite_fence_create(struct i915_request **requests,
+					      unsigned int num_batches,
+					      struct intel_context *context)
+{
+	struct dma_fence_array *fence_array;
+	struct dma_fence **fences;
+
+	GEM_BUG_ON(!intel_context_is_parent(context));
+
+	fences = kmalloc_array(num_batches, sizeof(*fences), GFP_KERNEL);
+	if (!fences)
+		return ERR_PTR(-ENOMEM);
+
+	for_each_batch_create_order(num_batches) {
+		fences[i] = &requests[i]->fence;
+		__set_bit(I915_FENCE_FLAG_COMPOSITE,
+			  &requests[i]->fence.flags);
+	}
+
+	fence_array = dma_fence_array_create(num_batches,
+					     fences,
+					     context->parallel.fence_context,
+					     context->parallel.seqno++,
+					     false);
+	if (!fence_array) {
+		kfree(fences);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	/* Move ownership to the dma_fence_array created above */
+	for_each_batch_create_order(num_batches)
+		dma_fence_get(fences[i]);
+
+	return &fence_array->base;
+}
+
+int __eb_select_engine(struct intel_context *ce)
+{
+	struct intel_context *child;
+	int err;
+
+	for_each_child(ce, child)
+		intel_context_get(child);
+	intel_gt_pm_get(ce->engine->gt);
+
+	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
+		err = intel_context_alloc_state(ce);
+		if (err)
+			goto err;
+	}
+	for_each_child(ce, child) {
+		if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
+			err = intel_context_alloc_state(child);
+			if (err)
+				goto err;
+		}
+	}
+
+	/*
+	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
+	 * EIO if the GPU is already wedged.
+	 */
+	err = intel_gt_terminally_wedged(ce->engine->gt);
+	if (err)
+		goto err;
+
+	if (!i915_vm_tryget(ce->vm)) {
+		err = -ENOENT;
+		goto err;
+	}
+
+	return 0;
+err:
+	intel_gt_pm_put(ce->engine->gt);
+	for_each_child(ce, child)
+		intel_context_put(child);
+	return err;
+}
+
+void __eb_put_engine(struct intel_context *context, struct intel_gt *gt)
+{
+	struct intel_context *child;
+
+	i915_vm_put(context->vm);
+	intel_gt_pm_put(gt);
+	for_each_child(context, child)
+		intel_context_put(child);
+	intel_context_put(context);
+}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
new file mode 100644
index 000000000000..725febfd6a53
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#ifndef __I915_GEM_EXECBUFFER_COMMON_H
+#define __I915_GEM_EXECBUFFER_COMMON_H
+
+#include <drm/drm_syncobj.h>
+
+#include "gt/intel_context.h"
+
+struct eb_fence {
+	struct drm_syncobj *syncobj;
+	struct dma_fence *dma_fence;
+	u64 value;
+	struct dma_fence_chain *chain_fence;
+};
+
+int __eb_pin_engine(struct intel_context *ce, struct i915_gem_ww_ctx *ww,
+		    bool throttle, bool nonblock);
+void __eb_unpin_engine(struct intel_context *ce);
+int __eb_select_engine(struct intel_context *ce);
+void __eb_put_engine(struct intel_context *context, struct intel_gt *gt);
+
+struct intel_context *
+eb_find_context(struct intel_context *context, unsigned int context_number);
+
+int add_timeline_fence(struct drm_file *file, u32 handle, u64 point,
+		       struct eb_fence *f, bool wait, bool signal);
+void put_fence_array(struct eb_fence *fences, u64 num_fences);
+int await_fence_array(struct eb_fence *fences, u64 num_fences,
+		      struct i915_request *rq);
+void signal_fence_array(struct eb_fence *fences, u64 num_fences,
+			struct dma_fence * const fence);
+
+int eb_requests_add(struct i915_request **requests, unsigned int num_batches,
+		    struct intel_context *context, struct i915_sched_attr sched,
+		    int err);
+void eb_requests_get(struct i915_request **requests, unsigned int num_batches);
+void eb_requests_put(struct i915_request **requests, unsigned int num_batches);
+
+struct dma_fence *__eb_composite_fence_create(struct i915_request **requests,
+					      unsigned int num_batches,
+					      struct intel_context *context);
+
+#endif /* __I915_GEM_EXECBUFFER_COMMON_H */
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC v4 09/14] drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Implement new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
works in vm_bind mode. The vm_bind mode only works with
this new execbuf3 ioctl.

The new execbuf3 ioctl will not have any list of objects to validate
bind as all required objects binding would have been requested by the
userspace before submitting the execbuf3.

Legacy features like relocations etc are not supported by execbuf3.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 575 ++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |   2 +
 drivers/gpu/drm/i915/i915_driver.c            |   1 +
 include/uapi/drm/i915_drm.h                   |  64 ++
 5 files changed, 643 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index bf952f478555..3473ee5825bb 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -150,6 +150,7 @@ gem-y += \
 	gem/i915_gem_domain.o \
 	gem/i915_gem_execbuffer_common.o \
 	gem/i915_gem_execbuffer.o \
+	gem/i915_gem_execbuffer3.o \
 	gem/i915_gem_internal.o \
 	gem/i915_gem_object.o \
 	gem/i915_gem_lmem.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
new file mode 100644
index 000000000000..b6229b955e62
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
@@ -0,0 +1,575 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include <linux/dma-resv.h>
+#include <linux/sync_file.h>
+#include <linux/uaccess.h>
+
+#include "gt/intel_context.h"
+#include "gt/intel_gpu_commands.h"
+#include "gt/intel_gt.h"
+#include "gt/intel_gt_pm.h"
+#include "gt/intel_ring.h"
+
+#include "i915_drv.h"
+#include "i915_file_private.h"
+#include "i915_gem_context.h"
+#include "i915_gem_execbuffer_common.h"
+#include "i915_gem_ioctls.h"
+#include "i915_gem_vm_bind.h"
+#include "i915_trace.h"
+
+#define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
+#define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
+
+/* Catch emission of unexpected errors for CI! */
+#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
+#undef EINVAL
+#define EINVAL ({ \
+	DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \
+	22; \
+})
+#endif
+
+/**
+ * DOC: User command execution with execbuf3 ioctl
+ *
+ * A VM in VM_BIND mode will not support older execbuf mode of binding.
+ * The execbuf ioctl handling in VM_BIND mode differs significantly from the
+ * older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
+ * Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
+ * struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
+ * execlist. Hence, no support for implicit sync.
+ *
+ * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
+ * works with execbuf3 ioctl for submission.
+ *
+ * The execbuf3 ioctl directly specifies the batch addresses instead of as
+ * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
+ * support many of the older features like in/out/submit fences, fence array,
+ * default gem context etc. (See struct drm_i915_gem_execbuffer3).
+ *
+ * In VM_BIND mode, VA allocation is completely managed by the user instead of
+ * the i915 driver. Hence all VA assignment, eviction are not applicable in
+ * VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
+ * be using the i915_vma active reference tracking. It will instead check the
+ * dma-resv object's fence list for that.
+ *
+ * So, a lot of code supporting execbuf2 ioctl, like relocations, VA evictions,
+ * vma lookup table, implicit sync, vma active reference tracking etc., are not
+ * applicable for execbuf3 ioctl.
+ */
+
+/**
+ * struct i915_execbuffer - execbuf struct for execbuf3
+ * @i915: reference to the i915 instance we run on
+ * @file: drm file reference
+ * args: execbuf3 ioctl structure
+ * @gt: reference to the gt instance ioctl submitted for
+ * @context: logical state for the request
+ * @gem_context: callers context
+ * @requests: requests to be build
+ * @composite_fence: used for excl fence in dma_resv objects when > 1 BB submitted
+ * @ww: i915_gem_ww_ctx instance
+ * @num_batches: number of batches submitted
+ * @batch_addresses: addresses corresponds to the submitted batches
+ * @batches: references to the i915_vmas corresponding to the batches
+ */
+struct i915_execbuffer {
+	struct drm_i915_private *i915;
+	struct drm_file *file;
+	struct drm_i915_gem_execbuffer3 *args;
+
+	struct intel_gt *gt;
+	struct intel_context *context;
+	struct i915_gem_context *gem_context;
+
+	struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
+	struct dma_fence *composite_fence;
+
+	struct i915_gem_ww_ctx ww;
+
+	unsigned int num_batches;
+	u64 batch_addresses[MAX_ENGINE_INSTANCE + 1];
+	struct i915_vma *batches[MAX_ENGINE_INSTANCE + 1];
+
+	struct eb_fence *fences;
+	u64 num_fences;
+};
+
+static void eb_unpin_engine(struct i915_execbuffer *eb);
+
+static int eb_select_context(struct i915_execbuffer *eb)
+{
+	struct i915_gem_context *ctx;
+
+	ctx = i915_gem_context_lookup(eb->file->driver_priv, eb->args->ctx_id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	if (!ctx->vm->vm_bind_mode) {
+		i915_gem_context_put(ctx);
+		return -EOPNOTSUPP;
+	}
+
+	eb->gem_context = ctx;
+	return 0;
+}
+
+static struct i915_vma *
+eb_find_vma(struct i915_address_space *vm, u64 addr)
+{
+	u64 va;
+
+	lockdep_assert_held(&vm->vm_bind_lock);
+
+	va = gen8_noncanonical_addr(addr & PIN_OFFSET_MASK);
+	return i915_gem_vm_bind_lookup_vma(vm, va);
+}
+
+static int eb_lookup_vma_all(struct i915_execbuffer *eb)
+{
+	unsigned int i, current_batch = 0;
+	struct i915_vma *vma;
+
+	for (i = 0; i < eb->num_batches; i++) {
+		vma = eb_find_vma(eb->context->vm, eb->batch_addresses[i]);
+		if (!vma)
+			return -EINVAL;
+
+		eb->batches[current_batch] = vma;
+		++current_batch;
+	}
+
+	return 0;
+}
+
+static void eb_release_vma_all(struct i915_execbuffer *eb, bool final)
+{
+	eb_unpin_engine(eb);
+}
+
+/*
+ * Using two helper loops for the order of which requests / batches are created
+ * and added the to backend. Requests are created in order from the parent to
+ * the last child. Requests are added in the reverse order, from the last child
+ * to parent. This is done for locking reasons as the timeline lock is acquired
+ * during request creation and released when the request is added to the
+ * backend. To make lockdep happy (see intel_context_timeline_lock) this must be
+ * the ordering.
+ */
+#define for_each_batch_create_order(_eb) \
+	for (unsigned int i = 0; i < (_eb)->num_batches; ++i)
+
+static int eb_move_to_gpu(struct i915_execbuffer *eb)
+{
+	/* Unconditionally flush any chipset caches (for streaming writes). */
+	intel_gt_chipset_flush(eb->gt);
+
+	return 0;
+}
+
+static int eb_request_submit(struct i915_execbuffer *eb,
+			     struct i915_request *rq,
+			     struct i915_vma *batch,
+			     u64 batch_len)
+{
+	struct intel_engine_cs *engine = rq->context->engine;
+	int err;
+
+	if (intel_context_nopreempt(rq->context))
+		__set_bit(I915_FENCE_FLAG_NOPREEMPT, &rq->fence.flags);
+
+	/*
+	 * After we completed waiting for other engines (using HW semaphores)
+	 * then we can signal that this request/batch is ready to run. This
+	 * allows us to determine if the batch is still waiting on the GPU
+	 * or actually running by checking the breadcrumb.
+	 */
+	if (engine->emit_init_breadcrumb) {
+		err = engine->emit_init_breadcrumb(rq);
+		if (err)
+			return err;
+	}
+
+	return engine->emit_bb_start(rq, batch->node.start, batch_len, 0);
+}
+
+static int eb_submit(struct i915_execbuffer *eb)
+{
+	int err;
+
+	err = eb_move_to_gpu(eb);
+
+	for_each_batch_create_order(eb) {
+		if (!eb->requests[i])
+			break;
+
+		trace_i915_request_queue(eb->requests[i], 0);
+		if (!err)
+			err = eb_request_submit(eb, eb->requests[i],
+						eb->batches[i],
+						eb->batches[i]->size);
+	}
+
+	return err;
+}
+
+static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
+{
+	int err;
+
+	GEM_BUG_ON(eb->args->flags & __EXEC3_ENGINE_PINNED);
+
+	err = __eb_pin_engine(eb->context, &eb->ww, throttle,
+			      eb->file->filp->f_flags & O_NONBLOCK);
+	if (err)
+		return err;
+
+	eb->args->flags |= __EXEC3_ENGINE_PINNED;
+	return 0;
+}
+
+static void eb_unpin_engine(struct i915_execbuffer *eb)
+{
+	if (!(eb->args->flags & __EXEC3_ENGINE_PINNED))
+		return;
+
+	eb->args->flags &= ~__EXEC3_ENGINE_PINNED;
+
+	__eb_unpin_engine(eb->context);
+}
+
+static int
+eb_select_engine(struct i915_execbuffer *eb)
+{
+	struct intel_context *ce;
+	unsigned int idx;
+	int err;
+
+	if (!i915_gem_context_user_engines(eb->gem_context))
+		return -EINVAL;
+
+	idx = eb->args->engine_idx;
+	ce = i915_gem_context_get_engine(eb->gem_context, idx);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
+	eb->num_batches = ce->parallel.number_children + 1;
+
+	err = __eb_select_engine(ce);
+	if (err)
+		goto err;
+
+	eb->context = ce;
+	eb->gt = ce->engine->gt;
+
+	/*
+	 * Make sure engine pool stays alive even if we call intel_context_put
+	 * during ww handling. The pool is destroyed when last pm reference
+	 * is dropped, which breaks our -EDEADLK handling.
+	 */
+	return 0;
+
+err:
+	intel_context_put(ce);
+	return err;
+}
+
+static void
+eb_put_engine(struct i915_execbuffer *eb)
+{
+	__eb_put_engine(eb->context, eb->gt);
+}
+
+static int add_timeline_fence_array(struct i915_execbuffer *eb)
+{
+	struct drm_i915_gem_timeline_fence __user *user_fences;
+	struct eb_fence *f;
+	u64 nfences;
+
+	nfences = eb->args->fence_count;
+	if (!nfences)
+		return 0;
+
+	/* Check multiplication overflow for access_ok() and kvmalloc_array() */
+	BUILD_BUG_ON(sizeof(size_t) > sizeof(unsigned long));
+	if (nfences > min_t(unsigned long,
+			    ULONG_MAX / sizeof(*user_fences),
+			    SIZE_MAX / sizeof(*f)) - eb->num_fences)
+		return -EINVAL;
+
+	user_fences = u64_to_user_ptr(eb->args->timeline_fences);
+	if (!access_ok(user_fences, nfences * sizeof(*user_fences)))
+		return -EFAULT;
+
+	f = krealloc(eb->fences,
+		     (eb->num_fences + nfences) * sizeof(*f),
+		     __GFP_NOWARN | GFP_KERNEL);
+	if (!f)
+		return -ENOMEM;
+
+	eb->fences = f;
+	f += eb->num_fences;
+
+	BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
+		     ~__I915_TIMELINE_FENCE_UNKNOWN_FLAGS);
+
+	while (nfences--) {
+		struct drm_i915_gem_timeline_fence user_fence;
+		bool wait, signal;
+		int ret;
+
+		if (__copy_from_user(&user_fence,
+				     user_fences++,
+				     sizeof(user_fence)))
+			return -EFAULT;
+
+		if (user_fence.flags & __I915_TIMELINE_FENCE_UNKNOWN_FLAGS)
+			return -EINVAL;
+
+		wait = user_fence.flags & I915_EXEC_FENCE_WAIT;
+		signal = user_fence.flags & I915_EXEC_FENCE_SIGNAL;
+		ret = add_timeline_fence(eb->file, user_fence.handle,
+					 user_fence.value, f, wait, signal);
+		if (ret < 0)
+			return ret;
+		else if (!ret)
+			continue;
+
+		f++;
+		eb->num_fences++;
+	}
+
+	return 0;
+}
+
+static int parse_timeline_fences(struct i915_execbuffer *eb)
+{
+	return add_timeline_fence_array(eb);
+}
+
+static int parse_batch_addresses(struct i915_execbuffer *eb)
+{
+	struct drm_i915_gem_execbuffer3 *args = eb->args;
+	u64 __user *batch_addr = u64_to_user_ptr(args->batch_address);
+
+	if (copy_from_user(eb->batch_addresses, batch_addr,
+			   sizeof(batch_addr[0]) * eb->num_batches))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int
+eb_composite_fence_create(struct i915_execbuffer *eb)
+{
+	struct dma_fence *fence;
+
+	fence = __eb_composite_fence_create(eb->requests, eb->num_batches,
+					    eb->context);
+	if (IS_ERR(fence))
+		return PTR_ERR(fence);
+
+	eb->composite_fence = fence;
+
+	return 0;
+}
+
+static int
+eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq)
+{
+	int err;
+
+	if (unlikely(eb->gem_context->syncobj)) {
+		struct dma_fence *fence;
+
+		fence = drm_syncobj_fence_get(eb->gem_context->syncobj);
+		err = i915_request_await_dma_fence(rq, fence);
+		dma_fence_put(fence);
+		if (err)
+			return err;
+	}
+
+	if (eb->fences) {
+		err = await_fence_array(eb->fences, eb->num_fences, rq);
+		if (err)
+			return err;
+	}
+
+	if (intel_context_is_parallel(eb->context)) {
+		err = eb_composite_fence_create(eb);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int eb_requests_create(struct i915_execbuffer *eb)
+{
+	int err;
+
+	for_each_batch_create_order(eb) {
+		/* Allocate a request for this batch buffer nice and early. */
+		eb->requests[i] =
+			i915_request_create(eb_find_context(eb->context, i));
+		if (IS_ERR(eb->requests[i])) {
+			err = PTR_ERR(eb->requests[i]);
+			eb->requests[i] = NULL;
+			return err;
+		}
+
+		/*
+		 * Only the first request added (committed to backend) has to
+		 * take the in fences into account as all subsequent requests
+		 * will have fences inserted inbetween them.
+		 */
+		if (i + 1 == eb->num_batches) {
+			err = eb_fences_add(eb, eb->requests[i]);
+			if (err)
+				return err;
+		}
+
+		if (eb->batches[i])
+			eb->requests[i]->batch_res =
+				i915_vma_resource_get(eb->batches[i]->resource);
+	}
+
+	return 0;
+}
+
+static int
+i915_gem_do_execbuffer(struct drm_device *dev,
+		       struct drm_file *file,
+		       struct drm_i915_gem_execbuffer3 *args)
+{
+	struct drm_i915_private *i915 = to_i915(dev);
+	struct i915_execbuffer eb;
+	bool throttle = true;
+	int err;
+
+	BUILD_BUG_ON(__EXEC3_INTERNAL_FLAGS & ~__I915_EXEC3_UNKNOWN_FLAGS);
+
+	eb.i915 = i915;
+	eb.file = file;
+	eb.args = args;
+
+	eb.fences = NULL;
+	eb.num_fences = 0;
+
+	memset(eb.requests, 0, sizeof(struct i915_request *) *
+	       ARRAY_SIZE(eb.requests));
+	eb.composite_fence = NULL;
+
+	err = parse_timeline_fences(&eb);
+	if (err)
+		return err;
+
+	err = eb_select_context(&eb);
+	if (unlikely(err))
+		goto err_fences;
+
+	err = eb_select_engine(&eb);
+	if (unlikely(err))
+		goto err_context;
+
+	err = parse_batch_addresses(&eb);
+	if (unlikely(err))
+		goto err_engine;
+
+	mutex_lock(&eb.context->vm->vm_bind_lock);
+
+	err = eb_lookup_vma_all(&eb);
+	if (err) {
+		eb_release_vma_all(&eb, true);
+		goto err_vm_bind_lock;
+	}
+
+	i915_gem_ww_ctx_init(&eb.ww, true);
+
+retry_validate:
+	err = eb_pin_engine(&eb, throttle);
+	if (err)
+		goto err_validate;
+
+	/* only throttle once, even if we didn't need to throttle */
+	throttle = false;
+
+err_validate:
+	if (err == -EDEADLK) {
+		eb_release_vma_all(&eb, false);
+		err = i915_gem_ww_ctx_backoff(&eb.ww);
+		if (!err)
+			goto retry_validate;
+	}
+	if (err)
+		goto err_vma;
+
+	ww_acquire_done(&eb.ww.ctx);
+
+	err = eb_requests_create(&eb);
+	if (err) {
+		if (eb.requests[0])
+			goto err_request;
+		else
+			goto err_vma;
+	}
+
+	err = eb_submit(&eb);
+
+err_request:
+	eb_requests_get(eb.requests, eb.num_batches);
+	err = eb_requests_add(eb.requests, eb.num_batches, eb.context,
+			      eb.gem_context->sched, err);
+
+	if (eb.fences)
+		signal_fence_array(eb.fences, eb.num_fences,
+				   eb.composite_fence ?
+				   eb.composite_fence :
+				   &eb.requests[0]->fence);
+
+	if (unlikely(eb.gem_context->syncobj)) {
+		drm_syncobj_replace_fence(eb.gem_context->syncobj,
+					  eb.composite_fence ?
+					  eb.composite_fence :
+					  &eb.requests[0]->fence);
+	}
+
+	if (eb.composite_fence)
+		dma_fence_put(eb.composite_fence);
+
+	eb_requests_put(eb.requests, eb.num_batches);
+
+err_vma:
+	eb_release_vma_all(&eb, true);
+	WARN_ON(err == -EDEADLK);
+	i915_gem_ww_ctx_fini(&eb.ww);
+err_vm_bind_lock:
+	mutex_unlock(&eb.context->vm->vm_bind_lock);
+err_engine:
+	eb_put_engine(&eb);
+err_context:
+	i915_gem_context_put(eb.gem_context);
+err_fences:
+	put_fence_array(eb.fences, eb.num_fences);
+	return err;
+}
+
+int
+i915_gem_execbuffer3_ioctl(struct drm_device *dev, void *data,
+			   struct drm_file *file)
+{
+	struct drm_i915_gem_execbuffer3 *args = data;
+	int err;
+
+	if (args->flags & __I915_EXEC3_UNKNOWN_FLAGS)
+		return -EINVAL;
+
+	err = i915_gem_do_execbuffer(dev, file, args);
+
+	args->flags &= ~__I915_EXEC3_UNKNOWN_FLAGS;
+	return err;
+}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
index 28d6526e32ab..b7a1e9725a84 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
@@ -18,6 +18,8 @@ int i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *file);
 int i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 			       struct drm_file *file);
+int i915_gem_execbuffer3_ioctl(struct drm_device *dev, void *data,
+			       struct drm_file *file);
 int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *file);
 int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index f9e4a784dd0e..697dc39c744c 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1854,6 +1854,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_INIT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER, drm_invalid_op, DRM_AUTH),
 	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER2_WR, i915_gem_execbuffer2_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER3, i915_gem_execbuffer3_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_PIN, i915_gem_reject_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_UNPIN, i915_gem_reject_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_BUSY, i915_gem_busy_ioctl, DRM_RENDER_ALLOW),
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 9f93e4afa1c8..eaeb80a3ede1 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -472,6 +472,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_CREATE_EXT		0x3c
 #define DRM_I915_GEM_VM_BIND		0x3d
 #define DRM_I915_GEM_VM_UNBIND		0x3e
+#define DRM_I915_GEM_EXECBUFFER3	0x3f
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
@@ -538,6 +539,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
 #define DRM_IOCTL_I915_GEM_VM_BIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
 #define DRM_IOCTL_I915_GEM_VM_UNBIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1546,6 +1548,68 @@ struct drm_i915_gem_timeline_fence {
 	__u64 value;
 };
 
+/**
+ * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
+ * ioctl.
+ *
+ * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
+ * only works with this ioctl for submission.
+ * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
+ */
+struct drm_i915_gem_execbuffer3 {
+	/**
+	 * @ctx_id: Context id
+	 *
+	 * Only contexts with user engine map are allowed.
+	 */
+	__u32 ctx_id;
+
+	/**
+	 * @engine_idx: Engine index
+	 *
+	 * An index in the user engine map of the context specified by @ctx_id.
+	 */
+	__u32 engine_idx;
+
+	/**
+	 * @batch_address: Batch gpu virtual address/es.
+	 *
+	 * For normal submission, it is the gpu virtual address of the batch
+	 * buffer. For parallel submission, it is a pointer to an array of
+	 * batch buffer gpu virtual addresses with array size equal to the
+	 * number of (parallel) engines involved in that submission (See
+	 * struct i915_context_engines_parallel_submit).
+	 */
+	__u64 batch_address;
+
+	/** @flags: Currently reserved, MBZ */
+	__u64 flags;
+#define __I915_EXEC3_UNKNOWN_FLAGS (~0)
+
+	/** @rsvd1: Reserved, MBZ */
+	__u32 rsvd1;
+
+	/** @fence_count: Number of fences in @timeline_fences array. */
+	__u32 fence_count;
+
+	/**
+	 * @timeline_fences: Pointer to an array of timeline fences.
+	 *
+	 * Timeline fences are of format struct drm_i915_gem_timeline_fence.
+	 */
+	__u64 timeline_fences;
+
+	/** @rsvd2: Reserved, MBZ */
+	__u64 rsvd2;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
 struct drm_i915_gem_pin {
 	/** Handle of the buffer to be pinned. */
 	__u32 handle;
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-gfx] [RFC v4 09/14] drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Implement new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
works in vm_bind mode. The vm_bind mode only works with
this new execbuf3 ioctl.

The new execbuf3 ioctl will not have any list of objects to validate
bind as all required objects binding would have been requested by the
userspace before submitting the execbuf3.

Legacy features like relocations etc are not supported by execbuf3.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 575 ++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |   2 +
 drivers/gpu/drm/i915/i915_driver.c            |   1 +
 include/uapi/drm/i915_drm.h                   |  64 ++
 5 files changed, 643 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index bf952f478555..3473ee5825bb 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -150,6 +150,7 @@ gem-y += \
 	gem/i915_gem_domain.o \
 	gem/i915_gem_execbuffer_common.o \
 	gem/i915_gem_execbuffer.o \
+	gem/i915_gem_execbuffer3.o \
 	gem/i915_gem_internal.o \
 	gem/i915_gem_object.o \
 	gem/i915_gem_lmem.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
new file mode 100644
index 000000000000..b6229b955e62
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
@@ -0,0 +1,575 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include <linux/dma-resv.h>
+#include <linux/sync_file.h>
+#include <linux/uaccess.h>
+
+#include "gt/intel_context.h"
+#include "gt/intel_gpu_commands.h"
+#include "gt/intel_gt.h"
+#include "gt/intel_gt_pm.h"
+#include "gt/intel_ring.h"
+
+#include "i915_drv.h"
+#include "i915_file_private.h"
+#include "i915_gem_context.h"
+#include "i915_gem_execbuffer_common.h"
+#include "i915_gem_ioctls.h"
+#include "i915_gem_vm_bind.h"
+#include "i915_trace.h"
+
+#define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
+#define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
+
+/* Catch emission of unexpected errors for CI! */
+#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
+#undef EINVAL
+#define EINVAL ({ \
+	DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \
+	22; \
+})
+#endif
+
+/**
+ * DOC: User command execution with execbuf3 ioctl
+ *
+ * A VM in VM_BIND mode will not support older execbuf mode of binding.
+ * The execbuf ioctl handling in VM_BIND mode differs significantly from the
+ * older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
+ * Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
+ * struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
+ * execlist. Hence, no support for implicit sync.
+ *
+ * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
+ * works with execbuf3 ioctl for submission.
+ *
+ * The execbuf3 ioctl directly specifies the batch addresses instead of as
+ * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
+ * support many of the older features like in/out/submit fences, fence array,
+ * default gem context etc. (See struct drm_i915_gem_execbuffer3).
+ *
+ * In VM_BIND mode, VA allocation is completely managed by the user instead of
+ * the i915 driver. Hence all VA assignment, eviction are not applicable in
+ * VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
+ * be using the i915_vma active reference tracking. It will instead check the
+ * dma-resv object's fence list for that.
+ *
+ * So, a lot of code supporting execbuf2 ioctl, like relocations, VA evictions,
+ * vma lookup table, implicit sync, vma active reference tracking etc., are not
+ * applicable for execbuf3 ioctl.
+ */
+
+/**
+ * struct i915_execbuffer - execbuf struct for execbuf3
+ * @i915: reference to the i915 instance we run on
+ * @file: drm file reference
+ * args: execbuf3 ioctl structure
+ * @gt: reference to the gt instance ioctl submitted for
+ * @context: logical state for the request
+ * @gem_context: callers context
+ * @requests: requests to be build
+ * @composite_fence: used for excl fence in dma_resv objects when > 1 BB submitted
+ * @ww: i915_gem_ww_ctx instance
+ * @num_batches: number of batches submitted
+ * @batch_addresses: addresses corresponds to the submitted batches
+ * @batches: references to the i915_vmas corresponding to the batches
+ */
+struct i915_execbuffer {
+	struct drm_i915_private *i915;
+	struct drm_file *file;
+	struct drm_i915_gem_execbuffer3 *args;
+
+	struct intel_gt *gt;
+	struct intel_context *context;
+	struct i915_gem_context *gem_context;
+
+	struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
+	struct dma_fence *composite_fence;
+
+	struct i915_gem_ww_ctx ww;
+
+	unsigned int num_batches;
+	u64 batch_addresses[MAX_ENGINE_INSTANCE + 1];
+	struct i915_vma *batches[MAX_ENGINE_INSTANCE + 1];
+
+	struct eb_fence *fences;
+	u64 num_fences;
+};
+
+static void eb_unpin_engine(struct i915_execbuffer *eb);
+
+static int eb_select_context(struct i915_execbuffer *eb)
+{
+	struct i915_gem_context *ctx;
+
+	ctx = i915_gem_context_lookup(eb->file->driver_priv, eb->args->ctx_id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	if (!ctx->vm->vm_bind_mode) {
+		i915_gem_context_put(ctx);
+		return -EOPNOTSUPP;
+	}
+
+	eb->gem_context = ctx;
+	return 0;
+}
+
+static struct i915_vma *
+eb_find_vma(struct i915_address_space *vm, u64 addr)
+{
+	u64 va;
+
+	lockdep_assert_held(&vm->vm_bind_lock);
+
+	va = gen8_noncanonical_addr(addr & PIN_OFFSET_MASK);
+	return i915_gem_vm_bind_lookup_vma(vm, va);
+}
+
+static int eb_lookup_vma_all(struct i915_execbuffer *eb)
+{
+	unsigned int i, current_batch = 0;
+	struct i915_vma *vma;
+
+	for (i = 0; i < eb->num_batches; i++) {
+		vma = eb_find_vma(eb->context->vm, eb->batch_addresses[i]);
+		if (!vma)
+			return -EINVAL;
+
+		eb->batches[current_batch] = vma;
+		++current_batch;
+	}
+
+	return 0;
+}
+
+static void eb_release_vma_all(struct i915_execbuffer *eb, bool final)
+{
+	eb_unpin_engine(eb);
+}
+
+/*
+ * Using two helper loops for the order of which requests / batches are created
+ * and added the to backend. Requests are created in order from the parent to
+ * the last child. Requests are added in the reverse order, from the last child
+ * to parent. This is done for locking reasons as the timeline lock is acquired
+ * during request creation and released when the request is added to the
+ * backend. To make lockdep happy (see intel_context_timeline_lock) this must be
+ * the ordering.
+ */
+#define for_each_batch_create_order(_eb) \
+	for (unsigned int i = 0; i < (_eb)->num_batches; ++i)
+
+static int eb_move_to_gpu(struct i915_execbuffer *eb)
+{
+	/* Unconditionally flush any chipset caches (for streaming writes). */
+	intel_gt_chipset_flush(eb->gt);
+
+	return 0;
+}
+
+static int eb_request_submit(struct i915_execbuffer *eb,
+			     struct i915_request *rq,
+			     struct i915_vma *batch,
+			     u64 batch_len)
+{
+	struct intel_engine_cs *engine = rq->context->engine;
+	int err;
+
+	if (intel_context_nopreempt(rq->context))
+		__set_bit(I915_FENCE_FLAG_NOPREEMPT, &rq->fence.flags);
+
+	/*
+	 * After we completed waiting for other engines (using HW semaphores)
+	 * then we can signal that this request/batch is ready to run. This
+	 * allows us to determine if the batch is still waiting on the GPU
+	 * or actually running by checking the breadcrumb.
+	 */
+	if (engine->emit_init_breadcrumb) {
+		err = engine->emit_init_breadcrumb(rq);
+		if (err)
+			return err;
+	}
+
+	return engine->emit_bb_start(rq, batch->node.start, batch_len, 0);
+}
+
+static int eb_submit(struct i915_execbuffer *eb)
+{
+	int err;
+
+	err = eb_move_to_gpu(eb);
+
+	for_each_batch_create_order(eb) {
+		if (!eb->requests[i])
+			break;
+
+		trace_i915_request_queue(eb->requests[i], 0);
+		if (!err)
+			err = eb_request_submit(eb, eb->requests[i],
+						eb->batches[i],
+						eb->batches[i]->size);
+	}
+
+	return err;
+}
+
+static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
+{
+	int err;
+
+	GEM_BUG_ON(eb->args->flags & __EXEC3_ENGINE_PINNED);
+
+	err = __eb_pin_engine(eb->context, &eb->ww, throttle,
+			      eb->file->filp->f_flags & O_NONBLOCK);
+	if (err)
+		return err;
+
+	eb->args->flags |= __EXEC3_ENGINE_PINNED;
+	return 0;
+}
+
+static void eb_unpin_engine(struct i915_execbuffer *eb)
+{
+	if (!(eb->args->flags & __EXEC3_ENGINE_PINNED))
+		return;
+
+	eb->args->flags &= ~__EXEC3_ENGINE_PINNED;
+
+	__eb_unpin_engine(eb->context);
+}
+
+static int
+eb_select_engine(struct i915_execbuffer *eb)
+{
+	struct intel_context *ce;
+	unsigned int idx;
+	int err;
+
+	if (!i915_gem_context_user_engines(eb->gem_context))
+		return -EINVAL;
+
+	idx = eb->args->engine_idx;
+	ce = i915_gem_context_get_engine(eb->gem_context, idx);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
+	eb->num_batches = ce->parallel.number_children + 1;
+
+	err = __eb_select_engine(ce);
+	if (err)
+		goto err;
+
+	eb->context = ce;
+	eb->gt = ce->engine->gt;
+
+	/*
+	 * Make sure engine pool stays alive even if we call intel_context_put
+	 * during ww handling. The pool is destroyed when last pm reference
+	 * is dropped, which breaks our -EDEADLK handling.
+	 */
+	return 0;
+
+err:
+	intel_context_put(ce);
+	return err;
+}
+
+static void
+eb_put_engine(struct i915_execbuffer *eb)
+{
+	__eb_put_engine(eb->context, eb->gt);
+}
+
+static int add_timeline_fence_array(struct i915_execbuffer *eb)
+{
+	struct drm_i915_gem_timeline_fence __user *user_fences;
+	struct eb_fence *f;
+	u64 nfences;
+
+	nfences = eb->args->fence_count;
+	if (!nfences)
+		return 0;
+
+	/* Check multiplication overflow for access_ok() and kvmalloc_array() */
+	BUILD_BUG_ON(sizeof(size_t) > sizeof(unsigned long));
+	if (nfences > min_t(unsigned long,
+			    ULONG_MAX / sizeof(*user_fences),
+			    SIZE_MAX / sizeof(*f)) - eb->num_fences)
+		return -EINVAL;
+
+	user_fences = u64_to_user_ptr(eb->args->timeline_fences);
+	if (!access_ok(user_fences, nfences * sizeof(*user_fences)))
+		return -EFAULT;
+
+	f = krealloc(eb->fences,
+		     (eb->num_fences + nfences) * sizeof(*f),
+		     __GFP_NOWARN | GFP_KERNEL);
+	if (!f)
+		return -ENOMEM;
+
+	eb->fences = f;
+	f += eb->num_fences;
+
+	BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
+		     ~__I915_TIMELINE_FENCE_UNKNOWN_FLAGS);
+
+	while (nfences--) {
+		struct drm_i915_gem_timeline_fence user_fence;
+		bool wait, signal;
+		int ret;
+
+		if (__copy_from_user(&user_fence,
+				     user_fences++,
+				     sizeof(user_fence)))
+			return -EFAULT;
+
+		if (user_fence.flags & __I915_TIMELINE_FENCE_UNKNOWN_FLAGS)
+			return -EINVAL;
+
+		wait = user_fence.flags & I915_EXEC_FENCE_WAIT;
+		signal = user_fence.flags & I915_EXEC_FENCE_SIGNAL;
+		ret = add_timeline_fence(eb->file, user_fence.handle,
+					 user_fence.value, f, wait, signal);
+		if (ret < 0)
+			return ret;
+		else if (!ret)
+			continue;
+
+		f++;
+		eb->num_fences++;
+	}
+
+	return 0;
+}
+
+static int parse_timeline_fences(struct i915_execbuffer *eb)
+{
+	return add_timeline_fence_array(eb);
+}
+
+static int parse_batch_addresses(struct i915_execbuffer *eb)
+{
+	struct drm_i915_gem_execbuffer3 *args = eb->args;
+	u64 __user *batch_addr = u64_to_user_ptr(args->batch_address);
+
+	if (copy_from_user(eb->batch_addresses, batch_addr,
+			   sizeof(batch_addr[0]) * eb->num_batches))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int
+eb_composite_fence_create(struct i915_execbuffer *eb)
+{
+	struct dma_fence *fence;
+
+	fence = __eb_composite_fence_create(eb->requests, eb->num_batches,
+					    eb->context);
+	if (IS_ERR(fence))
+		return PTR_ERR(fence);
+
+	eb->composite_fence = fence;
+
+	return 0;
+}
+
+static int
+eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq)
+{
+	int err;
+
+	if (unlikely(eb->gem_context->syncobj)) {
+		struct dma_fence *fence;
+
+		fence = drm_syncobj_fence_get(eb->gem_context->syncobj);
+		err = i915_request_await_dma_fence(rq, fence);
+		dma_fence_put(fence);
+		if (err)
+			return err;
+	}
+
+	if (eb->fences) {
+		err = await_fence_array(eb->fences, eb->num_fences, rq);
+		if (err)
+			return err;
+	}
+
+	if (intel_context_is_parallel(eb->context)) {
+		err = eb_composite_fence_create(eb);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int eb_requests_create(struct i915_execbuffer *eb)
+{
+	int err;
+
+	for_each_batch_create_order(eb) {
+		/* Allocate a request for this batch buffer nice and early. */
+		eb->requests[i] =
+			i915_request_create(eb_find_context(eb->context, i));
+		if (IS_ERR(eb->requests[i])) {
+			err = PTR_ERR(eb->requests[i]);
+			eb->requests[i] = NULL;
+			return err;
+		}
+
+		/*
+		 * Only the first request added (committed to backend) has to
+		 * take the in fences into account as all subsequent requests
+		 * will have fences inserted inbetween them.
+		 */
+		if (i + 1 == eb->num_batches) {
+			err = eb_fences_add(eb, eb->requests[i]);
+			if (err)
+				return err;
+		}
+
+		if (eb->batches[i])
+			eb->requests[i]->batch_res =
+				i915_vma_resource_get(eb->batches[i]->resource);
+	}
+
+	return 0;
+}
+
+static int
+i915_gem_do_execbuffer(struct drm_device *dev,
+		       struct drm_file *file,
+		       struct drm_i915_gem_execbuffer3 *args)
+{
+	struct drm_i915_private *i915 = to_i915(dev);
+	struct i915_execbuffer eb;
+	bool throttle = true;
+	int err;
+
+	BUILD_BUG_ON(__EXEC3_INTERNAL_FLAGS & ~__I915_EXEC3_UNKNOWN_FLAGS);
+
+	eb.i915 = i915;
+	eb.file = file;
+	eb.args = args;
+
+	eb.fences = NULL;
+	eb.num_fences = 0;
+
+	memset(eb.requests, 0, sizeof(struct i915_request *) *
+	       ARRAY_SIZE(eb.requests));
+	eb.composite_fence = NULL;
+
+	err = parse_timeline_fences(&eb);
+	if (err)
+		return err;
+
+	err = eb_select_context(&eb);
+	if (unlikely(err))
+		goto err_fences;
+
+	err = eb_select_engine(&eb);
+	if (unlikely(err))
+		goto err_context;
+
+	err = parse_batch_addresses(&eb);
+	if (unlikely(err))
+		goto err_engine;
+
+	mutex_lock(&eb.context->vm->vm_bind_lock);
+
+	err = eb_lookup_vma_all(&eb);
+	if (err) {
+		eb_release_vma_all(&eb, true);
+		goto err_vm_bind_lock;
+	}
+
+	i915_gem_ww_ctx_init(&eb.ww, true);
+
+retry_validate:
+	err = eb_pin_engine(&eb, throttle);
+	if (err)
+		goto err_validate;
+
+	/* only throttle once, even if we didn't need to throttle */
+	throttle = false;
+
+err_validate:
+	if (err == -EDEADLK) {
+		eb_release_vma_all(&eb, false);
+		err = i915_gem_ww_ctx_backoff(&eb.ww);
+		if (!err)
+			goto retry_validate;
+	}
+	if (err)
+		goto err_vma;
+
+	ww_acquire_done(&eb.ww.ctx);
+
+	err = eb_requests_create(&eb);
+	if (err) {
+		if (eb.requests[0])
+			goto err_request;
+		else
+			goto err_vma;
+	}
+
+	err = eb_submit(&eb);
+
+err_request:
+	eb_requests_get(eb.requests, eb.num_batches);
+	err = eb_requests_add(eb.requests, eb.num_batches, eb.context,
+			      eb.gem_context->sched, err);
+
+	if (eb.fences)
+		signal_fence_array(eb.fences, eb.num_fences,
+				   eb.composite_fence ?
+				   eb.composite_fence :
+				   &eb.requests[0]->fence);
+
+	if (unlikely(eb.gem_context->syncobj)) {
+		drm_syncobj_replace_fence(eb.gem_context->syncobj,
+					  eb.composite_fence ?
+					  eb.composite_fence :
+					  &eb.requests[0]->fence);
+	}
+
+	if (eb.composite_fence)
+		dma_fence_put(eb.composite_fence);
+
+	eb_requests_put(eb.requests, eb.num_batches);
+
+err_vma:
+	eb_release_vma_all(&eb, true);
+	WARN_ON(err == -EDEADLK);
+	i915_gem_ww_ctx_fini(&eb.ww);
+err_vm_bind_lock:
+	mutex_unlock(&eb.context->vm->vm_bind_lock);
+err_engine:
+	eb_put_engine(&eb);
+err_context:
+	i915_gem_context_put(eb.gem_context);
+err_fences:
+	put_fence_array(eb.fences, eb.num_fences);
+	return err;
+}
+
+int
+i915_gem_execbuffer3_ioctl(struct drm_device *dev, void *data,
+			   struct drm_file *file)
+{
+	struct drm_i915_gem_execbuffer3 *args = data;
+	int err;
+
+	if (args->flags & __I915_EXEC3_UNKNOWN_FLAGS)
+		return -EINVAL;
+
+	err = i915_gem_do_execbuffer(dev, file, args);
+
+	args->flags &= ~__I915_EXEC3_UNKNOWN_FLAGS;
+	return err;
+}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
index 28d6526e32ab..b7a1e9725a84 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
@@ -18,6 +18,8 @@ int i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *file);
 int i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 			       struct drm_file *file);
+int i915_gem_execbuffer3_ioctl(struct drm_device *dev, void *data,
+			       struct drm_file *file);
 int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *file);
 int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index f9e4a784dd0e..697dc39c744c 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1854,6 +1854,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_INIT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER, drm_invalid_op, DRM_AUTH),
 	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER2_WR, i915_gem_execbuffer2_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER3, i915_gem_execbuffer3_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_PIN, i915_gem_reject_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_UNPIN, i915_gem_reject_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_BUSY, i915_gem_busy_ioctl, DRM_RENDER_ALLOW),
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 9f93e4afa1c8..eaeb80a3ede1 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -472,6 +472,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_CREATE_EXT		0x3c
 #define DRM_I915_GEM_VM_BIND		0x3d
 #define DRM_I915_GEM_VM_UNBIND		0x3e
+#define DRM_I915_GEM_EXECBUFFER3	0x3f
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
@@ -538,6 +539,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
 #define DRM_IOCTL_I915_GEM_VM_BIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
 #define DRM_IOCTL_I915_GEM_VM_UNBIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1546,6 +1548,68 @@ struct drm_i915_gem_timeline_fence {
 	__u64 value;
 };
 
+/**
+ * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
+ * ioctl.
+ *
+ * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
+ * only works with this ioctl for submission.
+ * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
+ */
+struct drm_i915_gem_execbuffer3 {
+	/**
+	 * @ctx_id: Context id
+	 *
+	 * Only contexts with user engine map are allowed.
+	 */
+	__u32 ctx_id;
+
+	/**
+	 * @engine_idx: Engine index
+	 *
+	 * An index in the user engine map of the context specified by @ctx_id.
+	 */
+	__u32 engine_idx;
+
+	/**
+	 * @batch_address: Batch gpu virtual address/es.
+	 *
+	 * For normal submission, it is the gpu virtual address of the batch
+	 * buffer. For parallel submission, it is a pointer to an array of
+	 * batch buffer gpu virtual addresses with array size equal to the
+	 * number of (parallel) engines involved in that submission (See
+	 * struct i915_context_engines_parallel_submit).
+	 */
+	__u64 batch_address;
+
+	/** @flags: Currently reserved, MBZ */
+	__u64 flags;
+#define __I915_EXEC3_UNKNOWN_FLAGS (~0)
+
+	/** @rsvd1: Reserved, MBZ */
+	__u32 rsvd1;
+
+	/** @fence_count: Number of fences in @timeline_fences array. */
+	__u32 fence_count;
+
+	/**
+	 * @timeline_fences: Pointer to an array of timeline fences.
+	 *
+	 * Timeline fences are of format struct drm_i915_gem_timeline_fence.
+	 */
+	__u64 timeline_fences;
+
+	/** @rsvd2: Reserved, MBZ */
+	__u64 rsvd2;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
 struct drm_i915_gem_pin {
 	/** Handle of the buffer to be pinned. */
 	__u32 handle;
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC v4 10/14] drm/i915/vm_bind: Update i915_vma_verify_bind_complete()
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Ensure i915_vma_verify_bind_complete() handles case where bind
is not initiated. Also make it non static, add documentation
and move it out of CONFIG_DRM_I915_DEBUG_GEM.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_vma.c | 16 +++++++++++-----
 drivers/gpu/drm/i915/i915_vma.h |  1 +
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index f7d711e675d6..24f171588f56 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -406,12 +406,21 @@ int i915_vma_sync(struct i915_vma *vma)
 	return i915_vm_sync(vma->vm);
 }
 
-#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
-static int i915_vma_verify_bind_complete(struct i915_vma *vma)
+/**
+ * i915_vma_verify_bind_complete() - Check for the bind completion of the vma
+ * @vma: vma to check for bind completion
+ *
+ * Returns: 0 if the vma bind is completed. Error code otherwise.
+ */
+int i915_vma_verify_bind_complete(struct i915_vma *vma)
 {
 	struct dma_fence *fence = i915_active_fence_get(&vma->active.excl);
 	int err;
 
+	/* Ensure vma bind is initiated */
+	if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
+		return -EINVAL;
+
 	if (!fence)
 		return 0;
 
@@ -424,9 +433,6 @@ static int i915_vma_verify_bind_complete(struct i915_vma *vma)
 
 	return err;
 }
-#else
-#define i915_vma_verify_bind_complete(_vma) 0
-#endif
 
 I915_SELFTEST_EXPORT void
 i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res,
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index aa536c9ce472..3a47db2d85f5 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -433,6 +433,7 @@ void i915_vma_make_purgeable(struct i915_vma *vma);
 
 int i915_vma_wait_for_bind(struct i915_vma *vma);
 int i915_vma_sync(struct i915_vma *vma);
+int i915_vma_verify_bind_complete(struct i915_vma *vma);
 
 /**
  * i915_vma_get_current_resource - Get the current resource of the vma
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-gfx] [RFC v4 10/14] drm/i915/vm_bind: Update i915_vma_verify_bind_complete()
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Ensure i915_vma_verify_bind_complete() handles case where bind
is not initiated. Also make it non static, add documentation
and move it out of CONFIG_DRM_I915_DEBUG_GEM.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_vma.c | 16 +++++++++++-----
 drivers/gpu/drm/i915/i915_vma.h |  1 +
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index f7d711e675d6..24f171588f56 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -406,12 +406,21 @@ int i915_vma_sync(struct i915_vma *vma)
 	return i915_vm_sync(vma->vm);
 }
 
-#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
-static int i915_vma_verify_bind_complete(struct i915_vma *vma)
+/**
+ * i915_vma_verify_bind_complete() - Check for the bind completion of the vma
+ * @vma: vma to check for bind completion
+ *
+ * Returns: 0 if the vma bind is completed. Error code otherwise.
+ */
+int i915_vma_verify_bind_complete(struct i915_vma *vma)
 {
 	struct dma_fence *fence = i915_active_fence_get(&vma->active.excl);
 	int err;
 
+	/* Ensure vma bind is initiated */
+	if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
+		return -EINVAL;
+
 	if (!fence)
 		return 0;
 
@@ -424,9 +433,6 @@ static int i915_vma_verify_bind_complete(struct i915_vma *vma)
 
 	return err;
 }
-#else
-#define i915_vma_verify_bind_complete(_vma) 0
-#endif
 
 I915_SELFTEST_EXPORT void
 i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res,
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index aa536c9ce472..3a47db2d85f5 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -433,6 +433,7 @@ void i915_vma_make_purgeable(struct i915_vma *vma);
 
 int i915_vma_wait_for_bind(struct i915_vma *vma);
 int i915_vma_sync(struct i915_vma *vma);
+int i915_vma_verify_bind_complete(struct i915_vma *vma);
 
 /**
  * i915_vma_get_current_resource - Get the current resource of the vma
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC v4 11/14] drm/i915/vm_bind: Handle persistent vmas in execbuf3
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Handle persistent (VM_BIND) mappings during the request submission
in the execbuf3 path.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 189 +++++++++++++++++-
 1 file changed, 188 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
index b6229b955e62..82a068d03440 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/dma-resv.h>
+#include <linux/lockdep.h>
 #include <linux/sync_file.h>
 #include <linux/uaccess.h>
 
@@ -21,6 +22,7 @@
 #include "i915_gem_vm_bind.h"
 #include "i915_trace.h"
 
+#define __EXEC3_HAS_PIN			BIT_ULL(33)
 #define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
 #define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
 
@@ -44,7 +46,9 @@
  * execlist. Hence, no support for implicit sync.
  *
  * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
- * works with execbuf3 ioctl for submission.
+ * works with execbuf3 ioctl for submission. All BOs mapped on that VM (through
+ * VM_BIND call) at the time of execbuf3 call are deemed required for that
+ * submission.
  *
  * The execbuf3 ioctl directly specifies the batch addresses instead of as
  * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
@@ -60,6 +64,13 @@
  * So, a lot of code supporting execbuf2 ioctl, like relocations, VA evictions,
  * vma lookup table, implicit sync, vma active reference tracking etc., are not
  * applicable for execbuf3 ioctl.
+ *
+ * During each execbuf submission, request fence is added to all VM_BIND mapped
+ * objects with DMA_RESV_USAGE_BOOKKEEP. The DMA_RESV_USAGE_BOOKKEEP usage will
+ * prevent over sync (See enum dma_resv_usage). Note that DRM_I915_GEM_WAIT and
+ * DRM_I915_GEM_BUSY ioctls do not check for DMA_RESV_USAGE_BOOKKEEP usage and
+ * hence should not be used for end of batch check. Instead, the execbuf3
+ * timeline out fence should be used for end of batch check.
  */
 
 /**
@@ -129,6 +140,23 @@ eb_find_vma(struct i915_address_space *vm, u64 addr)
 	return i915_gem_vm_bind_lookup_vma(vm, va);
 }
 
+static void eb_scoop_unbound_vma_all(struct i915_address_space *vm)
+{
+	struct i915_vma *vma, *vn;
+
+	/**
+	 * Move all unbound vmas back into vm_bind_list so that they are
+	 * revalidated.
+	 */
+	spin_lock(&vm->vm_rebind_lock);
+	list_for_each_entry_safe(vma, vn, &vm->vm_rebind_list, vm_rebind_link) {
+		list_del_init(&vma->vm_rebind_link);
+		if (!list_empty(&vma->vm_bind_link))
+			list_move_tail(&vma->vm_bind_link, &vm->vm_bind_list);
+	}
+	spin_unlock(&vm->vm_rebind_lock);
+}
+
 static int eb_lookup_vma_all(struct i915_execbuffer *eb)
 {
 	unsigned int i, current_batch = 0;
@@ -143,14 +171,121 @@ static int eb_lookup_vma_all(struct i915_execbuffer *eb)
 		++current_batch;
 	}
 
+	eb_scoop_unbound_vma_all(eb->context->vm);
+
+	return 0;
+}
+
+static int eb_lock_vma_all(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma;
+	int err;
+
+	err = i915_gem_object_lock(eb->context->vm->root_obj, &eb->ww);
+	if (err)
+		return err;
+
+	list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
+			    non_priv_vm_bind_link) {
+		err = i915_gem_object_lock(vma->obj, &eb->ww);
+		if (err)
+			return err;
+	}
+
 	return 0;
 }
 
+static void eb_release_persistent_vma_all(struct i915_execbuffer *eb,
+					  bool final)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma, *vn;
+
+	lockdep_assert_held(&vm->vm_bind_lock);
+
+	if (!(eb->args->flags & __EXEC3_HAS_PIN))
+		return;
+
+	assert_object_held(vm->root_obj);
+
+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
+		__i915_vma_unpin(vma);
+
+	eb->args->flags &= ~__EXEC3_HAS_PIN;
+	if (!final)
+		return;
+
+	list_for_each_entry_safe(vma, vn, &vm->vm_bind_list, vm_bind_link)
+		if (i915_vma_verify_bind_complete(vma))
+			list_move_tail(&vma->vm_bind_link, &vm->vm_bound_list);
+}
+
 static void eb_release_vma_all(struct i915_execbuffer *eb, bool final)
 {
+	eb_release_persistent_vma_all(eb, final);
 	eb_unpin_engine(eb);
 }
 
+static int eb_reserve_fence_for_persistent_vma_all(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma;
+	int ret;
+
+	ret = dma_resv_reserve_fences(vm->root_obj->base.resv, 1);
+	if (ret)
+		return ret;
+
+	list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
+			    non_priv_vm_bind_link) {
+		ret = dma_resv_reserve_fences(vma->obj->base.resv, 1);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int eb_validate_persistent_vma_all(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma, *last_pinned_vma = NULL;
+	int ret = 0;
+
+	lockdep_assert_held(&vm->vm_bind_lock);
+	assert_object_held(vm->root_obj);
+
+	ret = eb_reserve_fence_for_persistent_vma_all(eb);
+	if (ret)
+		return ret;
+
+	if (list_empty(&vm->vm_bind_list))
+		return 0;
+
+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
+		u64 pin_flags = vma->start | PIN_OFFSET_FIXED | PIN_USER;
+
+		ret = i915_vma_pin_ww(vma, &eb->ww, 0, 0, pin_flags);
+		if (ret)
+			break;
+
+		last_pinned_vma = vma;
+	}
+
+	if (ret && last_pinned_vma) {
+		list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
+			__i915_vma_unpin(vma);
+			if (vma == last_pinned_vma)
+				break;
+		}
+	} else if (last_pinned_vma) {
+		eb->args->flags |= __EXEC3_HAS_PIN;
+	}
+
+	return ret;
+}
+
 /*
  * Using two helper loops for the order of which requests / batches are created
  * and added the to backend. Requests are created in order from the parent to
@@ -163,8 +298,43 @@ static void eb_release_vma_all(struct i915_execbuffer *eb, bool final)
 #define for_each_batch_create_order(_eb) \
 	for (unsigned int i = 0; i < (_eb)->num_batches; ++i)
 
+static void __eb_persistent_add_shared_fence(struct drm_i915_gem_object *obj,
+					     struct dma_fence *fence)
+{
+	dma_resv_add_fence(obj->base.resv, fence, DMA_RESV_USAGE_BOOKKEEP);
+	obj->write_domain = 0;
+	obj->read_domains |= I915_GEM_GPU_DOMAINS;
+	obj->mm.dirty = true;
+}
+
+static void eb_persistent_add_shared_fence(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct dma_fence *fence;
+	struct i915_vma *vma;
+
+	fence = eb->composite_fence ? eb->composite_fence :
+		&eb->requests[0]->fence;
+
+	__eb_persistent_add_shared_fence(vm->root_obj, fence);
+	list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
+			    non_priv_vm_bind_link)
+		__eb_persistent_add_shared_fence(vma->obj, fence);
+}
+
+static void eb_move_all_persistent_vma_to_active(struct i915_execbuffer *eb)
+{
+	/* Add fence to BOs dma-resv fence list */
+	eb_persistent_add_shared_fence(eb);
+}
+
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
+	lockdep_assert_held(&eb->context->vm->vm_bind_lock);
+	assert_object_held(eb->context->vm->root_obj);
+
+	eb_move_all_persistent_vma_to_active(eb);
+
 	/* Unconditionally flush any chipset caches (for streaming writes). */
 	intel_gt_chipset_flush(eb->gt);
 
@@ -482,6 +652,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 
 	mutex_lock(&eb.context->vm->vm_bind_lock);
 
+lookup_vmas:
 	err = eb_lookup_vma_all(&eb);
 	if (err) {
 		eb_release_vma_all(&eb, true);
@@ -498,6 +669,22 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	/* only throttle once, even if we didn't need to throttle */
 	throttle = false;
 
+	err = eb_lock_vma_all(&eb);
+	if (err)
+		goto err_validate;
+
+	/**
+	 * No object unbinds possible once the objects are locked. So,
+	 * check for any unbinds here, which needs to be scooped up.
+	 */
+	if (!list_empty(&eb.context->vm->vm_rebind_list)) {
+		eb_release_vma_all(&eb, true);
+		i915_gem_ww_ctx_fini(&eb.ww);
+		goto lookup_vmas;
+	}
+
+	err = eb_validate_persistent_vma_all(&eb);
+
 err_validate:
 	if (err == -EDEADLK) {
 		eb_release_vma_all(&eb, false);
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-gfx] [RFC v4 11/14] drm/i915/vm_bind: Handle persistent vmas in execbuf3
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Handle persistent (VM_BIND) mappings during the request submission
in the execbuf3 path.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 189 +++++++++++++++++-
 1 file changed, 188 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
index b6229b955e62..82a068d03440 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/dma-resv.h>
+#include <linux/lockdep.h>
 #include <linux/sync_file.h>
 #include <linux/uaccess.h>
 
@@ -21,6 +22,7 @@
 #include "i915_gem_vm_bind.h"
 #include "i915_trace.h"
 
+#define __EXEC3_HAS_PIN			BIT_ULL(33)
 #define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
 #define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
 
@@ -44,7 +46,9 @@
  * execlist. Hence, no support for implicit sync.
  *
  * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
- * works with execbuf3 ioctl for submission.
+ * works with execbuf3 ioctl for submission. All BOs mapped on that VM (through
+ * VM_BIND call) at the time of execbuf3 call are deemed required for that
+ * submission.
  *
  * The execbuf3 ioctl directly specifies the batch addresses instead of as
  * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
@@ -60,6 +64,13 @@
  * So, a lot of code supporting execbuf2 ioctl, like relocations, VA evictions,
  * vma lookup table, implicit sync, vma active reference tracking etc., are not
  * applicable for execbuf3 ioctl.
+ *
+ * During each execbuf submission, request fence is added to all VM_BIND mapped
+ * objects with DMA_RESV_USAGE_BOOKKEEP. The DMA_RESV_USAGE_BOOKKEEP usage will
+ * prevent over sync (See enum dma_resv_usage). Note that DRM_I915_GEM_WAIT and
+ * DRM_I915_GEM_BUSY ioctls do not check for DMA_RESV_USAGE_BOOKKEEP usage and
+ * hence should not be used for end of batch check. Instead, the execbuf3
+ * timeline out fence should be used for end of batch check.
  */
 
 /**
@@ -129,6 +140,23 @@ eb_find_vma(struct i915_address_space *vm, u64 addr)
 	return i915_gem_vm_bind_lookup_vma(vm, va);
 }
 
+static void eb_scoop_unbound_vma_all(struct i915_address_space *vm)
+{
+	struct i915_vma *vma, *vn;
+
+	/**
+	 * Move all unbound vmas back into vm_bind_list so that they are
+	 * revalidated.
+	 */
+	spin_lock(&vm->vm_rebind_lock);
+	list_for_each_entry_safe(vma, vn, &vm->vm_rebind_list, vm_rebind_link) {
+		list_del_init(&vma->vm_rebind_link);
+		if (!list_empty(&vma->vm_bind_link))
+			list_move_tail(&vma->vm_bind_link, &vm->vm_bind_list);
+	}
+	spin_unlock(&vm->vm_rebind_lock);
+}
+
 static int eb_lookup_vma_all(struct i915_execbuffer *eb)
 {
 	unsigned int i, current_batch = 0;
@@ -143,14 +171,121 @@ static int eb_lookup_vma_all(struct i915_execbuffer *eb)
 		++current_batch;
 	}
 
+	eb_scoop_unbound_vma_all(eb->context->vm);
+
+	return 0;
+}
+
+static int eb_lock_vma_all(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma;
+	int err;
+
+	err = i915_gem_object_lock(eb->context->vm->root_obj, &eb->ww);
+	if (err)
+		return err;
+
+	list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
+			    non_priv_vm_bind_link) {
+		err = i915_gem_object_lock(vma->obj, &eb->ww);
+		if (err)
+			return err;
+	}
+
 	return 0;
 }
 
+static void eb_release_persistent_vma_all(struct i915_execbuffer *eb,
+					  bool final)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma, *vn;
+
+	lockdep_assert_held(&vm->vm_bind_lock);
+
+	if (!(eb->args->flags & __EXEC3_HAS_PIN))
+		return;
+
+	assert_object_held(vm->root_obj);
+
+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
+		__i915_vma_unpin(vma);
+
+	eb->args->flags &= ~__EXEC3_HAS_PIN;
+	if (!final)
+		return;
+
+	list_for_each_entry_safe(vma, vn, &vm->vm_bind_list, vm_bind_link)
+		if (i915_vma_verify_bind_complete(vma))
+			list_move_tail(&vma->vm_bind_link, &vm->vm_bound_list);
+}
+
 static void eb_release_vma_all(struct i915_execbuffer *eb, bool final)
 {
+	eb_release_persistent_vma_all(eb, final);
 	eb_unpin_engine(eb);
 }
 
+static int eb_reserve_fence_for_persistent_vma_all(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma;
+	int ret;
+
+	ret = dma_resv_reserve_fences(vm->root_obj->base.resv, 1);
+	if (ret)
+		return ret;
+
+	list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
+			    non_priv_vm_bind_link) {
+		ret = dma_resv_reserve_fences(vma->obj->base.resv, 1);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int eb_validate_persistent_vma_all(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma, *last_pinned_vma = NULL;
+	int ret = 0;
+
+	lockdep_assert_held(&vm->vm_bind_lock);
+	assert_object_held(vm->root_obj);
+
+	ret = eb_reserve_fence_for_persistent_vma_all(eb);
+	if (ret)
+		return ret;
+
+	if (list_empty(&vm->vm_bind_list))
+		return 0;
+
+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
+		u64 pin_flags = vma->start | PIN_OFFSET_FIXED | PIN_USER;
+
+		ret = i915_vma_pin_ww(vma, &eb->ww, 0, 0, pin_flags);
+		if (ret)
+			break;
+
+		last_pinned_vma = vma;
+	}
+
+	if (ret && last_pinned_vma) {
+		list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
+			__i915_vma_unpin(vma);
+			if (vma == last_pinned_vma)
+				break;
+		}
+	} else if (last_pinned_vma) {
+		eb->args->flags |= __EXEC3_HAS_PIN;
+	}
+
+	return ret;
+}
+
 /*
  * Using two helper loops for the order of which requests / batches are created
  * and added the to backend. Requests are created in order from the parent to
@@ -163,8 +298,43 @@ static void eb_release_vma_all(struct i915_execbuffer *eb, bool final)
 #define for_each_batch_create_order(_eb) \
 	for (unsigned int i = 0; i < (_eb)->num_batches; ++i)
 
+static void __eb_persistent_add_shared_fence(struct drm_i915_gem_object *obj,
+					     struct dma_fence *fence)
+{
+	dma_resv_add_fence(obj->base.resv, fence, DMA_RESV_USAGE_BOOKKEEP);
+	obj->write_domain = 0;
+	obj->read_domains |= I915_GEM_GPU_DOMAINS;
+	obj->mm.dirty = true;
+}
+
+static void eb_persistent_add_shared_fence(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct dma_fence *fence;
+	struct i915_vma *vma;
+
+	fence = eb->composite_fence ? eb->composite_fence :
+		&eb->requests[0]->fence;
+
+	__eb_persistent_add_shared_fence(vm->root_obj, fence);
+	list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
+			    non_priv_vm_bind_link)
+		__eb_persistent_add_shared_fence(vma->obj, fence);
+}
+
+static void eb_move_all_persistent_vma_to_active(struct i915_execbuffer *eb)
+{
+	/* Add fence to BOs dma-resv fence list */
+	eb_persistent_add_shared_fence(eb);
+}
+
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
+	lockdep_assert_held(&eb->context->vm->vm_bind_lock);
+	assert_object_held(eb->context->vm->root_obj);
+
+	eb_move_all_persistent_vma_to_active(eb);
+
 	/* Unconditionally flush any chipset caches (for streaming writes). */
 	intel_gt_chipset_flush(eb->gt);
 
@@ -482,6 +652,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 
 	mutex_lock(&eb.context->vm->vm_bind_lock);
 
+lookup_vmas:
 	err = eb_lookup_vma_all(&eb);
 	if (err) {
 		eb_release_vma_all(&eb, true);
@@ -498,6 +669,22 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	/* only throttle once, even if we didn't need to throttle */
 	throttle = false;
 
+	err = eb_lock_vma_all(&eb);
+	if (err)
+		goto err_validate;
+
+	/**
+	 * No object unbinds possible once the objects are locked. So,
+	 * check for any unbinds here, which needs to be scooped up.
+	 */
+	if (!list_empty(&eb.context->vm->vm_rebind_list)) {
+		eb_release_vma_all(&eb, true);
+		i915_gem_ww_ctx_fini(&eb.ww);
+		goto lookup_vmas;
+	}
+
+	err = eb_validate_persistent_vma_all(&eb);
+
 err_validate:
 	if (err == -EDEADLK) {
 		eb_release_vma_all(&eb, false);
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC v4 12/14] drm/i915/vm_bind: userptr dma-resv changes
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

For persistent (vm_bind) vmas of userptr BOs, handle the user
page pinning by using the i915_gem_object_userptr_submit_init()
/done() functions

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 99 +++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   | 17 ++++
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 16 +++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  4 +
 drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
 6 files changed, 140 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
index 82a068d03440..7467e3daac5c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
@@ -22,6 +22,7 @@
 #include "i915_gem_vm_bind.h"
 #include "i915_trace.h"
 
+#define __EXEC3_USERPTR_USED		BIT_ULL(34)
 #define __EXEC3_HAS_PIN			BIT_ULL(33)
 #define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
 #define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
@@ -144,6 +145,21 @@ static void eb_scoop_unbound_vma_all(struct i915_address_space *vm)
 {
 	struct i915_vma *vma, *vn;
 
+#ifdef CONFIG_MMU_NOTIFIER
+	/**
+	 * Move all invalidated userptr vmas back into vm_bind_list so that
+	 * they are looked up and revalidated.
+	 */
+	spin_lock(&vm->userptr_invalidated_lock);
+	list_for_each_entry_safe(vma, vn, &vm->userptr_invalidated_list,
+				 userptr_invalidated_link) {
+		list_del_init(&vma->userptr_invalidated_link);
+		if (!list_empty(&vma->vm_bind_link))
+			list_move_tail(&vma->vm_bind_link, &vm->vm_bind_list);
+	}
+	spin_unlock(&vm->userptr_invalidated_lock);
+#endif
+
 	/**
 	 * Move all unbound vmas back into vm_bind_list so that they are
 	 * revalidated.
@@ -157,10 +173,47 @@ static void eb_scoop_unbound_vma_all(struct i915_address_space *vm)
 	spin_unlock(&vm->vm_rebind_lock);
 }
 
+static int eb_lookup_persistent_userptr_vmas(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *last_vma = NULL;
+	struct i915_vma *vma;
+	int err;
+
+	lockdep_assert_held(&vm->vm_bind_lock);
+
+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
+		if (!i915_gem_object_is_userptr(vma->obj))
+			continue;
+
+		err = i915_gem_object_userptr_submit_init(vma->obj);
+		if (err)
+			return err;
+
+		/**
+		 * The above submit_init() call does the object unbind and
+		 * hence adds vma into vm_rebind_list. Remove it from that
+		 * list as it is already scooped for revalidation.
+		 */
+		spin_lock(&vm->vm_rebind_lock);
+		if (!list_empty(&vma->vm_rebind_link))
+			list_del_init(&vma->vm_rebind_link);
+		spin_unlock(&vm->vm_rebind_lock);
+
+		last_vma = vma;
+	}
+
+	if (last_vma)
+		eb->args->flags |= __EXEC3_USERPTR_USED;
+
+	return 0;
+}
+
 static int eb_lookup_vma_all(struct i915_execbuffer *eb)
 {
 	unsigned int i, current_batch = 0;
 	struct i915_vma *vma;
+	int err = 0;
 
 	for (i = 0; i < eb->num_batches; i++) {
 		vma = eb_find_vma(eb->context->vm, eb->batch_addresses[i]);
@@ -173,6 +226,10 @@ static int eb_lookup_vma_all(struct i915_execbuffer *eb)
 
 	eb_scoop_unbound_vma_all(eb->context->vm);
 
+	err = eb_lookup_persistent_userptr_vmas(eb);
+	if (err)
+		return err;
+
 	return 0;
 }
 
@@ -330,15 +387,57 @@ static void eb_move_all_persistent_vma_to_active(struct i915_execbuffer *eb)
 
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
+	int err = 0;
+
 	lockdep_assert_held(&eb->context->vm->vm_bind_lock);
 	assert_object_held(eb->context->vm->root_obj);
 
 	eb_move_all_persistent_vma_to_active(eb);
 
+#ifdef CONFIG_MMU_NOTIFIER
+	/* Check for further userptr invalidations */
+	spin_lock(&eb->context->vm->userptr_invalidated_lock);
+	if (!list_empty(&eb->context->vm->userptr_invalidated_list))
+		err = -EAGAIN;
+	spin_unlock(&eb->context->vm->userptr_invalidated_lock);
+
+	if (!err && (eb->args->flags & __EXEC3_USERPTR_USED)) {
+		struct i915_vma *vma;
+
+		lockdep_assert_held(&eb->context->vm->vm_bind_lock);
+		assert_object_held(eb->context->vm->root_obj);
+
+		read_lock(&eb->i915->mm.notifier_lock);
+		list_for_each_entry(vma, &eb->context->vm->vm_bind_list,
+				    vm_bind_link) {
+			if (!i915_gem_object_is_userptr(vma->obj))
+				continue;
+
+			err = i915_gem_object_userptr_submit_done(vma->obj);
+			if (err)
+				break;
+		}
+
+		read_unlock(&eb->i915->mm.notifier_lock);
+	}
+#endif
+
+	if (unlikely(err))
+		goto err_skip;
+
 	/* Unconditionally flush any chipset caches (for streaming writes). */
 	intel_gt_chipset_flush(eb->gt);
 
 	return 0;
+
+err_skip:
+	for_each_batch_create_order(eb) {
+		if (!eb->requests[i])
+			break;
+
+		i915_request_set_error_once(eb->requests[i], err);
+	}
+	return err;
 }
 
 static int eb_request_submit(struct i915_execbuffer *eb,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 8423df021b71..c0869f102f28 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -63,6 +63,7 @@ static bool i915_gem_userptr_invalidate(struct mmu_interval_notifier *mni,
 {
 	struct drm_i915_gem_object *obj = container_of(mni, struct drm_i915_gem_object, userptr.notifier);
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct i915_vma *vma;
 	long r;
 
 	if (!mmu_notifier_range_blockable(range))
@@ -85,6 +86,22 @@ static bool i915_gem_userptr_invalidate(struct mmu_interval_notifier *mni,
 	if (current->flags & PF_EXITING)
 		return true;
 
+	/**
+	 * Add persistent vmas into userptr_invalidated list for relookup
+	 * and revalidation.
+	 */
+	spin_lock(&obj->vma.lock);
+	list_for_each_entry(vma, &obj->vma.list, obj_link) {
+		if (!i915_vma_is_persistent(vma))
+			continue;
+
+		spin_lock(&vma->vm->userptr_invalidated_lock);
+		list_add_tail(&vma->userptr_invalidated_link,
+			      &vma->vm->userptr_invalidated_list);
+		spin_unlock(&vma->vm->userptr_invalidated_lock);
+	}
+	spin_unlock(&obj->vma.lock);
+
 	/* we will unbind on next submission, still have userptr pins */
 	r = dma_resv_wait_timeout(obj->base.resv, DMA_RESV_USAGE_BOOKKEEP, false,
 				  MAX_SCHEDULE_TIMEOUT);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 5cd788404ee7..3087731cc0c0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -294,6 +294,12 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		goto put_obj;
 	}
 
+	if (i915_gem_object_is_userptr(obj)) {
+		ret = i915_gem_object_userptr_submit_init(obj);
+		if (ret)
+			goto put_obj;
+	}
+
 	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
 	if (ret)
 		goto put_obj;
@@ -325,6 +331,16 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		/* Make it evictable */
 		__i915_vma_unpin(vma);
 
+#ifdef CONFIG_MMU_NOTIFIER
+		if (i915_gem_object_is_userptr(obj)) {
+			read_lock(&vm->i915->mm.notifier_lock);
+			ret = i915_gem_object_userptr_submit_done(obj);
+			read_unlock(&vm->i915->mm.notifier_lock);
+			if (ret)
+				continue;
+		}
+#endif
+
 		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
 		i915_vm_bind_it_insert(vma, &vm->va);
 		if (!obj->priv_root)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 6db31197fa87..401202391649 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -298,6 +298,8 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	GEM_BUG_ON(IS_ERR(vm->root_obj));
 	INIT_LIST_HEAD(&vm->vm_rebind_list);
 	spin_lock_init(&vm->vm_rebind_lock);
+	spin_lock_init(&vm->userptr_invalidated_lock);
+	INIT_LIST_HEAD(&vm->userptr_invalidated_list);
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index b73d35b4e05d..c3069ee42b5a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -277,6 +277,10 @@ struct i915_address_space {
 	struct list_head vm_rebind_list;
 	/* @vm_rebind_lock: protects vm_rebound_list */
 	spinlock_t vm_rebind_lock;
+	/* @userptr_invalidated_list: list of invalidated userptr vmas */
+	struct list_head userptr_invalidated_list;
+	/* @userptr_invalidated_lock: protects userptr_invalidated_list */
+	spinlock_t userptr_invalidated_lock;
 	/* @va: tree of persistent vmas */
 	struct rb_root_cached va;
 	struct list_head non_priv_vm_bind_list;
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index 7fdbf73666e9..636520a83e4f 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -310,6 +310,8 @@ struct i915_vma {
 	struct list_head non_priv_vm_bind_link;
 	/* @vm_rebind_link: link to vm_rebind_list and protected by vm_rebind_lock */
 	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
+	/*@userptr_invalidated_link: link to the vm->userptr_invalidated_list */
+	struct list_head userptr_invalidated_link;
 
 	/** Timeline fence for vm_bind completion notification */
 	struct {
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-gfx] [RFC v4 12/14] drm/i915/vm_bind: userptr dma-resv changes
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

For persistent (vm_bind) vmas of userptr BOs, handle the user
page pinning by using the i915_gem_object_userptr_submit_init()
/done() functions

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 99 +++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   | 17 ++++
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 16 +++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  4 +
 drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
 6 files changed, 140 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
index 82a068d03440..7467e3daac5c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
@@ -22,6 +22,7 @@
 #include "i915_gem_vm_bind.h"
 #include "i915_trace.h"
 
+#define __EXEC3_USERPTR_USED		BIT_ULL(34)
 #define __EXEC3_HAS_PIN			BIT_ULL(33)
 #define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
 #define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
@@ -144,6 +145,21 @@ static void eb_scoop_unbound_vma_all(struct i915_address_space *vm)
 {
 	struct i915_vma *vma, *vn;
 
+#ifdef CONFIG_MMU_NOTIFIER
+	/**
+	 * Move all invalidated userptr vmas back into vm_bind_list so that
+	 * they are looked up and revalidated.
+	 */
+	spin_lock(&vm->userptr_invalidated_lock);
+	list_for_each_entry_safe(vma, vn, &vm->userptr_invalidated_list,
+				 userptr_invalidated_link) {
+		list_del_init(&vma->userptr_invalidated_link);
+		if (!list_empty(&vma->vm_bind_link))
+			list_move_tail(&vma->vm_bind_link, &vm->vm_bind_list);
+	}
+	spin_unlock(&vm->userptr_invalidated_lock);
+#endif
+
 	/**
 	 * Move all unbound vmas back into vm_bind_list so that they are
 	 * revalidated.
@@ -157,10 +173,47 @@ static void eb_scoop_unbound_vma_all(struct i915_address_space *vm)
 	spin_unlock(&vm->vm_rebind_lock);
 }
 
+static int eb_lookup_persistent_userptr_vmas(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *last_vma = NULL;
+	struct i915_vma *vma;
+	int err;
+
+	lockdep_assert_held(&vm->vm_bind_lock);
+
+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
+		if (!i915_gem_object_is_userptr(vma->obj))
+			continue;
+
+		err = i915_gem_object_userptr_submit_init(vma->obj);
+		if (err)
+			return err;
+
+		/**
+		 * The above submit_init() call does the object unbind and
+		 * hence adds vma into vm_rebind_list. Remove it from that
+		 * list as it is already scooped for revalidation.
+		 */
+		spin_lock(&vm->vm_rebind_lock);
+		if (!list_empty(&vma->vm_rebind_link))
+			list_del_init(&vma->vm_rebind_link);
+		spin_unlock(&vm->vm_rebind_lock);
+
+		last_vma = vma;
+	}
+
+	if (last_vma)
+		eb->args->flags |= __EXEC3_USERPTR_USED;
+
+	return 0;
+}
+
 static int eb_lookup_vma_all(struct i915_execbuffer *eb)
 {
 	unsigned int i, current_batch = 0;
 	struct i915_vma *vma;
+	int err = 0;
 
 	for (i = 0; i < eb->num_batches; i++) {
 		vma = eb_find_vma(eb->context->vm, eb->batch_addresses[i]);
@@ -173,6 +226,10 @@ static int eb_lookup_vma_all(struct i915_execbuffer *eb)
 
 	eb_scoop_unbound_vma_all(eb->context->vm);
 
+	err = eb_lookup_persistent_userptr_vmas(eb);
+	if (err)
+		return err;
+
 	return 0;
 }
 
@@ -330,15 +387,57 @@ static void eb_move_all_persistent_vma_to_active(struct i915_execbuffer *eb)
 
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
+	int err = 0;
+
 	lockdep_assert_held(&eb->context->vm->vm_bind_lock);
 	assert_object_held(eb->context->vm->root_obj);
 
 	eb_move_all_persistent_vma_to_active(eb);
 
+#ifdef CONFIG_MMU_NOTIFIER
+	/* Check for further userptr invalidations */
+	spin_lock(&eb->context->vm->userptr_invalidated_lock);
+	if (!list_empty(&eb->context->vm->userptr_invalidated_list))
+		err = -EAGAIN;
+	spin_unlock(&eb->context->vm->userptr_invalidated_lock);
+
+	if (!err && (eb->args->flags & __EXEC3_USERPTR_USED)) {
+		struct i915_vma *vma;
+
+		lockdep_assert_held(&eb->context->vm->vm_bind_lock);
+		assert_object_held(eb->context->vm->root_obj);
+
+		read_lock(&eb->i915->mm.notifier_lock);
+		list_for_each_entry(vma, &eb->context->vm->vm_bind_list,
+				    vm_bind_link) {
+			if (!i915_gem_object_is_userptr(vma->obj))
+				continue;
+
+			err = i915_gem_object_userptr_submit_done(vma->obj);
+			if (err)
+				break;
+		}
+
+		read_unlock(&eb->i915->mm.notifier_lock);
+	}
+#endif
+
+	if (unlikely(err))
+		goto err_skip;
+
 	/* Unconditionally flush any chipset caches (for streaming writes). */
 	intel_gt_chipset_flush(eb->gt);
 
 	return 0;
+
+err_skip:
+	for_each_batch_create_order(eb) {
+		if (!eb->requests[i])
+			break;
+
+		i915_request_set_error_once(eb->requests[i], err);
+	}
+	return err;
 }
 
 static int eb_request_submit(struct i915_execbuffer *eb,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 8423df021b71..c0869f102f28 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -63,6 +63,7 @@ static bool i915_gem_userptr_invalidate(struct mmu_interval_notifier *mni,
 {
 	struct drm_i915_gem_object *obj = container_of(mni, struct drm_i915_gem_object, userptr.notifier);
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct i915_vma *vma;
 	long r;
 
 	if (!mmu_notifier_range_blockable(range))
@@ -85,6 +86,22 @@ static bool i915_gem_userptr_invalidate(struct mmu_interval_notifier *mni,
 	if (current->flags & PF_EXITING)
 		return true;
 
+	/**
+	 * Add persistent vmas into userptr_invalidated list for relookup
+	 * and revalidation.
+	 */
+	spin_lock(&obj->vma.lock);
+	list_for_each_entry(vma, &obj->vma.list, obj_link) {
+		if (!i915_vma_is_persistent(vma))
+			continue;
+
+		spin_lock(&vma->vm->userptr_invalidated_lock);
+		list_add_tail(&vma->userptr_invalidated_link,
+			      &vma->vm->userptr_invalidated_list);
+		spin_unlock(&vma->vm->userptr_invalidated_lock);
+	}
+	spin_unlock(&obj->vma.lock);
+
 	/* we will unbind on next submission, still have userptr pins */
 	r = dma_resv_wait_timeout(obj->base.resv, DMA_RESV_USAGE_BOOKKEEP, false,
 				  MAX_SCHEDULE_TIMEOUT);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 5cd788404ee7..3087731cc0c0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -294,6 +294,12 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		goto put_obj;
 	}
 
+	if (i915_gem_object_is_userptr(obj)) {
+		ret = i915_gem_object_userptr_submit_init(obj);
+		if (ret)
+			goto put_obj;
+	}
+
 	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
 	if (ret)
 		goto put_obj;
@@ -325,6 +331,16 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		/* Make it evictable */
 		__i915_vma_unpin(vma);
 
+#ifdef CONFIG_MMU_NOTIFIER
+		if (i915_gem_object_is_userptr(obj)) {
+			read_lock(&vm->i915->mm.notifier_lock);
+			ret = i915_gem_object_userptr_submit_done(obj);
+			read_unlock(&vm->i915->mm.notifier_lock);
+			if (ret)
+				continue;
+		}
+#endif
+
 		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
 		i915_vm_bind_it_insert(vma, &vm->va);
 		if (!obj->priv_root)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 6db31197fa87..401202391649 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -298,6 +298,8 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	GEM_BUG_ON(IS_ERR(vm->root_obj));
 	INIT_LIST_HEAD(&vm->vm_rebind_list);
 	spin_lock_init(&vm->vm_rebind_lock);
+	spin_lock_init(&vm->userptr_invalidated_lock);
+	INIT_LIST_HEAD(&vm->userptr_invalidated_list);
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index b73d35b4e05d..c3069ee42b5a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -277,6 +277,10 @@ struct i915_address_space {
 	struct list_head vm_rebind_list;
 	/* @vm_rebind_lock: protects vm_rebound_list */
 	spinlock_t vm_rebind_lock;
+	/* @userptr_invalidated_list: list of invalidated userptr vmas */
+	struct list_head userptr_invalidated_list;
+	/* @userptr_invalidated_lock: protects userptr_invalidated_list */
+	spinlock_t userptr_invalidated_lock;
 	/* @va: tree of persistent vmas */
 	struct rb_root_cached va;
 	struct list_head non_priv_vm_bind_list;
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index 7fdbf73666e9..636520a83e4f 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -310,6 +310,8 @@ struct i915_vma {
 	struct list_head non_priv_vm_bind_link;
 	/* @vm_rebind_link: link to vm_rebind_list and protected by vm_rebind_lock */
 	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
+	/*@userptr_invalidated_link: link to the vm->userptr_invalidated_list */
+	struct list_head userptr_invalidated_link;
 
 	/** Timeline fence for vm_bind completion notification */
 	struct {
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC v4 13/14] drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

vma_lookup is tied to segment of the object instead of section
of VA space. Hence, it do not support aliasing (ie., multiple
bindings to the same section of the object).
Skip vma_lookup for persistent vmas as it supports aliasing.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/display/intel_fb_pin.c   |  2 +-
 .../drm/i915/display/intel_plane_initial.c    |  2 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  4 +-
 .../drm/i915/gem/i915_gem_vm_bind_object.c    |  2 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   | 16 +++----
 .../i915/gem/selftests/i915_gem_client_blt.c  |  2 +-
 .../drm/i915/gem/selftests/i915_gem_context.c | 12 ++---
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |  6 ++-
 .../drm/i915/gem/selftests/igt_gem_utils.c    |  2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  2 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt.c            |  2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c           |  4 +-
 drivers/gpu/drm/i915/gt/intel_renderstate.c   |  2 +-
 drivers/gpu/drm/i915/gt/intel_ring.c          |  2 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  4 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c      |  2 +-
 drivers/gpu/drm/i915/gt/mock_engine.c         |  2 +-
 drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |  4 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 16 +++----
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  6 +--
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |  2 +-
 .../drm/i915/gt/selftest_ring_submission.c    |  2 +-
 drivers/gpu/drm/i915/gt/selftest_rps.c        |  2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  2 +-
 drivers/gpu/drm/i915/i915_gem.c               |  2 +-
 drivers/gpu/drm/i915/i915_perf.c              |  2 +-
 drivers/gpu/drm/i915/i915_vma.c               | 26 +++++++----
 drivers/gpu/drm/i915/i915_vma.h               |  3 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 44 +++++++++----------
 drivers/gpu/drm/i915/selftests/i915_request.c |  4 +-
 drivers/gpu/drm/i915/selftests/i915_vma.c     |  2 +-
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |  2 +-
 .../drm/i915/selftests/intel_memory_region.c  |  2 +-
 37 files changed, 106 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fb_pin.c b/drivers/gpu/drm/i915/display/intel_fb_pin.c
index c86e5d4ee016..5a718b247bb3 100644
--- a/drivers/gpu/drm/i915/display/intel_fb_pin.c
+++ b/drivers/gpu/drm/i915/display/intel_fb_pin.c
@@ -47,7 +47,7 @@ intel_pin_fb_obj_dpt(struct drm_framebuffer *fb,
 		goto err;
 	}
 
-	vma = i915_vma_instance(obj, vm, view);
+	vma = i915_vma_instance(obj, vm, view, false);
 	if (IS_ERR(vma))
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c
index 76be796df255..7667e2faa3fb 100644
--- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
+++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
@@ -136,7 +136,7 @@ initial_plane_vma(struct drm_i915_private *i915,
 		goto err_obj;
 	}
 
-	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL, false);
 	if (IS_ERR(vma))
 		goto err_obj;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 363b2a788cdf..0ee43cb601b5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -876,7 +876,7 @@ static struct i915_vma *eb_lookup_vma(struct i915_execbuffer *eb, u32 handle)
 			}
 		}
 
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			i915_gem_object_put(obj);
 			return vma;
@@ -2208,7 +2208,7 @@ shadow_batch_pin(struct i915_execbuffer *eb,
 	struct i915_vma *vma;
 	int err;
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma))
 		return vma;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 3087731cc0c0..4468603af6f1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -252,7 +252,7 @@ static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
 	view.type = I915_GTT_VIEW_PARTIAL;
 	view.partial.offset = va->offset >> PAGE_SHIFT;
 	view.partial.size = va->length >> PAGE_SHIFT;
-	vma = i915_vma_instance(obj, vm, &view);
+	vma = i915_vma_instance(obj, vm, &view, true);
 	if (IS_ERR(vma))
 		return vma;
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index c570cf780079..6e13a83d0e36 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -454,7 +454,7 @@ static int igt_mock_exhaust_device_supported_pages(void *arg)
 				goto out_put;
 			}
 
-			vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
+			vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
 			if (IS_ERR(vma)) {
 				err = PTR_ERR(vma);
 				goto out_put;
@@ -522,7 +522,7 @@ static int igt_mock_memory_region_huge_pages(void *arg)
 				goto out_region;
 			}
 
-			vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
+			vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
 			if (IS_ERR(vma)) {
 				err = PTR_ERR(vma);
 				goto out_put;
@@ -614,7 +614,7 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg)
 		/* Force the page size for this object */
 		obj->mm.page_sizes.sg = page_size;
 
-		vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out_unpin;
@@ -746,7 +746,7 @@ static int igt_mock_ppgtt_huge_fill(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			break;
@@ -924,7 +924,7 @@ static int igt_mock_ppgtt_64K(void *arg)
 			 */
 			obj->mm.page_sizes.sg &= ~I915_GTT_PAGE_SIZE_2M;
 
-			vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
+			vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
 			if (IS_ERR(vma)) {
 				err = PTR_ERR(vma);
 				goto out_object_unpin;
@@ -1092,7 +1092,7 @@ static int __igt_write_huge(struct intel_context *ce,
 	struct i915_vma *vma;
 	int err;
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma))
 		return PTR_ERR(vma);
 
@@ -1587,7 +1587,7 @@ static int igt_tmpfs_fallback(void *arg)
 	__i915_gem_object_flush_map(obj, 0, 64);
 	i915_gem_object_unpin_map(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_put;
@@ -1654,7 +1654,7 @@ static int igt_shrink_thp(void *arg)
 		goto out_vm;
 	}
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_put;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index 9a6a6b5b722b..e6c6c73bf80e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -282,7 +282,7 @@ __create_vma(struct tiled_blits *t, size_t size, bool lmem)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, t->ce->vm, NULL);
+	vma = i915_vma_instance(obj, t->ce->vm, NULL, false);
 	if (IS_ERR(vma))
 		i915_gem_object_put(obj);
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index c6ad67b90e8a..570f74df9bef 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -426,7 +426,7 @@ static int gpu_fill(struct intel_context *ce,
 	GEM_BUG_ON(obj->base.size > ce->vm->total);
 	GEM_BUG_ON(!intel_engine_can_store_dword(ce->engine));
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma))
 		return PTR_ERR(vma);
 
@@ -930,7 +930,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj,
 	if (GRAPHICS_VER(i915) < 8)
 		return -EINVAL;
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma))
 		return PTR_ERR(vma);
 
@@ -938,7 +938,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj,
 	if (IS_ERR(rpcs))
 		return PTR_ERR(rpcs);
 
-	batch = i915_vma_instance(rpcs, ce->vm, NULL);
+	batch = i915_vma_instance(rpcs, ce->vm, NULL, false);
 	if (IS_ERR(batch)) {
 		err = PTR_ERR(batch);
 		goto err_put;
@@ -1522,7 +1522,7 @@ static int write_to_scratch(struct i915_gem_context *ctx,
 	intel_gt_chipset_flush(engine->gt);
 
 	vm = i915_gem_context_get_eb_vm(ctx);
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_vm;
@@ -1599,7 +1599,7 @@ static int read_from_scratch(struct i915_gem_context *ctx,
 		const u32 GPR0 = engine->mmio_base + 0x600;
 
 		vm = i915_gem_context_get_eb_vm(ctx);
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out_vm;
@@ -1635,7 +1635,7 @@ static int read_from_scratch(struct i915_gem_context *ctx,
 
 		/* hsw: register access even to 3DPRIM! is protected */
 		vm = i915_vm_get(&engine->gt->ggtt->vm);
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out_vm;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
index fe6c37fd7859..fc235e1e6c12 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
@@ -201,7 +201,7 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
 		return PTR_ERR(obj);
 
 	if (vm) {
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out_put;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index b73c91aa5450..e07c91dc33ba 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -546,7 +546,8 @@ static int make_obj_busy(struct drm_i915_gem_object *obj)
 		struct i915_gem_ww_ctx ww;
 		int err;
 
-		vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &engine->gt->ggtt->vm,
+					NULL, false);
 		if (IS_ERR(vma))
 			return PTR_ERR(vma);
 
@@ -1587,7 +1588,8 @@ static int __igt_mmap_gpu(struct drm_i915_private *i915,
 		struct i915_vma *vma;
 		struct i915_gem_ww_ctx ww;
 
-		vma = i915_vma_instance(obj, engine->kernel_context->vm, NULL);
+		vma = i915_vma_instance(obj, engine->kernel_context->vm,
+					NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out_unmap;
diff --git a/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c b/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
index 3c55e77b0f1b..4184e198c824 100644
--- a/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
+++ b/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
@@ -91,7 +91,7 @@ igt_emit_store_dw(struct i915_vma *vma,
 
 	intel_gt_chipset_flush(vma->vm->gt);
 
-	vma = i915_vma_instance(obj, vma->vm, NULL);
+	vma = i915_vma_instance(obj, vma->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err;
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index 1bb766c79dcb..a0af2aa50533 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -395,7 +395,7 @@ gen6_alloc_top_pd(struct gen6_ppgtt *ppgtt)
 	pd->pt.base->base.resv = i915_vm_resv_get(&ppgtt->base.vm);
 	pd->pt.base->shares_resv_from = &ppgtt->base.vm;
 
-	ppgtt->vma = i915_vma_instance(pd->pt.base, &ggtt->vm, NULL);
+	ppgtt->vma = i915_vma_instance(pd->pt.base, &ggtt->vm, NULL, false);
 	if (IS_ERR(ppgtt->vma)) {
 		err = PTR_ERR(ppgtt->vma);
 		ppgtt->vma = NULL;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 2ddcad497fa3..8146bf811d0f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1001,7 +1001,7 @@ static int init_status_page(struct intel_engine_cs *engine)
 
 	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
 
-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err_put;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index b367cfff48d5..8a78c6cec7b4 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -441,7 +441,7 @@ static int intel_gt_init_scratch(struct intel_gt *gt, unsigned int size)
 		return PTR_ERR(obj);
 	}
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err_unref;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 401202391649..c9bc33149ad7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -628,7 +628,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
 
 	i915_gem_object_set_cache_coherency(obj, I915_CACHING_CACHED);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 3955292483a6..570d097a2492 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1029,7 +1029,7 @@ __lrc_alloc_state(struct intel_context *ce, struct intel_engine_cs *engine)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
@@ -1685,7 +1685,7 @@ static int lrc_create_wa_ctx(struct intel_engine_cs *engine)
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err;
diff --git a/drivers/gpu/drm/i915/gt/intel_renderstate.c b/drivers/gpu/drm/i915/gt/intel_renderstate.c
index 5121e6dc2fa5..bc7a2d4421db 100644
--- a/drivers/gpu/drm/i915/gt/intel_renderstate.c
+++ b/drivers/gpu/drm/i915/gt/intel_renderstate.c
@@ -157,7 +157,7 @@ int intel_renderstate_init(struct intel_renderstate *so,
 		if (IS_ERR(obj))
 			return PTR_ERR(obj);
 
-		so->vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+		so->vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
 		if (IS_ERR(so->vma)) {
 			err = PTR_ERR(so->vma);
 			goto err_obj;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c
index 15ec64d881c4..24c8b738a394 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -130,7 +130,7 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
 	if (vm->has_read_only)
 		i915_gem_object_set_readonly(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma))
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index d5d6f1fadcae..5e93a4052140 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -551,7 +551,7 @@ alloc_context_vma(struct intel_engine_cs *engine)
 	if (IS_IVYBRIDGE(i915))
 		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
 
-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -1291,7 +1291,7 @@ static struct i915_vma *gen7_ctx_vma(struct intel_engine_cs *engine)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, engine->gt->vm, NULL);
+	vma = i915_vma_instance(obj, engine->gt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return ERR_CAST(vma);
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index b9640212d659..31f56996f100 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -28,7 +28,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
 
 	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma))
 		i915_gem_object_put(obj);
 
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index c0637bf799a3..6f3578308395 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -46,7 +46,7 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma))
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
index 1b75f478d1b8..16fcaba7c980 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
@@ -85,7 +85,7 @@ static struct i915_vma *create_empty_batch(struct intel_context *ce)
 
 	i915_gem_object_flush_map(obj);
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_unpin;
@@ -222,7 +222,7 @@ static struct i915_vma *create_nop_batch(struct intel_context *ce)
 
 	i915_gem_object_flush_map(obj);
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_unpin;
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index 1e08b2473b99..643ffcb3964a 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -1000,7 +1000,7 @@ static int live_timeslice_preempt(void *arg)
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -1307,7 +1307,7 @@ static int live_timeslice_queue(void *arg)
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -1562,7 +1562,7 @@ static int live_busywait_preempt(void *arg)
 		goto err_obj;
 	}
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_map;
@@ -2716,7 +2716,7 @@ static int create_gang(struct intel_engine_cs *engine,
 		goto err_ce;
 	}
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -3060,7 +3060,7 @@ create_gpr_user(struct intel_engine_cs *engine,
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, result->vm, NULL);
+	vma = i915_vma_instance(obj, result->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
@@ -3130,7 +3130,7 @@ static struct i915_vma *create_global(struct intel_gt *gt, size_t sz)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
@@ -3159,7 +3159,7 @@ create_gpr_client(struct intel_engine_cs *engine,
 	if (IS_ERR(ce))
 		return ERR_CAST(ce);
 
-	vma = i915_vma_instance(global->obj, ce->vm, NULL);
+	vma = i915_vma_instance(global->obj, ce->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_ce;
@@ -3501,7 +3501,7 @@ static int smoke_submit(struct preempt_smoke *smoke,
 		struct i915_address_space *vm;
 
 		vm = i915_gem_context_get_eb_vm(ctx);
-		vma = i915_vma_instance(batch, vm, NULL);
+		vma = i915_vma_instance(batch, vm, NULL, false);
 		i915_vm_put(vm);
 		if (IS_ERR(vma))
 			return PTR_ERR(vma);
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 7f3bb1d34dfb..0b021a32d0e0 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -147,13 +147,13 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine)
 	h->obj = obj;
 	h->batch = vaddr;
 
-	vma = i915_vma_instance(h->obj, vm, NULL);
+	vma = i915_vma_instance(h->obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_vm_put(vm);
 		return ERR_CAST(vma);
 	}
 
-	hws = i915_vma_instance(h->hws, vm, NULL);
+	hws = i915_vma_instance(h->hws, vm, NULL, false);
 	if (IS_ERR(hws)) {
 		i915_vm_put(vm);
 		return ERR_CAST(hws);
@@ -1474,7 +1474,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
 		}
 	}
 
-	arg.vma = i915_vma_instance(obj, vm, NULL);
+	arg.vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(arg.vma)) {
 		err = PTR_ERR(arg.vma);
 		pr_err("[%s] VMA instance failed: %d!\n", engine->name, err);
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 82d3f8058995..32867049b3bf 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -938,7 +938,7 @@ create_user_vma(struct i915_address_space *vm, unsigned long size)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
diff --git a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
index 70f9ac1ec2c7..7e9361104620 100644
--- a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
@@ -17,7 +17,7 @@ static struct i915_vma *create_wally(struct intel_engine_cs *engine)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, engine->gt->vm, NULL);
+	vma = i915_vma_instance(obj, engine->gt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c
index cfb4708dd62e..327558828bef 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rps.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
@@ -78,7 +78,7 @@ create_spin_counter(struct intel_engine_cs *engine,
 
 	end = obj->base.size / sizeof(u32) - 1;
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_put;
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index 67a9aab801dd..d893ea763ac6 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -122,7 +122,7 @@ read_nonprivs(struct intel_context *ce)
 	i915_gem_object_flush_map(result);
 	i915_gem_object_unpin_map(result);
 
-	vma = i915_vma_instance(result, &engine->gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(result, &engine->gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -389,7 +389,7 @@ static struct i915_vma *create_batch(struct i915_address_space *vm)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index bac06e3d6f2c..d56b1f82250c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -737,7 +737,7 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma))
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 88df9a35e0fe..bb6b1f56836f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -934,7 +934,7 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj,
 	}
 
 new_vma:
-	vma = i915_vma_instance(obj, &ggtt->vm, view);
+	vma = i915_vma_instance(obj, &ggtt->vm, view, false);
 	if (IS_ERR(vma))
 		return vma;
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 0defbb43ceea..d8f5ef9fd00f 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1920,7 +1920,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
 
 	oa_bo->vma = i915_vma_instance(obj,
 				       &stream->engine->gt->ggtt->vm,
-				       NULL);
+				       NULL, false);
 	if (IS_ERR(oa_bo->vma)) {
 		err = PTR_ERR(oa_bo->vma);
 		goto out_ww;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 24f171588f56..ef709a61fd54 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -110,7 +110,8 @@ static void __i915_vma_retire(struct i915_active *ref)
 static struct i915_vma *
 vma_create(struct drm_i915_gem_object *obj,
 	   struct i915_address_space *vm,
-	   const struct i915_gtt_view *view)
+	   const struct i915_gtt_view *view,
+	   bool persistent)
 {
 	struct i915_vma *pos = ERR_PTR(-E2BIG);
 	struct i915_vma *vma;
@@ -197,6 +198,9 @@ vma_create(struct drm_i915_gem_object *obj,
 		__set_bit(I915_VMA_GGTT_BIT, __i915_vma_flags(vma));
 	}
 
+	if (persistent)
+		goto skip_rb_insert;
+
 	rb = NULL;
 	p = &obj->vma.tree.rb_node;
 	while (*p) {
@@ -221,6 +225,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	rb_link_node(&vma->obj_node, rb, p);
 	rb_insert_color(&vma->obj_node, &obj->vma.tree);
 
+skip_rb_insert:
 	if (i915_vma_is_ggtt(vma))
 		/*
 		 * We put the GGTT vma at the start of the vma-list, followed
@@ -279,6 +284,7 @@ i915_vma_lookup(struct drm_i915_gem_object *obj,
  * @obj: parent &struct drm_i915_gem_object to be mapped
  * @vm: address space in which the mapping is located
  * @view: additional mapping requirements
+ * @persistent: Whether the vma is persistent
  *
  * i915_vma_instance() looks up an existing VMA of the @obj in the @vm with
  * the same @view characteristics. If a match is not found, one is created.
@@ -290,19 +296,22 @@ i915_vma_lookup(struct drm_i915_gem_object *obj,
 struct i915_vma *
 i915_vma_instance(struct drm_i915_gem_object *obj,
 		  struct i915_address_space *vm,
-		  const struct i915_gtt_view *view)
+		  const struct i915_gtt_view *view,
+		  bool persistent)
 {
-	struct i915_vma *vma;
+	struct i915_vma *vma = NULL;
 
 	GEM_BUG_ON(!kref_read(&vm->ref));
 
-	spin_lock(&obj->vma.lock);
-	vma = i915_vma_lookup(obj, vm, view);
-	spin_unlock(&obj->vma.lock);
+	if (!persistent) {
+		spin_lock(&obj->vma.lock);
+		vma = i915_vma_lookup(obj, vm, view);
+		spin_unlock(&obj->vma.lock);
+	}
 
 	/* vma_create() will resolve the race if another creates the vma */
 	if (unlikely(!vma))
-		vma = vma_create(obj, vm, view);
+		vma = vma_create(obj, vm, view, persistent);
 
 	GEM_BUG_ON(!IS_ERR(vma) && i915_vma_compare(vma, vm, view));
 	return vma;
@@ -1704,7 +1713,8 @@ static void release_references(struct i915_vma *vma, struct intel_gt *gt,
 
 	spin_lock(&obj->vma.lock);
 	list_del(&vma->obj_link);
-	if (!RB_EMPTY_NODE(&vma->obj_node))
+	if (!i915_vma_is_persistent(vma) &&
+	    !RB_EMPTY_NODE(&vma->obj_node))
 		rb_erase(&vma->obj_node, &obj->vma.tree);
 
 	spin_unlock(&obj->vma.lock);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 3a47db2d85f5..b8e805c6532f 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -43,7 +43,8 @@
 struct i915_vma *
 i915_vma_instance(struct drm_i915_gem_object *obj,
 		  struct i915_address_space *vm,
-		  const struct i915_gtt_view *view);
+		  const struct i915_gtt_view *view,
+		  bool persistent);
 
 void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
 #define I915_VMA_RELEASE_MAP BIT(0)
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index e050a2de5fd1..d8ffbdf91498 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -390,7 +390,7 @@ static void close_object_list(struct list_head *objects,
 	list_for_each_entry_safe(obj, on, objects, st_link) {
 		struct i915_vma *vma;
 
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (!IS_ERR(vma))
 			ignored = i915_vma_unbind_unlocked(vma);
 
@@ -452,7 +452,7 @@ static int fill_hole(struct i915_address_space *vm,
 					u64 aligned_size = round_up(obj->base.size,
 								    min_alignment);
 
-					vma = i915_vma_instance(obj, vm, NULL);
+					vma = i915_vma_instance(obj, vm, NULL, false);
 					if (IS_ERR(vma))
 						continue;
 
@@ -492,7 +492,7 @@ static int fill_hole(struct i915_address_space *vm,
 					u64 aligned_size = round_up(obj->base.size,
 								    min_alignment);
 
-					vma = i915_vma_instance(obj, vm, NULL);
+					vma = i915_vma_instance(obj, vm, NULL, false);
 					if (IS_ERR(vma))
 						continue;
 
@@ -531,7 +531,7 @@ static int fill_hole(struct i915_address_space *vm,
 					u64 aligned_size = round_up(obj->base.size,
 								    min_alignment);
 
-					vma = i915_vma_instance(obj, vm, NULL);
+					vma = i915_vma_instance(obj, vm, NULL, false);
 					if (IS_ERR(vma))
 						continue;
 
@@ -571,7 +571,7 @@ static int fill_hole(struct i915_address_space *vm,
 					u64 aligned_size = round_up(obj->base.size,
 								    min_alignment);
 
-					vma = i915_vma_instance(obj, vm, NULL);
+					vma = i915_vma_instance(obj, vm, NULL, false);
 					if (IS_ERR(vma))
 						continue;
 
@@ -653,7 +653,7 @@ static int walk_hole(struct i915_address_space *vm,
 		if (IS_ERR(obj))
 			break;
 
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto err_put;
@@ -728,7 +728,7 @@ static int pot_hole(struct i915_address_space *vm,
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -837,7 +837,7 @@ static int drunk_hole(struct i915_address_space *vm,
 			break;
 		}
 
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto err_obj;
@@ -920,7 +920,7 @@ static int __shrink_hole(struct i915_address_space *vm,
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			break;
@@ -1018,7 +1018,7 @@ static int shrink_boom(struct i915_address_space *vm,
 		if (IS_ERR(purge))
 			return PTR_ERR(purge);
 
-		vma = i915_vma_instance(purge, vm, NULL);
+		vma = i915_vma_instance(purge, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto err_purge;
@@ -1041,7 +1041,7 @@ static int shrink_boom(struct i915_address_space *vm,
 		vm->fault_attr.interval = 1;
 		atomic_set(&vm->fault_attr.times, -1);
 
-		vma = i915_vma_instance(explode, vm, NULL);
+		vma = i915_vma_instance(explode, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto err_explode;
@@ -1088,7 +1088,7 @@ static int misaligned_case(struct i915_address_space *vm, struct intel_memory_re
 		return PTR_ERR(obj);
 	}
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_put;
@@ -1560,7 +1560,7 @@ static int igt_gtt_reserve(void *arg)
 		}
 
 		list_add(&obj->st_link, &objects);
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1606,7 +1606,7 @@ static int igt_gtt_reserve(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1636,7 +1636,7 @@ static int igt_gtt_reserve(void *arg)
 		struct i915_vma *vma;
 		u64 offset;
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1783,7 +1783,7 @@ static int igt_gtt_insert(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1809,7 +1809,7 @@ static int igt_gtt_insert(void *arg)
 	list_for_each_entry(obj, &objects, st_link) {
 		struct i915_vma *vma;
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1829,7 +1829,7 @@ static int igt_gtt_insert(void *arg)
 		struct i915_vma *vma;
 		u64 offset;
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1882,7 +1882,7 @@ static int igt_gtt_insert(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -2091,7 +2091,7 @@ static int igt_cs_tlb(void *arg)
 	}
 	i915_gem_object_set_cache_coherency(out, I915_CACHING_CACHED);
 
-	vma = i915_vma_instance(out, vm, NULL);
+	vma = i915_vma_instance(out, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_put_out;
@@ -2131,7 +2131,7 @@ static int igt_cs_tlb(void *arg)
 
 			memset32(result, STACK_MAGIC, PAGE_SIZE / sizeof(u32));
 
-			vma = i915_vma_instance(bbe, vm, NULL);
+			vma = i915_vma_instance(bbe, vm, NULL, false);
 			if (IS_ERR(vma)) {
 				err = PTR_ERR(vma);
 				goto end;
@@ -2203,7 +2203,7 @@ static int igt_cs_tlb(void *arg)
 				goto end;
 			}
 
-			vma = i915_vma_instance(act, vm, NULL);
+			vma = i915_vma_instance(act, vm, NULL, false);
 			if (IS_ERR(vma)) {
 				kfree(vma_res);
 				err = PTR_ERR(vma);
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 818a4909c1f3..297c1d4ebf44 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -961,7 +961,7 @@ static struct i915_vma *empty_batch(struct drm_i915_private *i915)
 
 	intel_gt_chipset_flush(to_gt(i915));
 
-	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err;
@@ -1100,7 +1100,7 @@ static struct i915_vma *recursive_batch(struct drm_i915_private *i915)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, to_gt(i915)->vm, NULL);
+	vma = i915_vma_instance(obj, to_gt(i915)->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err;
diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
index 71b52d5efef4..3899c2252de3 100644
--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
@@ -68,7 +68,7 @@ checked_vma_instance(struct drm_i915_gem_object *obj,
 	struct i915_vma *vma;
 	bool ok = true;
 
-	vma = i915_vma_instance(obj, vm, view);
+	vma = i915_vma_instance(obj, vm, view, false);
 	if (IS_ERR(vma))
 		return vma;
 
diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
index 0c22594ae274..6901f94ff076 100644
--- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
+++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
@@ -47,7 +47,7 @@ static void *igt_spinner_pin_obj(struct intel_context *ce,
 	void *vaddr;
 	int ret;
 
-	*vma = i915_vma_instance(obj, ce->vm, NULL);
+	*vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(*vma))
 		return ERR_CAST(*vma);
 
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
index 3b18e5905c86..551d0c958a3b 100644
--- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
@@ -745,7 +745,7 @@ static int igt_gpu_write(struct i915_gem_context *ctx,
 	if (!order)
 		return -ENOMEM;
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_free;
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-gfx] [RFC v4 13/14] drm/i915/vm_bind: Skip vma_lookup for persistent vmas
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

vma_lookup is tied to segment of the object instead of section
of VA space. Hence, it do not support aliasing (ie., multiple
bindings to the same section of the object).
Skip vma_lookup for persistent vmas as it supports aliasing.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/display/intel_fb_pin.c   |  2 +-
 .../drm/i915/display/intel_plane_initial.c    |  2 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  4 +-
 .../drm/i915/gem/i915_gem_vm_bind_object.c    |  2 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   | 16 +++----
 .../i915/gem/selftests/i915_gem_client_blt.c  |  2 +-
 .../drm/i915/gem/selftests/i915_gem_context.c | 12 ++---
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |  6 ++-
 .../drm/i915/gem/selftests/igt_gem_utils.c    |  2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  2 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt.c            |  2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c           |  4 +-
 drivers/gpu/drm/i915/gt/intel_renderstate.c   |  2 +-
 drivers/gpu/drm/i915/gt/intel_ring.c          |  2 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  4 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c      |  2 +-
 drivers/gpu/drm/i915/gt/mock_engine.c         |  2 +-
 drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |  4 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 16 +++----
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  6 +--
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |  2 +-
 .../drm/i915/gt/selftest_ring_submission.c    |  2 +-
 drivers/gpu/drm/i915/gt/selftest_rps.c        |  2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  2 +-
 drivers/gpu/drm/i915/i915_gem.c               |  2 +-
 drivers/gpu/drm/i915/i915_perf.c              |  2 +-
 drivers/gpu/drm/i915/i915_vma.c               | 26 +++++++----
 drivers/gpu/drm/i915/i915_vma.h               |  3 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 44 +++++++++----------
 drivers/gpu/drm/i915/selftests/i915_request.c |  4 +-
 drivers/gpu/drm/i915/selftests/i915_vma.c     |  2 +-
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |  2 +-
 .../drm/i915/selftests/intel_memory_region.c  |  2 +-
 37 files changed, 106 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fb_pin.c b/drivers/gpu/drm/i915/display/intel_fb_pin.c
index c86e5d4ee016..5a718b247bb3 100644
--- a/drivers/gpu/drm/i915/display/intel_fb_pin.c
+++ b/drivers/gpu/drm/i915/display/intel_fb_pin.c
@@ -47,7 +47,7 @@ intel_pin_fb_obj_dpt(struct drm_framebuffer *fb,
 		goto err;
 	}
 
-	vma = i915_vma_instance(obj, vm, view);
+	vma = i915_vma_instance(obj, vm, view, false);
 	if (IS_ERR(vma))
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c
index 76be796df255..7667e2faa3fb 100644
--- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
+++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
@@ -136,7 +136,7 @@ initial_plane_vma(struct drm_i915_private *i915,
 		goto err_obj;
 	}
 
-	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL, false);
 	if (IS_ERR(vma))
 		goto err_obj;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 363b2a788cdf..0ee43cb601b5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -876,7 +876,7 @@ static struct i915_vma *eb_lookup_vma(struct i915_execbuffer *eb, u32 handle)
 			}
 		}
 
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			i915_gem_object_put(obj);
 			return vma;
@@ -2208,7 +2208,7 @@ shadow_batch_pin(struct i915_execbuffer *eb,
 	struct i915_vma *vma;
 	int err;
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma))
 		return vma;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 3087731cc0c0..4468603af6f1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -252,7 +252,7 @@ static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
 	view.type = I915_GTT_VIEW_PARTIAL;
 	view.partial.offset = va->offset >> PAGE_SHIFT;
 	view.partial.size = va->length >> PAGE_SHIFT;
-	vma = i915_vma_instance(obj, vm, &view);
+	vma = i915_vma_instance(obj, vm, &view, true);
 	if (IS_ERR(vma))
 		return vma;
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index c570cf780079..6e13a83d0e36 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -454,7 +454,7 @@ static int igt_mock_exhaust_device_supported_pages(void *arg)
 				goto out_put;
 			}
 
-			vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
+			vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
 			if (IS_ERR(vma)) {
 				err = PTR_ERR(vma);
 				goto out_put;
@@ -522,7 +522,7 @@ static int igt_mock_memory_region_huge_pages(void *arg)
 				goto out_region;
 			}
 
-			vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
+			vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
 			if (IS_ERR(vma)) {
 				err = PTR_ERR(vma);
 				goto out_put;
@@ -614,7 +614,7 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg)
 		/* Force the page size for this object */
 		obj->mm.page_sizes.sg = page_size;
 
-		vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out_unpin;
@@ -746,7 +746,7 @@ static int igt_mock_ppgtt_huge_fill(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			break;
@@ -924,7 +924,7 @@ static int igt_mock_ppgtt_64K(void *arg)
 			 */
 			obj->mm.page_sizes.sg &= ~I915_GTT_PAGE_SIZE_2M;
 
-			vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
+			vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
 			if (IS_ERR(vma)) {
 				err = PTR_ERR(vma);
 				goto out_object_unpin;
@@ -1092,7 +1092,7 @@ static int __igt_write_huge(struct intel_context *ce,
 	struct i915_vma *vma;
 	int err;
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma))
 		return PTR_ERR(vma);
 
@@ -1587,7 +1587,7 @@ static int igt_tmpfs_fallback(void *arg)
 	__i915_gem_object_flush_map(obj, 0, 64);
 	i915_gem_object_unpin_map(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_put;
@@ -1654,7 +1654,7 @@ static int igt_shrink_thp(void *arg)
 		goto out_vm;
 	}
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_put;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index 9a6a6b5b722b..e6c6c73bf80e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -282,7 +282,7 @@ __create_vma(struct tiled_blits *t, size_t size, bool lmem)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, t->ce->vm, NULL);
+	vma = i915_vma_instance(obj, t->ce->vm, NULL, false);
 	if (IS_ERR(vma))
 		i915_gem_object_put(obj);
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index c6ad67b90e8a..570f74df9bef 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -426,7 +426,7 @@ static int gpu_fill(struct intel_context *ce,
 	GEM_BUG_ON(obj->base.size > ce->vm->total);
 	GEM_BUG_ON(!intel_engine_can_store_dword(ce->engine));
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma))
 		return PTR_ERR(vma);
 
@@ -930,7 +930,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj,
 	if (GRAPHICS_VER(i915) < 8)
 		return -EINVAL;
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma))
 		return PTR_ERR(vma);
 
@@ -938,7 +938,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj,
 	if (IS_ERR(rpcs))
 		return PTR_ERR(rpcs);
 
-	batch = i915_vma_instance(rpcs, ce->vm, NULL);
+	batch = i915_vma_instance(rpcs, ce->vm, NULL, false);
 	if (IS_ERR(batch)) {
 		err = PTR_ERR(batch);
 		goto err_put;
@@ -1522,7 +1522,7 @@ static int write_to_scratch(struct i915_gem_context *ctx,
 	intel_gt_chipset_flush(engine->gt);
 
 	vm = i915_gem_context_get_eb_vm(ctx);
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_vm;
@@ -1599,7 +1599,7 @@ static int read_from_scratch(struct i915_gem_context *ctx,
 		const u32 GPR0 = engine->mmio_base + 0x600;
 
 		vm = i915_gem_context_get_eb_vm(ctx);
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out_vm;
@@ -1635,7 +1635,7 @@ static int read_from_scratch(struct i915_gem_context *ctx,
 
 		/* hsw: register access even to 3DPRIM! is protected */
 		vm = i915_vm_get(&engine->gt->ggtt->vm);
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out_vm;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
index fe6c37fd7859..fc235e1e6c12 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
@@ -201,7 +201,7 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
 		return PTR_ERR(obj);
 
 	if (vm) {
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out_put;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index b73c91aa5450..e07c91dc33ba 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -546,7 +546,8 @@ static int make_obj_busy(struct drm_i915_gem_object *obj)
 		struct i915_gem_ww_ctx ww;
 		int err;
 
-		vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &engine->gt->ggtt->vm,
+					NULL, false);
 		if (IS_ERR(vma))
 			return PTR_ERR(vma);
 
@@ -1587,7 +1588,8 @@ static int __igt_mmap_gpu(struct drm_i915_private *i915,
 		struct i915_vma *vma;
 		struct i915_gem_ww_ctx ww;
 
-		vma = i915_vma_instance(obj, engine->kernel_context->vm, NULL);
+		vma = i915_vma_instance(obj, engine->kernel_context->vm,
+					NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out_unmap;
diff --git a/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c b/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
index 3c55e77b0f1b..4184e198c824 100644
--- a/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
+++ b/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
@@ -91,7 +91,7 @@ igt_emit_store_dw(struct i915_vma *vma,
 
 	intel_gt_chipset_flush(vma->vm->gt);
 
-	vma = i915_vma_instance(obj, vma->vm, NULL);
+	vma = i915_vma_instance(obj, vma->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err;
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index 1bb766c79dcb..a0af2aa50533 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -395,7 +395,7 @@ gen6_alloc_top_pd(struct gen6_ppgtt *ppgtt)
 	pd->pt.base->base.resv = i915_vm_resv_get(&ppgtt->base.vm);
 	pd->pt.base->shares_resv_from = &ppgtt->base.vm;
 
-	ppgtt->vma = i915_vma_instance(pd->pt.base, &ggtt->vm, NULL);
+	ppgtt->vma = i915_vma_instance(pd->pt.base, &ggtt->vm, NULL, false);
 	if (IS_ERR(ppgtt->vma)) {
 		err = PTR_ERR(ppgtt->vma);
 		ppgtt->vma = NULL;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 2ddcad497fa3..8146bf811d0f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1001,7 +1001,7 @@ static int init_status_page(struct intel_engine_cs *engine)
 
 	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
 
-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err_put;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index b367cfff48d5..8a78c6cec7b4 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -441,7 +441,7 @@ static int intel_gt_init_scratch(struct intel_gt *gt, unsigned int size)
 		return PTR_ERR(obj);
 	}
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err_unref;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 401202391649..c9bc33149ad7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -628,7 +628,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
 
 	i915_gem_object_set_cache_coherency(obj, I915_CACHING_CACHED);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 3955292483a6..570d097a2492 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1029,7 +1029,7 @@ __lrc_alloc_state(struct intel_context *ce, struct intel_engine_cs *engine)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
@@ -1685,7 +1685,7 @@ static int lrc_create_wa_ctx(struct intel_engine_cs *engine)
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err;
diff --git a/drivers/gpu/drm/i915/gt/intel_renderstate.c b/drivers/gpu/drm/i915/gt/intel_renderstate.c
index 5121e6dc2fa5..bc7a2d4421db 100644
--- a/drivers/gpu/drm/i915/gt/intel_renderstate.c
+++ b/drivers/gpu/drm/i915/gt/intel_renderstate.c
@@ -157,7 +157,7 @@ int intel_renderstate_init(struct intel_renderstate *so,
 		if (IS_ERR(obj))
 			return PTR_ERR(obj);
 
-		so->vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+		so->vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
 		if (IS_ERR(so->vma)) {
 			err = PTR_ERR(so->vma);
 			goto err_obj;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c
index 15ec64d881c4..24c8b738a394 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -130,7 +130,7 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
 	if (vm->has_read_only)
 		i915_gem_object_set_readonly(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma))
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index d5d6f1fadcae..5e93a4052140 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -551,7 +551,7 @@ alloc_context_vma(struct intel_engine_cs *engine)
 	if (IS_IVYBRIDGE(i915))
 		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
 
-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -1291,7 +1291,7 @@ static struct i915_vma *gen7_ctx_vma(struct intel_engine_cs *engine)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, engine->gt->vm, NULL);
+	vma = i915_vma_instance(obj, engine->gt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return ERR_CAST(vma);
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index b9640212d659..31f56996f100 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -28,7 +28,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
 
 	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma))
 		i915_gem_object_put(obj);
 
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index c0637bf799a3..6f3578308395 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -46,7 +46,7 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma))
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
index 1b75f478d1b8..16fcaba7c980 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
@@ -85,7 +85,7 @@ static struct i915_vma *create_empty_batch(struct intel_context *ce)
 
 	i915_gem_object_flush_map(obj);
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_unpin;
@@ -222,7 +222,7 @@ static struct i915_vma *create_nop_batch(struct intel_context *ce)
 
 	i915_gem_object_flush_map(obj);
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_unpin;
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index 1e08b2473b99..643ffcb3964a 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -1000,7 +1000,7 @@ static int live_timeslice_preempt(void *arg)
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -1307,7 +1307,7 @@ static int live_timeslice_queue(void *arg)
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -1562,7 +1562,7 @@ static int live_busywait_preempt(void *arg)
 		goto err_obj;
 	}
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_map;
@@ -2716,7 +2716,7 @@ static int create_gang(struct intel_engine_cs *engine,
 		goto err_ce;
 	}
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -3060,7 +3060,7 @@ create_gpr_user(struct intel_engine_cs *engine,
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, result->vm, NULL);
+	vma = i915_vma_instance(obj, result->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
@@ -3130,7 +3130,7 @@ static struct i915_vma *create_global(struct intel_gt *gt, size_t sz)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
@@ -3159,7 +3159,7 @@ create_gpr_client(struct intel_engine_cs *engine,
 	if (IS_ERR(ce))
 		return ERR_CAST(ce);
 
-	vma = i915_vma_instance(global->obj, ce->vm, NULL);
+	vma = i915_vma_instance(global->obj, ce->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_ce;
@@ -3501,7 +3501,7 @@ static int smoke_submit(struct preempt_smoke *smoke,
 		struct i915_address_space *vm;
 
 		vm = i915_gem_context_get_eb_vm(ctx);
-		vma = i915_vma_instance(batch, vm, NULL);
+		vma = i915_vma_instance(batch, vm, NULL, false);
 		i915_vm_put(vm);
 		if (IS_ERR(vma))
 			return PTR_ERR(vma);
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 7f3bb1d34dfb..0b021a32d0e0 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -147,13 +147,13 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine)
 	h->obj = obj;
 	h->batch = vaddr;
 
-	vma = i915_vma_instance(h->obj, vm, NULL);
+	vma = i915_vma_instance(h->obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_vm_put(vm);
 		return ERR_CAST(vma);
 	}
 
-	hws = i915_vma_instance(h->hws, vm, NULL);
+	hws = i915_vma_instance(h->hws, vm, NULL, false);
 	if (IS_ERR(hws)) {
 		i915_vm_put(vm);
 		return ERR_CAST(hws);
@@ -1474,7 +1474,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
 		}
 	}
 
-	arg.vma = i915_vma_instance(obj, vm, NULL);
+	arg.vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(arg.vma)) {
 		err = PTR_ERR(arg.vma);
 		pr_err("[%s] VMA instance failed: %d!\n", engine->name, err);
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 82d3f8058995..32867049b3bf 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -938,7 +938,7 @@ create_user_vma(struct i915_address_space *vm, unsigned long size)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
diff --git a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
index 70f9ac1ec2c7..7e9361104620 100644
--- a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
@@ -17,7 +17,7 @@ static struct i915_vma *create_wally(struct intel_engine_cs *engine)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, engine->gt->vm, NULL);
+	vma = i915_vma_instance(obj, engine->gt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c
index cfb4708dd62e..327558828bef 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rps.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
@@ -78,7 +78,7 @@ create_spin_counter(struct intel_engine_cs *engine,
 
 	end = obj->base.size / sizeof(u32) - 1;
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_put;
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index 67a9aab801dd..d893ea763ac6 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -122,7 +122,7 @@ read_nonprivs(struct intel_context *ce)
 	i915_gem_object_flush_map(result);
 	i915_gem_object_unpin_map(result);
 
-	vma = i915_vma_instance(result, &engine->gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(result, &engine->gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -389,7 +389,7 @@ static struct i915_vma *create_batch(struct i915_address_space *vm)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index bac06e3d6f2c..d56b1f82250c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -737,7 +737,7 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma))
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 88df9a35e0fe..bb6b1f56836f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -934,7 +934,7 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj,
 	}
 
 new_vma:
-	vma = i915_vma_instance(obj, &ggtt->vm, view);
+	vma = i915_vma_instance(obj, &ggtt->vm, view, false);
 	if (IS_ERR(vma))
 		return vma;
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 0defbb43ceea..d8f5ef9fd00f 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1920,7 +1920,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
 
 	oa_bo->vma = i915_vma_instance(obj,
 				       &stream->engine->gt->ggtt->vm,
-				       NULL);
+				       NULL, false);
 	if (IS_ERR(oa_bo->vma)) {
 		err = PTR_ERR(oa_bo->vma);
 		goto out_ww;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 24f171588f56..ef709a61fd54 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -110,7 +110,8 @@ static void __i915_vma_retire(struct i915_active *ref)
 static struct i915_vma *
 vma_create(struct drm_i915_gem_object *obj,
 	   struct i915_address_space *vm,
-	   const struct i915_gtt_view *view)
+	   const struct i915_gtt_view *view,
+	   bool persistent)
 {
 	struct i915_vma *pos = ERR_PTR(-E2BIG);
 	struct i915_vma *vma;
@@ -197,6 +198,9 @@ vma_create(struct drm_i915_gem_object *obj,
 		__set_bit(I915_VMA_GGTT_BIT, __i915_vma_flags(vma));
 	}
 
+	if (persistent)
+		goto skip_rb_insert;
+
 	rb = NULL;
 	p = &obj->vma.tree.rb_node;
 	while (*p) {
@@ -221,6 +225,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	rb_link_node(&vma->obj_node, rb, p);
 	rb_insert_color(&vma->obj_node, &obj->vma.tree);
 
+skip_rb_insert:
 	if (i915_vma_is_ggtt(vma))
 		/*
 		 * We put the GGTT vma at the start of the vma-list, followed
@@ -279,6 +284,7 @@ i915_vma_lookup(struct drm_i915_gem_object *obj,
  * @obj: parent &struct drm_i915_gem_object to be mapped
  * @vm: address space in which the mapping is located
  * @view: additional mapping requirements
+ * @persistent: Whether the vma is persistent
  *
  * i915_vma_instance() looks up an existing VMA of the @obj in the @vm with
  * the same @view characteristics. If a match is not found, one is created.
@@ -290,19 +296,22 @@ i915_vma_lookup(struct drm_i915_gem_object *obj,
 struct i915_vma *
 i915_vma_instance(struct drm_i915_gem_object *obj,
 		  struct i915_address_space *vm,
-		  const struct i915_gtt_view *view)
+		  const struct i915_gtt_view *view,
+		  bool persistent)
 {
-	struct i915_vma *vma;
+	struct i915_vma *vma = NULL;
 
 	GEM_BUG_ON(!kref_read(&vm->ref));
 
-	spin_lock(&obj->vma.lock);
-	vma = i915_vma_lookup(obj, vm, view);
-	spin_unlock(&obj->vma.lock);
+	if (!persistent) {
+		spin_lock(&obj->vma.lock);
+		vma = i915_vma_lookup(obj, vm, view);
+		spin_unlock(&obj->vma.lock);
+	}
 
 	/* vma_create() will resolve the race if another creates the vma */
 	if (unlikely(!vma))
-		vma = vma_create(obj, vm, view);
+		vma = vma_create(obj, vm, view, persistent);
 
 	GEM_BUG_ON(!IS_ERR(vma) && i915_vma_compare(vma, vm, view));
 	return vma;
@@ -1704,7 +1713,8 @@ static void release_references(struct i915_vma *vma, struct intel_gt *gt,
 
 	spin_lock(&obj->vma.lock);
 	list_del(&vma->obj_link);
-	if (!RB_EMPTY_NODE(&vma->obj_node))
+	if (!i915_vma_is_persistent(vma) &&
+	    !RB_EMPTY_NODE(&vma->obj_node))
 		rb_erase(&vma->obj_node, &obj->vma.tree);
 
 	spin_unlock(&obj->vma.lock);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 3a47db2d85f5..b8e805c6532f 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -43,7 +43,8 @@
 struct i915_vma *
 i915_vma_instance(struct drm_i915_gem_object *obj,
 		  struct i915_address_space *vm,
-		  const struct i915_gtt_view *view);
+		  const struct i915_gtt_view *view,
+		  bool persistent);
 
 void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
 #define I915_VMA_RELEASE_MAP BIT(0)
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index e050a2de5fd1..d8ffbdf91498 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -390,7 +390,7 @@ static void close_object_list(struct list_head *objects,
 	list_for_each_entry_safe(obj, on, objects, st_link) {
 		struct i915_vma *vma;
 
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (!IS_ERR(vma))
 			ignored = i915_vma_unbind_unlocked(vma);
 
@@ -452,7 +452,7 @@ static int fill_hole(struct i915_address_space *vm,
 					u64 aligned_size = round_up(obj->base.size,
 								    min_alignment);
 
-					vma = i915_vma_instance(obj, vm, NULL);
+					vma = i915_vma_instance(obj, vm, NULL, false);
 					if (IS_ERR(vma))
 						continue;
 
@@ -492,7 +492,7 @@ static int fill_hole(struct i915_address_space *vm,
 					u64 aligned_size = round_up(obj->base.size,
 								    min_alignment);
 
-					vma = i915_vma_instance(obj, vm, NULL);
+					vma = i915_vma_instance(obj, vm, NULL, false);
 					if (IS_ERR(vma))
 						continue;
 
@@ -531,7 +531,7 @@ static int fill_hole(struct i915_address_space *vm,
 					u64 aligned_size = round_up(obj->base.size,
 								    min_alignment);
 
-					vma = i915_vma_instance(obj, vm, NULL);
+					vma = i915_vma_instance(obj, vm, NULL, false);
 					if (IS_ERR(vma))
 						continue;
 
@@ -571,7 +571,7 @@ static int fill_hole(struct i915_address_space *vm,
 					u64 aligned_size = round_up(obj->base.size,
 								    min_alignment);
 
-					vma = i915_vma_instance(obj, vm, NULL);
+					vma = i915_vma_instance(obj, vm, NULL, false);
 					if (IS_ERR(vma))
 						continue;
 
@@ -653,7 +653,7 @@ static int walk_hole(struct i915_address_space *vm,
 		if (IS_ERR(obj))
 			break;
 
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto err_put;
@@ -728,7 +728,7 @@ static int pot_hole(struct i915_address_space *vm,
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -837,7 +837,7 @@ static int drunk_hole(struct i915_address_space *vm,
 			break;
 		}
 
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto err_obj;
@@ -920,7 +920,7 @@ static int __shrink_hole(struct i915_address_space *vm,
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			break;
@@ -1018,7 +1018,7 @@ static int shrink_boom(struct i915_address_space *vm,
 		if (IS_ERR(purge))
 			return PTR_ERR(purge);
 
-		vma = i915_vma_instance(purge, vm, NULL);
+		vma = i915_vma_instance(purge, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto err_purge;
@@ -1041,7 +1041,7 @@ static int shrink_boom(struct i915_address_space *vm,
 		vm->fault_attr.interval = 1;
 		atomic_set(&vm->fault_attr.times, -1);
 
-		vma = i915_vma_instance(explode, vm, NULL);
+		vma = i915_vma_instance(explode, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto err_explode;
@@ -1088,7 +1088,7 @@ static int misaligned_case(struct i915_address_space *vm, struct intel_memory_re
 		return PTR_ERR(obj);
 	}
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_put;
@@ -1560,7 +1560,7 @@ static int igt_gtt_reserve(void *arg)
 		}
 
 		list_add(&obj->st_link, &objects);
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1606,7 +1606,7 @@ static int igt_gtt_reserve(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1636,7 +1636,7 @@ static int igt_gtt_reserve(void *arg)
 		struct i915_vma *vma;
 		u64 offset;
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1783,7 +1783,7 @@ static int igt_gtt_insert(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1809,7 +1809,7 @@ static int igt_gtt_insert(void *arg)
 	list_for_each_entry(obj, &objects, st_link) {
 		struct i915_vma *vma;
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1829,7 +1829,7 @@ static int igt_gtt_insert(void *arg)
 		struct i915_vma *vma;
 		u64 offset;
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1882,7 +1882,7 @@ static int igt_gtt_insert(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -2091,7 +2091,7 @@ static int igt_cs_tlb(void *arg)
 	}
 	i915_gem_object_set_cache_coherency(out, I915_CACHING_CACHED);
 
-	vma = i915_vma_instance(out, vm, NULL);
+	vma = i915_vma_instance(out, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_put_out;
@@ -2131,7 +2131,7 @@ static int igt_cs_tlb(void *arg)
 
 			memset32(result, STACK_MAGIC, PAGE_SIZE / sizeof(u32));
 
-			vma = i915_vma_instance(bbe, vm, NULL);
+			vma = i915_vma_instance(bbe, vm, NULL, false);
 			if (IS_ERR(vma)) {
 				err = PTR_ERR(vma);
 				goto end;
@@ -2203,7 +2203,7 @@ static int igt_cs_tlb(void *arg)
 				goto end;
 			}
 
-			vma = i915_vma_instance(act, vm, NULL);
+			vma = i915_vma_instance(act, vm, NULL, false);
 			if (IS_ERR(vma)) {
 				kfree(vma_res);
 				err = PTR_ERR(vma);
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 818a4909c1f3..297c1d4ebf44 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -961,7 +961,7 @@ static struct i915_vma *empty_batch(struct drm_i915_private *i915)
 
 	intel_gt_chipset_flush(to_gt(i915));
 
-	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err;
@@ -1100,7 +1100,7 @@ static struct i915_vma *recursive_batch(struct drm_i915_private *i915)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, to_gt(i915)->vm, NULL);
+	vma = i915_vma_instance(obj, to_gt(i915)->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err;
diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
index 71b52d5efef4..3899c2252de3 100644
--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
@@ -68,7 +68,7 @@ checked_vma_instance(struct drm_i915_gem_object *obj,
 	struct i915_vma *vma;
 	bool ok = true;
 
-	vma = i915_vma_instance(obj, vm, view);
+	vma = i915_vma_instance(obj, vm, view, false);
 	if (IS_ERR(vma))
 		return vma;
 
diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
index 0c22594ae274..6901f94ff076 100644
--- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
+++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
@@ -47,7 +47,7 @@ static void *igt_spinner_pin_obj(struct intel_context *ce,
 	void *vaddr;
 	int ret;
 
-	*vma = i915_vma_instance(obj, ce->vm, NULL);
+	*vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(*vma))
 		return ERR_CAST(*vma);
 
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
index 3b18e5905c86..551d0c958a3b 100644
--- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
@@ -745,7 +745,7 @@ static int igt_gpu_write(struct i915_gem_context *ctx,
 	if (!order)
 		return -ENOMEM;
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_free;
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC v4 14/14] drm/i915/vm_bind: Add uapi for user to enable vm_bind_mode
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Add getparam support for VM_BIND capability version.
Add VM creation time flag to enable vm_bind_mode for the VM.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c |  5 ++++-
 drivers/gpu/drm/i915/i915_getparam.c        |  3 +++
 include/uapi/drm/i915_drm.h                 | 24 ++++++++++++++++++++-
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index f4e648ec01ed..e0ebf47d4d57 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1808,7 +1808,7 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
 	if (!HAS_FULL_PPGTT(i915))
 		return -ENODEV;
 
-	if (args->flags)
+	if (args->flags & I915_VM_CREATE_FLAGS_UNKNOWN)
 		return -EINVAL;
 
 	ppgtt = i915_ppgtt_create(to_gt(i915), 0);
@@ -1828,6 +1828,9 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
 	if (err)
 		goto err_put;
 
+	if (args->flags & I915_VM_CREATE_FLAGS_USE_VM_BIND)
+		ppgtt->vm.vm_bind_mode = true;
+
 	GEM_BUG_ON(id == 0); /* reserved for invalid/unassigned ppgtt */
 	args->vm_id = id;
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c
index 342c8ca6414e..56ff5118d017 100644
--- a/drivers/gpu/drm/i915/i915_getparam.c
+++ b/drivers/gpu/drm/i915/i915_getparam.c
@@ -175,6 +175,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_PERF_REVISION:
 		value = i915_perf_ioctl_version();
 		break;
+	case I915_PARAM_VM_BIND_VERSION:
+		value = GRAPHICS_VER(i915) >= 12 ? 1 : 0;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index eaeb80a3ede1..df0fb875276c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -755,6 +755,27 @@ typedef struct drm_i915_irq_wait {
 /* Query if the kernel supports the I915_USERPTR_PROBE flag. */
 #define I915_PARAM_HAS_USERPTR_PROBE 56
 
+/*
+ * VM_BIND feature version supported.
+ *
+ * The following versions of VM_BIND have been defined:
+ *
+ * 0: No VM_BIND support.
+ *
+ * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
+ *    previously with VM_BIND, the ioctl will not support unbinding multiple
+ *    mappings or splitting them. Similarly, VM_BIND calls will not replace
+ *    any existing mappings.
+ *
+ * 2: The restrictions on unbinding partial or multiple mappings is
+ *    lifted, Similarly, binding will replace any mappings in the given range.
+ *
+ * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
+ *
+ * vm_bind versions are backward compatible.
+ */
+#define I915_PARAM_VM_BIND_VERSION	57
+
 /* Must be kept compact -- no holes and well documented */
 
 /**
@@ -2625,7 +2646,8 @@ struct drm_i915_gem_vm_control {
 	/** @extensions: Zero-terminated chain of extensions. */
 	__u64 extensions;
 
-	/** @flags: reserved for future usage, currently MBZ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND	(1u << 0)
+#define I915_VM_CREATE_FLAGS_UNKNOWN	(-(I915_VM_CREATE_FLAGS_USE_VM_BIND << 1))
 	__u32 flags;
 
 	/** @vm_id: Id of the VM created or to be destroyed */
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-gfx] [RFC v4 14/14] drm/i915/vm_bind: Add uapi for user to enable vm_bind_mode
@ 2022-09-21  7:09   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:09 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Add getparam support for VM_BIND capability version.
Add VM creation time flag to enable vm_bind_mode for the VM.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c |  5 ++++-
 drivers/gpu/drm/i915/i915_getparam.c        |  3 +++
 include/uapi/drm/i915_drm.h                 | 24 ++++++++++++++++++++-
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index f4e648ec01ed..e0ebf47d4d57 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1808,7 +1808,7 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
 	if (!HAS_FULL_PPGTT(i915))
 		return -ENODEV;
 
-	if (args->flags)
+	if (args->flags & I915_VM_CREATE_FLAGS_UNKNOWN)
 		return -EINVAL;
 
 	ppgtt = i915_ppgtt_create(to_gt(i915), 0);
@@ -1828,6 +1828,9 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
 	if (err)
 		goto err_put;
 
+	if (args->flags & I915_VM_CREATE_FLAGS_USE_VM_BIND)
+		ppgtt->vm.vm_bind_mode = true;
+
 	GEM_BUG_ON(id == 0); /* reserved for invalid/unassigned ppgtt */
 	args->vm_id = id;
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c
index 342c8ca6414e..56ff5118d017 100644
--- a/drivers/gpu/drm/i915/i915_getparam.c
+++ b/drivers/gpu/drm/i915/i915_getparam.c
@@ -175,6 +175,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_PERF_REVISION:
 		value = i915_perf_ioctl_version();
 		break;
+	case I915_PARAM_VM_BIND_VERSION:
+		value = GRAPHICS_VER(i915) >= 12 ? 1 : 0;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index eaeb80a3ede1..df0fb875276c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -755,6 +755,27 @@ typedef struct drm_i915_irq_wait {
 /* Query if the kernel supports the I915_USERPTR_PROBE flag. */
 #define I915_PARAM_HAS_USERPTR_PROBE 56
 
+/*
+ * VM_BIND feature version supported.
+ *
+ * The following versions of VM_BIND have been defined:
+ *
+ * 0: No VM_BIND support.
+ *
+ * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
+ *    previously with VM_BIND, the ioctl will not support unbinding multiple
+ *    mappings or splitting them. Similarly, VM_BIND calls will not replace
+ *    any existing mappings.
+ *
+ * 2: The restrictions on unbinding partial or multiple mappings is
+ *    lifted, Similarly, binding will replace any mappings in the given range.
+ *
+ * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
+ *
+ * vm_bind versions are backward compatible.
+ */
+#define I915_PARAM_VM_BIND_VERSION	57
+
 /* Must be kept compact -- no holes and well documented */
 
 /**
@@ -2625,7 +2646,8 @@ struct drm_i915_gem_vm_control {
 	/** @extensions: Zero-terminated chain of extensions. */
 	__u64 extensions;
 
-	/** @flags: reserved for future usage, currently MBZ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND	(1u << 0)
+#define I915_VM_CREATE_FLAGS_UNKNOWN	(-(I915_VM_CREATE_FLAGS_USE_VM_BIND << 1))
 	__u32 flags;
 
 	/** @vm_id: Id of the VM created or to be destroyed */
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/vm_bind: Add VM_BIND functionality (rev3)
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
                   ` (14 preceding siblings ...)
  (?)
@ 2022-09-21  8:33 ` Patchwork
  -1 siblings, 0 replies; 62+ messages in thread
From: Patchwork @ 2022-09-21  8:33 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/vm_bind: Add VM_BIND functionality (rev3)
URL   : https://patchwork.freedesktop.org/series/105879/
State : warning

== Summary ==

Error: dim checkpatch failed
1faa6bedf8da drm/i915/vm_bind: Expose vm lookup function
2474615410c3 drm/i915/vm_bind: Add __i915_sw_fence_await_reservation()
585b299126d4 drm/i915/vm_bind: Expose i915_gem_object_max_page_size()
671058542882 drm/i915/vm_bind: Implement bind and unbind of object
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 11, in <module>
    import git
ModuleNotFoundError: No module named 'git'
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 11, in <module>
    import git
ModuleNotFoundError: No module named 'git'
-:45: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#45: 
new file mode 100644

-:565: WARNING:LONG_LINE: line length of 118 exceeds 100 columns
#565: FILE: include/uapi/drm/i915_drm.h:539:
+#define DRM_IOCTL_I915_GEM_VM_BIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)

-:566: WARNING:LONG_LINE: line length of 122 exceeds 100 columns
#566: FILE: include/uapi/drm/i915_drm.h:540:
+#define DRM_IOCTL_I915_GEM_VM_UNBIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)

total: 0 errors, 3 warnings, 0 checks, 665 lines checked
97e2068b921a drm/i915/vm_bind: Support for VM private BOs
b45f5fd59970 drm/i915/vm_bind: Handle persistent vmas
52581ec4212a drm/i915/vm_bind: Add out fence support
a9d8b10d97c9 drm/i915/vm_bind: Abstract out common execbuf functions
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 11, in <module>
    import git
ModuleNotFoundError: No module named 'git'
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 11, in <module>
    import git
ModuleNotFoundError: No module named 'git'
-:712: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#712: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 1247 lines checked
f2a78068e648 drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 11, in <module>
    import git
ModuleNotFoundError: No module named 'git'
-:32: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#32: 
new file mode 100644

-:653: WARNING:LONG_LINE: line length of 126 exceeds 100 columns
#653: FILE: include/uapi/drm/i915_drm.h:542:
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)

total: 0 errors, 2 warnings, 0 checks, 679 lines checked
f60faf478ccc drm/i915/vm_bind: Update i915_vma_verify_bind_complete()
5795b486d78c drm/i915/vm_bind: Handle persistent vmas in execbuf3
93f405eeeab3 drm/i915/vm_bind: userptr dma-resv changes
52ee0357651e drm/i915/vm_bind: Skip vma_lookup for persistent vmas
80e8d1ec7e16 drm/i915/vm_bind: Add uapi for user to enable vm_bind_mode



^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/vm_bind: Add VM_BIND functionality (rev3)
  2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
                   ` (15 preceding siblings ...)
  (?)
@ 2022-09-21  8:55 ` Patchwork
  -1 siblings, 0 replies; 62+ messages in thread
From: Patchwork @ 2022-09-21  8:55 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 11277 bytes --]

== Series Details ==

Series: drm/i915/vm_bind: Add VM_BIND functionality (rev3)
URL   : https://patchwork.freedesktop.org/series/105879/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_12163 -> Patchwork_105879v3
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_105879v3 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_105879v3, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/index.html

Participating hosts (42 -> 40)
------------------------------

  Additional (1): fi-tgl-u2 
  Missing    (3): fi-hsw-4770 fi-rkl-11600 fi-bdw-samus 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_105879v3:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_module_load@load:
    - fi-ilk-650:         [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12163/fi-ilk-650/igt@i915_module_load@load.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-ilk-650/igt@i915_module_load@load.html
    - fi-blb-e6850:       [PASS][3] -> [INCOMPLETE][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12163/fi-blb-e6850/igt@i915_module_load@load.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-blb-e6850/igt@i915_module_load@load.html
    - fi-pnv-d510:        [PASS][5] -> [INCOMPLETE][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12163/fi-pnv-d510/igt@i915_module_load@load.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-pnv-d510/igt@i915_module_load@load.html
    - fi-snb-2520m:       [PASS][7] -> [INCOMPLETE][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12163/fi-snb-2520m/igt@i915_module_load@load.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-snb-2520m/igt@i915_module_load@load.html
    - fi-hsw-g3258:       [PASS][9] -> [INCOMPLETE][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12163/fi-hsw-g3258/igt@i915_module_load@load.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-hsw-g3258/igt@i915_module_load@load.html
    - fi-ivb-3770:        [PASS][11] -> [INCOMPLETE][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12163/fi-ivb-3770/igt@i915_module_load@load.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-ivb-3770/igt@i915_module_load@load.html
    - fi-snb-2600:        [PASS][13] -> [INCOMPLETE][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12163/fi-snb-2600/igt@i915_module_load@load.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-snb-2600/igt@i915_module_load@load.html

  * igt@i915_selftest@live@gem_migrate:
    - bat-dg1-5:          [PASS][15] -> [DMESG-WARN][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12163/bat-dg1-5/igt@i915_selftest@live@gem_migrate.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/bat-dg1-5/igt@i915_selftest@live@gem_migrate.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_selftest@live@gem_migrate:
    - {bat-dg2-8}:        [PASS][17] -> [DMESG-WARN][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12163/bat-dg2-8/igt@i915_selftest@live@gem_migrate.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/bat-dg2-8/igt@i915_selftest@live@gem_migrate.html
    - {bat-dg2-9}:        [PASS][19] -> [DMESG-WARN][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12163/bat-dg2-9/igt@i915_selftest@live@gem_migrate.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/bat-dg2-9/igt@i915_selftest@live@gem_migrate.html

  
Known issues
------------

  Here are the changes found in Patchwork_105879v3 that come from known issues:

### CI changes ###

#### Issues hit ####

  * boot:
    - fi-skl-6700k2:      [PASS][21] -> [FAIL][22] ([i915#5032])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12163/fi-skl-6700k2/boot.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-skl-6700k2/boot.html

  

### IGT changes ###

#### Issues hit ####

  * igt@gem_huc_copy@huc-copy:
    - fi-tgl-u2:          NOTRUN -> [SKIP][23] ([i915#2190])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-tgl-u2/igt@gem_huc_copy@huc-copy.html

  * igt@i915_module_load@load:
    - fi-elk-e7500:       [PASS][24] -> [INCOMPLETE][25] ([i915#6836])
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12163/fi-elk-e7500/igt@i915_module_load@load.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-elk-e7500/igt@i915_module_load@load.html

  * igt@kms_chamelium@hdmi-edid-read:
    - fi-tgl-u2:          NOTRUN -> [SKIP][26] ([fdo#109284] / [fdo#111827]) +7 similar issues
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-tgl-u2/igt@kms_chamelium@hdmi-edid-read.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor:
    - fi-tgl-u2:          NOTRUN -> [SKIP][27] ([i915#4103])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-tgl-u2/igt@kms_cursor_legacy@basic-busy-flip-before-cursor.html

  * igt@kms_force_connector_basic@force-load-detect:
    - fi-tgl-u2:          NOTRUN -> [SKIP][28] ([fdo#109285])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-tgl-u2/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_setmode@basic-clone-single-crtc:
    - fi-tgl-u2:          NOTRUN -> [SKIP][29] ([i915#3555])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-tgl-u2/igt@kms_setmode@basic-clone-single-crtc.html

  * igt@runner@aborted:
    - fi-ivb-3770:        NOTRUN -> [FAIL][30] ([i915#4312] / [i915#6219])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-ivb-3770/igt@runner@aborted.html
    - fi-elk-e7500:       NOTRUN -> [FAIL][31] ([i915#4312] / [i915#6836])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-elk-e7500/igt@runner@aborted.html
    - fi-snb-2600:        NOTRUN -> [FAIL][32] ([i915#4312])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-snb-2600/igt@runner@aborted.html
    - fi-ilk-650:         NOTRUN -> [FAIL][33] ([i915#4312] / [i915#4991])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-ilk-650/igt@runner@aborted.html
    - fi-blb-e6850:       NOTRUN -> [FAIL][34] ([i915#2403] / [i915#4312])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-blb-e6850/igt@runner@aborted.html
    - bat-dg1-5:          NOTRUN -> [FAIL][35] ([i915#4312])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/bat-dg1-5/igt@runner@aborted.html
    - fi-pnv-d510:        NOTRUN -> [FAIL][36] ([i915#2403] / [i915#4312])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-pnv-d510/igt@runner@aborted.html
    - fi-snb-2520m:       NOTRUN -> [FAIL][37] ([i915#4312])
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-snb-2520m/igt@runner@aborted.html
    - fi-hsw-g3258:       NOTRUN -> [FAIL][38] ([i915#4312] / [i915#6246])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/fi-hsw-g3258/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s3@smem:
    - {bat-adlm-1}:       [DMESG-WARN][39] ([i915#2867]) -> [PASS][40]
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12163/bat-adlm-1/igt@gem_exec_suspend@basic-s3@smem.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/bat-adlm-1/igt@gem_exec_suspend@basic-s3@smem.html

  * igt@i915_selftest@live@reset:
    - {bat-rpls-1}:       [DMESG-FAIL][41] ([i915#4983] / [i915#5828]) -> [PASS][42]
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12163/bat-rpls-1/igt@i915_selftest@live@reset.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/bat-rpls-1/igt@i915_selftest@live@reset.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2403]: https://gitlab.freedesktop.org/drm/intel/issues/2403
  [i915#2867]: https://gitlab.freedesktop.org/drm/intel/issues/2867
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#4991]: https://gitlab.freedesktop.org/drm/intel/issues/4991
  [i915#5032]: https://gitlab.freedesktop.org/drm/intel/issues/5032
  [i915#5122]: https://gitlab.freedesktop.org/drm/intel/issues/5122
  [i915#5257]: https://gitlab.freedesktop.org/drm/intel/issues/5257
  [i915#5828]: https://gitlab.freedesktop.org/drm/intel/issues/5828
  [i915#6219]: https://gitlab.freedesktop.org/drm/intel/issues/6219
  [i915#6246]: https://gitlab.freedesktop.org/drm/intel/issues/6246
  [i915#6434]: https://gitlab.freedesktop.org/drm/intel/issues/6434
  [i915#6836]: https://gitlab.freedesktop.org/drm/intel/issues/6836


Build changes
-------------

  * Linux: CI_DRM_12163 -> Patchwork_105879v3

  CI-20190529: 20190529
  CI_DRM_12163: 8a052348946d9ec1b368ddcc1d3db5f2fc486f75 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6659: 1becf700a737a7a98555a0cfbe8566355377afb2 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_105879v3: 8a052348946d9ec1b368ddcc1d3db5f2fc486f75 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

bfa82522715e drm/i915/vm_bind: Add uapi for user to enable vm_bind_mode
9be69aec5a6b drm/i915/vm_bind: Skip vma_lookup for persistent vmas
43ec58a56ea6 drm/i915/vm_bind: userptr dma-resv changes
fa391ae135ab drm/i915/vm_bind: Handle persistent vmas in execbuf3
ca7e10c22b47 drm/i915/vm_bind: Update i915_vma_verify_bind_complete()
76c0a2dc950a drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
7a078b2c65a3 drm/i915/vm_bind: Abstract out common execbuf functions
59f16bc4df63 drm/i915/vm_bind: Add out fence support
8dc247ec22e0 drm/i915/vm_bind: Handle persistent vmas
c0e5a22deac3 drm/i915/vm_bind: Support for VM private BOs
e60dcd038470 drm/i915/vm_bind: Implement bind and unbind of object
be9b4b742db7 drm/i915/vm_bind: Expose i915_gem_object_max_page_size()
720b5cbfeaad drm/i915/vm_bind: Add __i915_sw_fence_await_reservation()
b8b76a29b94b drm/i915/vm_bind: Expose vm lookup function

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v3/index.html

[-- Attachment #2: Type: text/html, Size: 12995 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 02/14] drm/i915/vm_bind: Add __i915_sw_fence_await_reservation()
  2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
  (?)
@ 2022-09-21  9:06   ` Tvrtko Ursulin
  2022-09-21 17:47     ` Niranjana Vishwanathapura
  -1 siblings, 1 reply; 62+ messages in thread
From: Tvrtko Ursulin @ 2022-09-21  9:06 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: daniel.vetter, christian.koenig, thomas.hellstrom,
	paulo.r.zanoni, matthew.auld


On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
> Add function __i915_sw_fence_await_reservation() for
> asynchronous wait on a dma-resv object with specified
> dma_resv_usage. This is required for async vma unbind
> with vm_bind.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_sw_fence.c | 25 ++++++++++++++++++-------
>   drivers/gpu/drm/i915/i915_sw_fence.h |  7 ++++++-
>   2 files changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
> index 6fc0d1b89690..0ce8f4efc1ed 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> @@ -569,12 +569,11 @@ int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>   	return ret;
>   }
>   
> -int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
> -				    struct dma_resv *resv,
> -				    const struct dma_fence_ops *exclude,
> -				    bool write,
> -				    unsigned long timeout,
> -				    gfp_t gfp)
> +int __i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
> +				      struct dma_resv *resv,
> +				      enum dma_resv_usage usage,
> +				      unsigned long timeout,
> +				      gfp_t gfp)
>   {
>   	struct dma_resv_iter cursor;
>   	struct dma_fence *f;
> @@ -583,7 +582,7 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>   	debug_fence_assert(fence);
>   	might_sleep_if(gfpflags_allow_blocking(gfp));
>   
> -	dma_resv_iter_begin(&cursor, resv, dma_resv_usage_rw(write));
> +	dma_resv_iter_begin(&cursor, resv, usage);
>   	dma_resv_for_each_fence_unlocked(&cursor, f) {
>   		pending = i915_sw_fence_await_dma_fence(fence, f, timeout,
>   							gfp);
> @@ -598,6 +597,18 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>   	return ret;
>   }
>   
> +int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
> +				    struct dma_resv *resv,
> +				    const struct dma_fence_ops *exclude,
> +				    bool write,
> +				    unsigned long timeout,
> +				    gfp_t gfp)
> +{
> +	return __i915_sw_fence_await_reservation(fence, resv,
> +						 dma_resv_usage_rw(write),
> +						 timeout, gfp);
> +}

Drive by observation - it looked dodgy that you create a wrapper here 
which ignores one function parameter.

On a more detailed look it seems no callers actually use exclude and 
it's even unused inside this function since 1b5bdf071e62 ("drm/i915: use 
the new iterator in i915_sw_fence_await_reservation v3").

So a cleanup patch before this one?

Regards,

Tvrtko


> +
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "selftests/lib_sw_fence.c"
>   #include "selftests/i915_sw_fence.c"
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
> index 619fc5a22f0c..3cf4b6e16f35 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.h
> @@ -10,13 +10,13 @@
>   #define _I915_SW_FENCE_H_
>   
>   #include <linux/dma-fence.h>
> +#include <linux/dma-resv.h>
>   #include <linux/gfp.h>
>   #include <linux/kref.h>
>   #include <linux/notifier.h> /* for NOTIFY_DONE */
>   #include <linux/wait.h>
>   
>   struct completion;
> -struct dma_resv;
>   struct i915_sw_fence;
>   
>   enum i915_sw_fence_notify {
> @@ -89,6 +89,11 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>   				  unsigned long timeout,
>   				  gfp_t gfp);
>   
> +int __i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
> +				      struct dma_resv *resv,
> +				      enum dma_resv_usage usage,
> +				      unsigned long timeout,
> +				      gfp_t gfp);
>   int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>   				    struct dma_resv *resv,
>   				    const struct dma_fence_ops *exclude,

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 03/14] drm/i915/vm_bind: Expose i915_gem_object_max_page_size()
  2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
  (?)
@ 2022-09-21  9:13   ` Tvrtko Ursulin
  2022-09-21 18:00     ` Niranjana Vishwanathapura
  -1 siblings, 1 reply; 62+ messages in thread
From: Tvrtko Ursulin @ 2022-09-21  9:13 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: daniel.vetter, christian.koenig, thomas.hellstrom,
	paulo.r.zanoni, matthew.auld


On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
> Expose i915_gem_object_max_page_size() function non-static
> which will be used by the vm_bind feature.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_create.c | 20 +++++++++++++++-----
>   drivers/gpu/drm/i915/gem/i915_gem_object.h |  2 ++
>   2 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> index 33673fe7ee0a..3b3ab4abb0a3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> @@ -11,14 +11,24 @@
>   #include "pxp/intel_pxp.h"
>   
>   #include "i915_drv.h"
> +#include "i915_gem_context.h"

I can't spot that you are adding any code which would need this? 
I915_GTT_PAGE_SIZE_4K? It is in intel_gtt.h.

>   #include "i915_gem_create.h"
>   #include "i915_trace.h"
>   #include "i915_user_extensions.h"
>   
> -static u32 object_max_page_size(struct intel_memory_region **placements,
> -				unsigned int n_placements)
> +/**
> + * i915_gem_object_max_page_size() - max of min_page_size of the regions
> + * @placements:  list of regions
> + * @n_placements: number of the placements
> + *
> + * Calculates the max of the min_page_size of a list of placements passed in.
> + *
> + * Return: max of the min_page_size
> + */
> +u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
> +				  unsigned int n_placements)
>   {
> -	u32 max_page_size = 0;
> +	u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>   	int i;
>   
>   	for (i = 0; i < n_placements; i++) {
> @@ -28,7 +38,6 @@ static u32 object_max_page_size(struct intel_memory_region **placements,
>   		max_page_size = max_t(u32, max_page_size, mr->min_page_size);
>   	}
>   
> -	GEM_BUG_ON(!max_page_size);
>   	return max_page_size;
>   }
>   
> @@ -99,7 +108,8 @@ __i915_gem_object_create_user_ext(struct drm_i915_private *i915, u64 size,
>   
>   	i915_gem_flush_free_objects(i915);
>   
> -	size = round_up(size, object_max_page_size(placements, n_placements));
> +	size = round_up(size, i915_gem_object_max_page_size(placements,
> +							    n_placements));
>   	if (size == 0)
>   		return ERR_PTR(-EINVAL);

Because of the changes above this path is now unreachable. I suppose it 
was meant to tell the user "you have supplied no placements"? But then 
GEM_BUG_ON (which you remove) used to be wrong.

Regards,

Tvrtko

>   
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 7317d4102955..8c97bddad921 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
>   }
>   
>   void i915_gem_init__objects(struct drm_i915_private *i915);
> +u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
> +				  unsigned int n_placements);
>   
>   void i915_objects_module_exit(void);
>   int i915_objects_module_init(void);

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 08/14] drm/i915/vm_bind: Abstract out common execbuf functions
  2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
  (?)
@ 2022-09-21 10:18   ` Tvrtko Ursulin
  2022-09-21 18:17     ` Niranjana Vishwanathapura
  -1 siblings, 1 reply; 62+ messages in thread
From: Tvrtko Ursulin @ 2022-09-21 10:18 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: daniel.vetter, christian.koenig, thomas.hellstrom,
	paulo.r.zanoni, matthew.auld


On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
> The new execbuf3 ioctl path and the legacy execbuf ioctl
> paths have many common functionalities.
> Share code between these two paths by abstracting out the
> common functionalities into a separate file where possible.

Looks like a good start to me. A couple comments/questions below.

> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>   drivers/gpu/drm/i915/Makefile                 |   1 +
>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 507 ++---------------
>   .../drm/i915/gem/i915_gem_execbuffer_common.c | 530 ++++++++++++++++++
>   .../drm/i915/gem/i915_gem_execbuffer_common.h |  47 ++
>   4 files changed, 612 insertions(+), 473 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
>   create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 9bf939ef18ea..bf952f478555 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -148,6 +148,7 @@ gem-y += \
>   	gem/i915_gem_create.o \
>   	gem/i915_gem_dmabuf.o \
>   	gem/i915_gem_domain.o \
> +	gem/i915_gem_execbuffer_common.o \
>   	gem/i915_gem_execbuffer.o \
>   	gem/i915_gem_internal.o \
>   	gem/i915_gem_object.o \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 33d989a20227..363b2a788cdf 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -9,8 +9,6 @@
>   #include <linux/sync_file.h>
>   #include <linux/uaccess.h>
>   
> -#include <drm/drm_syncobj.h>
> -
>   #include "display/intel_frontbuffer.h"
>   
>   #include "gem/i915_gem_ioctls.h"
> @@ -28,6 +26,7 @@
>   #include "i915_file_private.h"
>   #include "i915_gem_clflush.h"
>   #include "i915_gem_context.h"
> +#include "i915_gem_execbuffer_common.h"
>   #include "i915_gem_evict.h"
>   #include "i915_gem_ioctls.h"
>   #include "i915_trace.h"
> @@ -235,13 +234,6 @@ enum {
>    * the batchbuffer in trusted mode, otherwise the ioctl is rejected.
>    */
>   
> -struct eb_fence {
> -	struct drm_syncobj *syncobj; /* Use with ptr_mask_bits() */
> -	struct dma_fence *dma_fence;
> -	u64 value;
> -	struct dma_fence_chain *chain_fence;
> -};
> -
>   struct i915_execbuffer {
>   	struct drm_i915_private *i915; /** i915 backpointer */
>   	struct drm_file *file; /** per-file lookup tables and limits */
> @@ -2446,164 +2438,29 @@ static const enum intel_engine_id user_ring_map[] = {
>   	[I915_EXEC_VEBOX]	= VECS0
>   };
>   
> -static struct i915_request *eb_throttle(struct i915_execbuffer *eb, struct intel_context *ce)
> -{
> -	struct intel_ring *ring = ce->ring;
> -	struct intel_timeline *tl = ce->timeline;
> -	struct i915_request *rq;
> -
> -	/*
> -	 * Completely unscientific finger-in-the-air estimates for suitable
> -	 * maximum user request size (to avoid blocking) and then backoff.
> -	 */
> -	if (intel_ring_update_space(ring) >= PAGE_SIZE)
> -		return NULL;
> -
> -	/*
> -	 * Find a request that after waiting upon, there will be at least half
> -	 * the ring available. The hysteresis allows us to compete for the
> -	 * shared ring and should mean that we sleep less often prior to
> -	 * claiming our resources, but not so long that the ring completely
> -	 * drains before we can submit our next request.
> -	 */
> -	list_for_each_entry(rq, &tl->requests, link) {
> -		if (rq->ring != ring)
> -			continue;
> -
> -		if (__intel_ring_space(rq->postfix,
> -				       ring->emit, ring->size) > ring->size / 2)
> -			break;
> -	}
> -	if (&rq->link == &tl->requests)
> -		return NULL; /* weird, we will check again later for real */
> -
> -	return i915_request_get(rq);
> -}
> -
> -static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
> -			   bool throttle)
> -{
> -	struct intel_timeline *tl;
> -	struct i915_request *rq = NULL;
> -
> -	/*
> -	 * Take a local wakeref for preparing to dispatch the execbuf as
> -	 * we expect to access the hardware fairly frequently in the
> -	 * process, and require the engine to be kept awake between accesses.
> -	 * Upon dispatch, we acquire another prolonged wakeref that we hold
> -	 * until the timeline is idle, which in turn releases the wakeref
> -	 * taken on the engine, and the parent device.
> -	 */
> -	tl = intel_context_timeline_lock(ce);
> -	if (IS_ERR(tl))
> -		return PTR_ERR(tl);
> -
> -	intel_context_enter(ce);
> -	if (throttle)
> -		rq = eb_throttle(eb, ce);
> -	intel_context_timeline_unlock(tl);
> -
> -	if (rq) {
> -		bool nonblock = eb->file->filp->f_flags & O_NONBLOCK;
> -		long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
> -
> -		if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
> -				      timeout) < 0) {
> -			i915_request_put(rq);
> -
> -			/*
> -			 * Error path, cannot use intel_context_timeline_lock as
> -			 * that is user interruptable and this clean up step
> -			 * must be done.
> -			 */
> -			mutex_lock(&ce->timeline->mutex);
> -			intel_context_exit(ce);
> -			mutex_unlock(&ce->timeline->mutex);
> -
> -			if (nonblock)
> -				return -EWOULDBLOCK;
> -			else
> -				return -EINTR;
> -		}
> -		i915_request_put(rq);
> -	}
> -
> -	return 0;
> -}
> -
>   static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
>   {
> -	struct intel_context *ce = eb->context, *child;
>   	int err;
> -	int i = 0, j = 0;
>   
>   	GEM_BUG_ON(eb->args->flags & __EXEC_ENGINE_PINNED);

You could avoid duplication by putting the common flags into the common 
header and then eb2 and eb3 add their own flags relative to the end of 
the last common entry.

>   
> -	if (unlikely(intel_context_is_banned(ce)))
> -		return -EIO;
> -
> -	/*
> -	 * Pinning the contexts may generate requests in order to acquire
> -	 * GGTT space, so do this first before we reserve a seqno for
> -	 * ourselves.
> -	 */
> -	err = intel_context_pin_ww(ce, &eb->ww);
> +	err = __eb_pin_engine(eb->context, &eb->ww, throttle,
> +			      eb->file->filp->f_flags & O_NONBLOCK);
>   	if (err)
>   		return err;
> -	for_each_child(ce, child) {
> -		err = intel_context_pin_ww(child, &eb->ww);
> -		GEM_BUG_ON(err);	/* perma-pinned should incr a counter */
> -	}
> -
> -	for_each_child(ce, child) {
> -		err = eb_pin_timeline(eb, child, throttle);
> -		if (err)
> -			goto unwind;
> -		++i;
> -	}
> -	err = eb_pin_timeline(eb, ce, throttle);
> -	if (err)
> -		goto unwind;
>   
>   	eb->args->flags |= __EXEC_ENGINE_PINNED;
>   	return 0;
> -
> -unwind:
> -	for_each_child(ce, child) {
> -		if (j++ < i) {
> -			mutex_lock(&child->timeline->mutex);
> -			intel_context_exit(child);
> -			mutex_unlock(&child->timeline->mutex);
> -		}
> -	}
> -	for_each_child(ce, child)
> -		intel_context_unpin(child);
> -	intel_context_unpin(ce);
> -	return err;
>   }
>   
>   static void eb_unpin_engine(struct i915_execbuffer *eb)
>   {
> -	struct intel_context *ce = eb->context, *child;
> -
>   	if (!(eb->args->flags & __EXEC_ENGINE_PINNED))
>   		return;
>   
>   	eb->args->flags &= ~__EXEC_ENGINE_PINNED;
>   
> -	for_each_child(ce, child) {
> -		mutex_lock(&child->timeline->mutex);
> -		intel_context_exit(child);
> -		mutex_unlock(&child->timeline->mutex);
> -
> -		intel_context_unpin(child);
> -	}
> -
> -	mutex_lock(&ce->timeline->mutex);
> -	intel_context_exit(ce);
> -	mutex_unlock(&ce->timeline->mutex);
> -
> -	intel_context_unpin(ce);
> +	__eb_unpin_engine(eb->context);
>   }
>   
>   static unsigned int
> @@ -2652,7 +2509,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
>   static int
>   eb_select_engine(struct i915_execbuffer *eb)
>   {
> -	struct intel_context *ce, *child;
> +	struct intel_context *ce;
>   	unsigned int idx;
>   	int err;
>   
> @@ -2677,36 +2534,10 @@ eb_select_engine(struct i915_execbuffer *eb)
>   	}
>   	eb->num_batches = ce->parallel.number_children + 1;
>   
> -	for_each_child(ce, child)
> -		intel_context_get(child);
> -	intel_gt_pm_get(ce->engine->gt);
> -
> -	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
> -		err = intel_context_alloc_state(ce);
> -		if (err)
> -			goto err;
> -	}
> -	for_each_child(ce, child) {
> -		if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
> -			err = intel_context_alloc_state(child);
> -			if (err)
> -				goto err;
> -		}
> -	}
> -
> -	/*
> -	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
> -	 * EIO if the GPU is already wedged.
> -	 */
> -	err = intel_gt_terminally_wedged(ce->engine->gt);
> +	err = __eb_select_engine(ce);
>   	if (err)
>   		goto err;
>   
> -	if (!i915_vm_tryget(ce->vm)) {
> -		err = -ENOENT;
> -		goto err;
> -	}
> -
>   	eb->context = ce;
>   	eb->gt = ce->engine->gt;
>   
> @@ -2715,12 +2546,9 @@ eb_select_engine(struct i915_execbuffer *eb)
>   	 * during ww handling. The pool is destroyed when last pm reference
>   	 * is dropped, which breaks our -EDEADLK handling.
>   	 */
> -	return err;
> +	return 0;
>   
>   err:
> -	intel_gt_pm_put(ce->engine->gt);
> -	for_each_child(ce, child)
> -		intel_context_put(child);
>   	intel_context_put(ce);
>   	return err;
>   }
> @@ -2728,24 +2556,7 @@ eb_select_engine(struct i915_execbuffer *eb)
>   static void
>   eb_put_engine(struct i915_execbuffer *eb)
>   {
> -	struct intel_context *child;
> -
> -	i915_vm_put(eb->context->vm);
> -	intel_gt_pm_put(eb->gt);
> -	for_each_child(eb->context, child)
> -		intel_context_put(child);
> -	intel_context_put(eb->context);
> -}
> -
> -static void
> -__free_fence_array(struct eb_fence *fences, unsigned int n)
> -{
> -	while (n--) {
> -		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
> -		dma_fence_put(fences[n].dma_fence);
> -		dma_fence_chain_free(fences[n].chain_fence);
> -	}
> -	kvfree(fences);
> +	__eb_put_engine(eb->context, eb->gt);
>   }
>   
>   static int
> @@ -2756,7 +2567,6 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
>   	u64 __user *user_values;
>   	struct eb_fence *f;
>   	u64 nfences;
> -	int err = 0;
>   
>   	nfences = timeline_fences->fence_count;
>   	if (!nfences)
> @@ -2791,9 +2601,9 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
>   
>   	while (nfences--) {
>   		struct drm_i915_gem_exec_fence user_fence;
> -		struct drm_syncobj *syncobj;
> -		struct dma_fence *fence = NULL;
> +		bool wait, signal;
>   		u64 point;
> +		int ret;
>   
>   		if (__copy_from_user(&user_fence,
>   				     user_fences++,
> @@ -2806,70 +2616,15 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
>   		if (__get_user(point, user_values++))
>   			return -EFAULT;
>   
> -		syncobj = drm_syncobj_find(eb->file, user_fence.handle);
> -		if (!syncobj) {
> -			DRM_DEBUG("Invalid syncobj handle provided\n");
> -			return -ENOENT;
> -		}
> -
> -		fence = drm_syncobj_fence_get(syncobj);
> -
> -		if (!fence && user_fence.flags &&
> -		    !(user_fence.flags & I915_EXEC_FENCE_SIGNAL)) {
> -			DRM_DEBUG("Syncobj handle has no fence\n");
> -			drm_syncobj_put(syncobj);
> -			return -EINVAL;
> -		}
> -
> -		if (fence)
> -			err = dma_fence_chain_find_seqno(&fence, point);
> -
> -		if (err && !(user_fence.flags & I915_EXEC_FENCE_SIGNAL)) {
> -			DRM_DEBUG("Syncobj handle missing requested point %llu\n", point);
> -			dma_fence_put(fence);
> -			drm_syncobj_put(syncobj);
> -			return err;
> -		}
> -
> -		/*
> -		 * A point might have been signaled already and
> -		 * garbage collected from the timeline. In this case
> -		 * just ignore the point and carry on.
> -		 */
> -		if (!fence && !(user_fence.flags & I915_EXEC_FENCE_SIGNAL)) {
> -			drm_syncobj_put(syncobj);
> +		wait = user_fence.flags & I915_EXEC_FENCE_WAIT;
> +		signal = user_fence.flags & I915_EXEC_FENCE_SIGNAL;
> +		ret = add_timeline_fence(eb->file, user_fence.handle, point,
> +					 f, wait, signal);
> +		if (ret < 0)
> +			return ret;
> +		else if (!ret)
>   			continue;
> -		}
>   
> -		/*
> -		 * For timeline syncobjs we need to preallocate chains for
> -		 * later signaling.
> -		 */
> -		if (point != 0 && user_fence.flags & I915_EXEC_FENCE_SIGNAL) {
> -			/*
> -			 * Waiting and signaling the same point (when point !=
> -			 * 0) would break the timeline.
> -			 */
> -			if (user_fence.flags & I915_EXEC_FENCE_WAIT) {
> -				DRM_DEBUG("Trying to wait & signal the same timeline point.\n");
> -				dma_fence_put(fence);
> -				drm_syncobj_put(syncobj);
> -				return -EINVAL;
> -			}
> -
> -			f->chain_fence = dma_fence_chain_alloc();
> -			if (!f->chain_fence) {
> -				drm_syncobj_put(syncobj);
> -				dma_fence_put(fence);
> -				return -ENOMEM;
> -			}
> -		} else {
> -			f->chain_fence = NULL;
> -		}
> -
> -		f->syncobj = ptr_pack_bits(syncobj, user_fence.flags, 2);
> -		f->dma_fence = fence;
> -		f->value = point;
>   		f++;
>   		eb->num_fences++;
>   	}
> @@ -2949,65 +2704,6 @@ static int add_fence_array(struct i915_execbuffer *eb)
>   	return 0;
>   }
>   
> -static void put_fence_array(struct eb_fence *fences, int num_fences)
> -{
> -	if (fences)
> -		__free_fence_array(fences, num_fences);
> -}
> -
> -static int
> -await_fence_array(struct i915_execbuffer *eb,
> -		  struct i915_request *rq)
> -{
> -	unsigned int n;
> -	int err;
> -
> -	for (n = 0; n < eb->num_fences; n++) {
> -		struct drm_syncobj *syncobj;
> -		unsigned int flags;
> -
> -		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
> -
> -		if (!eb->fences[n].dma_fence)
> -			continue;
> -
> -		err = i915_request_await_dma_fence(rq, eb->fences[n].dma_fence);
> -		if (err < 0)
> -			return err;
> -	}
> -
> -	return 0;
> -}
> -
> -static void signal_fence_array(const struct i915_execbuffer *eb,
> -			       struct dma_fence * const fence)
> -{
> -	unsigned int n;
> -
> -	for (n = 0; n < eb->num_fences; n++) {
> -		struct drm_syncobj *syncobj;
> -		unsigned int flags;
> -
> -		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
> -		if (!(flags & I915_EXEC_FENCE_SIGNAL))
> -			continue;
> -
> -		if (eb->fences[n].chain_fence) {
> -			drm_syncobj_add_point(syncobj,
> -					      eb->fences[n].chain_fence,
> -					      fence,
> -					      eb->fences[n].value);
> -			/*
> -			 * The chain's ownership is transferred to the
> -			 * timeline.
> -			 */
> -			eb->fences[n].chain_fence = NULL;
> -		} else {
> -			drm_syncobj_replace_fence(syncobj, fence);
> -		}
> -	}
> -}
> -
>   static int
>   parse_timeline_fences(struct i915_user_extension __user *ext, void *data)
>   {
> @@ -3020,80 +2716,6 @@ parse_timeline_fences(struct i915_user_extension __user *ext, void *data)
>   	return add_timeline_fence_array(eb, &timeline_fences);
>   }
>   
> -static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
> -{
> -	struct i915_request *rq, *rn;
> -
> -	list_for_each_entry_safe(rq, rn, &tl->requests, link)
> -		if (rq == end || !i915_request_retire(rq))
> -			break;
> -}
> -
> -static int eb_request_add(struct i915_execbuffer *eb, struct i915_request *rq,
> -			  int err, bool last_parallel)
> -{
> -	struct intel_timeline * const tl = i915_request_timeline(rq);
> -	struct i915_sched_attr attr = {};
> -	struct i915_request *prev;
> -
> -	lockdep_assert_held(&tl->mutex);
> -	lockdep_unpin_lock(&tl->mutex, rq->cookie);
> -
> -	trace_i915_request_add(rq);
> -
> -	prev = __i915_request_commit(rq);
> -
> -	/* Check that the context wasn't destroyed before submission */
> -	if (likely(!intel_context_is_closed(eb->context))) {
> -		attr = eb->gem_context->sched;
> -	} else {
> -		/* Serialise with context_close via the add_to_timeline */
> -		i915_request_set_error_once(rq, -ENOENT);
> -		__i915_request_skip(rq);
> -		err = -ENOENT; /* override any transient errors */
> -	}
> -
> -	if (intel_context_is_parallel(eb->context)) {
> -		if (err) {
> -			__i915_request_skip(rq);
> -			set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
> -				&rq->fence.flags);
> -		}
> -		if (last_parallel)
> -			set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
> -				&rq->fence.flags);
> -	}
> -
> -	__i915_request_queue(rq, &attr);
> -
> -	/* Try to clean up the client's timeline after submitting the request */
> -	if (prev)
> -		retire_requests(tl, prev);
> -
> -	mutex_unlock(&tl->mutex);
> -
> -	return err;
> -}
> -
> -static int eb_requests_add(struct i915_execbuffer *eb, int err)
> -{
> -	int i;
> -
> -	/*
> -	 * We iterate in reverse order of creation to release timeline mutexes in
> -	 * same order.
> -	 */
> -	for_each_batch_add_order(eb, i) {
> -		struct i915_request *rq = eb->requests[i];
> -
> -		if (!rq)
> -			continue;
> -		err |= eb_request_add(eb, rq, err, i == 0);
> -	}
> -
> -	return err;
> -}
> -
>   static const i915_user_extension_fn execbuf_extensions[] = {
>   	[DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES] = parse_timeline_fences,
>   };
> @@ -3120,73 +2742,26 @@ parse_execbuf2_extensions(struct drm_i915_gem_execbuffer2 *args,
>   				    eb);
>   }
>   
> -static void eb_requests_get(struct i915_execbuffer *eb)
> -{
> -	unsigned int i;
> -
> -	for_each_batch_create_order(eb, i) {
> -		if (!eb->requests[i])
> -			break;
> -
> -		i915_request_get(eb->requests[i]);
> -	}
> -}
> -
> -static void eb_requests_put(struct i915_execbuffer *eb)
> -{
> -	unsigned int i;
> -
> -	for_each_batch_create_order(eb, i) {
> -		if (!eb->requests[i])
> -			break;
> -
> -		i915_request_put(eb->requests[i]);
> -	}
> -}
> -
>   static struct sync_file *
>   eb_composite_fence_create(struct i915_execbuffer *eb, int out_fence_fd)
>   {
>   	struct sync_file *out_fence = NULL;
> -	struct dma_fence_array *fence_array;
> -	struct dma_fence **fences;
> -	unsigned int i;
> -
> -	GEM_BUG_ON(!intel_context_is_parent(eb->context));
> +	struct dma_fence *fence;
>   
> -	fences = kmalloc_array(eb->num_batches, sizeof(*fences), GFP_KERNEL);
> -	if (!fences)
> -		return ERR_PTR(-ENOMEM);
> -
> -	for_each_batch_create_order(eb, i) {
> -		fences[i] = &eb->requests[i]->fence;
> -		__set_bit(I915_FENCE_FLAG_COMPOSITE,
> -			  &eb->requests[i]->fence.flags);
> -	}
> -
> -	fence_array = dma_fence_array_create(eb->num_batches,
> -					     fences,
> -					     eb->context->parallel.fence_context,
> -					     eb->context->parallel.seqno++,
> -					     false);
> -	if (!fence_array) {
> -		kfree(fences);
> -		return ERR_PTR(-ENOMEM);
> -	}
> -
> -	/* Move ownership to the dma_fence_array created above */
> -	for_each_batch_create_order(eb, i)
> -		dma_fence_get(fences[i]);
> +	fence = __eb_composite_fence_create(eb->requests, eb->num_batches,
> +					    eb->context);
> +	if (IS_ERR(fence))
> +		return ERR_CAST(fence);
>   
>   	if (out_fence_fd != -1) {
> -		out_fence = sync_file_create(&fence_array->base);
> +		out_fence = sync_file_create(fence);
>   		/* sync_file now owns fence_arry, drop creation ref */
> -		dma_fence_put(&fence_array->base);
> +		dma_fence_put(fence);
>   		if (!out_fence)
>   			return ERR_PTR(-ENOMEM);
>   	}
>   
> -	eb->composite_fence = &fence_array->base;
> +	eb->composite_fence = fence;
>   
>   	return out_fence;
>   }
> @@ -3218,7 +2793,7 @@ eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq,
>   	}
>   
>   	if (eb->fences) {
> -		err = await_fence_array(eb, rq);
> +		err = await_fence_array(eb->fences, eb->num_fences, rq);
>   		if (err)
>   			return ERR_PTR(err);
>   	}
> @@ -3236,23 +2811,6 @@ eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq,
>   	return out_fence;
>   }
>   
> -static struct intel_context *
> -eb_find_context(struct i915_execbuffer *eb, unsigned int context_number)
> -{
> -	struct intel_context *child;
> -
> -	if (likely(context_number == 0))
> -		return eb->context;
> -
> -	for_each_child(eb->context, child)
> -		if (!--context_number)
> -			return child;
> -
> -	GEM_BUG_ON("Context not found");
> -
> -	return NULL;
> -}
> -
>   static struct sync_file *
>   eb_requests_create(struct i915_execbuffer *eb, struct dma_fence *in_fence,
>   		   int out_fence_fd)
> @@ -3262,7 +2820,8 @@ eb_requests_create(struct i915_execbuffer *eb, struct dma_fence *in_fence,
>   
>   	for_each_batch_create_order(eb, i) {
>   		/* Allocate a request for this batch buffer nice and early. */
> -		eb->requests[i] = i915_request_create(eb_find_context(eb, i));
> +		eb->requests[i] =
> +			i915_request_create(eb_find_context(eb->context, i));
>   		if (IS_ERR(eb->requests[i])) {
>   			out_fence = ERR_CAST(eb->requests[i]);
>   			eb->requests[i] = NULL;
> @@ -3442,11 +3001,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   	err = eb_submit(&eb);
>   
>   err_request:
> -	eb_requests_get(&eb);
> -	err = eb_requests_add(&eb, err);
> +	eb_requests_get(eb.requests, eb.num_batches);
> +	err = eb_requests_add(eb.requests, eb.num_batches, eb.context,
> +			      eb.gem_context->sched, err);
>   
>   	if (eb.fences)
> -		signal_fence_array(&eb, eb.composite_fence ?
> +		signal_fence_array(eb.fences, eb.num_fences,
> +				   eb.composite_fence ?
>   				   eb.composite_fence :
>   				   &eb.requests[0]->fence);
>   
> @@ -3471,7 +3032,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   	if (!out_fence && eb.composite_fence)
>   		dma_fence_put(eb.composite_fence);
>   
> -	eb_requests_put(&eb);
> +	eb_requests_put(eb.requests, eb.num_batches);
>   
>   err_vma:
>   	eb_release_vmas(&eb, true);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
> new file mode 100644
> index 000000000000..167268dfd930
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
> @@ -0,0 +1,530 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#include <linux/dma-fence-array.h>
> +#include "gt/intel_gt.h"
> +#include "gt/intel_gt_pm.h"
> +#include "gt/intel_ring.h"
> +
> +#include "i915_gem_execbuffer_common.h"
> +
> +#define __EXEC_COMMON_FENCE_WAIT	BIT(0)
> +#define __EXEC_COMMON_FENCE_SIGNAL	BIT(1)
> +
> +static struct i915_request *eb_throttle(struct intel_context *ce)
> +{
> +	struct intel_ring *ring = ce->ring;
> +	struct intel_timeline *tl = ce->timeline;
> +	struct i915_request *rq;
> +
> +	/*
> +	 * Completely unscientific finger-in-the-air estimates for suitable
> +	 * maximum user request size (to avoid blocking) and then backoff.
> +	 */
> +	if (intel_ring_update_space(ring) >= PAGE_SIZE)
> +		return NULL;
> +
> +	/*
> +	 * Find a request that after waiting upon, there will be at least half
> +	 * the ring available. The hysteresis allows us to compete for the
> +	 * shared ring and should mean that we sleep less often prior to
> +	 * claiming our resources, but not so long that the ring completely
> +	 * drains before we can submit our next request.
> +	 */
> +	list_for_each_entry(rq, &tl->requests, link) {
> +		if (rq->ring != ring)
> +			continue;
> +
> +		if (__intel_ring_space(rq->postfix,
> +				       ring->emit, ring->size) > ring->size / 2)
> +			break;
> +	}
> +	if (&rq->link == &tl->requests)
> +		return NULL; /* weird, we will check again later for real */
> +
> +	return i915_request_get(rq);
> +}
> +
> +static int eb_pin_timeline(struct intel_context *ce, bool throttle,
> +			   bool nonblock)
> +{
> +	struct intel_timeline *tl;
> +	struct i915_request *rq = NULL;
> +
> +	/*
> +	 * Take a local wakeref for preparing to dispatch the execbuf as
> +	 * we expect to access the hardware fairly frequently in the
> +	 * process, and require the engine to be kept awake between accesses.
> +	 * Upon dispatch, we acquire another prolonged wakeref that we hold
> +	 * until the timeline is idle, which in turn releases the wakeref
> +	 * taken on the engine, and the parent device.
> +	 */
> +	tl = intel_context_timeline_lock(ce);
> +	if (IS_ERR(tl))
> +		return PTR_ERR(tl);
> +
> +	intel_context_enter(ce);
> +	if (throttle)
> +		rq = eb_throttle(ce);
> +	intel_context_timeline_unlock(tl);
> +
> +	if (rq) {
> +		long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
> +
> +		if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
> +				      timeout) < 0) {
> +			i915_request_put(rq);
> +
> +			/*
> +			 * Error path, cannot use intel_context_timeline_lock as
> +			 * that is user interruptable and this clean up step
> +			 * must be done.
> +			 */
> +			mutex_lock(&ce->timeline->mutex);
> +			intel_context_exit(ce);
> +			mutex_unlock(&ce->timeline->mutex);
> +
> +			if (nonblock)
> +				return -EWOULDBLOCK;
> +			else
> +				return -EINTR;
> +		}
> +		i915_request_put(rq);
> +	}
> +
> +	return 0;
> +}
> +
> +int __eb_pin_engine(struct intel_context *ce, struct i915_gem_ww_ctx *ww,
> +		    bool throttle, bool nonblock)
> +{
> +	struct intel_context *child;
> +	int err;
> +	int i = 0, j = 0;
> +
> +	if (unlikely(intel_context_is_banned(ce)))
> +		return -EIO;
> +
> +	/*
> +	 * Pinning the contexts may generate requests in order to acquire
> +	 * GGTT space, so do this first before we reserve a seqno for
> +	 * ourselves.
> +	 */
> +	err = intel_context_pin_ww(ce, ww);
> +	if (err)
> +		return err;
> +
> +	for_each_child(ce, child) {
> +		err = intel_context_pin_ww(child, ww);
> +		GEM_BUG_ON(err);	/* perma-pinned should incr a counter */
> +	}
> +
> +	for_each_child(ce, child) {
> +		err = eb_pin_timeline(child, throttle, nonblock);
> +		if (err)
> +			goto unwind;
> +		++i;
> +	}
> +	err = eb_pin_timeline(ce, throttle, nonblock);
> +	if (err)
> +		goto unwind;
> +
> +	return 0;
> +
> +unwind:
> +	for_each_child(ce, child) {
> +		if (j++ < i) {
> +			mutex_lock(&child->timeline->mutex);
> +			intel_context_exit(child);
> +			mutex_unlock(&child->timeline->mutex);
> +		}
> +	}
> +	for_each_child(ce, child)
> +		intel_context_unpin(child);
> +	intel_context_unpin(ce);
> +	return err;
> +}
> +
> +void __eb_unpin_engine(struct intel_context *ce)
> +{
> +	struct intel_context *child;
> +
> +	for_each_child(ce, child) {
> +		mutex_lock(&child->timeline->mutex);
> +		intel_context_exit(child);
> +		mutex_unlock(&child->timeline->mutex);
> +
> +		intel_context_unpin(child);
> +	}
> +
> +	mutex_lock(&ce->timeline->mutex);
> +	intel_context_exit(ce);
> +	mutex_unlock(&ce->timeline->mutex);
> +
> +	intel_context_unpin(ce);
> +}
> +
> +struct intel_context *
> +eb_find_context(struct intel_context *context, unsigned int context_number)
> +{
> +	struct intel_context *child;
> +
> +	if (likely(context_number == 0))
> +		return context;
> +
> +	for_each_child(context, child)
> +		if (!--context_number)
> +			return child;
> +
> +	GEM_BUG_ON("Context not found");
> +
> +	return NULL;
> +}
> +
> +static void __free_fence_array(struct eb_fence *fences, u64 n)
> +{
> +	while (n--) {
> +		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
> +		dma_fence_put(fences[n].dma_fence);
> +		dma_fence_chain_free(fences[n].chain_fence);
> +	}
> +	kvfree(fences);
> +}
> +
> +void put_fence_array(struct eb_fence *fences, u64 num_fences)
> +{
> +	if (fences)
> +		__free_fence_array(fences, num_fences);
> +}
> +
> +int add_timeline_fence(struct drm_file *file, u32 handle, u64 point,
> +		       struct eb_fence *f, bool wait, bool signal)
> +{
> +	struct drm_syncobj *syncobj;
> +	struct dma_fence *fence = NULL;
> +	u32 flags = 0;
> +	int err = 0;
> +
> +	syncobj = drm_syncobj_find(file, handle);
> +	if (!syncobj) {
> +		DRM_DEBUG("Invalid syncobj handle provided\n");
> +		return -ENOENT;
> +	}
> +
> +	fence = drm_syncobj_fence_get(syncobj);
> +
> +	if (!fence && wait && !signal) {
> +		DRM_DEBUG("Syncobj handle has no fence\n");
> +		drm_syncobj_put(syncobj);
> +		return -EINVAL;
> +	}
> +
> +	if (fence)
> +		err = dma_fence_chain_find_seqno(&fence, point);
> +
> +	if (err && !signal) {
> +		DRM_DEBUG("Syncobj handle missing requested point %llu\n", point);
> +		dma_fence_put(fence);
> +		drm_syncobj_put(syncobj);
> +		return err;
> +	}
> +
> +	/*
> +	 * A point might have been signaled already and
> +	 * garbage collected from the timeline. In this case
> +	 * just ignore the point and carry on.
> +	 */
> +	if (!fence && !signal) {
> +		drm_syncobj_put(syncobj);
> +		return 0;
> +	}
> +
> +	/*
> +	 * For timeline syncobjs we need to preallocate chains for
> +	 * later signaling.
> +	 */
> +	if (point != 0 && signal) {
> +		/*
> +		 * Waiting and signaling the same point (when point !=
> +		 * 0) would break the timeline.
> +		 */
> +		if (wait) {
> +			DRM_DEBUG("Trying to wait & signal the same timeline point.\n");
> +			dma_fence_put(fence);
> +			drm_syncobj_put(syncobj);
> +			return -EINVAL;
> +		}
> +
> +		f->chain_fence = dma_fence_chain_alloc();
> +		if (!f->chain_fence) {
> +			drm_syncobj_put(syncobj);
> +			dma_fence_put(fence);
> +			return -ENOMEM;
> +		}
> +	} else {
> +		f->chain_fence = NULL;
> +	}
> +
> +	flags |= wait ? __EXEC_COMMON_FENCE_WAIT : 0;
> +	flags |= signal ? __EXEC_COMMON_FENCE_SIGNAL : 0;
> +
> +	f->syncobj = ptr_pack_bits(syncobj, flags, 2);
> +	f->dma_fence = fence;
> +	f->value = point;
> +	return 1;
> +}
> +
> +int await_fence_array(struct eb_fence *fences, u64 num_fences,
> +		      struct i915_request *rq)
> +{
> +	unsigned int n;
> +
> +	for (n = 0; n < num_fences; n++) {
> +		struct drm_syncobj *syncobj;
> +		unsigned int flags;
> +		int err;
> +
> +		syncobj = ptr_unpack_bits(fences[n].syncobj, &flags, 2);
> +
> +		if (!fences[n].dma_fence)
> +			continue;
> +
> +		err = i915_request_await_dma_fence(rq, fences[n].dma_fence);
> +		if (err < 0)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
> +void signal_fence_array(struct eb_fence *fences, u64 num_fences,
> +			struct dma_fence * const fence)
> +{
> +	unsigned int n;
> +
> +	for (n = 0; n < num_fences; n++) {
> +		struct drm_syncobj *syncobj;
> +		unsigned int flags;
> +
> +		syncobj = ptr_unpack_bits(fences[n].syncobj, &flags, 2);
> +		if (!(flags & __EXEC_COMMON_FENCE_SIGNAL))
> +			continue;
> +
> +		if (fences[n].chain_fence) {
> +			drm_syncobj_add_point(syncobj,
> +					      fences[n].chain_fence,
> +					      fence,
> +					      fences[n].value);
> +			/*
> +			 * The chain's ownership is transferred to the
> +			 * timeline.
> +			 */
> +			fences[n].chain_fence = NULL;
> +		} else {
> +			drm_syncobj_replace_fence(syncobj, fence);
> +		}
> +	}
> +}
> +
> +/*
> + * Using two helper loops for the order of which requests / batches are created
> + * and added the to backend. Requests are created in order from the parent to
> + * the last child. Requests are added in the reverse order, from the last child
> + * to parent. This is done for locking reasons as the timeline lock is acquired
> + * during request creation and released when the request is added to the
> + * backend. To make lockdep happy (see intel_context_timeline_lock) this must be
> + * the ordering.
> + */
> +#define for_each_batch_create_order(_num_batches) \
> +	for (unsigned int i = 0; i < (_num_batches); ++i)
> +#define for_each_batch_add_order(_num_batches) \
> +	for (int i = (_num_batches) - 1; i >= 0; --i)
> +
> +static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
> +{
> +	struct i915_request *rq, *rn;
> +
> +	list_for_each_entry_safe(rq, rn, &tl->requests, link)
> +		if (rq == end || !i915_request_retire(rq))
> +			break;
> +}
> +
> +static int eb_request_add(struct intel_context *context,
> +			  struct i915_request *rq,
> +			  struct i915_sched_attr sched,
> +			  int err, bool last_parallel)
> +{
> +	struct intel_timeline * const tl = i915_request_timeline(rq);
> +	struct i915_sched_attr attr = {};
> +	struct i915_request *prev;
> +
> +	lockdep_assert_held(&tl->mutex);
> +	lockdep_unpin_lock(&tl->mutex, rq->cookie);
> +
> +	trace_i915_request_add(rq);
> +
> +	prev = __i915_request_commit(rq);
> +
> +	/* Check that the context wasn't destroyed before submission */
> +	if (likely(!intel_context_is_closed(context))) {
> +		attr = sched;
> +	} else {
> +		/* Serialise with context_close via the add_to_timeline */
> +		i915_request_set_error_once(rq, -ENOENT);
> +		__i915_request_skip(rq);
> +		err = -ENOENT; /* override any transient errors */
> +	}
> +
> +	if (intel_context_is_parallel(context)) {
> +		if (err) {
> +			__i915_request_skip(rq);
> +			set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
> +				&rq->fence.flags);
> +		}
> +		if (last_parallel)
> +			set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
> +				&rq->fence.flags);
> +	}
> +
> +	__i915_request_queue(rq, &attr);
> +
> +	/* Try to clean up the client's timeline after submitting the request */
> +	if (prev)
> +		retire_requests(tl, prev);
> +
> +	mutex_unlock(&tl->mutex);
> +
> +	return err;
> +}
> +
> +int eb_requests_add(struct i915_request **requests, unsigned int num_batches,
> +		    struct intel_context *context, struct i915_sched_attr sched,
> +		    int err)
> +{
> +	/*
> +	 * We iterate in reverse order of creation to release timeline mutexes
> +	 * in same order.
> +	 */
> +	for_each_batch_add_order(num_batches) {
> +		struct i915_request *rq = requests[i];
> +
> +		if (!rq)
> +			continue;
> +
> +		err = eb_request_add(context, rq, sched, err, i == 0);
> +	}
> +
> +	return err;
> +}
> +
> +void eb_requests_get(struct i915_request **requests, unsigned int num_batches)
> +{
> +	for_each_batch_create_order(num_batches) {
> +		if (!requests[i])
> +			break;
> +
> +		i915_request_get(requests[i]);
> +	}
> +}
> +
> +void eb_requests_put(struct i915_request **requests, unsigned int num_batches)
> +{
> +	for_each_batch_create_order(num_batches) {
> +		if (!requests[i])
> +			break;
> +
> +		i915_request_put(requests[i]);
> +	}
> +}
> +
> +struct dma_fence *__eb_composite_fence_create(struct i915_request **requests,
> +					      unsigned int num_batches,
> +					      struct intel_context *context)
> +{
> +	struct dma_fence_array *fence_array;
> +	struct dma_fence **fences;
> +
> +	GEM_BUG_ON(!intel_context_is_parent(context));
> +
> +	fences = kmalloc_array(num_batches, sizeof(*fences), GFP_KERNEL);
> +	if (!fences)
> +		return ERR_PTR(-ENOMEM);
> +
> +	for_each_batch_create_order(num_batches) {
> +		fences[i] = &requests[i]->fence;
> +		__set_bit(I915_FENCE_FLAG_COMPOSITE,
> +			  &requests[i]->fence.flags);
> +	}
> +
> +	fence_array = dma_fence_array_create(num_batches,
> +					     fences,
> +					     context->parallel.fence_context,
> +					     context->parallel.seqno++,
> +					     false);
> +	if (!fence_array) {
> +		kfree(fences);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	/* Move ownership to the dma_fence_array created above */
> +	for_each_batch_create_order(num_batches)
> +		dma_fence_get(fences[i]);
> +
> +	return &fence_array->base;
> +}
> +
> +int __eb_select_engine(struct intel_context *ce)
> +{
> +	struct intel_context *child;
> +	int err;
> +
> +	for_each_child(ce, child)
> +		intel_context_get(child);
> +	intel_gt_pm_get(ce->engine->gt);
> +
> +	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
> +		err = intel_context_alloc_state(ce);
> +		if (err)
> +			goto err;
> +	}
> +	for_each_child(ce, child) {
> +		if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
> +			err = intel_context_alloc_state(child);
> +			if (err)
> +				goto err;
> +		}
> +	}
> +
> +	/*
> +	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
> +	 * EIO if the GPU is already wedged.
> +	 */
> +	err = intel_gt_terminally_wedged(ce->engine->gt);
> +	if (err)
> +		goto err;
> +
> +	if (!i915_vm_tryget(ce->vm)) {
> +		err = -ENOENT;
> +		goto err;
> +	}
> +
> +	return 0;
> +err:
> +	intel_gt_pm_put(ce->engine->gt);
> +	for_each_child(ce, child)
> +		intel_context_put(child);
> +	return err;
> +}
> +
> +void __eb_put_engine(struct intel_context *context, struct intel_gt *gt)
> +{
> +	struct intel_context *child;
> +
> +	i915_vm_put(context->vm);
> +	intel_gt_pm_put(gt);
> +	for_each_child(context, child)
> +		intel_context_put(child);
> +	intel_context_put(context);
> +}
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
> new file mode 100644
> index 000000000000..725febfd6a53
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#ifndef __I915_GEM_EXECBUFFER_COMMON_H
> +#define __I915_GEM_EXECBUFFER_COMMON_H
> +
> +#include <drm/drm_syncobj.h>
> +
> +#include "gt/intel_context.h"
> +
> +struct eb_fence {
> +	struct drm_syncobj *syncobj;
> +	struct dma_fence *dma_fence;
> +	u64 value;
> +	struct dma_fence_chain *chain_fence;
> +};
> +
> +int __eb_pin_engine(struct intel_context *ce, struct i915_gem_ww_ctx *ww,
> +		    bool throttle, bool nonblock);
> +void __eb_unpin_engine(struct intel_context *ce);
> +int __eb_select_engine(struct intel_context *ce);
> +void __eb_put_engine(struct intel_context *context, struct intel_gt *gt);

Two things:

1)

Is there enough commonality to maybe avoid multiple arguments and have like

struct i915_execbuffer {

};

struct i915_execbuffer2 {
	struct i915_execbuffer eb;
	.. eb2 specific fields ..
};

struct i915_execbuffer3 {
	struct i915_execbuffer eb;
	.. eb3 specific fields ..
};

And then have the common helpers take the pointer to the common struct?

2)

Should we prefix with i915_ everything that is now no longer static?

Regards,

Tvrtko

> +
> +struct intel_context *
> +eb_find_context(struct intel_context *context, unsigned int context_number);
> +
> +int add_timeline_fence(struct drm_file *file, u32 handle, u64 point,
> +		       struct eb_fence *f, bool wait, bool signal);
> +void put_fence_array(struct eb_fence *fences, u64 num_fences);
> +int await_fence_array(struct eb_fence *fences, u64 num_fences,
> +		      struct i915_request *rq);
> +void signal_fence_array(struct eb_fence *fences, u64 num_fences,
> +			struct dma_fence * const fence);
> +
> +int eb_requests_add(struct i915_request **requests, unsigned int num_batches,
> +		    struct intel_context *context, struct i915_sched_attr sched,
> +		    int err);
> +void eb_requests_get(struct i915_request **requests, unsigned int num_batches);
> +void eb_requests_put(struct i915_request **requests, unsigned int num_batches);
> +
> +struct dma_fence *__eb_composite_fence_create(struct i915_request **requests,
> +					      unsigned int num_batches,
> +					      struct intel_context *context);
> +
> +#endif /* __I915_GEM_EXECBUFFER_COMMON_H */

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 02/14] drm/i915/vm_bind: Add __i915_sw_fence_await_reservation()
  2022-09-21  9:06   ` Tvrtko Ursulin
@ 2022-09-21 17:47     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21 17:47 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	matthew.auld, daniel.vetter, christian.koenig

On Wed, Sep 21, 2022 at 10:06:48AM +0100, Tvrtko Ursulin wrote:
>
>On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
>>Add function __i915_sw_fence_await_reservation() for
>>asynchronous wait on a dma-resv object with specified
>>dma_resv_usage. This is required for async vma unbind
>>with vm_bind.
>>
>>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>---
>>  drivers/gpu/drm/i915/i915_sw_fence.c | 25 ++++++++++++++++++-------
>>  drivers/gpu/drm/i915/i915_sw_fence.h |  7 ++++++-
>>  2 files changed, 24 insertions(+), 8 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
>>index 6fc0d1b89690..0ce8f4efc1ed 100644
>>--- a/drivers/gpu/drm/i915/i915_sw_fence.c
>>+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
>>@@ -569,12 +569,11 @@ int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>>  	return ret;
>>  }
>>-int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>>-				    struct dma_resv *resv,
>>-				    const struct dma_fence_ops *exclude,
>>-				    bool write,
>>-				    unsigned long timeout,
>>-				    gfp_t gfp)
>>+int __i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>>+				      struct dma_resv *resv,
>>+				      enum dma_resv_usage usage,
>>+				      unsigned long timeout,
>>+				      gfp_t gfp)
>>  {
>>  	struct dma_resv_iter cursor;
>>  	struct dma_fence *f;
>>@@ -583,7 +582,7 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>>  	debug_fence_assert(fence);
>>  	might_sleep_if(gfpflags_allow_blocking(gfp));
>>-	dma_resv_iter_begin(&cursor, resv, dma_resv_usage_rw(write));
>>+	dma_resv_iter_begin(&cursor, resv, usage);
>>  	dma_resv_for_each_fence_unlocked(&cursor, f) {
>>  		pending = i915_sw_fence_await_dma_fence(fence, f, timeout,
>>  							gfp);
>>@@ -598,6 +597,18 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>>  	return ret;
>>  }
>>+int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>>+				    struct dma_resv *resv,
>>+				    const struct dma_fence_ops *exclude,
>>+				    bool write,
>>+				    unsigned long timeout,
>>+				    gfp_t gfp)
>>+{
>>+	return __i915_sw_fence_await_reservation(fence, resv,
>>+						 dma_resv_usage_rw(write),
>>+						 timeout, gfp);
>>+}
>
>Drive by observation - it looked dodgy that you create a wrapper here 
>which ignores one function parameter.
>
>On a more detailed look it seems no callers actually use exclude and 
>it's even unused inside this function since 1b5bdf071e62 ("drm/i915: 
>use the new iterator in i915_sw_fence_await_reservation v3").
>
>So a cleanup patch before this one?
>

Thanks Tvrtko.
Yah, I noticed it, but did not want to fix that here.
Sure, will post a patch beforehand to fix that.

Niranjana

>Regards,
>
>Tvrtko
>
>
>>+
>>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>  #include "selftests/lib_sw_fence.c"
>>  #include "selftests/i915_sw_fence.c"
>>diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
>>index 619fc5a22f0c..3cf4b6e16f35 100644
>>--- a/drivers/gpu/drm/i915/i915_sw_fence.h
>>+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
>>@@ -10,13 +10,13 @@
>>  #define _I915_SW_FENCE_H_
>>  #include <linux/dma-fence.h>
>>+#include <linux/dma-resv.h>
>>  #include <linux/gfp.h>
>>  #include <linux/kref.h>
>>  #include <linux/notifier.h> /* for NOTIFY_DONE */
>>  #include <linux/wait.h>
>>  struct completion;
>>-struct dma_resv;
>>  struct i915_sw_fence;
>>  enum i915_sw_fence_notify {
>>@@ -89,6 +89,11 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>>  				  unsigned long timeout,
>>  				  gfp_t gfp);
>>+int __i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>>+				      struct dma_resv *resv,
>>+				      enum dma_resv_usage usage,
>>+				      unsigned long timeout,
>>+				      gfp_t gfp);
>>  int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>>  				    struct dma_resv *resv,
>>  				    const struct dma_fence_ops *exclude,

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 03/14] drm/i915/vm_bind: Expose i915_gem_object_max_page_size()
  2022-09-21  9:13   ` Tvrtko Ursulin
@ 2022-09-21 18:00     ` Niranjana Vishwanathapura
  2022-09-22  8:09       ` Tvrtko Ursulin
  0 siblings, 1 reply; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21 18:00 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	matthew.auld, daniel.vetter, christian.koenig

On Wed, Sep 21, 2022 at 10:13:12AM +0100, Tvrtko Ursulin wrote:
>
>On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
>>Expose i915_gem_object_max_page_size() function non-static
>>which will be used by the vm_bind feature.
>>
>>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>---
>>  drivers/gpu/drm/i915/gem/i915_gem_create.c | 20 +++++++++++++++-----
>>  drivers/gpu/drm/i915/gem/i915_gem_object.h |  2 ++
>>  2 files changed, 17 insertions(+), 5 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>index 33673fe7ee0a..3b3ab4abb0a3 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>@@ -11,14 +11,24 @@
>>  #include "pxp/intel_pxp.h"
>>  #include "i915_drv.h"
>>+#include "i915_gem_context.h"
>
>I can't spot that you are adding any code which would need this? 
>I915_GTT_PAGE_SIZE_4K? It is in intel_gtt.h.

This include should have been added in a later patch for calling
i915_gem_vm_lookup(). But got added here while patch refactoring.
Will fix.

>
>>  #include "i915_gem_create.h"
>>  #include "i915_trace.h"
>>  #include "i915_user_extensions.h"
>>-static u32 object_max_page_size(struct intel_memory_region **placements,
>>-				unsigned int n_placements)
>>+/**
>>+ * i915_gem_object_max_page_size() - max of min_page_size of the regions
>>+ * @placements:  list of regions
>>+ * @n_placements: number of the placements
>>+ *
>>+ * Calculates the max of the min_page_size of a list of placements passed in.
>>+ *
>>+ * Return: max of the min_page_size
>>+ */
>>+u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
>>+				  unsigned int n_placements)
>>  {
>>-	u32 max_page_size = 0;
>>+	u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>>  	int i;
>>  	for (i = 0; i < n_placements; i++) {
>>@@ -28,7 +38,6 @@ static u32 object_max_page_size(struct intel_memory_region **placements,
>>  		max_page_size = max_t(u32, max_page_size, mr->min_page_size);
>>  	}
>>-	GEM_BUG_ON(!max_page_size);
>>  	return max_page_size;
>>  }
>>@@ -99,7 +108,8 @@ __i915_gem_object_create_user_ext(struct drm_i915_private *i915, u64 size,
>>  	i915_gem_flush_free_objects(i915);
>>-	size = round_up(size, object_max_page_size(placements, n_placements));
>>+	size = round_up(size, i915_gem_object_max_page_size(placements,
>>+							    n_placements));
>>  	if (size == 0)
>>  		return ERR_PTR(-EINVAL);
>
>Because of the changes above this path is now unreachable. I suppose 
>it was meant to tell the user "you have supplied no placements"? But 
>then GEM_BUG_ON (which you remove) used to be wrong.
>

Yah, looks like an existing problem. May be this "size == 0" check
should have been made before we do the round_up()? ie., check input 'size'
paramter is not 0?
I think for now, I will remove this check as it was unreachable anyhow.

Niranjana

>Regards,
>
>Tvrtko
>
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>index 7317d4102955..8c97bddad921 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>@@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
>>  }
>>  void i915_gem_init__objects(struct drm_i915_private *i915);
>>+u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
>>+				  unsigned int n_placements);
>>  void i915_objects_module_exit(void);
>>  int i915_objects_module_init(void);

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 08/14] drm/i915/vm_bind: Abstract out common execbuf functions
  2022-09-21 10:18   ` Tvrtko Ursulin
@ 2022-09-21 18:17     ` Niranjana Vishwanathapura
  2022-09-22  9:05       ` Tvrtko Ursulin
  0 siblings, 1 reply; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21 18:17 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	matthew.auld, daniel.vetter, christian.koenig

On Wed, Sep 21, 2022 at 11:18:53AM +0100, Tvrtko Ursulin wrote:
>
>On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
>>The new execbuf3 ioctl path and the legacy execbuf ioctl
>>paths have many common functionalities.
>>Share code between these two paths by abstracting out the
>>common functionalities into a separate file where possible.
>
>Looks like a good start to me. A couple comments/questions below.
>
>>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>---
>>  drivers/gpu/drm/i915/Makefile                 |   1 +
>>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 507 ++---------------
>>  .../drm/i915/gem/i915_gem_execbuffer_common.c | 530 ++++++++++++++++++
>>  .../drm/i915/gem/i915_gem_execbuffer_common.h |  47 ++
>>  4 files changed, 612 insertions(+), 473 deletions(-)
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
>>
>>diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>>index 9bf939ef18ea..bf952f478555 100644
>>--- a/drivers/gpu/drm/i915/Makefile
>>+++ b/drivers/gpu/drm/i915/Makefile
>>@@ -148,6 +148,7 @@ gem-y += \
>>  	gem/i915_gem_create.o \
>>  	gem/i915_gem_dmabuf.o \
>>  	gem/i915_gem_domain.o \
>>+	gem/i915_gem_execbuffer_common.o \
>>  	gem/i915_gem_execbuffer.o \
>>  	gem/i915_gem_internal.o \
>>  	gem/i915_gem_object.o \
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>index 33d989a20227..363b2a788cdf 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>@@ -9,8 +9,6 @@
>>  #include <linux/sync_file.h>
>>  #include <linux/uaccess.h>
>>-#include <drm/drm_syncobj.h>
>>-
>>  #include "display/intel_frontbuffer.h"
>>  #include "gem/i915_gem_ioctls.h"
>>@@ -28,6 +26,7 @@
>>  #include "i915_file_private.h"
>>  #include "i915_gem_clflush.h"
>>  #include "i915_gem_context.h"
>>+#include "i915_gem_execbuffer_common.h"
>>  #include "i915_gem_evict.h"
>>  #include "i915_gem_ioctls.h"
>>  #include "i915_trace.h"
>>@@ -235,13 +234,6 @@ enum {
>>   * the batchbuffer in trusted mode, otherwise the ioctl is rejected.
>>   */
>>-struct eb_fence {
>>-	struct drm_syncobj *syncobj; /* Use with ptr_mask_bits() */
>>-	struct dma_fence *dma_fence;
>>-	u64 value;
>>-	struct dma_fence_chain *chain_fence;
>>-};
>>-
>>  struct i915_execbuffer {
>>  	struct drm_i915_private *i915; /** i915 backpointer */
>>  	struct drm_file *file; /** per-file lookup tables and limits */
>>@@ -2446,164 +2438,29 @@ static const enum intel_engine_id user_ring_map[] = {
>>  	[I915_EXEC_VEBOX]	= VECS0
>>  };
>>-static struct i915_request *eb_throttle(struct i915_execbuffer *eb, struct intel_context *ce)
>>-{
>>-	struct intel_ring *ring = ce->ring;
>>-	struct intel_timeline *tl = ce->timeline;
>>-	struct i915_request *rq;
>>-
>>-	/*
>>-	 * Completely unscientific finger-in-the-air estimates for suitable
>>-	 * maximum user request size (to avoid blocking) and then backoff.
>>-	 */
>>-	if (intel_ring_update_space(ring) >= PAGE_SIZE)
>>-		return NULL;
>>-
>>-	/*
>>-	 * Find a request that after waiting upon, there will be at least half
>>-	 * the ring available. The hysteresis allows us to compete for the
>>-	 * shared ring and should mean that we sleep less often prior to
>>-	 * claiming our resources, but not so long that the ring completely
>>-	 * drains before we can submit our next request.
>>-	 */
>>-	list_for_each_entry(rq, &tl->requests, link) {
>>-		if (rq->ring != ring)
>>-			continue;
>>-
>>-		if (__intel_ring_space(rq->postfix,
>>-				       ring->emit, ring->size) > ring->size / 2)
>>-			break;
>>-	}
>>-	if (&rq->link == &tl->requests)
>>-		return NULL; /* weird, we will check again later for real */
>>-
>>-	return i915_request_get(rq);
>>-}
>>-
>>-static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
>>-			   bool throttle)
>>-{
>>-	struct intel_timeline *tl;
>>-	struct i915_request *rq = NULL;
>>-
>>-	/*
>>-	 * Take a local wakeref for preparing to dispatch the execbuf as
>>-	 * we expect to access the hardware fairly frequently in the
>>-	 * process, and require the engine to be kept awake between accesses.
>>-	 * Upon dispatch, we acquire another prolonged wakeref that we hold
>>-	 * until the timeline is idle, which in turn releases the wakeref
>>-	 * taken on the engine, and the parent device.
>>-	 */
>>-	tl = intel_context_timeline_lock(ce);
>>-	if (IS_ERR(tl))
>>-		return PTR_ERR(tl);
>>-
>>-	intel_context_enter(ce);
>>-	if (throttle)
>>-		rq = eb_throttle(eb, ce);
>>-	intel_context_timeline_unlock(tl);
>>-
>>-	if (rq) {
>>-		bool nonblock = eb->file->filp->f_flags & O_NONBLOCK;
>>-		long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
>>-
>>-		if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
>>-				      timeout) < 0) {
>>-			i915_request_put(rq);
>>-
>>-			/*
>>-			 * Error path, cannot use intel_context_timeline_lock as
>>-			 * that is user interruptable and this clean up step
>>-			 * must be done.
>>-			 */
>>-			mutex_lock(&ce->timeline->mutex);
>>-			intel_context_exit(ce);
>>-			mutex_unlock(&ce->timeline->mutex);
>>-
>>-			if (nonblock)
>>-				return -EWOULDBLOCK;
>>-			else
>>-				return -EINTR;
>>-		}
>>-		i915_request_put(rq);
>>-	}
>>-
>>-	return 0;
>>-}
>>-
>>  static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
>>  {
>>-	struct intel_context *ce = eb->context, *child;
>>  	int err;
>>-	int i = 0, j = 0;
>>  	GEM_BUG_ON(eb->args->flags & __EXEC_ENGINE_PINNED);
>
>You could avoid duplication by putting the common flags into the 
>common header and then eb2 and eb3 add their own flags relative to the 
>end of the last common entry.
>

I intentionally avoided that. I think we should avoid creating
dependencies between legacy execbuf and execbuf3 paths. They only
get more and more intertwined if we go that route.
So I have added some helper functions here which both paths
can share, but the main flow is strictly kept separate.
More on this below...

>>-	if (unlikely(intel_context_is_banned(ce)))
>>-		return -EIO;
>>-
>>-	/*
>>-	 * Pinning the contexts may generate requests in order to acquire
>>-	 * GGTT space, so do this first before we reserve a seqno for
>>-	 * ourselves.
>>-	 */
>>-	err = intel_context_pin_ww(ce, &eb->ww);
>>+	err = __eb_pin_engine(eb->context, &eb->ww, throttle,
>>+			      eb->file->filp->f_flags & O_NONBLOCK);
>>  	if (err)
>>  		return err;
>>-	for_each_child(ce, child) {
>>-		err = intel_context_pin_ww(child, &eb->ww);
>>-		GEM_BUG_ON(err);	/* perma-pinned should incr a counter */
>>-	}
>>-
>>-	for_each_child(ce, child) {
>>-		err = eb_pin_timeline(eb, child, throttle);
>>-		if (err)
>>-			goto unwind;
>>-		++i;
>>-	}
>>-	err = eb_pin_timeline(eb, ce, throttle);
>>-	if (err)
>>-		goto unwind;
>>  	eb->args->flags |= __EXEC_ENGINE_PINNED;
>>  	return 0;
>>-
>>-unwind:
>>-	for_each_child(ce, child) {
>>-		if (j++ < i) {
>>-			mutex_lock(&child->timeline->mutex);
>>-			intel_context_exit(child);
>>-			mutex_unlock(&child->timeline->mutex);
>>-		}
>>-	}
>>-	for_each_child(ce, child)
>>-		intel_context_unpin(child);
>>-	intel_context_unpin(ce);
>>-	return err;
>>  }
>>  static void eb_unpin_engine(struct i915_execbuffer *eb)
>>  {
>>-	struct intel_context *ce = eb->context, *child;
>>-
>>  	if (!(eb->args->flags & __EXEC_ENGINE_PINNED))
>>  		return;
>>  	eb->args->flags &= ~__EXEC_ENGINE_PINNED;
>>-	for_each_child(ce, child) {
>>-		mutex_lock(&child->timeline->mutex);
>>-		intel_context_exit(child);
>>-		mutex_unlock(&child->timeline->mutex);
>>-
>>-		intel_context_unpin(child);
>>-	}
>>-
>>-	mutex_lock(&ce->timeline->mutex);
>>-	intel_context_exit(ce);
>>-	mutex_unlock(&ce->timeline->mutex);
>>-
>>-	intel_context_unpin(ce);
>>+	__eb_unpin_engine(eb->context);
>>  }
>>  static unsigned int
>>@@ -2652,7 +2509,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
>>  static int
>>  eb_select_engine(struct i915_execbuffer *eb)
>>  {
>>-	struct intel_context *ce, *child;
>>+	struct intel_context *ce;
>>  	unsigned int idx;
>>  	int err;
>>@@ -2677,36 +2534,10 @@ eb_select_engine(struct i915_execbuffer *eb)
>>  	}
>>  	eb->num_batches = ce->parallel.number_children + 1;
>>-	for_each_child(ce, child)
>>-		intel_context_get(child);
>>-	intel_gt_pm_get(ce->engine->gt);
>>-
>>-	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
>>-		err = intel_context_alloc_state(ce);
>>-		if (err)
>>-			goto err;
>>-	}
>>-	for_each_child(ce, child) {
>>-		if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
>>-			err = intel_context_alloc_state(child);
>>-			if (err)
>>-				goto err;
>>-		}
>>-	}
>>-
>>-	/*
>>-	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
>>-	 * EIO if the GPU is already wedged.
>>-	 */
>>-	err = intel_gt_terminally_wedged(ce->engine->gt);
>>+	err = __eb_select_engine(ce);
>>  	if (err)
>>  		goto err;
>>-	if (!i915_vm_tryget(ce->vm)) {
>>-		err = -ENOENT;
>>-		goto err;
>>-	}
>>-
>>  	eb->context = ce;
>>  	eb->gt = ce->engine->gt;
>>@@ -2715,12 +2546,9 @@ eb_select_engine(struct i915_execbuffer *eb)
>>  	 * during ww handling. The pool is destroyed when last pm reference
>>  	 * is dropped, which breaks our -EDEADLK handling.
>>  	 */
>>-	return err;
>>+	return 0;
>>  err:
>>-	intel_gt_pm_put(ce->engine->gt);
>>-	for_each_child(ce, child)
>>-		intel_context_put(child);
>>  	intel_context_put(ce);
>>  	return err;
>>  }
>>@@ -2728,24 +2556,7 @@ eb_select_engine(struct i915_execbuffer *eb)
>>  static void
>>  eb_put_engine(struct i915_execbuffer *eb)
>>  {
>>-	struct intel_context *child;
>>-
>>-	i915_vm_put(eb->context->vm);
>>-	intel_gt_pm_put(eb->gt);
>>-	for_each_child(eb->context, child)
>>-		intel_context_put(child);
>>-	intel_context_put(eb->context);
>>-}
>>-
>>-static void
>>-__free_fence_array(struct eb_fence *fences, unsigned int n)
>>-{
>>-	while (n--) {
>>-		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
>>-		dma_fence_put(fences[n].dma_fence);
>>-		dma_fence_chain_free(fences[n].chain_fence);
>>-	}
>>-	kvfree(fences);
>>+	__eb_put_engine(eb->context, eb->gt);
>>  }
>>  static int
>>@@ -2756,7 +2567,6 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
>>  	u64 __user *user_values;
>>  	struct eb_fence *f;
>>  	u64 nfences;
>>-	int err = 0;
>>  	nfences = timeline_fences->fence_count;
>>  	if (!nfences)
>>@@ -2791,9 +2601,9 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
>>  	while (nfences--) {
>>  		struct drm_i915_gem_exec_fence user_fence;
>>-		struct drm_syncobj *syncobj;
>>-		struct dma_fence *fence = NULL;
>>+		bool wait, signal;
>>  		u64 point;
>>+		int ret;
>>  		if (__copy_from_user(&user_fence,
>>  				     user_fences++,
>>@@ -2806,70 +2616,15 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
>>  		if (__get_user(point, user_values++))
>>  			return -EFAULT;
>>-		syncobj = drm_syncobj_find(eb->file, user_fence.handle);
>>-		if (!syncobj) {
>>-			DRM_DEBUG("Invalid syncobj handle provided\n");
>>-			return -ENOENT;
>>-		}
>>-
>>-		fence = drm_syncobj_fence_get(syncobj);
>>-
>>-		if (!fence && user_fence.flags &&
>>-		    !(user_fence.flags & I915_EXEC_FENCE_SIGNAL)) {
>>-			DRM_DEBUG("Syncobj handle has no fence\n");
>>-			drm_syncobj_put(syncobj);
>>-			return -EINVAL;
>>-		}
>>-
>>-		if (fence)
>>-			err = dma_fence_chain_find_seqno(&fence, point);
>>-
>>-		if (err && !(user_fence.flags & I915_EXEC_FENCE_SIGNAL)) {
>>-			DRM_DEBUG("Syncobj handle missing requested point %llu\n", point);
>>-			dma_fence_put(fence);
>>-			drm_syncobj_put(syncobj);
>>-			return err;
>>-		}
>>-
>>-		/*
>>-		 * A point might have been signaled already and
>>-		 * garbage collected from the timeline. In this case
>>-		 * just ignore the point and carry on.
>>-		 */
>>-		if (!fence && !(user_fence.flags & I915_EXEC_FENCE_SIGNAL)) {
>>-			drm_syncobj_put(syncobj);
>>+		wait = user_fence.flags & I915_EXEC_FENCE_WAIT;
>>+		signal = user_fence.flags & I915_EXEC_FENCE_SIGNAL;
>>+		ret = add_timeline_fence(eb->file, user_fence.handle, point,
>>+					 f, wait, signal);
>>+		if (ret < 0)
>>+			return ret;
>>+		else if (!ret)
>>  			continue;
>>-		}
>>-		/*
>>-		 * For timeline syncobjs we need to preallocate chains for
>>-		 * later signaling.
>>-		 */
>>-		if (point != 0 && user_fence.flags & I915_EXEC_FENCE_SIGNAL) {
>>-			/*
>>-			 * Waiting and signaling the same point (when point !=
>>-			 * 0) would break the timeline.
>>-			 */
>>-			if (user_fence.flags & I915_EXEC_FENCE_WAIT) {
>>-				DRM_DEBUG("Trying to wait & signal the same timeline point.\n");
>>-				dma_fence_put(fence);
>>-				drm_syncobj_put(syncobj);
>>-				return -EINVAL;
>>-			}
>>-
>>-			f->chain_fence = dma_fence_chain_alloc();
>>-			if (!f->chain_fence) {
>>-				drm_syncobj_put(syncobj);
>>-				dma_fence_put(fence);
>>-				return -ENOMEM;
>>-			}
>>-		} else {
>>-			f->chain_fence = NULL;
>>-		}
>>-
>>-		f->syncobj = ptr_pack_bits(syncobj, user_fence.flags, 2);
>>-		f->dma_fence = fence;
>>-		f->value = point;
>>  		f++;
>>  		eb->num_fences++;
>>  	}
>>@@ -2949,65 +2704,6 @@ static int add_fence_array(struct i915_execbuffer *eb)
>>  	return 0;
>>  }
>>-static void put_fence_array(struct eb_fence *fences, int num_fences)
>>-{
>>-	if (fences)
>>-		__free_fence_array(fences, num_fences);
>>-}
>>-
>>-static int
>>-await_fence_array(struct i915_execbuffer *eb,
>>-		  struct i915_request *rq)
>>-{
>>-	unsigned int n;
>>-	int err;
>>-
>>-	for (n = 0; n < eb->num_fences; n++) {
>>-		struct drm_syncobj *syncobj;
>>-		unsigned int flags;
>>-
>>-		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
>>-
>>-		if (!eb->fences[n].dma_fence)
>>-			continue;
>>-
>>-		err = i915_request_await_dma_fence(rq, eb->fences[n].dma_fence);
>>-		if (err < 0)
>>-			return err;
>>-	}
>>-
>>-	return 0;
>>-}
>>-
>>-static void signal_fence_array(const struct i915_execbuffer *eb,
>>-			       struct dma_fence * const fence)
>>-{
>>-	unsigned int n;
>>-
>>-	for (n = 0; n < eb->num_fences; n++) {
>>-		struct drm_syncobj *syncobj;
>>-		unsigned int flags;
>>-
>>-		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
>>-		if (!(flags & I915_EXEC_FENCE_SIGNAL))
>>-			continue;
>>-
>>-		if (eb->fences[n].chain_fence) {
>>-			drm_syncobj_add_point(syncobj,
>>-					      eb->fences[n].chain_fence,
>>-					      fence,
>>-					      eb->fences[n].value);
>>-			/*
>>-			 * The chain's ownership is transferred to the
>>-			 * timeline.
>>-			 */
>>-			eb->fences[n].chain_fence = NULL;
>>-		} else {
>>-			drm_syncobj_replace_fence(syncobj, fence);
>>-		}
>>-	}
>>-}
>>-
>>  static int
>>  parse_timeline_fences(struct i915_user_extension __user *ext, void *data)
>>  {
>>@@ -3020,80 +2716,6 @@ parse_timeline_fences(struct i915_user_extension __user *ext, void *data)
>>  	return add_timeline_fence_array(eb, &timeline_fences);
>>  }
>>-static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
>>-{
>>-	struct i915_request *rq, *rn;
>>-
>>-	list_for_each_entry_safe(rq, rn, &tl->requests, link)
>>-		if (rq == end || !i915_request_retire(rq))
>>-			break;
>>-}
>>-
>>-static int eb_request_add(struct i915_execbuffer *eb, struct i915_request *rq,
>>-			  int err, bool last_parallel)
>>-{
>>-	struct intel_timeline * const tl = i915_request_timeline(rq);
>>-	struct i915_sched_attr attr = {};
>>-	struct i915_request *prev;
>>-
>>-	lockdep_assert_held(&tl->mutex);
>>-	lockdep_unpin_lock(&tl->mutex, rq->cookie);
>>-
>>-	trace_i915_request_add(rq);
>>-
>>-	prev = __i915_request_commit(rq);
>>-
>>-	/* Check that the context wasn't destroyed before submission */
>>-	if (likely(!intel_context_is_closed(eb->context))) {
>>-		attr = eb->gem_context->sched;
>>-	} else {
>>-		/* Serialise with context_close via the add_to_timeline */
>>-		i915_request_set_error_once(rq, -ENOENT);
>>-		__i915_request_skip(rq);
>>-		err = -ENOENT; /* override any transient errors */
>>-	}
>>-
>>-	if (intel_context_is_parallel(eb->context)) {
>>-		if (err) {
>>-			__i915_request_skip(rq);
>>-			set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
>>-				&rq->fence.flags);
>>-		}
>>-		if (last_parallel)
>>-			set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
>>-				&rq->fence.flags);
>>-	}
>>-
>>-	__i915_request_queue(rq, &attr);
>>-
>>-	/* Try to clean up the client's timeline after submitting the request */
>>-	if (prev)
>>-		retire_requests(tl, prev);
>>-
>>-	mutex_unlock(&tl->mutex);
>>-
>>-	return err;
>>-}
>>-
>>-static int eb_requests_add(struct i915_execbuffer *eb, int err)
>>-{
>>-	int i;
>>-
>>-	/*
>>-	 * We iterate in reverse order of creation to release timeline mutexes in
>>-	 * same order.
>>-	 */
>>-	for_each_batch_add_order(eb, i) {
>>-		struct i915_request *rq = eb->requests[i];
>>-
>>-		if (!rq)
>>-			continue;
>>-		err |= eb_request_add(eb, rq, err, i == 0);
>>-	}
>>-
>>-	return err;
>>-}
>>-
>>  static const i915_user_extension_fn execbuf_extensions[] = {
>>  	[DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES] = parse_timeline_fences,
>>  };
>>@@ -3120,73 +2742,26 @@ parse_execbuf2_extensions(struct drm_i915_gem_execbuffer2 *args,
>>  				    eb);
>>  }
>>-static void eb_requests_get(struct i915_execbuffer *eb)
>>-{
>>-	unsigned int i;
>>-
>>-	for_each_batch_create_order(eb, i) {
>>-		if (!eb->requests[i])
>>-			break;
>>-
>>-		i915_request_get(eb->requests[i]);
>>-	}
>>-}
>>-
>>-static void eb_requests_put(struct i915_execbuffer *eb)
>>-{
>>-	unsigned int i;
>>-
>>-	for_each_batch_create_order(eb, i) {
>>-		if (!eb->requests[i])
>>-			break;
>>-
>>-		i915_request_put(eb->requests[i]);
>>-	}
>>-}
>>-
>>  static struct sync_file *
>>  eb_composite_fence_create(struct i915_execbuffer *eb, int out_fence_fd)
>>  {
>>  	struct sync_file *out_fence = NULL;
>>-	struct dma_fence_array *fence_array;
>>-	struct dma_fence **fences;
>>-	unsigned int i;
>>-
>>-	GEM_BUG_ON(!intel_context_is_parent(eb->context));
>>+	struct dma_fence *fence;
>>-	fences = kmalloc_array(eb->num_batches, sizeof(*fences), GFP_KERNEL);
>>-	if (!fences)
>>-		return ERR_PTR(-ENOMEM);
>>-
>>-	for_each_batch_create_order(eb, i) {
>>-		fences[i] = &eb->requests[i]->fence;
>>-		__set_bit(I915_FENCE_FLAG_COMPOSITE,
>>-			  &eb->requests[i]->fence.flags);
>>-	}
>>-
>>-	fence_array = dma_fence_array_create(eb->num_batches,
>>-					     fences,
>>-					     eb->context->parallel.fence_context,
>>-					     eb->context->parallel.seqno++,
>>-					     false);
>>-	if (!fence_array) {
>>-		kfree(fences);
>>-		return ERR_PTR(-ENOMEM);
>>-	}
>>-
>>-	/* Move ownership to the dma_fence_array created above */
>>-	for_each_batch_create_order(eb, i)
>>-		dma_fence_get(fences[i]);
>>+	fence = __eb_composite_fence_create(eb->requests, eb->num_batches,
>>+					    eb->context);
>>+	if (IS_ERR(fence))
>>+		return ERR_CAST(fence);
>>  	if (out_fence_fd != -1) {
>>-		out_fence = sync_file_create(&fence_array->base);
>>+		out_fence = sync_file_create(fence);
>>  		/* sync_file now owns fence_arry, drop creation ref */
>>-		dma_fence_put(&fence_array->base);
>>+		dma_fence_put(fence);
>>  		if (!out_fence)
>>  			return ERR_PTR(-ENOMEM);
>>  	}
>>-	eb->composite_fence = &fence_array->base;
>>+	eb->composite_fence = fence;
>>  	return out_fence;
>>  }
>>@@ -3218,7 +2793,7 @@ eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq,
>>  	}
>>  	if (eb->fences) {
>>-		err = await_fence_array(eb, rq);
>>+		err = await_fence_array(eb->fences, eb->num_fences, rq);
>>  		if (err)
>>  			return ERR_PTR(err);
>>  	}
>>@@ -3236,23 +2811,6 @@ eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq,
>>  	return out_fence;
>>  }
>>-static struct intel_context *
>>-eb_find_context(struct i915_execbuffer *eb, unsigned int context_number)
>>-{
>>-	struct intel_context *child;
>>-
>>-	if (likely(context_number == 0))
>>-		return eb->context;
>>-
>>-	for_each_child(eb->context, child)
>>-		if (!--context_number)
>>-			return child;
>>-
>>-	GEM_BUG_ON("Context not found");
>>-
>>-	return NULL;
>>-}
>>-
>>  static struct sync_file *
>>  eb_requests_create(struct i915_execbuffer *eb, struct dma_fence *in_fence,
>>  		   int out_fence_fd)
>>@@ -3262,7 +2820,8 @@ eb_requests_create(struct i915_execbuffer *eb, struct dma_fence *in_fence,
>>  	for_each_batch_create_order(eb, i) {
>>  		/* Allocate a request for this batch buffer nice and early. */
>>-		eb->requests[i] = i915_request_create(eb_find_context(eb, i));
>>+		eb->requests[i] =
>>+			i915_request_create(eb_find_context(eb->context, i));
>>  		if (IS_ERR(eb->requests[i])) {
>>  			out_fence = ERR_CAST(eb->requests[i]);
>>  			eb->requests[i] = NULL;
>>@@ -3442,11 +3001,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>>  	err = eb_submit(&eb);
>>  err_request:
>>-	eb_requests_get(&eb);
>>-	err = eb_requests_add(&eb, err);
>>+	eb_requests_get(eb.requests, eb.num_batches);
>>+	err = eb_requests_add(eb.requests, eb.num_batches, eb.context,
>>+			      eb.gem_context->sched, err);
>>  	if (eb.fences)
>>-		signal_fence_array(&eb, eb.composite_fence ?
>>+		signal_fence_array(eb.fences, eb.num_fences,
>>+				   eb.composite_fence ?
>>  				   eb.composite_fence :
>>  				   &eb.requests[0]->fence);
>>@@ -3471,7 +3032,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>>  	if (!out_fence && eb.composite_fence)
>>  		dma_fence_put(eb.composite_fence);
>>-	eb_requests_put(&eb);
>>+	eb_requests_put(eb.requests, eb.num_batches);
>>  err_vma:
>>  	eb_release_vmas(&eb, true);
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
>>new file mode 100644
>>index 000000000000..167268dfd930
>>--- /dev/null
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
>>@@ -0,0 +1,530 @@
>>+// SPDX-License-Identifier: MIT
>>+/*
>>+ * Copyright © 2022 Intel Corporation
>>+ */
>>+
>>+#include <linux/dma-fence-array.h>
>>+#include "gt/intel_gt.h"
>>+#include "gt/intel_gt_pm.h"
>>+#include "gt/intel_ring.h"
>>+
>>+#include "i915_gem_execbuffer_common.h"
>>+
>>+#define __EXEC_COMMON_FENCE_WAIT	BIT(0)
>>+#define __EXEC_COMMON_FENCE_SIGNAL	BIT(1)
>>+
>>+static struct i915_request *eb_throttle(struct intel_context *ce)
>>+{
>>+	struct intel_ring *ring = ce->ring;
>>+	struct intel_timeline *tl = ce->timeline;
>>+	struct i915_request *rq;
>>+
>>+	/*
>>+	 * Completely unscientific finger-in-the-air estimates for suitable
>>+	 * maximum user request size (to avoid blocking) and then backoff.
>>+	 */
>>+	if (intel_ring_update_space(ring) >= PAGE_SIZE)
>>+		return NULL;
>>+
>>+	/*
>>+	 * Find a request that after waiting upon, there will be at least half
>>+	 * the ring available. The hysteresis allows us to compete for the
>>+	 * shared ring and should mean that we sleep less often prior to
>>+	 * claiming our resources, but not so long that the ring completely
>>+	 * drains before we can submit our next request.
>>+	 */
>>+	list_for_each_entry(rq, &tl->requests, link) {
>>+		if (rq->ring != ring)
>>+			continue;
>>+
>>+		if (__intel_ring_space(rq->postfix,
>>+				       ring->emit, ring->size) > ring->size / 2)
>>+			break;
>>+	}
>>+	if (&rq->link == &tl->requests)
>>+		return NULL; /* weird, we will check again later for real */
>>+
>>+	return i915_request_get(rq);
>>+}
>>+
>>+static int eb_pin_timeline(struct intel_context *ce, bool throttle,
>>+			   bool nonblock)
>>+{
>>+	struct intel_timeline *tl;
>>+	struct i915_request *rq = NULL;
>>+
>>+	/*
>>+	 * Take a local wakeref for preparing to dispatch the execbuf as
>>+	 * we expect to access the hardware fairly frequently in the
>>+	 * process, and require the engine to be kept awake between accesses.
>>+	 * Upon dispatch, we acquire another prolonged wakeref that we hold
>>+	 * until the timeline is idle, which in turn releases the wakeref
>>+	 * taken on the engine, and the parent device.
>>+	 */
>>+	tl = intel_context_timeline_lock(ce);
>>+	if (IS_ERR(tl))
>>+		return PTR_ERR(tl);
>>+
>>+	intel_context_enter(ce);
>>+	if (throttle)
>>+		rq = eb_throttle(ce);
>>+	intel_context_timeline_unlock(tl);
>>+
>>+	if (rq) {
>>+		long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
>>+
>>+		if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
>>+				      timeout) < 0) {
>>+			i915_request_put(rq);
>>+
>>+			/*
>>+			 * Error path, cannot use intel_context_timeline_lock as
>>+			 * that is user interruptable and this clean up step
>>+			 * must be done.
>>+			 */
>>+			mutex_lock(&ce->timeline->mutex);
>>+			intel_context_exit(ce);
>>+			mutex_unlock(&ce->timeline->mutex);
>>+
>>+			if (nonblock)
>>+				return -EWOULDBLOCK;
>>+			else
>>+				return -EINTR;
>>+		}
>>+		i915_request_put(rq);
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+int __eb_pin_engine(struct intel_context *ce, struct i915_gem_ww_ctx *ww,
>>+		    bool throttle, bool nonblock)
>>+{
>>+	struct intel_context *child;
>>+	int err;
>>+	int i = 0, j = 0;
>>+
>>+	if (unlikely(intel_context_is_banned(ce)))
>>+		return -EIO;
>>+
>>+	/*
>>+	 * Pinning the contexts may generate requests in order to acquire
>>+	 * GGTT space, so do this first before we reserve a seqno for
>>+	 * ourselves.
>>+	 */
>>+	err = intel_context_pin_ww(ce, ww);
>>+	if (err)
>>+		return err;
>>+
>>+	for_each_child(ce, child) {
>>+		err = intel_context_pin_ww(child, ww);
>>+		GEM_BUG_ON(err);	/* perma-pinned should incr a counter */
>>+	}
>>+
>>+	for_each_child(ce, child) {
>>+		err = eb_pin_timeline(child, throttle, nonblock);
>>+		if (err)
>>+			goto unwind;
>>+		++i;
>>+	}
>>+	err = eb_pin_timeline(ce, throttle, nonblock);
>>+	if (err)
>>+		goto unwind;
>>+
>>+	return 0;
>>+
>>+unwind:
>>+	for_each_child(ce, child) {
>>+		if (j++ < i) {
>>+			mutex_lock(&child->timeline->mutex);
>>+			intel_context_exit(child);
>>+			mutex_unlock(&child->timeline->mutex);
>>+		}
>>+	}
>>+	for_each_child(ce, child)
>>+		intel_context_unpin(child);
>>+	intel_context_unpin(ce);
>>+	return err;
>>+}
>>+
>>+void __eb_unpin_engine(struct intel_context *ce)
>>+{
>>+	struct intel_context *child;
>>+
>>+	for_each_child(ce, child) {
>>+		mutex_lock(&child->timeline->mutex);
>>+		intel_context_exit(child);
>>+		mutex_unlock(&child->timeline->mutex);
>>+
>>+		intel_context_unpin(child);
>>+	}
>>+
>>+	mutex_lock(&ce->timeline->mutex);
>>+	intel_context_exit(ce);
>>+	mutex_unlock(&ce->timeline->mutex);
>>+
>>+	intel_context_unpin(ce);
>>+}
>>+
>>+struct intel_context *
>>+eb_find_context(struct intel_context *context, unsigned int context_number)
>>+{
>>+	struct intel_context *child;
>>+
>>+	if (likely(context_number == 0))
>>+		return context;
>>+
>>+	for_each_child(context, child)
>>+		if (!--context_number)
>>+			return child;
>>+
>>+	GEM_BUG_ON("Context not found");
>>+
>>+	return NULL;
>>+}
>>+
>>+static void __free_fence_array(struct eb_fence *fences, u64 n)
>>+{
>>+	while (n--) {
>>+		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
>>+		dma_fence_put(fences[n].dma_fence);
>>+		dma_fence_chain_free(fences[n].chain_fence);
>>+	}
>>+	kvfree(fences);
>>+}
>>+
>>+void put_fence_array(struct eb_fence *fences, u64 num_fences)
>>+{
>>+	if (fences)
>>+		__free_fence_array(fences, num_fences);
>>+}
>>+
>>+int add_timeline_fence(struct drm_file *file, u32 handle, u64 point,
>>+		       struct eb_fence *f, bool wait, bool signal)
>>+{
>>+	struct drm_syncobj *syncobj;
>>+	struct dma_fence *fence = NULL;
>>+	u32 flags = 0;
>>+	int err = 0;
>>+
>>+	syncobj = drm_syncobj_find(file, handle);
>>+	if (!syncobj) {
>>+		DRM_DEBUG("Invalid syncobj handle provided\n");
>>+		return -ENOENT;
>>+	}
>>+
>>+	fence = drm_syncobj_fence_get(syncobj);
>>+
>>+	if (!fence && wait && !signal) {
>>+		DRM_DEBUG("Syncobj handle has no fence\n");
>>+		drm_syncobj_put(syncobj);
>>+		return -EINVAL;
>>+	}
>>+
>>+	if (fence)
>>+		err = dma_fence_chain_find_seqno(&fence, point);
>>+
>>+	if (err && !signal) {
>>+		DRM_DEBUG("Syncobj handle missing requested point %llu\n", point);
>>+		dma_fence_put(fence);
>>+		drm_syncobj_put(syncobj);
>>+		return err;
>>+	}
>>+
>>+	/*
>>+	 * A point might have been signaled already and
>>+	 * garbage collected from the timeline. In this case
>>+	 * just ignore the point and carry on.
>>+	 */
>>+	if (!fence && !signal) {
>>+		drm_syncobj_put(syncobj);
>>+		return 0;
>>+	}
>>+
>>+	/*
>>+	 * For timeline syncobjs we need to preallocate chains for
>>+	 * later signaling.
>>+	 */
>>+	if (point != 0 && signal) {
>>+		/*
>>+		 * Waiting and signaling the same point (when point !=
>>+		 * 0) would break the timeline.
>>+		 */
>>+		if (wait) {
>>+			DRM_DEBUG("Trying to wait & signal the same timeline point.\n");
>>+			dma_fence_put(fence);
>>+			drm_syncobj_put(syncobj);
>>+			return -EINVAL;
>>+		}
>>+
>>+		f->chain_fence = dma_fence_chain_alloc();
>>+		if (!f->chain_fence) {
>>+			drm_syncobj_put(syncobj);
>>+			dma_fence_put(fence);
>>+			return -ENOMEM;
>>+		}
>>+	} else {
>>+		f->chain_fence = NULL;
>>+	}
>>+
>>+	flags |= wait ? __EXEC_COMMON_FENCE_WAIT : 0;
>>+	flags |= signal ? __EXEC_COMMON_FENCE_SIGNAL : 0;
>>+
>>+	f->syncobj = ptr_pack_bits(syncobj, flags, 2);
>>+	f->dma_fence = fence;
>>+	f->value = point;
>>+	return 1;
>>+}
>>+
>>+int await_fence_array(struct eb_fence *fences, u64 num_fences,
>>+		      struct i915_request *rq)
>>+{
>>+	unsigned int n;
>>+
>>+	for (n = 0; n < num_fences; n++) {
>>+		struct drm_syncobj *syncobj;
>>+		unsigned int flags;
>>+		int err;
>>+
>>+		syncobj = ptr_unpack_bits(fences[n].syncobj, &flags, 2);
>>+
>>+		if (!fences[n].dma_fence)
>>+			continue;
>>+
>>+		err = i915_request_await_dma_fence(rq, fences[n].dma_fence);
>>+		if (err < 0)
>>+			return err;
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+void signal_fence_array(struct eb_fence *fences, u64 num_fences,
>>+			struct dma_fence * const fence)
>>+{
>>+	unsigned int n;
>>+
>>+	for (n = 0; n < num_fences; n++) {
>>+		struct drm_syncobj *syncobj;
>>+		unsigned int flags;
>>+
>>+		syncobj = ptr_unpack_bits(fences[n].syncobj, &flags, 2);
>>+		if (!(flags & __EXEC_COMMON_FENCE_SIGNAL))
>>+			continue;
>>+
>>+		if (fences[n].chain_fence) {
>>+			drm_syncobj_add_point(syncobj,
>>+					      fences[n].chain_fence,
>>+					      fence,
>>+					      fences[n].value);
>>+			/*
>>+			 * The chain's ownership is transferred to the
>>+			 * timeline.
>>+			 */
>>+			fences[n].chain_fence = NULL;
>>+		} else {
>>+			drm_syncobj_replace_fence(syncobj, fence);
>>+		}
>>+	}
>>+}
>>+
>>+/*
>>+ * Using two helper loops for the order of which requests / batches are created
>>+ * and added the to backend. Requests are created in order from the parent to
>>+ * the last child. Requests are added in the reverse order, from the last child
>>+ * to parent. This is done for locking reasons as the timeline lock is acquired
>>+ * during request creation and released when the request is added to the
>>+ * backend. To make lockdep happy (see intel_context_timeline_lock) this must be
>>+ * the ordering.
>>+ */
>>+#define for_each_batch_create_order(_num_batches) \
>>+	for (unsigned int i = 0; i < (_num_batches); ++i)
>>+#define for_each_batch_add_order(_num_batches) \
>>+	for (int i = (_num_batches) - 1; i >= 0; --i)
>>+
>>+static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
>>+{
>>+	struct i915_request *rq, *rn;
>>+
>>+	list_for_each_entry_safe(rq, rn, &tl->requests, link)
>>+		if (rq == end || !i915_request_retire(rq))
>>+			break;
>>+}
>>+
>>+static int eb_request_add(struct intel_context *context,
>>+			  struct i915_request *rq,
>>+			  struct i915_sched_attr sched,
>>+			  int err, bool last_parallel)
>>+{
>>+	struct intel_timeline * const tl = i915_request_timeline(rq);
>>+	struct i915_sched_attr attr = {};
>>+	struct i915_request *prev;
>>+
>>+	lockdep_assert_held(&tl->mutex);
>>+	lockdep_unpin_lock(&tl->mutex, rq->cookie);
>>+
>>+	trace_i915_request_add(rq);
>>+
>>+	prev = __i915_request_commit(rq);
>>+
>>+	/* Check that the context wasn't destroyed before submission */
>>+	if (likely(!intel_context_is_closed(context))) {
>>+		attr = sched;
>>+	} else {
>>+		/* Serialise with context_close via the add_to_timeline */
>>+		i915_request_set_error_once(rq, -ENOENT);
>>+		__i915_request_skip(rq);
>>+		err = -ENOENT; /* override any transient errors */
>>+	}
>>+
>>+	if (intel_context_is_parallel(context)) {
>>+		if (err) {
>>+			__i915_request_skip(rq);
>>+			set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
>>+				&rq->fence.flags);
>>+		}
>>+		if (last_parallel)
>>+			set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
>>+				&rq->fence.flags);
>>+	}
>>+
>>+	__i915_request_queue(rq, &attr);
>>+
>>+	/* Try to clean up the client's timeline after submitting the request */
>>+	if (prev)
>>+		retire_requests(tl, prev);
>>+
>>+	mutex_unlock(&tl->mutex);
>>+
>>+	return err;
>>+}
>>+
>>+int eb_requests_add(struct i915_request **requests, unsigned int num_batches,
>>+		    struct intel_context *context, struct i915_sched_attr sched,
>>+		    int err)
>>+{
>>+	/*
>>+	 * We iterate in reverse order of creation to release timeline mutexes
>>+	 * in same order.
>>+	 */
>>+	for_each_batch_add_order(num_batches) {
>>+		struct i915_request *rq = requests[i];
>>+
>>+		if (!rq)
>>+			continue;
>>+
>>+		err = eb_request_add(context, rq, sched, err, i == 0);
>>+	}
>>+
>>+	return err;
>>+}
>>+
>>+void eb_requests_get(struct i915_request **requests, unsigned int num_batches)
>>+{
>>+	for_each_batch_create_order(num_batches) {
>>+		if (!requests[i])
>>+			break;
>>+
>>+		i915_request_get(requests[i]);
>>+	}
>>+}
>>+
>>+void eb_requests_put(struct i915_request **requests, unsigned int num_batches)
>>+{
>>+	for_each_batch_create_order(num_batches) {
>>+		if (!requests[i])
>>+			break;
>>+
>>+		i915_request_put(requests[i]);
>>+	}
>>+}
>>+
>>+struct dma_fence *__eb_composite_fence_create(struct i915_request **requests,
>>+					      unsigned int num_batches,
>>+					      struct intel_context *context)
>>+{
>>+	struct dma_fence_array *fence_array;
>>+	struct dma_fence **fences;
>>+
>>+	GEM_BUG_ON(!intel_context_is_parent(context));
>>+
>>+	fences = kmalloc_array(num_batches, sizeof(*fences), GFP_KERNEL);
>>+	if (!fences)
>>+		return ERR_PTR(-ENOMEM);
>>+
>>+	for_each_batch_create_order(num_batches) {
>>+		fences[i] = &requests[i]->fence;
>>+		__set_bit(I915_FENCE_FLAG_COMPOSITE,
>>+			  &requests[i]->fence.flags);
>>+	}
>>+
>>+	fence_array = dma_fence_array_create(num_batches,
>>+					     fences,
>>+					     context->parallel.fence_context,
>>+					     context->parallel.seqno++,
>>+					     false);
>>+	if (!fence_array) {
>>+		kfree(fences);
>>+		return ERR_PTR(-ENOMEM);
>>+	}
>>+
>>+	/* Move ownership to the dma_fence_array created above */
>>+	for_each_batch_create_order(num_batches)
>>+		dma_fence_get(fences[i]);
>>+
>>+	return &fence_array->base;
>>+}
>>+
>>+int __eb_select_engine(struct intel_context *ce)
>>+{
>>+	struct intel_context *child;
>>+	int err;
>>+
>>+	for_each_child(ce, child)
>>+		intel_context_get(child);
>>+	intel_gt_pm_get(ce->engine->gt);
>>+
>>+	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
>>+		err = intel_context_alloc_state(ce);
>>+		if (err)
>>+			goto err;
>>+	}
>>+	for_each_child(ce, child) {
>>+		if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
>>+			err = intel_context_alloc_state(child);
>>+			if (err)
>>+				goto err;
>>+		}
>>+	}
>>+
>>+	/*
>>+	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
>>+	 * EIO if the GPU is already wedged.
>>+	 */
>>+	err = intel_gt_terminally_wedged(ce->engine->gt);
>>+	if (err)
>>+		goto err;
>>+
>>+	if (!i915_vm_tryget(ce->vm)) {
>>+		err = -ENOENT;
>>+		goto err;
>>+	}
>>+
>>+	return 0;
>>+err:
>>+	intel_gt_pm_put(ce->engine->gt);
>>+	for_each_child(ce, child)
>>+		intel_context_put(child);
>>+	return err;
>>+}
>>+
>>+void __eb_put_engine(struct intel_context *context, struct intel_gt *gt)
>>+{
>>+	struct intel_context *child;
>>+
>>+	i915_vm_put(context->vm);
>>+	intel_gt_pm_put(gt);
>>+	for_each_child(context, child)
>>+		intel_context_put(child);
>>+	intel_context_put(context);
>>+}
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
>>new file mode 100644
>>index 000000000000..725febfd6a53
>>--- /dev/null
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
>>@@ -0,0 +1,47 @@
>>+/* SPDX-License-Identifier: MIT */
>>+/*
>>+ * Copyright © 2022 Intel Corporation
>>+ */
>>+
>>+#ifndef __I915_GEM_EXECBUFFER_COMMON_H
>>+#define __I915_GEM_EXECBUFFER_COMMON_H
>>+
>>+#include <drm/drm_syncobj.h>
>>+
>>+#include "gt/intel_context.h"
>>+
>>+struct eb_fence {
>>+	struct drm_syncobj *syncobj;
>>+	struct dma_fence *dma_fence;
>>+	u64 value;
>>+	struct dma_fence_chain *chain_fence;
>>+};
>>+
>>+int __eb_pin_engine(struct intel_context *ce, struct i915_gem_ww_ctx *ww,
>>+		    bool throttle, bool nonblock);
>>+void __eb_unpin_engine(struct intel_context *ce);
>>+int __eb_select_engine(struct intel_context *ce);
>>+void __eb_put_engine(struct intel_context *context, struct intel_gt *gt);
>
>Two things:
>
>1)
>
>Is there enough commonality to maybe avoid multiple arguments and have like
>
>struct i915_execbuffer {
>
>};
>
>struct i915_execbuffer2 {
>	struct i915_execbuffer eb;
>	.. eb2 specific fields ..
>};
>
>struct i915_execbuffer3 {
>	struct i915_execbuffer eb;
>	.. eb3 specific fields ..
>};
>
>And then have the common helpers take the pointer to the common struct?
>

...
This requires updating legacy execbuf path everywhere which doesn't look
like a good idea to me. As discussed during vm_bind rfc, I think it is
better to keep execbuf3 to itself and keep it leaner.

>2)
>
>Should we prefix with i915_ everything that is now no longer static?
>

Yah, makes sense, will update.

Niranjana

>Regards,
>
>Tvrtko
>
>>+
>>+struct intel_context *
>>+eb_find_context(struct intel_context *context, unsigned int context_number);
>>+
>>+int add_timeline_fence(struct drm_file *file, u32 handle, u64 point,
>>+		       struct eb_fence *f, bool wait, bool signal);
>>+void put_fence_array(struct eb_fence *fences, u64 num_fences);
>>+int await_fence_array(struct eb_fence *fences, u64 num_fences,
>>+		      struct i915_request *rq);
>>+void signal_fence_array(struct eb_fence *fences, u64 num_fences,
>>+			struct dma_fence * const fence);
>>+
>>+int eb_requests_add(struct i915_request **requests, unsigned int num_batches,
>>+		    struct intel_context *context, struct i915_sched_attr sched,
>>+		    int err);
>>+void eb_requests_get(struct i915_request **requests, unsigned int num_batches);
>>+void eb_requests_put(struct i915_request **requests, unsigned int num_batches);
>>+
>>+struct dma_fence *__eb_composite_fence_create(struct i915_request **requests,
>>+					      unsigned int num_batches,
>>+					      struct intel_context *context);
>>+
>>+#endif /* __I915_GEM_EXECBUFFER_COMMON_H */

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 03/14] drm/i915/vm_bind: Expose i915_gem_object_max_page_size()
  2022-09-21 18:00     ` Niranjana Vishwanathapura
@ 2022-09-22  8:09       ` Tvrtko Ursulin
  2022-09-22 16:18         ` Matthew Auld
  0 siblings, 1 reply; 62+ messages in thread
From: Tvrtko Ursulin @ 2022-09-22  8:09 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, Matthew Auld
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	matthew.auld, daniel.vetter, christian.koenig


On 21/09/2022 19:00, Niranjana Vishwanathapura wrote:
> On Wed, Sep 21, 2022 at 10:13:12AM +0100, Tvrtko Ursulin wrote:
>>
>> On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
>>> Expose i915_gem_object_max_page_size() function non-static
>>> which will be used by the vm_bind feature.
>>>
>>> Signed-off-by: Niranjana Vishwanathapura 
>>> <niranjana.vishwanathapura@intel.com>
>>> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/gem/i915_gem_create.c | 20 +++++++++++++++-----
>>>  drivers/gpu/drm/i915/gem/i915_gem_object.h |  2 ++
>>>  2 files changed, 17 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>> index 33673fe7ee0a..3b3ab4abb0a3 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>> @@ -11,14 +11,24 @@
>>>  #include "pxp/intel_pxp.h"
>>>  #include "i915_drv.h"
>>> +#include "i915_gem_context.h"
>>
>> I can't spot that you are adding any code which would need this? 
>> I915_GTT_PAGE_SIZE_4K? It is in intel_gtt.h.
> 
> This include should have been added in a later patch for calling
> i915_gem_vm_lookup(). But got added here while patch refactoring.
> Will fix.
> 
>>
>>>  #include "i915_gem_create.h"
>>>  #include "i915_trace.h"
>>>  #include "i915_user_extensions.h"
>>> -static u32 object_max_page_size(struct intel_memory_region 
>>> **placements,
>>> -                unsigned int n_placements)
>>> +/**
>>> + * i915_gem_object_max_page_size() - max of min_page_size of the 
>>> regions
>>> + * @placements:  list of regions
>>> + * @n_placements: number of the placements
>>> + *
>>> + * Calculates the max of the min_page_size of a list of placements 
>>> passed in.
>>> + *
>>> + * Return: max of the min_page_size
>>> + */
>>> +u32 i915_gem_object_max_page_size(struct intel_memory_region 
>>> **placements,
>>> +                  unsigned int n_placements)
>>>  {
>>> -    u32 max_page_size = 0;
>>> +    u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>>>      int i;
>>>      for (i = 0; i < n_placements; i++) {
>>> @@ -28,7 +38,6 @@ static u32 object_max_page_size(struct 
>>> intel_memory_region **placements,
>>>          max_page_size = max_t(u32, max_page_size, mr->min_page_size);
>>>      }
>>> -    GEM_BUG_ON(!max_page_size);
>>>      return max_page_size;
>>>  }
>>> @@ -99,7 +108,8 @@ __i915_gem_object_create_user_ext(struct 
>>> drm_i915_private *i915, u64 size,
>>>      i915_gem_flush_free_objects(i915);
>>> -    size = round_up(size, object_max_page_size(placements, 
>>> n_placements));
>>> +    size = round_up(size, i915_gem_object_max_page_size(placements,
>>> +                                n_placements));
>>>      if (size == 0)
>>>          return ERR_PTR(-EINVAL);
>>
>> Because of the changes above this path is now unreachable. I suppose 
>> it was meant to tell the user "you have supplied no placements"? But 
>> then GEM_BUG_ON (which you remove) used to be wrong.
>>
> 
> Yah, looks like an existing problem. May be this "size == 0" check
> should have been made before we do the round_up()? ie., check input 'size'
> paramter is not 0?
> I think for now, I will remove this check as it was unreachable anyhow.

Hm that's true as well. i915_gem_create_ext_ioctl ensures at least one 
placement and internal callers do as well.

To be safe, instead of removing maybe move to before "size = " and 
change to "if (GEM_WARN_ON(n_placements == 0))"? Not sure.. Matt any 
thoughts here given the changes in this patch?

Regards,

Tvrtko

> 
> Niranjana
> 
>> Regards,
>>
>> Tvrtko
>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> index 7317d4102955..8c97bddad921 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> @@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
>>>  }
>>>  void i915_gem_init__objects(struct drm_i915_private *i915);
>>> +u32 i915_gem_object_max_page_size(struct intel_memory_region 
>>> **placements,
>>> +                  unsigned int n_placements);
>>>  void i915_objects_module_exit(void);
>>>  int i915_objects_module_init(void);

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 08/14] drm/i915/vm_bind: Abstract out common execbuf functions
  2022-09-21 18:17     ` Niranjana Vishwanathapura
@ 2022-09-22  9:05       ` Tvrtko Ursulin
  2022-09-22 14:12         ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 62+ messages in thread
From: Tvrtko Ursulin @ 2022-09-22  9:05 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	matthew.auld, daniel.vetter, christian.koenig


On 21/09/2022 19:17, Niranjana Vishwanathapura wrote:
> On Wed, Sep 21, 2022 at 11:18:53AM +0100, Tvrtko Ursulin wrote:
>>
>> On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
>>> The new execbuf3 ioctl path and the legacy execbuf ioctl
>>> paths have many common functionalities.
>>> Share code between these two paths by abstracting out the
>>> common functionalities into a separate file where possible.
>>
>> Looks like a good start to me. A couple comments/questions below.
>>
>>> Signed-off-by: Niranjana Vishwanathapura 
>>> <niranjana.vishwanathapura@intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/Makefile                 |   1 +
>>>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 507 ++---------------
>>>  .../drm/i915/gem/i915_gem_execbuffer_common.c | 530 ++++++++++++++++++
>>>  .../drm/i915/gem/i915_gem_execbuffer_common.h |  47 ++
>>>  4 files changed, 612 insertions(+), 473 deletions(-)
>>>  create mode 100644 
>>> drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
>>>  create mode 100644 
>>> drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
>>>
>>> diff --git a/drivers/gpu/drm/i915/Makefile 
>>> b/drivers/gpu/drm/i915/Makefile
>>> index 9bf939ef18ea..bf952f478555 100644
>>> --- a/drivers/gpu/drm/i915/Makefile
>>> +++ b/drivers/gpu/drm/i915/Makefile
>>> @@ -148,6 +148,7 @@ gem-y += \
>>>      gem/i915_gem_create.o \
>>>      gem/i915_gem_dmabuf.o \
>>>      gem/i915_gem_domain.o \
>>> +    gem/i915_gem_execbuffer_common.o \
>>>      gem/i915_gem_execbuffer.o \
>>>      gem/i915_gem_internal.o \
>>>      gem/i915_gem_object.o \
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> index 33d989a20227..363b2a788cdf 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> @@ -9,8 +9,6 @@
>>>  #include <linux/sync_file.h>
>>>  #include <linux/uaccess.h>
>>> -#include <drm/drm_syncobj.h>
>>> -
>>>  #include "display/intel_frontbuffer.h"
>>>  #include "gem/i915_gem_ioctls.h"
>>> @@ -28,6 +26,7 @@
>>>  #include "i915_file_private.h"
>>>  #include "i915_gem_clflush.h"
>>>  #include "i915_gem_context.h"
>>> +#include "i915_gem_execbuffer_common.h"
>>>  #include "i915_gem_evict.h"
>>>  #include "i915_gem_ioctls.h"
>>>  #include "i915_trace.h"
>>> @@ -235,13 +234,6 @@ enum {
>>>   * the batchbuffer in trusted mode, otherwise the ioctl is rejected.
>>>   */
>>> -struct eb_fence {
>>> -    struct drm_syncobj *syncobj; /* Use with ptr_mask_bits() */
>>> -    struct dma_fence *dma_fence;
>>> -    u64 value;
>>> -    struct dma_fence_chain *chain_fence;
>>> -};
>>> -
>>>  struct i915_execbuffer {
>>>      struct drm_i915_private *i915; /** i915 backpointer */
>>>      struct drm_file *file; /** per-file lookup tables and limits */
>>> @@ -2446,164 +2438,29 @@ static const enum intel_engine_id 
>>> user_ring_map[] = {
>>>      [I915_EXEC_VEBOX]    = VECS0
>>>  };
>>> -static struct i915_request *eb_throttle(struct i915_execbuffer *eb, 
>>> struct intel_context *ce)
>>> -{
>>> -    struct intel_ring *ring = ce->ring;
>>> -    struct intel_timeline *tl = ce->timeline;
>>> -    struct i915_request *rq;
>>> -
>>> -    /*
>>> -     * Completely unscientific finger-in-the-air estimates for suitable
>>> -     * maximum user request size (to avoid blocking) and then backoff.
>>> -     */
>>> -    if (intel_ring_update_space(ring) >= PAGE_SIZE)
>>> -        return NULL;
>>> -
>>> -    /*
>>> -     * Find a request that after waiting upon, there will be at 
>>> least half
>>> -     * the ring available. The hysteresis allows us to compete for the
>>> -     * shared ring and should mean that we sleep less often prior to
>>> -     * claiming our resources, but not so long that the ring completely
>>> -     * drains before we can submit our next request.
>>> -     */
>>> -    list_for_each_entry(rq, &tl->requests, link) {
>>> -        if (rq->ring != ring)
>>> -            continue;
>>> -
>>> -        if (__intel_ring_space(rq->postfix,
>>> -                       ring->emit, ring->size) > ring->size / 2)
>>> -            break;
>>> -    }
>>> -    if (&rq->link == &tl->requests)
>>> -        return NULL; /* weird, we will check again later for real */
>>> -
>>> -    return i915_request_get(rq);
>>> -}
>>> -
>>> -static int eb_pin_timeline(struct i915_execbuffer *eb, struct 
>>> intel_context *ce,
>>> -               bool throttle)
>>> -{
>>> -    struct intel_timeline *tl;
>>> -    struct i915_request *rq = NULL;
>>> -
>>> -    /*
>>> -     * Take a local wakeref for preparing to dispatch the execbuf as
>>> -     * we expect to access the hardware fairly frequently in the
>>> -     * process, and require the engine to be kept awake between 
>>> accesses.
>>> -     * Upon dispatch, we acquire another prolonged wakeref that we hold
>>> -     * until the timeline is idle, which in turn releases the wakeref
>>> -     * taken on the engine, and the parent device.
>>> -     */
>>> -    tl = intel_context_timeline_lock(ce);
>>> -    if (IS_ERR(tl))
>>> -        return PTR_ERR(tl);
>>> -
>>> -    intel_context_enter(ce);
>>> -    if (throttle)
>>> -        rq = eb_throttle(eb, ce);
>>> -    intel_context_timeline_unlock(tl);
>>> -
>>> -    if (rq) {
>>> -        bool nonblock = eb->file->filp->f_flags & O_NONBLOCK;
>>> -        long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
>>> -
>>> -        if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
>>> -                      timeout) < 0) {
>>> -            i915_request_put(rq);
>>> -
>>> -            /*
>>> -             * Error path, cannot use intel_context_timeline_lock as
>>> -             * that is user interruptable and this clean up step
>>> -             * must be done.
>>> -             */
>>> -            mutex_lock(&ce->timeline->mutex);
>>> -            intel_context_exit(ce);
>>> -            mutex_unlock(&ce->timeline->mutex);
>>> -
>>> -            if (nonblock)
>>> -                return -EWOULDBLOCK;
>>> -            else
>>> -                return -EINTR;
>>> -        }
>>> -        i915_request_put(rq);
>>> -    }
>>> -
>>> -    return 0;
>>> -}
>>> -
>>>  static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
>>>  {
>>> -    struct intel_context *ce = eb->context, *child;
>>>      int err;
>>> -    int i = 0, j = 0;
>>>      GEM_BUG_ON(eb->args->flags & __EXEC_ENGINE_PINNED);
>>
>> You could avoid duplication by putting the common flags into the 
>> common header and then eb2 and eb3 add their own flags relative to the 
>> end of the last common entry.
>>
> 
> I intentionally avoided that. I think we should avoid creating
> dependencies between legacy execbuf and execbuf3 paths. They only
> get more and more intertwined if we go that route.
> So I have added some helper functions here which both paths
> can share, but the main flow is strictly kept separate.
> More on this below...
> 
>>> -    if (unlikely(intel_context_is_banned(ce)))
>>> -        return -EIO;
>>> -
>>> -    /*
>>> -     * Pinning the contexts may generate requests in order to acquire
>>> -     * GGTT space, so do this first before we reserve a seqno for
>>> -     * ourselves.
>>> -     */
>>> -    err = intel_context_pin_ww(ce, &eb->ww);
>>> +    err = __eb_pin_engine(eb->context, &eb->ww, throttle,
>>> +                  eb->file->filp->f_flags & O_NONBLOCK);
>>>      if (err)
>>>          return err;
>>> -    for_each_child(ce, child) {
>>> -        err = intel_context_pin_ww(child, &eb->ww);
>>> -        GEM_BUG_ON(err);    /* perma-pinned should incr a counter */
>>> -    }
>>> -
>>> -    for_each_child(ce, child) {
>>> -        err = eb_pin_timeline(eb, child, throttle);
>>> -        if (err)
>>> -            goto unwind;
>>> -        ++i;
>>> -    }
>>> -    err = eb_pin_timeline(eb, ce, throttle);
>>> -    if (err)
>>> -        goto unwind;
>>>      eb->args->flags |= __EXEC_ENGINE_PINNED;
>>>      return 0;
>>> -
>>> -unwind:
>>> -    for_each_child(ce, child) {
>>> -        if (j++ < i) {
>>> -            mutex_lock(&child->timeline->mutex);
>>> -            intel_context_exit(child);
>>> -            mutex_unlock(&child->timeline->mutex);
>>> -        }
>>> -    }
>>> -    for_each_child(ce, child)
>>> -        intel_context_unpin(child);
>>> -    intel_context_unpin(ce);
>>> -    return err;
>>>  }
>>>  static void eb_unpin_engine(struct i915_execbuffer *eb)
>>>  {
>>> -    struct intel_context *ce = eb->context, *child;
>>> -
>>>      if (!(eb->args->flags & __EXEC_ENGINE_PINNED))
>>>          return;
>>>      eb->args->flags &= ~__EXEC_ENGINE_PINNED;
>>> -    for_each_child(ce, child) {
>>> -        mutex_lock(&child->timeline->mutex);
>>> -        intel_context_exit(child);
>>> -        mutex_unlock(&child->timeline->mutex);
>>> -
>>> -        intel_context_unpin(child);
>>> -    }
>>> -
>>> -    mutex_lock(&ce->timeline->mutex);
>>> -    intel_context_exit(ce);
>>> -    mutex_unlock(&ce->timeline->mutex);
>>> -
>>> -    intel_context_unpin(ce);
>>> +    __eb_unpin_engine(eb->context);
>>>  }
>>>  static unsigned int
>>> @@ -2652,7 +2509,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
>>>  static int
>>>  eb_select_engine(struct i915_execbuffer *eb)
>>>  {
>>> -    struct intel_context *ce, *child;
>>> +    struct intel_context *ce;
>>>      unsigned int idx;
>>>      int err;
>>> @@ -2677,36 +2534,10 @@ eb_select_engine(struct i915_execbuffer *eb)
>>>      }
>>>      eb->num_batches = ce->parallel.number_children + 1;
>>> -    for_each_child(ce, child)
>>> -        intel_context_get(child);
>>> -    intel_gt_pm_get(ce->engine->gt);
>>> -
>>> -    if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
>>> -        err = intel_context_alloc_state(ce);
>>> -        if (err)
>>> -            goto err;
>>> -    }
>>> -    for_each_child(ce, child) {
>>> -        if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
>>> -            err = intel_context_alloc_state(child);
>>> -            if (err)
>>> -                goto err;
>>> -        }
>>> -    }
>>> -
>>> -    /*
>>> -     * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
>>> -     * EIO if the GPU is already wedged.
>>> -     */
>>> -    err = intel_gt_terminally_wedged(ce->engine->gt);
>>> +    err = __eb_select_engine(ce);
>>>      if (err)
>>>          goto err;
>>> -    if (!i915_vm_tryget(ce->vm)) {
>>> -        err = -ENOENT;
>>> -        goto err;
>>> -    }
>>> -
>>>      eb->context = ce;
>>>      eb->gt = ce->engine->gt;
>>> @@ -2715,12 +2546,9 @@ eb_select_engine(struct i915_execbuffer *eb)
>>>       * during ww handling. The pool is destroyed when last pm reference
>>>       * is dropped, which breaks our -EDEADLK handling.
>>>       */
>>> -    return err;
>>> +    return 0;
>>>  err:
>>> -    intel_gt_pm_put(ce->engine->gt);
>>> -    for_each_child(ce, child)
>>> -        intel_context_put(child);
>>>      intel_context_put(ce);
>>>      return err;
>>>  }
>>> @@ -2728,24 +2556,7 @@ eb_select_engine(struct i915_execbuffer *eb)
>>>  static void
>>>  eb_put_engine(struct i915_execbuffer *eb)
>>>  {
>>> -    struct intel_context *child;
>>> -
>>> -    i915_vm_put(eb->context->vm);
>>> -    intel_gt_pm_put(eb->gt);
>>> -    for_each_child(eb->context, child)
>>> -        intel_context_put(child);
>>> -    intel_context_put(eb->context);
>>> -}
>>> -
>>> -static void
>>> -__free_fence_array(struct eb_fence *fences, unsigned int n)
>>> -{
>>> -    while (n--) {
>>> -        drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
>>> -        dma_fence_put(fences[n].dma_fence);
>>> -        dma_fence_chain_free(fences[n].chain_fence);
>>> -    }
>>> -    kvfree(fences);
>>> +    __eb_put_engine(eb->context, eb->gt);
>>>  }
>>>  static int
>>> @@ -2756,7 +2567,6 @@ add_timeline_fence_array(struct i915_execbuffer 
>>> *eb,
>>>      u64 __user *user_values;
>>>      struct eb_fence *f;
>>>      u64 nfences;
>>> -    int err = 0;
>>>      nfences = timeline_fences->fence_count;
>>>      if (!nfences)
>>> @@ -2791,9 +2601,9 @@ add_timeline_fence_array(struct i915_execbuffer 
>>> *eb,
>>>      while (nfences--) {
>>>          struct drm_i915_gem_exec_fence user_fence;
>>> -        struct drm_syncobj *syncobj;
>>> -        struct dma_fence *fence = NULL;
>>> +        bool wait, signal;
>>>          u64 point;
>>> +        int ret;
>>>          if (__copy_from_user(&user_fence,
>>>                       user_fences++,
>>> @@ -2806,70 +2616,15 @@ add_timeline_fence_array(struct 
>>> i915_execbuffer *eb,
>>>          if (__get_user(point, user_values++))
>>>              return -EFAULT;
>>> -        syncobj = drm_syncobj_find(eb->file, user_fence.handle);
>>> -        if (!syncobj) {
>>> -            DRM_DEBUG("Invalid syncobj handle provided\n");
>>> -            return -ENOENT;
>>> -        }
>>> -
>>> -        fence = drm_syncobj_fence_get(syncobj);
>>> -
>>> -        if (!fence && user_fence.flags &&
>>> -            !(user_fence.flags & I915_EXEC_FENCE_SIGNAL)) {
>>> -            DRM_DEBUG("Syncobj handle has no fence\n");
>>> -            drm_syncobj_put(syncobj);
>>> -            return -EINVAL;
>>> -        }
>>> -
>>> -        if (fence)
>>> -            err = dma_fence_chain_find_seqno(&fence, point);
>>> -
>>> -        if (err && !(user_fence.flags & I915_EXEC_FENCE_SIGNAL)) {
>>> -            DRM_DEBUG("Syncobj handle missing requested point 
>>> %llu\n", point);
>>> -            dma_fence_put(fence);
>>> -            drm_syncobj_put(syncobj);
>>> -            return err;
>>> -        }
>>> -
>>> -        /*
>>> -         * A point might have been signaled already and
>>> -         * garbage collected from the timeline. In this case
>>> -         * just ignore the point and carry on.
>>> -         */
>>> -        if (!fence && !(user_fence.flags & I915_EXEC_FENCE_SIGNAL)) {
>>> -            drm_syncobj_put(syncobj);
>>> +        wait = user_fence.flags & I915_EXEC_FENCE_WAIT;
>>> +        signal = user_fence.flags & I915_EXEC_FENCE_SIGNAL;
>>> +        ret = add_timeline_fence(eb->file, user_fence.handle, point,
>>> +                     f, wait, signal);
>>> +        if (ret < 0)
>>> +            return ret;
>>> +        else if (!ret)
>>>              continue;
>>> -        }
>>> -        /*
>>> -         * For timeline syncobjs we need to preallocate chains for
>>> -         * later signaling.
>>> -         */
>>> -        if (point != 0 && user_fence.flags & I915_EXEC_FENCE_SIGNAL) {
>>> -            /*
>>> -             * Waiting and signaling the same point (when point !=
>>> -             * 0) would break the timeline.
>>> -             */
>>> -            if (user_fence.flags & I915_EXEC_FENCE_WAIT) {
>>> -                DRM_DEBUG("Trying to wait & signal the same timeline 
>>> point.\n");
>>> -                dma_fence_put(fence);
>>> -                drm_syncobj_put(syncobj);
>>> -                return -EINVAL;
>>> -            }
>>> -
>>> -            f->chain_fence = dma_fence_chain_alloc();
>>> -            if (!f->chain_fence) {
>>> -                drm_syncobj_put(syncobj);
>>> -                dma_fence_put(fence);
>>> -                return -ENOMEM;
>>> -            }
>>> -        } else {
>>> -            f->chain_fence = NULL;
>>> -        }
>>> -
>>> -        f->syncobj = ptr_pack_bits(syncobj, user_fence.flags, 2);
>>> -        f->dma_fence = fence;
>>> -        f->value = point;
>>>          f++;
>>>          eb->num_fences++;
>>>      }
>>> @@ -2949,65 +2704,6 @@ static int add_fence_array(struct 
>>> i915_execbuffer *eb)
>>>      return 0;
>>>  }
>>> -static void put_fence_array(struct eb_fence *fences, int num_fences)
>>> -{
>>> -    if (fences)
>>> -        __free_fence_array(fences, num_fences);
>>> -}
>>> -
>>> -static int
>>> -await_fence_array(struct i915_execbuffer *eb,
>>> -          struct i915_request *rq)
>>> -{
>>> -    unsigned int n;
>>> -    int err;
>>> -
>>> -    for (n = 0; n < eb->num_fences; n++) {
>>> -        struct drm_syncobj *syncobj;
>>> -        unsigned int flags;
>>> -
>>> -        syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
>>> -
>>> -        if (!eb->fences[n].dma_fence)
>>> -            continue;
>>> -
>>> -        err = i915_request_await_dma_fence(rq, 
>>> eb->fences[n].dma_fence);
>>> -        if (err < 0)
>>> -            return err;
>>> -    }
>>> -
>>> -    return 0;
>>> -}
>>> -
>>> -static void signal_fence_array(const struct i915_execbuffer *eb,
>>> -                   struct dma_fence * const fence)
>>> -{
>>> -    unsigned int n;
>>> -
>>> -    for (n = 0; n < eb->num_fences; n++) {
>>> -        struct drm_syncobj *syncobj;
>>> -        unsigned int flags;
>>> -
>>> -        syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
>>> -        if (!(flags & I915_EXEC_FENCE_SIGNAL))
>>> -            continue;
>>> -
>>> -        if (eb->fences[n].chain_fence) {
>>> -            drm_syncobj_add_point(syncobj,
>>> -                          eb->fences[n].chain_fence,
>>> -                          fence,
>>> -                          eb->fences[n].value);
>>> -            /*
>>> -             * The chain's ownership is transferred to the
>>> -             * timeline.
>>> -             */
>>> -            eb->fences[n].chain_fence = NULL;
>>> -        } else {
>>> -            drm_syncobj_replace_fence(syncobj, fence);
>>> -        }
>>> -    }
>>> -}
>>> -
>>>  static int
>>>  parse_timeline_fences(struct i915_user_extension __user *ext, void 
>>> *data)
>>>  {
>>> @@ -3020,80 +2716,6 @@ parse_timeline_fences(struct 
>>> i915_user_extension __user *ext, void *data)
>>>      return add_timeline_fence_array(eb, &timeline_fences);
>>>  }
>>> -static void retire_requests(struct intel_timeline *tl, struct 
>>> i915_request *end)
>>> -{
>>> -    struct i915_request *rq, *rn;
>>> -
>>> -    list_for_each_entry_safe(rq, rn, &tl->requests, link)
>>> -        if (rq == end || !i915_request_retire(rq))
>>> -            break;
>>> -}
>>> -
>>> -static int eb_request_add(struct i915_execbuffer *eb, struct 
>>> i915_request *rq,
>>> -              int err, bool last_parallel)
>>> -{
>>> -    struct intel_timeline * const tl = i915_request_timeline(rq);
>>> -    struct i915_sched_attr attr = {};
>>> -    struct i915_request *prev;
>>> -
>>> -    lockdep_assert_held(&tl->mutex);
>>> -    lockdep_unpin_lock(&tl->mutex, rq->cookie);
>>> -
>>> -    trace_i915_request_add(rq);
>>> -
>>> -    prev = __i915_request_commit(rq);
>>> -
>>> -    /* Check that the context wasn't destroyed before submission */
>>> -    if (likely(!intel_context_is_closed(eb->context))) {
>>> -        attr = eb->gem_context->sched;
>>> -    } else {
>>> -        /* Serialise with context_close via the add_to_timeline */
>>> -        i915_request_set_error_once(rq, -ENOENT);
>>> -        __i915_request_skip(rq);
>>> -        err = -ENOENT; /* override any transient errors */
>>> -    }
>>> -
>>> -    if (intel_context_is_parallel(eb->context)) {
>>> -        if (err) {
>>> -            __i915_request_skip(rq);
>>> -            set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
>>> -                &rq->fence.flags);
>>> -        }
>>> -        if (last_parallel)
>>> -            set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
>>> -                &rq->fence.flags);
>>> -    }
>>> -
>>> -    __i915_request_queue(rq, &attr);
>>> -
>>> -    /* Try to clean up the client's timeline after submitting the 
>>> request */
>>> -    if (prev)
>>> -        retire_requests(tl, prev);
>>> -
>>> -    mutex_unlock(&tl->mutex);
>>> -
>>> -    return err;
>>> -}
>>> -
>>> -static int eb_requests_add(struct i915_execbuffer *eb, int err)
>>> -{
>>> -    int i;
>>> -
>>> -    /*
>>> -     * We iterate in reverse order of creation to release timeline 
>>> mutexes in
>>> -     * same order.
>>> -     */
>>> -    for_each_batch_add_order(eb, i) {
>>> -        struct i915_request *rq = eb->requests[i];
>>> -
>>> -        if (!rq)
>>> -            continue;
>>> -        err |= eb_request_add(eb, rq, err, i == 0);
>>> -    }
>>> -
>>> -    return err;
>>> -}
>>> -
>>>  static const i915_user_extension_fn execbuf_extensions[] = {
>>>      [DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES] = 
>>> parse_timeline_fences,
>>>  };
>>> @@ -3120,73 +2742,26 @@ parse_execbuf2_extensions(struct 
>>> drm_i915_gem_execbuffer2 *args,
>>>                      eb);
>>>  }
>>> -static void eb_requests_get(struct i915_execbuffer *eb)
>>> -{
>>> -    unsigned int i;
>>> -
>>> -    for_each_batch_create_order(eb, i) {
>>> -        if (!eb->requests[i])
>>> -            break;
>>> -
>>> -        i915_request_get(eb->requests[i]);
>>> -    }
>>> -}
>>> -
>>> -static void eb_requests_put(struct i915_execbuffer *eb)
>>> -{
>>> -    unsigned int i;
>>> -
>>> -    for_each_batch_create_order(eb, i) {
>>> -        if (!eb->requests[i])
>>> -            break;
>>> -
>>> -        i915_request_put(eb->requests[i]);
>>> -    }
>>> -}
>>> -
>>>  static struct sync_file *
>>>  eb_composite_fence_create(struct i915_execbuffer *eb, int out_fence_fd)
>>>  {
>>>      struct sync_file *out_fence = NULL;
>>> -    struct dma_fence_array *fence_array;
>>> -    struct dma_fence **fences;
>>> -    unsigned int i;
>>> -
>>> -    GEM_BUG_ON(!intel_context_is_parent(eb->context));
>>> +    struct dma_fence *fence;
>>> -    fences = kmalloc_array(eb->num_batches, sizeof(*fences), 
>>> GFP_KERNEL);
>>> -    if (!fences)
>>> -        return ERR_PTR(-ENOMEM);
>>> -
>>> -    for_each_batch_create_order(eb, i) {
>>> -        fences[i] = &eb->requests[i]->fence;
>>> -        __set_bit(I915_FENCE_FLAG_COMPOSITE,
>>> -              &eb->requests[i]->fence.flags);
>>> -    }
>>> -
>>> -    fence_array = dma_fence_array_create(eb->num_batches,
>>> -                         fences,
>>> -                         eb->context->parallel.fence_context,
>>> -                         eb->context->parallel.seqno++,
>>> -                         false);
>>> -    if (!fence_array) {
>>> -        kfree(fences);
>>> -        return ERR_PTR(-ENOMEM);
>>> -    }
>>> -
>>> -    /* Move ownership to the dma_fence_array created above */
>>> -    for_each_batch_create_order(eb, i)
>>> -        dma_fence_get(fences[i]);
>>> +    fence = __eb_composite_fence_create(eb->requests, eb->num_batches,
>>> +                        eb->context);
>>> +    if (IS_ERR(fence))
>>> +        return ERR_CAST(fence);
>>>      if (out_fence_fd != -1) {
>>> -        out_fence = sync_file_create(&fence_array->base);
>>> +        out_fence = sync_file_create(fence);
>>>          /* sync_file now owns fence_arry, drop creation ref */
>>> -        dma_fence_put(&fence_array->base);
>>> +        dma_fence_put(fence);
>>>          if (!out_fence)
>>>              return ERR_PTR(-ENOMEM);
>>>      }
>>> -    eb->composite_fence = &fence_array->base;
>>> +    eb->composite_fence = fence;
>>>      return out_fence;
>>>  }
>>> @@ -3218,7 +2793,7 @@ eb_fences_add(struct i915_execbuffer *eb, 
>>> struct i915_request *rq,
>>>      }
>>>      if (eb->fences) {
>>> -        err = await_fence_array(eb, rq);
>>> +        err = await_fence_array(eb->fences, eb->num_fences, rq);
>>>          if (err)
>>>              return ERR_PTR(err);
>>>      }
>>> @@ -3236,23 +2811,6 @@ eb_fences_add(struct i915_execbuffer *eb, 
>>> struct i915_request *rq,
>>>      return out_fence;
>>>  }
>>> -static struct intel_context *
>>> -eb_find_context(struct i915_execbuffer *eb, unsigned int 
>>> context_number)
>>> -{
>>> -    struct intel_context *child;
>>> -
>>> -    if (likely(context_number == 0))
>>> -        return eb->context;
>>> -
>>> -    for_each_child(eb->context, child)
>>> -        if (!--context_number)
>>> -            return child;
>>> -
>>> -    GEM_BUG_ON("Context not found");
>>> -
>>> -    return NULL;
>>> -}
>>> -
>>>  static struct sync_file *
>>>  eb_requests_create(struct i915_execbuffer *eb, struct dma_fence 
>>> *in_fence,
>>>             int out_fence_fd)
>>> @@ -3262,7 +2820,8 @@ eb_requests_create(struct i915_execbuffer *eb, 
>>> struct dma_fence *in_fence,
>>>      for_each_batch_create_order(eb, i) {
>>>          /* Allocate a request for this batch buffer nice and early. */
>>> -        eb->requests[i] = i915_request_create(eb_find_context(eb, i));
>>> +        eb->requests[i] =
>>> +            i915_request_create(eb_find_context(eb->context, i));
>>>          if (IS_ERR(eb->requests[i])) {
>>>              out_fence = ERR_CAST(eb->requests[i]);
>>>              eb->requests[i] = NULL;
>>> @@ -3442,11 +3001,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>>>      err = eb_submit(&eb);
>>>  err_request:
>>> -    eb_requests_get(&eb);
>>> -    err = eb_requests_add(&eb, err);
>>> +    eb_requests_get(eb.requests, eb.num_batches);
>>> +    err = eb_requests_add(eb.requests, eb.num_batches, eb.context,
>>> +                  eb.gem_context->sched, err);
>>>      if (eb.fences)
>>> -        signal_fence_array(&eb, eb.composite_fence ?
>>> +        signal_fence_array(eb.fences, eb.num_fences,
>>> +                   eb.composite_fence ?
>>>                     eb.composite_fence :
>>>                     &eb.requests[0]->fence);
>>> @@ -3471,7 +3032,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>>>      if (!out_fence && eb.composite_fence)
>>>          dma_fence_put(eb.composite_fence);
>>> -    eb_requests_put(&eb);
>>> +    eb_requests_put(eb.requests, eb.num_batches);
>>>  err_vma:
>>>      eb_release_vmas(&eb, true);
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
>>> new file mode 100644
>>> index 000000000000..167268dfd930
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
>>> @@ -0,0 +1,530 @@
>>> +// SPDX-License-Identifier: MIT
>>> +/*
>>> + * Copyright © 2022 Intel Corporation
>>> + */
>>> +
>>> +#include <linux/dma-fence-array.h>
>>> +#include "gt/intel_gt.h"
>>> +#include "gt/intel_gt_pm.h"
>>> +#include "gt/intel_ring.h"
>>> +
>>> +#include "i915_gem_execbuffer_common.h"
>>> +
>>> +#define __EXEC_COMMON_FENCE_WAIT    BIT(0)
>>> +#define __EXEC_COMMON_FENCE_SIGNAL    BIT(1)
>>> +
>>> +static struct i915_request *eb_throttle(struct intel_context *ce)
>>> +{
>>> +    struct intel_ring *ring = ce->ring;
>>> +    struct intel_timeline *tl = ce->timeline;
>>> +    struct i915_request *rq;
>>> +
>>> +    /*
>>> +     * Completely unscientific finger-in-the-air estimates for suitable
>>> +     * maximum user request size (to avoid blocking) and then backoff.
>>> +     */
>>> +    if (intel_ring_update_space(ring) >= PAGE_SIZE)
>>> +        return NULL;
>>> +
>>> +    /*
>>> +     * Find a request that after waiting upon, there will be at 
>>> least half
>>> +     * the ring available. The hysteresis allows us to compete for the
>>> +     * shared ring and should mean that we sleep less often prior to
>>> +     * claiming our resources, but not so long that the ring completely
>>> +     * drains before we can submit our next request.
>>> +     */
>>> +    list_for_each_entry(rq, &tl->requests, link) {
>>> +        if (rq->ring != ring)
>>> +            continue;
>>> +
>>> +        if (__intel_ring_space(rq->postfix,
>>> +                       ring->emit, ring->size) > ring->size / 2)
>>> +            break;
>>> +    }
>>> +    if (&rq->link == &tl->requests)
>>> +        return NULL; /* weird, we will check again later for real */
>>> +
>>> +    return i915_request_get(rq);
>>> +}
>>> +
>>> +static int eb_pin_timeline(struct intel_context *ce, bool throttle,
>>> +               bool nonblock)
>>> +{
>>> +    struct intel_timeline *tl;
>>> +    struct i915_request *rq = NULL;
>>> +
>>> +    /*
>>> +     * Take a local wakeref for preparing to dispatch the execbuf as
>>> +     * we expect to access the hardware fairly frequently in the
>>> +     * process, and require the engine to be kept awake between 
>>> accesses.
>>> +     * Upon dispatch, we acquire another prolonged wakeref that we hold
>>> +     * until the timeline is idle, which in turn releases the wakeref
>>> +     * taken on the engine, and the parent device.
>>> +     */
>>> +    tl = intel_context_timeline_lock(ce);
>>> +    if (IS_ERR(tl))
>>> +        return PTR_ERR(tl);
>>> +
>>> +    intel_context_enter(ce);
>>> +    if (throttle)
>>> +        rq = eb_throttle(ce);
>>> +    intel_context_timeline_unlock(tl);
>>> +
>>> +    if (rq) {
>>> +        long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
>>> +
>>> +        if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
>>> +                      timeout) < 0) {
>>> +            i915_request_put(rq);
>>> +
>>> +            /*
>>> +             * Error path, cannot use intel_context_timeline_lock as
>>> +             * that is user interruptable and this clean up step
>>> +             * must be done.
>>> +             */
>>> +            mutex_lock(&ce->timeline->mutex);
>>> +            intel_context_exit(ce);
>>> +            mutex_unlock(&ce->timeline->mutex);
>>> +
>>> +            if (nonblock)
>>> +                return -EWOULDBLOCK;
>>> +            else
>>> +                return -EINTR;
>>> +        }
>>> +        i915_request_put(rq);
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +int __eb_pin_engine(struct intel_context *ce, struct i915_gem_ww_ctx 
>>> *ww,
>>> +            bool throttle, bool nonblock)
>>> +{
>>> +    struct intel_context *child;
>>> +    int err;
>>> +    int i = 0, j = 0;
>>> +
>>> +    if (unlikely(intel_context_is_banned(ce)))
>>> +        return -EIO;
>>> +
>>> +    /*
>>> +     * Pinning the contexts may generate requests in order to acquire
>>> +     * GGTT space, so do this first before we reserve a seqno for
>>> +     * ourselves.
>>> +     */
>>> +    err = intel_context_pin_ww(ce, ww);
>>> +    if (err)
>>> +        return err;
>>> +
>>> +    for_each_child(ce, child) {
>>> +        err = intel_context_pin_ww(child, ww);
>>> +        GEM_BUG_ON(err);    /* perma-pinned should incr a counter */
>>> +    }
>>> +
>>> +    for_each_child(ce, child) {
>>> +        err = eb_pin_timeline(child, throttle, nonblock);
>>> +        if (err)
>>> +            goto unwind;
>>> +        ++i;
>>> +    }
>>> +    err = eb_pin_timeline(ce, throttle, nonblock);
>>> +    if (err)
>>> +        goto unwind;
>>> +
>>> +    return 0;
>>> +
>>> +unwind:
>>> +    for_each_child(ce, child) {
>>> +        if (j++ < i) {
>>> +            mutex_lock(&child->timeline->mutex);
>>> +            intel_context_exit(child);
>>> +            mutex_unlock(&child->timeline->mutex);
>>> +        }
>>> +    }
>>> +    for_each_child(ce, child)
>>> +        intel_context_unpin(child);
>>> +    intel_context_unpin(ce);
>>> +    return err;
>>> +}
>>> +
>>> +void __eb_unpin_engine(struct intel_context *ce)
>>> +{
>>> +    struct intel_context *child;
>>> +
>>> +    for_each_child(ce, child) {
>>> +        mutex_lock(&child->timeline->mutex);
>>> +        intel_context_exit(child);
>>> +        mutex_unlock(&child->timeline->mutex);
>>> +
>>> +        intel_context_unpin(child);
>>> +    }
>>> +
>>> +    mutex_lock(&ce->timeline->mutex);
>>> +    intel_context_exit(ce);
>>> +    mutex_unlock(&ce->timeline->mutex);
>>> +
>>> +    intel_context_unpin(ce);
>>> +}
>>> +
>>> +struct intel_context *
>>> +eb_find_context(struct intel_context *context, unsigned int 
>>> context_number)
>>> +{
>>> +    struct intel_context *child;
>>> +
>>> +    if (likely(context_number == 0))
>>> +        return context;
>>> +
>>> +    for_each_child(context, child)
>>> +        if (!--context_number)
>>> +            return child;
>>> +
>>> +    GEM_BUG_ON("Context not found");
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>> +static void __free_fence_array(struct eb_fence *fences, u64 n)
>>> +{
>>> +    while (n--) {
>>> +        drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
>>> +        dma_fence_put(fences[n].dma_fence);
>>> +        dma_fence_chain_free(fences[n].chain_fence);
>>> +    }
>>> +    kvfree(fences);
>>> +}
>>> +
>>> +void put_fence_array(struct eb_fence *fences, u64 num_fences)
>>> +{
>>> +    if (fences)
>>> +        __free_fence_array(fences, num_fences);
>>> +}
>>> +
>>> +int add_timeline_fence(struct drm_file *file, u32 handle, u64 point,
>>> +               struct eb_fence *f, bool wait, bool signal)
>>> +{
>>> +    struct drm_syncobj *syncobj;
>>> +    struct dma_fence *fence = NULL;
>>> +    u32 flags = 0;
>>> +    int err = 0;
>>> +
>>> +    syncobj = drm_syncobj_find(file, handle);
>>> +    if (!syncobj) {
>>> +        DRM_DEBUG("Invalid syncobj handle provided\n");
>>> +        return -ENOENT;
>>> +    }
>>> +
>>> +    fence = drm_syncobj_fence_get(syncobj);
>>> +
>>> +    if (!fence && wait && !signal) {
>>> +        DRM_DEBUG("Syncobj handle has no fence\n");
>>> +        drm_syncobj_put(syncobj);
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    if (fence)
>>> +        err = dma_fence_chain_find_seqno(&fence, point);
>>> +
>>> +    if (err && !signal) {
>>> +        DRM_DEBUG("Syncobj handle missing requested point %llu\n", 
>>> point);
>>> +        dma_fence_put(fence);
>>> +        drm_syncobj_put(syncobj);
>>> +        return err;
>>> +    }
>>> +
>>> +    /*
>>> +     * A point might have been signaled already and
>>> +     * garbage collected from the timeline. In this case
>>> +     * just ignore the point and carry on.
>>> +     */
>>> +    if (!fence && !signal) {
>>> +        drm_syncobj_put(syncobj);
>>> +        return 0;
>>> +    }
>>> +
>>> +    /*
>>> +     * For timeline syncobjs we need to preallocate chains for
>>> +     * later signaling.
>>> +     */
>>> +    if (point != 0 && signal) {
>>> +        /*
>>> +         * Waiting and signaling the same point (when point !=
>>> +         * 0) would break the timeline.
>>> +         */
>>> +        if (wait) {
>>> +            DRM_DEBUG("Trying to wait & signal the same timeline 
>>> point.\n");
>>> +            dma_fence_put(fence);
>>> +            drm_syncobj_put(syncobj);
>>> +            return -EINVAL;
>>> +        }
>>> +
>>> +        f->chain_fence = dma_fence_chain_alloc();
>>> +        if (!f->chain_fence) {
>>> +            drm_syncobj_put(syncobj);
>>> +            dma_fence_put(fence);
>>> +            return -ENOMEM;
>>> +        }
>>> +    } else {
>>> +        f->chain_fence = NULL;
>>> +    }
>>> +
>>> +    flags |= wait ? __EXEC_COMMON_FENCE_WAIT : 0;
>>> +    flags |= signal ? __EXEC_COMMON_FENCE_SIGNAL : 0;
>>> +
>>> +    f->syncobj = ptr_pack_bits(syncobj, flags, 2);
>>> +    f->dma_fence = fence;
>>> +    f->value = point;
>>> +    return 1;
>>> +}
>>> +
>>> +int await_fence_array(struct eb_fence *fences, u64 num_fences,
>>> +              struct i915_request *rq)
>>> +{
>>> +    unsigned int n;
>>> +
>>> +    for (n = 0; n < num_fences; n++) {
>>> +        struct drm_syncobj *syncobj;
>>> +        unsigned int flags;
>>> +        int err;
>>> +
>>> +        syncobj = ptr_unpack_bits(fences[n].syncobj, &flags, 2);
>>> +
>>> +        if (!fences[n].dma_fence)
>>> +            continue;
>>> +
>>> +        err = i915_request_await_dma_fence(rq, fences[n].dma_fence);
>>> +        if (err < 0)
>>> +            return err;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +void signal_fence_array(struct eb_fence *fences, u64 num_fences,
>>> +            struct dma_fence * const fence)
>>> +{
>>> +    unsigned int n;
>>> +
>>> +    for (n = 0; n < num_fences; n++) {
>>> +        struct drm_syncobj *syncobj;
>>> +        unsigned int flags;
>>> +
>>> +        syncobj = ptr_unpack_bits(fences[n].syncobj, &flags, 2);
>>> +        if (!(flags & __EXEC_COMMON_FENCE_SIGNAL))
>>> +            continue;
>>> +
>>> +        if (fences[n].chain_fence) {
>>> +            drm_syncobj_add_point(syncobj,
>>> +                          fences[n].chain_fence,
>>> +                          fence,
>>> +                          fences[n].value);
>>> +            /*
>>> +             * The chain's ownership is transferred to the
>>> +             * timeline.
>>> +             */
>>> +            fences[n].chain_fence = NULL;
>>> +        } else {
>>> +            drm_syncobj_replace_fence(syncobj, fence);
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +/*
>>> + * Using two helper loops for the order of which requests / batches 
>>> are created
>>> + * and added the to backend. Requests are created in order from the 
>>> parent to
>>> + * the last child. Requests are added in the reverse order, from the 
>>> last child
>>> + * to parent. This is done for locking reasons as the timeline lock 
>>> is acquired
>>> + * during request creation and released when the request is added to 
>>> the
>>> + * backend. To make lockdep happy (see intel_context_timeline_lock) 
>>> this must be
>>> + * the ordering.
>>> + */
>>> +#define for_each_batch_create_order(_num_batches) \
>>> +    for (unsigned int i = 0; i < (_num_batches); ++i)
>>> +#define for_each_batch_add_order(_num_batches) \
>>> +    for (int i = (_num_batches) - 1; i >= 0; --i)
>>> +
>>> +static void retire_requests(struct intel_timeline *tl, struct 
>>> i915_request *end)
>>> +{
>>> +    struct i915_request *rq, *rn;
>>> +
>>> +    list_for_each_entry_safe(rq, rn, &tl->requests, link)
>>> +        if (rq == end || !i915_request_retire(rq))
>>> +            break;
>>> +}
>>> +
>>> +static int eb_request_add(struct intel_context *context,
>>> +              struct i915_request *rq,
>>> +              struct i915_sched_attr sched,
>>> +              int err, bool last_parallel)
>>> +{
>>> +    struct intel_timeline * const tl = i915_request_timeline(rq);
>>> +    struct i915_sched_attr attr = {};
>>> +    struct i915_request *prev;
>>> +
>>> +    lockdep_assert_held(&tl->mutex);
>>> +    lockdep_unpin_lock(&tl->mutex, rq->cookie);
>>> +
>>> +    trace_i915_request_add(rq);
>>> +
>>> +    prev = __i915_request_commit(rq);
>>> +
>>> +    /* Check that the context wasn't destroyed before submission */
>>> +    if (likely(!intel_context_is_closed(context))) {
>>> +        attr = sched;
>>> +    } else {
>>> +        /* Serialise with context_close via the add_to_timeline */
>>> +        i915_request_set_error_once(rq, -ENOENT);
>>> +        __i915_request_skip(rq);
>>> +        err = -ENOENT; /* override any transient errors */
>>> +    }
>>> +
>>> +    if (intel_context_is_parallel(context)) {
>>> +        if (err) {
>>> +            __i915_request_skip(rq);
>>> +            set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
>>> +                &rq->fence.flags);
>>> +        }
>>> +        if (last_parallel)
>>> +            set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
>>> +                &rq->fence.flags);
>>> +    }
>>> +
>>> +    __i915_request_queue(rq, &attr);
>>> +
>>> +    /* Try to clean up the client's timeline after submitting the 
>>> request */
>>> +    if (prev)
>>> +        retire_requests(tl, prev);
>>> +
>>> +    mutex_unlock(&tl->mutex);
>>> +
>>> +    return err;
>>> +}
>>> +
>>> +int eb_requests_add(struct i915_request **requests, unsigned int 
>>> num_batches,
>>> +            struct intel_context *context, struct i915_sched_attr 
>>> sched,
>>> +            int err)
>>> +{
>>> +    /*
>>> +     * We iterate in reverse order of creation to release timeline 
>>> mutexes
>>> +     * in same order.
>>> +     */
>>> +    for_each_batch_add_order(num_batches) {
>>> +        struct i915_request *rq = requests[i];
>>> +
>>> +        if (!rq)
>>> +            continue;
>>> +
>>> +        err = eb_request_add(context, rq, sched, err, i == 0);
>>> +    }
>>> +
>>> +    return err;
>>> +}
>>> +
>>> +void eb_requests_get(struct i915_request **requests, unsigned int 
>>> num_batches)
>>> +{
>>> +    for_each_batch_create_order(num_batches) {
>>> +        if (!requests[i])
>>> +            break;
>>> +
>>> +        i915_request_get(requests[i]);
>>> +    }
>>> +}
>>> +
>>> +void eb_requests_put(struct i915_request **requests, unsigned int 
>>> num_batches)
>>> +{
>>> +    for_each_batch_create_order(num_batches) {
>>> +        if (!requests[i])
>>> +            break;
>>> +
>>> +        i915_request_put(requests[i]);
>>> +    }
>>> +}
>>> +
>>> +struct dma_fence *__eb_composite_fence_create(struct i915_request 
>>> **requests,
>>> +                          unsigned int num_batches,
>>> +                          struct intel_context *context)
>>> +{
>>> +    struct dma_fence_array *fence_array;
>>> +    struct dma_fence **fences;
>>> +
>>> +    GEM_BUG_ON(!intel_context_is_parent(context));
>>> +
>>> +    fences = kmalloc_array(num_batches, sizeof(*fences), GFP_KERNEL);
>>> +    if (!fences)
>>> +        return ERR_PTR(-ENOMEM);
>>> +
>>> +    for_each_batch_create_order(num_batches) {
>>> +        fences[i] = &requests[i]->fence;
>>> +        __set_bit(I915_FENCE_FLAG_COMPOSITE,
>>> +              &requests[i]->fence.flags);
>>> +    }
>>> +
>>> +    fence_array = dma_fence_array_create(num_batches,
>>> +                         fences,
>>> +                         context->parallel.fence_context,
>>> +                         context->parallel.seqno++,
>>> +                         false);
>>> +    if (!fence_array) {
>>> +        kfree(fences);
>>> +        return ERR_PTR(-ENOMEM);
>>> +    }
>>> +
>>> +    /* Move ownership to the dma_fence_array created above */
>>> +    for_each_batch_create_order(num_batches)
>>> +        dma_fence_get(fences[i]);
>>> +
>>> +    return &fence_array->base;
>>> +}
>>> +
>>> +int __eb_select_engine(struct intel_context *ce)
>>> +{
>>> +    struct intel_context *child;
>>> +    int err;
>>> +
>>> +    for_each_child(ce, child)
>>> +        intel_context_get(child);
>>> +    intel_gt_pm_get(ce->engine->gt);
>>> +
>>> +    if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
>>> +        err = intel_context_alloc_state(ce);
>>> +        if (err)
>>> +            goto err;
>>> +    }
>>> +    for_each_child(ce, child) {
>>> +        if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
>>> +            err = intel_context_alloc_state(child);
>>> +            if (err)
>>> +                goto err;
>>> +        }
>>> +    }
>>> +
>>> +    /*
>>> +     * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
>>> +     * EIO if the GPU is already wedged.
>>> +     */
>>> +    err = intel_gt_terminally_wedged(ce->engine->gt);
>>> +    if (err)
>>> +        goto err;
>>> +
>>> +    if (!i915_vm_tryget(ce->vm)) {
>>> +        err = -ENOENT;
>>> +        goto err;
>>> +    }
>>> +
>>> +    return 0;
>>> +err:
>>> +    intel_gt_pm_put(ce->engine->gt);
>>> +    for_each_child(ce, child)
>>> +        intel_context_put(child);
>>> +    return err;
>>> +}
>>> +
>>> +void __eb_put_engine(struct intel_context *context, struct intel_gt 
>>> *gt)
>>> +{
>>> +    struct intel_context *child;
>>> +
>>> +    i915_vm_put(context->vm);
>>> +    intel_gt_pm_put(gt);
>>> +    for_each_child(context, child)
>>> +        intel_context_put(child);
>>> +    intel_context_put(context);
>>> +}
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
>>> new file mode 100644
>>> index 000000000000..725febfd6a53
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
>>> @@ -0,0 +1,47 @@
>>> +/* SPDX-License-Identifier: MIT */
>>> +/*
>>> + * Copyright © 2022 Intel Corporation
>>> + */
>>> +
>>> +#ifndef __I915_GEM_EXECBUFFER_COMMON_H
>>> +#define __I915_GEM_EXECBUFFER_COMMON_H
>>> +
>>> +#include <drm/drm_syncobj.h>
>>> +
>>> +#include "gt/intel_context.h"
>>> +
>>> +struct eb_fence {
>>> +    struct drm_syncobj *syncobj;
>>> +    struct dma_fence *dma_fence;
>>> +    u64 value;
>>> +    struct dma_fence_chain *chain_fence;
>>> +};
>>> +
>>> +int __eb_pin_engine(struct intel_context *ce, struct i915_gem_ww_ctx 
>>> *ww,
>>> +            bool throttle, bool nonblock);
>>> +void __eb_unpin_engine(struct intel_context *ce);
>>> +int __eb_select_engine(struct intel_context *ce);
>>> +void __eb_put_engine(struct intel_context *context, struct intel_gt 
>>> *gt);
>>
>> Two things:
>>
>> 1)
>>
>> Is there enough commonality to maybe avoid multiple arguments and have 
>> like
>>
>> struct i915_execbuffer {
>>
>> };
>>
>> struct i915_execbuffer2 {
>>     struct i915_execbuffer eb;
>>     .. eb2 specific fields ..
>> };
>>
>> struct i915_execbuffer3 {
>>     struct i915_execbuffer eb;
>>     .. eb3 specific fields ..
>> };
>>
>> And then have the common helpers take the pointer to the common struct?
>>
> 
> ...
> This requires updating legacy execbuf path everywhere which doesn't look
> like a good idea to me. As discussed during vm_bind rfc, I think it is
> better to keep execbuf3 to itself and keep it leaner.

To be clear the amount of almost the same duplicated code worries me 
from the maintenance burden angle. I don't think we have any such 
precedent in the driver. And AFAIR during RFC conclusion was keep the 
ioctls separate and share code where it makes sense.

For instance eb_fences_add - could you have a common helper which takes 
in_fence and out_fence as parameters. Passing in -1/-1 from eb3 and end 
up with even more sharing? Same approach like you did in this patch by 
making helpers take arguments they need instead of struct eb.

Eb_requests_create? Again same code if you make eb->batch_pool a 
standalone argument passed in.

Haven't looked at more than those in this round..

Regards,

Tvrtko

>> 2)
>>
>> Should we prefix with i915_ everything that is now no longer static?
>>
> 
> Yah, makes sense, will update.
> 
> Niranjana
> 
>> Regards,
>>
>> Tvrtko
>>
>>> +
>>> +struct intel_context *
>>> +eb_find_context(struct intel_context *context, unsigned int 
>>> context_number);
>>> +
>>> +int add_timeline_fence(struct drm_file *file, u32 handle, u64 point,
>>> +               struct eb_fence *f, bool wait, bool signal);
>>> +void put_fence_array(struct eb_fence *fences, u64 num_fences);
>>> +int await_fence_array(struct eb_fence *fences, u64 num_fences,
>>> +              struct i915_request *rq);
>>> +void signal_fence_array(struct eb_fence *fences, u64 num_fences,
>>> +            struct dma_fence * const fence);
>>> +
>>> +int eb_requests_add(struct i915_request **requests, unsigned int 
>>> num_batches,
>>> +            struct intel_context *context, struct i915_sched_attr 
>>> sched,
>>> +            int err);
>>> +void eb_requests_get(struct i915_request **requests, unsigned int 
>>> num_batches);
>>> +void eb_requests_put(struct i915_request **requests, unsigned int 
>>> num_batches);
>>> +
>>> +struct dma_fence *__eb_composite_fence_create(struct i915_request 
>>> **requests,
>>> +                          unsigned int num_batches,
>>> +                          struct intel_context *context);
>>> +
>>> +#endif /* __I915_GEM_EXECBUFFER_COMMON_H */

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC v4 02/14] drm/i915/vm_bind: Add __i915_sw_fence_await_reservation()
  2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-22  9:26     ` Jani Nikula
  -1 siblings, 0 replies; 62+ messages in thread
From: Jani Nikula @ 2022-09-22  9:26 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

On Wed, 21 Sep 2022, Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> wrote:
> Add function __i915_sw_fence_await_reservation() for
> asynchronous wait on a dma-resv object with specified
> dma_resv_usage. This is required for async vma unbind
> with vm_bind.
>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_sw_fence.c | 25 ++++++++++++++++++-------
>  drivers/gpu/drm/i915/i915_sw_fence.h |  7 ++++++-
>  2 files changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
> index 6fc0d1b89690..0ce8f4efc1ed 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> @@ -569,12 +569,11 @@ int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>  	return ret;
>  }
>  
> -int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
> -				    struct dma_resv *resv,
> -				    const struct dma_fence_ops *exclude,
> -				    bool write,
> -				    unsigned long timeout,
> -				    gfp_t gfp)
> +int __i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
> +				      struct dma_resv *resv,
> +				      enum dma_resv_usage usage,
> +				      unsigned long timeout,
> +				      gfp_t gfp)
>  {
>  	struct dma_resv_iter cursor;
>  	struct dma_fence *f;
> @@ -583,7 +582,7 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>  	debug_fence_assert(fence);
>  	might_sleep_if(gfpflags_allow_blocking(gfp));
>  
> -	dma_resv_iter_begin(&cursor, resv, dma_resv_usage_rw(write));
> +	dma_resv_iter_begin(&cursor, resv, usage);
>  	dma_resv_for_each_fence_unlocked(&cursor, f) {
>  		pending = i915_sw_fence_await_dma_fence(fence, f, timeout,
>  							gfp);
> @@ -598,6 +597,18 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>  	return ret;
>  }
>  
> +int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
> +				    struct dma_resv *resv,
> +				    const struct dma_fence_ops *exclude,
> +				    bool write,
> +				    unsigned long timeout,
> +				    gfp_t gfp)
> +{
> +	return __i915_sw_fence_await_reservation(fence, resv,
> +						 dma_resv_usage_rw(write),
> +						 timeout, gfp);
> +}
> +
>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>  #include "selftests/lib_sw_fence.c"
>  #include "selftests/i915_sw_fence.c"
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
> index 619fc5a22f0c..3cf4b6e16f35 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.h
> @@ -10,13 +10,13 @@
>  #define _I915_SW_FENCE_H_
>  
>  #include <linux/dma-fence.h>
> +#include <linux/dma-resv.h>

As a GCC extension you can drop this and forward declare enum
dma_resv_usage. We use it extensively.

>  #include <linux/gfp.h>
>  #include <linux/kref.h>
>  #include <linux/notifier.h> /* for NOTIFY_DONE */
>  #include <linux/wait.h>
>  
>  struct completion;
> -struct dma_resv;
>  struct i915_sw_fence;
>  
>  enum i915_sw_fence_notify {
> @@ -89,6 +89,11 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>  				  unsigned long timeout,
>  				  gfp_t gfp);
>  
> +int __i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
> +				      struct dma_resv *resv,
> +				      enum dma_resv_usage usage,
> +				      unsigned long timeout,
> +				      gfp_t gfp);
>  int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>  				    struct dma_resv *resv,
>  				    const struct dma_fence_ops *exclude,

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 02/14] drm/i915/vm_bind: Add __i915_sw_fence_await_reservation()
@ 2022-09-22  9:26     ` Jani Nikula
  0 siblings, 0 replies; 62+ messages in thread
From: Jani Nikula @ 2022-09-22  9:26 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

On Wed, 21 Sep 2022, Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> wrote:
> Add function __i915_sw_fence_await_reservation() for
> asynchronous wait on a dma-resv object with specified
> dma_resv_usage. This is required for async vma unbind
> with vm_bind.
>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_sw_fence.c | 25 ++++++++++++++++++-------
>  drivers/gpu/drm/i915/i915_sw_fence.h |  7 ++++++-
>  2 files changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
> index 6fc0d1b89690..0ce8f4efc1ed 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> @@ -569,12 +569,11 @@ int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>  	return ret;
>  }
>  
> -int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
> -				    struct dma_resv *resv,
> -				    const struct dma_fence_ops *exclude,
> -				    bool write,
> -				    unsigned long timeout,
> -				    gfp_t gfp)
> +int __i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
> +				      struct dma_resv *resv,
> +				      enum dma_resv_usage usage,
> +				      unsigned long timeout,
> +				      gfp_t gfp)
>  {
>  	struct dma_resv_iter cursor;
>  	struct dma_fence *f;
> @@ -583,7 +582,7 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>  	debug_fence_assert(fence);
>  	might_sleep_if(gfpflags_allow_blocking(gfp));
>  
> -	dma_resv_iter_begin(&cursor, resv, dma_resv_usage_rw(write));
> +	dma_resv_iter_begin(&cursor, resv, usage);
>  	dma_resv_for_each_fence_unlocked(&cursor, f) {
>  		pending = i915_sw_fence_await_dma_fence(fence, f, timeout,
>  							gfp);
> @@ -598,6 +597,18 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>  	return ret;
>  }
>  
> +int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
> +				    struct dma_resv *resv,
> +				    const struct dma_fence_ops *exclude,
> +				    bool write,
> +				    unsigned long timeout,
> +				    gfp_t gfp)
> +{
> +	return __i915_sw_fence_await_reservation(fence, resv,
> +						 dma_resv_usage_rw(write),
> +						 timeout, gfp);
> +}
> +
>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>  #include "selftests/lib_sw_fence.c"
>  #include "selftests/i915_sw_fence.c"
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
> index 619fc5a22f0c..3cf4b6e16f35 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.h
> @@ -10,13 +10,13 @@
>  #define _I915_SW_FENCE_H_
>  
>  #include <linux/dma-fence.h>
> +#include <linux/dma-resv.h>

As a GCC extension you can drop this and forward declare enum
dma_resv_usage. We use it extensively.

>  #include <linux/gfp.h>
>  #include <linux/kref.h>
>  #include <linux/notifier.h> /* for NOTIFY_DONE */
>  #include <linux/wait.h>
>  
>  struct completion;
> -struct dma_resv;
>  struct i915_sw_fence;
>  
>  enum i915_sw_fence_notify {
> @@ -89,6 +89,11 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>  				  unsigned long timeout,
>  				  gfp_t gfp);
>  
> +int __i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
> +				      struct dma_resv *resv,
> +				      enum dma_resv_usage usage,
> +				      unsigned long timeout,
> +				      gfp_t gfp);
>  int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>  				    struct dma_resv *resv,
>  				    const struct dma_fence_ops *exclude,

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 04/14] drm/i915/vm_bind: Implement bind and unbind of object
  2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
  (?)
@ 2022-09-22  9:29   ` Jani Nikula
  -1 siblings, 0 replies; 62+ messages in thread
From: Jani Nikula @ 2022-09-22  9:29 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: daniel.vetter, christian.koenig, thomas.hellstrom,
	paulo.r.zanoni, matthew.auld

On Wed, 21 Sep 2022, Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> wrote:
> Add uapi and implement support for bind and unbind of an
> object at the specified GPU virtual addresses.
>
> The vm_bind mode is not supported in legacy execbuf2 ioctl.
> It will be supported only in the newer execbuf3 ioctl.
>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/Makefile                 |   1 +
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   5 +
>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  27 ++
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 308 ++++++++++++++++++
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  10 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  17 +
>  drivers/gpu/drm/i915/i915_driver.c            |   3 +
>  drivers/gpu/drm/i915/i915_vma.c               |   3 +-
>  drivers/gpu/drm/i915/i915_vma.h               |   2 -
>  drivers/gpu/drm/i915/i915_vma_types.h         |  14 +
>  include/uapi/drm/i915_drm.h                   | 167 ++++++++++
>  11 files changed, 554 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index a26edcdadc21..9bf939ef18ea 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -166,6 +166,7 @@ gem-y += \
>  	gem/i915_gem_ttm_move.o \
>  	gem/i915_gem_ttm_pm.o \
>  	gem/i915_gem_userptr.o \
> +	gem/i915_gem_vm_bind_object.o \
>  	gem/i915_gem_wait.o \
>  	gem/i915_gemfs.o
>  i915-y += \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index cd75b0ca2555..f85f10cf9c34 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -781,6 +781,11 @@ static int eb_select_context(struct i915_execbuffer *eb)
>  	if (unlikely(IS_ERR(ctx)))
>  		return PTR_ERR(ctx);
>  
> +	if (ctx->vm->vm_bind_mode) {
> +		i915_gem_context_put(ctx);
> +		return -EOPNOTSUPP;
> +	}
> +
>  	eb->gem_context = ctx;
>  	if (i915_gem_context_has_full_ppgtt(ctx))
>  		eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> new file mode 100644
> index 000000000000..4f3cfa1f6ef6
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#ifndef __I915_GEM_VM_BIND_H
> +#define __I915_GEM_VM_BIND_H
> +
> +#include <linux/types.h>

This one's needed for u64, but none of the below includes are needed.
Please drop them and use forward declarations instead.

As a rule of thumb, don't include headers from headers if it can be
avoided. The interdependencies we have are already huge, and need to be
reduced, not increased.

BR,
Jani.

> +#include <drm/drm_file.h>
> +#include <drm/drm_device.h>
> +
> +#include "gt/intel_gtt.h"
> +#include "i915_vma_types.h"
> +
> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
> +
> +int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
> +			   struct drm_file *file);
> +int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file);
> +
> +void i915_gem_vm_unbind_all(struct i915_address_space *vm);
> +
> +#endif /* __I915_GEM_VM_BIND_H */
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> new file mode 100644
> index 000000000000..c24e22657617
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -0,0 +1,308 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#include <uapi/drm/i915_drm.h>
> +
> +#include <linux/interval_tree_generic.h>
> +
> +#include "gem/i915_gem_context.h"
> +#include "gem/i915_gem_vm_bind.h"
> +
> +#include "gt/intel_gpu_commands.h"
> +
> +#define START(node) ((node)->start)
> +#define LAST(node) ((node)->last)
> +
> +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
> +		     START, LAST, static inline, i915_vm_bind_it)
> +
> +#undef START
> +#undef LAST
> +
> +/**
> + * DOC: VM_BIND/UNBIND ioctls
> + *
> + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
> + * objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
> + * specified address space (VM). Multiple mappings can map to the same physical
> + * pages of an object (aliasing). These mappings (also referred to as persistent
> + * mappings) will be persistent across multiple GPU submissions (execbuf calls)
> + * issued by the UMD, without user having to provide a list of all required
> + * mappings during each submission (as required by older execbuf mode).
> + *
> + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
> + * signaling the completion of bind/unbind operation.
> + *
> + * VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
> + * User has to opt-in for VM_BIND mode of binding for an address space (VM)
> + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be
> + * done asynchronously, when valid out fence is specified.
> + *
> + * VM_BIND locking order is as below.
> + *
> + * 1) vm_bind_lock mutex will protect vm_bind lists. This lock is taken in
> + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
> + *    mapping.
> + *
> + *    In future, when GPU page faults are supported, we can potentially use a
> + *    rwsem instead, so that multiple page fault handlers can take the read
> + *    side lock to lookup the mapping and hence can run in parallel.
> + *    The older execbuf mode of binding do not need this lock.
> + *
> + * 2) The object's dma-resv lock will protect i915_vma state and needs
> + *    to be held while binding/unbinding a vma in the async worker and while
> + *    updating dma-resv fence list of an object. Note that private BOs of a VM
> + *    will all share a dma-resv object.
> + *
> + * 3) Spinlock/s to protect some of the VM's lists like the list of
> + *    invalidated vmas (due to eviction and userptr invalidation) etc.
> + */
> +
> +/**
> + * i915_gem_vm_bind_lookup_vma() - lookup for the vma with a starting addr
> + * @vm: virtual address space in which vma needs to be looked for
> + * @va: starting addr of the vma
> + *
> + * retrieves the vma with a starting address from the vm's vma tree.
> + *
> + * Returns: returns vma on success, NULL on failure.
> + */
> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
> +{
> +	lockdep_assert_held(&vm->vm_bind_lock);
> +
> +	return i915_vm_bind_it_iter_first(&vm->va, va, va);
> +}
> +
> +/**
> + * i915_gem_vm_bind_remove() - Remove vma from the vm bind list
> + * @vma: vma that needs to be removed
> + * @release_obj: release the object
> + *
> + * Removes the vma from the vm's lists and interval tree
> + */
> +static void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
> +{
> +	lockdep_assert_held(&vma->vm->vm_bind_lock);
> +
> +	list_del_init(&vma->vm_bind_link);
> +	i915_vm_bind_it_remove(vma, &vma->vm->va);
> +
> +	/* Release object */
> +	if (release_obj)
> +		i915_gem_object_put(vma->obj);
> +}
> +
> +static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
> +				  struct drm_i915_gem_vm_unbind *va)
> +{
> +	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma;
> +	int ret;
> +
> +	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
> +	if (ret)
> +		return ret;
> +
> +	va->start = gen8_noncanonical_addr(va->start);
> +	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +
> +	if (!vma)
> +		ret = -ENOENT;
> +	else if (vma->size != va->length)
> +		ret = -EINVAL;
> +
> +	if (ret) {
> +		mutex_unlock(&vm->vm_bind_lock);
> +		return ret;
> +	}
> +
> +	i915_gem_vm_bind_remove(vma, false);
> +
> +	mutex_unlock(&vm->vm_bind_lock);
> +
> +	/* Destroy vma and then release object */
> +	obj = vma->obj;
> +	ret = i915_gem_object_lock(obj, NULL);
> +	if (ret)
> +		return ret;
> +
> +	i915_vma_destroy(vma);
> +	i915_gem_object_unlock(obj);
> +
> +	i915_gem_object_put(obj);
> +
> +	return 0;
> +}
> +
> +/**
> + * i915_gem_vm_unbind_all() - Unbind all mappings from an address space
> + * @vm: Address spece to remove mappings from
> + *
> + * Unbind all userspace requested vm_bind mappings
> + */
> +void i915_gem_vm_unbind_all(struct i915_address_space *vm)
> +{
> +	struct i915_vma *vma, *t;
> +
> +	mutex_lock(&vm->vm_bind_lock);
> +	list_for_each_entry_safe(vma, t, &vm->vm_bind_list, vm_bind_link)
> +		i915_gem_vm_bind_remove(vma, true);
> +	list_for_each_entry_safe(vma, t, &vm->vm_bound_list, vm_bind_link)
> +		i915_gem_vm_bind_remove(vma, true);
> +	mutex_unlock(&vm->vm_bind_lock);
> +}
> +
> +static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
> +					struct drm_i915_gem_object *obj,
> +					struct drm_i915_gem_vm_bind *va)
> +{
> +	struct i915_gtt_view view;
> +	struct i915_vma *vma;
> +
> +	va->start = gen8_noncanonical_addr(va->start);
> +	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +	if (vma)
> +		return ERR_PTR(-EEXIST);
> +
> +	view.type = I915_GTT_VIEW_PARTIAL;
> +	view.partial.offset = va->offset >> PAGE_SHIFT;
> +	view.partial.size = va->length >> PAGE_SHIFT;
> +	vma = i915_vma_instance(obj, vm, &view);
> +	if (IS_ERR(vma))
> +		return vma;
> +
> +	vma->start = va->start;
> +	vma->last = va->start + va->length - 1;
> +
> +	return vma;
> +}
> +
> +static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> +				struct drm_i915_gem_vm_bind *va,
> +				struct drm_file *file)
> +{
> +	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma = NULL;
> +	struct i915_gem_ww_ctx ww;
> +	u64 pin_flags;
> +	int ret = 0;
> +
> +	if (!vm->vm_bind_mode)
> +		return -EOPNOTSUPP;
> +
> +	obj = i915_gem_object_lookup(file, va->handle);
> +	if (!obj)
> +		return -ENOENT;
> +
> +	if (!va->length ||
> +	    !IS_ALIGNED(va->offset | va->length,
> +			i915_gem_object_max_page_size(obj->mm.placements,
> +						      obj->mm.n_placements)) ||
> +	    range_overflows_t(u64, va->offset, va->length, obj->base.size)) {
> +		ret = -EINVAL;
> +		goto put_obj;
> +	}
> +
> +	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
> +	if (ret)
> +		goto put_obj;
> +
> +	vma = vm_bind_get_vma(vm, obj, va);
> +	if (IS_ERR(vma)) {
> +		ret = PTR_ERR(vma);
> +		goto unlock_vm;
> +	}
> +
> +	pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
> +
> +	for_i915_gem_ww(&ww, ret, true) {
> +		ret = i915_gem_object_lock(vma->obj, &ww);
> +		if (ret)
> +			continue;
> +
> +		ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
> +		if (ret)
> +			continue;
> +
> +		/* Make it evictable */
> +		__i915_vma_unpin(vma);
> +
> +		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
> +		i915_vm_bind_it_insert(vma, &vm->va);
> +
> +		/* Hold object reference until vm_unbind */
> +		i915_gem_object_get(vma->obj);
> +	}
> +
> +	if (ret)
> +		i915_vma_destroy(vma);
> +unlock_vm:
> +	mutex_unlock(&vm->vm_bind_lock);
> +put_obj:
> +	i915_gem_object_put(obj);
> +
> +	return ret;
> +}
> +
> +/**
> + * i915_gem_vm_bind_ioctl() - ioctl function for binding an obj into
> + * virtual address
> + * @dev: drm device associated to the virtual address
> + * @data: data related to the vm bind required
> + * @file: drm_file related to he ioctl
> + *
> + * Implements a function to bind the object into the virtual address
> + *
> + * Returns 0 on success, error code on failure.
> + */
> +int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
> +			   struct drm_file *file)
> +{
> +	struct drm_i915_gem_vm_bind *args = data;
> +	struct i915_address_space *vm;
> +	int ret;
> +
> +	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> +	if (unlikely(!vm))
> +		return -ENOENT;
> +
> +	ret = i915_gem_vm_bind_obj(vm, args, file);
> +
> +	i915_vm_put(vm);
> +	return ret;
> +}
> +
> +/**
> + * i915_gem_vm_unbind_ioctl() - ioctl function for unbinding an obj from
> + * virtual address
> + * @dev: drm device associated to the virtual address
> + * @data: data related to the binding that needs to be unbinded
> + * @file: drm_file related to the ioctl
> + *
> + * Implements a function to unbind the object from the virtual address
> + *
> + * Returns 0 on success, error code on failure.
> + */
> +int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file)
> +{
> +	struct drm_i915_gem_vm_unbind *args = data;
> +	struct i915_address_space *vm;
> +	int ret;
> +
> +	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> +	if (unlikely(!vm))
> +		return -ENOENT;
> +
> +	ret = i915_gem_vm_unbind_vma(vm, args);
> +
> +	i915_vm_put(vm);
> +	return ret;
> +}
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index b67831833c9a..0daa70c6ed0d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -12,6 +12,7 @@
>  
>  #include "gem/i915_gem_internal.h"
>  #include "gem/i915_gem_lmem.h"
> +#include "gem/i915_gem_vm_bind.h"
>  #include "i915_trace.h"
>  #include "i915_utils.h"
>  #include "intel_gt.h"
> @@ -176,6 +177,8 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
>  void i915_address_space_fini(struct i915_address_space *vm)
>  {
>  	drm_mm_takedown(&vm->mm);
> +	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
> +	mutex_destroy(&vm->vm_bind_lock);
>  }
>  
>  /**
> @@ -202,6 +205,8 @@ static void __i915_vm_release(struct work_struct *work)
>  	struct i915_address_space *vm =
>  		container_of(work, struct i915_address_space, release_work);
>  
> +	i915_gem_vm_unbind_all(vm);
> +
>  	__i915_vm_close(vm);
>  
>  	/* Synchronize async unbinds. */
> @@ -282,6 +287,11 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>  
>  	INIT_LIST_HEAD(&vm->bound_list);
>  	INIT_LIST_HEAD(&vm->unbound_list);
> +
> +	vm->va = RB_ROOT_CACHED;
> +	INIT_LIST_HEAD(&vm->vm_bind_list);
> +	INIT_LIST_HEAD(&vm->vm_bound_list);
> +	mutex_init(&vm->vm_bind_lock);
>  }
>  
>  void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index c0ca53cba9f0..b52061858161 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -259,6 +259,23 @@ struct i915_address_space {
>  	 */
>  	struct list_head unbound_list;
>  
> +	/**
> +	 * @vm_bind_mode: flag to indicate vm_bind method of binding
> +	 *
> +	 * True: allow only vm_bind method of binding.
> +	 * False: allow only legacy execbuff method of binding.
> +	 */
> +	bool vm_bind_mode:1;
> +
> +	/** @vm_bind_lock: Mutex to protect @vm_bind_list and @vm_bound_list */
> +	struct mutex vm_bind_lock;
> +	/** @vm_bind_list: List of vm_binding in process */
> +	struct list_head vm_bind_list;
> +	/** @vm_bound_list: List of vm_binding completed */
> +	struct list_head vm_bound_list;
> +	/* @va: tree of persistent vmas */
> +	struct rb_root_cached va;
> +
>  	/* Global GTT */
>  	bool is_ggtt:1;
>  
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index 9d1fc2477f80..f9e4a784dd0e 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -69,6 +69,7 @@
>  #include "gem/i915_gem_ioctls.h"
>  #include "gem/i915_gem_mman.h"
>  #include "gem/i915_gem_pm.h"
> +#include "gem/i915_gem_vm_bind.h"
>  #include "gt/intel_gt.h"
>  #include "gt/intel_gt_pm.h"
>  #include "gt/intel_rc6.h"
> @@ -1892,6 +1893,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
>  	DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GEM_VM_BIND, i915_gem_vm_bind_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GEM_VM_UNBIND, i915_gem_vm_unbind_ioctl, DRM_RENDER_ALLOW),
>  };
>  
>  /*
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index f17c09ead7d7..33cb0cbc7fb1 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -29,6 +29,7 @@
>  #include "display/intel_frontbuffer.h"
>  #include "gem/i915_gem_lmem.h"
>  #include "gem/i915_gem_tiling.h"
> +#include "gem/i915_gem_vm_bind.h"
>  #include "gt/intel_engine.h"
>  #include "gt/intel_engine_heartbeat.h"
>  #include "gt/intel_gt.h"
> @@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
>  	spin_unlock(&obj->vma.lock);
>  	mutex_unlock(&vm->mutex);
>  
> +	INIT_LIST_HEAD(&vma->vm_bind_link);
>  	return vma;
>  
>  err_unlock:
> @@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>  {
>  	struct i915_vma *vma;
>  
> -	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>  	GEM_BUG_ON(!kref_read(&vm->ref));
>  
>  	spin_lock(&obj->vma.lock);
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index aecd9c64486b..6feef0305fe1 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
>  {
>  	ptrdiff_t cmp;
>  
> -	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
> -
>  	cmp = ptrdiff(vma->vm, vm);
>  	if (cmp)
>  		return cmp;
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
> index ec0f6c9f57d0..bed7a344dcd7 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -289,6 +289,20 @@ struct i915_vma {
>  	/** This object's place on the active/inactive lists */
>  	struct list_head vm_link;
>  
> +	/** @vm_bind_link: node for the vm_bind related lists of vm */
> +	struct list_head vm_bind_link;
> +
> +	/** Interval tree structures for persistent vma */
> +
> +	/** @rb: node for the interval tree of vm for persistent vmas */
> +	struct rb_node rb;
> +	/** @start: start endpoint of the rb node */
> +	u64 start;
> +	/** @last: Last endpoint of the rb node */
> +	u64 last;
> +	/** @__subtree_last: last in subtree */
> +	u64 __subtree_last;
> +
>  	struct list_head obj_link; /* Link in the object's VMA list */
>  	struct rb_node obj_node;
>  	struct hlist_node obj_hash;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 520ad2691a99..4a4f2a77388c 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -470,6 +470,8 @@ typedef struct _drm_i915_sarea {
>  #define DRM_I915_GEM_VM_CREATE		0x3a
>  #define DRM_I915_GEM_VM_DESTROY		0x3b
>  #define DRM_I915_GEM_CREATE_EXT		0x3c
> +#define DRM_I915_GEM_VM_BIND		0x3d
> +#define DRM_I915_GEM_VM_UNBIND		0x3e
>  /* Must be kept compact -- no holes */
>  
>  #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
> @@ -534,6 +536,8 @@ typedef struct _drm_i915_sarea {
>  #define DRM_IOCTL_I915_QUERY			DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query)
>  #define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
>  #define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
> +#define DRM_IOCTL_I915_GEM_VM_BIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)
>  
>  /* Allow drivers to submit batchbuffers directly to hardware, relying
>   * on the security mechanisms provided by hardware.
> @@ -1507,6 +1511,41 @@ struct drm_i915_gem_execbuffer2 {
>  #define i915_execbuffer2_get_context_id(eb2) \
>  	((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
>  
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the completion of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> +	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
> +	__u32 handle;
> +
> +	/**
> +	 * @flags: Supported flags are:
> +	 *
> +	 * I915_TIMELINE_FENCE_WAIT:
> +	 * Wait for the input fence before the operation.
> +	 *
> +	 * I915_TIMELINE_FENCE_SIGNAL:
> +	 * Return operation completion fence as output.
> +	 */
> +	__u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
> +
> +	/**
> +	 * @value: A point in the timeline.
> +	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> +	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> +	 * binary one.
> +	 */
> +	__u64 value;
> +};
> +
>  struct drm_i915_gem_pin {
>  	/** Handle of the buffer to be pinned. */
>  	__u32 handle;
> @@ -3717,6 +3756,134 @@ struct drm_i915_gem_create_ext_protected_content {
>  /* ID of the protected content session managed by i915 when PXP is active */
>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>  
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
> + * virtual address (VA) range to the section of an object that should be bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not currently bound) and can
> + * be mapped to whole object or a section of the object (partial binding).
> + * Multiple VA mappings can be created to the same section of the object
> + * (aliasing).
> + *
> + * The @start, @offset and @length must be 4K page aligned. However the DG2
> + * and XEHPSDV has 64K page size for device local memory and has compact page
> + * table. On those platforms, for binding device local-memory objects, the
> + * @start, @offset and @length must be 64K aligned. Also, UMDs should not mix
> + * the local memory 64K page and the system memory 4K page bindings in the same
> + * 2M range.
> + *
> + * Error code -EINVAL will be returned if @start, @offset and @length are not
> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
> + * -ENOSPC will be returned if the VA range specified can't be reserved.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_bind {
> +	/** @vm_id: VM (address space) id to bind */
> +	__u32 vm_id;
> +
> +	/** @handle: Object handle */
> +	__u32 handle;
> +
> +	/** @start: Virtual Address start to bind */
> +	__u64 start;
> +
> +	/** @offset: Offset in object to bind */
> +	__u64 offset;
> +
> +	/** @length: Length of mapping to bind */
> +	__u64 length;
> +
> +	/**
> +	 * @flags: Currently reserved, MBZ.
> +	 *
> +	 * Note that @fence carries its own flags.
> +	 */
> +	__u64 flags;
> +
> +	/**
> +	 * @fence: Timeline fence for bind completion signaling.
> +	 *
> +	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
> +	 *
> +	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +	 * is invalid, and an error will be returned.
> +	 *
> +	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
> +	 * is not requested and binding is completed synchronously.
> +	 */
> +	struct drm_i915_gem_timeline_fence fence;
> +
> +	/**
> +	 * @extensions: Zero-terminated chain of extensions.
> +	 *
> +	 * For future extensions. See struct i915_user_extension.
> +	 */
> +	__u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> + *
> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
> + * address (VA) range that should be unbound from the device page table of the
> + * specified address space (VM). VM_UNBIND will force unbind the specified
> + * range from device page table without waiting for any GPU job to complete.
> + * It is UMDs responsibility to ensure the mapping is no longer in use before
> + * calling VM_UNBIND.
> + *
> + * If the specified mapping is not found, the ioctl will simply return without
> + * any error.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_unbind {
> +	/** @vm_id: VM (address space) id to bind */
> +	__u32 vm_id;
> +
> +	/** @rsvd: Reserved, MBZ */
> +	__u32 rsvd;
> +
> +	/** @start: Virtual Address start to unbind */
> +	__u64 start;
> +
> +	/** @length: Length of mapping to unbind */
> +	__u64 length;
> +
> +	/**
> +	 * @flags: Currently reserved, MBZ.
> +	 *
> +	 * Note that @fence carries its own flags.
> +	 */
> +	__u64 flags;
> +
> +	/**
> +	 * @fence: Timeline fence for unbind completion signaling.
> +	 *
> +	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
> +	 *
> +	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +	 * is invalid, and an error will be returned.
> +	 *
> +	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
> +	 * is not requested and unbinding is completed synchronously.
> +	 */
> +	struct drm_i915_gem_timeline_fence fence;
> +
> +	/**
> +	 * @extensions: Zero-terminated chain of extensions.
> +	 *
> +	 * For future extensions. See struct i915_user_extension.
> +	 */
> +	__u64 extensions;
> +};
> +
>  #if defined(__cplusplus)
>  }
>  #endif

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 07/14] drm/i915/vm_bind: Add out fence support
  2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
  (?)
@ 2022-09-22  9:31   ` Jani Nikula
  -1 siblings, 0 replies; 62+ messages in thread
From: Jani Nikula @ 2022-09-22  9:31 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: daniel.vetter, christian.koenig, thomas.hellstrom,
	paulo.r.zanoni, matthew.auld

On Wed, 21 Sep 2022, Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> wrote:
> Add support for handling out fence for vm_bind call.
>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  4 +
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 81 +++++++++++++++++++
>  drivers/gpu/drm/i915/i915_vma.c               |  6 +-
>  drivers/gpu/drm/i915/i915_vma_types.h         |  7 ++
>  4 files changed, 97 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> index 4f3cfa1f6ef6..facba29ead04 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -6,6 +6,7 @@
>  #ifndef __I915_GEM_VM_BIND_H
>  #define __I915_GEM_VM_BIND_H
>  
> +#include <linux/dma-fence.h>

Unnecessary. Please use forward declarations.

>  #include <linux/types.h>
>  
>  #include <drm/drm_file.h>
> @@ -24,4 +25,7 @@ int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
>  
>  void i915_gem_vm_unbind_all(struct i915_address_space *vm);
>  
> +void i915_vm_bind_signal_fence(struct i915_vma *vma,
> +			       struct dma_fence * const fence);
> +
>  #endif /* __I915_GEM_VM_BIND_H */
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> index 236f901b8b9c..5cd788404ee7 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -7,6 +7,8 @@
>  
>  #include <linux/interval_tree_generic.h>
>  
> +#include <drm/drm_syncobj.h>
> +
>  #include "gem/i915_gem_context.h"
>  #include "gem/i915_gem_vm_bind.h"
>  
> @@ -106,6 +108,75 @@ static void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
>  		i915_gem_object_put(vma->obj);
>  }
>  
> +static int i915_vm_bind_add_fence(struct drm_file *file, struct i915_vma *vma,
> +				  u32 handle, u64 point)
> +{
> +	struct drm_syncobj *syncobj;
> +
> +	syncobj = drm_syncobj_find(file, handle);
> +	if (!syncobj) {
> +		DRM_DEBUG("Invalid syncobj handle provided\n");
> +		return -ENOENT;
> +	}
> +
> +	/*
> +	 * For timeline syncobjs we need to preallocate chains for
> +	 * later signaling.
> +	 */
> +	if (point) {
> +		vma->vm_bind_fence.chain_fence = dma_fence_chain_alloc();
> +		if (!vma->vm_bind_fence.chain_fence) {
> +			drm_syncobj_put(syncobj);
> +			return -ENOMEM;
> +		}
> +	} else {
> +		vma->vm_bind_fence.chain_fence = NULL;
> +	}
> +	vma->vm_bind_fence.syncobj = syncobj;
> +	vma->vm_bind_fence.value = point;
> +
> +	return 0;
> +}
> +
> +static void i915_vm_bind_put_fence(struct i915_vma *vma)
> +{
> +	if (!vma->vm_bind_fence.syncobj)
> +		return;
> +
> +	drm_syncobj_put(vma->vm_bind_fence.syncobj);
> +	dma_fence_chain_free(vma->vm_bind_fence.chain_fence);
> +}
> +
> +/**
> + * i915_vm_bind_signal_fence() - Add fence to vm_bind syncobj
> + * @vma: vma mapping requiring signaling
> + * @fence: fence to be added
> + *
> + * Associate specified @fence with the @vma's syncobj to be
> + * signaled after the @fence work completes.
> + */
> +void i915_vm_bind_signal_fence(struct i915_vma *vma,
> +			       struct dma_fence * const fence)
> +{
> +	struct drm_syncobj *syncobj = vma->vm_bind_fence.syncobj;
> +
> +	if (!syncobj)
> +		return;
> +
> +	if (vma->vm_bind_fence.chain_fence) {
> +		drm_syncobj_add_point(syncobj,
> +				      vma->vm_bind_fence.chain_fence,
> +				      fence, vma->vm_bind_fence.value);
> +		/*
> +		 * The chain's ownership is transferred to the
> +		 * timeline.
> +		 */
> +		vma->vm_bind_fence.chain_fence = NULL;
> +	} else {
> +		drm_syncobj_replace_fence(syncobj, fence);
> +	}
> +}
> +
>  static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
>  				  struct drm_i915_gem_vm_unbind *va)
>  {
> @@ -233,6 +304,13 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>  		goto unlock_vm;
>  	}
>  
> +	if (va->fence.flags & I915_TIMELINE_FENCE_SIGNAL) {
> +		ret = i915_vm_bind_add_fence(file, vma, va->fence.handle,
> +					     va->fence.value);
> +		if (ret)
> +			goto put_vma;
> +	}
> +
>  	pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
>  
>  	for_i915_gem_ww(&ww, ret, true) {
> @@ -257,6 +335,9 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>  		i915_gem_object_get(vma->obj);
>  	}
>  
> +	if (va->fence.flags & I915_TIMELINE_FENCE_SIGNAL)
> +		i915_vm_bind_put_fence(vma);
> +put_vma:
>  	if (ret)
>  		i915_vma_destroy(vma);
>  unlock_vm:
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index ff216e9a2c8d..f7d711e675d6 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -1540,8 +1540,12 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>  err_vma_res:
>  	i915_vma_resource_free(vma_res);
>  err_fence:
> -	if (work)
> +	if (work) {
> +		if (i915_vma_is_persistent(vma))
> +			i915_vm_bind_signal_fence(vma, &work->base.dma);
> +
>  		dma_fence_work_commit_imm(&work->base);
> +	}
>  err_rpm:
>  	if (wakeref)
>  		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
> index d21bf97febaa..7fdbf73666e9 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -311,6 +311,13 @@ struct i915_vma {
>  	/* @vm_rebind_link: link to vm_rebind_list and protected by vm_rebind_lock */
>  	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
>  
> +	/** Timeline fence for vm_bind completion notification */
> +	struct {
> +		struct dma_fence_chain *chain_fence;
> +		struct drm_syncobj *syncobj;
> +		u64 value;
> +	} vm_bind_fence;
> +
>  	/** Interval tree structures for persistent vma */
>  
>  	/** @rb: node for the interval tree of vm for persistent vmas */

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC v4 08/14] drm/i915/vm_bind: Abstract out common execbuf functions
  2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-22  9:54     ` Jani Nikula
  -1 siblings, 0 replies; 62+ messages in thread
From: Jani Nikula @ 2022-09-22  9:54 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

On Wed, 21 Sep 2022, Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> wrote:
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
> new file mode 100644
> index 000000000000..725febfd6a53
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#ifndef __I915_GEM_EXECBUFFER_COMMON_H
> +#define __I915_GEM_EXECBUFFER_COMMON_H
> +
> +#include <drm/drm_syncobj.h>
> +
> +#include "gt/intel_context.h"

You don't need these includes. Most of it can be handled using forward
declarations. You'll need <linux/types.h>

> +
> +struct eb_fence {
> +	struct drm_syncobj *syncobj;
> +	struct dma_fence *dma_fence;
> +	u64 value;
> +	struct dma_fence_chain *chain_fence;
> +};
> +
> +int __eb_pin_engine(struct intel_context *ce, struct i915_gem_ww_ctx *ww,
> +		    bool throttle, bool nonblock);
> +void __eb_unpin_engine(struct intel_context *ce);
> +int __eb_select_engine(struct intel_context *ce);
> +void __eb_put_engine(struct intel_context *context, struct intel_gt *gt);
> +
> +struct intel_context *
> +eb_find_context(struct intel_context *context, unsigned int context_number);
> +
> +int add_timeline_fence(struct drm_file *file, u32 handle, u64 point,
> +		       struct eb_fence *f, bool wait, bool signal);
> +void put_fence_array(struct eb_fence *fences, u64 num_fences);
> +int await_fence_array(struct eb_fence *fences, u64 num_fences,
> +		      struct i915_request *rq);
> +void signal_fence_array(struct eb_fence *fences, u64 num_fences,
> +			struct dma_fence * const fence);
> +
> +int eb_requests_add(struct i915_request **requests, unsigned int num_batches,
> +		    struct intel_context *context, struct i915_sched_attr sched,

struct i915_sched_attr is passed by value, so you either need to turn
that into a pointer, or you need the definition. The definition is just
a wrapper around an int. (For strict type safety or for future proofing
or what, I don't know.) And this all brings me to my pet peeve about
gem/gt headers.

To get that definition of a struct wrapper around an int, you need to
include i915_scheduler_types.h, which recursively includes a total of 16
headers. Touch any of those files, and you get a rebuild butterfly
effect.

28% of i915 header files, when modified, cause the rebuild of 83% of the
driver. Please let's not make it worse.


BR,
Jani.

> +		    int err);
> +void eb_requests_get(struct i915_request **requests, unsigned int num_batches);
> +void eb_requests_put(struct i915_request **requests, unsigned int num_batches);
> +
> +struct dma_fence *__eb_composite_fence_create(struct i915_request **requests,
> +					      unsigned int num_batches,
> +					      struct intel_context *context);
> +
> +#endif /* __I915_GEM_EXECBUFFER_COMMON_H */

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 08/14] drm/i915/vm_bind: Abstract out common execbuf functions
@ 2022-09-22  9:54     ` Jani Nikula
  0 siblings, 0 replies; 62+ messages in thread
From: Jani Nikula @ 2022-09-22  9:54 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

On Wed, 21 Sep 2022, Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> wrote:
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
> new file mode 100644
> index 000000000000..725febfd6a53
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#ifndef __I915_GEM_EXECBUFFER_COMMON_H
> +#define __I915_GEM_EXECBUFFER_COMMON_H
> +
> +#include <drm/drm_syncobj.h>
> +
> +#include "gt/intel_context.h"

You don't need these includes. Most of it can be handled using forward
declarations. You'll need <linux/types.h>

> +
> +struct eb_fence {
> +	struct drm_syncobj *syncobj;
> +	struct dma_fence *dma_fence;
> +	u64 value;
> +	struct dma_fence_chain *chain_fence;
> +};
> +
> +int __eb_pin_engine(struct intel_context *ce, struct i915_gem_ww_ctx *ww,
> +		    bool throttle, bool nonblock);
> +void __eb_unpin_engine(struct intel_context *ce);
> +int __eb_select_engine(struct intel_context *ce);
> +void __eb_put_engine(struct intel_context *context, struct intel_gt *gt);
> +
> +struct intel_context *
> +eb_find_context(struct intel_context *context, unsigned int context_number);
> +
> +int add_timeline_fence(struct drm_file *file, u32 handle, u64 point,
> +		       struct eb_fence *f, bool wait, bool signal);
> +void put_fence_array(struct eb_fence *fences, u64 num_fences);
> +int await_fence_array(struct eb_fence *fences, u64 num_fences,
> +		      struct i915_request *rq);
> +void signal_fence_array(struct eb_fence *fences, u64 num_fences,
> +			struct dma_fence * const fence);
> +
> +int eb_requests_add(struct i915_request **requests, unsigned int num_batches,
> +		    struct intel_context *context, struct i915_sched_attr sched,

struct i915_sched_attr is passed by value, so you either need to turn
that into a pointer, or you need the definition. The definition is just
a wrapper around an int. (For strict type safety or for future proofing
or what, I don't know.) And this all brings me to my pet peeve about
gem/gt headers.

To get that definition of a struct wrapper around an int, you need to
include i915_scheduler_types.h, which recursively includes a total of 16
headers. Touch any of those files, and you get a rebuild butterfly
effect.

28% of i915 header files, when modified, cause the rebuild of 83% of the
driver. Please let's not make it worse.


BR,
Jani.

> +		    int err);
> +void eb_requests_get(struct i915_request **requests, unsigned int num_batches);
> +void eb_requests_put(struct i915_request **requests, unsigned int num_batches);
> +
> +struct dma_fence *__eb_composite_fence_create(struct i915_request **requests,
> +					      unsigned int num_batches,
> +					      struct intel_context *context);
> +
> +#endif /* __I915_GEM_EXECBUFFER_COMMON_H */

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 08/14] drm/i915/vm_bind: Abstract out common execbuf functions
  2022-09-22  9:05       ` Tvrtko Ursulin
@ 2022-09-22 14:12         ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-22 14:12 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	matthew.auld, daniel.vetter, christian.koenig

On Thu, Sep 22, 2022 at 10:05:34AM +0100, Tvrtko Ursulin wrote:
>
>On 21/09/2022 19:17, Niranjana Vishwanathapura wrote:
>>On Wed, Sep 21, 2022 at 11:18:53AM +0100, Tvrtko Ursulin wrote:
>>>
>>>On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:

<snip>

>>>
>>>Two things:
>>>
>>>1)
>>>
>>>Is there enough commonality to maybe avoid multiple arguments and 
>>>have like
>>>
>>>struct i915_execbuffer {
>>>
>>>};
>>>
>>>struct i915_execbuffer2 {
>>>    struct i915_execbuffer eb;
>>>    .. eb2 specific fields ..
>>>};
>>>
>>>struct i915_execbuffer3 {
>>>    struct i915_execbuffer eb;
>>>    .. eb3 specific fields ..
>>>};
>>>
>>>And then have the common helpers take the pointer to the common struct?
>>>
>>
>>...
>>This requires updating legacy execbuf path everywhere which doesn't look
>>like a good idea to me. As discussed during vm_bind rfc, I think it is
>>better to keep execbuf3 to itself and keep it leaner.
>
>To be clear the amount of almost the same duplicated code worries me 
>from the maintenance burden angle. I don't think we have any such 
>precedent in the driver. And AFAIR during RFC conclusion was keep the 
>ioctls separate and share code where it makes sense.
>

But if we make a common functions that tries to cater to all with lot
of 'if/else' statements, that also doesn't look good.
What I took from RFC discussion was that code should be duplicated
and only share code where is a 100% match.

>For instance eb_fences_add - could you have a common helper which 
>takes in_fence and out_fence as parameters. Passing in -1/-1 from eb3 
>and end up with even more sharing? Same approach like you did in this 
>patch by making helpers take arguments they need instead of struct eb.
>
>Eb_requests_create? Again same code if you make eb->batch_pool a 
>standalone argument passed in.
>

I am trying to avoid those things. The legacy execbuf and execbuf3 are
very different here. ie., execbuf3 doesn't support in/out fences,
the handling of batches are different and there is no batch_pool etc.
So, it would be good to have those two paths handle it separately.
Why should execbuf3 send dummy '-1 or NULL' etc when the point of
execbuf3 is to move away from legacy things?

Niranjana

>Haven't looked at more than those in this round..
>




>Regards,
>
>Tvrtko
>
>>>2)
>>>
>>>Should we prefix with i915_ everything that is now no longer static?
>>>
>>
>>Yah, makes sense, will update.
>>
>>Niranjana
>>
>>>Regards,
>>>
>>>Tvrtko
>>>
>>>>+
>>>>+struct intel_context *
>>>>+eb_find_context(struct intel_context *context, unsigned int 
>>>>context_number);
>>>>+
>>>>+int add_timeline_fence(struct drm_file *file, u32 handle, u64 point,
>>>>+               struct eb_fence *f, bool wait, bool signal);
>>>>+void put_fence_array(struct eb_fence *fences, u64 num_fences);
>>>>+int await_fence_array(struct eb_fence *fences, u64 num_fences,
>>>>+              struct i915_request *rq);
>>>>+void signal_fence_array(struct eb_fence *fences, u64 num_fences,
>>>>+            struct dma_fence * const fence);
>>>>+
>>>>+int eb_requests_add(struct i915_request **requests, unsigned 
>>>>int num_batches,
>>>>+            struct intel_context *context, struct 
>>>>i915_sched_attr sched,
>>>>+            int err);
>>>>+void eb_requests_get(struct i915_request **requests, unsigned 
>>>>int num_batches);
>>>>+void eb_requests_put(struct i915_request **requests, unsigned 
>>>>int num_batches);
>>>>+
>>>>+struct dma_fence *__eb_composite_fence_create(struct 
>>>>i915_request **requests,
>>>>+                          unsigned int num_batches,
>>>>+                          struct intel_context *context);
>>>>+
>>>>+#endif /* __I915_GEM_EXECBUFFER_COMMON_H */

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 03/14] drm/i915/vm_bind: Expose i915_gem_object_max_page_size()
  2022-09-22  8:09       ` Tvrtko Ursulin
@ 2022-09-22 16:18         ` Matthew Auld
  2022-09-22 16:46             ` Niranjana Vishwanathapura
  2022-09-23  7:45           ` Tvrtko Ursulin
  0 siblings, 2 replies; 62+ messages in thread
From: Matthew Auld @ 2022-09-22 16:18 UTC (permalink / raw)
  To: Tvrtko Ursulin, Niranjana Vishwanathapura
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	daniel.vetter, christian.koenig

On 22/09/2022 09:09, Tvrtko Ursulin wrote:
> 
> On 21/09/2022 19:00, Niranjana Vishwanathapura wrote:
>> On Wed, Sep 21, 2022 at 10:13:12AM +0100, Tvrtko Ursulin wrote:
>>>
>>> On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
>>>> Expose i915_gem_object_max_page_size() function non-static
>>>> which will be used by the vm_bind feature.
>>>>
>>>> Signed-off-by: Niranjana Vishwanathapura 
>>>> <niranjana.vishwanathapura@intel.com>
>>>> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>>> ---
>>>>  drivers/gpu/drm/i915/gem/i915_gem_create.c | 20 +++++++++++++++-----
>>>>  drivers/gpu/drm/i915/gem/i915_gem_object.h |  2 ++
>>>>  2 files changed, 17 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
>>>> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>> index 33673fe7ee0a..3b3ab4abb0a3 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>> @@ -11,14 +11,24 @@
>>>>  #include "pxp/intel_pxp.h"
>>>>  #include "i915_drv.h"
>>>> +#include "i915_gem_context.h"
>>>
>>> I can't spot that you are adding any code which would need this? 
>>> I915_GTT_PAGE_SIZE_4K? It is in intel_gtt.h.
>>
>> This include should have been added in a later patch for calling
>> i915_gem_vm_lookup(). But got added here while patch refactoring.
>> Will fix.
>>
>>>
>>>>  #include "i915_gem_create.h"
>>>>  #include "i915_trace.h"
>>>>  #include "i915_user_extensions.h"
>>>> -static u32 object_max_page_size(struct intel_memory_region 
>>>> **placements,
>>>> -                unsigned int n_placements)
>>>> +/**
>>>> + * i915_gem_object_max_page_size() - max of min_page_size of the 
>>>> regions
>>>> + * @placements:  list of regions
>>>> + * @n_placements: number of the placements
>>>> + *
>>>> + * Calculates the max of the min_page_size of a list of placements 
>>>> passed in.
>>>> + *
>>>> + * Return: max of the min_page_size
>>>> + */
>>>> +u32 i915_gem_object_max_page_size(struct intel_memory_region 
>>>> **placements,
>>>> +                  unsigned int n_placements)
>>>>  {
>>>> -    u32 max_page_size = 0;
>>>> +    u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>>>>      int i;
>>>>      for (i = 0; i < n_placements; i++) {
>>>> @@ -28,7 +38,6 @@ static u32 object_max_page_size(struct 
>>>> intel_memory_region **placements,
>>>>          max_page_size = max_t(u32, max_page_size, mr->min_page_size);
>>>>      }
>>>> -    GEM_BUG_ON(!max_page_size);
>>>>      return max_page_size;
>>>>  }
>>>> @@ -99,7 +108,8 @@ __i915_gem_object_create_user_ext(struct 
>>>> drm_i915_private *i915, u64 size,
>>>>      i915_gem_flush_free_objects(i915);
>>>> -    size = round_up(size, object_max_page_size(placements, 
>>>> n_placements));
>>>> +    size = round_up(size, i915_gem_object_max_page_size(placements,
>>>> +                                n_placements));
>>>>      if (size == 0)
>>>>          return ERR_PTR(-EINVAL);
>>>
>>> Because of the changes above this path is now unreachable. I suppose 
>>> it was meant to tell the user "you have supplied no placements"? But 
>>> then GEM_BUG_ON (which you remove) used to be wrong.
>>>
>>
>> Yah, looks like an existing problem. May be this "size == 0" check
>> should have been made before we do the round_up()? ie., check input 
>> 'size'
>> paramter is not 0?
>> I think for now, I will remove this check as it was unreachable anyhow.
> 
> Hm that's true as well. i915_gem_create_ext_ioctl ensures at least one 
> placement and internal callers do as well.
> 
> To be safe, instead of removing maybe move to before "size = " and 
> change to "if (GEM_WARN_ON(n_placements == 0))"? Not sure.. Matt any 
> thoughts here given the changes in this patch?

The check is also to reject a zero sized object with args->size = 0, i.e 
round_up(0, PAGE_SIZE) == 0. So for sure that is still needed here.

> 
> Regards,
> 
> Tvrtko
> 
>>
>> Niranjana
>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
>>>> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>>> index 7317d4102955..8c97bddad921 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>>> @@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64 
>>>> size)
>>>>  }
>>>>  void i915_gem_init__objects(struct drm_i915_private *i915);
>>>> +u32 i915_gem_object_max_page_size(struct intel_memory_region 
>>>> **placements,
>>>> +                  unsigned int n_placements);
>>>>  void i915_objects_module_exit(void);
>>>>  int i915_objects_module_init(void);

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 03/14] drm/i915/vm_bind: Expose i915_gem_object_max_page_size()
  2022-09-22 16:18         ` Matthew Auld
@ 2022-09-22 16:46             ` Niranjana Vishwanathapura
  2022-09-23  7:45           ` Tvrtko Ursulin
  1 sibling, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-22 16:46 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Tvrtko Ursulin, paulo.r.zanoni, intel-gfx, dri-devel,
	thomas.hellstrom, daniel.vetter, christian.koenig

On Thu, Sep 22, 2022 at 05:18:28PM +0100, Matthew Auld wrote:
>On 22/09/2022 09:09, Tvrtko Ursulin wrote:
>>
>>On 21/09/2022 19:00, Niranjana Vishwanathapura wrote:
>>>On Wed, Sep 21, 2022 at 10:13:12AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>>On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
>>>>>Expose i915_gem_object_max_page_size() function non-static
>>>>>which will be used by the vm_bind feature.
>>>>>
>>>>>Signed-off-by: Niranjana Vishwanathapura 
>>>>><niranjana.vishwanathapura@intel.com>
>>>>>Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>>>>---
>>>>> drivers/gpu/drm/i915/gem/i915_gem_create.c | 20 +++++++++++++++-----
>>>>> drivers/gpu/drm/i915/gem/i915_gem_object.h |  2 ++
>>>>> 2 files changed, 17 insertions(+), 5 deletions(-)
>>>>>
>>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
>>>>>b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>>>index 33673fe7ee0a..3b3ab4abb0a3 100644
>>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>>>@@ -11,14 +11,24 @@
>>>>> #include "pxp/intel_pxp.h"
>>>>> #include "i915_drv.h"
>>>>>+#include "i915_gem_context.h"
>>>>
>>>>I can't spot that you are adding any code which would need this? 
>>>>I915_GTT_PAGE_SIZE_4K? It is in intel_gtt.h.
>>>
>>>This include should have been added in a later patch for calling
>>>i915_gem_vm_lookup(). But got added here while patch refactoring.
>>>Will fix.
>>>
>>>>
>>>>> #include "i915_gem_create.h"
>>>>> #include "i915_trace.h"
>>>>> #include "i915_user_extensions.h"
>>>>>-static u32 object_max_page_size(struct intel_memory_region 
>>>>>**placements,
>>>>>-                unsigned int n_placements)
>>>>>+/**
>>>>>+ * i915_gem_object_max_page_size() - max of min_page_size of 
>>>>>the regions
>>>>>+ * @placements:  list of regions
>>>>>+ * @n_placements: number of the placements
>>>>>+ *
>>>>>+ * Calculates the max of the min_page_size of a list of 
>>>>>placements passed in.
>>>>>+ *
>>>>>+ * Return: max of the min_page_size
>>>>>+ */
>>>>>+u32 i915_gem_object_max_page_size(struct intel_memory_region 
>>>>>**placements,
>>>>>+                  unsigned int n_placements)
>>>>> {
>>>>>-    u32 max_page_size = 0;
>>>>>+    u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>>>>>     int i;
>>>>>     for (i = 0; i < n_placements; i++) {
>>>>>@@ -28,7 +38,6 @@ static u32 object_max_page_size(struct 
>>>>>intel_memory_region **placements,
>>>>>         max_page_size = max_t(u32, max_page_size, mr->min_page_size);
>>>>>     }
>>>>>-    GEM_BUG_ON(!max_page_size);
>>>>>     return max_page_size;
>>>>> }
>>>>>@@ -99,7 +108,8 @@ __i915_gem_object_create_user_ext(struct 
>>>>>drm_i915_private *i915, u64 size,
>>>>>     i915_gem_flush_free_objects(i915);
>>>>>-    size = round_up(size, object_max_page_size(placements, 
>>>>>n_placements));
>>>>>+    size = round_up(size, i915_gem_object_max_page_size(placements,
>>>>>+                                n_placements));
>>>>>     if (size == 0)
>>>>>         return ERR_PTR(-EINVAL);
>>>>
>>>>Because of the changes above this path is now unreachable. I 
>>>>suppose it was meant to tell the user "you have supplied no 
>>>>placements"? But then GEM_BUG_ON (which you remove) used to be 
>>>>wrong.
>>>>
>>>
>>>Yah, looks like an existing problem. May be this "size == 0" check
>>>should have been made before we do the round_up()? ie., check 
>>>input 'size'
>>>paramter is not 0?
>>>I think for now, I will remove this check as it was unreachable anyhow.
>>
>>Hm that's true as well. i915_gem_create_ext_ioctl ensures at least 
>>one placement and internal callers do as well.
>>
>>To be safe, instead of removing maybe move to before "size = " and 
>>change to "if (GEM_WARN_ON(n_placements == 0))"? Not sure.. Matt any 
>>thoughts here given the changes in this patch?
>
>The check is also to reject a zero sized object with args->size = 0, 
>i.e round_up(0, PAGE_SIZE) == 0. So for sure that is still needed 
>here.

Thanks Matt.
Yah, we could check for "size == 0" before we round_up, but doing it
after like here should be just fine. Will keep it as is.

Niranjana

>
>>
>>Regards,
>>
>>Tvrtko
>>
>>>
>>>Niranjana
>>>
>>>>Regards,
>>>>
>>>>Tvrtko
>>>>
>>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
>>>>>b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>>>>index 7317d4102955..8c97bddad921 100644
>>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>>>>@@ -47,6 +47,8 @@ static inline bool 
>>>>>i915_gem_object_size_2big(u64 size)
>>>>> }
>>>>> void i915_gem_init__objects(struct drm_i915_private *i915);
>>>>>+u32 i915_gem_object_max_page_size(struct intel_memory_region 
>>>>>**placements,
>>>>>+                  unsigned int n_placements);
>>>>> void i915_objects_module_exit(void);
>>>>> int i915_objects_module_init(void);

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 03/14] drm/i915/vm_bind: Expose i915_gem_object_max_page_size()
@ 2022-09-22 16:46             ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-22 16:46 UTC (permalink / raw)
  To: Matthew Auld
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	daniel.vetter, christian.koenig

On Thu, Sep 22, 2022 at 05:18:28PM +0100, Matthew Auld wrote:
>On 22/09/2022 09:09, Tvrtko Ursulin wrote:
>>
>>On 21/09/2022 19:00, Niranjana Vishwanathapura wrote:
>>>On Wed, Sep 21, 2022 at 10:13:12AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>>On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
>>>>>Expose i915_gem_object_max_page_size() function non-static
>>>>>which will be used by the vm_bind feature.
>>>>>
>>>>>Signed-off-by: Niranjana Vishwanathapura 
>>>>><niranjana.vishwanathapura@intel.com>
>>>>>Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>>>>---
>>>>> drivers/gpu/drm/i915/gem/i915_gem_create.c | 20 +++++++++++++++-----
>>>>> drivers/gpu/drm/i915/gem/i915_gem_object.h |  2 ++
>>>>> 2 files changed, 17 insertions(+), 5 deletions(-)
>>>>>
>>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
>>>>>b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>>>index 33673fe7ee0a..3b3ab4abb0a3 100644
>>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>>>@@ -11,14 +11,24 @@
>>>>> #include "pxp/intel_pxp.h"
>>>>> #include "i915_drv.h"
>>>>>+#include "i915_gem_context.h"
>>>>
>>>>I can't spot that you are adding any code which would need this? 
>>>>I915_GTT_PAGE_SIZE_4K? It is in intel_gtt.h.
>>>
>>>This include should have been added in a later patch for calling
>>>i915_gem_vm_lookup(). But got added here while patch refactoring.
>>>Will fix.
>>>
>>>>
>>>>> #include "i915_gem_create.h"
>>>>> #include "i915_trace.h"
>>>>> #include "i915_user_extensions.h"
>>>>>-static u32 object_max_page_size(struct intel_memory_region 
>>>>>**placements,
>>>>>-                unsigned int n_placements)
>>>>>+/**
>>>>>+ * i915_gem_object_max_page_size() - max of min_page_size of 
>>>>>the regions
>>>>>+ * @placements:  list of regions
>>>>>+ * @n_placements: number of the placements
>>>>>+ *
>>>>>+ * Calculates the max of the min_page_size of a list of 
>>>>>placements passed in.
>>>>>+ *
>>>>>+ * Return: max of the min_page_size
>>>>>+ */
>>>>>+u32 i915_gem_object_max_page_size(struct intel_memory_region 
>>>>>**placements,
>>>>>+                  unsigned int n_placements)
>>>>> {
>>>>>-    u32 max_page_size = 0;
>>>>>+    u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>>>>>     int i;
>>>>>     for (i = 0; i < n_placements; i++) {
>>>>>@@ -28,7 +38,6 @@ static u32 object_max_page_size(struct 
>>>>>intel_memory_region **placements,
>>>>>         max_page_size = max_t(u32, max_page_size, mr->min_page_size);
>>>>>     }
>>>>>-    GEM_BUG_ON(!max_page_size);
>>>>>     return max_page_size;
>>>>> }
>>>>>@@ -99,7 +108,8 @@ __i915_gem_object_create_user_ext(struct 
>>>>>drm_i915_private *i915, u64 size,
>>>>>     i915_gem_flush_free_objects(i915);
>>>>>-    size = round_up(size, object_max_page_size(placements, 
>>>>>n_placements));
>>>>>+    size = round_up(size, i915_gem_object_max_page_size(placements,
>>>>>+                                n_placements));
>>>>>     if (size == 0)
>>>>>         return ERR_PTR(-EINVAL);
>>>>
>>>>Because of the changes above this path is now unreachable. I 
>>>>suppose it was meant to tell the user "you have supplied no 
>>>>placements"? But then GEM_BUG_ON (which you remove) used to be 
>>>>wrong.
>>>>
>>>
>>>Yah, looks like an existing problem. May be this "size == 0" check
>>>should have been made before we do the round_up()? ie., check 
>>>input 'size'
>>>paramter is not 0?
>>>I think for now, I will remove this check as it was unreachable anyhow.
>>
>>Hm that's true as well. i915_gem_create_ext_ioctl ensures at least 
>>one placement and internal callers do as well.
>>
>>To be safe, instead of removing maybe move to before "size = " and 
>>change to "if (GEM_WARN_ON(n_placements == 0))"? Not sure.. Matt any 
>>thoughts here given the changes in this patch?
>
>The check is also to reject a zero sized object with args->size = 0, 
>i.e round_up(0, PAGE_SIZE) == 0. So for sure that is still needed 
>here.

Thanks Matt.
Yah, we could check for "size == 0" before we round_up, but doing it
after like here should be just fine. Will keep it as is.

Niranjana

>
>>
>>Regards,
>>
>>Tvrtko
>>
>>>
>>>Niranjana
>>>
>>>>Regards,
>>>>
>>>>Tvrtko
>>>>
>>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
>>>>>b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>>>>index 7317d4102955..8c97bddad921 100644
>>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>>>>@@ -47,6 +47,8 @@ static inline bool 
>>>>>i915_gem_object_size_2big(u64 size)
>>>>> }
>>>>> void i915_gem_init__objects(struct drm_i915_private *i915);
>>>>>+u32 i915_gem_object_max_page_size(struct intel_memory_region 
>>>>>**placements,
>>>>>+                  unsigned int n_placements);
>>>>> void i915_objects_module_exit(void);
>>>>> int i915_objects_module_init(void);

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 03/14] drm/i915/vm_bind: Expose i915_gem_object_max_page_size()
  2022-09-22 16:18         ` Matthew Auld
  2022-09-22 16:46             ` Niranjana Vishwanathapura
@ 2022-09-23  7:45           ` Tvrtko Ursulin
  1 sibling, 0 replies; 62+ messages in thread
From: Tvrtko Ursulin @ 2022-09-23  7:45 UTC (permalink / raw)
  To: Matthew Auld, Niranjana Vishwanathapura
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	daniel.vetter, christian.koenig


On 22/09/2022 17:18, Matthew Auld wrote:
> On 22/09/2022 09:09, Tvrtko Ursulin wrote:
>>
>> On 21/09/2022 19:00, Niranjana Vishwanathapura wrote:
>>> On Wed, Sep 21, 2022 at 10:13:12AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
>>>>> Expose i915_gem_object_max_page_size() function non-static
>>>>> which will be used by the vm_bind feature.
>>>>>
>>>>> Signed-off-by: Niranjana Vishwanathapura 
>>>>> <niranjana.vishwanathapura@intel.com>
>>>>> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>>>> ---
>>>>>  drivers/gpu/drm/i915/gem/i915_gem_create.c | 20 +++++++++++++++-----
>>>>>  drivers/gpu/drm/i915/gem/i915_gem_object.h |  2 ++
>>>>>  2 files changed, 17 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>>> index 33673fe7ee0a..3b3ab4abb0a3 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>>> @@ -11,14 +11,24 @@
>>>>>  #include "pxp/intel_pxp.h"
>>>>>  #include "i915_drv.h"
>>>>> +#include "i915_gem_context.h"
>>>>
>>>> I can't spot that you are adding any code which would need this? 
>>>> I915_GTT_PAGE_SIZE_4K? It is in intel_gtt.h.
>>>
>>> This include should have been added in a later patch for calling
>>> i915_gem_vm_lookup(). But got added here while patch refactoring.
>>> Will fix.
>>>
>>>>
>>>>>  #include "i915_gem_create.h"
>>>>>  #include "i915_trace.h"
>>>>>  #include "i915_user_extensions.h"
>>>>> -static u32 object_max_page_size(struct intel_memory_region 
>>>>> **placements,
>>>>> -                unsigned int n_placements)
>>>>> +/**
>>>>> + * i915_gem_object_max_page_size() - max of min_page_size of the 
>>>>> regions
>>>>> + * @placements:  list of regions
>>>>> + * @n_placements: number of the placements
>>>>> + *
>>>>> + * Calculates the max of the min_page_size of a list of placements 
>>>>> passed in.
>>>>> + *
>>>>> + * Return: max of the min_page_size
>>>>> + */
>>>>> +u32 i915_gem_object_max_page_size(struct intel_memory_region 
>>>>> **placements,
>>>>> +                  unsigned int n_placements)
>>>>>  {
>>>>> -    u32 max_page_size = 0;
>>>>> +    u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>>>>>      int i;
>>>>>      for (i = 0; i < n_placements; i++) {
>>>>> @@ -28,7 +38,6 @@ static u32 object_max_page_size(struct 
>>>>> intel_memory_region **placements,
>>>>>          max_page_size = max_t(u32, max_page_size, mr->min_page_size);
>>>>>      }
>>>>> -    GEM_BUG_ON(!max_page_size);
>>>>>      return max_page_size;
>>>>>  }
>>>>> @@ -99,7 +108,8 @@ __i915_gem_object_create_user_ext(struct 
>>>>> drm_i915_private *i915, u64 size,
>>>>>      i915_gem_flush_free_objects(i915);
>>>>> -    size = round_up(size, object_max_page_size(placements, 
>>>>> n_placements));
>>>>> +    size = round_up(size, i915_gem_object_max_page_size(placements,
>>>>> +                                n_placements));
>>>>>      if (size == 0)
>>>>>          return ERR_PTR(-EINVAL);
>>>>
>>>> Because of the changes above this path is now unreachable. I suppose 
>>>> it was meant to tell the user "you have supplied no placements"? But 
>>>> then GEM_BUG_ON (which you remove) used to be wrong.
>>>>
>>>
>>> Yah, looks like an existing problem. May be this "size == 0" check
>>> should have been made before we do the round_up()? ie., check input 
>>> 'size'
>>> paramter is not 0?
>>> I think for now, I will remove this check as it was unreachable anyhow.
>>
>> Hm that's true as well. i915_gem_create_ext_ioctl ensures at least one 
>> placement and internal callers do as well.
>>
>> To be safe, instead of removing maybe move to before "size = " and 
>> change to "if (GEM_WARN_ON(n_placements == 0))"? Not sure.. Matt any 
>> thoughts here given the changes in this patch?
> 
> The check is also to reject a zero sized object with args->size = 0, i.e 
> round_up(0, PAGE_SIZE) == 0. So for sure that is still needed here.

Oh yeah sneaky round up.. Thanks, my bad.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 13/14] drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
  (?)
@ 2022-09-23  8:40   ` Tvrtko Ursulin
  2022-09-24  4:30     ` Niranjana Vishwanathapura
  -1 siblings, 1 reply; 62+ messages in thread
From: Tvrtko Ursulin @ 2022-09-23  8:40 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: daniel.vetter, christian.koenig, thomas.hellstrom,
	paulo.r.zanoni, matthew.auld


On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
> vma_lookup is tied to segment of the object instead of section

Can be, but not only that. It would be more accurate to say it is based 
of gtt views.

> of VA space. Hence, it do not support aliasing (ie., multiple
> bindings to the same section of the object).
> Skip vma_lookup for persistent vmas as it supports aliasing.

What's broken without this patch? If something is, should it go 
somewhere earlier in the series? If so should be mentioned in the commit 
message.

Or is it just a performance optimisation to skip unused tracking? If so 
should also be mentioned in the commit message.

> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/display/intel_fb_pin.c   |  2 +-
>   .../drm/i915/display/intel_plane_initial.c    |  2 +-
>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  4 +-
>   .../drm/i915/gem/i915_gem_vm_bind_object.c    |  2 +-
>   .../gpu/drm/i915/gem/selftests/huge_pages.c   | 16 +++----
>   .../i915/gem/selftests/i915_gem_client_blt.c  |  2 +-
>   .../drm/i915/gem/selftests/i915_gem_context.c | 12 ++---
>   .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
>   .../drm/i915/gem/selftests/i915_gem_mman.c    |  6 ++-
>   .../drm/i915/gem/selftests/igt_gem_utils.c    |  2 +-
>   drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  2 +-
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  2 +-
>   drivers/gpu/drm/i915/gt/intel_gt.c            |  2 +-
>   drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +-
>   drivers/gpu/drm/i915/gt/intel_lrc.c           |  4 +-
>   drivers/gpu/drm/i915/gt/intel_renderstate.c   |  2 +-
>   drivers/gpu/drm/i915/gt/intel_ring.c          |  2 +-
>   .../gpu/drm/i915/gt/intel_ring_submission.c   |  4 +-
>   drivers/gpu/drm/i915/gt/intel_timeline.c      |  2 +-
>   drivers/gpu/drm/i915/gt/mock_engine.c         |  2 +-
>   drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |  4 +-
>   drivers/gpu/drm/i915/gt/selftest_execlists.c  | 16 +++----
>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  6 +--
>   drivers/gpu/drm/i915/gt/selftest_lrc.c        |  2 +-
>   .../drm/i915/gt/selftest_ring_submission.c    |  2 +-
>   drivers/gpu/drm/i915/gt/selftest_rps.c        |  2 +-
>   .../gpu/drm/i915/gt/selftest_workarounds.c    |  4 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  2 +-
>   drivers/gpu/drm/i915/i915_gem.c               |  2 +-
>   drivers/gpu/drm/i915/i915_perf.c              |  2 +-
>   drivers/gpu/drm/i915/i915_vma.c               | 26 +++++++----
>   drivers/gpu/drm/i915/i915_vma.h               |  3 +-
>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 44 +++++++++----------
>   drivers/gpu/drm/i915/selftests/i915_request.c |  4 +-
>   drivers/gpu/drm/i915/selftests/i915_vma.c     |  2 +-
>   drivers/gpu/drm/i915/selftests/igt_spinner.c  |  2 +-
>   .../drm/i915/selftests/intel_memory_region.c  |  2 +-
>   37 files changed, 106 insertions(+), 93 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_fb_pin.c b/drivers/gpu/drm/i915/display/intel_fb_pin.c
> index c86e5d4ee016..5a718b247bb3 100644
> --- a/drivers/gpu/drm/i915/display/intel_fb_pin.c
> +++ b/drivers/gpu/drm/i915/display/intel_fb_pin.c
> @@ -47,7 +47,7 @@ intel_pin_fb_obj_dpt(struct drm_framebuffer *fb,
>   		goto err;
>   	}
>   
> -	vma = i915_vma_instance(obj, vm, view);
> +	vma = i915_vma_instance(obj, vm, view, false);

Hey why are you touching all the legacy paths? >:P

>   	if (IS_ERR(vma))
>   		goto err;
>   
> diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c
> index 76be796df255..7667e2faa3fb 100644
> --- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
> +++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
> @@ -136,7 +136,7 @@ initial_plane_vma(struct drm_i915_private *i915,
>   		goto err_obj;
>   	}
>   
> -	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL);
> +	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL, false);
>   	if (IS_ERR(vma))
>   		goto err_obj;
>   
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 363b2a788cdf..0ee43cb601b5 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -876,7 +876,7 @@ static struct i915_vma *eb_lookup_vma(struct i915_execbuffer *eb, u32 handle)
>   			}
>   		}
>   
> -		vma = i915_vma_instance(obj, vm, NULL);
> +		vma = i915_vma_instance(obj, vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			i915_gem_object_put(obj);
>   			return vma;
> @@ -2208,7 +2208,7 @@ shadow_batch_pin(struct i915_execbuffer *eb,
>   	struct i915_vma *vma;
>   	int err;
>   
> -	vma = i915_vma_instance(obj, vm, NULL);
> +	vma = i915_vma_instance(obj, vm, NULL, false);
>   	if (IS_ERR(vma))
>   		return vma;
>   
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> index 3087731cc0c0..4468603af6f1 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -252,7 +252,7 @@ static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
>   	view.type = I915_GTT_VIEW_PARTIAL;
>   	view.partial.offset = va->offset >> PAGE_SHIFT;
>   	view.partial.size = va->length >> PAGE_SHIFT;
> -	vma = i915_vma_instance(obj, vm, &view);
> +	vma = i915_vma_instance(obj, vm, &view, true);

This is the only caller passing "true". Leave i915_vma_instance as is, 
and add i915_vma_instance_persistent(), and drop 90% of the patch?

>   	if (IS_ERR(vma))
>   		return vma;
>   
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> index c570cf780079..6e13a83d0e36 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> @@ -454,7 +454,7 @@ static int igt_mock_exhaust_device_supported_pages(void *arg)
>   				goto out_put;
>   			}
>   
> -			vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
> +			vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
>   			if (IS_ERR(vma)) {
>   				err = PTR_ERR(vma);
>   				goto out_put;
> @@ -522,7 +522,7 @@ static int igt_mock_memory_region_huge_pages(void *arg)
>   				goto out_region;
>   			}
>   
> -			vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
> +			vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
>   			if (IS_ERR(vma)) {
>   				err = PTR_ERR(vma);
>   				goto out_put;
> @@ -614,7 +614,7 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg)
>   		/* Force the page size for this object */
>   		obj->mm.page_sizes.sg = page_size;
>   
> -		vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
> +		vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto out_unpin;
> @@ -746,7 +746,7 @@ static int igt_mock_ppgtt_huge_fill(void *arg)
>   
>   		list_add(&obj->st_link, &objects);
>   
> -		vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
> +		vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			break;
> @@ -924,7 +924,7 @@ static int igt_mock_ppgtt_64K(void *arg)
>   			 */
>   			obj->mm.page_sizes.sg &= ~I915_GTT_PAGE_SIZE_2M;
>   
> -			vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
> +			vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
>   			if (IS_ERR(vma)) {
>   				err = PTR_ERR(vma);
>   				goto out_object_unpin;
> @@ -1092,7 +1092,7 @@ static int __igt_write_huge(struct intel_context *ce,
>   	struct i915_vma *vma;
>   	int err;
>   
> -	vma = i915_vma_instance(obj, ce->vm, NULL);
> +	vma = i915_vma_instance(obj, ce->vm, NULL, false);
>   	if (IS_ERR(vma))
>   		return PTR_ERR(vma);
>   
> @@ -1587,7 +1587,7 @@ static int igt_tmpfs_fallback(void *arg)
>   	__i915_gem_object_flush_map(obj, 0, 64);
>   	i915_gem_object_unpin_map(obj);
>   
> -	vma = i915_vma_instance(obj, vm, NULL);
> +	vma = i915_vma_instance(obj, vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto out_put;
> @@ -1654,7 +1654,7 @@ static int igt_shrink_thp(void *arg)
>   		goto out_vm;
>   	}
>   
> -	vma = i915_vma_instance(obj, vm, NULL);
> +	vma = i915_vma_instance(obj, vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto out_put;
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> index 9a6a6b5b722b..e6c6c73bf80e 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> @@ -282,7 +282,7 @@ __create_vma(struct tiled_blits *t, size_t size, bool lmem)
>   	if (IS_ERR(obj))
>   		return ERR_CAST(obj);
>   
> -	vma = i915_vma_instance(obj, t->ce->vm, NULL);
> +	vma = i915_vma_instance(obj, t->ce->vm, NULL, false);
>   	if (IS_ERR(vma))
>   		i915_gem_object_put(obj);
>   
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> index c6ad67b90e8a..570f74df9bef 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> @@ -426,7 +426,7 @@ static int gpu_fill(struct intel_context *ce,
>   	GEM_BUG_ON(obj->base.size > ce->vm->total);
>   	GEM_BUG_ON(!intel_engine_can_store_dword(ce->engine));
>   
> -	vma = i915_vma_instance(obj, ce->vm, NULL);
> +	vma = i915_vma_instance(obj, ce->vm, NULL, false);
>   	if (IS_ERR(vma))
>   		return PTR_ERR(vma);
>   
> @@ -930,7 +930,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj,
>   	if (GRAPHICS_VER(i915) < 8)
>   		return -EINVAL;
>   
> -	vma = i915_vma_instance(obj, ce->vm, NULL);
> +	vma = i915_vma_instance(obj, ce->vm, NULL, false);
>   	if (IS_ERR(vma))
>   		return PTR_ERR(vma);
>   
> @@ -938,7 +938,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj,
>   	if (IS_ERR(rpcs))
>   		return PTR_ERR(rpcs);
>   
> -	batch = i915_vma_instance(rpcs, ce->vm, NULL);
> +	batch = i915_vma_instance(rpcs, ce->vm, NULL, false);
>   	if (IS_ERR(batch)) {
>   		err = PTR_ERR(batch);
>   		goto err_put;
> @@ -1522,7 +1522,7 @@ static int write_to_scratch(struct i915_gem_context *ctx,
>   	intel_gt_chipset_flush(engine->gt);
>   
>   	vm = i915_gem_context_get_eb_vm(ctx);
> -	vma = i915_vma_instance(obj, vm, NULL);
> +	vma = i915_vma_instance(obj, vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto out_vm;
> @@ -1599,7 +1599,7 @@ static int read_from_scratch(struct i915_gem_context *ctx,
>   		const u32 GPR0 = engine->mmio_base + 0x600;
>   
>   		vm = i915_gem_context_get_eb_vm(ctx);
> -		vma = i915_vma_instance(obj, vm, NULL);
> +		vma = i915_vma_instance(obj, vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto out_vm;
> @@ -1635,7 +1635,7 @@ static int read_from_scratch(struct i915_gem_context *ctx,
>   
>   		/* hsw: register access even to 3DPRIM! is protected */
>   		vm = i915_vm_get(&engine->gt->ggtt->vm);
> -		vma = i915_vma_instance(obj, vm, NULL);
> +		vma = i915_vma_instance(obj, vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto out_vm;
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
> index fe6c37fd7859..fc235e1e6c12 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
> @@ -201,7 +201,7 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
>   		return PTR_ERR(obj);
>   
>   	if (vm) {
> -		vma = i915_vma_instance(obj, vm, NULL);
> +		vma = i915_vma_instance(obj, vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto out_put;
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> index b73c91aa5450..e07c91dc33ba 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> @@ -546,7 +546,8 @@ static int make_obj_busy(struct drm_i915_gem_object *obj)
>   		struct i915_gem_ww_ctx ww;
>   		int err;
>   
> -		vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
> +		vma = i915_vma_instance(obj, &engine->gt->ggtt->vm,
> +					NULL, false);
>   		if (IS_ERR(vma))
>   			return PTR_ERR(vma);
>   
> @@ -1587,7 +1588,8 @@ static int __igt_mmap_gpu(struct drm_i915_private *i915,
>   		struct i915_vma *vma;
>   		struct i915_gem_ww_ctx ww;
>   
> -		vma = i915_vma_instance(obj, engine->kernel_context->vm, NULL);
> +		vma = i915_vma_instance(obj, engine->kernel_context->vm,
> +					NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto out_unmap;
> diff --git a/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c b/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
> index 3c55e77b0f1b..4184e198c824 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
> @@ -91,7 +91,7 @@ igt_emit_store_dw(struct i915_vma *vma,
>   
>   	intel_gt_chipset_flush(vma->vm->gt);
>   
> -	vma = i915_vma_instance(obj, vma->vm, NULL);
> +	vma = i915_vma_instance(obj, vma->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err;
> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> index 1bb766c79dcb..a0af2aa50533 100644
> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> @@ -395,7 +395,7 @@ gen6_alloc_top_pd(struct gen6_ppgtt *ppgtt)
>   	pd->pt.base->base.resv = i915_vm_resv_get(&ppgtt->base.vm);
>   	pd->pt.base->shares_resv_from = &ppgtt->base.vm;
>   
> -	ppgtt->vma = i915_vma_instance(pd->pt.base, &ggtt->vm, NULL);
> +	ppgtt->vma = i915_vma_instance(pd->pt.base, &ggtt->vm, NULL, false);
>   	if (IS_ERR(ppgtt->vma)) {
>   		err = PTR_ERR(ppgtt->vma);
>   		ppgtt->vma = NULL;
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 2ddcad497fa3..8146bf811d0f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -1001,7 +1001,7 @@ static int init_status_page(struct intel_engine_cs *engine)
>   
>   	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>   
> -	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
> +	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		ret = PTR_ERR(vma);
>   		goto err_put;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index b367cfff48d5..8a78c6cec7b4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -441,7 +441,7 @@ static int intel_gt_init_scratch(struct intel_gt *gt, unsigned int size)
>   		return PTR_ERR(obj);
>   	}
>   
> -	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
> +	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		ret = PTR_ERR(vma);
>   		goto err_unref;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 401202391649..c9bc33149ad7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -628,7 +628,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
>   
>   	i915_gem_object_set_cache_coherency(obj, I915_CACHING_CACHED);
>   
> -	vma = i915_vma_instance(obj, vm, NULL);
> +	vma = i915_vma_instance(obj, vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		i915_gem_object_put(obj);
>   		return vma;
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 3955292483a6..570d097a2492 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1029,7 +1029,7 @@ __lrc_alloc_state(struct intel_context *ce, struct intel_engine_cs *engine)
>   	if (IS_ERR(obj))
>   		return ERR_CAST(obj);
>   
> -	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
> +	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		i915_gem_object_put(obj);
>   		return vma;
> @@ -1685,7 +1685,7 @@ static int lrc_create_wa_ctx(struct intel_engine_cs *engine)
>   	if (IS_ERR(obj))
>   		return PTR_ERR(obj);
>   
> -	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
> +	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err;
> diff --git a/drivers/gpu/drm/i915/gt/intel_renderstate.c b/drivers/gpu/drm/i915/gt/intel_renderstate.c
> index 5121e6dc2fa5..bc7a2d4421db 100644
> --- a/drivers/gpu/drm/i915/gt/intel_renderstate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_renderstate.c
> @@ -157,7 +157,7 @@ int intel_renderstate_init(struct intel_renderstate *so,
>   		if (IS_ERR(obj))
>   			return PTR_ERR(obj);
>   
> -		so->vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
> +		so->vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
>   		if (IS_ERR(so->vma)) {
>   			err = PTR_ERR(so->vma);
>   			goto err_obj;
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c
> index 15ec64d881c4..24c8b738a394 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring.c
> @@ -130,7 +130,7 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
>   	if (vm->has_read_only)
>   		i915_gem_object_set_readonly(obj);
>   
> -	vma = i915_vma_instance(obj, vm, NULL);
> +	vma = i915_vma_instance(obj, vm, NULL, false);
>   	if (IS_ERR(vma))
>   		goto err;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index d5d6f1fadcae..5e93a4052140 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -551,7 +551,7 @@ alloc_context_vma(struct intel_engine_cs *engine)
>   	if (IS_IVYBRIDGE(i915))
>   		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
>   
> -	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
> +	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err_obj;
> @@ -1291,7 +1291,7 @@ static struct i915_vma *gen7_ctx_vma(struct intel_engine_cs *engine)
>   	if (IS_ERR(obj))
>   		return ERR_CAST(obj);
>   
> -	vma = i915_vma_instance(obj, engine->gt->vm, NULL);
> +	vma = i915_vma_instance(obj, engine->gt->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		i915_gem_object_put(obj);
>   		return ERR_CAST(vma);
> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
> index b9640212d659..31f56996f100 100644
> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
> @@ -28,7 +28,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
>   
>   	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>   
> -	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
> +	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>   	if (IS_ERR(vma))
>   		i915_gem_object_put(obj);
>   
> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> index c0637bf799a3..6f3578308395 100644
> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> @@ -46,7 +46,7 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
>   	if (IS_ERR(obj))
>   		return ERR_CAST(obj);
>   
> -	vma = i915_vma_instance(obj, vm, NULL);
> +	vma = i915_vma_instance(obj, vm, NULL, false);
>   	if (IS_ERR(vma))
>   		goto err;
>   
> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
> index 1b75f478d1b8..16fcaba7c980 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
> @@ -85,7 +85,7 @@ static struct i915_vma *create_empty_batch(struct intel_context *ce)
>   
>   	i915_gem_object_flush_map(obj);
>   
> -	vma = i915_vma_instance(obj, ce->vm, NULL);
> +	vma = i915_vma_instance(obj, ce->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err_unpin;
> @@ -222,7 +222,7 @@ static struct i915_vma *create_nop_batch(struct intel_context *ce)
>   
>   	i915_gem_object_flush_map(obj);
>   
> -	vma = i915_vma_instance(obj, ce->vm, NULL);
> +	vma = i915_vma_instance(obj, ce->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err_unpin;
> diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> index 1e08b2473b99..643ffcb3964a 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> @@ -1000,7 +1000,7 @@ static int live_timeslice_preempt(void *arg)
>   	if (IS_ERR(obj))
>   		return PTR_ERR(obj);
>   
> -	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
> +	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err_obj;
> @@ -1307,7 +1307,7 @@ static int live_timeslice_queue(void *arg)
>   	if (IS_ERR(obj))
>   		return PTR_ERR(obj);
>   
> -	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
> +	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err_obj;
> @@ -1562,7 +1562,7 @@ static int live_busywait_preempt(void *arg)
>   		goto err_obj;
>   	}
>   
> -	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
> +	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err_map;
> @@ -2716,7 +2716,7 @@ static int create_gang(struct intel_engine_cs *engine,
>   		goto err_ce;
>   	}
>   
> -	vma = i915_vma_instance(obj, ce->vm, NULL);
> +	vma = i915_vma_instance(obj, ce->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err_obj;
> @@ -3060,7 +3060,7 @@ create_gpr_user(struct intel_engine_cs *engine,
>   	if (IS_ERR(obj))
>   		return ERR_CAST(obj);
>   
> -	vma = i915_vma_instance(obj, result->vm, NULL);
> +	vma = i915_vma_instance(obj, result->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		i915_gem_object_put(obj);
>   		return vma;
> @@ -3130,7 +3130,7 @@ static struct i915_vma *create_global(struct intel_gt *gt, size_t sz)
>   	if (IS_ERR(obj))
>   		return ERR_CAST(obj);
>   
> -	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
> +	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		i915_gem_object_put(obj);
>   		return vma;
> @@ -3159,7 +3159,7 @@ create_gpr_client(struct intel_engine_cs *engine,
>   	if (IS_ERR(ce))
>   		return ERR_CAST(ce);
>   
> -	vma = i915_vma_instance(global->obj, ce->vm, NULL);
> +	vma = i915_vma_instance(global->obj, ce->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto out_ce;
> @@ -3501,7 +3501,7 @@ static int smoke_submit(struct preempt_smoke *smoke,
>   		struct i915_address_space *vm;
>   
>   		vm = i915_gem_context_get_eb_vm(ctx);
> -		vma = i915_vma_instance(batch, vm, NULL);
> +		vma = i915_vma_instance(batch, vm, NULL, false);
>   		i915_vm_put(vm);
>   		if (IS_ERR(vma))
>   			return PTR_ERR(vma);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 7f3bb1d34dfb..0b021a32d0e0 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -147,13 +147,13 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine)
>   	h->obj = obj;
>   	h->batch = vaddr;
>   
> -	vma = i915_vma_instance(h->obj, vm, NULL);
> +	vma = i915_vma_instance(h->obj, vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		i915_vm_put(vm);
>   		return ERR_CAST(vma);
>   	}
>   
> -	hws = i915_vma_instance(h->hws, vm, NULL);
> +	hws = i915_vma_instance(h->hws, vm, NULL, false);
>   	if (IS_ERR(hws)) {
>   		i915_vm_put(vm);
>   		return ERR_CAST(hws);
> @@ -1474,7 +1474,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
>   		}
>   	}
>   
> -	arg.vma = i915_vma_instance(obj, vm, NULL);
> +	arg.vma = i915_vma_instance(obj, vm, NULL, false);
>   	if (IS_ERR(arg.vma)) {
>   		err = PTR_ERR(arg.vma);
>   		pr_err("[%s] VMA instance failed: %d!\n", engine->name, err);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> index 82d3f8058995..32867049b3bf 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> @@ -938,7 +938,7 @@ create_user_vma(struct i915_address_space *vm, unsigned long size)
>   	if (IS_ERR(obj))
>   		return ERR_CAST(obj);
>   
> -	vma = i915_vma_instance(obj, vm, NULL);
> +	vma = i915_vma_instance(obj, vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		i915_gem_object_put(obj);
>   		return vma;
> diff --git a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
> index 70f9ac1ec2c7..7e9361104620 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
> @@ -17,7 +17,7 @@ static struct i915_vma *create_wally(struct intel_engine_cs *engine)
>   	if (IS_ERR(obj))
>   		return ERR_CAST(obj);
>   
> -	vma = i915_vma_instance(obj, engine->gt->vm, NULL);
> +	vma = i915_vma_instance(obj, engine->gt->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		i915_gem_object_put(obj);
>   		return vma;
> diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c
> index cfb4708dd62e..327558828bef 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_rps.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
> @@ -78,7 +78,7 @@ create_spin_counter(struct intel_engine_cs *engine,
>   
>   	end = obj->base.size / sizeof(u32) - 1;
>   
> -	vma = i915_vma_instance(obj, vm, NULL);
> +	vma = i915_vma_instance(obj, vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err_put;
> diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> index 67a9aab801dd..d893ea763ac6 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> @@ -122,7 +122,7 @@ read_nonprivs(struct intel_context *ce)
>   	i915_gem_object_flush_map(result);
>   	i915_gem_object_unpin_map(result);
>   
> -	vma = i915_vma_instance(result, &engine->gt->ggtt->vm, NULL);
> +	vma = i915_vma_instance(result, &engine->gt->ggtt->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err_obj;
> @@ -389,7 +389,7 @@ static struct i915_vma *create_batch(struct i915_address_space *vm)
>   	if (IS_ERR(obj))
>   		return ERR_CAST(obj);
>   
> -	vma = i915_vma_instance(obj, vm, NULL);
> +	vma = i915_vma_instance(obj, vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err_obj;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index bac06e3d6f2c..d56b1f82250c 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -737,7 +737,7 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
>   	if (IS_ERR(obj))
>   		return ERR_CAST(obj);
>   
> -	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
> +	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>   	if (IS_ERR(vma))
>   		goto err;
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 88df9a35e0fe..bb6b1f56836f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -934,7 +934,7 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj,
>   	}
>   
>   new_vma:
> -	vma = i915_vma_instance(obj, &ggtt->vm, view);
> +	vma = i915_vma_instance(obj, &ggtt->vm, view, false);
>   	if (IS_ERR(vma))
>   		return vma;
>   
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 0defbb43ceea..d8f5ef9fd00f 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1920,7 +1920,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
>   
>   	oa_bo->vma = i915_vma_instance(obj,
>   				       &stream->engine->gt->ggtt->vm,
> -				       NULL);
> +				       NULL, false);
>   	if (IS_ERR(oa_bo->vma)) {
>   		err = PTR_ERR(oa_bo->vma);
>   		goto out_ww;
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 24f171588f56..ef709a61fd54 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -110,7 +110,8 @@ static void __i915_vma_retire(struct i915_active *ref)
>   static struct i915_vma *
>   vma_create(struct drm_i915_gem_object *obj,
>   	   struct i915_address_space *vm,
> -	   const struct i915_gtt_view *view)
> +	   const struct i915_gtt_view *view,
> +	   bool persistent)
>   {
>   	struct i915_vma *pos = ERR_PTR(-E2BIG);
>   	struct i915_vma *vma;
> @@ -197,6 +198,9 @@ vma_create(struct drm_i915_gem_object *obj,
>   		__set_bit(I915_VMA_GGTT_BIT, __i915_vma_flags(vma));
>   	}
>   
> +	if (persistent)
> +		goto skip_rb_insert;

Oh so you don't use the gtt_view's fully at all. I now have reservations 
whether that was the right approach. Since you are not using the 
existing rb tree tracking I mean..

You know if a vma is persistent right? So you could have just added 
special case for persistent vmas to __i915_vma_get_pages and still call 
intel_partial_pages from there. Maybe union over struct i915_gtt_view in 
i915_vma for either the view or struct intel_partial_info for persistent 
ones.

Regards,

Tvrtko

> +
>   	rb = NULL;
>   	p = &obj->vma.tree.rb_node;
>   	while (*p) {
> @@ -221,6 +225,7 @@ vma_create(struct drm_i915_gem_object *obj,
>   	rb_link_node(&vma->obj_node, rb, p);
>   	rb_insert_color(&vma->obj_node, &obj->vma.tree);
>   
> +skip_rb_insert:
>   	if (i915_vma_is_ggtt(vma))
>   		/*
>   		 * We put the GGTT vma at the start of the vma-list, followed
> @@ -279,6 +284,7 @@ i915_vma_lookup(struct drm_i915_gem_object *obj,
>    * @obj: parent &struct drm_i915_gem_object to be mapped
>    * @vm: address space in which the mapping is located
>    * @view: additional mapping requirements
> + * @persistent: Whether the vma is persistent
>    *
>    * i915_vma_instance() looks up an existing VMA of the @obj in the @vm with
>    * the same @view characteristics. If a match is not found, one is created.
> @@ -290,19 +296,22 @@ i915_vma_lookup(struct drm_i915_gem_object *obj,
>   struct i915_vma *
>   i915_vma_instance(struct drm_i915_gem_object *obj,
>   		  struct i915_address_space *vm,
> -		  const struct i915_gtt_view *view)
> +		  const struct i915_gtt_view *view,
> +		  bool persistent)
>   {
> -	struct i915_vma *vma;
> +	struct i915_vma *vma = NULL;
>   
>   	GEM_BUG_ON(!kref_read(&vm->ref));
>   
> -	spin_lock(&obj->vma.lock);
> -	vma = i915_vma_lookup(obj, vm, view);
> -	spin_unlock(&obj->vma.lock);
> +	if (!persistent) {
> +		spin_lock(&obj->vma.lock);
> +		vma = i915_vma_lookup(obj, vm, view);
> +		spin_unlock(&obj->vma.lock);
> +	}
>   
>   	/* vma_create() will resolve the race if another creates the vma */
>   	if (unlikely(!vma))
> -		vma = vma_create(obj, vm, view);
> +		vma = vma_create(obj, vm, view, persistent);
>   
>   	GEM_BUG_ON(!IS_ERR(vma) && i915_vma_compare(vma, vm, view));
>   	return vma;
> @@ -1704,7 +1713,8 @@ static void release_references(struct i915_vma *vma, struct intel_gt *gt,
>   
>   	spin_lock(&obj->vma.lock);
>   	list_del(&vma->obj_link);
> -	if (!RB_EMPTY_NODE(&vma->obj_node))
> +	if (!i915_vma_is_persistent(vma) &&
> +	    !RB_EMPTY_NODE(&vma->obj_node))
>   		rb_erase(&vma->obj_node, &obj->vma.tree);
>   
>   	spin_unlock(&obj->vma.lock);
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 3a47db2d85f5..b8e805c6532f 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -43,7 +43,8 @@
>   struct i915_vma *
>   i915_vma_instance(struct drm_i915_gem_object *obj,
>   		  struct i915_address_space *vm,
> -		  const struct i915_gtt_view *view);
> +		  const struct i915_gtt_view *view,
> +		  bool persistent);
>   
>   void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
>   #define I915_VMA_RELEASE_MAP BIT(0)
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index e050a2de5fd1..d8ffbdf91498 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -390,7 +390,7 @@ static void close_object_list(struct list_head *objects,
>   	list_for_each_entry_safe(obj, on, objects, st_link) {
>   		struct i915_vma *vma;
>   
> -		vma = i915_vma_instance(obj, vm, NULL);
> +		vma = i915_vma_instance(obj, vm, NULL, false);
>   		if (!IS_ERR(vma))
>   			ignored = i915_vma_unbind_unlocked(vma);
>   
> @@ -452,7 +452,7 @@ static int fill_hole(struct i915_address_space *vm,
>   					u64 aligned_size = round_up(obj->base.size,
>   								    min_alignment);
>   
> -					vma = i915_vma_instance(obj, vm, NULL);
> +					vma = i915_vma_instance(obj, vm, NULL, false);
>   					if (IS_ERR(vma))
>   						continue;
>   
> @@ -492,7 +492,7 @@ static int fill_hole(struct i915_address_space *vm,
>   					u64 aligned_size = round_up(obj->base.size,
>   								    min_alignment);
>   
> -					vma = i915_vma_instance(obj, vm, NULL);
> +					vma = i915_vma_instance(obj, vm, NULL, false);
>   					if (IS_ERR(vma))
>   						continue;
>   
> @@ -531,7 +531,7 @@ static int fill_hole(struct i915_address_space *vm,
>   					u64 aligned_size = round_up(obj->base.size,
>   								    min_alignment);
>   
> -					vma = i915_vma_instance(obj, vm, NULL);
> +					vma = i915_vma_instance(obj, vm, NULL, false);
>   					if (IS_ERR(vma))
>   						continue;
>   
> @@ -571,7 +571,7 @@ static int fill_hole(struct i915_address_space *vm,
>   					u64 aligned_size = round_up(obj->base.size,
>   								    min_alignment);
>   
> -					vma = i915_vma_instance(obj, vm, NULL);
> +					vma = i915_vma_instance(obj, vm, NULL, false);
>   					if (IS_ERR(vma))
>   						continue;
>   
> @@ -653,7 +653,7 @@ static int walk_hole(struct i915_address_space *vm,
>   		if (IS_ERR(obj))
>   			break;
>   
> -		vma = i915_vma_instance(obj, vm, NULL);
> +		vma = i915_vma_instance(obj, vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto err_put;
> @@ -728,7 +728,7 @@ static int pot_hole(struct i915_address_space *vm,
>   	if (IS_ERR(obj))
>   		return PTR_ERR(obj);
>   
> -	vma = i915_vma_instance(obj, vm, NULL);
> +	vma = i915_vma_instance(obj, vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err_obj;
> @@ -837,7 +837,7 @@ static int drunk_hole(struct i915_address_space *vm,
>   			break;
>   		}
>   
> -		vma = i915_vma_instance(obj, vm, NULL);
> +		vma = i915_vma_instance(obj, vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto err_obj;
> @@ -920,7 +920,7 @@ static int __shrink_hole(struct i915_address_space *vm,
>   
>   		list_add(&obj->st_link, &objects);
>   
> -		vma = i915_vma_instance(obj, vm, NULL);
> +		vma = i915_vma_instance(obj, vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			break;
> @@ -1018,7 +1018,7 @@ static int shrink_boom(struct i915_address_space *vm,
>   		if (IS_ERR(purge))
>   			return PTR_ERR(purge);
>   
> -		vma = i915_vma_instance(purge, vm, NULL);
> +		vma = i915_vma_instance(purge, vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto err_purge;
> @@ -1041,7 +1041,7 @@ static int shrink_boom(struct i915_address_space *vm,
>   		vm->fault_attr.interval = 1;
>   		atomic_set(&vm->fault_attr.times, -1);
>   
> -		vma = i915_vma_instance(explode, vm, NULL);
> +		vma = i915_vma_instance(explode, vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto err_explode;
> @@ -1088,7 +1088,7 @@ static int misaligned_case(struct i915_address_space *vm, struct intel_memory_re
>   		return PTR_ERR(obj);
>   	}
>   
> -	vma = i915_vma_instance(obj, vm, NULL);
> +	vma = i915_vma_instance(obj, vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err_put;
> @@ -1560,7 +1560,7 @@ static int igt_gtt_reserve(void *arg)
>   		}
>   
>   		list_add(&obj->st_link, &objects);
> -		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
> +		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto out;
> @@ -1606,7 +1606,7 @@ static int igt_gtt_reserve(void *arg)
>   
>   		list_add(&obj->st_link, &objects);
>   
> -		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
> +		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto out;
> @@ -1636,7 +1636,7 @@ static int igt_gtt_reserve(void *arg)
>   		struct i915_vma *vma;
>   		u64 offset;
>   
> -		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
> +		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto out;
> @@ -1783,7 +1783,7 @@ static int igt_gtt_insert(void *arg)
>   
>   		list_add(&obj->st_link, &objects);
>   
> -		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
> +		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto out;
> @@ -1809,7 +1809,7 @@ static int igt_gtt_insert(void *arg)
>   	list_for_each_entry(obj, &objects, st_link) {
>   		struct i915_vma *vma;
>   
> -		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
> +		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto out;
> @@ -1829,7 +1829,7 @@ static int igt_gtt_insert(void *arg)
>   		struct i915_vma *vma;
>   		u64 offset;
>   
> -		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
> +		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto out;
> @@ -1882,7 +1882,7 @@ static int igt_gtt_insert(void *arg)
>   
>   		list_add(&obj->st_link, &objects);
>   
> -		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
> +		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>   		if (IS_ERR(vma)) {
>   			err = PTR_ERR(vma);
>   			goto out;
> @@ -2091,7 +2091,7 @@ static int igt_cs_tlb(void *arg)
>   	}
>   	i915_gem_object_set_cache_coherency(out, I915_CACHING_CACHED);
>   
> -	vma = i915_vma_instance(out, vm, NULL);
> +	vma = i915_vma_instance(out, vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto out_put_out;
> @@ -2131,7 +2131,7 @@ static int igt_cs_tlb(void *arg)
>   
>   			memset32(result, STACK_MAGIC, PAGE_SIZE / sizeof(u32));
>   
> -			vma = i915_vma_instance(bbe, vm, NULL);
> +			vma = i915_vma_instance(bbe, vm, NULL, false);
>   			if (IS_ERR(vma)) {
>   				err = PTR_ERR(vma);
>   				goto end;
> @@ -2203,7 +2203,7 @@ static int igt_cs_tlb(void *arg)
>   				goto end;
>   			}
>   
> -			vma = i915_vma_instance(act, vm, NULL);
> +			vma = i915_vma_instance(act, vm, NULL, false);
>   			if (IS_ERR(vma)) {
>   				kfree(vma_res);
>   				err = PTR_ERR(vma);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
> index 818a4909c1f3..297c1d4ebf44 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> @@ -961,7 +961,7 @@ static struct i915_vma *empty_batch(struct drm_i915_private *i915)
>   
>   	intel_gt_chipset_flush(to_gt(i915));
>   
> -	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL);
> +	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err;
> @@ -1100,7 +1100,7 @@ static struct i915_vma *recursive_batch(struct drm_i915_private *i915)
>   	if (IS_ERR(obj))
>   		return ERR_CAST(obj);
>   
> -	vma = i915_vma_instance(obj, to_gt(i915)->vm, NULL);
> +	vma = i915_vma_instance(obj, to_gt(i915)->vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto err;
> diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
> index 71b52d5efef4..3899c2252de3 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_vma.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
> @@ -68,7 +68,7 @@ checked_vma_instance(struct drm_i915_gem_object *obj,
>   	struct i915_vma *vma;
>   	bool ok = true;
>   
> -	vma = i915_vma_instance(obj, vm, view);
> +	vma = i915_vma_instance(obj, vm, view, false);
>   	if (IS_ERR(vma))
>   		return vma;
>   
> diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> index 0c22594ae274..6901f94ff076 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> @@ -47,7 +47,7 @@ static void *igt_spinner_pin_obj(struct intel_context *ce,
>   	void *vaddr;
>   	int ret;
>   
> -	*vma = i915_vma_instance(obj, ce->vm, NULL);
> +	*vma = i915_vma_instance(obj, ce->vm, NULL, false);
>   	if (IS_ERR(*vma))
>   		return ERR_CAST(*vma);
>   
> diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> index 3b18e5905c86..551d0c958a3b 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> @@ -745,7 +745,7 @@ static int igt_gpu_write(struct i915_gem_context *ctx,
>   	if (!order)
>   		return -ENOMEM;
>   
> -	vma = i915_vma_instance(obj, vm, NULL);
> +	vma = i915_vma_instance(obj, vm, NULL, false);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		goto out_free;

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC v4 08/14] drm/i915/vm_bind: Abstract out common execbuf functions
  2022-09-22  9:54     ` [Intel-gfx] " Jani Nikula
@ 2022-09-24  4:22       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-24  4:22 UTC (permalink / raw)
  To: Jani Nikula
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin, intel-gfx,
	lionel.g.landwerlin, thomas.hellstrom, dri-devel, jason,
	daniel.vetter, christian.koenig, matthew.auld

On Thu, Sep 22, 2022 at 12:54:09PM +0300, Jani Nikula wrote:
>On Wed, 21 Sep 2022, Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> wrote:
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
>> new file mode 100644
>> index 000000000000..725febfd6a53
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
>> @@ -0,0 +1,47 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +#ifndef __I915_GEM_EXECBUFFER_COMMON_H
>> +#define __I915_GEM_EXECBUFFER_COMMON_H
>> +
>> +#include <drm/drm_syncobj.h>
>> +
>> +#include "gt/intel_context.h"
>
>You don't need these includes. Most of it can be handled using forward
>declarations. You'll need <linux/types.h>
>

Thanks Jani,
Sure, here and everywhere I will remove the unwanted includes and use
forward declarations instead.

>> +
>> +struct eb_fence {
>> +	struct drm_syncobj *syncobj;
>> +	struct dma_fence *dma_fence;
>> +	u64 value;
>> +	struct dma_fence_chain *chain_fence;
>> +};
>> +
>> +int __eb_pin_engine(struct intel_context *ce, struct i915_gem_ww_ctx *ww,
>> +		    bool throttle, bool nonblock);
>> +void __eb_unpin_engine(struct intel_context *ce);
>> +int __eb_select_engine(struct intel_context *ce);
>> +void __eb_put_engine(struct intel_context *context, struct intel_gt *gt);
>> +
>> +struct intel_context *
>> +eb_find_context(struct intel_context *context, unsigned int context_number);
>> +
>> +int add_timeline_fence(struct drm_file *file, u32 handle, u64 point,
>> +		       struct eb_fence *f, bool wait, bool signal);
>> +void put_fence_array(struct eb_fence *fences, u64 num_fences);
>> +int await_fence_array(struct eb_fence *fences, u64 num_fences,
>> +		      struct i915_request *rq);
>> +void signal_fence_array(struct eb_fence *fences, u64 num_fences,
>> +			struct dma_fence * const fence);
>> +
>> +int eb_requests_add(struct i915_request **requests, unsigned int num_batches,
>> +		    struct intel_context *context, struct i915_sched_attr sched,
>
>struct i915_sched_attr is passed by value, so you either need to turn
>that into a pointer, or you need the definition. The definition is just
>a wrapper around an int. (For strict type safety or for future proofing
>or what, I don't know.) And this all brings me to my pet peeve about
>gem/gt headers.
>
>To get that definition of a struct wrapper around an int, you need to
>include i915_scheduler_types.h, which recursively includes a total of 16
>headers. Touch any of those files, and you get a rebuild butterfly
>effect.
>
>28% of i915 header files, when modified, cause the rebuild of 83% of the
>driver. Please let's not make it worse.
>

Ok. I think it is passed by value as it is just a wrapper around an int.
I am just moving this function to a separate file.
Will keep it as such, but will forward declare it insted of including
the i915_scheduler_types.h.

Regards,
Niranjana

>
>BR,
>Jani.
>
>> +		    int err);
>> +void eb_requests_get(struct i915_request **requests, unsigned int num_batches);
>> +void eb_requests_put(struct i915_request **requests, unsigned int num_batches);
>> +
>> +struct dma_fence *__eb_composite_fence_create(struct i915_request **requests,
>> +					      unsigned int num_batches,
>> +					      struct intel_context *context);
>> +
>> +#endif /* __I915_GEM_EXECBUFFER_COMMON_H */
>
>-- 
>Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 08/14] drm/i915/vm_bind: Abstract out common execbuf functions
@ 2022-09-24  4:22       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-24  4:22 UTC (permalink / raw)
  To: Jani Nikula
  Cc: paulo.r.zanoni, intel-gfx, thomas.hellstrom, dri-devel,
	daniel.vetter, christian.koenig, matthew.auld

On Thu, Sep 22, 2022 at 12:54:09PM +0300, Jani Nikula wrote:
>On Wed, 21 Sep 2022, Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> wrote:
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
>> new file mode 100644
>> index 000000000000..725febfd6a53
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
>> @@ -0,0 +1,47 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +#ifndef __I915_GEM_EXECBUFFER_COMMON_H
>> +#define __I915_GEM_EXECBUFFER_COMMON_H
>> +
>> +#include <drm/drm_syncobj.h>
>> +
>> +#include "gt/intel_context.h"
>
>You don't need these includes. Most of it can be handled using forward
>declarations. You'll need <linux/types.h>
>

Thanks Jani,
Sure, here and everywhere I will remove the unwanted includes and use
forward declarations instead.

>> +
>> +struct eb_fence {
>> +	struct drm_syncobj *syncobj;
>> +	struct dma_fence *dma_fence;
>> +	u64 value;
>> +	struct dma_fence_chain *chain_fence;
>> +};
>> +
>> +int __eb_pin_engine(struct intel_context *ce, struct i915_gem_ww_ctx *ww,
>> +		    bool throttle, bool nonblock);
>> +void __eb_unpin_engine(struct intel_context *ce);
>> +int __eb_select_engine(struct intel_context *ce);
>> +void __eb_put_engine(struct intel_context *context, struct intel_gt *gt);
>> +
>> +struct intel_context *
>> +eb_find_context(struct intel_context *context, unsigned int context_number);
>> +
>> +int add_timeline_fence(struct drm_file *file, u32 handle, u64 point,
>> +		       struct eb_fence *f, bool wait, bool signal);
>> +void put_fence_array(struct eb_fence *fences, u64 num_fences);
>> +int await_fence_array(struct eb_fence *fences, u64 num_fences,
>> +		      struct i915_request *rq);
>> +void signal_fence_array(struct eb_fence *fences, u64 num_fences,
>> +			struct dma_fence * const fence);
>> +
>> +int eb_requests_add(struct i915_request **requests, unsigned int num_batches,
>> +		    struct intel_context *context, struct i915_sched_attr sched,
>
>struct i915_sched_attr is passed by value, so you either need to turn
>that into a pointer, or you need the definition. The definition is just
>a wrapper around an int. (For strict type safety or for future proofing
>or what, I don't know.) And this all brings me to my pet peeve about
>gem/gt headers.
>
>To get that definition of a struct wrapper around an int, you need to
>include i915_scheduler_types.h, which recursively includes a total of 16
>headers. Touch any of those files, and you get a rebuild butterfly
>effect.
>
>28% of i915 header files, when modified, cause the rebuild of 83% of the
>driver. Please let's not make it worse.
>

Ok. I think it is passed by value as it is just a wrapper around an int.
I am just moving this function to a separate file.
Will keep it as such, but will forward declare it insted of including
the i915_scheduler_types.h.

Regards,
Niranjana

>
>BR,
>Jani.
>
>> +		    int err);
>> +void eb_requests_get(struct i915_request **requests, unsigned int num_batches);
>> +void eb_requests_put(struct i915_request **requests, unsigned int num_batches);
>> +
>> +struct dma_fence *__eb_composite_fence_create(struct i915_request **requests,
>> +					      unsigned int num_batches,
>> +					      struct intel_context *context);
>> +
>> +#endif /* __I915_GEM_EXECBUFFER_COMMON_H */
>
>-- 
>Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 13/14] drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  2022-09-23  8:40   ` Tvrtko Ursulin
@ 2022-09-24  4:30     ` Niranjana Vishwanathapura
  2022-09-26 16:26       ` Tvrtko Ursulin
  0 siblings, 1 reply; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-24  4:30 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	matthew.auld, daniel.vetter, christian.koenig

On Fri, Sep 23, 2022 at 09:40:20AM +0100, Tvrtko Ursulin wrote:
>
>On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
>>vma_lookup is tied to segment of the object instead of section
>
>Can be, but not only that. It would be more accurate to say it is 
>based of gtt views.

Yah, but new code is also based on gtt views, the only difference
is that now there can be multiple mappings (at different VAs)
to the same gtt_view of the object.

>
>>of VA space. Hence, it do not support aliasing (ie., multiple
>>bindings to the same section of the object).
>>Skip vma_lookup for persistent vmas as it supports aliasing.
>
>What's broken without this patch? If something is, should it go 
>somewhere earlier in the series? If so should be mentioned in the 
>commit message.
>
>Or is it just a performance optimisation to skip unused tracking? If 
>so should also be mentioned in the commit message.
>

No, it is not a performance optimization.
The vma_lookup is based on the fact that there can be only one mapping
for a given gtt_view of the object.
So, it was looking for gtt_view to find the mapping.

But now, as I mentioned above, there can be multiple mappings for a
given gtt_view of the object. Hence the vma_lookup method won't work
here. Hence, it is being skipped for persistent vmas.

>>
>>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>---
>>  drivers/gpu/drm/i915/display/intel_fb_pin.c   |  2 +-
>>  .../drm/i915/display/intel_plane_initial.c    |  2 +-
>>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  4 +-
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  2 +-
>>  .../gpu/drm/i915/gem/selftests/huge_pages.c   | 16 +++----
>>  .../i915/gem/selftests/i915_gem_client_blt.c  |  2 +-
>>  .../drm/i915/gem/selftests/i915_gem_context.c | 12 ++---
>>  .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
>>  .../drm/i915/gem/selftests/i915_gem_mman.c    |  6 ++-
>>  .../drm/i915/gem/selftests/igt_gem_utils.c    |  2 +-
>>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  2 +-
>>  drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  2 +-
>>  drivers/gpu/drm/i915/gt/intel_gt.c            |  2 +-
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +-
>>  drivers/gpu/drm/i915/gt/intel_lrc.c           |  4 +-
>>  drivers/gpu/drm/i915/gt/intel_renderstate.c   |  2 +-
>>  drivers/gpu/drm/i915/gt/intel_ring.c          |  2 +-
>>  .../gpu/drm/i915/gt/intel_ring_submission.c   |  4 +-
>>  drivers/gpu/drm/i915/gt/intel_timeline.c      |  2 +-
>>  drivers/gpu/drm/i915/gt/mock_engine.c         |  2 +-
>>  drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |  4 +-
>>  drivers/gpu/drm/i915/gt/selftest_execlists.c  | 16 +++----
>>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  6 +--
>>  drivers/gpu/drm/i915/gt/selftest_lrc.c        |  2 +-
>>  .../drm/i915/gt/selftest_ring_submission.c    |  2 +-
>>  drivers/gpu/drm/i915/gt/selftest_rps.c        |  2 +-
>>  .../gpu/drm/i915/gt/selftest_workarounds.c    |  4 +-
>>  drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  2 +-
>>  drivers/gpu/drm/i915/i915_gem.c               |  2 +-
>>  drivers/gpu/drm/i915/i915_perf.c              |  2 +-
>>  drivers/gpu/drm/i915/i915_vma.c               | 26 +++++++----
>>  drivers/gpu/drm/i915/i915_vma.h               |  3 +-
>>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 44 +++++++++----------
>>  drivers/gpu/drm/i915/selftests/i915_request.c |  4 +-
>>  drivers/gpu/drm/i915/selftests/i915_vma.c     |  2 +-
>>  drivers/gpu/drm/i915/selftests/igt_spinner.c  |  2 +-
>>  .../drm/i915/selftests/intel_memory_region.c  |  2 +-
>>  37 files changed, 106 insertions(+), 93 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/display/intel_fb_pin.c b/drivers/gpu/drm/i915/display/intel_fb_pin.c
>>index c86e5d4ee016..5a718b247bb3 100644
>>--- a/drivers/gpu/drm/i915/display/intel_fb_pin.c
>>+++ b/drivers/gpu/drm/i915/display/intel_fb_pin.c
>>@@ -47,7 +47,7 @@ intel_pin_fb_obj_dpt(struct drm_framebuffer *fb,
>>  		goto err;
>>  	}
>>-	vma = i915_vma_instance(obj, vm, view);
>>+	vma = i915_vma_instance(obj, vm, view, false);
>
>Hey why are you touching all the legacy paths? >:P
>
>>  	if (IS_ERR(vma))
>>  		goto err;
>>diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c
>>index 76be796df255..7667e2faa3fb 100644
>>--- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
>>+++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
>>@@ -136,7 +136,7 @@ initial_plane_vma(struct drm_i915_private *i915,
>>  		goto err_obj;
>>  	}
>>-	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL);
>>+	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL, false);
>>  	if (IS_ERR(vma))
>>  		goto err_obj;
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>index 363b2a788cdf..0ee43cb601b5 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>@@ -876,7 +876,7 @@ static struct i915_vma *eb_lookup_vma(struct i915_execbuffer *eb, u32 handle)
>>  			}
>>  		}
>>-		vma = i915_vma_instance(obj, vm, NULL);
>>+		vma = i915_vma_instance(obj, vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			i915_gem_object_put(obj);
>>  			return vma;
>>@@ -2208,7 +2208,7 @@ shadow_batch_pin(struct i915_execbuffer *eb,
>>  	struct i915_vma *vma;
>>  	int err;
>>-	vma = i915_vma_instance(obj, vm, NULL);
>>+	vma = i915_vma_instance(obj, vm, NULL, false);
>>  	if (IS_ERR(vma))
>>  		return vma;
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>index 3087731cc0c0..4468603af6f1 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>@@ -252,7 +252,7 @@ static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
>>  	view.type = I915_GTT_VIEW_PARTIAL;
>>  	view.partial.offset = va->offset >> PAGE_SHIFT;
>>  	view.partial.size = va->length >> PAGE_SHIFT;
>>-	vma = i915_vma_instance(obj, vm, &view);
>>+	vma = i915_vma_instance(obj, vm, &view, true);
>
>This is the only caller passing "true". Leave i915_vma_instance as is, 
>and add i915_vma_instance_persistent(), and drop 90% of the patch?

Yah, makes sense. Will fix it.

>
>>  	if (IS_ERR(vma))
>>  		return vma;
>>diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>index c570cf780079..6e13a83d0e36 100644
>>--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>@@ -454,7 +454,7 @@ static int igt_mock_exhaust_device_supported_pages(void *arg)
>>  				goto out_put;
>>  			}
>>-			vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
>>+			vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
>>  			if (IS_ERR(vma)) {
>>  				err = PTR_ERR(vma);
>>  				goto out_put;
>>@@ -522,7 +522,7 @@ static int igt_mock_memory_region_huge_pages(void *arg)
>>  				goto out_region;
>>  			}
>>-			vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
>>+			vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
>>  			if (IS_ERR(vma)) {
>>  				err = PTR_ERR(vma);
>>  				goto out_put;
>>@@ -614,7 +614,7 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg)
>>  		/* Force the page size for this object */
>>  		obj->mm.page_sizes.sg = page_size;
>>-		vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
>>+		vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto out_unpin;
>>@@ -746,7 +746,7 @@ static int igt_mock_ppgtt_huge_fill(void *arg)
>>  		list_add(&obj->st_link, &objects);
>>-		vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
>>+		vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			break;
>>@@ -924,7 +924,7 @@ static int igt_mock_ppgtt_64K(void *arg)
>>  			 */
>>  			obj->mm.page_sizes.sg &= ~I915_GTT_PAGE_SIZE_2M;
>>-			vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
>>+			vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
>>  			if (IS_ERR(vma)) {
>>  				err = PTR_ERR(vma);
>>  				goto out_object_unpin;
>>@@ -1092,7 +1092,7 @@ static int __igt_write_huge(struct intel_context *ce,
>>  	struct i915_vma *vma;
>>  	int err;
>>-	vma = i915_vma_instance(obj, ce->vm, NULL);
>>+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
>>  	if (IS_ERR(vma))
>>  		return PTR_ERR(vma);
>>@@ -1587,7 +1587,7 @@ static int igt_tmpfs_fallback(void *arg)
>>  	__i915_gem_object_flush_map(obj, 0, 64);
>>  	i915_gem_object_unpin_map(obj);
>>-	vma = i915_vma_instance(obj, vm, NULL);
>>+	vma = i915_vma_instance(obj, vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto out_put;
>>@@ -1654,7 +1654,7 @@ static int igt_shrink_thp(void *arg)
>>  		goto out_vm;
>>  	}
>>-	vma = i915_vma_instance(obj, vm, NULL);
>>+	vma = i915_vma_instance(obj, vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto out_put;
>>diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>index 9a6a6b5b722b..e6c6c73bf80e 100644
>>--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>@@ -282,7 +282,7 @@ __create_vma(struct tiled_blits *t, size_t size, bool lmem)
>>  	if (IS_ERR(obj))
>>  		return ERR_CAST(obj);
>>-	vma = i915_vma_instance(obj, t->ce->vm, NULL);
>>+	vma = i915_vma_instance(obj, t->ce->vm, NULL, false);
>>  	if (IS_ERR(vma))
>>  		i915_gem_object_put(obj);
>>diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
>>index c6ad67b90e8a..570f74df9bef 100644
>>--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
>>+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
>>@@ -426,7 +426,7 @@ static int gpu_fill(struct intel_context *ce,
>>  	GEM_BUG_ON(obj->base.size > ce->vm->total);
>>  	GEM_BUG_ON(!intel_engine_can_store_dword(ce->engine));
>>-	vma = i915_vma_instance(obj, ce->vm, NULL);
>>+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
>>  	if (IS_ERR(vma))
>>  		return PTR_ERR(vma);
>>@@ -930,7 +930,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj,
>>  	if (GRAPHICS_VER(i915) < 8)
>>  		return -EINVAL;
>>-	vma = i915_vma_instance(obj, ce->vm, NULL);
>>+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
>>  	if (IS_ERR(vma))
>>  		return PTR_ERR(vma);
>>@@ -938,7 +938,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj,
>>  	if (IS_ERR(rpcs))
>>  		return PTR_ERR(rpcs);
>>-	batch = i915_vma_instance(rpcs, ce->vm, NULL);
>>+	batch = i915_vma_instance(rpcs, ce->vm, NULL, false);
>>  	if (IS_ERR(batch)) {
>>  		err = PTR_ERR(batch);
>>  		goto err_put;
>>@@ -1522,7 +1522,7 @@ static int write_to_scratch(struct i915_gem_context *ctx,
>>  	intel_gt_chipset_flush(engine->gt);
>>  	vm = i915_gem_context_get_eb_vm(ctx);
>>-	vma = i915_vma_instance(obj, vm, NULL);
>>+	vma = i915_vma_instance(obj, vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto out_vm;
>>@@ -1599,7 +1599,7 @@ static int read_from_scratch(struct i915_gem_context *ctx,
>>  		const u32 GPR0 = engine->mmio_base + 0x600;
>>  		vm = i915_gem_context_get_eb_vm(ctx);
>>-		vma = i915_vma_instance(obj, vm, NULL);
>>+		vma = i915_vma_instance(obj, vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto out_vm;
>>@@ -1635,7 +1635,7 @@ static int read_from_scratch(struct i915_gem_context *ctx,
>>  		/* hsw: register access even to 3DPRIM! is protected */
>>  		vm = i915_vm_get(&engine->gt->ggtt->vm);
>>-		vma = i915_vma_instance(obj, vm, NULL);
>>+		vma = i915_vma_instance(obj, vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto out_vm;
>>diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>>index fe6c37fd7859..fc235e1e6c12 100644
>>--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>>+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>>@@ -201,7 +201,7 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
>>  		return PTR_ERR(obj);
>>  	if (vm) {
>>-		vma = i915_vma_instance(obj, vm, NULL);
>>+		vma = i915_vma_instance(obj, vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto out_put;
>>diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>index b73c91aa5450..e07c91dc33ba 100644
>>--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>@@ -546,7 +546,8 @@ static int make_obj_busy(struct drm_i915_gem_object *obj)
>>  		struct i915_gem_ww_ctx ww;
>>  		int err;
>>-		vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>+		vma = i915_vma_instance(obj, &engine->gt->ggtt->vm,
>>+					NULL, false);
>>  		if (IS_ERR(vma))
>>  			return PTR_ERR(vma);
>>@@ -1587,7 +1588,8 @@ static int __igt_mmap_gpu(struct drm_i915_private *i915,
>>  		struct i915_vma *vma;
>>  		struct i915_gem_ww_ctx ww;
>>-		vma = i915_vma_instance(obj, engine->kernel_context->vm, NULL);
>>+		vma = i915_vma_instance(obj, engine->kernel_context->vm,
>>+					NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto out_unmap;
>>diff --git a/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c b/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
>>index 3c55e77b0f1b..4184e198c824 100644
>>--- a/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
>>+++ b/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
>>@@ -91,7 +91,7 @@ igt_emit_store_dw(struct i915_vma *vma,
>>  	intel_gt_chipset_flush(vma->vm->gt);
>>-	vma = i915_vma_instance(obj, vma->vm, NULL);
>>+	vma = i915_vma_instance(obj, vma->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err;
>>diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>>index 1bb766c79dcb..a0af2aa50533 100644
>>--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>>+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>>@@ -395,7 +395,7 @@ gen6_alloc_top_pd(struct gen6_ppgtt *ppgtt)
>>  	pd->pt.base->base.resv = i915_vm_resv_get(&ppgtt->base.vm);
>>  	pd->pt.base->shares_resv_from = &ppgtt->base.vm;
>>-	ppgtt->vma = i915_vma_instance(pd->pt.base, &ggtt->vm, NULL);
>>+	ppgtt->vma = i915_vma_instance(pd->pt.base, &ggtt->vm, NULL, false);
>>  	if (IS_ERR(ppgtt->vma)) {
>>  		err = PTR_ERR(ppgtt->vma);
>>  		ppgtt->vma = NULL;
>>diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>index 2ddcad497fa3..8146bf811d0f 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>@@ -1001,7 +1001,7 @@ static int init_status_page(struct intel_engine_cs *engine)
>>  	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		ret = PTR_ERR(vma);
>>  		goto err_put;
>>diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>>index b367cfff48d5..8a78c6cec7b4 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_gt.c
>>+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>>@@ -441,7 +441,7 @@ static int intel_gt_init_scratch(struct intel_gt *gt, unsigned int size)
>>  		return PTR_ERR(obj);
>>  	}
>>-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		ret = PTR_ERR(vma);
>>  		goto err_unref;
>>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>index 401202391649..c9bc33149ad7 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>@@ -628,7 +628,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
>>  	i915_gem_object_set_cache_coherency(obj, I915_CACHING_CACHED);
>>-	vma = i915_vma_instance(obj, vm, NULL);
>>+	vma = i915_vma_instance(obj, vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		i915_gem_object_put(obj);
>>  		return vma;
>>diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>index 3955292483a6..570d097a2492 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>@@ -1029,7 +1029,7 @@ __lrc_alloc_state(struct intel_context *ce, struct intel_engine_cs *engine)
>>  	if (IS_ERR(obj))
>>  		return ERR_CAST(obj);
>>-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		i915_gem_object_put(obj);
>>  		return vma;
>>@@ -1685,7 +1685,7 @@ static int lrc_create_wa_ctx(struct intel_engine_cs *engine)
>>  	if (IS_ERR(obj))
>>  		return PTR_ERR(obj);
>>-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err;
>>diff --git a/drivers/gpu/drm/i915/gt/intel_renderstate.c b/drivers/gpu/drm/i915/gt/intel_renderstate.c
>>index 5121e6dc2fa5..bc7a2d4421db 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_renderstate.c
>>+++ b/drivers/gpu/drm/i915/gt/intel_renderstate.c
>>@@ -157,7 +157,7 @@ int intel_renderstate_init(struct intel_renderstate *so,
>>  		if (IS_ERR(obj))
>>  			return PTR_ERR(obj);
>>-		so->vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>+		so->vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
>>  		if (IS_ERR(so->vma)) {
>>  			err = PTR_ERR(so->vma);
>>  			goto err_obj;
>>diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c
>>index 15ec64d881c4..24c8b738a394 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_ring.c
>>+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
>>@@ -130,7 +130,7 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
>>  	if (vm->has_read_only)
>>  		i915_gem_object_set_readonly(obj);
>>-	vma = i915_vma_instance(obj, vm, NULL);
>>+	vma = i915_vma_instance(obj, vm, NULL, false);
>>  	if (IS_ERR(vma))
>>  		goto err;
>>diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>index d5d6f1fadcae..5e93a4052140 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>@@ -551,7 +551,7 @@ alloc_context_vma(struct intel_engine_cs *engine)
>>  	if (IS_IVYBRIDGE(i915))
>>  		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
>>-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err_obj;
>>@@ -1291,7 +1291,7 @@ static struct i915_vma *gen7_ctx_vma(struct intel_engine_cs *engine)
>>  	if (IS_ERR(obj))
>>  		return ERR_CAST(obj);
>>-	vma = i915_vma_instance(obj, engine->gt->vm, NULL);
>>+	vma = i915_vma_instance(obj, engine->gt->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		i915_gem_object_put(obj);
>>  		return ERR_CAST(vma);
>>diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
>>index b9640212d659..31f56996f100 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
>>+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
>>@@ -28,7 +28,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
>>  	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>>  	if (IS_ERR(vma))
>>  		i915_gem_object_put(obj);
>>diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
>>index c0637bf799a3..6f3578308395 100644
>>--- a/drivers/gpu/drm/i915/gt/mock_engine.c
>>+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
>>@@ -46,7 +46,7 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
>>  	if (IS_ERR(obj))
>>  		return ERR_CAST(obj);
>>-	vma = i915_vma_instance(obj, vm, NULL);
>>+	vma = i915_vma_instance(obj, vm, NULL, false);
>>  	if (IS_ERR(vma))
>>  		goto err;
>>diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
>>index 1b75f478d1b8..16fcaba7c980 100644
>>--- a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
>>+++ b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
>>@@ -85,7 +85,7 @@ static struct i915_vma *create_empty_batch(struct intel_context *ce)
>>  	i915_gem_object_flush_map(obj);
>>-	vma = i915_vma_instance(obj, ce->vm, NULL);
>>+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err_unpin;
>>@@ -222,7 +222,7 @@ static struct i915_vma *create_nop_batch(struct intel_context *ce)
>>  	i915_gem_object_flush_map(obj);
>>-	vma = i915_vma_instance(obj, ce->vm, NULL);
>>+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err_unpin;
>>diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
>>index 1e08b2473b99..643ffcb3964a 100644
>>--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
>>+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
>>@@ -1000,7 +1000,7 @@ static int live_timeslice_preempt(void *arg)
>>  	if (IS_ERR(obj))
>>  		return PTR_ERR(obj);
>>-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err_obj;
>>@@ -1307,7 +1307,7 @@ static int live_timeslice_queue(void *arg)
>>  	if (IS_ERR(obj))
>>  		return PTR_ERR(obj);
>>-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err_obj;
>>@@ -1562,7 +1562,7 @@ static int live_busywait_preempt(void *arg)
>>  		goto err_obj;
>>  	}
>>-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err_map;
>>@@ -2716,7 +2716,7 @@ static int create_gang(struct intel_engine_cs *engine,
>>  		goto err_ce;
>>  	}
>>-	vma = i915_vma_instance(obj, ce->vm, NULL);
>>+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err_obj;
>>@@ -3060,7 +3060,7 @@ create_gpr_user(struct intel_engine_cs *engine,
>>  	if (IS_ERR(obj))
>>  		return ERR_CAST(obj);
>>-	vma = i915_vma_instance(obj, result->vm, NULL);
>>+	vma = i915_vma_instance(obj, result->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		i915_gem_object_put(obj);
>>  		return vma;
>>@@ -3130,7 +3130,7 @@ static struct i915_vma *create_global(struct intel_gt *gt, size_t sz)
>>  	if (IS_ERR(obj))
>>  		return ERR_CAST(obj);
>>-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		i915_gem_object_put(obj);
>>  		return vma;
>>@@ -3159,7 +3159,7 @@ create_gpr_client(struct intel_engine_cs *engine,
>>  	if (IS_ERR(ce))
>>  		return ERR_CAST(ce);
>>-	vma = i915_vma_instance(global->obj, ce->vm, NULL);
>>+	vma = i915_vma_instance(global->obj, ce->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto out_ce;
>>@@ -3501,7 +3501,7 @@ static int smoke_submit(struct preempt_smoke *smoke,
>>  		struct i915_address_space *vm;
>>  		vm = i915_gem_context_get_eb_vm(ctx);
>>-		vma = i915_vma_instance(batch, vm, NULL);
>>+		vma = i915_vma_instance(batch, vm, NULL, false);
>>  		i915_vm_put(vm);
>>  		if (IS_ERR(vma))
>>  			return PTR_ERR(vma);
>>diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>>index 7f3bb1d34dfb..0b021a32d0e0 100644
>>--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>>+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>>@@ -147,13 +147,13 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine)
>>  	h->obj = obj;
>>  	h->batch = vaddr;
>>-	vma = i915_vma_instance(h->obj, vm, NULL);
>>+	vma = i915_vma_instance(h->obj, vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		i915_vm_put(vm);
>>  		return ERR_CAST(vma);
>>  	}
>>-	hws = i915_vma_instance(h->hws, vm, NULL);
>>+	hws = i915_vma_instance(h->hws, vm, NULL, false);
>>  	if (IS_ERR(hws)) {
>>  		i915_vm_put(vm);
>>  		return ERR_CAST(hws);
>>@@ -1474,7 +1474,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
>>  		}
>>  	}
>>-	arg.vma = i915_vma_instance(obj, vm, NULL);
>>+	arg.vma = i915_vma_instance(obj, vm, NULL, false);
>>  	if (IS_ERR(arg.vma)) {
>>  		err = PTR_ERR(arg.vma);
>>  		pr_err("[%s] VMA instance failed: %d!\n", engine->name, err);
>>diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
>>index 82d3f8058995..32867049b3bf 100644
>>--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
>>+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
>>@@ -938,7 +938,7 @@ create_user_vma(struct i915_address_space *vm, unsigned long size)
>>  	if (IS_ERR(obj))
>>  		return ERR_CAST(obj);
>>-	vma = i915_vma_instance(obj, vm, NULL);
>>+	vma = i915_vma_instance(obj, vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		i915_gem_object_put(obj);
>>  		return vma;
>>diff --git a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
>>index 70f9ac1ec2c7..7e9361104620 100644
>>--- a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
>>+++ b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
>>@@ -17,7 +17,7 @@ static struct i915_vma *create_wally(struct intel_engine_cs *engine)
>>  	if (IS_ERR(obj))
>>  		return ERR_CAST(obj);
>>-	vma = i915_vma_instance(obj, engine->gt->vm, NULL);
>>+	vma = i915_vma_instance(obj, engine->gt->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		i915_gem_object_put(obj);
>>  		return vma;
>>diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c
>>index cfb4708dd62e..327558828bef 100644
>>--- a/drivers/gpu/drm/i915/gt/selftest_rps.c
>>+++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
>>@@ -78,7 +78,7 @@ create_spin_counter(struct intel_engine_cs *engine,
>>  	end = obj->base.size / sizeof(u32) - 1;
>>-	vma = i915_vma_instance(obj, vm, NULL);
>>+	vma = i915_vma_instance(obj, vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err_put;
>>diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>>index 67a9aab801dd..d893ea763ac6 100644
>>--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>>+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>>@@ -122,7 +122,7 @@ read_nonprivs(struct intel_context *ce)
>>  	i915_gem_object_flush_map(result);
>>  	i915_gem_object_unpin_map(result);
>>-	vma = i915_vma_instance(result, &engine->gt->ggtt->vm, NULL);
>>+	vma = i915_vma_instance(result, &engine->gt->ggtt->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err_obj;
>>@@ -389,7 +389,7 @@ static struct i915_vma *create_batch(struct i915_address_space *vm)
>>  	if (IS_ERR(obj))
>>  		return ERR_CAST(obj);
>>-	vma = i915_vma_instance(obj, vm, NULL);
>>+	vma = i915_vma_instance(obj, vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err_obj;
>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>index bac06e3d6f2c..d56b1f82250c 100644
>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>@@ -737,7 +737,7 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
>>  	if (IS_ERR(obj))
>>  		return ERR_CAST(obj);
>>-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>>  	if (IS_ERR(vma))
>>  		goto err;
>>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>index 88df9a35e0fe..bb6b1f56836f 100644
>>--- a/drivers/gpu/drm/i915/i915_gem.c
>>+++ b/drivers/gpu/drm/i915/i915_gem.c
>>@@ -934,7 +934,7 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj,
>>  	}
>>  new_vma:
>>-	vma = i915_vma_instance(obj, &ggtt->vm, view);
>>+	vma = i915_vma_instance(obj, &ggtt->vm, view, false);
>>  	if (IS_ERR(vma))
>>  		return vma;
>>diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>index 0defbb43ceea..d8f5ef9fd00f 100644
>>--- a/drivers/gpu/drm/i915/i915_perf.c
>>+++ b/drivers/gpu/drm/i915/i915_perf.c
>>@@ -1920,7 +1920,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
>>  	oa_bo->vma = i915_vma_instance(obj,
>>  				       &stream->engine->gt->ggtt->vm,
>>-				       NULL);
>>+				       NULL, false);
>>  	if (IS_ERR(oa_bo->vma)) {
>>  		err = PTR_ERR(oa_bo->vma);
>>  		goto out_ww;
>>diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>>index 24f171588f56..ef709a61fd54 100644
>>--- a/drivers/gpu/drm/i915/i915_vma.c
>>+++ b/drivers/gpu/drm/i915/i915_vma.c
>>@@ -110,7 +110,8 @@ static void __i915_vma_retire(struct i915_active *ref)
>>  static struct i915_vma *
>>  vma_create(struct drm_i915_gem_object *obj,
>>  	   struct i915_address_space *vm,
>>-	   const struct i915_gtt_view *view)
>>+	   const struct i915_gtt_view *view,
>>+	   bool persistent)
>>  {
>>  	struct i915_vma *pos = ERR_PTR(-E2BIG);
>>  	struct i915_vma *vma;
>>@@ -197,6 +198,9 @@ vma_create(struct drm_i915_gem_object *obj,
>>  		__set_bit(I915_VMA_GGTT_BIT, __i915_vma_flags(vma));
>>  	}
>>+	if (persistent)
>>+		goto skip_rb_insert;
>
>Oh so you don't use the gtt_view's fully at all. I now have 
>reservations whether that was the right approach. Since you are not 
>using the existing rb tree tracking I mean..
>
>You know if a vma is persistent right? So you could have just added 
>special case for persistent vmas to __i915_vma_get_pages and still 
>call intel_partial_pages from there. Maybe union over struct 
>i915_gtt_view in i915_vma for either the view or struct 
>intel_partial_info for persistent ones.
>

We are using the gtt_view fully in this patch for persistent vmas.
But as mentioned above, now we have support multiple mappings
for the same gtt_view of the object. For this, the current
vma_lookup() falls short. So, we are skipping it.

Regards,
Niranjana

>Regards,
>
>Tvrtko
>
>>+
>>  	rb = NULL;
>>  	p = &obj->vma.tree.rb_node;
>>  	while (*p) {
>>@@ -221,6 +225,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>  	rb_link_node(&vma->obj_node, rb, p);
>>  	rb_insert_color(&vma->obj_node, &obj->vma.tree);
>>+skip_rb_insert:
>>  	if (i915_vma_is_ggtt(vma))
>>  		/*
>>  		 * We put the GGTT vma at the start of the vma-list, followed
>>@@ -279,6 +284,7 @@ i915_vma_lookup(struct drm_i915_gem_object *obj,
>>   * @obj: parent &struct drm_i915_gem_object to be mapped
>>   * @vm: address space in which the mapping is located
>>   * @view: additional mapping requirements
>>+ * @persistent: Whether the vma is persistent
>>   *
>>   * i915_vma_instance() looks up an existing VMA of the @obj in the @vm with
>>   * the same @view characteristics. If a match is not found, one is created.
>>@@ -290,19 +296,22 @@ i915_vma_lookup(struct drm_i915_gem_object *obj,
>>  struct i915_vma *
>>  i915_vma_instance(struct drm_i915_gem_object *obj,
>>  		  struct i915_address_space *vm,
>>-		  const struct i915_gtt_view *view)
>>+		  const struct i915_gtt_view *view,
>>+		  bool persistent)
>>  {
>>-	struct i915_vma *vma;
>>+	struct i915_vma *vma = NULL;
>>  	GEM_BUG_ON(!kref_read(&vm->ref));
>>-	spin_lock(&obj->vma.lock);
>>-	vma = i915_vma_lookup(obj, vm, view);
>>-	spin_unlock(&obj->vma.lock);
>>+	if (!persistent) {
>>+		spin_lock(&obj->vma.lock);
>>+		vma = i915_vma_lookup(obj, vm, view);
>>+		spin_unlock(&obj->vma.lock);
>>+	}
>>  	/* vma_create() will resolve the race if another creates the vma */
>>  	if (unlikely(!vma))
>>-		vma = vma_create(obj, vm, view);
>>+		vma = vma_create(obj, vm, view, persistent);
>>  	GEM_BUG_ON(!IS_ERR(vma) && i915_vma_compare(vma, vm, view));
>>  	return vma;
>>@@ -1704,7 +1713,8 @@ static void release_references(struct i915_vma *vma, struct intel_gt *gt,
>>  	spin_lock(&obj->vma.lock);
>>  	list_del(&vma->obj_link);
>>-	if (!RB_EMPTY_NODE(&vma->obj_node))
>>+	if (!i915_vma_is_persistent(vma) &&
>>+	    !RB_EMPTY_NODE(&vma->obj_node))
>>  		rb_erase(&vma->obj_node, &obj->vma.tree);
>>  	spin_unlock(&obj->vma.lock);
>>diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
>>index 3a47db2d85f5..b8e805c6532f 100644
>>--- a/drivers/gpu/drm/i915/i915_vma.h
>>+++ b/drivers/gpu/drm/i915/i915_vma.h
>>@@ -43,7 +43,8 @@
>>  struct i915_vma *
>>  i915_vma_instance(struct drm_i915_gem_object *obj,
>>  		  struct i915_address_space *vm,
>>-		  const struct i915_gtt_view *view);
>>+		  const struct i915_gtt_view *view,
>>+		  bool persistent);
>>  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
>>  #define I915_VMA_RELEASE_MAP BIT(0)
>>diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>index e050a2de5fd1..d8ffbdf91498 100644
>>--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>@@ -390,7 +390,7 @@ static void close_object_list(struct list_head *objects,
>>  	list_for_each_entry_safe(obj, on, objects, st_link) {
>>  		struct i915_vma *vma;
>>-		vma = i915_vma_instance(obj, vm, NULL);
>>+		vma = i915_vma_instance(obj, vm, NULL, false);
>>  		if (!IS_ERR(vma))
>>  			ignored = i915_vma_unbind_unlocked(vma);
>>@@ -452,7 +452,7 @@ static int fill_hole(struct i915_address_space *vm,
>>  					u64 aligned_size = round_up(obj->base.size,
>>  								    min_alignment);
>>-					vma = i915_vma_instance(obj, vm, NULL);
>>+					vma = i915_vma_instance(obj, vm, NULL, false);
>>  					if (IS_ERR(vma))
>>  						continue;
>>@@ -492,7 +492,7 @@ static int fill_hole(struct i915_address_space *vm,
>>  					u64 aligned_size = round_up(obj->base.size,
>>  								    min_alignment);
>>-					vma = i915_vma_instance(obj, vm, NULL);
>>+					vma = i915_vma_instance(obj, vm, NULL, false);
>>  					if (IS_ERR(vma))
>>  						continue;
>>@@ -531,7 +531,7 @@ static int fill_hole(struct i915_address_space *vm,
>>  					u64 aligned_size = round_up(obj->base.size,
>>  								    min_alignment);
>>-					vma = i915_vma_instance(obj, vm, NULL);
>>+					vma = i915_vma_instance(obj, vm, NULL, false);
>>  					if (IS_ERR(vma))
>>  						continue;
>>@@ -571,7 +571,7 @@ static int fill_hole(struct i915_address_space *vm,
>>  					u64 aligned_size = round_up(obj->base.size,
>>  								    min_alignment);
>>-					vma = i915_vma_instance(obj, vm, NULL);
>>+					vma = i915_vma_instance(obj, vm, NULL, false);
>>  					if (IS_ERR(vma))
>>  						continue;
>>@@ -653,7 +653,7 @@ static int walk_hole(struct i915_address_space *vm,
>>  		if (IS_ERR(obj))
>>  			break;
>>-		vma = i915_vma_instance(obj, vm, NULL);
>>+		vma = i915_vma_instance(obj, vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto err_put;
>>@@ -728,7 +728,7 @@ static int pot_hole(struct i915_address_space *vm,
>>  	if (IS_ERR(obj))
>>  		return PTR_ERR(obj);
>>-	vma = i915_vma_instance(obj, vm, NULL);
>>+	vma = i915_vma_instance(obj, vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err_obj;
>>@@ -837,7 +837,7 @@ static int drunk_hole(struct i915_address_space *vm,
>>  			break;
>>  		}
>>-		vma = i915_vma_instance(obj, vm, NULL);
>>+		vma = i915_vma_instance(obj, vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto err_obj;
>>@@ -920,7 +920,7 @@ static int __shrink_hole(struct i915_address_space *vm,
>>  		list_add(&obj->st_link, &objects);
>>-		vma = i915_vma_instance(obj, vm, NULL);
>>+		vma = i915_vma_instance(obj, vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			break;
>>@@ -1018,7 +1018,7 @@ static int shrink_boom(struct i915_address_space *vm,
>>  		if (IS_ERR(purge))
>>  			return PTR_ERR(purge);
>>-		vma = i915_vma_instance(purge, vm, NULL);
>>+		vma = i915_vma_instance(purge, vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto err_purge;
>>@@ -1041,7 +1041,7 @@ static int shrink_boom(struct i915_address_space *vm,
>>  		vm->fault_attr.interval = 1;
>>  		atomic_set(&vm->fault_attr.times, -1);
>>-		vma = i915_vma_instance(explode, vm, NULL);
>>+		vma = i915_vma_instance(explode, vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto err_explode;
>>@@ -1088,7 +1088,7 @@ static int misaligned_case(struct i915_address_space *vm, struct intel_memory_re
>>  		return PTR_ERR(obj);
>>  	}
>>-	vma = i915_vma_instance(obj, vm, NULL);
>>+	vma = i915_vma_instance(obj, vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err_put;
>>@@ -1560,7 +1560,7 @@ static int igt_gtt_reserve(void *arg)
>>  		}
>>  		list_add(&obj->st_link, &objects);
>>-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
>>+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto out;
>>@@ -1606,7 +1606,7 @@ static int igt_gtt_reserve(void *arg)
>>  		list_add(&obj->st_link, &objects);
>>-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
>>+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto out;
>>@@ -1636,7 +1636,7 @@ static int igt_gtt_reserve(void *arg)
>>  		struct i915_vma *vma;
>>  		u64 offset;
>>-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
>>+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto out;
>>@@ -1783,7 +1783,7 @@ static int igt_gtt_insert(void *arg)
>>  		list_add(&obj->st_link, &objects);
>>-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
>>+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto out;
>>@@ -1809,7 +1809,7 @@ static int igt_gtt_insert(void *arg)
>>  	list_for_each_entry(obj, &objects, st_link) {
>>  		struct i915_vma *vma;
>>-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
>>+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto out;
>>@@ -1829,7 +1829,7 @@ static int igt_gtt_insert(void *arg)
>>  		struct i915_vma *vma;
>>  		u64 offset;
>>-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
>>+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto out;
>>@@ -1882,7 +1882,7 @@ static int igt_gtt_insert(void *arg)
>>  		list_add(&obj->st_link, &objects);
>>-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
>>+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>>  		if (IS_ERR(vma)) {
>>  			err = PTR_ERR(vma);
>>  			goto out;
>>@@ -2091,7 +2091,7 @@ static int igt_cs_tlb(void *arg)
>>  	}
>>  	i915_gem_object_set_cache_coherency(out, I915_CACHING_CACHED);
>>-	vma = i915_vma_instance(out, vm, NULL);
>>+	vma = i915_vma_instance(out, vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto out_put_out;
>>@@ -2131,7 +2131,7 @@ static int igt_cs_tlb(void *arg)
>>  			memset32(result, STACK_MAGIC, PAGE_SIZE / sizeof(u32));
>>-			vma = i915_vma_instance(bbe, vm, NULL);
>>+			vma = i915_vma_instance(bbe, vm, NULL, false);
>>  			if (IS_ERR(vma)) {
>>  				err = PTR_ERR(vma);
>>  				goto end;
>>@@ -2203,7 +2203,7 @@ static int igt_cs_tlb(void *arg)
>>  				goto end;
>>  			}
>>-			vma = i915_vma_instance(act, vm, NULL);
>>+			vma = i915_vma_instance(act, vm, NULL, false);
>>  			if (IS_ERR(vma)) {
>>  				kfree(vma_res);
>>  				err = PTR_ERR(vma);
>>diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
>>index 818a4909c1f3..297c1d4ebf44 100644
>>--- a/drivers/gpu/drm/i915/selftests/i915_request.c
>>+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
>>@@ -961,7 +961,7 @@ static struct i915_vma *empty_batch(struct drm_i915_private *i915)
>>  	intel_gt_chipset_flush(to_gt(i915));
>>-	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL);
>>+	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err;
>>@@ -1100,7 +1100,7 @@ static struct i915_vma *recursive_batch(struct drm_i915_private *i915)
>>  	if (IS_ERR(obj))
>>  		return ERR_CAST(obj);
>>-	vma = i915_vma_instance(obj, to_gt(i915)->vm, NULL);
>>+	vma = i915_vma_instance(obj, to_gt(i915)->vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto err;
>>diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
>>index 71b52d5efef4..3899c2252de3 100644
>>--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
>>+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
>>@@ -68,7 +68,7 @@ checked_vma_instance(struct drm_i915_gem_object *obj,
>>  	struct i915_vma *vma;
>>  	bool ok = true;
>>-	vma = i915_vma_instance(obj, vm, view);
>>+	vma = i915_vma_instance(obj, vm, view, false);
>>  	if (IS_ERR(vma))
>>  		return vma;
>>diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>>index 0c22594ae274..6901f94ff076 100644
>>--- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
>>+++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>>@@ -47,7 +47,7 @@ static void *igt_spinner_pin_obj(struct intel_context *ce,
>>  	void *vaddr;
>>  	int ret;
>>-	*vma = i915_vma_instance(obj, ce->vm, NULL);
>>+	*vma = i915_vma_instance(obj, ce->vm, NULL, false);
>>  	if (IS_ERR(*vma))
>>  		return ERR_CAST(*vma);
>>diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>>index 3b18e5905c86..551d0c958a3b 100644
>>--- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>>+++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>>@@ -745,7 +745,7 @@ static int igt_gpu_write(struct i915_gem_context *ctx,
>>  	if (!order)
>>  		return -ENOMEM;
>>-	vma = i915_vma_instance(obj, vm, NULL);
>>+	vma = i915_vma_instance(obj, vm, NULL, false);
>>  	if (IS_ERR(vma)) {
>>  		err = PTR_ERR(vma);
>>  		goto out_free;

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 13/14] drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  2022-09-24  4:30     ` Niranjana Vishwanathapura
@ 2022-09-26 16:26       ` Tvrtko Ursulin
  2022-09-26 17:09         ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 62+ messages in thread
From: Tvrtko Ursulin @ 2022-09-26 16:26 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	matthew.auld, daniel.vetter, christian.koenig


On 24/09/2022 05:30, Niranjana Vishwanathapura wrote:
> On Fri, Sep 23, 2022 at 09:40:20AM +0100, Tvrtko Ursulin wrote:
>>
>> On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
>>> vma_lookup is tied to segment of the object instead of section
>>
>> Can be, but not only that. It would be more accurate to say it is 
>> based of gtt views.
> 
> Yah, but new code is also based on gtt views, the only difference
> is that now there can be multiple mappings (at different VAs)
> to the same gtt_view of the object.
> 
>>
>>> of VA space. Hence, it do not support aliasing (ie., multiple
>>> bindings to the same section of the object).
>>> Skip vma_lookup for persistent vmas as it supports aliasing.
>>
>> What's broken without this patch? If something is, should it go 
>> somewhere earlier in the series? If so should be mentioned in the 
>> commit message.
>>
>> Or is it just a performance optimisation to skip unused tracking? If 
>> so should also be mentioned in the commit message.
>>
> 
> No, it is not a performance optimization.
> The vma_lookup is based on the fact that there can be only one mapping
> for a given gtt_view of the object.
> So, it was looking for gtt_view to find the mapping.
> 
> But now, as I mentioned above, there can be multiple mappings for a
> given gtt_view of the object. Hence the vma_lookup method won't work
> here. Hence, it is being skipped for persistent vmas.

Right, so in that case isn't this patch too late in the series? Granted 
you only allow _userspace_ to use vm bind in 14/14, but the kernel 
infrastructure is there and if there was a selftest it would be able to 
fail without this patch, no?

>>> Signed-off-by: Niranjana Vishwanathapura 
>>> <niranjana.vishwanathapura@intel.com>
>>> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/display/intel_fb_pin.c   |  2 +-
>>>  .../drm/i915/display/intel_plane_initial.c    |  2 +-
>>>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  4 +-
>>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  2 +-
>>>  .../gpu/drm/i915/gem/selftests/huge_pages.c   | 16 +++----
>>>  .../i915/gem/selftests/i915_gem_client_blt.c  |  2 +-
>>>  .../drm/i915/gem/selftests/i915_gem_context.c | 12 ++---
>>>  .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
>>>  .../drm/i915/gem/selftests/i915_gem_mman.c    |  6 ++-
>>>  .../drm/i915/gem/selftests/igt_gem_utils.c    |  2 +-
>>>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  2 +-
>>>  drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  2 +-
>>>  drivers/gpu/drm/i915/gt/intel_gt.c            |  2 +-
>>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +-
>>>  drivers/gpu/drm/i915/gt/intel_lrc.c           |  4 +-
>>>  drivers/gpu/drm/i915/gt/intel_renderstate.c   |  2 +-
>>>  drivers/gpu/drm/i915/gt/intel_ring.c          |  2 +-
>>>  .../gpu/drm/i915/gt/intel_ring_submission.c   |  4 +-
>>>  drivers/gpu/drm/i915/gt/intel_timeline.c      |  2 +-
>>>  drivers/gpu/drm/i915/gt/mock_engine.c         |  2 +-
>>>  drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |  4 +-
>>>  drivers/gpu/drm/i915/gt/selftest_execlists.c  | 16 +++----
>>>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  6 +--
>>>  drivers/gpu/drm/i915/gt/selftest_lrc.c        |  2 +-
>>>  .../drm/i915/gt/selftest_ring_submission.c    |  2 +-
>>>  drivers/gpu/drm/i915/gt/selftest_rps.c        |  2 +-
>>>  .../gpu/drm/i915/gt/selftest_workarounds.c    |  4 +-
>>>  drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  2 +-
>>>  drivers/gpu/drm/i915/i915_gem.c               |  2 +-
>>>  drivers/gpu/drm/i915/i915_perf.c              |  2 +-
>>>  drivers/gpu/drm/i915/i915_vma.c               | 26 +++++++----
>>>  drivers/gpu/drm/i915/i915_vma.h               |  3 +-
>>>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 44 +++++++++----------
>>>  drivers/gpu/drm/i915/selftests/i915_request.c |  4 +-
>>>  drivers/gpu/drm/i915/selftests/i915_vma.c     |  2 +-
>>>  drivers/gpu/drm/i915/selftests/igt_spinner.c  |  2 +-
>>>  .../drm/i915/selftests/intel_memory_region.c  |  2 +-
>>>  37 files changed, 106 insertions(+), 93 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/display/intel_fb_pin.c 
>>> b/drivers/gpu/drm/i915/display/intel_fb_pin.c
>>> index c86e5d4ee016..5a718b247bb3 100644
>>> --- a/drivers/gpu/drm/i915/display/intel_fb_pin.c
>>> +++ b/drivers/gpu/drm/i915/display/intel_fb_pin.c
>>> @@ -47,7 +47,7 @@ intel_pin_fb_obj_dpt(struct drm_framebuffer *fb,
>>>          goto err;
>>>      }
>>> -    vma = i915_vma_instance(obj, vm, view);
>>> +    vma = i915_vma_instance(obj, vm, view, false);
>>
>> Hey why are you touching all the legacy paths? >:P
>>
>>>      if (IS_ERR(vma))
>>>          goto err;
>>> diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c 
>>> b/drivers/gpu/drm/i915/display/intel_plane_initial.c
>>> index 76be796df255..7667e2faa3fb 100644
>>> --- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
>>> +++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
>>> @@ -136,7 +136,7 @@ initial_plane_vma(struct drm_i915_private *i915,
>>>          goto err_obj;
>>>      }
>>> -    vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL);
>>> +    vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL, false);
>>>      if (IS_ERR(vma))
>>>          goto err_obj;
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> index 363b2a788cdf..0ee43cb601b5 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> @@ -876,7 +876,7 @@ static struct i915_vma *eb_lookup_vma(struct 
>>> i915_execbuffer *eb, u32 handle)
>>>              }
>>>          }
>>> -        vma = i915_vma_instance(obj, vm, NULL);
>>> +        vma = i915_vma_instance(obj, vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              i915_gem_object_put(obj);
>>>              return vma;
>>> @@ -2208,7 +2208,7 @@ shadow_batch_pin(struct i915_execbuffer *eb,
>>>      struct i915_vma *vma;
>>>      int err;
>>> -    vma = i915_vma_instance(obj, vm, NULL);
>>> +    vma = i915_vma_instance(obj, vm, NULL, false);
>>>      if (IS_ERR(vma))
>>>          return vma;
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>> index 3087731cc0c0..4468603af6f1 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>> @@ -252,7 +252,7 @@ static struct i915_vma *vm_bind_get_vma(struct 
>>> i915_address_space *vm,
>>>      view.type = I915_GTT_VIEW_PARTIAL;
>>>      view.partial.offset = va->offset >> PAGE_SHIFT;
>>>      view.partial.size = va->length >> PAGE_SHIFT;
>>> -    vma = i915_vma_instance(obj, vm, &view);
>>> +    vma = i915_vma_instance(obj, vm, &view, true);
>>
>> This is the only caller passing "true". Leave i915_vma_instance as is, 
>> and add i915_vma_instance_persistent(), and drop 90% of the patch?
> 
> Yah, makes sense. Will fix it.
> 
>>
>>>      if (IS_ERR(vma))
>>>          return vma;
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c 
>>> b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>> index c570cf780079..6e13a83d0e36 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>> @@ -454,7 +454,7 @@ static int 
>>> igt_mock_exhaust_device_supported_pages(void *arg)
>>>                  goto out_put;
>>>              }
>>> -            vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
>>> +            vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
>>>              if (IS_ERR(vma)) {
>>>                  err = PTR_ERR(vma);
>>>                  goto out_put;
>>> @@ -522,7 +522,7 @@ static int igt_mock_memory_region_huge_pages(void 
>>> *arg)
>>>                  goto out_region;
>>>              }
>>> -            vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
>>> +            vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
>>>              if (IS_ERR(vma)) {
>>>                  err = PTR_ERR(vma);
>>>                  goto out_put;
>>> @@ -614,7 +614,7 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg)
>>>          /* Force the page size for this object */
>>>          obj->mm.page_sizes.sg = page_size;
>>> -        vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
>>> +        vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto out_unpin;
>>> @@ -746,7 +746,7 @@ static int igt_mock_ppgtt_huge_fill(void *arg)
>>>          list_add(&obj->st_link, &objects);
>>> -        vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
>>> +        vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              break;
>>> @@ -924,7 +924,7 @@ static int igt_mock_ppgtt_64K(void *arg)
>>>               */
>>>              obj->mm.page_sizes.sg &= ~I915_GTT_PAGE_SIZE_2M;
>>> -            vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
>>> +            vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
>>>              if (IS_ERR(vma)) {
>>>                  err = PTR_ERR(vma);
>>>                  goto out_object_unpin;
>>> @@ -1092,7 +1092,7 @@ static int __igt_write_huge(struct 
>>> intel_context *ce,
>>>      struct i915_vma *vma;
>>>      int err;
>>> -    vma = i915_vma_instance(obj, ce->vm, NULL);
>>> +    vma = i915_vma_instance(obj, ce->vm, NULL, false);
>>>      if (IS_ERR(vma))
>>>          return PTR_ERR(vma);
>>> @@ -1587,7 +1587,7 @@ static int igt_tmpfs_fallback(void *arg)
>>>      __i915_gem_object_flush_map(obj, 0, 64);
>>>      i915_gem_object_unpin_map(obj);
>>> -    vma = i915_vma_instance(obj, vm, NULL);
>>> +    vma = i915_vma_instance(obj, vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto out_put;
>>> @@ -1654,7 +1654,7 @@ static int igt_shrink_thp(void *arg)
>>>          goto out_vm;
>>>      }
>>> -    vma = i915_vma_instance(obj, vm, NULL);
>>> +    vma = i915_vma_instance(obj, vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto out_put;
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>> index 9a6a6b5b722b..e6c6c73bf80e 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>> @@ -282,7 +282,7 @@ __create_vma(struct tiled_blits *t, size_t size, 
>>> bool lmem)
>>>      if (IS_ERR(obj))
>>>          return ERR_CAST(obj);
>>> -    vma = i915_vma_instance(obj, t->ce->vm, NULL);
>>> +    vma = i915_vma_instance(obj, t->ce->vm, NULL, false);
>>>      if (IS_ERR(vma))
>>>          i915_gem_object_put(obj);
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c 
>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
>>> index c6ad67b90e8a..570f74df9bef 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
>>> @@ -426,7 +426,7 @@ static int gpu_fill(struct intel_context *ce,
>>>      GEM_BUG_ON(obj->base.size > ce->vm->total);
>>>      GEM_BUG_ON(!intel_engine_can_store_dword(ce->engine));
>>> -    vma = i915_vma_instance(obj, ce->vm, NULL);
>>> +    vma = i915_vma_instance(obj, ce->vm, NULL, false);
>>>      if (IS_ERR(vma))
>>>          return PTR_ERR(vma);
>>> @@ -930,7 +930,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj,
>>>      if (GRAPHICS_VER(i915) < 8)
>>>          return -EINVAL;
>>> -    vma = i915_vma_instance(obj, ce->vm, NULL);
>>> +    vma = i915_vma_instance(obj, ce->vm, NULL, false);
>>>      if (IS_ERR(vma))
>>>          return PTR_ERR(vma);
>>> @@ -938,7 +938,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj,
>>>      if (IS_ERR(rpcs))
>>>          return PTR_ERR(rpcs);
>>> -    batch = i915_vma_instance(rpcs, ce->vm, NULL);
>>> +    batch = i915_vma_instance(rpcs, ce->vm, NULL, false);
>>>      if (IS_ERR(batch)) {
>>>          err = PTR_ERR(batch);
>>>          goto err_put;
>>> @@ -1522,7 +1522,7 @@ static int write_to_scratch(struct 
>>> i915_gem_context *ctx,
>>>      intel_gt_chipset_flush(engine->gt);
>>>      vm = i915_gem_context_get_eb_vm(ctx);
>>> -    vma = i915_vma_instance(obj, vm, NULL);
>>> +    vma = i915_vma_instance(obj, vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto out_vm;
>>> @@ -1599,7 +1599,7 @@ static int read_from_scratch(struct 
>>> i915_gem_context *ctx,
>>>          const u32 GPR0 = engine->mmio_base + 0x600;
>>>          vm = i915_gem_context_get_eb_vm(ctx);
>>> -        vma = i915_vma_instance(obj, vm, NULL);
>>> +        vma = i915_vma_instance(obj, vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto out_vm;
>>> @@ -1635,7 +1635,7 @@ static int read_from_scratch(struct 
>>> i915_gem_context *ctx,
>>>          /* hsw: register access even to 3DPRIM! is protected */
>>>          vm = i915_vm_get(&engine->gt->ggtt->vm);
>>> -        vma = i915_vma_instance(obj, vm, NULL);
>>> +        vma = i915_vma_instance(obj, vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto out_vm;
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c 
>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>>> index fe6c37fd7859..fc235e1e6c12 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>>> @@ -201,7 +201,7 @@ static int __igt_lmem_pages_migrate(struct 
>>> intel_gt *gt,
>>>          return PTR_ERR(obj);
>>>      if (vm) {
>>> -        vma = i915_vma_instance(obj, vm, NULL);
>>> +        vma = i915_vma_instance(obj, vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto out_put;
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>> index b73c91aa5450..e07c91dc33ba 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>> @@ -546,7 +546,8 @@ static int make_obj_busy(struct 
>>> drm_i915_gem_object *obj)
>>>          struct i915_gem_ww_ctx ww;
>>>          int err;
>>> -        vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>> +        vma = i915_vma_instance(obj, &engine->gt->ggtt->vm,
>>> +                    NULL, false);
>>>          if (IS_ERR(vma))
>>>              return PTR_ERR(vma);
>>> @@ -1587,7 +1588,8 @@ static int __igt_mmap_gpu(struct 
>>> drm_i915_private *i915,
>>>          struct i915_vma *vma;
>>>          struct i915_gem_ww_ctx ww;
>>> -        vma = i915_vma_instance(obj, engine->kernel_context->vm, NULL);
>>> +        vma = i915_vma_instance(obj, engine->kernel_context->vm,
>>> +                    NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto out_unmap;
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c 
>>> b/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
>>> index 3c55e77b0f1b..4184e198c824 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
>>> @@ -91,7 +91,7 @@ igt_emit_store_dw(struct i915_vma *vma,
>>>      intel_gt_chipset_flush(vma->vm->gt);
>>> -    vma = i915_vma_instance(obj, vma->vm, NULL);
>>> +    vma = i915_vma_instance(obj, vma->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err;
>>> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c 
>>> b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>>> index 1bb766c79dcb..a0af2aa50533 100644
>>> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>>> @@ -395,7 +395,7 @@ gen6_alloc_top_pd(struct gen6_ppgtt *ppgtt)
>>>      pd->pt.base->base.resv = i915_vm_resv_get(&ppgtt->base.vm);
>>>      pd->pt.base->shares_resv_from = &ppgtt->base.vm;
>>> -    ppgtt->vma = i915_vma_instance(pd->pt.base, &ggtt->vm, NULL);
>>> +    ppgtt->vma = i915_vma_instance(pd->pt.base, &ggtt->vm, NULL, 
>>> false);
>>>      if (IS_ERR(ppgtt->vma)) {
>>>          err = PTR_ERR(ppgtt->vma);
>>>          ppgtt->vma = NULL;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
>>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> index 2ddcad497fa3..8146bf811d0f 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> @@ -1001,7 +1001,7 @@ static int init_status_page(struct 
>>> intel_engine_cs *engine)
>>>      i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>> -    vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>> +    vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          ret = PTR_ERR(vma);
>>>          goto err_put;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
>>> b/drivers/gpu/drm/i915/gt/intel_gt.c
>>> index b367cfff48d5..8a78c6cec7b4 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>>> @@ -441,7 +441,7 @@ static int intel_gt_init_scratch(struct intel_gt 
>>> *gt, unsigned int size)
>>>          return PTR_ERR(obj);
>>>      }
>>> -    vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>> +    vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          ret = PTR_ERR(vma);
>>>          goto err_unref;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> index 401202391649..c9bc33149ad7 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> @@ -628,7 +628,7 @@ __vm_create_scratch_for_read(struct 
>>> i915_address_space *vm, unsigned long size)
>>>      i915_gem_object_set_cache_coherency(obj, I915_CACHING_CACHED);
>>> -    vma = i915_vma_instance(obj, vm, NULL);
>>> +    vma = i915_vma_instance(obj, vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          i915_gem_object_put(obj);
>>>          return vma;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
>>> b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> index 3955292483a6..570d097a2492 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> @@ -1029,7 +1029,7 @@ __lrc_alloc_state(struct intel_context *ce, 
>>> struct intel_engine_cs *engine)
>>>      if (IS_ERR(obj))
>>>          return ERR_CAST(obj);
>>> -    vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>> +    vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          i915_gem_object_put(obj);
>>>          return vma;
>>> @@ -1685,7 +1685,7 @@ static int lrc_create_wa_ctx(struct 
>>> intel_engine_cs *engine)
>>>      if (IS_ERR(obj))
>>>          return PTR_ERR(obj);
>>> -    vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>> +    vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_renderstate.c 
>>> b/drivers/gpu/drm/i915/gt/intel_renderstate.c
>>> index 5121e6dc2fa5..bc7a2d4421db 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_renderstate.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_renderstate.c
>>> @@ -157,7 +157,7 @@ int intel_renderstate_init(struct 
>>> intel_renderstate *so,
>>>          if (IS_ERR(obj))
>>>              return PTR_ERR(obj);
>>> -        so->vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>> +        so->vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, 
>>> NULL, false);
>>>          if (IS_ERR(so->vma)) {
>>>              err = PTR_ERR(so->vma);
>>>              goto err_obj;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c 
>>> b/drivers/gpu/drm/i915/gt/intel_ring.c
>>> index 15ec64d881c4..24c8b738a394 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ring.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ring.c
>>> @@ -130,7 +130,7 @@ static struct i915_vma *create_ring_vma(struct 
>>> i915_ggtt *ggtt, int size)
>>>      if (vm->has_read_only)
>>>          i915_gem_object_set_readonly(obj);
>>> -    vma = i915_vma_instance(obj, vm, NULL);
>>> +    vma = i915_vma_instance(obj, vm, NULL, false);
>>>      if (IS_ERR(vma))
>>>          goto err;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c 
>>> b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> index d5d6f1fadcae..5e93a4052140 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> @@ -551,7 +551,7 @@ alloc_context_vma(struct intel_engine_cs *engine)
>>>      if (IS_IVYBRIDGE(i915))
>>>          i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
>>> -    vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>> +    vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err_obj;
>>> @@ -1291,7 +1291,7 @@ static struct i915_vma *gen7_ctx_vma(struct 
>>> intel_engine_cs *engine)
>>>      if (IS_ERR(obj))
>>>          return ERR_CAST(obj);
>>> -    vma = i915_vma_instance(obj, engine->gt->vm, NULL);
>>> +    vma = i915_vma_instance(obj, engine->gt->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          i915_gem_object_put(obj);
>>>          return ERR_CAST(vma);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c 
>>> b/drivers/gpu/drm/i915/gt/intel_timeline.c
>>> index b9640212d659..31f56996f100 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
>>> @@ -28,7 +28,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt 
>>> *gt)
>>>      i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>> -    vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>> +    vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>>>      if (IS_ERR(vma))
>>>          i915_gem_object_put(obj);
>>> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c 
>>> b/drivers/gpu/drm/i915/gt/mock_engine.c
>>> index c0637bf799a3..6f3578308395 100644
>>> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
>>> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
>>> @@ -46,7 +46,7 @@ static struct i915_vma *create_ring_vma(struct 
>>> i915_ggtt *ggtt, int size)
>>>      if (IS_ERR(obj))
>>>          return ERR_CAST(obj);
>>> -    vma = i915_vma_instance(obj, vm, NULL);
>>> +    vma = i915_vma_instance(obj, vm, NULL, false);
>>>      if (IS_ERR(vma))
>>>          goto err;
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c 
>>> b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
>>> index 1b75f478d1b8..16fcaba7c980 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
>>> @@ -85,7 +85,7 @@ static struct i915_vma *create_empty_batch(struct 
>>> intel_context *ce)
>>>      i915_gem_object_flush_map(obj);
>>> -    vma = i915_vma_instance(obj, ce->vm, NULL);
>>> +    vma = i915_vma_instance(obj, ce->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err_unpin;
>>> @@ -222,7 +222,7 @@ static struct i915_vma *create_nop_batch(struct 
>>> intel_context *ce)
>>>      i915_gem_object_flush_map(obj);
>>> -    vma = i915_vma_instance(obj, ce->vm, NULL);
>>> +    vma = i915_vma_instance(obj, ce->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err_unpin;
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c 
>>> b/drivers/gpu/drm/i915/gt/selftest_execlists.c
>>> index 1e08b2473b99..643ffcb3964a 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
>>> @@ -1000,7 +1000,7 @@ static int live_timeslice_preempt(void *arg)
>>>      if (IS_ERR(obj))
>>>          return PTR_ERR(obj);
>>> -    vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>> +    vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err_obj;
>>> @@ -1307,7 +1307,7 @@ static int live_timeslice_queue(void *arg)
>>>      if (IS_ERR(obj))
>>>          return PTR_ERR(obj);
>>> -    vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>> +    vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err_obj;
>>> @@ -1562,7 +1562,7 @@ static int live_busywait_preempt(void *arg)
>>>          goto err_obj;
>>>      }
>>> -    vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>> +    vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err_map;
>>> @@ -2716,7 +2716,7 @@ static int create_gang(struct intel_engine_cs 
>>> *engine,
>>>          goto err_ce;
>>>      }
>>> -    vma = i915_vma_instance(obj, ce->vm, NULL);
>>> +    vma = i915_vma_instance(obj, ce->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err_obj;
>>> @@ -3060,7 +3060,7 @@ create_gpr_user(struct intel_engine_cs *engine,
>>>      if (IS_ERR(obj))
>>>          return ERR_CAST(obj);
>>> -    vma = i915_vma_instance(obj, result->vm, NULL);
>>> +    vma = i915_vma_instance(obj, result->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          i915_gem_object_put(obj);
>>>          return vma;
>>> @@ -3130,7 +3130,7 @@ static struct i915_vma *create_global(struct 
>>> intel_gt *gt, size_t sz)
>>>      if (IS_ERR(obj))
>>>          return ERR_CAST(obj);
>>> -    vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>> +    vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          i915_gem_object_put(obj);
>>>          return vma;
>>> @@ -3159,7 +3159,7 @@ create_gpr_client(struct intel_engine_cs *engine,
>>>      if (IS_ERR(ce))
>>>          return ERR_CAST(ce);
>>> -    vma = i915_vma_instance(global->obj, ce->vm, NULL);
>>> +    vma = i915_vma_instance(global->obj, ce->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto out_ce;
>>> @@ -3501,7 +3501,7 @@ static int smoke_submit(struct preempt_smoke 
>>> *smoke,
>>>          struct i915_address_space *vm;
>>>          vm = i915_gem_context_get_eb_vm(ctx);
>>> -        vma = i915_vma_instance(batch, vm, NULL);
>>> +        vma = i915_vma_instance(batch, vm, NULL, false);
>>>          i915_vm_put(vm);
>>>          if (IS_ERR(vma))
>>>              return PTR_ERR(vma);
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c 
>>> b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>>> index 7f3bb1d34dfb..0b021a32d0e0 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>>> @@ -147,13 +147,13 @@ hang_create_request(struct hang *h, struct 
>>> intel_engine_cs *engine)
>>>      h->obj = obj;
>>>      h->batch = vaddr;
>>> -    vma = i915_vma_instance(h->obj, vm, NULL);
>>> +    vma = i915_vma_instance(h->obj, vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          i915_vm_put(vm);
>>>          return ERR_CAST(vma);
>>>      }
>>> -    hws = i915_vma_instance(h->hws, vm, NULL);
>>> +    hws = i915_vma_instance(h->hws, vm, NULL, false);
>>>      if (IS_ERR(hws)) {
>>>          i915_vm_put(vm);
>>>          return ERR_CAST(hws);
>>> @@ -1474,7 +1474,7 @@ static int __igt_reset_evict_vma(struct 
>>> intel_gt *gt,
>>>          }
>>>      }
>>> -    arg.vma = i915_vma_instance(obj, vm, NULL);
>>> +    arg.vma = i915_vma_instance(obj, vm, NULL, false);
>>>      if (IS_ERR(arg.vma)) {
>>>          err = PTR_ERR(arg.vma);
>>>          pr_err("[%s] VMA instance failed: %d!\n", engine->name, err);
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c 
>>> b/drivers/gpu/drm/i915/gt/selftest_lrc.c
>>> index 82d3f8058995..32867049b3bf 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
>>> @@ -938,7 +938,7 @@ create_user_vma(struct i915_address_space *vm, 
>>> unsigned long size)
>>>      if (IS_ERR(obj))
>>>          return ERR_CAST(obj);
>>> -    vma = i915_vma_instance(obj, vm, NULL);
>>> +    vma = i915_vma_instance(obj, vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          i915_gem_object_put(obj);
>>>          return vma;
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c 
>>> b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
>>> index 70f9ac1ec2c7..7e9361104620 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
>>> @@ -17,7 +17,7 @@ static struct i915_vma *create_wally(struct 
>>> intel_engine_cs *engine)
>>>      if (IS_ERR(obj))
>>>          return ERR_CAST(obj);
>>> -    vma = i915_vma_instance(obj, engine->gt->vm, NULL);
>>> +    vma = i915_vma_instance(obj, engine->gt->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          i915_gem_object_put(obj);
>>>          return vma;
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c 
>>> b/drivers/gpu/drm/i915/gt/selftest_rps.c
>>> index cfb4708dd62e..327558828bef 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_rps.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
>>> @@ -78,7 +78,7 @@ create_spin_counter(struct intel_engine_cs *engine,
>>>      end = obj->base.size / sizeof(u32) - 1;
>>> -    vma = i915_vma_instance(obj, vm, NULL);
>>> +    vma = i915_vma_instance(obj, vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err_put;
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c 
>>> b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>>> index 67a9aab801dd..d893ea763ac6 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>>> @@ -122,7 +122,7 @@ read_nonprivs(struct intel_context *ce)
>>>      i915_gem_object_flush_map(result);
>>>      i915_gem_object_unpin_map(result);
>>> -    vma = i915_vma_instance(result, &engine->gt->ggtt->vm, NULL);
>>> +    vma = i915_vma_instance(result, &engine->gt->ggtt->vm, NULL, 
>>> false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err_obj;
>>> @@ -389,7 +389,7 @@ static struct i915_vma *create_batch(struct 
>>> i915_address_space *vm)
>>>      if (IS_ERR(obj))
>>>          return ERR_CAST(obj);
>>> -    vma = i915_vma_instance(obj, vm, NULL);
>>> +    vma = i915_vma_instance(obj, vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err_obj;
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>> index bac06e3d6f2c..d56b1f82250c 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>> @@ -737,7 +737,7 @@ struct i915_vma *intel_guc_allocate_vma(struct 
>>> intel_guc *guc, u32 size)
>>>      if (IS_ERR(obj))
>>>          return ERR_CAST(obj);
>>> -    vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>> +    vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
>>>      if (IS_ERR(vma))
>>>          goto err;
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>>> b/drivers/gpu/drm/i915/i915_gem.c
>>> index 88df9a35e0fe..bb6b1f56836f 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -934,7 +934,7 @@ i915_gem_object_ggtt_pin_ww(struct 
>>> drm_i915_gem_object *obj,
>>>      }
>>>  new_vma:
>>> -    vma = i915_vma_instance(obj, &ggtt->vm, view);
>>> +    vma = i915_vma_instance(obj, &ggtt->vm, view, false);
>>>      if (IS_ERR(vma))
>>>          return vma;
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>> b/drivers/gpu/drm/i915/i915_perf.c
>>> index 0defbb43ceea..d8f5ef9fd00f 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -1920,7 +1920,7 @@ alloc_oa_config_buffer(struct i915_perf_stream 
>>> *stream,
>>>      oa_bo->vma = i915_vma_instance(obj,
>>>                         &stream->engine->gt->ggtt->vm,
>>> -                       NULL);
>>> +                       NULL, false);
>>>      if (IS_ERR(oa_bo->vma)) {
>>>          err = PTR_ERR(oa_bo->vma);
>>>          goto out_ww;
>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>> b/drivers/gpu/drm/i915/i915_vma.c
>>> index 24f171588f56..ef709a61fd54 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>> @@ -110,7 +110,8 @@ static void __i915_vma_retire(struct i915_active 
>>> *ref)
>>>  static struct i915_vma *
>>>  vma_create(struct drm_i915_gem_object *obj,
>>>         struct i915_address_space *vm,
>>> -       const struct i915_gtt_view *view)
>>> +       const struct i915_gtt_view *view,
>>> +       bool persistent)
>>>  {
>>>      struct i915_vma *pos = ERR_PTR(-E2BIG);
>>>      struct i915_vma *vma;
>>> @@ -197,6 +198,9 @@ vma_create(struct drm_i915_gem_object *obj,
>>>          __set_bit(I915_VMA_GGTT_BIT, __i915_vma_flags(vma));
>>>      }
>>> +    if (persistent)
>>> +        goto skip_rb_insert;
>>
>> Oh so you don't use the gtt_view's fully at all. I now have 
>> reservations whether that was the right approach. Since you are not 
>> using the existing rb tree tracking I mean..
>>
>> You know if a vma is persistent right? So you could have just added 
>> special case for persistent vmas to __i915_vma_get_pages and still 
>> call intel_partial_pages from there. Maybe union over struct 
>> i915_gtt_view in i915_vma for either the view or struct 
>> intel_partial_info for persistent ones.
>>
> 
> We are using the gtt_view fully in this patch for persistent vmas.

I guess yours and mine definition of fully are different. :)

> But as mentioned above, now we have support multiple mappings
> for the same gtt_view of the object. For this, the current
> vma_lookup() falls short. So, we are skipping it.

I get it - but then, having only now noticed how it will be used, I am 
less convinced touching the ggtt_view code was the right approach.

What about what I proposed above? That you just add code to 
__i915_vma_get_pages, which in case of a persistent VMA would call 
intel_partial_pages from there.

If that works I think it's cleaner and we'd just revert the ggtt_view to 
gtt_view rename.

Regards,

Tvrtko

> 
> Regards,
> Niranjana
> 
>> Regards,
>>
>> Tvrtko
>>
>>> +
>>>      rb = NULL;
>>>      p = &obj->vma.tree.rb_node;
>>>      while (*p) {
>>> @@ -221,6 +225,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>>      rb_link_node(&vma->obj_node, rb, p);
>>>      rb_insert_color(&vma->obj_node, &obj->vma.tree);
>>> +skip_rb_insert:
>>>      if (i915_vma_is_ggtt(vma))
>>>          /*
>>>           * We put the GGTT vma at the start of the vma-list, followed
>>> @@ -279,6 +284,7 @@ i915_vma_lookup(struct drm_i915_gem_object *obj,
>>>   * @obj: parent &struct drm_i915_gem_object to be mapped
>>>   * @vm: address space in which the mapping is located
>>>   * @view: additional mapping requirements
>>> + * @persistent: Whether the vma is persistent
>>>   *
>>>   * i915_vma_instance() looks up an existing VMA of the @obj in the 
>>> @vm with
>>>   * the same @view characteristics. If a match is not found, one is 
>>> created.
>>> @@ -290,19 +296,22 @@ i915_vma_lookup(struct drm_i915_gem_object *obj,
>>>  struct i915_vma *
>>>  i915_vma_instance(struct drm_i915_gem_object *obj,
>>>            struct i915_address_space *vm,
>>> -          const struct i915_gtt_view *view)
>>> +          const struct i915_gtt_view *view,
>>> +          bool persistent)
>>>  {
>>> -    struct i915_vma *vma;
>>> +    struct i915_vma *vma = NULL;
>>>      GEM_BUG_ON(!kref_read(&vm->ref));
>>> -    spin_lock(&obj->vma.lock);
>>> -    vma = i915_vma_lookup(obj, vm, view);
>>> -    spin_unlock(&obj->vma.lock);
>>> +    if (!persistent) {
>>> +        spin_lock(&obj->vma.lock);
>>> +        vma = i915_vma_lookup(obj, vm, view);
>>> +        spin_unlock(&obj->vma.lock);
>>> +    }
>>>      /* vma_create() will resolve the race if another creates the vma */
>>>      if (unlikely(!vma))
>>> -        vma = vma_create(obj, vm, view);
>>> +        vma = vma_create(obj, vm, view, persistent);
>>>      GEM_BUG_ON(!IS_ERR(vma) && i915_vma_compare(vma, vm, view));
>>>      return vma;
>>> @@ -1704,7 +1713,8 @@ static void release_references(struct i915_vma 
>>> *vma, struct intel_gt *gt,
>>>      spin_lock(&obj->vma.lock);
>>>      list_del(&vma->obj_link);
>>> -    if (!RB_EMPTY_NODE(&vma->obj_node))
>>> +    if (!i915_vma_is_persistent(vma) &&
>>> +        !RB_EMPTY_NODE(&vma->obj_node))
>>>          rb_erase(&vma->obj_node, &obj->vma.tree);
>>>      spin_unlock(&obj->vma.lock);
>>> diff --git a/drivers/gpu/drm/i915/i915_vma.h 
>>> b/drivers/gpu/drm/i915/i915_vma.h
>>> index 3a47db2d85f5..b8e805c6532f 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma.h
>>> +++ b/drivers/gpu/drm/i915/i915_vma.h
>>> @@ -43,7 +43,8 @@
>>>  struct i915_vma *
>>>  i915_vma_instance(struct drm_i915_gem_object *obj,
>>>            struct i915_address_space *vm,
>>> -          const struct i915_gtt_view *view);
>>> +          const struct i915_gtt_view *view,
>>> +          bool persistent);
>>>  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned 
>>> int flags);
>>>  #define I915_VMA_RELEASE_MAP BIT(0)
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> index e050a2de5fd1..d8ffbdf91498 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> @@ -390,7 +390,7 @@ static void close_object_list(struct list_head 
>>> *objects,
>>>      list_for_each_entry_safe(obj, on, objects, st_link) {
>>>          struct i915_vma *vma;
>>> -        vma = i915_vma_instance(obj, vm, NULL);
>>> +        vma = i915_vma_instance(obj, vm, NULL, false);
>>>          if (!IS_ERR(vma))
>>>              ignored = i915_vma_unbind_unlocked(vma);
>>> @@ -452,7 +452,7 @@ static int fill_hole(struct i915_address_space *vm,
>>>                      u64 aligned_size = round_up(obj->base.size,
>>>                                      min_alignment);
>>> -                    vma = i915_vma_instance(obj, vm, NULL);
>>> +                    vma = i915_vma_instance(obj, vm, NULL, false);
>>>                      if (IS_ERR(vma))
>>>                          continue;
>>> @@ -492,7 +492,7 @@ static int fill_hole(struct i915_address_space *vm,
>>>                      u64 aligned_size = round_up(obj->base.size,
>>>                                      min_alignment);
>>> -                    vma = i915_vma_instance(obj, vm, NULL);
>>> +                    vma = i915_vma_instance(obj, vm, NULL, false);
>>>                      if (IS_ERR(vma))
>>>                          continue;
>>> @@ -531,7 +531,7 @@ static int fill_hole(struct i915_address_space *vm,
>>>                      u64 aligned_size = round_up(obj->base.size,
>>>                                      min_alignment);
>>> -                    vma = i915_vma_instance(obj, vm, NULL);
>>> +                    vma = i915_vma_instance(obj, vm, NULL, false);
>>>                      if (IS_ERR(vma))
>>>                          continue;
>>> @@ -571,7 +571,7 @@ static int fill_hole(struct i915_address_space *vm,
>>>                      u64 aligned_size = round_up(obj->base.size,
>>>                                      min_alignment);
>>> -                    vma = i915_vma_instance(obj, vm, NULL);
>>> +                    vma = i915_vma_instance(obj, vm, NULL, false);
>>>                      if (IS_ERR(vma))
>>>                          continue;
>>> @@ -653,7 +653,7 @@ static int walk_hole(struct i915_address_space *vm,
>>>          if (IS_ERR(obj))
>>>              break;
>>> -        vma = i915_vma_instance(obj, vm, NULL);
>>> +        vma = i915_vma_instance(obj, vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto err_put;
>>> @@ -728,7 +728,7 @@ static int pot_hole(struct i915_address_space *vm,
>>>      if (IS_ERR(obj))
>>>          return PTR_ERR(obj);
>>> -    vma = i915_vma_instance(obj, vm, NULL);
>>> +    vma = i915_vma_instance(obj, vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err_obj;
>>> @@ -837,7 +837,7 @@ static int drunk_hole(struct i915_address_space *vm,
>>>              break;
>>>          }
>>> -        vma = i915_vma_instance(obj, vm, NULL);
>>> +        vma = i915_vma_instance(obj, vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto err_obj;
>>> @@ -920,7 +920,7 @@ static int __shrink_hole(struct 
>>> i915_address_space *vm,
>>>          list_add(&obj->st_link, &objects);
>>> -        vma = i915_vma_instance(obj, vm, NULL);
>>> +        vma = i915_vma_instance(obj, vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              break;
>>> @@ -1018,7 +1018,7 @@ static int shrink_boom(struct 
>>> i915_address_space *vm,
>>>          if (IS_ERR(purge))
>>>              return PTR_ERR(purge);
>>> -        vma = i915_vma_instance(purge, vm, NULL);
>>> +        vma = i915_vma_instance(purge, vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto err_purge;
>>> @@ -1041,7 +1041,7 @@ static int shrink_boom(struct 
>>> i915_address_space *vm,
>>>          vm->fault_attr.interval = 1;
>>>          atomic_set(&vm->fault_attr.times, -1);
>>> -        vma = i915_vma_instance(explode, vm, NULL);
>>> +        vma = i915_vma_instance(explode, vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto err_explode;
>>> @@ -1088,7 +1088,7 @@ static int misaligned_case(struct 
>>> i915_address_space *vm, struct intel_memory_re
>>>          return PTR_ERR(obj);
>>>      }
>>> -    vma = i915_vma_instance(obj, vm, NULL);
>>> +    vma = i915_vma_instance(obj, vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err_put;
>>> @@ -1560,7 +1560,7 @@ static int igt_gtt_reserve(void *arg)
>>>          }
>>>          list_add(&obj->st_link, &objects);
>>> -        vma = i915_vma_instance(obj, &ggtt->vm, NULL);
>>> +        vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto out;
>>> @@ -1606,7 +1606,7 @@ static int igt_gtt_reserve(void *arg)
>>>          list_add(&obj->st_link, &objects);
>>> -        vma = i915_vma_instance(obj, &ggtt->vm, NULL);
>>> +        vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto out;
>>> @@ -1636,7 +1636,7 @@ static int igt_gtt_reserve(void *arg)
>>>          struct i915_vma *vma;
>>>          u64 offset;
>>> -        vma = i915_vma_instance(obj, &ggtt->vm, NULL);
>>> +        vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto out;
>>> @@ -1783,7 +1783,7 @@ static int igt_gtt_insert(void *arg)
>>>          list_add(&obj->st_link, &objects);
>>> -        vma = i915_vma_instance(obj, &ggtt->vm, NULL);
>>> +        vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto out;
>>> @@ -1809,7 +1809,7 @@ static int igt_gtt_insert(void *arg)
>>>      list_for_each_entry(obj, &objects, st_link) {
>>>          struct i915_vma *vma;
>>> -        vma = i915_vma_instance(obj, &ggtt->vm, NULL);
>>> +        vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto out;
>>> @@ -1829,7 +1829,7 @@ static int igt_gtt_insert(void *arg)
>>>          struct i915_vma *vma;
>>>          u64 offset;
>>> -        vma = i915_vma_instance(obj, &ggtt->vm, NULL);
>>> +        vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto out;
>>> @@ -1882,7 +1882,7 @@ static int igt_gtt_insert(void *arg)
>>>          list_add(&obj->st_link, &objects);
>>> -        vma = i915_vma_instance(obj, &ggtt->vm, NULL);
>>> +        vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
>>>          if (IS_ERR(vma)) {
>>>              err = PTR_ERR(vma);
>>>              goto out;
>>> @@ -2091,7 +2091,7 @@ static int igt_cs_tlb(void *arg)
>>>      }
>>>      i915_gem_object_set_cache_coherency(out, I915_CACHING_CACHED);
>>> -    vma = i915_vma_instance(out, vm, NULL);
>>> +    vma = i915_vma_instance(out, vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto out_put_out;
>>> @@ -2131,7 +2131,7 @@ static int igt_cs_tlb(void *arg)
>>>              memset32(result, STACK_MAGIC, PAGE_SIZE / sizeof(u32));
>>> -            vma = i915_vma_instance(bbe, vm, NULL);
>>> +            vma = i915_vma_instance(bbe, vm, NULL, false);
>>>              if (IS_ERR(vma)) {
>>>                  err = PTR_ERR(vma);
>>>                  goto end;
>>> @@ -2203,7 +2203,7 @@ static int igt_cs_tlb(void *arg)
>>>                  goto end;
>>>              }
>>> -            vma = i915_vma_instance(act, vm, NULL);
>>> +            vma = i915_vma_instance(act, vm, NULL, false);
>>>              if (IS_ERR(vma)) {
>>>                  kfree(vma_res);
>>>                  err = PTR_ERR(vma);
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c 
>>> b/drivers/gpu/drm/i915/selftests/i915_request.c
>>> index 818a4909c1f3..297c1d4ebf44 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
>>> @@ -961,7 +961,7 @@ static struct i915_vma *empty_batch(struct 
>>> drm_i915_private *i915)
>>>      intel_gt_chipset_flush(to_gt(i915));
>>> -    vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL);
>>> +    vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err;
>>> @@ -1100,7 +1100,7 @@ static struct i915_vma *recursive_batch(struct 
>>> drm_i915_private *i915)
>>>      if (IS_ERR(obj))
>>>          return ERR_CAST(obj);
>>> -    vma = i915_vma_instance(obj, to_gt(i915)->vm, NULL);
>>> +    vma = i915_vma_instance(obj, to_gt(i915)->vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto err;
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c 
>>> b/drivers/gpu/drm/i915/selftests/i915_vma.c
>>> index 71b52d5efef4..3899c2252de3 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_vma.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
>>> @@ -68,7 +68,7 @@ checked_vma_instance(struct drm_i915_gem_object *obj,
>>>      struct i915_vma *vma;
>>>      bool ok = true;
>>> -    vma = i915_vma_instance(obj, vm, view);
>>> +    vma = i915_vma_instance(obj, vm, view, false);
>>>      if (IS_ERR(vma))
>>>          return vma;
>>> diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c 
>>> b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>>> index 0c22594ae274..6901f94ff076 100644
>>> --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
>>> +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>>> @@ -47,7 +47,7 @@ static void *igt_spinner_pin_obj(struct 
>>> intel_context *ce,
>>>      void *vaddr;
>>>      int ret;
>>> -    *vma = i915_vma_instance(obj, ce->vm, NULL);
>>> +    *vma = i915_vma_instance(obj, ce->vm, NULL, false);
>>>      if (IS_ERR(*vma))
>>>          return ERR_CAST(*vma);
>>> diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c 
>>> b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>>> index 3b18e5905c86..551d0c958a3b 100644
>>> --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>>> +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>>> @@ -745,7 +745,7 @@ static int igt_gpu_write(struct i915_gem_context 
>>> *ctx,
>>>      if (!order)
>>>          return -ENOMEM;
>>> -    vma = i915_vma_instance(obj, vm, NULL);
>>> +    vma = i915_vma_instance(obj, vm, NULL, false);
>>>      if (IS_ERR(vma)) {
>>>          err = PTR_ERR(vma);
>>>          goto out_free;

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 13/14] drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  2022-09-26 16:26       ` Tvrtko Ursulin
@ 2022-09-26 17:09         ` Niranjana Vishwanathapura
  2022-09-27  9:28           ` Tvrtko Ursulin
  0 siblings, 1 reply; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-26 17:09 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	matthew.auld, daniel.vetter, christian.koenig

On Mon, Sep 26, 2022 at 05:26:12PM +0100, Tvrtko Ursulin wrote:
>
>On 24/09/2022 05:30, Niranjana Vishwanathapura wrote:
>>On Fri, Sep 23, 2022 at 09:40:20AM +0100, Tvrtko Ursulin wrote:
>>>
>>>On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
>>>>vma_lookup is tied to segment of the object instead of section
>>>
>>>Can be, but not only that. It would be more accurate to say it is 
>>>based of gtt views.
>>
>>Yah, but new code is also based on gtt views, the only difference
>>is that now there can be multiple mappings (at different VAs)
>>to the same gtt_view of the object.
>>
>>>
>>>>of VA space. Hence, it do not support aliasing (ie., multiple
>>>>bindings to the same section of the object).
>>>>Skip vma_lookup for persistent vmas as it supports aliasing.
>>>
>>>What's broken without this patch? If something is, should it go 
>>>somewhere earlier in the series? If so should be mentioned in the 
>>>commit message.
>>>
>>>Or is it just a performance optimisation to skip unused tracking? 
>>>If so should also be mentioned in the commit message.
>>>
>>
>>No, it is not a performance optimization.
>>The vma_lookup is based on the fact that there can be only one mapping
>>for a given gtt_view of the object.
>>So, it was looking for gtt_view to find the mapping.
>>
>>But now, as I mentioned above, there can be multiple mappings for a
>>given gtt_view of the object. Hence the vma_lookup method won't work
>>here. Hence, it is being skipped for persistent vmas.
>
>Right, so in that case isn't this patch too late in the series? 
>Granted you only allow _userspace_ to use vm bind in 14/14, but the 
>kernel infrastructure is there and if there was a selftest it would be 
>able to fail without this patch, no?
>

Yes it is incorrect patch ordering. I am fixing it by moving this patch
to early in the series and adding a new i915_vma_create_persistent()
function and avoid touching i915_vma_instance() everywhere (as you
suggested).

<snip>

>>>>--- a/drivers/gpu/drm/i915/i915_vma.c
>>>>+++ b/drivers/gpu/drm/i915/i915_vma.c
>>>>@@ -110,7 +110,8 @@ static void __i915_vma_retire(struct 
>>>>i915_active *ref)
>>>> static struct i915_vma *
>>>> vma_create(struct drm_i915_gem_object *obj,
>>>>        struct i915_address_space *vm,
>>>>-       const struct i915_gtt_view *view)
>>>>+       const struct i915_gtt_view *view,
>>>>+       bool persistent)
>>>> {
>>>>     struct i915_vma *pos = ERR_PTR(-E2BIG);
>>>>     struct i915_vma *vma;
>>>>@@ -197,6 +198,9 @@ vma_create(struct drm_i915_gem_object *obj,
>>>>         __set_bit(I915_VMA_GGTT_BIT, __i915_vma_flags(vma));
>>>>     }
>>>>+    if (persistent)
>>>>+        goto skip_rb_insert;
>>>
>>>Oh so you don't use the gtt_view's fully at all. I now have 
>>>reservations whether that was the right approach. Since you are 
>>>not using the existing rb tree tracking I mean..
>>>
>>>You know if a vma is persistent right? So you could have just 
>>>added special case for persistent vmas to __i915_vma_get_pages and 
>>>still call intel_partial_pages from there. Maybe union over struct 
>>>i915_gtt_view in i915_vma for either the view or struct 
>>>intel_partial_info for persistent ones.
>>>
>>
>>We are using the gtt_view fully in this patch for persistent vmas.
>
>I guess yours and mine definition of fully are different. :)
>
>>But as mentioned above, now we have support multiple mappings
>>for the same gtt_view of the object. For this, the current
>>vma_lookup() falls short. So, we are skipping it.
>
>I get it - but then, having only now noticed how it will be used, I am 
>less convinced touching the ggtt_view code was the right approach.
>
>What about what I proposed above? That you just add code to 
>__i915_vma_get_pages, which in case of a persistent VMA would call 
>intel_partial_pages from there.
>
>If that works I think it's cleaner and we'd just revert the ggtt_view 
>to gtt_view rename.
>

I don't think that is any cleaner. We need to store the partial view
information somewhere for the persistent vmas as well. Why not use
the existing gtt_view for that instead of a new data structure?
In fact long back I had such an implementation and it was looking
odd and was suggested to use the existing infrastructure (gtt_view).

Besides, I think the current i915_vma_lookup method is no longer valid.
(Ever since we had softpinning, lookup should have be based on the VA
and not the vma's view of the object).

Regards,
Niranjana

>Regards,
>
>Tvrtko
>
>>
>>Regards,
>>Niranjana
>>
>>>Regards,
>>>
>>>Tvrtko
>>>
>>>>+
>>>>     rb = NULL;
>>>>     p = &obj->vma.tree.rb_node;
>>>>     while (*p) {
>>>>@@ -221,6 +225,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>>>     rb_link_node(&vma->obj_node, rb, p);
>>>>     rb_insert_color(&vma->obj_node, &obj->vma.tree);
>>>>+skip_rb_insert:
>>>>     if (i915_vma_is_ggtt(vma))
>>>>         /*
>>>>          * We put the GGTT vma at the start of the vma-list, followed
>>>>@@ -279,6 +284,7 @@ i915_vma_lookup(struct drm_i915_gem_object *obj,
>>>>  * @obj: parent &struct drm_i915_gem_object to be mapped
>>>>  * @vm: address space in which the mapping is located
>>>>  * @view: additional mapping requirements
>>>>+ * @persistent: Whether the vma is persistent
>>>>  *
>>>>  * i915_vma_instance() looks up an existing VMA of the @obj in 
>>>>the @vm with
>>>>  * the same @view characteristics. If a match is not found, one 
>>>>is created.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [Intel-gfx] [RFC v4 06/14] drm/i915/vm_bind: Handle persistent vmas
  2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-09-27  2:36     ` Zeng, Oak
  -1 siblings, 0 replies; 62+ messages in thread
From: Zeng, Oak @ 2022-09-27  2:36 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana, intel-gfx, dri-devel
  Cc: Vetter, Daniel, christian.koenig, Hellstrom, Thomas, Zanoni,
	Paulo R, Auld, Matthew



Regards,
Oak

> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Niranjana
> Vishwanathapura
> Sent: September 21, 2022 3:10 AM
> To: intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Cc: Zanoni, Paulo R <paulo.r.zanoni@intel.com>; Hellstrom, Thomas
> <thomas.hellstrom@intel.com>; Auld, Matthew <matthew.auld@intel.com>; Vetter,
> Daniel <daniel.vetter@intel.com>; christian.koenig@amd.com
> Subject: [Intel-gfx] [RFC v4 06/14] drm/i915/vm_bind: Handle persistent vmas
> 
> Treat VM_BIND vmas as persistent across execbuf ioctl calls and handle
> them during the request submission in the execbuff path.
> 
> Support eviction by maintaining a list of evicted persistent vmas
> for rebinding during next submission.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  7 +++
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  4 ++
>  drivers/gpu/drm/i915/i915_gem_gtt.c           | 39 ++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem_gtt.h           |  3 ++
>  drivers/gpu/drm/i915/i915_vma.c               | 46 +++++++++++++++++++
>  drivers/gpu/drm/i915/i915_vma.h               | 45 +++++++++++++-----
>  drivers/gpu/drm/i915/i915_vma_types.h         | 17 +++++++
>  8 files changed, 151 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> index 7ca6a41fc981..236f901b8b9c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -91,6 +91,12 @@ static void i915_gem_vm_bind_remove(struct i915_vma
> *vma, bool release_obj)
>  {
>  	lockdep_assert_held(&vma->vm->vm_bind_lock);
> 
> +	spin_lock(&vma->vm->vm_rebind_lock);
> +	if (!list_empty(&vma->vm_rebind_link))
> +		list_del_init(&vma->vm_rebind_link);
> +	i915_vma_set_purged(vma);
> +	spin_unlock(&vma->vm->vm_rebind_lock);
> +
>  	list_del_init(&vma->vm_bind_link);
>  	list_del_init(&vma->non_priv_vm_bind_link);
>  	i915_vm_bind_it_remove(vma, &vma->vm->va);
> @@ -181,6 +187,7 @@ static struct i915_vma *vm_bind_get_vma(struct
> i915_address_space *vm,
> 
>  	vma->start = va->start;
>  	vma->last = va->start + va->length - 1;
> +	i915_vma_set_persistent(vma);
> 
>  	return vma;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index da4f9dee0397..6db31197fa87 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -296,6 +296,8 @@ void i915_address_space_init(struct i915_address_space
> *vm, int subclass)
>  	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>  	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
>  	GEM_BUG_ON(IS_ERR(vm->root_obj));
> +	INIT_LIST_HEAD(&vm->vm_rebind_list);
> +	spin_lock_init(&vm->vm_rebind_lock);
>  }
> 
>  void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 3f2e87d3bf34..b73d35b4e05d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -273,6 +273,10 @@ struct i915_address_space {
>  	struct list_head vm_bind_list;
>  	/** @vm_bound_list: List of vm_binding completed */
>  	struct list_head vm_bound_list;
> +	/* @vm_rebind_list: list of vmas to be rebinded */
> +	struct list_head vm_rebind_list;
> +	/* @vm_rebind_lock: protects vm_rebound_list */
> +	spinlock_t vm_rebind_lock;
>  	/* @va: tree of persistent vmas */
>  	struct rb_root_cached va;
>  	struct list_head non_priv_vm_bind_list;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
> b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 329ff75b80b9..b7d0844de561 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -25,6 +25,45 @@
>  #include "i915_trace.h"
>  #include "i915_vgpu.h"
> 
> +/**
> + * i915_vm_sync() - Wait until address space is not in use
> + * @vm: address space
> + *
> + * Waits until all requests using the address space are complete.
> + *
> + * Returns: 0 if success, -ve err code upon failure
> + */
> +int i915_vm_sync(struct i915_address_space *vm)
> +{
> +	int ret;
> +
> +	/* Wait for all requests under this vm to finish */
> +	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
> +				    DMA_RESV_USAGE_BOOKKEEP, false,
> +				    MAX_SCHEDULE_TIMEOUT);
> +	if (ret < 0)
> +		return ret;
> +	else if (ret > 0)
> +		return 0;
> +	else
> +		return -ETIMEDOUT;
> +}
> +
> +/**
> + * i915_vm_is_active() - Check if address space is being used
> + * @vm: address space
> + *
> + * Check if any request using the specified address space is
> + * active.
> + *
> + * Returns: true if address space is active, false otherwise.
> + */
> +bool i915_vm_is_active(const struct i915_address_space *vm)
> +{
> +	return !dma_resv_test_signaled(vm->root_obj->base.resv,
> +				       DMA_RESV_USAGE_BOOKKEEP);
> +}
> +
>  int i915_gem_gtt_prepare_pages(struct drm_i915_gem_object *obj,
>  			       struct sg_table *pages)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h
> b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 8c2f57eb5dda..a5bbdc59d9df 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -51,4 +51,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
> 
>  #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
> 
> +int i915_vm_sync(struct i915_address_space *vm);
> +bool i915_vm_is_active(const struct i915_address_space *vm);
> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index aa332ad69ec2..ff216e9a2c8d 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
> 
>  	INIT_LIST_HEAD(&vma->vm_bind_link);
>  	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
> +	INIT_LIST_HEAD(&vma->vm_rebind_link);
>  	return vma;
> 
>  err_unlock:
> @@ -387,6 +388,24 @@ int i915_vma_wait_for_bind(struct i915_vma *vma)
>  	return err;
>  }
> 
> +/**
> + * i915_vma_sync() - Wait for the vma to be idle
> + * @vma: vma to be tested
> + *
> + * Returns 0 on success and error code on failure
> + */
> +int i915_vma_sync(struct i915_vma *vma)
> +{
> +	int ret;
> +
> +	/* Wait for the asynchronous bindings and pending GPU reads */
> +	ret = i915_active_wait(&vma->active);
> +	if (ret || !i915_vma_is_persistent(vma) || i915_vma_is_purged(vma))
> +		return ret;
> +
> +	return i915_vm_sync(vma->vm);

Hi, I try to understand why we call vm_sync here. As I understand it, each vm has many vmas. The vma_sync function wait for a single vma to be idle, for example, wait for all the request/gpu task using this vma to complete. While vm_sync wait for the whole vm to be idle. To me vm_sync essentially wait for all the vmas in this vm to be idle. Isn't it?

Thanks,
Oak

> +}
> +
>  #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
>  static int i915_vma_verify_bind_complete(struct i915_vma *vma)
>  {
> @@ -1654,6 +1673,13 @@ static void force_unbind(struct i915_vma *vma)
>  	if (!drm_mm_node_allocated(&vma->node))
>  		return;
> 
> +	/*
> +	 * Mark persistent vma as purged to avoid it waiting
> +	 * for VM to be released.
> +	 */
> +	if (i915_vma_is_persistent(vma))
> +		i915_vma_set_purged(vma);
> +
>  	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
>  	WARN_ON(__i915_vma_unbind(vma));
>  	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
> @@ -1846,6 +1872,8 @@ int _i915_vma_move_to_active(struct i915_vma *vma,
>  	int err;
> 
>  	assert_object_held(obj);
> +	if (i915_vma_is_persistent(vma))
> +		return -EINVAL;
> 
>  	GEM_BUG_ON(!vma->pages);
> 
> @@ -2015,6 +2043,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
>  	__i915_vma_evict(vma, false);
> 
>  	drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
> +
> +	if (i915_vma_is_persistent(vma)) {
> +		spin_lock(&vma->vm->vm_rebind_lock);
> +		if (list_empty(&vma->vm_rebind_link) &&
> +		    !i915_vma_is_purged(vma))
> +			list_add_tail(&vma->vm_rebind_link,
> +				      &vma->vm->vm_rebind_list);
> +		spin_unlock(&vma->vm->vm_rebind_lock);
> +	}
> +
>  	return 0;
>  }
> 
> @@ -2046,6 +2084,14 @@ static struct dma_fence
> *__i915_vma_unbind_async(struct i915_vma *vma)
>  		return ERR_PTR(-EBUSY);
>  	}
> 
> +	if (__i915_sw_fence_await_reservation(&vma->resource->chain,
> +					      vma->obj->base.resv,
> +					      DMA_RESV_USAGE_BOOKKEEP,
> +					      i915_fence_timeout(vma->vm->i915),
> +					      I915_FENCE_GFP) < 0) {
> +		return ERR_PTR(-EBUSY);
> +	}
> +
>  	fence = __i915_vma_evict(vma, true);
> 
>  	drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 6feef0305fe1..aa536c9ce472 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
> 
>  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
>  #define I915_VMA_RELEASE_MAP BIT(0)
> -
> -static inline bool i915_vma_is_active(const struct i915_vma *vma)
> -{
> -	return !i915_active_is_idle(&vma->active);
> -}
> -
>  /* do not reserve memory to prevent deadlocks */
>  #define __EXEC_OBJECT_NO_RESERVE BIT(31)
> 
> @@ -138,6 +132,38 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
>  	return i915_vm_to_ggtt(vma->vm)->pin_bias;
>  }
> 
> +static inline bool i915_vma_is_persistent(const struct i915_vma *vma)
> +{
> +	return test_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline void i915_vma_set_persistent(struct i915_vma *vma)
> +{
> +	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline bool i915_vma_is_purged(const struct i915_vma *vma)
> +{
> +	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline void i915_vma_set_purged(struct i915_vma *vma)
> +{
> +	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline bool i915_vma_is_active(const struct i915_vma *vma)
> +{
> +	if (i915_vma_is_persistent(vma)) {
> +		if (i915_vma_is_purged(vma))
> +			return false;
> +
> +		return i915_vm_is_active(vma->vm);
> +	}
> +
> +	return !i915_active_is_idle(&vma->active);
> +}
> +
>  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
>  {
>  	i915_gem_object_get(vma->obj);
> @@ -406,12 +432,7 @@ void i915_vma_make_shrinkable(struct i915_vma *vma);
>  void i915_vma_make_purgeable(struct i915_vma *vma);
> 
>  int i915_vma_wait_for_bind(struct i915_vma *vma);
> -
> -static inline int i915_vma_sync(struct i915_vma *vma)
> -{
> -	/* Wait for the asynchronous bindings and pending GPU reads */
> -	return i915_active_wait(&vma->active);
> -}
> +int i915_vma_sync(struct i915_vma *vma);
> 
>  /**
>   * i915_vma_get_current_resource - Get the current resource of the vma
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> b/drivers/gpu/drm/i915/i915_vma_types.h
> index 6d727c2d9802..d21bf97febaa 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -264,6 +264,21 @@ struct i915_vma {
>  #define I915_VMA_SCANOUT_BIT	17
>  #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
> 
> +  /**
> +   * I915_VMA_PERSISTENT_BIT:
> +   * The vma is persistent (created with VM_BIND call).
> +   *
> +   * I915_VMA_PURGED_BIT:
> +   * The persistent vma is force unbound either due to VM_UNBIND call
> +   * from UMD or VM is released. Do not check/wait for VM activeness
> +   * in i915_vma_is_active() and i915_vma_sync() calls.
> +   */
> +#define I915_VMA_PERSISTENT_BIT	19
> +#define I915_VMA_PURGED_BIT	20
> +
> +#define I915_VMA_PERSISTENT	((int)BIT(I915_VMA_PERSISTENT_BIT))
> +#define I915_VMA_PURGED		((int)BIT(I915_VMA_PURGED_BIT))
> +
>  	struct i915_active active;
> 
>  #define I915_VMA_PAGES_BIAS 24
> @@ -293,6 +308,8 @@ struct i915_vma {
>  	struct list_head vm_bind_link;
>  	/* @non_priv_vm_bind_link: Link in non-private persistent VMA list */
>  	struct list_head non_priv_vm_bind_link;
> +	/* @vm_rebind_link: link to vm_rebind_list and protected by
> vm_rebind_lock */
> +	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
> 
>  	/** Interval tree structures for persistent vma */
> 
> --
> 2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 06/14] drm/i915/vm_bind: Handle persistent vmas
@ 2022-09-27  2:36     ` Zeng, Oak
  0 siblings, 0 replies; 62+ messages in thread
From: Zeng, Oak @ 2022-09-27  2:36 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana, intel-gfx, dri-devel
  Cc: Vetter, Daniel, christian.koenig, Hellstrom, Thomas, Zanoni,
	Paulo R, Auld, Matthew



Regards,
Oak

> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Niranjana
> Vishwanathapura
> Sent: September 21, 2022 3:10 AM
> To: intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Cc: Zanoni, Paulo R <paulo.r.zanoni@intel.com>; Hellstrom, Thomas
> <thomas.hellstrom@intel.com>; Auld, Matthew <matthew.auld@intel.com>; Vetter,
> Daniel <daniel.vetter@intel.com>; christian.koenig@amd.com
> Subject: [Intel-gfx] [RFC v4 06/14] drm/i915/vm_bind: Handle persistent vmas
> 
> Treat VM_BIND vmas as persistent across execbuf ioctl calls and handle
> them during the request submission in the execbuff path.
> 
> Support eviction by maintaining a list of evicted persistent vmas
> for rebinding during next submission.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  7 +++
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  4 ++
>  drivers/gpu/drm/i915/i915_gem_gtt.c           | 39 ++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem_gtt.h           |  3 ++
>  drivers/gpu/drm/i915/i915_vma.c               | 46 +++++++++++++++++++
>  drivers/gpu/drm/i915/i915_vma.h               | 45 +++++++++++++-----
>  drivers/gpu/drm/i915/i915_vma_types.h         | 17 +++++++
>  8 files changed, 151 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> index 7ca6a41fc981..236f901b8b9c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -91,6 +91,12 @@ static void i915_gem_vm_bind_remove(struct i915_vma
> *vma, bool release_obj)
>  {
>  	lockdep_assert_held(&vma->vm->vm_bind_lock);
> 
> +	spin_lock(&vma->vm->vm_rebind_lock);
> +	if (!list_empty(&vma->vm_rebind_link))
> +		list_del_init(&vma->vm_rebind_link);
> +	i915_vma_set_purged(vma);
> +	spin_unlock(&vma->vm->vm_rebind_lock);
> +
>  	list_del_init(&vma->vm_bind_link);
>  	list_del_init(&vma->non_priv_vm_bind_link);
>  	i915_vm_bind_it_remove(vma, &vma->vm->va);
> @@ -181,6 +187,7 @@ static struct i915_vma *vm_bind_get_vma(struct
> i915_address_space *vm,
> 
>  	vma->start = va->start;
>  	vma->last = va->start + va->length - 1;
> +	i915_vma_set_persistent(vma);
> 
>  	return vma;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index da4f9dee0397..6db31197fa87 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -296,6 +296,8 @@ void i915_address_space_init(struct i915_address_space
> *vm, int subclass)
>  	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>  	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
>  	GEM_BUG_ON(IS_ERR(vm->root_obj));
> +	INIT_LIST_HEAD(&vm->vm_rebind_list);
> +	spin_lock_init(&vm->vm_rebind_lock);
>  }
> 
>  void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 3f2e87d3bf34..b73d35b4e05d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -273,6 +273,10 @@ struct i915_address_space {
>  	struct list_head vm_bind_list;
>  	/** @vm_bound_list: List of vm_binding completed */
>  	struct list_head vm_bound_list;
> +	/* @vm_rebind_list: list of vmas to be rebinded */
> +	struct list_head vm_rebind_list;
> +	/* @vm_rebind_lock: protects vm_rebound_list */
> +	spinlock_t vm_rebind_lock;
>  	/* @va: tree of persistent vmas */
>  	struct rb_root_cached va;
>  	struct list_head non_priv_vm_bind_list;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
> b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 329ff75b80b9..b7d0844de561 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -25,6 +25,45 @@
>  #include "i915_trace.h"
>  #include "i915_vgpu.h"
> 
> +/**
> + * i915_vm_sync() - Wait until address space is not in use
> + * @vm: address space
> + *
> + * Waits until all requests using the address space are complete.
> + *
> + * Returns: 0 if success, -ve err code upon failure
> + */
> +int i915_vm_sync(struct i915_address_space *vm)
> +{
> +	int ret;
> +
> +	/* Wait for all requests under this vm to finish */
> +	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
> +				    DMA_RESV_USAGE_BOOKKEEP, false,
> +				    MAX_SCHEDULE_TIMEOUT);
> +	if (ret < 0)
> +		return ret;
> +	else if (ret > 0)
> +		return 0;
> +	else
> +		return -ETIMEDOUT;
> +}
> +
> +/**
> + * i915_vm_is_active() - Check if address space is being used
> + * @vm: address space
> + *
> + * Check if any request using the specified address space is
> + * active.
> + *
> + * Returns: true if address space is active, false otherwise.
> + */
> +bool i915_vm_is_active(const struct i915_address_space *vm)
> +{
> +	return !dma_resv_test_signaled(vm->root_obj->base.resv,
> +				       DMA_RESV_USAGE_BOOKKEEP);
> +}
> +
>  int i915_gem_gtt_prepare_pages(struct drm_i915_gem_object *obj,
>  			       struct sg_table *pages)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h
> b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 8c2f57eb5dda..a5bbdc59d9df 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -51,4 +51,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
> 
>  #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
> 
> +int i915_vm_sync(struct i915_address_space *vm);
> +bool i915_vm_is_active(const struct i915_address_space *vm);
> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index aa332ad69ec2..ff216e9a2c8d 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
> 
>  	INIT_LIST_HEAD(&vma->vm_bind_link);
>  	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
> +	INIT_LIST_HEAD(&vma->vm_rebind_link);
>  	return vma;
> 
>  err_unlock:
> @@ -387,6 +388,24 @@ int i915_vma_wait_for_bind(struct i915_vma *vma)
>  	return err;
>  }
> 
> +/**
> + * i915_vma_sync() - Wait for the vma to be idle
> + * @vma: vma to be tested
> + *
> + * Returns 0 on success and error code on failure
> + */
> +int i915_vma_sync(struct i915_vma *vma)
> +{
> +	int ret;
> +
> +	/* Wait for the asynchronous bindings and pending GPU reads */
> +	ret = i915_active_wait(&vma->active);
> +	if (ret || !i915_vma_is_persistent(vma) || i915_vma_is_purged(vma))
> +		return ret;
> +
> +	return i915_vm_sync(vma->vm);

Hi, I try to understand why we call vm_sync here. As I understand it, each vm has many vmas. The vma_sync function wait for a single vma to be idle, for example, wait for all the request/gpu task using this vma to complete. While vm_sync wait for the whole vm to be idle. To me vm_sync essentially wait for all the vmas in this vm to be idle. Isn't it?

Thanks,
Oak

> +}
> +
>  #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
>  static int i915_vma_verify_bind_complete(struct i915_vma *vma)
>  {
> @@ -1654,6 +1673,13 @@ static void force_unbind(struct i915_vma *vma)
>  	if (!drm_mm_node_allocated(&vma->node))
>  		return;
> 
> +	/*
> +	 * Mark persistent vma as purged to avoid it waiting
> +	 * for VM to be released.
> +	 */
> +	if (i915_vma_is_persistent(vma))
> +		i915_vma_set_purged(vma);
> +
>  	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
>  	WARN_ON(__i915_vma_unbind(vma));
>  	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
> @@ -1846,6 +1872,8 @@ int _i915_vma_move_to_active(struct i915_vma *vma,
>  	int err;
> 
>  	assert_object_held(obj);
> +	if (i915_vma_is_persistent(vma))
> +		return -EINVAL;
> 
>  	GEM_BUG_ON(!vma->pages);
> 
> @@ -2015,6 +2043,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
>  	__i915_vma_evict(vma, false);
> 
>  	drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
> +
> +	if (i915_vma_is_persistent(vma)) {
> +		spin_lock(&vma->vm->vm_rebind_lock);
> +		if (list_empty(&vma->vm_rebind_link) &&
> +		    !i915_vma_is_purged(vma))
> +			list_add_tail(&vma->vm_rebind_link,
> +				      &vma->vm->vm_rebind_list);
> +		spin_unlock(&vma->vm->vm_rebind_lock);
> +	}
> +
>  	return 0;
>  }
> 
> @@ -2046,6 +2084,14 @@ static struct dma_fence
> *__i915_vma_unbind_async(struct i915_vma *vma)
>  		return ERR_PTR(-EBUSY);
>  	}
> 
> +	if (__i915_sw_fence_await_reservation(&vma->resource->chain,
> +					      vma->obj->base.resv,
> +					      DMA_RESV_USAGE_BOOKKEEP,
> +					      i915_fence_timeout(vma->vm->i915),
> +					      I915_FENCE_GFP) < 0) {
> +		return ERR_PTR(-EBUSY);
> +	}
> +
>  	fence = __i915_vma_evict(vma, true);
> 
>  	drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 6feef0305fe1..aa536c9ce472 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
> 
>  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
>  #define I915_VMA_RELEASE_MAP BIT(0)
> -
> -static inline bool i915_vma_is_active(const struct i915_vma *vma)
> -{
> -	return !i915_active_is_idle(&vma->active);
> -}
> -
>  /* do not reserve memory to prevent deadlocks */
>  #define __EXEC_OBJECT_NO_RESERVE BIT(31)
> 
> @@ -138,6 +132,38 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
>  	return i915_vm_to_ggtt(vma->vm)->pin_bias;
>  }
> 
> +static inline bool i915_vma_is_persistent(const struct i915_vma *vma)
> +{
> +	return test_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline void i915_vma_set_persistent(struct i915_vma *vma)
> +{
> +	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline bool i915_vma_is_purged(const struct i915_vma *vma)
> +{
> +	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline void i915_vma_set_purged(struct i915_vma *vma)
> +{
> +	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline bool i915_vma_is_active(const struct i915_vma *vma)
> +{
> +	if (i915_vma_is_persistent(vma)) {
> +		if (i915_vma_is_purged(vma))
> +			return false;
> +
> +		return i915_vm_is_active(vma->vm);
> +	}
> +
> +	return !i915_active_is_idle(&vma->active);
> +}
> +
>  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
>  {
>  	i915_gem_object_get(vma->obj);
> @@ -406,12 +432,7 @@ void i915_vma_make_shrinkable(struct i915_vma *vma);
>  void i915_vma_make_purgeable(struct i915_vma *vma);
> 
>  int i915_vma_wait_for_bind(struct i915_vma *vma);
> -
> -static inline int i915_vma_sync(struct i915_vma *vma)
> -{
> -	/* Wait for the asynchronous bindings and pending GPU reads */
> -	return i915_active_wait(&vma->active);
> -}
> +int i915_vma_sync(struct i915_vma *vma);
> 
>  /**
>   * i915_vma_get_current_resource - Get the current resource of the vma
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> b/drivers/gpu/drm/i915/i915_vma_types.h
> index 6d727c2d9802..d21bf97febaa 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -264,6 +264,21 @@ struct i915_vma {
>  #define I915_VMA_SCANOUT_BIT	17
>  #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
> 
> +  /**
> +   * I915_VMA_PERSISTENT_BIT:
> +   * The vma is persistent (created with VM_BIND call).
> +   *
> +   * I915_VMA_PURGED_BIT:
> +   * The persistent vma is force unbound either due to VM_UNBIND call
> +   * from UMD or VM is released. Do not check/wait for VM activeness
> +   * in i915_vma_is_active() and i915_vma_sync() calls.
> +   */
> +#define I915_VMA_PERSISTENT_BIT	19
> +#define I915_VMA_PURGED_BIT	20
> +
> +#define I915_VMA_PERSISTENT	((int)BIT(I915_VMA_PERSISTENT_BIT))
> +#define I915_VMA_PURGED		((int)BIT(I915_VMA_PURGED_BIT))
> +
>  	struct i915_active active;
> 
>  #define I915_VMA_PAGES_BIAS 24
> @@ -293,6 +308,8 @@ struct i915_vma {
>  	struct list_head vm_bind_link;
>  	/* @non_priv_vm_bind_link: Link in non-private persistent VMA list */
>  	struct list_head non_priv_vm_bind_link;
> +	/* @vm_rebind_link: link to vm_rebind_list and protected by
> vm_rebind_lock */
> +	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
> 
>  	/** Interval tree structures for persistent vma */
> 
> --
> 2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 06/14] drm/i915/vm_bind: Handle persistent vmas
  2022-09-27  2:36     ` Zeng, Oak
  (?)
@ 2022-09-27  5:45     ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-27  5:45 UTC (permalink / raw)
  To: Zeng, Oak
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Hellstrom, Thomas, Auld,
	Matthew, Vetter, Daniel, christian.koenig

On Mon, Sep 26, 2022 at 07:36:24PM -0700, Zeng, Oak wrote:
>
>
>Regards,
>Oak
>
>> -----Original Message-----
>> From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Niranjana
>> Vishwanathapura
>> Sent: September 21, 2022 3:10 AM
>> To: intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
>> Cc: Zanoni, Paulo R <paulo.r.zanoni@intel.com>; Hellstrom, Thomas
>> <thomas.hellstrom@intel.com>; Auld, Matthew <matthew.auld@intel.com>; Vetter,
>> Daniel <daniel.vetter@intel.com>; christian.koenig@amd.com
>> Subject: [Intel-gfx] [RFC v4 06/14] drm/i915/vm_bind: Handle persistent vmas
>>
>> Treat VM_BIND vmas as persistent across execbuf ioctl calls and handle
>> them during the request submission in the execbuff path.
>>
>> Support eviction by maintaining a list of evicted persistent vmas
>> for rebinding during next submission.
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>> ---
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  7 +++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  4 ++
>>  drivers/gpu/drm/i915/i915_gem_gtt.c           | 39 ++++++++++++++++
>>  drivers/gpu/drm/i915/i915_gem_gtt.h           |  3 ++
>>  drivers/gpu/drm/i915/i915_vma.c               | 46 +++++++++++++++++++
>>  drivers/gpu/drm/i915/i915_vma.h               | 45 +++++++++++++-----
>>  drivers/gpu/drm/i915/i915_vma_types.h         | 17 +++++++
>>  8 files changed, 151 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> index 7ca6a41fc981..236f901b8b9c 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> @@ -91,6 +91,12 @@ static void i915_gem_vm_bind_remove(struct i915_vma
>> *vma, bool release_obj)
>>  {
>>       lockdep_assert_held(&vma->vm->vm_bind_lock);
>>
>> +     spin_lock(&vma->vm->vm_rebind_lock);
>> +     if (!list_empty(&vma->vm_rebind_link))
>> +             list_del_init(&vma->vm_rebind_link);
>> +     i915_vma_set_purged(vma);
>> +     spin_unlock(&vma->vm->vm_rebind_lock);
>> +
>>       list_del_init(&vma->vm_bind_link);
>>       list_del_init(&vma->non_priv_vm_bind_link);
>>       i915_vm_bind_it_remove(vma, &vma->vm->va);
>> @@ -181,6 +187,7 @@ static struct i915_vma *vm_bind_get_vma(struct
>> i915_address_space *vm,
>>
>>       vma->start = va->start;
>>       vma->last = va->start + va->length - 1;
>> +     i915_vma_set_persistent(vma);
>>
>>       return vma;
>>  }
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index da4f9dee0397..6db31197fa87 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -296,6 +296,8 @@ void i915_address_space_init(struct i915_address_space
>> *vm, int subclass)
>>       INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>>       vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
>>       GEM_BUG_ON(IS_ERR(vm->root_obj));
>> +     INIT_LIST_HEAD(&vm->vm_rebind_list);
>> +     spin_lock_init(&vm->vm_rebind_lock);
>>  }
>>
>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index 3f2e87d3bf34..b73d35b4e05d 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -273,6 +273,10 @@ struct i915_address_space {
>>       struct list_head vm_bind_list;
>>       /** @vm_bound_list: List of vm_binding completed */
>>       struct list_head vm_bound_list;
>> +     /* @vm_rebind_list: list of vmas to be rebinded */
>> +     struct list_head vm_rebind_list;
>> +     /* @vm_rebind_lock: protects vm_rebound_list */
>> +     spinlock_t vm_rebind_lock;
>>       /* @va: tree of persistent vmas */
>>       struct rb_root_cached va;
>>       struct list_head non_priv_vm_bind_list;
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index 329ff75b80b9..b7d0844de561 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -25,6 +25,45 @@
>>  #include "i915_trace.h"
>>  #include "i915_vgpu.h"
>>
>> +/**
>> + * i915_vm_sync() - Wait until address space is not in use
>> + * @vm: address space
>> + *
>> + * Waits until all requests using the address space are complete.
>> + *
>> + * Returns: 0 if success, -ve err code upon failure
>> + */
>> +int i915_vm_sync(struct i915_address_space *vm)
>> +{
>> +     int ret;
>> +
>> +     /* Wait for all requests under this vm to finish */
>> +     ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
>> +                                 DMA_RESV_USAGE_BOOKKEEP, false,
>> +                                 MAX_SCHEDULE_TIMEOUT);
>> +     if (ret < 0)
>> +             return ret;
>> +     else if (ret > 0)
>> +             return 0;
>> +     else
>> +             return -ETIMEDOUT;
>> +}
>> +
>> +/**
>> + * i915_vm_is_active() - Check if address space is being used
>> + * @vm: address space
>> + *
>> + * Check if any request using the specified address space is
>> + * active.
>> + *
>> + * Returns: true if address space is active, false otherwise.
>> + */
>> +bool i915_vm_is_active(const struct i915_address_space *vm)
>> +{
>> +     return !dma_resv_test_signaled(vm->root_obj->base.resv,
>> +                                    DMA_RESV_USAGE_BOOKKEEP);
>> +}
>> +
>>  int i915_gem_gtt_prepare_pages(struct drm_i915_gem_object *obj,
>>                              struct sg_table *pages)
>>  {
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h
>> b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> index 8c2f57eb5dda..a5bbdc59d9df 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> @@ -51,4 +51,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
>>
>>  #define PIN_OFFSET_MASK              I915_GTT_PAGE_MASK
>>
>> +int i915_vm_sync(struct i915_address_space *vm);
>> +bool i915_vm_is_active(const struct i915_address_space *vm);
>> +
>>  #endif
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>> index aa332ad69ec2..ff216e9a2c8d 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>
>>       INIT_LIST_HEAD(&vma->vm_bind_link);
>>       INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>> +     INIT_LIST_HEAD(&vma->vm_rebind_link);
>>       return vma;
>>
>>  err_unlock:
>> @@ -387,6 +388,24 @@ int i915_vma_wait_for_bind(struct i915_vma *vma)
>>       return err;
>>  }
>>
>> +/**
>> + * i915_vma_sync() - Wait for the vma to be idle
>> + * @vma: vma to be tested
>> + *
>> + * Returns 0 on success and error code on failure
>> + */
>> +int i915_vma_sync(struct i915_vma *vma)
>> +{
>> +     int ret;
>> +
>> +     /* Wait for the asynchronous bindings and pending GPU reads */
>> +     ret = i915_active_wait(&vma->active);
>> +     if (ret || !i915_vma_is_persistent(vma) || i915_vma_is_purged(vma))
>> +             return ret;
>> +
>> +     return i915_vm_sync(vma->vm);
>
>Hi, I try to understand why we call vm_sync here. As I understand it, each vm has many vmas. The vma_sync function wait for a single vma to be idle, for example, wait for all the request/gpu task using this vma to complete. While vm_sync wait for the whole vm to be idle. To me vm_sync essentially wait for all the vmas in this vm to be idle. Isn't it?
>

Yah. But a persistent vma is active as long as the VM is active (also checkout i915_vma_is_active()). Hence we call i915_vm_sync() here.
I am rearranging the patches a bit and this part will go into a separate patch with more description. Hope that will help.

Regards,
Niranjana

>Thanks,
>Oak
>
>> +}
>> +
>>  #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
>>  static int i915_vma_verify_bind_complete(struct i915_vma *vma)
>>  {
>> @@ -1654,6 +1673,13 @@ static void force_unbind(struct i915_vma *vma)
>>       if (!drm_mm_node_allocated(&vma->node))
>>               return;
>>
>> +     /*
>> +      * Mark persistent vma as purged to avoid it waiting
>> +      * for VM to be released.
>> +      */
>> +     if (i915_vma_is_persistent(vma))
>> +             i915_vma_set_purged(vma);
>> +
>>       atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
>>       WARN_ON(__i915_vma_unbind(vma));
>>       GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
>> @@ -1846,6 +1872,8 @@ int _i915_vma_move_to_active(struct i915_vma *vma,
>>       int err;
>>
>>       assert_object_held(obj);
>> +     if (i915_vma_is_persistent(vma))
>> +             return -EINVAL;
>>
>>       GEM_BUG_ON(!vma->pages);
>>
>> @@ -2015,6 +2043,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
>>       __i915_vma_evict(vma, false);
>>
>>       drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
>> +
>> +     if (i915_vma_is_persistent(vma)) {
>> +             spin_lock(&vma->vm->vm_rebind_lock);
>> +             if (list_empty(&vma->vm_rebind_link) &&
>> +                 !i915_vma_is_purged(vma))
>> +                     list_add_tail(&vma->vm_rebind_link,
>> +                                   &vma->vm->vm_rebind_list);
>> +             spin_unlock(&vma->vm->vm_rebind_lock);
>> +     }
>> +
>>       return 0;
>>  }
>>
>> @@ -2046,6 +2084,14 @@ static struct dma_fence
>> *__i915_vma_unbind_async(struct i915_vma *vma)
>>               return ERR_PTR(-EBUSY);
>>       }
>>
>> +     if (__i915_sw_fence_await_reservation(&vma->resource->chain,
>> +                                           vma->obj->base.resv,
>> +                                           DMA_RESV_USAGE_BOOKKEEP,
>> +                                           i915_fence_timeout(vma->vm->i915),
>> +                                           I915_FENCE_GFP) < 0) {
>> +             return ERR_PTR(-EBUSY);
>> +     }
>> +
>>       fence = __i915_vma_evict(vma, true);
>>
>>       drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
>> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
>> index 6feef0305fe1..aa536c9ce472 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.h
>> +++ b/drivers/gpu/drm/i915/i915_vma.h
>> @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>>
>>  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
>>  #define I915_VMA_RELEASE_MAP BIT(0)
>> -
>> -static inline bool i915_vma_is_active(const struct i915_vma *vma)
>> -{
>> -     return !i915_active_is_idle(&vma->active);
>> -}
>> -
>>  /* do not reserve memory to prevent deadlocks */
>>  #define __EXEC_OBJECT_NO_RESERVE BIT(31)
>>
>> @@ -138,6 +132,38 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
>>       return i915_vm_to_ggtt(vma->vm)->pin_bias;
>>  }
>>
>> +static inline bool i915_vma_is_persistent(const struct i915_vma *vma)
>> +{
>> +     return test_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline void i915_vma_set_persistent(struct i915_vma *vma)
>> +{
>> +     set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline bool i915_vma_is_purged(const struct i915_vma *vma)
>> +{
>> +     return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline void i915_vma_set_purged(struct i915_vma *vma)
>> +{
>> +     set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline bool i915_vma_is_active(const struct i915_vma *vma)
>> +{
>> +     if (i915_vma_is_persistent(vma)) {
>> +             if (i915_vma_is_purged(vma))
>> +                     return false;
>> +
>> +             return i915_vm_is_active(vma->vm);
>> +     }
>> +
>> +     return !i915_active_is_idle(&vma->active);
>> +}
>> +
>>  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
>>  {
>>       i915_gem_object_get(vma->obj);
>> @@ -406,12 +432,7 @@ void i915_vma_make_shrinkable(struct i915_vma *vma);
>>  void i915_vma_make_purgeable(struct i915_vma *vma);
>>
>>  int i915_vma_wait_for_bind(struct i915_vma *vma);
>> -
>> -static inline int i915_vma_sync(struct i915_vma *vma)
>> -{
>> -     /* Wait for the asynchronous bindings and pending GPU reads */
>> -     return i915_active_wait(&vma->active);
>> -}
>> +int i915_vma_sync(struct i915_vma *vma);
>>
>>  /**
>>   * i915_vma_get_current_resource - Get the current resource of the vma
>> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
>> b/drivers/gpu/drm/i915/i915_vma_types.h
>> index 6d727c2d9802..d21bf97febaa 100644
>> --- a/drivers/gpu/drm/i915/i915_vma_types.h
>> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>> @@ -264,6 +264,21 @@ struct i915_vma {
>>  #define I915_VMA_SCANOUT_BIT 17
>>  #define I915_VMA_SCANOUT     ((int)BIT(I915_VMA_SCANOUT_BIT))
>>
>> +  /**
>> +   * I915_VMA_PERSISTENT_BIT:
>> +   * The vma is persistent (created with VM_BIND call).
>> +   *
>> +   * I915_VMA_PURGED_BIT:
>> +   * The persistent vma is force unbound either due to VM_UNBIND call
>> +   * from UMD or VM is released. Do not check/wait for VM activeness
>> +   * in i915_vma_is_active() and i915_vma_sync() calls.
>> +   */
>> +#define I915_VMA_PERSISTENT_BIT      19
>> +#define I915_VMA_PURGED_BIT  20
>> +
>> +#define I915_VMA_PERSISTENT  ((int)BIT(I915_VMA_PERSISTENT_BIT))
>> +#define I915_VMA_PURGED              ((int)BIT(I915_VMA_PURGED_BIT))
>> +
>>       struct i915_active active;
>>
>>  #define I915_VMA_PAGES_BIAS 24
>> @@ -293,6 +308,8 @@ struct i915_vma {
>>       struct list_head vm_bind_link;
>>       /* @non_priv_vm_bind_link: Link in non-private persistent VMA list */
>>       struct list_head non_priv_vm_bind_link;
>> +     /* @vm_rebind_link: link to vm_rebind_list and protected by
>> vm_rebind_lock */
>> +     struct list_head vm_rebind_link; /* Link in vm_rebind_list */
>>
>>       /** Interval tree structures for persistent vma */
>>
>> --
>> 2.21.0.rc0.32.g243a4c7e27
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 13/14] drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  2022-09-26 17:09         ` Niranjana Vishwanathapura
@ 2022-09-27  9:28           ` Tvrtko Ursulin
  2022-09-27 15:37             ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 62+ messages in thread
From: Tvrtko Ursulin @ 2022-09-27  9:28 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	matthew.auld, daniel.vetter, christian.koenig


On 26/09/2022 18:09, Niranjana Vishwanathapura wrote:
> On Mon, Sep 26, 2022 at 05:26:12PM +0100, Tvrtko Ursulin wrote:
>>
>> On 24/09/2022 05:30, Niranjana Vishwanathapura wrote:
>>> On Fri, Sep 23, 2022 at 09:40:20AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
>>>>> vma_lookup is tied to segment of the object instead of section
>>>>
>>>> Can be, but not only that. It would be more accurate to say it is 
>>>> based of gtt views.
>>>
>>> Yah, but new code is also based on gtt views, the only difference
>>> is that now there can be multiple mappings (at different VAs)
>>> to the same gtt_view of the object.
>>>
>>>>
>>>>> of VA space. Hence, it do not support aliasing (ie., multiple
>>>>> bindings to the same section of the object).
>>>>> Skip vma_lookup for persistent vmas as it supports aliasing.
>>>>
>>>> What's broken without this patch? If something is, should it go 
>>>> somewhere earlier in the series? If so should be mentioned in the 
>>>> commit message.
>>>>
>>>> Or is it just a performance optimisation to skip unused tracking? If 
>>>> so should also be mentioned in the commit message.
>>>>
>>>
>>> No, it is not a performance optimization.
>>> The vma_lookup is based on the fact that there can be only one mapping
>>> for a given gtt_view of the object.
>>> So, it was looking for gtt_view to find the mapping.
>>>
>>> But now, as I mentioned above, there can be multiple mappings for a
>>> given gtt_view of the object. Hence the vma_lookup method won't work
>>> here. Hence, it is being skipped for persistent vmas.
>>
>> Right, so in that case isn't this patch too late in the series? 
>> Granted you only allow _userspace_ to use vm bind in 14/14, but the 
>> kernel infrastructure is there and if there was a selftest it would be 
>> able to fail without this patch, no?
>>
> 
> Yes it is incorrect patch ordering. I am fixing it by moving this patch
> to early in the series and adding a new i915_vma_create_persistent()
> function and avoid touching i915_vma_instance() everywhere (as you
> suggested).
> 
> <snip>
> 
>>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>>> @@ -110,7 +110,8 @@ static void __i915_vma_retire(struct 
>>>>> i915_active *ref)
>>>>>  static struct i915_vma *
>>>>>  vma_create(struct drm_i915_gem_object *obj,
>>>>>         struct i915_address_space *vm,
>>>>> -       const struct i915_gtt_view *view)
>>>>> +       const struct i915_gtt_view *view,
>>>>> +       bool persistent)
>>>>>  {
>>>>>      struct i915_vma *pos = ERR_PTR(-E2BIG);
>>>>>      struct i915_vma *vma;
>>>>> @@ -197,6 +198,9 @@ vma_create(struct drm_i915_gem_object *obj,
>>>>>          __set_bit(I915_VMA_GGTT_BIT, __i915_vma_flags(vma));
>>>>>      }
>>>>> +    if (persistent)
>>>>> +        goto skip_rb_insert;
>>>>
>>>> Oh so you don't use the gtt_view's fully at all. I now have 
>>>> reservations whether that was the right approach. Since you are not 
>>>> using the existing rb tree tracking I mean..
>>>>
>>>> You know if a vma is persistent right? So you could have just added 
>>>> special case for persistent vmas to __i915_vma_get_pages and still 
>>>> call intel_partial_pages from there. Maybe union over struct 
>>>> i915_gtt_view in i915_vma for either the view or struct 
>>>> intel_partial_info for persistent ones.
>>>>
>>>
>>> We are using the gtt_view fully in this patch for persistent vmas.
>>
>> I guess yours and mine definition of fully are different. :)
>>
>>> But as mentioned above, now we have support multiple mappings
>>> for the same gtt_view of the object. For this, the current
>>> vma_lookup() falls short. So, we are skipping it.
>>
>> I get it - but then, having only now noticed how it will be used, I am 
>> less convinced touching the ggtt_view code was the right approach.
>>
>> What about what I proposed above? That you just add code to 
>> __i915_vma_get_pages, which in case of a persistent VMA would call 
>> intel_partial_pages from there.
>>
>> If that works I think it's cleaner and we'd just revert the ggtt_view 
>> to gtt_view rename.
>>
> 
> I don't think that is any cleaner. We need to store the partial view
> information somewhere for the persistent vmas as well. Why not use
> the existing gtt_view for that instead of a new data structure?
> In fact long back I had such an implementation and it was looking
> odd and was suggested to use the existing infrastructure (gtt_view).
> 
> Besides, I think the current i915_vma_lookup method is no longer valid.
> (Ever since we had softpinning, lookup should have be based on the VA
> and not the vma's view of the object).

As a side note I don't think soft pinning was a problem. That did not establish a partial VMA concept, nor had any interaction view ggtt_views. It was still one obj - one vma per vm relationship.

But okay, it is okay to do it like this. I think when you change to separate create/lookup entry points for persistent it will become much cleaner. I do acknowledge you have to "hide" them from normal lookup to avoid confusing the legacy code paths.

One more note - I think patch 6 should be before or together with patch 4. In general infrastructure to handle vm bind should all be in place before code starts using it.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-gfx] [RFC v4 13/14] drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  2022-09-27  9:28           ` Tvrtko Ursulin
@ 2022-09-27 15:37             ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 62+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-27 15:37 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	matthew.auld, daniel.vetter, christian.koenig

On Tue, Sep 27, 2022 at 10:28:03AM +0100, Tvrtko Ursulin wrote:
>
>On 26/09/2022 18:09, Niranjana Vishwanathapura wrote:
>>On Mon, Sep 26, 2022 at 05:26:12PM +0100, Tvrtko Ursulin wrote:
>>>
>>>On 24/09/2022 05:30, Niranjana Vishwanathapura wrote:
>>>>On Fri, Sep 23, 2022 at 09:40:20AM +0100, Tvrtko Ursulin wrote:
>>>>>
>>>>>On 21/09/2022 08:09, Niranjana Vishwanathapura wrote:
>>>>>>vma_lookup is tied to segment of the object instead of section
>>>>>
>>>>>Can be, but not only that. It would be more accurate to say it 
>>>>>is based of gtt views.
>>>>
>>>>Yah, but new code is also based on gtt views, the only difference
>>>>is that now there can be multiple mappings (at different VAs)
>>>>to the same gtt_view of the object.
>>>>
>>>>>
>>>>>>of VA space. Hence, it do not support aliasing (ie., multiple
>>>>>>bindings to the same section of the object).
>>>>>>Skip vma_lookup for persistent vmas as it supports aliasing.
>>>>>
>>>>>What's broken without this patch? If something is, should it 
>>>>>go somewhere earlier in the series? If so should be mentioned 
>>>>>in the commit message.
>>>>>
>>>>>Or is it just a performance optimisation to skip unused 
>>>>>tracking? If so should also be mentioned in the commit 
>>>>>message.
>>>>>
>>>>
>>>>No, it is not a performance optimization.
>>>>The vma_lookup is based on the fact that there can be only one mapping
>>>>for a given gtt_view of the object.
>>>>So, it was looking for gtt_view to find the mapping.
>>>>
>>>>But now, as I mentioned above, there can be multiple mappings for a
>>>>given gtt_view of the object. Hence the vma_lookup method won't work
>>>>here. Hence, it is being skipped for persistent vmas.
>>>
>>>Right, so in that case isn't this patch too late in the series? 
>>>Granted you only allow _userspace_ to use vm bind in 14/14, but 
>>>the kernel infrastructure is there and if there was a selftest it 
>>>would be able to fail without this patch, no?
>>>
>>
>>Yes it is incorrect patch ordering. I am fixing it by moving this patch
>>to early in the series and adding a new i915_vma_create_persistent()
>>function and avoid touching i915_vma_instance() everywhere (as you
>>suggested).
>>
>><snip>
>>
>>>>>>--- a/drivers/gpu/drm/i915/i915_vma.c
>>>>>>+++ b/drivers/gpu/drm/i915/i915_vma.c
>>>>>>@@ -110,7 +110,8 @@ static void __i915_vma_retire(struct 
>>>>>>i915_active *ref)
>>>>>> static struct i915_vma *
>>>>>> vma_create(struct drm_i915_gem_object *obj,
>>>>>>        struct i915_address_space *vm,
>>>>>>-       const struct i915_gtt_view *view)
>>>>>>+       const struct i915_gtt_view *view,
>>>>>>+       bool persistent)
>>>>>> {
>>>>>>     struct i915_vma *pos = ERR_PTR(-E2BIG);
>>>>>>     struct i915_vma *vma;
>>>>>>@@ -197,6 +198,9 @@ vma_create(struct drm_i915_gem_object *obj,
>>>>>>         __set_bit(I915_VMA_GGTT_BIT, __i915_vma_flags(vma));
>>>>>>     }
>>>>>>+    if (persistent)
>>>>>>+        goto skip_rb_insert;
>>>>>
>>>>>Oh so you don't use the gtt_view's fully at all. I now have 
>>>>>reservations whether that was the right approach. Since you 
>>>>>are not using the existing rb tree tracking I mean..
>>>>>
>>>>>You know if a vma is persistent right? So you could have just 
>>>>>added special case for persistent vmas to __i915_vma_get_pages 
>>>>>and still call intel_partial_pages from there. Maybe union 
>>>>>over struct i915_gtt_view in i915_vma for either the view or 
>>>>>struct intel_partial_info for persistent ones.
>>>>>
>>>>
>>>>We are using the gtt_view fully in this patch for persistent vmas.
>>>
>>>I guess yours and mine definition of fully are different. :)
>>>
>>>>But as mentioned above, now we have support multiple mappings
>>>>for the same gtt_view of the object. For this, the current
>>>>vma_lookup() falls short. So, we are skipping it.
>>>
>>>I get it - but then, having only now noticed how it will be used, 
>>>I am less convinced touching the ggtt_view code was the right 
>>>approach.
>>>
>>>What about what I proposed above? That you just add code to 
>>>__i915_vma_get_pages, which in case of a persistent VMA would call 
>>>intel_partial_pages from there.
>>>
>>>If that works I think it's cleaner and we'd just revert the 
>>>ggtt_view to gtt_view rename.
>>>
>>
>>I don't think that is any cleaner. We need to store the partial view
>>information somewhere for the persistent vmas as well. Why not use
>>the existing gtt_view for that instead of a new data structure?
>>In fact long back I had such an implementation and it was looking
>>odd and was suggested to use the existing infrastructure (gtt_view).
>>
>>Besides, I think the current i915_vma_lookup method is no longer valid.
>>(Ever since we had softpinning, lookup should have be based on the VA
>>and not the vma's view of the object).
>
>As a side note I don't think soft pinning was a problem. That did not establish a partial VMA concept, nor had any interaction view ggtt_views. It was still one obj - one vma per vm relationship.
>
>But okay, it is okay to do it like this. I think when you change to separate create/lookup entry points for persistent it will become much cleaner. I do acknowledge you have to "hide" them from normal lookup to avoid confusing the legacy code paths.
>
>One more note - I think patch 6 should be before or together with patch 4. In general infrastructure to handle vm bind should all be in place before code starts using it.
>

Thanks. Yah, separating it out is looking lot cleaner. Yah, I have further split the patches including patch 6 and move part it to before patch 4.
Everything is looking much cleaner now. Will be posting updated series soon.

Regards,
Niranjana

>Regards,
>
>Tvrtko

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2022-09-27 15:38 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-21  7:09 [RFC v4 00/14] drm/i915/vm_bind: Add VM_BIND functionality Niranjana Vishwanathapura
2022-09-21  7:09 ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-21  7:09 ` [RFC v4 01/14] drm/i915/vm_bind: Expose vm lookup function Niranjana Vishwanathapura
2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-21  7:09 ` [RFC v4 02/14] drm/i915/vm_bind: Add __i915_sw_fence_await_reservation() Niranjana Vishwanathapura
2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-21  9:06   ` Tvrtko Ursulin
2022-09-21 17:47     ` Niranjana Vishwanathapura
2022-09-22  9:26   ` Jani Nikula
2022-09-22  9:26     ` [Intel-gfx] " Jani Nikula
2022-09-21  7:09 ` [RFC v4 03/14] drm/i915/vm_bind: Expose i915_gem_object_max_page_size() Niranjana Vishwanathapura
2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-21  9:13   ` Tvrtko Ursulin
2022-09-21 18:00     ` Niranjana Vishwanathapura
2022-09-22  8:09       ` Tvrtko Ursulin
2022-09-22 16:18         ` Matthew Auld
2022-09-22 16:46           ` Niranjana Vishwanathapura
2022-09-22 16:46             ` Niranjana Vishwanathapura
2022-09-23  7:45           ` Tvrtko Ursulin
2022-09-21  7:09 ` [RFC v4 04/14] drm/i915/vm_bind: Implement bind and unbind of object Niranjana Vishwanathapura
2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-22  9:29   ` Jani Nikula
2022-09-21  7:09 ` [RFC v4 05/14] drm/i915/vm_bind: Support for VM private BOs Niranjana Vishwanathapura
2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-21  7:09 ` [RFC v4 06/14] drm/i915/vm_bind: Handle persistent vmas Niranjana Vishwanathapura
2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-27  2:36   ` Zeng, Oak
2022-09-27  2:36     ` Zeng, Oak
2022-09-27  5:45     ` Niranjana Vishwanathapura
2022-09-21  7:09 ` [RFC v4 07/14] drm/i915/vm_bind: Add out fence support Niranjana Vishwanathapura
2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-22  9:31   ` Jani Nikula
2022-09-21  7:09 ` [RFC v4 08/14] drm/i915/vm_bind: Abstract out common execbuf functions Niranjana Vishwanathapura
2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-21 10:18   ` Tvrtko Ursulin
2022-09-21 18:17     ` Niranjana Vishwanathapura
2022-09-22  9:05       ` Tvrtko Ursulin
2022-09-22 14:12         ` Niranjana Vishwanathapura
2022-09-22  9:54   ` Jani Nikula
2022-09-22  9:54     ` [Intel-gfx] " Jani Nikula
2022-09-24  4:22     ` Niranjana Vishwanathapura
2022-09-24  4:22       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-21  7:09 ` [RFC v4 09/14] drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl Niranjana Vishwanathapura
2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-21  7:09 ` [RFC v4 10/14] drm/i915/vm_bind: Update i915_vma_verify_bind_complete() Niranjana Vishwanathapura
2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-21  7:09 ` [RFC v4 11/14] drm/i915/vm_bind: Handle persistent vmas in execbuf3 Niranjana Vishwanathapura
2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-21  7:09 ` [RFC v4 12/14] drm/i915/vm_bind: userptr dma-resv changes Niranjana Vishwanathapura
2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-21  7:09 ` [RFC v4 13/14] drm/i915/vm_bind: Skip vma_lookup for persistent vmas Niranjana Vishwanathapura
2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-23  8:40   ` Tvrtko Ursulin
2022-09-24  4:30     ` Niranjana Vishwanathapura
2022-09-26 16:26       ` Tvrtko Ursulin
2022-09-26 17:09         ` Niranjana Vishwanathapura
2022-09-27  9:28           ` Tvrtko Ursulin
2022-09-27 15:37             ` Niranjana Vishwanathapura
2022-09-21  7:09 ` [RFC v4 14/14] drm/i915/vm_bind: Add uapi for user to enable vm_bind_mode Niranjana Vishwanathapura
2022-09-21  7:09   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-09-21  8:33 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/vm_bind: Add VM_BIND functionality (rev3) Patchwork
2022-09-21  8:55 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.