dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality
@ 2022-08-27 19:43 Andi Shyti
  2022-08-27 19:43 ` [RFC PATCH v3 01/17] drm/i915: Expose vm_lookup in i915_gem_context.h Andi Shyti
                   ` (17 more replies)
  0 siblings, 18 replies; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

Hi,

just sending the original Niranjana's patch as an RFC. It's v3 as
the v2 has been reviewed offline with Ramalingam.

I'm still keeping most of the structure even though some further
discussion can be done starting from here.

Copy pasting Niranjana's original cover letter message:

DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM
buffer objects (BOs) or sections of a BOs at specified GPU virtual
addresses on a specified address space (VM). Multiple mappings can map
to the same physical pages of an object (aliasing). These mappings (also
referred to as persistent mappings) will be persistent across multiple
GPU submissions (execbuf calls) issued by the UMD, without user having
to provide a list of all required mappings during each submission (as
required by older execbuf mode).

This patch series support VM_BIND version 1, as described by the param
I915_PARAM_VM_BIND_VERSION.

Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only works in
vm_bind mode. The vm_bind mode only works with this new execbuf3 ioctl.
The new execbuf3 ioctl will not have any execlist support and all the
legacy support like relocations etc., are removed.

TODOs:
* Support out fence for VM_UNBIND ioctl.
* Async VM_UNBIND support.
* Share code between execbuf2 and execbuf3 where possible.
* Cleanups and optimizations.

NOTEs:
* It is based on below VM_BIND design+uapi patch series.
  https://lists.freedesktop.org/archives/intel-gfx/2022-July/300760.html

* The IGT RFC series is posted as,
  [RFC 0/5] vm_bind: Add VM_BIND validation support

Niranjana Vishwanathapura (17):
  drm/i915: Expose vm_lookup in i915_gem_context.h
  drm/i915: Mark vm for vm_bind usage at creation
  drm/i915/gem: expose i915_gem_object_max_page_size() in
    i915_gem_object.h
  drm/i915: Implement bind and unbind of object
  drm/i915: Support for VM private BOs
  drm/i915/dmabuf: Deny the dmabuf export for VM private BOs
  drm/i915/vm_bind: Handle persistent vmas
  drm/i915/vm_bind: Add out fence support
  drm/i915: Do not support vm_bind mode in execbuf2
  drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
  drm/i915: Add i915_vma_is_bind_complete()
  drm/i915/vm_bind: Handle persistent vmas in execbuf3
  drm/i915/vm_bind: userptr dma-resv changes
  drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  drm/i915: Extend getparm for VM_BIND capability
  drm/i915/ioctl: Enable the vm_bind/unbind ioctls
  drm/i915: Enable execbuf3 ioctl for vm_bind

 drivers/gpu/drm/i915/Makefile                 |    2 +
 drivers/gpu/drm/i915/display/intel_fb_pin.c   |    2 +-
 .../drm/i915/display/intel_plane_initial.c    |    2 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   16 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |    3 +
 drivers/gpu/drm/i915/gem/i915_gem_create.c    |   16 +-
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |    6 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |    9 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1275 +++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |    1 +
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |    2 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |    3 +
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   10 +
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |   24 +
 .../drm/i915/gem/i915_gem_vm_bind_object.c    |  437 ++++++
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   16 +-
 .../i915/gem/selftests/i915_gem_client_blt.c  |    2 +-
 .../drm/i915/gem/selftests/i915_gem_context.c |   12 +-
 .../drm/i915/gem/selftests/i915_gem_migrate.c |    2 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |    6 +-
 .../drm/i915/gem/selftests/igt_gem_utils.c    |    2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |    2 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |    2 +-
 drivers/gpu/drm/i915/gt/intel_gt.c            |    2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |   20 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   27 +
 drivers/gpu/drm/i915/gt/intel_lrc.c           |    4 +-
 drivers/gpu/drm/i915/gt/intel_renderstate.c   |    2 +-
 drivers/gpu/drm/i915/gt/intel_ring.c          |    2 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |    4 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c      |    2 +-
 drivers/gpu/drm/i915/gt/mock_engine.c         |    2 +-
 drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |    4 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |   16 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |    6 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |    2 +-
 .../drm/i915/gt/selftest_ring_submission.c    |    2 +-
 drivers/gpu/drm/i915/gt/selftest_rps.c        |    2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |    4 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |    2 +-
 drivers/gpu/drm/i915/i915_driver.c            |    4 +
 drivers/gpu/drm/i915/i915_gem.c               |    2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c           |   38 +
 drivers/gpu/drm/i915/i915_gem_gtt.h           |    3 +
 drivers/gpu/drm/i915/i915_getparam.c          |    3 +
 drivers/gpu/drm/i915/i915_perf.c              |    2 +-
 drivers/gpu/drm/i915/i915_vma.c               |  114 +-
 drivers/gpu/drm/i915/i915_vma.h               |   62 +-
 drivers/gpu/drm/i915/i915_vma_types.h         |   49 +
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   44 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |    4 +-
 drivers/gpu/drm/i915/selftests/i915_vma.c     |    2 +-
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |    2 +-
 .../drm/i915/selftests/intel_memory_region.c  |    2 +-
 include/uapi/drm/i915_drm.h                   |  255 +++-
 56 files changed, 2424 insertions(+), 119 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 01/17] drm/i915: Expose vm_lookup in i915_gem_context.h
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
@ 2022-08-27 19:43 ` Andi Shyti
  2022-08-27 19:43 ` [RFC PATCH v3 02/17] drm/i915: Mark vm for vm_bind usage at creation Andi Shyti
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

To reuse i915_gem_vm_lookup in upcoming implementation, expose it in
i915_gem_context.h

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 11 ++++++++++-
 drivers/gpu/drm/i915/gem/i915_gem_context.h |  3 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dabdfe09f5e51..fdd3e3bfd4088 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -346,7 +346,16 @@ static int proto_context_register(struct drm_i915_file_private *fpriv,
 	return ret;
 }
 
-static struct i915_address_space *
+/**
+ * i915_gem_vm_lookup() - looks up for the VM reference given the vm id
+ * @file_priv: the private data associated with the user's file
+ * @id: the VM id
+ *
+ * Finds the VM reference associated to a specific id.
+ *
+ * Returns the VM pointer on success, NULL in case of failure.
+ */
+struct i915_address_space *
 i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
 {
 	struct i915_address_space *vm;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index e5b0f66ea1feb..899fa8f1e0fed 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -139,6 +139,9 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
 				       struct drm_file *file);
 
+struct i915_address_space *
+i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id);
+
 struct i915_gem_context *
 i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 02/17] drm/i915: Mark vm for vm_bind usage at creation
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
  2022-08-27 19:43 ` [RFC PATCH v3 01/17] drm/i915: Expose vm_lookup in i915_gem_context.h Andi Shyti
@ 2022-08-27 19:43 ` Andi Shyti
  2022-08-27 19:43 ` [RFC PATCH v3 03/17] drm/i915/gem: expose i915_gem_object_max_page_size() in i915_gem_object.h Andi Shyti
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

At vm creation time, add a flag to indicate that the new vm will use
vm_bind only for object binding.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 ++++-
 drivers/gpu/drm/i915/gt/intel_gtt.h         | 8 ++++++++
 include/uapi/drm/i915_drm.h                 | 3 ++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index fdd3e3bfd4088..2e25341f78ab6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1808,7 +1808,7 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
 	if (!HAS_FULL_PPGTT(i915))
 		return -ENODEV;
 
-	if (args->flags)
+	if (args->flags & I915_VM_CREATE_FLAGS_UNKNOWN)
 		return -EINVAL;
 
 	ppgtt = i915_ppgtt_create(to_gt(i915), 0);
@@ -1828,6 +1828,9 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
 	if (err)
 		goto err_put;
 
+	if (args->flags & I915_VM_CREATE_FLAGS_USE_VM_BIND)
+		ppgtt->vm.vm_bind_mode = true;
+
 	GEM_BUG_ON(id == 0); /* reserved for invalid/unassigned ppgtt */
 	args->vm_id = id;
 	return 0;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index e639434e97fdb..da21088890b3b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -271,6 +271,14 @@ struct i915_address_space {
 	/* Skip pte rewrite on unbind for suspend. Protected by @mutex */
 	bool skip_pte_rewrite:1;
 
+	/**
+	 * @vm_bind_mode: flag to indicate vm_bind method of binding
+	 *
+	 * True: allow only vm_bind method of binding.
+	 * False: allow only legacy execbuff method of binding.
+	 */
+	bool vm_bind_mode:1;
+
 	u8 top;
 	u8 pd_shift;
 	u8 scratch_order;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 520ad2691a99d..12435db751eb8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2522,7 +2522,8 @@ struct drm_i915_gem_vm_control {
 	/** @extensions: Zero-terminated chain of extensions. */
 	__u64 extensions;
 
-	/** @flags: reserved for future usage, currently MBZ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND	(1u << 0)
+#define I915_VM_CREATE_FLAGS_UNKNOWN	(-(I915_VM_CREATE_FLAGS_USE_VM_BIND << 1))
 	__u32 flags;
 
 	/** @vm_id: Id of the VM created or to be destroyed */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 03/17] drm/i915/gem: expose i915_gem_object_max_page_size() in i915_gem_object.h
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
  2022-08-27 19:43 ` [RFC PATCH v3 01/17] drm/i915: Expose vm_lookup in i915_gem_context.h Andi Shyti
  2022-08-27 19:43 ` [RFC PATCH v3 02/17] drm/i915: Mark vm for vm_bind usage at creation Andi Shyti
@ 2022-08-27 19:43 ` Andi Shyti
  2022-08-27 19:43 ` [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object Andi Shyti
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

To reuse i915_gem_object_max_page_size() in upcoming
implementation, expose it in i915_gem_object.h

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 16 +++++++++++++---
 drivers/gpu/drm/i915/gem/i915_gem_object.h |  2 ++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 33673fe7ee0ac..b0aebcc52f83c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -15,8 +15,17 @@
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
 
-static u32 object_max_page_size(struct intel_memory_region **placements,
-				unsigned int n_placements)
+/**
+ * i915_gem_object_max_page_size() - max of min_page_size of the regions
+ * @placements:  list of regions
+ * @n_placements: number of the placements
+ *
+ * Calculates the max of the min_page_size of a list of placements passed in.
+ *
+ * Return: max of the min_page_size
+ */
+u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
+				  unsigned int n_placements)
 {
 	u32 max_page_size = 0;
 	int i;
@@ -99,7 +108,8 @@ __i915_gem_object_create_user_ext(struct drm_i915_private *i915, u64 size,
 
 	i915_gem_flush_free_objects(i915);
 
-	size = round_up(size, object_max_page_size(placements, n_placements));
+	size = round_up(size, i915_gem_object_max_page_size(placements,
+							    n_placements));
 	if (size == 0)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 6f0a3ce355670..650de22248435 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
 }
 
 void i915_gem_init__objects(struct drm_i915_private *i915);
+u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
+				  unsigned int n_placements);
 
 void i915_objects_module_exit(void);
 int i915_objects_module_init(void);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
                   ` (2 preceding siblings ...)
  2022-08-27 19:43 ` [RFC PATCH v3 03/17] drm/i915/gem: expose i915_gem_object_max_page_size() in i915_gem_object.h Andi Shyti
@ 2022-08-27 19:43 ` Andi Shyti
  2022-08-30 17:37   ` Matthew Auld
                     ` (3 more replies)
  2022-08-27 19:43 ` [RFC PATCH v3 05/17] drm/i915: Support for VM private BOs Andi Shyti
                   ` (13 subsequent siblings)
  17 siblings, 4 replies; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Implement the bind and unbind of an object at the specified GPU virtual
addresses.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  21 ++
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 322 ++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  10 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
 drivers/gpu/drm/i915/i915_driver.c            |   1 +
 drivers/gpu/drm/i915/i915_vma.c               |   3 +-
 drivers/gpu/drm/i915/i915_vma.h               |   2 -
 drivers/gpu/drm/i915/i915_vma_types.h         |  14 +
 include/uapi/drm/i915_drm.h                   | 163 +++++++++
 10 files changed, 543 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 522ef9b4aff32..4e1627e96c6e0 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -165,6 +165,7 @@ gem-y += \
 	gem/i915_gem_ttm_move.o \
 	gem/i915_gem_ttm_pm.o \
 	gem/i915_gem_userptr.o \
+	gem/i915_gem_vm_bind_object.o \
 	gem/i915_gem_wait.o \
 	gem/i915_gemfs.o
 i915-y += \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
new file mode 100644
index 0000000000000..ebc493b7dafc1
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#ifndef __I915_GEM_VM_BIND_H
+#define __I915_GEM_VM_BIND_H
+
+#include "i915_drv.h"
+
+struct i915_vma *
+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
+void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
+
+int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
+			   struct drm_file *file);
+int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file);
+
+void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm);
+#endif /* __I915_GEM_VM_BIND_H */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
new file mode 100644
index 0000000000000..dadd1d4b1761b
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -0,0 +1,322 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include <linux/interval_tree_generic.h>
+
+#include "gem/i915_gem_vm_bind.h"
+#include "gem/i915_gem_context.h"
+#include "gt/gen8_engine_cs.h"
+
+#include "i915_drv.h"
+#include "i915_gem_gtt.h"
+
+#define START(node) ((node)->start)
+#define LAST(node) ((node)->last)
+
+INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
+		     START, LAST, static inline, i915_vm_bind_it)
+
+#undef START
+#undef LAST
+
+/**
+ * DOC: VM_BIND/UNBIND ioctls
+ *
+ * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
+ * objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
+ * specified address space (VM). Multiple mappings can map to the same physical
+ * pages of an object (aliasing). These mappings (also referred to as persistent
+ * mappings) will be persistent across multiple GPU submissions (execbuf calls)
+ * issued by the UMD, without user having to provide a list of all required
+ * mappings during each submission (as required by older execbuf mode).
+ *
+ * The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
+ * signaling the completion of bind/unbind operation.
+ *
+ * VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
+ * User has to opt-in for VM_BIND mode of binding for an address space (VM)
+ * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be
+ * done asynchronously, when valid out fence is specified.
+ *
+ * VM_BIND locking order is as below.
+ *
+ * 1) vm_bind_lock mutex will protect vm_bind lists. This lock is taken in
+ *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
+ *    mapping.
+ *
+ *    In future, when GPU page faults are supported, we can potentially use a
+ *    rwsem instead, so that multiple page fault handlers can take the read
+ *    side lock to lookup the mapping and hence can run in parallel.
+ *    The older execbuf mode of binding do not need this lock.
+ *
+ * 2) The object's dma-resv lock will protect i915_vma state and needs
+ *    to be held while binding/unbinding a vma in the async worker and while
+ *    updating dma-resv fence list of an object. Note that private BOs of a VM
+ *    will all share a dma-resv object.
+ *
+ * 3) Spinlock/s to protect some of the VM's lists like the list of
+ *    invalidated vmas (due to eviction and userptr invalidation) etc.
+ */
+
+/**
+ * i915_gem_vm_bind_lookup_vma() - lookup for the vma with a starting addr
+ * @vm: virtual address space in which vma needs to be looked for
+ * @va: starting addr of the vma
+ *
+ * retrieves the vma with a starting address from the vm's vma tree.
+ *
+ * Returns: returns vma on success, NULL on failure.
+ */
+struct i915_vma *
+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
+{
+	lockdep_assert_held(&vm->vm_bind_lock);
+
+	return i915_vm_bind_it_iter_first(&vm->va, va, va);
+}
+
+/**
+ * i915_gem_vm_bind_remove() - Remove vma from the vm bind list
+ * @vma: vma that needs to be removed
+ * @release_obj: object to be release or not
+ *
+ * Removes the vma from the vm's lists custom interval tree
+ */
+void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
+{
+	lockdep_assert_held(&vma->vm->vm_bind_lock);
+
+	if (!list_empty(&vma->vm_bind_link)) {
+		list_del_init(&vma->vm_bind_link);
+		i915_vm_bind_it_remove(vma, &vma->vm->va);
+
+		/* Release object */
+		if (release_obj)
+			i915_gem_object_put(vma->obj);
+	}
+}
+
+static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
+				  struct i915_vma *vma,
+				  struct drm_i915_gem_vm_unbind *va)
+{
+	struct drm_i915_gem_object *obj;
+	int ret;
+
+	if (vma) {
+		obj = vma->obj;
+		i915_vma_destroy(vma);
+
+		goto exit;
+	}
+
+	if (!va)
+		return -EINVAL;
+
+	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
+	if (ret)
+		return ret;
+
+	va->start = gen8_noncanonical_addr(va->start);
+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
+
+	if (!vma)
+		ret = -ENOENT;
+	else if (vma->size != va->length)
+		ret = -EINVAL;
+
+	if (ret) {
+		mutex_unlock(&vm->vm_bind_lock);
+		return ret;
+	}
+
+	i915_gem_vm_bind_remove(vma, false);
+
+	mutex_unlock(&vm->vm_bind_lock);
+
+	/* Destroy vma and then release object */
+	obj = vma->obj;
+	ret = i915_gem_object_lock(obj, NULL);
+	if (ret)
+		return ret;
+
+	i915_vma_destroy(vma);
+	i915_gem_object_unlock(obj);
+
+exit:
+	i915_gem_object_put(obj);
+
+	return 0;
+}
+
+/**
+ * i915_gem_vm_unbind_vma_all() - Unbind all vmas from an address space
+ * @vm: Address spece from which vma binding needs to be removed
+ *
+ * Unbind all userspace requested object binding
+ */
+void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm)
+{
+	struct i915_vma *vma, *t;
+
+	list_for_each_entry_safe(vma, t, &vm->vm_bound_list, vm_bind_link)
+		WARN_ON(i915_gem_vm_unbind_vma(vm, vma, NULL));
+}
+
+static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
+					struct drm_i915_gem_object *obj,
+					struct drm_i915_gem_vm_bind *va)
+{
+	struct i915_ggtt_view view;
+	struct i915_vma *vma;
+
+	va->start = gen8_noncanonical_addr(va->start);
+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
+	if (vma)
+		return ERR_PTR(-EEXIST);
+
+	view.type = I915_GGTT_VIEW_PARTIAL;
+	view.partial.offset = va->offset >> PAGE_SHIFT;
+	view.partial.size = va->length >> PAGE_SHIFT;
+	vma = i915_vma_instance(obj, vm, &view);
+	if (IS_ERR(vma))
+		return vma;
+
+	vma->start = va->start;
+	vma->last = va->start + va->length - 1;
+
+	return vma;
+}
+
+static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
+				struct drm_i915_gem_vm_bind *va,
+				struct drm_file *file)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma = NULL;
+	struct i915_gem_ww_ctx ww;
+	u64 pin_flags;
+	int ret = 0;
+
+	if (!vm->vm_bind_mode)
+		return -EOPNOTSUPP;
+
+	obj = i915_gem_object_lookup(file, va->handle);
+	if (!obj)
+		return -ENOENT;
+
+	if (!va->length ||
+	    !IS_ALIGNED(va->offset | va->length,
+			i915_gem_object_max_page_size(obj->mm.placements,
+						      obj->mm.n_placements)) ||
+	    range_overflows_t(u64, va->offset, va->length, obj->base.size)) {
+		ret = -EINVAL;
+		goto put_obj;
+	}
+
+	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
+	if (ret)
+		goto put_obj;
+
+	vma = vm_bind_get_vma(vm, obj, va);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto unlock_vm;
+	}
+
+	for_i915_gem_ww(&ww, ret, true) {
+retry:
+		ret = i915_gem_object_lock(vma->obj, &ww);
+		if (ret)
+			goto out_ww;
+
+		ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
+		if (ret)
+			goto out_ww;
+
+		/* Make it evictable */
+		__i915_vma_unpin(vma);
+
+		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
+		i915_vm_bind_it_insert(vma, &vm->va);
+
+out_ww:
+		if (ret == -EDEADLK) {
+			ret = i915_gem_ww_ctx_backoff(&ww);
+			if (!ret)
+				goto retry;
+		} else {
+			/* Hold object reference until vm_unbind */
+			i915_gem_object_get(vma->obj);
+		}
+	}
+
+unlock_vm:
+	mutex_unlock(&vm->vm_bind_lock);
+
+put_obj:
+	i915_gem_object_put(obj);
+
+	return ret;
+}
+
+/**
+ * i915_gem_vm_bind_ioctl() - ioctl function for binding an obj into
+ * virtual address
+ * @dev: drm device associated to the virtual address
+ * @data: data related to the vm bind required
+ * @file: drm_file related to he ioctl
+ *
+ * Implements a function to bind the object into the virtual address
+ *
+ * Returns 0 on success, error code on failure.
+ */
+int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
+			   struct drm_file *file)
+{
+	struct drm_i915_gem_vm_bind *args = data;
+	struct i915_address_space *vm;
+	int ret;
+
+	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
+	if (unlikely(!vm))
+		return -ENOENT;
+
+	ret = i915_gem_vm_bind_obj(vm, args, file);
+
+	i915_vm_put(vm);
+	return ret;
+}
+
+/**
+ * i915_gem_vm_unbind_ioctl() - ioctl function for unbinding an obj from
+ * virtual address
+ * @dev: drm device associated to the virtual address
+ * @data: data related to the binding that needs to be unbinded
+ * @file: drm_file related to the ioctl
+ *
+ * Implements a function to unbind the object from the virtual address
+ *
+ * Returns 0 on success, error code on failure.
+ */
+int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file)
+{
+	struct drm_i915_gem_vm_unbind *args = data;
+	struct i915_address_space *vm;
+	int ret;
+
+	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
+	if (unlikely(!vm))
+		return -ENOENT;
+
+	ret = i915_gem_vm_unbind_vma(vm, NULL, args);
+
+	i915_vm_put(vm);
+	return ret;
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index b67831833c9a3..cb188377b7bd9 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -12,6 +12,7 @@
 
 #include "gem/i915_gem_internal.h"
 #include "gem/i915_gem_lmem.h"
+#include "gem/i915_gem_vm_bind.h"
 #include "i915_trace.h"
 #include "i915_utils.h"
 #include "intel_gt.h"
@@ -176,6 +177,8 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
 void i915_address_space_fini(struct i915_address_space *vm)
 {
 	drm_mm_takedown(&vm->mm);
+	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
+	mutex_destroy(&vm->vm_bind_lock);
 }
 
 /**
@@ -204,6 +207,8 @@ static void __i915_vm_release(struct work_struct *work)
 
 	__i915_vm_close(vm);
 
+	i915_gem_vm_unbind_vma_all(vm);
+
 	/* Synchronize async unbinds. */
 	i915_vma_resource_bind_dep_sync_all(vm);
 
@@ -282,6 +287,11 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	INIT_LIST_HEAD(&vm->bound_list);
 	INIT_LIST_HEAD(&vm->unbound_list);
+
+	vm->va = RB_ROOT_CACHED;
+	INIT_LIST_HEAD(&vm->vm_bind_list);
+	INIT_LIST_HEAD(&vm->vm_bound_list);
+	mutex_init(&vm->vm_bind_lock);
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index da21088890b3b..06a259475816b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -259,6 +259,15 @@ struct i915_address_space {
 	 */
 	struct list_head unbound_list;
 
+	/** @vm_bind_lock: Mutex to protect @vm_bind_list and @vm_bound_list */
+	struct mutex vm_bind_lock;
+	/** @vm_bind_list: List of vm_binding in process */
+	struct list_head vm_bind_list;
+	/** @vm_bound_list: List of vm_binding completed */
+	struct list_head vm_bound_list;
+	/* @va: tree of persistent vmas */
+	struct rb_root_cached va;
+
 	/* Global GTT */
 	bool is_ggtt:1;
 
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 1332c70370a68..9a9010fd9ecfa 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -68,6 +68,7 @@
 #include "gem/i915_gem_ioctls.h"
 #include "gem/i915_gem_mman.h"
 #include "gem/i915_gem_pm.h"
+#include "gem/i915_gem_vm_bind.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_pm.h"
 #include "gt/intel_rc6.h"
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 2603717164900..092ae4309d8a1 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -29,6 +29,7 @@
 #include "display/intel_frontbuffer.h"
 #include "gem/i915_gem_lmem.h"
 #include "gem/i915_gem_tiling.h"
+#include "gem/i915_gem_vm_bind.h"
 #include "gt/intel_engine.h"
 #include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_gt.h"
@@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	spin_unlock(&obj->vma.lock);
 	mutex_unlock(&vm->mutex);
 
+	INIT_LIST_HEAD(&vma->vm_bind_link);
 	return vma;
 
 err_unlock:
@@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 {
 	struct i915_vma *vma;
 
-	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
 	GEM_BUG_ON(!kref_read(&vm->ref));
 
 	spin_lock(&obj->vma.lock);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 33a58f605d75c..15eac55a3e274 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
 {
 	ptrdiff_t cmp;
 
-	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
-
 	cmp = ptrdiff(vma->vm, vm);
 	if (cmp)
 		return cmp;
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index be6e028c3b57d..f746fecae85ed 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -289,6 +289,20 @@ struct i915_vma {
 	/** This object's place on the active/inactive lists */
 	struct list_head vm_link;
 
+	/** @vm_bind_link: node for the vm_bind related lists of vm */
+	struct list_head vm_bind_link;
+
+	/** Interval tree structures for persistent vma */
+
+	/** @rb: node for the interval tree of vm for persistent vmas */
+	struct rb_node rb;
+	/** @start: start endpoint of the rb node */
+	u64 start;
+	/** @last: Last endpoint of the rb node */
+	u64 last;
+	/** @__subtree_last: last in subtree */
+	u64 __subtree_last;
+
 	struct list_head obj_link; /* Link in the object's VMA list */
 	struct rb_node obj_node;
 	struct hlist_node obj_hash;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 12435db751eb8..3da0e07f84bbd 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1507,6 +1507,41 @@ struct drm_i915_gem_execbuffer2 {
 #define i915_execbuffer2_get_context_id(eb2) \
 	((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
 
+/**
+ * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
+ *
+ * The operation will wait for input fence to signal.
+ *
+ * The returned output fence will be signaled after the completion of the
+ * operation.
+ */
+struct drm_i915_gem_timeline_fence {
+	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
+	__u32 handle;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_TIMELINE_FENCE_WAIT:
+	 * Wait for the input fence before the operation.
+	 *
+	 * I915_TIMELINE_FENCE_SIGNAL:
+	 * Return operation completion fence as output.
+	 */
+	__u32 flags;
+#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
+#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
+#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
+
+	/**
+	 * @value: A point in the timeline.
+	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+	 * binary one.
+	 */
+	__u64 value;
+};
+
 struct drm_i915_gem_pin {
 	/** Handle of the buffer to be pinned. */
 	__u32 handle;
@@ -3718,6 +3753,134 @@ struct drm_i915_gem_create_ext_protected_content {
 /* ID of the protected content session managed by i915 when PXP is active */
 #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
 
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
+ * virtual address (VA) range to the section of an object that should be bound
+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) and can
+ * be mapped to whole object or a section of the object (partial binding).
+ * Multiple VA mappings can be created to the same section of the object
+ * (aliasing).
+ *
+ * The @start, @offset and @length must be 4K page aligned. However the DG2
+ * and XEHPSDV has 64K page size for device local memory and has compact page
+ * table. On those platforms, for binding device local-memory objects, the
+ * @start, @offset and @length must be 64K aligned. Also, UMDs should not mix
+ * the local memory 64K page and the system memory 4K page bindings in the same
+ * 2M range.
+ *
+ * Error code -EINVAL will be returned if @start, @offset and @length are not
+ * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
+ * -ENOSPC will be returned if the VA range specified can't be reserved.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_BIND operation can be done
+ * asynchronously, if valid @fence is specified.
+ */
+struct drm_i915_gem_vm_bind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @handle: Object handle */
+	__u32 handle;
+
+	/** @start: Virtual Address start to bind */
+	__u64 start;
+
+	/** @offset: Offset in object to bind */
+	__u64 offset;
+
+	/** @length: Length of mapping to bind */
+	__u64 length;
+
+	/**
+	 * @flags: Currently reserved, MBZ.
+	 *
+	 * Note that @fence carries its own flags.
+	 */
+	__u64 flags;
+
+	/**
+	 * @fence: Timeline fence for bind completion signaling.
+	 *
+	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
+	 *
+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
+	 * is invalid, and an error will be returned.
+	 *
+	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
+	 * is not requested and binding is completed synchronously.
+	 */
+	struct drm_i915_gem_timeline_fence fence;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
+ *
+ * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
+ * address (VA) range that should be unbound from the device page table of the
+ * specified address space (VM). VM_UNBIND will force unbind the specified
+ * range from device page table without waiting for any GPU job to complete.
+ * It is UMDs responsibility to ensure the mapping is no longer in use before
+ * calling VM_UNBIND.
+ *
+ * If the specified mapping is not found, the ioctl will simply return without
+ * any error.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
+ * asynchronously, if valid @fence is specified.
+ */
+struct drm_i915_gem_vm_unbind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @rsvd: Reserved, MBZ */
+	__u32 rsvd;
+
+	/** @start: Virtual Address start to unbind */
+	__u64 start;
+
+	/** @length: Length of mapping to unbind */
+	__u64 length;
+
+	/**
+	 * @flags: Currently reserved, MBZ.
+	 *
+	 * Note that @fence carries its own flags.
+	 */
+	__u64 flags;
+
+	/**
+	 * @fence: Timeline fence for unbind completion signaling.
+	 *
+	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
+	 *
+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
+	 * is invalid, and an error will be returned.
+	 *
+	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
+	 * is not requested and unbinding is completed synchronously.
+	 */
+	struct drm_i915_gem_timeline_fence fence;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 05/17] drm/i915: Support for VM private BOs
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
                   ` (3 preceding siblings ...)
  2022-08-27 19:43 ` [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object Andi Shyti
@ 2022-08-27 19:43 ` Andi Shyti
  2022-08-31  6:13   ` Niranjana Vishwanathapura
  2022-08-27 19:43 ` [RFC PATCH v3 06/17] drm/i915/dmabuf: Deny the dmabuf export " Andi Shyti
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Each VM creates a root_obj and shares it with all of its private objects
to use it as dma_resv object. This has a performance advantage as it
requires a single dma_resv object update for all private BOs vs list of
dma_resv objects update for shared BOs, in the execbuf path.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h   | 3 +++
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c | 9 +++++++++
 drivers/gpu/drm/i915/gt/intel_gtt.c                | 4 ++++
 drivers/gpu/drm/i915/gt/intel_gtt.h                | 2 ++
 drivers/gpu/drm/i915/i915_vma.c                    | 1 +
 drivers/gpu/drm/i915/i915_vma_types.h              | 2 ++
 6 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 9f6b14ec189a2..46308dcf39e99 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -241,6 +241,9 @@ struct drm_i915_gem_object {
 
 	const struct drm_i915_gem_object_ops *ops;
 
+	/* For VM private BO, points to root_obj in VM. NULL otherwise */
+	struct drm_i915_gem_object *priv_root;
+
 	struct {
 		/**
 		 * @vma.lock: protect the list/tree of vmas
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index dadd1d4b1761b..9ff929f187cfd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -93,6 +93,7 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
 
 	if (!list_empty(&vma->vm_bind_link)) {
 		list_del_init(&vma->vm_bind_link);
+		list_del_init(&vma->non_priv_vm_bind_link);
 		i915_vm_bind_it_remove(vma, &vma->vm->va);
 
 		/* Release object */
@@ -219,6 +220,11 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		goto put_obj;
 	}
 
+	if (obj->priv_root && obj->priv_root != vm->root_obj) {
+		ret = -EINVAL;
+		goto put_obj;
+	}
+
 	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
 	if (ret)
 		goto put_obj;
@@ -244,6 +250,9 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 
 		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
 		i915_vm_bind_it_insert(vma, &vm->va);
+		if (!obj->priv_root)
+			list_add_tail(&vma->non_priv_vm_bind_link,
+				      &vm->non_priv_vm_bind_list);
 
 out_ww:
 		if (ret == -EDEADLK) {
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index cb188377b7bd9..c4f75826213ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -177,6 +177,7 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
 void i915_address_space_fini(struct i915_address_space *vm)
 {
 	drm_mm_takedown(&vm->mm);
+	i915_gem_object_put(vm->root_obj);
 	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
 	mutex_destroy(&vm->vm_bind_lock);
 }
@@ -292,6 +293,9 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	INIT_LIST_HEAD(&vm->vm_bind_list);
 	INIT_LIST_HEAD(&vm->vm_bound_list);
 	mutex_init(&vm->vm_bind_lock);
+	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
+	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
+	GEM_BUG_ON(IS_ERR(vm->root_obj));
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 06a259475816b..9a2665e4ec2e5 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -267,6 +267,8 @@ struct i915_address_space {
 	struct list_head vm_bound_list;
 	/* @va: tree of persistent vmas */
 	struct rb_root_cached va;
+	struct list_head non_priv_vm_bind_list;
+	struct drm_i915_gem_object *root_obj;
 
 	/* Global GTT */
 	bool is_ggtt:1;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 092ae4309d8a1..239346e0c07f2 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	mutex_unlock(&vm->mutex);
 
 	INIT_LIST_HEAD(&vma->vm_bind_link);
+	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
 	return vma;
 
 err_unlock:
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index f746fecae85ed..de5534d518cdd 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -291,6 +291,8 @@ struct i915_vma {
 
 	/** @vm_bind_link: node for the vm_bind related lists of vm */
 	struct list_head vm_bind_link;
+	/* @non_priv_vm_bind_link: Link in non-private persistent VMA list */
+	struct list_head non_priv_vm_bind_link;
 
 	/** Interval tree structures for persistent vma */
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 06/17] drm/i915/dmabuf: Deny the dmabuf export for VM private BOs
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
                   ` (4 preceding siblings ...)
  2022-08-27 19:43 ` [RFC PATCH v3 05/17] drm/i915: Support for VM private BOs Andi Shyti
@ 2022-08-27 19:43 ` Andi Shyti
  2022-08-27 19:43 ` [RFC PATCH v3 07/17] drm/i915/vm_bind: Handle persistent vmas Andi Shyti
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

VM private BOs can be only mapped on specified VM and cannot be dmabuf
exported.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index f5062d0c63336..6433173c3e84d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags)
 	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
 	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
 
+	if (obj->priv_root) {
+		drm_dbg(obj->base.dev,
+			"Exporting VM private objects is not allowed\n");
+		return ERR_PTR(-EINVAL);
+	}
+
 	exp_info.ops = &i915_dmabuf_ops;
 	exp_info.size = gem_obj->size;
 	exp_info.flags = flags;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 07/17] drm/i915/vm_bind: Handle persistent vmas
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
                   ` (5 preceding siblings ...)
  2022-08-27 19:43 ` [RFC PATCH v3 06/17] drm/i915/dmabuf: Deny the dmabuf export " Andi Shyti
@ 2022-08-27 19:43 ` Andi Shyti
  2022-08-31  6:16   ` Niranjana Vishwanathapura
  2022-09-12 13:16   ` [Intel-gfx] " Jani Nikula
  2022-08-27 19:43 ` [RFC PATCH v3 08/17] drm/i915/vm_bind: Add out fence support Andi Shyti
                   ` (10 subsequent siblings)
  17 siblings, 2 replies; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Treat VM_BIND vmas as persistent across execbuf ioctl calls and handle
them during the request submission in the execbuff path.

Support eviction by maintaining a list of evicted persistent vmas
for rebinding during next submission.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
 .../drm/i915/gem/i915_gem_vm_bind_object.c    |  8 +++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  4 ++
 drivers/gpu/drm/i915/i915_gem_gtt.c           | 38 +++++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h           |  3 +
 drivers/gpu/drm/i915/i915_vma.c               | 50 +++++++++++++++--
 drivers/gpu/drm/i915/i915_vma.h               | 56 +++++++++++++++----
 drivers/gpu/drm/i915/i915_vma_types.h         | 24 ++++++++
 9 files changed, 169 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 389e9f157ca5e..825dce41f7113 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -38,6 +38,7 @@
 #include "i915_gem_mman.h"
 #include "i915_gem_object.h"
 #include "i915_gem_ttm.h"
+#include "i915_gem_vm_bind.h"
 #include "i915_memcpy.h"
 #include "i915_trace.h"
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 9ff929f187cfd..3b45529fe8d4c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -91,6 +91,13 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
 {
 	lockdep_assert_held(&vma->vm->vm_bind_lock);
 
+	spin_lock(&vma->vm->vm_rebind_lock);
+	if (!list_empty(&vma->vm_rebind_link))
+		list_del_init(&vma->vm_rebind_link);
+	i915_vma_set_purged(vma);
+	i915_vma_set_freed(vma);
+	spin_unlock(&vma->vm->vm_rebind_lock);
+
 	if (!list_empty(&vma->vm_bind_link)) {
 		list_del_init(&vma->vm_bind_link);
 		list_del_init(&vma->non_priv_vm_bind_link);
@@ -190,6 +197,7 @@ static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
 
 	vma->start = va->start;
 	vma->last = va->start + va->length - 1;
+	i915_vma_set_persistent(vma);
 
 	return vma;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index c4f75826213ae..97cd0089b516d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -296,6 +296,8 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
 	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
 	GEM_BUG_ON(IS_ERR(vm->root_obj));
+	INIT_LIST_HEAD(&vm->vm_rebind_list);
+	spin_lock_init(&vm->vm_rebind_lock);
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 9a2665e4ec2e5..1f3b1967ec175 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -265,6 +265,10 @@ struct i915_address_space {
 	struct list_head vm_bind_list;
 	/** @vm_bound_list: List of vm_binding completed */
 	struct list_head vm_bound_list;
+	/* @vm_rebind_list: list of vmas to be rebinded */
+	struct list_head vm_rebind_list;
+	/* @vm_rebind_lock: protects vm_rebound_list */
+	spinlock_t vm_rebind_lock;
 	/* @va: tree of persistent vmas */
 	struct rb_root_cached va;
 	struct list_head non_priv_vm_bind_list;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 329ff75b80b97..f083724163deb 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -25,6 +25,44 @@
 #include "i915_trace.h"
 #include "i915_vgpu.h"
 
+/**
+ * i915_vm_sync() - Wait for all requests on private vmas of a vm to be completed
+ * @vm: address space we need to wait for idle
+ *
+ * Waits till all requests of the vm_binded private objs are completed.
+ *
+ * Returns: 0 on success -ve errcode on failure
+ */
+int i915_vm_sync(struct i915_address_space *vm)
+{
+	int ret;
+
+	/* Wait for all requests under this vm to finish */
+	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
+				    DMA_RESV_USAGE_BOOKKEEP, false,
+				    MAX_SCHEDULE_TIMEOUT);
+	if (ret < 0)
+		return ret;
+	else if (ret > 0)
+		return 0;
+	else
+		return -ETIMEDOUT;
+}
+
+/**
+ * i915_vm_is_active() - Check for activeness of requests of vm
+ * @vm: address spece targetted
+ *
+ * Check whether all the requests related private vmas are completed or not
+ *
+ * Returns: True when requests are not completed yet. Flase otherwise.
+ */
+bool i915_vm_is_active(const struct i915_address_space *vm)
+{
+	return !dma_resv_test_signaled(vm->root_obj->base.resv,
+				       DMA_RESV_USAGE_BOOKKEEP);
+}
+
 int i915_gem_gtt_prepare_pages(struct drm_i915_gem_object *obj,
 			       struct sg_table *pages)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8c2f57eb5ddaa..a5bbdc59d9dfb 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -51,4 +51,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 
 #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
 
+int i915_vm_sync(struct i915_address_space *vm);
+bool i915_vm_is_active(const struct i915_address_space *vm);
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 239346e0c07f2..0eb7727d62a6f 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
 
 	INIT_LIST_HEAD(&vma->vm_bind_link);
 	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
+	INIT_LIST_HEAD(&vma->vm_rebind_link);
 	return vma;
 
 err_unlock:
@@ -387,8 +388,31 @@ int i915_vma_wait_for_bind(struct i915_vma *vma)
 	return err;
 }
 
-#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
-static int i915_vma_verify_bind_complete(struct i915_vma *vma)
+/**
+ * i915_vma_sync() - Wait for the vma to be idle
+ * @vma: vma to be tested
+ *
+ * Returns 0 on success and error code on failure
+ */
+int i915_vma_sync(struct i915_vma *vma)
+{
+	int ret;
+
+	/* Wait for the asynchronous bindings and pending GPU reads */
+	ret = i915_active_wait(&vma->active);
+	if (ret || !i915_vma_is_persistent(vma) || i915_vma_is_purged(vma))
+		return ret;
+
+	return i915_vm_sync(vma->vm);
+}
+
+/**
+ * i915_vma_verify_bind_complete() - Check for the vm_bind completion of the vma
+ * @vma: vma submitted for vm_bind
+ *
+ * Returns: 0 if the vm_bind is completed. Error code otherwise.
+ */
+int i915_vma_verify_bind_complete(struct i915_vma *vma)
 {
 	struct dma_fence *fence = i915_active_fence_get(&vma->active.excl);
 	int err;
@@ -405,9 +429,6 @@ static int i915_vma_verify_bind_complete(struct i915_vma *vma)
 
 	return err;
 }
-#else
-#define i915_vma_verify_bind_complete(_vma) 0
-#endif
 
 I915_SELFTEST_EXPORT void
 i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res,
@@ -1654,6 +1675,13 @@ static void force_unbind(struct i915_vma *vma)
 	if (!drm_mm_node_allocated(&vma->node))
 		return;
 
+	/*
+	 * Mark persistent vma as purged to avoid it waiting
+	 * for VM to be released.
+	 */
+	if (i915_vma_is_persistent(vma))
+		i915_vma_set_purged(vma);
+
 	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
 	WARN_ON(__i915_vma_unbind(vma));
 	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
@@ -1846,6 +1874,8 @@ int _i915_vma_move_to_active(struct i915_vma *vma,
 	int err;
 
 	assert_object_held(obj);
+	if (i915_vma_is_persistent(vma))
+		return -EINVAL;
 
 	GEM_BUG_ON(!vma->pages);
 
@@ -2014,6 +2044,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
 	__i915_vma_evict(vma, false);
 
 	drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
+
+	if (i915_vma_is_persistent(vma)) {
+		spin_lock(&vma->vm->vm_rebind_lock);
+		if (list_empty(&vma->vm_rebind_link) &&
+		    !i915_vma_is_purged(vma))
+			list_add_tail(&vma->vm_rebind_link,
+				      &vma->vm->vm_rebind_list);
+		spin_unlock(&vma->vm->vm_rebind_lock);
+	}
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 15eac55a3e274..bf0b5b4abd919 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 
 void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
 #define I915_VMA_RELEASE_MAP BIT(0)
-
-static inline bool i915_vma_is_active(const struct i915_vma *vma)
-{
-	return !i915_active_is_idle(&vma->active);
-}
-
 /* do not reserve memory to prevent deadlocks */
 #define __EXEC_OBJECT_NO_RESERVE BIT(31)
 
@@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
 	return i915_vm_to_ggtt(vma->vm)->pin_bias;
 }
 
+static inline bool i915_vma_is_persistent(const struct i915_vma *vma)
+{
+	return test_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
+}
+
+static inline void i915_vma_set_persistent(struct i915_vma *vma)
+{
+	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
+}
+
+static inline bool i915_vma_is_purged(const struct i915_vma *vma)
+{
+	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
+}
+
+static inline void i915_vma_set_purged(struct i915_vma *vma)
+{
+	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
+}
+
+static inline bool i915_vma_is_freed(const struct i915_vma *vma)
+{
+	return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
+}
+
+static inline void i915_vma_set_freed(struct i915_vma *vma)
+{
+	set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
+}
+
+static inline bool i915_vma_is_active(const struct i915_vma *vma)
+{
+	if (i915_vma_is_persistent(vma)) {
+		if (i915_vma_is_purged(vma))
+			return false;
+
+		return i915_vm_is_active(vma->vm);
+	}
+
+	return !i915_active_is_idle(&vma->active);
+}
+
 static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
 {
 	i915_gem_object_get(vma->obj);
@@ -406,12 +442,8 @@ void i915_vma_make_shrinkable(struct i915_vma *vma);
 void i915_vma_make_purgeable(struct i915_vma *vma);
 
 int i915_vma_wait_for_bind(struct i915_vma *vma);
-
-static inline int i915_vma_sync(struct i915_vma *vma)
-{
-	/* Wait for the asynchronous bindings and pending GPU reads */
-	return i915_active_wait(&vma->active);
-}
+int i915_vma_verify_bind_complete(struct i915_vma *vma);
+int i915_vma_sync(struct i915_vma *vma);
 
 /**
  * i915_vma_get_current_resource - Get the current resource of the vma
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index de5534d518cdd..5483ccf0c82c7 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -264,6 +264,28 @@ struct i915_vma {
 #define I915_VMA_SCANOUT_BIT	17
 #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
 
+  /**
+   * I915_VMA_PERSISTENT_BIT:
+   * The vma is persistent (created with VM_BIND call).
+   *
+   * I915_VMA_PURGED_BIT:
+   * The persistent vma is force unbound either due to VM_UNBIND call
+   * from UMD or VM is released. Do not check/wait for VM activeness
+   * in i915_vma_is_active() and i915_vma_sync() calls.
+   *
+   * I915_VMA_FREED_BIT:
+   * The persistent vma is being released by UMD via VM_UNBIND call.
+   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND call
+   * already holds the lock.
+   */
+#define I915_VMA_PERSISTENT_BIT	19
+#define I915_VMA_PURGED_BIT	20
+#define I915_VMA_FREED_BIT	21
+
+#define I915_VMA_PERSISTENT	((int)BIT(I915_VMA_PERSISTENT_BIT))
+#define I915_VMA_PURGED		((int)BIT(I915_VMA_PURGED_BIT))
+#define I915_VMA_FREED		((int)BIT(I915_VMA_FREED_BIT))
+
 	struct i915_active active;
 
 #define I915_VMA_PAGES_BIAS 24
@@ -293,6 +315,8 @@ struct i915_vma {
 	struct list_head vm_bind_link;
 	/* @non_priv_vm_bind_link: Link in non-private persistent VMA list */
 	struct list_head non_priv_vm_bind_link;
+	/* @vm_rebind_link: link to vm_rebind_list and protected by vm_rebind_lock */
+	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
 
 	/** Interval tree structures for persistent vma */
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 08/17] drm/i915/vm_bind: Add out fence support
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
                   ` (6 preceding siblings ...)
  2022-08-27 19:43 ` [RFC PATCH v3 07/17] drm/i915/vm_bind: Handle persistent vmas Andi Shyti
@ 2022-08-27 19:43 ` Andi Shyti
  2022-08-31  6:22   ` Niranjana Vishwanathapura
  2022-08-27 19:43 ` [RFC PATCH v3 09/17] drm/i915: Do not support vm_bind mode in execbuf2 Andi Shyti
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Add support for handling out fence of vm_bind call.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  3 +
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 82 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_vma.c               |  6 +-
 drivers/gpu/drm/i915/i915_vma_types.h         |  7 ++
 4 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
index ebc493b7dafc1..d65e6e4fb3972 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
@@ -18,4 +18,7 @@ int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
 			     struct drm_file *file);
 
 void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm);
+void i915_vm_bind_signal_fence(struct i915_vma *vma,
+			       struct dma_fence * const fence);
+
 #endif /* __I915_GEM_VM_BIND_H */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 3b45529fe8d4c..e57b9c492a7f9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -5,6 +5,8 @@
 
 #include <linux/interval_tree_generic.h>
 
+#include <drm/drm_syncobj.h>
+
 #include "gem/i915_gem_vm_bind.h"
 #include "gem/i915_gem_context.h"
 #include "gt/gen8_engine_cs.h"
@@ -109,6 +111,67 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
 	}
 }
 
+static int i915_vm_bind_add_fence(struct drm_file *file, struct i915_vma *vma,
+				  u32 handle, u64 point)
+{
+	struct drm_syncobj *syncobj;
+
+	syncobj = drm_syncobj_find(file, handle);
+	if (!syncobj) {
+		DRM_DEBUG("Invalid syncobj handle provided\n");
+		return -ENOENT;
+	}
+
+	/*
+	 * For timeline syncobjs we need to preallocate chains for
+	 * later signaling.
+	 */
+	if (point) {
+		vma->vm_bind_fence.chain_fence = dma_fence_chain_alloc();
+		if (!vma->vm_bind_fence.chain_fence) {
+			drm_syncobj_put(syncobj);
+			return -ENOMEM;
+		}
+	} else {
+		vma->vm_bind_fence.chain_fence = NULL;
+	}
+	vma->vm_bind_fence.syncobj = syncobj;
+	vma->vm_bind_fence.value = point;
+
+	return 0;
+}
+
+static void i915_vm_bind_put_fence(struct i915_vma *vma)
+{
+	if (!vma->vm_bind_fence.syncobj)
+		return;
+
+	drm_syncobj_put(vma->vm_bind_fence.syncobj);
+	dma_fence_chain_free(vma->vm_bind_fence.chain_fence);
+}
+
+void i915_vm_bind_signal_fence(struct i915_vma *vma,
+			       struct dma_fence * const fence)
+{
+	struct drm_syncobj *syncobj = vma->vm_bind_fence.syncobj;
+
+	if (!syncobj)
+		return;
+
+	if (vma->vm_bind_fence.chain_fence) {
+		drm_syncobj_add_point(syncobj,
+				      vma->vm_bind_fence.chain_fence,
+				      fence, vma->vm_bind_fence.value);
+		/*
+		 * The chain's ownership is transferred to the
+		 * timeline.
+		 */
+		vma->vm_bind_fence.chain_fence = NULL;
+	} else {
+		drm_syncobj_replace_fence(syncobj, fence);
+	}
+}
+
 static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
 				  struct i915_vma *vma,
 				  struct drm_i915_gem_vm_unbind *va)
@@ -243,6 +306,15 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		goto unlock_vm;
 	}
 
+	if (va->fence.flags & I915_TIMELINE_FENCE_SIGNAL) {
+		ret = i915_vm_bind_add_fence(file, vma, va->fence.handle,
+					     va->fence.value);
+		if (ret)
+			goto put_vma;
+	}
+
+	pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
+
 	for_i915_gem_ww(&ww, ret, true) {
 retry:
 		ret = i915_gem_object_lock(vma->obj, &ww);
@@ -267,12 +339,22 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 			ret = i915_gem_ww_ctx_backoff(&ww);
 			if (!ret)
 				goto retry;
+
 		} else {
 			/* Hold object reference until vm_unbind */
 			i915_gem_object_get(vma->obj);
 		}
 	}
 
+	if (va->fence.flags & I915_TIMELINE_FENCE_SIGNAL)
+		i915_vm_bind_put_fence(vma);
+
+put_vma:
+	if (ret && vma) {
+		i915_vma_set_freed(vma);
+		i915_vma_destroy(vma);
+	}
+
 unlock_vm:
 	mutex_unlock(&vm->vm_bind_lock);
 
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 0eb7727d62a6f..6ca37ce2b35a8 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1542,8 +1542,12 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 err_vma_res:
 	i915_vma_resource_free(vma_res);
 err_fence:
-	if (work)
+	if (work) {
+		if (i915_vma_is_persistent(vma))
+			i915_vm_bind_signal_fence(vma, &work->base.dma);
+
 		dma_fence_work_commit_imm(&work->base);
+	}
 err_rpm:
 	if (wakeref)
 		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index 5483ccf0c82c7..8bf870a0f689b 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -318,6 +318,13 @@ struct i915_vma {
 	/* @vm_rebind_link: link to vm_rebind_list and protected by vm_rebind_lock */
 	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
 
+	/** Timeline fence for vm_bind completion notification */
+	struct {
+		struct drm_syncobj *syncobj;
+		u64 value;
+		struct dma_fence_chain *chain_fence;
+	} vm_bind_fence;
+
 	/** Interval tree structures for persistent vma */
 
 	/** @rb: node for the interval tree of vm for persistent vmas */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 09/17] drm/i915: Do not support vm_bind mode in execbuf2
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
                   ` (7 preceding siblings ...)
  2022-08-27 19:43 ` [RFC PATCH v3 08/17] drm/i915/vm_bind: Add out fence support Andi Shyti
@ 2022-08-27 19:43 ` Andi Shyti
  2022-08-31  5:45   ` Niranjana Vishwanathapura
  2022-08-27 19:43 ` [RFC PATCH v3 10/17] drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl Andi Shyti
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Do not support the vm in vm_bind_mode in execbuf2 ioctl.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index cd75b0ca2555f..f85f10cf9c34b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -781,6 +781,11 @@ static int eb_select_context(struct i915_execbuffer *eb)
 	if (unlikely(IS_ERR(ctx)))
 		return PTR_ERR(ctx);
 
+	if (ctx->vm->vm_bind_mode) {
+		i915_gem_context_put(ctx);
+		return -EOPNOTSUPP;
+	}
+
 	eb->gem_context = ctx;
 	if (i915_gem_context_has_full_ppgtt(ctx))
 		eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 10/17] drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
                   ` (8 preceding siblings ...)
  2022-08-27 19:43 ` [RFC PATCH v3 09/17] drm/i915: Do not support vm_bind mode in execbuf2 Andi Shyti
@ 2022-08-27 19:43 ` Andi Shyti
  2022-08-31  7:38   ` [Intel-gfx] " Tvrtko Ursulin
  2022-08-27 19:43 ` [RFC PATCH v3 11/17] drm/i915: Add i915_vma_is_bind_complete() Andi Shyti
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Implement new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
works in vm_bind mode. The vm_bind mode only works with
this new execbuf3 ioctl.

The new execbuf3 ioctl will not have any list of objects to validate
bind as all required objects binding would have been requested by the
userspace before submitting the execbuf3.

And the legacy support like relocations etc are removed.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |    1 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1000 +++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
 include/uapi/drm/i915_drm.h                   |   62 +
 4 files changed, 1065 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 4e1627e96c6e0..38cd1c5bc1a55 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -148,6 +148,7 @@ gem-y += \
 	gem/i915_gem_dmabuf.o \
 	gem/i915_gem_domain.o \
 	gem/i915_gem_execbuffer.o \
+	gem/i915_gem_execbuffer3.o \
 	gem/i915_gem_internal.o \
 	gem/i915_gem_object.o \
 	gem/i915_gem_lmem.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
new file mode 100644
index 0000000000000..a3d767cd9f808
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
@@ -0,0 +1,1000 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include <linux/dma-resv.h>
+#include <linux/sync_file.h>
+#include <linux/uaccess.h>
+
+#include <drm/drm_syncobj.h>
+
+#include "gt/intel_context.h"
+#include "gt/intel_gpu_commands.h"
+#include "gt/intel_gt.h"
+#include "gt/intel_gt_pm.h"
+#include "gt/intel_ring.h"
+
+#include "i915_drv.h"
+#include "i915_file_private.h"
+#include "i915_gem_context.h"
+#include "i915_gem_ioctls.h"
+#include "i915_gem_vm_bind.h"
+#include "i915_trace.h"
+
+#define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
+#define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
+
+/* Catch emission of unexpected errors for CI! */
+#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
+#undef EINVAL
+#define EINVAL ({ \
+	DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \
+	22; \
+})
+#endif
+
+/**
+ * DOC: User command execution with execbuf3 ioctl
+ *
+ * A VM in VM_BIND mode will not support older execbuf mode of binding.
+ * The execbuf ioctl handling in VM_BIND mode differs significantly from the
+ * older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
+ * Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
+ * struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
+ * execlist. Hence, no support for implicit sync.
+ *
+ * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
+ * works with execbuf3 ioctl for submission.
+ *
+ * The execbuf3 ioctl directly specifies the batch addresses instead of as
+ * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
+ * support many of the older features like in/out/submit fences, fence array,
+ * default gem context etc. (See struct drm_i915_gem_execbuffer3).
+ *
+ * In VM_BIND mode, VA allocation is completely managed by the user instead of
+ * the i915 driver. Hence all VA assignment, eviction are not applicable in
+ * VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
+ * be using the i915_vma active reference tracking. It will instead check the
+ * dma-resv object's fence list for that.
+ *
+ * So, a lot of code supporting execbuf2 ioctl, like relocations, VA evictions,
+ * vma lookup table, implicit sync, vma active reference tracking etc., are not
+ * applicable for execbuf3 ioctl.
+ */
+
+struct eb_fence {
+	struct drm_syncobj *syncobj;
+	struct dma_fence *dma_fence;
+	u64 value;
+	struct dma_fence_chain *chain_fence;
+};
+
+/**
+ * struct i915_execbuffer - execbuf struct for execbuf3
+ * @i915: reference to the i915 instance we run on
+ * @file: drm file reference
+ * args: execbuf3 ioctl structure
+ * @gt: reference to the gt instance ioctl submitted for
+ * @context: logical state for the request
+ * @gem_context: callers context
+ * @requests: requests to be build
+ * @composite_fence: used for excl fence in dma_resv objects when > 1 BB submitted
+ * @ww: i915_gem_ww_ctx instance
+ * @num_batches: number of batches submitted
+ * @batch_addresses: addresses corresponds to the submitted batches
+ * @batches: references to the i915_vmas corresponding to the batches
+ */
+struct i915_execbuffer {
+	struct drm_i915_private *i915;
+	struct drm_file *file;
+	struct drm_i915_gem_execbuffer3 *args;
+
+	struct intel_gt *gt;
+	struct intel_context *context;
+	struct i915_gem_context *gem_context;
+
+	struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
+	struct dma_fence *composite_fence;
+
+	struct i915_gem_ww_ctx ww;
+
+	unsigned int num_batches;
+	u64 batch_addresses[MAX_ENGINE_INSTANCE + 1];
+	struct i915_vma *batches[MAX_ENGINE_INSTANCE + 1];
+
+	struct eb_fence *fences;
+	unsigned long num_fences;
+};
+
+static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle);
+
+static int eb_select_context(struct i915_execbuffer *eb)
+{
+	struct i915_gem_context *ctx;
+
+	ctx = i915_gem_context_lookup(eb->file->driver_priv, eb->args->ctx_id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	eb->gem_context = ctx;
+	return 0;
+}
+
+static struct i915_vma *
+eb_find_vma(struct i915_address_space *vm, u64 addr)
+{
+	u64 va;
+
+	lockdep_assert_held(&vm->vm_bind_lock);
+
+	va = gen8_noncanonical_addr(addr & PIN_OFFSET_MASK);
+	return i915_gem_vm_bind_lookup_vma(vm, va);
+}
+
+static int eb_lookup_vma_all(struct i915_execbuffer *eb)
+{
+	unsigned int i, current_batch = 0;
+	struct i915_vma *vma;
+
+	for (i = 0; i < eb->num_batches; i++) {
+		vma = eb_find_vma(eb->context->vm, eb->batch_addresses[i]);
+		if (!vma)
+			return -EINVAL;
+
+		eb->batches[current_batch] = vma;
+		++current_batch;
+	}
+
+	return 0;
+}
+
+static void eb_release_vma_all(struct i915_execbuffer *eb, bool final)
+{
+}
+
+static int eb_validate_vma_all(struct i915_execbuffer *eb)
+{
+	/* only throttle once, even if we didn't need to throttle */
+	for (bool throttle = true;; throttle = false) {
+		int err;
+
+		err = eb_pin_engine(eb, throttle);
+		if (!err)
+			return 0;
+
+		if (err != -EDEADLK)
+			return err;
+
+		err = i915_gem_ww_ctx_backoff(&eb->ww);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+/*
+ * Using two helper loops for the order of which requests / batches are created
+ * and added the to backend. Requests are created in order from the parent to
+ * the last child. Requests are added in the reverse order, from the last child
+ * to parent. This is done for locking reasons as the timeline lock is acquired
+ * during request creation and released when the request is added to the
+ * backend. To make lockdep happy (see intel_context_timeline_lock) this must be
+ * the ordering.
+ */
+#define for_each_batch_create_order(_eb, _i) \
+	for ((_i) = 0; (_i) < (_eb)->num_batches; ++(_i))
+#define for_each_batch_add_order(_eb, _i) \
+	BUILD_BUG_ON(!typecheck(int, _i)); \
+	for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
+
+static int eb_move_to_gpu(struct i915_execbuffer *eb)
+{
+	/* Unconditionally flush any chipset caches (for streaming writes). */
+	intel_gt_chipset_flush(eb->gt);
+
+	return 0;
+}
+
+static int eb_request_submit(struct i915_execbuffer *eb,
+			     struct i915_request *rq,
+			     struct i915_vma *batch,
+			     u64 batch_len)
+{
+	struct intel_engine_cs *engine = rq->context->engine;
+	int err;
+
+	if (intel_context_nopreempt(rq->context))
+		__set_bit(I915_FENCE_FLAG_NOPREEMPT, &rq->fence.flags);
+
+	/*
+	 * After we completed waiting for other engines (using HW semaphores)
+	 * then we can signal that this request/batch is ready to run. This
+	 * allows us to determine if the batch is still waiting on the GPU
+	 * or actually running by checking the breadcrumb.
+	 */
+	if (engine->emit_init_breadcrumb) {
+		err = engine->emit_init_breadcrumb(rq);
+		if (err)
+			return err;
+	}
+
+	return engine->emit_bb_start(rq, batch->node.start, batch_len, 0);
+}
+
+static int eb_submit(struct i915_execbuffer *eb)
+{
+	unsigned int i;
+	int err;
+
+	err = eb_move_to_gpu(eb);
+
+	for_each_batch_create_order(eb, i) {
+		if (!eb->requests[i])
+			break;
+
+		trace_i915_request_queue(eb->requests[i], 0);
+		if (!err)
+			err = eb_request_submit(eb, eb->requests[i],
+						eb->batches[i],
+						eb->batches[i]->size);
+	}
+
+	return err;
+}
+
+static struct i915_request *eb_throttle(struct i915_execbuffer *eb, struct intel_context *ce)
+{
+	struct intel_ring *ring = ce->ring;
+	struct intel_timeline *tl = ce->timeline;
+	struct i915_request *rq;
+
+	/*
+	 * Completely unscientific finger-in-the-air estimates for suitable
+	 * maximum user request size (to avoid blocking) and then backoff.
+	 */
+	if (intel_ring_update_space(ring) >= PAGE_SIZE)
+		return NULL;
+
+	/*
+	 * Find a request that after waiting upon, there will be at least half
+	 * the ring available. The hysteresis allows us to compete for the
+	 * shared ring and should mean that we sleep less often prior to
+	 * claiming our resources, but not so long that the ring completely
+	 * drains before we can submit our next request.
+	 */
+	list_for_each_entry(rq, &tl->requests, link) {
+		if (rq->ring != ring)
+			continue;
+
+		if (__intel_ring_space(rq->postfix,
+				       ring->emit, ring->size) > ring->size / 2)
+			break;
+	}
+	if (&rq->link == &tl->requests)
+		return NULL; /* weird, we will check again later for real */
+
+	return i915_request_get(rq);
+}
+
+static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
+			   bool throttle)
+{
+	struct intel_timeline *tl;
+	struct i915_request *rq = NULL;
+
+	/*
+	 * Take a local wakeref for preparing to dispatch the execbuf as
+	 * we expect to access the hardware fairly frequently in the
+	 * process, and require the engine to be kept awake between accesses.
+	 * Upon dispatch, we acquire another prolonged wakeref that we hold
+	 * until the timeline is idle, which in turn releases the wakeref
+	 * taken on the engine, and the parent device.
+	 */
+	tl = intel_context_timeline_lock(ce);
+	if (IS_ERR(tl))
+		return PTR_ERR(tl);
+
+	intel_context_enter(ce);
+	if (throttle)
+		rq = eb_throttle(eb, ce);
+	intel_context_timeline_unlock(tl);
+
+	if (rq) {
+		bool nonblock = eb->file->filp->f_flags & O_NONBLOCK;
+		long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
+
+		if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
+				      timeout) < 0) {
+			i915_request_put(rq);
+
+			/*
+			 * Error path, cannot use intel_context_timeline_lock as
+			 * that is user interruptable and this clean up step
+			 * must be done.
+			 */
+			mutex_lock(&ce->timeline->mutex);
+			intel_context_exit(ce);
+			mutex_unlock(&ce->timeline->mutex);
+
+			if (nonblock)
+				return -EWOULDBLOCK;
+			else
+				return -EINTR;
+		}
+		i915_request_put(rq);
+	}
+
+	return 0;
+}
+
+static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
+{
+	struct intel_context *ce = eb->context, *child;
+	int err;
+	int i = 0, j = 0;
+
+	GEM_BUG_ON(eb->args->flags & __EXEC3_ENGINE_PINNED);
+
+	if (unlikely(intel_context_is_banned(ce)))
+		return -EIO;
+
+	/*
+	 * Pinning the contexts may generate requests in order to acquire
+	 * GGTT space, so do this first before we reserve a seqno for
+	 * ourselves.
+	 */
+	err = intel_context_pin_ww(ce, &eb->ww);
+	if (err)
+		return err;
+
+	for_each_child(ce, child) {
+		err = intel_context_pin_ww(child, &eb->ww);
+		GEM_BUG_ON(err);	/* perma-pinned should incr a counter */
+	}
+
+	for_each_child(ce, child) {
+		err = eb_pin_timeline(eb, child, throttle);
+		if (err)
+			goto unwind;
+		++i;
+	}
+	err = eb_pin_timeline(eb, ce, throttle);
+	if (err)
+		goto unwind;
+
+	eb->args->flags |= __EXEC3_ENGINE_PINNED;
+	return 0;
+
+unwind:
+	for_each_child(ce, child) {
+		if (j++ < i) {
+			mutex_lock(&child->timeline->mutex);
+			intel_context_exit(child);
+			mutex_unlock(&child->timeline->mutex);
+		}
+	}
+	for_each_child(ce, child)
+		intel_context_unpin(child);
+	intel_context_unpin(ce);
+	return err;
+}
+
+static int
+eb_select_engine(struct i915_execbuffer *eb)
+{
+	struct intel_context *ce, *child;
+	unsigned int idx;
+	int err;
+
+	if (!i915_gem_context_user_engines(eb->gem_context))
+		return -EINVAL;
+
+	idx = eb->args->engine_idx;
+	ce = i915_gem_context_get_engine(eb->gem_context, idx);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
+	eb->num_batches = ce->parallel.number_children + 1;
+
+	for_each_child(ce, child)
+		intel_context_get(child);
+	intel_gt_pm_get(ce->engine->gt);
+
+	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
+		err = intel_context_alloc_state(ce);
+		if (err)
+			goto err;
+	}
+	for_each_child(ce, child) {
+		if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
+			err = intel_context_alloc_state(child);
+			if (err)
+				goto err;
+		}
+	}
+
+	/*
+	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
+	 * EIO if the GPU is already wedged.
+	 */
+	err = intel_gt_terminally_wedged(ce->engine->gt);
+	if (err)
+		goto err;
+
+	if (!i915_vm_tryget(ce->vm)) {
+		err = -ENOENT;
+		goto err;
+	}
+
+	eb->context = ce;
+	eb->gt = ce->engine->gt;
+
+	/*
+	 * Make sure engine pool stays alive even if we call intel_context_put
+	 * during ww handling. The pool is destroyed when last pm reference
+	 * is dropped, which breaks our -EDEADLK handling.
+	 */
+	return err;
+
+err:
+	intel_gt_pm_put(ce->engine->gt);
+	for_each_child(ce, child)
+		intel_context_put(child);
+	intel_context_put(ce);
+	return err;
+}
+
+static void
+eb_put_engine(struct i915_execbuffer *eb)
+{
+	struct intel_context *child;
+
+	i915_vm_put(eb->context->vm);
+	intel_gt_pm_put(eb->gt);
+	for_each_child(eb->context, child)
+		intel_context_put(child);
+	intel_context_put(eb->context);
+}
+
+static void
+__free_fence_array(struct eb_fence *fences, unsigned int n)
+{
+	while (n--) {
+		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
+		dma_fence_put(fences[n].dma_fence);
+		dma_fence_chain_free(fences[n].chain_fence);
+	}
+	kvfree(fences);
+}
+
+static int add_timeline_fence_array(struct i915_execbuffer *eb)
+{
+	struct drm_i915_gem_timeline_fence __user *user_fences;
+	struct eb_fence *f;
+	u64 nfences;
+	int err = 0;
+
+	nfences = eb->args->fence_count;
+	if (!nfences)
+		return 0;
+
+	/* Check multiplication overflow for access_ok() and kvmalloc_array() */
+	BUILD_BUG_ON(sizeof(size_t) > sizeof(unsigned long));
+	if (nfences > min_t(unsigned long,
+			    ULONG_MAX / sizeof(*user_fences),
+			    SIZE_MAX / sizeof(*f)) - eb->num_fences)
+		return -EINVAL;
+
+	user_fences = u64_to_user_ptr(eb->args->timeline_fences);
+	if (!access_ok(user_fences, nfences * sizeof(*user_fences)))
+		return -EFAULT;
+
+	f = krealloc(eb->fences,
+		     (eb->num_fences + nfences) * sizeof(*f),
+		     __GFP_NOWARN | GFP_KERNEL);
+	if (!f)
+		return -ENOMEM;
+
+	eb->fences = f;
+	f += eb->num_fences;
+
+	BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
+		     ~__I915_TIMELINE_FENCE_UNKNOWN_FLAGS);
+
+	while (nfences--) {
+		struct drm_i915_gem_timeline_fence user_fence;
+		struct drm_syncobj *syncobj;
+		struct dma_fence *fence = NULL;
+		u64 point;
+
+		if (__copy_from_user(&user_fence,
+				     user_fences++,
+				     sizeof(user_fence)))
+			return -EFAULT;
+
+		if (user_fence.flags & __I915_TIMELINE_FENCE_UNKNOWN_FLAGS)
+			return -EINVAL;
+
+		syncobj = drm_syncobj_find(eb->file, user_fence.handle);
+		if (!syncobj) {
+			DRM_DEBUG("Invalid syncobj handle provided\n");
+			return -ENOENT;
+		}
+
+		fence = drm_syncobj_fence_get(syncobj);
+
+		if (!fence && user_fence.flags &&
+		    !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL)) {
+			DRM_DEBUG("Syncobj handle has no fence\n");
+			drm_syncobj_put(syncobj);
+			return -EINVAL;
+		}
+
+		point = user_fence.value;
+		if (fence)
+			err = dma_fence_chain_find_seqno(&fence, point);
+
+		if (err && !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL)) {
+			DRM_DEBUG("Syncobj handle missing requested point %llu\n", point);
+			dma_fence_put(fence);
+			drm_syncobj_put(syncobj);
+			return err;
+		}
+
+		/*
+		 * A point might have been signaled already and
+		 * garbage collected from the timeline. In this case
+		 * just ignore the point and carry on.
+		 */
+		if (!fence && !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL)) {
+			drm_syncobj_put(syncobj);
+			continue;
+		}
+
+		/*
+		 * For timeline syncobjs we need to preallocate chains for
+		 * later signaling.
+		 */
+		if (point != 0 && user_fence.flags & I915_TIMELINE_FENCE_SIGNAL) {
+			/*
+			 * Waiting and signaling the same point (when point !=
+			 * 0) would break the timeline.
+			 */
+			if (user_fence.flags & I915_TIMELINE_FENCE_WAIT) {
+				DRM_DEBUG("Trying to wait & signal the same timeline point.\n");
+				dma_fence_put(fence);
+				drm_syncobj_put(syncobj);
+				return -EINVAL;
+			}
+
+			f->chain_fence = dma_fence_chain_alloc();
+			if (!f->chain_fence) {
+				drm_syncobj_put(syncobj);
+				dma_fence_put(fence);
+				return -ENOMEM;
+			}
+		} else {
+			f->chain_fence = NULL;
+		}
+
+		f->syncobj = ptr_pack_bits(syncobj, user_fence.flags, 2);
+		f->dma_fence = fence;
+		f->value = point;
+		f++;
+		eb->num_fences++;
+	}
+
+	return 0;
+}
+
+static void put_fence_array(struct eb_fence *fences, int num_fences)
+{
+	if (fences)
+		__free_fence_array(fences, num_fences);
+}
+
+static int
+await_fence_array(struct i915_execbuffer *eb,
+		  struct i915_request *rq)
+{
+	unsigned int n;
+
+	for (n = 0; n < eb->num_fences; n++) {
+		int err;
+
+		struct drm_syncobj *syncobj;
+		unsigned int flags;
+
+		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
+
+		if (!eb->fences[n].dma_fence)
+			continue;
+
+		err = i915_request_await_dma_fence(rq, eb->fences[n].dma_fence);
+		if (err < 0)
+			return err;
+	}
+
+	return 0;
+}
+
+static void signal_fence_array(const struct i915_execbuffer *eb,
+			       struct dma_fence * const fence)
+{
+	unsigned int n;
+
+	for (n = 0; n < eb->num_fences; n++) {
+		struct drm_syncobj *syncobj;
+		unsigned int flags;
+
+		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
+		if (!(flags & I915_TIMELINE_FENCE_SIGNAL))
+			continue;
+
+		if (eb->fences[n].chain_fence) {
+			drm_syncobj_add_point(syncobj,
+					      eb->fences[n].chain_fence,
+					      fence,
+					      eb->fences[n].value);
+			/*
+			 * The chain's ownership is transferred to the
+			 * timeline.
+			 */
+			eb->fences[n].chain_fence = NULL;
+		} else {
+			drm_syncobj_replace_fence(syncobj, fence);
+		}
+	}
+}
+
+static int parse_timeline_fences(struct i915_execbuffer *eb)
+{
+	return add_timeline_fence_array(eb);
+}
+
+static int parse_batch_addresses(struct i915_execbuffer *eb)
+{
+	struct drm_i915_gem_execbuffer3 *args = eb->args;
+	u64 __user *batch_addr = u64_to_user_ptr(args->batch_address);
+
+	if (copy_from_user(eb->batch_addresses, batch_addr,
+			   sizeof(batch_addr[0]) * eb->num_batches))
+		return -EFAULT;
+
+	return 0;
+}
+
+static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
+{
+	struct i915_request *rq, *rn;
+
+	list_for_each_entry_safe(rq, rn, &tl->requests, link)
+		if (rq == end || !i915_request_retire(rq))
+			break;
+}
+
+static int eb_request_add(struct i915_execbuffer *eb, struct i915_request *rq,
+			  int err, bool last_parallel)
+{
+	struct intel_timeline * const tl = i915_request_timeline(rq);
+	struct i915_sched_attr attr = {};
+	struct i915_request *prev;
+
+	lockdep_assert_held(&tl->mutex);
+	lockdep_unpin_lock(&tl->mutex, rq->cookie);
+
+	trace_i915_request_add(rq);
+
+	prev = __i915_request_commit(rq);
+
+	/* Check that the context wasn't destroyed before submission */
+	if (likely(!intel_context_is_closed(eb->context))) {
+		attr = eb->gem_context->sched;
+	} else {
+		/* Serialise with context_close via the add_to_timeline */
+		i915_request_set_error_once(rq, -ENOENT);
+		__i915_request_skip(rq);
+		err = -ENOENT; /* override any transient errors */
+	}
+
+	if (intel_context_is_parallel(eb->context)) {
+		if (err) {
+			__i915_request_skip(rq);
+			set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
+				&rq->fence.flags);
+		}
+		if (last_parallel)
+			set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
+				&rq->fence.flags);
+	}
+
+	__i915_request_queue(rq, &attr);
+
+	/* Try to clean up the client's timeline after submitting the request */
+	if (prev)
+		retire_requests(tl, prev);
+
+	mutex_unlock(&tl->mutex);
+
+	return err;
+}
+
+static int eb_request_add_all(struct i915_execbuffer *eb, int err)
+{
+	int i;
+
+	/*
+	 * We iterate in reverse order of creation to release timeline mutexes in
+	 * same order.
+	 */
+	for_each_batch_add_order(eb, i) {
+		struct i915_request *rq = eb->requests[i];
+
+		if (!rq)
+			continue;
+
+		err = eb_request_add(eb, rq, err, i == 0);
+	}
+
+	return err;
+}
+
+static void eb_requests_get(struct i915_execbuffer *eb)
+{
+	unsigned int i;
+
+	for_each_batch_create_order(eb, i) {
+		if (!eb->requests[i])
+			break;
+
+		i915_request_get(eb->requests[i]);
+	}
+}
+
+static void eb_requests_put(struct i915_execbuffer *eb)
+{
+	unsigned int i;
+
+	for_each_batch_create_order(eb, i) {
+		if (!eb->requests[i])
+			break;
+
+		i915_request_put(eb->requests[i]);
+	}
+}
+
+static int
+eb_composite_fence_create(struct i915_execbuffer *eb)
+{
+	struct dma_fence_array *fence_array;
+	struct dma_fence **fences;
+	unsigned int i;
+
+	GEM_BUG_ON(!intel_context_is_parent(eb->context));
+
+	fences = kmalloc_array(eb->num_batches, sizeof(*fences), GFP_KERNEL);
+	if (!fences)
+		return -ENOMEM;
+
+	for_each_batch_create_order(eb, i) {
+		fences[i] = &eb->requests[i]->fence;
+		__set_bit(I915_FENCE_FLAG_COMPOSITE,
+			  &eb->requests[i]->fence.flags);
+	}
+
+	fence_array = dma_fence_array_create(eb->num_batches,
+					     fences,
+					     eb->context->parallel.fence_context,
+					     eb->context->parallel.seqno++,
+					     false);
+	if (!fence_array) {
+		kfree(fences);
+		return -ENOMEM;
+	}
+
+	/* Move ownership to the dma_fence_array created above */
+	for_each_batch_create_order(eb, i)
+		dma_fence_get(fences[i]);
+
+	eb->composite_fence = &fence_array->base;
+
+	return 0;
+}
+
+static int
+eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq)
+{
+	int err;
+
+	if (unlikely(eb->gem_context->syncobj)) {
+		struct dma_fence *fence;
+
+		fence = drm_syncobj_fence_get(eb->gem_context->syncobj);
+		err = i915_request_await_dma_fence(rq, fence);
+		dma_fence_put(fence);
+		if (err)
+			return err;
+	}
+
+	if (eb->fences) {
+		err = await_fence_array(eb, rq);
+		if (err)
+			return err;
+	}
+
+	if (intel_context_is_parallel(eb->context)) {
+		err = eb_composite_fence_create(eb);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static struct intel_context *
+eb_find_context(struct i915_execbuffer *eb, unsigned int context_number)
+{
+	struct intel_context *child;
+
+	if (likely(context_number == 0))
+		return eb->context;
+
+	for_each_child(eb->context, child)
+		if (!--context_number)
+			return child;
+
+	GEM_BUG_ON("Context not found");
+
+	return NULL;
+}
+
+static int eb_requests_create(struct i915_execbuffer *eb)
+{
+	unsigned int i;
+	int err;
+
+	for_each_batch_create_order(eb, i) {
+		/* Allocate a request for this batch buffer nice and early. */
+		eb->requests[i] = i915_request_create(eb_find_context(eb, i));
+		if (IS_ERR(eb->requests[i])) {
+			err = PTR_ERR(eb->requests[i]);
+			eb->requests[i] = NULL;
+			return err;
+		}
+
+		/*
+		 * Only the first request added (committed to backend) has to
+		 * take the in fences into account as all subsequent requests
+		 * will have fences inserted inbetween them.
+		 */
+		if (i + 1 == eb->num_batches) {
+			err = eb_fences_add(eb, eb->requests[i]);
+			if (err)
+				return err;
+		}
+
+		if (eb->batches[i])
+			eb->requests[i]->batch_res =
+				i915_vma_resource_get(eb->batches[i]->resource);
+	}
+
+	return 0;
+}
+
+static int
+i915_gem_do_execbuffer(struct drm_device *dev,
+		       struct drm_file *file,
+		       struct drm_i915_gem_execbuffer3 *args)
+{
+	struct drm_i915_private *i915 = to_i915(dev);
+	struct i915_execbuffer eb;
+	int err;
+
+	BUILD_BUG_ON(__EXEC3_INTERNAL_FLAGS & ~__I915_EXEC3_UNKNOWN_FLAGS);
+
+	eb.i915 = i915;
+	eb.file = file;
+	eb.args = args;
+
+	eb.fences = NULL;
+	eb.num_fences = 0;
+
+	memset(eb.requests, 0, sizeof(struct i915_request *) *
+	       ARRAY_SIZE(eb.requests));
+	eb.composite_fence = NULL;
+
+	err = parse_timeline_fences(&eb);
+	if (err)
+		return err;
+
+	err = eb_select_context(&eb);
+	if (unlikely(err))
+		goto err_fences;
+
+	err = eb_select_engine(&eb);
+	if (unlikely(err))
+		goto err_context;
+
+	err = parse_batch_addresses(&eb);
+	if (unlikely(err))
+		goto err_engine;
+
+	mutex_lock(&eb.context->vm->vm_bind_lock);
+
+	err = eb_lookup_vma_all(&eb);
+	if (err) {
+		eb_release_vma_all(&eb, true);
+		goto err_vm_bind_lock;
+	}
+
+	i915_gem_ww_ctx_init(&eb.ww, true);
+
+	err = eb_validate_vma_all(&eb);
+	if (err)
+		goto err_vma;
+
+	ww_acquire_done(&eb.ww.ctx);
+
+	err = eb_requests_create(&eb);
+	if (err) {
+		if (eb.requests[0])
+			goto err_request;
+		else
+			goto err_vma;
+	}
+
+	err = eb_submit(&eb);
+
+err_request:
+	eb_requests_get(&eb);
+	err = eb_request_add_all(&eb, err);
+
+	if (eb.fences)
+		signal_fence_array(&eb, eb.composite_fence ?
+				   eb.composite_fence :
+				   &eb.requests[0]->fence);
+
+	if (unlikely(eb.gem_context->syncobj)) {
+		drm_syncobj_replace_fence(eb.gem_context->syncobj,
+					  eb.composite_fence ?
+					  eb.composite_fence :
+					  &eb.requests[0]->fence);
+	}
+
+	if (eb.composite_fence)
+		dma_fence_put(eb.composite_fence);
+
+	eb_requests_put(&eb);
+
+err_vma:
+	eb_release_vma_all(&eb, true);
+	WARN_ON(err == -EDEADLK);
+	i915_gem_ww_ctx_fini(&eb.ww);
+err_vm_bind_lock:
+	mutex_unlock(&eb.context->vm->vm_bind_lock);
+err_engine:
+	eb_put_engine(&eb);
+err_context:
+	i915_gem_context_put(eb.gem_context);
+err_fences:
+	put_fence_array(eb.fences, eb.num_fences);
+	return err;
+}
+
+int
+i915_gem_execbuffer3_ioctl(struct drm_device *dev, void *data,
+			   struct drm_file *file)
+{
+	struct drm_i915_gem_execbuffer3 *args = data;
+	int err;
+
+	if (args->flags & __I915_EXEC3_UNKNOWN_FLAGS)
+		return -EINVAL;
+
+	err = i915_gem_do_execbuffer(dev, file, args);
+
+	args->flags &= ~__I915_EXEC3_UNKNOWN_FLAGS;
+	return err;
+}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
index 28d6526e32ab0..b7a1e9725a841 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
@@ -18,6 +18,8 @@ int i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *file);
 int i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 			       struct drm_file *file);
+int i915_gem_execbuffer3_ioctl(struct drm_device *dev, void *data,
+			       struct drm_file *file);
 int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *file);
 int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 3da0e07f84bbd..ea1906873f278 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1542,6 +1542,68 @@ struct drm_i915_gem_timeline_fence {
 	__u64 value;
 };
 
+/**
+ * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
+ * ioctl.
+ *
+ * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
+ * only works with this ioctl for submission.
+ * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
+ */
+struct drm_i915_gem_execbuffer3 {
+	/**
+	 * @ctx_id: Context id
+	 *
+	 * Only contexts with user engine map are allowed.
+	 */
+	__u32 ctx_id;
+
+	/**
+	 * @engine_idx: Engine index
+	 *
+	 * An index in the user engine map of the context specified by @ctx_id.
+	 */
+	__u32 engine_idx;
+
+	/**
+	 * @batch_address: Batch gpu virtual address/es.
+	 *
+	 * For normal submission, it is the gpu virtual address of the batch
+	 * buffer. For parallel submission, it is a pointer to an array of
+	 * batch buffer gpu virtual addresses with array size equal to the
+	 * number of (parallel) engines involved in that submission (See
+	 * struct i915_context_engines_parallel_submit).
+	 */
+	__u64 batch_address;
+
+	/** @flags: Currently reserved, MBZ */
+	__u64 flags;
+#define __I915_EXEC3_UNKNOWN_FLAGS (~0)
+
+	/** @rsvd1: Reserved, MBZ */
+	__u32 rsvd1;
+
+	/** @fence_count: Number of fences in @timeline_fences array. */
+	__u32 fence_count;
+
+	/**
+	 * @timeline_fences: Pointer to an array of timeline fences.
+	 *
+	 * Timeline fences are of format struct drm_i915_gem_timeline_fence.
+	 */
+	__u64 timeline_fences;
+
+	/** @rsvd2: Reserved, MBZ */
+	__u64 rsvd2;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
 struct drm_i915_gem_pin {
 	/** Handle of the buffer to be pinned. */
 	__u32 handle;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 11/17] drm/i915: Add i915_vma_is_bind_complete()
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
                   ` (9 preceding siblings ...)
  2022-08-27 19:43 ` [RFC PATCH v3 10/17] drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl Andi Shyti
@ 2022-08-27 19:43 ` Andi Shyti
  2022-08-27 19:43 ` [RFC PATCH v3 12/17] drm/i915/vm_bind: Handle persistent vmas in execbuf3 Andi Shyti
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Add i915_vma_is_bind_complete() to check if the binding of a
of the VM of a specific VMA is complete.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_vma.c | 28 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_vma.h |  1 +
 2 files changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 6ca37ce2b35a8..4b8ae58cd886b 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -406,6 +406,34 @@ int i915_vma_sync(struct i915_vma *vma)
 	return i915_vm_sync(vma->vm);
 }
 
+/**
+ * i915_vma_is_bind_complete() - Checks if the binding of the VM is complete
+ * @vma: virtual address where the virtual memory that is being checked for
+ * binding completion
+ *
+ * Returns true if the binding is complete, otherwise false.
+ */
+bool i915_vma_is_bind_complete(struct i915_vma *vma)
+{
+	/* Ensure vma bind is initiated */
+	if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
+		return false;
+
+	/* Ensure any binding started is complete */
+	if (rcu_access_pointer(vma->active.excl.fence)) {
+		struct dma_fence *fence;
+
+		rcu_read_lock();
+		fence = dma_fence_get_rcu_safe(&vma->active.excl.fence);
+		rcu_read_unlock();
+		if (fence) {
+			dma_fence_put(fence);
+			return false;
+		}
+	}
+	return true;
+}
+
 /**
  * i915_vma_verify_bind_complete() - Check for the vm_bind completion of the vma
  * @vma: vma submitted for vm_bind
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index bf0b5b4abd919..9f8c369c3b466 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -444,6 +444,7 @@ void i915_vma_make_purgeable(struct i915_vma *vma);
 int i915_vma_wait_for_bind(struct i915_vma *vma);
 int i915_vma_verify_bind_complete(struct i915_vma *vma);
 int i915_vma_sync(struct i915_vma *vma);
+bool i915_vma_is_bind_complete(struct i915_vma *vma);
 
 /**
  * i915_vma_get_current_resource - Get the current resource of the vma
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 12/17] drm/i915/vm_bind: Handle persistent vmas in execbuf3
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
                   ` (10 preceding siblings ...)
  2022-08-27 19:43 ` [RFC PATCH v3 11/17] drm/i915: Add i915_vma_is_bind_complete() Andi Shyti
@ 2022-08-27 19:43 ` Andi Shyti
  2022-08-27 19:43 ` [RFC PATCH v3 13/17] drm/i915/vm_bind: userptr dma-resv changes Andi Shyti
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Handle persistent (VM_BIND) mappings during the request submission
in the execbuf3 path.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 200 +++++++++++++++++-
 1 file changed, 199 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
index a3d767cd9f808..8e0dde26194e0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/dma-resv.h>
+#include <linux/lockdep.h>
 #include <linux/sync_file.h>
 #include <linux/uaccess.h>
 
@@ -22,6 +23,7 @@
 #include "i915_gem_vm_bind.h"
 #include "i915_trace.h"
 
+#define __EXEC3_HAS_PIN			BIT_ULL(33)
 #define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
 #define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
 
@@ -45,7 +47,9 @@
  * execlist. Hence, no support for implicit sync.
  *
  * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
- * works with execbuf3 ioctl for submission.
+ * works with execbuf3 ioctl for submission. All BOs mapped on that VM (through
+ * VM_BIND call) at the time of execbuf3 call are deemed required for that
+ * submission.
  *
  * The execbuf3 ioctl directly specifies the batch addresses instead of as
  * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
@@ -61,6 +65,13 @@
  * So, a lot of code supporting execbuf2 ioctl, like relocations, VA evictions,
  * vma lookup table, implicit sync, vma active reference tracking etc., are not
  * applicable for execbuf3 ioctl.
+ *
+ * During each execbuf submission, request fence is added to all VM_BIND mapped
+ * objects with DMA_RESV_USAGE_BOOKKEEP. The DMA_RESV_USAGE_BOOKKEEP usage will
+ * prevent over sync (See enum dma_resv_usage). Note that DRM_I915_GEM_WAIT and
+ * DRM_I915_GEM_BUSY ioctls do not check for DMA_RESV_USAGE_BOOKKEEP usage and
+ * hence should not be used for end of batch check. Instead, the execbuf3
+ * timeline out fence should be used for end of batch check.
  */
 
 struct eb_fence {
@@ -108,6 +119,7 @@ struct i915_execbuffer {
 };
 
 static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle);
+static void eb_unpin_engine(struct i915_execbuffer *eb);
 
 static int eb_select_context(struct i915_execbuffer *eb)
 {
@@ -132,6 +144,19 @@ eb_find_vma(struct i915_address_space *vm, u64 addr)
 	return i915_gem_vm_bind_lookup_vma(vm, va);
 }
 
+static void eb_scoop_unbound_vma_all(struct i915_address_space *vm)
+{
+	struct i915_vma *vma, *vn;
+
+	spin_lock(&vm->vm_rebind_lock);
+	list_for_each_entry_safe(vma, vn, &vm->vm_rebind_list, vm_rebind_link) {
+		list_del_init(&vma->vm_rebind_link);
+		if (!list_empty(&vma->vm_bind_link))
+			list_move_tail(&vma->vm_bind_link, &vm->vm_bind_list);
+	}
+	spin_unlock(&vm->vm_rebind_lock);
+}
+
 static int eb_lookup_vma_all(struct i915_execbuffer *eb)
 {
 	unsigned int i, current_batch = 0;
@@ -146,11 +171,119 @@ static int eb_lookup_vma_all(struct i915_execbuffer *eb)
 		++current_batch;
 	}
 
+	eb_scoop_unbound_vma_all(eb->context->vm);
+
 	return 0;
 }
 
+static int eb_lock_vma_all(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma;
+	int err;
+
+	err = i915_gem_object_lock(eb->context->vm->root_obj, &eb->ww);
+	if (err)
+		return err;
+
+	list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
+			    non_priv_vm_bind_link) {
+		err = i915_gem_object_lock(vma->obj, &eb->ww);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static void eb_release_persistent_vma_all(struct i915_execbuffer *eb,
+					  bool final)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma, *vn;
+
+	lockdep_assert_held(&vm->vm_bind_lock);
+
+	if (!(eb->args->flags & __EXEC3_HAS_PIN))
+		return;
+
+	assert_object_held(vm->root_obj);
+
+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
+		__i915_vma_unpin(vma);
+
+	eb->args->flags &= ~__EXEC3_HAS_PIN;
+	if (!final)
+		return;
+
+	list_for_each_entry_safe(vma, vn, &vm->vm_bind_list, vm_bind_link)
+		if (i915_vma_is_bind_complete(vma))
+			list_move_tail(&vma->vm_bind_link, &vm->vm_bound_list);
+}
+
 static void eb_release_vma_all(struct i915_execbuffer *eb, bool final)
 {
+	eb_release_persistent_vma_all(eb, final);
+	eb_unpin_engine(eb);
+}
+
+static int eb_reserve_fence_for_persistent_vma_all(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma;
+	int ret;
+
+	ret = dma_resv_reserve_fences(vm->root_obj->base.resv, 1);
+	if (ret)
+		return ret;
+
+	list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
+			    non_priv_vm_bind_link) {
+		ret = dma_resv_reserve_fences(vma->obj->base.resv, 1);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int eb_validate_persistent_vma_all(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma, *last_pinned_vma = NULL;
+	int ret = 0;
+
+	lockdep_assert_held(&vm->vm_bind_lock);
+	assert_object_held(vm->root_obj);
+
+	ret = eb_reserve_fence_for_persistent_vma_all(eb);
+	if (ret)
+		return ret;
+
+	if (list_empty(&vm->vm_bind_list))
+		return 0;
+
+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
+		u64 pin_flags = vma->start | PIN_OFFSET_FIXED | PIN_USER;
+
+		ret = i915_vma_pin_ww(vma, &eb->ww, 0, 0, pin_flags);
+		if (ret)
+			break;
+
+		last_pinned_vma = vma;
+	}
+
+	if (ret && last_pinned_vma) {
+		list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
+			__i915_vma_unpin(vma);
+			if (vma == last_pinned_vma)
+				break;
+		}
+	} else if (last_pinned_vma) {
+		eb->args->flags |= __EXEC3_HAS_PIN;
+	}
+
+	return ret;
 }
 
 static int eb_validate_vma_all(struct i915_execbuffer *eb)
@@ -160,6 +293,12 @@ static int eb_validate_vma_all(struct i915_execbuffer *eb)
 		int err;
 
 		err = eb_pin_engine(eb, throttle);
+		if (!err)
+			err = eb_lock_vma_all(eb);
+
+		if (!err)
+			err = eb_validate_persistent_vma_all(eb);
+
 		if (!err)
 			return 0;
 
@@ -189,8 +328,43 @@ static int eb_validate_vma_all(struct i915_execbuffer *eb)
 	BUILD_BUG_ON(!typecheck(int, _i)); \
 	for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
 
+static void __eb_persistent_add_shared_fence(struct drm_i915_gem_object *obj,
+					     struct dma_fence *fence)
+{
+	dma_resv_add_fence(obj->base.resv, fence, DMA_RESV_USAGE_BOOKKEEP);
+	obj->write_domain = 0;
+	obj->read_domains |= I915_GEM_GPU_DOMAINS;
+	obj->mm.dirty = true;
+}
+
+static void eb_persistent_add_shared_fence(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct dma_fence *fence;
+	struct i915_vma *vma;
+
+	fence = eb->composite_fence ? eb->composite_fence :
+		&eb->requests[0]->fence;
+
+	__eb_persistent_add_shared_fence(vm->root_obj, fence);
+	list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
+			    non_priv_vm_bind_link)
+		__eb_persistent_add_shared_fence(vma->obj, fence);
+}
+
+static void eb_move_all_persistent_vma_to_active(struct i915_execbuffer *eb)
+{
+	/* Add fence to BOs dma-resv fence list */
+	eb_persistent_add_shared_fence(eb);
+}
+
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
+	lockdep_assert_held(&eb->context->vm->vm_bind_lock);
+	assert_object_held(eb->context->vm->root_obj);
+
+	eb_move_all_persistent_vma_to_active(eb);
+
 	/* Unconditionally flush any chipset caches (for streaming writes). */
 	intel_gt_chipset_flush(eb->gt);
 
@@ -381,6 +555,30 @@ static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
 	return err;
 }
 
+static void eb_unpin_engine(struct i915_execbuffer *eb)
+{
+	struct intel_context *ce = eb->context, *child;
+
+	if (!(eb->args->flags & __EXEC3_ENGINE_PINNED))
+		return;
+
+	eb->args->flags &= ~__EXEC3_ENGINE_PINNED;
+
+	for_each_child(ce, child) {
+		mutex_lock(&child->timeline->mutex);
+		intel_context_exit(child);
+		mutex_unlock(&child->timeline->mutex);
+
+		intel_context_unpin(child);
+	}
+
+	mutex_lock(&ce->timeline->mutex);
+	intel_context_exit(ce);
+	mutex_unlock(&ce->timeline->mutex);
+
+	intel_context_unpin(ce);
+}
+
 static int
 eb_select_engine(struct i915_execbuffer *eb)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 13/17] drm/i915/vm_bind: userptr dma-resv changes
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
                   ` (11 preceding siblings ...)
  2022-08-27 19:43 ` [RFC PATCH v3 12/17] drm/i915/vm_bind: Handle persistent vmas in execbuf3 Andi Shyti
@ 2022-08-27 19:43 ` Andi Shyti
  2022-08-31  6:45   ` Niranjana Vishwanathapura
  2022-08-27 19:44 ` [RFC PATCH v3 14/17] drm/i915/vm_bind: Skip vma_lookup for persistent vmas Andi Shyti
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

For persistent (vm_bind) vmas of userptr BOs, handle the user
page pinning by using the i915_gem_object_userptr_submit_init()
/done() functions

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 139 ++++++++++++++----
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |  10 ++
 .../drm/i915/gem/i915_gem_vm_bind_object.c    |  16 ++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   4 +
 drivers/gpu/drm/i915/i915_vma_types.h         |   2 +
 6 files changed, 142 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
index 8e0dde26194e0..72d6771da2113 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
@@ -23,6 +23,7 @@
 #include "i915_gem_vm_bind.h"
 #include "i915_trace.h"
 
+#define __EXEC3_USERPTR_USED		BIT_ULL(34)
 #define __EXEC3_HAS_PIN			BIT_ULL(33)
 #define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
 #define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
@@ -157,10 +158,45 @@ static void eb_scoop_unbound_vma_all(struct i915_address_space *vm)
 	spin_unlock(&vm->vm_rebind_lock);
 }
 
+static int eb_lookup_persistent_userptr_vmas(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *last_vma = NULL;
+	struct i915_vma *vma;
+	int err;
+
+	lockdep_assert_held(&vm->vm_bind_lock);
+
+	list_for_each_entry(vma, &vm->vm_userptr_invalidated_list,
+			    vm_userptr_invalidated_link) {
+		list_del_init(&vma->vm_userptr_invalidated_link);
+		err = i915_gem_object_userptr_submit_init(vma->obj);
+		if (err)
+			return err;
+
+		last_vma = vma;
+	}
+
+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
+		if (i915_gem_object_is_userptr(vma->obj)) {
+			err = i915_gem_object_userptr_submit_init(vma->obj);
+			if (err)
+				return err;
+
+			last_vma = vma;
+		}
+
+	if (last_vma)
+		eb->args->flags |= __EXEC3_USERPTR_USED;
+
+	return 0;
+}
+
 static int eb_lookup_vma_all(struct i915_execbuffer *eb)
 {
 	unsigned int i, current_batch = 0;
 	struct i915_vma *vma;
+	int err = 0;
 
 	for (i = 0; i < eb->num_batches; i++) {
 		vma = eb_find_vma(eb->context->vm, eb->batch_addresses[i]);
@@ -171,6 +207,10 @@ static int eb_lookup_vma_all(struct i915_execbuffer *eb)
 		++current_batch;
 	}
 
+	err = eb_lookup_persistent_userptr_vmas(eb);
+	if (err)
+		return err;
+
 	eb_scoop_unbound_vma_all(eb->context->vm);
 
 	return 0;
@@ -286,33 +326,6 @@ static int eb_validate_persistent_vma_all(struct i915_execbuffer *eb)
 	return ret;
 }
 
-static int eb_validate_vma_all(struct i915_execbuffer *eb)
-{
-	/* only throttle once, even if we didn't need to throttle */
-	for (bool throttle = true;; throttle = false) {
-		int err;
-
-		err = eb_pin_engine(eb, throttle);
-		if (!err)
-			err = eb_lock_vma_all(eb);
-
-		if (!err)
-			err = eb_validate_persistent_vma_all(eb);
-
-		if (!err)
-			return 0;
-
-		if (err != -EDEADLK)
-			return err;
-
-		err = i915_gem_ww_ctx_backoff(&eb->ww);
-		if (err)
-			return err;
-	}
-
-	return 0;
-}
-
 /*
  * Using two helper loops for the order of which requests / batches are created
  * and added the to backend. Requests are created in order from the parent to
@@ -360,15 +373,51 @@ static void eb_move_all_persistent_vma_to_active(struct i915_execbuffer *eb)
 
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
+	int err = 0, j;
+
 	lockdep_assert_held(&eb->context->vm->vm_bind_lock);
 	assert_object_held(eb->context->vm->root_obj);
 
 	eb_move_all_persistent_vma_to_active(eb);
 
-	/* Unconditionally flush any chipset caches (for streaming writes). */
-	intel_gt_chipset_flush(eb->gt);
+#ifdef CONFIG_MMU_NOTIFIER
+	if (!err && (eb->args->flags & __EXEC3_USERPTR_USED)) {
+		struct i915_vma *vma;
 
-	return 0;
+		lockdep_assert_held(&eb->context->vm->vm_bind_lock);
+		assert_object_held(eb->context->vm->root_obj);
+
+		read_lock(&eb->i915->mm.notifier_lock);
+		list_for_each_entry(vma, &eb->context->vm->vm_bind_list,
+				    vm_bind_link) {
+			if (!i915_gem_object_is_userptr(vma->obj))
+				continue;
+
+			err = i915_gem_object_userptr_submit_done(vma->obj);
+			if (err)
+				break;
+		}
+
+		read_unlock(&eb->i915->mm.notifier_lock);
+	}
+#endif
+
+	if (likely(!err)) {
+		/* 
+		 * Unconditionally flush any
+		 * chipset caches (for streaming writes).
+		 */
+		intel_gt_chipset_flush(eb->gt);
+		return 0;
+	}
+
+	for_each_batch_create_order(eb, j) {
+		if (!eb->requests[j])
+			break;
+
+		i915_request_set_error_once(eb->requests[j], err);
+	}
+	return err;
 }
 
 static int eb_request_submit(struct i915_execbuffer *eb,
@@ -1088,6 +1137,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 {
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct i915_execbuffer eb;
+	bool throttle = true;
 	int err;
 
 	BUILD_BUG_ON(__EXEC3_INTERNAL_FLAGS & ~__I915_EXEC3_UNKNOWN_FLAGS);
@@ -1121,6 +1171,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 
 	mutex_lock(&eb.context->vm->vm_bind_lock);
 
+lookup_vmas:
 	err = eb_lookup_vma_all(&eb);
 	if (err) {
 		eb_release_vma_all(&eb, true);
@@ -1129,7 +1180,33 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 
 	i915_gem_ww_ctx_init(&eb.ww, true);
 
-	err = eb_validate_vma_all(&eb);
+retry_validate:
+	err = eb_pin_engine(&eb, throttle);
+	if (err)
+		goto err_validate;
+
+	/* only throttle once, even if we didn't need to throttle */
+	throttle = false;
+
+	err = eb_lock_vma_all(&eb);
+	if (err)
+		goto err_validate;
+
+	if (!list_empty(&eb.context->vm->vm_rebind_list)) {
+		eb_release_vma_all(&eb, true);
+		i915_gem_ww_ctx_fini(&eb.ww);
+		goto lookup_vmas;
+	}
+
+	err = eb_validate_persistent_vma_all(&eb);
+
+err_validate:
+	if (err == -EDEADLK) {
+		eb_release_vma_all(&eb, false);
+		err = i915_gem_ww_ctx_backoff(&eb.ww);
+		if (!err)
+			goto retry_validate;
+	}
 	if (err)
 		goto err_vma;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 8423df021b713..f980d7443fa27 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -63,6 +63,7 @@ static bool i915_gem_userptr_invalidate(struct mmu_interval_notifier *mni,
 {
 	struct drm_i915_gem_object *obj = container_of(mni, struct drm_i915_gem_object, userptr.notifier);
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct i915_vma *vma;
 	long r;
 
 	if (!mmu_notifier_range_blockable(range))
@@ -85,6 +86,15 @@ static bool i915_gem_userptr_invalidate(struct mmu_interval_notifier *mni,
 	if (current->flags & PF_EXITING)
 		return true;
 
+	spin_lock(&obj->vma.lock);
+	list_for_each_entry(vma, &obj->vma.list, obj_link) {
+		spin_lock(&vma->vm->vm_userptr_invalidated_lock);
+		list_add_tail(&vma->vm_userptr_invalidated_link,
+			      &vma->vm->vm_userptr_invalidated_list);
+		spin_unlock(&vma->vm->vm_userptr_invalidated_lock);
+	}
+	spin_unlock(&obj->vma.lock);
+
 	/* we will unbind on next submission, still have userptr pins */
 	r = dma_resv_wait_timeout(obj->base.resv, DMA_RESV_USAGE_BOOKKEEP, false,
 				  MAX_SCHEDULE_TIMEOUT);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index e57b9c492a7f9..e6216f49e7d58 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -296,6 +296,12 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		goto put_obj;
 	}
 
+	if (i915_gem_object_is_userptr(obj)) {
+		ret = i915_gem_object_userptr_submit_init(obj);
+		if (ret)
+			goto put_obj;
+	}
+
 	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
 	if (ret)
 		goto put_obj;
@@ -328,6 +334,16 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		/* Make it evictable */
 		__i915_vma_unpin(vma);
 
+#ifdef CONFIG_MMU_NOTIFIER
+		if (i915_gem_object_is_userptr(obj)) {
+			read_lock(&vm->i915->mm.notifier_lock);
+			ret = i915_gem_object_userptr_submit_done(obj);
+			read_unlock(&vm->i915->mm.notifier_lock);
+			if (ret)
+				goto out_ww;
+		}
+#endif
+
 		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
 		i915_vm_bind_it_insert(vma, &vm->va);
 		if (!obj->priv_root)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 97cd0089b516d..f1db8310de4a6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -298,6 +298,8 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	GEM_BUG_ON(IS_ERR(vm->root_obj));
 	INIT_LIST_HEAD(&vm->vm_rebind_list);
 	spin_lock_init(&vm->vm_rebind_lock);
+	spin_lock_init(&vm->vm_userptr_invalidated_lock);
+	INIT_LIST_HEAD(&vm->vm_userptr_invalidated_list);
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 1f3b1967ec175..71203d65e1d60 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -269,6 +269,10 @@ struct i915_address_space {
 	struct list_head vm_rebind_list;
 	/* @vm_rebind_lock: protects vm_rebound_list */
 	spinlock_t vm_rebind_lock;
+	/* @vm_userptr_invalidated_list: list of invalidated userptr vmas */
+	struct list_head vm_userptr_invalidated_list;
+	/* @vm_userptr_invalidated_lock: protects vm_userptr_invalidated_list */
+	spinlock_t vm_userptr_invalidated_lock;
 	/* @va: tree of persistent vmas */
 	struct rb_root_cached va;
 	struct list_head non_priv_vm_bind_list;
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index 8bf870a0f689b..5b583ca744387 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -317,6 +317,8 @@ struct i915_vma {
 	struct list_head non_priv_vm_bind_link;
 	/* @vm_rebind_link: link to vm_rebind_list and protected by vm_rebind_lock */
 	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
+	/*@vm_userptr_invalidated_link: link to the vm->vm_userptr_invalidated_list */
+	struct list_head vm_userptr_invalidated_link;
 
 	/** Timeline fence for vm_bind completion notification */
 	struct {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 14/17] drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
                   ` (12 preceding siblings ...)
  2022-08-27 19:43 ` [RFC PATCH v3 13/17] drm/i915/vm_bind: userptr dma-resv changes Andi Shyti
@ 2022-08-27 19:44 ` Andi Shyti
  2022-08-27 19:44 ` [RFC PATCH v3 15/17] drm/i915: Extend getparm for VM_BIND capability Andi Shyti
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:44 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

vma_lookup is tied to segment of the object instead of section
of VA space. Hence, it do not support aliasing (ie., multiple
bindings to the same section of the object).
Skip vma_lookup for persistent vmas as it supports aliasing.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/display/intel_fb_pin.c   |  2 +-
 .../drm/i915/display/intel_plane_initial.c    |  2 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  4 +-
 .../drm/i915/gem/i915_gem_vm_bind_object.c    |  2 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   | 16 +++----
 .../i915/gem/selftests/i915_gem_client_blt.c  |  2 +-
 .../drm/i915/gem/selftests/i915_gem_context.c | 12 ++---
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |  6 ++-
 .../drm/i915/gem/selftests/igt_gem_utils.c    |  2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  2 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt.c            |  2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c           |  4 +-
 drivers/gpu/drm/i915/gt/intel_renderstate.c   |  2 +-
 drivers/gpu/drm/i915/gt/intel_ring.c          |  2 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  4 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c      |  2 +-
 drivers/gpu/drm/i915/gt/mock_engine.c         |  2 +-
 drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |  4 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 16 +++----
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  6 +--
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |  2 +-
 .../drm/i915/gt/selftest_ring_submission.c    |  2 +-
 drivers/gpu/drm/i915/gt/selftest_rps.c        |  2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  2 +-
 drivers/gpu/drm/i915/i915_gem.c               |  2 +-
 drivers/gpu/drm/i915/i915_perf.c              |  2 +-
 drivers/gpu/drm/i915/i915_vma.c               | 26 +++++++----
 drivers/gpu/drm/i915/i915_vma.h               |  3 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 44 +++++++++----------
 drivers/gpu/drm/i915/selftests/i915_request.c |  4 +-
 drivers/gpu/drm/i915/selftests/i915_vma.c     |  2 +-
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |  2 +-
 .../drm/i915/selftests/intel_memory_region.c  |  2 +-
 37 files changed, 106 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fb_pin.c b/drivers/gpu/drm/i915/display/intel_fb_pin.c
index bd6e7c98e751d..d4b5cd4d1038c 100644
--- a/drivers/gpu/drm/i915/display/intel_fb_pin.c
+++ b/drivers/gpu/drm/i915/display/intel_fb_pin.c
@@ -47,7 +47,7 @@ intel_pin_fb_obj_dpt(struct drm_framebuffer *fb,
 		goto err;
 	}
 
-	vma = i915_vma_instance(obj, vm, view);
+	vma = i915_vma_instance(obj, vm, view, false);
 	if (IS_ERR(vma))
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c
index d10f27d0b7b09..ce034351b0c9c 100644
--- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
+++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
@@ -136,7 +136,7 @@ initial_plane_vma(struct drm_i915_private *i915,
 		goto err_obj;
 	}
 
-	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL, false);
 	if (IS_ERR(vma))
 		goto err_obj;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index f85f10cf9c34b..a53e19fc48584 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -880,7 +880,7 @@ static struct i915_vma *eb_lookup_vma(struct i915_execbuffer *eb, u32 handle)
 			}
 		}
 
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			i915_gem_object_put(obj);
 			return vma;
@@ -2212,7 +2212,7 @@ shadow_batch_pin(struct i915_execbuffer *eb,
 	struct i915_vma *vma;
 	int err;
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma))
 		return vma;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index e6216f49e7d58..3dc5af4600a28 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -254,7 +254,7 @@ static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
 	view.type = I915_GGTT_VIEW_PARTIAL;
 	view.partial.offset = va->offset >> PAGE_SHIFT;
 	view.partial.size = va->length >> PAGE_SHIFT;
-	vma = i915_vma_instance(obj, vm, &view);
+	vma = i915_vma_instance(obj, vm, &view, true);
 	if (IS_ERR(vma))
 		return vma;
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index c570cf780079a..6e13a83d0e363 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -454,7 +454,7 @@ static int igt_mock_exhaust_device_supported_pages(void *arg)
 				goto out_put;
 			}
 
-			vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
+			vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
 			if (IS_ERR(vma)) {
 				err = PTR_ERR(vma);
 				goto out_put;
@@ -522,7 +522,7 @@ static int igt_mock_memory_region_huge_pages(void *arg)
 				goto out_region;
 			}
 
-			vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
+			vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
 			if (IS_ERR(vma)) {
 				err = PTR_ERR(vma);
 				goto out_put;
@@ -614,7 +614,7 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg)
 		/* Force the page size for this object */
 		obj->mm.page_sizes.sg = page_size;
 
-		vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out_unpin;
@@ -746,7 +746,7 @@ static int igt_mock_ppgtt_huge_fill(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			break;
@@ -924,7 +924,7 @@ static int igt_mock_ppgtt_64K(void *arg)
 			 */
 			obj->mm.page_sizes.sg &= ~I915_GTT_PAGE_SIZE_2M;
 
-			vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
+			vma = i915_vma_instance(obj, &ppgtt->vm, NULL, false);
 			if (IS_ERR(vma)) {
 				err = PTR_ERR(vma);
 				goto out_object_unpin;
@@ -1092,7 +1092,7 @@ static int __igt_write_huge(struct intel_context *ce,
 	struct i915_vma *vma;
 	int err;
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma))
 		return PTR_ERR(vma);
 
@@ -1587,7 +1587,7 @@ static int igt_tmpfs_fallback(void *arg)
 	__i915_gem_object_flush_map(obj, 0, 64);
 	i915_gem_object_unpin_map(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_put;
@@ -1654,7 +1654,7 @@ static int igt_shrink_thp(void *arg)
 		goto out_vm;
 	}
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_put;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index 3cfc621ef363d..0feae9fc7e81d 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -282,7 +282,7 @@ __create_vma(struct tiled_blits *t, size_t size, bool lmem)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, t->ce->vm, NULL);
+	vma = i915_vma_instance(obj, t->ce->vm, NULL, false);
 	if (IS_ERR(vma))
 		i915_gem_object_put(obj);
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index c6ad67b90e8af..570f74df9bef5 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -426,7 +426,7 @@ static int gpu_fill(struct intel_context *ce,
 	GEM_BUG_ON(obj->base.size > ce->vm->total);
 	GEM_BUG_ON(!intel_engine_can_store_dword(ce->engine));
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma))
 		return PTR_ERR(vma);
 
@@ -930,7 +930,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj,
 	if (GRAPHICS_VER(i915) < 8)
 		return -EINVAL;
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma))
 		return PTR_ERR(vma);
 
@@ -938,7 +938,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj,
 	if (IS_ERR(rpcs))
 		return PTR_ERR(rpcs);
 
-	batch = i915_vma_instance(rpcs, ce->vm, NULL);
+	batch = i915_vma_instance(rpcs, ce->vm, NULL, false);
 	if (IS_ERR(batch)) {
 		err = PTR_ERR(batch);
 		goto err_put;
@@ -1522,7 +1522,7 @@ static int write_to_scratch(struct i915_gem_context *ctx,
 	intel_gt_chipset_flush(engine->gt);
 
 	vm = i915_gem_context_get_eb_vm(ctx);
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_vm;
@@ -1599,7 +1599,7 @@ static int read_from_scratch(struct i915_gem_context *ctx,
 		const u32 GPR0 = engine->mmio_base + 0x600;
 
 		vm = i915_gem_context_get_eb_vm(ctx);
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out_vm;
@@ -1635,7 +1635,7 @@ static int read_from_scratch(struct i915_gem_context *ctx,
 
 		/* hsw: register access even to 3DPRIM! is protected */
 		vm = i915_vm_get(&engine->gt->ggtt->vm);
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out_vm;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
index fe6c37fd7859a..fc235e1e6c122 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
@@ -201,7 +201,7 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
 		return PTR_ERR(obj);
 
 	if (vm) {
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out_put;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 3cff08f04f6ce..10ffa52aad8d4 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -546,7 +546,8 @@ static int make_obj_busy(struct drm_i915_gem_object *obj)
 		struct i915_gem_ww_ctx ww;
 		int err;
 
-		vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &engine->gt->ggtt->vm,
+					NULL, false);
 		if (IS_ERR(vma))
 			return PTR_ERR(vma);
 
@@ -1587,7 +1588,8 @@ static int __igt_mmap_gpu(struct drm_i915_private *i915,
 		struct i915_vma *vma;
 		struct i915_gem_ww_ctx ww;
 
-		vma = i915_vma_instance(obj, engine->kernel_context->vm, NULL);
+		vma = i915_vma_instance(obj, engine->kernel_context->vm,
+					NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out_unmap;
diff --git a/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c b/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
index 3c55e77b0f1b0..4184e198c824a 100644
--- a/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
+++ b/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c
@@ -91,7 +91,7 @@ igt_emit_store_dw(struct i915_vma *vma,
 
 	intel_gt_chipset_flush(vma->vm->gt);
 
-	vma = i915_vma_instance(obj, vma->vm, NULL);
+	vma = i915_vma_instance(obj, vma->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err;
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index 1bb766c79dcbe..a0af2aa50533f 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -395,7 +395,7 @@ gen6_alloc_top_pd(struct gen6_ppgtt *ppgtt)
 	pd->pt.base->base.resv = i915_vm_resv_get(&ppgtt->base.vm);
 	pd->pt.base->shares_resv_from = &ppgtt->base.vm;
 
-	ppgtt->vma = i915_vma_instance(pd->pt.base, &ggtt->vm, NULL);
+	ppgtt->vma = i915_vma_instance(pd->pt.base, &ggtt->vm, NULL, false);
 	if (IS_ERR(ppgtt->vma)) {
 		err = PTR_ERR(ppgtt->vma);
 		ppgtt->vma = NULL;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 275ad72940c15..52f8295b85a2b 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -991,7 +991,7 @@ static int init_status_page(struct intel_engine_cs *engine)
 
 	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
 
-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err_put;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index e4bac2431e416..d9ddaecdd5b48 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -423,7 +423,7 @@ static int intel_gt_init_scratch(struct intel_gt *gt, unsigned int size)
 		return PTR_ERR(obj);
 	}
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err_unref;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index f1db8310de4a6..59ed3a822483f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -628,7 +628,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
 
 	i915_gem_object_set_cache_coherency(obj, I915_CACHING_CACHED);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 070cec4ff8a48..71fc27df858d4 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1009,7 +1009,7 @@ __lrc_alloc_state(struct intel_context *ce, struct intel_engine_cs *engine)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
@@ -1662,7 +1662,7 @@ static int lrc_create_wa_ctx(struct intel_engine_cs *engine)
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err;
diff --git a/drivers/gpu/drm/i915/gt/intel_renderstate.c b/drivers/gpu/drm/i915/gt/intel_renderstate.c
index 5121e6dc2fa53..bc7a2d4421dbc 100644
--- a/drivers/gpu/drm/i915/gt/intel_renderstate.c
+++ b/drivers/gpu/drm/i915/gt/intel_renderstate.c
@@ -157,7 +157,7 @@ int intel_renderstate_init(struct intel_renderstate *so,
 		if (IS_ERR(obj))
 			return PTR_ERR(obj);
 
-		so->vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+		so->vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
 		if (IS_ERR(so->vma)) {
 			err = PTR_ERR(so->vma);
 			goto err_obj;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c
index 15ec64d881c44..24c8b738a3945 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -130,7 +130,7 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
 	if (vm->has_read_only)
 		i915_gem_object_set_readonly(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma))
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index d5d6f1fadcae3..5e93a4052140a 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -551,7 +551,7 @@ alloc_context_vma(struct intel_engine_cs *engine)
 	if (IS_IVYBRIDGE(i915))
 		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
 
-	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -1291,7 +1291,7 @@ static struct i915_vma *gen7_ctx_vma(struct intel_engine_cs *engine)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, engine->gt->vm, NULL);
+	vma = i915_vma_instance(obj, engine->gt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return ERR_CAST(vma);
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index b9640212d6595..31f56996f1002 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -28,7 +28,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
 
 	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma))
 		i915_gem_object_put(obj);
 
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index c0637bf799a33..6f35783083953 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -46,7 +46,7 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma))
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
index 1b75f478d1b83..16fcaba7c9806 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
@@ -85,7 +85,7 @@ static struct i915_vma *create_empty_batch(struct intel_context *ce)
 
 	i915_gem_object_flush_map(obj);
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_unpin;
@@ -222,7 +222,7 @@ static struct i915_vma *create_nop_batch(struct intel_context *ce)
 
 	i915_gem_object_flush_map(obj);
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_unpin;
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index 1e08b2473b993..643ffcb3964a8 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -1000,7 +1000,7 @@ static int live_timeslice_preempt(void *arg)
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -1307,7 +1307,7 @@ static int live_timeslice_queue(void *arg)
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -1562,7 +1562,7 @@ static int live_busywait_preempt(void *arg)
 		goto err_obj;
 	}
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_map;
@@ -2716,7 +2716,7 @@ static int create_gang(struct intel_engine_cs *engine,
 		goto err_ce;
 	}
 
-	vma = i915_vma_instance(obj, ce->vm, NULL);
+	vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -3060,7 +3060,7 @@ create_gpr_user(struct intel_engine_cs *engine,
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, result->vm, NULL);
+	vma = i915_vma_instance(obj, result->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
@@ -3130,7 +3130,7 @@ static struct i915_vma *create_global(struct intel_gt *gt, size_t sz)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
@@ -3159,7 +3159,7 @@ create_gpr_client(struct intel_engine_cs *engine,
 	if (IS_ERR(ce))
 		return ERR_CAST(ce);
 
-	vma = i915_vma_instance(global->obj, ce->vm, NULL);
+	vma = i915_vma_instance(global->obj, ce->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_ce;
@@ -3501,7 +3501,7 @@ static int smoke_submit(struct preempt_smoke *smoke,
 		struct i915_address_space *vm;
 
 		vm = i915_gem_context_get_eb_vm(ctx);
-		vma = i915_vma_instance(batch, vm, NULL);
+		vma = i915_vma_instance(batch, vm, NULL, false);
 		i915_vm_put(vm);
 		if (IS_ERR(vma))
 			return PTR_ERR(vma);
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 7f3bb1d34dfbf..0b021a32d0e03 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -147,13 +147,13 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine)
 	h->obj = obj;
 	h->batch = vaddr;
 
-	vma = i915_vma_instance(h->obj, vm, NULL);
+	vma = i915_vma_instance(h->obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_vm_put(vm);
 		return ERR_CAST(vma);
 	}
 
-	hws = i915_vma_instance(h->hws, vm, NULL);
+	hws = i915_vma_instance(h->hws, vm, NULL, false);
 	if (IS_ERR(hws)) {
 		i915_vm_put(vm);
 		return ERR_CAST(hws);
@@ -1474,7 +1474,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
 		}
 	}
 
-	arg.vma = i915_vma_instance(obj, vm, NULL);
+	arg.vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(arg.vma)) {
 		err = PTR_ERR(arg.vma);
 		pr_err("[%s] VMA instance failed: %d!\n", engine->name, err);
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 1109088fe8f63..2ddcefffcfd4d 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -930,7 +930,7 @@ create_user_vma(struct i915_address_space *vm, unsigned long size)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
diff --git a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
index 70f9ac1ec2c76..7e9361104620c 100644
--- a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
@@ -17,7 +17,7 @@ static struct i915_vma *create_wally(struct intel_engine_cs *engine)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, engine->gt->vm, NULL);
+	vma = i915_vma_instance(obj, engine->gt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		i915_gem_object_put(obj);
 		return vma;
diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c
index cfb4708dd62e6..327558828bef6 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rps.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
@@ -78,7 +78,7 @@ create_spin_counter(struct intel_engine_cs *engine,
 
 	end = obj->base.size / sizeof(u32) - 1;
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_put;
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index 67a9aab801ddf..d893ea763ac61 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -122,7 +122,7 @@ read_nonprivs(struct intel_context *ce)
 	i915_gem_object_flush_map(result);
 	i915_gem_object_unpin_map(result);
 
-	vma = i915_vma_instance(result, &engine->gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(result, &engine->gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -389,7 +389,7 @@ static struct i915_vma *create_batch(struct i915_address_space *vm)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 24451d000a6a6..cd3e52fa5ea50 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -737,7 +737,7 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL, false);
 	if (IS_ERR(vma))
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4b76051312dd7..00773b78d71f8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -930,7 +930,7 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj,
 	}
 
 new_vma:
-	vma = i915_vma_instance(obj, &ggtt->vm, view);
+	vma = i915_vma_instance(obj, &ggtt->vm, view, false);
 	if (IS_ERR(vma))
 		return vma;
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index f3c23fe9ad9ce..a7aa03b79ac47 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1919,7 +1919,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
 
 	oa_bo->vma = i915_vma_instance(obj,
 				       &stream->engine->gt->ggtt->vm,
-				       NULL);
+				       NULL, false);
 	if (IS_ERR(oa_bo->vma)) {
 		err = PTR_ERR(oa_bo->vma);
 		goto out_ww;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 4b8ae58cd886b..81b8e33ac085f 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -110,7 +110,8 @@ static void __i915_vma_retire(struct i915_active *ref)
 static struct i915_vma *
 vma_create(struct drm_i915_gem_object *obj,
 	   struct i915_address_space *vm,
-	   const struct i915_ggtt_view *view)
+	   const struct i915_ggtt_view *view,
+	   bool persistent)
 {
 	struct i915_vma *pos = ERR_PTR(-E2BIG);
 	struct i915_vma *vma;
@@ -197,6 +198,9 @@ vma_create(struct drm_i915_gem_object *obj,
 		__set_bit(I915_VMA_GGTT_BIT, __i915_vma_flags(vma));
 	}
 
+	if (persistent)
+		goto skip_rb_insert;
+
 	rb = NULL;
 	p = &obj->vma.tree.rb_node;
 	while (*p) {
@@ -221,6 +225,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	rb_link_node(&vma->obj_node, rb, p);
 	rb_insert_color(&vma->obj_node, &obj->vma.tree);
 
+skip_rb_insert:
 	if (i915_vma_is_ggtt(vma))
 		/*
 		 * We put the GGTT vma at the start of the vma-list, followed
@@ -279,6 +284,7 @@ i915_vma_lookup(struct drm_i915_gem_object *obj,
  * @obj: parent &struct drm_i915_gem_object to be mapped
  * @vm: address space in which the mapping is located
  * @view: additional mapping requirements
+ * @persistent: Whether the vma is persistent
  *
  * i915_vma_instance() looks up an existing VMA of the @obj in the @vm with
  * the same @view characteristics. If a match is not found, one is created.
@@ -290,19 +296,22 @@ i915_vma_lookup(struct drm_i915_gem_object *obj,
 struct i915_vma *
 i915_vma_instance(struct drm_i915_gem_object *obj,
 		  struct i915_address_space *vm,
-		  const struct i915_ggtt_view *view)
+		  const struct i915_ggtt_view *view,
+		  bool persistent)
 {
-	struct i915_vma *vma;
+	struct i915_vma *vma = NULL;
 
 	GEM_BUG_ON(!kref_read(&vm->ref));
 
-	spin_lock(&obj->vma.lock);
-	vma = i915_vma_lookup(obj, vm, view);
-	spin_unlock(&obj->vma.lock);
+	if (!persistent) {
+		spin_lock(&obj->vma.lock);
+		vma = i915_vma_lookup(obj, vm, view);
+		spin_unlock(&obj->vma.lock);
+	}
 
 	/* vma_create() will resolve the race if another creates the vma */
 	if (unlikely(!vma))
-		vma = vma_create(obj, vm, view);
+		vma = vma_create(obj, vm, view, persistent);
 
 	GEM_BUG_ON(!IS_ERR(vma) && i915_vma_compare(vma, vm, view));
 	return vma;
@@ -1728,7 +1737,8 @@ static void release_references(struct i915_vma *vma, struct intel_gt *gt,
 
 	spin_lock(&obj->vma.lock);
 	list_del(&vma->obj_link);
-	if (!RB_EMPTY_NODE(&vma->obj_node))
+	if (!i915_vma_is_persistent(vma) &&
+	    !RB_EMPTY_NODE(&vma->obj_node))
 		rb_erase(&vma->obj_node, &obj->vma.tree);
 
 	spin_unlock(&obj->vma.lock);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 9f8c369c3b466..028d063731aa2 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -43,7 +43,8 @@
 struct i915_vma *
 i915_vma_instance(struct drm_i915_gem_object *obj,
 		  struct i915_address_space *vm,
-		  const struct i915_ggtt_view *view);
+		  const struct i915_ggtt_view *view,
+		  bool persistent);
 
 void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
 #define I915_VMA_RELEASE_MAP BIT(0)
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index fb5e619634792..8552be3820308 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -390,7 +390,7 @@ static void close_object_list(struct list_head *objects,
 	list_for_each_entry_safe(obj, on, objects, st_link) {
 		struct i915_vma *vma;
 
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (!IS_ERR(vma))
 			ignored = i915_vma_unbind_unlocked(vma);
 
@@ -452,7 +452,7 @@ static int fill_hole(struct i915_address_space *vm,
 					u64 aligned_size = round_up(obj->base.size,
 								    min_alignment);
 
-					vma = i915_vma_instance(obj, vm, NULL);
+					vma = i915_vma_instance(obj, vm, NULL, false);
 					if (IS_ERR(vma))
 						continue;
 
@@ -492,7 +492,7 @@ static int fill_hole(struct i915_address_space *vm,
 					u64 aligned_size = round_up(obj->base.size,
 								    min_alignment);
 
-					vma = i915_vma_instance(obj, vm, NULL);
+					vma = i915_vma_instance(obj, vm, NULL, false);
 					if (IS_ERR(vma))
 						continue;
 
@@ -531,7 +531,7 @@ static int fill_hole(struct i915_address_space *vm,
 					u64 aligned_size = round_up(obj->base.size,
 								    min_alignment);
 
-					vma = i915_vma_instance(obj, vm, NULL);
+					vma = i915_vma_instance(obj, vm, NULL, false);
 					if (IS_ERR(vma))
 						continue;
 
@@ -571,7 +571,7 @@ static int fill_hole(struct i915_address_space *vm,
 					u64 aligned_size = round_up(obj->base.size,
 								    min_alignment);
 
-					vma = i915_vma_instance(obj, vm, NULL);
+					vma = i915_vma_instance(obj, vm, NULL, false);
 					if (IS_ERR(vma))
 						continue;
 
@@ -653,7 +653,7 @@ static int walk_hole(struct i915_address_space *vm,
 		if (IS_ERR(obj))
 			break;
 
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto err_put;
@@ -728,7 +728,7 @@ static int pot_hole(struct i915_address_space *vm,
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_obj;
@@ -837,7 +837,7 @@ static int drunk_hole(struct i915_address_space *vm,
 			break;
 		}
 
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto err_obj;
@@ -920,7 +920,7 @@ static int __shrink_hole(struct i915_address_space *vm,
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, vm, NULL);
+		vma = i915_vma_instance(obj, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			break;
@@ -1018,7 +1018,7 @@ static int shrink_boom(struct i915_address_space *vm,
 		if (IS_ERR(purge))
 			return PTR_ERR(purge);
 
-		vma = i915_vma_instance(purge, vm, NULL);
+		vma = i915_vma_instance(purge, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto err_purge;
@@ -1041,7 +1041,7 @@ static int shrink_boom(struct i915_address_space *vm,
 		vm->fault_attr.interval = 1;
 		atomic_set(&vm->fault_attr.times, -1);
 
-		vma = i915_vma_instance(explode, vm, NULL);
+		vma = i915_vma_instance(explode, vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto err_explode;
@@ -1088,7 +1088,7 @@ static int misaligned_case(struct i915_address_space *vm, struct intel_memory_re
 		return PTR_ERR(obj);
 	}
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_put;
@@ -1560,7 +1560,7 @@ static int igt_gtt_reserve(void *arg)
 		}
 
 		list_add(&obj->st_link, &objects);
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1606,7 +1606,7 @@ static int igt_gtt_reserve(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1636,7 +1636,7 @@ static int igt_gtt_reserve(void *arg)
 		struct i915_vma *vma;
 		u64 offset;
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1783,7 +1783,7 @@ static int igt_gtt_insert(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1809,7 +1809,7 @@ static int igt_gtt_insert(void *arg)
 	list_for_each_entry(obj, &objects, st_link) {
 		struct i915_vma *vma;
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1829,7 +1829,7 @@ static int igt_gtt_insert(void *arg)
 		struct i915_vma *vma;
 		u64 offset;
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1882,7 +1882,7 @@ static int igt_gtt_insert(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL, false);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -2091,7 +2091,7 @@ static int igt_cs_tlb(void *arg)
 	}
 	i915_gem_object_set_cache_coherency(out, I915_CACHING_CACHED);
 
-	vma = i915_vma_instance(out, vm, NULL);
+	vma = i915_vma_instance(out, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_put_out;
@@ -2131,7 +2131,7 @@ static int igt_cs_tlb(void *arg)
 
 			memset32(result, STACK_MAGIC, PAGE_SIZE / sizeof(u32));
 
-			vma = i915_vma_instance(bbe, vm, NULL);
+			vma = i915_vma_instance(bbe, vm, NULL, false);
 			if (IS_ERR(vma)) {
 				err = PTR_ERR(vma);
 				goto end;
@@ -2203,7 +2203,7 @@ static int igt_cs_tlb(void *arg)
 				goto end;
 			}
 
-			vma = i915_vma_instance(act, vm, NULL);
+			vma = i915_vma_instance(act, vm, NULL, false);
 			if (IS_ERR(vma)) {
 				kfree(vma_res);
 				err = PTR_ERR(vma);
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 818a4909c1f35..297c1d4ebf44c 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -961,7 +961,7 @@ static struct i915_vma *empty_batch(struct drm_i915_private *i915)
 
 	intel_gt_chipset_flush(to_gt(i915));
 
-	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL);
+	vma = i915_vma_instance(obj, &to_gt(i915)->ggtt->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err;
@@ -1100,7 +1100,7 @@ static struct i915_vma *recursive_batch(struct drm_i915_private *i915)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	vma = i915_vma_instance(obj, to_gt(i915)->vm, NULL);
+	vma = i915_vma_instance(obj, to_gt(i915)->vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err;
diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
index e3821398a5b09..ecf2d4abf9290 100644
--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
@@ -68,7 +68,7 @@ checked_vma_instance(struct drm_i915_gem_object *obj,
 	struct i915_vma *vma;
 	bool ok = true;
 
-	vma = i915_vma_instance(obj, vm, view);
+	vma = i915_vma_instance(obj, vm, view, false);
 	if (IS_ERR(vma))
 		return vma;
 
diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
index 0c22594ae2746..6901f94ff0764 100644
--- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
+++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
@@ -47,7 +47,7 @@ static void *igt_spinner_pin_obj(struct intel_context *ce,
 	void *vaddr;
 	int ret;
 
-	*vma = i915_vma_instance(obj, ce->vm, NULL);
+	*vma = i915_vma_instance(obj, ce->vm, NULL, false);
 	if (IS_ERR(*vma))
 		return ERR_CAST(*vma);
 
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
index 3b18e5905c86b..551d0c958a3bc 100644
--- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
@@ -745,7 +745,7 @@ static int igt_gpu_write(struct i915_gem_context *ctx,
 	if (!order)
 		return -ENOMEM;
 
-	vma = i915_vma_instance(obj, vm, NULL);
+	vma = i915_vma_instance(obj, vm, NULL, false);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto out_free;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 15/17] drm/i915: Extend getparm for VM_BIND capability
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
                   ` (13 preceding siblings ...)
  2022-08-27 19:44 ` [RFC PATCH v3 14/17] drm/i915/vm_bind: Skip vma_lookup for persistent vmas Andi Shyti
@ 2022-08-27 19:44 ` Andi Shyti
  2022-08-27 19:44 ` [RFC PATCH v3 16/17] drm/i915/ioctl: Enable the vm_bind/unbind ioctls Andi Shyti
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:44 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Add getparam support for VM_BIND capability version support.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_getparam.c |  3 +++
 include/uapi/drm/i915_drm.h          | 21 +++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c
index 6fd15b39570c1..c1d53febc5de1 100644
--- a/drivers/gpu/drm/i915/i915_getparam.c
+++ b/drivers/gpu/drm/i915/i915_getparam.c
@@ -175,6 +175,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_PERF_REVISION:
 		value = i915_perf_ioctl_version();
 		break;
+	case I915_PARAM_VM_BIND_VERSION:
+		value = GRAPHICS_VER(i915) >= 12 ? 1 : 0;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index ea1906873f278..b3d3e98efa02a 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -749,6 +749,27 @@ typedef struct drm_i915_irq_wait {
 /* Query if the kernel supports the I915_USERPTR_PROBE flag. */
 #define I915_PARAM_HAS_USERPTR_PROBE 56
 
+/*
+ * VM_BIND feature version supported.
+ *
+ * The following versions of VM_BIND have been defined:
+ *
+ * 0: No VM_BIND support.
+ *
+ * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
+ *    previously with VM_BIND, the ioctl will not support unbinding multiple
+ *    mappings or splitting them. Similarly, VM_BIND calls will not replace
+ *    any existing mappings.
+ *
+ * 2: The restrictions on unbinding partial or multiple mappings is
+ *    lifted, Similarly, binding will replace any mappings in the given range.
+ *
+ * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
+ *
+ * vm_bind versions are backward compatible.
+ */
+#define I915_PARAM_VM_BIND_VERSION	57
+
 /* Must be kept compact -- no holes and well documented */
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 16/17] drm/i915/ioctl: Enable the vm_bind/unbind ioctls
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
                   ` (14 preceding siblings ...)
  2022-08-27 19:44 ` [RFC PATCH v3 15/17] drm/i915: Extend getparm for VM_BIND capability Andi Shyti
@ 2022-08-27 19:44 ` Andi Shyti
  2022-08-27 19:44 ` [RFC PATCH v3 17/17] drm/i915: Enable execbuf3 ioctl for vm_bind Andi Shyti
  2022-08-31  7:33 ` [Intel-gfx] [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Tvrtko Ursulin
  17 siblings, 0 replies; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:44 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Add ioctls to enable the vm_bind and vm_unbind feature

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_driver.c | 2 ++
 include/uapi/drm/i915_drm.h        | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 9a9010fd9ecfa..841b5d62c2c01 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1844,6 +1844,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_BIND, i915_gem_vm_bind_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_UNBIND, i915_gem_vm_unbind_ioctl, DRM_RENDER_ALLOW),
 };
 
 /*
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index b3d3e98efa02a..b4b844f558b24 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -470,6 +470,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_VM_CREATE		0x3a
 #define DRM_I915_GEM_VM_DESTROY		0x3b
 #define DRM_I915_GEM_CREATE_EXT		0x3c
+#define DRM_I915_GEM_VM_BIND		0x3d
+#define DRM_I915_GEM_VM_UNBIND		0x3e
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
@@ -534,6 +536,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_QUERY			DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query)
 #define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
 #define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
+#define DRM_IOCTL_I915_GEM_VM_BIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v3 17/17] drm/i915: Enable execbuf3 ioctl for vm_bind
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
                   ` (15 preceding siblings ...)
  2022-08-27 19:44 ` [RFC PATCH v3 16/17] drm/i915/ioctl: Enable the vm_bind/unbind ioctls Andi Shyti
@ 2022-08-27 19:44 ` Andi Shyti
  2022-08-31  7:33 ` [Intel-gfx] [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Tvrtko Ursulin
  17 siblings, 0 replies; 41+ messages in thread
From: Andi Shyti @ 2022-08-27 19:44 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Add the uapi for the implemented execbuf3 ioctl to present it for
userspace. This ioctl can be used only for vm_bind mode and vm_binded
batchbuffers can be submitted only through execbuf3 ioctl.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_driver.c | 1 +
 include/uapi/drm/i915_drm.h        | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 841b5d62c2c01..f3b0bbfbe9746 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1805,6 +1805,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_INIT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER, drm_invalid_op, DRM_AUTH),
 	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER2_WR, i915_gem_execbuffer2_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER3, i915_gem_execbuffer3_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_PIN, i915_gem_reject_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_UNPIN, i915_gem_reject_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_BUSY, i915_gem_busy_ioctl, DRM_RENDER_ALLOW),
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index b4b844f558b24..c807d48e1f96c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -472,6 +472,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_CREATE_EXT		0x3c
 #define DRM_I915_GEM_VM_BIND		0x3d
 #define DRM_I915_GEM_VM_UNBIND		0x3e
+#define DRM_I915_GEM_EXECBUFFER3	0x3f
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
@@ -538,6 +539,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
 #define DRM_IOCTL_I915_GEM_VM_BIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
 #define DRM_IOCTL_I915_GEM_VM_UNBIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object
  2022-08-27 19:43 ` [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object Andi Shyti
@ 2022-08-30 17:37   ` Matthew Auld
  2022-08-31  6:10     ` Niranjana Vishwanathapura
  2022-08-30 18:19   ` Matthew Auld
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 41+ messages in thread
From: Matthew Auld @ 2022-08-30 17:37 UTC (permalink / raw)
  To: Andi Shyti, intel-gfx, dri-devel
  Cc: Niranjana Vishwanathapura, Thomas Hellstrom, Andi Shyti, Ramalingam C

On 27/08/2022 20:43, Andi Shyti wrote:
> From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> 
> Implement the bind and unbind of an object at the specified GPU virtual
> addresses.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/Makefile                 |   1 +
>   drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  21 ++
>   .../drm/i915/gem/i915_gem_vm_bind_object.c    | 322 ++++++++++++++++++
>   drivers/gpu/drm/i915/gt/intel_gtt.c           |  10 +
>   drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>   drivers/gpu/drm/i915/i915_driver.c            |   1 +
>   drivers/gpu/drm/i915/i915_vma.c               |   3 +-
>   drivers/gpu/drm/i915/i915_vma.h               |   2 -
>   drivers/gpu/drm/i915/i915_vma_types.h         |  14 +
>   include/uapi/drm/i915_drm.h                   | 163 +++++++++
>   10 files changed, 543 insertions(+), 3 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>   create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 522ef9b4aff32..4e1627e96c6e0 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -165,6 +165,7 @@ gem-y += \
>   	gem/i915_gem_ttm_move.o \
>   	gem/i915_gem_ttm_pm.o \
>   	gem/i915_gem_userptr.o \
> +	gem/i915_gem_vm_bind_object.o \
>   	gem/i915_gem_wait.o \
>   	gem/i915_gemfs.o
>   i915-y += \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> new file mode 100644
> index 0000000000000..ebc493b7dafc1
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#ifndef __I915_GEM_VM_BIND_H
> +#define __I915_GEM_VM_BIND_H
> +
> +#include "i915_drv.h"
> +
> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
> +
> +int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
> +			   struct drm_file *file);
> +int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file);
> +
> +void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm);
> +#endif /* __I915_GEM_VM_BIND_H */
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> new file mode 100644
> index 0000000000000..dadd1d4b1761b
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -0,0 +1,322 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#include <linux/interval_tree_generic.h>
> +
> +#include "gem/i915_gem_vm_bind.h"
> +#include "gem/i915_gem_context.h"
> +#include "gt/gen8_engine_cs.h"
> +
> +#include "i915_drv.h"
> +#include "i915_gem_gtt.h"
> +
> +#define START(node) ((node)->start)
> +#define LAST(node) ((node)->last)
> +
> +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
> +		     START, LAST, static inline, i915_vm_bind_it)
> +
> +#undef START
> +#undef LAST
> +
> +/**
> + * DOC: VM_BIND/UNBIND ioctls
> + *
> + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
> + * objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
> + * specified address space (VM). Multiple mappings can map to the same physical
> + * pages of an object (aliasing). These mappings (also referred to as persistent
> + * mappings) will be persistent across multiple GPU submissions (execbuf calls)
> + * issued by the UMD, without user having to provide a list of all required
> + * mappings during each submission (as required by older execbuf mode).
> + *
> + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
> + * signaling the completion of bind/unbind operation.
> + *
> + * VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
> + * User has to opt-in for VM_BIND mode of binding for an address space (VM)
> + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be
> + * done asynchronously, when valid out fence is specified.
> + *
> + * VM_BIND locking order is as below.
> + *
> + * 1) vm_bind_lock mutex will protect vm_bind lists. This lock is taken in
> + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
> + *    mapping.
> + *
> + *    In future, when GPU page faults are supported, we can potentially use a
> + *    rwsem instead, so that multiple page fault handlers can take the read
> + *    side lock to lookup the mapping and hence can run in parallel.
> + *    The older execbuf mode of binding do not need this lock.
> + *
> + * 2) The object's dma-resv lock will protect i915_vma state and needs
> + *    to be held while binding/unbinding a vma in the async worker and while
> + *    updating dma-resv fence list of an object. Note that private BOs of a VM
> + *    will all share a dma-resv object.
> + *
> + * 3) Spinlock/s to protect some of the VM's lists like the list of
> + *    invalidated vmas (due to eviction and userptr invalidation) etc.
> + */
> +
> +/**
> + * i915_gem_vm_bind_lookup_vma() - lookup for the vma with a starting addr
> + * @vm: virtual address space in which vma needs to be looked for
> + * @va: starting addr of the vma
> + *
> + * retrieves the vma with a starting address from the vm's vma tree.
> + *
> + * Returns: returns vma on success, NULL on failure.
> + */
> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
> +{
> +	lockdep_assert_held(&vm->vm_bind_lock);
> +
> +	return i915_vm_bind_it_iter_first(&vm->va, va, va);
> +}
> +
> +/**
> + * i915_gem_vm_bind_remove() - Remove vma from the vm bind list
> + * @vma: vma that needs to be removed
> + * @release_obj: object to be release or not
> + *
> + * Removes the vma from the vm's lists custom interval tree
> + */
> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
> +{
> +	lockdep_assert_held(&vma->vm->vm_bind_lock);
> +
> +	if (!list_empty(&vma->vm_bind_link)) {
> +		list_del_init(&vma->vm_bind_link);
> +		i915_vm_bind_it_remove(vma, &vma->vm->va);
> +
> +		/* Release object */
> +		if (release_obj)
> +			i915_gem_object_put(vma->obj);
> +	}
> +}
> +
> +static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
> +				  struct i915_vma *vma,
> +				  struct drm_i915_gem_vm_unbind *va)
> +{
> +	struct drm_i915_gem_object *obj;
> +	int ret;
> +
> +	if (vma) {
> +		obj = vma->obj;
> +		i915_vma_destroy(vma);
> +
> +		goto exit;
> +	}
> +
> +	if (!va)
> +		return -EINVAL;
> +
> +	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
> +	if (ret)
> +		return ret;
> +
> +	va->start = gen8_noncanonical_addr(va->start);
> +	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +
> +	if (!vma)
> +		ret = -ENOENT;
> +	else if (vma->size != va->length)
> +		ret = -EINVAL;
> +
> +	if (ret) {
> +		mutex_unlock(&vm->vm_bind_lock);
> +		return ret;
> +	}
> +
> +	i915_gem_vm_bind_remove(vma, false);
> +
> +	mutex_unlock(&vm->vm_bind_lock);
> +
> +	/* Destroy vma and then release object */
> +	obj = vma->obj;
> +	ret = i915_gem_object_lock(obj, NULL);
> +	if (ret)
> +		return ret;
> +
> +	i915_vma_destroy(vma);
> +	i915_gem_object_unlock(obj);
> +
> +exit:
> +	i915_gem_object_put(obj);
> +
> +	return 0;
> +}
> +
> +/**
> + * i915_gem_vm_unbind_vma_all() - Unbind all vmas from an address space
> + * @vm: Address spece from which vma binding needs to be removed
> + *
> + * Unbind all userspace requested object binding
> + */
> +void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm)
> +{
> +	struct i915_vma *vma, *t;
> +
> +	list_for_each_entry_safe(vma, t, &vm->vm_bound_list, vm_bind_link)
> +		WARN_ON(i915_gem_vm_unbind_vma(vm, vma, NULL));
> +}
> +
> +static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
> +					struct drm_i915_gem_object *obj,
> +					struct drm_i915_gem_vm_bind *va)
> +{
> +	struct i915_ggtt_view view;
> +	struct i915_vma *vma;
> +
> +	va->start = gen8_noncanonical_addr(va->start);
> +	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +	if (vma)
> +		return ERR_PTR(-EEXIST);
> +
> +	view.type = I915_GGTT_VIEW_PARTIAL;
> +	view.partial.offset = va->offset >> PAGE_SHIFT;
> +	view.partial.size = va->length >> PAGE_SHIFT;
> +	vma = i915_vma_instance(obj, vm, &view);
> +	if (IS_ERR(vma))
> +		return vma;
> +
> +	vma->start = va->start;
> +	vma->last = va->start + va->length - 1;
> +
> +	return vma;
> +}
> +
> +static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> +				struct drm_i915_gem_vm_bind *va,
> +				struct drm_file *file)
> +{
> +	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma = NULL;
> +	struct i915_gem_ww_ctx ww;
> +	u64 pin_flags;
> +	int ret = 0;
> +
> +	if (!vm->vm_bind_mode)
> +		return -EOPNOTSUPP;
> +
> +	obj = i915_gem_object_lookup(file, va->handle);
> +	if (!obj)
> +		return -ENOENT;
> +
> +	if (!va->length ||
> +	    !IS_ALIGNED(va->offset | va->length,
> +			i915_gem_object_max_page_size(obj->mm.placements,
> +						      obj->mm.n_placements)) ||
> +	    range_overflows_t(u64, va->offset, va->length, obj->base.size)) {
> +		ret = -EINVAL;
> +		goto put_obj;
> +	}
> +
> +	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
> +	if (ret)
> +		goto put_obj;
> +
> +	vma = vm_bind_get_vma(vm, obj, va);
> +	if (IS_ERR(vma)) {
> +		ret = PTR_ERR(vma);
> +		goto unlock_vm;
> +	}
> +
> +	for_i915_gem_ww(&ww, ret, true) {
> +retry:
> +		ret = i915_gem_object_lock(vma->obj, &ww);
> +		if (ret)
> +			goto out_ww;
> +
> +		ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
> +		if (ret)
> +			goto out_ww;
> +
> +		/* Make it evictable */
> +		__i915_vma_unpin(vma);
> +
> +		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
> +		i915_vm_bind_it_insert(vma, &vm->va);
> +
> +out_ww:
> +		if (ret == -EDEADLK) {
> +			ret = i915_gem_ww_ctx_backoff(&ww);
> +			if (!ret)
> +				goto retry;
> +		} else {
> +			/* Hold object reference until vm_unbind */
> +			i915_gem_object_get(vma->obj);
> +		}
> +	}

Just a drive-by-comment, since this looks a little strange at a glance. 
The main idea behind for_i915_gem_ww() is to handle this type of stuff 
for you.

With the usual pattern this would look something like:

for_i915_gem_ww(&ww, ret, true) {
     ret = i915_gem_object_lock(vma->obj, &ww);
     if (ret)
         continue;

     ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
     if (ret)
         continue;

     __i915_vma_unpin(vma);
     list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
     i915_vm_bind_it_insert(vma, &vm->va);
     i915_gem_object_get(vma->obj);
}

Does that not work here? If it doesn't then probably we shouldn't use 
for_i915_gem_ww() here.

> +
> +unlock_vm:
> +	mutex_unlock(&vm->vm_bind_lock);
> +
> +put_obj:
> +	i915_gem_object_put(obj);
> +
> +	return ret;
> +}
> +
> +/**
> + * i915_gem_vm_bind_ioctl() - ioctl function for binding an obj into
> + * virtual address
> + * @dev: drm device associated to the virtual address
> + * @data: data related to the vm bind required
> + * @file: drm_file related to he ioctl
> + *
> + * Implements a function to bind the object into the virtual address
> + *
> + * Returns 0 on success, error code on failure.
> + */
> +int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
> +			   struct drm_file *file)
> +{
> +	struct drm_i915_gem_vm_bind *args = data;
> +	struct i915_address_space *vm;
> +	int ret;
> +
> +	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> +	if (unlikely(!vm))
> +		return -ENOENT;
> +
> +	ret = i915_gem_vm_bind_obj(vm, args, file);
> +
> +	i915_vm_put(vm);
> +	return ret;
> +}
> +
> +/**
> + * i915_gem_vm_unbind_ioctl() - ioctl function for unbinding an obj from
> + * virtual address
> + * @dev: drm device associated to the virtual address
> + * @data: data related to the binding that needs to be unbinded
> + * @file: drm_file related to the ioctl
> + *
> + * Implements a function to unbind the object from the virtual address
> + *
> + * Returns 0 on success, error code on failure.
> + */
> +int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file)
> +{
> +	struct drm_i915_gem_vm_unbind *args = data;
> +	struct i915_address_space *vm;
> +	int ret;
> +
> +	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> +	if (unlikely(!vm))
> +		return -ENOENT;
> +
> +	ret = i915_gem_vm_unbind_vma(vm, NULL, args);
> +
> +	i915_vm_put(vm);
> +	return ret;
> +}
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index b67831833c9a3..cb188377b7bd9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -12,6 +12,7 @@
>   
>   #include "gem/i915_gem_internal.h"
>   #include "gem/i915_gem_lmem.h"
> +#include "gem/i915_gem_vm_bind.h"
>   #include "i915_trace.h"
>   #include "i915_utils.h"
>   #include "intel_gt.h"
> @@ -176,6 +177,8 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
>   void i915_address_space_fini(struct i915_address_space *vm)
>   {
>   	drm_mm_takedown(&vm->mm);
> +	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
> +	mutex_destroy(&vm->vm_bind_lock);
>   }
>   
>   /**
> @@ -204,6 +207,8 @@ static void __i915_vm_release(struct work_struct *work)
>   
>   	__i915_vm_close(vm);
>   
> +	i915_gem_vm_unbind_vma_all(vm);
> +
>   	/* Synchronize async unbinds. */
>   	i915_vma_resource_bind_dep_sync_all(vm);
>   
> @@ -282,6 +287,11 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>   
>   	INIT_LIST_HEAD(&vm->bound_list);
>   	INIT_LIST_HEAD(&vm->unbound_list);
> +
> +	vm->va = RB_ROOT_CACHED;
> +	INIT_LIST_HEAD(&vm->vm_bind_list);
> +	INIT_LIST_HEAD(&vm->vm_bound_list);
> +	mutex_init(&vm->vm_bind_lock);
>   }
>   
>   void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index da21088890b3b..06a259475816b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -259,6 +259,15 @@ struct i915_address_space {
>   	 */
>   	struct list_head unbound_list;
>   
> +	/** @vm_bind_lock: Mutex to protect @vm_bind_list and @vm_bound_list */
> +	struct mutex vm_bind_lock;
> +	/** @vm_bind_list: List of vm_binding in process */
> +	struct list_head vm_bind_list;
> +	/** @vm_bound_list: List of vm_binding completed */
> +	struct list_head vm_bound_list;
> +	/* @va: tree of persistent vmas */
> +	struct rb_root_cached va;
> +
>   	/* Global GTT */
>   	bool is_ggtt:1;
>   
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index 1332c70370a68..9a9010fd9ecfa 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -68,6 +68,7 @@
>   #include "gem/i915_gem_ioctls.h"
>   #include "gem/i915_gem_mman.h"
>   #include "gem/i915_gem_pm.h"
> +#include "gem/i915_gem_vm_bind.h"
>   #include "gt/intel_gt.h"
>   #include "gt/intel_gt_pm.h"
>   #include "gt/intel_rc6.h"
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 2603717164900..092ae4309d8a1 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -29,6 +29,7 @@
>   #include "display/intel_frontbuffer.h"
>   #include "gem/i915_gem_lmem.h"
>   #include "gem/i915_gem_tiling.h"
> +#include "gem/i915_gem_vm_bind.h"
>   #include "gt/intel_engine.h"
>   #include "gt/intel_engine_heartbeat.h"
>   #include "gt/intel_gt.h"
> @@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
>   	spin_unlock(&obj->vma.lock);
>   	mutex_unlock(&vm->mutex);
>   
> +	INIT_LIST_HEAD(&vma->vm_bind_link);
>   	return vma;
>   
>   err_unlock:
> @@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>   {
>   	struct i915_vma *vma;
>   
> -	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>   	GEM_BUG_ON(!kref_read(&vm->ref));
>   
>   	spin_lock(&obj->vma.lock);
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 33a58f605d75c..15eac55a3e274 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
>   {
>   	ptrdiff_t cmp;
>   
> -	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
> -
>   	cmp = ptrdiff(vma->vm, vm);
>   	if (cmp)
>   		return cmp;
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
> index be6e028c3b57d..f746fecae85ed 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -289,6 +289,20 @@ struct i915_vma {
>   	/** This object's place on the active/inactive lists */
>   	struct list_head vm_link;
>   
> +	/** @vm_bind_link: node for the vm_bind related lists of vm */
> +	struct list_head vm_bind_link;
> +
> +	/** Interval tree structures for persistent vma */
> +
> +	/** @rb: node for the interval tree of vm for persistent vmas */
> +	struct rb_node rb;
> +	/** @start: start endpoint of the rb node */
> +	u64 start;
> +	/** @last: Last endpoint of the rb node */
> +	u64 last;
> +	/** @__subtree_last: last in subtree */
> +	u64 __subtree_last;
> +
>   	struct list_head obj_link; /* Link in the object's VMA list */
>   	struct rb_node obj_node;
>   	struct hlist_node obj_hash;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 12435db751eb8..3da0e07f84bbd 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1507,6 +1507,41 @@ struct drm_i915_gem_execbuffer2 {
>   #define i915_execbuffer2_get_context_id(eb2) \
>   	((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
>   
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the completion of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> +	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
> +	__u32 handle;
> +
> +	/**
> +	 * @flags: Supported flags are:
> +	 *
> +	 * I915_TIMELINE_FENCE_WAIT:
> +	 * Wait for the input fence before the operation.
> +	 *
> +	 * I915_TIMELINE_FENCE_SIGNAL:
> +	 * Return operation completion fence as output.
> +	 */
> +	__u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
> +
> +	/**
> +	 * @value: A point in the timeline.
> +	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> +	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> +	 * binary one.
> +	 */
> +	__u64 value;
> +};
> +
>   struct drm_i915_gem_pin {
>   	/** Handle of the buffer to be pinned. */
>   	__u32 handle;
> @@ -3718,6 +3753,134 @@ struct drm_i915_gem_create_ext_protected_content {
>   /* ID of the protected content session managed by i915 when PXP is active */
>   #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>   
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
> + * virtual address (VA) range to the section of an object that should be bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not currently bound) and can
> + * be mapped to whole object or a section of the object (partial binding).
> + * Multiple VA mappings can be created to the same section of the object
> + * (aliasing).
> + *
> + * The @start, @offset and @length must be 4K page aligned. However the DG2
> + * and XEHPSDV has 64K page size for device local memory and has compact page
> + * table. On those platforms, for binding device local-memory objects, the
> + * @start, @offset and @length must be 64K aligned. Also, UMDs should not mix
> + * the local memory 64K page and the system memory 4K page bindings in the same
> + * 2M range.
> + *
> + * Error code -EINVAL will be returned if @start, @offset and @length are not
> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
> + * -ENOSPC will be returned if the VA range specified can't be reserved.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_bind {
> +	/** @vm_id: VM (address space) id to bind */
> +	__u32 vm_id;
> +
> +	/** @handle: Object handle */
> +	__u32 handle;
> +
> +	/** @start: Virtual Address start to bind */
> +	__u64 start;
> +
> +	/** @offset: Offset in object to bind */
> +	__u64 offset;
> +
> +	/** @length: Length of mapping to bind */
> +	__u64 length;
> +
> +	/**
> +	 * @flags: Currently reserved, MBZ.
> +	 *
> +	 * Note that @fence carries its own flags.
> +	 */
> +	__u64 flags;
> +
> +	/**
> +	 * @fence: Timeline fence for bind completion signaling.
> +	 *
> +	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
> +	 *
> +	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +	 * is invalid, and an error will be returned.
> +	 *
> +	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
> +	 * is not requested and binding is completed synchronously.
> +	 */
> +	struct drm_i915_gem_timeline_fence fence;
> +
> +	/**
> +	 * @extensions: Zero-terminated chain of extensions.
> +	 *
> +	 * For future extensions. See struct i915_user_extension.
> +	 */
> +	__u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> + *
> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
> + * address (VA) range that should be unbound from the device page table of the
> + * specified address space (VM). VM_UNBIND will force unbind the specified
> + * range from device page table without waiting for any GPU job to complete.
> + * It is UMDs responsibility to ensure the mapping is no longer in use before
> + * calling VM_UNBIND.
> + *
> + * If the specified mapping is not found, the ioctl will simply return without
> + * any error.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_unbind {
> +	/** @vm_id: VM (address space) id to bind */
> +	__u32 vm_id;
> +
> +	/** @rsvd: Reserved, MBZ */
> +	__u32 rsvd;
> +
> +	/** @start: Virtual Address start to unbind */
> +	__u64 start;
> +
> +	/** @length: Length of mapping to unbind */
> +	__u64 length;
> +
> +	/**
> +	 * @flags: Currently reserved, MBZ.
> +	 *
> +	 * Note that @fence carries its own flags.
> +	 */
> +	__u64 flags;
> +
> +	/**
> +	 * @fence: Timeline fence for unbind completion signaling.
> +	 *
> +	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
> +	 *
> +	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +	 * is invalid, and an error will be returned.
> +	 *
> +	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
> +	 * is not requested and unbinding is completed synchronously.
> +	 */
> +	struct drm_i915_gem_timeline_fence fence;
> +
> +	/**
> +	 * @extensions: Zero-terminated chain of extensions.
> +	 *
> +	 * For future extensions. See struct i915_user_extension.
> +	 */
> +	__u64 extensions;
> +};
> +
>   #if defined(__cplusplus)
>   }
>   #endif

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object
  2022-08-27 19:43 ` [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object Andi Shyti
  2022-08-30 17:37   ` Matthew Auld
@ 2022-08-30 18:19   ` Matthew Auld
  2022-08-31  7:28     ` [Intel-gfx] " Tvrtko Ursulin
  2022-09-01  5:18     ` Niranjana Vishwanathapura
  2022-09-01  5:31   ` Dave Airlie
  2022-09-12 13:11   ` Jani Nikula
  3 siblings, 2 replies; 41+ messages in thread
From: Matthew Auld @ 2022-08-30 18:19 UTC (permalink / raw)
  To: Andi Shyti, intel-gfx, dri-devel
  Cc: Niranjana Vishwanathapura, Thomas Hellstrom, Andi Shyti, Ramalingam C

On 27/08/2022 20:43, Andi Shyti wrote:
> From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> 
> Implement the bind and unbind of an object at the specified GPU virtual
> addresses.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/Makefile                 |   1 +
>   drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  21 ++
>   .../drm/i915/gem/i915_gem_vm_bind_object.c    | 322 ++++++++++++++++++
>   drivers/gpu/drm/i915/gt/intel_gtt.c           |  10 +
>   drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>   drivers/gpu/drm/i915/i915_driver.c            |   1 +
>   drivers/gpu/drm/i915/i915_vma.c               |   3 +-
>   drivers/gpu/drm/i915/i915_vma.h               |   2 -
>   drivers/gpu/drm/i915/i915_vma_types.h         |  14 +
>   include/uapi/drm/i915_drm.h                   | 163 +++++++++
>   10 files changed, 543 insertions(+), 3 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>   create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 522ef9b4aff32..4e1627e96c6e0 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -165,6 +165,7 @@ gem-y += \
>   	gem/i915_gem_ttm_move.o \
>   	gem/i915_gem_ttm_pm.o \
>   	gem/i915_gem_userptr.o \
> +	gem/i915_gem_vm_bind_object.o \
>   	gem/i915_gem_wait.o \
>   	gem/i915_gemfs.o
>   i915-y += \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> new file mode 100644
> index 0000000000000..ebc493b7dafc1
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#ifndef __I915_GEM_VM_BIND_H
> +#define __I915_GEM_VM_BIND_H
> +
> +#include "i915_drv.h"
> +
> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
> +
> +int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
> +			   struct drm_file *file);
> +int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file);
> +
> +void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm);
> +#endif /* __I915_GEM_VM_BIND_H */
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> new file mode 100644
> index 0000000000000..dadd1d4b1761b
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -0,0 +1,322 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#include <linux/interval_tree_generic.h>
> +
> +#include "gem/i915_gem_vm_bind.h"
> +#include "gem/i915_gem_context.h"
> +#include "gt/gen8_engine_cs.h"
> +
> +#include "i915_drv.h"
> +#include "i915_gem_gtt.h"
> +
> +#define START(node) ((node)->start)
> +#define LAST(node) ((node)->last)
> +
> +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
> +		     START, LAST, static inline, i915_vm_bind_it)
> +
> +#undef START
> +#undef LAST
> +
> +/**
> + * DOC: VM_BIND/UNBIND ioctls
> + *
> + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
> + * objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
> + * specified address space (VM). Multiple mappings can map to the same physical
> + * pages of an object (aliasing). These mappings (also referred to as persistent
> + * mappings) will be persistent across multiple GPU submissions (execbuf calls)
> + * issued by the UMD, without user having to provide a list of all required
> + * mappings during each submission (as required by older execbuf mode).
> + *
> + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
> + * signaling the completion of bind/unbind operation.
> + *
> + * VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
> + * User has to opt-in for VM_BIND mode of binding for an address space (VM)
> + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be
> + * done asynchronously, when valid out fence is specified.
> + *
> + * VM_BIND locking order is as below.
> + *
> + * 1) vm_bind_lock mutex will protect vm_bind lists. This lock is taken in
> + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
> + *    mapping.
> + *
> + *    In future, when GPU page faults are supported, we can potentially use a
> + *    rwsem instead, so that multiple page fault handlers can take the read
> + *    side lock to lookup the mapping and hence can run in parallel.
> + *    The older execbuf mode of binding do not need this lock.
> + *
> + * 2) The object's dma-resv lock will protect i915_vma state and needs
> + *    to be held while binding/unbinding a vma in the async worker and while
> + *    updating dma-resv fence list of an object. Note that private BOs of a VM
> + *    will all share a dma-resv object.
> + *
> + * 3) Spinlock/s to protect some of the VM's lists like the list of
> + *    invalidated vmas (due to eviction and userptr invalidation) etc.
> + */
> +
> +/**
> + * i915_gem_vm_bind_lookup_vma() - lookup for the vma with a starting addr
> + * @vm: virtual address space in which vma needs to be looked for
> + * @va: starting addr of the vma
> + *
> + * retrieves the vma with a starting address from the vm's vma tree.
> + *
> + * Returns: returns vma on success, NULL on failure.
> + */
> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
> +{
> +	lockdep_assert_held(&vm->vm_bind_lock);
> +
> +	return i915_vm_bind_it_iter_first(&vm->va, va, va);
> +}
> +
> +/**
> + * i915_gem_vm_bind_remove() - Remove vma from the vm bind list
> + * @vma: vma that needs to be removed
> + * @release_obj: object to be release or not
> + *
> + * Removes the vma from the vm's lists custom interval tree
> + */
> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
> +{
> +	lockdep_assert_held(&vma->vm->vm_bind_lock);
> +
> +	if (!list_empty(&vma->vm_bind_link)) {
> +		list_del_init(&vma->vm_bind_link);
> +		i915_vm_bind_it_remove(vma, &vma->vm->va);
> +
> +		/* Release object */
> +		if (release_obj)
> +			i915_gem_object_put(vma->obj);
> +	}
> +}
> +
> +static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
> +				  struct i915_vma *vma,
> +				  struct drm_i915_gem_vm_unbind *va)
> +{
> +	struct drm_i915_gem_object *obj;
> +	int ret;
> +
> +	if (vma) {
> +		obj = vma->obj;
> +		i915_vma_destroy(vma);
> +
> +		goto exit;
> +	}
> +
> +	if (!va)
> +		return -EINVAL;
> +
> +	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
> +	if (ret)
> +		return ret;
> +
> +	va->start = gen8_noncanonical_addr(va->start);
> +	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +
> +	if (!vma)
> +		ret = -ENOENT;
> +	else if (vma->size != va->length)
> +		ret = -EINVAL;
> +
> +	if (ret) {
> +		mutex_unlock(&vm->vm_bind_lock);
> +		return ret;
> +	}
> +
> +	i915_gem_vm_bind_remove(vma, false);
> +
> +	mutex_unlock(&vm->vm_bind_lock);
> +
> +	/* Destroy vma and then release object */
> +	obj = vma->obj;
> +	ret = i915_gem_object_lock(obj, NULL);
> +	if (ret)
> +		return ret;
> +
> +	i915_vma_destroy(vma);
> +	i915_gem_object_unlock(obj);
> +
> +exit:
> +	i915_gem_object_put(obj);
> +
> +	return 0;
> +}
> +
> +/**
> + * i915_gem_vm_unbind_vma_all() - Unbind all vmas from an address space
> + * @vm: Address spece from which vma binding needs to be removed
> + *
> + * Unbind all userspace requested object binding
> + */
> +void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm)
> +{
> +	struct i915_vma *vma, *t;
> +
> +	list_for_each_entry_safe(vma, t, &vm->vm_bound_list, vm_bind_link)
> +		WARN_ON(i915_gem_vm_unbind_vma(vm, vma, NULL));
> +}
> +
> +static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
> +					struct drm_i915_gem_object *obj,
> +					struct drm_i915_gem_vm_bind *va)
> +{
> +	struct i915_ggtt_view view;

Should that be renamed to i915_gtt_view? So all of this just works with 
ppgtt insertion, as-is? I'm impressed.

> +	struct i915_vma *vma;
> +
> +	va->start = gen8_noncanonical_addr(va->start);
> +	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +	if (vma)
> +		return ERR_PTR(-EEXIST);
> +
> +	view.type = I915_GGTT_VIEW_PARTIAL;
> +	view.partial.offset = va->offset >> PAGE_SHIFT;
> +	view.partial.size = va->length >> PAGE_SHIFT;
> +	vma = i915_vma_instance(obj, vm, &view);
> +	if (IS_ERR(vma))
> +		return vma;
> +
> +	vma->start = va->start;
> +	vma->last = va->start + va->length - 1;
> +
> +	return vma;
> +}
> +
> +static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> +				struct drm_i915_gem_vm_bind *va,
> +				struct drm_file *file)
> +{
> +	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma = NULL;
> +	struct i915_gem_ww_ctx ww;
> +	u64 pin_flags;
> +	int ret = 0;
> +
> +	if (!vm->vm_bind_mode)
> +		return -EOPNOTSUPP;
> +
> +	obj = i915_gem_object_lookup(file, va->handle);

AFAICT this doesn't have to be an object from gem_create/ext...

> +	if (!obj)
> +		return -ENOENT;
> +
> +	if (!va->length ||
> +	    !IS_ALIGNED(va->offset | va->length,
> +			i915_gem_object_max_page_size(obj->mm.placements,
> +						      obj->mm.n_placements)) ||

...and so here max_page_size() can BUG_ON() if n_placements = 0. Also 
what should this return in that case?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v3 09/17] drm/i915: Do not support vm_bind mode in execbuf2
  2022-08-27 19:43 ` [RFC PATCH v3 09/17] drm/i915: Do not support vm_bind mode in execbuf2 Andi Shyti
@ 2022-08-31  5:45   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 41+ messages in thread
From: Niranjana Vishwanathapura @ 2022-08-31  5:45 UTC (permalink / raw)
  To: Andi Shyti
  Cc: Ramalingam C, intel-gfx, dri-devel, Thomas Hellstrom,
	Matthew Auld, Andi Shyti

On Sat, Aug 27, 2022 at 09:43:55PM +0200, Andi Shyti wrote:
>From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>
>Do not support the vm in vm_bind_mode in execbuf2 ioctl.
>
>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>---
> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>index cd75b0ca2555f..f85f10cf9c34b 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>@@ -781,6 +781,11 @@ static int eb_select_context(struct i915_execbuffer *eb)
> 	if (unlikely(IS_ERR(ctx)))
> 		return PTR_ERR(ctx);
>
>+	if (ctx->vm->vm_bind_mode) {
>+		i915_gem_context_put(ctx);
>+		return -EOPNOTSUPP;
>+	}
>+
> 	eb->gem_context = ctx;
> 	if (i915_gem_context_has_full_ppgtt(ctx))
> 		eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;

This should probably be merged with patch #2 that introduces vm_bind_mode uapi.

Niranjana

>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object
  2022-08-30 17:37   ` Matthew Auld
@ 2022-08-31  6:10     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 41+ messages in thread
From: Niranjana Vishwanathapura @ 2022-08-31  6:10 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Andi Shyti, Ramalingam C, intel-gfx, dri-devel, Thomas Hellstrom,
	Andi Shyti

On Tue, Aug 30, 2022 at 06:37:55PM +0100, Matthew Auld wrote:
>On 27/08/2022 20:43, Andi Shyti wrote:
>>From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>
>>Implement the bind and unbind of an object at the specified GPU virtual
>>addresses.
>>
>>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
>>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>---
>>  drivers/gpu/drm/i915/Makefile                 |   1 +
>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  21 ++
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 322 ++++++++++++++++++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  10 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>>  drivers/gpu/drm/i915/i915_driver.c            |   1 +
>>  drivers/gpu/drm/i915/i915_vma.c               |   3 +-
>>  drivers/gpu/drm/i915/i915_vma.h               |   2 -
>>  drivers/gpu/drm/i915/i915_vma_types.h         |  14 +
>>  include/uapi/drm/i915_drm.h                   | 163 +++++++++
>>  10 files changed, 543 insertions(+), 3 deletions(-)
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>
>>diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>>index 522ef9b4aff32..4e1627e96c6e0 100644
>>--- a/drivers/gpu/drm/i915/Makefile
>>+++ b/drivers/gpu/drm/i915/Makefile
>>@@ -165,6 +165,7 @@ gem-y += \
>>  	gem/i915_gem_ttm_move.o \
>>  	gem/i915_gem_ttm_pm.o \
>>  	gem/i915_gem_userptr.o \
>>+	gem/i915_gem_vm_bind_object.o \
>>  	gem/i915_gem_wait.o \
>>  	gem/i915_gemfs.o
>>  i915-y += \
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>new file mode 100644
>>index 0000000000000..ebc493b7dafc1
>>--- /dev/null
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>@@ -0,0 +1,21 @@
>>+/* SPDX-License-Identifier: MIT */
>>+/*
>>+ * Copyright © 2022 Intel Corporation
>>+ */
>>+
>>+#ifndef __I915_GEM_VM_BIND_H
>>+#define __I915_GEM_VM_BIND_H
>>+
>>+#include "i915_drv.h"
>>+
>>+struct i915_vma *
>>+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>>+void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
>>+
>>+int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
>>+			   struct drm_file *file);
>>+int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
>>+			     struct drm_file *file);
>>+
>>+void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm);
>>+#endif /* __I915_GEM_VM_BIND_H */
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>new file mode 100644
>>index 0000000000000..dadd1d4b1761b
>>--- /dev/null
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>@@ -0,0 +1,322 @@
>>+// SPDX-License-Identifier: MIT
>>+/*
>>+ * Copyright © 2022 Intel Corporation
>>+ */
>>+
>>+#include <linux/interval_tree_generic.h>
>>+
>>+#include "gem/i915_gem_vm_bind.h"
>>+#include "gem/i915_gem_context.h"
>>+#include "gt/gen8_engine_cs.h"
>>+
>>+#include "i915_drv.h"
>>+#include "i915_gem_gtt.h"
>>+
>>+#define START(node) ((node)->start)
>>+#define LAST(node) ((node)->last)
>>+
>>+INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
>>+		     START, LAST, static inline, i915_vm_bind_it)
>>+
>>+#undef START
>>+#undef LAST
>>+
>>+/**
>>+ * DOC: VM_BIND/UNBIND ioctls
>>+ *
>>+ * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
>>+ * objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
>>+ * specified address space (VM). Multiple mappings can map to the same physical
>>+ * pages of an object (aliasing). These mappings (also referred to as persistent
>>+ * mappings) will be persistent across multiple GPU submissions (execbuf calls)
>>+ * issued by the UMD, without user having to provide a list of all required
>>+ * mappings during each submission (as required by older execbuf mode).
>>+ *
>>+ * The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
>>+ * signaling the completion of bind/unbind operation.
>>+ *
>>+ * VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
>>+ * User has to opt-in for VM_BIND mode of binding for an address space (VM)
>>+ * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
>>+ *
>>+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>>+ * are not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be
>>+ * done asynchronously, when valid out fence is specified.
>>+ *
>>+ * VM_BIND locking order is as below.
>>+ *
>>+ * 1) vm_bind_lock mutex will protect vm_bind lists. This lock is taken in
>>+ *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
>>+ *    mapping.
>>+ *
>>+ *    In future, when GPU page faults are supported, we can potentially use a
>>+ *    rwsem instead, so that multiple page fault handlers can take the read
>>+ *    side lock to lookup the mapping and hence can run in parallel.
>>+ *    The older execbuf mode of binding do not need this lock.
>>+ *
>>+ * 2) The object's dma-resv lock will protect i915_vma state and needs
>>+ *    to be held while binding/unbinding a vma in the async worker and while
>>+ *    updating dma-resv fence list of an object. Note that private BOs of a VM
>>+ *    will all share a dma-resv object.
>>+ *
>>+ * 3) Spinlock/s to protect some of the VM's lists like the list of
>>+ *    invalidated vmas (due to eviction and userptr invalidation) etc.
>>+ */
>>+
>>+/**
>>+ * i915_gem_vm_bind_lookup_vma() - lookup for the vma with a starting addr
>>+ * @vm: virtual address space in which vma needs to be looked for
>>+ * @va: starting addr of the vma
>>+ *
>>+ * retrieves the vma with a starting address from the vm's vma tree.
>>+ *
>>+ * Returns: returns vma on success, NULL on failure.
>>+ */
>>+struct i915_vma *
>>+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
>>+{
>>+	lockdep_assert_held(&vm->vm_bind_lock);
>>+
>>+	return i915_vm_bind_it_iter_first(&vm->va, va, va);
>>+}
>>+
>>+/**
>>+ * i915_gem_vm_bind_remove() - Remove vma from the vm bind list
>>+ * @vma: vma that needs to be removed
>>+ * @release_obj: object to be release or not
>>+ *
>>+ * Removes the vma from the vm's lists custom interval tree
>>+ */
>>+void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
>>+{
>>+	lockdep_assert_held(&vma->vm->vm_bind_lock);
>>+
>>+	if (!list_empty(&vma->vm_bind_link)) {
>>+		list_del_init(&vma->vm_bind_link);
>>+		i915_vm_bind_it_remove(vma, &vma->vm->va);
>>+
>>+		/* Release object */
>>+		if (release_obj)
>>+			i915_gem_object_put(vma->obj);
>>+	}
>>+}
>>+
>>+static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
>>+				  struct i915_vma *vma,
>>+				  struct drm_i915_gem_vm_unbind *va)
>>+{
>>+	struct drm_i915_gem_object *obj;
>>+	int ret;
>>+
>>+	if (vma) {
>>+		obj = vma->obj;
>>+		i915_vma_destroy(vma);
>>+
>>+		goto exit;
>>+	}

This function overloading based on whether vma is NULL or not
doesn't look right. More on this below.

>>+
>>+	if (!va)
>>+		return -EINVAL;
>>+
>>+	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
>>+	if (ret)
>>+		return ret;
>>+
>>+	va->start = gen8_noncanonical_addr(va->start);
>>+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>>+
>>+	if (!vma)
>>+		ret = -ENOENT;
>>+	else if (vma->size != va->length)
>>+		ret = -EINVAL;
>>+
>>+	if (ret) {
>>+		mutex_unlock(&vm->vm_bind_lock);
>>+		return ret;
>>+	}
>>+
>>+	i915_gem_vm_bind_remove(vma, false);
>>+
>>+	mutex_unlock(&vm->vm_bind_lock);
>>+
>>+	/* Destroy vma and then release object */
>>+	obj = vma->obj;
>>+	ret = i915_gem_object_lock(obj, NULL);
>>+	if (ret)
>>+		return ret;
>>+
>>+	i915_vma_destroy(vma);
>>+	i915_gem_object_unlock(obj);
>>+
>>+exit:
>>+	i915_gem_object_put(obj);
>>+
>>+	return 0;
>>+}
>>+
>>+/**
>>+ * i915_gem_vm_unbind_vma_all() - Unbind all vmas from an address space
>>+ * @vm: Address spece from which vma binding needs to be removed
>>+ *
>>+ * Unbind all userspace requested object binding
>>+ */
>>+void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm)
>>+{
>>+	struct i915_vma *vma, *t;
>>+
>>+	list_for_each_entry_safe(vma, t, &vm->vm_bound_list, vm_bind_link)
>>+		WARN_ON(i915_gem_vm_unbind_vma(vm, vma, NULL));

I think this unbinding should be done for vmas in both vm_bound_list and
vm_bind_list.

>>+}
>>+
>>+static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
>>+					struct drm_i915_gem_object *obj,
>>+					struct drm_i915_gem_vm_bind *va)
>>+{
>>+	struct i915_ggtt_view view;
>>+	struct i915_vma *vma;
>>+
>>+	va->start = gen8_noncanonical_addr(va->start);
>>+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>>+	if (vma)
>>+		return ERR_PTR(-EEXIST);
>>+
>>+	view.type = I915_GGTT_VIEW_PARTIAL;
>>+	view.partial.offset = va->offset >> PAGE_SHIFT;
>>+	view.partial.size = va->length >> PAGE_SHIFT;
>>+	vma = i915_vma_instance(obj, vm, &view);
>>+	if (IS_ERR(vma))
>>+		return vma;
>>+
>>+	vma->start = va->start;
>>+	vma->last = va->start + va->length - 1;
>>+
>>+	return vma;
>>+}
>>+
>>+static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>>+				struct drm_i915_gem_vm_bind *va,
>>+				struct drm_file *file)
>>+{
>>+	struct drm_i915_gem_object *obj;
>>+	struct i915_vma *vma = NULL;
>>+	struct i915_gem_ww_ctx ww;
>>+	u64 pin_flags;
>>+	int ret = 0;
>>+
>>+	if (!vm->vm_bind_mode)
>>+		return -EOPNOTSUPP;
>>+
>>+	obj = i915_gem_object_lookup(file, va->handle);
>>+	if (!obj)
>>+		return -ENOENT;
>>+
>>+	if (!va->length ||
>>+	    !IS_ALIGNED(va->offset | va->length,
>>+			i915_gem_object_max_page_size(obj->mm.placements,
>>+						      obj->mm.n_placements)) ||
>>+	    range_overflows_t(u64, va->offset, va->length, obj->base.size)) {
>>+		ret = -EINVAL;
>>+		goto put_obj;
>>+	}
>>+
>>+	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
>>+	if (ret)
>>+		goto put_obj;
>>+
>>+	vma = vm_bind_get_vma(vm, obj, va);
>>+	if (IS_ERR(vma)) {
>>+		ret = PTR_ERR(vma);
>>+		goto unlock_vm;
>>+	}
>>+
>>+	for_i915_gem_ww(&ww, ret, true) {
>>+retry:
>>+		ret = i915_gem_object_lock(vma->obj, &ww);
>>+		if (ret)
>>+			goto out_ww;
>>+
>>+		ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
>>+		if (ret)
>>+			goto out_ww;
>>+
>>+		/* Make it evictable */
>>+		__i915_vma_unpin(vma);
>>+
>>+		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>>+		i915_vm_bind_it_insert(vma, &vm->va);
>>+
>>+out_ww:
>>+		if (ret == -EDEADLK) {
>>+			ret = i915_gem_ww_ctx_backoff(&ww);
>>+			if (!ret)
>>+				goto retry;
>>+		} else {
>>+			/* Hold object reference until vm_unbind */
>>+			i915_gem_object_get(vma->obj);
>>+		}
>>+	}
>
>Just a drive-by-comment, since this looks a little strange at a 
>glance. The main idea behind for_i915_gem_ww() is to handle this type 
>of stuff for you.
>
>With the usual pattern this would look something like:
>
>for_i915_gem_ww(&ww, ret, true) {
>    ret = i915_gem_object_lock(vma->obj, &ww);
>    if (ret)
>        continue;
>
>    ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
>    if (ret)
>        continue;
>
>    __i915_vma_unpin(vma);
>    list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>    i915_vm_bind_it_insert(vma, &vm->va);
>    i915_gem_object_get(vma->obj);
>}
>
>Does that not work here? If it doesn't then probably we shouldn't use 
>for_i915_gem_ww() here.
>

Yah, the for_i915_gem_ww() usage here is wrong.
The above pattern by Matt should work here.

Also, vma destruction upon error is missing here and got added in a
later patch. There is no reason not to handle error condition here
in this patch.

Niranjana

>>+
>>+unlock_vm:
>>+	mutex_unlock(&vm->vm_bind_lock);
>>+
>>+put_obj:
>>+	i915_gem_object_put(obj);
>>+
>>+	return ret;
>>+}
>>+
>>+/**
>>+ * i915_gem_vm_bind_ioctl() - ioctl function for binding an obj into
>>+ * virtual address
>>+ * @dev: drm device associated to the virtual address
>>+ * @data: data related to the vm bind required
>>+ * @file: drm_file related to he ioctl
>>+ *
>>+ * Implements a function to bind the object into the virtual address
>>+ *
>>+ * Returns 0 on success, error code on failure.
>>+ */
>>+int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
>>+			   struct drm_file *file)
>>+{
>>+	struct drm_i915_gem_vm_bind *args = data;
>>+	struct i915_address_space *vm;
>>+	int ret;
>>+
>>+	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>>+	if (unlikely(!vm))
>>+		return -ENOENT;
>>+
>>+	ret = i915_gem_vm_bind_obj(vm, args, file);
>>+
>>+	i915_vm_put(vm);
>>+	return ret;
>>+}
>>+
>>+/**
>>+ * i915_gem_vm_unbind_ioctl() - ioctl function for unbinding an obj from
>>+ * virtual address
>>+ * @dev: drm device associated to the virtual address
>>+ * @data: data related to the binding that needs to be unbinded
>>+ * @file: drm_file related to the ioctl
>>+ *
>>+ * Implements a function to unbind the object from the virtual address
>>+ *
>>+ * Returns 0 on success, error code on failure.
>>+ */
>>+int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
>>+			     struct drm_file *file)
>>+{
>>+	struct drm_i915_gem_vm_unbind *args = data;
>>+	struct i915_address_space *vm;
>>+	int ret;
>>+
>>+	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>>+	if (unlikely(!vm))
>>+		return -ENOENT;
>>+
>>+	ret = i915_gem_vm_unbind_vma(vm, NULL, args);

By passing NULL for vma, we are essentially calling i915_vma_destroy()
here. But I think we should be calling i915_gem_vm_bind_remove() also.
So, this function overloading based on vma being NULL or not doesn't
look right to me.

>>+
>>+	i915_vm_put(vm);
>>+	return ret;
>>+}
>>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>index b67831833c9a3..cb188377b7bd9 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>@@ -12,6 +12,7 @@
>>  #include "gem/i915_gem_internal.h"
>>  #include "gem/i915_gem_lmem.h"
>>+#include "gem/i915_gem_vm_bind.h"
>>  #include "i915_trace.h"
>>  #include "i915_utils.h"
>>  #include "intel_gt.h"
>>@@ -176,6 +177,8 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
>>  void i915_address_space_fini(struct i915_address_space *vm)
>>  {
>>  	drm_mm_takedown(&vm->mm);
>>+	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>>+	mutex_destroy(&vm->vm_bind_lock);
>>  }
>>  /**
>>@@ -204,6 +207,8 @@ static void __i915_vm_release(struct work_struct *work)
>>  	__i915_vm_close(vm);
>>+	i915_gem_vm_unbind_vma_all(vm);
>>+
>>  	/* Synchronize async unbinds. */
>>  	i915_vma_resource_bind_dep_sync_all(vm);
>>@@ -282,6 +287,11 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>>  	INIT_LIST_HEAD(&vm->bound_list);
>>  	INIT_LIST_HEAD(&vm->unbound_list);
>>+
>>+	vm->va = RB_ROOT_CACHED;
>>+	INIT_LIST_HEAD(&vm->vm_bind_list);
>>+	INIT_LIST_HEAD(&vm->vm_bound_list);
>>+	mutex_init(&vm->vm_bind_lock);
>>  }
>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>index da21088890b3b..06a259475816b 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>@@ -259,6 +259,15 @@ struct i915_address_space {
>>  	 */
>>  	struct list_head unbound_list;
>>+	/** @vm_bind_lock: Mutex to protect @vm_bind_list and @vm_bound_list */
>>+	struct mutex vm_bind_lock;
>>+	/** @vm_bind_list: List of vm_binding in process */
>>+	struct list_head vm_bind_list;
>>+	/** @vm_bound_list: List of vm_binding completed */
>>+	struct list_head vm_bound_list;
>>+	/* @va: tree of persistent vmas */
>>+	struct rb_root_cached va;
>>+
>>  	/* Global GTT */
>>  	bool is_ggtt:1;
>>diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>>index 1332c70370a68..9a9010fd9ecfa 100644
>>--- a/drivers/gpu/drm/i915/i915_driver.c
>>+++ b/drivers/gpu/drm/i915/i915_driver.c
>>@@ -68,6 +68,7 @@
>>  #include "gem/i915_gem_ioctls.h"
>>  #include "gem/i915_gem_mman.h"
>>  #include "gem/i915_gem_pm.h"
>>+#include "gem/i915_gem_vm_bind.h"
>>  #include "gt/intel_gt.h"
>>  #include "gt/intel_gt_pm.h"
>>  #include "gt/intel_rc6.h"

Not used, should be removed.

>>diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>>index 2603717164900..092ae4309d8a1 100644
>>--- a/drivers/gpu/drm/i915/i915_vma.c
>>+++ b/drivers/gpu/drm/i915/i915_vma.c
>>@@ -29,6 +29,7 @@
>>  #include "display/intel_frontbuffer.h"
>>  #include "gem/i915_gem_lmem.h"
>>  #include "gem/i915_gem_tiling.h"
>>+#include "gem/i915_gem_vm_bind.h"
>>  #include "gt/intel_engine.h"
>>  #include "gt/intel_engine_heartbeat.h"
>>  #include "gt/intel_gt.h"
>>@@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>  	spin_unlock(&obj->vma.lock);
>>  	mutex_unlock(&vm->mutex);
>>+	INIT_LIST_HEAD(&vma->vm_bind_link);
>>  	return vma;
>>  err_unlock:
>>@@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>>  {
>>  	struct i915_vma *vma;
>>-	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>>  	GEM_BUG_ON(!kref_read(&vm->ref));
>>  	spin_lock(&obj->vma.lock);
>>diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
>>index 33a58f605d75c..15eac55a3e274 100644
>>--- a/drivers/gpu/drm/i915/i915_vma.h
>>+++ b/drivers/gpu/drm/i915/i915_vma.h
>>@@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
>>  {
>>  	ptrdiff_t cmp;
>>-	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>>-
>>  	cmp = ptrdiff(vma->vm, vm);
>>  	if (cmp)
>>  		return cmp;
>>diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
>>index be6e028c3b57d..f746fecae85ed 100644
>>--- a/drivers/gpu/drm/i915/i915_vma_types.h
>>+++ b/drivers/gpu/drm/i915/i915_vma_types.h
>>@@ -289,6 +289,20 @@ struct i915_vma {
>>  	/** This object's place on the active/inactive lists */
>>  	struct list_head vm_link;
>>+	/** @vm_bind_link: node for the vm_bind related lists of vm */
>>+	struct list_head vm_bind_link;
>>+
>>+	/** Interval tree structures for persistent vma */
>>+
>>+	/** @rb: node for the interval tree of vm for persistent vmas */
>>+	struct rb_node rb;
>>+	/** @start: start endpoint of the rb node */
>>+	u64 start;
>>+	/** @last: Last endpoint of the rb node */
>>+	u64 last;
>>+	/** @__subtree_last: last in subtree */
>>+	u64 __subtree_last;
>>+
>>  	struct list_head obj_link; /* Link in the object's VMA list */
>>  	struct rb_node obj_node;
>>  	struct hlist_node obj_hash;
>>diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>index 12435db751eb8..3da0e07f84bbd 100644
>>--- a/include/uapi/drm/i915_drm.h
>>+++ b/include/uapi/drm/i915_drm.h
>>@@ -1507,6 +1507,41 @@ struct drm_i915_gem_execbuffer2 {
>>  #define i915_execbuffer2_get_context_id(eb2) \
>>  	((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
>>+/**
>>+ * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
>>+ *
>>+ * The operation will wait for input fence to signal.
>>+ *
>>+ * The returned output fence will be signaled after the completion of the
>>+ * operation.
>>+ */
>>+struct drm_i915_gem_timeline_fence {
>>+	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
>>+	__u32 handle;
>>+
>>+	/**
>>+	 * @flags: Supported flags are:
>>+	 *
>>+	 * I915_TIMELINE_FENCE_WAIT:
>>+	 * Wait for the input fence before the operation.
>>+	 *
>>+	 * I915_TIMELINE_FENCE_SIGNAL:
>>+	 * Return operation completion fence as output.
>>+	 */
>>+	__u32 flags;
>>+#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>>+#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>>+#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>>+
>>+	/**
>>+	 * @value: A point in the timeline.
>>+	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>>+	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
>>+	 * binary one.
>>+	 */
>>+	__u64 value;
>>+};
>>+
>>  struct drm_i915_gem_pin {
>>  	/** Handle of the buffer to be pinned. */
>>  	__u32 handle;
>>@@ -3718,6 +3753,134 @@ struct drm_i915_gem_create_ext_protected_content {
>>  /* ID of the protected content session managed by i915 when PXP is active */
>>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>>+/**
>>+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>>+ *
>>+ * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
>>+ * virtual address (VA) range to the section of an object that should be bound
>>+ * in the device page table of the specified address space (VM).
>>+ * The VA range specified must be unique (ie., not currently bound) and can
>>+ * be mapped to whole object or a section of the object (partial binding).
>>+ * Multiple VA mappings can be created to the same section of the object
>>+ * (aliasing).
>>+ *
>>+ * The @start, @offset and @length must be 4K page aligned. However the DG2
>>+ * and XEHPSDV has 64K page size for device local memory and has compact page
>>+ * table. On those platforms, for binding device local-memory objects, the
>>+ * @start, @offset and @length must be 64K aligned. Also, UMDs should not mix
>>+ * the local memory 64K page and the system memory 4K page bindings in the same
>>+ * 2M range.
>>+ *
>>+ * Error code -EINVAL will be returned if @start, @offset and @length are not
>>+ * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
>>+ * -ENOSPC will be returned if the VA range specified can't be reserved.
>>+ *
>>+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>>+ * are not ordered. Furthermore, parts of the VM_BIND operation can be done
>>+ * asynchronously, if valid @fence is specified.
>>+ */
>>+struct drm_i915_gem_vm_bind {
>>+	/** @vm_id: VM (address space) id to bind */
>>+	__u32 vm_id;
>>+
>>+	/** @handle: Object handle */
>>+	__u32 handle;
>>+
>>+	/** @start: Virtual Address start to bind */
>>+	__u64 start;
>>+
>>+	/** @offset: Offset in object to bind */
>>+	__u64 offset;
>>+
>>+	/** @length: Length of mapping to bind */
>>+	__u64 length;
>>+
>>+	/**
>>+	 * @flags: Currently reserved, MBZ.
>>+	 *
>>+	 * Note that @fence carries its own flags.
>>+	 */
>>+	__u64 flags;
>>+
>>+	/**
>>+	 * @fence: Timeline fence for bind completion signaling.
>>+	 *
>>+	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
>>+	 *
>>+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>>+	 * is invalid, and an error will be returned.
>>+	 *
>>+	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
>>+	 * is not requested and binding is completed synchronously.
>>+	 */
>>+	struct drm_i915_gem_timeline_fence fence;
>>+
>>+	/**
>>+	 * @extensions: Zero-terminated chain of extensions.
>>+	 *
>>+	 * For future extensions. See struct i915_user_extension.
>>+	 */
>>+	__u64 extensions;
>>+};
>>+
>>+/**
>>+ * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>>+ *
>>+ * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
>>+ * address (VA) range that should be unbound from the device page table of the
>>+ * specified address space (VM). VM_UNBIND will force unbind the specified
>>+ * range from device page table without waiting for any GPU job to complete.
>>+ * It is UMDs responsibility to ensure the mapping is no longer in use before
>>+ * calling VM_UNBIND.
>>+ *
>>+ * If the specified mapping is not found, the ioctl will simply return without
>>+ * any error.
>>+ *
>>+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>>+ * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
>>+ * asynchronously, if valid @fence is specified.
>>+ */
>>+struct drm_i915_gem_vm_unbind {
>>+	/** @vm_id: VM (address space) id to bind */
>>+	__u32 vm_id;
>>+
>>+	/** @rsvd: Reserved, MBZ */
>>+	__u32 rsvd;
>>+
>>+	/** @start: Virtual Address start to unbind */
>>+	__u64 start;
>>+
>>+	/** @length: Length of mapping to unbind */
>>+	__u64 length;
>>+
>>+	/**
>>+	 * @flags: Currently reserved, MBZ.
>>+	 *
>>+	 * Note that @fence carries its own flags.
>>+	 */
>>+	__u64 flags;
>>+
>>+	/**
>>+	 * @fence: Timeline fence for unbind completion signaling.
>>+	 *
>>+	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
>>+	 *
>>+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>>+	 * is invalid, and an error will be returned.
>>+	 *
>>+	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
>>+	 * is not requested and unbinding is completed synchronously.
>>+	 */
>>+	struct drm_i915_gem_timeline_fence fence;
>>+
>>+	/**
>>+	 * @extensions: Zero-terminated chain of extensions.
>>+	 *
>>+	 * For future extensions. See struct i915_user_extension.
>>+	 */
>>+	__u64 extensions;
>>+};
>>+
>>  #if defined(__cplusplus)
>>  }
>>  #endif

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v3 05/17] drm/i915: Support for VM private BOs
  2022-08-27 19:43 ` [RFC PATCH v3 05/17] drm/i915: Support for VM private BOs Andi Shyti
@ 2022-08-31  6:13   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 41+ messages in thread
From: Niranjana Vishwanathapura @ 2022-08-31  6:13 UTC (permalink / raw)
  To: Andi Shyti
  Cc: Ramalingam C, intel-gfx, dri-devel, Thomas Hellstrom,
	Matthew Auld, Andi Shyti

On Sat, Aug 27, 2022 at 09:43:51PM +0200, Andi Shyti wrote:
>From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>
>Each VM creates a root_obj and shares it with all of its private objects
>to use it as dma_resv object. This has a performance advantage as it
>requires a single dma_resv object update for all private BOs vs list of
>dma_resv objects update for shared BOs, in the execbuf path.
>
>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>---
> drivers/gpu/drm/i915/gem/i915_gem_object_types.h   | 3 +++
> drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c | 9 +++++++++
> drivers/gpu/drm/i915/gt/intel_gtt.c                | 4 ++++
> drivers/gpu/drm/i915/gt/intel_gtt.h                | 2 ++
> drivers/gpu/drm/i915/i915_vma.c                    | 1 +
> drivers/gpu/drm/i915/i915_vma_types.h              | 2 ++
> 6 files changed, 21 insertions(+)
>
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>index 9f6b14ec189a2..46308dcf39e99 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>@@ -241,6 +241,9 @@ struct drm_i915_gem_object {
>
> 	const struct drm_i915_gem_object_ops *ops;
>
>+	/* For VM private BO, points to root_obj in VM. NULL otherwise */
>+	struct drm_i915_gem_object *priv_root;
>+
> 	struct {
> 		/**
> 		 * @vma.lock: protect the list/tree of vmas
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>index dadd1d4b1761b..9ff929f187cfd 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>@@ -93,6 +93,7 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
>
> 	if (!list_empty(&vma->vm_bind_link)) {
> 		list_del_init(&vma->vm_bind_link);
>+		list_del_init(&vma->non_priv_vm_bind_link);
> 		i915_vm_bind_it_remove(vma, &vma->vm->va);
>
> 		/* Release object */
>@@ -219,6 +220,11 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> 		goto put_obj;
> 	}
>
>+	if (obj->priv_root && obj->priv_root != vm->root_obj) {
>+		ret = -EINVAL;
>+		goto put_obj;
>+	}
>+
> 	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
> 	if (ret)
> 		goto put_obj;
>@@ -244,6 +250,9 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>
> 		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
> 		i915_vm_bind_it_insert(vma, &vm->va);
>+		if (!obj->priv_root)
>+			list_add_tail(&vma->non_priv_vm_bind_link,
>+				      &vm->non_priv_vm_bind_list);
>
> out_ww:
> 		if (ret == -EDEADLK) {
>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>index cb188377b7bd9..c4f75826213ae 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>@@ -177,6 +177,7 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
> void i915_address_space_fini(struct i915_address_space *vm)
> {
> 	drm_mm_takedown(&vm->mm);
>+	i915_gem_object_put(vm->root_obj);
> 	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
> 	mutex_destroy(&vm->vm_bind_lock);
> }
>@@ -292,6 +293,9 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
> 	INIT_LIST_HEAD(&vm->vm_bind_list);
> 	INIT_LIST_HEAD(&vm->vm_bound_list);
> 	mutex_init(&vm->vm_bind_lock);
>+	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>+	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
>+	GEM_BUG_ON(IS_ERR(vm->root_obj));
> }
>
> void *__px_vaddr(struct drm_i915_gem_object *p)
>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>index 06a259475816b..9a2665e4ec2e5 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>@@ -267,6 +267,8 @@ struct i915_address_space {
> 	struct list_head vm_bound_list;
> 	/* @va: tree of persistent vmas */
> 	struct rb_root_cached va;
>+	struct list_head non_priv_vm_bind_list;
>+	struct drm_i915_gem_object *root_obj;
>
> 	/* Global GTT */
> 	bool is_ggtt:1;
>diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>index 092ae4309d8a1..239346e0c07f2 100644
>--- a/drivers/gpu/drm/i915/i915_vma.c
>+++ b/drivers/gpu/drm/i915/i915_vma.c
>@@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
> 	mutex_unlock(&vm->mutex);
>
> 	INIT_LIST_HEAD(&vma->vm_bind_link);
>+	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
> 	return vma;
>
> err_unlock:
>diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
>index f746fecae85ed..de5534d518cdd 100644
>--- a/drivers/gpu/drm/i915/i915_vma_types.h
>+++ b/drivers/gpu/drm/i915/i915_vma_types.h
>@@ -291,6 +291,8 @@ struct i915_vma {
>
> 	/** @vm_bind_link: node for the vm_bind related lists of vm */
> 	struct list_head vm_bind_link;
>+	/* @non_priv_vm_bind_link: Link in non-private persistent VMA list */
>+	struct list_head non_priv_vm_bind_link;
>
> 	/** Interval tree structures for persistent vma */
>

I am not seeing the upai part to allow user to specify an object as private.

Niranjana

>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v3 07/17] drm/i915/vm_bind: Handle persistent vmas
  2022-08-27 19:43 ` [RFC PATCH v3 07/17] drm/i915/vm_bind: Handle persistent vmas Andi Shyti
@ 2022-08-31  6:16   ` Niranjana Vishwanathapura
  2022-09-12 13:16   ` [Intel-gfx] " Jani Nikula
  1 sibling, 0 replies; 41+ messages in thread
From: Niranjana Vishwanathapura @ 2022-08-31  6:16 UTC (permalink / raw)
  To: Andi Shyti
  Cc: Ramalingam C, intel-gfx, dri-devel, Thomas Hellstrom,
	Matthew Auld, Andi Shyti

On Sat, Aug 27, 2022 at 09:43:53PM +0200, Andi Shyti wrote:
>From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>
>Treat VM_BIND vmas as persistent across execbuf ioctl calls and handle
>them during the request submission in the execbuff path.
>
>Support eviction by maintaining a list of evicted persistent vmas
>for rebinding during next submission.
>
>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>---
> drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
> .../drm/i915/gem/i915_gem_vm_bind_object.c    |  8 +++
> drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
> drivers/gpu/drm/i915/gt/intel_gtt.h           |  4 ++
> drivers/gpu/drm/i915/i915_gem_gtt.c           | 38 +++++++++++++
> drivers/gpu/drm/i915/i915_gem_gtt.h           |  3 +
> drivers/gpu/drm/i915/i915_vma.c               | 50 +++++++++++++++--
> drivers/gpu/drm/i915/i915_vma.h               | 56 +++++++++++++++----
> drivers/gpu/drm/i915/i915_vma_types.h         | 24 ++++++++
> 9 files changed, 169 insertions(+), 17 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>index 389e9f157ca5e..825dce41f7113 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>@@ -38,6 +38,7 @@
> #include "i915_gem_mman.h"
> #include "i915_gem_object.h"
> #include "i915_gem_ttm.h"
>+#include "i915_gem_vm_bind.h"
> #include "i915_memcpy.h"
> #include "i915_trace.h"
>

Not needed, should be removed.

>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>index 9ff929f187cfd..3b45529fe8d4c 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>@@ -91,6 +91,13 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
> {
> 	lockdep_assert_held(&vma->vm->vm_bind_lock);
>
>+	spin_lock(&vma->vm->vm_rebind_lock);
>+	if (!list_empty(&vma->vm_rebind_link))
>+		list_del_init(&vma->vm_rebind_link);
>+	i915_vma_set_purged(vma);
>+	i915_vma_set_freed(vma);
>+	spin_unlock(&vma->vm->vm_rebind_lock);
>+
> 	if (!list_empty(&vma->vm_bind_link)) {
> 		list_del_init(&vma->vm_bind_link);
> 		list_del_init(&vma->non_priv_vm_bind_link);
>@@ -190,6 +197,7 @@ static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
>
> 	vma->start = va->start;
> 	vma->last = va->start + va->length - 1;
>+	i915_vma_set_persistent(vma);

This can be set in vma_create() now that it knows whether the vma
is persistent or not.

Niranjana

>
> 	return vma;
> }
>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>index c4f75826213ae..97cd0089b516d 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>@@ -296,6 +296,8 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
> 	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
> 	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
> 	GEM_BUG_ON(IS_ERR(vm->root_obj));
>+	INIT_LIST_HEAD(&vm->vm_rebind_list);
>+	spin_lock_init(&vm->vm_rebind_lock);
> }
>
> void *__px_vaddr(struct drm_i915_gem_object *p)
>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>index 9a2665e4ec2e5..1f3b1967ec175 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>@@ -265,6 +265,10 @@ struct i915_address_space {
> 	struct list_head vm_bind_list;
> 	/** @vm_bound_list: List of vm_binding completed */
> 	struct list_head vm_bound_list;
>+	/* @vm_rebind_list: list of vmas to be rebinded */
>+	struct list_head vm_rebind_list;
>+	/* @vm_rebind_lock: protects vm_rebound_list */
>+	spinlock_t vm_rebind_lock;
> 	/* @va: tree of persistent vmas */
> 	struct rb_root_cached va;
> 	struct list_head non_priv_vm_bind_list;
>diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>index 329ff75b80b97..f083724163deb 100644
>--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>@@ -25,6 +25,44 @@
> #include "i915_trace.h"
> #include "i915_vgpu.h"
>
>+/**
>+ * i915_vm_sync() - Wait for all requests on private vmas of a vm to be completed
>+ * @vm: address space we need to wait for idle
>+ *
>+ * Waits till all requests of the vm_binded private objs are completed.
>+ *
>+ * Returns: 0 on success -ve errcode on failure
>+ */
>+int i915_vm_sync(struct i915_address_space *vm)
>+{
>+	int ret;
>+
>+	/* Wait for all requests under this vm to finish */
>+	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
>+				    DMA_RESV_USAGE_BOOKKEEP, false,
>+				    MAX_SCHEDULE_TIMEOUT);
>+	if (ret < 0)
>+		return ret;
>+	else if (ret > 0)
>+		return 0;
>+	else
>+		return -ETIMEDOUT;
>+}
>+
>+/**
>+ * i915_vm_is_active() - Check for activeness of requests of vm
>+ * @vm: address spece targetted
>+ *
>+ * Check whether all the requests related private vmas are completed or not
>+ *
>+ * Returns: True when requests are not completed yet. Flase otherwise.
>+ */
>+bool i915_vm_is_active(const struct i915_address_space *vm)
>+{
>+	return !dma_resv_test_signaled(vm->root_obj->base.resv,
>+				       DMA_RESV_USAGE_BOOKKEEP);
>+}
>+
> int i915_gem_gtt_prepare_pages(struct drm_i915_gem_object *obj,
> 			       struct sg_table *pages)
> {
>diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
>index 8c2f57eb5ddaa..a5bbdc59d9dfb 100644
>--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>@@ -51,4 +51,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
>
> #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
>
>+int i915_vm_sync(struct i915_address_space *vm);
>+bool i915_vm_is_active(const struct i915_address_space *vm);
>+
> #endif
>diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>index 239346e0c07f2..0eb7727d62a6f 100644
>--- a/drivers/gpu/drm/i915/i915_vma.c
>+++ b/drivers/gpu/drm/i915/i915_vma.c
>@@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
>
> 	INIT_LIST_HEAD(&vma->vm_bind_link);
> 	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>+	INIT_LIST_HEAD(&vma->vm_rebind_link);
> 	return vma;
>
> err_unlock:
>@@ -387,8 +388,31 @@ int i915_vma_wait_for_bind(struct i915_vma *vma)
> 	return err;
> }
>
>-#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
>-static int i915_vma_verify_bind_complete(struct i915_vma *vma)
>+/**
>+ * i915_vma_sync() - Wait for the vma to be idle
>+ * @vma: vma to be tested
>+ *
>+ * Returns 0 on success and error code on failure
>+ */
>+int i915_vma_sync(struct i915_vma *vma)
>+{
>+	int ret;
>+
>+	/* Wait for the asynchronous bindings and pending GPU reads */
>+	ret = i915_active_wait(&vma->active);
>+	if (ret || !i915_vma_is_persistent(vma) || i915_vma_is_purged(vma))
>+		return ret;
>+
>+	return i915_vm_sync(vma->vm);
>+}
>+
>+/**
>+ * i915_vma_verify_bind_complete() - Check for the vm_bind completion of the vma
>+ * @vma: vma submitted for vm_bind
>+ *
>+ * Returns: 0 if the vm_bind is completed. Error code otherwise.
>+ */
>+int i915_vma_verify_bind_complete(struct i915_vma *vma)
> {
> 	struct dma_fence *fence = i915_active_fence_get(&vma->active.excl);
> 	int err;
>@@ -405,9 +429,6 @@ static int i915_vma_verify_bind_complete(struct i915_vma *vma)
>
> 	return err;
> }
>-#else
>-#define i915_vma_verify_bind_complete(_vma) 0
>-#endif
>
> I915_SELFTEST_EXPORT void
> i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res,
>@@ -1654,6 +1675,13 @@ static void force_unbind(struct i915_vma *vma)
> 	if (!drm_mm_node_allocated(&vma->node))
> 		return;
>
>+	/*
>+	 * Mark persistent vma as purged to avoid it waiting
>+	 * for VM to be released.
>+	 */
>+	if (i915_vma_is_persistent(vma))
>+		i915_vma_set_purged(vma);
>+
> 	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
> 	WARN_ON(__i915_vma_unbind(vma));
> 	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
>@@ -1846,6 +1874,8 @@ int _i915_vma_move_to_active(struct i915_vma *vma,
> 	int err;
>
> 	assert_object_held(obj);
>+	if (i915_vma_is_persistent(vma))
>+		return -EINVAL;
>
> 	GEM_BUG_ON(!vma->pages);
>
>@@ -2014,6 +2044,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
> 	__i915_vma_evict(vma, false);
>
> 	drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
>+
>+	if (i915_vma_is_persistent(vma)) {
>+		spin_lock(&vma->vm->vm_rebind_lock);
>+		if (list_empty(&vma->vm_rebind_link) &&
>+		    !i915_vma_is_purged(vma))
>+			list_add_tail(&vma->vm_rebind_link,
>+				      &vma->vm->vm_rebind_list);
>+		spin_unlock(&vma->vm->vm_rebind_lock);
>+	}
>+
> 	return 0;
> }
>
>diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
>index 15eac55a3e274..bf0b5b4abd919 100644
>--- a/drivers/gpu/drm/i915/i915_vma.h
>+++ b/drivers/gpu/drm/i915/i915_vma.h
>@@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>
> void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
> #define I915_VMA_RELEASE_MAP BIT(0)
>-
>-static inline bool i915_vma_is_active(const struct i915_vma *vma)
>-{
>-	return !i915_active_is_idle(&vma->active);
>-}
>-
> /* do not reserve memory to prevent deadlocks */
> #define __EXEC_OBJECT_NO_RESERVE BIT(31)
>
>@@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
> 	return i915_vm_to_ggtt(vma->vm)->pin_bias;
> }
>
>+static inline bool i915_vma_is_persistent(const struct i915_vma *vma)
>+{
>+	return test_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
>+}
>+
>+static inline void i915_vma_set_persistent(struct i915_vma *vma)
>+{
>+	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
>+}
>+
>+static inline bool i915_vma_is_purged(const struct i915_vma *vma)
>+{
>+	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
>+}
>+
>+static inline void i915_vma_set_purged(struct i915_vma *vma)
>+{
>+	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
>+}
>+
>+static inline bool i915_vma_is_freed(const struct i915_vma *vma)
>+{
>+	return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
>+}
>+
>+static inline void i915_vma_set_freed(struct i915_vma *vma)
>+{
>+	set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
>+}
>+
>+static inline bool i915_vma_is_active(const struct i915_vma *vma)
>+{
>+	if (i915_vma_is_persistent(vma)) {
>+		if (i915_vma_is_purged(vma))
>+			return false;
>+
>+		return i915_vm_is_active(vma->vm);
>+	}
>+
>+	return !i915_active_is_idle(&vma->active);
>+}
>+
> static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
> {
> 	i915_gem_object_get(vma->obj);
>@@ -406,12 +442,8 @@ void i915_vma_make_shrinkable(struct i915_vma *vma);
> void i915_vma_make_purgeable(struct i915_vma *vma);
>
> int i915_vma_wait_for_bind(struct i915_vma *vma);
>-
>-static inline int i915_vma_sync(struct i915_vma *vma)
>-{
>-	/* Wait for the asynchronous bindings and pending GPU reads */
>-	return i915_active_wait(&vma->active);
>-}
>+int i915_vma_verify_bind_complete(struct i915_vma *vma);
>+int i915_vma_sync(struct i915_vma *vma);
>
> /**
>  * i915_vma_get_current_resource - Get the current resource of the vma
>diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
>index de5534d518cdd..5483ccf0c82c7 100644
>--- a/drivers/gpu/drm/i915/i915_vma_types.h
>+++ b/drivers/gpu/drm/i915/i915_vma_types.h
>@@ -264,6 +264,28 @@ struct i915_vma {
> #define I915_VMA_SCANOUT_BIT	17
> #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
>
>+  /**
>+   * I915_VMA_PERSISTENT_BIT:
>+   * The vma is persistent (created with VM_BIND call).
>+   *
>+   * I915_VMA_PURGED_BIT:
>+   * The persistent vma is force unbound either due to VM_UNBIND call
>+   * from UMD or VM is released. Do not check/wait for VM activeness
>+   * in i915_vma_is_active() and i915_vma_sync() calls.
>+   *
>+   * I915_VMA_FREED_BIT:
>+   * The persistent vma is being released by UMD via VM_UNBIND call.
>+   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND call
>+   * already holds the lock.
>+   */
>+#define I915_VMA_PERSISTENT_BIT	19
>+#define I915_VMA_PURGED_BIT	20
>+#define I915_VMA_FREED_BIT	21
>+
>+#define I915_VMA_PERSISTENT	((int)BIT(I915_VMA_PERSISTENT_BIT))
>+#define I915_VMA_PURGED		((int)BIT(I915_VMA_PURGED_BIT))
>+#define I915_VMA_FREED		((int)BIT(I915_VMA_FREED_BIT))
>+
> 	struct i915_active active;
>
> #define I915_VMA_PAGES_BIAS 24
>@@ -293,6 +315,8 @@ struct i915_vma {
> 	struct list_head vm_bind_link;
> 	/* @non_priv_vm_bind_link: Link in non-private persistent VMA list */
> 	struct list_head non_priv_vm_bind_link;
>+	/* @vm_rebind_link: link to vm_rebind_list and protected by vm_rebind_lock */
>+	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
>
> 	/** Interval tree structures for persistent vma */
>
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v3 08/17] drm/i915/vm_bind: Add out fence support
  2022-08-27 19:43 ` [RFC PATCH v3 08/17] drm/i915/vm_bind: Add out fence support Andi Shyti
@ 2022-08-31  6:22   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 41+ messages in thread
From: Niranjana Vishwanathapura @ 2022-08-31  6:22 UTC (permalink / raw)
  To: Andi Shyti
  Cc: Ramalingam C, intel-gfx, dri-devel, Thomas Hellstrom,
	Matthew Auld, Andi Shyti

On Sat, Aug 27, 2022 at 09:43:54PM +0200, Andi Shyti wrote:
>From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>
>Add support for handling out fence of vm_bind call.
>
>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>---
> drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  3 +
> .../drm/i915/gem/i915_gem_vm_bind_object.c    | 82 +++++++++++++++++++
> drivers/gpu/drm/i915/i915_vma.c               |  6 +-
> drivers/gpu/drm/i915/i915_vma_types.h         |  7 ++
> 4 files changed, 97 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>index ebc493b7dafc1..d65e6e4fb3972 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>@@ -18,4 +18,7 @@ int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
> 			     struct drm_file *file);
>
> void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm);
>+void i915_vm_bind_signal_fence(struct i915_vma *vma,
>+			       struct dma_fence * const fence);
>+
> #endif /* __I915_GEM_VM_BIND_H */
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>index 3b45529fe8d4c..e57b9c492a7f9 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>@@ -5,6 +5,8 @@
>
> #include <linux/interval_tree_generic.h>
>
>+#include <drm/drm_syncobj.h>
>+
> #include "gem/i915_gem_vm_bind.h"
> #include "gem/i915_gem_context.h"
> #include "gt/gen8_engine_cs.h"
>@@ -109,6 +111,67 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
> 	}
> }
>
>+static int i915_vm_bind_add_fence(struct drm_file *file, struct i915_vma *vma,
>+				  u32 handle, u64 point)
>+{
>+	struct drm_syncobj *syncobj;
>+
>+	syncobj = drm_syncobj_find(file, handle);
>+	if (!syncobj) {
>+		DRM_DEBUG("Invalid syncobj handle provided\n");
>+		return -ENOENT;
>+	}
>+
>+	/*
>+	 * For timeline syncobjs we need to preallocate chains for
>+	 * later signaling.
>+	 */
>+	if (point) {
>+		vma->vm_bind_fence.chain_fence = dma_fence_chain_alloc();
>+		if (!vma->vm_bind_fence.chain_fence) {
>+			drm_syncobj_put(syncobj);
>+			return -ENOMEM;
>+		}
>+	} else {
>+		vma->vm_bind_fence.chain_fence = NULL;
>+	}
>+	vma->vm_bind_fence.syncobj = syncobj;
>+	vma->vm_bind_fence.value = point;
>+
>+	return 0;
>+}
>+
>+static void i915_vm_bind_put_fence(struct i915_vma *vma)
>+{
>+	if (!vma->vm_bind_fence.syncobj)
>+		return;
>+
>+	drm_syncobj_put(vma->vm_bind_fence.syncobj);
>+	dma_fence_chain_free(vma->vm_bind_fence.chain_fence);
>+}
>+
>+void i915_vm_bind_signal_fence(struct i915_vma *vma,
>+			       struct dma_fence * const fence)
>+{
>+	struct drm_syncobj *syncobj = vma->vm_bind_fence.syncobj;
>+
>+	if (!syncobj)
>+		return;
>+
>+	if (vma->vm_bind_fence.chain_fence) {
>+		drm_syncobj_add_point(syncobj,
>+				      vma->vm_bind_fence.chain_fence,
>+				      fence, vma->vm_bind_fence.value);
>+		/*
>+		 * The chain's ownership is transferred to the
>+		 * timeline.
>+		 */
>+		vma->vm_bind_fence.chain_fence = NULL;
>+	} else {
>+		drm_syncobj_replace_fence(syncobj, fence);
>+	}
>+}
>+
> static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
> 				  struct i915_vma *vma,
> 				  struct drm_i915_gem_vm_unbind *va)
>@@ -243,6 +306,15 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> 		goto unlock_vm;
> 	}
>
>+	if (va->fence.flags & I915_TIMELINE_FENCE_SIGNAL) {
>+		ret = i915_vm_bind_add_fence(file, vma, va->fence.handle,
>+					     va->fence.value);
>+		if (ret)
>+			goto put_vma;
>+	}
>+
>+	pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;

Setting pin_flags should part of patch #4.

>+
> 	for_i915_gem_ww(&ww, ret, true) {
> retry:
> 		ret = i915_gem_object_lock(vma->obj, &ww);
>@@ -267,12 +339,22 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> 			ret = i915_gem_ww_ctx_backoff(&ww);
> 			if (!ret)
> 				goto retry;
>+

Redundent white space. remove.

> 		} else {
> 			/* Hold object reference until vm_unbind */
> 			i915_gem_object_get(vma->obj);
> 		}
> 	}
>
>+	if (va->fence.flags & I915_TIMELINE_FENCE_SIGNAL)
>+		i915_vm_bind_put_fence(vma);
>+
>+put_vma:
>+	if (ret && vma) {
>+		i915_vma_set_freed(vma);
>+		i915_vma_destroy(vma);
>+	}
>+

I think destroying vma upon error should be part of patch #4.

Niranjana

> unlock_vm:
> 	mutex_unlock(&vm->vm_bind_lock);
>
>diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>index 0eb7727d62a6f..6ca37ce2b35a8 100644
>--- a/drivers/gpu/drm/i915/i915_vma.c
>+++ b/drivers/gpu/drm/i915/i915_vma.c
>@@ -1542,8 +1542,12 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
> err_vma_res:
> 	i915_vma_resource_free(vma_res);
> err_fence:
>-	if (work)
>+	if (work) {
>+		if (i915_vma_is_persistent(vma))
>+			i915_vm_bind_signal_fence(vma, &work->base.dma);
>+
> 		dma_fence_work_commit_imm(&work->base);
>+	}
> err_rpm:
> 	if (wakeref)
> 		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
>diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
>index 5483ccf0c82c7..8bf870a0f689b 100644
>--- a/drivers/gpu/drm/i915/i915_vma_types.h
>+++ b/drivers/gpu/drm/i915/i915_vma_types.h
>@@ -318,6 +318,13 @@ struct i915_vma {
> 	/* @vm_rebind_link: link to vm_rebind_list and protected by vm_rebind_lock */
> 	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
>
>+	/** Timeline fence for vm_bind completion notification */
>+	struct {
>+		struct drm_syncobj *syncobj;
>+		u64 value;
>+		struct dma_fence_chain *chain_fence;
>+	} vm_bind_fence;
>+
> 	/** Interval tree structures for persistent vma */
>
> 	/** @rb: node for the interval tree of vm for persistent vmas */
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v3 13/17] drm/i915/vm_bind: userptr dma-resv changes
  2022-08-27 19:43 ` [RFC PATCH v3 13/17] drm/i915/vm_bind: userptr dma-resv changes Andi Shyti
@ 2022-08-31  6:45   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 41+ messages in thread
From: Niranjana Vishwanathapura @ 2022-08-31  6:45 UTC (permalink / raw)
  To: Andi Shyti
  Cc: Ramalingam C, intel-gfx, dri-devel, Thomas Hellstrom,
	Matthew Auld, Andi Shyti

On Sat, Aug 27, 2022 at 09:43:59PM +0200, Andi Shyti wrote:
>From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>
>For persistent (vm_bind) vmas of userptr BOs, handle the user
>page pinning by using the i915_gem_object_userptr_submit_init()
>/done() functions
>
>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>---
> .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 139 ++++++++++++++----
> drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |  10 ++
> .../drm/i915/gem/i915_gem_vm_bind_object.c    |  16 ++
> drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +
> drivers/gpu/drm/i915/gt/intel_gtt.h           |   4 +
> drivers/gpu/drm/i915/i915_vma_types.h         |   2 +
> 6 files changed, 142 insertions(+), 31 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>index 8e0dde26194e0..72d6771da2113 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>@@ -23,6 +23,7 @@
> #include "i915_gem_vm_bind.h"
> #include "i915_trace.h"
>
>+#define __EXEC3_USERPTR_USED		BIT_ULL(34)
> #define __EXEC3_HAS_PIN			BIT_ULL(33)
> #define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
> #define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
>@@ -157,10 +158,45 @@ static void eb_scoop_unbound_vma_all(struct i915_address_space *vm)
> 	spin_unlock(&vm->vm_rebind_lock);
> }
>
>+static int eb_lookup_persistent_userptr_vmas(struct i915_execbuffer *eb)
>+{
>+	struct i915_address_space *vm = eb->context->vm;
>+	struct i915_vma *last_vma = NULL;
>+	struct i915_vma *vma;
>+	int err;
>+
>+	lockdep_assert_held(&vm->vm_bind_lock);
>+
>+	list_for_each_entry(vma, &vm->vm_userptr_invalidated_list,
>+			    vm_userptr_invalidated_link) {
>+		list_del_init(&vma->vm_userptr_invalidated_link);
>+		err = i915_gem_object_userptr_submit_init(vma->obj);
>+		if (err)
>+			return err;
>+
>+		last_vma = vma;
>+	}

This should be done under the list lock. As it is a spinlock, we
should scoop them first under that spinlock and call submit_init()
outside that lock.

>+
>+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
>+		if (i915_gem_object_is_userptr(vma->obj)) {
>+			err = i915_gem_object_userptr_submit_init(vma->obj);
>+			if (err)
>+				return err;
>+
>+			last_vma = vma;
>+		}
>+
>+	if (last_vma)
>+		eb->args->flags |= __EXEC3_USERPTR_USED;
>+
>+	return 0;
>+}
>+
> static int eb_lookup_vma_all(struct i915_execbuffer *eb)
> {
> 	unsigned int i, current_batch = 0;
> 	struct i915_vma *vma;
>+	int err = 0;
>
> 	for (i = 0; i < eb->num_batches; i++) {
> 		vma = eb_find_vma(eb->context->vm, eb->batch_addresses[i]);
>@@ -171,6 +207,10 @@ static int eb_lookup_vma_all(struct i915_execbuffer *eb)
> 		++current_batch;
> 	}
>
>+	err = eb_lookup_persistent_userptr_vmas(eb);
>+	if (err)
>+		return err;
>+
> 	eb_scoop_unbound_vma_all(eb->context->vm);
>
> 	return 0;
>@@ -286,33 +326,6 @@ static int eb_validate_persistent_vma_all(struct i915_execbuffer *eb)
> 	return ret;
> }
>
>-static int eb_validate_vma_all(struct i915_execbuffer *eb)
>-{
>-	/* only throttle once, even if we didn't need to throttle */
>-	for (bool throttle = true;; throttle = false) {
>-		int err;
>-
>-		err = eb_pin_engine(eb, throttle);
>-		if (!err)
>-			err = eb_lock_vma_all(eb);
>-
>-		if (!err)
>-			err = eb_validate_persistent_vma_all(eb);
>-
>-		if (!err)
>-			return 0;
>-
>-		if (err != -EDEADLK)
>-			return err;
>-
>-		err = i915_gem_ww_ctx_backoff(&eb->ww);
>-		if (err)
>-			return err;
>-	}
>-
>-	return 0;
>-}
>-
> /*
>  * Using two helper loops for the order of which requests / batches are created
>  * and added the to backend. Requests are created in order from the parent to
>@@ -360,15 +373,51 @@ static void eb_move_all_persistent_vma_to_active(struct i915_execbuffer *eb)
>
> static int eb_move_to_gpu(struct i915_execbuffer *eb)
> {
>+	int err = 0, j;
>+
> 	lockdep_assert_held(&eb->context->vm->vm_bind_lock);
> 	assert_object_held(eb->context->vm->root_obj);
>
> 	eb_move_all_persistent_vma_to_active(eb);
>
>-	/* Unconditionally flush any chipset caches (for streaming writes). */
>-	intel_gt_chipset_flush(eb->gt);
>+#ifdef CONFIG_MMU_NOTIFIER
>+	if (!err && (eb->args->flags & __EXEC3_USERPTR_USED)) {
>+		struct i915_vma *vma;
>
>-	return 0;
>+		lockdep_assert_held(&eb->context->vm->vm_bind_lock);
>+		assert_object_held(eb->context->vm->root_obj);
>+
>+		read_lock(&eb->i915->mm.notifier_lock);
>+		list_for_each_entry(vma, &eb->context->vm->vm_bind_list,
>+				    vm_bind_link) {
>+			if (!i915_gem_object_is_userptr(vma->obj))
>+				continue;
>+
>+			err = i915_gem_object_userptr_submit_done(vma->obj);
>+			if (err)
>+				break;
>+		}
>+
>+		read_unlock(&eb->i915->mm.notifier_lock);
>+	}
>+#endif
>+
>+	if (likely(!err)) {
>+		/*
>+		 * Unconditionally flush any
>+		 * chipset caches (for streaming writes).
>+		 */
>+		intel_gt_chipset_flush(eb->gt);
>+		return 0;
>+	}
>+
>+	for_each_batch_create_order(eb, j) {
>+		if (!eb->requests[j])
>+			break;
>+
>+		i915_request_set_error_once(eb->requests[j], err);
>+	}
>+	return err;
> }
>
> static int eb_request_submit(struct i915_execbuffer *eb,
>@@ -1088,6 +1137,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> {
> 	struct drm_i915_private *i915 = to_i915(dev);
> 	struct i915_execbuffer eb;
>+	bool throttle = true;
> 	int err;
>
> 	BUILD_BUG_ON(__EXEC3_INTERNAL_FLAGS & ~__I915_EXEC3_UNKNOWN_FLAGS);
>@@ -1121,6 +1171,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>
> 	mutex_lock(&eb.context->vm->vm_bind_lock);
>
>+lookup_vmas:
> 	err = eb_lookup_vma_all(&eb);
> 	if (err) {
> 		eb_release_vma_all(&eb, true);
>@@ -1129,7 +1180,33 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>
> 	i915_gem_ww_ctx_init(&eb.ww, true);
>
>-	err = eb_validate_vma_all(&eb);
>+retry_validate:
>+	err = eb_pin_engine(&eb, throttle);
>+	if (err)
>+		goto err_validate;
>+
>+	/* only throttle once, even if we didn't need to throttle */
>+	throttle = false;
>+
>+	err = eb_lock_vma_all(&eb);
>+	if (err)
>+		goto err_validate;
>+
>+	if (!list_empty(&eb.context->vm->vm_rebind_list)) {
>+		eb_release_vma_all(&eb, true);
>+		i915_gem_ww_ctx_fini(&eb.ww);
>+		goto lookup_vmas;
>+	}
>+
>+	err = eb_validate_persistent_vma_all(&eb);
>+
>+err_validate:
>+	if (err == -EDEADLK) {
>+		eb_release_vma_all(&eb, false);
>+		err = i915_gem_ww_ctx_backoff(&eb.ww);
>+		if (!err)
>+			goto retry_validate;
>+	}
> 	if (err)
> 		goto err_vma;
>
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>index 8423df021b713..f980d7443fa27 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>@@ -63,6 +63,7 @@ static bool i915_gem_userptr_invalidate(struct mmu_interval_notifier *mni,
> {
> 	struct drm_i915_gem_object *obj = container_of(mni, struct drm_i915_gem_object, userptr.notifier);
> 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>+	struct i915_vma *vma;
> 	long r;
>
> 	if (!mmu_notifier_range_blockable(range))
>@@ -85,6 +86,15 @@ static bool i915_gem_userptr_invalidate(struct mmu_interval_notifier *mni,
> 	if (current->flags & PF_EXITING)
> 		return true;
>
>+	spin_lock(&obj->vma.lock);
>+	list_for_each_entry(vma, &obj->vma.list, obj_link) {
>+		spin_lock(&vma->vm->vm_userptr_invalidated_lock);
>+		list_add_tail(&vma->vm_userptr_invalidated_link,
>+			      &vma->vm->vm_userptr_invalidated_list);
>+		spin_unlock(&vma->vm->vm_userptr_invalidated_lock);

Should be done only if vma is persistent.

Niranjana

>+	}
>+	spin_unlock(&obj->vma.lock);
>+
> 	/* we will unbind on next submission, still have userptr pins */
> 	r = dma_resv_wait_timeout(obj->base.resv, DMA_RESV_USAGE_BOOKKEEP, false,
> 				  MAX_SCHEDULE_TIMEOUT);
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>index e57b9c492a7f9..e6216f49e7d58 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>@@ -296,6 +296,12 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> 		goto put_obj;
> 	}
>
>+	if (i915_gem_object_is_userptr(obj)) {
>+		ret = i915_gem_object_userptr_submit_init(obj);
>+		if (ret)
>+			goto put_obj;
>+	}
>+
> 	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
> 	if (ret)
> 		goto put_obj;
>@@ -328,6 +334,16 @@ static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> 		/* Make it evictable */
> 		__i915_vma_unpin(vma);
>
>+#ifdef CONFIG_MMU_NOTIFIER
>+		if (i915_gem_object_is_userptr(obj)) {
>+			read_lock(&vm->i915->mm.notifier_lock);
>+			ret = i915_gem_object_userptr_submit_done(obj);
>+			read_unlock(&vm->i915->mm.notifier_lock);
>+			if (ret)
>+				goto out_ww;
>+		}
>+#endif
>+
> 		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
> 		i915_vm_bind_it_insert(vma, &vm->va);
> 		if (!obj->priv_root)
>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>index 97cd0089b516d..f1db8310de4a6 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>@@ -298,6 +298,8 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
> 	GEM_BUG_ON(IS_ERR(vm->root_obj));
> 	INIT_LIST_HEAD(&vm->vm_rebind_list);
> 	spin_lock_init(&vm->vm_rebind_lock);
>+	spin_lock_init(&vm->vm_userptr_invalidated_lock);
>+	INIT_LIST_HEAD(&vm->vm_userptr_invalidated_list);
> }
>
> void *__px_vaddr(struct drm_i915_gem_object *p)
>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>index 1f3b1967ec175..71203d65e1d60 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>@@ -269,6 +269,10 @@ struct i915_address_space {
> 	struct list_head vm_rebind_list;
> 	/* @vm_rebind_lock: protects vm_rebound_list */
> 	spinlock_t vm_rebind_lock;
>+	/* @vm_userptr_invalidated_list: list of invalidated userptr vmas */
>+	struct list_head vm_userptr_invalidated_list;
>+	/* @vm_userptr_invalidated_lock: protects vm_userptr_invalidated_list */
>+	spinlock_t vm_userptr_invalidated_lock;
> 	/* @va: tree of persistent vmas */
> 	struct rb_root_cached va;
> 	struct list_head non_priv_vm_bind_list;
>diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
>index 8bf870a0f689b..5b583ca744387 100644
>--- a/drivers/gpu/drm/i915/i915_vma_types.h
>+++ b/drivers/gpu/drm/i915/i915_vma_types.h
>@@ -317,6 +317,8 @@ struct i915_vma {
> 	struct list_head non_priv_vm_bind_link;
> 	/* @vm_rebind_link: link to vm_rebind_list and protected by vm_rebind_lock */
> 	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
>+	/*@vm_userptr_invalidated_link: link to the vm->vm_userptr_invalidated_list */
>+	struct list_head vm_userptr_invalidated_link;
>
> 	/** Timeline fence for vm_bind completion notification */
> 	struct {
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Intel-gfx] [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object
  2022-08-30 18:19   ` Matthew Auld
@ 2022-08-31  7:28     ` Tvrtko Ursulin
  2022-09-01  5:18     ` Niranjana Vishwanathapura
  1 sibling, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2022-08-31  7:28 UTC (permalink / raw)
  To: Matthew Auld, Andi Shyti, intel-gfx, dri-devel
  Cc: Thomas Hellstrom, Ramalingam C


On 30/08/2022 19:19, Matthew Auld wrote:
> On 27/08/2022 20:43, Andi Shyti wrote:
>> From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>
>> Implement the bind and unbind of an object at the specified GPU virtual
>> addresses.
>>
>> Signed-off-by: Niranjana Vishwanathapura 
>> <niranjana.vishwanathapura@intel.com>
>> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>

[snip]

>> +static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
>> +                    struct drm_i915_gem_object *obj,
>> +                    struct drm_i915_gem_vm_bind *va)
>> +{
>> +    struct i915_ggtt_view view;
> 
> Should that be renamed to i915_gtt_view? So all of this just works with 
> ppgtt insertion, as-is? I'm impressed.

Yes please, do refactor first in the series. It is my standing request 
since January 2021. See 
ab307584-d97b-4fcf-7d4e-4d7de2d943fd@linux.intel.com from a ~month ago.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Intel-gfx] [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality
  2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
                   ` (16 preceding siblings ...)
  2022-08-27 19:44 ` [RFC PATCH v3 17/17] drm/i915: Enable execbuf3 ioctl for vm_bind Andi Shyti
@ 2022-08-31  7:33 ` Tvrtko Ursulin
  17 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2022-08-31  7:33 UTC (permalink / raw)
  To: Andi Shyti, intel-gfx, dri-devel
  Cc: Thomas Hellstrom, Matthew Auld, Ramalingam C


On 27/08/2022 20:43, Andi Shyti wrote:
> Hi,
> 
> just sending the original Niranjana's patch as an RFC. It's v3 as
> the v2 has been reviewed offline with Ramalingam.
> 
> I'm still keeping most of the structure even though some further
> discussion can be done starting from here.
> 
> Copy pasting Niranjana's original cover letter message:
> 
> DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM
> buffer objects (BOs) or sections of a BOs at specified GPU virtual
> addresses on a specified address space (VM). Multiple mappings can map
> to the same physical pages of an object (aliasing). These mappings (also
> referred to as persistent mappings) will be persistent across multiple
> GPU submissions (execbuf calls) issued by the UMD, without user having
> to provide a list of all required mappings during each submission (as
> required by older execbuf mode).
> 
> This patch series support VM_BIND version 1, as described by the param
> I915_PARAM_VM_BIND_VERSION.
> 
> Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only works in
> vm_bind mode. The vm_bind mode only works with this new execbuf3 ioctl.
> The new execbuf3 ioctl will not have any execlist support and all the
> legacy support like relocations etc., are removed.

We should consider not overloading the term execlists when really 
talking about the array of struct gem_exec_object2. Before it gets too 
confusing. At least I assume that's what it is meant and eb3 is not 
intended to be used only with the GuC. Alternatively, correct me if I am 
wrong that the term is somehow already established and I did not realise.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Intel-gfx] [RFC PATCH v3 10/17] drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
  2022-08-27 19:43 ` [RFC PATCH v3 10/17] drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl Andi Shyti
@ 2022-08-31  7:38   ` Tvrtko Ursulin
  2022-09-01  5:09     ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 41+ messages in thread
From: Tvrtko Ursulin @ 2022-08-31  7:38 UTC (permalink / raw)
  To: Andi Shyti, intel-gfx, dri-devel
  Cc: Thomas Hellstrom, Matthew Auld, Ramalingam C


On 27/08/2022 20:43, Andi Shyti wrote:
> From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> 
> Implement new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
> works in vm_bind mode. The vm_bind mode only works with
> this new execbuf3 ioctl.
> 
> The new execbuf3 ioctl will not have any list of objects to validate
> bind as all required objects binding would have been requested by the
> userspace before submitting the execbuf3.
> 
> And the legacy support like relocations etc are removed.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/Makefile                 |    1 +
>   .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1000 +++++++++++++++++
>   drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
>   include/uapi/drm/i915_drm.h                   |   62 +
>   4 files changed, 1065 insertions(+)
>   create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 4e1627e96c6e0..38cd1c5bc1a55 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -148,6 +148,7 @@ gem-y += \
>   	gem/i915_gem_dmabuf.o \
>   	gem/i915_gem_domain.o \
>   	gem/i915_gem_execbuffer.o \
> +	gem/i915_gem_execbuffer3.o \
>   	gem/i915_gem_internal.o \
>   	gem/i915_gem_object.o \
>   	gem/i915_gem_lmem.o \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> new file mode 100644
> index 0000000000000..a3d767cd9f808
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> @@ -0,0 +1,1000 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#include <linux/dma-resv.h>
> +#include <linux/sync_file.h>
> +#include <linux/uaccess.h>
> +
> +#include <drm/drm_syncobj.h>
> +
> +#include "gt/intel_context.h"
> +#include "gt/intel_gpu_commands.h"
> +#include "gt/intel_gt.h"
> +#include "gt/intel_gt_pm.h"
> +#include "gt/intel_ring.h"
> +
> +#include "i915_drv.h"
> +#include "i915_file_private.h"
> +#include "i915_gem_context.h"
> +#include "i915_gem_ioctls.h"
> +#include "i915_gem_vm_bind.h"
> +#include "i915_trace.h"
> +
> +#define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
> +#define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
> +
> +/* Catch emission of unexpected errors for CI! */
> +#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
> +#undef EINVAL
> +#define EINVAL ({ \
> +	DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \
> +	22; \
> +})
> +#endif
> +
> +/**
> + * DOC: User command execution with execbuf3 ioctl
> + *
> + * A VM in VM_BIND mode will not support older execbuf mode of binding.
> + * The execbuf ioctl handling in VM_BIND mode differs significantly from the
> + * older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
> + * Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
> + * struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
> + * execlist. Hence, no support for implicit sync.
> + *
> + * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
> + * works with execbuf3 ioctl for submission.
> + *
> + * The execbuf3 ioctl directly specifies the batch addresses instead of as
> + * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
> + * support many of the older features like in/out/submit fences, fence array,
> + * default gem context etc. (See struct drm_i915_gem_execbuffer3).
> + *
> + * In VM_BIND mode, VA allocation is completely managed by the user instead of
> + * the i915 driver. Hence all VA assignment, eviction are not applicable in
> + * VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
> + * be using the i915_vma active reference tracking. It will instead check the
> + * dma-resv object's fence list for that.
> + *
> + * So, a lot of code supporting execbuf2 ioctl, like relocations, VA evictions,
> + * vma lookup table, implicit sync, vma active reference tracking etc., are not
> + * applicable for execbuf3 ioctl.
> + */
> +
> +struct eb_fence {
> +	struct drm_syncobj *syncobj;
> +	struct dma_fence *dma_fence;
> +	u64 value;
> +	struct dma_fence_chain *chain_fence;
> +};
> +
> +/**
> + * struct i915_execbuffer - execbuf struct for execbuf3
> + * @i915: reference to the i915 instance we run on
> + * @file: drm file reference
> + * args: execbuf3 ioctl structure
> + * @gt: reference to the gt instance ioctl submitted for
> + * @context: logical state for the request
> + * @gem_context: callers context
> + * @requests: requests to be build
> + * @composite_fence: used for excl fence in dma_resv objects when > 1 BB submitted
> + * @ww: i915_gem_ww_ctx instance
> + * @num_batches: number of batches submitted
> + * @batch_addresses: addresses corresponds to the submitted batches
> + * @batches: references to the i915_vmas corresponding to the batches
> + */
> +struct i915_execbuffer {
> +	struct drm_i915_private *i915;
> +	struct drm_file *file;
> +	struct drm_i915_gem_execbuffer3 *args;
> +
> +	struct intel_gt *gt;
> +	struct intel_context *context;
> +	struct i915_gem_context *gem_context;
> +
> +	struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
> +	struct dma_fence *composite_fence;
> +
> +	struct i915_gem_ww_ctx ww;
> +
> +	unsigned int num_batches;
> +	u64 batch_addresses[MAX_ENGINE_INSTANCE + 1];
> +	struct i915_vma *batches[MAX_ENGINE_INSTANCE + 1];
> +
> +	struct eb_fence *fences;
> +	unsigned long num_fences;
> +};
> +
> +static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle);
> +
> +static int eb_select_context(struct i915_execbuffer *eb)
> +{
> +	struct i915_gem_context *ctx;
> +
> +	ctx = i915_gem_context_lookup(eb->file->driver_priv, eb->args->ctx_id);
> +	if (IS_ERR(ctx))
> +		return PTR_ERR(ctx);
> +
> +	eb->gem_context = ctx;
> +	return 0;
> +}
> +
> +static struct i915_vma *
> +eb_find_vma(struct i915_address_space *vm, u64 addr)
> +{
> +	u64 va;
> +
> +	lockdep_assert_held(&vm->vm_bind_lock);
> +
> +	va = gen8_noncanonical_addr(addr & PIN_OFFSET_MASK);
> +	return i915_gem_vm_bind_lookup_vma(vm, va);
> +}
> +
> +static int eb_lookup_vma_all(struct i915_execbuffer *eb)
> +{
> +	unsigned int i, current_batch = 0;
> +	struct i915_vma *vma;
> +
> +	for (i = 0; i < eb->num_batches; i++) {
> +		vma = eb_find_vma(eb->context->vm, eb->batch_addresses[i]);
> +		if (!vma)
> +			return -EINVAL;
> +
> +		eb->batches[current_batch] = vma;
> +		++current_batch;
> +	}
> +
> +	return 0;
> +}
> +
> +static void eb_release_vma_all(struct i915_execbuffer *eb, bool final)
> +{
> +}
> +
> +static int eb_validate_vma_all(struct i915_execbuffer *eb)
> +{
> +	/* only throttle once, even if we didn't need to throttle */
> +	for (bool throttle = true;; throttle = false) {
> +		int err;
> +
> +		err = eb_pin_engine(eb, throttle);
> +		if (!err)
> +			return 0;
> +
> +		if (err != -EDEADLK)
> +			return err;
> +
> +		err = i915_gem_ww_ctx_backoff(&eb->ww);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Using two helper loops for the order of which requests / batches are created
> + * and added the to backend. Requests are created in order from the parent to
> + * the last child. Requests are added in the reverse order, from the last child
> + * to parent. This is done for locking reasons as the timeline lock is acquired
> + * during request creation and released when the request is added to the
> + * backend. To make lockdep happy (see intel_context_timeline_lock) this must be
> + * the ordering.
> + */
> +#define for_each_batch_create_order(_eb, _i) \
> +	for ((_i) = 0; (_i) < (_eb)->num_batches; ++(_i))
> +#define for_each_batch_add_order(_eb, _i) \
> +	BUILD_BUG_ON(!typecheck(int, _i)); \
> +	for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
> +
> +static int eb_move_to_gpu(struct i915_execbuffer *eb)
> +{
> +	/* Unconditionally flush any chipset caches (for streaming writes). */
> +	intel_gt_chipset_flush(eb->gt);
> +
> +	return 0;
> +}
> +
> +static int eb_request_submit(struct i915_execbuffer *eb,
> +			     struct i915_request *rq,
> +			     struct i915_vma *batch,
> +			     u64 batch_len)
> +{
> +	struct intel_engine_cs *engine = rq->context->engine;
> +	int err;
> +
> +	if (intel_context_nopreempt(rq->context))
> +		__set_bit(I915_FENCE_FLAG_NOPREEMPT, &rq->fence.flags);
> +
> +	/*
> +	 * After we completed waiting for other engines (using HW semaphores)
> +	 * then we can signal that this request/batch is ready to run. This
> +	 * allows us to determine if the batch is still waiting on the GPU
> +	 * or actually running by checking the breadcrumb.
> +	 */
> +	if (engine->emit_init_breadcrumb) {
> +		err = engine->emit_init_breadcrumb(rq);
> +		if (err)
> +			return err;
> +	}
> +
> +	return engine->emit_bb_start(rq, batch->node.start, batch_len, 0);
> +}
> +
> +static int eb_submit(struct i915_execbuffer *eb)
> +{
> +	unsigned int i;
> +	int err;
> +
> +	err = eb_move_to_gpu(eb);
> +
> +	for_each_batch_create_order(eb, i) {
> +		if (!eb->requests[i])
> +			break;
> +
> +		trace_i915_request_queue(eb->requests[i], 0);
> +		if (!err)
> +			err = eb_request_submit(eb, eb->requests[i],
> +						eb->batches[i],
> +						eb->batches[i]->size);
> +	}
> +
> +	return err;
> +}
> +
> +static struct i915_request *eb_throttle(struct i915_execbuffer *eb, struct intel_context *ce)
> +{
> +	struct intel_ring *ring = ce->ring;
> +	struct intel_timeline *tl = ce->timeline;
> +	struct i915_request *rq;
> +
> +	/*
> +	 * Completely unscientific finger-in-the-air estimates for suitable
> +	 * maximum user request size (to avoid blocking) and then backoff.
> +	 */
> +	if (intel_ring_update_space(ring) >= PAGE_SIZE)
> +		return NULL;
> +
> +	/*
> +	 * Find a request that after waiting upon, there will be at least half
> +	 * the ring available. The hysteresis allows us to compete for the
> +	 * shared ring and should mean that we sleep less often prior to
> +	 * claiming our resources, but not so long that the ring completely
> +	 * drains before we can submit our next request.
> +	 */
> +	list_for_each_entry(rq, &tl->requests, link) {
> +		if (rq->ring != ring)
> +			continue;
> +
> +		if (__intel_ring_space(rq->postfix,
> +				       ring->emit, ring->size) > ring->size / 2)
> +			break;
> +	}
> +	if (&rq->link == &tl->requests)
> +		return NULL; /* weird, we will check again later for real */
> +
> +	return i915_request_get(rq);
> +}
> +
> +static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
> +			   bool throttle)
> +{
> +	struct intel_timeline *tl;
> +	struct i915_request *rq = NULL;
> +
> +	/*
> +	 * Take a local wakeref for preparing to dispatch the execbuf as
> +	 * we expect to access the hardware fairly frequently in the
> +	 * process, and require the engine to be kept awake between accesses.
> +	 * Upon dispatch, we acquire another prolonged wakeref that we hold
> +	 * until the timeline is idle, which in turn releases the wakeref
> +	 * taken on the engine, and the parent device.
> +	 */
> +	tl = intel_context_timeline_lock(ce);
> +	if (IS_ERR(tl))
> +		return PTR_ERR(tl);
> +
> +	intel_context_enter(ce);
> +	if (throttle)
> +		rq = eb_throttle(eb, ce);
> +	intel_context_timeline_unlock(tl);
> +
> +	if (rq) {
> +		bool nonblock = eb->file->filp->f_flags & O_NONBLOCK;
> +		long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
> +
> +		if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
> +				      timeout) < 0) {
> +			i915_request_put(rq);
> +
> +			/*
> +			 * Error path, cannot use intel_context_timeline_lock as
> +			 * that is user interruptable and this clean up step
> +			 * must be done.
> +			 */
> +			mutex_lock(&ce->timeline->mutex);
> +			intel_context_exit(ce);
> +			mutex_unlock(&ce->timeline->mutex);
> +
> +			if (nonblock)
> +				return -EWOULDBLOCK;
> +			else
> +				return -EINTR;
> +		}
> +		i915_request_put(rq);
> +	}
> +
> +	return 0;
> +}
> +
> +static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
> +{
> +	struct intel_context *ce = eb->context, *child;
> +	int err;
> +	int i = 0, j = 0;
> +
> +	GEM_BUG_ON(eb->args->flags & __EXEC3_ENGINE_PINNED);
> +
> +	if (unlikely(intel_context_is_banned(ce)))
> +		return -EIO;
> +
> +	/*
> +	 * Pinning the contexts may generate requests in order to acquire
> +	 * GGTT space, so do this first before we reserve a seqno for
> +	 * ourselves.
> +	 */
> +	err = intel_context_pin_ww(ce, &eb->ww);
> +	if (err)
> +		return err;
> +
> +	for_each_child(ce, child) {
> +		err = intel_context_pin_ww(child, &eb->ww);
> +		GEM_BUG_ON(err);	/* perma-pinned should incr a counter */
> +	}
> +
> +	for_each_child(ce, child) {
> +		err = eb_pin_timeline(eb, child, throttle);
> +		if (err)
> +			goto unwind;
> +		++i;
> +	}
> +	err = eb_pin_timeline(eb, ce, throttle);
> +	if (err)
> +		goto unwind;
> +
> +	eb->args->flags |= __EXEC3_ENGINE_PINNED;
> +	return 0;
> +
> +unwind:
> +	for_each_child(ce, child) {
> +		if (j++ < i) {
> +			mutex_lock(&child->timeline->mutex);
> +			intel_context_exit(child);
> +			mutex_unlock(&child->timeline->mutex);
> +		}
> +	}
> +	for_each_child(ce, child)
> +		intel_context_unpin(child);
> +	intel_context_unpin(ce);
> +	return err;
> +}
> +
> +static int
> +eb_select_engine(struct i915_execbuffer *eb)
> +{
> +	struct intel_context *ce, *child;
> +	unsigned int idx;
> +	int err;
> +
> +	if (!i915_gem_context_user_engines(eb->gem_context))
> +		return -EINVAL;
> +
> +	idx = eb->args->engine_idx;
> +	ce = i915_gem_context_get_engine(eb->gem_context, idx);
> +	if (IS_ERR(ce))
> +		return PTR_ERR(ce);
> +
> +	eb->num_batches = ce->parallel.number_children + 1;
> +
> +	for_each_child(ce, child)
> +		intel_context_get(child);
> +	intel_gt_pm_get(ce->engine->gt);
> +
> +	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
> +		err = intel_context_alloc_state(ce);
> +		if (err)
> +			goto err;
> +	}
> +	for_each_child(ce, child) {
> +		if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
> +			err = intel_context_alloc_state(child);
> +			if (err)
> +				goto err;
> +		}
> +	}
> +
> +	/*
> +	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
> +	 * EIO if the GPU is already wedged.
> +	 */
> +	err = intel_gt_terminally_wedged(ce->engine->gt);
> +	if (err)
> +		goto err;
> +
> +	if (!i915_vm_tryget(ce->vm)) {
> +		err = -ENOENT;
> +		goto err;
> +	}
> +
> +	eb->context = ce;
> +	eb->gt = ce->engine->gt;
> +
> +	/*
> +	 * Make sure engine pool stays alive even if we call intel_context_put
> +	 * during ww handling. The pool is destroyed when last pm reference
> +	 * is dropped, which breaks our -EDEADLK handling.
> +	 */
> +	return err;
> +
> +err:
> +	intel_gt_pm_put(ce->engine->gt);
> +	for_each_child(ce, child)
> +		intel_context_put(child);
> +	intel_context_put(ce);
> +	return err;
> +}
> +
> +static void
> +eb_put_engine(struct i915_execbuffer *eb)
> +{
> +	struct intel_context *child;
> +
> +	i915_vm_put(eb->context->vm);
> +	intel_gt_pm_put(eb->gt);
> +	for_each_child(eb->context, child)
> +		intel_context_put(child);
> +	intel_context_put(eb->context);
> +}
> +
> +static void
> +__free_fence_array(struct eb_fence *fences, unsigned int n)
> +{
> +	while (n--) {
> +		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
> +		dma_fence_put(fences[n].dma_fence);
> +		dma_fence_chain_free(fences[n].chain_fence);
> +	}
> +	kvfree(fences);
> +}
> +
> +static int add_timeline_fence_array(struct i915_execbuffer *eb)
> +{
> +	struct drm_i915_gem_timeline_fence __user *user_fences;
> +	struct eb_fence *f;
> +	u64 nfences;
> +	int err = 0;
> +
> +	nfences = eb->args->fence_count;
> +	if (!nfences)
> +		return 0;
> +
> +	/* Check multiplication overflow for access_ok() and kvmalloc_array() */
> +	BUILD_BUG_ON(sizeof(size_t) > sizeof(unsigned long));
> +	if (nfences > min_t(unsigned long,
> +			    ULONG_MAX / sizeof(*user_fences),
> +			    SIZE_MAX / sizeof(*f)) - eb->num_fences)
> +		return -EINVAL;
> +
> +	user_fences = u64_to_user_ptr(eb->args->timeline_fences);
> +	if (!access_ok(user_fences, nfences * sizeof(*user_fences)))
> +		return -EFAULT;
> +
> +	f = krealloc(eb->fences,
> +		     (eb->num_fences + nfences) * sizeof(*f),
> +		     __GFP_NOWARN | GFP_KERNEL);
> +	if (!f)
> +		return -ENOMEM;
> +
> +	eb->fences = f;
> +	f += eb->num_fences;
> +
> +	BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
> +		     ~__I915_TIMELINE_FENCE_UNKNOWN_FLAGS);
> +
> +	while (nfences--) {
> +		struct drm_i915_gem_timeline_fence user_fence;
> +		struct drm_syncobj *syncobj;
> +		struct dma_fence *fence = NULL;
> +		u64 point;
> +
> +		if (__copy_from_user(&user_fence,
> +				     user_fences++,
> +				     sizeof(user_fence)))
> +			return -EFAULT;
> +
> +		if (user_fence.flags & __I915_TIMELINE_FENCE_UNKNOWN_FLAGS)
> +			return -EINVAL;
> +
> +		syncobj = drm_syncobj_find(eb->file, user_fence.handle);
> +		if (!syncobj) {
> +			DRM_DEBUG("Invalid syncobj handle provided\n");
> +			return -ENOENT;
> +		}
> +
> +		fence = drm_syncobj_fence_get(syncobj);
> +
> +		if (!fence && user_fence.flags &&
> +		    !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL)) {
> +			DRM_DEBUG("Syncobj handle has no fence\n");
> +			drm_syncobj_put(syncobj);
> +			return -EINVAL;
> +		}
> +
> +		point = user_fence.value;
> +		if (fence)
> +			err = dma_fence_chain_find_seqno(&fence, point);
> +
> +		if (err && !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL)) {
> +			DRM_DEBUG("Syncobj handle missing requested point %llu\n", point);
> +			dma_fence_put(fence);
> +			drm_syncobj_put(syncobj);
> +			return err;
> +		}
> +
> +		/*
> +		 * A point might have been signaled already and
> +		 * garbage collected from the timeline. In this case
> +		 * just ignore the point and carry on.
> +		 */
> +		if (!fence && !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL)) {
> +			drm_syncobj_put(syncobj);
> +			continue;
> +		}
> +
> +		/*
> +		 * For timeline syncobjs we need to preallocate chains for
> +		 * later signaling.
> +		 */
> +		if (point != 0 && user_fence.flags & I915_TIMELINE_FENCE_SIGNAL) {
> +			/*
> +			 * Waiting and signaling the same point (when point !=
> +			 * 0) would break the timeline.
> +			 */
> +			if (user_fence.flags & I915_TIMELINE_FENCE_WAIT) {
> +				DRM_DEBUG("Trying to wait & signal the same timeline point.\n");
> +				dma_fence_put(fence);
> +				drm_syncobj_put(syncobj);
> +				return -EINVAL;
> +			}
> +
> +			f->chain_fence = dma_fence_chain_alloc();
> +			if (!f->chain_fence) {
> +				drm_syncobj_put(syncobj);
> +				dma_fence_put(fence);
> +				return -ENOMEM;
> +			}
> +		} else {
> +			f->chain_fence = NULL;
> +		}
> +
> +		f->syncobj = ptr_pack_bits(syncobj, user_fence.flags, 2);
> +		f->dma_fence = fence;
> +		f->value = point;
> +		f++;
> +		eb->num_fences++;
> +	}
> +
> +	return 0;
> +}
> +
> +static void put_fence_array(struct eb_fence *fences, int num_fences)
> +{
> +	if (fences)
> +		__free_fence_array(fences, num_fences);
> +}
> +
> +static int
> +await_fence_array(struct i915_execbuffer *eb,
> +		  struct i915_request *rq)
> +{
> +	unsigned int n;
> +
> +	for (n = 0; n < eb->num_fences; n++) {
> +		int err;
> +
> +		struct drm_syncobj *syncobj;
> +		unsigned int flags;
> +
> +		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
> +
> +		if (!eb->fences[n].dma_fence)
> +			continue;
> +
> +		err = i915_request_await_dma_fence(rq, eb->fences[n].dma_fence);
> +		if (err < 0)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
> +static void signal_fence_array(const struct i915_execbuffer *eb,
> +			       struct dma_fence * const fence)
> +{
> +	unsigned int n;
> +
> +	for (n = 0; n < eb->num_fences; n++) {
> +		struct drm_syncobj *syncobj;
> +		unsigned int flags;
> +
> +		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
> +		if (!(flags & I915_TIMELINE_FENCE_SIGNAL))
> +			continue;
> +
> +		if (eb->fences[n].chain_fence) {
> +			drm_syncobj_add_point(syncobj,
> +					      eb->fences[n].chain_fence,
> +					      fence,
> +					      eb->fences[n].value);
> +			/*
> +			 * The chain's ownership is transferred to the
> +			 * timeline.
> +			 */
> +			eb->fences[n].chain_fence = NULL;
> +		} else {
> +			drm_syncobj_replace_fence(syncobj, fence);
> +		}
> +	}
> +}
Semi-random place to ask - how many of the code here is direct copy of 
existing functions from i915_gem_execbuffer.c? There seems to be some 
100% copies at least. And then some more with small tweaks. Spend some 
time and try to figure out some code sharing?

Regards,

Tvrtko

> +
> +static int parse_timeline_fences(struct i915_execbuffer *eb)
> +{
> +	return add_timeline_fence_array(eb);
> +}
> +
> +static int parse_batch_addresses(struct i915_execbuffer *eb)
> +{
> +	struct drm_i915_gem_execbuffer3 *args = eb->args;
> +	u64 __user *batch_addr = u64_to_user_ptr(args->batch_address);
> +
> +	if (copy_from_user(eb->batch_addresses, batch_addr,
> +			   sizeof(batch_addr[0]) * eb->num_batches))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
> +{
> +	struct i915_request *rq, *rn;
> +
> +	list_for_each_entry_safe(rq, rn, &tl->requests, link)
> +		if (rq == end || !i915_request_retire(rq))
> +			break;
> +}
> +
> +static int eb_request_add(struct i915_execbuffer *eb, struct i915_request *rq,
> +			  int err, bool last_parallel)
> +{
> +	struct intel_timeline * const tl = i915_request_timeline(rq);
> +	struct i915_sched_attr attr = {};
> +	struct i915_request *prev;
> +
> +	lockdep_assert_held(&tl->mutex);
> +	lockdep_unpin_lock(&tl->mutex, rq->cookie);
> +
> +	trace_i915_request_add(rq);
> +
> +	prev = __i915_request_commit(rq);
> +
> +	/* Check that the context wasn't destroyed before submission */
> +	if (likely(!intel_context_is_closed(eb->context))) {
> +		attr = eb->gem_context->sched;
> +	} else {
> +		/* Serialise with context_close via the add_to_timeline */
> +		i915_request_set_error_once(rq, -ENOENT);
> +		__i915_request_skip(rq);
> +		err = -ENOENT; /* override any transient errors */
> +	}
> +
> +	if (intel_context_is_parallel(eb->context)) {
> +		if (err) {
> +			__i915_request_skip(rq);
> +			set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
> +				&rq->fence.flags);
> +		}
> +		if (last_parallel)
> +			set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
> +				&rq->fence.flags);
> +	}
> +
> +	__i915_request_queue(rq, &attr);
> +
> +	/* Try to clean up the client's timeline after submitting the request */
> +	if (prev)
> +		retire_requests(tl, prev);
> +
> +	mutex_unlock(&tl->mutex);
> +
> +	return err;
> +}
> +
> +static int eb_request_add_all(struct i915_execbuffer *eb, int err)
> +{
> +	int i;
> +
> +	/*
> +	 * We iterate in reverse order of creation to release timeline mutexes in
> +	 * same order.
> +	 */
> +	for_each_batch_add_order(eb, i) {
> +		struct i915_request *rq = eb->requests[i];
> +
> +		if (!rq)
> +			continue;
> +
> +		err = eb_request_add(eb, rq, err, i == 0);
> +	}
> +
> +	return err;
> +}
> +
> +static void eb_requests_get(struct i915_execbuffer *eb)
> +{
> +	unsigned int i;
> +
> +	for_each_batch_create_order(eb, i) {
> +		if (!eb->requests[i])
> +			break;
> +
> +		i915_request_get(eb->requests[i]);
> +	}
> +}
> +
> +static void eb_requests_put(struct i915_execbuffer *eb)
> +{
> +	unsigned int i;
> +
> +	for_each_batch_create_order(eb, i) {
> +		if (!eb->requests[i])
> +			break;
> +
> +		i915_request_put(eb->requests[i]);
> +	}
> +}
> +
> +static int
> +eb_composite_fence_create(struct i915_execbuffer *eb)
> +{
> +	struct dma_fence_array *fence_array;
> +	struct dma_fence **fences;
> +	unsigned int i;
> +
> +	GEM_BUG_ON(!intel_context_is_parent(eb->context));
> +
> +	fences = kmalloc_array(eb->num_batches, sizeof(*fences), GFP_KERNEL);
> +	if (!fences)
> +		return -ENOMEM;
> +
> +	for_each_batch_create_order(eb, i) {
> +		fences[i] = &eb->requests[i]->fence;
> +		__set_bit(I915_FENCE_FLAG_COMPOSITE,
> +			  &eb->requests[i]->fence.flags);
> +	}
> +
> +	fence_array = dma_fence_array_create(eb->num_batches,
> +					     fences,
> +					     eb->context->parallel.fence_context,
> +					     eb->context->parallel.seqno++,
> +					     false);
> +	if (!fence_array) {
> +		kfree(fences);
> +		return -ENOMEM;
> +	}
> +
> +	/* Move ownership to the dma_fence_array created above */
> +	for_each_batch_create_order(eb, i)
> +		dma_fence_get(fences[i]);
> +
> +	eb->composite_fence = &fence_array->base;
> +
> +	return 0;
> +}
> +
> +static int
> +eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq)
> +{
> +	int err;
> +
> +	if (unlikely(eb->gem_context->syncobj)) {
> +		struct dma_fence *fence;
> +
> +		fence = drm_syncobj_fence_get(eb->gem_context->syncobj);
> +		err = i915_request_await_dma_fence(rq, fence);
> +		dma_fence_put(fence);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (eb->fences) {
> +		err = await_fence_array(eb, rq);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (intel_context_is_parallel(eb->context)) {
> +		err = eb_composite_fence_create(eb);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
> +static struct intel_context *
> +eb_find_context(struct i915_execbuffer *eb, unsigned int context_number)
> +{
> +	struct intel_context *child;
> +
> +	if (likely(context_number == 0))
> +		return eb->context;
> +
> +	for_each_child(eb->context, child)
> +		if (!--context_number)
> +			return child;
> +
> +	GEM_BUG_ON("Context not found");
> +
> +	return NULL;
> +}
> +
> +static int eb_requests_create(struct i915_execbuffer *eb)
> +{
> +	unsigned int i;
> +	int err;
> +
> +	for_each_batch_create_order(eb, i) {
> +		/* Allocate a request for this batch buffer nice and early. */
> +		eb->requests[i] = i915_request_create(eb_find_context(eb, i));
> +		if (IS_ERR(eb->requests[i])) {
> +			err = PTR_ERR(eb->requests[i]);
> +			eb->requests[i] = NULL;
> +			return err;
> +		}
> +
> +		/*
> +		 * Only the first request added (committed to backend) has to
> +		 * take the in fences into account as all subsequent requests
> +		 * will have fences inserted inbetween them.
> +		 */
> +		if (i + 1 == eb->num_batches) {
> +			err = eb_fences_add(eb, eb->requests[i]);
> +			if (err)
> +				return err;
> +		}
> +
> +		if (eb->batches[i])
> +			eb->requests[i]->batch_res =
> +				i915_vma_resource_get(eb->batches[i]->resource);
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +i915_gem_do_execbuffer(struct drm_device *dev,
> +		       struct drm_file *file,
> +		       struct drm_i915_gem_execbuffer3 *args)
> +{
> +	struct drm_i915_private *i915 = to_i915(dev);
> +	struct i915_execbuffer eb;
> +	int err;
> +
> +	BUILD_BUG_ON(__EXEC3_INTERNAL_FLAGS & ~__I915_EXEC3_UNKNOWN_FLAGS);
> +
> +	eb.i915 = i915;
> +	eb.file = file;
> +	eb.args = args;
> +
> +	eb.fences = NULL;
> +	eb.num_fences = 0;
> +
> +	memset(eb.requests, 0, sizeof(struct i915_request *) *
> +	       ARRAY_SIZE(eb.requests));
> +	eb.composite_fence = NULL;
> +
> +	err = parse_timeline_fences(&eb);
> +	if (err)
> +		return err;
> +
> +	err = eb_select_context(&eb);
> +	if (unlikely(err))
> +		goto err_fences;
> +
> +	err = eb_select_engine(&eb);
> +	if (unlikely(err))
> +		goto err_context;
> +
> +	err = parse_batch_addresses(&eb);
> +	if (unlikely(err))
> +		goto err_engine;
> +
> +	mutex_lock(&eb.context->vm->vm_bind_lock);
> +
> +	err = eb_lookup_vma_all(&eb);
> +	if (err) {
> +		eb_release_vma_all(&eb, true);
> +		goto err_vm_bind_lock;
> +	}
> +
> +	i915_gem_ww_ctx_init(&eb.ww, true);
> +
> +	err = eb_validate_vma_all(&eb);
> +	if (err)
> +		goto err_vma;
> +
> +	ww_acquire_done(&eb.ww.ctx);
> +
> +	err = eb_requests_create(&eb);
> +	if (err) {
> +		if (eb.requests[0])
> +			goto err_request;
> +		else
> +			goto err_vma;
> +	}
> +
> +	err = eb_submit(&eb);
> +
> +err_request:
> +	eb_requests_get(&eb);
> +	err = eb_request_add_all(&eb, err);
> +
> +	if (eb.fences)
> +		signal_fence_array(&eb, eb.composite_fence ?
> +				   eb.composite_fence :
> +				   &eb.requests[0]->fence);
> +
> +	if (unlikely(eb.gem_context->syncobj)) {
> +		drm_syncobj_replace_fence(eb.gem_context->syncobj,
> +					  eb.composite_fence ?
> +					  eb.composite_fence :
> +					  &eb.requests[0]->fence);
> +	}
> +
> +	if (eb.composite_fence)
> +		dma_fence_put(eb.composite_fence);
> +
> +	eb_requests_put(&eb);
> +
> +err_vma:
> +	eb_release_vma_all(&eb, true);
> +	WARN_ON(err == -EDEADLK);
> +	i915_gem_ww_ctx_fini(&eb.ww);
> +err_vm_bind_lock:
> +	mutex_unlock(&eb.context->vm->vm_bind_lock);
> +err_engine:
> +	eb_put_engine(&eb);
> +err_context:
> +	i915_gem_context_put(eb.gem_context);
> +err_fences:
> +	put_fence_array(eb.fences, eb.num_fences);
> +	return err;
> +}
> +
> +int
> +i915_gem_execbuffer3_ioctl(struct drm_device *dev, void *data,
> +			   struct drm_file *file)
> +{
> +	struct drm_i915_gem_execbuffer3 *args = data;
> +	int err;
> +
> +	if (args->flags & __I915_EXEC3_UNKNOWN_FLAGS)
> +		return -EINVAL;
> +
> +	err = i915_gem_do_execbuffer(dev, file, args);
> +
> +	args->flags &= ~__I915_EXEC3_UNKNOWN_FLAGS;
> +	return err;
> +}
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
> index 28d6526e32ab0..b7a1e9725a841 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
> @@ -18,6 +18,8 @@ int i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
>   			      struct drm_file *file);
>   int i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
>   			       struct drm_file *file);
> +int i915_gem_execbuffer3_ioctl(struct drm_device *dev, void *data,
> +			       struct drm_file *file);
>   int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
>   				struct drm_file *file);
>   int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 3da0e07f84bbd..ea1906873f278 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1542,6 +1542,68 @@ struct drm_i915_gem_timeline_fence {
>   	__u64 value;
>   };
>   
> +/**
> + * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
> + * ioctl.
> + *
> + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
> + * only works with this ioctl for submission.
> + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
> + */
> +struct drm_i915_gem_execbuffer3 {
> +	/**
> +	 * @ctx_id: Context id
> +	 *
> +	 * Only contexts with user engine map are allowed.
> +	 */
> +	__u32 ctx_id;
> +
> +	/**
> +	 * @engine_idx: Engine index
> +	 *
> +	 * An index in the user engine map of the context specified by @ctx_id.
> +	 */
> +	__u32 engine_idx;
> +
> +	/**
> +	 * @batch_address: Batch gpu virtual address/es.
> +	 *
> +	 * For normal submission, it is the gpu virtual address of the batch
> +	 * buffer. For parallel submission, it is a pointer to an array of
> +	 * batch buffer gpu virtual addresses with array size equal to the
> +	 * number of (parallel) engines involved in that submission (See
> +	 * struct i915_context_engines_parallel_submit).
> +	 */
> +	__u64 batch_address;
> +
> +	/** @flags: Currently reserved, MBZ */
> +	__u64 flags;
> +#define __I915_EXEC3_UNKNOWN_FLAGS (~0)
> +
> +	/** @rsvd1: Reserved, MBZ */
> +	__u32 rsvd1;
> +
> +	/** @fence_count: Number of fences in @timeline_fences array. */
> +	__u32 fence_count;
> +
> +	/**
> +	 * @timeline_fences: Pointer to an array of timeline fences.
> +	 *
> +	 * Timeline fences are of format struct drm_i915_gem_timeline_fence.
> +	 */
> +	__u64 timeline_fences;
> +
> +	/** @rsvd2: Reserved, MBZ */
> +	__u64 rsvd2;
> +
> +	/**
> +	 * @extensions: Zero-terminated chain of extensions.
> +	 *
> +	 * For future extensions. See struct i915_user_extension.
> +	 */
> +	__u64 extensions;
> +};
> +
>   struct drm_i915_gem_pin {
>   	/** Handle of the buffer to be pinned. */
>   	__u32 handle;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Intel-gfx] [RFC PATCH v3 10/17] drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
  2022-08-31  7:38   ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-09-01  5:09     ` Niranjana Vishwanathapura
  2022-09-01  7:58       ` Tvrtko Ursulin
  0 siblings, 1 reply; 41+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-01  5:09 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Ramalingam C, intel-gfx, dri-devel, Thomas Hellstrom,
	Matthew Auld, Andi Shyti

On Wed, Aug 31, 2022 at 08:38:48AM +0100, Tvrtko Ursulin wrote:
>
>On 27/08/2022 20:43, Andi Shyti wrote:
>>From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>
>>Implement new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
>>works in vm_bind mode. The vm_bind mode only works with
>>this new execbuf3 ioctl.
>>
>>The new execbuf3 ioctl will not have any list of objects to validate
>>bind as all required objects binding would have been requested by the
>>userspace before submitting the execbuf3.
>>
>>And the legacy support like relocations etc are removed.
>>
>>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>---
>>  drivers/gpu/drm/i915/Makefile                 |    1 +
>>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1000 +++++++++++++++++
>>  drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
>>  include/uapi/drm/i915_drm.h                   |   62 +
>>  4 files changed, 1065 insertions(+)
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>>
>>diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>>index 4e1627e96c6e0..38cd1c5bc1a55 100644
>>--- a/drivers/gpu/drm/i915/Makefile
>>+++ b/drivers/gpu/drm/i915/Makefile
>>@@ -148,6 +148,7 @@ gem-y += \
>>  	gem/i915_gem_dmabuf.o \
>>  	gem/i915_gem_domain.o \
>>  	gem/i915_gem_execbuffer.o \
>>+	gem/i915_gem_execbuffer3.o \
>>  	gem/i915_gem_internal.o \
>>  	gem/i915_gem_object.o \
>>  	gem/i915_gem_lmem.o \
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>>new file mode 100644
>>index 0000000000000..a3d767cd9f808
>>--- /dev/null
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>>@@ -0,0 +1,1000 @@
>>+// SPDX-License-Identifier: MIT
>>+/*
>>+ * Copyright © 2022 Intel Corporation
>>+ */
>>+
>>+#include <linux/dma-resv.h>
>>+#include <linux/sync_file.h>
>>+#include <linux/uaccess.h>
>>+
>>+#include <drm/drm_syncobj.h>
>>+
>>+#include "gt/intel_context.h"
>>+#include "gt/intel_gpu_commands.h"
>>+#include "gt/intel_gt.h"
>>+#include "gt/intel_gt_pm.h"
>>+#include "gt/intel_ring.h"
>>+
>>+#include "i915_drv.h"
>>+#include "i915_file_private.h"
>>+#include "i915_gem_context.h"
>>+#include "i915_gem_ioctls.h"
>>+#include "i915_gem_vm_bind.h"
>>+#include "i915_trace.h"
>>+
>>+#define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
>>+#define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
>>+
>>+/* Catch emission of unexpected errors for CI! */
>>+#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
>>+#undef EINVAL
>>+#define EINVAL ({ \
>>+	DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \
>>+	22; \
>>+})
>>+#endif
>>+
>>+/**
>>+ * DOC: User command execution with execbuf3 ioctl
>>+ *
>>+ * A VM in VM_BIND mode will not support older execbuf mode of binding.
>>+ * The execbuf ioctl handling in VM_BIND mode differs significantly from the
>>+ * older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
>>+ * Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
>>+ * struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
>>+ * execlist. Hence, no support for implicit sync.
>>+ *
>>+ * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
>>+ * works with execbuf3 ioctl for submission.
>>+ *
>>+ * The execbuf3 ioctl directly specifies the batch addresses instead of as
>>+ * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
>>+ * support many of the older features like in/out/submit fences, fence array,
>>+ * default gem context etc. (See struct drm_i915_gem_execbuffer3).
>>+ *
>>+ * In VM_BIND mode, VA allocation is completely managed by the user instead of
>>+ * the i915 driver. Hence all VA assignment, eviction are not applicable in
>>+ * VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
>>+ * be using the i915_vma active reference tracking. It will instead check the
>>+ * dma-resv object's fence list for that.
>>+ *
>>+ * So, a lot of code supporting execbuf2 ioctl, like relocations, VA evictions,
>>+ * vma lookup table, implicit sync, vma active reference tracking etc., are not
>>+ * applicable for execbuf3 ioctl.
>>+ */
>>+
>>+struct eb_fence {
>>+	struct drm_syncobj *syncobj;
>>+	struct dma_fence *dma_fence;
>>+	u64 value;
>>+	struct dma_fence_chain *chain_fence;
>>+};
>>+
>>+/**
>>+ * struct i915_execbuffer - execbuf struct for execbuf3
>>+ * @i915: reference to the i915 instance we run on
>>+ * @file: drm file reference
>>+ * args: execbuf3 ioctl structure
>>+ * @gt: reference to the gt instance ioctl submitted for
>>+ * @context: logical state for the request
>>+ * @gem_context: callers context
>>+ * @requests: requests to be build
>>+ * @composite_fence: used for excl fence in dma_resv objects when > 1 BB submitted
>>+ * @ww: i915_gem_ww_ctx instance
>>+ * @num_batches: number of batches submitted
>>+ * @batch_addresses: addresses corresponds to the submitted batches
>>+ * @batches: references to the i915_vmas corresponding to the batches
>>+ */
>>+struct i915_execbuffer {
>>+	struct drm_i915_private *i915;
>>+	struct drm_file *file;
>>+	struct drm_i915_gem_execbuffer3 *args;
>>+
>>+	struct intel_gt *gt;
>>+	struct intel_context *context;
>>+	struct i915_gem_context *gem_context;
>>+
>>+	struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
>>+	struct dma_fence *composite_fence;
>>+
>>+	struct i915_gem_ww_ctx ww;
>>+
>>+	unsigned int num_batches;
>>+	u64 batch_addresses[MAX_ENGINE_INSTANCE + 1];
>>+	struct i915_vma *batches[MAX_ENGINE_INSTANCE + 1];
>>+
>>+	struct eb_fence *fences;
>>+	unsigned long num_fences;
>>+};
>>+
>>+static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle);
>>+
>>+static int eb_select_context(struct i915_execbuffer *eb)
>>+{
>>+	struct i915_gem_context *ctx;
>>+
>>+	ctx = i915_gem_context_lookup(eb->file->driver_priv, eb->args->ctx_id);
>>+	if (IS_ERR(ctx))
>>+		return PTR_ERR(ctx);
>>+
>>+	eb->gem_context = ctx;
>>+	return 0;
>>+}
>>+
>>+static struct i915_vma *
>>+eb_find_vma(struct i915_address_space *vm, u64 addr)
>>+{
>>+	u64 va;
>>+
>>+	lockdep_assert_held(&vm->vm_bind_lock);
>>+
>>+	va = gen8_noncanonical_addr(addr & PIN_OFFSET_MASK);
>>+	return i915_gem_vm_bind_lookup_vma(vm, va);
>>+}
>>+
>>+static int eb_lookup_vma_all(struct i915_execbuffer *eb)
>>+{
>>+	unsigned int i, current_batch = 0;
>>+	struct i915_vma *vma;
>>+
>>+	for (i = 0; i < eb->num_batches; i++) {
>>+		vma = eb_find_vma(eb->context->vm, eb->batch_addresses[i]);
>>+		if (!vma)
>>+			return -EINVAL;
>>+
>>+		eb->batches[current_batch] = vma;
>>+		++current_batch;
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+static void eb_release_vma_all(struct i915_execbuffer *eb, bool final)
>>+{
>>+}
>>+
>>+static int eb_validate_vma_all(struct i915_execbuffer *eb)
>>+{
>>+	/* only throttle once, even if we didn't need to throttle */
>>+	for (bool throttle = true;; throttle = false) {
>>+		int err;
>>+
>>+		err = eb_pin_engine(eb, throttle);
>>+		if (!err)
>>+			return 0;
>>+
>>+		if (err != -EDEADLK)
>>+			return err;
>>+
>>+		err = i915_gem_ww_ctx_backoff(&eb->ww);
>>+		if (err)
>>+			return err;
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+/*
>>+ * Using two helper loops for the order of which requests / batches are created
>>+ * and added the to backend. Requests are created in order from the parent to
>>+ * the last child. Requests are added in the reverse order, from the last child
>>+ * to parent. This is done for locking reasons as the timeline lock is acquired
>>+ * during request creation and released when the request is added to the
>>+ * backend. To make lockdep happy (see intel_context_timeline_lock) this must be
>>+ * the ordering.
>>+ */
>>+#define for_each_batch_create_order(_eb, _i) \
>>+	for ((_i) = 0; (_i) < (_eb)->num_batches; ++(_i))
>>+#define for_each_batch_add_order(_eb, _i) \
>>+	BUILD_BUG_ON(!typecheck(int, _i)); \
>>+	for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
>>+
>>+static int eb_move_to_gpu(struct i915_execbuffer *eb)
>>+{
>>+	/* Unconditionally flush any chipset caches (for streaming writes). */
>>+	intel_gt_chipset_flush(eb->gt);
>>+
>>+	return 0;
>>+}
>>+
>>+static int eb_request_submit(struct i915_execbuffer *eb,
>>+			     struct i915_request *rq,
>>+			     struct i915_vma *batch,
>>+			     u64 batch_len)
>>+{
>>+	struct intel_engine_cs *engine = rq->context->engine;
>>+	int err;
>>+
>>+	if (intel_context_nopreempt(rq->context))
>>+		__set_bit(I915_FENCE_FLAG_NOPREEMPT, &rq->fence.flags);
>>+
>>+	/*
>>+	 * After we completed waiting for other engines (using HW semaphores)
>>+	 * then we can signal that this request/batch is ready to run. This
>>+	 * allows us to determine if the batch is still waiting on the GPU
>>+	 * or actually running by checking the breadcrumb.
>>+	 */
>>+	if (engine->emit_init_breadcrumb) {
>>+		err = engine->emit_init_breadcrumb(rq);
>>+		if (err)
>>+			return err;
>>+	}
>>+
>>+	return engine->emit_bb_start(rq, batch->node.start, batch_len, 0);
>>+}
>>+
>>+static int eb_submit(struct i915_execbuffer *eb)
>>+{
>>+	unsigned int i;
>>+	int err;
>>+
>>+	err = eb_move_to_gpu(eb);
>>+
>>+	for_each_batch_create_order(eb, i) {
>>+		if (!eb->requests[i])
>>+			break;
>>+
>>+		trace_i915_request_queue(eb->requests[i], 0);
>>+		if (!err)
>>+			err = eb_request_submit(eb, eb->requests[i],
>>+						eb->batches[i],
>>+						eb->batches[i]->size);
>>+	}
>>+
>>+	return err;
>>+}
>>+
>>+static struct i915_request *eb_throttle(struct i915_execbuffer *eb, struct intel_context *ce)
>>+{
>>+	struct intel_ring *ring = ce->ring;
>>+	struct intel_timeline *tl = ce->timeline;
>>+	struct i915_request *rq;
>>+
>>+	/*
>>+	 * Completely unscientific finger-in-the-air estimates for suitable
>>+	 * maximum user request size (to avoid blocking) and then backoff.
>>+	 */
>>+	if (intel_ring_update_space(ring) >= PAGE_SIZE)
>>+		return NULL;
>>+
>>+	/*
>>+	 * Find a request that after waiting upon, there will be at least half
>>+	 * the ring available. The hysteresis allows us to compete for the
>>+	 * shared ring and should mean that we sleep less often prior to
>>+	 * claiming our resources, but not so long that the ring completely
>>+	 * drains before we can submit our next request.
>>+	 */
>>+	list_for_each_entry(rq, &tl->requests, link) {
>>+		if (rq->ring != ring)
>>+			continue;
>>+
>>+		if (__intel_ring_space(rq->postfix,
>>+				       ring->emit, ring->size) > ring->size / 2)
>>+			break;
>>+	}
>>+	if (&rq->link == &tl->requests)
>>+		return NULL; /* weird, we will check again later for real */
>>+
>>+	return i915_request_get(rq);
>>+}
>>+
>>+static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
>>+			   bool throttle)
>>+{
>>+	struct intel_timeline *tl;
>>+	struct i915_request *rq = NULL;
>>+
>>+	/*
>>+	 * Take a local wakeref for preparing to dispatch the execbuf as
>>+	 * we expect to access the hardware fairly frequently in the
>>+	 * process, and require the engine to be kept awake between accesses.
>>+	 * Upon dispatch, we acquire another prolonged wakeref that we hold
>>+	 * until the timeline is idle, which in turn releases the wakeref
>>+	 * taken on the engine, and the parent device.
>>+	 */
>>+	tl = intel_context_timeline_lock(ce);
>>+	if (IS_ERR(tl))
>>+		return PTR_ERR(tl);
>>+
>>+	intel_context_enter(ce);
>>+	if (throttle)
>>+		rq = eb_throttle(eb, ce);
>>+	intel_context_timeline_unlock(tl);
>>+
>>+	if (rq) {
>>+		bool nonblock = eb->file->filp->f_flags & O_NONBLOCK;
>>+		long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
>>+
>>+		if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
>>+				      timeout) < 0) {
>>+			i915_request_put(rq);
>>+
>>+			/*
>>+			 * Error path, cannot use intel_context_timeline_lock as
>>+			 * that is user interruptable and this clean up step
>>+			 * must be done.
>>+			 */
>>+			mutex_lock(&ce->timeline->mutex);
>>+			intel_context_exit(ce);
>>+			mutex_unlock(&ce->timeline->mutex);
>>+
>>+			if (nonblock)
>>+				return -EWOULDBLOCK;
>>+			else
>>+				return -EINTR;
>>+		}
>>+		i915_request_put(rq);
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
>>+{
>>+	struct intel_context *ce = eb->context, *child;
>>+	int err;
>>+	int i = 0, j = 0;
>>+
>>+	GEM_BUG_ON(eb->args->flags & __EXEC3_ENGINE_PINNED);
>>+
>>+	if (unlikely(intel_context_is_banned(ce)))
>>+		return -EIO;
>>+
>>+	/*
>>+	 * Pinning the contexts may generate requests in order to acquire
>>+	 * GGTT space, so do this first before we reserve a seqno for
>>+	 * ourselves.
>>+	 */
>>+	err = intel_context_pin_ww(ce, &eb->ww);
>>+	if (err)
>>+		return err;
>>+
>>+	for_each_child(ce, child) {
>>+		err = intel_context_pin_ww(child, &eb->ww);
>>+		GEM_BUG_ON(err);	/* perma-pinned should incr a counter */
>>+	}
>>+
>>+	for_each_child(ce, child) {
>>+		err = eb_pin_timeline(eb, child, throttle);
>>+		if (err)
>>+			goto unwind;
>>+		++i;
>>+	}
>>+	err = eb_pin_timeline(eb, ce, throttle);
>>+	if (err)
>>+		goto unwind;
>>+
>>+	eb->args->flags |= __EXEC3_ENGINE_PINNED;
>>+	return 0;
>>+
>>+unwind:
>>+	for_each_child(ce, child) {
>>+		if (j++ < i) {
>>+			mutex_lock(&child->timeline->mutex);
>>+			intel_context_exit(child);
>>+			mutex_unlock(&child->timeline->mutex);
>>+		}
>>+	}
>>+	for_each_child(ce, child)
>>+		intel_context_unpin(child);
>>+	intel_context_unpin(ce);
>>+	return err;
>>+}
>>+
>>+static int
>>+eb_select_engine(struct i915_execbuffer *eb)
>>+{
>>+	struct intel_context *ce, *child;
>>+	unsigned int idx;
>>+	int err;
>>+
>>+	if (!i915_gem_context_user_engines(eb->gem_context))
>>+		return -EINVAL;
>>+
>>+	idx = eb->args->engine_idx;
>>+	ce = i915_gem_context_get_engine(eb->gem_context, idx);
>>+	if (IS_ERR(ce))
>>+		return PTR_ERR(ce);
>>+
>>+	eb->num_batches = ce->parallel.number_children + 1;
>>+
>>+	for_each_child(ce, child)
>>+		intel_context_get(child);
>>+	intel_gt_pm_get(ce->engine->gt);
>>+
>>+	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
>>+		err = intel_context_alloc_state(ce);
>>+		if (err)
>>+			goto err;
>>+	}
>>+	for_each_child(ce, child) {
>>+		if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
>>+			err = intel_context_alloc_state(child);
>>+			if (err)
>>+				goto err;
>>+		}
>>+	}
>>+
>>+	/*
>>+	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
>>+	 * EIO if the GPU is already wedged.
>>+	 */
>>+	err = intel_gt_terminally_wedged(ce->engine->gt);
>>+	if (err)
>>+		goto err;
>>+
>>+	if (!i915_vm_tryget(ce->vm)) {
>>+		err = -ENOENT;
>>+		goto err;
>>+	}
>>+
>>+	eb->context = ce;
>>+	eb->gt = ce->engine->gt;
>>+
>>+	/*
>>+	 * Make sure engine pool stays alive even if we call intel_context_put
>>+	 * during ww handling. The pool is destroyed when last pm reference
>>+	 * is dropped, which breaks our -EDEADLK handling.
>>+	 */
>>+	return err;
>>+
>>+err:
>>+	intel_gt_pm_put(ce->engine->gt);
>>+	for_each_child(ce, child)
>>+		intel_context_put(child);
>>+	intel_context_put(ce);
>>+	return err;
>>+}
>>+
>>+static void
>>+eb_put_engine(struct i915_execbuffer *eb)
>>+{
>>+	struct intel_context *child;
>>+
>>+	i915_vm_put(eb->context->vm);
>>+	intel_gt_pm_put(eb->gt);
>>+	for_each_child(eb->context, child)
>>+		intel_context_put(child);
>>+	intel_context_put(eb->context);
>>+}
>>+
>>+static void
>>+__free_fence_array(struct eb_fence *fences, unsigned int n)
>>+{
>>+	while (n--) {
>>+		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
>>+		dma_fence_put(fences[n].dma_fence);
>>+		dma_fence_chain_free(fences[n].chain_fence);
>>+	}
>>+	kvfree(fences);
>>+}
>>+
>>+static int add_timeline_fence_array(struct i915_execbuffer *eb)
>>+{
>>+	struct drm_i915_gem_timeline_fence __user *user_fences;
>>+	struct eb_fence *f;
>>+	u64 nfences;
>>+	int err = 0;
>>+
>>+	nfences = eb->args->fence_count;
>>+	if (!nfences)
>>+		return 0;
>>+
>>+	/* Check multiplication overflow for access_ok() and kvmalloc_array() */
>>+	BUILD_BUG_ON(sizeof(size_t) > sizeof(unsigned long));
>>+	if (nfences > min_t(unsigned long,
>>+			    ULONG_MAX / sizeof(*user_fences),
>>+			    SIZE_MAX / sizeof(*f)) - eb->num_fences)
>>+		return -EINVAL;
>>+
>>+	user_fences = u64_to_user_ptr(eb->args->timeline_fences);
>>+	if (!access_ok(user_fences, nfences * sizeof(*user_fences)))
>>+		return -EFAULT;
>>+
>>+	f = krealloc(eb->fences,
>>+		     (eb->num_fences + nfences) * sizeof(*f),
>>+		     __GFP_NOWARN | GFP_KERNEL);
>>+	if (!f)
>>+		return -ENOMEM;
>>+
>>+	eb->fences = f;
>>+	f += eb->num_fences;
>>+
>>+	BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
>>+		     ~__I915_TIMELINE_FENCE_UNKNOWN_FLAGS);
>>+
>>+	while (nfences--) {
>>+		struct drm_i915_gem_timeline_fence user_fence;
>>+		struct drm_syncobj *syncobj;
>>+		struct dma_fence *fence = NULL;
>>+		u64 point;
>>+
>>+		if (__copy_from_user(&user_fence,
>>+				     user_fences++,
>>+				     sizeof(user_fence)))
>>+			return -EFAULT;
>>+
>>+		if (user_fence.flags & __I915_TIMELINE_FENCE_UNKNOWN_FLAGS)
>>+			return -EINVAL;
>>+
>>+		syncobj = drm_syncobj_find(eb->file, user_fence.handle);
>>+		if (!syncobj) {
>>+			DRM_DEBUG("Invalid syncobj handle provided\n");
>>+			return -ENOENT;
>>+		}
>>+
>>+		fence = drm_syncobj_fence_get(syncobj);
>>+
>>+		if (!fence && user_fence.flags &&
>>+		    !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL)) {
>>+			DRM_DEBUG("Syncobj handle has no fence\n");
>>+			drm_syncobj_put(syncobj);
>>+			return -EINVAL;
>>+		}
>>+
>>+		point = user_fence.value;
>>+		if (fence)
>>+			err = dma_fence_chain_find_seqno(&fence, point);
>>+
>>+		if (err && !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL)) {
>>+			DRM_DEBUG("Syncobj handle missing requested point %llu\n", point);
>>+			dma_fence_put(fence);
>>+			drm_syncobj_put(syncobj);
>>+			return err;
>>+		}
>>+
>>+		/*
>>+		 * A point might have been signaled already and
>>+		 * garbage collected from the timeline. In this case
>>+		 * just ignore the point and carry on.
>>+		 */
>>+		if (!fence && !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL)) {
>>+			drm_syncobj_put(syncobj);
>>+			continue;
>>+		}
>>+
>>+		/*
>>+		 * For timeline syncobjs we need to preallocate chains for
>>+		 * later signaling.
>>+		 */
>>+		if (point != 0 && user_fence.flags & I915_TIMELINE_FENCE_SIGNAL) {
>>+			/*
>>+			 * Waiting and signaling the same point (when point !=
>>+			 * 0) would break the timeline.
>>+			 */
>>+			if (user_fence.flags & I915_TIMELINE_FENCE_WAIT) {
>>+				DRM_DEBUG("Trying to wait & signal the same timeline point.\n");
>>+				dma_fence_put(fence);
>>+				drm_syncobj_put(syncobj);
>>+				return -EINVAL;
>>+			}
>>+
>>+			f->chain_fence = dma_fence_chain_alloc();
>>+			if (!f->chain_fence) {
>>+				drm_syncobj_put(syncobj);
>>+				dma_fence_put(fence);
>>+				return -ENOMEM;
>>+			}
>>+		} else {
>>+			f->chain_fence = NULL;
>>+		}
>>+
>>+		f->syncobj = ptr_pack_bits(syncobj, user_fence.flags, 2);
>>+		f->dma_fence = fence;
>>+		f->value = point;
>>+		f++;
>>+		eb->num_fences++;
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+static void put_fence_array(struct eb_fence *fences, int num_fences)
>>+{
>>+	if (fences)
>>+		__free_fence_array(fences, num_fences);
>>+}
>>+
>>+static int
>>+await_fence_array(struct i915_execbuffer *eb,
>>+		  struct i915_request *rq)
>>+{
>>+	unsigned int n;
>>+
>>+	for (n = 0; n < eb->num_fences; n++) {
>>+		int err;
>>+
>>+		struct drm_syncobj *syncobj;
>>+		unsigned int flags;
>>+
>>+		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
>>+
>>+		if (!eb->fences[n].dma_fence)
>>+			continue;
>>+
>>+		err = i915_request_await_dma_fence(rq, eb->fences[n].dma_fence);
>>+		if (err < 0)
>>+			return err;
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+static void signal_fence_array(const struct i915_execbuffer *eb,
>>+			       struct dma_fence * const fence)
>>+{
>>+	unsigned int n;
>>+
>>+	for (n = 0; n < eb->num_fences; n++) {
>>+		struct drm_syncobj *syncobj;
>>+		unsigned int flags;
>>+
>>+		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
>>+		if (!(flags & I915_TIMELINE_FENCE_SIGNAL))
>>+			continue;
>>+
>>+		if (eb->fences[n].chain_fence) {
>>+			drm_syncobj_add_point(syncobj,
>>+					      eb->fences[n].chain_fence,
>>+					      fence,
>>+					      eb->fences[n].value);
>>+			/*
>>+			 * The chain's ownership is transferred to the
>>+			 * timeline.
>>+			 */
>>+			eb->fences[n].chain_fence = NULL;
>>+		} else {
>>+			drm_syncobj_replace_fence(syncobj, fence);
>>+		}
>>+	}
>>+}
>Semi-random place to ask - how many of the code here is direct copy of 
>existing functions from i915_gem_execbuffer.c? There seems to be some 
>100% copies at least. And then some more with small tweaks. Spend some 
>time and try to figure out some code sharing?
>

During VM_BIND design review, maintainers expressed thought on keeping
execbuf3 completely separate and not touch the legacy execbuf path.

I also think, execbuf3 should be fully separate. We can do some code
sharing where is a close 100% copy (there is a TODO in cover letter).
There are some changes like the timeline fence array handling here
which looks similar, but the uapi is not exactly the same. Probably,
we should keep them separate and not try to force code sharing at
least at this point.

Niranjana

>Regards,
>
>Tvrtko
>
>>+
>>+static int parse_timeline_fences(struct i915_execbuffer *eb)
>>+{
>>+	return add_timeline_fence_array(eb);
>>+}
>>+
>>+static int parse_batch_addresses(struct i915_execbuffer *eb)
>>+{
>>+	struct drm_i915_gem_execbuffer3 *args = eb->args;
>>+	u64 __user *batch_addr = u64_to_user_ptr(args->batch_address);
>>+
>>+	if (copy_from_user(eb->batch_addresses, batch_addr,
>>+			   sizeof(batch_addr[0]) * eb->num_batches))
>>+		return -EFAULT;
>>+
>>+	return 0;
>>+}
>>+
>>+static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
>>+{
>>+	struct i915_request *rq, *rn;
>>+
>>+	list_for_each_entry_safe(rq, rn, &tl->requests, link)
>>+		if (rq == end || !i915_request_retire(rq))
>>+			break;
>>+}
>>+
>>+static int eb_request_add(struct i915_execbuffer *eb, struct i915_request *rq,
>>+			  int err, bool last_parallel)
>>+{
>>+	struct intel_timeline * const tl = i915_request_timeline(rq);
>>+	struct i915_sched_attr attr = {};
>>+	struct i915_request *prev;
>>+
>>+	lockdep_assert_held(&tl->mutex);
>>+	lockdep_unpin_lock(&tl->mutex, rq->cookie);
>>+
>>+	trace_i915_request_add(rq);
>>+
>>+	prev = __i915_request_commit(rq);
>>+
>>+	/* Check that the context wasn't destroyed before submission */
>>+	if (likely(!intel_context_is_closed(eb->context))) {
>>+		attr = eb->gem_context->sched;
>>+	} else {
>>+		/* Serialise with context_close via the add_to_timeline */
>>+		i915_request_set_error_once(rq, -ENOENT);
>>+		__i915_request_skip(rq);
>>+		err = -ENOENT; /* override any transient errors */
>>+	}
>>+
>>+	if (intel_context_is_parallel(eb->context)) {
>>+		if (err) {
>>+			__i915_request_skip(rq);
>>+			set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
>>+				&rq->fence.flags);
>>+		}
>>+		if (last_parallel)
>>+			set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
>>+				&rq->fence.flags);
>>+	}
>>+
>>+	__i915_request_queue(rq, &attr);
>>+
>>+	/* Try to clean up the client's timeline after submitting the request */
>>+	if (prev)
>>+		retire_requests(tl, prev);
>>+
>>+	mutex_unlock(&tl->mutex);
>>+
>>+	return err;
>>+}
>>+
>>+static int eb_request_add_all(struct i915_execbuffer *eb, int err)
>>+{
>>+	int i;
>>+
>>+	/*
>>+	 * We iterate in reverse order of creation to release timeline mutexes in
>>+	 * same order.
>>+	 */
>>+	for_each_batch_add_order(eb, i) {
>>+		struct i915_request *rq = eb->requests[i];
>>+
>>+		if (!rq)
>>+			continue;
>>+
>>+		err = eb_request_add(eb, rq, err, i == 0);
>>+	}
>>+
>>+	return err;
>>+}
>>+
>>+static void eb_requests_get(struct i915_execbuffer *eb)
>>+{
>>+	unsigned int i;
>>+
>>+	for_each_batch_create_order(eb, i) {
>>+		if (!eb->requests[i])
>>+			break;
>>+
>>+		i915_request_get(eb->requests[i]);
>>+	}
>>+}
>>+
>>+static void eb_requests_put(struct i915_execbuffer *eb)
>>+{
>>+	unsigned int i;
>>+
>>+	for_each_batch_create_order(eb, i) {
>>+		if (!eb->requests[i])
>>+			break;
>>+
>>+		i915_request_put(eb->requests[i]);
>>+	}
>>+}
>>+
>>+static int
>>+eb_composite_fence_create(struct i915_execbuffer *eb)
>>+{
>>+	struct dma_fence_array *fence_array;
>>+	struct dma_fence **fences;
>>+	unsigned int i;
>>+
>>+	GEM_BUG_ON(!intel_context_is_parent(eb->context));
>>+
>>+	fences = kmalloc_array(eb->num_batches, sizeof(*fences), GFP_KERNEL);
>>+	if (!fences)
>>+		return -ENOMEM;
>>+
>>+	for_each_batch_create_order(eb, i) {
>>+		fences[i] = &eb->requests[i]->fence;
>>+		__set_bit(I915_FENCE_FLAG_COMPOSITE,
>>+			  &eb->requests[i]->fence.flags);
>>+	}
>>+
>>+	fence_array = dma_fence_array_create(eb->num_batches,
>>+					     fences,
>>+					     eb->context->parallel.fence_context,
>>+					     eb->context->parallel.seqno++,
>>+					     false);
>>+	if (!fence_array) {
>>+		kfree(fences);
>>+		return -ENOMEM;
>>+	}
>>+
>>+	/* Move ownership to the dma_fence_array created above */
>>+	for_each_batch_create_order(eb, i)
>>+		dma_fence_get(fences[i]);
>>+
>>+	eb->composite_fence = &fence_array->base;
>>+
>>+	return 0;
>>+}
>>+
>>+static int
>>+eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq)
>>+{
>>+	int err;
>>+
>>+	if (unlikely(eb->gem_context->syncobj)) {
>>+		struct dma_fence *fence;
>>+
>>+		fence = drm_syncobj_fence_get(eb->gem_context->syncobj);
>>+		err = i915_request_await_dma_fence(rq, fence);
>>+		dma_fence_put(fence);
>>+		if (err)
>>+			return err;
>>+	}
>>+
>>+	if (eb->fences) {
>>+		err = await_fence_array(eb, rq);
>>+		if (err)
>>+			return err;
>>+	}
>>+
>>+	if (intel_context_is_parallel(eb->context)) {
>>+		err = eb_composite_fence_create(eb);
>>+		if (err)
>>+			return err;
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+static struct intel_context *
>>+eb_find_context(struct i915_execbuffer *eb, unsigned int context_number)
>>+{
>>+	struct intel_context *child;
>>+
>>+	if (likely(context_number == 0))
>>+		return eb->context;
>>+
>>+	for_each_child(eb->context, child)
>>+		if (!--context_number)
>>+			return child;
>>+
>>+	GEM_BUG_ON("Context not found");
>>+
>>+	return NULL;
>>+}
>>+
>>+static int eb_requests_create(struct i915_execbuffer *eb)
>>+{
>>+	unsigned int i;
>>+	int err;
>>+
>>+	for_each_batch_create_order(eb, i) {
>>+		/* Allocate a request for this batch buffer nice and early. */
>>+		eb->requests[i] = i915_request_create(eb_find_context(eb, i));
>>+		if (IS_ERR(eb->requests[i])) {
>>+			err = PTR_ERR(eb->requests[i]);
>>+			eb->requests[i] = NULL;
>>+			return err;
>>+		}
>>+
>>+		/*
>>+		 * Only the first request added (committed to backend) has to
>>+		 * take the in fences into account as all subsequent requests
>>+		 * will have fences inserted inbetween them.
>>+		 */
>>+		if (i + 1 == eb->num_batches) {
>>+			err = eb_fences_add(eb, eb->requests[i]);
>>+			if (err)
>>+				return err;
>>+		}
>>+
>>+		if (eb->batches[i])
>>+			eb->requests[i]->batch_res =
>>+				i915_vma_resource_get(eb->batches[i]->resource);
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+static int
>>+i915_gem_do_execbuffer(struct drm_device *dev,
>>+		       struct drm_file *file,
>>+		       struct drm_i915_gem_execbuffer3 *args)
>>+{
>>+	struct drm_i915_private *i915 = to_i915(dev);
>>+	struct i915_execbuffer eb;
>>+	int err;
>>+
>>+	BUILD_BUG_ON(__EXEC3_INTERNAL_FLAGS & ~__I915_EXEC3_UNKNOWN_FLAGS);
>>+
>>+	eb.i915 = i915;
>>+	eb.file = file;
>>+	eb.args = args;
>>+
>>+	eb.fences = NULL;
>>+	eb.num_fences = 0;
>>+
>>+	memset(eb.requests, 0, sizeof(struct i915_request *) *
>>+	       ARRAY_SIZE(eb.requests));
>>+	eb.composite_fence = NULL;
>>+
>>+	err = parse_timeline_fences(&eb);
>>+	if (err)
>>+		return err;
>>+
>>+	err = eb_select_context(&eb);
>>+	if (unlikely(err))
>>+		goto err_fences;
>>+
>>+	err = eb_select_engine(&eb);
>>+	if (unlikely(err))
>>+		goto err_context;
>>+
>>+	err = parse_batch_addresses(&eb);
>>+	if (unlikely(err))
>>+		goto err_engine;
>>+
>>+	mutex_lock(&eb.context->vm->vm_bind_lock);
>>+
>>+	err = eb_lookup_vma_all(&eb);
>>+	if (err) {
>>+		eb_release_vma_all(&eb, true);
>>+		goto err_vm_bind_lock;
>>+	}
>>+
>>+	i915_gem_ww_ctx_init(&eb.ww, true);
>>+
>>+	err = eb_validate_vma_all(&eb);
>>+	if (err)
>>+		goto err_vma;
>>+
>>+	ww_acquire_done(&eb.ww.ctx);
>>+
>>+	err = eb_requests_create(&eb);
>>+	if (err) {
>>+		if (eb.requests[0])
>>+			goto err_request;
>>+		else
>>+			goto err_vma;
>>+	}
>>+
>>+	err = eb_submit(&eb);
>>+
>>+err_request:
>>+	eb_requests_get(&eb);
>>+	err = eb_request_add_all(&eb, err);
>>+
>>+	if (eb.fences)
>>+		signal_fence_array(&eb, eb.composite_fence ?
>>+				   eb.composite_fence :
>>+				   &eb.requests[0]->fence);
>>+
>>+	if (unlikely(eb.gem_context->syncobj)) {
>>+		drm_syncobj_replace_fence(eb.gem_context->syncobj,
>>+					  eb.composite_fence ?
>>+					  eb.composite_fence :
>>+					  &eb.requests[0]->fence);
>>+	}
>>+
>>+	if (eb.composite_fence)
>>+		dma_fence_put(eb.composite_fence);
>>+
>>+	eb_requests_put(&eb);
>>+
>>+err_vma:
>>+	eb_release_vma_all(&eb, true);
>>+	WARN_ON(err == -EDEADLK);
>>+	i915_gem_ww_ctx_fini(&eb.ww);
>>+err_vm_bind_lock:
>>+	mutex_unlock(&eb.context->vm->vm_bind_lock);
>>+err_engine:
>>+	eb_put_engine(&eb);
>>+err_context:
>>+	i915_gem_context_put(eb.gem_context);
>>+err_fences:
>>+	put_fence_array(eb.fences, eb.num_fences);
>>+	return err;
>>+}
>>+
>>+int
>>+i915_gem_execbuffer3_ioctl(struct drm_device *dev, void *data,
>>+			   struct drm_file *file)
>>+{
>>+	struct drm_i915_gem_execbuffer3 *args = data;
>>+	int err;
>>+
>>+	if (args->flags & __I915_EXEC3_UNKNOWN_FLAGS)
>>+		return -EINVAL;
>>+
>>+	err = i915_gem_do_execbuffer(dev, file, args);
>>+
>>+	args->flags &= ~__I915_EXEC3_UNKNOWN_FLAGS;
>>+	return err;
>>+}
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
>>index 28d6526e32ab0..b7a1e9725a841 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
>>@@ -18,6 +18,8 @@ int i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
>>  			      struct drm_file *file);
>>  int i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
>>  			       struct drm_file *file);
>>+int i915_gem_execbuffer3_ioctl(struct drm_device *dev, void *data,
>>+			       struct drm_file *file);
>>  int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
>>  				struct drm_file *file);
>>  int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>>diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>index 3da0e07f84bbd..ea1906873f278 100644
>>--- a/include/uapi/drm/i915_drm.h
>>+++ b/include/uapi/drm/i915_drm.h
>>@@ -1542,6 +1542,68 @@ struct drm_i915_gem_timeline_fence {
>>  	__u64 value;
>>  };
>>+/**
>>+ * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
>>+ * ioctl.
>>+ *
>>+ * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
>>+ * only works with this ioctl for submission.
>>+ * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>>+ */
>>+struct drm_i915_gem_execbuffer3 {
>>+	/**
>>+	 * @ctx_id: Context id
>>+	 *
>>+	 * Only contexts with user engine map are allowed.
>>+	 */
>>+	__u32 ctx_id;
>>+
>>+	/**
>>+	 * @engine_idx: Engine index
>>+	 *
>>+	 * An index in the user engine map of the context specified by @ctx_id.
>>+	 */
>>+	__u32 engine_idx;
>>+
>>+	/**
>>+	 * @batch_address: Batch gpu virtual address/es.
>>+	 *
>>+	 * For normal submission, it is the gpu virtual address of the batch
>>+	 * buffer. For parallel submission, it is a pointer to an array of
>>+	 * batch buffer gpu virtual addresses with array size equal to the
>>+	 * number of (parallel) engines involved in that submission (See
>>+	 * struct i915_context_engines_parallel_submit).
>>+	 */
>>+	__u64 batch_address;
>>+
>>+	/** @flags: Currently reserved, MBZ */
>>+	__u64 flags;
>>+#define __I915_EXEC3_UNKNOWN_FLAGS (~0)
>>+
>>+	/** @rsvd1: Reserved, MBZ */
>>+	__u32 rsvd1;
>>+
>>+	/** @fence_count: Number of fences in @timeline_fences array. */
>>+	__u32 fence_count;
>>+
>>+	/**
>>+	 * @timeline_fences: Pointer to an array of timeline fences.
>>+	 *
>>+	 * Timeline fences are of format struct drm_i915_gem_timeline_fence.
>>+	 */
>>+	__u64 timeline_fences;
>>+
>>+	/** @rsvd2: Reserved, MBZ */
>>+	__u64 rsvd2;
>>+
>>+	/**
>>+	 * @extensions: Zero-terminated chain of extensions.
>>+	 *
>>+	 * For future extensions. See struct i915_user_extension.
>>+	 */
>>+	__u64 extensions;
>>+};
>>+
>>  struct drm_i915_gem_pin {
>>  	/** Handle of the buffer to be pinned. */
>>  	__u32 handle;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object
  2022-08-30 18:19   ` Matthew Auld
  2022-08-31  7:28     ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-09-01  5:18     ` Niranjana Vishwanathapura
  1 sibling, 0 replies; 41+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-01  5:18 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Andi Shyti, Ramalingam C, intel-gfx, dri-devel, Thomas Hellstrom,
	Andi Shyti

On Tue, Aug 30, 2022 at 07:19:17PM +0100, Matthew Auld wrote:
>On 27/08/2022 20:43, Andi Shyti wrote:
>>From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>
>>Implement the bind and unbind of an object at the specified GPU virtual
>>addresses.
>>
>>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
>>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>---
>>  drivers/gpu/drm/i915/Makefile                 |   1 +
>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  21 ++
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 322 ++++++++++++++++++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  10 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>>  drivers/gpu/drm/i915/i915_driver.c            |   1 +
>>  drivers/gpu/drm/i915/i915_vma.c               |   3 +-
>>  drivers/gpu/drm/i915/i915_vma.h               |   2 -
>>  drivers/gpu/drm/i915/i915_vma_types.h         |  14 +
>>  include/uapi/drm/i915_drm.h                   | 163 +++++++++
>>  10 files changed, 543 insertions(+), 3 deletions(-)
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>
>>diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>>index 522ef9b4aff32..4e1627e96c6e0 100644
>>--- a/drivers/gpu/drm/i915/Makefile
>>+++ b/drivers/gpu/drm/i915/Makefile
>>@@ -165,6 +165,7 @@ gem-y += \
>>  	gem/i915_gem_ttm_move.o \
>>  	gem/i915_gem_ttm_pm.o \
>>  	gem/i915_gem_userptr.o \
>>+	gem/i915_gem_vm_bind_object.o \
>>  	gem/i915_gem_wait.o \
>>  	gem/i915_gemfs.o
>>  i915-y += \
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>new file mode 100644
>>index 0000000000000..ebc493b7dafc1
>>--- /dev/null
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>@@ -0,0 +1,21 @@
>>+/* SPDX-License-Identifier: MIT */
>>+/*
>>+ * Copyright © 2022 Intel Corporation
>>+ */
>>+
>>+#ifndef __I915_GEM_VM_BIND_H
>>+#define __I915_GEM_VM_BIND_H
>>+
>>+#include "i915_drv.h"
>>+
>>+struct i915_vma *
>>+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>>+void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
>>+
>>+int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
>>+			   struct drm_file *file);
>>+int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
>>+			     struct drm_file *file);
>>+
>>+void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm);
>>+#endif /* __I915_GEM_VM_BIND_H */
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>new file mode 100644
>>index 0000000000000..dadd1d4b1761b
>>--- /dev/null
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>@@ -0,0 +1,322 @@
>>+// SPDX-License-Identifier: MIT
>>+/*
>>+ * Copyright © 2022 Intel Corporation
>>+ */
>>+
>>+#include <linux/interval_tree_generic.h>
>>+
>>+#include "gem/i915_gem_vm_bind.h"
>>+#include "gem/i915_gem_context.h"
>>+#include "gt/gen8_engine_cs.h"
>>+
>>+#include "i915_drv.h"
>>+#include "i915_gem_gtt.h"
>>+
>>+#define START(node) ((node)->start)
>>+#define LAST(node) ((node)->last)
>>+
>>+INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
>>+		     START, LAST, static inline, i915_vm_bind_it)
>>+
>>+#undef START
>>+#undef LAST
>>+
>>+/**
>>+ * DOC: VM_BIND/UNBIND ioctls
>>+ *
>>+ * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
>>+ * objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
>>+ * specified address space (VM). Multiple mappings can map to the same physical
>>+ * pages of an object (aliasing). These mappings (also referred to as persistent
>>+ * mappings) will be persistent across multiple GPU submissions (execbuf calls)
>>+ * issued by the UMD, without user having to provide a list of all required
>>+ * mappings during each submission (as required by older execbuf mode).
>>+ *
>>+ * The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
>>+ * signaling the completion of bind/unbind operation.
>>+ *
>>+ * VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
>>+ * User has to opt-in for VM_BIND mode of binding for an address space (VM)
>>+ * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
>>+ *
>>+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>>+ * are not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be
>>+ * done asynchronously, when valid out fence is specified.
>>+ *
>>+ * VM_BIND locking order is as below.
>>+ *
>>+ * 1) vm_bind_lock mutex will protect vm_bind lists. This lock is taken in
>>+ *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
>>+ *    mapping.
>>+ *
>>+ *    In future, when GPU page faults are supported, we can potentially use a
>>+ *    rwsem instead, so that multiple page fault handlers can take the read
>>+ *    side lock to lookup the mapping and hence can run in parallel.
>>+ *    The older execbuf mode of binding do not need this lock.
>>+ *
>>+ * 2) The object's dma-resv lock will protect i915_vma state and needs
>>+ *    to be held while binding/unbinding a vma in the async worker and while
>>+ *    updating dma-resv fence list of an object. Note that private BOs of a VM
>>+ *    will all share a dma-resv object.
>>+ *
>>+ * 3) Spinlock/s to protect some of the VM's lists like the list of
>>+ *    invalidated vmas (due to eviction and userptr invalidation) etc.
>>+ */
>>+
>>+/**
>>+ * i915_gem_vm_bind_lookup_vma() - lookup for the vma with a starting addr
>>+ * @vm: virtual address space in which vma needs to be looked for
>>+ * @va: starting addr of the vma
>>+ *
>>+ * retrieves the vma with a starting address from the vm's vma tree.
>>+ *
>>+ * Returns: returns vma on success, NULL on failure.
>>+ */
>>+struct i915_vma *
>>+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
>>+{
>>+	lockdep_assert_held(&vm->vm_bind_lock);
>>+
>>+	return i915_vm_bind_it_iter_first(&vm->va, va, va);
>>+}
>>+
>>+/**
>>+ * i915_gem_vm_bind_remove() - Remove vma from the vm bind list
>>+ * @vma: vma that needs to be removed
>>+ * @release_obj: object to be release or not
>>+ *
>>+ * Removes the vma from the vm's lists custom interval tree
>>+ */
>>+void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
>>+{
>>+	lockdep_assert_held(&vma->vm->vm_bind_lock);
>>+
>>+	if (!list_empty(&vma->vm_bind_link)) {
>>+		list_del_init(&vma->vm_bind_link);
>>+		i915_vm_bind_it_remove(vma, &vma->vm->va);
>>+
>>+		/* Release object */
>>+		if (release_obj)
>>+			i915_gem_object_put(vma->obj);
>>+	}
>>+}
>>+
>>+static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
>>+				  struct i915_vma *vma,
>>+				  struct drm_i915_gem_vm_unbind *va)
>>+{
>>+	struct drm_i915_gem_object *obj;
>>+	int ret;
>>+
>>+	if (vma) {
>>+		obj = vma->obj;
>>+		i915_vma_destroy(vma);
>>+
>>+		goto exit;
>>+	}
>>+
>>+	if (!va)
>>+		return -EINVAL;
>>+
>>+	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
>>+	if (ret)
>>+		return ret;
>>+
>>+	va->start = gen8_noncanonical_addr(va->start);
>>+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>>+
>>+	if (!vma)
>>+		ret = -ENOENT;
>>+	else if (vma->size != va->length)
>>+		ret = -EINVAL;
>>+
>>+	if (ret) {
>>+		mutex_unlock(&vm->vm_bind_lock);
>>+		return ret;
>>+	}
>>+
>>+	i915_gem_vm_bind_remove(vma, false);
>>+
>>+	mutex_unlock(&vm->vm_bind_lock);
>>+
>>+	/* Destroy vma and then release object */
>>+	obj = vma->obj;
>>+	ret = i915_gem_object_lock(obj, NULL);
>>+	if (ret)
>>+		return ret;
>>+
>>+	i915_vma_destroy(vma);
>>+	i915_gem_object_unlock(obj);
>>+
>>+exit:
>>+	i915_gem_object_put(obj);
>>+
>>+	return 0;
>>+}
>>+
>>+/**
>>+ * i915_gem_vm_unbind_vma_all() - Unbind all vmas from an address space
>>+ * @vm: Address spece from which vma binding needs to be removed
>>+ *
>>+ * Unbind all userspace requested object binding
>>+ */
>>+void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm)
>>+{
>>+	struct i915_vma *vma, *t;
>>+
>>+	list_for_each_entry_safe(vma, t, &vm->vm_bound_list, vm_bind_link)
>>+		WARN_ON(i915_gem_vm_unbind_vma(vm, vma, NULL));
>>+}
>>+
>>+static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
>>+					struct drm_i915_gem_object *obj,
>>+					struct drm_i915_gem_vm_bind *va)
>>+{
>>+	struct i915_ggtt_view view;
>
>Should that be renamed to i915_gtt_view? So all of this just works 
>with ppgtt insertion, as-is? I'm impressed.

Yah, Tvrtko also gave the same comment on RFC v1.
Changing it to i915_gtt_view requires updating a lot of places.
Probably we should do that in a separate patch and push it first.
Yes, it does work for ppgtt. We just had to remove couple bug_on checks.

>
>>+	struct i915_vma *vma;
>>+
>>+	va->start = gen8_noncanonical_addr(va->start);
>>+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>>+	if (vma)
>>+		return ERR_PTR(-EEXIST);
>>+
>>+	view.type = I915_GGTT_VIEW_PARTIAL;
>>+	view.partial.offset = va->offset >> PAGE_SHIFT;
>>+	view.partial.size = va->length >> PAGE_SHIFT;
>>+	vma = i915_vma_instance(obj, vm, &view);
>>+	if (IS_ERR(vma))
>>+		return vma;
>>+
>>+	vma->start = va->start;
>>+	vma->last = va->start + va->length - 1;
>>+
>>+	return vma;
>>+}
>>+
>>+static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>>+				struct drm_i915_gem_vm_bind *va,
>>+				struct drm_file *file)
>>+{
>>+	struct drm_i915_gem_object *obj;
>>+	struct i915_vma *vma = NULL;
>>+	struct i915_gem_ww_ctx ww;
>>+	u64 pin_flags;
>>+	int ret = 0;
>>+
>>+	if (!vm->vm_bind_mode)
>>+		return -EOPNOTSUPP;
>>+
>>+	obj = i915_gem_object_lookup(file, va->handle);
>
>AFAICT this doesn't have to be an object from gem_create/ext...
>
>>+	if (!obj)
>>+		return -ENOENT;
>>+
>>+	if (!va->length ||
>>+	    !IS_ALIGNED(va->offset | va->length,
>>+			i915_gem_object_max_page_size(obj->mm.placements,
>>+						      obj->mm.n_placements)) ||
>
>...and so here max_page_size() can BUG_ON() if n_placements = 0. Also 
>what should this return in that case?

Yah, in v1 version, i915_gem_object_max_page_size() doesn't have the BUG_ON()
and it does return 4K page size (as default size) if n_placements=0.
I think that would be the right way to go.

Niranjana


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object
  2022-08-27 19:43 ` [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object Andi Shyti
  2022-08-30 17:37   ` Matthew Auld
  2022-08-30 18:19   ` Matthew Auld
@ 2022-09-01  5:31   ` Dave Airlie
  2022-09-01 20:05     ` Niranjana Vishwanathapura
  2022-09-12 13:11   ` Jani Nikula
  3 siblings, 1 reply; 41+ messages in thread
From: Dave Airlie @ 2022-09-01  5:31 UTC (permalink / raw)
  To: Andi Shyti
  Cc: Ramalingam C, Intel Graphics Development, dri-devel,
	Thomas Hellstrom, Matthew Auld, Andi Shyti,
	Niranjana Vishwanathapura

On Sun, 28 Aug 2022 at 05:45, Andi Shyti <andi.shyti@linux.intel.com> wrote:
>
> From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>
> Implement the bind and unbind of an object at the specified GPU virtual
> addresses.
>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/Makefile                 |   1 +
>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  21 ++
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 322 ++++++++++++++++++
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  10 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>  drivers/gpu/drm/i915/i915_driver.c            |   1 +
>  drivers/gpu/drm/i915/i915_vma.c               |   3 +-
>  drivers/gpu/drm/i915/i915_vma.h               |   2 -
>  drivers/gpu/drm/i915/i915_vma_types.h         |  14 +
>  include/uapi/drm/i915_drm.h                   | 163 +++++++++
>  10 files changed, 543 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 522ef9b4aff32..4e1627e96c6e0 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -165,6 +165,7 @@ gem-y += \
>         gem/i915_gem_ttm_move.o \
>         gem/i915_gem_ttm_pm.o \
>         gem/i915_gem_userptr.o \
> +       gem/i915_gem_vm_bind_object.o \
>         gem/i915_gem_wait.o \
>         gem/i915_gemfs.o
>  i915-y += \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> new file mode 100644
> index 0000000000000..ebc493b7dafc1
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#ifndef __I915_GEM_VM_BIND_H
> +#define __I915_GEM_VM_BIND_H
> +
> +#include "i915_drv.h"
> +
> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
> +
> +int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
> +                          struct drm_file *file);
> +int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
> +                            struct drm_file *file);
> +
> +void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm);
> +#endif /* __I915_GEM_VM_BIND_H */
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> new file mode 100644
> index 0000000000000..dadd1d4b1761b
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -0,0 +1,322 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#include <linux/interval_tree_generic.h>
> +
> +#include "gem/i915_gem_vm_bind.h"
> +#include "gem/i915_gem_context.h"
> +#include "gt/gen8_engine_cs.h"
> +
> +#include "i915_drv.h"
> +#include "i915_gem_gtt.h"
> +
> +#define START(node) ((node)->start)
> +#define LAST(node) ((node)->last)
> +
> +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
> +                    START, LAST, static inline, i915_vm_bind_it)
> +
> +#undef START
> +#undef LAST
> +
> +/**
> + * DOC: VM_BIND/UNBIND ioctls
> + *
> + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
> + * objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
> + * specified address space (VM). Multiple mappings can map to the same physical
> + * pages of an object (aliasing). These mappings (also referred to as persistent
> + * mappings) will be persistent across multiple GPU submissions (execbuf calls)
> + * issued by the UMD, without user having to provide a list of all required
> + * mappings during each submission (as required by older execbuf mode).
> + *
> + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
> + * signaling the completion of bind/unbind operation.
> + *
> + * VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
> + * User has to opt-in for VM_BIND mode of binding for an address space (VM)
> + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be
> + * done asynchronously, when valid out fence is specified.
> + *
> + * VM_BIND locking order is as below.
> + *
> + * 1) vm_bind_lock mutex will protect vm_bind lists. This lock is taken in
> + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
> + *    mapping.
> + *
> + *    In future, when GPU page faults are supported, we can potentially use a
> + *    rwsem instead, so that multiple page fault handlers can take the read
> + *    side lock to lookup the mapping and hence can run in parallel.
> + *    The older execbuf mode of binding do not need this lock.
> + *
> + * 2) The object's dma-resv lock will protect i915_vma state and needs
> + *    to be held while binding/unbinding a vma in the async worker and while
> + *    updating dma-resv fence list of an object. Note that private BOs of a VM
> + *    will all share a dma-resv object.
> + *
> + * 3) Spinlock/s to protect some of the VM's lists like the list of
> + *    invalidated vmas (due to eviction and userptr invalidation) etc.
> + */
> +
> +/**
> + * i915_gem_vm_bind_lookup_vma() - lookup for the vma with a starting addr
> + * @vm: virtual address space in which vma needs to be looked for
> + * @va: starting addr of the vma
> + *
> + * retrieves the vma with a starting address from the vm's vma tree.
> + *
> + * Returns: returns vma on success, NULL on failure.
> + */
> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
> +{
> +       lockdep_assert_held(&vm->vm_bind_lock);
> +
> +       return i915_vm_bind_it_iter_first(&vm->va, va, va);
> +}
> +
> +/**
> + * i915_gem_vm_bind_remove() - Remove vma from the vm bind list
> + * @vma: vma that needs to be removed
> + * @release_obj: object to be release or not
> + *
> + * Removes the vma from the vm's lists custom interval tree
> + */
> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
> +{
> +       lockdep_assert_held(&vma->vm->vm_bind_lock);
> +
> +       if (!list_empty(&vma->vm_bind_link)) {
> +               list_del_init(&vma->vm_bind_link);
> +               i915_vm_bind_it_remove(vma, &vma->vm->va);
> +
> +               /* Release object */
> +               if (release_obj)
> +                       i915_gem_object_put(vma->obj);
> +       }
> +}
> +
> +static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
> +                                 struct i915_vma *vma,
> +                                 struct drm_i915_gem_vm_unbind *va)
> +{
> +       struct drm_i915_gem_object *obj;
> +       int ret;
> +
> +       if (vma) {
> +               obj = vma->obj;
> +               i915_vma_destroy(vma);
> +
> +               goto exit;
> +       }
> +
> +       if (!va)
> +               return -EINVAL;
> +
> +       ret = mutex_lock_interruptible(&vm->vm_bind_lock);
> +       if (ret)
> +               return ret;
> +
> +       va->start = gen8_noncanonical_addr(va->start);
> +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +
> +       if (!vma)
> +               ret = -ENOENT;
> +       else if (vma->size != va->length)
> +               ret = -EINVAL;
> +
> +       if (ret) {
> +               mutex_unlock(&vm->vm_bind_lock);
> +               return ret;
> +       }
> +
> +       i915_gem_vm_bind_remove(vma, false);
> +
> +       mutex_unlock(&vm->vm_bind_lock);
> +
> +       /* Destroy vma and then release object */
> +       obj = vma->obj;
> +       ret = i915_gem_object_lock(obj, NULL);
> +       if (ret)
> +               return ret;
> +
> +       i915_vma_destroy(vma);
> +       i915_gem_object_unlock(obj);
> +
> +exit:
> +       i915_gem_object_put(obj);
> +
> +       return 0;
> +}
> +
> +/**
> + * i915_gem_vm_unbind_vma_all() - Unbind all vmas from an address space
> + * @vm: Address spece from which vma binding needs to be removed
> + *
> + * Unbind all userspace requested object binding
> + */
> +void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm)
> +{
> +       struct i915_vma *vma, *t;
> +
> +       list_for_each_entry_safe(vma, t, &vm->vm_bound_list, vm_bind_link)
> +               WARN_ON(i915_gem_vm_unbind_vma(vm, vma, NULL));
> +}
> +
> +static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
> +                                       struct drm_i915_gem_object *obj,
> +                                       struct drm_i915_gem_vm_bind *va)
> +{
> +       struct i915_ggtt_view view;
> +       struct i915_vma *vma;
> +
> +       va->start = gen8_noncanonical_addr(va->start);
> +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +       if (vma)
> +               return ERR_PTR(-EEXIST);
> +
> +       view.type = I915_GGTT_VIEW_PARTIAL;
> +       view.partial.offset = va->offset >> PAGE_SHIFT;
> +       view.partial.size = va->length >> PAGE_SHIFT;
> +       vma = i915_vma_instance(obj, vm, &view);
> +       if (IS_ERR(vma))
> +               return vma;
> +
> +       vma->start = va->start;
> +       vma->last = va->start + va->length - 1;
> +
> +       return vma;
> +}
> +
> +static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> +                               struct drm_i915_gem_vm_bind *va,
> +                               struct drm_file *file)
> +{
> +       struct drm_i915_gem_object *obj;
> +       struct i915_vma *vma = NULL;
> +       struct i915_gem_ww_ctx ww;
> +       u64 pin_flags;
> +       int ret = 0;
> +
> +       if (!vm->vm_bind_mode)
> +               return -EOPNOTSUPP;
> +
> +       obj = i915_gem_object_lookup(file, va->handle);
> +       if (!obj)
> +               return -ENOENT;
> +
> +       if (!va->length ||
> +           !IS_ALIGNED(va->offset | va->length,
> +                       i915_gem_object_max_page_size(obj->mm.placements,
> +                                                     obj->mm.n_placements)) ||
> +           range_overflows_t(u64, va->offset, va->length, obj->base.size)) {
> +               ret = -EINVAL;
> +               goto put_obj;
> +       }
> +
> +       ret = mutex_lock_interruptible(&vm->vm_bind_lock);
> +       if (ret)
> +               goto put_obj;
> +
> +       vma = vm_bind_get_vma(vm, obj, va);
> +       if (IS_ERR(vma)) {
> +               ret = PTR_ERR(vma);
> +               goto unlock_vm;
> +       }
> +
> +       for_i915_gem_ww(&ww, ret, true) {
> +retry:
> +               ret = i915_gem_object_lock(vma->obj, &ww);
> +               if (ret)
> +                       goto out_ww;
> +
> +               ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
> +               if (ret)
> +                       goto out_ww;
> +
> +               /* Make it evictable */
> +               __i915_vma_unpin(vma);
> +
> +               list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
> +               i915_vm_bind_it_insert(vma, &vm->va);
> +
> +out_ww:
> +               if (ret == -EDEADLK) {
> +                       ret = i915_gem_ww_ctx_backoff(&ww);
> +                       if (!ret)
> +                               goto retry;
> +               } else {
> +                       /* Hold object reference until vm_unbind */
> +                       i915_gem_object_get(vma->obj);
> +               }
> +       }
> +
> +unlock_vm:
> +       mutex_unlock(&vm->vm_bind_lock);
> +
> +put_obj:
> +       i915_gem_object_put(obj);
> +
> +       return ret;
> +}
> +
> +/**
> + * i915_gem_vm_bind_ioctl() - ioctl function for binding an obj into
> + * virtual address
> + * @dev: drm device associated to the virtual address
> + * @data: data related to the vm bind required
> + * @file: drm_file related to he ioctl
> + *
> + * Implements a function to bind the object into the virtual address
> + *
> + * Returns 0 on success, error code on failure.
> + */
> +int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
> +                          struct drm_file *file)
> +{
> +       struct drm_i915_gem_vm_bind *args = data;
> +       struct i915_address_space *vm;
> +       int ret;
> +
> +       vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> +       if (unlikely(!vm))
> +               return -ENOENT;
> +
> +       ret = i915_gem_vm_bind_obj(vm, args, file);
> +
> +       i915_vm_put(vm);
> +       return ret;
> +}
> +
> +/**
> + * i915_gem_vm_unbind_ioctl() - ioctl function for unbinding an obj from
> + * virtual address
> + * @dev: drm device associated to the virtual address
> + * @data: data related to the binding that needs to be unbinded
> + * @file: drm_file related to the ioctl
> + *
> + * Implements a function to unbind the object from the virtual address
> + *
> + * Returns 0 on success, error code on failure.
> + */
> +int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
> +                            struct drm_file *file)
> +{
> +       struct drm_i915_gem_vm_unbind *args = data;
> +       struct i915_address_space *vm;
> +       int ret;
> +
> +       vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> +       if (unlikely(!vm))
> +               return -ENOENT;
> +
> +       ret = i915_gem_vm_unbind_vma(vm, NULL, args);
> +
> +       i915_vm_put(vm);
> +       return ret;
> +}
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index b67831833c9a3..cb188377b7bd9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -12,6 +12,7 @@
>
>  #include "gem/i915_gem_internal.h"
>  #include "gem/i915_gem_lmem.h"
> +#include "gem/i915_gem_vm_bind.h"
>  #include "i915_trace.h"
>  #include "i915_utils.h"
>  #include "intel_gt.h"
> @@ -176,6 +177,8 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
>  void i915_address_space_fini(struct i915_address_space *vm)
>  {
>         drm_mm_takedown(&vm->mm);
> +       GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
> +       mutex_destroy(&vm->vm_bind_lock);
>  }
>
>  /**
> @@ -204,6 +207,8 @@ static void __i915_vm_release(struct work_struct *work)
>
>         __i915_vm_close(vm);
>
> +       i915_gem_vm_unbind_vma_all(vm);
> +
>         /* Synchronize async unbinds. */
>         i915_vma_resource_bind_dep_sync_all(vm);
>
> @@ -282,6 +287,11 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>
>         INIT_LIST_HEAD(&vm->bound_list);
>         INIT_LIST_HEAD(&vm->unbound_list);
> +
> +       vm->va = RB_ROOT_CACHED;
> +       INIT_LIST_HEAD(&vm->vm_bind_list);
> +       INIT_LIST_HEAD(&vm->vm_bound_list);
> +       mutex_init(&vm->vm_bind_lock);
>  }
>
>  void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index da21088890b3b..06a259475816b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -259,6 +259,15 @@ struct i915_address_space {
>          */
>         struct list_head unbound_list;
>
> +       /** @vm_bind_lock: Mutex to protect @vm_bind_list and @vm_bound_list */
> +       struct mutex vm_bind_lock;
> +       /** @vm_bind_list: List of vm_binding in process */
> +       struct list_head vm_bind_list;
> +       /** @vm_bound_list: List of vm_binding completed */
> +       struct list_head vm_bound_list;
> +       /* @va: tree of persistent vmas */
> +       struct rb_root_cached va;
> +
>         /* Global GTT */
>         bool is_ggtt:1;
>
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index 1332c70370a68..9a9010fd9ecfa 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -68,6 +68,7 @@
>  #include "gem/i915_gem_ioctls.h"
>  #include "gem/i915_gem_mman.h"
>  #include "gem/i915_gem_pm.h"
> +#include "gem/i915_gem_vm_bind.h"
>  #include "gt/intel_gt.h"
>  #include "gt/intel_gt_pm.h"
>  #include "gt/intel_rc6.h"
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 2603717164900..092ae4309d8a1 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -29,6 +29,7 @@
>  #include "display/intel_frontbuffer.h"
>  #include "gem/i915_gem_lmem.h"
>  #include "gem/i915_gem_tiling.h"
> +#include "gem/i915_gem_vm_bind.h"
>  #include "gt/intel_engine.h"
>  #include "gt/intel_engine_heartbeat.h"
>  #include "gt/intel_gt.h"
> @@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
>         spin_unlock(&obj->vma.lock);
>         mutex_unlock(&vm->mutex);
>
> +       INIT_LIST_HEAD(&vma->vm_bind_link);
>         return vma;
>
>  err_unlock:
> @@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>  {
>         struct i915_vma *vma;
>
> -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>         GEM_BUG_ON(!kref_read(&vm->ref));
>
>         spin_lock(&obj->vma.lock);
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 33a58f605d75c..15eac55a3e274 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
>  {
>         ptrdiff_t cmp;
>
> -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
> -
>         cmp = ptrdiff(vma->vm, vm);
>         if (cmp)
>                 return cmp;
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
> index be6e028c3b57d..f746fecae85ed 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -289,6 +289,20 @@ struct i915_vma {
>         /** This object's place on the active/inactive lists */
>         struct list_head vm_link;
>
> +       /** @vm_bind_link: node for the vm_bind related lists of vm */
> +       struct list_head vm_bind_link;
> +
> +       /** Interval tree structures for persistent vma */
> +
> +       /** @rb: node for the interval tree of vm for persistent vmas */
> +       struct rb_node rb;
> +       /** @start: start endpoint of the rb node */
> +       u64 start;
> +       /** @last: Last endpoint of the rb node */
> +       u64 last;
> +       /** @__subtree_last: last in subtree */
> +       u64 __subtree_last;

Was a drm_mm node considered for this or was it overkill?

Dave.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Intel-gfx] [RFC PATCH v3 10/17] drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
  2022-09-01  5:09     ` Niranjana Vishwanathapura
@ 2022-09-01  7:58       ` Tvrtko Ursulin
  2022-09-02  5:41         ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 41+ messages in thread
From: Tvrtko Ursulin @ 2022-09-01  7:58 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: Ramalingam C, intel-gfx, dri-devel, Thomas Hellstrom,
	Matthew Auld, Andi Shyti



On 01/09/2022 06:09, Niranjana Vishwanathapura wrote:
> On Wed, Aug 31, 2022 at 08:38:48AM +0100, Tvrtko Ursulin wrote:
>>
>> On 27/08/2022 20:43, Andi Shyti wrote:
>>> From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>>
>>> Implement new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
>>> works in vm_bind mode. The vm_bind mode only works with
>>> this new execbuf3 ioctl.
>>>
>>> The new execbuf3 ioctl will not have any list of objects to validate
>>> bind as all required objects binding would have been requested by the
>>> userspace before submitting the execbuf3.
>>>
>>> And the legacy support like relocations etc are removed.
>>>
>>> Signed-off-by: Niranjana Vishwanathapura 
>>> <niranjana.vishwanathapura@intel.com>
>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>> ---

[snip]

>>> +static void signal_fence_array(const struct i915_execbuffer *eb,
>>> +                   struct dma_fence * const fence)
>>> +{
>>> +    unsigned int n;
>>> +
>>> +    for (n = 0; n < eb->num_fences; n++) {
>>> +        struct drm_syncobj *syncobj;
>>> +        unsigned int flags;
>>> +
>>> +        syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
>>> +        if (!(flags & I915_TIMELINE_FENCE_SIGNAL))
>>> +            continue;
>>> +
>>> +        if (eb->fences[n].chain_fence) {
>>> +            drm_syncobj_add_point(syncobj,
>>> +                          eb->fences[n].chain_fence,
>>> +                          fence,
>>> +                          eb->fences[n].value);
>>> +            /*
>>> +             * The chain's ownership is transferred to the
>>> +             * timeline.
>>> +             */
>>> +            eb->fences[n].chain_fence = NULL;
>>> +        } else {
>>> +            drm_syncobj_replace_fence(syncobj, fence);
>>> +        }
>>> +    }
>>> +}
>> Semi-random place to ask - how many of the code here is direct copy of 
>> existing functions from i915_gem_execbuffer.c? There seems to be some 
>> 100% copies at least. And then some more with small tweaks. Spend some 
>> time and try to figure out some code sharing?
>>
> 
> During VM_BIND design review, maintainers expressed thought on keeping
> execbuf3 completely separate and not touch the legacy execbuf path.

Got a link so this maintainer can see what exactly was said? Just to 
make sure there isn't any misunderstanding on what "completely separate" 
means to different people.

> I also think, execbuf3 should be fully separate. We can do some code
> sharing where is a close 100% copy (there is a TODO in cover letter).
> There are some changes like the timeline fence array handling here
> which looks similar, but the uapi is not exactly the same. Probably,
> we should keep them separate and not try to force code sharing at
> least at this point.

Okay did not spot that TODO in the cover. But fair since it is RFC to be 
unfinished.

I do however think it should be improved before considering the merge. 
Because looking at the patch, 100% copies are:

for_each_batch_create_order
for_each_batch_add_order
eb_throttle
eb_pin_timeline
eb_pin_engine
eb_put_engine
__free_fence_array
put_fence_array
await_fence_array
signal_fence_array
retire_requests
eb_request_add
eb_requests_get
eb_requests_put
eb_find_context

Quite a lot.

Then there is a bunch of almost same functions which could be shared if 
there weren't two incompatible local struct i915_execbuffer's. 
Especially given when the out fence TODO item gets handled a chunk more 
will also become a 100% copy.

This could be done by having a common struct i915_execbuffer and then 
eb2 and eb3 specific parts which inherit from it. After that is done it 
should be easier to see if it makes sense to do something more and how.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object
  2022-09-01  5:31   ` Dave Airlie
@ 2022-09-01 20:05     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 41+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-01 20:05 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Andi Shyti, Ramalingam C, Intel Graphics Development, dri-devel,
	Thomas Hellstrom, Matthew Auld, Andi Shyti

On Thu, Sep 01, 2022 at 03:31:13PM +1000, Dave Airlie wrote:
>On Sun, 28 Aug 2022 at 05:45, Andi Shyti <andi.shyti@linux.intel.com> wrote:
>>
>> From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>
>> Implement the bind and unbind of an object at the specified GPU virtual
>> addresses.
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>> ---
>>  drivers/gpu/drm/i915/Makefile                 |   1 +
>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  21 ++
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 322 ++++++++++++++++++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  10 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>>  drivers/gpu/drm/i915/i915_driver.c            |   1 +
>>  drivers/gpu/drm/i915/i915_vma.c               |   3 +-
>>  drivers/gpu/drm/i915/i915_vma.h               |   2 -
>>  drivers/gpu/drm/i915/i915_vma_types.h         |  14 +
>>  include/uapi/drm/i915_drm.h                   | 163 +++++++++
>>  10 files changed, 543 insertions(+), 3 deletions(-)
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>

<snip>

>> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
>> index be6e028c3b57d..f746fecae85ed 100644
>> --- a/drivers/gpu/drm/i915/i915_vma_types.h
>> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>> @@ -289,6 +289,20 @@ struct i915_vma {
>>         /** This object's place on the active/inactive lists */
>>         struct list_head vm_link;
>>
>> +       /** @vm_bind_link: node for the vm_bind related lists of vm */
>> +       struct list_head vm_bind_link;
>> +
>> +       /** Interval tree structures for persistent vma */
>> +
>> +       /** @rb: node for the interval tree of vm for persistent vmas */
>> +       struct rb_node rb;
>> +       /** @start: start endpoint of the rb node */
>> +       u64 start;
>> +       /** @last: Last endpoint of the rb node */
>> +       u64 last;
>> +       /** @__subtree_last: last in subtree */
>> +       u64 __subtree_last;
>
>Was a drm_mm node considered for this or was it overkill?
>

We already have a drm_mm node (i915_vma.node). But currently in i915
driver, VA managment and binding of vmas are tightly coupled.
Ideally we want to decouple it and then use the same drm_mm node for
persistent vma lookup as well, instead of this new interval tree.
But decouple it is not trivial I think it needs to be carefully
done in a separate patch series to not cause any regression.

The new interval/rb tree here is an optimization for fast lookup of
persistent vma (instead of list walk) whether it is bound or not.
Eventually though, with above cleanup we should be able to use the
i915_vma.node for vma lookup (even when it is not bound).

I was briefly discussed in earlier version of this series (though topic
was different there).
https://lists.freedesktop.org/archives/intel-gfx/2022-July/301159.html

Niranjana

>Dave.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Intel-gfx] [RFC PATCH v3 10/17] drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
  2022-09-01  7:58       ` Tvrtko Ursulin
@ 2022-09-02  5:41         ` Niranjana Vishwanathapura
  2022-09-05 15:08           ` Tvrtko Ursulin
  0 siblings, 1 reply; 41+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-02  5:41 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Ramalingam C, intel-gfx, dri-devel, Thomas Hellstrom,
	Matthew Auld, Andi Shyti

On Thu, Sep 01, 2022 at 08:58:57AM +0100, Tvrtko Ursulin wrote:
>
>
>On 01/09/2022 06:09, Niranjana Vishwanathapura wrote:
>>On Wed, Aug 31, 2022 at 08:38:48AM +0100, Tvrtko Ursulin wrote:
>>>
>>>On 27/08/2022 20:43, Andi Shyti wrote:
>>>>From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>>>
>>>>Implement new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
>>>>works in vm_bind mode. The vm_bind mode only works with
>>>>this new execbuf3 ioctl.
>>>>
>>>>The new execbuf3 ioctl will not have any list of objects to validate
>>>>bind as all required objects binding would have been requested by the
>>>>userspace before submitting the execbuf3.
>>>>
>>>>And the legacy support like relocations etc are removed.
>>>>
>>>>Signed-off-by: Niranjana Vishwanathapura 
>>>><niranjana.vishwanathapura@intel.com>
>>>>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>>Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>>>---
>
>[snip]
>
>>>>+static void signal_fence_array(const struct i915_execbuffer *eb,
>>>>+                   struct dma_fence * const fence)
>>>>+{
>>>>+    unsigned int n;
>>>>+
>>>>+    for (n = 0; n < eb->num_fences; n++) {
>>>>+        struct drm_syncobj *syncobj;
>>>>+        unsigned int flags;
>>>>+
>>>>+        syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
>>>>+        if (!(flags & I915_TIMELINE_FENCE_SIGNAL))
>>>>+            continue;
>>>>+
>>>>+        if (eb->fences[n].chain_fence) {
>>>>+            drm_syncobj_add_point(syncobj,
>>>>+                          eb->fences[n].chain_fence,
>>>>+                          fence,
>>>>+                          eb->fences[n].value);
>>>>+            /*
>>>>+             * The chain's ownership is transferred to the
>>>>+             * timeline.
>>>>+             */
>>>>+            eb->fences[n].chain_fence = NULL;
>>>>+        } else {
>>>>+            drm_syncobj_replace_fence(syncobj, fence);
>>>>+        }
>>>>+    }
>>>>+}
>>>Semi-random place to ask - how many of the code here is direct 
>>>copy of existing functions from i915_gem_execbuffer.c? There seems 
>>>to be some 100% copies at least. And then some more with small 
>>>tweaks. Spend some time and try to figure out some code sharing?
>>>
>>
>>During VM_BIND design review, maintainers expressed thought on keeping
>>execbuf3 completely separate and not touch the legacy execbuf path.
>
>Got a link so this maintainer can see what exactly was said? Just to 
>make sure there isn't any misunderstanding on what "completely 
>separate" means to different people.

Here is one (search for copypaste/copy-paste)
https://patchwork.freedesktop.org/patch/486608/?series=93447&rev=3
It is hard to search for old discussion threads. May be maintainers
can provide feedback here directly. Dave, Daniel? :)

>
>>I also think, execbuf3 should be fully separate. We can do some code
>>sharing where is a close 100% copy (there is a TODO in cover letter).
>>There are some changes like the timeline fence array handling here
>>which looks similar, but the uapi is not exactly the same. Probably,
>>we should keep them separate and not try to force code sharing at
>>least at this point.
>
>Okay did not spot that TODO in the cover. But fair since it is RFC to 
>be unfinished.
>
>I do however think it should be improved before considering the merge. 
>Because looking at the patch, 100% copies are:
>
>for_each_batch_create_order
>for_each_batch_add_order
>eb_throttle
>eb_pin_timeline
>eb_pin_engine
>eb_put_engine
>__free_fence_array
>put_fence_array
>await_fence_array
>signal_fence_array
>retire_requests
>eb_request_add
>eb_requests_get
>eb_requests_put
>eb_find_context
>
>Quite a lot.
>
>Then there is a bunch of almost same functions which could be shared 
>if there weren't two incompatible local struct i915_execbuffer's. 
>Especially given when the out fence TODO item gets handled a chunk 
>more will also become a 100% copy.
>

There are difinitely a few which is 100% copies hence should have a
shared code.
But some are not. Like, fence_array stuff though looks very similar,
the uapi structures are different between execbuf3 and legacy execbuf.
The internal flags are also different (eg., __EXEC3_ENGINE_PINNED vs
__EXEC_ENGINE_PINNED) which causes minor differences hence not a
100% copy.

So, I am not convinced if it is worth carrying legacy stuff into
execbuf3 code. I think we need to look at these on a case by case
basis and see if abstracting common functionality to a separate
shared code makes sense or it is better to keep the code separate.

>This could be done by having a common struct i915_execbuffer and then 
>eb2 and eb3 specific parts which inherit from it. After that is done 
>it should be easier to see if it makes sense to do something more and 
>how.

I am not a big fan of it. I think we should not try to load the execbuf3
code with the legacy stuff.

Niranjana

>
>Regards,
>
>Tvrtko

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Intel-gfx] [RFC PATCH v3 10/17] drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
  2022-09-02  5:41         ` Niranjana Vishwanathapura
@ 2022-09-05 15:08           ` Tvrtko Ursulin
  2022-09-21  7:18             ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 41+ messages in thread
From: Tvrtko Ursulin @ 2022-09-05 15:08 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: Ramalingam C, intel-gfx, dri-devel, Thomas Hellstrom,
	Matthew Auld, Andi Shyti


On 02/09/2022 06:41, Niranjana Vishwanathapura wrote:
> On Thu, Sep 01, 2022 at 08:58:57AM +0100, Tvrtko Ursulin wrote:
>>
>>
>> On 01/09/2022 06:09, Niranjana Vishwanathapura wrote:
>>> On Wed, Aug 31, 2022 at 08:38:48AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 27/08/2022 20:43, Andi Shyti wrote:
>>>>> From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>>>>
>>>>> Implement new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
>>>>> works in vm_bind mode. The vm_bind mode only works with
>>>>> this new execbuf3 ioctl.
>>>>>
>>>>> The new execbuf3 ioctl will not have any list of objects to validate
>>>>> bind as all required objects binding would have been requested by the
>>>>> userspace before submitting the execbuf3.
>>>>>
>>>>> And the legacy support like relocations etc are removed.
>>>>>
>>>>> Signed-off-by: Niranjana Vishwanathapura 
>>>>> <niranjana.vishwanathapura@intel.com>
>>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>>> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>>>> ---
>>
>> [snip]
>>
>>>>> +static void signal_fence_array(const struct i915_execbuffer *eb,
>>>>> +                   struct dma_fence * const fence)
>>>>> +{
>>>>> +    unsigned int n;
>>>>> +
>>>>> +    for (n = 0; n < eb->num_fences; n++) {
>>>>> +        struct drm_syncobj *syncobj;
>>>>> +        unsigned int flags;
>>>>> +
>>>>> +        syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
>>>>> +        if (!(flags & I915_TIMELINE_FENCE_SIGNAL))
>>>>> +            continue;
>>>>> +
>>>>> +        if (eb->fences[n].chain_fence) {
>>>>> +            drm_syncobj_add_point(syncobj,
>>>>> +                          eb->fences[n].chain_fence,
>>>>> +                          fence,
>>>>> +                          eb->fences[n].value);
>>>>> +            /*
>>>>> +             * The chain's ownership is transferred to the
>>>>> +             * timeline.
>>>>> +             */
>>>>> +            eb->fences[n].chain_fence = NULL;
>>>>> +        } else {
>>>>> +            drm_syncobj_replace_fence(syncobj, fence);
>>>>> +        }
>>>>> +    }
>>>>> +}
>>>> Semi-random place to ask - how many of the code here is direct copy 
>>>> of existing functions from i915_gem_execbuffer.c? There seems to be 
>>>> some 100% copies at least. And then some more with small tweaks. 
>>>> Spend some time and try to figure out some code sharing?
>>>>
>>>
>>> During VM_BIND design review, maintainers expressed thought on keeping
>>> execbuf3 completely separate and not touch the legacy execbuf path.
>>
>> Got a link so this maintainer can see what exactly was said? Just to 
>> make sure there isn't any misunderstanding on what "completely 
>> separate" means to different people.
> 
> Here is one (search for copypaste/copy-paste)
> https://patchwork.freedesktop.org/patch/486608/?series=93447&rev=3
> It is hard to search for old discussion threads. May be maintainers
> can provide feedback here directly. Dave, Daniel? :)

Thanks. I had a read and don't see a fundamental conflict with what I 
said. Conclusion seemed to be to go with a new ioctl and implement code 
sharing where it makes sense. Which is what TODO in the cover letter 
acknowledges so there should be no disagreement really.

>>> I also think, execbuf3 should be fully separate. We can do some code
>>> sharing where is a close 100% copy (there is a TODO in cover letter).
>>> There are some changes like the timeline fence array handling here
>>> which looks similar, but the uapi is not exactly the same. Probably,
>>> we should keep them separate and not try to force code sharing at
>>> least at this point.
>>
>> Okay did not spot that TODO in the cover. But fair since it is RFC to 
>> be unfinished.
>>
>> I do however think it should be improved before considering the merge. 
>> Because looking at the patch, 100% copies are:
>>
>> for_each_batch_create_order
>> for_each_batch_add_order
>> eb_throttle
>> eb_pin_timeline
>> eb_pin_engine
>> eb_put_engine
>> __free_fence_array
>> put_fence_array
>> await_fence_array
>> signal_fence_array
>> retire_requests
>> eb_request_add
>> eb_requests_get
>> eb_requests_put
>> eb_find_context
>>
>> Quite a lot.
>>
>> Then there is a bunch of almost same functions which could be shared 
>> if there weren't two incompatible local struct i915_execbuffer's. 
>> Especially given when the out fence TODO item gets handled a chunk 
>> more will also become a 100% copy.
>>
> 
> There are difinitely a few which is 100% copies hence should have a
> shared code.
> But some are not. Like, fence_array stuff though looks very similar,
> the uapi structures are different between execbuf3 and legacy execbuf.
> The internal flags are also different (eg., __EXEC3_ENGINE_PINNED vs
> __EXEC_ENGINE_PINNED) which causes minor differences hence not a
> 100% copy.
> 
> So, I am not convinced if it is worth carrying legacy stuff into
> execbuf3 code. I think we need to look at these on a case by case
> basis and see if abstracting common functionality to a separate
> shared code makes sense or it is better to keep the code separate.

No one is suggesting to carry any legacy stuff into eb3. What I'd 
suggest is to start something like i915_gem_eb_common.h|c and stuff the 
100% copies from the above list in there.

Common struct eb with struct eb2 and eb3 inheriting from it should do 
the trick. Similarly eb->flags shouldn't be a hard problem to solve.

Then you see what remains and whether it makes sense to consolidate further.

Regards,

Tvrtko

>> This could be done by having a common struct i915_execbuffer and then 
>> eb2 and eb3 specific parts which inherit from it. After that is done 
>> it should be easier to see if it makes sense to do something more and 
>> how.
> 
> I am not a big fan of it. I think we should not try to load the execbuf3
> code with the legacy stuff.
> 
> Niranjana
> 
>>
>> Regards,
>>
>> Tvrtko

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object
  2022-08-27 19:43 ` [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object Andi Shyti
                     ` (2 preceding siblings ...)
  2022-09-01  5:31   ` Dave Airlie
@ 2022-09-12 13:11   ` Jani Nikula
  2022-09-21  7:19     ` Niranjana Vishwanathapura
  3 siblings, 1 reply; 41+ messages in thread
From: Jani Nikula @ 2022-09-12 13:11 UTC (permalink / raw)
  To: Andi Shyti, intel-gfx, dri-devel
  Cc: Andi Shyti, Ramalingam C, Thomas Hellstrom, Matthew Auld,
	Andi Shyti, Niranjana Vishwanathapura

On Sat, 27 Aug 2022, Andi Shyti <andi.shyti@linux.intel.com> wrote:
> From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>
> Implement the bind and unbind of an object at the specified GPU virtual
> addresses.
>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/Makefile                 |   1 +
>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  21 ++
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 322 ++++++++++++++++++
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  10 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>  drivers/gpu/drm/i915/i915_driver.c            |   1 +
>  drivers/gpu/drm/i915/i915_vma.c               |   3 +-
>  drivers/gpu/drm/i915/i915_vma.h               |   2 -
>  drivers/gpu/drm/i915/i915_vma_types.h         |  14 +
>  include/uapi/drm/i915_drm.h                   | 163 +++++++++
>  10 files changed, 543 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 522ef9b4aff32..4e1627e96c6e0 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -165,6 +165,7 @@ gem-y += \
>  	gem/i915_gem_ttm_move.o \
>  	gem/i915_gem_ttm_pm.o \
>  	gem/i915_gem_userptr.o \
> +	gem/i915_gem_vm_bind_object.o \
>  	gem/i915_gem_wait.o \
>  	gem/i915_gemfs.o
>  i915-y += \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> new file mode 100644
> index 0000000000000..ebc493b7dafc1
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#ifndef __I915_GEM_VM_BIND_H
> +#define __I915_GEM_VM_BIND_H
> +
> +#include "i915_drv.h"

Please only include what you need. Here, you'll only need to include
<linux/types.h> for u64 and bool. For everything else, add forward
declarations.

> +
> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
> +
> +int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
> +			   struct drm_file *file);
> +int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file);
> +
> +void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm);

Blank line here please.

> +#endif /* __I915_GEM_VM_BIND_H */
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> new file mode 100644
> index 0000000000000..dadd1d4b1761b
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -0,0 +1,322 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#include <linux/interval_tree_generic.h>
> +
> +#include "gem/i915_gem_vm_bind.h"
> +#include "gem/i915_gem_context.h"
> +#include "gt/gen8_engine_cs.h"
> +
> +#include "i915_drv.h"
> +#include "i915_gem_gtt.h"
> +
> +#define START(node) ((node)->start)
> +#define LAST(node) ((node)->last)
> +
> +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
> +		     START, LAST, static inline, i915_vm_bind_it)
> +
> +#undef START
> +#undef LAST
> +
> +/**
> + * DOC: VM_BIND/UNBIND ioctls
> + *
> + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
> + * objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
> + * specified address space (VM). Multiple mappings can map to the same physical
> + * pages of an object (aliasing). These mappings (also referred to as persistent
> + * mappings) will be persistent across multiple GPU submissions (execbuf calls)
> + * issued by the UMD, without user having to provide a list of all required
> + * mappings during each submission (as required by older execbuf mode).
> + *
> + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
> + * signaling the completion of bind/unbind operation.
> + *
> + * VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
> + * User has to opt-in for VM_BIND mode of binding for an address space (VM)
> + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be
> + * done asynchronously, when valid out fence is specified.
> + *
> + * VM_BIND locking order is as below.
> + *
> + * 1) vm_bind_lock mutex will protect vm_bind lists. This lock is taken in
> + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
> + *    mapping.
> + *
> + *    In future, when GPU page faults are supported, we can potentially use a
> + *    rwsem instead, so that multiple page fault handlers can take the read
> + *    side lock to lookup the mapping and hence can run in parallel.
> + *    The older execbuf mode of binding do not need this lock.
> + *
> + * 2) The object's dma-resv lock will protect i915_vma state and needs
> + *    to be held while binding/unbinding a vma in the async worker and while
> + *    updating dma-resv fence list of an object. Note that private BOs of a VM
> + *    will all share a dma-resv object.
> + *
> + * 3) Spinlock/s to protect some of the VM's lists like the list of
> + *    invalidated vmas (due to eviction and userptr invalidation) etc.
> + */
> +
> +/**
> + * i915_gem_vm_bind_lookup_vma() - lookup for the vma with a starting addr
> + * @vm: virtual address space in which vma needs to be looked for
> + * @va: starting addr of the vma
> + *
> + * retrieves the vma with a starting address from the vm's vma tree.
> + *
> + * Returns: returns vma on success, NULL on failure.
> + */
> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
> +{
> +	lockdep_assert_held(&vm->vm_bind_lock);
> +
> +	return i915_vm_bind_it_iter_first(&vm->va, va, va);
> +}
> +
> +/**
> + * i915_gem_vm_bind_remove() - Remove vma from the vm bind list
> + * @vma: vma that needs to be removed
> + * @release_obj: object to be release or not
> + *
> + * Removes the vma from the vm's lists custom interval tree
> + */
> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
> +{
> +	lockdep_assert_held(&vma->vm->vm_bind_lock);
> +
> +	if (!list_empty(&vma->vm_bind_link)) {
> +		list_del_init(&vma->vm_bind_link);
> +		i915_vm_bind_it_remove(vma, &vma->vm->va);
> +
> +		/* Release object */
> +		if (release_obj)
> +			i915_gem_object_put(vma->obj);
> +	}
> +}
> +
> +static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
> +				  struct i915_vma *vma,
> +				  struct drm_i915_gem_vm_unbind *va)
> +{
> +	struct drm_i915_gem_object *obj;
> +	int ret;
> +
> +	if (vma) {
> +		obj = vma->obj;
> +		i915_vma_destroy(vma);
> +
> +		goto exit;
> +	}
> +
> +	if (!va)
> +		return -EINVAL;
> +
> +	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
> +	if (ret)
> +		return ret;
> +
> +	va->start = gen8_noncanonical_addr(va->start);
> +	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +
> +	if (!vma)
> +		ret = -ENOENT;
> +	else if (vma->size != va->length)
> +		ret = -EINVAL;
> +
> +	if (ret) {
> +		mutex_unlock(&vm->vm_bind_lock);
> +		return ret;
> +	}
> +
> +	i915_gem_vm_bind_remove(vma, false);
> +
> +	mutex_unlock(&vm->vm_bind_lock);
> +
> +	/* Destroy vma and then release object */
> +	obj = vma->obj;
> +	ret = i915_gem_object_lock(obj, NULL);
> +	if (ret)
> +		return ret;
> +
> +	i915_vma_destroy(vma);
> +	i915_gem_object_unlock(obj);
> +
> +exit:
> +	i915_gem_object_put(obj);
> +
> +	return 0;
> +}
> +
> +/**
> + * i915_gem_vm_unbind_vma_all() - Unbind all vmas from an address space
> + * @vm: Address spece from which vma binding needs to be removed
> + *
> + * Unbind all userspace requested object binding
> + */
> +void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm)
> +{
> +	struct i915_vma *vma, *t;
> +
> +	list_for_each_entry_safe(vma, t, &vm->vm_bound_list, vm_bind_link)
> +		WARN_ON(i915_gem_vm_unbind_vma(vm, vma, NULL));
> +}
> +
> +static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
> +					struct drm_i915_gem_object *obj,
> +					struct drm_i915_gem_vm_bind *va)
> +{
> +	struct i915_ggtt_view view;
> +	struct i915_vma *vma;
> +
> +	va->start = gen8_noncanonical_addr(va->start);
> +	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +	if (vma)
> +		return ERR_PTR(-EEXIST);
> +
> +	view.type = I915_GGTT_VIEW_PARTIAL;
> +	view.partial.offset = va->offset >> PAGE_SHIFT;
> +	view.partial.size = va->length >> PAGE_SHIFT;
> +	vma = i915_vma_instance(obj, vm, &view);
> +	if (IS_ERR(vma))
> +		return vma;
> +
> +	vma->start = va->start;
> +	vma->last = va->start + va->length - 1;
> +
> +	return vma;
> +}
> +
> +static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> +				struct drm_i915_gem_vm_bind *va,
> +				struct drm_file *file)
> +{
> +	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma = NULL;
> +	struct i915_gem_ww_ctx ww;
> +	u64 pin_flags;
> +	int ret = 0;
> +
> +	if (!vm->vm_bind_mode)
> +		return -EOPNOTSUPP;
> +
> +	obj = i915_gem_object_lookup(file, va->handle);
> +	if (!obj)
> +		return -ENOENT;
> +
> +	if (!va->length ||
> +	    !IS_ALIGNED(va->offset | va->length,
> +			i915_gem_object_max_page_size(obj->mm.placements,
> +						      obj->mm.n_placements)) ||
> +	    range_overflows_t(u64, va->offset, va->length, obj->base.size)) {
> +		ret = -EINVAL;
> +		goto put_obj;
> +	}
> +
> +	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
> +	if (ret)
> +		goto put_obj;
> +
> +	vma = vm_bind_get_vma(vm, obj, va);
> +	if (IS_ERR(vma)) {
> +		ret = PTR_ERR(vma);
> +		goto unlock_vm;
> +	}
> +
> +	for_i915_gem_ww(&ww, ret, true) {
> +retry:
> +		ret = i915_gem_object_lock(vma->obj, &ww);
> +		if (ret)
> +			goto out_ww;
> +
> +		ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
> +		if (ret)
> +			goto out_ww;
> +
> +		/* Make it evictable */
> +		__i915_vma_unpin(vma);
> +
> +		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
> +		i915_vm_bind_it_insert(vma, &vm->va);
> +
> +out_ww:
> +		if (ret == -EDEADLK) {
> +			ret = i915_gem_ww_ctx_backoff(&ww);
> +			if (!ret)
> +				goto retry;
> +		} else {
> +			/* Hold object reference until vm_unbind */
> +			i915_gem_object_get(vma->obj);
> +		}
> +	}
> +
> +unlock_vm:
> +	mutex_unlock(&vm->vm_bind_lock);
> +
> +put_obj:
> +	i915_gem_object_put(obj);
> +
> +	return ret;
> +}
> +
> +/**
> + * i915_gem_vm_bind_ioctl() - ioctl function for binding an obj into
> + * virtual address
> + * @dev: drm device associated to the virtual address
> + * @data: data related to the vm bind required
> + * @file: drm_file related to he ioctl
> + *
> + * Implements a function to bind the object into the virtual address
> + *
> + * Returns 0 on success, error code on failure.
> + */
> +int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
> +			   struct drm_file *file)
> +{
> +	struct drm_i915_gem_vm_bind *args = data;
> +	struct i915_address_space *vm;
> +	int ret;
> +
> +	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> +	if (unlikely(!vm))
> +		return -ENOENT;
> +
> +	ret = i915_gem_vm_bind_obj(vm, args, file);
> +
> +	i915_vm_put(vm);
> +	return ret;
> +}
> +
> +/**
> + * i915_gem_vm_unbind_ioctl() - ioctl function for unbinding an obj from
> + * virtual address
> + * @dev: drm device associated to the virtual address
> + * @data: data related to the binding that needs to be unbinded
> + * @file: drm_file related to the ioctl
> + *
> + * Implements a function to unbind the object from the virtual address
> + *
> + * Returns 0 on success, error code on failure.
> + */
> +int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file)
> +{
> +	struct drm_i915_gem_vm_unbind *args = data;
> +	struct i915_address_space *vm;
> +	int ret;
> +
> +	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> +	if (unlikely(!vm))
> +		return -ENOENT;
> +
> +	ret = i915_gem_vm_unbind_vma(vm, NULL, args);
> +
> +	i915_vm_put(vm);
> +	return ret;
> +}
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index b67831833c9a3..cb188377b7bd9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -12,6 +12,7 @@
>  
>  #include "gem/i915_gem_internal.h"
>  #include "gem/i915_gem_lmem.h"
> +#include "gem/i915_gem_vm_bind.h"
>  #include "i915_trace.h"
>  #include "i915_utils.h"
>  #include "intel_gt.h"
> @@ -176,6 +177,8 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
>  void i915_address_space_fini(struct i915_address_space *vm)
>  {
>  	drm_mm_takedown(&vm->mm);
> +	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
> +	mutex_destroy(&vm->vm_bind_lock);
>  }
>  
>  /**
> @@ -204,6 +207,8 @@ static void __i915_vm_release(struct work_struct *work)
>  
>  	__i915_vm_close(vm);
>  
> +	i915_gem_vm_unbind_vma_all(vm);
> +
>  	/* Synchronize async unbinds. */
>  	i915_vma_resource_bind_dep_sync_all(vm);
>  
> @@ -282,6 +287,11 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>  
>  	INIT_LIST_HEAD(&vm->bound_list);
>  	INIT_LIST_HEAD(&vm->unbound_list);
> +
> +	vm->va = RB_ROOT_CACHED;
> +	INIT_LIST_HEAD(&vm->vm_bind_list);
> +	INIT_LIST_HEAD(&vm->vm_bound_list);
> +	mutex_init(&vm->vm_bind_lock);
>  }
>  
>  void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index da21088890b3b..06a259475816b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -259,6 +259,15 @@ struct i915_address_space {
>  	 */
>  	struct list_head unbound_list;
>  
> +	/** @vm_bind_lock: Mutex to protect @vm_bind_list and @vm_bound_list */
> +	struct mutex vm_bind_lock;
> +	/** @vm_bind_list: List of vm_binding in process */
> +	struct list_head vm_bind_list;
> +	/** @vm_bound_list: List of vm_binding completed */
> +	struct list_head vm_bound_list;
> +	/* @va: tree of persistent vmas */
> +	struct rb_root_cached va;
> +
>  	/* Global GTT */
>  	bool is_ggtt:1;
>  
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index 1332c70370a68..9a9010fd9ecfa 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -68,6 +68,7 @@
>  #include "gem/i915_gem_ioctls.h"
>  #include "gem/i915_gem_mman.h"
>  #include "gem/i915_gem_pm.h"
> +#include "gem/i915_gem_vm_bind.h"

Why do you add this here if you don't use it for anything?

>  #include "gt/intel_gt.h"
>  #include "gt/intel_gt_pm.h"
>  #include "gt/intel_rc6.h"
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 2603717164900..092ae4309d8a1 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -29,6 +29,7 @@
>  #include "display/intel_frontbuffer.h"
>  #include "gem/i915_gem_lmem.h"
>  #include "gem/i915_gem_tiling.h"
> +#include "gem/i915_gem_vm_bind.h"
>  #include "gt/intel_engine.h"
>  #include "gt/intel_engine_heartbeat.h"
>  #include "gt/intel_gt.h"
> @@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
>  	spin_unlock(&obj->vma.lock);
>  	mutex_unlock(&vm->mutex);
>  
> +	INIT_LIST_HEAD(&vma->vm_bind_link);
>  	return vma;
>  
>  err_unlock:
> @@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>  {
>  	struct i915_vma *vma;
>  
> -	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>  	GEM_BUG_ON(!kref_read(&vm->ref));
>  
>  	spin_lock(&obj->vma.lock);
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 33a58f605d75c..15eac55a3e274 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
>  {
>  	ptrdiff_t cmp;
>  
> -	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
> -
>  	cmp = ptrdiff(vma->vm, vm);
>  	if (cmp)
>  		return cmp;
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
> index be6e028c3b57d..f746fecae85ed 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -289,6 +289,20 @@ struct i915_vma {
>  	/** This object's place on the active/inactive lists */
>  	struct list_head vm_link;
>  
> +	/** @vm_bind_link: node for the vm_bind related lists of vm */
> +	struct list_head vm_bind_link;
> +
> +	/** Interval tree structures for persistent vma */
> +
> +	/** @rb: node for the interval tree of vm for persistent vmas */
> +	struct rb_node rb;
> +	/** @start: start endpoint of the rb node */
> +	u64 start;
> +	/** @last: Last endpoint of the rb node */
> +	u64 last;
> +	/** @__subtree_last: last in subtree */
> +	u64 __subtree_last;
> +
>  	struct list_head obj_link; /* Link in the object's VMA list */
>  	struct rb_node obj_node;
>  	struct hlist_node obj_hash;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 12435db751eb8..3da0e07f84bbd 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1507,6 +1507,41 @@ struct drm_i915_gem_execbuffer2 {
>  #define i915_execbuffer2_get_context_id(eb2) \
>  	((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
>  
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the completion of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> +	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
> +	__u32 handle;
> +
> +	/**
> +	 * @flags: Supported flags are:
> +	 *
> +	 * I915_TIMELINE_FENCE_WAIT:
> +	 * Wait for the input fence before the operation.
> +	 *
> +	 * I915_TIMELINE_FENCE_SIGNAL:
> +	 * Return operation completion fence as output.
> +	 */
> +	__u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
> +
> +	/**
> +	 * @value: A point in the timeline.
> +	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> +	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> +	 * binary one.
> +	 */
> +	__u64 value;
> +};
> +
>  struct drm_i915_gem_pin {
>  	/** Handle of the buffer to be pinned. */
>  	__u32 handle;
> @@ -3718,6 +3753,134 @@ struct drm_i915_gem_create_ext_protected_content {
>  /* ID of the protected content session managed by i915 when PXP is active */
>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>  
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
> + * virtual address (VA) range to the section of an object that should be bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not currently bound) and can
> + * be mapped to whole object or a section of the object (partial binding).
> + * Multiple VA mappings can be created to the same section of the object
> + * (aliasing).
> + *
> + * The @start, @offset and @length must be 4K page aligned. However the DG2
> + * and XEHPSDV has 64K page size for device local memory and has compact page
> + * table. On those platforms, for binding device local-memory objects, the
> + * @start, @offset and @length must be 64K aligned. Also, UMDs should not mix
> + * the local memory 64K page and the system memory 4K page bindings in the same
> + * 2M range.
> + *
> + * Error code -EINVAL will be returned if @start, @offset and @length are not
> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
> + * -ENOSPC will be returned if the VA range specified can't be reserved.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_bind {
> +	/** @vm_id: VM (address space) id to bind */
> +	__u32 vm_id;
> +
> +	/** @handle: Object handle */
> +	__u32 handle;
> +
> +	/** @start: Virtual Address start to bind */
> +	__u64 start;
> +
> +	/** @offset: Offset in object to bind */
> +	__u64 offset;
> +
> +	/** @length: Length of mapping to bind */
> +	__u64 length;
> +
> +	/**
> +	 * @flags: Currently reserved, MBZ.
> +	 *
> +	 * Note that @fence carries its own flags.
> +	 */
> +	__u64 flags;
> +
> +	/**
> +	 * @fence: Timeline fence for bind completion signaling.
> +	 *
> +	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
> +	 *
> +	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +	 * is invalid, and an error will be returned.
> +	 *
> +	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
> +	 * is not requested and binding is completed synchronously.
> +	 */
> +	struct drm_i915_gem_timeline_fence fence;
> +
> +	/**
> +	 * @extensions: Zero-terminated chain of extensions.
> +	 *
> +	 * For future extensions. See struct i915_user_extension.
> +	 */
> +	__u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> + *
> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
> + * address (VA) range that should be unbound from the device page table of the
> + * specified address space (VM). VM_UNBIND will force unbind the specified
> + * range from device page table without waiting for any GPU job to complete.
> + * It is UMDs responsibility to ensure the mapping is no longer in use before
> + * calling VM_UNBIND.
> + *
> + * If the specified mapping is not found, the ioctl will simply return without
> + * any error.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_unbind {
> +	/** @vm_id: VM (address space) id to bind */
> +	__u32 vm_id;
> +
> +	/** @rsvd: Reserved, MBZ */
> +	__u32 rsvd;
> +
> +	/** @start: Virtual Address start to unbind */
> +	__u64 start;
> +
> +	/** @length: Length of mapping to unbind */
> +	__u64 length;
> +
> +	/**
> +	 * @flags: Currently reserved, MBZ.
> +	 *
> +	 * Note that @fence carries its own flags.
> +	 */
> +	__u64 flags;
> +
> +	/**
> +	 * @fence: Timeline fence for unbind completion signaling.
> +	 *
> +	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
> +	 *
> +	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +	 * is invalid, and an error will be returned.
> +	 *
> +	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
> +	 * is not requested and unbinding is completed synchronously.
> +	 */
> +	struct drm_i915_gem_timeline_fence fence;
> +
> +	/**
> +	 * @extensions: Zero-terminated chain of extensions.
> +	 *
> +	 * For future extensions. See struct i915_user_extension.
> +	 */
> +	__u64 extensions;
> +};
> +
>  #if defined(__cplusplus)
>  }
>  #endif

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Intel-gfx] [RFC PATCH v3 07/17] drm/i915/vm_bind: Handle persistent vmas
  2022-08-27 19:43 ` [RFC PATCH v3 07/17] drm/i915/vm_bind: Handle persistent vmas Andi Shyti
  2022-08-31  6:16   ` Niranjana Vishwanathapura
@ 2022-09-12 13:16   ` Jani Nikula
  2022-09-21  7:21     ` Niranjana Vishwanathapura
  1 sibling, 1 reply; 41+ messages in thread
From: Jani Nikula @ 2022-09-12 13:16 UTC (permalink / raw)
  To: Andi Shyti, intel-gfx, dri-devel
  Cc: Thomas Hellstrom, Matthew Auld, Ramalingam C

On Sat, 27 Aug 2022, Andi Shyti <andi.shyti@linux.intel.com> wrote:
> From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>
> Treat VM_BIND vmas as persistent across execbuf ioctl calls and handle
> them during the request submission in the execbuff path.
>
> Support eviction by maintaining a list of evicted persistent vmas
> for rebinding during next submission.
>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  8 +++
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  4 ++
>  drivers/gpu/drm/i915/i915_gem_gtt.c           | 38 +++++++++++++
>  drivers/gpu/drm/i915/i915_gem_gtt.h           |  3 +
>  drivers/gpu/drm/i915/i915_vma.c               | 50 +++++++++++++++--
>  drivers/gpu/drm/i915/i915_vma.h               | 56 +++++++++++++++----
>  drivers/gpu/drm/i915/i915_vma_types.h         | 24 ++++++++
>  9 files changed, 169 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 389e9f157ca5e..825dce41f7113 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -38,6 +38,7 @@
>  #include "i915_gem_mman.h"
>  #include "i915_gem_object.h"
>  #include "i915_gem_ttm.h"
> +#include "i915_gem_vm_bind.h"

Why do you add this here if you're not using anything from there?

>  #include "i915_memcpy.h"
>  #include "i915_trace.h"
>  
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> index 9ff929f187cfd..3b45529fe8d4c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -91,6 +91,13 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
>  {
>  	lockdep_assert_held(&vma->vm->vm_bind_lock);
>  
> +	spin_lock(&vma->vm->vm_rebind_lock);
> +	if (!list_empty(&vma->vm_rebind_link))
> +		list_del_init(&vma->vm_rebind_link);
> +	i915_vma_set_purged(vma);
> +	i915_vma_set_freed(vma);
> +	spin_unlock(&vma->vm->vm_rebind_lock);
> +
>  	if (!list_empty(&vma->vm_bind_link)) {
>  		list_del_init(&vma->vm_bind_link);
>  		list_del_init(&vma->non_priv_vm_bind_link);
> @@ -190,6 +197,7 @@ static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
>  
>  	vma->start = va->start;
>  	vma->last = va->start + va->length - 1;
> +	i915_vma_set_persistent(vma);
>  
>  	return vma;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index c4f75826213ae..97cd0089b516d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -296,6 +296,8 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>  	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>  	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
>  	GEM_BUG_ON(IS_ERR(vm->root_obj));
> +	INIT_LIST_HEAD(&vm->vm_rebind_list);
> +	spin_lock_init(&vm->vm_rebind_lock);
>  }
>  
>  void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 9a2665e4ec2e5..1f3b1967ec175 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -265,6 +265,10 @@ struct i915_address_space {
>  	struct list_head vm_bind_list;
>  	/** @vm_bound_list: List of vm_binding completed */
>  	struct list_head vm_bound_list;
> +	/* @vm_rebind_list: list of vmas to be rebinded */
> +	struct list_head vm_rebind_list;
> +	/* @vm_rebind_lock: protects vm_rebound_list */
> +	spinlock_t vm_rebind_lock;
>  	/* @va: tree of persistent vmas */
>  	struct rb_root_cached va;
>  	struct list_head non_priv_vm_bind_list;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 329ff75b80b97..f083724163deb 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -25,6 +25,44 @@
>  #include "i915_trace.h"
>  #include "i915_vgpu.h"
>  
> +/**
> + * i915_vm_sync() - Wait for all requests on private vmas of a vm to be completed
> + * @vm: address space we need to wait for idle
> + *
> + * Waits till all requests of the vm_binded private objs are completed.
> + *
> + * Returns: 0 on success -ve errcode on failure
> + */
> +int i915_vm_sync(struct i915_address_space *vm)
> +{
> +	int ret;
> +
> +	/* Wait for all requests under this vm to finish */
> +	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
> +				    DMA_RESV_USAGE_BOOKKEEP, false,
> +				    MAX_SCHEDULE_TIMEOUT);
> +	if (ret < 0)
> +		return ret;
> +	else if (ret > 0)
> +		return 0;
> +	else
> +		return -ETIMEDOUT;
> +}
> +
> +/**
> + * i915_vm_is_active() - Check for activeness of requests of vm
> + * @vm: address spece targetted
> + *
> + * Check whether all the requests related private vmas are completed or not
> + *
> + * Returns: True when requests are not completed yet. Flase otherwise.
> + */
> +bool i915_vm_is_active(const struct i915_address_space *vm)
> +{
> +	return !dma_resv_test_signaled(vm->root_obj->base.resv,
> +				       DMA_RESV_USAGE_BOOKKEEP);
> +}
> +
>  int i915_gem_gtt_prepare_pages(struct drm_i915_gem_object *obj,
>  			       struct sg_table *pages)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 8c2f57eb5ddaa..a5bbdc59d9dfb 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -51,4 +51,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
>  
>  #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
>  
> +int i915_vm_sync(struct i915_address_space *vm);
> +bool i915_vm_is_active(const struct i915_address_space *vm);

Maybe I don't get the gem header structure, but why do you add these in
i915_gem_gtt.h but the implementation in i915_vma.c?

In general, declarations for stuff in i915_foo.c should be in
i915_foo.h.

BR,
Jani.

> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 239346e0c07f2..0eb7727d62a6f 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
>  
>  	INIT_LIST_HEAD(&vma->vm_bind_link);
>  	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
> +	INIT_LIST_HEAD(&vma->vm_rebind_link);
>  	return vma;
>  
>  err_unlock:
> @@ -387,8 +388,31 @@ int i915_vma_wait_for_bind(struct i915_vma *vma)
>  	return err;
>  }
>  
> -#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
> -static int i915_vma_verify_bind_complete(struct i915_vma *vma)
> +/**
> + * i915_vma_sync() - Wait for the vma to be idle
> + * @vma: vma to be tested
> + *
> + * Returns 0 on success and error code on failure
> + */
> +int i915_vma_sync(struct i915_vma *vma)
> +{
> +	int ret;
> +
> +	/* Wait for the asynchronous bindings and pending GPU reads */
> +	ret = i915_active_wait(&vma->active);
> +	if (ret || !i915_vma_is_persistent(vma) || i915_vma_is_purged(vma))
> +		return ret;
> +
> +	return i915_vm_sync(vma->vm);
> +}
> +
> +/**
> + * i915_vma_verify_bind_complete() - Check for the vm_bind completion of the vma
> + * @vma: vma submitted for vm_bind
> + *
> + * Returns: 0 if the vm_bind is completed. Error code otherwise.
> + */
> +int i915_vma_verify_bind_complete(struct i915_vma *vma)
>  {
>  	struct dma_fence *fence = i915_active_fence_get(&vma->active.excl);
>  	int err;
> @@ -405,9 +429,6 @@ static int i915_vma_verify_bind_complete(struct i915_vma *vma)
>  
>  	return err;
>  }
> -#else
> -#define i915_vma_verify_bind_complete(_vma) 0
> -#endif
>  
>  I915_SELFTEST_EXPORT void
>  i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res,
> @@ -1654,6 +1675,13 @@ static void force_unbind(struct i915_vma *vma)
>  	if (!drm_mm_node_allocated(&vma->node))
>  		return;
>  
> +	/*
> +	 * Mark persistent vma as purged to avoid it waiting
> +	 * for VM to be released.
> +	 */
> +	if (i915_vma_is_persistent(vma))
> +		i915_vma_set_purged(vma);
> +
>  	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
>  	WARN_ON(__i915_vma_unbind(vma));
>  	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
> @@ -1846,6 +1874,8 @@ int _i915_vma_move_to_active(struct i915_vma *vma,
>  	int err;
>  
>  	assert_object_held(obj);
> +	if (i915_vma_is_persistent(vma))
> +		return -EINVAL;
>  
>  	GEM_BUG_ON(!vma->pages);
>  
> @@ -2014,6 +2044,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
>  	__i915_vma_evict(vma, false);
>  
>  	drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
> +
> +	if (i915_vma_is_persistent(vma)) {
> +		spin_lock(&vma->vm->vm_rebind_lock);
> +		if (list_empty(&vma->vm_rebind_link) &&
> +		    !i915_vma_is_purged(vma))
> +			list_add_tail(&vma->vm_rebind_link,
> +				      &vma->vm->vm_rebind_list);
> +		spin_unlock(&vma->vm->vm_rebind_lock);
> +	}
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 15eac55a3e274..bf0b5b4abd919 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>  
>  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
>  #define I915_VMA_RELEASE_MAP BIT(0)
> -
> -static inline bool i915_vma_is_active(const struct i915_vma *vma)
> -{
> -	return !i915_active_is_idle(&vma->active);
> -}
> -
>  /* do not reserve memory to prevent deadlocks */
>  #define __EXEC_OBJECT_NO_RESERVE BIT(31)
>  
> @@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
>  	return i915_vm_to_ggtt(vma->vm)->pin_bias;
>  }
>  
> +static inline bool i915_vma_is_persistent(const struct i915_vma *vma)
> +{
> +	return test_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline void i915_vma_set_persistent(struct i915_vma *vma)
> +{
> +	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline bool i915_vma_is_purged(const struct i915_vma *vma)
> +{
> +	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline void i915_vma_set_purged(struct i915_vma *vma)
> +{
> +	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline bool i915_vma_is_freed(const struct i915_vma *vma)
> +{
> +	return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline void i915_vma_set_freed(struct i915_vma *vma)
> +{
> +	set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline bool i915_vma_is_active(const struct i915_vma *vma)
> +{
> +	if (i915_vma_is_persistent(vma)) {
> +		if (i915_vma_is_purged(vma))
> +			return false;
> +
> +		return i915_vm_is_active(vma->vm);
> +	}
> +
> +	return !i915_active_is_idle(&vma->active);
> +}
> +
>  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
>  {
>  	i915_gem_object_get(vma->obj);
> @@ -406,12 +442,8 @@ void i915_vma_make_shrinkable(struct i915_vma *vma);
>  void i915_vma_make_purgeable(struct i915_vma *vma);
>  
>  int i915_vma_wait_for_bind(struct i915_vma *vma);
> -
> -static inline int i915_vma_sync(struct i915_vma *vma)
> -{
> -	/* Wait for the asynchronous bindings and pending GPU reads */
> -	return i915_active_wait(&vma->active);
> -}
> +int i915_vma_verify_bind_complete(struct i915_vma *vma);
> +int i915_vma_sync(struct i915_vma *vma);
>  
>  /**
>   * i915_vma_get_current_resource - Get the current resource of the vma
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
> index de5534d518cdd..5483ccf0c82c7 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -264,6 +264,28 @@ struct i915_vma {
>  #define I915_VMA_SCANOUT_BIT	17
>  #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
>  
> +  /**
> +   * I915_VMA_PERSISTENT_BIT:
> +   * The vma is persistent (created with VM_BIND call).
> +   *
> +   * I915_VMA_PURGED_BIT:
> +   * The persistent vma is force unbound either due to VM_UNBIND call
> +   * from UMD or VM is released. Do not check/wait for VM activeness
> +   * in i915_vma_is_active() and i915_vma_sync() calls.
> +   *
> +   * I915_VMA_FREED_BIT:
> +   * The persistent vma is being released by UMD via VM_UNBIND call.
> +   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND call
> +   * already holds the lock.
> +   */
> +#define I915_VMA_PERSISTENT_BIT	19
> +#define I915_VMA_PURGED_BIT	20
> +#define I915_VMA_FREED_BIT	21
> +
> +#define I915_VMA_PERSISTENT	((int)BIT(I915_VMA_PERSISTENT_BIT))
> +#define I915_VMA_PURGED		((int)BIT(I915_VMA_PURGED_BIT))
> +#define I915_VMA_FREED		((int)BIT(I915_VMA_FREED_BIT))
> +
>  	struct i915_active active;
>  
>  #define I915_VMA_PAGES_BIAS 24
> @@ -293,6 +315,8 @@ struct i915_vma {
>  	struct list_head vm_bind_link;
>  	/* @non_priv_vm_bind_link: Link in non-private persistent VMA list */
>  	struct list_head non_priv_vm_bind_link;
> +	/* @vm_rebind_link: link to vm_rebind_list and protected by vm_rebind_lock */
> +	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
>  
>  	/** Interval tree structures for persistent vma */

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Intel-gfx] [RFC PATCH v3 10/17] drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
  2022-09-05 15:08           ` Tvrtko Ursulin
@ 2022-09-21  7:18             ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 41+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:18 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Ramalingam C, intel-gfx, dri-devel, Thomas Hellstrom,
	Matthew Auld, Andi Shyti

On Mon, Sep 05, 2022 at 04:08:57PM +0100, Tvrtko Ursulin wrote:
>
>On 02/09/2022 06:41, Niranjana Vishwanathapura wrote:
>>On Thu, Sep 01, 2022 at 08:58:57AM +0100, Tvrtko Ursulin wrote:
>>>
>>>
>>>On 01/09/2022 06:09, Niranjana Vishwanathapura wrote:
>>>>On Wed, Aug 31, 2022 at 08:38:48AM +0100, Tvrtko Ursulin wrote:
>>>>>
>>>>>On 27/08/2022 20:43, Andi Shyti wrote:
>>>>>>From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>>>>>
>>>>>>Implement new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
>>>>>>works in vm_bind mode. The vm_bind mode only works with
>>>>>>this new execbuf3 ioctl.
>>>>>>
>>>>>>The new execbuf3 ioctl will not have any list of objects to validate
>>>>>>bind as all required objects binding would have been requested by the
>>>>>>userspace before submitting the execbuf3.
>>>>>>
>>>>>>And the legacy support like relocations etc are removed.
>>>>>>
>>>>>>Signed-off-by: Niranjana Vishwanathapura 
>>>>>><niranjana.vishwanathapura@intel.com>
>>>>>>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>>>>Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>>>>>---
>>>
>>>[snip]
>>>
>>>>>>+static void signal_fence_array(const struct i915_execbuffer *eb,
>>>>>>+                   struct dma_fence * const fence)
>>>>>>+{
>>>>>>+    unsigned int n;
>>>>>>+
>>>>>>+    for (n = 0; n < eb->num_fences; n++) {
>>>>>>+        struct drm_syncobj *syncobj;
>>>>>>+        unsigned int flags;
>>>>>>+
>>>>>>+        syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
>>>>>>+        if (!(flags & I915_TIMELINE_FENCE_SIGNAL))
>>>>>>+            continue;
>>>>>>+
>>>>>>+        if (eb->fences[n].chain_fence) {
>>>>>>+            drm_syncobj_add_point(syncobj,
>>>>>>+                          eb->fences[n].chain_fence,
>>>>>>+                          fence,
>>>>>>+                          eb->fences[n].value);
>>>>>>+            /*
>>>>>>+             * The chain's ownership is transferred to the
>>>>>>+             * timeline.
>>>>>>+             */
>>>>>>+            eb->fences[n].chain_fence = NULL;
>>>>>>+        } else {
>>>>>>+            drm_syncobj_replace_fence(syncobj, fence);
>>>>>>+        }
>>>>>>+    }
>>>>>>+}
>>>>>Semi-random place to ask - how many of the code here is direct 
>>>>>copy of existing functions from i915_gem_execbuffer.c? There 
>>>>>seems to be some 100% copies at least. And then some more with 
>>>>>small tweaks. Spend some time and try to figure out some code 
>>>>>sharing?
>>>>>
>>>>
>>>>During VM_BIND design review, maintainers expressed thought on keeping
>>>>execbuf3 completely separate and not touch the legacy execbuf path.
>>>
>>>Got a link so this maintainer can see what exactly was said? Just 
>>>to make sure there isn't any misunderstanding on what "completely 
>>>separate" means to different people.
>>
>>Here is one (search for copypaste/copy-paste)
>>https://patchwork.freedesktop.org/patch/486608/?series=93447&rev=3
>>It is hard to search for old discussion threads. May be maintainers
>>can provide feedback here directly. Dave, Daniel? :)
>
>Thanks. I had a read and don't see a fundamental conflict with what I 
>said. Conclusion seemed to be to go with a new ioctl and implement 
>code sharing where it makes sense. Which is what TODO in the cover 
>letter acknowledges so there should be no disagreement really.
>
>>>>I also think, execbuf3 should be fully separate. We can do some code
>>>>sharing where is a close 100% copy (there is a TODO in cover letter).
>>>>There are some changes like the timeline fence array handling here
>>>>which looks similar, but the uapi is not exactly the same. Probably,
>>>>we should keep them separate and not try to force code sharing at
>>>>least at this point.
>>>
>>>Okay did not spot that TODO in the cover. But fair since it is RFC 
>>>to be unfinished.
>>>
>>>I do however think it should be improved before considering the 
>>>merge. Because looking at the patch, 100% copies are:
>>>
>>>for_each_batch_create_order
>>>for_each_batch_add_order
>>>eb_throttle
>>>eb_pin_timeline
>>>eb_pin_engine
>>>eb_put_engine
>>>__free_fence_array
>>>put_fence_array
>>>await_fence_array
>>>signal_fence_array
>>>retire_requests
>>>eb_request_add
>>>eb_requests_get
>>>eb_requests_put
>>>eb_find_context
>>>
>>>Quite a lot.
>>>
>>>Then there is a bunch of almost same functions which could be 
>>>shared if there weren't two incompatible local struct 
>>>i915_execbuffer's. Especially given when the out fence TODO item 
>>>gets handled a chunk more will also become a 100% copy.
>>>
>>
>>There are difinitely a few which is 100% copies hence should have a
>>shared code.
>>But some are not. Like, fence_array stuff though looks very similar,
>>the uapi structures are different between execbuf3 and legacy execbuf.
>>The internal flags are also different (eg., __EXEC3_ENGINE_PINNED vs
>>__EXEC_ENGINE_PINNED) which causes minor differences hence not a
>>100% copy.
>>
>>So, I am not convinced if it is worth carrying legacy stuff into
>>execbuf3 code. I think we need to look at these on a case by case
>>basis and see if abstracting common functionality to a separate
>>shared code makes sense or it is better to keep the code separate.
>
>No one is suggesting to carry any legacy stuff into eb3. What I'd 
>suggest is to start something like i915_gem_eb_common.h|c and stuff 
>the 100% copies from the above list in there.
>
>Common struct eb with struct eb2 and eb3 inheriting from it should do 
>the trick. Similarly eb->flags shouldn't be a hard problem to solve.
>
>Then you see what remains and whether it makes sense to consolidate further.
>

Tvrtko,

I have posted vm_bind v4 rfc series with some code sharing between two
execbuf paths. My approach was not having to touch legacy execbuff path
all over the place.

Regards,
Niranjana

>Regards,
>
>Tvrtko
>
>>>This could be done by having a common struct i915_execbuffer and 
>>>then eb2 and eb3 specific parts which inherit from it. After that 
>>>is done it should be easier to see if it makes sense to do 
>>>something more and how.
>>
>>I am not a big fan of it. I think we should not try to load the execbuf3
>>code with the legacy stuff.
>>
>>Niranjana
>>
>>>
>>>Regards,
>>>
>>>Tvrtko

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object
  2022-09-12 13:11   ` Jani Nikula
@ 2022-09-21  7:19     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 41+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:19 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Andi Shyti, Ramalingam C, intel-gfx, dri-devel, Thomas Hellstrom,
	Matthew Auld, Andi Shyti

On Mon, Sep 12, 2022 at 04:11:54PM +0300, Jani Nikula wrote:
>On Sat, 27 Aug 2022, Andi Shyti <andi.shyti@linux.intel.com> wrote:
>> From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>
>> Implement the bind and unbind of an object at the specified GPU virtual
>> addresses.
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>> ---
>>  drivers/gpu/drm/i915/Makefile                 |   1 +
>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  21 ++
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 322 ++++++++++++++++++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  10 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>>  drivers/gpu/drm/i915/i915_driver.c            |   1 +
>>  drivers/gpu/drm/i915/i915_vma.c               |   3 +-
>>  drivers/gpu/drm/i915/i915_vma.h               |   2 -
>>  drivers/gpu/drm/i915/i915_vma_types.h         |  14 +
>>  include/uapi/drm/i915_drm.h                   | 163 +++++++++
>>  10 files changed, 543 insertions(+), 3 deletions(-)
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>
>> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>> index 522ef9b4aff32..4e1627e96c6e0 100644
>> --- a/drivers/gpu/drm/i915/Makefile
>> +++ b/drivers/gpu/drm/i915/Makefile
>> @@ -165,6 +165,7 @@ gem-y += \
>>  	gem/i915_gem_ttm_move.o \
>>  	gem/i915_gem_ttm_pm.o \
>>  	gem/i915_gem_userptr.o \
>> +	gem/i915_gem_vm_bind_object.o \
>>  	gem/i915_gem_wait.o \
>>  	gem/i915_gemfs.o
>>  i915-y += \
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> new file mode 100644
>> index 0000000000000..ebc493b7dafc1
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> @@ -0,0 +1,21 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +#ifndef __I915_GEM_VM_BIND_H
>> +#define __I915_GEM_VM_BIND_H
>> +
>> +#include "i915_drv.h"
>
>Please only include what you need. Here, you'll only need to include
><linux/types.h> for u64 and bool. For everything else, add forward
>declarations.

Jani,

Thanks. I have posted vm_bind v4 rfc series with your comments
addressed.

Regards,
Niranjana

>
>> +
>> +struct i915_vma *
>> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
>> +
>> +int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
>> +			   struct drm_file *file);
>> +int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
>> +			     struct drm_file *file);
>> +
>> +void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm);
>
>Blank line here please.
>
>> +#endif /* __I915_GEM_VM_BIND_H */
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> new file mode 100644
>> index 0000000000000..dadd1d4b1761b
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> @@ -0,0 +1,322 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +#include <linux/interval_tree_generic.h>
>> +
>> +#include "gem/i915_gem_vm_bind.h"
>> +#include "gem/i915_gem_context.h"
>> +#include "gt/gen8_engine_cs.h"
>> +
>> +#include "i915_drv.h"
>> +#include "i915_gem_gtt.h"
>> +
>> +#define START(node) ((node)->start)
>> +#define LAST(node) ((node)->last)
>> +
>> +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
>> +		     START, LAST, static inline, i915_vm_bind_it)
>> +
>> +#undef START
>> +#undef LAST
>> +
>> +/**
>> + * DOC: VM_BIND/UNBIND ioctls
>> + *
>> + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
>> + * objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
>> + * specified address space (VM). Multiple mappings can map to the same physical
>> + * pages of an object (aliasing). These mappings (also referred to as persistent
>> + * mappings) will be persistent across multiple GPU submissions (execbuf calls)
>> + * issued by the UMD, without user having to provide a list of all required
>> + * mappings during each submission (as required by older execbuf mode).
>> + *
>> + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
>> + * signaling the completion of bind/unbind operation.
>> + *
>> + * VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
>> + * User has to opt-in for VM_BIND mode of binding for an address space (VM)
>> + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
>> + *
>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>> + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be
>> + * done asynchronously, when valid out fence is specified.
>> + *
>> + * VM_BIND locking order is as below.
>> + *
>> + * 1) vm_bind_lock mutex will protect vm_bind lists. This lock is taken in
>> + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
>> + *    mapping.
>> + *
>> + *    In future, when GPU page faults are supported, we can potentially use a
>> + *    rwsem instead, so that multiple page fault handlers can take the read
>> + *    side lock to lookup the mapping and hence can run in parallel.
>> + *    The older execbuf mode of binding do not need this lock.
>> + *
>> + * 2) The object's dma-resv lock will protect i915_vma state and needs
>> + *    to be held while binding/unbinding a vma in the async worker and while
>> + *    updating dma-resv fence list of an object. Note that private BOs of a VM
>> + *    will all share a dma-resv object.
>> + *
>> + * 3) Spinlock/s to protect some of the VM's lists like the list of
>> + *    invalidated vmas (due to eviction and userptr invalidation) etc.
>> + */
>> +
>> +/**
>> + * i915_gem_vm_bind_lookup_vma() - lookup for the vma with a starting addr
>> + * @vm: virtual address space in which vma needs to be looked for
>> + * @va: starting addr of the vma
>> + *
>> + * retrieves the vma with a starting address from the vm's vma tree.
>> + *
>> + * Returns: returns vma on success, NULL on failure.
>> + */
>> +struct i915_vma *
>> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
>> +{
>> +	lockdep_assert_held(&vm->vm_bind_lock);
>> +
>> +	return i915_vm_bind_it_iter_first(&vm->va, va, va);
>> +}
>> +
>> +/**
>> + * i915_gem_vm_bind_remove() - Remove vma from the vm bind list
>> + * @vma: vma that needs to be removed
>> + * @release_obj: object to be release or not
>> + *
>> + * Removes the vma from the vm's lists custom interval tree
>> + */
>> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
>> +{
>> +	lockdep_assert_held(&vma->vm->vm_bind_lock);
>> +
>> +	if (!list_empty(&vma->vm_bind_link)) {
>> +		list_del_init(&vma->vm_bind_link);
>> +		i915_vm_bind_it_remove(vma, &vma->vm->va);
>> +
>> +		/* Release object */
>> +		if (release_obj)
>> +			i915_gem_object_put(vma->obj);
>> +	}
>> +}
>> +
>> +static int i915_gem_vm_unbind_vma(struct i915_address_space *vm,
>> +				  struct i915_vma *vma,
>> +				  struct drm_i915_gem_vm_unbind *va)
>> +{
>> +	struct drm_i915_gem_object *obj;
>> +	int ret;
>> +
>> +	if (vma) {
>> +		obj = vma->obj;
>> +		i915_vma_destroy(vma);
>> +
>> +		goto exit;
>> +	}
>> +
>> +	if (!va)
>> +		return -EINVAL;
>> +
>> +	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
>> +	if (ret)
>> +		return ret;
>> +
>> +	va->start = gen8_noncanonical_addr(va->start);
>> +	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>> +
>> +	if (!vma)
>> +		ret = -ENOENT;
>> +	else if (vma->size != va->length)
>> +		ret = -EINVAL;
>> +
>> +	if (ret) {
>> +		mutex_unlock(&vm->vm_bind_lock);
>> +		return ret;
>> +	}
>> +
>> +	i915_gem_vm_bind_remove(vma, false);
>> +
>> +	mutex_unlock(&vm->vm_bind_lock);
>> +
>> +	/* Destroy vma and then release object */
>> +	obj = vma->obj;
>> +	ret = i915_gem_object_lock(obj, NULL);
>> +	if (ret)
>> +		return ret;
>> +
>> +	i915_vma_destroy(vma);
>> +	i915_gem_object_unlock(obj);
>> +
>> +exit:
>> +	i915_gem_object_put(obj);
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * i915_gem_vm_unbind_vma_all() - Unbind all vmas from an address space
>> + * @vm: Address spece from which vma binding needs to be removed
>> + *
>> + * Unbind all userspace requested object binding
>> + */
>> +void i915_gem_vm_unbind_vma_all(struct i915_address_space *vm)
>> +{
>> +	struct i915_vma *vma, *t;
>> +
>> +	list_for_each_entry_safe(vma, t, &vm->vm_bound_list, vm_bind_link)
>> +		WARN_ON(i915_gem_vm_unbind_vma(vm, vma, NULL));
>> +}
>> +
>> +static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
>> +					struct drm_i915_gem_object *obj,
>> +					struct drm_i915_gem_vm_bind *va)
>> +{
>> +	struct i915_ggtt_view view;
>> +	struct i915_vma *vma;
>> +
>> +	va->start = gen8_noncanonical_addr(va->start);
>> +	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>> +	if (vma)
>> +		return ERR_PTR(-EEXIST);
>> +
>> +	view.type = I915_GGTT_VIEW_PARTIAL;
>> +	view.partial.offset = va->offset >> PAGE_SHIFT;
>> +	view.partial.size = va->length >> PAGE_SHIFT;
>> +	vma = i915_vma_instance(obj, vm, &view);
>> +	if (IS_ERR(vma))
>> +		return vma;
>> +
>> +	vma->start = va->start;
>> +	vma->last = va->start + va->length - 1;
>> +
>> +	return vma;
>> +}
>> +
>> +static int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>> +				struct drm_i915_gem_vm_bind *va,
>> +				struct drm_file *file)
>> +{
>> +	struct drm_i915_gem_object *obj;
>> +	struct i915_vma *vma = NULL;
>> +	struct i915_gem_ww_ctx ww;
>> +	u64 pin_flags;
>> +	int ret = 0;
>> +
>> +	if (!vm->vm_bind_mode)
>> +		return -EOPNOTSUPP;
>> +
>> +	obj = i915_gem_object_lookup(file, va->handle);
>> +	if (!obj)
>> +		return -ENOENT;
>> +
>> +	if (!va->length ||
>> +	    !IS_ALIGNED(va->offset | va->length,
>> +			i915_gem_object_max_page_size(obj->mm.placements,
>> +						      obj->mm.n_placements)) ||
>> +	    range_overflows_t(u64, va->offset, va->length, obj->base.size)) {
>> +		ret = -EINVAL;
>> +		goto put_obj;
>> +	}
>> +
>> +	ret = mutex_lock_interruptible(&vm->vm_bind_lock);
>> +	if (ret)
>> +		goto put_obj;
>> +
>> +	vma = vm_bind_get_vma(vm, obj, va);
>> +	if (IS_ERR(vma)) {
>> +		ret = PTR_ERR(vma);
>> +		goto unlock_vm;
>> +	}
>> +
>> +	for_i915_gem_ww(&ww, ret, true) {
>> +retry:
>> +		ret = i915_gem_object_lock(vma->obj, &ww);
>> +		if (ret)
>> +			goto out_ww;
>> +
>> +		ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
>> +		if (ret)
>> +			goto out_ww;
>> +
>> +		/* Make it evictable */
>> +		__i915_vma_unpin(vma);
>> +
>> +		list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>> +		i915_vm_bind_it_insert(vma, &vm->va);
>> +
>> +out_ww:
>> +		if (ret == -EDEADLK) {
>> +			ret = i915_gem_ww_ctx_backoff(&ww);
>> +			if (!ret)
>> +				goto retry;
>> +		} else {
>> +			/* Hold object reference until vm_unbind */
>> +			i915_gem_object_get(vma->obj);
>> +		}
>> +	}
>> +
>> +unlock_vm:
>> +	mutex_unlock(&vm->vm_bind_lock);
>> +
>> +put_obj:
>> +	i915_gem_object_put(obj);
>> +
>> +	return ret;
>> +}
>> +
>> +/**
>> + * i915_gem_vm_bind_ioctl() - ioctl function for binding an obj into
>> + * virtual address
>> + * @dev: drm device associated to the virtual address
>> + * @data: data related to the vm bind required
>> + * @file: drm_file related to he ioctl
>> + *
>> + * Implements a function to bind the object into the virtual address
>> + *
>> + * Returns 0 on success, error code on failure.
>> + */
>> +int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
>> +			   struct drm_file *file)
>> +{
>> +	struct drm_i915_gem_vm_bind *args = data;
>> +	struct i915_address_space *vm;
>> +	int ret;
>> +
>> +	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>> +	if (unlikely(!vm))
>> +		return -ENOENT;
>> +
>> +	ret = i915_gem_vm_bind_obj(vm, args, file);
>> +
>> +	i915_vm_put(vm);
>> +	return ret;
>> +}
>> +
>> +/**
>> + * i915_gem_vm_unbind_ioctl() - ioctl function for unbinding an obj from
>> + * virtual address
>> + * @dev: drm device associated to the virtual address
>> + * @data: data related to the binding that needs to be unbinded
>> + * @file: drm_file related to the ioctl
>> + *
>> + * Implements a function to unbind the object from the virtual address
>> + *
>> + * Returns 0 on success, error code on failure.
>> + */
>> +int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
>> +			     struct drm_file *file)
>> +{
>> +	struct drm_i915_gem_vm_unbind *args = data;
>> +	struct i915_address_space *vm;
>> +	int ret;
>> +
>> +	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>> +	if (unlikely(!vm))
>> +		return -ENOENT;
>> +
>> +	ret = i915_gem_vm_unbind_vma(vm, NULL, args);
>> +
>> +	i915_vm_put(vm);
>> +	return ret;
>> +}
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index b67831833c9a3..cb188377b7bd9 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -12,6 +12,7 @@
>>
>>  #include "gem/i915_gem_internal.h"
>>  #include "gem/i915_gem_lmem.h"
>> +#include "gem/i915_gem_vm_bind.h"
>>  #include "i915_trace.h"
>>  #include "i915_utils.h"
>>  #include "intel_gt.h"
>> @@ -176,6 +177,8 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
>>  void i915_address_space_fini(struct i915_address_space *vm)
>>  {
>>  	drm_mm_takedown(&vm->mm);
>> +	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>> +	mutex_destroy(&vm->vm_bind_lock);
>>  }
>>
>>  /**
>> @@ -204,6 +207,8 @@ static void __i915_vm_release(struct work_struct *work)
>>
>>  	__i915_vm_close(vm);
>>
>> +	i915_gem_vm_unbind_vma_all(vm);
>> +
>>  	/* Synchronize async unbinds. */
>>  	i915_vma_resource_bind_dep_sync_all(vm);
>>
>> @@ -282,6 +287,11 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>>
>>  	INIT_LIST_HEAD(&vm->bound_list);
>>  	INIT_LIST_HEAD(&vm->unbound_list);
>> +
>> +	vm->va = RB_ROOT_CACHED;
>> +	INIT_LIST_HEAD(&vm->vm_bind_list);
>> +	INIT_LIST_HEAD(&vm->vm_bound_list);
>> +	mutex_init(&vm->vm_bind_lock);
>>  }
>>
>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index da21088890b3b..06a259475816b 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -259,6 +259,15 @@ struct i915_address_space {
>>  	 */
>>  	struct list_head unbound_list;
>>
>> +	/** @vm_bind_lock: Mutex to protect @vm_bind_list and @vm_bound_list */
>> +	struct mutex vm_bind_lock;
>> +	/** @vm_bind_list: List of vm_binding in process */
>> +	struct list_head vm_bind_list;
>> +	/** @vm_bound_list: List of vm_binding completed */
>> +	struct list_head vm_bound_list;
>> +	/* @va: tree of persistent vmas */
>> +	struct rb_root_cached va;
>> +
>>  	/* Global GTT */
>>  	bool is_ggtt:1;
>>
>> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>> index 1332c70370a68..9a9010fd9ecfa 100644
>> --- a/drivers/gpu/drm/i915/i915_driver.c
>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>> @@ -68,6 +68,7 @@
>>  #include "gem/i915_gem_ioctls.h"
>>  #include "gem/i915_gem_mman.h"
>>  #include "gem/i915_gem_pm.h"
>> +#include "gem/i915_gem_vm_bind.h"
>
>Why do you add this here if you don't use it for anything?
>
>>  #include "gt/intel_gt.h"
>>  #include "gt/intel_gt_pm.h"
>>  #include "gt/intel_rc6.h"
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>> index 2603717164900..092ae4309d8a1 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -29,6 +29,7 @@
>>  #include "display/intel_frontbuffer.h"
>>  #include "gem/i915_gem_lmem.h"
>>  #include "gem/i915_gem_tiling.h"
>> +#include "gem/i915_gem_vm_bind.h"
>>  #include "gt/intel_engine.h"
>>  #include "gt/intel_engine_heartbeat.h"
>>  #include "gt/intel_gt.h"
>> @@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>  	spin_unlock(&obj->vma.lock);
>>  	mutex_unlock(&vm->mutex);
>>
>> +	INIT_LIST_HEAD(&vma->vm_bind_link);
>>  	return vma;
>>
>>  err_unlock:
>> @@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>>  {
>>  	struct i915_vma *vma;
>>
>> -	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>>  	GEM_BUG_ON(!kref_read(&vm->ref));
>>
>>  	spin_lock(&obj->vma.lock);
>> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
>> index 33a58f605d75c..15eac55a3e274 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.h
>> +++ b/drivers/gpu/drm/i915/i915_vma.h
>> @@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
>>  {
>>  	ptrdiff_t cmp;
>>
>> -	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>> -
>>  	cmp = ptrdiff(vma->vm, vm);
>>  	if (cmp)
>>  		return cmp;
>> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
>> index be6e028c3b57d..f746fecae85ed 100644
>> --- a/drivers/gpu/drm/i915/i915_vma_types.h
>> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>> @@ -289,6 +289,20 @@ struct i915_vma {
>>  	/** This object's place on the active/inactive lists */
>>  	struct list_head vm_link;
>>
>> +	/** @vm_bind_link: node for the vm_bind related lists of vm */
>> +	struct list_head vm_bind_link;
>> +
>> +	/** Interval tree structures for persistent vma */
>> +
>> +	/** @rb: node for the interval tree of vm for persistent vmas */
>> +	struct rb_node rb;
>> +	/** @start: start endpoint of the rb node */
>> +	u64 start;
>> +	/** @last: Last endpoint of the rb node */
>> +	u64 last;
>> +	/** @__subtree_last: last in subtree */
>> +	u64 __subtree_last;
>> +
>>  	struct list_head obj_link; /* Link in the object's VMA list */
>>  	struct rb_node obj_node;
>>  	struct hlist_node obj_hash;
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 12435db751eb8..3da0e07f84bbd 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1507,6 +1507,41 @@ struct drm_i915_gem_execbuffer2 {
>>  #define i915_execbuffer2_get_context_id(eb2) \
>>  	((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
>>
>> +/**
>> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
>> + *
>> + * The operation will wait for input fence to signal.
>> + *
>> + * The returned output fence will be signaled after the completion of the
>> + * operation.
>> + */
>> +struct drm_i915_gem_timeline_fence {
>> +	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
>> +	__u32 handle;
>> +
>> +	/**
>> +	 * @flags: Supported flags are:
>> +	 *
>> +	 * I915_TIMELINE_FENCE_WAIT:
>> +	 * Wait for the input fence before the operation.
>> +	 *
>> +	 * I915_TIMELINE_FENCE_SIGNAL:
>> +	 * Return operation completion fence as output.
>> +	 */
>> +	__u32 flags;
>> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>> +
>> +	/**
>> +	 * @value: A point in the timeline.
>> +	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>> +	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
>> +	 * binary one.
>> +	 */
>> +	__u64 value;
>> +};
>> +
>>  struct drm_i915_gem_pin {
>>  	/** Handle of the buffer to be pinned. */
>>  	__u32 handle;
>> @@ -3718,6 +3753,134 @@ struct drm_i915_gem_create_ext_protected_content {
>>  /* ID of the protected content session managed by i915 when PXP is active */
>>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>>
>> +/**
>> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>> + *
>> + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
>> + * virtual address (VA) range to the section of an object that should be bound
>> + * in the device page table of the specified address space (VM).
>> + * The VA range specified must be unique (ie., not currently bound) and can
>> + * be mapped to whole object or a section of the object (partial binding).
>> + * Multiple VA mappings can be created to the same section of the object
>> + * (aliasing).
>> + *
>> + * The @start, @offset and @length must be 4K page aligned. However the DG2
>> + * and XEHPSDV has 64K page size for device local memory and has compact page
>> + * table. On those platforms, for binding device local-memory objects, the
>> + * @start, @offset and @length must be 64K aligned. Also, UMDs should not mix
>> + * the local memory 64K page and the system memory 4K page bindings in the same
>> + * 2M range.
>> + *
>> + * Error code -EINVAL will be returned if @start, @offset and @length are not
>> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
>> + * -ENOSPC will be returned if the VA range specified can't be reserved.
>> + *
>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>> + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
>> + * asynchronously, if valid @fence is specified.
>> + */
>> +struct drm_i915_gem_vm_bind {
>> +	/** @vm_id: VM (address space) id to bind */
>> +	__u32 vm_id;
>> +
>> +	/** @handle: Object handle */
>> +	__u32 handle;
>> +
>> +	/** @start: Virtual Address start to bind */
>> +	__u64 start;
>> +
>> +	/** @offset: Offset in object to bind */
>> +	__u64 offset;
>> +
>> +	/** @length: Length of mapping to bind */
>> +	__u64 length;
>> +
>> +	/**
>> +	 * @flags: Currently reserved, MBZ.
>> +	 *
>> +	 * Note that @fence carries its own flags.
>> +	 */
>> +	__u64 flags;
>> +
>> +	/**
>> +	 * @fence: Timeline fence for bind completion signaling.
>> +	 *
>> +	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
>> +	 *
>> +	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>> +	 * is invalid, and an error will be returned.
>> +	 *
>> +	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
>> +	 * is not requested and binding is completed synchronously.
>> +	 */
>> +	struct drm_i915_gem_timeline_fence fence;
>> +
>> +	/**
>> +	 * @extensions: Zero-terminated chain of extensions.
>> +	 *
>> +	 * For future extensions. See struct i915_user_extension.
>> +	 */
>> +	__u64 extensions;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>> + *
>> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
>> + * address (VA) range that should be unbound from the device page table of the
>> + * specified address space (VM). VM_UNBIND will force unbind the specified
>> + * range from device page table without waiting for any GPU job to complete.
>> + * It is UMDs responsibility to ensure the mapping is no longer in use before
>> + * calling VM_UNBIND.
>> + *
>> + * If the specified mapping is not found, the ioctl will simply return without
>> + * any error.
>> + *
>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>> + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
>> + * asynchronously, if valid @fence is specified.
>> + */
>> +struct drm_i915_gem_vm_unbind {
>> +	/** @vm_id: VM (address space) id to bind */
>> +	__u32 vm_id;
>> +
>> +	/** @rsvd: Reserved, MBZ */
>> +	__u32 rsvd;
>> +
>> +	/** @start: Virtual Address start to unbind */
>> +	__u64 start;
>> +
>> +	/** @length: Length of mapping to unbind */
>> +	__u64 length;
>> +
>> +	/**
>> +	 * @flags: Currently reserved, MBZ.
>> +	 *
>> +	 * Note that @fence carries its own flags.
>> +	 */
>> +	__u64 flags;
>> +
>> +	/**
>> +	 * @fence: Timeline fence for unbind completion signaling.
>> +	 *
>> +	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
>> +	 *
>> +	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>> +	 * is invalid, and an error will be returned.
>> +	 *
>> +	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
>> +	 * is not requested and unbinding is completed synchronously.
>> +	 */
>> +	struct drm_i915_gem_timeline_fence fence;
>> +
>> +	/**
>> +	 * @extensions: Zero-terminated chain of extensions.
>> +	 *
>> +	 * For future extensions. See struct i915_user_extension.
>> +	 */
>> +	__u64 extensions;
>> +};
>> +
>>  #if defined(__cplusplus)
>>  }
>>  #endif
>
>-- 
>Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Intel-gfx] [RFC PATCH v3 07/17] drm/i915/vm_bind: Handle persistent vmas
  2022-09-12 13:16   ` [Intel-gfx] " Jani Nikula
@ 2022-09-21  7:21     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 41+ messages in thread
From: Niranjana Vishwanathapura @ 2022-09-21  7:21 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Ramalingam C, intel-gfx, dri-devel, Thomas Hellstrom,
	Matthew Auld, Andi Shyti

On Mon, Sep 12, 2022 at 04:16:06PM +0300, Jani Nikula wrote:
>On Sat, 27 Aug 2022, Andi Shyti <andi.shyti@linux.intel.com> wrote:
>> From: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>
>> Treat VM_BIND vmas as persistent across execbuf ioctl calls and handle
>> them during the request submission in the execbuff path.
>>
>> Support eviction by maintaining a list of evicted persistent vmas
>> for rebinding during next submission.
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>> ---
>>  drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  8 +++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  4 ++
>>  drivers/gpu/drm/i915/i915_gem_gtt.c           | 38 +++++++++++++
>>  drivers/gpu/drm/i915/i915_gem_gtt.h           |  3 +
>>  drivers/gpu/drm/i915/i915_vma.c               | 50 +++++++++++++++--
>>  drivers/gpu/drm/i915/i915_vma.h               | 56 +++++++++++++++----
>>  drivers/gpu/drm/i915/i915_vma_types.h         | 24 ++++++++
>>  9 files changed, 169 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 389e9f157ca5e..825dce41f7113 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -38,6 +38,7 @@
>>  #include "i915_gem_mman.h"
>>  #include "i915_gem_object.h"
>>  #include "i915_gem_ttm.h"
>> +#include "i915_gem_vm_bind.h"
>
>Why do you add this here if you're not using anything from there?

Addressed in v4 rfc series.

>
>>  #include "i915_memcpy.h"
>>  #include "i915_trace.h"
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> index 9ff929f187cfd..3b45529fe8d4c 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> @@ -91,6 +91,13 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
>>  {
>>  	lockdep_assert_held(&vma->vm->vm_bind_lock);
>>
>> +	spin_lock(&vma->vm->vm_rebind_lock);
>> +	if (!list_empty(&vma->vm_rebind_link))
>> +		list_del_init(&vma->vm_rebind_link);
>> +	i915_vma_set_purged(vma);
>> +	i915_vma_set_freed(vma);
>> +	spin_unlock(&vma->vm->vm_rebind_lock);
>> +
>>  	if (!list_empty(&vma->vm_bind_link)) {
>>  		list_del_init(&vma->vm_bind_link);
>>  		list_del_init(&vma->non_priv_vm_bind_link);
>> @@ -190,6 +197,7 @@ static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
>>
>>  	vma->start = va->start;
>>  	vma->last = va->start + va->length - 1;
>> +	i915_vma_set_persistent(vma);
>>
>>  	return vma;
>>  }
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index c4f75826213ae..97cd0089b516d 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -296,6 +296,8 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>>  	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>>  	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
>>  	GEM_BUG_ON(IS_ERR(vm->root_obj));
>> +	INIT_LIST_HEAD(&vm->vm_rebind_list);
>> +	spin_lock_init(&vm->vm_rebind_lock);
>>  }
>>
>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index 9a2665e4ec2e5..1f3b1967ec175 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -265,6 +265,10 @@ struct i915_address_space {
>>  	struct list_head vm_bind_list;
>>  	/** @vm_bound_list: List of vm_binding completed */
>>  	struct list_head vm_bound_list;
>> +	/* @vm_rebind_list: list of vmas to be rebinded */
>> +	struct list_head vm_rebind_list;
>> +	/* @vm_rebind_lock: protects vm_rebound_list */
>> +	spinlock_t vm_rebind_lock;
>>  	/* @va: tree of persistent vmas */
>>  	struct rb_root_cached va;
>>  	struct list_head non_priv_vm_bind_list;
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index 329ff75b80b97..f083724163deb 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -25,6 +25,44 @@
>>  #include "i915_trace.h"
>>  #include "i915_vgpu.h"
>>
>> +/**
>> + * i915_vm_sync() - Wait for all requests on private vmas of a vm to be completed
>> + * @vm: address space we need to wait for idle
>> + *
>> + * Waits till all requests of the vm_binded private objs are completed.
>> + *
>> + * Returns: 0 on success -ve errcode on failure
>> + */
>> +int i915_vm_sync(struct i915_address_space *vm)
>> +{
>> +	int ret;
>> +
>> +	/* Wait for all requests under this vm to finish */
>> +	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
>> +				    DMA_RESV_USAGE_BOOKKEEP, false,
>> +				    MAX_SCHEDULE_TIMEOUT);
>> +	if (ret < 0)
>> +		return ret;
>> +	else if (ret > 0)
>> +		return 0;
>> +	else
>> +		return -ETIMEDOUT;
>> +}
>> +
>> +/**
>> + * i915_vm_is_active() - Check for activeness of requests of vm
>> + * @vm: address spece targetted
>> + *
>> + * Check whether all the requests related private vmas are completed or not
>> + *
>> + * Returns: True when requests are not completed yet. Flase otherwise.
>> + */
>> +bool i915_vm_is_active(const struct i915_address_space *vm)
>> +{
>> +	return !dma_resv_test_signaled(vm->root_obj->base.resv,
>> +				       DMA_RESV_USAGE_BOOKKEEP);
>> +}
>> +
>>  int i915_gem_gtt_prepare_pages(struct drm_i915_gem_object *obj,
>>  			       struct sg_table *pages)
>>  {
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> index 8c2f57eb5ddaa..a5bbdc59d9dfb 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> @@ -51,4 +51,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
>>
>>  #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
>>
>> +int i915_vm_sync(struct i915_address_space *vm);
>> +bool i915_vm_is_active(const struct i915_address_space *vm);
>
>Maybe I don't get the gem header structure, but why do you add these in
>i915_gem_gtt.h but the implementation in i915_vma.c?
>
>In general, declarations for stuff in i915_foo.c should be in
>i915_foo.h.

These are _vm_ functions (not _vma_ functions), hence they are here.

Niranjana

>
>BR,
>Jani.
>
>> +
>>  #endif
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>> index 239346e0c07f2..0eb7727d62a6f 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>
>>  	INIT_LIST_HEAD(&vma->vm_bind_link);
>>  	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>> +	INIT_LIST_HEAD(&vma->vm_rebind_link);
>>  	return vma;
>>
>>  err_unlock:
>> @@ -387,8 +388,31 @@ int i915_vma_wait_for_bind(struct i915_vma *vma)
>>  	return err;
>>  }
>>
>> -#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
>> -static int i915_vma_verify_bind_complete(struct i915_vma *vma)
>> +/**
>> + * i915_vma_sync() - Wait for the vma to be idle
>> + * @vma: vma to be tested
>> + *
>> + * Returns 0 on success and error code on failure
>> + */
>> +int i915_vma_sync(struct i915_vma *vma)
>> +{
>> +	int ret;
>> +
>> +	/* Wait for the asynchronous bindings and pending GPU reads */
>> +	ret = i915_active_wait(&vma->active);
>> +	if (ret || !i915_vma_is_persistent(vma) || i915_vma_is_purged(vma))
>> +		return ret;
>> +
>> +	return i915_vm_sync(vma->vm);
>> +}
>> +
>> +/**
>> + * i915_vma_verify_bind_complete() - Check for the vm_bind completion of the vma
>> + * @vma: vma submitted for vm_bind
>> + *
>> + * Returns: 0 if the vm_bind is completed. Error code otherwise.
>> + */
>> +int i915_vma_verify_bind_complete(struct i915_vma *vma)
>>  {
>>  	struct dma_fence *fence = i915_active_fence_get(&vma->active.excl);
>>  	int err;
>> @@ -405,9 +429,6 @@ static int i915_vma_verify_bind_complete(struct i915_vma *vma)
>>
>>  	return err;
>>  }
>> -#else
>> -#define i915_vma_verify_bind_complete(_vma) 0
>> -#endif
>>
>>  I915_SELFTEST_EXPORT void
>>  i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res,
>> @@ -1654,6 +1675,13 @@ static void force_unbind(struct i915_vma *vma)
>>  	if (!drm_mm_node_allocated(&vma->node))
>>  		return;
>>
>> +	/*
>> +	 * Mark persistent vma as purged to avoid it waiting
>> +	 * for VM to be released.
>> +	 */
>> +	if (i915_vma_is_persistent(vma))
>> +		i915_vma_set_purged(vma);
>> +
>>  	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
>>  	WARN_ON(__i915_vma_unbind(vma));
>>  	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
>> @@ -1846,6 +1874,8 @@ int _i915_vma_move_to_active(struct i915_vma *vma,
>>  	int err;
>>
>>  	assert_object_held(obj);
>> +	if (i915_vma_is_persistent(vma))
>> +		return -EINVAL;
>>
>>  	GEM_BUG_ON(!vma->pages);
>>
>> @@ -2014,6 +2044,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
>>  	__i915_vma_evict(vma, false);
>>
>>  	drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
>> +
>> +	if (i915_vma_is_persistent(vma)) {
>> +		spin_lock(&vma->vm->vm_rebind_lock);
>> +		if (list_empty(&vma->vm_rebind_link) &&
>> +		    !i915_vma_is_purged(vma))
>> +			list_add_tail(&vma->vm_rebind_link,
>> +				      &vma->vm->vm_rebind_list);
>> +		spin_unlock(&vma->vm->vm_rebind_lock);
>> +	}
>> +
>>  	return 0;
>>  }
>>
>> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
>> index 15eac55a3e274..bf0b5b4abd919 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.h
>> +++ b/drivers/gpu/drm/i915/i915_vma.h
>> @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>>
>>  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
>>  #define I915_VMA_RELEASE_MAP BIT(0)
>> -
>> -static inline bool i915_vma_is_active(const struct i915_vma *vma)
>> -{
>> -	return !i915_active_is_idle(&vma->active);
>> -}
>> -
>>  /* do not reserve memory to prevent deadlocks */
>>  #define __EXEC_OBJECT_NO_RESERVE BIT(31)
>>
>> @@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
>>  	return i915_vm_to_ggtt(vma->vm)->pin_bias;
>>  }
>>
>> +static inline bool i915_vma_is_persistent(const struct i915_vma *vma)
>> +{
>> +	return test_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline void i915_vma_set_persistent(struct i915_vma *vma)
>> +{
>> +	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline bool i915_vma_is_purged(const struct i915_vma *vma)
>> +{
>> +	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline void i915_vma_set_purged(struct i915_vma *vma)
>> +{
>> +	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline bool i915_vma_is_freed(const struct i915_vma *vma)
>> +{
>> +	return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline void i915_vma_set_freed(struct i915_vma *vma)
>> +{
>> +	set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline bool i915_vma_is_active(const struct i915_vma *vma)
>> +{
>> +	if (i915_vma_is_persistent(vma)) {
>> +		if (i915_vma_is_purged(vma))
>> +			return false;
>> +
>> +		return i915_vm_is_active(vma->vm);
>> +	}
>> +
>> +	return !i915_active_is_idle(&vma->active);
>> +}
>> +
>>  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
>>  {
>>  	i915_gem_object_get(vma->obj);
>> @@ -406,12 +442,8 @@ void i915_vma_make_shrinkable(struct i915_vma *vma);
>>  void i915_vma_make_purgeable(struct i915_vma *vma);
>>
>>  int i915_vma_wait_for_bind(struct i915_vma *vma);
>> -
>> -static inline int i915_vma_sync(struct i915_vma *vma)
>> -{
>> -	/* Wait for the asynchronous bindings and pending GPU reads */
>> -	return i915_active_wait(&vma->active);
>> -}
>> +int i915_vma_verify_bind_complete(struct i915_vma *vma);
>> +int i915_vma_sync(struct i915_vma *vma);
>>
>>  /**
>>   * i915_vma_get_current_resource - Get the current resource of the vma
>> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
>> index de5534d518cdd..5483ccf0c82c7 100644
>> --- a/drivers/gpu/drm/i915/i915_vma_types.h
>> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>> @@ -264,6 +264,28 @@ struct i915_vma {
>>  #define I915_VMA_SCANOUT_BIT	17
>>  #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
>>
>> +  /**
>> +   * I915_VMA_PERSISTENT_BIT:
>> +   * The vma is persistent (created with VM_BIND call).
>> +   *
>> +   * I915_VMA_PURGED_BIT:
>> +   * The persistent vma is force unbound either due to VM_UNBIND call
>> +   * from UMD or VM is released. Do not check/wait for VM activeness
>> +   * in i915_vma_is_active() and i915_vma_sync() calls.
>> +   *
>> +   * I915_VMA_FREED_BIT:
>> +   * The persistent vma is being released by UMD via VM_UNBIND call.
>> +   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND call
>> +   * already holds the lock.
>> +   */
>> +#define I915_VMA_PERSISTENT_BIT	19
>> +#define I915_VMA_PURGED_BIT	20
>> +#define I915_VMA_FREED_BIT	21
>> +
>> +#define I915_VMA_PERSISTENT	((int)BIT(I915_VMA_PERSISTENT_BIT))
>> +#define I915_VMA_PURGED		((int)BIT(I915_VMA_PURGED_BIT))
>> +#define I915_VMA_FREED		((int)BIT(I915_VMA_FREED_BIT))
>> +
>>  	struct i915_active active;
>>
>>  #define I915_VMA_PAGES_BIAS 24
>> @@ -293,6 +315,8 @@ struct i915_vma {
>>  	struct list_head vm_bind_link;
>>  	/* @non_priv_vm_bind_link: Link in non-private persistent VMA list */
>>  	struct list_head non_priv_vm_bind_link;
>> +	/* @vm_rebind_link: link to vm_rebind_list and protected by vm_rebind_lock */
>> +	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
>>
>>  	/** Interval tree structures for persistent vma */
>
>-- 
>Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2022-09-21  7:22 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-27 19:43 [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Andi Shyti
2022-08-27 19:43 ` [RFC PATCH v3 01/17] drm/i915: Expose vm_lookup in i915_gem_context.h Andi Shyti
2022-08-27 19:43 ` [RFC PATCH v3 02/17] drm/i915: Mark vm for vm_bind usage at creation Andi Shyti
2022-08-27 19:43 ` [RFC PATCH v3 03/17] drm/i915/gem: expose i915_gem_object_max_page_size() in i915_gem_object.h Andi Shyti
2022-08-27 19:43 ` [RFC PATCH v3 04/17] drm/i915: Implement bind and unbind of object Andi Shyti
2022-08-30 17:37   ` Matthew Auld
2022-08-31  6:10     ` Niranjana Vishwanathapura
2022-08-30 18:19   ` Matthew Auld
2022-08-31  7:28     ` [Intel-gfx] " Tvrtko Ursulin
2022-09-01  5:18     ` Niranjana Vishwanathapura
2022-09-01  5:31   ` Dave Airlie
2022-09-01 20:05     ` Niranjana Vishwanathapura
2022-09-12 13:11   ` Jani Nikula
2022-09-21  7:19     ` Niranjana Vishwanathapura
2022-08-27 19:43 ` [RFC PATCH v3 05/17] drm/i915: Support for VM private BOs Andi Shyti
2022-08-31  6:13   ` Niranjana Vishwanathapura
2022-08-27 19:43 ` [RFC PATCH v3 06/17] drm/i915/dmabuf: Deny the dmabuf export " Andi Shyti
2022-08-27 19:43 ` [RFC PATCH v3 07/17] drm/i915/vm_bind: Handle persistent vmas Andi Shyti
2022-08-31  6:16   ` Niranjana Vishwanathapura
2022-09-12 13:16   ` [Intel-gfx] " Jani Nikula
2022-09-21  7:21     ` Niranjana Vishwanathapura
2022-08-27 19:43 ` [RFC PATCH v3 08/17] drm/i915/vm_bind: Add out fence support Andi Shyti
2022-08-31  6:22   ` Niranjana Vishwanathapura
2022-08-27 19:43 ` [RFC PATCH v3 09/17] drm/i915: Do not support vm_bind mode in execbuf2 Andi Shyti
2022-08-31  5:45   ` Niranjana Vishwanathapura
2022-08-27 19:43 ` [RFC PATCH v3 10/17] drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl Andi Shyti
2022-08-31  7:38   ` [Intel-gfx] " Tvrtko Ursulin
2022-09-01  5:09     ` Niranjana Vishwanathapura
2022-09-01  7:58       ` Tvrtko Ursulin
2022-09-02  5:41         ` Niranjana Vishwanathapura
2022-09-05 15:08           ` Tvrtko Ursulin
2022-09-21  7:18             ` Niranjana Vishwanathapura
2022-08-27 19:43 ` [RFC PATCH v3 11/17] drm/i915: Add i915_vma_is_bind_complete() Andi Shyti
2022-08-27 19:43 ` [RFC PATCH v3 12/17] drm/i915/vm_bind: Handle persistent vmas in execbuf3 Andi Shyti
2022-08-27 19:43 ` [RFC PATCH v3 13/17] drm/i915/vm_bind: userptr dma-resv changes Andi Shyti
2022-08-31  6:45   ` Niranjana Vishwanathapura
2022-08-27 19:44 ` [RFC PATCH v3 14/17] drm/i915/vm_bind: Skip vma_lookup for persistent vmas Andi Shyti
2022-08-27 19:44 ` [RFC PATCH v3 15/17] drm/i915: Extend getparm for VM_BIND capability Andi Shyti
2022-08-27 19:44 ` [RFC PATCH v3 16/17] drm/i915/ioctl: Enable the vm_bind/unbind ioctls Andi Shyti
2022-08-27 19:44 ` [RFC PATCH v3 17/17] drm/i915: Enable execbuf3 ioctl for vm_bind Andi Shyti
2022-08-31  7:33 ` [Intel-gfx] [RFC PATCH v3 00/17] drm/i915/vm_bind: Add VM_BIND functionality Tvrtko Ursulin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).