All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/10] drm/i915/vm_bind: Add VM_BIND functionality
@ 2022-07-01 22:50 ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM
buffer objects (BOs) or sections of a BOs at specified GPU virtual
addresses on a specified address space (VM). Multiple mappings can map
to the same physical pages of an object (aliasing). These mappings (also
referred to as persistent mappings) will be persistent across multiple
GPU submissions (execbuf calls) issued by the UMD, without user having
to provide a list of all required mappings during each submission (as
required by older execbuf mode).

This patch series support VM_BIND version 1, as described by the param
I915_PARAM_VM_BIND_VERSION.

Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only works in
vm_bind mode. The vm_bind mode only works with this new execbuf3 ioctl.
The new execbuf3 ioctl will not have any execlist support and all the
legacy support like relocations etc., are removed.

TODOs:
* Support out fence for VM_UNBIND ioctl.
* Async VM_UNBIND support.
* Share code between execbuf2 and execbuf3 where possible.
* Cleanups and optimizations.

NOTEs:
* It is based on below VM_BIND design+uapi patch series.
  https://lists.freedesktop.org/archives/intel-gfx/2022-July/300760.html

* The IGT RFC series is posted as,
  [RFC 0/5] vm_bind: Add VM_BIND validation support

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Niranjana Vishwanathapura (10):
  drm/i915/vm_bind: Introduce VM_BIND ioctl
  drm/i915/vm_bind: Bind and unbind mappings
  drm/i915/vm_bind: Support private and shared BOs
  drm/i915/vm_bind: Add out fence support
  drm/i915/vm_bind: Handle persistent vmas
  drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
  drm/i915/vm_bind: Handle persistent vmas in execbuf3
  drm/i915/vm_bind: userptr dma-resv changes
  drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  drm/i915/vm_bind: Fix vm->vm_bind_mutex and vm->mutex nesting

 drivers/gpu/drm/i915/Makefile                 |    2 +
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   20 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |   15 +
 drivers/gpu/drm/i915/gem/i915_gem_create.c    |   51 +-
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |    6 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |    5 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1270 +++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |    1 +
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |    2 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |    3 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |    3 +
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |   54 +
 .../drm/i915/gem/i915_gem_vm_bind_object.c    |  342 +++++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |   37 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   20 +
 drivers/gpu/drm/i915/i915_driver.c            |   38 +
 drivers/gpu/drm/i915/i915_gem_gtt.h           |   22 +
 drivers/gpu/drm/i915/i915_getparam.c          |    3 +
 drivers/gpu/drm/i915/i915_vma.c               |   59 +-
 drivers/gpu/drm/i915/i915_vma.h               |   80 +-
 drivers/gpu/drm/i915/i915_vma_types.h         |   40 +
 include/uapi/drm/i915_drm.h                   |  289 +++-
 23 files changed, 2316 insertions(+), 48 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c

-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply	[flat|nested] 121+ messages in thread

* [Intel-gfx] [RFC 00/10] drm/i915/vm_bind: Add VM_BIND functionality
@ 2022-07-01 22:50 ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM
buffer objects (BOs) or sections of a BOs at specified GPU virtual
addresses on a specified address space (VM). Multiple mappings can map
to the same physical pages of an object (aliasing). These mappings (also
referred to as persistent mappings) will be persistent across multiple
GPU submissions (execbuf calls) issued by the UMD, without user having
to provide a list of all required mappings during each submission (as
required by older execbuf mode).

This patch series support VM_BIND version 1, as described by the param
I915_PARAM_VM_BIND_VERSION.

Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only works in
vm_bind mode. The vm_bind mode only works with this new execbuf3 ioctl.
The new execbuf3 ioctl will not have any execlist support and all the
legacy support like relocations etc., are removed.

TODOs:
* Support out fence for VM_UNBIND ioctl.
* Async VM_UNBIND support.
* Share code between execbuf2 and execbuf3 where possible.
* Cleanups and optimizations.

NOTEs:
* It is based on below VM_BIND design+uapi patch series.
  https://lists.freedesktop.org/archives/intel-gfx/2022-July/300760.html

* The IGT RFC series is posted as,
  [RFC 0/5] vm_bind: Add VM_BIND validation support

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Niranjana Vishwanathapura (10):
  drm/i915/vm_bind: Introduce VM_BIND ioctl
  drm/i915/vm_bind: Bind and unbind mappings
  drm/i915/vm_bind: Support private and shared BOs
  drm/i915/vm_bind: Add out fence support
  drm/i915/vm_bind: Handle persistent vmas
  drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
  drm/i915/vm_bind: Handle persistent vmas in execbuf3
  drm/i915/vm_bind: userptr dma-resv changes
  drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  drm/i915/vm_bind: Fix vm->vm_bind_mutex and vm->mutex nesting

 drivers/gpu/drm/i915/Makefile                 |    2 +
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   20 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |   15 +
 drivers/gpu/drm/i915/gem/i915_gem_create.c    |   51 +-
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |    6 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |    5 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1270 +++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |    1 +
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |    2 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |    3 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |    3 +
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |   54 +
 .../drm/i915/gem/i915_gem_vm_bind_object.c    |  342 +++++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |   37 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   20 +
 drivers/gpu/drm/i915/i915_driver.c            |   38 +
 drivers/gpu/drm/i915/i915_gem_gtt.h           |   22 +
 drivers/gpu/drm/i915/i915_getparam.c          |    3 +
 drivers/gpu/drm/i915/i915_vma.c               |   59 +-
 drivers/gpu/drm/i915/i915_vma.h               |   80 +-
 drivers/gpu/drm/i915/i915_vma_types.h         |   40 +
 include/uapi/drm/i915_drm.h                   |  289 +++-
 23 files changed, 2316 insertions(+), 48 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c

-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply	[flat|nested] 121+ messages in thread

* [RFC 01/10] drm/i915/vm_bind: Introduce VM_BIND ioctl
  2022-07-01 22:50 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Add VM_BIND and VM_UNBIND ioctls to bind/unbind a section of an
object at the specified GPU virtual addresses.

Add I915_PARAM_VM_BIND_VERSION to indicate version of VM_BIND feature
supported and I915_VM_CREATE_FLAGS_USE_VM_BIND for UMDs to select the
vm_bind mode of binding.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c |  20 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h |  15 ++
 drivers/gpu/drm/i915/gt/intel_gtt.h         |   6 +
 drivers/gpu/drm/i915/i915_driver.c          |  30 +++
 drivers/gpu/drm/i915/i915_getparam.c        |   3 +
 include/uapi/drm/i915_drm.h                 | 192 +++++++++++++++++++-
 6 files changed, 248 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dabdfe09f5e5..e3f5fbf2ac05 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -81,7 +81,6 @@
 
 #include "pxp/intel_pxp.h"
 
-#include "i915_file_private.h"
 #include "i915_gem_context.h"
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
@@ -346,20 +345,6 @@ static int proto_context_register(struct drm_i915_file_private *fpriv,
 	return ret;
 }
 
-static struct i915_address_space *
-i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
-{
-	struct i915_address_space *vm;
-
-	xa_lock(&file_priv->vm_xa);
-	vm = xa_load(&file_priv->vm_xa, id);
-	if (vm)
-		kref_get(&vm->ref);
-	xa_unlock(&file_priv->vm_xa);
-
-	return vm;
-}
-
 static int set_proto_ctx_vm(struct drm_i915_file_private *fpriv,
 			    struct i915_gem_proto_context *pc,
 			    const struct drm_i915_gem_context_param *args)
@@ -1799,7 +1784,7 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
 	if (!HAS_FULL_PPGTT(i915))
 		return -ENODEV;
 
-	if (args->flags)
+	if (args->flags & I915_VM_CREATE_FLAGS_UNKNOWN)
 		return -EINVAL;
 
 	ppgtt = i915_ppgtt_create(to_gt(i915), 0);
@@ -1819,6 +1804,9 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
 	if (err)
 		goto err_put;
 
+	if (args->flags & I915_VM_CREATE_FLAGS_USE_VM_BIND)
+		ppgtt->vm.vm_bind_mode = true;
+
 	GEM_BUG_ON(id == 0); /* reserved for invalid/unassigned ppgtt */
 	args->vm_id = id;
 	return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index e5b0f66ea1fe..723bf446c934 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -12,6 +12,7 @@
 #include "gt/intel_context.h"
 
 #include "i915_drv.h"
+#include "i915_file_private.h"
 #include "i915_gem.h"
 #include "i915_scheduler.h"
 #include "intel_device_info.h"
@@ -139,6 +140,20 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
 				       struct drm_file *file);
 
+static inline struct i915_address_space *
+i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
+{
+	struct i915_address_space *vm;
+
+	xa_lock(&file_priv->vm_xa);
+	vm = xa_load(&file_priv->vm_xa, id);
+	if (vm)
+		kref_get(&vm->ref);
+	xa_unlock(&file_priv->vm_xa);
+
+	return vm;
+}
+
 struct i915_gem_context *
 i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index e639434e97fd..c812aa9708ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -271,6 +271,12 @@ struct i915_address_space {
 	/* Skip pte rewrite on unbind for suspend. Protected by @mutex */
 	bool skip_pte_rewrite:1;
 
+	/**
+	 * true: allow only vm_bind method of binding.
+	 * false: allow only legacy execbuff method of binding.
+	 */
+	bool vm_bind_mode:1;
+
 	u8 top;
 	u8 pd_shift;
 	u8 scratch_order;
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index deb8a8b76965..ccf990dfd99b 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1778,6 +1778,34 @@ i915_gem_reject_pin_ioctl(struct drm_device *dev, void *data,
 	return -ENODEV;
 }
 
+static int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
+				  struct drm_file *file)
+{
+	struct drm_i915_gem_vm_bind *args = data;
+	struct i915_address_space *vm;
+
+	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
+	if (unlikely(!vm))
+		return -ENOENT;
+
+	i915_vm_put(vm);
+	return -EINVAL;
+}
+
+static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
+				    struct drm_file *file)
+{
+	struct drm_i915_gem_vm_unbind *args = data;
+	struct i915_address_space *vm;
+
+	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
+	if (unlikely(!vm))
+		return -ENOENT;
+
+	i915_vm_put(vm);
+	return -EINVAL;
+}
+
 static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_INIT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_FLUSH, drm_noop, DRM_AUTH),
@@ -1838,6 +1866,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_BIND, i915_gem_vm_bind_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_UNBIND, i915_gem_vm_unbind_ioctl, DRM_RENDER_ALLOW),
 };
 
 /*
diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c
index 6fd15b39570c..c1d53febc5de 100644
--- a/drivers/gpu/drm/i915/i915_getparam.c
+++ b/drivers/gpu/drm/i915/i915_getparam.c
@@ -175,6 +175,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_PERF_REVISION:
 		value = i915_perf_ioctl_version();
 		break;
+	case I915_PARAM_VM_BIND_VERSION:
+		value = GRAPHICS_VER(i915) >= 12 ? 1 : 0;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 3e78a00220ea..26cca49717f8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -470,6 +470,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_VM_CREATE		0x3a
 #define DRM_I915_GEM_VM_DESTROY		0x3b
 #define DRM_I915_GEM_CREATE_EXT		0x3c
+#define DRM_I915_GEM_VM_BIND		0x3d
+#define DRM_I915_GEM_VM_UNBIND		0x3e
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
@@ -534,6 +536,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_QUERY			DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query)
 #define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
 #define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
+#define DRM_IOCTL_I915_GEM_VM_BIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -749,6 +753,25 @@ typedef struct drm_i915_irq_wait {
 /* Query if the kernel supports the I915_USERPTR_PROBE flag. */
 #define I915_PARAM_HAS_USERPTR_PROBE 56
 
+/*
+ * VM_BIND feature version supported.
+ *
+ * The following versions of VM_BIND have been defined:
+ *
+ * 0: No VM_BIND support.
+ *
+ * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
+ *    previously with VM_BIND, the ioctl will not support unbinding multiple
+ *    mappings or splitting them. Similarly, VM_BIND calls will not replace
+ *    any existing mappings.
+ *
+ * 2: The restrictions on unbinding partial or multiple mappings is
+ *    lifted, Similarly, binding will replace any mappings in the given range.
+ *
+ * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
+ */
+#define I915_PARAM_VM_BIND_VERSION	57
+
 /* Must be kept compact -- no holes and well documented */
 
 typedef struct drm_i915_getparam {
@@ -1441,6 +1464,41 @@ struct drm_i915_gem_execbuffer2 {
 #define i915_execbuffer2_get_context_id(eb2) \
 	((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
 
+/**
+ * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
+ *
+ * The operation will wait for input fence to signal.
+ *
+ * The returned output fence will be signaled after the completion of the
+ * operation.
+ */
+struct drm_i915_gem_timeline_fence {
+	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
+	__u32 handle;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_TIMELINE_FENCE_WAIT:
+	 * Wait for the input fence before the operation.
+	 *
+	 * I915_TIMELINE_FENCE_SIGNAL:
+	 * Return operation completion fence as output.
+	 */
+	__u32 flags;
+#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
+#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
+#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
+
+	/**
+	 * @value: A point in the timeline.
+	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+	 * binary one.
+	 */
+	__u64 value;
+};
+
 struct drm_i915_gem_pin {
 	/** Handle of the buffer to be pinned. */
 	__u32 handle;
@@ -2397,8 +2455,6 @@ struct drm_i915_gem_context_destroy {
  * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
  * returned in the outparam @id.
  *
- * No flags are defined, with all bits reserved and must be zero.
- *
  * An extension chain maybe provided, starting with @extensions, and terminated
  * by the @next_extension being 0. Currently, no extensions are defined.
  *
@@ -2410,6 +2466,10 @@ struct drm_i915_gem_context_destroy {
  */
 struct drm_i915_gem_vm_control {
 	__u64 extensions;
+
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND	(1u << 0)
+#define I915_VM_CREATE_FLAGS_UNKNOWN \
+	(-(I915_VM_CREATE_FLAGS_USE_VM_BIND << 1))
 	__u32 flags;
 	__u32 vm_id;
 };
@@ -3602,6 +3662,134 @@ struct drm_i915_gem_create_ext_protected_content {
 /* ID of the protected content session managed by i915 when PXP is active */
 #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
 
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
+ * virtual address (VA) range to the section of an object that should be bound
+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) and can
+ * be mapped to whole object or a section of the object (partial binding).
+ * Multiple VA mappings can be created to the same section of the object
+ * (aliasing).
+ *
+ * The @start, @offset and @length must be 4K page aligned. However the DG2
+ * and XEHPSDV has 64K page size for device local memory and has compact page
+ * table. On those platforms, for binding device local-memory objects, the
+ * @start, @offset and @length must be 64K aligned. Also, UMDs should not mix
+ * the local memory 64K page and the system memory 4K page bindings in the same
+ * 2M range.
+ *
+ * Error code -EINVAL will be returned if @start, @offset and @length are not
+ * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
+ * -ENOSPC will be returned if the VA range specified can't be reserved.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_BIND operation can be done
+ * asynchronously, if valid @fence is specified.
+ */
+struct drm_i915_gem_vm_bind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @handle: Object handle */
+	__u32 handle;
+
+	/** @start: Virtual Address start to bind */
+	__u64 start;
+
+	/** @offset: Offset in object to bind */
+	__u64 offset;
+
+	/** @length: Length of mapping to bind */
+	__u64 length;
+
+	/**
+	 * @flags: Currently reserved, MBZ.
+	 *
+	 * Note that @fence carries its own flags.
+	 */
+	__u64 flags;
+
+	/**
+	 * @fence: Timeline fence for bind completion signaling.
+	 *
+	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
+	 *
+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
+	 * is invalid, and an error will be returned.
+	 *
+	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
+	 * is not requested and binding is completed synchronously.
+	 */
+	struct drm_i915_gem_timeline_fence fence;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
+ *
+ * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
+ * address (VA) range that should be unbound from the device page table of the
+ * specified address space (VM). VM_UNBIND will force unbind the specified
+ * range from device page table without waiting for any GPU job to complete.
+ * It is UMDs responsibility to ensure the mapping is no longer in use before
+ * calling VM_UNBIND.
+ *
+ * If the specified mapping is not found, the ioctl will simply return without
+ * any error.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
+ * asynchronously, if valid @fence is specified.
+ */
+struct drm_i915_gem_vm_unbind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @rsvd: Reserved, MBZ */
+	__u32 rsvd;
+
+	/** @start: Virtual Address start to unbind */
+	__u64 start;
+
+	/** @length: Length of mapping to unbind */
+	__u64 length;
+
+	/**
+	 * @flags: Currently reserved, MBZ.
+	 *
+	 * Note that @fence carries its own flags.
+	 */
+	__u64 flags;
+
+	/**
+	 * @fence: Timeline fence for unbind completion signaling.
+	 *
+	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
+	 *
+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
+	 * is invalid, and an error will be returned.
+	 *
+	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
+	 * is not requested and unbinding is completed synchronously.
+	 */
+	struct drm_i915_gem_timeline_fence fence;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [Intel-gfx] [RFC 01/10] drm/i915/vm_bind: Introduce VM_BIND ioctl
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Add VM_BIND and VM_UNBIND ioctls to bind/unbind a section of an
object at the specified GPU virtual addresses.

Add I915_PARAM_VM_BIND_VERSION to indicate version of VM_BIND feature
supported and I915_VM_CREATE_FLAGS_USE_VM_BIND for UMDs to select the
vm_bind mode of binding.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c |  20 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h |  15 ++
 drivers/gpu/drm/i915/gt/intel_gtt.h         |   6 +
 drivers/gpu/drm/i915/i915_driver.c          |  30 +++
 drivers/gpu/drm/i915/i915_getparam.c        |   3 +
 include/uapi/drm/i915_drm.h                 | 192 +++++++++++++++++++-
 6 files changed, 248 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dabdfe09f5e5..e3f5fbf2ac05 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -81,7 +81,6 @@
 
 #include "pxp/intel_pxp.h"
 
-#include "i915_file_private.h"
 #include "i915_gem_context.h"
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
@@ -346,20 +345,6 @@ static int proto_context_register(struct drm_i915_file_private *fpriv,
 	return ret;
 }
 
-static struct i915_address_space *
-i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
-{
-	struct i915_address_space *vm;
-
-	xa_lock(&file_priv->vm_xa);
-	vm = xa_load(&file_priv->vm_xa, id);
-	if (vm)
-		kref_get(&vm->ref);
-	xa_unlock(&file_priv->vm_xa);
-
-	return vm;
-}
-
 static int set_proto_ctx_vm(struct drm_i915_file_private *fpriv,
 			    struct i915_gem_proto_context *pc,
 			    const struct drm_i915_gem_context_param *args)
@@ -1799,7 +1784,7 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
 	if (!HAS_FULL_PPGTT(i915))
 		return -ENODEV;
 
-	if (args->flags)
+	if (args->flags & I915_VM_CREATE_FLAGS_UNKNOWN)
 		return -EINVAL;
 
 	ppgtt = i915_ppgtt_create(to_gt(i915), 0);
@@ -1819,6 +1804,9 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
 	if (err)
 		goto err_put;
 
+	if (args->flags & I915_VM_CREATE_FLAGS_USE_VM_BIND)
+		ppgtt->vm.vm_bind_mode = true;
+
 	GEM_BUG_ON(id == 0); /* reserved for invalid/unassigned ppgtt */
 	args->vm_id = id;
 	return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index e5b0f66ea1fe..723bf446c934 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -12,6 +12,7 @@
 #include "gt/intel_context.h"
 
 #include "i915_drv.h"
+#include "i915_file_private.h"
 #include "i915_gem.h"
 #include "i915_scheduler.h"
 #include "intel_device_info.h"
@@ -139,6 +140,20 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
 				       struct drm_file *file);
 
+static inline struct i915_address_space *
+i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
+{
+	struct i915_address_space *vm;
+
+	xa_lock(&file_priv->vm_xa);
+	vm = xa_load(&file_priv->vm_xa, id);
+	if (vm)
+		kref_get(&vm->ref);
+	xa_unlock(&file_priv->vm_xa);
+
+	return vm;
+}
+
 struct i915_gem_context *
 i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index e639434e97fd..c812aa9708ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -271,6 +271,12 @@ struct i915_address_space {
 	/* Skip pte rewrite on unbind for suspend. Protected by @mutex */
 	bool skip_pte_rewrite:1;
 
+	/**
+	 * true: allow only vm_bind method of binding.
+	 * false: allow only legacy execbuff method of binding.
+	 */
+	bool vm_bind_mode:1;
+
 	u8 top;
 	u8 pd_shift;
 	u8 scratch_order;
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index deb8a8b76965..ccf990dfd99b 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1778,6 +1778,34 @@ i915_gem_reject_pin_ioctl(struct drm_device *dev, void *data,
 	return -ENODEV;
 }
 
+static int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
+				  struct drm_file *file)
+{
+	struct drm_i915_gem_vm_bind *args = data;
+	struct i915_address_space *vm;
+
+	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
+	if (unlikely(!vm))
+		return -ENOENT;
+
+	i915_vm_put(vm);
+	return -EINVAL;
+}
+
+static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
+				    struct drm_file *file)
+{
+	struct drm_i915_gem_vm_unbind *args = data;
+	struct i915_address_space *vm;
+
+	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
+	if (unlikely(!vm))
+		return -ENOENT;
+
+	i915_vm_put(vm);
+	return -EINVAL;
+}
+
 static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_INIT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_FLUSH, drm_noop, DRM_AUTH),
@@ -1838,6 +1866,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_BIND, i915_gem_vm_bind_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_UNBIND, i915_gem_vm_unbind_ioctl, DRM_RENDER_ALLOW),
 };
 
 /*
diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c
index 6fd15b39570c..c1d53febc5de 100644
--- a/drivers/gpu/drm/i915/i915_getparam.c
+++ b/drivers/gpu/drm/i915/i915_getparam.c
@@ -175,6 +175,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_PERF_REVISION:
 		value = i915_perf_ioctl_version();
 		break;
+	case I915_PARAM_VM_BIND_VERSION:
+		value = GRAPHICS_VER(i915) >= 12 ? 1 : 0;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 3e78a00220ea..26cca49717f8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -470,6 +470,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_VM_CREATE		0x3a
 #define DRM_I915_GEM_VM_DESTROY		0x3b
 #define DRM_I915_GEM_CREATE_EXT		0x3c
+#define DRM_I915_GEM_VM_BIND		0x3d
+#define DRM_I915_GEM_VM_UNBIND		0x3e
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
@@ -534,6 +536,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_QUERY			DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query)
 #define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
 #define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
+#define DRM_IOCTL_I915_GEM_VM_BIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -749,6 +753,25 @@ typedef struct drm_i915_irq_wait {
 /* Query if the kernel supports the I915_USERPTR_PROBE flag. */
 #define I915_PARAM_HAS_USERPTR_PROBE 56
 
+/*
+ * VM_BIND feature version supported.
+ *
+ * The following versions of VM_BIND have been defined:
+ *
+ * 0: No VM_BIND support.
+ *
+ * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
+ *    previously with VM_BIND, the ioctl will not support unbinding multiple
+ *    mappings or splitting them. Similarly, VM_BIND calls will not replace
+ *    any existing mappings.
+ *
+ * 2: The restrictions on unbinding partial or multiple mappings is
+ *    lifted, Similarly, binding will replace any mappings in the given range.
+ *
+ * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
+ */
+#define I915_PARAM_VM_BIND_VERSION	57
+
 /* Must be kept compact -- no holes and well documented */
 
 typedef struct drm_i915_getparam {
@@ -1441,6 +1464,41 @@ struct drm_i915_gem_execbuffer2 {
 #define i915_execbuffer2_get_context_id(eb2) \
 	((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
 
+/**
+ * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
+ *
+ * The operation will wait for input fence to signal.
+ *
+ * The returned output fence will be signaled after the completion of the
+ * operation.
+ */
+struct drm_i915_gem_timeline_fence {
+	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
+	__u32 handle;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_TIMELINE_FENCE_WAIT:
+	 * Wait for the input fence before the operation.
+	 *
+	 * I915_TIMELINE_FENCE_SIGNAL:
+	 * Return operation completion fence as output.
+	 */
+	__u32 flags;
+#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
+#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
+#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
+
+	/**
+	 * @value: A point in the timeline.
+	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+	 * binary one.
+	 */
+	__u64 value;
+};
+
 struct drm_i915_gem_pin {
 	/** Handle of the buffer to be pinned. */
 	__u32 handle;
@@ -2397,8 +2455,6 @@ struct drm_i915_gem_context_destroy {
  * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
  * returned in the outparam @id.
  *
- * No flags are defined, with all bits reserved and must be zero.
- *
  * An extension chain maybe provided, starting with @extensions, and terminated
  * by the @next_extension being 0. Currently, no extensions are defined.
  *
@@ -2410,6 +2466,10 @@ struct drm_i915_gem_context_destroy {
  */
 struct drm_i915_gem_vm_control {
 	__u64 extensions;
+
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND	(1u << 0)
+#define I915_VM_CREATE_FLAGS_UNKNOWN \
+	(-(I915_VM_CREATE_FLAGS_USE_VM_BIND << 1))
 	__u32 flags;
 	__u32 vm_id;
 };
@@ -3602,6 +3662,134 @@ struct drm_i915_gem_create_ext_protected_content {
 /* ID of the protected content session managed by i915 when PXP is active */
 #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
 
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
+ * virtual address (VA) range to the section of an object that should be bound
+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) and can
+ * be mapped to whole object or a section of the object (partial binding).
+ * Multiple VA mappings can be created to the same section of the object
+ * (aliasing).
+ *
+ * The @start, @offset and @length must be 4K page aligned. However the DG2
+ * and XEHPSDV has 64K page size for device local memory and has compact page
+ * table. On those platforms, for binding device local-memory objects, the
+ * @start, @offset and @length must be 64K aligned. Also, UMDs should not mix
+ * the local memory 64K page and the system memory 4K page bindings in the same
+ * 2M range.
+ *
+ * Error code -EINVAL will be returned if @start, @offset and @length are not
+ * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
+ * -ENOSPC will be returned if the VA range specified can't be reserved.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_BIND operation can be done
+ * asynchronously, if valid @fence is specified.
+ */
+struct drm_i915_gem_vm_bind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @handle: Object handle */
+	__u32 handle;
+
+	/** @start: Virtual Address start to bind */
+	__u64 start;
+
+	/** @offset: Offset in object to bind */
+	__u64 offset;
+
+	/** @length: Length of mapping to bind */
+	__u64 length;
+
+	/**
+	 * @flags: Currently reserved, MBZ.
+	 *
+	 * Note that @fence carries its own flags.
+	 */
+	__u64 flags;
+
+	/**
+	 * @fence: Timeline fence for bind completion signaling.
+	 *
+	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
+	 *
+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
+	 * is invalid, and an error will be returned.
+	 *
+	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
+	 * is not requested and binding is completed synchronously.
+	 */
+	struct drm_i915_gem_timeline_fence fence;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
+ *
+ * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
+ * address (VA) range that should be unbound from the device page table of the
+ * specified address space (VM). VM_UNBIND will force unbind the specified
+ * range from device page table without waiting for any GPU job to complete.
+ * It is UMDs responsibility to ensure the mapping is no longer in use before
+ * calling VM_UNBIND.
+ *
+ * If the specified mapping is not found, the ioctl will simply return without
+ * any error.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
+ * asynchronously, if valid @fence is specified.
+ */
+struct drm_i915_gem_vm_unbind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @rsvd: Reserved, MBZ */
+	__u32 rsvd;
+
+	/** @start: Virtual Address start to unbind */
+	__u64 start;
+
+	/** @length: Length of mapping to unbind */
+	__u64 length;
+
+	/**
+	 * @flags: Currently reserved, MBZ.
+	 *
+	 * Note that @fence carries its own flags.
+	 */
+	__u64 flags;
+
+	/**
+	 * @fence: Timeline fence for unbind completion signaling.
+	 *
+	 * Timeline fence is of format struct drm_i915_gem_timeline_fence.
+	 *
+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
+	 * is invalid, and an error will be returned.
+	 *
+	 * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out fence
+	 * is not requested and unbinding is completed synchronously.
+	 */
+	struct drm_i915_gem_timeline_fence fence;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
  2022-07-01 22:50 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Bind and unbind the mappings upon VM_BIND and VM_UNBIND calls.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_create.c    |  10 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  38 +++
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 233 ++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |   7 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
 drivers/gpu/drm/i915/i915_driver.c            |  11 +-
 drivers/gpu/drm/i915/i915_vma.c               |   7 +-
 drivers/gpu/drm/i915/i915_vma.h               |   2 -
 drivers/gpu/drm/i915/i915_vma_types.h         |   8 +
 11 files changed, 318 insertions(+), 10 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 522ef9b4aff3..4e1627e96c6e 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -165,6 +165,7 @@ gem-y += \
 	gem/i915_gem_ttm_move.o \
 	gem/i915_gem_ttm_pm.o \
 	gem/i915_gem_userptr.o \
+	gem/i915_gem_vm_bind_object.o \
 	gem/i915_gem_wait.o \
 	gem/i915_gemfs.o
 i915-y += \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 33673fe7ee0a..927a87e5ec59 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -15,10 +15,10 @@
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
 
-static u32 object_max_page_size(struct intel_memory_region **placements,
-				unsigned int n_placements)
+u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
+				  unsigned int n_placements)
 {
-	u32 max_page_size = 0;
+	u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
 	int i;
 
 	for (i = 0; i < n_placements; i++) {
@@ -28,7 +28,6 @@ static u32 object_max_page_size(struct intel_memory_region **placements,
 		max_page_size = max_t(u32, max_page_size, mr->min_page_size);
 	}
 
-	GEM_BUG_ON(!max_page_size);
 	return max_page_size;
 }
 
@@ -99,7 +98,8 @@ __i915_gem_object_create_user_ext(struct drm_i915_private *i915, u64 size,
 
 	i915_gem_flush_free_objects(i915);
 
-	size = round_up(size, object_max_page_size(placements, n_placements));
+	size = round_up(size, i915_gem_object_max_page_size(placements,
+							    n_placements));
 	if (size == 0)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 6f0a3ce35567..650de2224843 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
 }
 
 void i915_gem_init__objects(struct drm_i915_private *i915);
+u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
+				  unsigned int n_placements);
 
 void i915_objects_module_exit(void);
 int i915_objects_module_init(void);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
new file mode 100644
index 000000000000..642cdb559f17
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#ifndef __I915_GEM_VM_BIND_H
+#define __I915_GEM_VM_BIND_H
+
+#include "i915_drv.h"
+
+#define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)->vm_bind_lock)
+
+static inline void i915_gem_vm_bind_lock(struct i915_address_space *vm)
+{
+	mutex_lock(&vm->vm_bind_lock);
+}
+
+static inline int
+i915_gem_vm_bind_lock_interruptible(struct i915_address_space *vm)
+{
+	return mutex_lock_interruptible(&vm->vm_bind_lock);
+}
+
+static inline void i915_gem_vm_bind_unlock(struct i915_address_space *vm)
+{
+	mutex_unlock(&vm->vm_bind_lock);
+}
+
+struct i915_vma *
+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
+void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
+int i915_gem_vm_bind_obj(struct i915_address_space *vm,
+			 struct drm_i915_gem_vm_bind *va,
+			 struct drm_file *file);
+int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
+			   struct drm_i915_gem_vm_unbind *va);
+
+#endif /* __I915_GEM_VM_BIND_H */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
new file mode 100644
index 000000000000..43ceb4dcca6c
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -0,0 +1,233 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include <linux/interval_tree_generic.h>
+
+#include "gem/i915_gem_vm_bind.h"
+#include "gt/gen8_engine_cs.h"
+
+#include "i915_drv.h"
+#include "i915_gem_gtt.h"
+
+#define START(node) ((node)->start)
+#define LAST(node) ((node)->last)
+
+INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
+		     START, LAST, static inline, i915_vm_bind_it)
+
+#undef START
+#undef LAST
+
+/**
+ * DOC: VM_BIND/UNBIND ioctls
+ *
+ * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
+ * objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
+ * specified address space (VM). Multiple mappings can map to the same physical
+ * pages of an object (aliasing). These mappings (also referred to as persistent
+ * mappings) will be persistent across multiple GPU submissions (execbuf calls)
+ * issued by the UMD, without user having to provide a list of all required
+ * mappings during each submission (as required by older execbuf mode).
+ *
+ * The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
+ * signaling the completion of bind/unbind operation.
+ *
+ * VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
+ * User has to opt-in for VM_BIND mode of binding for an address space (VM)
+ * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be
+ * done asynchronously, when valid out fence is specified.
+ *
+ * VM_BIND locking order is as below.
+ *
+ * 1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
+ *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
+ *    mapping.
+ *
+ *    In future, when GPU page faults are supported, we can potentially use a
+ *    rwsem instead, so that multiple page fault handlers can take the read
+ *    side lock to lookup the mapping and hence can run in parallel.
+ *    The older execbuf mode of binding do not need this lock.
+ *
+ * 2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs
+ *    to be held while binding/unbinding a vma in the async worker and while
+ *    updating dma-resv fence list of an object. Note that private BOs of a VM
+ *    will all share a dma-resv object.
+ *
+ *    The future system allocator support will use the HMM prescribed locking
+ *    instead.
+ *
+ * 3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
+ *    invalidated vmas (due to eviction and userptr invalidation) etc.
+ */
+
+struct i915_vma *
+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
+{
+	struct i915_vma *vma, *temp;
+
+	assert_vm_bind_held(vm);
+
+	vma = i915_vm_bind_it_iter_first(&vm->va, va, va);
+	/* Working around compiler error, remove later */
+	if (vma)
+		temp = i915_vm_bind_it_iter_next(vma, va + vma->size, -1);
+	return vma;
+}
+
+void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
+{
+	assert_vm_bind_held(vma->vm);
+
+	if (!list_empty(&vma->vm_bind_link)) {
+		list_del_init(&vma->vm_bind_link);
+		i915_vm_bind_it_remove(vma, &vma->vm->va);
+
+		/* Release object */
+		if (release_obj)
+			i915_vma_put(vma);
+	}
+}
+
+int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
+			   struct drm_i915_gem_vm_unbind *va)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int ret;
+
+	va->start = gen8_noncanonical_addr(va->start);
+	ret = i915_gem_vm_bind_lock_interruptible(vm);
+	if (ret)
+		return ret;
+
+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
+	if (!vma) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
+
+	if (vma->size != va->length)
+		ret = -EINVAL;
+	else
+		i915_gem_vm_bind_remove(vma, false);
+
+out_unlock:
+	i915_gem_vm_bind_unlock(vm);
+	if (ret || !vma)
+		return ret;
+
+	/* Destroy vma and then release object */
+	obj = vma->obj;
+	ret = i915_gem_object_lock(obj, NULL);
+	if (ret)
+		return ret;
+
+	i915_vma_destroy(vma);
+	i915_gem_object_unlock(obj);
+	i915_gem_object_put(obj);
+
+	return 0;
+}
+
+static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
+					struct drm_i915_gem_object *obj,
+					struct drm_i915_gem_vm_bind *va)
+{
+	struct i915_ggtt_view view;
+	struct i915_vma *vma;
+
+	va->start = gen8_noncanonical_addr(va->start);
+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
+	if (vma)
+		return ERR_PTR(-EEXIST);
+
+	view.type = I915_GGTT_VIEW_PARTIAL;
+	view.partial.offset = va->offset >> PAGE_SHIFT;
+	view.partial.size = va->length >> PAGE_SHIFT;
+	vma = i915_vma_instance(obj, vm, &view);
+	if (IS_ERR(vma))
+		return vma;
+
+	vma->start = va->start;
+	vma->last = va->start + va->length - 1;
+
+	return vma;
+}
+
+int i915_gem_vm_bind_obj(struct i915_address_space *vm,
+			 struct drm_i915_gem_vm_bind *va,
+			 struct drm_file *file)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma = NULL;
+	struct i915_gem_ww_ctx ww;
+	u64 pin_flags;
+	int ret = 0;
+
+	if (!vm->vm_bind_mode)
+		return -EOPNOTSUPP;
+
+	obj = i915_gem_object_lookup(file, va->handle);
+	if (!obj)
+		return -ENOENT;
+
+	if (!va->length ||
+	    !IS_ALIGNED(va->offset | va->length,
+			i915_gem_object_max_page_size(obj->mm.placements,
+						      obj->mm.n_placements)) ||
+	    range_overflows_t(u64, va->offset, va->length, obj->base.size)) {
+		ret = -EINVAL;
+		goto put_obj;
+	}
+
+	ret = i915_gem_vm_bind_lock_interruptible(vm);
+	if (ret)
+		goto put_obj;
+
+	vma = vm_bind_get_vma(vm, obj, va);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto unlock_vm;
+	}
+
+	i915_gem_ww_ctx_init(&ww, true);
+	pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
+retry:
+	ret = i915_gem_object_lock(vma->obj, &ww);
+	if (ret)
+		goto out_ww;
+
+	ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
+	if (ret)
+		goto out_ww;
+
+	/* Make it evictable */
+	__i915_vma_unpin(vma);
+
+	list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
+	i915_vm_bind_it_insert(vma, &vm->va);
+
+	/* Hold object reference until vm_unbind */
+	i915_gem_object_get(vma->obj);
+out_ww:
+	if (ret == -EDEADLK) {
+		ret = i915_gem_ww_ctx_backoff(&ww);
+		if (!ret)
+			goto retry;
+	}
+
+	if (ret)
+		i915_vma_destroy(vma);
+
+	i915_gem_ww_ctx_fini(&ww);
+unlock_vm:
+	i915_gem_vm_bind_unlock(vm);
+put_obj:
+	i915_gem_object_put(obj);
+	return ret;
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index b67831833c9a..135dc4a76724 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -176,6 +176,8 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
 void i915_address_space_fini(struct i915_address_space *vm)
 {
 	drm_mm_takedown(&vm->mm);
+	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
+	mutex_destroy(&vm->vm_bind_lock);
 }
 
 /**
@@ -282,6 +284,11 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	INIT_LIST_HEAD(&vm->bound_list);
 	INIT_LIST_HEAD(&vm->unbound_list);
+
+	vm->va = RB_ROOT_CACHED;
+	INIT_LIST_HEAD(&vm->vm_bind_list);
+	INIT_LIST_HEAD(&vm->vm_bound_list);
+	mutex_init(&vm->vm_bind_lock);
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index c812aa9708ae..d4a6ce65251d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -259,6 +259,15 @@ struct i915_address_space {
 	 */
 	struct list_head unbound_list;
 
+	/**
+	 * List of VM_BIND objects.
+	 */
+	struct mutex vm_bind_lock;  /* Protects vm_bind lists */
+	struct list_head vm_bind_list;
+	struct list_head vm_bound_list;
+	/* va tree of persistent vmas */
+	struct rb_root_cached va;
+
 	/* Global GTT */
 	bool is_ggtt:1;
 
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index ccf990dfd99b..776ab7844f60 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -68,6 +68,7 @@
 #include "gem/i915_gem_ioctls.h"
 #include "gem/i915_gem_mman.h"
 #include "gem/i915_gem_pm.h"
+#include "gem/i915_gem_vm_bind.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_pm.h"
 #include "gt/intel_rc6.h"
@@ -1783,13 +1784,16 @@ static int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
 {
 	struct drm_i915_gem_vm_bind *args = data;
 	struct i915_address_space *vm;
+	int ret;
 
 	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
 	if (unlikely(!vm))
 		return -ENOENT;
 
+	ret = i915_gem_vm_bind_obj(vm, args, file);
+
 	i915_vm_put(vm);
-	return -EINVAL;
+	return ret;
 }
 
 static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
@@ -1797,13 +1801,16 @@ static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
 {
 	struct drm_i915_gem_vm_unbind *args = data;
 	struct i915_address_space *vm;
+	int ret;
 
 	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
 	if (unlikely(!vm))
 		return -ENOENT;
 
+	ret = i915_gem_vm_unbind_obj(vm, args);
+
 	i915_vm_put(vm);
-	return -EINVAL;
+	return ret;
 }
 
 static const struct drm_ioctl_desc i915_ioctls[] = {
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 43339ecabd73..d324e29cef0a 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -29,6 +29,7 @@
 #include "display/intel_frontbuffer.h"
 #include "gem/i915_gem_lmem.h"
 #include "gem/i915_gem_tiling.h"
+#include "gem/i915_gem_vm_bind.h"
 #include "gt/intel_engine.h"
 #include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_gt.h"
@@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	spin_unlock(&obj->vma.lock);
 	mutex_unlock(&vm->mutex);
 
+	INIT_LIST_HEAD(&vma->vm_bind_link);
 	return vma;
 
 err_unlock:
@@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 {
 	struct i915_vma *vma;
 
-	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
 	GEM_BUG_ON(!kref_read(&vm->ref));
 
 	spin_lock(&obj->vma.lock);
@@ -1660,6 +1661,10 @@ static void release_references(struct i915_vma *vma, bool vm_ddestroy)
 
 	spin_unlock(&obj->vma.lock);
 
+	i915_gem_vm_bind_lock(vma->vm);
+	i915_gem_vm_bind_remove(vma, true);
+	i915_gem_vm_bind_unlock(vma->vm);
+
 	spin_lock_irq(&gt->closed_lock);
 	__i915_vma_remove_closed(vma);
 	spin_unlock_irq(&gt->closed_lock);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 88ca0bd9c900..dcb49f79ff7e 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
 {
 	ptrdiff_t cmp;
 
-	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
-
 	cmp = ptrdiff(vma->vm, vm);
 	if (cmp)
 		return cmp;
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index be6e028c3b57..b6d179bdbfa0 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -289,6 +289,14 @@ struct i915_vma {
 	/** This object's place on the active/inactive lists */
 	struct list_head vm_link;
 
+	struct list_head vm_bind_link; /* Link in persistent VMA list */
+
+	/** Interval tree structures for persistent vma */
+	struct rb_node rb;
+	u64 start;
+	u64 last;
+	u64 __subtree_last;
+
 	struct list_head obj_link; /* Link in the object's VMA list */
 	struct rb_node obj_node;
 	struct hlist_node obj_hash;
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [Intel-gfx] [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Bind and unbind the mappings upon VM_BIND and VM_UNBIND calls.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_create.c    |  10 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  38 +++
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 233 ++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |   7 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
 drivers/gpu/drm/i915/i915_driver.c            |  11 +-
 drivers/gpu/drm/i915/i915_vma.c               |   7 +-
 drivers/gpu/drm/i915/i915_vma.h               |   2 -
 drivers/gpu/drm/i915/i915_vma_types.h         |   8 +
 11 files changed, 318 insertions(+), 10 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 522ef9b4aff3..4e1627e96c6e 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -165,6 +165,7 @@ gem-y += \
 	gem/i915_gem_ttm_move.o \
 	gem/i915_gem_ttm_pm.o \
 	gem/i915_gem_userptr.o \
+	gem/i915_gem_vm_bind_object.o \
 	gem/i915_gem_wait.o \
 	gem/i915_gemfs.o
 i915-y += \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 33673fe7ee0a..927a87e5ec59 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -15,10 +15,10 @@
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
 
-static u32 object_max_page_size(struct intel_memory_region **placements,
-				unsigned int n_placements)
+u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
+				  unsigned int n_placements)
 {
-	u32 max_page_size = 0;
+	u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
 	int i;
 
 	for (i = 0; i < n_placements; i++) {
@@ -28,7 +28,6 @@ static u32 object_max_page_size(struct intel_memory_region **placements,
 		max_page_size = max_t(u32, max_page_size, mr->min_page_size);
 	}
 
-	GEM_BUG_ON(!max_page_size);
 	return max_page_size;
 }
 
@@ -99,7 +98,8 @@ __i915_gem_object_create_user_ext(struct drm_i915_private *i915, u64 size,
 
 	i915_gem_flush_free_objects(i915);
 
-	size = round_up(size, object_max_page_size(placements, n_placements));
+	size = round_up(size, i915_gem_object_max_page_size(placements,
+							    n_placements));
 	if (size == 0)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 6f0a3ce35567..650de2224843 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
 }
 
 void i915_gem_init__objects(struct drm_i915_private *i915);
+u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
+				  unsigned int n_placements);
 
 void i915_objects_module_exit(void);
 int i915_objects_module_init(void);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
new file mode 100644
index 000000000000..642cdb559f17
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#ifndef __I915_GEM_VM_BIND_H
+#define __I915_GEM_VM_BIND_H
+
+#include "i915_drv.h"
+
+#define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)->vm_bind_lock)
+
+static inline void i915_gem_vm_bind_lock(struct i915_address_space *vm)
+{
+	mutex_lock(&vm->vm_bind_lock);
+}
+
+static inline int
+i915_gem_vm_bind_lock_interruptible(struct i915_address_space *vm)
+{
+	return mutex_lock_interruptible(&vm->vm_bind_lock);
+}
+
+static inline void i915_gem_vm_bind_unlock(struct i915_address_space *vm)
+{
+	mutex_unlock(&vm->vm_bind_lock);
+}
+
+struct i915_vma *
+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
+void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
+int i915_gem_vm_bind_obj(struct i915_address_space *vm,
+			 struct drm_i915_gem_vm_bind *va,
+			 struct drm_file *file);
+int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
+			   struct drm_i915_gem_vm_unbind *va);
+
+#endif /* __I915_GEM_VM_BIND_H */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
new file mode 100644
index 000000000000..43ceb4dcca6c
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -0,0 +1,233 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include <linux/interval_tree_generic.h>
+
+#include "gem/i915_gem_vm_bind.h"
+#include "gt/gen8_engine_cs.h"
+
+#include "i915_drv.h"
+#include "i915_gem_gtt.h"
+
+#define START(node) ((node)->start)
+#define LAST(node) ((node)->last)
+
+INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
+		     START, LAST, static inline, i915_vm_bind_it)
+
+#undef START
+#undef LAST
+
+/**
+ * DOC: VM_BIND/UNBIND ioctls
+ *
+ * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
+ * objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
+ * specified address space (VM). Multiple mappings can map to the same physical
+ * pages of an object (aliasing). These mappings (also referred to as persistent
+ * mappings) will be persistent across multiple GPU submissions (execbuf calls)
+ * issued by the UMD, without user having to provide a list of all required
+ * mappings during each submission (as required by older execbuf mode).
+ *
+ * The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
+ * signaling the completion of bind/unbind operation.
+ *
+ * VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
+ * User has to opt-in for VM_BIND mode of binding for an address space (VM)
+ * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be
+ * done asynchronously, when valid out fence is specified.
+ *
+ * VM_BIND locking order is as below.
+ *
+ * 1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
+ *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
+ *    mapping.
+ *
+ *    In future, when GPU page faults are supported, we can potentially use a
+ *    rwsem instead, so that multiple page fault handlers can take the read
+ *    side lock to lookup the mapping and hence can run in parallel.
+ *    The older execbuf mode of binding do not need this lock.
+ *
+ * 2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs
+ *    to be held while binding/unbinding a vma in the async worker and while
+ *    updating dma-resv fence list of an object. Note that private BOs of a VM
+ *    will all share a dma-resv object.
+ *
+ *    The future system allocator support will use the HMM prescribed locking
+ *    instead.
+ *
+ * 3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
+ *    invalidated vmas (due to eviction and userptr invalidation) etc.
+ */
+
+struct i915_vma *
+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
+{
+	struct i915_vma *vma, *temp;
+
+	assert_vm_bind_held(vm);
+
+	vma = i915_vm_bind_it_iter_first(&vm->va, va, va);
+	/* Working around compiler error, remove later */
+	if (vma)
+		temp = i915_vm_bind_it_iter_next(vma, va + vma->size, -1);
+	return vma;
+}
+
+void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
+{
+	assert_vm_bind_held(vma->vm);
+
+	if (!list_empty(&vma->vm_bind_link)) {
+		list_del_init(&vma->vm_bind_link);
+		i915_vm_bind_it_remove(vma, &vma->vm->va);
+
+		/* Release object */
+		if (release_obj)
+			i915_vma_put(vma);
+	}
+}
+
+int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
+			   struct drm_i915_gem_vm_unbind *va)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int ret;
+
+	va->start = gen8_noncanonical_addr(va->start);
+	ret = i915_gem_vm_bind_lock_interruptible(vm);
+	if (ret)
+		return ret;
+
+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
+	if (!vma) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
+
+	if (vma->size != va->length)
+		ret = -EINVAL;
+	else
+		i915_gem_vm_bind_remove(vma, false);
+
+out_unlock:
+	i915_gem_vm_bind_unlock(vm);
+	if (ret || !vma)
+		return ret;
+
+	/* Destroy vma and then release object */
+	obj = vma->obj;
+	ret = i915_gem_object_lock(obj, NULL);
+	if (ret)
+		return ret;
+
+	i915_vma_destroy(vma);
+	i915_gem_object_unlock(obj);
+	i915_gem_object_put(obj);
+
+	return 0;
+}
+
+static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
+					struct drm_i915_gem_object *obj,
+					struct drm_i915_gem_vm_bind *va)
+{
+	struct i915_ggtt_view view;
+	struct i915_vma *vma;
+
+	va->start = gen8_noncanonical_addr(va->start);
+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
+	if (vma)
+		return ERR_PTR(-EEXIST);
+
+	view.type = I915_GGTT_VIEW_PARTIAL;
+	view.partial.offset = va->offset >> PAGE_SHIFT;
+	view.partial.size = va->length >> PAGE_SHIFT;
+	vma = i915_vma_instance(obj, vm, &view);
+	if (IS_ERR(vma))
+		return vma;
+
+	vma->start = va->start;
+	vma->last = va->start + va->length - 1;
+
+	return vma;
+}
+
+int i915_gem_vm_bind_obj(struct i915_address_space *vm,
+			 struct drm_i915_gem_vm_bind *va,
+			 struct drm_file *file)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma = NULL;
+	struct i915_gem_ww_ctx ww;
+	u64 pin_flags;
+	int ret = 0;
+
+	if (!vm->vm_bind_mode)
+		return -EOPNOTSUPP;
+
+	obj = i915_gem_object_lookup(file, va->handle);
+	if (!obj)
+		return -ENOENT;
+
+	if (!va->length ||
+	    !IS_ALIGNED(va->offset | va->length,
+			i915_gem_object_max_page_size(obj->mm.placements,
+						      obj->mm.n_placements)) ||
+	    range_overflows_t(u64, va->offset, va->length, obj->base.size)) {
+		ret = -EINVAL;
+		goto put_obj;
+	}
+
+	ret = i915_gem_vm_bind_lock_interruptible(vm);
+	if (ret)
+		goto put_obj;
+
+	vma = vm_bind_get_vma(vm, obj, va);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto unlock_vm;
+	}
+
+	i915_gem_ww_ctx_init(&ww, true);
+	pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
+retry:
+	ret = i915_gem_object_lock(vma->obj, &ww);
+	if (ret)
+		goto out_ww;
+
+	ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
+	if (ret)
+		goto out_ww;
+
+	/* Make it evictable */
+	__i915_vma_unpin(vma);
+
+	list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
+	i915_vm_bind_it_insert(vma, &vm->va);
+
+	/* Hold object reference until vm_unbind */
+	i915_gem_object_get(vma->obj);
+out_ww:
+	if (ret == -EDEADLK) {
+		ret = i915_gem_ww_ctx_backoff(&ww);
+		if (!ret)
+			goto retry;
+	}
+
+	if (ret)
+		i915_vma_destroy(vma);
+
+	i915_gem_ww_ctx_fini(&ww);
+unlock_vm:
+	i915_gem_vm_bind_unlock(vm);
+put_obj:
+	i915_gem_object_put(obj);
+	return ret;
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index b67831833c9a..135dc4a76724 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -176,6 +176,8 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
 void i915_address_space_fini(struct i915_address_space *vm)
 {
 	drm_mm_takedown(&vm->mm);
+	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
+	mutex_destroy(&vm->vm_bind_lock);
 }
 
 /**
@@ -282,6 +284,11 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	INIT_LIST_HEAD(&vm->bound_list);
 	INIT_LIST_HEAD(&vm->unbound_list);
+
+	vm->va = RB_ROOT_CACHED;
+	INIT_LIST_HEAD(&vm->vm_bind_list);
+	INIT_LIST_HEAD(&vm->vm_bound_list);
+	mutex_init(&vm->vm_bind_lock);
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index c812aa9708ae..d4a6ce65251d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -259,6 +259,15 @@ struct i915_address_space {
 	 */
 	struct list_head unbound_list;
 
+	/**
+	 * List of VM_BIND objects.
+	 */
+	struct mutex vm_bind_lock;  /* Protects vm_bind lists */
+	struct list_head vm_bind_list;
+	struct list_head vm_bound_list;
+	/* va tree of persistent vmas */
+	struct rb_root_cached va;
+
 	/* Global GTT */
 	bool is_ggtt:1;
 
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index ccf990dfd99b..776ab7844f60 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -68,6 +68,7 @@
 #include "gem/i915_gem_ioctls.h"
 #include "gem/i915_gem_mman.h"
 #include "gem/i915_gem_pm.h"
+#include "gem/i915_gem_vm_bind.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_pm.h"
 #include "gt/intel_rc6.h"
@@ -1783,13 +1784,16 @@ static int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
 {
 	struct drm_i915_gem_vm_bind *args = data;
 	struct i915_address_space *vm;
+	int ret;
 
 	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
 	if (unlikely(!vm))
 		return -ENOENT;
 
+	ret = i915_gem_vm_bind_obj(vm, args, file);
+
 	i915_vm_put(vm);
-	return -EINVAL;
+	return ret;
 }
 
 static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
@@ -1797,13 +1801,16 @@ static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
 {
 	struct drm_i915_gem_vm_unbind *args = data;
 	struct i915_address_space *vm;
+	int ret;
 
 	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
 	if (unlikely(!vm))
 		return -ENOENT;
 
+	ret = i915_gem_vm_unbind_obj(vm, args);
+
 	i915_vm_put(vm);
-	return -EINVAL;
+	return ret;
 }
 
 static const struct drm_ioctl_desc i915_ioctls[] = {
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 43339ecabd73..d324e29cef0a 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -29,6 +29,7 @@
 #include "display/intel_frontbuffer.h"
 #include "gem/i915_gem_lmem.h"
 #include "gem/i915_gem_tiling.h"
+#include "gem/i915_gem_vm_bind.h"
 #include "gt/intel_engine.h"
 #include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_gt.h"
@@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	spin_unlock(&obj->vma.lock);
 	mutex_unlock(&vm->mutex);
 
+	INIT_LIST_HEAD(&vma->vm_bind_link);
 	return vma;
 
 err_unlock:
@@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 {
 	struct i915_vma *vma;
 
-	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
 	GEM_BUG_ON(!kref_read(&vm->ref));
 
 	spin_lock(&obj->vma.lock);
@@ -1660,6 +1661,10 @@ static void release_references(struct i915_vma *vma, bool vm_ddestroy)
 
 	spin_unlock(&obj->vma.lock);
 
+	i915_gem_vm_bind_lock(vma->vm);
+	i915_gem_vm_bind_remove(vma, true);
+	i915_gem_vm_bind_unlock(vma->vm);
+
 	spin_lock_irq(&gt->closed_lock);
 	__i915_vma_remove_closed(vma);
 	spin_unlock_irq(&gt->closed_lock);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 88ca0bd9c900..dcb49f79ff7e 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
 {
 	ptrdiff_t cmp;
 
-	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
-
 	cmp = ptrdiff(vma->vm, vm);
 	if (cmp)
 		return cmp;
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index be6e028c3b57..b6d179bdbfa0 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -289,6 +289,14 @@ struct i915_vma {
 	/** This object's place on the active/inactive lists */
 	struct list_head vm_link;
 
+	struct list_head vm_bind_link; /* Link in persistent VMA list */
+
+	/** Interval tree structures for persistent vma */
+	struct rb_node rb;
+	u64 start;
+	u64 last;
+	u64 __subtree_last;
+
 	struct list_head obj_link; /* Link in the object's VMA list */
 	struct rb_node obj_node;
 	struct hlist_node obj_hash;
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
  2022-07-01 22:50 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Add uapi allowing user to specify a BO as private to a specified VM
during the BO creation.
VM private BOs can only be mapped on the specified VM and can't be
dma_buf exported. VM private BOs share a single common dma_resv object,
hence has a performance advantage requiring a single dma_resv object
update in the execbuf path compared to non-private (shared) BOs.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41 ++++++++++++++++++-
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
 .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
 drivers/gpu/drm/i915/i915_vma.c               |  1 +
 drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
 include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
 11 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 927a87e5ec59..7e264566b51f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -11,6 +11,7 @@
 #include "pxp/intel_pxp.h"
 
 #include "i915_drv.h"
+#include "i915_gem_context.h"
 #include "i915_gem_create.h"
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
@@ -243,6 +244,7 @@ struct create_ext {
 	unsigned int n_placements;
 	unsigned int placement_mask;
 	unsigned long flags;
+	u32 vm_id;
 };
 
 static void repr_placements(char *buf, size_t size,
@@ -392,9 +394,24 @@ static int ext_set_protected(struct i915_user_extension __user *base, void *data
 	return 0;
 }
 
+static int ext_set_vm_private(struct i915_user_extension __user *base,
+			      void *data)
+{
+	struct drm_i915_gem_create_ext_vm_private ext;
+	struct create_ext *ext_data = data;
+
+	if (copy_from_user(&ext, base, sizeof(ext)))
+		return -EFAULT;
+
+	ext_data->vm_id = ext.vm_id;
+
+	return 0;
+}
+
 static const i915_user_extension_fn create_extensions[] = {
 	[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
 	[I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
+	[I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
 };
 
 /**
@@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct drm_i915_gem_create_ext *args = data;
 	struct create_ext ext_data = { .i915 = i915 };
+	struct i915_address_space *vm = NULL;
 	struct drm_i915_gem_object *obj;
 	int ret;
 
@@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
+	if (ext_data.vm_id) {
+		vm = i915_gem_vm_lookup(file->driver_priv, ext_data.vm_id);
+		if (unlikely(!vm))
+			return -ENOENT;
+	}
+
 	if (!ext_data.n_placements) {
 		ext_data.placements[0] =
 			intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
@@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 						ext_data.placements,
 						ext_data.n_placements,
 						ext_data.flags);
-	if (IS_ERR(obj))
-		return PTR_ERR(obj);
+	if (IS_ERR(obj)) {
+		ret = PTR_ERR(obj);
+		goto vm_put;
+	}
+
+	if (vm) {
+		obj->base.resv = vm->root_obj->base.resv;
+		obj->priv_root = i915_gem_object_get(vm->root_obj);
+		i915_vm_put(vm);
+	}
 
 	return i915_gem_publish(obj, file, &args->size, &args->handle);
+vm_put:
+	if (vm)
+		i915_vm_put(vm);
+
+	return ret;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index f5062d0c6333..6433173c3e84 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags)
 	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
 	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
 
+	if (obj->priv_root) {
+		drm_dbg(obj->base.dev,
+			"Exporting VM private objects is not allowed\n");
+		return ERR_PTR(-EINVAL);
+	}
+
 	exp_info.ops = &i915_dmabuf_ops;
 	exp_info.size = gem_obj->size;
 	exp_info.flags = flags;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 5cf36a130061..9fe3395ad4d9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -241,6 +241,9 @@ struct drm_i915_gem_object {
 
 	const struct drm_i915_gem_object_ops *ops;
 
+	/* Shared root is object private to a VM; NULL otherwise */
+	struct drm_i915_gem_object *priv_root;
+
 	struct {
 		/**
 		 * @vma.lock: protect the list/tree of vmas
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 7e1f8b83077f..f1912b12db00 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
 	i915_gem_object_release_memory_region(obj);
 	mutex_destroy(&obj->ttm.get_io_page.lock);
 
+	if (obj->priv_root)
+		i915_gem_object_put(obj->priv_root);
+
 	if (obj->ttm.created) {
 		/*
 		 * We freely manage the shrinker LRU outide of the mm.pages life
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
index 642cdb559f17..ee6e4c52e80e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
@@ -26,6 +26,17 @@ static inline void i915_gem_vm_bind_unlock(struct i915_address_space *vm)
 	mutex_unlock(&vm->vm_bind_lock);
 }
 
+static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
+					struct i915_gem_ww_ctx *ww)
+{
+	return i915_gem_object_lock(vm->root_obj, ww);
+}
+
+static inline void i915_gem_vm_priv_unlock(struct i915_address_space *vm)
+{
+	i915_gem_object_unlock(vm->root_obj);
+}
+
 struct i915_vma *
 i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
 void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 43ceb4dcca6c..3201204c8e74 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
 
 	if (!list_empty(&vma->vm_bind_link)) {
 		list_del_init(&vma->vm_bind_link);
+		list_del_init(&vma->non_priv_vm_bind_link);
 		i915_vm_bind_it_remove(vma, &vma->vm->va);
 
 		/* Release object */
@@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		goto put_obj;
 	}
 
+	if (obj->priv_root && obj->priv_root != vm->root_obj) {
+		ret = -EINVAL;
+		goto put_obj;
+	}
+
 	ret = i915_gem_vm_bind_lock_interruptible(vm);
 	if (ret)
 		goto put_obj;
@@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 
 	list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
 	i915_vm_bind_it_insert(vma, &vm->va);
+	if (!obj->priv_root)
+		list_add_tail(&vma->non_priv_vm_bind_link,
+			      &vm->non_priv_vm_bind_list);
 
 	/* Hold object reference until vm_unbind */
 	i915_gem_object_get(vma->obj);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 135dc4a76724..df0a8459c3c6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
 void i915_address_space_fini(struct i915_address_space *vm)
 {
 	drm_mm_takedown(&vm->mm);
+	i915_gem_object_put(vm->root_obj);
 	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
 	mutex_destroy(&vm->vm_bind_lock);
 }
@@ -289,6 +290,9 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	INIT_LIST_HEAD(&vm->vm_bind_list);
 	INIT_LIST_HEAD(&vm->vm_bound_list);
 	mutex_init(&vm->vm_bind_lock);
+	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
+	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
+	GEM_BUG_ON(IS_ERR(vm->root_obj));
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index d4a6ce65251d..f538ce9115c9 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -267,6 +267,8 @@ struct i915_address_space {
 	struct list_head vm_bound_list;
 	/* va tree of persistent vmas */
 	struct rb_root_cached va;
+	struct list_head non_priv_vm_bind_list;
+	struct drm_i915_gem_object *root_obj;
 
 	/* Global GTT */
 	bool is_ggtt:1;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index d324e29cef0a..f0226581d342 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	mutex_unlock(&vm->mutex);
 
 	INIT_LIST_HEAD(&vma->vm_bind_link);
+	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
 	return vma;
 
 err_unlock:
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index b6d179bdbfa0..2298b3d6b7c4 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -290,6 +290,8 @@ struct i915_vma {
 	struct list_head vm_link;
 
 	struct list_head vm_bind_link; /* Link in persistent VMA list */
+	/* Link in non-private persistent VMA list */
+	struct list_head non_priv_vm_bind_link;
 
 	/** Interval tree structures for persistent vma */
 	struct rb_node rb;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 26cca49717f8..ce1c6592b0d7 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
 	 * struct drm_i915_gem_create_ext_protected_content.
+	 *
+	 * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
+	 * struct drm_i915_gem_create_ext_vm_private.
 	 */
 #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
 #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
+#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
 	__u64 extensions;
 };
 
@@ -3662,6 +3666,32 @@ struct drm_i915_gem_create_ext_protected_content {
 /* ID of the protected content session managed by i915 when PXP is active */
 #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
 
+/**
+ * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
+ * private to the specified VM.
+ *
+ * See struct drm_i915_gem_create_ext.
+ *
+ * By default, BOs can be mapped on multiple VMs and can also be dma-buf
+ * exported. Hence these BOs are referred to as Shared BOs.
+ * During each execbuf3 submission, the request fence must be added to the
+ * dma-resv fence list of all shared BOs mapped on the VM.
+ *
+ * Unlike Shared BOs, these VM private BOs can only be mapped on the VM they
+ * are private to and can't be dma-buf exported. All private BOs of a VM share
+ * the dma-resv object. Hence during each execbuf3 submission, they need only
+ * one dma-resv fence list updated. Thus, the fast path (where required
+ * mappings are already bound) submission latency is O(1) w.r.t the number of
+ * VM private BOs.
+ */
+struct drm_i915_gem_create_ext_vm_private {
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+
+	/** @vm_id: Id of the VM to which the object is private */
+	__u32 vm_id;
+};
+
 /**
  * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
  *
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [Intel-gfx] [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Add uapi allowing user to specify a BO as private to a specified VM
during the BO creation.
VM private BOs can only be mapped on the specified VM and can't be
dma_buf exported. VM private BOs share a single common dma_resv object,
hence has a performance advantage requiring a single dma_resv object
update in the execbuf path compared to non-private (shared) BOs.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41 ++++++++++++++++++-
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
 .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
 drivers/gpu/drm/i915/i915_vma.c               |  1 +
 drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
 include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
 11 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 927a87e5ec59..7e264566b51f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -11,6 +11,7 @@
 #include "pxp/intel_pxp.h"
 
 #include "i915_drv.h"
+#include "i915_gem_context.h"
 #include "i915_gem_create.h"
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
@@ -243,6 +244,7 @@ struct create_ext {
 	unsigned int n_placements;
 	unsigned int placement_mask;
 	unsigned long flags;
+	u32 vm_id;
 };
 
 static void repr_placements(char *buf, size_t size,
@@ -392,9 +394,24 @@ static int ext_set_protected(struct i915_user_extension __user *base, void *data
 	return 0;
 }
 
+static int ext_set_vm_private(struct i915_user_extension __user *base,
+			      void *data)
+{
+	struct drm_i915_gem_create_ext_vm_private ext;
+	struct create_ext *ext_data = data;
+
+	if (copy_from_user(&ext, base, sizeof(ext)))
+		return -EFAULT;
+
+	ext_data->vm_id = ext.vm_id;
+
+	return 0;
+}
+
 static const i915_user_extension_fn create_extensions[] = {
 	[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
 	[I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
+	[I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
 };
 
 /**
@@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct drm_i915_gem_create_ext *args = data;
 	struct create_ext ext_data = { .i915 = i915 };
+	struct i915_address_space *vm = NULL;
 	struct drm_i915_gem_object *obj;
 	int ret;
 
@@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
+	if (ext_data.vm_id) {
+		vm = i915_gem_vm_lookup(file->driver_priv, ext_data.vm_id);
+		if (unlikely(!vm))
+			return -ENOENT;
+	}
+
 	if (!ext_data.n_placements) {
 		ext_data.placements[0] =
 			intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
@@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 						ext_data.placements,
 						ext_data.n_placements,
 						ext_data.flags);
-	if (IS_ERR(obj))
-		return PTR_ERR(obj);
+	if (IS_ERR(obj)) {
+		ret = PTR_ERR(obj);
+		goto vm_put;
+	}
+
+	if (vm) {
+		obj->base.resv = vm->root_obj->base.resv;
+		obj->priv_root = i915_gem_object_get(vm->root_obj);
+		i915_vm_put(vm);
+	}
 
 	return i915_gem_publish(obj, file, &args->size, &args->handle);
+vm_put:
+	if (vm)
+		i915_vm_put(vm);
+
+	return ret;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index f5062d0c6333..6433173c3e84 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags)
 	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
 	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
 
+	if (obj->priv_root) {
+		drm_dbg(obj->base.dev,
+			"Exporting VM private objects is not allowed\n");
+		return ERR_PTR(-EINVAL);
+	}
+
 	exp_info.ops = &i915_dmabuf_ops;
 	exp_info.size = gem_obj->size;
 	exp_info.flags = flags;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 5cf36a130061..9fe3395ad4d9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -241,6 +241,9 @@ struct drm_i915_gem_object {
 
 	const struct drm_i915_gem_object_ops *ops;
 
+	/* Shared root is object private to a VM; NULL otherwise */
+	struct drm_i915_gem_object *priv_root;
+
 	struct {
 		/**
 		 * @vma.lock: protect the list/tree of vmas
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 7e1f8b83077f..f1912b12db00 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
 	i915_gem_object_release_memory_region(obj);
 	mutex_destroy(&obj->ttm.get_io_page.lock);
 
+	if (obj->priv_root)
+		i915_gem_object_put(obj->priv_root);
+
 	if (obj->ttm.created) {
 		/*
 		 * We freely manage the shrinker LRU outide of the mm.pages life
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
index 642cdb559f17..ee6e4c52e80e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
@@ -26,6 +26,17 @@ static inline void i915_gem_vm_bind_unlock(struct i915_address_space *vm)
 	mutex_unlock(&vm->vm_bind_lock);
 }
 
+static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
+					struct i915_gem_ww_ctx *ww)
+{
+	return i915_gem_object_lock(vm->root_obj, ww);
+}
+
+static inline void i915_gem_vm_priv_unlock(struct i915_address_space *vm)
+{
+	i915_gem_object_unlock(vm->root_obj);
+}
+
 struct i915_vma *
 i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
 void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 43ceb4dcca6c..3201204c8e74 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
 
 	if (!list_empty(&vma->vm_bind_link)) {
 		list_del_init(&vma->vm_bind_link);
+		list_del_init(&vma->non_priv_vm_bind_link);
 		i915_vm_bind_it_remove(vma, &vma->vm->va);
 
 		/* Release object */
@@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		goto put_obj;
 	}
 
+	if (obj->priv_root && obj->priv_root != vm->root_obj) {
+		ret = -EINVAL;
+		goto put_obj;
+	}
+
 	ret = i915_gem_vm_bind_lock_interruptible(vm);
 	if (ret)
 		goto put_obj;
@@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 
 	list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
 	i915_vm_bind_it_insert(vma, &vm->va);
+	if (!obj->priv_root)
+		list_add_tail(&vma->non_priv_vm_bind_link,
+			      &vm->non_priv_vm_bind_list);
 
 	/* Hold object reference until vm_unbind */
 	i915_gem_object_get(vma->obj);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 135dc4a76724..df0a8459c3c6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
 void i915_address_space_fini(struct i915_address_space *vm)
 {
 	drm_mm_takedown(&vm->mm);
+	i915_gem_object_put(vm->root_obj);
 	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
 	mutex_destroy(&vm->vm_bind_lock);
 }
@@ -289,6 +290,9 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	INIT_LIST_HEAD(&vm->vm_bind_list);
 	INIT_LIST_HEAD(&vm->vm_bound_list);
 	mutex_init(&vm->vm_bind_lock);
+	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
+	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
+	GEM_BUG_ON(IS_ERR(vm->root_obj));
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index d4a6ce65251d..f538ce9115c9 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -267,6 +267,8 @@ struct i915_address_space {
 	struct list_head vm_bound_list;
 	/* va tree of persistent vmas */
 	struct rb_root_cached va;
+	struct list_head non_priv_vm_bind_list;
+	struct drm_i915_gem_object *root_obj;
 
 	/* Global GTT */
 	bool is_ggtt:1;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index d324e29cef0a..f0226581d342 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	mutex_unlock(&vm->mutex);
 
 	INIT_LIST_HEAD(&vma->vm_bind_link);
+	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
 	return vma;
 
 err_unlock:
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index b6d179bdbfa0..2298b3d6b7c4 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -290,6 +290,8 @@ struct i915_vma {
 	struct list_head vm_link;
 
 	struct list_head vm_bind_link; /* Link in persistent VMA list */
+	/* Link in non-private persistent VMA list */
+	struct list_head non_priv_vm_bind_link;
 
 	/** Interval tree structures for persistent vma */
 	struct rb_node rb;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 26cca49717f8..ce1c6592b0d7 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
 	 * struct drm_i915_gem_create_ext_protected_content.
+	 *
+	 * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
+	 * struct drm_i915_gem_create_ext_vm_private.
 	 */
 #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
 #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
+#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
 	__u64 extensions;
 };
 
@@ -3662,6 +3666,32 @@ struct drm_i915_gem_create_ext_protected_content {
 /* ID of the protected content session managed by i915 when PXP is active */
 #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
 
+/**
+ * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
+ * private to the specified VM.
+ *
+ * See struct drm_i915_gem_create_ext.
+ *
+ * By default, BOs can be mapped on multiple VMs and can also be dma-buf
+ * exported. Hence these BOs are referred to as Shared BOs.
+ * During each execbuf3 submission, the request fence must be added to the
+ * dma-resv fence list of all shared BOs mapped on the VM.
+ *
+ * Unlike Shared BOs, these VM private BOs can only be mapped on the VM they
+ * are private to and can't be dma-buf exported. All private BOs of a VM share
+ * the dma-resv object. Hence during each execbuf3 submission, they need only
+ * one dma-resv fence list updated. Thus, the fast path (where required
+ * mappings are already bound) submission latency is O(1) w.r.t the number of
+ * VM private BOs.
+ */
+struct drm_i915_gem_create_ext_vm_private {
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+
+	/** @vm_id: Id of the VM to which the object is private */
+	__u32 vm_id;
+};
+
 /**
  * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
  *
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [RFC 04/10] drm/i915/vm_bind: Add out fence support
  2022-07-01 22:50 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Add support for handling out fence of vm_bind call.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  2 +
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 74 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_vma.c               |  6 +-
 drivers/gpu/drm/i915/i915_vma_types.h         |  7 ++
 4 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
index ee6e4c52e80e..849bf3c1061e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
@@ -45,5 +45,7 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 			 struct drm_file *file);
 int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
 			   struct drm_i915_gem_vm_unbind *va);
+void i915_vm_bind_signal_fence(struct i915_vma *vma,
+			       struct dma_fence * const fence);
 
 #endif /* __I915_GEM_VM_BIND_H */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 3201204c8e74..96f139cc8060 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -5,6 +5,8 @@
 
 #include <linux/interval_tree_generic.h>
 
+#include <drm/drm_syncobj.h>
+
 #include "gem/i915_gem_vm_bind.h"
 #include "gt/gen8_engine_cs.h"
 
@@ -94,6 +96,68 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
 	}
 }
 
+static int i915_vm_bind_add_fence(struct drm_file *file, struct i915_vma *vma,
+				  u32 handle, u64 point)
+{
+	struct drm_syncobj *syncobj;
+
+	syncobj = drm_syncobj_find(file, handle);
+	if (!syncobj) {
+		DRM_DEBUG("Invalid syncobj handle provided\n");
+		return -ENOENT;
+	}
+
+	/*
+	 * For timeline syncobjs we need to preallocate chains for
+	 * later signaling.
+	 */
+	if (point) {
+		vma->vm_bind_fence.chain_fence = dma_fence_chain_alloc();
+		if (!vma->vm_bind_fence.chain_fence) {
+			drm_syncobj_put(syncobj);
+			return -ENOMEM;
+		}
+	} else {
+		vma->vm_bind_fence.chain_fence = NULL;
+	}
+
+	vma->vm_bind_fence.syncobj = syncobj;
+	vma->vm_bind_fence.value = point;
+
+	return 0;
+}
+
+static void i915_vm_bind_put_fence(struct i915_vma *vma)
+{
+	if (!vma->vm_bind_fence.syncobj)
+		return;
+
+	drm_syncobj_put(vma->vm_bind_fence.syncobj);
+	dma_fence_chain_free(vma->vm_bind_fence.chain_fence);
+}
+
+void i915_vm_bind_signal_fence(struct i915_vma *vma,
+			       struct dma_fence * const fence)
+{
+	struct drm_syncobj *syncobj = vma->vm_bind_fence.syncobj;
+
+	if (!syncobj)
+		return;
+
+	if (vma->vm_bind_fence.chain_fence) {
+		drm_syncobj_add_point(syncobj,
+				      vma->vm_bind_fence.chain_fence,
+				      fence, vma->vm_bind_fence.value);
+		/*
+		 * The chain's ownership is transferred to the
+		 * timeline.
+		 */
+		vma->vm_bind_fence.chain_fence = NULL;
+	} else {
+		drm_syncobj_replace_fence(syncobj, fence);
+	}
+}
+
 int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
 			   struct drm_i915_gem_vm_unbind *va)
 {
@@ -202,6 +266,14 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 	}
 
 	i915_gem_ww_ctx_init(&ww, true);
+
+	if (va->fence.flags & I915_TIMELINE_FENCE_SIGNAL) {
+		ret = i915_vm_bind_add_fence(file, vma, va->fence.handle,
+					     va->fence.value);
+		if (ret)
+			goto put_vma;
+	}
+
 	pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
 retry:
 	ret = i915_gem_object_lock(vma->obj, &ww);
@@ -230,6 +302,8 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 			goto retry;
 	}
 
+	i915_vm_bind_put_fence(vma);
+put_vma:
 	if (ret)
 		i915_vma_destroy(vma);
 
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index f0226581d342..6737236b7884 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1510,8 +1510,12 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 err_vma_res:
 	i915_vma_resource_free(vma_res);
 err_fence:
-	if (work)
+	if (work) {
+		if (i915_vma_is_persistent(vma))
+			i915_vm_bind_signal_fence(vma, &work->base.dma);
+
 		dma_fence_work_commit_imm(&work->base);
+	}
 err_rpm:
 	if (wakeref)
 		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index 2298b3d6b7c4..7d830a6a0b51 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -293,6 +293,13 @@ struct i915_vma {
 	/* Link in non-private persistent VMA list */
 	struct list_head non_priv_vm_bind_link;
 
+	/** Timeline fence for vm_bind completion notification */
+	struct {
+		struct drm_syncobj *syncobj;
+		u64 value;
+		struct dma_fence_chain *chain_fence;
+	} vm_bind_fence;
+
 	/** Interval tree structures for persistent vma */
 	struct rb_node rb;
 	u64 start;
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [Intel-gfx] [RFC 04/10] drm/i915/vm_bind: Add out fence support
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Add support for handling out fence of vm_bind call.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  2 +
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 74 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_vma.c               |  6 +-
 drivers/gpu/drm/i915/i915_vma_types.h         |  7 ++
 4 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
index ee6e4c52e80e..849bf3c1061e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
@@ -45,5 +45,7 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 			 struct drm_file *file);
 int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
 			   struct drm_i915_gem_vm_unbind *va);
+void i915_vm_bind_signal_fence(struct i915_vma *vma,
+			       struct dma_fence * const fence);
 
 #endif /* __I915_GEM_VM_BIND_H */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 3201204c8e74..96f139cc8060 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -5,6 +5,8 @@
 
 #include <linux/interval_tree_generic.h>
 
+#include <drm/drm_syncobj.h>
+
 #include "gem/i915_gem_vm_bind.h"
 #include "gt/gen8_engine_cs.h"
 
@@ -94,6 +96,68 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
 	}
 }
 
+static int i915_vm_bind_add_fence(struct drm_file *file, struct i915_vma *vma,
+				  u32 handle, u64 point)
+{
+	struct drm_syncobj *syncobj;
+
+	syncobj = drm_syncobj_find(file, handle);
+	if (!syncobj) {
+		DRM_DEBUG("Invalid syncobj handle provided\n");
+		return -ENOENT;
+	}
+
+	/*
+	 * For timeline syncobjs we need to preallocate chains for
+	 * later signaling.
+	 */
+	if (point) {
+		vma->vm_bind_fence.chain_fence = dma_fence_chain_alloc();
+		if (!vma->vm_bind_fence.chain_fence) {
+			drm_syncobj_put(syncobj);
+			return -ENOMEM;
+		}
+	} else {
+		vma->vm_bind_fence.chain_fence = NULL;
+	}
+
+	vma->vm_bind_fence.syncobj = syncobj;
+	vma->vm_bind_fence.value = point;
+
+	return 0;
+}
+
+static void i915_vm_bind_put_fence(struct i915_vma *vma)
+{
+	if (!vma->vm_bind_fence.syncobj)
+		return;
+
+	drm_syncobj_put(vma->vm_bind_fence.syncobj);
+	dma_fence_chain_free(vma->vm_bind_fence.chain_fence);
+}
+
+void i915_vm_bind_signal_fence(struct i915_vma *vma,
+			       struct dma_fence * const fence)
+{
+	struct drm_syncobj *syncobj = vma->vm_bind_fence.syncobj;
+
+	if (!syncobj)
+		return;
+
+	if (vma->vm_bind_fence.chain_fence) {
+		drm_syncobj_add_point(syncobj,
+				      vma->vm_bind_fence.chain_fence,
+				      fence, vma->vm_bind_fence.value);
+		/*
+		 * The chain's ownership is transferred to the
+		 * timeline.
+		 */
+		vma->vm_bind_fence.chain_fence = NULL;
+	} else {
+		drm_syncobj_replace_fence(syncobj, fence);
+	}
+}
+
 int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
 			   struct drm_i915_gem_vm_unbind *va)
 {
@@ -202,6 +266,14 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 	}
 
 	i915_gem_ww_ctx_init(&ww, true);
+
+	if (va->fence.flags & I915_TIMELINE_FENCE_SIGNAL) {
+		ret = i915_vm_bind_add_fence(file, vma, va->fence.handle,
+					     va->fence.value);
+		if (ret)
+			goto put_vma;
+	}
+
 	pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
 retry:
 	ret = i915_gem_object_lock(vma->obj, &ww);
@@ -230,6 +302,8 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 			goto retry;
 	}
 
+	i915_vm_bind_put_fence(vma);
+put_vma:
 	if (ret)
 		i915_vma_destroy(vma);
 
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index f0226581d342..6737236b7884 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1510,8 +1510,12 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 err_vma_res:
 	i915_vma_resource_free(vma_res);
 err_fence:
-	if (work)
+	if (work) {
+		if (i915_vma_is_persistent(vma))
+			i915_vm_bind_signal_fence(vma, &work->base.dma);
+
 		dma_fence_work_commit_imm(&work->base);
+	}
 err_rpm:
 	if (wakeref)
 		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index 2298b3d6b7c4..7d830a6a0b51 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -293,6 +293,13 @@ struct i915_vma {
 	/* Link in non-private persistent VMA list */
 	struct list_head non_priv_vm_bind_link;
 
+	/** Timeline fence for vm_bind completion notification */
+	struct {
+		struct drm_syncobj *syncobj;
+		u64 value;
+		struct dma_fence_chain *chain_fence;
+	} vm_bind_fence;
+
 	/** Interval tree structures for persistent vma */
 	struct rb_node rb;
 	u64 start;
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
  2022-07-01 22:50 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Treat VM_BIND vmas as persistent and handle them during the
request submission in the execbuff path.

Support eviction by maintaining a list of evicted persistent vmas
for rebinding during next submission.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  3 +
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 12 ++-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
 drivers/gpu/drm/i915/i915_gem_gtt.h           | 22 ++++++
 drivers/gpu/drm/i915/i915_vma.c               | 32 +++++++-
 drivers/gpu/drm/i915/i915_vma.h               | 78 +++++++++++++++++--
 drivers/gpu/drm/i915/i915_vma_types.h         | 23 ++++++
 9 files changed, 163 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index ccec4055fde3..5121f02ba95c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -38,6 +38,7 @@
 #include "i915_gem_mman.h"
 #include "i915_gem_object.h"
 #include "i915_gem_ttm.h"
+#include "i915_gem_vm_bind.h"
 #include "i915_memcpy.h"
 #include "i915_trace.h"
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
index 849bf3c1061e..eaadf5a6ab09 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
@@ -6,6 +6,7 @@
 #ifndef __I915_GEM_VM_BIND_H
 #define __I915_GEM_VM_BIND_H
 
+#include <linux/dma-resv.h>
 #include "i915_drv.h"
 
 #define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)->vm_bind_lock)
@@ -26,6 +27,8 @@ static inline void i915_gem_vm_bind_unlock(struct i915_address_space *vm)
 	mutex_unlock(&vm->vm_bind_lock);
 }
 
+#define assert_vm_priv_held(vm)   assert_object_held((vm)->root_obj)
+
 static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
 					struct i915_gem_ww_ctx *ww)
 {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 96f139cc8060..1a8efa83547f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -85,6 +85,13 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
 {
 	assert_vm_bind_held(vma->vm);
 
+	spin_lock(&vma->vm->vm_rebind_lock);
+	if (!list_empty(&vma->vm_rebind_link))
+		list_del_init(&vma->vm_rebind_link);
+	i915_vma_set_purged(vma);
+	i915_vma_set_freed(vma);
+	spin_unlock(&vma->vm->vm_rebind_lock);
+
 	if (!list_empty(&vma->vm_bind_link)) {
 		list_del_init(&vma->vm_bind_link);
 		list_del_init(&vma->non_priv_vm_bind_link);
@@ -220,6 +227,7 @@ static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
 
 	vma->start = va->start;
 	vma->last = va->start + va->length - 1;
+	i915_vma_set_persistent(vma);
 
 	return vma;
 }
@@ -304,8 +312,10 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 
 	i915_vm_bind_put_fence(vma);
 put_vma:
-	if (ret)
+	if (ret) {
+		i915_vma_set_freed(vma);
 		i915_vma_destroy(vma);
+	}
 
 	i915_gem_ww_ctx_fini(&ww);
 unlock_vm:
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index df0a8459c3c6..55d5389b2c6c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -293,6 +293,8 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
 	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
 	GEM_BUG_ON(IS_ERR(vm->root_obj));
+	INIT_LIST_HEAD(&vm->vm_rebind_list);
+	spin_lock_init(&vm->vm_rebind_lock);
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index f538ce9115c9..fe5485c4a1cd 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -265,6 +265,8 @@ struct i915_address_space {
 	struct mutex vm_bind_lock;  /* Protects vm_bind lists */
 	struct list_head vm_bind_list;
 	struct list_head vm_bound_list;
+	struct list_head vm_rebind_list;
+	spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
 	/* va tree of persistent vmas */
 	struct rb_root_cached va;
 	struct list_head non_priv_vm_bind_list;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8c2f57eb5dda..09b89d1913fc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -51,4 +51,26 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 
 #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
 
+static inline int i915_vm_sync(struct i915_address_space *vm)
+{
+	int ret;
+
+	/* Wait for all requests under this vm to finish */
+	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
+				    DMA_RESV_USAGE_BOOKKEEP, false,
+				    MAX_SCHEDULE_TIMEOUT);
+	if (ret < 0)
+		return ret;
+	else if (ret > 0)
+		return 0;
+	else
+		return -ETIMEDOUT;
+}
+
+static inline bool i915_vm_is_active(const struct i915_address_space *vm)
+{
+	return !dma_resv_test_signaled(vm->root_obj->base.resv,
+				       DMA_RESV_USAGE_BOOKKEEP);
+}
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 6737236b7884..6adb013579be 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
 
 	INIT_LIST_HEAD(&vma->vm_bind_link);
 	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
+	INIT_LIST_HEAD(&vma->vm_rebind_link);
 	return vma;
 
 err_unlock:
@@ -1622,7 +1623,8 @@ void i915_vma_close(struct i915_vma *vma)
 	if (atomic_dec_and_lock_irqsave(&vma->open_count,
 					&gt->closed_lock,
 					flags)) {
-		__vma_close(vma, gt);
+		if (!i915_vma_is_persistent(vma))
+			__vma_close(vma, gt);
 		spin_unlock_irqrestore(&gt->closed_lock, flags);
 	}
 }
@@ -1647,6 +1649,13 @@ static void force_unbind(struct i915_vma *vma)
 	if (!drm_mm_node_allocated(&vma->node))
 		return;
 
+	/*
+	 * Mark persistent vma as purged to avoid it waiting
+	 * for VM to be released.
+	 */
+	if (i915_vma_is_persistent(vma))
+		i915_vma_set_purged(vma);
+
 	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
 	WARN_ON(__i915_vma_unbind(vma));
 	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
@@ -1666,9 +1675,12 @@ static void release_references(struct i915_vma *vma, bool vm_ddestroy)
 
 	spin_unlock(&obj->vma.lock);
 
-	i915_gem_vm_bind_lock(vma->vm);
-	i915_gem_vm_bind_remove(vma, true);
-	i915_gem_vm_bind_unlock(vma->vm);
+	if (i915_vma_is_persistent(vma) &&
+	    !i915_vma_is_freed(vma)) {
+		i915_gem_vm_bind_lock(vma->vm);
+		i915_gem_vm_bind_remove(vma, true);
+		i915_gem_vm_bind_unlock(vma->vm);
+	}
 
 	spin_lock_irq(&gt->closed_lock);
 	__i915_vma_remove_closed(vma);
@@ -1839,6 +1851,8 @@ int _i915_vma_move_to_active(struct i915_vma *vma,
 	int err;
 
 	assert_object_held(obj);
+	if (i915_vma_is_persistent(vma))
+		return -EINVAL;
 
 	GEM_BUG_ON(!vma->pages);
 
@@ -1999,6 +2013,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
 	__i915_vma_evict(vma, false);
 
 	drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
+
+	if (i915_vma_is_persistent(vma)) {
+		spin_lock(&vma->vm->vm_rebind_lock);
+		if (list_empty(&vma->vm_rebind_link) &&
+		    !i915_vma_is_purged(vma))
+			list_add_tail(&vma->vm_rebind_link,
+				      &vma->vm->vm_rebind_list);
+		spin_unlock(&vma->vm->vm_rebind_lock);
+	}
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index dcb49f79ff7e..6c1369a40e03 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 
 void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
 #define I915_VMA_RELEASE_MAP BIT(0)
-
-static inline bool i915_vma_is_active(const struct i915_vma *vma)
-{
-	return !i915_active_is_idle(&vma->active);
-}
-
 /* do not reserve memory to prevent deadlocks */
 #define __EXEC_OBJECT_NO_RESERVE BIT(31)
 
@@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
 	return i915_vm_to_ggtt(vma->vm)->pin_bias;
 }
 
+static inline bool i915_vma_is_persistent(const struct i915_vma *vma)
+{
+	return test_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
+}
+
+static inline void i915_vma_set_persistent(struct i915_vma *vma)
+{
+	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
+}
+
+static inline bool i915_vma_is_purged(const struct i915_vma *vma)
+{
+	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
+}
+
+static inline void i915_vma_set_purged(struct i915_vma *vma)
+{
+	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
+}
+
+static inline bool i915_vma_is_freed(const struct i915_vma *vma)
+{
+	return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
+}
+
+static inline void i915_vma_set_freed(struct i915_vma *vma)
+{
+	set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
+}
+
+static inline bool i915_vma_is_active(const struct i915_vma *vma)
+{
+	if (i915_vma_is_persistent(vma)) {
+		if (i915_vma_is_purged(vma))
+			return false;
+
+		return i915_vm_is_active(vma->vm);
+	}
+
+	return !i915_active_is_idle(&vma->active);
+}
+
 static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
 {
 	i915_gem_object_get(vma->obj);
@@ -408,8 +444,36 @@ int i915_vma_wait_for_bind(struct i915_vma *vma);
 
 static inline int i915_vma_sync(struct i915_vma *vma)
 {
+	int ret;
+
 	/* Wait for the asynchronous bindings and pending GPU reads */
-	return i915_active_wait(&vma->active);
+	ret = i915_active_wait(&vma->active);
+	if (ret || !i915_vma_is_persistent(vma) || i915_vma_is_purged(vma))
+		return ret;
+
+	return i915_vm_sync(vma->vm);
+}
+
+static inline bool i915_vma_is_bind_complete(struct i915_vma *vma)
+{
+	/* Ensure vma bind is initiated */
+	if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
+		return false;
+
+	/* Ensure any binding started is complete */
+	if (rcu_access_pointer(vma->active.excl.fence)) {
+		struct dma_fence *fence;
+
+		rcu_read_lock();
+		fence = dma_fence_get_rcu_safe(&vma->active.excl.fence);
+		rcu_read_unlock();
+		if (fence) {
+			dma_fence_put(fence);
+			return false;
+		}
+	}
+
+	return true;
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index 7d830a6a0b51..405c82e1bc30 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -264,6 +264,28 @@ struct i915_vma {
 #define I915_VMA_SCANOUT_BIT	17
 #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
 
+  /**
+   * I915_VMA_PERSISTENT_BIT:
+   * The vma is persistent (created with VM_BIND call).
+   *
+   * I915_VMA_PURGED_BIT:
+   * The persistent vma is force unbound either due to VM_UNBIND call
+   * from UMD or VM is released. Do not check/wait for VM activeness
+   * in i915_vma_is_active() and i915_vma_sync() calls.
+   *
+   * I915_VMA_FREED_BIT:
+   * The persistent vma is being released by UMD via VM_UNBIND call.
+   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND call
+   * already holds the lock.
+   */
+#define I915_VMA_PERSISTENT_BIT	19
+#define I915_VMA_PURGED_BIT	20
+#define I915_VMA_FREED_BIT	21
+
+#define I915_VMA_PERSISTENT	((int)BIT(I915_VMA_PERSISTENT_BIT))
+#define I915_VMA_PURGED		((int)BIT(I915_VMA_PURGED_BIT))
+#define I915_VMA_FREED		((int)BIT(I915_VMA_FREED_BIT))
+
 	struct i915_active active;
 
 #define I915_VMA_PAGES_BIAS 24
@@ -292,6 +314,7 @@ struct i915_vma {
 	struct list_head vm_bind_link; /* Link in persistent VMA list */
 	/* Link in non-private persistent VMA list */
 	struct list_head non_priv_vm_bind_link;
+	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
 
 	/** Timeline fence for vm_bind completion notification */
 	struct {
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Treat VM_BIND vmas as persistent and handle them during the
request submission in the execbuff path.

Support eviction by maintaining a list of evicted persistent vmas
for rebinding during next submission.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  3 +
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 12 ++-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
 drivers/gpu/drm/i915/i915_gem_gtt.h           | 22 ++++++
 drivers/gpu/drm/i915/i915_vma.c               | 32 +++++++-
 drivers/gpu/drm/i915/i915_vma.h               | 78 +++++++++++++++++--
 drivers/gpu/drm/i915/i915_vma_types.h         | 23 ++++++
 9 files changed, 163 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index ccec4055fde3..5121f02ba95c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -38,6 +38,7 @@
 #include "i915_gem_mman.h"
 #include "i915_gem_object.h"
 #include "i915_gem_ttm.h"
+#include "i915_gem_vm_bind.h"
 #include "i915_memcpy.h"
 #include "i915_trace.h"
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
index 849bf3c1061e..eaadf5a6ab09 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
@@ -6,6 +6,7 @@
 #ifndef __I915_GEM_VM_BIND_H
 #define __I915_GEM_VM_BIND_H
 
+#include <linux/dma-resv.h>
 #include "i915_drv.h"
 
 #define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)->vm_bind_lock)
@@ -26,6 +27,8 @@ static inline void i915_gem_vm_bind_unlock(struct i915_address_space *vm)
 	mutex_unlock(&vm->vm_bind_lock);
 }
 
+#define assert_vm_priv_held(vm)   assert_object_held((vm)->root_obj)
+
 static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
 					struct i915_gem_ww_ctx *ww)
 {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 96f139cc8060..1a8efa83547f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -85,6 +85,13 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
 {
 	assert_vm_bind_held(vma->vm);
 
+	spin_lock(&vma->vm->vm_rebind_lock);
+	if (!list_empty(&vma->vm_rebind_link))
+		list_del_init(&vma->vm_rebind_link);
+	i915_vma_set_purged(vma);
+	i915_vma_set_freed(vma);
+	spin_unlock(&vma->vm->vm_rebind_lock);
+
 	if (!list_empty(&vma->vm_bind_link)) {
 		list_del_init(&vma->vm_bind_link);
 		list_del_init(&vma->non_priv_vm_bind_link);
@@ -220,6 +227,7 @@ static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
 
 	vma->start = va->start;
 	vma->last = va->start + va->length - 1;
+	i915_vma_set_persistent(vma);
 
 	return vma;
 }
@@ -304,8 +312,10 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 
 	i915_vm_bind_put_fence(vma);
 put_vma:
-	if (ret)
+	if (ret) {
+		i915_vma_set_freed(vma);
 		i915_vma_destroy(vma);
+	}
 
 	i915_gem_ww_ctx_fini(&ww);
 unlock_vm:
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index df0a8459c3c6..55d5389b2c6c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -293,6 +293,8 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
 	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
 	GEM_BUG_ON(IS_ERR(vm->root_obj));
+	INIT_LIST_HEAD(&vm->vm_rebind_list);
+	spin_lock_init(&vm->vm_rebind_lock);
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index f538ce9115c9..fe5485c4a1cd 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -265,6 +265,8 @@ struct i915_address_space {
 	struct mutex vm_bind_lock;  /* Protects vm_bind lists */
 	struct list_head vm_bind_list;
 	struct list_head vm_bound_list;
+	struct list_head vm_rebind_list;
+	spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
 	/* va tree of persistent vmas */
 	struct rb_root_cached va;
 	struct list_head non_priv_vm_bind_list;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8c2f57eb5dda..09b89d1913fc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -51,4 +51,26 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 
 #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
 
+static inline int i915_vm_sync(struct i915_address_space *vm)
+{
+	int ret;
+
+	/* Wait for all requests under this vm to finish */
+	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
+				    DMA_RESV_USAGE_BOOKKEEP, false,
+				    MAX_SCHEDULE_TIMEOUT);
+	if (ret < 0)
+		return ret;
+	else if (ret > 0)
+		return 0;
+	else
+		return -ETIMEDOUT;
+}
+
+static inline bool i915_vm_is_active(const struct i915_address_space *vm)
+{
+	return !dma_resv_test_signaled(vm->root_obj->base.resv,
+				       DMA_RESV_USAGE_BOOKKEEP);
+}
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 6737236b7884..6adb013579be 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
 
 	INIT_LIST_HEAD(&vma->vm_bind_link);
 	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
+	INIT_LIST_HEAD(&vma->vm_rebind_link);
 	return vma;
 
 err_unlock:
@@ -1622,7 +1623,8 @@ void i915_vma_close(struct i915_vma *vma)
 	if (atomic_dec_and_lock_irqsave(&vma->open_count,
 					&gt->closed_lock,
 					flags)) {
-		__vma_close(vma, gt);
+		if (!i915_vma_is_persistent(vma))
+			__vma_close(vma, gt);
 		spin_unlock_irqrestore(&gt->closed_lock, flags);
 	}
 }
@@ -1647,6 +1649,13 @@ static void force_unbind(struct i915_vma *vma)
 	if (!drm_mm_node_allocated(&vma->node))
 		return;
 
+	/*
+	 * Mark persistent vma as purged to avoid it waiting
+	 * for VM to be released.
+	 */
+	if (i915_vma_is_persistent(vma))
+		i915_vma_set_purged(vma);
+
 	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
 	WARN_ON(__i915_vma_unbind(vma));
 	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
@@ -1666,9 +1675,12 @@ static void release_references(struct i915_vma *vma, bool vm_ddestroy)
 
 	spin_unlock(&obj->vma.lock);
 
-	i915_gem_vm_bind_lock(vma->vm);
-	i915_gem_vm_bind_remove(vma, true);
-	i915_gem_vm_bind_unlock(vma->vm);
+	if (i915_vma_is_persistent(vma) &&
+	    !i915_vma_is_freed(vma)) {
+		i915_gem_vm_bind_lock(vma->vm);
+		i915_gem_vm_bind_remove(vma, true);
+		i915_gem_vm_bind_unlock(vma->vm);
+	}
 
 	spin_lock_irq(&gt->closed_lock);
 	__i915_vma_remove_closed(vma);
@@ -1839,6 +1851,8 @@ int _i915_vma_move_to_active(struct i915_vma *vma,
 	int err;
 
 	assert_object_held(obj);
+	if (i915_vma_is_persistent(vma))
+		return -EINVAL;
 
 	GEM_BUG_ON(!vma->pages);
 
@@ -1999,6 +2013,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
 	__i915_vma_evict(vma, false);
 
 	drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */
+
+	if (i915_vma_is_persistent(vma)) {
+		spin_lock(&vma->vm->vm_rebind_lock);
+		if (list_empty(&vma->vm_rebind_link) &&
+		    !i915_vma_is_purged(vma))
+			list_add_tail(&vma->vm_rebind_link,
+				      &vma->vm->vm_rebind_list);
+		spin_unlock(&vma->vm->vm_rebind_lock);
+	}
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index dcb49f79ff7e..6c1369a40e03 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 
 void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
 #define I915_VMA_RELEASE_MAP BIT(0)
-
-static inline bool i915_vma_is_active(const struct i915_vma *vma)
-{
-	return !i915_active_is_idle(&vma->active);
-}
-
 /* do not reserve memory to prevent deadlocks */
 #define __EXEC_OBJECT_NO_RESERVE BIT(31)
 
@@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
 	return i915_vm_to_ggtt(vma->vm)->pin_bias;
 }
 
+static inline bool i915_vma_is_persistent(const struct i915_vma *vma)
+{
+	return test_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
+}
+
+static inline void i915_vma_set_persistent(struct i915_vma *vma)
+{
+	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
+}
+
+static inline bool i915_vma_is_purged(const struct i915_vma *vma)
+{
+	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
+}
+
+static inline void i915_vma_set_purged(struct i915_vma *vma)
+{
+	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
+}
+
+static inline bool i915_vma_is_freed(const struct i915_vma *vma)
+{
+	return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
+}
+
+static inline void i915_vma_set_freed(struct i915_vma *vma)
+{
+	set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
+}
+
+static inline bool i915_vma_is_active(const struct i915_vma *vma)
+{
+	if (i915_vma_is_persistent(vma)) {
+		if (i915_vma_is_purged(vma))
+			return false;
+
+		return i915_vm_is_active(vma->vm);
+	}
+
+	return !i915_active_is_idle(&vma->active);
+}
+
 static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
 {
 	i915_gem_object_get(vma->obj);
@@ -408,8 +444,36 @@ int i915_vma_wait_for_bind(struct i915_vma *vma);
 
 static inline int i915_vma_sync(struct i915_vma *vma)
 {
+	int ret;
+
 	/* Wait for the asynchronous bindings and pending GPU reads */
-	return i915_active_wait(&vma->active);
+	ret = i915_active_wait(&vma->active);
+	if (ret || !i915_vma_is_persistent(vma) || i915_vma_is_purged(vma))
+		return ret;
+
+	return i915_vm_sync(vma->vm);
+}
+
+static inline bool i915_vma_is_bind_complete(struct i915_vma *vma)
+{
+	/* Ensure vma bind is initiated */
+	if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
+		return false;
+
+	/* Ensure any binding started is complete */
+	if (rcu_access_pointer(vma->active.excl.fence)) {
+		struct dma_fence *fence;
+
+		rcu_read_lock();
+		fence = dma_fence_get_rcu_safe(&vma->active.excl.fence);
+		rcu_read_unlock();
+		if (fence) {
+			dma_fence_put(fence);
+			return false;
+		}
+	}
+
+	return true;
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index 7d830a6a0b51..405c82e1bc30 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -264,6 +264,28 @@ struct i915_vma {
 #define I915_VMA_SCANOUT_BIT	17
 #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
 
+  /**
+   * I915_VMA_PERSISTENT_BIT:
+   * The vma is persistent (created with VM_BIND call).
+   *
+   * I915_VMA_PURGED_BIT:
+   * The persistent vma is force unbound either due to VM_UNBIND call
+   * from UMD or VM is released. Do not check/wait for VM activeness
+   * in i915_vma_is_active() and i915_vma_sync() calls.
+   *
+   * I915_VMA_FREED_BIT:
+   * The persistent vma is being released by UMD via VM_UNBIND call.
+   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND call
+   * already holds the lock.
+   */
+#define I915_VMA_PERSISTENT_BIT	19
+#define I915_VMA_PURGED_BIT	20
+#define I915_VMA_FREED_BIT	21
+
+#define I915_VMA_PERSISTENT	((int)BIT(I915_VMA_PERSISTENT_BIT))
+#define I915_VMA_PURGED		((int)BIT(I915_VMA_PURGED_BIT))
+#define I915_VMA_FREED		((int)BIT(I915_VMA_FREED_BIT))
+
 	struct i915_active active;
 
 #define I915_VMA_PAGES_BIAS 24
@@ -292,6 +314,7 @@ struct i915_vma {
 	struct list_head vm_bind_link; /* Link in persistent VMA list */
 	/* Link in non-private persistent VMA list */
 	struct list_head non_priv_vm_bind_link;
+	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
 
 	/** Timeline fence for vm_bind completion notification */
 	struct {
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [RFC 06/10] drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
  2022-07-01 22:50 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
works in vm_bind mode. The vm_bind mode only works with
this new execbuf3 ioctl.

The new execbuf3 ioctl will not have any execlist support
and all the legacy support like relocations etc are removed.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |    1 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |    5 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1029 +++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
 drivers/gpu/drm/i915/i915_driver.c            |    1 +
 include/uapi/drm/i915_drm.h                   |   67 +-
 6 files changed, 1104 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 4e1627e96c6e..38cd1c5bc1a5 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -148,6 +148,7 @@ gem-y += \
 	gem/i915_gem_dmabuf.o \
 	gem/i915_gem_domain.o \
 	gem/i915_gem_execbuffer.o \
+	gem/i915_gem_execbuffer3.o \
 	gem/i915_gem_internal.o \
 	gem/i915_gem_object.o \
 	gem/i915_gem_lmem.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index b7b2c14fd9e1..37bb1383ab8f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -782,6 +782,11 @@ static int eb_select_context(struct i915_execbuffer *eb)
 	if (unlikely(IS_ERR(ctx)))
 		return PTR_ERR(ctx);
 
+	if (ctx->vm->vm_bind_mode) {
+		i915_gem_context_put(ctx);
+		return -EOPNOTSUPP;
+	}
+
 	eb->gem_context = ctx;
 	if (i915_gem_context_has_full_ppgtt(ctx))
 		eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
new file mode 100644
index 000000000000..13121df72e3d
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
@@ -0,0 +1,1029 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include <linux/dma-resv.h>
+#include <linux/sync_file.h>
+#include <linux/uaccess.h>
+
+#include <drm/drm_syncobj.h>
+
+#include "gt/intel_context.h"
+#include "gt/intel_gpu_commands.h"
+#include "gt/intel_gt.h"
+#include "gt/intel_gt_pm.h"
+#include "gt/intel_ring.h"
+
+#include "i915_drv.h"
+#include "i915_file_private.h"
+#include "i915_gem_context.h"
+#include "i915_gem_ioctls.h"
+#include "i915_gem_vm_bind.h"
+#include "i915_trace.h"
+
+#define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
+#define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
+
+/* Catch emission of unexpected errors for CI! */
+#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
+#undef EINVAL
+#define EINVAL ({ \
+	DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \
+	22; \
+})
+#endif
+
+/**
+ * DOC: User command execution with execbuf3 ioctl
+ *
+ * A VM in VM_BIND mode will not support older execbuf mode of binding.
+ * The execbuf ioctl handling in VM_BIND mode differs significantly from the
+ * older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
+ * Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
+ * struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
+ * execlist. Hence, no support for implicit sync.
+ *
+ * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
+ * works with execbuf3 ioctl for submission.
+ *
+ * The execbuf3 ioctl directly specifies the batch addresses instead of as
+ * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
+ * support many of the older features like in/out/submit fences, fence array,
+ * default gem context etc. (See struct drm_i915_gem_execbuffer3).
+ *
+ * In VM_BIND mode, VA allocation is completely managed by the user instead of
+ * the i915 driver. Hence all VA assignment, eviction are not applicable in
+ * VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
+ * be using the i915_vma active reference tracking. It will instead check the
+ * dma-resv object's fence list for that.
+ *
+ * So, a lot of code supporting execbuf2 ioctl, like relocations, VA evictions,
+ * vma lookup table, implicit sync, vma active reference tracking etc., are not
+ * applicable for execbuf3 ioctl.
+ */
+
+struct eb_fence {
+	struct drm_syncobj *syncobj; /* Use with ptr_mask_bits() */
+	struct dma_fence *dma_fence;
+	u64 value;
+	struct dma_fence_chain *chain_fence;
+};
+
+struct i915_execbuffer {
+	struct drm_i915_private *i915; /** i915 backpointer */
+	struct drm_file *file; /** per-file lookup tables and limits */
+	struct drm_i915_gem_execbuffer3 *args; /** ioctl parameters */
+
+	struct intel_gt *gt; /* gt for the execbuf */
+	struct intel_context *context; /* logical state for the request */
+	struct i915_gem_context *gem_context; /** caller's context */
+
+	/** our requests to build */
+	struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
+
+	/** used for excl fence in dma_resv objects when > 1 BB submitted */
+	struct dma_fence *composite_fence;
+
+	struct i915_gem_ww_ctx ww;
+
+	/* number of batches in execbuf IOCTL */
+	unsigned int num_batches;
+
+	u64 batch_addresses[MAX_ENGINE_INSTANCE + 1];
+	/** identity of the batch obj/vma */
+	struct i915_vma *batches[MAX_ENGINE_INSTANCE + 1];
+
+	struct eb_fence *fences;
+	unsigned long num_fences;
+};
+
+static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle);
+static void eb_unpin_engine(struct i915_execbuffer *eb);
+
+static int eb_select_context(struct i915_execbuffer *eb)
+{
+	struct i915_gem_context *ctx;
+
+	ctx = i915_gem_context_lookup(eb->file->driver_priv, eb->args->ctx_id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	eb->gem_context = ctx;
+	return 0;
+}
+
+static struct i915_vma *
+eb_find_vma(struct i915_address_space *vm, u64 addr)
+{
+	u64 va;
+
+	assert_vm_bind_held(vm);
+
+	va = gen8_noncanonical_addr(addr & PIN_OFFSET_MASK);
+	return i915_gem_vm_bind_lookup_vma(vm, va);
+}
+
+static int eb_lookup_vmas(struct i915_execbuffer *eb)
+{
+	unsigned int i, current_batch = 0;
+	struct i915_vma *vma;
+
+	for (i = 0; i < eb->num_batches; i++) {
+		vma = eb_find_vma(eb->context->vm, eb->batch_addresses[i]);
+		if (!vma)
+			return -EINVAL;
+
+		eb->batches[current_batch] = vma;
+		++current_batch;
+	}
+
+	return 0;
+}
+
+static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
+{
+}
+
+static int eb_validate_vmas(struct i915_execbuffer *eb)
+{
+	int err;
+	bool throttle = true;
+
+retry:
+	err = eb_pin_engine(eb, throttle);
+	if (err) {
+		if (err != -EDEADLK)
+			return err;
+
+		goto err;
+	}
+
+	/* only throttle once, even if we didn't need to throttle */
+	throttle = false;
+
+err:
+	if (err == -EDEADLK) {
+		err = i915_gem_ww_ctx_backoff(&eb->ww);
+		if (!err)
+			goto retry;
+	}
+
+	return err;
+}
+
+/*
+ * Using two helper loops for the order of which requests / batches are created
+ * and added the to backend. Requests are created in order from the parent to
+ * the last child. Requests are added in the reverse order, from the last child
+ * to parent. This is done for locking reasons as the timeline lock is acquired
+ * during request creation and released when the request is added to the
+ * backend. To make lockdep happy (see intel_context_timeline_lock) this must be
+ * the ordering.
+ */
+#define for_each_batch_create_order(_eb, _i) \
+	for ((_i) = 0; (_i) < (_eb)->num_batches; ++(_i))
+#define for_each_batch_add_order(_eb, _i) \
+	BUILD_BUG_ON(!typecheck(int, _i)); \
+	for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
+
+static int eb_move_to_gpu(struct i915_execbuffer *eb)
+{
+	/* Unconditionally flush any chipset caches (for streaming writes). */
+	intel_gt_chipset_flush(eb->gt);
+
+	return 0;
+}
+
+static int eb_request_submit(struct i915_execbuffer *eb,
+			     struct i915_request *rq,
+			     struct i915_vma *batch,
+			     u64 batch_len)
+{
+	int err;
+
+	if (intel_context_nopreempt(rq->context))
+		__set_bit(I915_FENCE_FLAG_NOPREEMPT, &rq->fence.flags);
+
+	/*
+	 * After we completed waiting for other engines (using HW semaphores)
+	 * then we can signal that this request/batch is ready to run. This
+	 * allows us to determine if the batch is still waiting on the GPU
+	 * or actually running by checking the breadcrumb.
+	 */
+	if (rq->context->engine->emit_init_breadcrumb) {
+		err = rq->context->engine->emit_init_breadcrumb(rq);
+		if (err)
+			return err;
+	}
+
+	err = rq->context->engine->emit_bb_start(rq,
+						 batch->node.start,
+						 batch_len, 0);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static int eb_submit(struct i915_execbuffer *eb)
+{
+	unsigned int i;
+	int err;
+
+	err = eb_move_to_gpu(eb);
+
+	for_each_batch_create_order(eb, i) {
+		if (!eb->requests[i])
+			break;
+
+		trace_i915_request_queue(eb->requests[i], 0);
+		if (!err)
+			err = eb_request_submit(eb, eb->requests[i],
+						eb->batches[i],
+						eb->batches[i]->size);
+	}
+
+	return err;
+}
+
+static struct i915_request *eb_throttle(struct i915_execbuffer *eb, struct intel_context *ce)
+{
+	struct intel_ring *ring = ce->ring;
+	struct intel_timeline *tl = ce->timeline;
+	struct i915_request *rq;
+
+	/*
+	 * Completely unscientific finger-in-the-air estimates for suitable
+	 * maximum user request size (to avoid blocking) and then backoff.
+	 */
+	if (intel_ring_update_space(ring) >= PAGE_SIZE)
+		return NULL;
+
+	/*
+	 * Find a request that after waiting upon, there will be at least half
+	 * the ring available. The hysteresis allows us to compete for the
+	 * shared ring and should mean that we sleep less often prior to
+	 * claiming our resources, but not so long that the ring completely
+	 * drains before we can submit our next request.
+	 */
+	list_for_each_entry(rq, &tl->requests, link) {
+		if (rq->ring != ring)
+			continue;
+
+		if (__intel_ring_space(rq->postfix,
+				       ring->emit, ring->size) > ring->size / 2)
+			break;
+	}
+	if (&rq->link == &tl->requests)
+		return NULL; /* weird, we will check again later for real */
+
+	return i915_request_get(rq);
+}
+
+static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
+			   bool throttle)
+{
+	struct intel_timeline *tl;
+	struct i915_request *rq = NULL;
+
+	/*
+	 * Take a local wakeref for preparing to dispatch the execbuf as
+	 * we expect to access the hardware fairly frequently in the
+	 * process, and require the engine to be kept awake between accesses.
+	 * Upon dispatch, we acquire another prolonged wakeref that we hold
+	 * until the timeline is idle, which in turn releases the wakeref
+	 * taken on the engine, and the parent device.
+	 */
+	tl = intel_context_timeline_lock(ce);
+	if (IS_ERR(tl))
+		return PTR_ERR(tl);
+
+	intel_context_enter(ce);
+	if (throttle)
+		rq = eb_throttle(eb, ce);
+	intel_context_timeline_unlock(tl);
+
+	if (rq) {
+		bool nonblock = eb->file->filp->f_flags & O_NONBLOCK;
+		long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
+
+		if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
+				      timeout) < 0) {
+			i915_request_put(rq);
+
+			/*
+			 * Error path, cannot use intel_context_timeline_lock as
+			 * that is user interruptable and this clean up step
+			 * must be done.
+			 */
+			mutex_lock(&ce->timeline->mutex);
+			intel_context_exit(ce);
+			mutex_unlock(&ce->timeline->mutex);
+
+			if (nonblock)
+				return -EWOULDBLOCK;
+			else
+				return -EINTR;
+		}
+		i915_request_put(rq);
+	}
+
+	return 0;
+}
+
+static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
+{
+	struct intel_context *ce = eb->context, *child;
+	int err;
+	int i = 0, j = 0;
+
+	GEM_BUG_ON(eb->args->flags & __EXEC3_ENGINE_PINNED);
+
+	if (unlikely(intel_context_is_banned(ce)))
+		return -EIO;
+
+	/*
+	 * Pinning the contexts may generate requests in order to acquire
+	 * GGTT space, so do this first before we reserve a seqno for
+	 * ourselves.
+	 */
+	err = intel_context_pin_ww(ce, &eb->ww);
+	if (err)
+		return err;
+	for_each_child(ce, child) {
+		err = intel_context_pin_ww(child, &eb->ww);
+		GEM_BUG_ON(err);	/* perma-pinned should incr a counter */
+	}
+
+	for_each_child(ce, child) {
+		err = eb_pin_timeline(eb, child, throttle);
+		if (err)
+			goto unwind;
+		++i;
+	}
+	err = eb_pin_timeline(eb, ce, throttle);
+	if (err)
+		goto unwind;
+
+	eb->args->flags |= __EXEC3_ENGINE_PINNED;
+	return 0;
+
+unwind:
+	for_each_child(ce, child) {
+		if (j++ < i) {
+			mutex_lock(&child->timeline->mutex);
+			intel_context_exit(child);
+			mutex_unlock(&child->timeline->mutex);
+		}
+	}
+	for_each_child(ce, child)
+		intel_context_unpin(child);
+	intel_context_unpin(ce);
+	return err;
+}
+
+static void eb_unpin_engine(struct i915_execbuffer *eb)
+{
+	struct intel_context *ce = eb->context, *child;
+
+	if (!(eb->args->flags & __EXEC3_ENGINE_PINNED))
+		return;
+
+	eb->args->flags &= ~__EXEC3_ENGINE_PINNED;
+
+	for_each_child(ce, child) {
+		mutex_lock(&child->timeline->mutex);
+		intel_context_exit(child);
+		mutex_unlock(&child->timeline->mutex);
+
+		intel_context_unpin(child);
+	}
+
+	mutex_lock(&ce->timeline->mutex);
+	intel_context_exit(ce);
+	mutex_unlock(&ce->timeline->mutex);
+
+	intel_context_unpin(ce);
+}
+
+static int
+eb_select_engine(struct i915_execbuffer *eb)
+{
+	struct intel_context *ce, *child;
+	unsigned int idx;
+	int err;
+
+	if (!i915_gem_context_user_engines(eb->gem_context))
+		return -EINVAL;
+
+	idx = eb->args->engine_idx;
+	ce = i915_gem_context_get_engine(eb->gem_context, idx);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
+	eb->num_batches = ce->parallel.number_children + 1;
+
+	for_each_child(ce, child)
+		intel_context_get(child);
+	intel_gt_pm_get(ce->engine->gt);
+
+	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
+		err = intel_context_alloc_state(ce);
+		if (err)
+			goto err;
+	}
+	for_each_child(ce, child) {
+		if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
+			err = intel_context_alloc_state(child);
+			if (err)
+				goto err;
+		}
+	}
+
+	/*
+	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
+	 * EIO if the GPU is already wedged.
+	 */
+	err = intel_gt_terminally_wedged(ce->engine->gt);
+	if (err)
+		goto err;
+
+	if (!i915_vm_tryget(ce->vm)) {
+		err = -ENOENT;
+		goto err;
+	}
+
+	eb->context = ce;
+	eb->gt = ce->engine->gt;
+
+	/*
+	 * Make sure engine pool stays alive even if we call intel_context_put
+	 * during ww handling. The pool is destroyed when last pm reference
+	 * is dropped, which breaks our -EDEADLK handling.
+	 */
+	return err;
+
+err:
+	intel_gt_pm_put(ce->engine->gt);
+	for_each_child(ce, child)
+		intel_context_put(child);
+	intel_context_put(ce);
+	return err;
+}
+
+static void
+eb_put_engine(struct i915_execbuffer *eb)
+{
+	struct intel_context *child;
+
+	i915_vm_put(eb->context->vm);
+	intel_gt_pm_put(eb->gt);
+	for_each_child(eb->context, child)
+		intel_context_put(child);
+	intel_context_put(eb->context);
+}
+
+static void
+__free_fence_array(struct eb_fence *fences, unsigned int n)
+{
+	while (n--) {
+		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
+		dma_fence_put(fences[n].dma_fence);
+		dma_fence_chain_free(fences[n].chain_fence);
+	}
+	kvfree(fences);
+}
+
+static int add_timeline_fence_array(struct i915_execbuffer *eb)
+{
+	struct drm_i915_gem_timeline_fence __user *user_fences;
+	struct eb_fence *f;
+	u64 nfences;
+	int err = 0;
+
+	nfences = eb->args->fence_count;
+	if (!nfences)
+		return 0;
+
+	/* Check multiplication overflow for access_ok() and kvmalloc_array() */
+	BUILD_BUG_ON(sizeof(size_t) > sizeof(unsigned long));
+	if (nfences > min_t(unsigned long,
+			    ULONG_MAX / sizeof(*user_fences),
+			    SIZE_MAX / sizeof(*f)) - eb->num_fences)
+		return -EINVAL;
+
+	user_fences = u64_to_user_ptr(eb->args->timeline_fences);
+	if (!access_ok(user_fences, nfences * sizeof(*user_fences)))
+		return -EFAULT;
+
+	f = krealloc(eb->fences,
+		     (eb->num_fences + nfences) * sizeof(*f),
+		     __GFP_NOWARN | GFP_KERNEL);
+	if (!f)
+		return -ENOMEM;
+
+	eb->fences = f;
+	f += eb->num_fences;
+
+	BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
+		     ~__I915_TIMELINE_FENCE_UNKNOWN_FLAGS);
+
+	while (nfences--) {
+		struct drm_i915_gem_timeline_fence user_fence;
+		struct drm_syncobj *syncobj;
+		struct dma_fence *fence = NULL;
+		u64 point;
+
+		if (__copy_from_user(&user_fence,
+				     user_fences++,
+				     sizeof(user_fence)))
+			return -EFAULT;
+
+		if (user_fence.flags & __I915_TIMELINE_FENCE_UNKNOWN_FLAGS)
+			return -EINVAL;
+
+		syncobj = drm_syncobj_find(eb->file, user_fence.handle);
+		if (!syncobj) {
+			DRM_DEBUG("Invalid syncobj handle provided\n");
+			return -ENOENT;
+		}
+
+		fence = drm_syncobj_fence_get(syncobj);
+
+		if (!fence && user_fence.flags &&
+		    !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL)) {
+			DRM_DEBUG("Syncobj handle has no fence\n");
+			drm_syncobj_put(syncobj);
+			return -EINVAL;
+		}
+
+		point = user_fence.value;
+		if (fence)
+			err = dma_fence_chain_find_seqno(&fence, point);
+
+		if (err && !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL)) {
+			DRM_DEBUG("Syncobj handle missing requested point %llu\n", point);
+			dma_fence_put(fence);
+			drm_syncobj_put(syncobj);
+			return err;
+		}
+
+		/*
+		 * A point might have been signaled already and
+		 * garbage collected from the timeline. In this case
+		 * just ignore the point and carry on.
+		 */
+		if (!fence && !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL)) {
+			drm_syncobj_put(syncobj);
+			continue;
+		}
+
+		/*
+		 * For timeline syncobjs we need to preallocate chains for
+		 * later signaling.
+		 */
+		if (point != 0 && user_fence.flags & I915_TIMELINE_FENCE_SIGNAL) {
+			/*
+			 * Waiting and signaling the same point (when point !=
+			 * 0) would break the timeline.
+			 */
+			if (user_fence.flags & I915_TIMELINE_FENCE_WAIT) {
+				DRM_DEBUG("Trying to wait & signal the same timeline point.\n");
+				dma_fence_put(fence);
+				drm_syncobj_put(syncobj);
+				return -EINVAL;
+			}
+
+			f->chain_fence = dma_fence_chain_alloc();
+			if (!f->chain_fence) {
+				drm_syncobj_put(syncobj);
+				dma_fence_put(fence);
+				return -ENOMEM;
+			}
+		} else {
+			f->chain_fence = NULL;
+		}
+
+		f->syncobj = ptr_pack_bits(syncobj, user_fence.flags, 2);
+		f->dma_fence = fence;
+		f->value = point;
+		f++;
+		eb->num_fences++;
+	}
+
+	return 0;
+}
+
+static void put_fence_array(struct eb_fence *fences, int num_fences)
+{
+	if (fences)
+		__free_fence_array(fences, num_fences);
+}
+
+static int
+await_fence_array(struct i915_execbuffer *eb,
+		  struct i915_request *rq)
+{
+	unsigned int n;
+	int err;
+
+	for (n = 0; n < eb->num_fences; n++) {
+		struct drm_syncobj *syncobj;
+		unsigned int flags;
+
+		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
+
+		if (!eb->fences[n].dma_fence)
+			continue;
+
+		err = i915_request_await_dma_fence(rq, eb->fences[n].dma_fence);
+		if (err < 0)
+			return err;
+	}
+
+	return 0;
+}
+
+static void signal_fence_array(const struct i915_execbuffer *eb,
+			       struct dma_fence * const fence)
+{
+	unsigned int n;
+
+	for (n = 0; n < eb->num_fences; n++) {
+		struct drm_syncobj *syncobj;
+		unsigned int flags;
+
+		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
+		if (!(flags & I915_TIMELINE_FENCE_SIGNAL))
+			continue;
+
+		if (eb->fences[n].chain_fence) {
+			drm_syncobj_add_point(syncobj,
+					      eb->fences[n].chain_fence,
+					      fence,
+					      eb->fences[n].value);
+			/*
+			 * The chain's ownership is transferred to the
+			 * timeline.
+			 */
+			eb->fences[n].chain_fence = NULL;
+		} else {
+			drm_syncobj_replace_fence(syncobj, fence);
+		}
+	}
+}
+
+static int parse_timeline_fences(struct i915_execbuffer *eb)
+{
+	return add_timeline_fence_array(eb);
+}
+
+static int parse_batch_addresses(struct i915_execbuffer *eb)
+{
+	struct drm_i915_gem_execbuffer3 *args = eb->args;
+	u64 __user *batch_addr = u64_to_user_ptr(args->batch_address);
+
+	if (copy_from_user(eb->batch_addresses, batch_addr,
+			   sizeof(batch_addr[0]) * eb->num_batches))
+		return -EFAULT;
+
+	return 0;
+}
+
+static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
+{
+	struct i915_request *rq, *rn;
+
+	list_for_each_entry_safe(rq, rn, &tl->requests, link)
+		if (rq == end || !i915_request_retire(rq))
+			break;
+}
+
+static int eb_request_add(struct i915_execbuffer *eb, struct i915_request *rq,
+			  int err, bool last_parallel)
+{
+	struct intel_timeline * const tl = i915_request_timeline(rq);
+	struct i915_sched_attr attr = {};
+	struct i915_request *prev;
+
+	lockdep_assert_held(&tl->mutex);
+	lockdep_unpin_lock(&tl->mutex, rq->cookie);
+
+	trace_i915_request_add(rq);
+
+	prev = __i915_request_commit(rq);
+
+	/* Check that the context wasn't destroyed before submission */
+	if (likely(!intel_context_is_closed(eb->context))) {
+		attr = eb->gem_context->sched;
+	} else {
+		/* Serialise with context_close via the add_to_timeline */
+		i915_request_set_error_once(rq, -ENOENT);
+		__i915_request_skip(rq);
+		err = -ENOENT; /* override any transient errors */
+	}
+
+	if (intel_context_is_parallel(eb->context)) {
+		if (err) {
+			__i915_request_skip(rq);
+			set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
+				&rq->fence.flags);
+		}
+		if (last_parallel)
+			set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
+				&rq->fence.flags);
+	}
+
+	__i915_request_queue(rq, &attr);
+
+	/* Try to clean up the client's timeline after submitting the request */
+	if (prev)
+		retire_requests(tl, prev);
+
+	mutex_unlock(&tl->mutex);
+
+	return err;
+}
+
+static int eb_requests_add(struct i915_execbuffer *eb, int err)
+{
+	int i;
+
+	/*
+	 * We iterate in reverse order of creation to release timeline mutexes in
+	 * same order.
+	 */
+	for_each_batch_add_order(eb, i) {
+		struct i915_request *rq = eb->requests[i];
+
+		if (!rq)
+			continue;
+		err |= eb_request_add(eb, rq, err, i == 0);
+	}
+
+	return err;
+}
+
+static void eb_requests_get(struct i915_execbuffer *eb)
+{
+	unsigned int i;
+
+	for_each_batch_create_order(eb, i) {
+		if (!eb->requests[i])
+			break;
+
+		i915_request_get(eb->requests[i]);
+	}
+}
+
+static void eb_requests_put(struct i915_execbuffer *eb)
+{
+	unsigned int i;
+
+	for_each_batch_create_order(eb, i) {
+		if (!eb->requests[i])
+			break;
+
+		i915_request_put(eb->requests[i]);
+	}
+}
+
+static int
+eb_composite_fence_create(struct i915_execbuffer *eb)
+{
+	struct dma_fence_array *fence_array;
+	struct dma_fence **fences;
+	unsigned int i;
+
+	GEM_BUG_ON(!intel_context_is_parent(eb->context));
+
+	fences = kmalloc_array(eb->num_batches, sizeof(*fences), GFP_KERNEL);
+	if (!fences)
+		return -ENOMEM;
+
+	for_each_batch_create_order(eb, i) {
+		fences[i] = &eb->requests[i]->fence;
+		__set_bit(I915_FENCE_FLAG_COMPOSITE,
+			  &eb->requests[i]->fence.flags);
+	}
+
+	fence_array = dma_fence_array_create(eb->num_batches,
+					     fences,
+					     eb->context->parallel.fence_context,
+					     eb->context->parallel.seqno++,
+					     false);
+	if (!fence_array) {
+		kfree(fences);
+		return -ENOMEM;
+	}
+
+	/* Move ownership to the dma_fence_array created above */
+	for_each_batch_create_order(eb, i)
+		dma_fence_get(fences[i]);
+
+	eb->composite_fence = &fence_array->base;
+
+	return 0;
+}
+
+static int
+eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq)
+{
+	int err;
+
+	if (unlikely(eb->gem_context->syncobj)) {
+		struct dma_fence *fence;
+
+		fence = drm_syncobj_fence_get(eb->gem_context->syncobj);
+		err = i915_request_await_dma_fence(rq, fence);
+		dma_fence_put(fence);
+		if (err)
+			return err;
+	}
+
+	if (eb->fences) {
+		err = await_fence_array(eb, rq);
+		if (err)
+			return err;
+	}
+
+	if (intel_context_is_parallel(eb->context)) {
+		err = eb_composite_fence_create(eb);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static struct intel_context *
+eb_find_context(struct i915_execbuffer *eb, unsigned int context_number)
+{
+	struct intel_context *child;
+
+	if (likely(context_number == 0))
+		return eb->context;
+
+	for_each_child(eb->context, child)
+		if (!--context_number)
+			return child;
+
+	GEM_BUG_ON("Context not found");
+
+	return NULL;
+}
+
+static int eb_requests_create(struct i915_execbuffer *eb)
+{
+	unsigned int i;
+	int err;
+
+	for_each_batch_create_order(eb, i) {
+		/* Allocate a request for this batch buffer nice and early. */
+		eb->requests[i] = i915_request_create(eb_find_context(eb, i));
+		if (IS_ERR(eb->requests[i])) {
+			err = PTR_ERR(eb->requests[i]);
+			eb->requests[i] = NULL;
+			return err;
+		}
+
+		/*
+		 * Only the first request added (committed to backend) has to
+		 * take the in fences into account as all subsequent requests
+		 * will have fences inserted inbetween them.
+		 */
+		if (i + 1 == eb->num_batches) {
+			err = eb_fences_add(eb, eb->requests[i]);
+			if (err)
+				return err;
+		}
+
+		/*
+		 * Not really on stack, but we don't want to call
+		 * kfree on the batch_snapshot when we put it, so use the
+		 * _onstack interface.
+		 */
+		if (eb->batches[i])
+			eb->requests[i]->batch_res =
+				i915_vma_resource_get(eb->batches[i]->resource);
+	}
+
+	return 0;
+}
+
+static int
+i915_gem_do_execbuffer(struct drm_device *dev,
+		       struct drm_file *file,
+		       struct drm_i915_gem_execbuffer3 *args)
+{
+	struct drm_i915_private *i915 = to_i915(dev);
+	struct i915_execbuffer eb;
+	int err;
+
+	BUILD_BUG_ON(__EXEC3_INTERNAL_FLAGS & ~__I915_EXEC3_UNKNOWN_FLAGS);
+
+	eb.i915 = i915;
+	eb.file = file;
+	eb.args = args;
+
+	eb.fences = NULL;
+	eb.num_fences = 0;
+
+	memset(eb.requests, 0, sizeof(struct i915_request *) *
+	       ARRAY_SIZE(eb.requests));
+	eb.composite_fence = NULL;
+
+	err = parse_timeline_fences(&eb);
+	if (err)
+		return err;
+
+	err = eb_select_context(&eb);
+	if (unlikely(err))
+		goto err_fences;
+
+	err = eb_select_engine(&eb);
+	if (unlikely(err))
+		goto err_context;
+
+	err = parse_batch_addresses(&eb);
+	if (unlikely(err))
+		goto err_engine;
+
+	i915_gem_vm_bind_lock(eb.context->vm);
+
+	err = eb_lookup_vmas(&eb);
+	if (err) {
+		eb_release_vmas(&eb, true);
+		goto err_vm_bind_lock;
+	}
+
+	i915_gem_ww_ctx_init(&eb.ww, true);
+
+	err = eb_validate_vmas(&eb);
+	if (err)
+		goto err_vma;
+
+	ww_acquire_done(&eb.ww.ctx);
+
+	err = eb_requests_create(&eb);
+	if (err) {
+		if (eb.requests[0])
+			goto err_request;
+		else
+			goto err_vma;
+	}
+
+	err = eb_submit(&eb);
+
+err_request:
+	eb_requests_get(&eb);
+	err = eb_requests_add(&eb, err);
+
+	if (eb.fences)
+		signal_fence_array(&eb, eb.composite_fence ?
+				   eb.composite_fence :
+				   &eb.requests[0]->fence);
+
+	if (unlikely(eb.gem_context->syncobj)) {
+		drm_syncobj_replace_fence(eb.gem_context->syncobj,
+					  eb.composite_fence ?
+					  eb.composite_fence :
+					  &eb.requests[0]->fence);
+	}
+
+	if (eb.composite_fence)
+		dma_fence_put(eb.composite_fence);
+
+	eb_requests_put(&eb);
+
+err_vma:
+	eb_release_vmas(&eb, true);
+	WARN_ON(err == -EDEADLK);
+	i915_gem_ww_ctx_fini(&eb.ww);
+err_vm_bind_lock:
+	i915_gem_vm_bind_unlock(eb.context->vm);
+err_engine:
+	eb_put_engine(&eb);
+err_context:
+	i915_gem_context_put(eb.gem_context);
+err_fences:
+	put_fence_array(eb.fences, eb.num_fences);
+	return err;
+}
+
+int
+i915_gem_execbuffer3_ioctl(struct drm_device *dev, void *data,
+			   struct drm_file *file)
+{
+	struct drm_i915_gem_execbuffer3 *args = data;
+	int err;
+
+	if (args->flags & __I915_EXEC3_UNKNOWN_FLAGS)
+		return -EINVAL;
+
+	err = i915_gem_do_execbuffer(dev, file, args);
+
+	args->flags &= ~__I915_EXEC3_UNKNOWN_FLAGS;
+	return err;
+}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
index 28d6526e32ab..b7a1e9725a84 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
@@ -18,6 +18,8 @@ int i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *file);
 int i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 			       struct drm_file *file);
+int i915_gem_execbuffer3_ioctl(struct drm_device *dev, void *data,
+			       struct drm_file *file);
 int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *file);
 int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 776ab7844f60..4c13628d8663 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1834,6 +1834,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_INIT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER, drm_invalid_op, DRM_AUTH),
 	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER2_WR, i915_gem_execbuffer2_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER3, i915_gem_execbuffer3_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_PIN, i915_gem_reject_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_UNPIN, i915_gem_reject_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_BUSY, i915_gem_busy_ioctl, DRM_RENDER_ALLOW),
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index ce1c6592b0d7..45cc97f9a424 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -472,6 +472,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_CREATE_EXT		0x3c
 #define DRM_I915_GEM_VM_BIND		0x3d
 #define DRM_I915_GEM_VM_UNBIND		0x3e
+#define DRM_I915_GEM_EXECBUFFER3	0x3f
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
@@ -538,6 +539,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
 #define DRM_IOCTL_I915_GEM_VM_BIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
 #define DRM_IOCTL_I915_GEM_VM_UNBIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1277,7 +1279,8 @@ struct drm_i915_gem_exec_fence {
 /*
  * See drm_i915_gem_execbuffer_ext_timeline_fences.
  */
-#define DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES 0
+#define DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES  0
+#define DRM_I915_GEM_EXECBUFFER3_EXT_TIMELINE_FENCES 0
 
 /*
  * This structure describes an array of drm_syncobj and associated points for
@@ -1499,6 +1502,68 @@ struct drm_i915_gem_timeline_fence {
 	__u64 value;
 };
 
+/**
+ * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
+ * ioctl.
+ *
+ * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
+ * only works with this ioctl for submission.
+ * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
+ */
+struct drm_i915_gem_execbuffer3 {
+	/**
+	 * @ctx_id: Context id
+	 *
+	 * Only contexts with user engine map are allowed.
+	 */
+	__u32 ctx_id;
+
+	/**
+	 * @engine_idx: Engine index
+	 *
+	 * An index in the user engine map of the context specified by @ctx_id.
+	 */
+	__u32 engine_idx;
+
+	/**
+	 * @batch_address: Batch gpu virtual address/es.
+	 *
+	 * For normal submission, it is the gpu virtual address of the batch
+	 * buffer. For parallel submission, it is a pointer to an array of
+	 * batch buffer gpu virtual addresses with array size equal to the
+	 * number of (parallel) engines involved in that submission (See
+	 * struct i915_context_engines_parallel_submit).
+	 */
+	__u64 batch_address;
+
+	/** @flags: Currently reserved, MBZ */
+	__u64 flags;
+#define __I915_EXEC3_UNKNOWN_FLAGS (~0)
+
+	/** @rsvd1: Reserved, MBZ */
+	__u32 rsvd1;
+
+	/** @fence_count: Number of fences in @timeline_fences array. */
+	__u32 fence_count;
+
+	/**
+	 * @timeline_fences: Pointer to an array of timeline fences.
+	 *
+	 * Timeline fences are of format struct drm_i915_gem_timeline_fence.
+	 */
+	__u64 timeline_fences;
+
+	/** @rsvd2: Reserved, MBZ */
+	__u64 rsvd2;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
 struct drm_i915_gem_pin {
 	/** Handle of the buffer to be pinned. */
 	__u32 handle;
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [Intel-gfx] [RFC 06/10] drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
works in vm_bind mode. The vm_bind mode only works with
this new execbuf3 ioctl.

The new execbuf3 ioctl will not have any execlist support
and all the legacy support like relocations etc are removed.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |    1 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |    5 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1029 +++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
 drivers/gpu/drm/i915/i915_driver.c            |    1 +
 include/uapi/drm/i915_drm.h                   |   67 +-
 6 files changed, 1104 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 4e1627e96c6e..38cd1c5bc1a5 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -148,6 +148,7 @@ gem-y += \
 	gem/i915_gem_dmabuf.o \
 	gem/i915_gem_domain.o \
 	gem/i915_gem_execbuffer.o \
+	gem/i915_gem_execbuffer3.o \
 	gem/i915_gem_internal.o \
 	gem/i915_gem_object.o \
 	gem/i915_gem_lmem.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index b7b2c14fd9e1..37bb1383ab8f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -782,6 +782,11 @@ static int eb_select_context(struct i915_execbuffer *eb)
 	if (unlikely(IS_ERR(ctx)))
 		return PTR_ERR(ctx);
 
+	if (ctx->vm->vm_bind_mode) {
+		i915_gem_context_put(ctx);
+		return -EOPNOTSUPP;
+	}
+
 	eb->gem_context = ctx;
 	if (i915_gem_context_has_full_ppgtt(ctx))
 		eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
new file mode 100644
index 000000000000..13121df72e3d
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
@@ -0,0 +1,1029 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include <linux/dma-resv.h>
+#include <linux/sync_file.h>
+#include <linux/uaccess.h>
+
+#include <drm/drm_syncobj.h>
+
+#include "gt/intel_context.h"
+#include "gt/intel_gpu_commands.h"
+#include "gt/intel_gt.h"
+#include "gt/intel_gt_pm.h"
+#include "gt/intel_ring.h"
+
+#include "i915_drv.h"
+#include "i915_file_private.h"
+#include "i915_gem_context.h"
+#include "i915_gem_ioctls.h"
+#include "i915_gem_vm_bind.h"
+#include "i915_trace.h"
+
+#define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
+#define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
+
+/* Catch emission of unexpected errors for CI! */
+#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
+#undef EINVAL
+#define EINVAL ({ \
+	DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \
+	22; \
+})
+#endif
+
+/**
+ * DOC: User command execution with execbuf3 ioctl
+ *
+ * A VM in VM_BIND mode will not support older execbuf mode of binding.
+ * The execbuf ioctl handling in VM_BIND mode differs significantly from the
+ * older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
+ * Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
+ * struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
+ * execlist. Hence, no support for implicit sync.
+ *
+ * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
+ * works with execbuf3 ioctl for submission.
+ *
+ * The execbuf3 ioctl directly specifies the batch addresses instead of as
+ * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
+ * support many of the older features like in/out/submit fences, fence array,
+ * default gem context etc. (See struct drm_i915_gem_execbuffer3).
+ *
+ * In VM_BIND mode, VA allocation is completely managed by the user instead of
+ * the i915 driver. Hence all VA assignment, eviction are not applicable in
+ * VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
+ * be using the i915_vma active reference tracking. It will instead check the
+ * dma-resv object's fence list for that.
+ *
+ * So, a lot of code supporting execbuf2 ioctl, like relocations, VA evictions,
+ * vma lookup table, implicit sync, vma active reference tracking etc., are not
+ * applicable for execbuf3 ioctl.
+ */
+
+struct eb_fence {
+	struct drm_syncobj *syncobj; /* Use with ptr_mask_bits() */
+	struct dma_fence *dma_fence;
+	u64 value;
+	struct dma_fence_chain *chain_fence;
+};
+
+struct i915_execbuffer {
+	struct drm_i915_private *i915; /** i915 backpointer */
+	struct drm_file *file; /** per-file lookup tables and limits */
+	struct drm_i915_gem_execbuffer3 *args; /** ioctl parameters */
+
+	struct intel_gt *gt; /* gt for the execbuf */
+	struct intel_context *context; /* logical state for the request */
+	struct i915_gem_context *gem_context; /** caller's context */
+
+	/** our requests to build */
+	struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
+
+	/** used for excl fence in dma_resv objects when > 1 BB submitted */
+	struct dma_fence *composite_fence;
+
+	struct i915_gem_ww_ctx ww;
+
+	/* number of batches in execbuf IOCTL */
+	unsigned int num_batches;
+
+	u64 batch_addresses[MAX_ENGINE_INSTANCE + 1];
+	/** identity of the batch obj/vma */
+	struct i915_vma *batches[MAX_ENGINE_INSTANCE + 1];
+
+	struct eb_fence *fences;
+	unsigned long num_fences;
+};
+
+static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle);
+static void eb_unpin_engine(struct i915_execbuffer *eb);
+
+static int eb_select_context(struct i915_execbuffer *eb)
+{
+	struct i915_gem_context *ctx;
+
+	ctx = i915_gem_context_lookup(eb->file->driver_priv, eb->args->ctx_id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	eb->gem_context = ctx;
+	return 0;
+}
+
+static struct i915_vma *
+eb_find_vma(struct i915_address_space *vm, u64 addr)
+{
+	u64 va;
+
+	assert_vm_bind_held(vm);
+
+	va = gen8_noncanonical_addr(addr & PIN_OFFSET_MASK);
+	return i915_gem_vm_bind_lookup_vma(vm, va);
+}
+
+static int eb_lookup_vmas(struct i915_execbuffer *eb)
+{
+	unsigned int i, current_batch = 0;
+	struct i915_vma *vma;
+
+	for (i = 0; i < eb->num_batches; i++) {
+		vma = eb_find_vma(eb->context->vm, eb->batch_addresses[i]);
+		if (!vma)
+			return -EINVAL;
+
+		eb->batches[current_batch] = vma;
+		++current_batch;
+	}
+
+	return 0;
+}
+
+static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
+{
+}
+
+static int eb_validate_vmas(struct i915_execbuffer *eb)
+{
+	int err;
+	bool throttle = true;
+
+retry:
+	err = eb_pin_engine(eb, throttle);
+	if (err) {
+		if (err != -EDEADLK)
+			return err;
+
+		goto err;
+	}
+
+	/* only throttle once, even if we didn't need to throttle */
+	throttle = false;
+
+err:
+	if (err == -EDEADLK) {
+		err = i915_gem_ww_ctx_backoff(&eb->ww);
+		if (!err)
+			goto retry;
+	}
+
+	return err;
+}
+
+/*
+ * Using two helper loops for the order of which requests / batches are created
+ * and added the to backend. Requests are created in order from the parent to
+ * the last child. Requests are added in the reverse order, from the last child
+ * to parent. This is done for locking reasons as the timeline lock is acquired
+ * during request creation and released when the request is added to the
+ * backend. To make lockdep happy (see intel_context_timeline_lock) this must be
+ * the ordering.
+ */
+#define for_each_batch_create_order(_eb, _i) \
+	for ((_i) = 0; (_i) < (_eb)->num_batches; ++(_i))
+#define for_each_batch_add_order(_eb, _i) \
+	BUILD_BUG_ON(!typecheck(int, _i)); \
+	for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
+
+static int eb_move_to_gpu(struct i915_execbuffer *eb)
+{
+	/* Unconditionally flush any chipset caches (for streaming writes). */
+	intel_gt_chipset_flush(eb->gt);
+
+	return 0;
+}
+
+static int eb_request_submit(struct i915_execbuffer *eb,
+			     struct i915_request *rq,
+			     struct i915_vma *batch,
+			     u64 batch_len)
+{
+	int err;
+
+	if (intel_context_nopreempt(rq->context))
+		__set_bit(I915_FENCE_FLAG_NOPREEMPT, &rq->fence.flags);
+
+	/*
+	 * After we completed waiting for other engines (using HW semaphores)
+	 * then we can signal that this request/batch is ready to run. This
+	 * allows us to determine if the batch is still waiting on the GPU
+	 * or actually running by checking the breadcrumb.
+	 */
+	if (rq->context->engine->emit_init_breadcrumb) {
+		err = rq->context->engine->emit_init_breadcrumb(rq);
+		if (err)
+			return err;
+	}
+
+	err = rq->context->engine->emit_bb_start(rq,
+						 batch->node.start,
+						 batch_len, 0);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static int eb_submit(struct i915_execbuffer *eb)
+{
+	unsigned int i;
+	int err;
+
+	err = eb_move_to_gpu(eb);
+
+	for_each_batch_create_order(eb, i) {
+		if (!eb->requests[i])
+			break;
+
+		trace_i915_request_queue(eb->requests[i], 0);
+		if (!err)
+			err = eb_request_submit(eb, eb->requests[i],
+						eb->batches[i],
+						eb->batches[i]->size);
+	}
+
+	return err;
+}
+
+static struct i915_request *eb_throttle(struct i915_execbuffer *eb, struct intel_context *ce)
+{
+	struct intel_ring *ring = ce->ring;
+	struct intel_timeline *tl = ce->timeline;
+	struct i915_request *rq;
+
+	/*
+	 * Completely unscientific finger-in-the-air estimates for suitable
+	 * maximum user request size (to avoid blocking) and then backoff.
+	 */
+	if (intel_ring_update_space(ring) >= PAGE_SIZE)
+		return NULL;
+
+	/*
+	 * Find a request that after waiting upon, there will be at least half
+	 * the ring available. The hysteresis allows us to compete for the
+	 * shared ring and should mean that we sleep less often prior to
+	 * claiming our resources, but not so long that the ring completely
+	 * drains before we can submit our next request.
+	 */
+	list_for_each_entry(rq, &tl->requests, link) {
+		if (rq->ring != ring)
+			continue;
+
+		if (__intel_ring_space(rq->postfix,
+				       ring->emit, ring->size) > ring->size / 2)
+			break;
+	}
+	if (&rq->link == &tl->requests)
+		return NULL; /* weird, we will check again later for real */
+
+	return i915_request_get(rq);
+}
+
+static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
+			   bool throttle)
+{
+	struct intel_timeline *tl;
+	struct i915_request *rq = NULL;
+
+	/*
+	 * Take a local wakeref for preparing to dispatch the execbuf as
+	 * we expect to access the hardware fairly frequently in the
+	 * process, and require the engine to be kept awake between accesses.
+	 * Upon dispatch, we acquire another prolonged wakeref that we hold
+	 * until the timeline is idle, which in turn releases the wakeref
+	 * taken on the engine, and the parent device.
+	 */
+	tl = intel_context_timeline_lock(ce);
+	if (IS_ERR(tl))
+		return PTR_ERR(tl);
+
+	intel_context_enter(ce);
+	if (throttle)
+		rq = eb_throttle(eb, ce);
+	intel_context_timeline_unlock(tl);
+
+	if (rq) {
+		bool nonblock = eb->file->filp->f_flags & O_NONBLOCK;
+		long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
+
+		if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
+				      timeout) < 0) {
+			i915_request_put(rq);
+
+			/*
+			 * Error path, cannot use intel_context_timeline_lock as
+			 * that is user interruptable and this clean up step
+			 * must be done.
+			 */
+			mutex_lock(&ce->timeline->mutex);
+			intel_context_exit(ce);
+			mutex_unlock(&ce->timeline->mutex);
+
+			if (nonblock)
+				return -EWOULDBLOCK;
+			else
+				return -EINTR;
+		}
+		i915_request_put(rq);
+	}
+
+	return 0;
+}
+
+static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
+{
+	struct intel_context *ce = eb->context, *child;
+	int err;
+	int i = 0, j = 0;
+
+	GEM_BUG_ON(eb->args->flags & __EXEC3_ENGINE_PINNED);
+
+	if (unlikely(intel_context_is_banned(ce)))
+		return -EIO;
+
+	/*
+	 * Pinning the contexts may generate requests in order to acquire
+	 * GGTT space, so do this first before we reserve a seqno for
+	 * ourselves.
+	 */
+	err = intel_context_pin_ww(ce, &eb->ww);
+	if (err)
+		return err;
+	for_each_child(ce, child) {
+		err = intel_context_pin_ww(child, &eb->ww);
+		GEM_BUG_ON(err);	/* perma-pinned should incr a counter */
+	}
+
+	for_each_child(ce, child) {
+		err = eb_pin_timeline(eb, child, throttle);
+		if (err)
+			goto unwind;
+		++i;
+	}
+	err = eb_pin_timeline(eb, ce, throttle);
+	if (err)
+		goto unwind;
+
+	eb->args->flags |= __EXEC3_ENGINE_PINNED;
+	return 0;
+
+unwind:
+	for_each_child(ce, child) {
+		if (j++ < i) {
+			mutex_lock(&child->timeline->mutex);
+			intel_context_exit(child);
+			mutex_unlock(&child->timeline->mutex);
+		}
+	}
+	for_each_child(ce, child)
+		intel_context_unpin(child);
+	intel_context_unpin(ce);
+	return err;
+}
+
+static void eb_unpin_engine(struct i915_execbuffer *eb)
+{
+	struct intel_context *ce = eb->context, *child;
+
+	if (!(eb->args->flags & __EXEC3_ENGINE_PINNED))
+		return;
+
+	eb->args->flags &= ~__EXEC3_ENGINE_PINNED;
+
+	for_each_child(ce, child) {
+		mutex_lock(&child->timeline->mutex);
+		intel_context_exit(child);
+		mutex_unlock(&child->timeline->mutex);
+
+		intel_context_unpin(child);
+	}
+
+	mutex_lock(&ce->timeline->mutex);
+	intel_context_exit(ce);
+	mutex_unlock(&ce->timeline->mutex);
+
+	intel_context_unpin(ce);
+}
+
+static int
+eb_select_engine(struct i915_execbuffer *eb)
+{
+	struct intel_context *ce, *child;
+	unsigned int idx;
+	int err;
+
+	if (!i915_gem_context_user_engines(eb->gem_context))
+		return -EINVAL;
+
+	idx = eb->args->engine_idx;
+	ce = i915_gem_context_get_engine(eb->gem_context, idx);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
+	eb->num_batches = ce->parallel.number_children + 1;
+
+	for_each_child(ce, child)
+		intel_context_get(child);
+	intel_gt_pm_get(ce->engine->gt);
+
+	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
+		err = intel_context_alloc_state(ce);
+		if (err)
+			goto err;
+	}
+	for_each_child(ce, child) {
+		if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
+			err = intel_context_alloc_state(child);
+			if (err)
+				goto err;
+		}
+	}
+
+	/*
+	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
+	 * EIO if the GPU is already wedged.
+	 */
+	err = intel_gt_terminally_wedged(ce->engine->gt);
+	if (err)
+		goto err;
+
+	if (!i915_vm_tryget(ce->vm)) {
+		err = -ENOENT;
+		goto err;
+	}
+
+	eb->context = ce;
+	eb->gt = ce->engine->gt;
+
+	/*
+	 * Make sure engine pool stays alive even if we call intel_context_put
+	 * during ww handling. The pool is destroyed when last pm reference
+	 * is dropped, which breaks our -EDEADLK handling.
+	 */
+	return err;
+
+err:
+	intel_gt_pm_put(ce->engine->gt);
+	for_each_child(ce, child)
+		intel_context_put(child);
+	intel_context_put(ce);
+	return err;
+}
+
+static void
+eb_put_engine(struct i915_execbuffer *eb)
+{
+	struct intel_context *child;
+
+	i915_vm_put(eb->context->vm);
+	intel_gt_pm_put(eb->gt);
+	for_each_child(eb->context, child)
+		intel_context_put(child);
+	intel_context_put(eb->context);
+}
+
+static void
+__free_fence_array(struct eb_fence *fences, unsigned int n)
+{
+	while (n--) {
+		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
+		dma_fence_put(fences[n].dma_fence);
+		dma_fence_chain_free(fences[n].chain_fence);
+	}
+	kvfree(fences);
+}
+
+static int add_timeline_fence_array(struct i915_execbuffer *eb)
+{
+	struct drm_i915_gem_timeline_fence __user *user_fences;
+	struct eb_fence *f;
+	u64 nfences;
+	int err = 0;
+
+	nfences = eb->args->fence_count;
+	if (!nfences)
+		return 0;
+
+	/* Check multiplication overflow for access_ok() and kvmalloc_array() */
+	BUILD_BUG_ON(sizeof(size_t) > sizeof(unsigned long));
+	if (nfences > min_t(unsigned long,
+			    ULONG_MAX / sizeof(*user_fences),
+			    SIZE_MAX / sizeof(*f)) - eb->num_fences)
+		return -EINVAL;
+
+	user_fences = u64_to_user_ptr(eb->args->timeline_fences);
+	if (!access_ok(user_fences, nfences * sizeof(*user_fences)))
+		return -EFAULT;
+
+	f = krealloc(eb->fences,
+		     (eb->num_fences + nfences) * sizeof(*f),
+		     __GFP_NOWARN | GFP_KERNEL);
+	if (!f)
+		return -ENOMEM;
+
+	eb->fences = f;
+	f += eb->num_fences;
+
+	BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
+		     ~__I915_TIMELINE_FENCE_UNKNOWN_FLAGS);
+
+	while (nfences--) {
+		struct drm_i915_gem_timeline_fence user_fence;
+		struct drm_syncobj *syncobj;
+		struct dma_fence *fence = NULL;
+		u64 point;
+
+		if (__copy_from_user(&user_fence,
+				     user_fences++,
+				     sizeof(user_fence)))
+			return -EFAULT;
+
+		if (user_fence.flags & __I915_TIMELINE_FENCE_UNKNOWN_FLAGS)
+			return -EINVAL;
+
+		syncobj = drm_syncobj_find(eb->file, user_fence.handle);
+		if (!syncobj) {
+			DRM_DEBUG("Invalid syncobj handle provided\n");
+			return -ENOENT;
+		}
+
+		fence = drm_syncobj_fence_get(syncobj);
+
+		if (!fence && user_fence.flags &&
+		    !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL)) {
+			DRM_DEBUG("Syncobj handle has no fence\n");
+			drm_syncobj_put(syncobj);
+			return -EINVAL;
+		}
+
+		point = user_fence.value;
+		if (fence)
+			err = dma_fence_chain_find_seqno(&fence, point);
+
+		if (err && !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL)) {
+			DRM_DEBUG("Syncobj handle missing requested point %llu\n", point);
+			dma_fence_put(fence);
+			drm_syncobj_put(syncobj);
+			return err;
+		}
+
+		/*
+		 * A point might have been signaled already and
+		 * garbage collected from the timeline. In this case
+		 * just ignore the point and carry on.
+		 */
+		if (!fence && !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL)) {
+			drm_syncobj_put(syncobj);
+			continue;
+		}
+
+		/*
+		 * For timeline syncobjs we need to preallocate chains for
+		 * later signaling.
+		 */
+		if (point != 0 && user_fence.flags & I915_TIMELINE_FENCE_SIGNAL) {
+			/*
+			 * Waiting and signaling the same point (when point !=
+			 * 0) would break the timeline.
+			 */
+			if (user_fence.flags & I915_TIMELINE_FENCE_WAIT) {
+				DRM_DEBUG("Trying to wait & signal the same timeline point.\n");
+				dma_fence_put(fence);
+				drm_syncobj_put(syncobj);
+				return -EINVAL;
+			}
+
+			f->chain_fence = dma_fence_chain_alloc();
+			if (!f->chain_fence) {
+				drm_syncobj_put(syncobj);
+				dma_fence_put(fence);
+				return -ENOMEM;
+			}
+		} else {
+			f->chain_fence = NULL;
+		}
+
+		f->syncobj = ptr_pack_bits(syncobj, user_fence.flags, 2);
+		f->dma_fence = fence;
+		f->value = point;
+		f++;
+		eb->num_fences++;
+	}
+
+	return 0;
+}
+
+static void put_fence_array(struct eb_fence *fences, int num_fences)
+{
+	if (fences)
+		__free_fence_array(fences, num_fences);
+}
+
+static int
+await_fence_array(struct i915_execbuffer *eb,
+		  struct i915_request *rq)
+{
+	unsigned int n;
+	int err;
+
+	for (n = 0; n < eb->num_fences; n++) {
+		struct drm_syncobj *syncobj;
+		unsigned int flags;
+
+		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
+
+		if (!eb->fences[n].dma_fence)
+			continue;
+
+		err = i915_request_await_dma_fence(rq, eb->fences[n].dma_fence);
+		if (err < 0)
+			return err;
+	}
+
+	return 0;
+}
+
+static void signal_fence_array(const struct i915_execbuffer *eb,
+			       struct dma_fence * const fence)
+{
+	unsigned int n;
+
+	for (n = 0; n < eb->num_fences; n++) {
+		struct drm_syncobj *syncobj;
+		unsigned int flags;
+
+		syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2);
+		if (!(flags & I915_TIMELINE_FENCE_SIGNAL))
+			continue;
+
+		if (eb->fences[n].chain_fence) {
+			drm_syncobj_add_point(syncobj,
+					      eb->fences[n].chain_fence,
+					      fence,
+					      eb->fences[n].value);
+			/*
+			 * The chain's ownership is transferred to the
+			 * timeline.
+			 */
+			eb->fences[n].chain_fence = NULL;
+		} else {
+			drm_syncobj_replace_fence(syncobj, fence);
+		}
+	}
+}
+
+static int parse_timeline_fences(struct i915_execbuffer *eb)
+{
+	return add_timeline_fence_array(eb);
+}
+
+static int parse_batch_addresses(struct i915_execbuffer *eb)
+{
+	struct drm_i915_gem_execbuffer3 *args = eb->args;
+	u64 __user *batch_addr = u64_to_user_ptr(args->batch_address);
+
+	if (copy_from_user(eb->batch_addresses, batch_addr,
+			   sizeof(batch_addr[0]) * eb->num_batches))
+		return -EFAULT;
+
+	return 0;
+}
+
+static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
+{
+	struct i915_request *rq, *rn;
+
+	list_for_each_entry_safe(rq, rn, &tl->requests, link)
+		if (rq == end || !i915_request_retire(rq))
+			break;
+}
+
+static int eb_request_add(struct i915_execbuffer *eb, struct i915_request *rq,
+			  int err, bool last_parallel)
+{
+	struct intel_timeline * const tl = i915_request_timeline(rq);
+	struct i915_sched_attr attr = {};
+	struct i915_request *prev;
+
+	lockdep_assert_held(&tl->mutex);
+	lockdep_unpin_lock(&tl->mutex, rq->cookie);
+
+	trace_i915_request_add(rq);
+
+	prev = __i915_request_commit(rq);
+
+	/* Check that the context wasn't destroyed before submission */
+	if (likely(!intel_context_is_closed(eb->context))) {
+		attr = eb->gem_context->sched;
+	} else {
+		/* Serialise with context_close via the add_to_timeline */
+		i915_request_set_error_once(rq, -ENOENT);
+		__i915_request_skip(rq);
+		err = -ENOENT; /* override any transient errors */
+	}
+
+	if (intel_context_is_parallel(eb->context)) {
+		if (err) {
+			__i915_request_skip(rq);
+			set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
+				&rq->fence.flags);
+		}
+		if (last_parallel)
+			set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
+				&rq->fence.flags);
+	}
+
+	__i915_request_queue(rq, &attr);
+
+	/* Try to clean up the client's timeline after submitting the request */
+	if (prev)
+		retire_requests(tl, prev);
+
+	mutex_unlock(&tl->mutex);
+
+	return err;
+}
+
+static int eb_requests_add(struct i915_execbuffer *eb, int err)
+{
+	int i;
+
+	/*
+	 * We iterate in reverse order of creation to release timeline mutexes in
+	 * same order.
+	 */
+	for_each_batch_add_order(eb, i) {
+		struct i915_request *rq = eb->requests[i];
+
+		if (!rq)
+			continue;
+		err |= eb_request_add(eb, rq, err, i == 0);
+	}
+
+	return err;
+}
+
+static void eb_requests_get(struct i915_execbuffer *eb)
+{
+	unsigned int i;
+
+	for_each_batch_create_order(eb, i) {
+		if (!eb->requests[i])
+			break;
+
+		i915_request_get(eb->requests[i]);
+	}
+}
+
+static void eb_requests_put(struct i915_execbuffer *eb)
+{
+	unsigned int i;
+
+	for_each_batch_create_order(eb, i) {
+		if (!eb->requests[i])
+			break;
+
+		i915_request_put(eb->requests[i]);
+	}
+}
+
+static int
+eb_composite_fence_create(struct i915_execbuffer *eb)
+{
+	struct dma_fence_array *fence_array;
+	struct dma_fence **fences;
+	unsigned int i;
+
+	GEM_BUG_ON(!intel_context_is_parent(eb->context));
+
+	fences = kmalloc_array(eb->num_batches, sizeof(*fences), GFP_KERNEL);
+	if (!fences)
+		return -ENOMEM;
+
+	for_each_batch_create_order(eb, i) {
+		fences[i] = &eb->requests[i]->fence;
+		__set_bit(I915_FENCE_FLAG_COMPOSITE,
+			  &eb->requests[i]->fence.flags);
+	}
+
+	fence_array = dma_fence_array_create(eb->num_batches,
+					     fences,
+					     eb->context->parallel.fence_context,
+					     eb->context->parallel.seqno++,
+					     false);
+	if (!fence_array) {
+		kfree(fences);
+		return -ENOMEM;
+	}
+
+	/* Move ownership to the dma_fence_array created above */
+	for_each_batch_create_order(eb, i)
+		dma_fence_get(fences[i]);
+
+	eb->composite_fence = &fence_array->base;
+
+	return 0;
+}
+
+static int
+eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq)
+{
+	int err;
+
+	if (unlikely(eb->gem_context->syncobj)) {
+		struct dma_fence *fence;
+
+		fence = drm_syncobj_fence_get(eb->gem_context->syncobj);
+		err = i915_request_await_dma_fence(rq, fence);
+		dma_fence_put(fence);
+		if (err)
+			return err;
+	}
+
+	if (eb->fences) {
+		err = await_fence_array(eb, rq);
+		if (err)
+			return err;
+	}
+
+	if (intel_context_is_parallel(eb->context)) {
+		err = eb_composite_fence_create(eb);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static struct intel_context *
+eb_find_context(struct i915_execbuffer *eb, unsigned int context_number)
+{
+	struct intel_context *child;
+
+	if (likely(context_number == 0))
+		return eb->context;
+
+	for_each_child(eb->context, child)
+		if (!--context_number)
+			return child;
+
+	GEM_BUG_ON("Context not found");
+
+	return NULL;
+}
+
+static int eb_requests_create(struct i915_execbuffer *eb)
+{
+	unsigned int i;
+	int err;
+
+	for_each_batch_create_order(eb, i) {
+		/* Allocate a request for this batch buffer nice and early. */
+		eb->requests[i] = i915_request_create(eb_find_context(eb, i));
+		if (IS_ERR(eb->requests[i])) {
+			err = PTR_ERR(eb->requests[i]);
+			eb->requests[i] = NULL;
+			return err;
+		}
+
+		/*
+		 * Only the first request added (committed to backend) has to
+		 * take the in fences into account as all subsequent requests
+		 * will have fences inserted inbetween them.
+		 */
+		if (i + 1 == eb->num_batches) {
+			err = eb_fences_add(eb, eb->requests[i]);
+			if (err)
+				return err;
+		}
+
+		/*
+		 * Not really on stack, but we don't want to call
+		 * kfree on the batch_snapshot when we put it, so use the
+		 * _onstack interface.
+		 */
+		if (eb->batches[i])
+			eb->requests[i]->batch_res =
+				i915_vma_resource_get(eb->batches[i]->resource);
+	}
+
+	return 0;
+}
+
+static int
+i915_gem_do_execbuffer(struct drm_device *dev,
+		       struct drm_file *file,
+		       struct drm_i915_gem_execbuffer3 *args)
+{
+	struct drm_i915_private *i915 = to_i915(dev);
+	struct i915_execbuffer eb;
+	int err;
+
+	BUILD_BUG_ON(__EXEC3_INTERNAL_FLAGS & ~__I915_EXEC3_UNKNOWN_FLAGS);
+
+	eb.i915 = i915;
+	eb.file = file;
+	eb.args = args;
+
+	eb.fences = NULL;
+	eb.num_fences = 0;
+
+	memset(eb.requests, 0, sizeof(struct i915_request *) *
+	       ARRAY_SIZE(eb.requests));
+	eb.composite_fence = NULL;
+
+	err = parse_timeline_fences(&eb);
+	if (err)
+		return err;
+
+	err = eb_select_context(&eb);
+	if (unlikely(err))
+		goto err_fences;
+
+	err = eb_select_engine(&eb);
+	if (unlikely(err))
+		goto err_context;
+
+	err = parse_batch_addresses(&eb);
+	if (unlikely(err))
+		goto err_engine;
+
+	i915_gem_vm_bind_lock(eb.context->vm);
+
+	err = eb_lookup_vmas(&eb);
+	if (err) {
+		eb_release_vmas(&eb, true);
+		goto err_vm_bind_lock;
+	}
+
+	i915_gem_ww_ctx_init(&eb.ww, true);
+
+	err = eb_validate_vmas(&eb);
+	if (err)
+		goto err_vma;
+
+	ww_acquire_done(&eb.ww.ctx);
+
+	err = eb_requests_create(&eb);
+	if (err) {
+		if (eb.requests[0])
+			goto err_request;
+		else
+			goto err_vma;
+	}
+
+	err = eb_submit(&eb);
+
+err_request:
+	eb_requests_get(&eb);
+	err = eb_requests_add(&eb, err);
+
+	if (eb.fences)
+		signal_fence_array(&eb, eb.composite_fence ?
+				   eb.composite_fence :
+				   &eb.requests[0]->fence);
+
+	if (unlikely(eb.gem_context->syncobj)) {
+		drm_syncobj_replace_fence(eb.gem_context->syncobj,
+					  eb.composite_fence ?
+					  eb.composite_fence :
+					  &eb.requests[0]->fence);
+	}
+
+	if (eb.composite_fence)
+		dma_fence_put(eb.composite_fence);
+
+	eb_requests_put(&eb);
+
+err_vma:
+	eb_release_vmas(&eb, true);
+	WARN_ON(err == -EDEADLK);
+	i915_gem_ww_ctx_fini(&eb.ww);
+err_vm_bind_lock:
+	i915_gem_vm_bind_unlock(eb.context->vm);
+err_engine:
+	eb_put_engine(&eb);
+err_context:
+	i915_gem_context_put(eb.gem_context);
+err_fences:
+	put_fence_array(eb.fences, eb.num_fences);
+	return err;
+}
+
+int
+i915_gem_execbuffer3_ioctl(struct drm_device *dev, void *data,
+			   struct drm_file *file)
+{
+	struct drm_i915_gem_execbuffer3 *args = data;
+	int err;
+
+	if (args->flags & __I915_EXEC3_UNKNOWN_FLAGS)
+		return -EINVAL;
+
+	err = i915_gem_do_execbuffer(dev, file, args);
+
+	args->flags &= ~__I915_EXEC3_UNKNOWN_FLAGS;
+	return err;
+}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
index 28d6526e32ab..b7a1e9725a84 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
@@ -18,6 +18,8 @@ int i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *file);
 int i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 			       struct drm_file *file);
+int i915_gem_execbuffer3_ioctl(struct drm_device *dev, void *data,
+			       struct drm_file *file);
 int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *file);
 int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 776ab7844f60..4c13628d8663 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1834,6 +1834,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_INIT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER, drm_invalid_op, DRM_AUTH),
 	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER2_WR, i915_gem_execbuffer2_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER3, i915_gem_execbuffer3_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_PIN, i915_gem_reject_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_UNPIN, i915_gem_reject_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_BUSY, i915_gem_busy_ioctl, DRM_RENDER_ALLOW),
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index ce1c6592b0d7..45cc97f9a424 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -472,6 +472,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_CREATE_EXT		0x3c
 #define DRM_I915_GEM_VM_BIND		0x3d
 #define DRM_I915_GEM_VM_UNBIND		0x3e
+#define DRM_I915_GEM_EXECBUFFER3	0x3f
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
@@ -538,6 +539,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
 #define DRM_IOCTL_I915_GEM_VM_BIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
 #define DRM_IOCTL_I915_GEM_VM_UNBIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1277,7 +1279,8 @@ struct drm_i915_gem_exec_fence {
 /*
  * See drm_i915_gem_execbuffer_ext_timeline_fences.
  */
-#define DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES 0
+#define DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES  0
+#define DRM_I915_GEM_EXECBUFFER3_EXT_TIMELINE_FENCES 0
 
 /*
  * This structure describes an array of drm_syncobj and associated points for
@@ -1499,6 +1502,68 @@ struct drm_i915_gem_timeline_fence {
 	__u64 value;
 };
 
+/**
+ * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
+ * ioctl.
+ *
+ * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
+ * only works with this ioctl for submission.
+ * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
+ */
+struct drm_i915_gem_execbuffer3 {
+	/**
+	 * @ctx_id: Context id
+	 *
+	 * Only contexts with user engine map are allowed.
+	 */
+	__u32 ctx_id;
+
+	/**
+	 * @engine_idx: Engine index
+	 *
+	 * An index in the user engine map of the context specified by @ctx_id.
+	 */
+	__u32 engine_idx;
+
+	/**
+	 * @batch_address: Batch gpu virtual address/es.
+	 *
+	 * For normal submission, it is the gpu virtual address of the batch
+	 * buffer. For parallel submission, it is a pointer to an array of
+	 * batch buffer gpu virtual addresses with array size equal to the
+	 * number of (parallel) engines involved in that submission (See
+	 * struct i915_context_engines_parallel_submit).
+	 */
+	__u64 batch_address;
+
+	/** @flags: Currently reserved, MBZ */
+	__u64 flags;
+#define __I915_EXEC3_UNKNOWN_FLAGS (~0)
+
+	/** @rsvd1: Reserved, MBZ */
+	__u32 rsvd1;
+
+	/** @fence_count: Number of fences in @timeline_fences array. */
+	__u32 fence_count;
+
+	/**
+	 * @timeline_fences: Pointer to an array of timeline fences.
+	 *
+	 * Timeline fences are of format struct drm_i915_gem_timeline_fence.
+	 */
+	__u64 timeline_fences;
+
+	/** @rsvd2: Reserved, MBZ */
+	__u64 rsvd2;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
 struct drm_i915_gem_pin {
 	/** Handle of the buffer to be pinned. */
 	__u32 handle;
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [RFC 07/10] drm/i915/vm_bind: Handle persistent vmas in execbuf3
  2022-07-01 22:50 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

Handle persistent (VM_BIND) mappings during the request submission
in the execbuf3 path.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 176 +++++++++++++++++-
 1 file changed, 175 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
index 13121df72e3d..2079f5ca9010 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
@@ -22,6 +22,7 @@
 #include "i915_gem_vm_bind.h"
 #include "i915_trace.h"
 
+#define __EXEC3_HAS_PIN			BIT_ULL(33)
 #define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
 #define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
 
@@ -45,7 +46,9 @@
  * execlist. Hence, no support for implicit sync.
  *
  * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
- * works with execbuf3 ioctl for submission.
+ * works with execbuf3 ioctl for submission. All BOs mapped on that VM (through
+ * VM_BIND call) at the time of execbuf3 call are deemed required for that
+ * submission.
  *
  * The execbuf3 ioctl directly specifies the batch addresses instead of as
  * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
@@ -61,6 +64,13 @@
  * So, a lot of code supporting execbuf2 ioctl, like relocations, VA evictions,
  * vma lookup table, implicit sync, vma active reference tracking etc., are not
  * applicable for execbuf3 ioctl.
+ *
+ * During each execbuf submission, request fence is added to all VM_BIND mapped
+ * objects with DMA_RESV_USAGE_BOOKKEEP. The DMA_RESV_USAGE_BOOKKEEP usage will
+ * prevent over sync (See enum dma_resv_usage). Note that DRM_I915_GEM_WAIT and
+ * DRM_I915_GEM_BUSY ioctls do not check for DMA_RESV_USAGE_BOOKKEEP usage and
+ * hence should not be used for end of batch check. Instead, the execbuf3
+ * timeline out fence should be used for end of batch check.
  */
 
 struct eb_fence {
@@ -124,6 +134,19 @@ eb_find_vma(struct i915_address_space *vm, u64 addr)
 	return i915_gem_vm_bind_lookup_vma(vm, va);
 }
 
+static void eb_scoop_unbound_vmas(struct i915_address_space *vm)
+{
+	struct i915_vma *vma, *vn;
+
+	spin_lock(&vm->vm_rebind_lock);
+	list_for_each_entry_safe(vma, vn, &vm->vm_rebind_list, vm_rebind_link) {
+		list_del_init(&vma->vm_rebind_link);
+		if (!list_empty(&vma->vm_bind_link))
+			list_move_tail(&vma->vm_bind_link, &vm->vm_bind_list);
+	}
+	spin_unlock(&vm->vm_rebind_lock);
+}
+
 static int eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	unsigned int i, current_batch = 0;
@@ -138,11 +161,118 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 		++current_batch;
 	}
 
+	eb_scoop_unbound_vmas(eb->context->vm);
+
+	return 0;
+}
+
+static int eb_lock_vmas(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma;
+	int err;
+
+	err = i915_gem_vm_priv_lock(eb->context->vm, &eb->ww);
+	if (err)
+		return err;
+
+	list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
+			    non_priv_vm_bind_link) {
+		err = i915_gem_object_lock(vma->obj, &eb->ww);
+		if (err)
+			return err;
+	}
+
 	return 0;
 }
 
+static void eb_release_persistent_vmas(struct i915_execbuffer *eb, bool final)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma, *vn;
+
+	assert_vm_bind_held(vm);
+
+	if (!(eb->args->flags & __EXEC3_HAS_PIN))
+		return;
+
+	assert_vm_priv_held(vm);
+
+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
+		__i915_vma_unpin(vma);
+
+	eb->args->flags &= ~__EXEC3_HAS_PIN;
+	if (!final)
+		return;
+
+	list_for_each_entry_safe(vma, vn, &vm->vm_bind_list, vm_bind_link)
+		if (i915_vma_is_bind_complete(vma))
+			list_move_tail(&vma->vm_bind_link, &vm->vm_bound_list);
+}
+
 static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
 {
+	eb_release_persistent_vmas(eb, final);
+	eb_unpin_engine(eb);
+}
+
+static int eb_reserve_fence_for_persistent_vmas(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma;
+	int ret;
+
+	ret = dma_resv_reserve_fences(vm->root_obj->base.resv, 1);
+	if (ret)
+		return ret;
+
+	list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
+			    non_priv_vm_bind_link) {
+		ret = dma_resv_reserve_fences(vma->obj->base.resv, 1);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int eb_validate_persistent_vmas(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma, *last_pinned_vma = NULL;
+	int ret = 0;
+
+	assert_vm_bind_held(vm);
+	assert_vm_priv_held(vm);
+
+	ret = eb_reserve_fence_for_persistent_vmas(eb);
+	if (ret)
+		return ret;
+
+	if (list_empty(&vm->vm_bind_list))
+		return 0;
+
+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
+		u64 pin_flags = vma->start | PIN_OFFSET_FIXED | PIN_USER;
+
+		ret = i915_vma_pin_ww(vma, &eb->ww, 0, 0, pin_flags);
+		if (ret)
+			break;
+
+		last_pinned_vma = vma;
+	}
+
+	if (ret && last_pinned_vma) {
+		list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
+			__i915_vma_unpin(vma);
+			if (vma == last_pinned_vma)
+				break;
+		}
+	} else if (last_pinned_vma) {
+		eb->args->flags |= __EXEC3_HAS_PIN;
+	}
+
+	return ret;
 }
 
 static int eb_validate_vmas(struct i915_execbuffer *eb)
@@ -162,8 +292,17 @@ static int eb_validate_vmas(struct i915_execbuffer *eb)
 	/* only throttle once, even if we didn't need to throttle */
 	throttle = false;
 
+	err = eb_lock_vmas(eb);
+	if (err)
+		goto err;
+
+	err = eb_validate_persistent_vmas(eb);
+	if (err)
+		goto err;
+
 err:
 	if (err == -EDEADLK) {
+		eb_release_vmas(eb, false);
 		err = i915_gem_ww_ctx_backoff(&eb->ww);
 		if (!err)
 			goto retry;
@@ -187,8 +326,43 @@ static int eb_validate_vmas(struct i915_execbuffer *eb)
 	BUILD_BUG_ON(!typecheck(int, _i)); \
 	for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
 
+static void __eb_persistent_add_shared_fence(struct drm_i915_gem_object *obj,
+					     struct dma_fence *fence)
+{
+	dma_resv_add_fence(obj->base.resv, fence, DMA_RESV_USAGE_BOOKKEEP);
+	obj->write_domain = 0;
+	obj->read_domains |= I915_GEM_GPU_DOMAINS;
+	obj->mm.dirty = true;
+}
+
+static void eb_persistent_add_shared_fence(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct dma_fence *fence;
+	struct i915_vma *vma;
+
+	fence = eb->composite_fence ? eb->composite_fence :
+		&eb->requests[0]->fence;
+
+	__eb_persistent_add_shared_fence(vm->root_obj, fence);
+	list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
+			    non_priv_vm_bind_link)
+		__eb_persistent_add_shared_fence(vma->obj, fence);
+}
+
+static void eb_persistent_vmas_move_to_active(struct i915_execbuffer *eb)
+{
+	/* Add fence to BOs dma-resv fence list */
+	eb_persistent_add_shared_fence(eb);
+}
+
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
+	assert_vm_bind_held(eb->context->vm);
+	assert_vm_priv_held(eb->context->vm);
+
+	eb_persistent_vmas_move_to_active(eb);
+
 	/* Unconditionally flush any chipset caches (for streaming writes). */
 	intel_gt_chipset_flush(eb->gt);
 
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [Intel-gfx] [RFC 07/10] drm/i915/vm_bind: Handle persistent vmas in execbuf3
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

Handle persistent (VM_BIND) mappings during the request submission
in the execbuf3 path.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 176 +++++++++++++++++-
 1 file changed, 175 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
index 13121df72e3d..2079f5ca9010 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
@@ -22,6 +22,7 @@
 #include "i915_gem_vm_bind.h"
 #include "i915_trace.h"
 
+#define __EXEC3_HAS_PIN			BIT_ULL(33)
 #define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
 #define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
 
@@ -45,7 +46,9 @@
  * execlist. Hence, no support for implicit sync.
  *
  * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
- * works with execbuf3 ioctl for submission.
+ * works with execbuf3 ioctl for submission. All BOs mapped on that VM (through
+ * VM_BIND call) at the time of execbuf3 call are deemed required for that
+ * submission.
  *
  * The execbuf3 ioctl directly specifies the batch addresses instead of as
  * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
@@ -61,6 +64,13 @@
  * So, a lot of code supporting execbuf2 ioctl, like relocations, VA evictions,
  * vma lookup table, implicit sync, vma active reference tracking etc., are not
  * applicable for execbuf3 ioctl.
+ *
+ * During each execbuf submission, request fence is added to all VM_BIND mapped
+ * objects with DMA_RESV_USAGE_BOOKKEEP. The DMA_RESV_USAGE_BOOKKEEP usage will
+ * prevent over sync (See enum dma_resv_usage). Note that DRM_I915_GEM_WAIT and
+ * DRM_I915_GEM_BUSY ioctls do not check for DMA_RESV_USAGE_BOOKKEEP usage and
+ * hence should not be used for end of batch check. Instead, the execbuf3
+ * timeline out fence should be used for end of batch check.
  */
 
 struct eb_fence {
@@ -124,6 +134,19 @@ eb_find_vma(struct i915_address_space *vm, u64 addr)
 	return i915_gem_vm_bind_lookup_vma(vm, va);
 }
 
+static void eb_scoop_unbound_vmas(struct i915_address_space *vm)
+{
+	struct i915_vma *vma, *vn;
+
+	spin_lock(&vm->vm_rebind_lock);
+	list_for_each_entry_safe(vma, vn, &vm->vm_rebind_list, vm_rebind_link) {
+		list_del_init(&vma->vm_rebind_link);
+		if (!list_empty(&vma->vm_bind_link))
+			list_move_tail(&vma->vm_bind_link, &vm->vm_bind_list);
+	}
+	spin_unlock(&vm->vm_rebind_lock);
+}
+
 static int eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	unsigned int i, current_batch = 0;
@@ -138,11 +161,118 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 		++current_batch;
 	}
 
+	eb_scoop_unbound_vmas(eb->context->vm);
+
+	return 0;
+}
+
+static int eb_lock_vmas(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma;
+	int err;
+
+	err = i915_gem_vm_priv_lock(eb->context->vm, &eb->ww);
+	if (err)
+		return err;
+
+	list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
+			    non_priv_vm_bind_link) {
+		err = i915_gem_object_lock(vma->obj, &eb->ww);
+		if (err)
+			return err;
+	}
+
 	return 0;
 }
 
+static void eb_release_persistent_vmas(struct i915_execbuffer *eb, bool final)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma, *vn;
+
+	assert_vm_bind_held(vm);
+
+	if (!(eb->args->flags & __EXEC3_HAS_PIN))
+		return;
+
+	assert_vm_priv_held(vm);
+
+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
+		__i915_vma_unpin(vma);
+
+	eb->args->flags &= ~__EXEC3_HAS_PIN;
+	if (!final)
+		return;
+
+	list_for_each_entry_safe(vma, vn, &vm->vm_bind_list, vm_bind_link)
+		if (i915_vma_is_bind_complete(vma))
+			list_move_tail(&vma->vm_bind_link, &vm->vm_bound_list);
+}
+
 static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
 {
+	eb_release_persistent_vmas(eb, final);
+	eb_unpin_engine(eb);
+}
+
+static int eb_reserve_fence_for_persistent_vmas(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma;
+	int ret;
+
+	ret = dma_resv_reserve_fences(vm->root_obj->base.resv, 1);
+	if (ret)
+		return ret;
+
+	list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
+			    non_priv_vm_bind_link) {
+		ret = dma_resv_reserve_fences(vma->obj->base.resv, 1);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int eb_validate_persistent_vmas(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *vma, *last_pinned_vma = NULL;
+	int ret = 0;
+
+	assert_vm_bind_held(vm);
+	assert_vm_priv_held(vm);
+
+	ret = eb_reserve_fence_for_persistent_vmas(eb);
+	if (ret)
+		return ret;
+
+	if (list_empty(&vm->vm_bind_list))
+		return 0;
+
+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
+		u64 pin_flags = vma->start | PIN_OFFSET_FIXED | PIN_USER;
+
+		ret = i915_vma_pin_ww(vma, &eb->ww, 0, 0, pin_flags);
+		if (ret)
+			break;
+
+		last_pinned_vma = vma;
+	}
+
+	if (ret && last_pinned_vma) {
+		list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
+			__i915_vma_unpin(vma);
+			if (vma == last_pinned_vma)
+				break;
+		}
+	} else if (last_pinned_vma) {
+		eb->args->flags |= __EXEC3_HAS_PIN;
+	}
+
+	return ret;
 }
 
 static int eb_validate_vmas(struct i915_execbuffer *eb)
@@ -162,8 +292,17 @@ static int eb_validate_vmas(struct i915_execbuffer *eb)
 	/* only throttle once, even if we didn't need to throttle */
 	throttle = false;
 
+	err = eb_lock_vmas(eb);
+	if (err)
+		goto err;
+
+	err = eb_validate_persistent_vmas(eb);
+	if (err)
+		goto err;
+
 err:
 	if (err == -EDEADLK) {
+		eb_release_vmas(eb, false);
 		err = i915_gem_ww_ctx_backoff(&eb->ww);
 		if (!err)
 			goto retry;
@@ -187,8 +326,43 @@ static int eb_validate_vmas(struct i915_execbuffer *eb)
 	BUILD_BUG_ON(!typecheck(int, _i)); \
 	for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
 
+static void __eb_persistent_add_shared_fence(struct drm_i915_gem_object *obj,
+					     struct dma_fence *fence)
+{
+	dma_resv_add_fence(obj->base.resv, fence, DMA_RESV_USAGE_BOOKKEEP);
+	obj->write_domain = 0;
+	obj->read_domains |= I915_GEM_GPU_DOMAINS;
+	obj->mm.dirty = true;
+}
+
+static void eb_persistent_add_shared_fence(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct dma_fence *fence;
+	struct i915_vma *vma;
+
+	fence = eb->composite_fence ? eb->composite_fence :
+		&eb->requests[0]->fence;
+
+	__eb_persistent_add_shared_fence(vm->root_obj, fence);
+	list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
+			    non_priv_vm_bind_link)
+		__eb_persistent_add_shared_fence(vma->obj, fence);
+}
+
+static void eb_persistent_vmas_move_to_active(struct i915_execbuffer *eb)
+{
+	/* Add fence to BOs dma-resv fence list */
+	eb_persistent_add_shared_fence(eb);
+}
+
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
+	assert_vm_bind_held(eb->context->vm);
+	assert_vm_priv_held(eb->context->vm);
+
+	eb_persistent_vmas_move_to_active(eb);
+
 	/* Unconditionally flush any chipset caches (for streaming writes). */
 	intel_gt_chipset_flush(eb->gt);
 
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [RFC 08/10] drm/i915/vm_bind: userptr dma-resv changes
  2022-07-01 22:50 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

For persistent (vm_bind) vmas of userptr BOs, handle the user
page pinning by using the i915_gem_object_userptr_submit_init()
/done() functions

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 67 +++++++++++++++++++
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 16 +++++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  1 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  1 +
 4 files changed, 85 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
index 2079f5ca9010..bf13dd6d642e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
@@ -22,6 +22,7 @@
 #include "i915_gem_vm_bind.h"
 #include "i915_trace.h"
 
+#define __EXEC3_USERPTR_USED		BIT_ULL(34)
 #define __EXEC3_HAS_PIN			BIT_ULL(33)
 #define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
 #define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
@@ -147,10 +148,36 @@ static void eb_scoop_unbound_vmas(struct i915_address_space *vm)
 	spin_unlock(&vm->vm_rebind_lock);
 }
 
+static int eb_lookup_persistent_userptr_vmas(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *last_vma = NULL;
+	struct i915_vma *vma;
+	int err;
+
+	assert_vm_bind_held(vm);
+
+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
+		if (i915_gem_object_is_userptr(vma->obj)) {
+			err = i915_gem_object_userptr_submit_init(vma->obj);
+			if (err)
+				return err;
+
+			last_vma = vma;
+		}
+	}
+
+	if (last_vma)
+		eb->args->flags |= __EXEC3_USERPTR_USED;
+
+	return 0;
+}
+
 static int eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	unsigned int i, current_batch = 0;
 	struct i915_vma *vma;
+	int err = 0;
 
 	for (i = 0; i < eb->num_batches; i++) {
 		vma = eb_find_vma(eb->context->vm, eb->batch_addresses[i]);
@@ -163,6 +190,10 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 
 	eb_scoop_unbound_vmas(eb->context->vm);
 
+	err = eb_lookup_persistent_userptr_vmas(eb);
+	if (err)
+		return err;
+
 	return 0;
 }
 
@@ -358,15 +389,51 @@ static void eb_persistent_vmas_move_to_active(struct i915_execbuffer *eb)
 
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
+	int err = 0, j;
+
 	assert_vm_bind_held(eb->context->vm);
 	assert_vm_priv_held(eb->context->vm);
 
 	eb_persistent_vmas_move_to_active(eb);
 
+#ifdef CONFIG_MMU_NOTIFIER
+	if (!err && (eb->args->flags & __EXEC3_USERPTR_USED)) {
+		struct i915_vma *vma;
+
+		assert_vm_bind_held(eb->context->vm);
+		assert_vm_priv_held(eb->context->vm);
+
+		read_lock(&eb->i915->mm.notifier_lock);
+		list_for_each_entry(vma, &eb->context->vm->vm_bind_list,
+				    vm_bind_link) {
+			if (!i915_gem_object_is_userptr(vma->obj))
+				continue;
+
+			err = i915_gem_object_userptr_submit_done(vma->obj);
+			if (err)
+				break;
+		}
+
+		read_unlock(&eb->i915->mm.notifier_lock);
+	}
+#endif
+
+	if (unlikely(err))
+		goto err_skip;
+
 	/* Unconditionally flush any chipset caches (for streaming writes). */
 	intel_gt_chipset_flush(eb->gt);
 
 	return 0;
+
+err_skip:
+	for_each_batch_create_order(eb, j) {
+		if (!eb->requests[j])
+			break;
+
+		i915_request_set_error_once(eb->requests[j], err);
+	}
+	return err;
 }
 
 static int eb_request_submit(struct i915_execbuffer *eb,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 1a8efa83547f..cae282b91618 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -263,6 +263,12 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		goto put_obj;
 	}
 
+	if (i915_gem_object_is_userptr(obj)) {
+		ret = i915_gem_object_userptr_submit_init(obj);
+		if (ret)
+			goto put_obj;
+	}
+
 	ret = i915_gem_vm_bind_lock_interruptible(vm);
 	if (ret)
 		goto put_obj;
@@ -295,6 +301,16 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 	/* Make it evictable */
 	__i915_vma_unpin(vma);
 
+#ifdef CONFIG_MMU_NOTIFIER
+	if (i915_gem_object_is_userptr(obj)) {
+		write_lock(&vm->i915->mm.notifier_lock);
+		ret = i915_gem_object_userptr_submit_done(obj);
+		write_unlock(&vm->i915->mm.notifier_lock);
+		if (ret)
+			goto out_ww;
+	}
+#endif
+
 	list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
 	i915_vm_bind_it_insert(vma, &vm->va);
 	if (!obj->priv_root)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 55d5389b2c6c..4ab3bda644ff 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -295,6 +295,7 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	GEM_BUG_ON(IS_ERR(vm->root_obj));
 	INIT_LIST_HEAD(&vm->vm_rebind_list);
 	spin_lock_init(&vm->vm_rebind_lock);
+	INIT_LIST_HEAD(&vm->invalidate_link);
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index fe5485c4a1cd..f9edf11c144f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -267,6 +267,7 @@ struct i915_address_space {
 	struct list_head vm_bound_list;
 	struct list_head vm_rebind_list;
 	spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
+	struct list_head invalidate_link;
 	/* va tree of persistent vmas */
 	struct rb_root_cached va;
 	struct list_head non_priv_vm_bind_list;
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [Intel-gfx] [RFC 08/10] drm/i915/vm_bind: userptr dma-resv changes
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

For persistent (vm_bind) vmas of userptr BOs, handle the user
page pinning by using the i915_gem_object_userptr_submit_init()
/done() functions

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 67 +++++++++++++++++++
 .../drm/i915/gem/i915_gem_vm_bind_object.c    | 16 +++++
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  1 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  1 +
 4 files changed, 85 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
index 2079f5ca9010..bf13dd6d642e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
@@ -22,6 +22,7 @@
 #include "i915_gem_vm_bind.h"
 #include "i915_trace.h"
 
+#define __EXEC3_USERPTR_USED		BIT_ULL(34)
 #define __EXEC3_HAS_PIN			BIT_ULL(33)
 #define __EXEC3_ENGINE_PINNED		BIT_ULL(32)
 #define __EXEC3_INTERNAL_FLAGS		(~0ull << 32)
@@ -147,10 +148,36 @@ static void eb_scoop_unbound_vmas(struct i915_address_space *vm)
 	spin_unlock(&vm->vm_rebind_lock);
 }
 
+static int eb_lookup_persistent_userptr_vmas(struct i915_execbuffer *eb)
+{
+	struct i915_address_space *vm = eb->context->vm;
+	struct i915_vma *last_vma = NULL;
+	struct i915_vma *vma;
+	int err;
+
+	assert_vm_bind_held(vm);
+
+	list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
+		if (i915_gem_object_is_userptr(vma->obj)) {
+			err = i915_gem_object_userptr_submit_init(vma->obj);
+			if (err)
+				return err;
+
+			last_vma = vma;
+		}
+	}
+
+	if (last_vma)
+		eb->args->flags |= __EXEC3_USERPTR_USED;
+
+	return 0;
+}
+
 static int eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	unsigned int i, current_batch = 0;
 	struct i915_vma *vma;
+	int err = 0;
 
 	for (i = 0; i < eb->num_batches; i++) {
 		vma = eb_find_vma(eb->context->vm, eb->batch_addresses[i]);
@@ -163,6 +190,10 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 
 	eb_scoop_unbound_vmas(eb->context->vm);
 
+	err = eb_lookup_persistent_userptr_vmas(eb);
+	if (err)
+		return err;
+
 	return 0;
 }
 
@@ -358,15 +389,51 @@ static void eb_persistent_vmas_move_to_active(struct i915_execbuffer *eb)
 
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
+	int err = 0, j;
+
 	assert_vm_bind_held(eb->context->vm);
 	assert_vm_priv_held(eb->context->vm);
 
 	eb_persistent_vmas_move_to_active(eb);
 
+#ifdef CONFIG_MMU_NOTIFIER
+	if (!err && (eb->args->flags & __EXEC3_USERPTR_USED)) {
+		struct i915_vma *vma;
+
+		assert_vm_bind_held(eb->context->vm);
+		assert_vm_priv_held(eb->context->vm);
+
+		read_lock(&eb->i915->mm.notifier_lock);
+		list_for_each_entry(vma, &eb->context->vm->vm_bind_list,
+				    vm_bind_link) {
+			if (!i915_gem_object_is_userptr(vma->obj))
+				continue;
+
+			err = i915_gem_object_userptr_submit_done(vma->obj);
+			if (err)
+				break;
+		}
+
+		read_unlock(&eb->i915->mm.notifier_lock);
+	}
+#endif
+
+	if (unlikely(err))
+		goto err_skip;
+
 	/* Unconditionally flush any chipset caches (for streaming writes). */
 	intel_gt_chipset_flush(eb->gt);
 
 	return 0;
+
+err_skip:
+	for_each_batch_create_order(eb, j) {
+		if (!eb->requests[j])
+			break;
+
+		i915_request_set_error_once(eb->requests[j], err);
+	}
+	return err;
 }
 
 static int eb_request_submit(struct i915_execbuffer *eb,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
index 1a8efa83547f..cae282b91618 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
@@ -263,6 +263,12 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 		goto put_obj;
 	}
 
+	if (i915_gem_object_is_userptr(obj)) {
+		ret = i915_gem_object_userptr_submit_init(obj);
+		if (ret)
+			goto put_obj;
+	}
+
 	ret = i915_gem_vm_bind_lock_interruptible(vm);
 	if (ret)
 		goto put_obj;
@@ -295,6 +301,16 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
 	/* Make it evictable */
 	__i915_vma_unpin(vma);
 
+#ifdef CONFIG_MMU_NOTIFIER
+	if (i915_gem_object_is_userptr(obj)) {
+		write_lock(&vm->i915->mm.notifier_lock);
+		ret = i915_gem_object_userptr_submit_done(obj);
+		write_unlock(&vm->i915->mm.notifier_lock);
+		if (ret)
+			goto out_ww;
+	}
+#endif
+
 	list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
 	i915_vm_bind_it_insert(vma, &vm->va);
 	if (!obj->priv_root)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 55d5389b2c6c..4ab3bda644ff 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -295,6 +295,7 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	GEM_BUG_ON(IS_ERR(vm->root_obj));
 	INIT_LIST_HEAD(&vm->vm_rebind_list);
 	spin_lock_init(&vm->vm_rebind_lock);
+	INIT_LIST_HEAD(&vm->invalidate_link);
 }
 
 void *__px_vaddr(struct drm_i915_gem_object *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index fe5485c4a1cd..f9edf11c144f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -267,6 +267,7 @@ struct i915_address_space {
 	struct list_head vm_bound_list;
 	struct list_head vm_rebind_list;
 	spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
+	struct list_head invalidate_link;
 	/* va tree of persistent vmas */
 	struct rb_root_cached va;
 	struct list_head non_priv_vm_bind_list;
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [RFC 09/10] drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  2022-07-01 22:50 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

vma_lookup is tied to segment of the object instead of section
of VA space. Hence, it do not support aliasing (ie., multiple
bindings to the same section of the object).
Skip vma_lookup for persistent vmas as it supports aliasing.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/i915_vma.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 6adb013579be..9aa38b772b5b 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -197,6 +197,10 @@ vma_create(struct drm_i915_gem_object *obj,
 		__set_bit(I915_VMA_GGTT_BIT, __i915_vma_flags(vma));
 	}
 
+	if (!i915_vma_is_ggtt(vma) &&
+	    (view && view->type == I915_GGTT_VIEW_PARTIAL))
+		goto skip_rb_insert;
+
 	rb = NULL;
 	p = &obj->vma.tree.rb_node;
 	while (*p) {
@@ -221,6 +225,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	rb_link_node(&vma->obj_node, rb, p);
 	rb_insert_color(&vma->obj_node, &obj->vma.tree);
 
+skip_rb_insert:
 	if (i915_vma_is_ggtt(vma))
 		/*
 		 * We put the GGTT vma at the start of the vma-list, followed
@@ -292,13 +297,16 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 		  struct i915_address_space *vm,
 		  const struct i915_ggtt_view *view)
 {
-	struct i915_vma *vma;
+	struct i915_vma *vma = NULL;
 
 	GEM_BUG_ON(!kref_read(&vm->ref));
 
-	spin_lock(&obj->vma.lock);
-	vma = i915_vma_lookup(obj, vm, view);
-	spin_unlock(&obj->vma.lock);
+	if (i915_is_ggtt(vm) || !view ||
+	    view->type != I915_GGTT_VIEW_PARTIAL) {
+		spin_lock(&obj->vma.lock);
+		vma = i915_vma_lookup(obj, vm, view);
+		spin_unlock(&obj->vma.lock);
+	}
 
 	/* vma_create() will resolve the race if another creates the vma */
 	if (unlikely(!vma))
@@ -1670,7 +1678,8 @@ static void release_references(struct i915_vma *vma, bool vm_ddestroy)
 
 	spin_lock(&obj->vma.lock);
 	list_del(&vma->obj_link);
-	if (!RB_EMPTY_NODE(&vma->obj_node))
+	if (!i915_vma_is_persistent(vma) &&
+	    !RB_EMPTY_NODE(&vma->obj_node))
 		rb_erase(&vma->obj_node, &obj->vma.tree);
 
 	spin_unlock(&obj->vma.lock);
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [Intel-gfx] [RFC 09/10] drm/i915/vm_bind: Skip vma_lookup for persistent vmas
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

vma_lookup is tied to segment of the object instead of section
of VA space. Hence, it do not support aliasing (ie., multiple
bindings to the same section of the object).
Skip vma_lookup for persistent vmas as it supports aliasing.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/i915_vma.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 6adb013579be..9aa38b772b5b 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -197,6 +197,10 @@ vma_create(struct drm_i915_gem_object *obj,
 		__set_bit(I915_VMA_GGTT_BIT, __i915_vma_flags(vma));
 	}
 
+	if (!i915_vma_is_ggtt(vma) &&
+	    (view && view->type == I915_GGTT_VIEW_PARTIAL))
+		goto skip_rb_insert;
+
 	rb = NULL;
 	p = &obj->vma.tree.rb_node;
 	while (*p) {
@@ -221,6 +225,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	rb_link_node(&vma->obj_node, rb, p);
 	rb_insert_color(&vma->obj_node, &obj->vma.tree);
 
+skip_rb_insert:
 	if (i915_vma_is_ggtt(vma))
 		/*
 		 * We put the GGTT vma at the start of the vma-list, followed
@@ -292,13 +297,16 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 		  struct i915_address_space *vm,
 		  const struct i915_ggtt_view *view)
 {
-	struct i915_vma *vma;
+	struct i915_vma *vma = NULL;
 
 	GEM_BUG_ON(!kref_read(&vm->ref));
 
-	spin_lock(&obj->vma.lock);
-	vma = i915_vma_lookup(obj, vm, view);
-	spin_unlock(&obj->vma.lock);
+	if (i915_is_ggtt(vm) || !view ||
+	    view->type != I915_GGTT_VIEW_PARTIAL) {
+		spin_lock(&obj->vma.lock);
+		vma = i915_vma_lookup(obj, vm, view);
+		spin_unlock(&obj->vma.lock);
+	}
 
 	/* vma_create() will resolve the race if another creates the vma */
 	if (unlikely(!vma))
@@ -1670,7 +1678,8 @@ static void release_references(struct i915_vma *vma, bool vm_ddestroy)
 
 	spin_lock(&obj->vma.lock);
 	list_del(&vma->obj_link);
-	if (!RB_EMPTY_NODE(&vma->obj_node))
+	if (!i915_vma_is_persistent(vma) &&
+	    !RB_EMPTY_NODE(&vma->obj_node))
 		rb_erase(&vma->obj_node, &obj->vma.tree);
 
 	spin_unlock(&obj->vma.lock);
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [RFC 10/10] drm/i915/vm_bind: Fix vm->vm_bind_mutex and vm->mutex nesting
  2022-07-01 22:50 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter, christian.koenig

VM_BIND functionality maintain that vm->vm_bind_mutex will never be taken
while holding vm->mutex.
However, while closing 'vm', vma is destroyed while holding vm->mutex.
But vma releasing needs to take vm->vm_bind_mutex in order to delete vma
from the vm_bind_list. To avoid this, destroy the vma outside vm->mutex
while closing the 'vm'.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gtt.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 4ab3bda644ff..4f707d0eb3ef 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -109,7 +109,8 @@ int map_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object
 	return 0;
 }
 
-static void clear_vm_list(struct list_head *list)
+static void clear_vm_list(struct list_head *list,
+			  struct list_head *destroy_list)
 {
 	struct i915_vma *vma, *vn;
 
@@ -138,8 +139,7 @@ static void clear_vm_list(struct list_head *list)
 			i915_vm_resv_get(vma->vm);
 			vma->vm_ddestroy = true;
 		} else {
-			i915_vma_destroy_locked(vma);
-			i915_gem_object_put(obj);
+			list_move_tail(&vma->vm_link, destroy_list);
 		}
 
 	}
@@ -147,16 +147,29 @@ static void clear_vm_list(struct list_head *list)
 
 static void __i915_vm_close(struct i915_address_space *vm)
 {
+	struct i915_vma *vma, *vn;
+	struct list_head list;
+
+	INIT_LIST_HEAD(&list);
+
 	mutex_lock(&vm->mutex);
 
-	clear_vm_list(&vm->bound_list);
-	clear_vm_list(&vm->unbound_list);
+	clear_vm_list(&vm->bound_list, &list);
+	clear_vm_list(&vm->unbound_list, &list);
 
 	/* Check for must-fix unanticipated side-effects */
 	GEM_BUG_ON(!list_empty(&vm->bound_list));
 	GEM_BUG_ON(!list_empty(&vm->unbound_list));
 
 	mutex_unlock(&vm->mutex);
+
+	/* Destroy vmas outside vm->mutex */
+	list_for_each_entry_safe(vma, vn, &list, vm_link) {
+		struct drm_i915_gem_object *obj = vma->obj;
+
+		i915_vma_destroy(vma);
+		i915_gem_object_put(obj);
+	}
 }
 
 /* lock the vm into the current ww, if we lock one, we lock all */
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [Intel-gfx] [RFC 10/10] drm/i915/vm_bind: Fix vm->vm_bind_mutex and vm->mutex nesting
@ 2022-07-01 22:50   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-01 22:50 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter,
	christian.koenig

VM_BIND functionality maintain that vm->vm_bind_mutex will never be taken
while holding vm->mutex.
However, while closing 'vm', vma is destroyed while holding vm->mutex.
But vma releasing needs to take vm->vm_bind_mutex in order to delete vma
from the vm_bind_list. To avoid this, destroy the vma outside vm->mutex
while closing the 'vm'.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gtt.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 4ab3bda644ff..4f707d0eb3ef 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -109,7 +109,8 @@ int map_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object
 	return 0;
 }
 
-static void clear_vm_list(struct list_head *list)
+static void clear_vm_list(struct list_head *list,
+			  struct list_head *destroy_list)
 {
 	struct i915_vma *vma, *vn;
 
@@ -138,8 +139,7 @@ static void clear_vm_list(struct list_head *list)
 			i915_vm_resv_get(vma->vm);
 			vma->vm_ddestroy = true;
 		} else {
-			i915_vma_destroy_locked(vma);
-			i915_gem_object_put(obj);
+			list_move_tail(&vma->vm_link, destroy_list);
 		}
 
 	}
@@ -147,16 +147,29 @@ static void clear_vm_list(struct list_head *list)
 
 static void __i915_vm_close(struct i915_address_space *vm)
 {
+	struct i915_vma *vma, *vn;
+	struct list_head list;
+
+	INIT_LIST_HEAD(&list);
+
 	mutex_lock(&vm->mutex);
 
-	clear_vm_list(&vm->bound_list);
-	clear_vm_list(&vm->unbound_list);
+	clear_vm_list(&vm->bound_list, &list);
+	clear_vm_list(&vm->unbound_list, &list);
 
 	/* Check for must-fix unanticipated side-effects */
 	GEM_BUG_ON(!list_empty(&vm->bound_list));
 	GEM_BUG_ON(!list_empty(&vm->unbound_list));
 
 	mutex_unlock(&vm->mutex);
+
+	/* Destroy vmas outside vm->mutex */
+	list_for_each_entry_safe(vma, vn, &list, vm_link) {
+		struct drm_i915_gem_object *obj = vma->obj;
+
+		i915_vma_destroy(vma);
+		i915_gem_object_put(obj);
+	}
 }
 
 /* lock the vm into the current ww, if we lock one, we lock all */
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/vm_bind: Add VM_BIND functionality
  2022-07-01 22:50 ` [Intel-gfx] " Niranjana Vishwanathapura
                   ` (10 preceding siblings ...)
  (?)
@ 2022-07-01 23:19 ` Patchwork
  -1 siblings, 0 replies; 121+ messages in thread
From: Patchwork @ 2022-07-01 23:19 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/vm_bind: Add VM_BIND functionality
URL   : https://patchwork.freedesktop.org/series/105879/
State : warning

== Summary ==

Error: dim checkpatch failed
9a618f815c5e drm/i915/vm_bind: Introduce VM_BIND ioctl
-:196: WARNING:LONG_LINE: line length of 118 exceeds 100 columns
#196: FILE: include/uapi/drm/i915_drm.h:539:
+#define DRM_IOCTL_I915_GEM_VM_BIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)

-:197: WARNING:LONG_LINE: line length of 122 exceeds 100 columns
#197: FILE: include/uapi/drm/i915_drm.h:540:
+#define DRM_IOCTL_I915_GEM_VM_UNBIND	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)

total: 0 errors, 2 warnings, 0 checks, 368 lines checked
b021c9cf1fd7 drm/i915/vm_bind: Bind and unbind mappings
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in <module>
    from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in <module>
    from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
-:73: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#73: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 443 lines checked
6375255bdb22 drm/i915/vm_bind: Support private and shared BOs
bc88aa1c4e77 drm/i915/vm_bind: Add out fence support
150c21055b0e drm/i915/vm_bind: Handle persistent vmas
bd2c34d240da drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in <module>
    from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
-:44: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#44: 
new file mode 100644

-:232: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_i' - possible side-effects?
#232: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c:184:
+#define for_each_batch_create_order(_eb, _i) \
+	for ((_i) = 0; (_i) < (_eb)->num_batches; ++(_i))

-:234: ERROR:MULTISTATEMENT_MACRO_USE_DO_WHILE: Macros with multiple statements should be enclosed in a do - while loop
#234: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c:186:
+#define for_each_batch_add_order(_eb, _i) \
+	BUILD_BUG_ON(!typecheck(int, _i)); \
+	for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))

-:234: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_i' - possible side-effects?
#234: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c:186:
+#define for_each_batch_add_order(_eb, _i) \
+	BUILD_BUG_ON(!typecheck(int, _i)); \
+	for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))

-:1119: WARNING:LONG_LINE: line length of 126 exceeds 100 columns
#1119: FILE: include/uapi/drm/i915_drm.h:542:
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)

total: 1 errors, 2 warnings, 2 checks, 1153 lines checked
05108f6c5754 drm/i915/vm_bind: Handle persistent vmas in execbuf3
481d633a73ce drm/i915/vm_bind: userptr dma-resv changes
aaafeac528d6 drm/i915/vm_bind: Skip vma_lookup for persistent vmas
27b31d00e6b0 drm/i915/vm_bind: Fix vm->vm_bind_mutex and vm->mutex nesting



^ permalink raw reply	[flat|nested] 121+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/vm_bind: Add VM_BIND functionality
  2022-07-01 22:50 ` [Intel-gfx] " Niranjana Vishwanathapura
                   ` (11 preceding siblings ...)
  (?)
@ 2022-07-01 23:19 ` Patchwork
  -1 siblings, 0 replies; 121+ messages in thread
From: Patchwork @ 2022-07-01 23:19 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/vm_bind: Add VM_BIND functionality
URL   : https://patchwork.freedesktop.org/series/105879/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 121+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/vm_bind: Add VM_BIND functionality
  2022-07-01 22:50 ` [Intel-gfx] " Niranjana Vishwanathapura
                   ` (12 preceding siblings ...)
  (?)
@ 2022-07-01 23:40 ` Patchwork
  -1 siblings, 0 replies; 121+ messages in thread
From: Patchwork @ 2022-07-01 23:40 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 12716 bytes --]

== Series Details ==

Series: drm/i915/vm_bind: Add VM_BIND functionality
URL   : https://patchwork.freedesktop.org/series/105879/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11841 -> Patchwork_105879v1
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_105879v1 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_105879v1, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/index.html

Participating hosts (41 -> 40)
------------------------------

  Additional (2): fi-hsw-4770 bat-jsl-1 
  Missing    (3): bat-dg2-8 fi-icl-u2 fi-tgl-u2 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_105879v1:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_module_load@load:
    - fi-ilk-650:         [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/fi-ilk-650/igt@i915_module_load@load.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-ilk-650/igt@i915_module_load@load.html
    - fi-blb-e6850:       [PASS][3] -> [INCOMPLETE][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/fi-blb-e6850/igt@i915_module_load@load.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-blb-e6850/igt@i915_module_load@load.html
    - fi-pnv-d510:        [PASS][5] -> [INCOMPLETE][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/fi-pnv-d510/igt@i915_module_load@load.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-pnv-d510/igt@i915_module_load@load.html
    - fi-snb-2520m:       [PASS][7] -> [INCOMPLETE][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/fi-snb-2520m/igt@i915_module_load@load.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-snb-2520m/igt@i915_module_load@load.html
    - fi-elk-e7500:       [PASS][9] -> [INCOMPLETE][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/fi-elk-e7500/igt@i915_module_load@load.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-elk-e7500/igt@i915_module_load@load.html
    - fi-hsw-g3258:       [PASS][11] -> [INCOMPLETE][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/fi-hsw-g3258/igt@i915_module_load@load.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-hsw-g3258/igt@i915_module_load@load.html
    - fi-hsw-4770:        NOTRUN -> [INCOMPLETE][13]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-hsw-4770/igt@i915_module_load@load.html
    - fi-ivb-3770:        [PASS][14] -> [INCOMPLETE][15]
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/fi-ivb-3770/igt@i915_module_load@load.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-ivb-3770/igt@i915_module_load@load.html
    - fi-snb-2600:        [PASS][16] -> [INCOMPLETE][17]
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/fi-snb-2600/igt@i915_module_load@load.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-snb-2600/igt@i915_module_load@load.html

  * igt@i915_pm_rpm@module-reload:
    - fi-cfl-8109u:       [PASS][18] -> [INCOMPLETE][19]
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/fi-cfl-8109u/igt@i915_pm_rpm@module-reload.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-cfl-8109u/igt@i915_pm_rpm@module-reload.html

  * igt@kms_addfb_basic@addfb25-bad-modifier:
    - fi-kbl-soraka:      [PASS][20] -> [INCOMPLETE][21]
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/fi-kbl-soraka/igt@kms_addfb_basic@addfb25-bad-modifier.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-kbl-soraka/igt@kms_addfb_basic@addfb25-bad-modifier.html

  
Known issues
------------

  Here are the changes found in Patchwork_105879v1 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_lmem_swapping@basic:
    - fi-apl-guc:         NOTRUN -> [SKIP][22] ([fdo#109271] / [i915#4613]) +3 similar issues
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-apl-guc/igt@gem_lmem_swapping@basic.html

  * igt@i915_selftest@live@gt_heartbeat:
    - fi-kbl-8809g:       [PASS][23] -> [DMESG-FAIL][24] ([i915#5334])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/fi-kbl-8809g/igt@i915_selftest@live@gt_heartbeat.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-kbl-8809g/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_suspend@basic-s2idle-without-i915:
    - fi-bdw-gvtdvm:      NOTRUN -> [INCOMPLETE][25] ([i915#4817])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-bdw-gvtdvm/igt@i915_suspend@basic-s2idle-without-i915.html

  * igt@kms_chamelium@hdmi-crc-fast:
    - fi-apl-guc:         NOTRUN -> [SKIP][26] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-apl-guc/igt@kms_chamelium@hdmi-crc-fast.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-bdw-gvtdvm:      NOTRUN -> [SKIP][27] ([fdo#109271] / [fdo#111827]) +7 similar issues
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-bdw-gvtdvm/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@kms_flip@basic-flip-vs-modeset@b-edp1:
    - bat-adlp-4:         [PASS][28] -> [DMESG-WARN][29] ([i915#3576]) +1 similar issue
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/bat-adlp-4/igt@kms_flip@basic-flip-vs-modeset@b-edp1.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/bat-adlp-4/igt@kms_flip@basic-flip-vs-modeset@b-edp1.html

  * igt@kms_flip@basic-plain-flip:
    - fi-bdw-gvtdvm:      NOTRUN -> [SKIP][30] ([fdo#109271]) +31 similar issues
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-bdw-gvtdvm/igt@kms_flip@basic-plain-flip.html

  * igt@kms_force_connector_basic@force-connector-state:
    - fi-apl-guc:         NOTRUN -> [SKIP][31] ([fdo#109271]) +11 similar issues
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-apl-guc/igt@kms_force_connector_basic@force-connector-state.html

  * igt@runner@aborted:
    - fi-hsw-4770:        NOTRUN -> [FAIL][32] ([i915#4312] / [i915#5594] / [i915#6246])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-hsw-4770/igt@runner@aborted.html
    - fi-ivb-3770:        NOTRUN -> [FAIL][33] ([i915#4312] / [i915#6219])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-ivb-3770/igt@runner@aborted.html
    - fi-elk-e7500:       NOTRUN -> [FAIL][34] ([i915#4312])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-elk-e7500/igt@runner@aborted.html
    - fi-snb-2600:        NOTRUN -> [FAIL][35] ([i915#4312])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-snb-2600/igt@runner@aborted.html
    - fi-ilk-650:         NOTRUN -> [FAIL][36] ([i915#4312])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-ilk-650/igt@runner@aborted.html
    - fi-snb-2520m:       NOTRUN -> [FAIL][37] ([i915#4312])
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-snb-2520m/igt@runner@aborted.html
    - fi-hsw-g3258:       NOTRUN -> [FAIL][38] ([i915#4312] / [i915#6246])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-hsw-g3258/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@gem_exec_parallel@engines@fds:
    - fi-bdw-gvtdvm:      [INCOMPLETE][39] -> [PASS][40]
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/fi-bdw-gvtdvm/igt@gem_exec_parallel@engines@fds.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-bdw-gvtdvm/igt@gem_exec_parallel@engines@fds.html

  * igt@gem_render_tiled_blits@basic:
    - fi-apl-guc:         [INCOMPLETE][41] ([i915#6274]) -> [PASS][42]
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/fi-apl-guc/igt@gem_render_tiled_blits@basic.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-apl-guc/igt@gem_render_tiled_blits@basic.html

  * igt@i915_module_load@reload:
    - {bat-adln-1}:       [DMESG-WARN][43] ([i915#6297]) -> [PASS][44]
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/bat-adln-1/igt@i915_module_load@reload.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/bat-adln-1/igt@i915_module_load@reload.html

  * igt@i915_selftest@live@hangcheck:
    - bat-dg1-5:          [DMESG-FAIL][45] ([i915#4494] / [i915#4957]) -> [PASS][46]
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/bat-dg1-5/igt@i915_selftest@live@hangcheck.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/bat-dg1-5/igt@i915_selftest@live@hangcheck.html

  
#### Warnings ####

  * igt@runner@aborted:
    - fi-blb-e6850:       [FAIL][47] ([fdo#109271] / [i915#2403] / [i915#4312]) -> [FAIL][48] ([i915#2403] / [i915#4312])
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/fi-blb-e6850/igt@runner@aborted.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-blb-e6850/igt@runner@aborted.html
    - fi-pnv-d510:        [FAIL][49] ([fdo#109271] / [i915#2403] / [i915#4312]) -> [FAIL][50] ([i915#2403] / [i915#4312])
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11841/fi-pnv-d510/igt@runner@aborted.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/fi-pnv-d510/igt@runner@aborted.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2403]: https://gitlab.freedesktop.org/drm/intel/issues/2403
  [i915#3003]: https://gitlab.freedesktop.org/drm/intel/issues/3003
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3576]: https://gitlab.freedesktop.org/drm/intel/issues/3576
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4494]: https://gitlab.freedesktop.org/drm/intel/issues/4494
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4817]: https://gitlab.freedesktop.org/drm/intel/issues/4817
  [i915#4957]: https://gitlab.freedesktop.org/drm/intel/issues/4957
  [i915#5334]: https://gitlab.freedesktop.org/drm/intel/issues/5334
  [i915#5594]: https://gitlab.freedesktop.org/drm/intel/issues/5594
  [i915#5903]: https://gitlab.freedesktop.org/drm/intel/issues/5903
  [i915#6219]: https://gitlab.freedesktop.org/drm/intel/issues/6219
  [i915#6246]: https://gitlab.freedesktop.org/drm/intel/issues/6246
  [i915#6274]: https://gitlab.freedesktop.org/drm/intel/issues/6274
  [i915#6297]: https://gitlab.freedesktop.org/drm/intel/issues/6297


Build changes
-------------

  * Linux: CI_DRM_11841 -> Patchwork_105879v1

  CI-20190529: 20190529
  CI_DRM_11841: 5ad538f9740c12845ca64cdf7148c2b3e6d993e9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6554: 2824470eeed46d448ccc8111f96736da3abe66b5 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_105879v1: 5ad538f9740c12845ca64cdf7148c2b3e6d993e9 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

6f21ff9e08ce drm/i915/vm_bind: Fix vm->vm_bind_mutex and vm->mutex nesting
e6705f1e5068 drm/i915/vm_bind: Skip vma_lookup for persistent vmas
6dabb2eae5ad drm/i915/vm_bind: userptr dma-resv changes
db981be3485e drm/i915/vm_bind: Handle persistent vmas in execbuf3
87dbb471a109 drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
344a84111151 drm/i915/vm_bind: Handle persistent vmas
ee410225efb8 drm/i915/vm_bind: Add out fence support
0690e7045848 drm/i915/vm_bind: Support private and shared BOs
7b5cc6153270 drm/i915/vm_bind: Bind and unbind mappings
e746d4c4a6c9 drm/i915/vm_bind: Introduce VM_BIND ioctl

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105879v1/index.html

[-- Attachment #2: Type: text/html, Size: 14894 bytes --]

^ permalink raw reply	[flat|nested] 121+ messages in thread

* RE: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
  2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-04 17:05     ` Zeng, Oak
  -1 siblings, 0 replies; 121+ messages in thread
From: Zeng, Oak @ 2022-07-04 17:05 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana, intel-gfx, dri-devel
  Cc: Vetter, Daniel, christian.koenig, Hellstrom, Thomas, Zanoni,
	Paulo R, Auld, Matthew



Thanks,
Oak

> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of
> Niranjana Vishwanathapura
> Sent: July 1, 2022 6:51 PM
> To: intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Cc: Zanoni, Paulo R <paulo.r.zanoni@intel.com>; Hellstrom, Thomas
> <thomas.hellstrom@intel.com>; Auld, Matthew <matthew.auld@intel.com>;
> Vetter, Daniel <daniel.vetter@intel.com>; christian.koenig@amd.com
> Subject: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
> 
> Treat VM_BIND vmas as persistent and handle them during the request
> submission in the execbuff path.

Hi Niranjana,

Is the meaning of "persistent" above persistent across all the subsequent execbuf ioctls?

Thanks,
Oak 

> 
> Support eviction by maintaining a list of evicted persistent vmas for rebinding
> during next submission.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  3 +
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 12 ++-
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>  drivers/gpu/drm/i915/i915_gem_gtt.h           | 22 ++++++
>  drivers/gpu/drm/i915/i915_vma.c               | 32 +++++++-
>  drivers/gpu/drm/i915/i915_vma.h               | 78 +++++++++++++++++--
>  drivers/gpu/drm/i915/i915_vma_types.h         | 23 ++++++
>  9 files changed, 163 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index ccec4055fde3..5121f02ba95c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -38,6 +38,7 @@
>  #include "i915_gem_mman.h"
>  #include "i915_gem_object.h"
>  #include "i915_gem_ttm.h"
> +#include "i915_gem_vm_bind.h"
>  #include "i915_memcpy.h"
>  #include "i915_trace.h"
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> index 849bf3c1061e..eaadf5a6ab09 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -6,6 +6,7 @@
>  #ifndef __I915_GEM_VM_BIND_H
>  #define __I915_GEM_VM_BIND_H
> 
> +#include <linux/dma-resv.h>
>  #include "i915_drv.h"
> 
>  #define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
> >vm_bind_lock)
> @@ -26,6 +27,8 @@ static inline void i915_gem_vm_bind_unlock(struct
> i915_address_space *vm)
>  	mutex_unlock(&vm->vm_bind_lock);
>  }
> 
> +#define assert_vm_priv_held(vm)   assert_object_held((vm)->root_obj)
> +
>  static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
>  					struct i915_gem_ww_ctx *ww)
>  {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> index 96f139cc8060..1a8efa83547f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -85,6 +85,13 @@ void i915_gem_vm_bind_remove(struct i915_vma
> *vma, bool release_obj)  {
>  	assert_vm_bind_held(vma->vm);
> 
> +	spin_lock(&vma->vm->vm_rebind_lock);
> +	if (!list_empty(&vma->vm_rebind_link))
> +		list_del_init(&vma->vm_rebind_link);
> +	i915_vma_set_purged(vma);
> +	i915_vma_set_freed(vma);
> +	spin_unlock(&vma->vm->vm_rebind_lock);
> +
>  	if (!list_empty(&vma->vm_bind_link)) {
>  		list_del_init(&vma->vm_bind_link);
>  		list_del_init(&vma->non_priv_vm_bind_link);
> @@ -220,6 +227,7 @@ static struct i915_vma *vm_bind_get_vma(struct
> i915_address_space *vm,
> 
>  	vma->start = va->start;
>  	vma->last = va->start + va->length - 1;
> +	i915_vma_set_persistent(vma);
> 
>  	return vma;
>  }
> @@ -304,8 +312,10 @@ int i915_gem_vm_bind_obj(struct
> i915_address_space *vm,
> 
>  	i915_vm_bind_put_fence(vma);
>  put_vma:
> -	if (ret)
> +	if (ret) {
> +		i915_vma_set_freed(vma);
>  		i915_vma_destroy(vma);
> +	}
> 
>  	i915_gem_ww_ctx_fini(&ww);
>  unlock_vm:
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index df0a8459c3c6..55d5389b2c6c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -293,6 +293,8 @@ void i915_address_space_init(struct
> i915_address_space *vm, int subclass)
>  	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>  	vm->root_obj = i915_gem_object_create_internal(vm->i915,
> PAGE_SIZE);
>  	GEM_BUG_ON(IS_ERR(vm->root_obj));
> +	INIT_LIST_HEAD(&vm->vm_rebind_list);
> +	spin_lock_init(&vm->vm_rebind_lock);
>  }
> 
>  void *__px_vaddr(struct drm_i915_gem_object *p) diff --git
> a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index f538ce9115c9..fe5485c4a1cd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -265,6 +265,8 @@ struct i915_address_space {
>  	struct mutex vm_bind_lock;  /* Protects vm_bind lists */
>  	struct list_head vm_bind_list;
>  	struct list_head vm_bound_list;
> +	struct list_head vm_rebind_list;
> +	spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
>  	/* va tree of persistent vmas */
>  	struct rb_root_cached va;
>  	struct list_head non_priv_vm_bind_list; diff --git
> a/drivers/gpu/drm/i915/i915_gem_gtt.h
> b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 8c2f57eb5dda..09b89d1913fc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -51,4 +51,26 @@ int i915_gem_gtt_insert(struct i915_address_space
> *vm,
> 
>  #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
> 
> +static inline int i915_vm_sync(struct i915_address_space *vm) {
> +	int ret;
> +
> +	/* Wait for all requests under this vm to finish */
> +	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
> +				    DMA_RESV_USAGE_BOOKKEEP, false,
> +				    MAX_SCHEDULE_TIMEOUT);
> +	if (ret < 0)
> +		return ret;
> +	else if (ret > 0)
> +		return 0;
> +	else
> +		return -ETIMEDOUT;
> +}
> +
> +static inline bool i915_vm_is_active(const struct i915_address_space
> +*vm) {
> +	return !dma_resv_test_signaled(vm->root_obj->base.resv,
> +				       DMA_RESV_USAGE_BOOKKEEP);
> +}
> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/i915_vma.c
> b/drivers/gpu/drm/i915/i915_vma.c index 6737236b7884..6adb013579be
> 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
> 
>  	INIT_LIST_HEAD(&vma->vm_bind_link);
>  	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
> +	INIT_LIST_HEAD(&vma->vm_rebind_link);
>  	return vma;
> 
>  err_unlock:
> @@ -1622,7 +1623,8 @@ void i915_vma_close(struct i915_vma *vma)
>  	if (atomic_dec_and_lock_irqsave(&vma->open_count,
>  					&gt->closed_lock,
>  					flags)) {
> -		__vma_close(vma, gt);
> +		if (!i915_vma_is_persistent(vma))
> +			__vma_close(vma, gt);
>  		spin_unlock_irqrestore(&gt->closed_lock, flags);
>  	}
>  }
> @@ -1647,6 +1649,13 @@ static void force_unbind(struct i915_vma *vma)
>  	if (!drm_mm_node_allocated(&vma->node))
>  		return;
> 
> +	/*
> +	 * Mark persistent vma as purged to avoid it waiting
> +	 * for VM to be released.
> +	 */
> +	if (i915_vma_is_persistent(vma))
> +		i915_vma_set_purged(vma);
> +
>  	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
>  	WARN_ON(__i915_vma_unbind(vma));
>  	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
> @@ -1666,9 +1675,12 @@ static void release_references(struct i915_vma
> *vma, bool vm_ddestroy)
> 
>  	spin_unlock(&obj->vma.lock);
> 
> -	i915_gem_vm_bind_lock(vma->vm);
> -	i915_gem_vm_bind_remove(vma, true);
> -	i915_gem_vm_bind_unlock(vma->vm);
> +	if (i915_vma_is_persistent(vma) &&
> +	    !i915_vma_is_freed(vma)) {
> +		i915_gem_vm_bind_lock(vma->vm);
> +		i915_gem_vm_bind_remove(vma, true);
> +		i915_gem_vm_bind_unlock(vma->vm);
> +	}
> 
>  	spin_lock_irq(&gt->closed_lock);
>  	__i915_vma_remove_closed(vma);
> @@ -1839,6 +1851,8 @@ int _i915_vma_move_to_active(struct i915_vma
> *vma,
>  	int err;
> 
>  	assert_object_held(obj);
> +	if (i915_vma_is_persistent(vma))
> +		return -EINVAL;
> 
>  	GEM_BUG_ON(!vma->pages);
> 
> @@ -1999,6 +2013,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
>  	__i915_vma_evict(vma, false);
> 
>  	drm_mm_remove_node(&vma->node); /* pairs with
> i915_vma_release() */
> +
> +	if (i915_vma_is_persistent(vma)) {
> +		spin_lock(&vma->vm->vm_rebind_lock);
> +		if (list_empty(&vma->vm_rebind_link) &&
> +		    !i915_vma_is_purged(vma))
> +			list_add_tail(&vma->vm_rebind_link,
> +				      &vma->vm->vm_rebind_list);
> +		spin_unlock(&vma->vm->vm_rebind_lock);
> +	}
> +
>  	return 0;
>  }
> 
> diff --git a/drivers/gpu/drm/i915/i915_vma.h
> b/drivers/gpu/drm/i915/i915_vma.h index dcb49f79ff7e..6c1369a40e03
> 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
> 
>  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int
> flags);  #define I915_VMA_RELEASE_MAP BIT(0)
> -
> -static inline bool i915_vma_is_active(const struct i915_vma *vma) -{
> -	return !i915_active_is_idle(&vma->active);
> -}
> -
>  /* do not reserve memory to prevent deadlocks */  #define
> __EXEC_OBJECT_NO_RESERVE BIT(31)
> 
> @@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma
> *vma)
>  	return i915_vm_to_ggtt(vma->vm)->pin_bias;
>  }
> 
> +static inline bool i915_vma_is_persistent(const struct i915_vma *vma) {
> +	return test_bit(I915_VMA_PERSISTENT_BIT,
> __i915_vma_flags(vma)); }
> +
> +static inline void i915_vma_set_persistent(struct i915_vma *vma) {
> +	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma)); }
> +
> +static inline bool i915_vma_is_purged(const struct i915_vma *vma) {
> +	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma)); }
> +
> +static inline void i915_vma_set_purged(struct i915_vma *vma) {
> +	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma)); }
> +
> +static inline bool i915_vma_is_freed(const struct i915_vma *vma) {
> +	return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma)); }
> +
> +static inline void i915_vma_set_freed(struct i915_vma *vma) {
> +	set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma)); }
> +
> +static inline bool i915_vma_is_active(const struct i915_vma *vma) {
> +	if (i915_vma_is_persistent(vma)) {
> +		if (i915_vma_is_purged(vma))
> +			return false;
> +
> +		return i915_vm_is_active(vma->vm);
> +	}
> +
> +	return !i915_active_is_idle(&vma->active);
> +}
> +
>  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)  {
>  	i915_gem_object_get(vma->obj);
> @@ -408,8 +444,36 @@ int i915_vma_wait_for_bind(struct i915_vma *vma);
> 
>  static inline int i915_vma_sync(struct i915_vma *vma)  {
> +	int ret;
> +
>  	/* Wait for the asynchronous bindings and pending GPU reads */
> -	return i915_active_wait(&vma->active);
> +	ret = i915_active_wait(&vma->active);
> +	if (ret || !i915_vma_is_persistent(vma) ||
> i915_vma_is_purged(vma))
> +		return ret;
> +
> +	return i915_vm_sync(vma->vm);
> +}
> +
> +static inline bool i915_vma_is_bind_complete(struct i915_vma *vma) {
> +	/* Ensure vma bind is initiated */
> +	if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
> +		return false;
> +
> +	/* Ensure any binding started is complete */
> +	if (rcu_access_pointer(vma->active.excl.fence)) {
> +		struct dma_fence *fence;
> +
> +		rcu_read_lock();
> +		fence = dma_fence_get_rcu_safe(&vma->active.excl.fence);
> +		rcu_read_unlock();
> +		if (fence) {
> +			dma_fence_put(fence);
> +			return false;
> +		}
> +	}
> +
> +	return true;
>  }
> 
>  /**
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> b/drivers/gpu/drm/i915/i915_vma_types.h
> index 7d830a6a0b51..405c82e1bc30 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -264,6 +264,28 @@ struct i915_vma {
>  #define I915_VMA_SCANOUT_BIT	17
>  #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
> 
> +  /**
> +   * I915_VMA_PERSISTENT_BIT:
> +   * The vma is persistent (created with VM_BIND call).
> +   *
> +   * I915_VMA_PURGED_BIT:
> +   * The persistent vma is force unbound either due to VM_UNBIND call
> +   * from UMD or VM is released. Do not check/wait for VM activeness
> +   * in i915_vma_is_active() and i915_vma_sync() calls.
> +   *
> +   * I915_VMA_FREED_BIT:
> +   * The persistent vma is being released by UMD via VM_UNBIND call.
> +   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND call
> +   * already holds the lock.
> +   */
> +#define I915_VMA_PERSISTENT_BIT	19
> +#define I915_VMA_PURGED_BIT	20
> +#define I915_VMA_FREED_BIT	21
> +
> +#define I915_VMA_PERSISTENT
> 	((int)BIT(I915_VMA_PERSISTENT_BIT))
> +#define I915_VMA_PURGED		((int)BIT(I915_VMA_PURGED_BIT))
> +#define I915_VMA_FREED		((int)BIT(I915_VMA_FREED_BIT))
> +
>  	struct i915_active active;
> 
>  #define I915_VMA_PAGES_BIAS 24
> @@ -292,6 +314,7 @@ struct i915_vma {
>  	struct list_head vm_bind_link; /* Link in persistent VMA list */
>  	/* Link in non-private persistent VMA list */
>  	struct list_head non_priv_vm_bind_link;
> +	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
> 
>  	/** Timeline fence for vm_bind completion notification */
>  	struct {
> --
> 2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
@ 2022-07-04 17:05     ` Zeng, Oak
  0 siblings, 0 replies; 121+ messages in thread
From: Zeng, Oak @ 2022-07-04 17:05 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana, intel-gfx, dri-devel
  Cc: Vetter, Daniel, christian.koenig, Hellstrom, Thomas, Zanoni,
	Paulo R, Auld, Matthew



Thanks,
Oak

> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of
> Niranjana Vishwanathapura
> Sent: July 1, 2022 6:51 PM
> To: intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Cc: Zanoni, Paulo R <paulo.r.zanoni@intel.com>; Hellstrom, Thomas
> <thomas.hellstrom@intel.com>; Auld, Matthew <matthew.auld@intel.com>;
> Vetter, Daniel <daniel.vetter@intel.com>; christian.koenig@amd.com
> Subject: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
> 
> Treat VM_BIND vmas as persistent and handle them during the request
> submission in the execbuff path.

Hi Niranjana,

Is the meaning of "persistent" above persistent across all the subsequent execbuf ioctls?

Thanks,
Oak 

> 
> Support eviction by maintaining a list of evicted persistent vmas for rebinding
> during next submission.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  3 +
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 12 ++-
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>  drivers/gpu/drm/i915/i915_gem_gtt.h           | 22 ++++++
>  drivers/gpu/drm/i915/i915_vma.c               | 32 +++++++-
>  drivers/gpu/drm/i915/i915_vma.h               | 78 +++++++++++++++++--
>  drivers/gpu/drm/i915/i915_vma_types.h         | 23 ++++++
>  9 files changed, 163 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index ccec4055fde3..5121f02ba95c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -38,6 +38,7 @@
>  #include "i915_gem_mman.h"
>  #include "i915_gem_object.h"
>  #include "i915_gem_ttm.h"
> +#include "i915_gem_vm_bind.h"
>  #include "i915_memcpy.h"
>  #include "i915_trace.h"
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> index 849bf3c1061e..eaadf5a6ab09 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -6,6 +6,7 @@
>  #ifndef __I915_GEM_VM_BIND_H
>  #define __I915_GEM_VM_BIND_H
> 
> +#include <linux/dma-resv.h>
>  #include "i915_drv.h"
> 
>  #define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
> >vm_bind_lock)
> @@ -26,6 +27,8 @@ static inline void i915_gem_vm_bind_unlock(struct
> i915_address_space *vm)
>  	mutex_unlock(&vm->vm_bind_lock);
>  }
> 
> +#define assert_vm_priv_held(vm)   assert_object_held((vm)->root_obj)
> +
>  static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
>  					struct i915_gem_ww_ctx *ww)
>  {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> index 96f139cc8060..1a8efa83547f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -85,6 +85,13 @@ void i915_gem_vm_bind_remove(struct i915_vma
> *vma, bool release_obj)  {
>  	assert_vm_bind_held(vma->vm);
> 
> +	spin_lock(&vma->vm->vm_rebind_lock);
> +	if (!list_empty(&vma->vm_rebind_link))
> +		list_del_init(&vma->vm_rebind_link);
> +	i915_vma_set_purged(vma);
> +	i915_vma_set_freed(vma);
> +	spin_unlock(&vma->vm->vm_rebind_lock);
> +
>  	if (!list_empty(&vma->vm_bind_link)) {
>  		list_del_init(&vma->vm_bind_link);
>  		list_del_init(&vma->non_priv_vm_bind_link);
> @@ -220,6 +227,7 @@ static struct i915_vma *vm_bind_get_vma(struct
> i915_address_space *vm,
> 
>  	vma->start = va->start;
>  	vma->last = va->start + va->length - 1;
> +	i915_vma_set_persistent(vma);
> 
>  	return vma;
>  }
> @@ -304,8 +312,10 @@ int i915_gem_vm_bind_obj(struct
> i915_address_space *vm,
> 
>  	i915_vm_bind_put_fence(vma);
>  put_vma:
> -	if (ret)
> +	if (ret) {
> +		i915_vma_set_freed(vma);
>  		i915_vma_destroy(vma);
> +	}
> 
>  	i915_gem_ww_ctx_fini(&ww);
>  unlock_vm:
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index df0a8459c3c6..55d5389b2c6c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -293,6 +293,8 @@ void i915_address_space_init(struct
> i915_address_space *vm, int subclass)
>  	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>  	vm->root_obj = i915_gem_object_create_internal(vm->i915,
> PAGE_SIZE);
>  	GEM_BUG_ON(IS_ERR(vm->root_obj));
> +	INIT_LIST_HEAD(&vm->vm_rebind_list);
> +	spin_lock_init(&vm->vm_rebind_lock);
>  }
> 
>  void *__px_vaddr(struct drm_i915_gem_object *p) diff --git
> a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index f538ce9115c9..fe5485c4a1cd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -265,6 +265,8 @@ struct i915_address_space {
>  	struct mutex vm_bind_lock;  /* Protects vm_bind lists */
>  	struct list_head vm_bind_list;
>  	struct list_head vm_bound_list;
> +	struct list_head vm_rebind_list;
> +	spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
>  	/* va tree of persistent vmas */
>  	struct rb_root_cached va;
>  	struct list_head non_priv_vm_bind_list; diff --git
> a/drivers/gpu/drm/i915/i915_gem_gtt.h
> b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 8c2f57eb5dda..09b89d1913fc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -51,4 +51,26 @@ int i915_gem_gtt_insert(struct i915_address_space
> *vm,
> 
>  #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
> 
> +static inline int i915_vm_sync(struct i915_address_space *vm) {
> +	int ret;
> +
> +	/* Wait for all requests under this vm to finish */
> +	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
> +				    DMA_RESV_USAGE_BOOKKEEP, false,
> +				    MAX_SCHEDULE_TIMEOUT);
> +	if (ret < 0)
> +		return ret;
> +	else if (ret > 0)
> +		return 0;
> +	else
> +		return -ETIMEDOUT;
> +}
> +
> +static inline bool i915_vm_is_active(const struct i915_address_space
> +*vm) {
> +	return !dma_resv_test_signaled(vm->root_obj->base.resv,
> +				       DMA_RESV_USAGE_BOOKKEEP);
> +}
> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/i915_vma.c
> b/drivers/gpu/drm/i915/i915_vma.c index 6737236b7884..6adb013579be
> 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
> 
>  	INIT_LIST_HEAD(&vma->vm_bind_link);
>  	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
> +	INIT_LIST_HEAD(&vma->vm_rebind_link);
>  	return vma;
> 
>  err_unlock:
> @@ -1622,7 +1623,8 @@ void i915_vma_close(struct i915_vma *vma)
>  	if (atomic_dec_and_lock_irqsave(&vma->open_count,
>  					&gt->closed_lock,
>  					flags)) {
> -		__vma_close(vma, gt);
> +		if (!i915_vma_is_persistent(vma))
> +			__vma_close(vma, gt);
>  		spin_unlock_irqrestore(&gt->closed_lock, flags);
>  	}
>  }
> @@ -1647,6 +1649,13 @@ static void force_unbind(struct i915_vma *vma)
>  	if (!drm_mm_node_allocated(&vma->node))
>  		return;
> 
> +	/*
> +	 * Mark persistent vma as purged to avoid it waiting
> +	 * for VM to be released.
> +	 */
> +	if (i915_vma_is_persistent(vma))
> +		i915_vma_set_purged(vma);
> +
>  	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
>  	WARN_ON(__i915_vma_unbind(vma));
>  	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
> @@ -1666,9 +1675,12 @@ static void release_references(struct i915_vma
> *vma, bool vm_ddestroy)
> 
>  	spin_unlock(&obj->vma.lock);
> 
> -	i915_gem_vm_bind_lock(vma->vm);
> -	i915_gem_vm_bind_remove(vma, true);
> -	i915_gem_vm_bind_unlock(vma->vm);
> +	if (i915_vma_is_persistent(vma) &&
> +	    !i915_vma_is_freed(vma)) {
> +		i915_gem_vm_bind_lock(vma->vm);
> +		i915_gem_vm_bind_remove(vma, true);
> +		i915_gem_vm_bind_unlock(vma->vm);
> +	}
> 
>  	spin_lock_irq(&gt->closed_lock);
>  	__i915_vma_remove_closed(vma);
> @@ -1839,6 +1851,8 @@ int _i915_vma_move_to_active(struct i915_vma
> *vma,
>  	int err;
> 
>  	assert_object_held(obj);
> +	if (i915_vma_is_persistent(vma))
> +		return -EINVAL;
> 
>  	GEM_BUG_ON(!vma->pages);
> 
> @@ -1999,6 +2013,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
>  	__i915_vma_evict(vma, false);
> 
>  	drm_mm_remove_node(&vma->node); /* pairs with
> i915_vma_release() */
> +
> +	if (i915_vma_is_persistent(vma)) {
> +		spin_lock(&vma->vm->vm_rebind_lock);
> +		if (list_empty(&vma->vm_rebind_link) &&
> +		    !i915_vma_is_purged(vma))
> +			list_add_tail(&vma->vm_rebind_link,
> +				      &vma->vm->vm_rebind_list);
> +		spin_unlock(&vma->vm->vm_rebind_lock);
> +	}
> +
>  	return 0;
>  }
> 
> diff --git a/drivers/gpu/drm/i915/i915_vma.h
> b/drivers/gpu/drm/i915/i915_vma.h index dcb49f79ff7e..6c1369a40e03
> 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
> 
>  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int
> flags);  #define I915_VMA_RELEASE_MAP BIT(0)
> -
> -static inline bool i915_vma_is_active(const struct i915_vma *vma) -{
> -	return !i915_active_is_idle(&vma->active);
> -}
> -
>  /* do not reserve memory to prevent deadlocks */  #define
> __EXEC_OBJECT_NO_RESERVE BIT(31)
> 
> @@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma
> *vma)
>  	return i915_vm_to_ggtt(vma->vm)->pin_bias;
>  }
> 
> +static inline bool i915_vma_is_persistent(const struct i915_vma *vma) {
> +	return test_bit(I915_VMA_PERSISTENT_BIT,
> __i915_vma_flags(vma)); }
> +
> +static inline void i915_vma_set_persistent(struct i915_vma *vma) {
> +	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma)); }
> +
> +static inline bool i915_vma_is_purged(const struct i915_vma *vma) {
> +	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma)); }
> +
> +static inline void i915_vma_set_purged(struct i915_vma *vma) {
> +	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma)); }
> +
> +static inline bool i915_vma_is_freed(const struct i915_vma *vma) {
> +	return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma)); }
> +
> +static inline void i915_vma_set_freed(struct i915_vma *vma) {
> +	set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma)); }
> +
> +static inline bool i915_vma_is_active(const struct i915_vma *vma) {
> +	if (i915_vma_is_persistent(vma)) {
> +		if (i915_vma_is_purged(vma))
> +			return false;
> +
> +		return i915_vm_is_active(vma->vm);
> +	}
> +
> +	return !i915_active_is_idle(&vma->active);
> +}
> +
>  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)  {
>  	i915_gem_object_get(vma->obj);
> @@ -408,8 +444,36 @@ int i915_vma_wait_for_bind(struct i915_vma *vma);
> 
>  static inline int i915_vma_sync(struct i915_vma *vma)  {
> +	int ret;
> +
>  	/* Wait for the asynchronous bindings and pending GPU reads */
> -	return i915_active_wait(&vma->active);
> +	ret = i915_active_wait(&vma->active);
> +	if (ret || !i915_vma_is_persistent(vma) ||
> i915_vma_is_purged(vma))
> +		return ret;
> +
> +	return i915_vm_sync(vma->vm);
> +}
> +
> +static inline bool i915_vma_is_bind_complete(struct i915_vma *vma) {
> +	/* Ensure vma bind is initiated */
> +	if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
> +		return false;
> +
> +	/* Ensure any binding started is complete */
> +	if (rcu_access_pointer(vma->active.excl.fence)) {
> +		struct dma_fence *fence;
> +
> +		rcu_read_lock();
> +		fence = dma_fence_get_rcu_safe(&vma->active.excl.fence);
> +		rcu_read_unlock();
> +		if (fence) {
> +			dma_fence_put(fence);
> +			return false;
> +		}
> +	}
> +
> +	return true;
>  }
> 
>  /**
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> b/drivers/gpu/drm/i915/i915_vma_types.h
> index 7d830a6a0b51..405c82e1bc30 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -264,6 +264,28 @@ struct i915_vma {
>  #define I915_VMA_SCANOUT_BIT	17
>  #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
> 
> +  /**
> +   * I915_VMA_PERSISTENT_BIT:
> +   * The vma is persistent (created with VM_BIND call).
> +   *
> +   * I915_VMA_PURGED_BIT:
> +   * The persistent vma is force unbound either due to VM_UNBIND call
> +   * from UMD or VM is released. Do not check/wait for VM activeness
> +   * in i915_vma_is_active() and i915_vma_sync() calls.
> +   *
> +   * I915_VMA_FREED_BIT:
> +   * The persistent vma is being released by UMD via VM_UNBIND call.
> +   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND call
> +   * already holds the lock.
> +   */
> +#define I915_VMA_PERSISTENT_BIT	19
> +#define I915_VMA_PURGED_BIT	20
> +#define I915_VMA_FREED_BIT	21
> +
> +#define I915_VMA_PERSISTENT
> 	((int)BIT(I915_VMA_PERSISTENT_BIT))
> +#define I915_VMA_PURGED		((int)BIT(I915_VMA_PURGED_BIT))
> +#define I915_VMA_FREED		((int)BIT(I915_VMA_FREED_BIT))
> +
>  	struct i915_active active;
> 
>  #define I915_VMA_PAGES_BIAS 24
> @@ -292,6 +314,7 @@ struct i915_vma {
>  	struct list_head vm_bind_link; /* Link in persistent VMA list */
>  	/* Link in non-private persistent VMA list */
>  	struct list_head non_priv_vm_bind_link;
> +	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
> 
>  	/** Timeline fence for vm_bind completion notification */
>  	struct {
> --
> 2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 10/10] drm/i915/vm_bind: Fix vm->vm_bind_mutex and vm->mutex nesting
  2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-05  8:40     ` Thomas Hellström
  -1 siblings, 0 replies; 121+ messages in thread
From: Thomas Hellström @ 2022-07-05  8:40 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, matthew.auld, jason, daniel.vetter,
	christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> VM_BIND functionality maintain that vm->vm_bind_mutex will never be
> taken
> while holding vm->mutex.
> However, while closing 'vm', vma is destroyed while holding vm-
> >mutex.
> But vma releasing needs to take vm->vm_bind_mutex in order to delete
> vma
> from the vm_bind_list. To avoid this, destroy the vma outside vm-
> >mutex
> while closing the 'vm'.
> 
> Signed-off-by: Niranjana Vishwanathapura

First, when introducing a new feature like this, we should not need to
end the series with "Fix.." patches like this, rather whatever needs to
be fixed should be fixed where the code was introduced.

Second, an analogy whith linux kernel CPU mapping, could we instead
think of the vm_bind_lock being similar to the mmap_lock, and the
vm_mutex being similar to the i_mmap_lock, the former being used for VA
manipulation and the latter when attaching / removing the backing store
from the VA?

Then we would not need to take the vm_bind_lock from vma destruction
since the VA would already have been reclaimed at that point. For vm
destruction here we'd loop over all relevant vm bind VAs under the
vm_bind lock and call vm_unbind? Would that work?

/Thomas


> <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gtt.c | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 4ab3bda644ff..4f707d0eb3ef 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -109,7 +109,8 @@ int map_pt_dma_locked(struct i915_address_space
> *vm, struct drm_i915_gem_object
>         return 0;
>  }
>  
> -static void clear_vm_list(struct list_head *list)
> +static void clear_vm_list(struct list_head *list,
> +                         struct list_head *destroy_list)
>  {
>         struct i915_vma *vma, *vn;
>  
> @@ -138,8 +139,7 @@ static void clear_vm_list(struct list_head *list)
>                         i915_vm_resv_get(vma->vm);
>                         vma->vm_ddestroy = true;
>                 } else {
> -                       i915_vma_destroy_locked(vma);
> -                       i915_gem_object_put(obj);
> +                       list_move_tail(&vma->vm_link, destroy_list);
>                 }
>  
>         }
> @@ -147,16 +147,29 @@ static void clear_vm_list(struct list_head
> *list)
>  
>  static void __i915_vm_close(struct i915_address_space *vm)
>  {
> +       struct i915_vma *vma, *vn;
> +       struct list_head list;
> +
> +       INIT_LIST_HEAD(&list);
> +
>         mutex_lock(&vm->mutex);
>  
> -       clear_vm_list(&vm->bound_list);
> -       clear_vm_list(&vm->unbound_list);
> +       clear_vm_list(&vm->bound_list, &list);
> +       clear_vm_list(&vm->unbound_list, &list);
>  
>         /* Check for must-fix unanticipated side-effects */
>         GEM_BUG_ON(!list_empty(&vm->bound_list));
>         GEM_BUG_ON(!list_empty(&vm->unbound_list));
>  
>         mutex_unlock(&vm->mutex);
> +
> +       /* Destroy vmas outside vm->mutex */
> +       list_for_each_entry_safe(vma, vn, &list, vm_link) {
> +               struct drm_i915_gem_object *obj = vma->obj;
> +
> +               i915_vma_destroy(vma);
> +               i915_gem_object_put(obj);
> +       }
>  }
>  
>  /* lock the vm into the current ww, if we lock one, we lock all */


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 10/10] drm/i915/vm_bind: Fix vm->vm_bind_mutex and vm->mutex nesting
@ 2022-07-05  8:40     ` Thomas Hellström
  0 siblings, 0 replies; 121+ messages in thread
From: Thomas Hellström @ 2022-07-05  8:40 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: paulo.r.zanoni, matthew.auld, daniel.vetter, christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> VM_BIND functionality maintain that vm->vm_bind_mutex will never be
> taken
> while holding vm->mutex.
> However, while closing 'vm', vma is destroyed while holding vm-
> >mutex.
> But vma releasing needs to take vm->vm_bind_mutex in order to delete
> vma
> from the vm_bind_list. To avoid this, destroy the vma outside vm-
> >mutex
> while closing the 'vm'.
> 
> Signed-off-by: Niranjana Vishwanathapura

First, when introducing a new feature like this, we should not need to
end the series with "Fix.." patches like this, rather whatever needs to
be fixed should be fixed where the code was introduced.

Second, an analogy whith linux kernel CPU mapping, could we instead
think of the vm_bind_lock being similar to the mmap_lock, and the
vm_mutex being similar to the i_mmap_lock, the former being used for VA
manipulation and the latter when attaching / removing the backing store
from the VA?

Then we would not need to take the vm_bind_lock from vma destruction
since the VA would already have been reclaimed at that point. For vm
destruction here we'd loop over all relevant vm bind VAs under the
vm_bind lock and call vm_unbind? Would that work?

/Thomas


> <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gtt.c | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 4ab3bda644ff..4f707d0eb3ef 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -109,7 +109,8 @@ int map_pt_dma_locked(struct i915_address_space
> *vm, struct drm_i915_gem_object
>         return 0;
>  }
>  
> -static void clear_vm_list(struct list_head *list)
> +static void clear_vm_list(struct list_head *list,
> +                         struct list_head *destroy_list)
>  {
>         struct i915_vma *vma, *vn;
>  
> @@ -138,8 +139,7 @@ static void clear_vm_list(struct list_head *list)
>                         i915_vm_resv_get(vma->vm);
>                         vma->vm_ddestroy = true;
>                 } else {
> -                       i915_vma_destroy_locked(vma);
> -                       i915_gem_object_put(obj);
> +                       list_move_tail(&vma->vm_link, destroy_list);
>                 }
>  
>         }
> @@ -147,16 +147,29 @@ static void clear_vm_list(struct list_head
> *list)
>  
>  static void __i915_vm_close(struct i915_address_space *vm)
>  {
> +       struct i915_vma *vma, *vn;
> +       struct list_head list;
> +
> +       INIT_LIST_HEAD(&list);
> +
>         mutex_lock(&vm->mutex);
>  
> -       clear_vm_list(&vm->bound_list);
> -       clear_vm_list(&vm->unbound_list);
> +       clear_vm_list(&vm->bound_list, &list);
> +       clear_vm_list(&vm->unbound_list, &list);
>  
>         /* Check for must-fix unanticipated side-effects */
>         GEM_BUG_ON(!list_empty(&vm->bound_list));
>         GEM_BUG_ON(!list_empty(&vm->unbound_list));
>  
>         mutex_unlock(&vm->mutex);
> +
> +       /* Destroy vmas outside vm->mutex */
> +       list_for_each_entry_safe(vma, vn, &list, vm_link) {
> +               struct drm_i915_gem_object *obj = vma->obj;
> +
> +               i915_vma_destroy(vma);
> +               i915_gem_object_put(obj);
> +       }
>  }
>  
>  /* lock the vm into the current ww, if we lock one, we lock all */


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 09/10] drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-05  8:57     ` Thomas Hellström
  -1 siblings, 0 replies; 121+ messages in thread
From: Thomas Hellström @ 2022-07-05  8:57 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: paulo.r.zanoni, matthew.auld, daniel.vetter, christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> vma_lookup is tied to segment of the object instead of section
> of VA space. Hence, it do not support aliasing (ie., multiple
> bindings to the same section of the object).
> Skip vma_lookup for persistent vmas as it supports aliasing.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_vma.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_vma.c
> b/drivers/gpu/drm/i915/i915_vma.c
> index 6adb013579be..9aa38b772b5b 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -197,6 +197,10 @@ vma_create(struct drm_i915_gem_object *obj,
>                 __set_bit(I915_VMA_GGTT_BIT, __i915_vma_flags(vma));
>         }
>  
> +       if (!i915_vma_is_ggtt(vma) &&
> +           (view && view->type == I915_GGTT_VIEW_PARTIAL))
> +               goto skip_rb_insert;
> +

Rather than guessing that a vma with this signature is a persistent
vma, which is confusing to the reader, could we have an argument saying
we want to create a persistent vma?

>         rb = NULL;
>         p = &obj->vma.tree.rb_node;
>         while (*p) {
> @@ -221,6 +225,7 @@ vma_create(struct drm_i915_gem_object *obj,
>         rb_link_node(&vma->obj_node, rb, p);
>         rb_insert_color(&vma->obj_node, &obj->vma.tree);
>  
> +skip_rb_insert:
>         if (i915_vma_is_ggtt(vma))
>                 /*
>                  * We put the GGTT vma at the start of the vma-list,
> followed
> @@ -292,13 +297,16 @@ i915_vma_instance(struct drm_i915_gem_object
> *obj,
>                   struct i915_address_space *vm,
>                   const struct i915_ggtt_view *view)
>  {
> -       struct i915_vma *vma;
> +       struct i915_vma *vma = NULL;
>  
>         GEM_BUG_ON(!kref_read(&vm->ref));
>  
> -       spin_lock(&obj->vma.lock);
> -       vma = i915_vma_lookup(obj, vm, view);
> -       spin_unlock(&obj->vma.lock);
> +       if (i915_is_ggtt(vm) || !view ||
> +           view->type != I915_GGTT_VIEW_PARTIAL) {

Same here?

/Thomas


> +               spin_lock(&obj->vma.lock);
> +               vma = i915_vma_lookup(obj, vm, view);
> +               spin_unlock(&obj->vma.lock);
> +       }
>  
>         /* vma_create() will resolve the race if another creates the
> vma */
>         if (unlikely(!vma))
> @@ -1670,7 +1678,8 @@ static void release_references(struct i915_vma
> *vma, bool vm_ddestroy)
>  
>         spin_lock(&obj->vma.lock);
>         list_del(&vma->obj_link);
> -       if (!RB_EMPTY_NODE(&vma->obj_node))
> +       if (!i915_vma_is_persistent(vma) &&
> +           !RB_EMPTY_NODE(&vma->obj_node))
>                 rb_erase(&vma->obj_node, &obj->vma.tree);
>  
>         spin_unlock(&obj->vma.lock);


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 09/10] drm/i915/vm_bind: Skip vma_lookup for persistent vmas
@ 2022-07-05  8:57     ` Thomas Hellström
  0 siblings, 0 replies; 121+ messages in thread
From: Thomas Hellström @ 2022-07-05  8:57 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, matthew.auld, jason, daniel.vetter,
	christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> vma_lookup is tied to segment of the object instead of section
> of VA space. Hence, it do not support aliasing (ie., multiple
> bindings to the same section of the object).
> Skip vma_lookup for persistent vmas as it supports aliasing.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_vma.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_vma.c
> b/drivers/gpu/drm/i915/i915_vma.c
> index 6adb013579be..9aa38b772b5b 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -197,6 +197,10 @@ vma_create(struct drm_i915_gem_object *obj,
>                 __set_bit(I915_VMA_GGTT_BIT, __i915_vma_flags(vma));
>         }
>  
> +       if (!i915_vma_is_ggtt(vma) &&
> +           (view && view->type == I915_GGTT_VIEW_PARTIAL))
> +               goto skip_rb_insert;
> +

Rather than guessing that a vma with this signature is a persistent
vma, which is confusing to the reader, could we have an argument saying
we want to create a persistent vma?

>         rb = NULL;
>         p = &obj->vma.tree.rb_node;
>         while (*p) {
> @@ -221,6 +225,7 @@ vma_create(struct drm_i915_gem_object *obj,
>         rb_link_node(&vma->obj_node, rb, p);
>         rb_insert_color(&vma->obj_node, &obj->vma.tree);
>  
> +skip_rb_insert:
>         if (i915_vma_is_ggtt(vma))
>                 /*
>                  * We put the GGTT vma at the start of the vma-list,
> followed
> @@ -292,13 +297,16 @@ i915_vma_instance(struct drm_i915_gem_object
> *obj,
>                   struct i915_address_space *vm,
>                   const struct i915_ggtt_view *view)
>  {
> -       struct i915_vma *vma;
> +       struct i915_vma *vma = NULL;
>  
>         GEM_BUG_ON(!kref_read(&vm->ref));
>  
> -       spin_lock(&obj->vma.lock);
> -       vma = i915_vma_lookup(obj, vm, view);
> -       spin_unlock(&obj->vma.lock);
> +       if (i915_is_ggtt(vm) || !view ||
> +           view->type != I915_GGTT_VIEW_PARTIAL) {

Same here?

/Thomas


> +               spin_lock(&obj->vma.lock);
> +               vma = i915_vma_lookup(obj, vm, view);
> +               spin_unlock(&obj->vma.lock);
> +       }
>  
>         /* vma_create() will resolve the race if another creates the
> vma */
>         if (unlikely(!vma))
> @@ -1670,7 +1678,8 @@ static void release_references(struct i915_vma
> *vma, bool vm_ddestroy)
>  
>         spin_lock(&obj->vma.lock);
>         list_del(&vma->obj_link);
> -       if (!RB_EMPTY_NODE(&vma->obj_node))
> +       if (!i915_vma_is_persistent(vma) &&
> +           !RB_EMPTY_NODE(&vma->obj_node))
>                 rb_erase(&vma->obj_node, &obj->vma.tree);
>  
>         spin_unlock(&obj->vma.lock);


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
  2022-07-04 17:05     ` Zeng, Oak
@ 2022-07-05  9:20       ` Ramalingam C
  -1 siblings, 0 replies; 121+ messages in thread
From: Ramalingam C @ 2022-07-05  9:20 UTC (permalink / raw)
  To: Zeng, Oak
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Hellstrom, Thomas, Auld,
	Matthew, Vetter, Daniel, Vishwanathapura, Niranjana,
	christian.koenig

On 2022-07-04 at 17:05:38 +0000, Zeng, Oak wrote:
> 
> 
> Thanks,
> Oak
> 
> > -----Original Message-----
> > From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of
> > Niranjana Vishwanathapura
> > Sent: July 1, 2022 6:51 PM
> > To: intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> > Cc: Zanoni, Paulo R <paulo.r.zanoni@intel.com>; Hellstrom, Thomas
> > <thomas.hellstrom@intel.com>; Auld, Matthew <matthew.auld@intel.com>;
> > Vetter, Daniel <daniel.vetter@intel.com>; christian.koenig@amd.com
> > Subject: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
> > 
> > Treat VM_BIND vmas as persistent and handle them during the request
> > submission in the execbuff path.
> 
> Hi Niranjana,
> 
> Is the meaning of "persistent" above persistent across all the subsequent execbuf ioctls?

Yes oak. Thats correct. persistent across multiple execbuf ioctls.

Regards,
Ram.
> 
> Thanks,
> Oak 
> 
> > 
> > Support eviction by maintaining a list of evicted persistent vmas for rebinding
> > during next submission.
> > 
> > Signed-off-by: Niranjana Vishwanathapura
> > <niranjana.vishwanathapura@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
> >  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  3 +
> >  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 12 ++-
> >  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
> >  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
> >  drivers/gpu/drm/i915/i915_gem_gtt.h           | 22 ++++++
> >  drivers/gpu/drm/i915/i915_vma.c               | 32 +++++++-
> >  drivers/gpu/drm/i915/i915_vma.h               | 78 +++++++++++++++++--
> >  drivers/gpu/drm/i915/i915_vma_types.h         | 23 ++++++
> >  9 files changed, 163 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > index ccec4055fde3..5121f02ba95c 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > @@ -38,6 +38,7 @@
> >  #include "i915_gem_mman.h"
> >  #include "i915_gem_object.h"
> >  #include "i915_gem_ttm.h"
> > +#include "i915_gem_vm_bind.h"
> >  #include "i915_memcpy.h"
> >  #include "i915_trace.h"
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > index 849bf3c1061e..eaadf5a6ab09 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > @@ -6,6 +6,7 @@
> >  #ifndef __I915_GEM_VM_BIND_H
> >  #define __I915_GEM_VM_BIND_H
> > 
> > +#include <linux/dma-resv.h>
> >  #include "i915_drv.h"
> > 
> >  #define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
> > >vm_bind_lock)
> > @@ -26,6 +27,8 @@ static inline void i915_gem_vm_bind_unlock(struct
> > i915_address_space *vm)
> >  	mutex_unlock(&vm->vm_bind_lock);
> >  }
> > 
> > +#define assert_vm_priv_held(vm)   assert_object_held((vm)->root_obj)
> > +
> >  static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
> >  					struct i915_gem_ww_ctx *ww)
> >  {
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > index 96f139cc8060..1a8efa83547f 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > @@ -85,6 +85,13 @@ void i915_gem_vm_bind_remove(struct i915_vma
> > *vma, bool release_obj)  {
> >  	assert_vm_bind_held(vma->vm);
> > 
> > +	spin_lock(&vma->vm->vm_rebind_lock);
> > +	if (!list_empty(&vma->vm_rebind_link))
> > +		list_del_init(&vma->vm_rebind_link);
> > +	i915_vma_set_purged(vma);
> > +	i915_vma_set_freed(vma);
> > +	spin_unlock(&vma->vm->vm_rebind_lock);
> > +
> >  	if (!list_empty(&vma->vm_bind_link)) {
> >  		list_del_init(&vma->vm_bind_link);
> >  		list_del_init(&vma->non_priv_vm_bind_link);
> > @@ -220,6 +227,7 @@ static struct i915_vma *vm_bind_get_vma(struct
> > i915_address_space *vm,
> > 
> >  	vma->start = va->start;
> >  	vma->last = va->start + va->length - 1;
> > +	i915_vma_set_persistent(vma);
> > 
> >  	return vma;
> >  }
> > @@ -304,8 +312,10 @@ int i915_gem_vm_bind_obj(struct
> > i915_address_space *vm,
> > 
> >  	i915_vm_bind_put_fence(vma);
> >  put_vma:
> > -	if (ret)
> > +	if (ret) {
> > +		i915_vma_set_freed(vma);
> >  		i915_vma_destroy(vma);
> > +	}
> > 
> >  	i915_gem_ww_ctx_fini(&ww);
> >  unlock_vm:
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > index df0a8459c3c6..55d5389b2c6c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > @@ -293,6 +293,8 @@ void i915_address_space_init(struct
> > i915_address_space *vm, int subclass)
> >  	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
> >  	vm->root_obj = i915_gem_object_create_internal(vm->i915,
> > PAGE_SIZE);
> >  	GEM_BUG_ON(IS_ERR(vm->root_obj));
> > +	INIT_LIST_HEAD(&vm->vm_rebind_list);
> > +	spin_lock_init(&vm->vm_rebind_lock);
> >  }
> > 
> >  void *__px_vaddr(struct drm_i915_gem_object *p) diff --git
> > a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > index f538ce9115c9..fe5485c4a1cd 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > @@ -265,6 +265,8 @@ struct i915_address_space {
> >  	struct mutex vm_bind_lock;  /* Protects vm_bind lists */
> >  	struct list_head vm_bind_list;
> >  	struct list_head vm_bound_list;
> > +	struct list_head vm_rebind_list;
> > +	spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
> >  	/* va tree of persistent vmas */
> >  	struct rb_root_cached va;
> >  	struct list_head non_priv_vm_bind_list; diff --git
> > a/drivers/gpu/drm/i915/i915_gem_gtt.h
> > b/drivers/gpu/drm/i915/i915_gem_gtt.h
> > index 8c2f57eb5dda..09b89d1913fc 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> > @@ -51,4 +51,26 @@ int i915_gem_gtt_insert(struct i915_address_space
> > *vm,
> > 
> >  #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
> > 
> > +static inline int i915_vm_sync(struct i915_address_space *vm) {
> > +	int ret;
> > +
> > +	/* Wait for all requests under this vm to finish */
> > +	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
> > +				    DMA_RESV_USAGE_BOOKKEEP, false,
> > +				    MAX_SCHEDULE_TIMEOUT);
> > +	if (ret < 0)
> > +		return ret;
> > +	else if (ret > 0)
> > +		return 0;
> > +	else
> > +		return -ETIMEDOUT;
> > +}
> > +
> > +static inline bool i915_vm_is_active(const struct i915_address_space
> > +*vm) {
> > +	return !dma_resv_test_signaled(vm->root_obj->base.resv,
> > +				       DMA_RESV_USAGE_BOOKKEEP);
> > +}
> > +
> >  #endif
> > diff --git a/drivers/gpu/drm/i915/i915_vma.c
> > b/drivers/gpu/drm/i915/i915_vma.c index 6737236b7884..6adb013579be
> > 100644
> > --- a/drivers/gpu/drm/i915/i915_vma.c
> > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
> > 
> >  	INIT_LIST_HEAD(&vma->vm_bind_link);
> >  	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
> > +	INIT_LIST_HEAD(&vma->vm_rebind_link);
> >  	return vma;
> > 
> >  err_unlock:
> > @@ -1622,7 +1623,8 @@ void i915_vma_close(struct i915_vma *vma)
> >  	if (atomic_dec_and_lock_irqsave(&vma->open_count,
> >  					&gt->closed_lock,
> >  					flags)) {
> > -		__vma_close(vma, gt);
> > +		if (!i915_vma_is_persistent(vma))
> > +			__vma_close(vma, gt);
> >  		spin_unlock_irqrestore(&gt->closed_lock, flags);
> >  	}
> >  }
> > @@ -1647,6 +1649,13 @@ static void force_unbind(struct i915_vma *vma)
> >  	if (!drm_mm_node_allocated(&vma->node))
> >  		return;
> > 
> > +	/*
> > +	 * Mark persistent vma as purged to avoid it waiting
> > +	 * for VM to be released.
> > +	 */
> > +	if (i915_vma_is_persistent(vma))
> > +		i915_vma_set_purged(vma);
> > +
> >  	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
> >  	WARN_ON(__i915_vma_unbind(vma));
> >  	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
> > @@ -1666,9 +1675,12 @@ static void release_references(struct i915_vma
> > *vma, bool vm_ddestroy)
> > 
> >  	spin_unlock(&obj->vma.lock);
> > 
> > -	i915_gem_vm_bind_lock(vma->vm);
> > -	i915_gem_vm_bind_remove(vma, true);
> > -	i915_gem_vm_bind_unlock(vma->vm);
> > +	if (i915_vma_is_persistent(vma) &&
> > +	    !i915_vma_is_freed(vma)) {
> > +		i915_gem_vm_bind_lock(vma->vm);
> > +		i915_gem_vm_bind_remove(vma, true);
> > +		i915_gem_vm_bind_unlock(vma->vm);
> > +	}
> > 
> >  	spin_lock_irq(&gt->closed_lock);
> >  	__i915_vma_remove_closed(vma);
> > @@ -1839,6 +1851,8 @@ int _i915_vma_move_to_active(struct i915_vma
> > *vma,
> >  	int err;
> > 
> >  	assert_object_held(obj);
> > +	if (i915_vma_is_persistent(vma))
> > +		return -EINVAL;
> > 
> >  	GEM_BUG_ON(!vma->pages);
> > 
> > @@ -1999,6 +2013,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
> >  	__i915_vma_evict(vma, false);
> > 
> >  	drm_mm_remove_node(&vma->node); /* pairs with
> > i915_vma_release() */
> > +
> > +	if (i915_vma_is_persistent(vma)) {
> > +		spin_lock(&vma->vm->vm_rebind_lock);
> > +		if (list_empty(&vma->vm_rebind_link) &&
> > +		    !i915_vma_is_purged(vma))
> > +			list_add_tail(&vma->vm_rebind_link,
> > +				      &vma->vm->vm_rebind_list);
> > +		spin_unlock(&vma->vm->vm_rebind_lock);
> > +	}
> > +
> >  	return 0;
> >  }
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_vma.h
> > b/drivers/gpu/drm/i915/i915_vma.h index dcb49f79ff7e..6c1369a40e03
> > 100644
> > --- a/drivers/gpu/drm/i915/i915_vma.h
> > +++ b/drivers/gpu/drm/i915/i915_vma.h
> > @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
> > 
> >  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int
> > flags);  #define I915_VMA_RELEASE_MAP BIT(0)
> > -
> > -static inline bool i915_vma_is_active(const struct i915_vma *vma) -{
> > -	return !i915_active_is_idle(&vma->active);
> > -}
> > -
> >  /* do not reserve memory to prevent deadlocks */  #define
> > __EXEC_OBJECT_NO_RESERVE BIT(31)
> > 
> > @@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma
> > *vma)
> >  	return i915_vm_to_ggtt(vma->vm)->pin_bias;
> >  }
> > 
> > +static inline bool i915_vma_is_persistent(const struct i915_vma *vma) {
> > +	return test_bit(I915_VMA_PERSISTENT_BIT,
> > __i915_vma_flags(vma)); }
> > +
> > +static inline void i915_vma_set_persistent(struct i915_vma *vma) {
> > +	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma)); }
> > +
> > +static inline bool i915_vma_is_purged(const struct i915_vma *vma) {
> > +	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma)); }
> > +
> > +static inline void i915_vma_set_purged(struct i915_vma *vma) {
> > +	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma)); }
> > +
> > +static inline bool i915_vma_is_freed(const struct i915_vma *vma) {
> > +	return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma)); }
> > +
> > +static inline void i915_vma_set_freed(struct i915_vma *vma) {
> > +	set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma)); }
> > +
> > +static inline bool i915_vma_is_active(const struct i915_vma *vma) {
> > +	if (i915_vma_is_persistent(vma)) {
> > +		if (i915_vma_is_purged(vma))
> > +			return false;
> > +
> > +		return i915_vm_is_active(vma->vm);
> > +	}
> > +
> > +	return !i915_active_is_idle(&vma->active);
> > +}
> > +
> >  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)  {
> >  	i915_gem_object_get(vma->obj);
> > @@ -408,8 +444,36 @@ int i915_vma_wait_for_bind(struct i915_vma *vma);
> > 
> >  static inline int i915_vma_sync(struct i915_vma *vma)  {
> > +	int ret;
> > +
> >  	/* Wait for the asynchronous bindings and pending GPU reads */
> > -	return i915_active_wait(&vma->active);
> > +	ret = i915_active_wait(&vma->active);
> > +	if (ret || !i915_vma_is_persistent(vma) ||
> > i915_vma_is_purged(vma))
> > +		return ret;
> > +
> > +	return i915_vm_sync(vma->vm);
> > +}
> > +
> > +static inline bool i915_vma_is_bind_complete(struct i915_vma *vma) {
> > +	/* Ensure vma bind is initiated */
> > +	if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
> > +		return false;
> > +
> > +	/* Ensure any binding started is complete */
> > +	if (rcu_access_pointer(vma->active.excl.fence)) {
> > +		struct dma_fence *fence;
> > +
> > +		rcu_read_lock();
> > +		fence = dma_fence_get_rcu_safe(&vma->active.excl.fence);
> > +		rcu_read_unlock();
> > +		if (fence) {
> > +			dma_fence_put(fence);
> > +			return false;
> > +		}
> > +	}
> > +
> > +	return true;
> >  }
> > 
> >  /**
> > diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> > b/drivers/gpu/drm/i915/i915_vma_types.h
> > index 7d830a6a0b51..405c82e1bc30 100644
> > --- a/drivers/gpu/drm/i915/i915_vma_types.h
> > +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> > @@ -264,6 +264,28 @@ struct i915_vma {
> >  #define I915_VMA_SCANOUT_BIT	17
> >  #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
> > 
> > +  /**
> > +   * I915_VMA_PERSISTENT_BIT:
> > +   * The vma is persistent (created with VM_BIND call).
> > +   *
> > +   * I915_VMA_PURGED_BIT:
> > +   * The persistent vma is force unbound either due to VM_UNBIND call
> > +   * from UMD or VM is released. Do not check/wait for VM activeness
> > +   * in i915_vma_is_active() and i915_vma_sync() calls.
> > +   *
> > +   * I915_VMA_FREED_BIT:
> > +   * The persistent vma is being released by UMD via VM_UNBIND call.
> > +   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND call
> > +   * already holds the lock.
> > +   */
> > +#define I915_VMA_PERSISTENT_BIT	19
> > +#define I915_VMA_PURGED_BIT	20
> > +#define I915_VMA_FREED_BIT	21
> > +
> > +#define I915_VMA_PERSISTENT
> > 	((int)BIT(I915_VMA_PERSISTENT_BIT))
> > +#define I915_VMA_PURGED		((int)BIT(I915_VMA_PURGED_BIT))
> > +#define I915_VMA_FREED		((int)BIT(I915_VMA_FREED_BIT))
> > +
> >  	struct i915_active active;
> > 
> >  #define I915_VMA_PAGES_BIAS 24
> > @@ -292,6 +314,7 @@ struct i915_vma {
> >  	struct list_head vm_bind_link; /* Link in persistent VMA list */
> >  	/* Link in non-private persistent VMA list */
> >  	struct list_head non_priv_vm_bind_link;
> > +	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
> > 
> >  	/** Timeline fence for vm_bind completion notification */
> >  	struct {
> > --
> > 2.21.0.rc0.32.g243a4c7e27
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
@ 2022-07-05  9:20       ` Ramalingam C
  0 siblings, 0 replies; 121+ messages in thread
From: Ramalingam C @ 2022-07-05  9:20 UTC (permalink / raw)
  To: Zeng, Oak
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Hellstrom, Thomas, Auld,
	Matthew, Vetter, Daniel, christian.koenig

On 2022-07-04 at 17:05:38 +0000, Zeng, Oak wrote:
> 
> 
> Thanks,
> Oak
> 
> > -----Original Message-----
> > From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of
> > Niranjana Vishwanathapura
> > Sent: July 1, 2022 6:51 PM
> > To: intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> > Cc: Zanoni, Paulo R <paulo.r.zanoni@intel.com>; Hellstrom, Thomas
> > <thomas.hellstrom@intel.com>; Auld, Matthew <matthew.auld@intel.com>;
> > Vetter, Daniel <daniel.vetter@intel.com>; christian.koenig@amd.com
> > Subject: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
> > 
> > Treat VM_BIND vmas as persistent and handle them during the request
> > submission in the execbuff path.
> 
> Hi Niranjana,
> 
> Is the meaning of "persistent" above persistent across all the subsequent execbuf ioctls?

Yes oak. Thats correct. persistent across multiple execbuf ioctls.

Regards,
Ram.
> 
> Thanks,
> Oak 
> 
> > 
> > Support eviction by maintaining a list of evicted persistent vmas for rebinding
> > during next submission.
> > 
> > Signed-off-by: Niranjana Vishwanathapura
> > <niranjana.vishwanathapura@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
> >  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  3 +
> >  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 12 ++-
> >  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
> >  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
> >  drivers/gpu/drm/i915/i915_gem_gtt.h           | 22 ++++++
> >  drivers/gpu/drm/i915/i915_vma.c               | 32 +++++++-
> >  drivers/gpu/drm/i915/i915_vma.h               | 78 +++++++++++++++++--
> >  drivers/gpu/drm/i915/i915_vma_types.h         | 23 ++++++
> >  9 files changed, 163 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > index ccec4055fde3..5121f02ba95c 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > @@ -38,6 +38,7 @@
> >  #include "i915_gem_mman.h"
> >  #include "i915_gem_object.h"
> >  #include "i915_gem_ttm.h"
> > +#include "i915_gem_vm_bind.h"
> >  #include "i915_memcpy.h"
> >  #include "i915_trace.h"
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > index 849bf3c1061e..eaadf5a6ab09 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > @@ -6,6 +6,7 @@
> >  #ifndef __I915_GEM_VM_BIND_H
> >  #define __I915_GEM_VM_BIND_H
> > 
> > +#include <linux/dma-resv.h>
> >  #include "i915_drv.h"
> > 
> >  #define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
> > >vm_bind_lock)
> > @@ -26,6 +27,8 @@ static inline void i915_gem_vm_bind_unlock(struct
> > i915_address_space *vm)
> >  	mutex_unlock(&vm->vm_bind_lock);
> >  }
> > 
> > +#define assert_vm_priv_held(vm)   assert_object_held((vm)->root_obj)
> > +
> >  static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
> >  					struct i915_gem_ww_ctx *ww)
> >  {
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > index 96f139cc8060..1a8efa83547f 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > @@ -85,6 +85,13 @@ void i915_gem_vm_bind_remove(struct i915_vma
> > *vma, bool release_obj)  {
> >  	assert_vm_bind_held(vma->vm);
> > 
> > +	spin_lock(&vma->vm->vm_rebind_lock);
> > +	if (!list_empty(&vma->vm_rebind_link))
> > +		list_del_init(&vma->vm_rebind_link);
> > +	i915_vma_set_purged(vma);
> > +	i915_vma_set_freed(vma);
> > +	spin_unlock(&vma->vm->vm_rebind_lock);
> > +
> >  	if (!list_empty(&vma->vm_bind_link)) {
> >  		list_del_init(&vma->vm_bind_link);
> >  		list_del_init(&vma->non_priv_vm_bind_link);
> > @@ -220,6 +227,7 @@ static struct i915_vma *vm_bind_get_vma(struct
> > i915_address_space *vm,
> > 
> >  	vma->start = va->start;
> >  	vma->last = va->start + va->length - 1;
> > +	i915_vma_set_persistent(vma);
> > 
> >  	return vma;
> >  }
> > @@ -304,8 +312,10 @@ int i915_gem_vm_bind_obj(struct
> > i915_address_space *vm,
> > 
> >  	i915_vm_bind_put_fence(vma);
> >  put_vma:
> > -	if (ret)
> > +	if (ret) {
> > +		i915_vma_set_freed(vma);
> >  		i915_vma_destroy(vma);
> > +	}
> > 
> >  	i915_gem_ww_ctx_fini(&ww);
> >  unlock_vm:
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > index df0a8459c3c6..55d5389b2c6c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > @@ -293,6 +293,8 @@ void i915_address_space_init(struct
> > i915_address_space *vm, int subclass)
> >  	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
> >  	vm->root_obj = i915_gem_object_create_internal(vm->i915,
> > PAGE_SIZE);
> >  	GEM_BUG_ON(IS_ERR(vm->root_obj));
> > +	INIT_LIST_HEAD(&vm->vm_rebind_list);
> > +	spin_lock_init(&vm->vm_rebind_lock);
> >  }
> > 
> >  void *__px_vaddr(struct drm_i915_gem_object *p) diff --git
> > a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > index f538ce9115c9..fe5485c4a1cd 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > @@ -265,6 +265,8 @@ struct i915_address_space {
> >  	struct mutex vm_bind_lock;  /* Protects vm_bind lists */
> >  	struct list_head vm_bind_list;
> >  	struct list_head vm_bound_list;
> > +	struct list_head vm_rebind_list;
> > +	spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
> >  	/* va tree of persistent vmas */
> >  	struct rb_root_cached va;
> >  	struct list_head non_priv_vm_bind_list; diff --git
> > a/drivers/gpu/drm/i915/i915_gem_gtt.h
> > b/drivers/gpu/drm/i915/i915_gem_gtt.h
> > index 8c2f57eb5dda..09b89d1913fc 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> > @@ -51,4 +51,26 @@ int i915_gem_gtt_insert(struct i915_address_space
> > *vm,
> > 
> >  #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
> > 
> > +static inline int i915_vm_sync(struct i915_address_space *vm) {
> > +	int ret;
> > +
> > +	/* Wait for all requests under this vm to finish */
> > +	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
> > +				    DMA_RESV_USAGE_BOOKKEEP, false,
> > +				    MAX_SCHEDULE_TIMEOUT);
> > +	if (ret < 0)
> > +		return ret;
> > +	else if (ret > 0)
> > +		return 0;
> > +	else
> > +		return -ETIMEDOUT;
> > +}
> > +
> > +static inline bool i915_vm_is_active(const struct i915_address_space
> > +*vm) {
> > +	return !dma_resv_test_signaled(vm->root_obj->base.resv,
> > +				       DMA_RESV_USAGE_BOOKKEEP);
> > +}
> > +
> >  #endif
> > diff --git a/drivers/gpu/drm/i915/i915_vma.c
> > b/drivers/gpu/drm/i915/i915_vma.c index 6737236b7884..6adb013579be
> > 100644
> > --- a/drivers/gpu/drm/i915/i915_vma.c
> > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
> > 
> >  	INIT_LIST_HEAD(&vma->vm_bind_link);
> >  	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
> > +	INIT_LIST_HEAD(&vma->vm_rebind_link);
> >  	return vma;
> > 
> >  err_unlock:
> > @@ -1622,7 +1623,8 @@ void i915_vma_close(struct i915_vma *vma)
> >  	if (atomic_dec_and_lock_irqsave(&vma->open_count,
> >  					&gt->closed_lock,
> >  					flags)) {
> > -		__vma_close(vma, gt);
> > +		if (!i915_vma_is_persistent(vma))
> > +			__vma_close(vma, gt);
> >  		spin_unlock_irqrestore(&gt->closed_lock, flags);
> >  	}
> >  }
> > @@ -1647,6 +1649,13 @@ static void force_unbind(struct i915_vma *vma)
> >  	if (!drm_mm_node_allocated(&vma->node))
> >  		return;
> > 
> > +	/*
> > +	 * Mark persistent vma as purged to avoid it waiting
> > +	 * for VM to be released.
> > +	 */
> > +	if (i915_vma_is_persistent(vma))
> > +		i915_vma_set_purged(vma);
> > +
> >  	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
> >  	WARN_ON(__i915_vma_unbind(vma));
> >  	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
> > @@ -1666,9 +1675,12 @@ static void release_references(struct i915_vma
> > *vma, bool vm_ddestroy)
> > 
> >  	spin_unlock(&obj->vma.lock);
> > 
> > -	i915_gem_vm_bind_lock(vma->vm);
> > -	i915_gem_vm_bind_remove(vma, true);
> > -	i915_gem_vm_bind_unlock(vma->vm);
> > +	if (i915_vma_is_persistent(vma) &&
> > +	    !i915_vma_is_freed(vma)) {
> > +		i915_gem_vm_bind_lock(vma->vm);
> > +		i915_gem_vm_bind_remove(vma, true);
> > +		i915_gem_vm_bind_unlock(vma->vm);
> > +	}
> > 
> >  	spin_lock_irq(&gt->closed_lock);
> >  	__i915_vma_remove_closed(vma);
> > @@ -1839,6 +1851,8 @@ int _i915_vma_move_to_active(struct i915_vma
> > *vma,
> >  	int err;
> > 
> >  	assert_object_held(obj);
> > +	if (i915_vma_is_persistent(vma))
> > +		return -EINVAL;
> > 
> >  	GEM_BUG_ON(!vma->pages);
> > 
> > @@ -1999,6 +2013,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
> >  	__i915_vma_evict(vma, false);
> > 
> >  	drm_mm_remove_node(&vma->node); /* pairs with
> > i915_vma_release() */
> > +
> > +	if (i915_vma_is_persistent(vma)) {
> > +		spin_lock(&vma->vm->vm_rebind_lock);
> > +		if (list_empty(&vma->vm_rebind_link) &&
> > +		    !i915_vma_is_purged(vma))
> > +			list_add_tail(&vma->vm_rebind_link,
> > +				      &vma->vm->vm_rebind_list);
> > +		spin_unlock(&vma->vm->vm_rebind_lock);
> > +	}
> > +
> >  	return 0;
> >  }
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_vma.h
> > b/drivers/gpu/drm/i915/i915_vma.h index dcb49f79ff7e..6c1369a40e03
> > 100644
> > --- a/drivers/gpu/drm/i915/i915_vma.h
> > +++ b/drivers/gpu/drm/i915/i915_vma.h
> > @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
> > 
> >  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int
> > flags);  #define I915_VMA_RELEASE_MAP BIT(0)
> > -
> > -static inline bool i915_vma_is_active(const struct i915_vma *vma) -{
> > -	return !i915_active_is_idle(&vma->active);
> > -}
> > -
> >  /* do not reserve memory to prevent deadlocks */  #define
> > __EXEC_OBJECT_NO_RESERVE BIT(31)
> > 
> > @@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma
> > *vma)
> >  	return i915_vm_to_ggtt(vma->vm)->pin_bias;
> >  }
> > 
> > +static inline bool i915_vma_is_persistent(const struct i915_vma *vma) {
> > +	return test_bit(I915_VMA_PERSISTENT_BIT,
> > __i915_vma_flags(vma)); }
> > +
> > +static inline void i915_vma_set_persistent(struct i915_vma *vma) {
> > +	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma)); }
> > +
> > +static inline bool i915_vma_is_purged(const struct i915_vma *vma) {
> > +	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma)); }
> > +
> > +static inline void i915_vma_set_purged(struct i915_vma *vma) {
> > +	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma)); }
> > +
> > +static inline bool i915_vma_is_freed(const struct i915_vma *vma) {
> > +	return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma)); }
> > +
> > +static inline void i915_vma_set_freed(struct i915_vma *vma) {
> > +	set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma)); }
> > +
> > +static inline bool i915_vma_is_active(const struct i915_vma *vma) {
> > +	if (i915_vma_is_persistent(vma)) {
> > +		if (i915_vma_is_purged(vma))
> > +			return false;
> > +
> > +		return i915_vm_is_active(vma->vm);
> > +	}
> > +
> > +	return !i915_active_is_idle(&vma->active);
> > +}
> > +
> >  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)  {
> >  	i915_gem_object_get(vma->obj);
> > @@ -408,8 +444,36 @@ int i915_vma_wait_for_bind(struct i915_vma *vma);
> > 
> >  static inline int i915_vma_sync(struct i915_vma *vma)  {
> > +	int ret;
> > +
> >  	/* Wait for the asynchronous bindings and pending GPU reads */
> > -	return i915_active_wait(&vma->active);
> > +	ret = i915_active_wait(&vma->active);
> > +	if (ret || !i915_vma_is_persistent(vma) ||
> > i915_vma_is_purged(vma))
> > +		return ret;
> > +
> > +	return i915_vm_sync(vma->vm);
> > +}
> > +
> > +static inline bool i915_vma_is_bind_complete(struct i915_vma *vma) {
> > +	/* Ensure vma bind is initiated */
> > +	if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
> > +		return false;
> > +
> > +	/* Ensure any binding started is complete */
> > +	if (rcu_access_pointer(vma->active.excl.fence)) {
> > +		struct dma_fence *fence;
> > +
> > +		rcu_read_lock();
> > +		fence = dma_fence_get_rcu_safe(&vma->active.excl.fence);
> > +		rcu_read_unlock();
> > +		if (fence) {
> > +			dma_fence_put(fence);
> > +			return false;
> > +		}
> > +	}
> > +
> > +	return true;
> >  }
> > 
> >  /**
> > diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> > b/drivers/gpu/drm/i915/i915_vma_types.h
> > index 7d830a6a0b51..405c82e1bc30 100644
> > --- a/drivers/gpu/drm/i915/i915_vma_types.h
> > +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> > @@ -264,6 +264,28 @@ struct i915_vma {
> >  #define I915_VMA_SCANOUT_BIT	17
> >  #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
> > 
> > +  /**
> > +   * I915_VMA_PERSISTENT_BIT:
> > +   * The vma is persistent (created with VM_BIND call).
> > +   *
> > +   * I915_VMA_PURGED_BIT:
> > +   * The persistent vma is force unbound either due to VM_UNBIND call
> > +   * from UMD or VM is released. Do not check/wait for VM activeness
> > +   * in i915_vma_is_active() and i915_vma_sync() calls.
> > +   *
> > +   * I915_VMA_FREED_BIT:
> > +   * The persistent vma is being released by UMD via VM_UNBIND call.
> > +   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND call
> > +   * already holds the lock.
> > +   */
> > +#define I915_VMA_PERSISTENT_BIT	19
> > +#define I915_VMA_PURGED_BIT	20
> > +#define I915_VMA_FREED_BIT	21
> > +
> > +#define I915_VMA_PERSISTENT
> > 	((int)BIT(I915_VMA_PERSISTENT_BIT))
> > +#define I915_VMA_PURGED		((int)BIT(I915_VMA_PURGED_BIT))
> > +#define I915_VMA_FREED		((int)BIT(I915_VMA_FREED_BIT))
> > +
> >  	struct i915_active active;
> > 
> >  #define I915_VMA_PAGES_BIAS 24
> > @@ -292,6 +314,7 @@ struct i915_vma {
> >  	struct list_head vm_bind_link; /* Link in persistent VMA list */
> >  	/* Link in non-private persistent VMA list */
> >  	struct list_head non_priv_vm_bind_link;
> > +	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
> > 
> >  	/** Timeline fence for vm_bind completion notification */
> >  	struct {
> > --
> > 2.21.0.rc0.32.g243a4c7e27
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 01/10] drm/i915/vm_bind: Introduce VM_BIND ioctl
  2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-05  9:59     ` Hellstrom, Thomas
  -1 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-05  9:59 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Zanoni, Paulo R, Auld, Matthew, Vetter, Daniel, christian.koenig

Hi,


On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> Add VM_BIND and VM_UNBIND ioctls to bind/unbind a section of an
> object at the specified GPU virtual addresses.
> 
> Add I915_PARAM_VM_BIND_VERSION to indicate version of VM_BIND feature
> supported and I915_VM_CREATE_FLAGS_USE_VM_BIND for UMDs to select the
> vm_bind mode of binding.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>

Some comments on patch ordering. In order to ease reviews and to not
introduce unwanted surprises, could we

1) Add patches that introduce needed internal functionality /
refactoring / helpers.
2) Add patches that add enable intended user-space functionality, any
yet unsupported functionality disabled.
3) Add patches that introduce additional internal functionality /
refactoring / helpers.
4) Add patches that enable that additional functionality.

Fixes that are known at series submission time squashed before a
feature is enabled.


> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c |  20 +-
>  drivers/gpu/drm/i915/gem/i915_gem_context.h |  15 ++
>  drivers/gpu/drm/i915/gt/intel_gtt.h         |   6 +
>  drivers/gpu/drm/i915/i915_driver.c          |  30 +++
>  drivers/gpu/drm/i915/i915_getparam.c        |   3 +
>  include/uapi/drm/i915_drm.h                 | 192
> +++++++++++++++++++-
>  6 files changed, 248 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index dabdfe09f5e5..e3f5fbf2ac05 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -81,7 +81,6 @@
>  
>  #include "pxp/intel_pxp.h"
>  
> -#include "i915_file_private.h"
>  #include "i915_gem_context.h"
>  #include "i915_trace.h"
>  #include "i915_user_extensions.h"
> @@ -346,20 +345,6 @@ static int proto_context_register(struct
> drm_i915_file_private *fpriv,
>         return ret;
>  }
>  
> -static struct i915_address_space *
> -i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
> -{
> -       struct i915_address_space *vm;
> -
> -       xa_lock(&file_priv->vm_xa);
> -       vm = xa_load(&file_priv->vm_xa, id);
> -       if (vm)
> -               kref_get(&vm->ref);
> -       xa_unlock(&file_priv->vm_xa);
> -
> -       return vm;
> -}
> -
>  static int set_proto_ctx_vm(struct drm_i915_file_private *fpriv,
>                             struct i915_gem_proto_context *pc,
>                             const struct drm_i915_gem_context_param
> *args)
> @@ -1799,7 +1784,7 @@ int i915_gem_vm_create_ioctl(struct drm_device
> *dev, void *data,
>         if (!HAS_FULL_PPGTT(i915))
>                 return -ENODEV;
>  
> -       if (args->flags)
> +       if (args->flags & I915_VM_CREATE_FLAGS_UNKNOWN)
>                 return -EINVAL;
>  
>         ppgtt = i915_ppgtt_create(to_gt(i915), 0);
> @@ -1819,6 +1804,9 @@ int i915_gem_vm_create_ioctl(struct drm_device
> *dev, void *data,
>         if (err)
>                 goto err_put;
>  
> +       if (args->flags & I915_VM_CREATE_FLAGS_USE_VM_BIND)
> +               ppgtt->vm.vm_bind_mode = true;
> +
>         GEM_BUG_ON(id == 0); /* reserved for invalid/unassigned ppgtt
> */
>         args->vm_id = id;
>         return 0;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h
> b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> index e5b0f66ea1fe..723bf446c934 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> @@ -12,6 +12,7 @@
>  #include "gt/intel_context.h"
>  
>  #include "i915_drv.h"
> +#include "i915_file_private.h"
>  #include "i915_gem.h"
>  #include "i915_scheduler.h"
>  #include "intel_device_info.h"
> @@ -139,6 +140,20 @@ int i915_gem_context_setparam_ioctl(struct
> drm_device *dev, void *data,
>  int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void
> *data,
>                                        struct drm_file *file);
>  
> +static inline struct i915_address_space *
> +i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
> +{
> +       struct i915_address_space *vm;
> +
> +       xa_lock(&file_priv->vm_xa);
> +       vm = xa_load(&file_priv->vm_xa, id);
> +       if (vm)
> +               kref_get(&vm->ref);
> +       xa_unlock(&file_priv->vm_xa);
> +
> +       return vm;
> +}

Does this really need to be inlined?

> +
>  struct i915_gem_context *
>  i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32
> id);
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index e639434e97fd..c812aa9708ae 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -271,6 +271,12 @@ struct i915_address_space {
>         /* Skip pte rewrite on unbind for suspend. Protected by
> @mutex */
>         bool skip_pte_rewrite:1;
>  
> +       /**
> +        * true: allow only vm_bind method of binding.
> +        * false: allow only legacy execbuff method of binding.
> +        */

Use proper kerneldoc. (Same holds for structure documentation across
the series).
Also please follow internal locking guidelines on documentation of
members that need protection with locks.

> +       bool vm_bind_mode:1;
> +
>         u8 top;
>         u8 pd_shift;
>         u8 scratch_order;
> diff --git a/drivers/gpu/drm/i915/i915_driver.c
> b/drivers/gpu/drm/i915/i915_driver.c
> index deb8a8b76965..ccf990dfd99b 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -1778,6 +1778,34 @@ i915_gem_reject_pin_ioctl(struct drm_device
> *dev, void *data,
>         return -ENODEV;
>  }
>  
> +static int i915_gem_vm_bind_ioctl(struct drm_device *dev, void
> *data,
> +                                 struct drm_file *file)
> +{
> +       struct drm_i915_gem_vm_bind *args = data;
> +       struct i915_address_space *vm;
> +
> +       vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> +       if (unlikely(!vm))
> +               return -ENOENT;
> +
> +       i915_vm_put(vm);
> +       return -EINVAL;
> +}
> +
> +static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void
> *data,
> +                                   struct drm_file *file)
> +{
> +       struct drm_i915_gem_vm_unbind *args = data;
> +       struct i915_address_space *vm;
> +
> +       vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> +       if (unlikely(!vm))
> +               return -ENOENT;
> +
> +       i915_vm_put(vm);
> +       return -EINVAL;
> +}
> +

Move these functions to the file of the actual implementation?

>  static const struct drm_ioctl_desc i915_ioctls[] = {
>         DRM_IOCTL_DEF_DRV(I915_INIT, drm_noop,
> DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
>         DRM_IOCTL_DEF_DRV(I915_FLUSH, drm_noop, DRM_AUTH),
> @@ -1838,6 +1866,8 @@ static const struct drm_ioctl_desc
> i915_ioctls[] = {
>         DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl,
> DRM_RENDER_ALLOW),
>         DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE,
> i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
>         DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY,
> i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(I915_GEM_VM_BIND, i915_gem_vm_bind_ioctl,
> DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(I915_GEM_VM_UNBIND,
> i915_gem_vm_unbind_ioctl, DRM_RENDER_ALLOW),
>  };
>  
>  /*
> diff --git a/drivers/gpu/drm/i915/i915_getparam.c
> b/drivers/gpu/drm/i915/i915_getparam.c
> index 6fd15b39570c..c1d53febc5de 100644
> --- a/drivers/gpu/drm/i915/i915_getparam.c
> +++ b/drivers/gpu/drm/i915/i915_getparam.c
> @@ -175,6 +175,9 @@ int i915_getparam_ioctl(struct drm_device *dev,
> void *data,
>         case I915_PARAM_PERF_REVISION:
>                 value = i915_perf_ioctl_version();
>                 break;
> +       case I915_PARAM_VM_BIND_VERSION:
> +               value = GRAPHICS_VER(i915) >= 12 ? 1 : 0;
> +               break;
>         default:
>                 DRM_DEBUG("Unknown parameter %d\n", param->param);
>                 return -EINVAL;
> diff --git a/include/uapi/drm/i915_drm.h
> b/include/uapi/drm/i915_drm.h
> index 3e78a00220ea..26cca49717f8 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -470,6 +470,8 @@ typedef struct _drm_i915_sarea {
>  #define DRM_I915_GEM_VM_CREATE         0x3a
>  #define DRM_I915_GEM_VM_DESTROY                0x3b
>  #define DRM_I915_GEM_CREATE_EXT                0x3c
> +#define DRM_I915_GEM_VM_BIND           0x3d
> +#define DRM_I915_GEM_VM_UNBIND         0x3e
>  /* Must be kept compact -- no holes */
>  
>  #define DRM_IOCTL_I915_INIT            DRM_IOW( DRM_COMMAND_BASE +
> DRM_I915_INIT, drm_i915_init_t)
> @@ -534,6 +536,8 @@ typedef struct _drm_i915_sarea {
>  #define
> DRM_IOCTL_I915_QUERY                   DRM_IOWR(DRM_COMMAND_BASE +
> DRM_I915_QUERY, struct drm_i915_query)
>  #define DRM_IOCTL_I915_GEM_VM_CREATE   DRM_IOWR(DRM_COMMAND_BASE +
> DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
>  #define DRM_IOCTL_I915_GEM_VM_DESTROY  DRM_IOW (DRM_COMMAND_BASE +
> DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
> +#define DRM_IOCTL_I915_GEM_VM_BIND     DRM_IOWR(DRM_COMMAND_BASE +
> DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND   DRM_IOWR(DRM_COMMAND_BASE +
> DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)
>  
>  /* Allow drivers to submit batchbuffers directly to hardware,
> relying
>   * on the security mechanisms provided by hardware.
> @@ -749,6 +753,25 @@ typedef struct drm_i915_irq_wait {
>  /* Query if the kernel supports the I915_USERPTR_PROBE flag. */
>  #define I915_PARAM_HAS_USERPTR_PROBE 56
>  
> +/*
> + * VM_BIND feature version supported.
> + *
> + * The following versions of VM_BIND have been defined:
> + *
> + * 0: No VM_BIND support.
> + *
> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
> created
> + *    previously with VM_BIND, the ioctl will not support unbinding
> multiple
> + *    mappings or splitting them. Similarly, VM_BIND calls will not
> replace
> + *    any existing mappings.
> + *
> + * 2: The restrictions on unbinding partial or multiple mappings is
> + *    lifted, Similarly, binding will replace any mappings in the
> given range.
> + *
> + * See struct drm_i915_gem_vm_bind and struct
> drm_i915_gem_vm_unbind.
> + */
> +#define I915_PARAM_VM_BIND_VERSION     57

Perhaps clarify that new versions are always backwards compatible?

> +
>  /* Must be kept compact -- no holes and well documented */
>  
>  typedef struct drm_i915_getparam {
> @@ -1441,6 +1464,41 @@ struct drm_i915_gem_execbuffer2 {
>  #define i915_execbuffer2_get_context_id(eb2) \
>         ((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
>  
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output timeline
> fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the completion
> of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> +       /** @handle: User's handle for a drm_syncobj to wait on or
> signal. */
> +       __u32 handle;
> +
> +       /**
> +        * @flags: Supported flags are:
> +        *
> +        * I915_TIMELINE_FENCE_WAIT:
> +        * Wait for the input fence before the operation.
> +        *
> +        * I915_TIMELINE_FENCE_SIGNAL:
> +        * Return operation completion fence as output.
> +        */
> +       __u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-
> (I915_TIMELINE_FENCE_SIGNAL << 1))
> +
> +       /**
> +        * @value: A point in the timeline.
> +        * Value must be 0 for a binary drm_syncobj. A Value of 0 for
> a
> +        * timeline drm_syncobj is invalid as it turns a drm_syncobj
> into a
> +        * binary one.
> +        */
> +       __u64 value;
> +};
> +
>  struct drm_i915_gem_pin {
>         /** Handle of the buffer to be pinned. */
>         __u32 handle;
> @@ -2397,8 +2455,6 @@ struct drm_i915_gem_context_destroy {
>   * The id of new VM (bound to the fd) for use with
> I915_CONTEXT_PARAM_VM is
>   * returned in the outparam @id.
>   *
> - * No flags are defined, with all bits reserved and must be zero.
> - *
>   * An extension chain maybe provided, starting with @extensions, and
> terminated
>   * by the @next_extension being 0. Currently, no extensions are
> defined.
>   *
> @@ -2410,6 +2466,10 @@ struct drm_i915_gem_context_destroy {
>   */
>  struct drm_i915_gem_vm_control {
>         __u64 extensions;
> +
> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1u << 0)
> +#define I915_VM_CREATE_FLAGS_UNKNOWN \
> +       (-(I915_VM_CREATE_FLAGS_USE_VM_BIND << 1))
>         __u32 flags;
>         __u32 vm_id;
>  };
> @@ -3602,6 +3662,134 @@ struct
> drm_i915_gem_create_ext_protected_content {
>  /* ID of the protected content session managed by i915 when PXP is
> active */
>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>  
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the
> mapping of GPU
> + * virtual address (VA) range to the section of an object that
> should be bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not currently bound)
> and can
> + * be mapped to whole object or a section of the object (partial
> binding).
> + * Multiple VA mappings can be created to the same section of the
> object
> + * (aliasing).
> + *
> + * The @start, @offset and @length must be 4K page aligned. However
> the DG2
> + * and XEHPSDV has 64K page size for device local memory and has
> compact page
> + * table. On those platforms, for binding device local-memory
> objects, the
> + * @start, @offset and @length must be 64K aligned. Also, UMDs
> should not mix
> + * the local memory 64K page and the system memory 4K page bindings
> in the same
> + * 2M range.
> + *
> + * Error code -EINVAL will be returned if @start, @offset and
> @length are not
> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION),
> error code
> + * -ENOSPC will be returned if the VA range specified can't be
> reserved.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND operation can
> be done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_bind {
> +       /** @vm_id: VM (address space) id to bind */
> +       __u32 vm_id;
> +
> +       /** @handle: Object handle */
> +       __u32 handle;
> +
> +       /** @start: Virtual Address start to bind */
> +       __u64 start;
> +
> +       /** @offset: Offset in object to bind */
> +       __u64 offset;
> +
> +       /** @length: Length of mapping to bind */
> +       __u64 length;
> +
> +       /**
> +        * @flags: Currently reserved, MBZ.
> +        *
> +        * Note that @fence carries its own flags.
> +        */
> +       __u64 flags;
> +
> +       /**
> +        * @fence: Timeline fence for bind completion signaling.
> +        *
> +        * Timeline fence is of format struct
> drm_i915_gem_timeline_fence.
> +        *
> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT
> flag
> +        * is invalid, and an error will be returned.
> +        *
> +        * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out
> fence
> +        * is not requested and binding is completed synchronously.
> +        */
> +       struct drm_i915_gem_timeline_fence fence;
> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> + *
> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU
> virtual
> + * address (VA) range that should be unbound from the device page
> table of the
> + * specified address space (VM). VM_UNBIND will force unbind the
> specified
> + * range from device page table without waiting for any GPU job to
> complete.
> + * It is UMDs responsibility to ensure the mapping is no longer in
> use before
> + * calling VM_UNBIND.
> + *
> + * If the specified mapping is not found, the ioctl will simply
> return without
> + * any error.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> concurrently
> + * are not ordered. Furthermore, parts of the VM_UNBIND operation
> can be done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_unbind {
> +       /** @vm_id: VM (address space) id to bind */
> +       __u32 vm_id;
> +
> +       /** @rsvd: Reserved, MBZ */
> +       __u32 rsvd;
> +
> +       /** @start: Virtual Address start to unbind */
> +       __u64 start;
> +
> +       /** @length: Length of mapping to unbind */
> +       __u64 length;
> +
> +       /**
> +        * @flags: Currently reserved, MBZ.
> +        *
> +        * Note that @fence carries its own flags.
> +        */
> +       __u64 flags;
> +
> +       /**
> +        * @fence: Timeline fence for unbind completion signaling.
> +        *
> +        * Timeline fence is of format struct
> drm_i915_gem_timeline_fence.
> +        *
> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT
> flag
> +        * is invalid, and an error will be returned.
> +        *
> +        * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out
> fence
> +        * is not requested and unbinding is completed synchronously.
> +        */
> +       struct drm_i915_gem_timeline_fence fence;
> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
>  #if defined(__cplusplus)
>  }
>  #endif


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 01/10] drm/i915/vm_bind: Introduce VM_BIND ioctl
@ 2022-07-05  9:59     ` Hellstrom, Thomas
  0 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-05  9:59 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Brost, Matthew, Zanoni, Paulo R, Ursulin, Tvrtko, Landwerlin,
	Lionel G, Auld, Matthew, jason, Vetter, Daniel, christian.koenig

Hi,


On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> Add VM_BIND and VM_UNBIND ioctls to bind/unbind a section of an
> object at the specified GPU virtual addresses.
> 
> Add I915_PARAM_VM_BIND_VERSION to indicate version of VM_BIND feature
> supported and I915_VM_CREATE_FLAGS_USE_VM_BIND for UMDs to select the
> vm_bind mode of binding.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>

Some comments on patch ordering. In order to ease reviews and to not
introduce unwanted surprises, could we

1) Add patches that introduce needed internal functionality /
refactoring / helpers.
2) Add patches that add enable intended user-space functionality, any
yet unsupported functionality disabled.
3) Add patches that introduce additional internal functionality /
refactoring / helpers.
4) Add patches that enable that additional functionality.

Fixes that are known at series submission time squashed before a
feature is enabled.


> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c |  20 +-
>  drivers/gpu/drm/i915/gem/i915_gem_context.h |  15 ++
>  drivers/gpu/drm/i915/gt/intel_gtt.h         |   6 +
>  drivers/gpu/drm/i915/i915_driver.c          |  30 +++
>  drivers/gpu/drm/i915/i915_getparam.c        |   3 +
>  include/uapi/drm/i915_drm.h                 | 192
> +++++++++++++++++++-
>  6 files changed, 248 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index dabdfe09f5e5..e3f5fbf2ac05 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -81,7 +81,6 @@
>  
>  #include "pxp/intel_pxp.h"
>  
> -#include "i915_file_private.h"
>  #include "i915_gem_context.h"
>  #include "i915_trace.h"
>  #include "i915_user_extensions.h"
> @@ -346,20 +345,6 @@ static int proto_context_register(struct
> drm_i915_file_private *fpriv,
>         return ret;
>  }
>  
> -static struct i915_address_space *
> -i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
> -{
> -       struct i915_address_space *vm;
> -
> -       xa_lock(&file_priv->vm_xa);
> -       vm = xa_load(&file_priv->vm_xa, id);
> -       if (vm)
> -               kref_get(&vm->ref);
> -       xa_unlock(&file_priv->vm_xa);
> -
> -       return vm;
> -}
> -
>  static int set_proto_ctx_vm(struct drm_i915_file_private *fpriv,
>                             struct i915_gem_proto_context *pc,
>                             const struct drm_i915_gem_context_param
> *args)
> @@ -1799,7 +1784,7 @@ int i915_gem_vm_create_ioctl(struct drm_device
> *dev, void *data,
>         if (!HAS_FULL_PPGTT(i915))
>                 return -ENODEV;
>  
> -       if (args->flags)
> +       if (args->flags & I915_VM_CREATE_FLAGS_UNKNOWN)
>                 return -EINVAL;
>  
>         ppgtt = i915_ppgtt_create(to_gt(i915), 0);
> @@ -1819,6 +1804,9 @@ int i915_gem_vm_create_ioctl(struct drm_device
> *dev, void *data,
>         if (err)
>                 goto err_put;
>  
> +       if (args->flags & I915_VM_CREATE_FLAGS_USE_VM_BIND)
> +               ppgtt->vm.vm_bind_mode = true;
> +
>         GEM_BUG_ON(id == 0); /* reserved for invalid/unassigned ppgtt
> */
>         args->vm_id = id;
>         return 0;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h
> b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> index e5b0f66ea1fe..723bf446c934 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> @@ -12,6 +12,7 @@
>  #include "gt/intel_context.h"
>  
>  #include "i915_drv.h"
> +#include "i915_file_private.h"
>  #include "i915_gem.h"
>  #include "i915_scheduler.h"
>  #include "intel_device_info.h"
> @@ -139,6 +140,20 @@ int i915_gem_context_setparam_ioctl(struct
> drm_device *dev, void *data,
>  int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void
> *data,
>                                        struct drm_file *file);
>  
> +static inline struct i915_address_space *
> +i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
> +{
> +       struct i915_address_space *vm;
> +
> +       xa_lock(&file_priv->vm_xa);
> +       vm = xa_load(&file_priv->vm_xa, id);
> +       if (vm)
> +               kref_get(&vm->ref);
> +       xa_unlock(&file_priv->vm_xa);
> +
> +       return vm;
> +}

Does this really need to be inlined?

> +
>  struct i915_gem_context *
>  i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32
> id);
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index e639434e97fd..c812aa9708ae 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -271,6 +271,12 @@ struct i915_address_space {
>         /* Skip pte rewrite on unbind for suspend. Protected by
> @mutex */
>         bool skip_pte_rewrite:1;
>  
> +       /**
> +        * true: allow only vm_bind method of binding.
> +        * false: allow only legacy execbuff method of binding.
> +        */

Use proper kerneldoc. (Same holds for structure documentation across
the series).
Also please follow internal locking guidelines on documentation of
members that need protection with locks.

> +       bool vm_bind_mode:1;
> +
>         u8 top;
>         u8 pd_shift;
>         u8 scratch_order;
> diff --git a/drivers/gpu/drm/i915/i915_driver.c
> b/drivers/gpu/drm/i915/i915_driver.c
> index deb8a8b76965..ccf990dfd99b 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -1778,6 +1778,34 @@ i915_gem_reject_pin_ioctl(struct drm_device
> *dev, void *data,
>         return -ENODEV;
>  }
>  
> +static int i915_gem_vm_bind_ioctl(struct drm_device *dev, void
> *data,
> +                                 struct drm_file *file)
> +{
> +       struct drm_i915_gem_vm_bind *args = data;
> +       struct i915_address_space *vm;
> +
> +       vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> +       if (unlikely(!vm))
> +               return -ENOENT;
> +
> +       i915_vm_put(vm);
> +       return -EINVAL;
> +}
> +
> +static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void
> *data,
> +                                   struct drm_file *file)
> +{
> +       struct drm_i915_gem_vm_unbind *args = data;
> +       struct i915_address_space *vm;
> +
> +       vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> +       if (unlikely(!vm))
> +               return -ENOENT;
> +
> +       i915_vm_put(vm);
> +       return -EINVAL;
> +}
> +

Move these functions to the file of the actual implementation?

>  static const struct drm_ioctl_desc i915_ioctls[] = {
>         DRM_IOCTL_DEF_DRV(I915_INIT, drm_noop,
> DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
>         DRM_IOCTL_DEF_DRV(I915_FLUSH, drm_noop, DRM_AUTH),
> @@ -1838,6 +1866,8 @@ static const struct drm_ioctl_desc
> i915_ioctls[] = {
>         DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl,
> DRM_RENDER_ALLOW),
>         DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE,
> i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
>         DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY,
> i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(I915_GEM_VM_BIND, i915_gem_vm_bind_ioctl,
> DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(I915_GEM_VM_UNBIND,
> i915_gem_vm_unbind_ioctl, DRM_RENDER_ALLOW),
>  };
>  
>  /*
> diff --git a/drivers/gpu/drm/i915/i915_getparam.c
> b/drivers/gpu/drm/i915/i915_getparam.c
> index 6fd15b39570c..c1d53febc5de 100644
> --- a/drivers/gpu/drm/i915/i915_getparam.c
> +++ b/drivers/gpu/drm/i915/i915_getparam.c
> @@ -175,6 +175,9 @@ int i915_getparam_ioctl(struct drm_device *dev,
> void *data,
>         case I915_PARAM_PERF_REVISION:
>                 value = i915_perf_ioctl_version();
>                 break;
> +       case I915_PARAM_VM_BIND_VERSION:
> +               value = GRAPHICS_VER(i915) >= 12 ? 1 : 0;
> +               break;
>         default:
>                 DRM_DEBUG("Unknown parameter %d\n", param->param);
>                 return -EINVAL;
> diff --git a/include/uapi/drm/i915_drm.h
> b/include/uapi/drm/i915_drm.h
> index 3e78a00220ea..26cca49717f8 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -470,6 +470,8 @@ typedef struct _drm_i915_sarea {
>  #define DRM_I915_GEM_VM_CREATE         0x3a
>  #define DRM_I915_GEM_VM_DESTROY                0x3b
>  #define DRM_I915_GEM_CREATE_EXT                0x3c
> +#define DRM_I915_GEM_VM_BIND           0x3d
> +#define DRM_I915_GEM_VM_UNBIND         0x3e
>  /* Must be kept compact -- no holes */
>  
>  #define DRM_IOCTL_I915_INIT            DRM_IOW( DRM_COMMAND_BASE +
> DRM_I915_INIT, drm_i915_init_t)
> @@ -534,6 +536,8 @@ typedef struct _drm_i915_sarea {
>  #define
> DRM_IOCTL_I915_QUERY                   DRM_IOWR(DRM_COMMAND_BASE +
> DRM_I915_QUERY, struct drm_i915_query)
>  #define DRM_IOCTL_I915_GEM_VM_CREATE   DRM_IOWR(DRM_COMMAND_BASE +
> DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
>  #define DRM_IOCTL_I915_GEM_VM_DESTROY  DRM_IOW (DRM_COMMAND_BASE +
> DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
> +#define DRM_IOCTL_I915_GEM_VM_BIND     DRM_IOWR(DRM_COMMAND_BASE +
> DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND   DRM_IOWR(DRM_COMMAND_BASE +
> DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)
>  
>  /* Allow drivers to submit batchbuffers directly to hardware,
> relying
>   * on the security mechanisms provided by hardware.
> @@ -749,6 +753,25 @@ typedef struct drm_i915_irq_wait {
>  /* Query if the kernel supports the I915_USERPTR_PROBE flag. */
>  #define I915_PARAM_HAS_USERPTR_PROBE 56
>  
> +/*
> + * VM_BIND feature version supported.
> + *
> + * The following versions of VM_BIND have been defined:
> + *
> + * 0: No VM_BIND support.
> + *
> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
> created
> + *    previously with VM_BIND, the ioctl will not support unbinding
> multiple
> + *    mappings or splitting them. Similarly, VM_BIND calls will not
> replace
> + *    any existing mappings.
> + *
> + * 2: The restrictions on unbinding partial or multiple mappings is
> + *    lifted, Similarly, binding will replace any mappings in the
> given range.
> + *
> + * See struct drm_i915_gem_vm_bind and struct
> drm_i915_gem_vm_unbind.
> + */
> +#define I915_PARAM_VM_BIND_VERSION     57

Perhaps clarify that new versions are always backwards compatible?

> +
>  /* Must be kept compact -- no holes and well documented */
>  
>  typedef struct drm_i915_getparam {
> @@ -1441,6 +1464,41 @@ struct drm_i915_gem_execbuffer2 {
>  #define i915_execbuffer2_get_context_id(eb2) \
>         ((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
>  
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output timeline
> fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the completion
> of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> +       /** @handle: User's handle for a drm_syncobj to wait on or
> signal. */
> +       __u32 handle;
> +
> +       /**
> +        * @flags: Supported flags are:
> +        *
> +        * I915_TIMELINE_FENCE_WAIT:
> +        * Wait for the input fence before the operation.
> +        *
> +        * I915_TIMELINE_FENCE_SIGNAL:
> +        * Return operation completion fence as output.
> +        */
> +       __u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-
> (I915_TIMELINE_FENCE_SIGNAL << 1))
> +
> +       /**
> +        * @value: A point in the timeline.
> +        * Value must be 0 for a binary drm_syncobj. A Value of 0 for
> a
> +        * timeline drm_syncobj is invalid as it turns a drm_syncobj
> into a
> +        * binary one.
> +        */
> +       __u64 value;
> +};
> +
>  struct drm_i915_gem_pin {
>         /** Handle of the buffer to be pinned. */
>         __u32 handle;
> @@ -2397,8 +2455,6 @@ struct drm_i915_gem_context_destroy {
>   * The id of new VM (bound to the fd) for use with
> I915_CONTEXT_PARAM_VM is
>   * returned in the outparam @id.
>   *
> - * No flags are defined, with all bits reserved and must be zero.
> - *
>   * An extension chain maybe provided, starting with @extensions, and
> terminated
>   * by the @next_extension being 0. Currently, no extensions are
> defined.
>   *
> @@ -2410,6 +2466,10 @@ struct drm_i915_gem_context_destroy {
>   */
>  struct drm_i915_gem_vm_control {
>         __u64 extensions;
> +
> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1u << 0)
> +#define I915_VM_CREATE_FLAGS_UNKNOWN \
> +       (-(I915_VM_CREATE_FLAGS_USE_VM_BIND << 1))
>         __u32 flags;
>         __u32 vm_id;
>  };
> @@ -3602,6 +3662,134 @@ struct
> drm_i915_gem_create_ext_protected_content {
>  /* ID of the protected content session managed by i915 when PXP is
> active */
>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>  
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the
> mapping of GPU
> + * virtual address (VA) range to the section of an object that
> should be bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not currently bound)
> and can
> + * be mapped to whole object or a section of the object (partial
> binding).
> + * Multiple VA mappings can be created to the same section of the
> object
> + * (aliasing).
> + *
> + * The @start, @offset and @length must be 4K page aligned. However
> the DG2
> + * and XEHPSDV has 64K page size for device local memory and has
> compact page
> + * table. On those platforms, for binding device local-memory
> objects, the
> + * @start, @offset and @length must be 64K aligned. Also, UMDs
> should not mix
> + * the local memory 64K page and the system memory 4K page bindings
> in the same
> + * 2M range.
> + *
> + * Error code -EINVAL will be returned if @start, @offset and
> @length are not
> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION),
> error code
> + * -ENOSPC will be returned if the VA range specified can't be
> reserved.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND operation can
> be done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_bind {
> +       /** @vm_id: VM (address space) id to bind */
> +       __u32 vm_id;
> +
> +       /** @handle: Object handle */
> +       __u32 handle;
> +
> +       /** @start: Virtual Address start to bind */
> +       __u64 start;
> +
> +       /** @offset: Offset in object to bind */
> +       __u64 offset;
> +
> +       /** @length: Length of mapping to bind */
> +       __u64 length;
> +
> +       /**
> +        * @flags: Currently reserved, MBZ.
> +        *
> +        * Note that @fence carries its own flags.
> +        */
> +       __u64 flags;
> +
> +       /**
> +        * @fence: Timeline fence for bind completion signaling.
> +        *
> +        * Timeline fence is of format struct
> drm_i915_gem_timeline_fence.
> +        *
> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT
> flag
> +        * is invalid, and an error will be returned.
> +        *
> +        * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out
> fence
> +        * is not requested and binding is completed synchronously.
> +        */
> +       struct drm_i915_gem_timeline_fence fence;
> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> + *
> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU
> virtual
> + * address (VA) range that should be unbound from the device page
> table of the
> + * specified address space (VM). VM_UNBIND will force unbind the
> specified
> + * range from device page table without waiting for any GPU job to
> complete.
> + * It is UMDs responsibility to ensure the mapping is no longer in
> use before
> + * calling VM_UNBIND.
> + *
> + * If the specified mapping is not found, the ioctl will simply
> return without
> + * any error.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> concurrently
> + * are not ordered. Furthermore, parts of the VM_UNBIND operation
> can be done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_unbind {
> +       /** @vm_id: VM (address space) id to bind */
> +       __u32 vm_id;
> +
> +       /** @rsvd: Reserved, MBZ */
> +       __u32 rsvd;
> +
> +       /** @start: Virtual Address start to unbind */
> +       __u64 start;
> +
> +       /** @length: Length of mapping to unbind */
> +       __u64 length;
> +
> +       /**
> +        * @flags: Currently reserved, MBZ.
> +        *
> +        * Note that @fence carries its own flags.
> +        */
> +       __u64 flags;
> +
> +       /**
> +        * @fence: Timeline fence for unbind completion signaling.
> +        *
> +        * Timeline fence is of format struct
> drm_i915_gem_timeline_fence.
> +        *
> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT
> flag
> +        * is invalid, and an error will be returned.
> +        *
> +        * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out
> fence
> +        * is not requested and unbinding is completed synchronously.
> +        */
> +       struct drm_i915_gem_timeline_fence fence;
> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
>  #if defined(__cplusplus)
>  }
>  #endif


^ permalink raw reply	[flat|nested] 121+ messages in thread

* RE: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
  2022-07-05  9:20       ` Ramalingam C
@ 2022-07-05 13:50         ` Zeng, Oak
  -1 siblings, 0 replies; 121+ messages in thread
From: Zeng, Oak @ 2022-07-05 13:50 UTC (permalink / raw)
  To: C, Ramalingam
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Hellstrom, Thomas, Auld,
	Matthew, Vetter,  Daniel, Vishwanathapura, Niranjana,
	christian.koenig



Thanks,
Oak

> -----Original Message-----
> From: C, Ramalingam <ramalingam.c@intel.com>
> Sent: July 5, 2022 5:20 AM
> To: Zeng, Oak <oak.zeng@intel.com>
> Cc: Vishwanathapura, Niranjana <niranjana.vishwanathapura@intel.com>;
> intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Vetter,
> Daniel <daniel.vetter@intel.com>; christian.koenig@amd.com; Hellstrom,
> Thomas <thomas.hellstrom@intel.com>; Zanoni, Paulo R
> <paulo.r.zanoni@intel.com>; Auld, Matthew <matthew.auld@intel.com>
> Subject: Re: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent
> vmas
> 
> On 2022-07-04 at 17:05:38 +0000, Zeng, Oak wrote:
> >
> >
> > Thanks,
> > Oak
> >
> > > -----Original Message-----
> > > From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf
> > > Of Niranjana Vishwanathapura
> > > Sent: July 1, 2022 6:51 PM
> > > To: intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> > > Cc: Zanoni, Paulo R <paulo.r.zanoni@intel.com>; Hellstrom, Thomas
> > > <thomas.hellstrom@intel.com>; Auld, Matthew
> > > <matthew.auld@intel.com>; Vetter, Daniel <daniel.vetter@intel.com>;
> > > christian.koenig@amd.com
> > > Subject: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent
> > > vmas
> > >
> > > Treat VM_BIND vmas as persistent and handle them during the request
> > > submission in the execbuff path.
> >
> > Hi Niranjana,
> >
> > Is the meaning of "persistent" above persistent across all the subsequent
> execbuf ioctls?
> 
> Yes oak. Thats correct. persistent across multiple execbuf ioctls.

Thank you Ram. Maybe we can add that to the commit message: " Treat VM_BIND vmas as persistent across multiple execbuf ioctls"?  I think this is versus the old execbuf mode where we bind in the execbuf ioctl and bindings are only valid for that execbuf. For those who don't have such background knowledge, it is hard to guess the meaning of persistent.

Thanks,
Oak
> 
> Regards,
> Ram.
> >
> > Thanks,
> > Oak
> >
> > >
> > > Support eviction by maintaining a list of evicted persistent vmas
> > > for rebinding during next submission.
> > >
> > > Signed-off-by: Niranjana Vishwanathapura
> > > <niranjana.vishwanathapura@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
> > >  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  3 +
> > >  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 12 ++-
> > >  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
> > >  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
> > >  drivers/gpu/drm/i915/i915_gem_gtt.h           | 22 ++++++
> > >  drivers/gpu/drm/i915/i915_vma.c               | 32 +++++++-
> > >  drivers/gpu/drm/i915/i915_vma.h               | 78 +++++++++++++++++--
> > >  drivers/gpu/drm/i915/i915_vma_types.h         | 23 ++++++
> > >  9 files changed, 163 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > index ccec4055fde3..5121f02ba95c 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > @@ -38,6 +38,7 @@
> > >  #include "i915_gem_mman.h"
> > >  #include "i915_gem_object.h"
> > >  #include "i915_gem_ttm.h"
> > > +#include "i915_gem_vm_bind.h"
> > >  #include "i915_memcpy.h"
> > >  #include "i915_trace.h"
> > >
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > index 849bf3c1061e..eaadf5a6ab09 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > @@ -6,6 +6,7 @@
> > >  #ifndef __I915_GEM_VM_BIND_H
> > >  #define __I915_GEM_VM_BIND_H
> > >
> > > +#include <linux/dma-resv.h>
> > >  #include "i915_drv.h"
> > >
> > >  #define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
> > > >vm_bind_lock)
> > > @@ -26,6 +27,8 @@ static inline void i915_gem_vm_bind_unlock(struct
> > > i915_address_space *vm)
> > >  	mutex_unlock(&vm->vm_bind_lock);
> > >  }
> > >
> > > +#define assert_vm_priv_held(vm)   assert_object_held((vm)->root_obj)
> > > +
> > >  static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
> > >  					struct i915_gem_ww_ctx *ww)
> > >  {
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > index 96f139cc8060..1a8efa83547f 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > @@ -85,6 +85,13 @@ void i915_gem_vm_bind_remove(struct i915_vma
> > > *vma, bool release_obj)  {
> > >  	assert_vm_bind_held(vma->vm);
> > >
> > > +	spin_lock(&vma->vm->vm_rebind_lock);
> > > +	if (!list_empty(&vma->vm_rebind_link))
> > > +		list_del_init(&vma->vm_rebind_link);
> > > +	i915_vma_set_purged(vma);
> > > +	i915_vma_set_freed(vma);
> > > +	spin_unlock(&vma->vm->vm_rebind_lock);
> > > +
> > >  	if (!list_empty(&vma->vm_bind_link)) {
> > >  		list_del_init(&vma->vm_bind_link);
> > >  		list_del_init(&vma->non_priv_vm_bind_link);
> > > @@ -220,6 +227,7 @@ static struct i915_vma *vm_bind_get_vma(struct
> > > i915_address_space *vm,
> > >
> > >  	vma->start = va->start;
> > >  	vma->last = va->start + va->length - 1;
> > > +	i915_vma_set_persistent(vma);
> > >
> > >  	return vma;
> > >  }
> > > @@ -304,8 +312,10 @@ int i915_gem_vm_bind_obj(struct
> > > i915_address_space *vm,
> > >
> > >  	i915_vm_bind_put_fence(vma);
> > >  put_vma:
> > > -	if (ret)
> > > +	if (ret) {
> > > +		i915_vma_set_freed(vma);
> > >  		i915_vma_destroy(vma);
> > > +	}
> > >
> > >  	i915_gem_ww_ctx_fini(&ww);
> > >  unlock_vm:
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > index df0a8459c3c6..55d5389b2c6c 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > @@ -293,6 +293,8 @@ void i915_address_space_init(struct
> > > i915_address_space *vm, int subclass)
> > >  	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
> > >  	vm->root_obj = i915_gem_object_create_internal(vm->i915,
> > > PAGE_SIZE);
> > >  	GEM_BUG_ON(IS_ERR(vm->root_obj));
> > > +	INIT_LIST_HEAD(&vm->vm_rebind_list);
> > > +	spin_lock_init(&vm->vm_rebind_lock);
> > >  }
> > >
> > >  void *__px_vaddr(struct drm_i915_gem_object *p) diff --git
> > > a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > index f538ce9115c9..fe5485c4a1cd 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > @@ -265,6 +265,8 @@ struct i915_address_space {
> > >  	struct mutex vm_bind_lock;  /* Protects vm_bind lists */
> > >  	struct list_head vm_bind_list;
> > >  	struct list_head vm_bound_list;
> > > +	struct list_head vm_rebind_list;
> > > +	spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
> > >  	/* va tree of persistent vmas */
> > >  	struct rb_root_cached va;
> > >  	struct list_head non_priv_vm_bind_list; diff --git
> > > a/drivers/gpu/drm/i915/i915_gem_gtt.h
> > > b/drivers/gpu/drm/i915/i915_gem_gtt.h
> > > index 8c2f57eb5dda..09b89d1913fc 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> > > @@ -51,4 +51,26 @@ int i915_gem_gtt_insert(struct i915_address_space
> > > *vm,
> > >
> > >  #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
> > >
> > > +static inline int i915_vm_sync(struct i915_address_space *vm) {
> > > +	int ret;
> > > +
> > > +	/* Wait for all requests under this vm to finish */
> > > +	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
> > > +				    DMA_RESV_USAGE_BOOKKEEP, false,
> > > +				    MAX_SCHEDULE_TIMEOUT);
> > > +	if (ret < 0)
> > > +		return ret;
> > > +	else if (ret > 0)
> > > +		return 0;
> > > +	else
> > > +		return -ETIMEDOUT;
> > > +}
> > > +
> > > +static inline bool i915_vm_is_active(const struct
> > > +i915_address_space
> > > +*vm) {
> > > +	return !dma_resv_test_signaled(vm->root_obj->base.resv,
> > > +				       DMA_RESV_USAGE_BOOKKEEP); }
> > > +
> > >  #endif
> > > diff --git a/drivers/gpu/drm/i915/i915_vma.c
> > > b/drivers/gpu/drm/i915/i915_vma.c index 6737236b7884..6adb013579be
> > > 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma.c
> > > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > > @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
> > >
> > >  	INIT_LIST_HEAD(&vma->vm_bind_link);
> > >  	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
> > > +	INIT_LIST_HEAD(&vma->vm_rebind_link);
> > >  	return vma;
> > >
> > >  err_unlock:
> > > @@ -1622,7 +1623,8 @@ void i915_vma_close(struct i915_vma *vma)
> > >  	if (atomic_dec_and_lock_irqsave(&vma->open_count,
> > >  					&gt->closed_lock,
> > >  					flags)) {
> > > -		__vma_close(vma, gt);
> > > +		if (!i915_vma_is_persistent(vma))
> > > +			__vma_close(vma, gt);
> > >  		spin_unlock_irqrestore(&gt->closed_lock, flags);
> > >  	}
> > >  }
> > > @@ -1647,6 +1649,13 @@ static void force_unbind(struct i915_vma *vma)
> > >  	if (!drm_mm_node_allocated(&vma->node))
> > >  		return;
> > >
> > > +	/*
> > > +	 * Mark persistent vma as purged to avoid it waiting
> > > +	 * for VM to be released.
> > > +	 */
> > > +	if (i915_vma_is_persistent(vma))
> > > +		i915_vma_set_purged(vma);
> > > +
> > >  	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
> > >  	WARN_ON(__i915_vma_unbind(vma));
> > >  	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
> > > @@ -1666,9 +1675,12 @@ static void release_references(struct
> > > i915_vma *vma, bool vm_ddestroy)
> > >
> > >  	spin_unlock(&obj->vma.lock);
> > >
> > > -	i915_gem_vm_bind_lock(vma->vm);
> > > -	i915_gem_vm_bind_remove(vma, true);
> > > -	i915_gem_vm_bind_unlock(vma->vm);
> > > +	if (i915_vma_is_persistent(vma) &&
> > > +	    !i915_vma_is_freed(vma)) {
> > > +		i915_gem_vm_bind_lock(vma->vm);
> > > +		i915_gem_vm_bind_remove(vma, true);
> > > +		i915_gem_vm_bind_unlock(vma->vm);
> > > +	}
> > >
> > >  	spin_lock_irq(&gt->closed_lock);
> > >  	__i915_vma_remove_closed(vma);
> > > @@ -1839,6 +1851,8 @@ int _i915_vma_move_to_active(struct
> i915_vma
> > > *vma,
> > >  	int err;
> > >
> > >  	assert_object_held(obj);
> > > +	if (i915_vma_is_persistent(vma))
> > > +		return -EINVAL;
> > >
> > >  	GEM_BUG_ON(!vma->pages);
> > >
> > > @@ -1999,6 +2013,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
> > >  	__i915_vma_evict(vma, false);
> > >
> > >  	drm_mm_remove_node(&vma->node); /* pairs with
> > > i915_vma_release() */
> > > +
> > > +	if (i915_vma_is_persistent(vma)) {
> > > +		spin_lock(&vma->vm->vm_rebind_lock);
> > > +		if (list_empty(&vma->vm_rebind_link) &&
> > > +		    !i915_vma_is_purged(vma))
> > > +			list_add_tail(&vma->vm_rebind_link,
> > > +				      &vma->vm->vm_rebind_list);
> > > +		spin_unlock(&vma->vm->vm_rebind_lock);
> > > +	}
> > > +
> > >  	return 0;
> > >  }
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_vma.h
> > > b/drivers/gpu/drm/i915/i915_vma.h index dcb49f79ff7e..6c1369a40e03
> > > 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma.h
> > > +++ b/drivers/gpu/drm/i915/i915_vma.h
> > > @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object
> > > *obj,
> > >
> > >  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned
> > > int flags);  #define I915_VMA_RELEASE_MAP BIT(0)
> > > -
> > > -static inline bool i915_vma_is_active(const struct i915_vma *vma) -{
> > > -	return !i915_active_is_idle(&vma->active);
> > > -}
> > > -
> > >  /* do not reserve memory to prevent deadlocks */  #define
> > > __EXEC_OBJECT_NO_RESERVE BIT(31)
> > >
> > > @@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct
> > > i915_vma
> > > *vma)
> > >  	return i915_vm_to_ggtt(vma->vm)->pin_bias;
> > >  }
> > >
> > > +static inline bool i915_vma_is_persistent(const struct i915_vma *vma) {
> > > +	return test_bit(I915_VMA_PERSISTENT_BIT,
> > > __i915_vma_flags(vma)); }
> > > +
> > > +static inline void i915_vma_set_persistent(struct i915_vma *vma) {
> > > +	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma)); }
> > > +
> > > +static inline bool i915_vma_is_purged(const struct i915_vma *vma) {
> > > +	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma)); }
> > > +
> > > +static inline void i915_vma_set_purged(struct i915_vma *vma) {
> > > +	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma)); }
> > > +
> > > +static inline bool i915_vma_is_freed(const struct i915_vma *vma) {
> > > +	return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma)); }
> > > +
> > > +static inline void i915_vma_set_freed(struct i915_vma *vma) {
> > > +	set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma)); }
> > > +
> > > +static inline bool i915_vma_is_active(const struct i915_vma *vma) {
> > > +	if (i915_vma_is_persistent(vma)) {
> > > +		if (i915_vma_is_purged(vma))
> > > +			return false;
> > > +
> > > +		return i915_vm_is_active(vma->vm);
> > > +	}
> > > +
> > > +	return !i915_active_is_idle(&vma->active);
> > > +}
> > > +
> > >  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)  {
> > >  	i915_gem_object_get(vma->obj);
> > > @@ -408,8 +444,36 @@ int i915_vma_wait_for_bind(struct i915_vma
> > > *vma);
> > >
> > >  static inline int i915_vma_sync(struct i915_vma *vma)  {
> > > +	int ret;
> > > +
> > >  	/* Wait for the asynchronous bindings and pending GPU reads */
> > > -	return i915_active_wait(&vma->active);
> > > +	ret = i915_active_wait(&vma->active);
> > > +	if (ret || !i915_vma_is_persistent(vma) ||
> > > i915_vma_is_purged(vma))
> > > +		return ret;
> > > +
> > > +	return i915_vm_sync(vma->vm);
> > > +}
> > > +
> > > +static inline bool i915_vma_is_bind_complete(struct i915_vma *vma) {
> > > +	/* Ensure vma bind is initiated */
> > > +	if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
> > > +		return false;
> > > +
> > > +	/* Ensure any binding started is complete */
> > > +	if (rcu_access_pointer(vma->active.excl.fence)) {
> > > +		struct dma_fence *fence;
> > > +
> > > +		rcu_read_lock();
> > > +		fence = dma_fence_get_rcu_safe(&vma->active.excl.fence);
> > > +		rcu_read_unlock();
> > > +		if (fence) {
> > > +			dma_fence_put(fence);
> > > +			return false;
> > > +		}
> > > +	}
> > > +
> > > +	return true;
> > >  }
> > >
> > >  /**
> > > diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> > > b/drivers/gpu/drm/i915/i915_vma_types.h
> > > index 7d830a6a0b51..405c82e1bc30 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma_types.h
> > > +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> > > @@ -264,6 +264,28 @@ struct i915_vma {
> > >  #define I915_VMA_SCANOUT_BIT	17
> > >  #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
> > >
> > > +  /**
> > > +   * I915_VMA_PERSISTENT_BIT:
> > > +   * The vma is persistent (created with VM_BIND call).
> > > +   *
> > > +   * I915_VMA_PURGED_BIT:
> > > +   * The persistent vma is force unbound either due to VM_UNBIND call
> > > +   * from UMD or VM is released. Do not check/wait for VM activeness
> > > +   * in i915_vma_is_active() and i915_vma_sync() calls.
> > > +   *
> > > +   * I915_VMA_FREED_BIT:
> > > +   * The persistent vma is being released by UMD via VM_UNBIND call.
> > > +   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND
> call
> > > +   * already holds the lock.
> > > +   */
> > > +#define I915_VMA_PERSISTENT_BIT	19
> > > +#define I915_VMA_PURGED_BIT	20
> > > +#define I915_VMA_FREED_BIT	21
> > > +
> > > +#define I915_VMA_PERSISTENT
> > > 	((int)BIT(I915_VMA_PERSISTENT_BIT))
> > > +#define I915_VMA_PURGED
> 	((int)BIT(I915_VMA_PURGED_BIT))
> > > +#define I915_VMA_FREED
> 	((int)BIT(I915_VMA_FREED_BIT))
> > > +
> > >  	struct i915_active active;
> > >
> > >  #define I915_VMA_PAGES_BIAS 24
> > > @@ -292,6 +314,7 @@ struct i915_vma {
> > >  	struct list_head vm_bind_link; /* Link in persistent VMA list */
> > >  	/* Link in non-private persistent VMA list */
> > >  	struct list_head non_priv_vm_bind_link;
> > > +	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
> > >
> > >  	/** Timeline fence for vm_bind completion notification */
> > >  	struct {
> > > --
> > > 2.21.0.rc0.32.g243a4c7e27
> >

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
@ 2022-07-05 13:50         ` Zeng, Oak
  0 siblings, 0 replies; 121+ messages in thread
From: Zeng, Oak @ 2022-07-05 13:50 UTC (permalink / raw)
  To: C, Ramalingam
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Hellstrom, Thomas, Auld,
	Matthew, Vetter,  Daniel, christian.koenig



Thanks,
Oak

> -----Original Message-----
> From: C, Ramalingam <ramalingam.c@intel.com>
> Sent: July 5, 2022 5:20 AM
> To: Zeng, Oak <oak.zeng@intel.com>
> Cc: Vishwanathapura, Niranjana <niranjana.vishwanathapura@intel.com>;
> intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Vetter,
> Daniel <daniel.vetter@intel.com>; christian.koenig@amd.com; Hellstrom,
> Thomas <thomas.hellstrom@intel.com>; Zanoni, Paulo R
> <paulo.r.zanoni@intel.com>; Auld, Matthew <matthew.auld@intel.com>
> Subject: Re: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent
> vmas
> 
> On 2022-07-04 at 17:05:38 +0000, Zeng, Oak wrote:
> >
> >
> > Thanks,
> > Oak
> >
> > > -----Original Message-----
> > > From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf
> > > Of Niranjana Vishwanathapura
> > > Sent: July 1, 2022 6:51 PM
> > > To: intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> > > Cc: Zanoni, Paulo R <paulo.r.zanoni@intel.com>; Hellstrom, Thomas
> > > <thomas.hellstrom@intel.com>; Auld, Matthew
> > > <matthew.auld@intel.com>; Vetter, Daniel <daniel.vetter@intel.com>;
> > > christian.koenig@amd.com
> > > Subject: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent
> > > vmas
> > >
> > > Treat VM_BIND vmas as persistent and handle them during the request
> > > submission in the execbuff path.
> >
> > Hi Niranjana,
> >
> > Is the meaning of "persistent" above persistent across all the subsequent
> execbuf ioctls?
> 
> Yes oak. Thats correct. persistent across multiple execbuf ioctls.

Thank you Ram. Maybe we can add that to the commit message: " Treat VM_BIND vmas as persistent across multiple execbuf ioctls"?  I think this is versus the old execbuf mode where we bind in the execbuf ioctl and bindings are only valid for that execbuf. For those who don't have such background knowledge, it is hard to guess the meaning of persistent.

Thanks,
Oak
> 
> Regards,
> Ram.
> >
> > Thanks,
> > Oak
> >
> > >
> > > Support eviction by maintaining a list of evicted persistent vmas
> > > for rebinding during next submission.
> > >
> > > Signed-off-by: Niranjana Vishwanathapura
> > > <niranjana.vishwanathapura@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
> > >  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  3 +
> > >  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 12 ++-
> > >  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
> > >  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
> > >  drivers/gpu/drm/i915/i915_gem_gtt.h           | 22 ++++++
> > >  drivers/gpu/drm/i915/i915_vma.c               | 32 +++++++-
> > >  drivers/gpu/drm/i915/i915_vma.h               | 78 +++++++++++++++++--
> > >  drivers/gpu/drm/i915/i915_vma_types.h         | 23 ++++++
> > >  9 files changed, 163 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > index ccec4055fde3..5121f02ba95c 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > @@ -38,6 +38,7 @@
> > >  #include "i915_gem_mman.h"
> > >  #include "i915_gem_object.h"
> > >  #include "i915_gem_ttm.h"
> > > +#include "i915_gem_vm_bind.h"
> > >  #include "i915_memcpy.h"
> > >  #include "i915_trace.h"
> > >
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > index 849bf3c1061e..eaadf5a6ab09 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > @@ -6,6 +6,7 @@
> > >  #ifndef __I915_GEM_VM_BIND_H
> > >  #define __I915_GEM_VM_BIND_H
> > >
> > > +#include <linux/dma-resv.h>
> > >  #include "i915_drv.h"
> > >
> > >  #define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
> > > >vm_bind_lock)
> > > @@ -26,6 +27,8 @@ static inline void i915_gem_vm_bind_unlock(struct
> > > i915_address_space *vm)
> > >  	mutex_unlock(&vm->vm_bind_lock);
> > >  }
> > >
> > > +#define assert_vm_priv_held(vm)   assert_object_held((vm)->root_obj)
> > > +
> > >  static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
> > >  					struct i915_gem_ww_ctx *ww)
> > >  {
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > index 96f139cc8060..1a8efa83547f 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > @@ -85,6 +85,13 @@ void i915_gem_vm_bind_remove(struct i915_vma
> > > *vma, bool release_obj)  {
> > >  	assert_vm_bind_held(vma->vm);
> > >
> > > +	spin_lock(&vma->vm->vm_rebind_lock);
> > > +	if (!list_empty(&vma->vm_rebind_link))
> > > +		list_del_init(&vma->vm_rebind_link);
> > > +	i915_vma_set_purged(vma);
> > > +	i915_vma_set_freed(vma);
> > > +	spin_unlock(&vma->vm->vm_rebind_lock);
> > > +
> > >  	if (!list_empty(&vma->vm_bind_link)) {
> > >  		list_del_init(&vma->vm_bind_link);
> > >  		list_del_init(&vma->non_priv_vm_bind_link);
> > > @@ -220,6 +227,7 @@ static struct i915_vma *vm_bind_get_vma(struct
> > > i915_address_space *vm,
> > >
> > >  	vma->start = va->start;
> > >  	vma->last = va->start + va->length - 1;
> > > +	i915_vma_set_persistent(vma);
> > >
> > >  	return vma;
> > >  }
> > > @@ -304,8 +312,10 @@ int i915_gem_vm_bind_obj(struct
> > > i915_address_space *vm,
> > >
> > >  	i915_vm_bind_put_fence(vma);
> > >  put_vma:
> > > -	if (ret)
> > > +	if (ret) {
> > > +		i915_vma_set_freed(vma);
> > >  		i915_vma_destroy(vma);
> > > +	}
> > >
> > >  	i915_gem_ww_ctx_fini(&ww);
> > >  unlock_vm:
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > index df0a8459c3c6..55d5389b2c6c 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > @@ -293,6 +293,8 @@ void i915_address_space_init(struct
> > > i915_address_space *vm, int subclass)
> > >  	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
> > >  	vm->root_obj = i915_gem_object_create_internal(vm->i915,
> > > PAGE_SIZE);
> > >  	GEM_BUG_ON(IS_ERR(vm->root_obj));
> > > +	INIT_LIST_HEAD(&vm->vm_rebind_list);
> > > +	spin_lock_init(&vm->vm_rebind_lock);
> > >  }
> > >
> > >  void *__px_vaddr(struct drm_i915_gem_object *p) diff --git
> > > a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > index f538ce9115c9..fe5485c4a1cd 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > @@ -265,6 +265,8 @@ struct i915_address_space {
> > >  	struct mutex vm_bind_lock;  /* Protects vm_bind lists */
> > >  	struct list_head vm_bind_list;
> > >  	struct list_head vm_bound_list;
> > > +	struct list_head vm_rebind_list;
> > > +	spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
> > >  	/* va tree of persistent vmas */
> > >  	struct rb_root_cached va;
> > >  	struct list_head non_priv_vm_bind_list; diff --git
> > > a/drivers/gpu/drm/i915/i915_gem_gtt.h
> > > b/drivers/gpu/drm/i915/i915_gem_gtt.h
> > > index 8c2f57eb5dda..09b89d1913fc 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> > > @@ -51,4 +51,26 @@ int i915_gem_gtt_insert(struct i915_address_space
> > > *vm,
> > >
> > >  #define PIN_OFFSET_MASK		I915_GTT_PAGE_MASK
> > >
> > > +static inline int i915_vm_sync(struct i915_address_space *vm) {
> > > +	int ret;
> > > +
> > > +	/* Wait for all requests under this vm to finish */
> > > +	ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
> > > +				    DMA_RESV_USAGE_BOOKKEEP, false,
> > > +				    MAX_SCHEDULE_TIMEOUT);
> > > +	if (ret < 0)
> > > +		return ret;
> > > +	else if (ret > 0)
> > > +		return 0;
> > > +	else
> > > +		return -ETIMEDOUT;
> > > +}
> > > +
> > > +static inline bool i915_vm_is_active(const struct
> > > +i915_address_space
> > > +*vm) {
> > > +	return !dma_resv_test_signaled(vm->root_obj->base.resv,
> > > +				       DMA_RESV_USAGE_BOOKKEEP); }
> > > +
> > >  #endif
> > > diff --git a/drivers/gpu/drm/i915/i915_vma.c
> > > b/drivers/gpu/drm/i915/i915_vma.c index 6737236b7884..6adb013579be
> > > 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma.c
> > > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > > @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
> > >
> > >  	INIT_LIST_HEAD(&vma->vm_bind_link);
> > >  	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
> > > +	INIT_LIST_HEAD(&vma->vm_rebind_link);
> > >  	return vma;
> > >
> > >  err_unlock:
> > > @@ -1622,7 +1623,8 @@ void i915_vma_close(struct i915_vma *vma)
> > >  	if (atomic_dec_and_lock_irqsave(&vma->open_count,
> > >  					&gt->closed_lock,
> > >  					flags)) {
> > > -		__vma_close(vma, gt);
> > > +		if (!i915_vma_is_persistent(vma))
> > > +			__vma_close(vma, gt);
> > >  		spin_unlock_irqrestore(&gt->closed_lock, flags);
> > >  	}
> > >  }
> > > @@ -1647,6 +1649,13 @@ static void force_unbind(struct i915_vma *vma)
> > >  	if (!drm_mm_node_allocated(&vma->node))
> > >  		return;
> > >
> > > +	/*
> > > +	 * Mark persistent vma as purged to avoid it waiting
> > > +	 * for VM to be released.
> > > +	 */
> > > +	if (i915_vma_is_persistent(vma))
> > > +		i915_vma_set_purged(vma);
> > > +
> > >  	atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
> > >  	WARN_ON(__i915_vma_unbind(vma));
> > >  	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
> > > @@ -1666,9 +1675,12 @@ static void release_references(struct
> > > i915_vma *vma, bool vm_ddestroy)
> > >
> > >  	spin_unlock(&obj->vma.lock);
> > >
> > > -	i915_gem_vm_bind_lock(vma->vm);
> > > -	i915_gem_vm_bind_remove(vma, true);
> > > -	i915_gem_vm_bind_unlock(vma->vm);
> > > +	if (i915_vma_is_persistent(vma) &&
> > > +	    !i915_vma_is_freed(vma)) {
> > > +		i915_gem_vm_bind_lock(vma->vm);
> > > +		i915_gem_vm_bind_remove(vma, true);
> > > +		i915_gem_vm_bind_unlock(vma->vm);
> > > +	}
> > >
> > >  	spin_lock_irq(&gt->closed_lock);
> > >  	__i915_vma_remove_closed(vma);
> > > @@ -1839,6 +1851,8 @@ int _i915_vma_move_to_active(struct
> i915_vma
> > > *vma,
> > >  	int err;
> > >
> > >  	assert_object_held(obj);
> > > +	if (i915_vma_is_persistent(vma))
> > > +		return -EINVAL;
> > >
> > >  	GEM_BUG_ON(!vma->pages);
> > >
> > > @@ -1999,6 +2013,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
> > >  	__i915_vma_evict(vma, false);
> > >
> > >  	drm_mm_remove_node(&vma->node); /* pairs with
> > > i915_vma_release() */
> > > +
> > > +	if (i915_vma_is_persistent(vma)) {
> > > +		spin_lock(&vma->vm->vm_rebind_lock);
> > > +		if (list_empty(&vma->vm_rebind_link) &&
> > > +		    !i915_vma_is_purged(vma))
> > > +			list_add_tail(&vma->vm_rebind_link,
> > > +				      &vma->vm->vm_rebind_list);
> > > +		spin_unlock(&vma->vm->vm_rebind_lock);
> > > +	}
> > > +
> > >  	return 0;
> > >  }
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_vma.h
> > > b/drivers/gpu/drm/i915/i915_vma.h index dcb49f79ff7e..6c1369a40e03
> > > 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma.h
> > > +++ b/drivers/gpu/drm/i915/i915_vma.h
> > > @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object
> > > *obj,
> > >
> > >  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned
> > > int flags);  #define I915_VMA_RELEASE_MAP BIT(0)
> > > -
> > > -static inline bool i915_vma_is_active(const struct i915_vma *vma) -{
> > > -	return !i915_active_is_idle(&vma->active);
> > > -}
> > > -
> > >  /* do not reserve memory to prevent deadlocks */  #define
> > > __EXEC_OBJECT_NO_RESERVE BIT(31)
> > >
> > > @@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct
> > > i915_vma
> > > *vma)
> > >  	return i915_vm_to_ggtt(vma->vm)->pin_bias;
> > >  }
> > >
> > > +static inline bool i915_vma_is_persistent(const struct i915_vma *vma) {
> > > +	return test_bit(I915_VMA_PERSISTENT_BIT,
> > > __i915_vma_flags(vma)); }
> > > +
> > > +static inline void i915_vma_set_persistent(struct i915_vma *vma) {
> > > +	set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma)); }
> > > +
> > > +static inline bool i915_vma_is_purged(const struct i915_vma *vma) {
> > > +	return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma)); }
> > > +
> > > +static inline void i915_vma_set_purged(struct i915_vma *vma) {
> > > +	set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma)); }
> > > +
> > > +static inline bool i915_vma_is_freed(const struct i915_vma *vma) {
> > > +	return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma)); }
> > > +
> > > +static inline void i915_vma_set_freed(struct i915_vma *vma) {
> > > +	set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma)); }
> > > +
> > > +static inline bool i915_vma_is_active(const struct i915_vma *vma) {
> > > +	if (i915_vma_is_persistent(vma)) {
> > > +		if (i915_vma_is_purged(vma))
> > > +			return false;
> > > +
> > > +		return i915_vm_is_active(vma->vm);
> > > +	}
> > > +
> > > +	return !i915_active_is_idle(&vma->active);
> > > +}
> > > +
> > >  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)  {
> > >  	i915_gem_object_get(vma->obj);
> > > @@ -408,8 +444,36 @@ int i915_vma_wait_for_bind(struct i915_vma
> > > *vma);
> > >
> > >  static inline int i915_vma_sync(struct i915_vma *vma)  {
> > > +	int ret;
> > > +
> > >  	/* Wait for the asynchronous bindings and pending GPU reads */
> > > -	return i915_active_wait(&vma->active);
> > > +	ret = i915_active_wait(&vma->active);
> > > +	if (ret || !i915_vma_is_persistent(vma) ||
> > > i915_vma_is_purged(vma))
> > > +		return ret;
> > > +
> > > +	return i915_vm_sync(vma->vm);
> > > +}
> > > +
> > > +static inline bool i915_vma_is_bind_complete(struct i915_vma *vma) {
> > > +	/* Ensure vma bind is initiated */
> > > +	if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
> > > +		return false;
> > > +
> > > +	/* Ensure any binding started is complete */
> > > +	if (rcu_access_pointer(vma->active.excl.fence)) {
> > > +		struct dma_fence *fence;
> > > +
> > > +		rcu_read_lock();
> > > +		fence = dma_fence_get_rcu_safe(&vma->active.excl.fence);
> > > +		rcu_read_unlock();
> > > +		if (fence) {
> > > +			dma_fence_put(fence);
> > > +			return false;
> > > +		}
> > > +	}
> > > +
> > > +	return true;
> > >  }
> > >
> > >  /**
> > > diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> > > b/drivers/gpu/drm/i915/i915_vma_types.h
> > > index 7d830a6a0b51..405c82e1bc30 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma_types.h
> > > +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> > > @@ -264,6 +264,28 @@ struct i915_vma {
> > >  #define I915_VMA_SCANOUT_BIT	17
> > >  #define I915_VMA_SCANOUT	((int)BIT(I915_VMA_SCANOUT_BIT))
> > >
> > > +  /**
> > > +   * I915_VMA_PERSISTENT_BIT:
> > > +   * The vma is persistent (created with VM_BIND call).
> > > +   *
> > > +   * I915_VMA_PURGED_BIT:
> > > +   * The persistent vma is force unbound either due to VM_UNBIND call
> > > +   * from UMD or VM is released. Do not check/wait for VM activeness
> > > +   * in i915_vma_is_active() and i915_vma_sync() calls.
> > > +   *
> > > +   * I915_VMA_FREED_BIT:
> > > +   * The persistent vma is being released by UMD via VM_UNBIND call.
> > > +   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND
> call
> > > +   * already holds the lock.
> > > +   */
> > > +#define I915_VMA_PERSISTENT_BIT	19
> > > +#define I915_VMA_PURGED_BIT	20
> > > +#define I915_VMA_FREED_BIT	21
> > > +
> > > +#define I915_VMA_PERSISTENT
> > > 	((int)BIT(I915_VMA_PERSISTENT_BIT))
> > > +#define I915_VMA_PURGED
> 	((int)BIT(I915_VMA_PURGED_BIT))
> > > +#define I915_VMA_FREED
> 	((int)BIT(I915_VMA_FREED_BIT))
> > > +
> > >  	struct i915_active active;
> > >
> > >  #define I915_VMA_PAGES_BIAS 24
> > > @@ -292,6 +314,7 @@ struct i915_vma {
> > >  	struct list_head vm_bind_link; /* Link in persistent VMA list */
> > >  	/* Link in non-private persistent VMA list */
> > >  	struct list_head non_priv_vm_bind_link;
> > > +	struct list_head vm_rebind_link; /* Link in vm_rebind_list */
> > >
> > >  	/** Timeline fence for vm_bind completion notification */
> > >  	struct {
> > > --
> > > 2.21.0.rc0.32.g243a4c7e27
> >

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
  2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-06 16:21     ` Thomas Hellström
  -1 siblings, 0 replies; 121+ messages in thread
From: Thomas Hellström @ 2022-07-06 16:21 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, matthew.auld, jason, daniel.vetter,
	christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> Bind and unbind the mappings upon VM_BIND and VM_UNBIND calls.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> ---
>  drivers/gpu/drm/i915/Makefile                 |   1 +
>  drivers/gpu/drm/i915/gem/i915_gem_create.c    |  10 +-
>  drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +
>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  38 +++
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 233
> ++++++++++++++++++
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |   7 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>  drivers/gpu/drm/i915/i915_driver.c            |  11 +-
>  drivers/gpu/drm/i915/i915_vma.c               |   7 +-
>  drivers/gpu/drm/i915/i915_vma.h               |   2 -
>  drivers/gpu/drm/i915/i915_vma_types.h         |   8 +
>  11 files changed, 318 insertions(+), 10 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>  create mode 100644
> drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> 
> diff --git a/drivers/gpu/drm/i915/Makefile
> b/drivers/gpu/drm/i915/Makefile
> index 522ef9b4aff3..4e1627e96c6e 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -165,6 +165,7 @@ gem-y += \
>         gem/i915_gem_ttm_move.o \
>         gem/i915_gem_ttm_pm.o \
>         gem/i915_gem_userptr.o \
> +       gem/i915_gem_vm_bind_object.o \
>         gem/i915_gem_wait.o \
>         gem/i915_gemfs.o
>  i915-y += \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> index 33673fe7ee0a..927a87e5ec59 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> @@ -15,10 +15,10 @@
>  #include "i915_trace.h"
>  #include "i915_user_extensions.h"
>  
> -static u32 object_max_page_size(struct intel_memory_region
> **placements,
> -                               unsigned int n_placements)
> +u32 i915_gem_object_max_page_size(struct intel_memory_region
> **placements,

Kerneldoc.

> +                                 unsigned int n_placements)
>  {
> -       u32 max_page_size = 0;
> +       u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>         int i;
>  
>         for (i = 0; i < n_placements; i++) {
> @@ -28,7 +28,6 @@ static u32 object_max_page_size(struct
> intel_memory_region **placements,
>                 max_page_size = max_t(u32, max_page_size, mr-
> >min_page_size);
>         }
>  
> -       GEM_BUG_ON(!max_page_size);
>         return max_page_size;
>  }

Should this change be separated out? It's not immediately clear to a
reviewer why it is included.

>  
> @@ -99,7 +98,8 @@ __i915_gem_object_create_user_ext(struct
> drm_i915_private *i915, u64 size,
>  
>         i915_gem_flush_free_objects(i915);
>  
> -       size = round_up(size, object_max_page_size(placements,
> n_placements));
> +       size = round_up(size,
> i915_gem_object_max_page_size(placements,
> +                                                          
> n_placements));
>         if (size == 0)
>                 return ERR_PTR(-EINVAL);
>  
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 6f0a3ce35567..650de2224843 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64
> size)
>  }
>  
>  void i915_gem_init__objects(struct drm_i915_private *i915);
> +u32 i915_gem_object_max_page_size(struct intel_memory_region
> **placements,
> +                                 unsigned int n_placements);
>  
>  void i915_objects_module_exit(void);
>  int i915_objects_module_init(void);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> new file mode 100644
> index 000000000000..642cdb559f17
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -0,0 +1,38 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#ifndef __I915_GEM_VM_BIND_H
> +#define __I915_GEM_VM_BIND_H
> +
> +#include "i915_drv.h"
> +
> +#define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
> >vm_bind_lock)
> +
> +static inline void i915_gem_vm_bind_lock(struct i915_address_space
> *vm)
> +{
> +       mutex_lock(&vm->vm_bind_lock);
> +}
> +
> +static inline int
> +i915_gem_vm_bind_lock_interruptible(struct i915_address_space *vm)
> +{
> +       return mutex_lock_interruptible(&vm->vm_bind_lock);
> +}
> +
> +static inline void i915_gem_vm_bind_unlock(struct i915_address_space
> *vm)
> +{
> +       mutex_unlock(&vm->vm_bind_lock);
> +}
> +

Kerneldoc for the inlines.

> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
> release_obj);
> +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> +                        struct drm_i915_gem_vm_bind *va,
> +                        struct drm_file *file);
> +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
> +                          struct drm_i915_gem_vm_unbind *va);
> +
> +#endif /* __I915_GEM_VM_BIND_H */
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> new file mode 100644
> index 000000000000..43ceb4dcca6c
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -0,0 +1,233 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#include <linux/interval_tree_generic.h>
> +
> +#include "gem/i915_gem_vm_bind.h"
> +#include "gt/gen8_engine_cs.h"
> +
> +#include "i915_drv.h"
> +#include "i915_gem_gtt.h"
> +
> +#define START(node) ((node)->start)
> +#define LAST(node) ((node)->last)
> +
> +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
> +                    START, LAST, static inline, i915_vm_bind_it)
> +
> +#undef START
> +#undef LAST
> +
> +/**
> + * DOC: VM_BIND/UNBIND ioctls
> + *
> + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM
> buffer
> + * objects (BOs) or sections of a BOs at specified GPU virtual
> addresses on a
> + * specified address space (VM). Multiple mappings can map to the
> same physical
> + * pages of an object (aliasing). These mappings (also referred to
> as persistent
> + * mappings) will be persistent across multiple GPU submissions
> (execbuf calls)
> + * issued by the UMD, without user having to provide a list of all
> required
> + * mappings during each submission (as required by older execbuf
> mode).
> + *
> + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out
> fence for
> + * signaling the completion of bind/unbind operation.
> + *
> + * VM_BIND feature is advertised to user via
> I915_PARAM_VM_BIND_VERSION.
> + * User has to opt-in for VM_BIND mode of binding for an address
> space (VM)
> + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND
> extension.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND
> operations can be
> + * done asynchronously, when valid out fence is specified.
> + *
> + * VM_BIND locking order is as below.
> + *
> + * 1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock
> is taken in
> + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while
> releasing the
> + *    mapping.
> + *
> + *    In future, when GPU page faults are supported, we can
> potentially use a
> + *    rwsem instead, so that multiple page fault handlers can take
> the read
> + *    side lock to lookup the mapping and hence can run in parallel.
> + *    The older execbuf mode of binding do not need this lock.
> + *
> + * 2) Lock-B: The object's dma-resv lock will protect i915_vma state
> and needs
> + *    to be held while binding/unbinding a vma in the async worker
> and while
> + *    updating dma-resv fence list of an object. Note that private
> BOs of a VM
> + *    will all share a dma-resv object.
> + *
> + *    The future system allocator support will use the HMM
> prescribed locking
> + *    instead.

I don't think the last sentence is relevant for this series. Also, are
there any other mentions for Locks A, B and C? If not, can we ditch
that naming?

> + *
> + * 3) Lock-C: Spinlock/s to protect some of the VM's lists like the
> list of
> + *    invalidated vmas (due to eviction and userptr invalidation)
> etc.
> + */
> +
> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)

Kerneldoc for the extern functions.


> +{
> +       struct i915_vma *vma, *temp;
> +
> +       assert_vm_bind_held(vm);
> +
> +       vma = i915_vm_bind_it_iter_first(&vm->va, va, va);
> +       /* Working around compiler error, remove later */

Is this still relevant? What compiler error is seen here?

> +       if (vma)
> +               temp = i915_vm_bind_it_iter_next(vma, va + vma->size,
> -1);
> +       return vma;
> +}
> +
> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
> +{
> +       assert_vm_bind_held(vma->vm);
> +
> +       if (!list_empty(&vma->vm_bind_link)) {
> +               list_del_init(&vma->vm_bind_link);
> +               i915_vm_bind_it_remove(vma, &vma->vm->va);
> +
> +               /* Release object */
> +               if (release_obj)
> +                       i915_vma_put(vma);

i915_vma_put() here is confusing. Can we use i915_gem_object_put() to
further make it clear that the persistent vmas actually take a
reference on the object?

> +       }
> +}
> +
> +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
> +                          struct drm_i915_gem_vm_unbind *va)
> +{
> +       struct drm_i915_gem_object *obj;
> +       struct i915_vma *vma;
> +       int ret;
> +
> +       va->start = gen8_noncanonical_addr(va->start);
> +       ret = i915_gem_vm_bind_lock_interruptible(vm);
> +       if (ret)
> +               return ret;
> +
> +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +       if (!vma) {
> +               ret = -ENOENT;
> +               goto out_unlock;
> +       }
> +
> +       if (vma->size != va->length)
> +               ret = -EINVAL;
> +       else
> +               i915_gem_vm_bind_remove(vma, false);
> +
> +out_unlock:
> +       i915_gem_vm_bind_unlock(vm);
> +       if (ret || !vma)
> +               return ret;
> +
> +       /* Destroy vma and then release object */
> +       obj = vma->obj;
> +       ret = i915_gem_object_lock(obj, NULL);
> +       if (ret)
> +               return ret;

This call never returns an error and we could GEM_WARN_ON(...), or
(void) to annotate that the return value is wilfully ignored.

> +
> +       i915_vma_destroy(vma);
> +       i915_gem_object_unlock(obj);
> +       i915_gem_object_put(obj);
> +
> +       return 0;
> +}
> +
> +static struct i915_vma *vm_bind_get_vma(struct i915_address_space
> *vm,
> +                                       struct drm_i915_gem_object
> *obj,
> +                                       struct drm_i915_gem_vm_bind
> *va)
> +{
> +       struct i915_ggtt_view view;
> +       struct i915_vma *vma;
> +
> +       va->start = gen8_noncanonical_addr(va->start);
> +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +       if (vma)
> +               return ERR_PTR(-EEXIST);
> +
> +       view.type = I915_GGTT_VIEW_PARTIAL;
> +       view.partial.offset = va->offset >> PAGE_SHIFT;
> +       view.partial.size = va->length >> PAGE_SHIFT;

IIRC, this vma view is not handled correctly in the vma code, that only
understands views for ggtt bindings.


> +       vma = i915_vma_instance(obj, vm, &view);
> +       if (IS_ERR(vma))
> +               return vma;
> +
> +       vma->start = va->start;
> +       vma->last = va->start + va->length - 1;
> +
> +       return vma;
> +}
> +
> +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> +                        struct drm_i915_gem_vm_bind *va,
> +                        struct drm_file *file)
> +{
> +       struct drm_i915_gem_object *obj;
> +       struct i915_vma *vma = NULL;
> +       struct i915_gem_ww_ctx ww;
> +       u64 pin_flags;
> +       int ret = 0;
> +
> +       if (!vm->vm_bind_mode)
> +               return -EOPNOTSUPP;
> +
> +       obj = i915_gem_object_lookup(file, va->handle);
> +       if (!obj)
> +               return -ENOENT;
> +
> +       if (!va->length ||
> +           !IS_ALIGNED(va->offset | va->length,
> +                       i915_gem_object_max_page_size(obj-
> >mm.placements,
> +                                                     obj-
> >mm.n_placements)) ||
> +           range_overflows_t(u64, va->offset, va->length, obj-
> >base.size)) {
> +               ret = -EINVAL;
> +               goto put_obj;
> +       }
> +
> +       ret = i915_gem_vm_bind_lock_interruptible(vm);
> +       if (ret)
> +               goto put_obj;
> +
> +       vma = vm_bind_get_vma(vm, obj, va);
> +       if (IS_ERR(vma)) {
> +               ret = PTR_ERR(vma);
> +               goto unlock_vm;
> +       }
> +
> +       i915_gem_ww_ctx_init(&ww, true);
> +       pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
> +retry:
> +       ret = i915_gem_object_lock(vma->obj, &ww);
> +       if (ret)
> +               goto out_ww;
> +
> +       ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
> +       if (ret)
> +               goto out_ww;
> +
> +       /* Make it evictable */
> +       __i915_vma_unpin(vma);

A considerable effort has been put into avoiding short term vma pins in
i915. We should add an interface like i915_vma_bind_ww() that avoids
the pin altoghether.

> +
> +       list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
> +       i915_vm_bind_it_insert(vma, &vm->va);
> +
> +       /* Hold object reference until vm_unbind */
> +       i915_gem_object_get(vma->obj);
> +out_ww:
> +       if (ret == -EDEADLK) {
> +               ret = i915_gem_ww_ctx_backoff(&ww);
> +               if (!ret)
> +                       goto retry;
> +       }
> +
> +       if (ret)
> +               i915_vma_destroy(vma);
> +
> +       i915_gem_ww_ctx_fini(&ww);

Could use for_i915_gem_ww()?

> +unlock_vm:
> +       i915_gem_vm_bind_unlock(vm);
> +put_obj:
> +       i915_gem_object_put(obj);
> +       return ret;
> +}
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index b67831833c9a..135dc4a76724 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -176,6 +176,8 @@ int i915_vm_lock_objects(struct
> i915_address_space *vm,
>  void i915_address_space_fini(struct i915_address_space *vm)
>  {
>         drm_mm_takedown(&vm->mm);
> +       GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
> +       mutex_destroy(&vm->vm_bind_lock);
>  }
>  
>  /**
> @@ -282,6 +284,11 @@ void i915_address_space_init(struct
> i915_address_space *vm, int subclass)
>  
>         INIT_LIST_HEAD(&vm->bound_list);
>         INIT_LIST_HEAD(&vm->unbound_list);
> +
> +       vm->va = RB_ROOT_CACHED;
> +       INIT_LIST_HEAD(&vm->vm_bind_list);
> +       INIT_LIST_HEAD(&vm->vm_bound_list);
> +       mutex_init(&vm->vm_bind_lock);
>  }
>  
>  void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index c812aa9708ae..d4a6ce65251d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -259,6 +259,15 @@ struct i915_address_space {
>          */
>         struct list_head unbound_list;
>  
> +       /**
> +        * List of VM_BIND objects.
> +        */

Proper kerneldoc + intel locking guidelines comments, please.

> +       struct mutex vm_bind_lock;  /* Protects vm_bind lists */
> +       struct list_head vm_bind_list;
> +       struct list_head vm_bound_list;
> +       /* va tree of persistent vmas */
> +       struct rb_root_cached va;
> +
>         /* Global GTT */
>         bool is_ggtt:1;
>  
> diff --git a/drivers/gpu/drm/i915/i915_driver.c
> b/drivers/gpu/drm/i915/i915_driver.c
> index ccf990dfd99b..776ab7844f60 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -68,6 +68,7 @@
>  #include "gem/i915_gem_ioctls.h"
>  #include "gem/i915_gem_mman.h"
>  #include "gem/i915_gem_pm.h"
> +#include "gem/i915_gem_vm_bind.h"
>  #include "gt/intel_gt.h"
>  #include "gt/intel_gt_pm.h"
>  #include "gt/intel_rc6.h"
> @@ -1783,13 +1784,16 @@ static int i915_gem_vm_bind_ioctl(struct
> drm_device *dev, void *data,
>  {
>         struct drm_i915_gem_vm_bind *args = data;
>         struct i915_address_space *vm;
> +       int ret;
>  
>         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>         if (unlikely(!vm))
>                 return -ENOENT;
>  
> +       ret = i915_gem_vm_bind_obj(vm, args, file);
> +
>         i915_vm_put(vm);
> -       return -EINVAL;
> +       return ret;
>  }
>  
>  static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void
> *data,
> @@ -1797,13 +1801,16 @@ static int i915_gem_vm_unbind_ioctl(struct
> drm_device *dev, void *data,
>  {
>         struct drm_i915_gem_vm_unbind *args = data;
>         struct i915_address_space *vm;
> +       int ret;
>  
>         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>         if (unlikely(!vm))
>                 return -ENOENT;
>  
> +       ret = i915_gem_vm_unbind_obj(vm, args);
> +
>         i915_vm_put(vm);
> -       return -EINVAL;
> +       return ret;
>  }
>  
>  static const struct drm_ioctl_desc i915_ioctls[] = {
> diff --git a/drivers/gpu/drm/i915/i915_vma.c
> b/drivers/gpu/drm/i915/i915_vma.c
> index 43339ecabd73..d324e29cef0a 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -29,6 +29,7 @@
>  #include "display/intel_frontbuffer.h"
>  #include "gem/i915_gem_lmem.h"
>  #include "gem/i915_gem_tiling.h"
> +#include "gem/i915_gem_vm_bind.h"
>  #include "gt/intel_engine.h"
>  #include "gt/intel_engine_heartbeat.h"
>  #include "gt/intel_gt.h"
> @@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
>         spin_unlock(&obj->vma.lock);
>         mutex_unlock(&vm->mutex);
>  
> +       INIT_LIST_HEAD(&vma->vm_bind_link);
>         return vma;
>  
>  err_unlock:
> @@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object
> *obj,
>  {
>         struct i915_vma *vma;
>  
> -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>         GEM_BUG_ON(!kref_read(&vm->ref));
>  
>         spin_lock(&obj->vma.lock);
> @@ -1660,6 +1661,10 @@ static void release_references(struct i915_vma
> *vma, bool vm_ddestroy)
>  
>         spin_unlock(&obj->vma.lock);
>  
> +       i915_gem_vm_bind_lock(vma->vm);
> +       i915_gem_vm_bind_remove(vma, true);
> +       i915_gem_vm_bind_unlock(vma->vm);
> +

The vm might be destroyed at this point already.

From what I understand we can destroy the vma from three call sites:
1) VM_UNBIND -> The vma has already been removed from the vm_bind
address space,
2) object destruction -> since the vma has an object reference while in
the vm_bind address space, it must also have been removed from the
address space if called from object destruction.
3) vm destruction. Suggestion is to call VM_UNBIND from under the
vm_bind lock early in vm destruction. 

Then the above added code can be removed and replaced with an assert
that the vm_bind address space RB_NODE is indeed empty.


>         spin_lock_irq(&gt->closed_lock);
>         __i915_vma_remove_closed(vma);
>         spin_unlock_irq(&gt->closed_lock);
> diff --git a/drivers/gpu/drm/i915/i915_vma.h
> b/drivers/gpu/drm/i915/i915_vma.h
> index 88ca0bd9c900..dcb49f79ff7e 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
>  {
>         ptrdiff_t cmp;
>  
> -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
> -
>         cmp = ptrdiff(vma->vm, vm);
>         if (cmp)
>                 return cmp;
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> b/drivers/gpu/drm/i915/i915_vma_types.h
> index be6e028c3b57..b6d179bdbfa0 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -289,6 +289,14 @@ struct i915_vma {
>         /** This object's place on the active/inactive lists */
>         struct list_head vm_link;
>  
> +       struct list_head vm_bind_link; /* Link in persistent VMA list
> */
> +
> +       /** Interval tree structures for persistent vma */

Proper kerneldoc.

> +       struct rb_node rb;
> +       u64 start;
> +       u64 last;
> +       u64 __subtree_last;
> +
>         struct list_head obj_link; /* Link in the object's VMA list
> */
>         struct rb_node obj_node;
>         struct hlist_node obj_hash;

Thanks,
Thomas


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
@ 2022-07-06 16:21     ` Thomas Hellström
  0 siblings, 0 replies; 121+ messages in thread
From: Thomas Hellström @ 2022-07-06 16:21 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: paulo.r.zanoni, matthew.auld, daniel.vetter, christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> Bind and unbind the mappings upon VM_BIND and VM_UNBIND calls.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> ---
>  drivers/gpu/drm/i915/Makefile                 |   1 +
>  drivers/gpu/drm/i915/gem/i915_gem_create.c    |  10 +-
>  drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +
>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  38 +++
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 233
> ++++++++++++++++++
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |   7 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>  drivers/gpu/drm/i915/i915_driver.c            |  11 +-
>  drivers/gpu/drm/i915/i915_vma.c               |   7 +-
>  drivers/gpu/drm/i915/i915_vma.h               |   2 -
>  drivers/gpu/drm/i915/i915_vma_types.h         |   8 +
>  11 files changed, 318 insertions(+), 10 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>  create mode 100644
> drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> 
> diff --git a/drivers/gpu/drm/i915/Makefile
> b/drivers/gpu/drm/i915/Makefile
> index 522ef9b4aff3..4e1627e96c6e 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -165,6 +165,7 @@ gem-y += \
>         gem/i915_gem_ttm_move.o \
>         gem/i915_gem_ttm_pm.o \
>         gem/i915_gem_userptr.o \
> +       gem/i915_gem_vm_bind_object.o \
>         gem/i915_gem_wait.o \
>         gem/i915_gemfs.o
>  i915-y += \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> index 33673fe7ee0a..927a87e5ec59 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> @@ -15,10 +15,10 @@
>  #include "i915_trace.h"
>  #include "i915_user_extensions.h"
>  
> -static u32 object_max_page_size(struct intel_memory_region
> **placements,
> -                               unsigned int n_placements)
> +u32 i915_gem_object_max_page_size(struct intel_memory_region
> **placements,

Kerneldoc.

> +                                 unsigned int n_placements)
>  {
> -       u32 max_page_size = 0;
> +       u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>         int i;
>  
>         for (i = 0; i < n_placements; i++) {
> @@ -28,7 +28,6 @@ static u32 object_max_page_size(struct
> intel_memory_region **placements,
>                 max_page_size = max_t(u32, max_page_size, mr-
> >min_page_size);
>         }
>  
> -       GEM_BUG_ON(!max_page_size);
>         return max_page_size;
>  }

Should this change be separated out? It's not immediately clear to a
reviewer why it is included.

>  
> @@ -99,7 +98,8 @@ __i915_gem_object_create_user_ext(struct
> drm_i915_private *i915, u64 size,
>  
>         i915_gem_flush_free_objects(i915);
>  
> -       size = round_up(size, object_max_page_size(placements,
> n_placements));
> +       size = round_up(size,
> i915_gem_object_max_page_size(placements,
> +                                                          
> n_placements));
>         if (size == 0)
>                 return ERR_PTR(-EINVAL);
>  
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 6f0a3ce35567..650de2224843 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64
> size)
>  }
>  
>  void i915_gem_init__objects(struct drm_i915_private *i915);
> +u32 i915_gem_object_max_page_size(struct intel_memory_region
> **placements,
> +                                 unsigned int n_placements);
>  
>  void i915_objects_module_exit(void);
>  int i915_objects_module_init(void);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> new file mode 100644
> index 000000000000..642cdb559f17
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -0,0 +1,38 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#ifndef __I915_GEM_VM_BIND_H
> +#define __I915_GEM_VM_BIND_H
> +
> +#include "i915_drv.h"
> +
> +#define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
> >vm_bind_lock)
> +
> +static inline void i915_gem_vm_bind_lock(struct i915_address_space
> *vm)
> +{
> +       mutex_lock(&vm->vm_bind_lock);
> +}
> +
> +static inline int
> +i915_gem_vm_bind_lock_interruptible(struct i915_address_space *vm)
> +{
> +       return mutex_lock_interruptible(&vm->vm_bind_lock);
> +}
> +
> +static inline void i915_gem_vm_bind_unlock(struct i915_address_space
> *vm)
> +{
> +       mutex_unlock(&vm->vm_bind_lock);
> +}
> +

Kerneldoc for the inlines.

> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
> release_obj);
> +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> +                        struct drm_i915_gem_vm_bind *va,
> +                        struct drm_file *file);
> +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
> +                          struct drm_i915_gem_vm_unbind *va);
> +
> +#endif /* __I915_GEM_VM_BIND_H */
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> new file mode 100644
> index 000000000000..43ceb4dcca6c
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -0,0 +1,233 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#include <linux/interval_tree_generic.h>
> +
> +#include "gem/i915_gem_vm_bind.h"
> +#include "gt/gen8_engine_cs.h"
> +
> +#include "i915_drv.h"
> +#include "i915_gem_gtt.h"
> +
> +#define START(node) ((node)->start)
> +#define LAST(node) ((node)->last)
> +
> +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
> +                    START, LAST, static inline, i915_vm_bind_it)
> +
> +#undef START
> +#undef LAST
> +
> +/**
> + * DOC: VM_BIND/UNBIND ioctls
> + *
> + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM
> buffer
> + * objects (BOs) or sections of a BOs at specified GPU virtual
> addresses on a
> + * specified address space (VM). Multiple mappings can map to the
> same physical
> + * pages of an object (aliasing). These mappings (also referred to
> as persistent
> + * mappings) will be persistent across multiple GPU submissions
> (execbuf calls)
> + * issued by the UMD, without user having to provide a list of all
> required
> + * mappings during each submission (as required by older execbuf
> mode).
> + *
> + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out
> fence for
> + * signaling the completion of bind/unbind operation.
> + *
> + * VM_BIND feature is advertised to user via
> I915_PARAM_VM_BIND_VERSION.
> + * User has to opt-in for VM_BIND mode of binding for an address
> space (VM)
> + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND
> extension.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND
> operations can be
> + * done asynchronously, when valid out fence is specified.
> + *
> + * VM_BIND locking order is as below.
> + *
> + * 1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock
> is taken in
> + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while
> releasing the
> + *    mapping.
> + *
> + *    In future, when GPU page faults are supported, we can
> potentially use a
> + *    rwsem instead, so that multiple page fault handlers can take
> the read
> + *    side lock to lookup the mapping and hence can run in parallel.
> + *    The older execbuf mode of binding do not need this lock.
> + *
> + * 2) Lock-B: The object's dma-resv lock will protect i915_vma state
> and needs
> + *    to be held while binding/unbinding a vma in the async worker
> and while
> + *    updating dma-resv fence list of an object. Note that private
> BOs of a VM
> + *    will all share a dma-resv object.
> + *
> + *    The future system allocator support will use the HMM
> prescribed locking
> + *    instead.

I don't think the last sentence is relevant for this series. Also, are
there any other mentions for Locks A, B and C? If not, can we ditch
that naming?

> + *
> + * 3) Lock-C: Spinlock/s to protect some of the VM's lists like the
> list of
> + *    invalidated vmas (due to eviction and userptr invalidation)
> etc.
> + */
> +
> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)

Kerneldoc for the extern functions.


> +{
> +       struct i915_vma *vma, *temp;
> +
> +       assert_vm_bind_held(vm);
> +
> +       vma = i915_vm_bind_it_iter_first(&vm->va, va, va);
> +       /* Working around compiler error, remove later */

Is this still relevant? What compiler error is seen here?

> +       if (vma)
> +               temp = i915_vm_bind_it_iter_next(vma, va + vma->size,
> -1);
> +       return vma;
> +}
> +
> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
> +{
> +       assert_vm_bind_held(vma->vm);
> +
> +       if (!list_empty(&vma->vm_bind_link)) {
> +               list_del_init(&vma->vm_bind_link);
> +               i915_vm_bind_it_remove(vma, &vma->vm->va);
> +
> +               /* Release object */
> +               if (release_obj)
> +                       i915_vma_put(vma);

i915_vma_put() here is confusing. Can we use i915_gem_object_put() to
further make it clear that the persistent vmas actually take a
reference on the object?

> +       }
> +}
> +
> +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
> +                          struct drm_i915_gem_vm_unbind *va)
> +{
> +       struct drm_i915_gem_object *obj;
> +       struct i915_vma *vma;
> +       int ret;
> +
> +       va->start = gen8_noncanonical_addr(va->start);
> +       ret = i915_gem_vm_bind_lock_interruptible(vm);
> +       if (ret)
> +               return ret;
> +
> +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +       if (!vma) {
> +               ret = -ENOENT;
> +               goto out_unlock;
> +       }
> +
> +       if (vma->size != va->length)
> +               ret = -EINVAL;
> +       else
> +               i915_gem_vm_bind_remove(vma, false);
> +
> +out_unlock:
> +       i915_gem_vm_bind_unlock(vm);
> +       if (ret || !vma)
> +               return ret;
> +
> +       /* Destroy vma and then release object */
> +       obj = vma->obj;
> +       ret = i915_gem_object_lock(obj, NULL);
> +       if (ret)
> +               return ret;

This call never returns an error and we could GEM_WARN_ON(...), or
(void) to annotate that the return value is wilfully ignored.

> +
> +       i915_vma_destroy(vma);
> +       i915_gem_object_unlock(obj);
> +       i915_gem_object_put(obj);
> +
> +       return 0;
> +}
> +
> +static struct i915_vma *vm_bind_get_vma(struct i915_address_space
> *vm,
> +                                       struct drm_i915_gem_object
> *obj,
> +                                       struct drm_i915_gem_vm_bind
> *va)
> +{
> +       struct i915_ggtt_view view;
> +       struct i915_vma *vma;
> +
> +       va->start = gen8_noncanonical_addr(va->start);
> +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +       if (vma)
> +               return ERR_PTR(-EEXIST);
> +
> +       view.type = I915_GGTT_VIEW_PARTIAL;
> +       view.partial.offset = va->offset >> PAGE_SHIFT;
> +       view.partial.size = va->length >> PAGE_SHIFT;

IIRC, this vma view is not handled correctly in the vma code, that only
understands views for ggtt bindings.


> +       vma = i915_vma_instance(obj, vm, &view);
> +       if (IS_ERR(vma))
> +               return vma;
> +
> +       vma->start = va->start;
> +       vma->last = va->start + va->length - 1;
> +
> +       return vma;
> +}
> +
> +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> +                        struct drm_i915_gem_vm_bind *va,
> +                        struct drm_file *file)
> +{
> +       struct drm_i915_gem_object *obj;
> +       struct i915_vma *vma = NULL;
> +       struct i915_gem_ww_ctx ww;
> +       u64 pin_flags;
> +       int ret = 0;
> +
> +       if (!vm->vm_bind_mode)
> +               return -EOPNOTSUPP;
> +
> +       obj = i915_gem_object_lookup(file, va->handle);
> +       if (!obj)
> +               return -ENOENT;
> +
> +       if (!va->length ||
> +           !IS_ALIGNED(va->offset | va->length,
> +                       i915_gem_object_max_page_size(obj-
> >mm.placements,
> +                                                     obj-
> >mm.n_placements)) ||
> +           range_overflows_t(u64, va->offset, va->length, obj-
> >base.size)) {
> +               ret = -EINVAL;
> +               goto put_obj;
> +       }
> +
> +       ret = i915_gem_vm_bind_lock_interruptible(vm);
> +       if (ret)
> +               goto put_obj;
> +
> +       vma = vm_bind_get_vma(vm, obj, va);
> +       if (IS_ERR(vma)) {
> +               ret = PTR_ERR(vma);
> +               goto unlock_vm;
> +       }
> +
> +       i915_gem_ww_ctx_init(&ww, true);
> +       pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
> +retry:
> +       ret = i915_gem_object_lock(vma->obj, &ww);
> +       if (ret)
> +               goto out_ww;
> +
> +       ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
> +       if (ret)
> +               goto out_ww;
> +
> +       /* Make it evictable */
> +       __i915_vma_unpin(vma);

A considerable effort has been put into avoiding short term vma pins in
i915. We should add an interface like i915_vma_bind_ww() that avoids
the pin altoghether.

> +
> +       list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
> +       i915_vm_bind_it_insert(vma, &vm->va);
> +
> +       /* Hold object reference until vm_unbind */
> +       i915_gem_object_get(vma->obj);
> +out_ww:
> +       if (ret == -EDEADLK) {
> +               ret = i915_gem_ww_ctx_backoff(&ww);
> +               if (!ret)
> +                       goto retry;
> +       }
> +
> +       if (ret)
> +               i915_vma_destroy(vma);
> +
> +       i915_gem_ww_ctx_fini(&ww);

Could use for_i915_gem_ww()?

> +unlock_vm:
> +       i915_gem_vm_bind_unlock(vm);
> +put_obj:
> +       i915_gem_object_put(obj);
> +       return ret;
> +}
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index b67831833c9a..135dc4a76724 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -176,6 +176,8 @@ int i915_vm_lock_objects(struct
> i915_address_space *vm,
>  void i915_address_space_fini(struct i915_address_space *vm)
>  {
>         drm_mm_takedown(&vm->mm);
> +       GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
> +       mutex_destroy(&vm->vm_bind_lock);
>  }
>  
>  /**
> @@ -282,6 +284,11 @@ void i915_address_space_init(struct
> i915_address_space *vm, int subclass)
>  
>         INIT_LIST_HEAD(&vm->bound_list);
>         INIT_LIST_HEAD(&vm->unbound_list);
> +
> +       vm->va = RB_ROOT_CACHED;
> +       INIT_LIST_HEAD(&vm->vm_bind_list);
> +       INIT_LIST_HEAD(&vm->vm_bound_list);
> +       mutex_init(&vm->vm_bind_lock);
>  }
>  
>  void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index c812aa9708ae..d4a6ce65251d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -259,6 +259,15 @@ struct i915_address_space {
>          */
>         struct list_head unbound_list;
>  
> +       /**
> +        * List of VM_BIND objects.
> +        */

Proper kerneldoc + intel locking guidelines comments, please.

> +       struct mutex vm_bind_lock;  /* Protects vm_bind lists */
> +       struct list_head vm_bind_list;
> +       struct list_head vm_bound_list;
> +       /* va tree of persistent vmas */
> +       struct rb_root_cached va;
> +
>         /* Global GTT */
>         bool is_ggtt:1;
>  
> diff --git a/drivers/gpu/drm/i915/i915_driver.c
> b/drivers/gpu/drm/i915/i915_driver.c
> index ccf990dfd99b..776ab7844f60 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -68,6 +68,7 @@
>  #include "gem/i915_gem_ioctls.h"
>  #include "gem/i915_gem_mman.h"
>  #include "gem/i915_gem_pm.h"
> +#include "gem/i915_gem_vm_bind.h"
>  #include "gt/intel_gt.h"
>  #include "gt/intel_gt_pm.h"
>  #include "gt/intel_rc6.h"
> @@ -1783,13 +1784,16 @@ static int i915_gem_vm_bind_ioctl(struct
> drm_device *dev, void *data,
>  {
>         struct drm_i915_gem_vm_bind *args = data;
>         struct i915_address_space *vm;
> +       int ret;
>  
>         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>         if (unlikely(!vm))
>                 return -ENOENT;
>  
> +       ret = i915_gem_vm_bind_obj(vm, args, file);
> +
>         i915_vm_put(vm);
> -       return -EINVAL;
> +       return ret;
>  }
>  
>  static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void
> *data,
> @@ -1797,13 +1801,16 @@ static int i915_gem_vm_unbind_ioctl(struct
> drm_device *dev, void *data,
>  {
>         struct drm_i915_gem_vm_unbind *args = data;
>         struct i915_address_space *vm;
> +       int ret;
>  
>         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>         if (unlikely(!vm))
>                 return -ENOENT;
>  
> +       ret = i915_gem_vm_unbind_obj(vm, args);
> +
>         i915_vm_put(vm);
> -       return -EINVAL;
> +       return ret;
>  }
>  
>  static const struct drm_ioctl_desc i915_ioctls[] = {
> diff --git a/drivers/gpu/drm/i915/i915_vma.c
> b/drivers/gpu/drm/i915/i915_vma.c
> index 43339ecabd73..d324e29cef0a 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -29,6 +29,7 @@
>  #include "display/intel_frontbuffer.h"
>  #include "gem/i915_gem_lmem.h"
>  #include "gem/i915_gem_tiling.h"
> +#include "gem/i915_gem_vm_bind.h"
>  #include "gt/intel_engine.h"
>  #include "gt/intel_engine_heartbeat.h"
>  #include "gt/intel_gt.h"
> @@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
>         spin_unlock(&obj->vma.lock);
>         mutex_unlock(&vm->mutex);
>  
> +       INIT_LIST_HEAD(&vma->vm_bind_link);
>         return vma;
>  
>  err_unlock:
> @@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object
> *obj,
>  {
>         struct i915_vma *vma;
>  
> -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>         GEM_BUG_ON(!kref_read(&vm->ref));
>  
>         spin_lock(&obj->vma.lock);
> @@ -1660,6 +1661,10 @@ static void release_references(struct i915_vma
> *vma, bool vm_ddestroy)
>  
>         spin_unlock(&obj->vma.lock);
>  
> +       i915_gem_vm_bind_lock(vma->vm);
> +       i915_gem_vm_bind_remove(vma, true);
> +       i915_gem_vm_bind_unlock(vma->vm);
> +

The vm might be destroyed at this point already.

From what I understand we can destroy the vma from three call sites:
1) VM_UNBIND -> The vma has already been removed from the vm_bind
address space,
2) object destruction -> since the vma has an object reference while in
the vm_bind address space, it must also have been removed from the
address space if called from object destruction.
3) vm destruction. Suggestion is to call VM_UNBIND from under the
vm_bind lock early in vm destruction. 

Then the above added code can be removed and replaced with an assert
that the vm_bind address space RB_NODE is indeed empty.


>         spin_lock_irq(&gt->closed_lock);
>         __i915_vma_remove_closed(vma);
>         spin_unlock_irq(&gt->closed_lock);
> diff --git a/drivers/gpu/drm/i915/i915_vma.h
> b/drivers/gpu/drm/i915/i915_vma.h
> index 88ca0bd9c900..dcb49f79ff7e 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
>  {
>         ptrdiff_t cmp;
>  
> -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
> -
>         cmp = ptrdiff(vma->vm, vm);
>         if (cmp)
>                 return cmp;
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> b/drivers/gpu/drm/i915/i915_vma_types.h
> index be6e028c3b57..b6d179bdbfa0 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -289,6 +289,14 @@ struct i915_vma {
>         /** This object's place on the active/inactive lists */
>         struct list_head vm_link;
>  
> +       struct list_head vm_bind_link; /* Link in persistent VMA list
> */
> +
> +       /** Interval tree structures for persistent vma */

Proper kerneldoc.

> +       struct rb_node rb;
> +       u64 start;
> +       u64 last;
> +       u64 __subtree_last;
> +
>         struct list_head obj_link; /* Link in the object's VMA list
> */
>         struct rb_node obj_node;
>         struct hlist_node obj_hash;

Thanks,
Thomas


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 10/10] drm/i915/vm_bind: Fix vm->vm_bind_mutex and vm->mutex nesting
  2022-07-05  8:40     ` [Intel-gfx] " Thomas Hellström
@ 2022-07-06 16:33       ` Ramalingam C
  -1 siblings, 0 replies; 121+ messages in thread
From: Ramalingam C @ 2022-07-06 16:33 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: matthew.brost, paulo.r.zanoni, lionel.g.landwerlin,
	tvrtko.ursulin, intel-gfx, dri-devel, matthew.auld, jason,
	daniel.vetter, Niranjana Vishwanathapura, christian.koenig

On 2022-07-05 at 10:40:56 +0200, Thomas Hellström wrote:
> On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> > VM_BIND functionality maintain that vm->vm_bind_mutex will never be
> > taken
> > while holding vm->mutex.
> > However, while closing 'vm', vma is destroyed while holding vm-
> > >mutex.
> > But vma releasing needs to take vm->vm_bind_mutex in order to delete
> > vma
> > from the vm_bind_list. To avoid this, destroy the vma outside vm-
> > >mutex
> > while closing the 'vm'.
> > 
> > Signed-off-by: Niranjana Vishwanathapura
> 
> First, when introducing a new feature like this, we should not need to
> end the series with "Fix.." patches like this, rather whatever needs to
> be fixed should be fixed where the code was introduced.
Thanks Thomas for the review. I will fix it.
> 
> Second, an analogy whith linux kernel CPU mapping, could we instead
> think of the vm_bind_lock being similar to the mmap_lock, and the
> vm_mutex being similar to the i_mmap_lock, the former being used for VA
> manipulation and the latter when attaching / removing the backing store
> from the VA?
> 
> Then we would not need to take the vm_bind_lock from vma destruction
> since the VA would already have been reclaimed at that point. For vm
> destruction here we'd loop over all relevant vm bind VAs under the
> vm_bind lock and call vm_unbind? Would that work?

Sounds reasonable. I will try this locking approach.

Ram
> 
> /Thomas
> 
> 
> > <niranjana.vishwanathapura@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_gtt.c | 23 ++++++++++++++++++-----
> >  1 file changed, 18 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > index 4ab3bda644ff..4f707d0eb3ef 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > @@ -109,7 +109,8 @@ int map_pt_dma_locked(struct i915_address_space
> > *vm, struct drm_i915_gem_object
> >         return 0;
> >  }
> >  
> > -static void clear_vm_list(struct list_head *list)
> > +static void clear_vm_list(struct list_head *list,
> > +                         struct list_head *destroy_list)
> >  {
> >         struct i915_vma *vma, *vn;
> >  
> > @@ -138,8 +139,7 @@ static void clear_vm_list(struct list_head *list)
> >                         i915_vm_resv_get(vma->vm);
> >                         vma->vm_ddestroy = true;
> >                 } else {
> > -                       i915_vma_destroy_locked(vma);
> > -                       i915_gem_object_put(obj);
> > +                       list_move_tail(&vma->vm_link, destroy_list);
> >                 }
> >  
> >         }
> > @@ -147,16 +147,29 @@ static void clear_vm_list(struct list_head
> > *list)
> >  
> >  static void __i915_vm_close(struct i915_address_space *vm)
> >  {
> > +       struct i915_vma *vma, *vn;
> > +       struct list_head list;
> > +
> > +       INIT_LIST_HEAD(&list);
> > +
> >         mutex_lock(&vm->mutex);
> >  
> > -       clear_vm_list(&vm->bound_list);
> > -       clear_vm_list(&vm->unbound_list);
> > +       clear_vm_list(&vm->bound_list, &list);
> > +       clear_vm_list(&vm->unbound_list, &list);
> >  
> >         /* Check for must-fix unanticipated side-effects */
> >         GEM_BUG_ON(!list_empty(&vm->bound_list));
> >         GEM_BUG_ON(!list_empty(&vm->unbound_list));
> >  
> >         mutex_unlock(&vm->mutex);
> > +
> > +       /* Destroy vmas outside vm->mutex */
> > +       list_for_each_entry_safe(vma, vn, &list, vm_link) {
> > +               struct drm_i915_gem_object *obj = vma->obj;
> > +
> > +               i915_vma_destroy(vma);
> > +               i915_gem_object_put(obj);
> > +       }
> >  }
> >  
> >  /* lock the vm into the current ww, if we lock one, we lock all */
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 10/10] drm/i915/vm_bind: Fix vm->vm_bind_mutex and vm->mutex nesting
@ 2022-07-06 16:33       ` Ramalingam C
  0 siblings, 0 replies; 121+ messages in thread
From: Ramalingam C @ 2022-07-06 16:33 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, matthew.auld,
	daniel.vetter, christian.koenig

On 2022-07-05 at 10:40:56 +0200, Thomas Hellström wrote:
> On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> > VM_BIND functionality maintain that vm->vm_bind_mutex will never be
> > taken
> > while holding vm->mutex.
> > However, while closing 'vm', vma is destroyed while holding vm-
> > >mutex.
> > But vma releasing needs to take vm->vm_bind_mutex in order to delete
> > vma
> > from the vm_bind_list. To avoid this, destroy the vma outside vm-
> > >mutex
> > while closing the 'vm'.
> > 
> > Signed-off-by: Niranjana Vishwanathapura
> 
> First, when introducing a new feature like this, we should not need to
> end the series with "Fix.." patches like this, rather whatever needs to
> be fixed should be fixed where the code was introduced.
Thanks Thomas for the review. I will fix it.
> 
> Second, an analogy whith linux kernel CPU mapping, could we instead
> think of the vm_bind_lock being similar to the mmap_lock, and the
> vm_mutex being similar to the i_mmap_lock, the former being used for VA
> manipulation and the latter when attaching / removing the backing store
> from the VA?
> 
> Then we would not need to take the vm_bind_lock from vma destruction
> since the VA would already have been reclaimed at that point. For vm
> destruction here we'd loop over all relevant vm bind VAs under the
> vm_bind lock and call vm_unbind? Would that work?

Sounds reasonable. I will try this locking approach.

Ram
> 
> /Thomas
> 
> 
> > <niranjana.vishwanathapura@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_gtt.c | 23 ++++++++++++++++++-----
> >  1 file changed, 18 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > index 4ab3bda644ff..4f707d0eb3ef 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > @@ -109,7 +109,8 @@ int map_pt_dma_locked(struct i915_address_space
> > *vm, struct drm_i915_gem_object
> >         return 0;
> >  }
> >  
> > -static void clear_vm_list(struct list_head *list)
> > +static void clear_vm_list(struct list_head *list,
> > +                         struct list_head *destroy_list)
> >  {
> >         struct i915_vma *vma, *vn;
> >  
> > @@ -138,8 +139,7 @@ static void clear_vm_list(struct list_head *list)
> >                         i915_vm_resv_get(vma->vm);
> >                         vma->vm_ddestroy = true;
> >                 } else {
> > -                       i915_vma_destroy_locked(vma);
> > -                       i915_gem_object_put(obj);
> > +                       list_move_tail(&vma->vm_link, destroy_list);
> >                 }
> >  
> >         }
> > @@ -147,16 +147,29 @@ static void clear_vm_list(struct list_head
> > *list)
> >  
> >  static void __i915_vm_close(struct i915_address_space *vm)
> >  {
> > +       struct i915_vma *vma, *vn;
> > +       struct list_head list;
> > +
> > +       INIT_LIST_HEAD(&list);
> > +
> >         mutex_lock(&vm->mutex);
> >  
> > -       clear_vm_list(&vm->bound_list);
> > -       clear_vm_list(&vm->unbound_list);
> > +       clear_vm_list(&vm->bound_list, &list);
> > +       clear_vm_list(&vm->unbound_list, &list);
> >  
> >         /* Check for must-fix unanticipated side-effects */
> >         GEM_BUG_ON(!list_empty(&vm->bound_list));
> >         GEM_BUG_ON(!list_empty(&vm->unbound_list));
> >  
> >         mutex_unlock(&vm->mutex);
> > +
> > +       /* Destroy vmas outside vm->mutex */
> > +       list_for_each_entry_safe(vma, vn, &list, vm_link) {
> > +               struct drm_i915_gem_object *obj = vma->obj;
> > +
> > +               i915_vma_destroy(vma);
> > +               i915_gem_object_put(obj);
> > +       }
> >  }
> >  
> >  /* lock the vm into the current ww, if we lock one, we lock all */
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 01/10] drm/i915/vm_bind: Introduce VM_BIND ioctl
  2022-07-05  9:59     ` Hellstrom, Thomas
@ 2022-07-07  1:18       ` Andi Shyti
  -1 siblings, 0 replies; 121+ messages in thread
From: Andi Shyti @ 2022-07-07  1:18 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, Vishwanathapura, Niranjana, christian.koenig

Hi,

[...]

> > +/*
> > + * VM_BIND feature version supported.
> > + *
> > + * The following versions of VM_BIND have been defined:
> > + *
> > + * 0: No VM_BIND support.
> > + *
> > + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
> > created
> > + *    previously with VM_BIND, the ioctl will not support unbinding
> > multiple
> > + *    mappings or splitting them. Similarly, VM_BIND calls will not
> > replace
> > + *    any existing mappings.
> > + *
> > + * 2: The restrictions on unbinding partial or multiple mappings is
> > + *    lifted, Similarly, binding will replace any mappings in the
> > given range.
> > + *
> > + * See struct drm_i915_gem_vm_bind and struct
> > drm_i915_gem_vm_unbind.
> > + */
> > +#define I915_PARAM_VM_BIND_VERSION     57
> 
> Perhaps clarify that new versions are always backwards compatible?

how is this 57 coherent with the description above?

Andi

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 01/10] drm/i915/vm_bind: Introduce VM_BIND ioctl
@ 2022-07-07  1:18       ` Andi Shyti
  0 siblings, 0 replies; 121+ messages in thread
From: Andi Shyti @ 2022-07-07  1:18 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, christian.koenig

Hi,

[...]

> > +/*
> > + * VM_BIND feature version supported.
> > + *
> > + * The following versions of VM_BIND have been defined:
> > + *
> > + * 0: No VM_BIND support.
> > + *
> > + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
> > created
> > + *    previously with VM_BIND, the ioctl will not support unbinding
> > multiple
> > + *    mappings or splitting them. Similarly, VM_BIND calls will not
> > replace
> > + *    any existing mappings.
> > + *
> > + * 2: The restrictions on unbinding partial or multiple mappings is
> > + *    lifted, Similarly, binding will replace any mappings in the
> > given range.
> > + *
> > + * See struct drm_i915_gem_vm_bind and struct
> > drm_i915_gem_vm_unbind.
> > + */
> > +#define I915_PARAM_VM_BIND_VERSION     57
> 
> Perhaps clarify that new versions are always backwards compatible?

how is this 57 coherent with the description above?

Andi

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
  2022-07-06 16:21     ` [Intel-gfx] " Thomas Hellström
@ 2022-07-07  1:41       ` Andi Shyti
  -1 siblings, 0 replies; 121+ messages in thread
From: Andi Shyti @ 2022-07-07  1:41 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, matthew.auld,
	daniel.vetter, Niranjana Vishwanathapura, christian.koenig

Hi,

[...]

> > @@ -28,7 +28,6 @@ static u32 object_max_page_size(struct
> > intel_memory_region **placements,
> >                 max_page_size = max_t(u32, max_page_size, mr-
> > >min_page_size);
> >         }
> >  
> > -       GEM_BUG_ON(!max_page_size);
> >         return max_page_size;
> >  }
> 
> Should this change be separated out? It's not immediately clear to a
> reviewer why it is included.

no, it's not, indeed... and is it correct to assume that the
default size is I915_GTT_PAGE_SIZE_4K?

[...]

> > +#define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
> > >vm_bind_lock)
> > +
> > +static inline void i915_gem_vm_bind_lock(struct i915_address_space
> > *vm)
> > +{
> > +       mutex_lock(&vm->vm_bind_lock);
> > +}
> > +
> > +static inline int
> > +i915_gem_vm_bind_lock_interruptible(struct i915_address_space *vm)
> > +{
> > +       return mutex_lock_interruptible(&vm->vm_bind_lock);
> > +}
> > +
> > +static inline void i915_gem_vm_bind_unlock(struct i915_address_space
> > *vm)
> > +{
> > +       mutex_unlock(&vm->vm_bind_lock);
> > +}
> > +
> 
> Kerneldoc for the inlines.

do we really need these oneline wrappers?

Andi

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
@ 2022-07-07  1:41       ` Andi Shyti
  0 siblings, 0 replies; 121+ messages in thread
From: Andi Shyti @ 2022-07-07  1:41 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, matthew.auld,
	daniel.vetter, christian.koenig

Hi,

[...]

> > @@ -28,7 +28,6 @@ static u32 object_max_page_size(struct
> > intel_memory_region **placements,
> >                 max_page_size = max_t(u32, max_page_size, mr-
> > >min_page_size);
> >         }
> >  
> > -       GEM_BUG_ON(!max_page_size);
> >         return max_page_size;
> >  }
> 
> Should this change be separated out? It's not immediately clear to a
> reviewer why it is included.

no, it's not, indeed... and is it correct to assume that the
default size is I915_GTT_PAGE_SIZE_4K?

[...]

> > +#define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
> > >vm_bind_lock)
> > +
> > +static inline void i915_gem_vm_bind_lock(struct i915_address_space
> > *vm)
> > +{
> > +       mutex_lock(&vm->vm_bind_lock);
> > +}
> > +
> > +static inline int
> > +i915_gem_vm_bind_lock_interruptible(struct i915_address_space *vm)
> > +{
> > +       return mutex_lock_interruptible(&vm->vm_bind_lock);
> > +}
> > +
> > +static inline void i915_gem_vm_bind_unlock(struct i915_address_space
> > *vm)
> > +{
> > +       mutex_unlock(&vm->vm_bind_lock);
> > +}
> > +
> 
> Kerneldoc for the inlines.

do we really need these oneline wrappers?

Andi

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 01/10] drm/i915/vm_bind: Introduce VM_BIND ioctl
  2022-07-05  9:59     ` Hellstrom, Thomas
@ 2022-07-07  5:01       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-07  5:01 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	Daniel, christian.koenig

On Tue, Jul 05, 2022 at 02:59:24AM -0700, Hellstrom, Thomas wrote:
>Hi,
>
>
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> Add VM_BIND and VM_UNBIND ioctls to bind/unbind a section of an
>> object at the specified GPU virtual addresses.
>>
>> Add I915_PARAM_VM_BIND_VERSION to indicate version of VM_BIND feature
>> supported and I915_VM_CREATE_FLAGS_USE_VM_BIND for UMDs to select the
>> vm_bind mode of binding.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>
>Some comments on patch ordering. In order to ease reviews and to not
>introduce unwanted surprises, could we
>
>1) Add patches that introduce needed internal functionality /
>refactoring / helpers.
>2) Add patches that add enable intended user-space functionality, any
>yet unsupported functionality disabled.
>3) Add patches that introduce additional internal functionality /
>refactoring / helpers.
>4) Add patches that enable that additional functionality.
>
>Fixes that are known at series submission time squashed before a
>feature is enabled.
>

Thanks Thomas for the feedback.

Yah, makes sense.

>
>> ---
>>  drivers/gpu/drm/i915/gem/i915_gem_context.c |  20 +-
>>  drivers/gpu/drm/i915/gem/i915_gem_context.h |  15 ++
>>  drivers/gpu/drm/i915/gt/intel_gtt.h         |   6 +
>>  drivers/gpu/drm/i915/i915_driver.c          |  30 +++
>>  drivers/gpu/drm/i915/i915_getparam.c        |   3 +
>>  include/uapi/drm/i915_drm.h                 | 192
>> +++++++++++++++++++-
>>  6 files changed, 248 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> index dabdfe09f5e5..e3f5fbf2ac05 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> @@ -81,7 +81,6 @@
>>
>>  #include "pxp/intel_pxp.h"
>>
>> -#include "i915_file_private.h"
>>  #include "i915_gem_context.h"
>>  #include "i915_trace.h"
>>  #include "i915_user_extensions.h"
>> @@ -346,20 +345,6 @@ static int proto_context_register(struct
>> drm_i915_file_private *fpriv,
>>         return ret;
>>  }
>>
>> -static struct i915_address_space *
>> -i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
>> -{
>> -       struct i915_address_space *vm;
>> -
>> -       xa_lock(&file_priv->vm_xa);
>> -       vm = xa_load(&file_priv->vm_xa, id);
>> -       if (vm)
>> -               kref_get(&vm->ref);
>> -       xa_unlock(&file_priv->vm_xa);
>> -
>> -       return vm;
>> -}
>> -
>>  static int set_proto_ctx_vm(struct drm_i915_file_private *fpriv,
>>                             struct i915_gem_proto_context *pc,
>>                             const struct drm_i915_gem_context_param
>> *args)
>> @@ -1799,7 +1784,7 @@ int i915_gem_vm_create_ioctl(struct drm_device
>> *dev, void *data,
>>         if (!HAS_FULL_PPGTT(i915))
>>                 return -ENODEV;
>>
>> -       if (args->flags)
>> +       if (args->flags & I915_VM_CREATE_FLAGS_UNKNOWN)
>>                 return -EINVAL;
>>
>>         ppgtt = i915_ppgtt_create(to_gt(i915), 0);
>> @@ -1819,6 +1804,9 @@ int i915_gem_vm_create_ioctl(struct drm_device
>> *dev, void *data,
>>         if (err)
>>                 goto err_put;
>>
>> +       if (args->flags & I915_VM_CREATE_FLAGS_USE_VM_BIND)
>> +               ppgtt->vm.vm_bind_mode = true;
>> +
>>         GEM_BUG_ON(id == 0); /* reserved for invalid/unassigned ppgtt
>> */
>>         args->vm_id = id;
>>         return 0;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h
>> b/drivers/gpu/drm/i915/gem/i915_gem_context.h
>> index e5b0f66ea1fe..723bf446c934 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
>> @@ -12,6 +12,7 @@
>>  #include "gt/intel_context.h"
>>
>>  #include "i915_drv.h"
>> +#include "i915_file_private.h"
>>  #include "i915_gem.h"
>>  #include "i915_scheduler.h"
>>  #include "intel_device_info.h"
>> @@ -139,6 +140,20 @@ int i915_gem_context_setparam_ioctl(struct
>> drm_device *dev, void *data,
>>  int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void
>> *data,
>>                                        struct drm_file *file);
>>
>> +static inline struct i915_address_space *
>> +i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
>> +{
>> +       struct i915_address_space *vm;
>> +
>> +       xa_lock(&file_priv->vm_xa);
>> +       vm = xa_load(&file_priv->vm_xa, id);
>> +       if (vm)
>> +               kref_get(&vm->ref);
>> +       xa_unlock(&file_priv->vm_xa);
>> +
>> +       return vm;
>> +}
>
>Does this really need to be inlined?
>
>> +
>>  struct i915_gem_context *
>>  i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32
>> id);
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index e639434e97fd..c812aa9708ae 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -271,6 +271,12 @@ struct i915_address_space {
>>         /* Skip pte rewrite on unbind for suspend. Protected by
>> @mutex */
>>         bool skip_pte_rewrite:1;
>>
>> +       /**
>> +        * true: allow only vm_bind method of binding.
>> +        * false: allow only legacy execbuff method of binding.
>> +        */
>
>Use proper kerneldoc. (Same holds for structure documentation across
>the series).
>Also please follow internal locking guidelines on documentation of
>members that need protection with locks.
>

I just followed the documentation convention that was already there ;)
I think we need a prep patch in this series that adds kernel-doc for
these structures and then add new fields for vm_bind with proper
kernel-docs.

>> +       bool vm_bind_mode:1;
>> +
>>         u8 top;
>>         u8 pd_shift;
>>         u8 scratch_order;
>> diff --git a/drivers/gpu/drm/i915/i915_driver.c
>> b/drivers/gpu/drm/i915/i915_driver.c
>> index deb8a8b76965..ccf990dfd99b 100644
>> --- a/drivers/gpu/drm/i915/i915_driver.c
>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>> @@ -1778,6 +1778,34 @@ i915_gem_reject_pin_ioctl(struct drm_device
>> *dev, void *data,
>>         return -ENODEV;
>>  }
>>
>> +static int i915_gem_vm_bind_ioctl(struct drm_device *dev, void
>> *data,
>> +                                 struct drm_file *file)
>> +{
>> +       struct drm_i915_gem_vm_bind *args = data;
>> +       struct i915_address_space *vm;
>> +
>> +       vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>> +       if (unlikely(!vm))
>> +               return -ENOENT;
>> +
>> +       i915_vm_put(vm);
>> +       return -EINVAL;
>> +}
>> +
>> +static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void
>> *data,
>> +                                   struct drm_file *file)
>> +{
>> +       struct drm_i915_gem_vm_unbind *args = data;
>> +       struct i915_address_space *vm;
>> +
>> +       vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>> +       if (unlikely(!vm))
>> +               return -ENOENT;
>> +
>> +       i915_vm_put(vm);
>> +       return -EINVAL;
>> +}
>> +
>
>Move these functions to the file of the actual implementation?
>

Yah, makes sense.

>>  static const struct drm_ioctl_desc i915_ioctls[] = {
>>         DRM_IOCTL_DEF_DRV(I915_INIT, drm_noop,
>> DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
>>         DRM_IOCTL_DEF_DRV(I915_FLUSH, drm_noop, DRM_AUTH),
>> @@ -1838,6 +1866,8 @@ static const struct drm_ioctl_desc
>> i915_ioctls[] = {
>>         DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl,
>> DRM_RENDER_ALLOW),
>>         DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE,
>> i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
>>         DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY,
>> i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
>> +       DRM_IOCTL_DEF_DRV(I915_GEM_VM_BIND, i915_gem_vm_bind_ioctl,
>> DRM_RENDER_ALLOW),
>> +       DRM_IOCTL_DEF_DRV(I915_GEM_VM_UNBIND,
>> i915_gem_vm_unbind_ioctl, DRM_RENDER_ALLOW),
>>  };
>>
>>  /*
>> diff --git a/drivers/gpu/drm/i915/i915_getparam.c
>> b/drivers/gpu/drm/i915/i915_getparam.c
>> index 6fd15b39570c..c1d53febc5de 100644
>> --- a/drivers/gpu/drm/i915/i915_getparam.c
>> +++ b/drivers/gpu/drm/i915/i915_getparam.c
>> @@ -175,6 +175,9 @@ int i915_getparam_ioctl(struct drm_device *dev,
>> void *data,
>>         case I915_PARAM_PERF_REVISION:
>>                 value = i915_perf_ioctl_version();
>>                 break;
>> +       case I915_PARAM_VM_BIND_VERSION:
>> +               value = GRAPHICS_VER(i915) >= 12 ? 1 : 0;
>> +               break;
>>         default:
>>                 DRM_DEBUG("Unknown parameter %d\n", param->param);
>>                 return -EINVAL;
>> diff --git a/include/uapi/drm/i915_drm.h
>> b/include/uapi/drm/i915_drm.h
>> index 3e78a00220ea..26cca49717f8 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -470,6 +470,8 @@ typedef struct _drm_i915_sarea {
>>  #define DRM_I915_GEM_VM_CREATE         0x3a
>>  #define DRM_I915_GEM_VM_DESTROY                0x3b
>>  #define DRM_I915_GEM_CREATE_EXT                0x3c
>> +#define DRM_I915_GEM_VM_BIND           0x3d
>> +#define DRM_I915_GEM_VM_UNBIND         0x3e
>>  /* Must be kept compact -- no holes */
>>
>>  #define DRM_IOCTL_I915_INIT            DRM_IOW( DRM_COMMAND_BASE +
>> DRM_I915_INIT, drm_i915_init_t)
>> @@ -534,6 +536,8 @@ typedef struct _drm_i915_sarea {
>>  #define
>> DRM_IOCTL_I915_QUERY                   DRM_IOWR(DRM_COMMAND_BASE +
>> DRM_I915_QUERY, struct drm_i915_query)
>>  #define DRM_IOCTL_I915_GEM_VM_CREATE   DRM_IOWR(DRM_COMMAND_BASE +
>> DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
>>  #define DRM_IOCTL_I915_GEM_VM_DESTROY  DRM_IOW (DRM_COMMAND_BASE +
>> DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
>> +#define DRM_IOCTL_I915_GEM_VM_BIND     DRM_IOWR(DRM_COMMAND_BASE +
>> DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
>> +#define DRM_IOCTL_I915_GEM_VM_UNBIND   DRM_IOWR(DRM_COMMAND_BASE +
>> DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)
>>
>>  /* Allow drivers to submit batchbuffers directly to hardware,
>> relying
>>   * on the security mechanisms provided by hardware.
>> @@ -749,6 +753,25 @@ typedef struct drm_i915_irq_wait {
>>  /* Query if the kernel supports the I915_USERPTR_PROBE flag. */
>>  #define I915_PARAM_HAS_USERPTR_PROBE 56
>>
>> +/*
>> + * VM_BIND feature version supported.
>> + *
>> + * The following versions of VM_BIND have been defined:
>> + *
>> + * 0: No VM_BIND support.
>> + *
>> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
>> created
>> + *    previously with VM_BIND, the ioctl will not support unbinding
>> multiple
>> + *    mappings or splitting them. Similarly, VM_BIND calls will not
>> replace
>> + *    any existing mappings.
>> + *
>> + * 2: The restrictions on unbinding partial or multiple mappings is
>> + *    lifted, Similarly, binding will replace any mappings in the
>> given range.
>> + *
>> + * See struct drm_i915_gem_vm_bind and struct
>> drm_i915_gem_vm_unbind.
>> + */
>> +#define I915_PARAM_VM_BIND_VERSION     57
>
>Perhaps clarify that new versions are always backwards compatible?
>

I thought that is implicit by version 2 definition, but yah making it
explicit will be better.

Niranjana

>> +
>>  /* Must be kept compact -- no holes and well documented */
>>
>>  typedef struct drm_i915_getparam {
>> @@ -1441,6 +1464,41 @@ struct drm_i915_gem_execbuffer2 {
>>  #define i915_execbuffer2_get_context_id(eb2) \
>>         ((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
>>
>> +/**
>> + * struct drm_i915_gem_timeline_fence - An input or output timeline
>> fence.
>> + *
>> + * The operation will wait for input fence to signal.
>> + *
>> + * The returned output fence will be signaled after the completion
>> of the
>> + * operation.
>> + */
>> +struct drm_i915_gem_timeline_fence {
>> +       /** @handle: User's handle for a drm_syncobj to wait on or
>> signal. */
>> +       __u32 handle;
>> +
>> +       /**
>> +        * @flags: Supported flags are:
>> +        *
>> +        * I915_TIMELINE_FENCE_WAIT:
>> +        * Wait for the input fence before the operation.
>> +        *
>> +        * I915_TIMELINE_FENCE_SIGNAL:
>> +        * Return operation completion fence as output.
>> +        */
>> +       __u32 flags;
>> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-
>> (I915_TIMELINE_FENCE_SIGNAL << 1))
>> +
>> +       /**
>> +        * @value: A point in the timeline.
>> +        * Value must be 0 for a binary drm_syncobj. A Value of 0 for
>> a
>> +        * timeline drm_syncobj is invalid as it turns a drm_syncobj
>> into a
>> +        * binary one.
>> +        */
>> +       __u64 value;
>> +};
>> +
>>  struct drm_i915_gem_pin {
>>         /** Handle of the buffer to be pinned. */
>>         __u32 handle;
>> @@ -2397,8 +2455,6 @@ struct drm_i915_gem_context_destroy {
>>   * The id of new VM (bound to the fd) for use with
>> I915_CONTEXT_PARAM_VM is
>>   * returned in the outparam @id.
>>   *
>> - * No flags are defined, with all bits reserved and must be zero.
>> - *
>>   * An extension chain maybe provided, starting with @extensions, and
>> terminated
>>   * by the @next_extension being 0. Currently, no extensions are
>> defined.
>>   *
>> @@ -2410,6 +2466,10 @@ struct drm_i915_gem_context_destroy {
>>   */
>>  struct drm_i915_gem_vm_control {
>>         __u64 extensions;
>> +
>> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1u << 0)
>> +#define I915_VM_CREATE_FLAGS_UNKNOWN \
>> +       (-(I915_VM_CREATE_FLAGS_USE_VM_BIND << 1))
>>         __u32 flags;
>>         __u32 vm_id;
>>  };
>> @@ -3602,6 +3662,134 @@ struct
>> drm_i915_gem_create_ext_protected_content {
>>  /* ID of the protected content session managed by i915 when PXP is
>> active */
>>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>>
>> +/**
>> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>> + *
>> + * This structure is passed to VM_BIND ioctl and specifies the
>> mapping of GPU
>> + * virtual address (VA) range to the section of an object that
>> should be bound
>> + * in the device page table of the specified address space (VM).
>> + * The VA range specified must be unique (ie., not currently bound)
>> and can
>> + * be mapped to whole object or a section of the object (partial
>> binding).
>> + * Multiple VA mappings can be created to the same section of the
>> object
>> + * (aliasing).
>> + *
>> + * The @start, @offset and @length must be 4K page aligned. However
>> the DG2
>> + * and XEHPSDV has 64K page size for device local memory and has
>> compact page
>> + * table. On those platforms, for binding device local-memory
>> objects, the
>> + * @start, @offset and @length must be 64K aligned. Also, UMDs
>> should not mix
>> + * the local memory 64K page and the system memory 4K page bindings
>> in the same
>> + * 2M range.
>> + *
>> + * Error code -EINVAL will be returned if @start, @offset and
>> @length are not
>> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION),
>> error code
>> + * -ENOSPC will be returned if the VA range specified can't be
>> reserved.
>> + *
>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>> concurrently
>> + * are not ordered. Furthermore, parts of the VM_BIND operation can
>> be done
>> + * asynchronously, if valid @fence is specified.
>> + */
>> +struct drm_i915_gem_vm_bind {
>> +       /** @vm_id: VM (address space) id to bind */
>> +       __u32 vm_id;
>> +
>> +       /** @handle: Object handle */
>> +       __u32 handle;
>> +
>> +       /** @start: Virtual Address start to bind */
>> +       __u64 start;
>> +
>> +       /** @offset: Offset in object to bind */
>> +       __u64 offset;
>> +
>> +       /** @length: Length of mapping to bind */
>> +       __u64 length;
>> +
>> +       /**
>> +        * @flags: Currently reserved, MBZ.
>> +        *
>> +        * Note that @fence carries its own flags.
>> +        */
>> +       __u64 flags;
>> +
>> +       /**
>> +        * @fence: Timeline fence for bind completion signaling.
>> +        *
>> +        * Timeline fence is of format struct
>> drm_i915_gem_timeline_fence.
>> +        *
>> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT
>> flag
>> +        * is invalid, and an error will be returned.
>> +        *
>> +        * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out
>> fence
>> +        * is not requested and binding is completed synchronously.
>> +        */
>> +       struct drm_i915_gem_timeline_fence fence;
>> +
>> +       /**
>> +        * @extensions: Zero-terminated chain of extensions.
>> +        *
>> +        * For future extensions. See struct i915_user_extension.
>> +        */
>> +       __u64 extensions;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>> + *
>> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU
>> virtual
>> + * address (VA) range that should be unbound from the device page
>> table of the
>> + * specified address space (VM). VM_UNBIND will force unbind the
>> specified
>> + * range from device page table without waiting for any GPU job to
>> complete.
>> + * It is UMDs responsibility to ensure the mapping is no longer in
>> use before
>> + * calling VM_UNBIND.
>> + *
>> + * If the specified mapping is not found, the ioctl will simply
>> return without
>> + * any error.
>> + *
>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>> concurrently
>> + * are not ordered. Furthermore, parts of the VM_UNBIND operation
>> can be done
>> + * asynchronously, if valid @fence is specified.
>> + */
>> +struct drm_i915_gem_vm_unbind {
>> +       /** @vm_id: VM (address space) id to bind */
>> +       __u32 vm_id;
>> +
>> +       /** @rsvd: Reserved, MBZ */
>> +       __u32 rsvd;
>> +
>> +       /** @start: Virtual Address start to unbind */
>> +       __u64 start;
>> +
>> +       /** @length: Length of mapping to unbind */
>> +       __u64 length;
>> +
>> +       /**
>> +        * @flags: Currently reserved, MBZ.
>> +        *
>> +        * Note that @fence carries its own flags.
>> +        */
>> +       __u64 flags;
>> +
>> +       /**
>> +        * @fence: Timeline fence for unbind completion signaling.
>> +        *
>> +        * Timeline fence is of format struct
>> drm_i915_gem_timeline_fence.
>> +        *
>> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT
>> flag
>> +        * is invalid, and an error will be returned.
>> +        *
>> +        * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out
>> fence
>> +        * is not requested and unbinding is completed synchronously.
>> +        */
>> +       struct drm_i915_gem_timeline_fence fence;
>> +
>> +       /**
>> +        * @extensions: Zero-terminated chain of extensions.
>> +        *
>> +        * For future extensions. See struct i915_user_extension.
>> +        */
>> +       __u64 extensions;
>> +};
>> +
>>  #if defined(__cplusplus)
>>  }
>>  #endif
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 01/10] drm/i915/vm_bind: Introduce VM_BIND ioctl
@ 2022-07-07  5:01       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-07  5:01 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, christian.koenig

On Tue, Jul 05, 2022 at 02:59:24AM -0700, Hellstrom, Thomas wrote:
>Hi,
>
>
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> Add VM_BIND and VM_UNBIND ioctls to bind/unbind a section of an
>> object at the specified GPU virtual addresses.
>>
>> Add I915_PARAM_VM_BIND_VERSION to indicate version of VM_BIND feature
>> supported and I915_VM_CREATE_FLAGS_USE_VM_BIND for UMDs to select the
>> vm_bind mode of binding.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>
>Some comments on patch ordering. In order to ease reviews and to not
>introduce unwanted surprises, could we
>
>1) Add patches that introduce needed internal functionality /
>refactoring / helpers.
>2) Add patches that add enable intended user-space functionality, any
>yet unsupported functionality disabled.
>3) Add patches that introduce additional internal functionality /
>refactoring / helpers.
>4) Add patches that enable that additional functionality.
>
>Fixes that are known at series submission time squashed before a
>feature is enabled.
>

Thanks Thomas for the feedback.

Yah, makes sense.

>
>> ---
>>  drivers/gpu/drm/i915/gem/i915_gem_context.c |  20 +-
>>  drivers/gpu/drm/i915/gem/i915_gem_context.h |  15 ++
>>  drivers/gpu/drm/i915/gt/intel_gtt.h         |   6 +
>>  drivers/gpu/drm/i915/i915_driver.c          |  30 +++
>>  drivers/gpu/drm/i915/i915_getparam.c        |   3 +
>>  include/uapi/drm/i915_drm.h                 | 192
>> +++++++++++++++++++-
>>  6 files changed, 248 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> index dabdfe09f5e5..e3f5fbf2ac05 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> @@ -81,7 +81,6 @@
>>
>>  #include "pxp/intel_pxp.h"
>>
>> -#include "i915_file_private.h"
>>  #include "i915_gem_context.h"
>>  #include "i915_trace.h"
>>  #include "i915_user_extensions.h"
>> @@ -346,20 +345,6 @@ static int proto_context_register(struct
>> drm_i915_file_private *fpriv,
>>         return ret;
>>  }
>>
>> -static struct i915_address_space *
>> -i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
>> -{
>> -       struct i915_address_space *vm;
>> -
>> -       xa_lock(&file_priv->vm_xa);
>> -       vm = xa_load(&file_priv->vm_xa, id);
>> -       if (vm)
>> -               kref_get(&vm->ref);
>> -       xa_unlock(&file_priv->vm_xa);
>> -
>> -       return vm;
>> -}
>> -
>>  static int set_proto_ctx_vm(struct drm_i915_file_private *fpriv,
>>                             struct i915_gem_proto_context *pc,
>>                             const struct drm_i915_gem_context_param
>> *args)
>> @@ -1799,7 +1784,7 @@ int i915_gem_vm_create_ioctl(struct drm_device
>> *dev, void *data,
>>         if (!HAS_FULL_PPGTT(i915))
>>                 return -ENODEV;
>>
>> -       if (args->flags)
>> +       if (args->flags & I915_VM_CREATE_FLAGS_UNKNOWN)
>>                 return -EINVAL;
>>
>>         ppgtt = i915_ppgtt_create(to_gt(i915), 0);
>> @@ -1819,6 +1804,9 @@ int i915_gem_vm_create_ioctl(struct drm_device
>> *dev, void *data,
>>         if (err)
>>                 goto err_put;
>>
>> +       if (args->flags & I915_VM_CREATE_FLAGS_USE_VM_BIND)
>> +               ppgtt->vm.vm_bind_mode = true;
>> +
>>         GEM_BUG_ON(id == 0); /* reserved for invalid/unassigned ppgtt
>> */
>>         args->vm_id = id;
>>         return 0;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h
>> b/drivers/gpu/drm/i915/gem/i915_gem_context.h
>> index e5b0f66ea1fe..723bf446c934 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
>> @@ -12,6 +12,7 @@
>>  #include "gt/intel_context.h"
>>
>>  #include "i915_drv.h"
>> +#include "i915_file_private.h"
>>  #include "i915_gem.h"
>>  #include "i915_scheduler.h"
>>  #include "intel_device_info.h"
>> @@ -139,6 +140,20 @@ int i915_gem_context_setparam_ioctl(struct
>> drm_device *dev, void *data,
>>  int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void
>> *data,
>>                                        struct drm_file *file);
>>
>> +static inline struct i915_address_space *
>> +i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
>> +{
>> +       struct i915_address_space *vm;
>> +
>> +       xa_lock(&file_priv->vm_xa);
>> +       vm = xa_load(&file_priv->vm_xa, id);
>> +       if (vm)
>> +               kref_get(&vm->ref);
>> +       xa_unlock(&file_priv->vm_xa);
>> +
>> +       return vm;
>> +}
>
>Does this really need to be inlined?
>
>> +
>>  struct i915_gem_context *
>>  i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32
>> id);
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index e639434e97fd..c812aa9708ae 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -271,6 +271,12 @@ struct i915_address_space {
>>         /* Skip pte rewrite on unbind for suspend. Protected by
>> @mutex */
>>         bool skip_pte_rewrite:1;
>>
>> +       /**
>> +        * true: allow only vm_bind method of binding.
>> +        * false: allow only legacy execbuff method of binding.
>> +        */
>
>Use proper kerneldoc. (Same holds for structure documentation across
>the series).
>Also please follow internal locking guidelines on documentation of
>members that need protection with locks.
>

I just followed the documentation convention that was already there ;)
I think we need a prep patch in this series that adds kernel-doc for
these structures and then add new fields for vm_bind with proper
kernel-docs.

>> +       bool vm_bind_mode:1;
>> +
>>         u8 top;
>>         u8 pd_shift;
>>         u8 scratch_order;
>> diff --git a/drivers/gpu/drm/i915/i915_driver.c
>> b/drivers/gpu/drm/i915/i915_driver.c
>> index deb8a8b76965..ccf990dfd99b 100644
>> --- a/drivers/gpu/drm/i915/i915_driver.c
>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>> @@ -1778,6 +1778,34 @@ i915_gem_reject_pin_ioctl(struct drm_device
>> *dev, void *data,
>>         return -ENODEV;
>>  }
>>
>> +static int i915_gem_vm_bind_ioctl(struct drm_device *dev, void
>> *data,
>> +                                 struct drm_file *file)
>> +{
>> +       struct drm_i915_gem_vm_bind *args = data;
>> +       struct i915_address_space *vm;
>> +
>> +       vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>> +       if (unlikely(!vm))
>> +               return -ENOENT;
>> +
>> +       i915_vm_put(vm);
>> +       return -EINVAL;
>> +}
>> +
>> +static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void
>> *data,
>> +                                   struct drm_file *file)
>> +{
>> +       struct drm_i915_gem_vm_unbind *args = data;
>> +       struct i915_address_space *vm;
>> +
>> +       vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>> +       if (unlikely(!vm))
>> +               return -ENOENT;
>> +
>> +       i915_vm_put(vm);
>> +       return -EINVAL;
>> +}
>> +
>
>Move these functions to the file of the actual implementation?
>

Yah, makes sense.

>>  static const struct drm_ioctl_desc i915_ioctls[] = {
>>         DRM_IOCTL_DEF_DRV(I915_INIT, drm_noop,
>> DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
>>         DRM_IOCTL_DEF_DRV(I915_FLUSH, drm_noop, DRM_AUTH),
>> @@ -1838,6 +1866,8 @@ static const struct drm_ioctl_desc
>> i915_ioctls[] = {
>>         DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl,
>> DRM_RENDER_ALLOW),
>>         DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE,
>> i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
>>         DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY,
>> i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
>> +       DRM_IOCTL_DEF_DRV(I915_GEM_VM_BIND, i915_gem_vm_bind_ioctl,
>> DRM_RENDER_ALLOW),
>> +       DRM_IOCTL_DEF_DRV(I915_GEM_VM_UNBIND,
>> i915_gem_vm_unbind_ioctl, DRM_RENDER_ALLOW),
>>  };
>>
>>  /*
>> diff --git a/drivers/gpu/drm/i915/i915_getparam.c
>> b/drivers/gpu/drm/i915/i915_getparam.c
>> index 6fd15b39570c..c1d53febc5de 100644
>> --- a/drivers/gpu/drm/i915/i915_getparam.c
>> +++ b/drivers/gpu/drm/i915/i915_getparam.c
>> @@ -175,6 +175,9 @@ int i915_getparam_ioctl(struct drm_device *dev,
>> void *data,
>>         case I915_PARAM_PERF_REVISION:
>>                 value = i915_perf_ioctl_version();
>>                 break;
>> +       case I915_PARAM_VM_BIND_VERSION:
>> +               value = GRAPHICS_VER(i915) >= 12 ? 1 : 0;
>> +               break;
>>         default:
>>                 DRM_DEBUG("Unknown parameter %d\n", param->param);
>>                 return -EINVAL;
>> diff --git a/include/uapi/drm/i915_drm.h
>> b/include/uapi/drm/i915_drm.h
>> index 3e78a00220ea..26cca49717f8 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -470,6 +470,8 @@ typedef struct _drm_i915_sarea {
>>  #define DRM_I915_GEM_VM_CREATE         0x3a
>>  #define DRM_I915_GEM_VM_DESTROY                0x3b
>>  #define DRM_I915_GEM_CREATE_EXT                0x3c
>> +#define DRM_I915_GEM_VM_BIND           0x3d
>> +#define DRM_I915_GEM_VM_UNBIND         0x3e
>>  /* Must be kept compact -- no holes */
>>
>>  #define DRM_IOCTL_I915_INIT            DRM_IOW( DRM_COMMAND_BASE +
>> DRM_I915_INIT, drm_i915_init_t)
>> @@ -534,6 +536,8 @@ typedef struct _drm_i915_sarea {
>>  #define
>> DRM_IOCTL_I915_QUERY                   DRM_IOWR(DRM_COMMAND_BASE +
>> DRM_I915_QUERY, struct drm_i915_query)
>>  #define DRM_IOCTL_I915_GEM_VM_CREATE   DRM_IOWR(DRM_COMMAND_BASE +
>> DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
>>  #define DRM_IOCTL_I915_GEM_VM_DESTROY  DRM_IOW (DRM_COMMAND_BASE +
>> DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
>> +#define DRM_IOCTL_I915_GEM_VM_BIND     DRM_IOWR(DRM_COMMAND_BASE +
>> DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
>> +#define DRM_IOCTL_I915_GEM_VM_UNBIND   DRM_IOWR(DRM_COMMAND_BASE +
>> DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_unbind)
>>
>>  /* Allow drivers to submit batchbuffers directly to hardware,
>> relying
>>   * on the security mechanisms provided by hardware.
>> @@ -749,6 +753,25 @@ typedef struct drm_i915_irq_wait {
>>  /* Query if the kernel supports the I915_USERPTR_PROBE flag. */
>>  #define I915_PARAM_HAS_USERPTR_PROBE 56
>>
>> +/*
>> + * VM_BIND feature version supported.
>> + *
>> + * The following versions of VM_BIND have been defined:
>> + *
>> + * 0: No VM_BIND support.
>> + *
>> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
>> created
>> + *    previously with VM_BIND, the ioctl will not support unbinding
>> multiple
>> + *    mappings or splitting them. Similarly, VM_BIND calls will not
>> replace
>> + *    any existing mappings.
>> + *
>> + * 2: The restrictions on unbinding partial or multiple mappings is
>> + *    lifted, Similarly, binding will replace any mappings in the
>> given range.
>> + *
>> + * See struct drm_i915_gem_vm_bind and struct
>> drm_i915_gem_vm_unbind.
>> + */
>> +#define I915_PARAM_VM_BIND_VERSION     57
>
>Perhaps clarify that new versions are always backwards compatible?
>

I thought that is implicit by version 2 definition, but yah making it
explicit will be better.

Niranjana

>> +
>>  /* Must be kept compact -- no holes and well documented */
>>
>>  typedef struct drm_i915_getparam {
>> @@ -1441,6 +1464,41 @@ struct drm_i915_gem_execbuffer2 {
>>  #define i915_execbuffer2_get_context_id(eb2) \
>>         ((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
>>
>> +/**
>> + * struct drm_i915_gem_timeline_fence - An input or output timeline
>> fence.
>> + *
>> + * The operation will wait for input fence to signal.
>> + *
>> + * The returned output fence will be signaled after the completion
>> of the
>> + * operation.
>> + */
>> +struct drm_i915_gem_timeline_fence {
>> +       /** @handle: User's handle for a drm_syncobj to wait on or
>> signal. */
>> +       __u32 handle;
>> +
>> +       /**
>> +        * @flags: Supported flags are:
>> +        *
>> +        * I915_TIMELINE_FENCE_WAIT:
>> +        * Wait for the input fence before the operation.
>> +        *
>> +        * I915_TIMELINE_FENCE_SIGNAL:
>> +        * Return operation completion fence as output.
>> +        */
>> +       __u32 flags;
>> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-
>> (I915_TIMELINE_FENCE_SIGNAL << 1))
>> +
>> +       /**
>> +        * @value: A point in the timeline.
>> +        * Value must be 0 for a binary drm_syncobj. A Value of 0 for
>> a
>> +        * timeline drm_syncobj is invalid as it turns a drm_syncobj
>> into a
>> +        * binary one.
>> +        */
>> +       __u64 value;
>> +};
>> +
>>  struct drm_i915_gem_pin {
>>         /** Handle of the buffer to be pinned. */
>>         __u32 handle;
>> @@ -2397,8 +2455,6 @@ struct drm_i915_gem_context_destroy {
>>   * The id of new VM (bound to the fd) for use with
>> I915_CONTEXT_PARAM_VM is
>>   * returned in the outparam @id.
>>   *
>> - * No flags are defined, with all bits reserved and must be zero.
>> - *
>>   * An extension chain maybe provided, starting with @extensions, and
>> terminated
>>   * by the @next_extension being 0. Currently, no extensions are
>> defined.
>>   *
>> @@ -2410,6 +2466,10 @@ struct drm_i915_gem_context_destroy {
>>   */
>>  struct drm_i915_gem_vm_control {
>>         __u64 extensions;
>> +
>> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1u << 0)
>> +#define I915_VM_CREATE_FLAGS_UNKNOWN \
>> +       (-(I915_VM_CREATE_FLAGS_USE_VM_BIND << 1))
>>         __u32 flags;
>>         __u32 vm_id;
>>  };
>> @@ -3602,6 +3662,134 @@ struct
>> drm_i915_gem_create_ext_protected_content {
>>  /* ID of the protected content session managed by i915 when PXP is
>> active */
>>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>>
>> +/**
>> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>> + *
>> + * This structure is passed to VM_BIND ioctl and specifies the
>> mapping of GPU
>> + * virtual address (VA) range to the section of an object that
>> should be bound
>> + * in the device page table of the specified address space (VM).
>> + * The VA range specified must be unique (ie., not currently bound)
>> and can
>> + * be mapped to whole object or a section of the object (partial
>> binding).
>> + * Multiple VA mappings can be created to the same section of the
>> object
>> + * (aliasing).
>> + *
>> + * The @start, @offset and @length must be 4K page aligned. However
>> the DG2
>> + * and XEHPSDV has 64K page size for device local memory and has
>> compact page
>> + * table. On those platforms, for binding device local-memory
>> objects, the
>> + * @start, @offset and @length must be 64K aligned. Also, UMDs
>> should not mix
>> + * the local memory 64K page and the system memory 4K page bindings
>> in the same
>> + * 2M range.
>> + *
>> + * Error code -EINVAL will be returned if @start, @offset and
>> @length are not
>> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION),
>> error code
>> + * -ENOSPC will be returned if the VA range specified can't be
>> reserved.
>> + *
>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>> concurrently
>> + * are not ordered. Furthermore, parts of the VM_BIND operation can
>> be done
>> + * asynchronously, if valid @fence is specified.
>> + */
>> +struct drm_i915_gem_vm_bind {
>> +       /** @vm_id: VM (address space) id to bind */
>> +       __u32 vm_id;
>> +
>> +       /** @handle: Object handle */
>> +       __u32 handle;
>> +
>> +       /** @start: Virtual Address start to bind */
>> +       __u64 start;
>> +
>> +       /** @offset: Offset in object to bind */
>> +       __u64 offset;
>> +
>> +       /** @length: Length of mapping to bind */
>> +       __u64 length;
>> +
>> +       /**
>> +        * @flags: Currently reserved, MBZ.
>> +        *
>> +        * Note that @fence carries its own flags.
>> +        */
>> +       __u64 flags;
>> +
>> +       /**
>> +        * @fence: Timeline fence for bind completion signaling.
>> +        *
>> +        * Timeline fence is of format struct
>> drm_i915_gem_timeline_fence.
>> +        *
>> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT
>> flag
>> +        * is invalid, and an error will be returned.
>> +        *
>> +        * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out
>> fence
>> +        * is not requested and binding is completed synchronously.
>> +        */
>> +       struct drm_i915_gem_timeline_fence fence;
>> +
>> +       /**
>> +        * @extensions: Zero-terminated chain of extensions.
>> +        *
>> +        * For future extensions. See struct i915_user_extension.
>> +        */
>> +       __u64 extensions;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>> + *
>> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU
>> virtual
>> + * address (VA) range that should be unbound from the device page
>> table of the
>> + * specified address space (VM). VM_UNBIND will force unbind the
>> specified
>> + * range from device page table without waiting for any GPU job to
>> complete.
>> + * It is UMDs responsibility to ensure the mapping is no longer in
>> use before
>> + * calling VM_UNBIND.
>> + *
>> + * If the specified mapping is not found, the ioctl will simply
>> return without
>> + * any error.
>> + *
>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>> concurrently
>> + * are not ordered. Furthermore, parts of the VM_UNBIND operation
>> can be done
>> + * asynchronously, if valid @fence is specified.
>> + */
>> +struct drm_i915_gem_vm_unbind {
>> +       /** @vm_id: VM (address space) id to bind */
>> +       __u32 vm_id;
>> +
>> +       /** @rsvd: Reserved, MBZ */
>> +       __u32 rsvd;
>> +
>> +       /** @start: Virtual Address start to unbind */
>> +       __u64 start;
>> +
>> +       /** @length: Length of mapping to unbind */
>> +       __u64 length;
>> +
>> +       /**
>> +        * @flags: Currently reserved, MBZ.
>> +        *
>> +        * Note that @fence carries its own flags.
>> +        */
>> +       __u64 flags;
>> +
>> +       /**
>> +        * @fence: Timeline fence for unbind completion signaling.
>> +        *
>> +        * Timeline fence is of format struct
>> drm_i915_gem_timeline_fence.
>> +        *
>> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT
>> flag
>> +        * is invalid, and an error will be returned.
>> +        *
>> +        * If I915_TIMELINE_FENCE_SIGNAL flag is not set, then out
>> fence
>> +        * is not requested and unbinding is completed synchronously.
>> +        */
>> +       struct drm_i915_gem_timeline_fence fence;
>> +
>> +       /**
>> +        * @extensions: Zero-terminated chain of extensions.
>> +        *
>> +        * For future extensions. See struct i915_user_extension.
>> +        */
>> +       __u64 extensions;
>> +};
>> +
>>  #if defined(__cplusplus)
>>  }
>>  #endif
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 01/10] drm/i915/vm_bind: Introduce VM_BIND ioctl
  2022-07-07  1:18       ` Andi Shyti
  (?)
@ 2022-07-07  5:06       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-07  5:06 UTC (permalink / raw)
  To: Andi Shyti
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Hellstrom, Thomas, Auld,
	Matthew, Vetter, Daniel, christian.koenig

On Thu, Jul 07, 2022 at 03:18:15AM +0200, Andi Shyti wrote:
>Hi,
>
>[...]
>
>> > +/*
>> > + * VM_BIND feature version supported.
>> > + *
>> > + * The following versions of VM_BIND have been defined:
>> > + *
>> > + * 0: No VM_BIND support.
>> > + *
>> > + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
>> > created
>> > + *    previously with VM_BIND, the ioctl will not support unbinding
>> > multiple
>> > + *    mappings or splitting them. Similarly, VM_BIND calls will not
>> > replace
>> > + *    any existing mappings.
>> > + *
>> > + * 2: The restrictions on unbinding partial or multiple mappings is
>> > + *    lifted, Similarly, binding will replace any mappings in the
>> > given range.
>> > + *
>> > + * See struct drm_i915_gem_vm_bind and struct
>> > drm_i915_gem_vm_unbind.
>> > + */
>> > +#define I915_PARAM_VM_BIND_VERSION     57
>>
>> Perhaps clarify that new versions are always backwards compatible?
>
>how is this 57 coherent with the description above?
>

57 is the next availble I915_PARAM_* number (from i915_drm.h). The
description above is regarding that 'value' it returns.

Niranjana

>Andi

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
  2022-07-06 16:21     ` [Intel-gfx] " Thomas Hellström
@ 2022-07-07  5:43       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-07  5:43 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: matthew.brost, paulo.r.zanoni, lionel.g.landwerlin,
	tvrtko.ursulin, intel-gfx, dri-devel, matthew.auld, jason,
	daniel.vetter, christian.koenig

On Wed, Jul 06, 2022 at 06:21:03PM +0200, Thomas Hellström wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> Bind and unbind the mappings upon VM_BIND and VM_UNBIND calls.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
>> ---
>>  drivers/gpu/drm/i915/Makefile                 |   1 +
>>  drivers/gpu/drm/i915/gem/i915_gem_create.c    |  10 +-
>>  drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +
>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  38 +++
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 233
>> ++++++++++++++++++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |   7 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>>  drivers/gpu/drm/i915/i915_driver.c            |  11 +-
>>  drivers/gpu/drm/i915/i915_vma.c               |   7 +-
>>  drivers/gpu/drm/i915/i915_vma.h               |   2 -
>>  drivers/gpu/drm/i915/i915_vma_types.h         |   8 +
>>  11 files changed, 318 insertions(+), 10 deletions(-)
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>  create mode 100644
>> drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>
>> diff --git a/drivers/gpu/drm/i915/Makefile
>> b/drivers/gpu/drm/i915/Makefile
>> index 522ef9b4aff3..4e1627e96c6e 100644
>> --- a/drivers/gpu/drm/i915/Makefile
>> +++ b/drivers/gpu/drm/i915/Makefile
>> @@ -165,6 +165,7 @@ gem-y += \
>>         gem/i915_gem_ttm_move.o \
>>         gem/i915_gem_ttm_pm.o \
>>         gem/i915_gem_userptr.o \
>> +       gem/i915_gem_vm_bind_object.o \
>>         gem/i915_gem_wait.o \
>>         gem/i915_gemfs.o
>>  i915-y += \
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> index 33673fe7ee0a..927a87e5ec59 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> @@ -15,10 +15,10 @@
>>  #include "i915_trace.h"
>>  #include "i915_user_extensions.h"
>>  
>> -static u32 object_max_page_size(struct intel_memory_region
>> **placements,
>> -                               unsigned int n_placements)
>> +u32 i915_gem_object_max_page_size(struct intel_memory_region
>> **placements,
>
>Kerneldoc.

This is an existing function that is being modified. As I
mentioned in other thread, we probably need a prep patch early
in this series to add missing kernel-docs in i915 which this
patch series would later update.

>
>> +                                 unsigned int n_placements)
>>  {
>> -       u32 max_page_size = 0;
>> +       u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>>         int i;
>>  
>>         for (i = 0; i < n_placements; i++) {
>> @@ -28,7 +28,6 @@ static u32 object_max_page_size(struct
>> intel_memory_region **placements,
>>                 max_page_size = max_t(u32, max_page_size, mr-
>> >min_page_size);
>>         }
>>  
>> -       GEM_BUG_ON(!max_page_size);
>>         return max_page_size;
>>  }
>
>Should this change be separated out? It's not immediately clear to a
>reviewer why it is included.

It is being removed as max_page_size now has a non-zero default
value and hence this check is not valid anymore.

>
>>  
>> @@ -99,7 +98,8 @@ __i915_gem_object_create_user_ext(struct
>> drm_i915_private *i915, u64 size,
>>  
>>         i915_gem_flush_free_objects(i915);
>>  
>> -       size = round_up(size, object_max_page_size(placements,
>> n_placements));
>> +       size = round_up(size,
>> i915_gem_object_max_page_size(placements,
>> +                                                          
>> n_placements));
>>         if (size == 0)
>>                 return ERR_PTR(-EINVAL);
>>  
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> index 6f0a3ce35567..650de2224843 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> @@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64
>> size)
>>  }
>>  
>>  void i915_gem_init__objects(struct drm_i915_private *i915);
>> +u32 i915_gem_object_max_page_size(struct intel_memory_region
>> **placements,
>> +                                 unsigned int n_placements);
>>  
>>  void i915_objects_module_exit(void);
>>  int i915_objects_module_init(void);
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> new file mode 100644
>> index 000000000000..642cdb559f17
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> @@ -0,0 +1,38 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +#ifndef __I915_GEM_VM_BIND_H
>> +#define __I915_GEM_VM_BIND_H
>> +
>> +#include "i915_drv.h"
>> +
>> +#define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
>> >vm_bind_lock)
>> +
>> +static inline void i915_gem_vm_bind_lock(struct i915_address_space
>> *vm)
>> +{
>> +       mutex_lock(&vm->vm_bind_lock);
>> +}
>> +
>> +static inline int
>> +i915_gem_vm_bind_lock_interruptible(struct i915_address_space *vm)
>> +{
>> +       return mutex_lock_interruptible(&vm->vm_bind_lock);
>> +}
>> +
>> +static inline void i915_gem_vm_bind_unlock(struct i915_address_space
>> *vm)
>> +{
>> +       mutex_unlock(&vm->vm_bind_lock);
>> +}
>> +
>
>Kerneldoc for the inlines.
>
>> +struct i915_vma *
>> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
>> release_obj);
>> +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>> +                        struct drm_i915_gem_vm_bind *va,
>> +                        struct drm_file *file);
>> +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
>> +                          struct drm_i915_gem_vm_unbind *va);
>> +
>> +#endif /* __I915_GEM_VM_BIND_H */
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> new file mode 100644
>> index 000000000000..43ceb4dcca6c
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> @@ -0,0 +1,233 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +#include <linux/interval_tree_generic.h>
>> +
>> +#include "gem/i915_gem_vm_bind.h"
>> +#include "gt/gen8_engine_cs.h"
>> +
>> +#include "i915_drv.h"
>> +#include "i915_gem_gtt.h"
>> +
>> +#define START(node) ((node)->start)
>> +#define LAST(node) ((node)->last)
>> +
>> +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
>> +                    START, LAST, static inline, i915_vm_bind_it)
>> +
>> +#undef START
>> +#undef LAST
>> +
>> +/**
>> + * DOC: VM_BIND/UNBIND ioctls
>> + *
>> + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM
>> buffer
>> + * objects (BOs) or sections of a BOs at specified GPU virtual
>> addresses on a
>> + * specified address space (VM). Multiple mappings can map to the
>> same physical
>> + * pages of an object (aliasing). These mappings (also referred to
>> as persistent
>> + * mappings) will be persistent across multiple GPU submissions
>> (execbuf calls)
>> + * issued by the UMD, without user having to provide a list of all
>> required
>> + * mappings during each submission (as required by older execbuf
>> mode).
>> + *
>> + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out
>> fence for
>> + * signaling the completion of bind/unbind operation.
>> + *
>> + * VM_BIND feature is advertised to user via
>> I915_PARAM_VM_BIND_VERSION.
>> + * User has to opt-in for VM_BIND mode of binding for an address
>> space (VM)
>> + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND
>> extension.
>> + *
>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>> concurrently
>> + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND
>> operations can be
>> + * done asynchronously, when valid out fence is specified.
>> + *
>> + * VM_BIND locking order is as below.
>> + *
>> + * 1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock
>> is taken in
>> + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while
>> releasing the
>> + *    mapping.
>> + *
>> + *    In future, when GPU page faults are supported, we can
>> potentially use a
>> + *    rwsem instead, so that multiple page fault handlers can take
>> the read
>> + *    side lock to lookup the mapping and hence can run in parallel.
>> + *    The older execbuf mode of binding do not need this lock.
>> + *
>> + * 2) Lock-B: The object's dma-resv lock will protect i915_vma state
>> and needs
>> + *    to be held while binding/unbinding a vma in the async worker
>> and while
>> + *    updating dma-resv fence list of an object. Note that private
>> BOs of a VM
>> + *    will all share a dma-resv object.
>> + *
>> + *    The future system allocator support will use the HMM
>> prescribed locking
>> + *    instead.
>
>I don't think the last sentence is relevant for this series. Also, are
>there any other mentions for Locks A, B and C? If not, can we ditch
>that naming?

It is taken from design rfc :). Yah, I think better to remove it and
probably the lock names and make it more specific to the implementation
in this patch series.

>
>> + *
>> + * 3) Lock-C: Spinlock/s to protect some of the VM's lists like the
>> list of
>> + *    invalidated vmas (due to eviction and userptr invalidation)
>> etc.
>> + */
>> +
>> +struct i915_vma *
>> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
>
>Kerneldoc for the extern functions.
>
>
>> +{
>> +       struct i915_vma *vma, *temp;
>> +
>> +       assert_vm_bind_held(vm);
>> +
>> +       vma = i915_vm_bind_it_iter_first(&vm->va, va, va);
>> +       /* Working around compiler error, remove later */
>
>Is this still relevant? What compiler error is seen here?
>
>> +       if (vma)
>> +               temp = i915_vm_bind_it_iter_next(vma, va + vma->size,
>> -1);
>> +       return vma;
>> +}
>> +
>> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
>> +{
>> +       assert_vm_bind_held(vma->vm);
>> +
>> +       if (!list_empty(&vma->vm_bind_link)) {
>> +               list_del_init(&vma->vm_bind_link);
>> +               i915_vm_bind_it_remove(vma, &vma->vm->va);
>> +
>> +               /* Release object */
>> +               if (release_obj)
>> +                       i915_vma_put(vma);
>
>i915_vma_put() here is confusing. Can we use i915_gem_object_put() to
>further make it clear that the persistent vmas actually take a
>reference on the object?
>

makes sense.

>> +       }
>> +}
>> +
>> +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
>> +                          struct drm_i915_gem_vm_unbind *va)
>> +{
>> +       struct drm_i915_gem_object *obj;
>> +       struct i915_vma *vma;
>> +       int ret;
>> +
>> +       va->start = gen8_noncanonical_addr(va->start);
>> +       ret = i915_gem_vm_bind_lock_interruptible(vm);
>> +       if (ret)
>> +               return ret;
>> +
>> +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>> +       if (!vma) {
>> +               ret = -ENOENT;
>> +               goto out_unlock;
>> +       }
>> +
>> +       if (vma->size != va->length)
>> +               ret = -EINVAL;
>> +       else
>> +               i915_gem_vm_bind_remove(vma, false);
>> +
>> +out_unlock:
>> +       i915_gem_vm_bind_unlock(vm);
>> +       if (ret || !vma)
>> +               return ret;
>> +
>> +       /* Destroy vma and then release object */
>> +       obj = vma->obj;
>> +       ret = i915_gem_object_lock(obj, NULL);
>> +       if (ret)
>> +               return ret;
>
>This call never returns an error and we could GEM_WARN_ON(...), or
>(void) to annotate that the return value is wilfully ignored.
>

makes sense.

>> +
>> +       i915_vma_destroy(vma);
>> +       i915_gem_object_unlock(obj);
>> +       i915_gem_object_put(obj);
>> +
>> +       return 0;
>> +}
>> +
>> +static struct i915_vma *vm_bind_get_vma(struct i915_address_space
>> *vm,
>> +                                       struct drm_i915_gem_object
>> *obj,
>> +                                       struct drm_i915_gem_vm_bind
>> *va)
>> +{
>> +       struct i915_ggtt_view view;
>> +       struct i915_vma *vma;
>> +
>> +       va->start = gen8_noncanonical_addr(va->start);
>> +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>> +       if (vma)
>> +               return ERR_PTR(-EEXIST);
>> +
>> +       view.type = I915_GGTT_VIEW_PARTIAL;
>> +       view.partial.offset = va->offset >> PAGE_SHIFT;
>> +       view.partial.size = va->length >> PAGE_SHIFT;
>
>IIRC, this vma view is not handled correctly in the vma code, that only
>understands views for ggtt bindings.
>

This patch series extends the partial view to ppgtt also.
Yah, the naming is still i915_ggtt_view, but I am hoping we can fix the
name in a follow up patch later.

>
>> +       vma = i915_vma_instance(obj, vm, &view);
>> +       if (IS_ERR(vma))
>> +               return vma;
>> +
>> +       vma->start = va->start;
>> +       vma->last = va->start + va->length - 1;
>> +
>> +       return vma;
>> +}
>> +
>> +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>> +                        struct drm_i915_gem_vm_bind *va,
>> +                        struct drm_file *file)
>> +{
>> +       struct drm_i915_gem_object *obj;
>> +       struct i915_vma *vma = NULL;
>> +       struct i915_gem_ww_ctx ww;
>> +       u64 pin_flags;
>> +       int ret = 0;
>> +
>> +       if (!vm->vm_bind_mode)
>> +               return -EOPNOTSUPP;
>> +
>> +       obj = i915_gem_object_lookup(file, va->handle);
>> +       if (!obj)
>> +               return -ENOENT;
>> +
>> +       if (!va->length ||
>> +           !IS_ALIGNED(va->offset | va->length,
>> +                       i915_gem_object_max_page_size(obj-
>> >mm.placements,
>> +                                                     obj-
>> >mm.n_placements)) ||
>> +           range_overflows_t(u64, va->offset, va->length, obj-
>> >base.size)) {
>> +               ret = -EINVAL;
>> +               goto put_obj;
>> +       }
>> +
>> +       ret = i915_gem_vm_bind_lock_interruptible(vm);
>> +       if (ret)
>> +               goto put_obj;
>> +
>> +       vma = vm_bind_get_vma(vm, obj, va);
>> +       if (IS_ERR(vma)) {
>> +               ret = PTR_ERR(vma);
>> +               goto unlock_vm;
>> +       }
>> +
>> +       i915_gem_ww_ctx_init(&ww, true);
>> +       pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
>> +retry:
>> +       ret = i915_gem_object_lock(vma->obj, &ww);
>> +       if (ret)
>> +               goto out_ww;
>> +
>> +       ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
>> +       if (ret)
>> +               goto out_ww;
>> +
>> +       /* Make it evictable */
>> +       __i915_vma_unpin(vma);
>
>A considerable effort has been put into avoiding short term vma pins in
>i915. We should add an interface like i915_vma_bind_ww() that avoids
>the pin altoghether.

Currently in i915 driver VA managment and device page table bindings
are tightly coupled. i915_vma_pin_ww() does the both VA allocation and
biding. And we also interpret VA being allocated (drm_mm node allocated)
also as vma is bound.

Decoupling it would be ideal but I think it needs to be carefully done
in a separate patch series to not cause any regression.

>
>> +
>> +       list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>> +       i915_vm_bind_it_insert(vma, &vm->va);
>> +
>> +       /* Hold object reference until vm_unbind */
>> +       i915_gem_object_get(vma->obj);
>> +out_ww:
>> +       if (ret == -EDEADLK) {
>> +               ret = i915_gem_ww_ctx_backoff(&ww);
>> +               if (!ret)
>> +                       goto retry;
>> +       }
>> +
>> +       if (ret)
>> +               i915_vma_destroy(vma);
>> +
>> +       i915_gem_ww_ctx_fini(&ww);
>
>Could use for_i915_gem_ww()?
>

Yah, I think it is a better idea to use it.

>> +unlock_vm:
>> +       i915_gem_vm_bind_unlock(vm);
>> +put_obj:
>> +       i915_gem_object_put(obj);
>> +       return ret;
>> +}
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index b67831833c9a..135dc4a76724 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -176,6 +176,8 @@ int i915_vm_lock_objects(struct
>> i915_address_space *vm,
>>  void i915_address_space_fini(struct i915_address_space *vm)
>>  {
>>         drm_mm_takedown(&vm->mm);
>> +       GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>> +       mutex_destroy(&vm->vm_bind_lock);
>>  }
>>  
>>  /**
>> @@ -282,6 +284,11 @@ void i915_address_space_init(struct
>> i915_address_space *vm, int subclass)
>>  
>>         INIT_LIST_HEAD(&vm->bound_list);
>>         INIT_LIST_HEAD(&vm->unbound_list);
>> +
>> +       vm->va = RB_ROOT_CACHED;
>> +       INIT_LIST_HEAD(&vm->vm_bind_list);
>> +       INIT_LIST_HEAD(&vm->vm_bound_list);
>> +       mutex_init(&vm->vm_bind_lock);
>>  }
>>  
>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index c812aa9708ae..d4a6ce65251d 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -259,6 +259,15 @@ struct i915_address_space {
>>          */
>>         struct list_head unbound_list;
>>  
>> +       /**
>> +        * List of VM_BIND objects.
>> +        */
>
>Proper kerneldoc + intel locking guidelines comments, please.
>
>> +       struct mutex vm_bind_lock;  /* Protects vm_bind lists */
>> +       struct list_head vm_bind_list;
>> +       struct list_head vm_bound_list;
>> +       /* va tree of persistent vmas */
>> +       struct rb_root_cached va;
>> +
>>         /* Global GTT */
>>         bool is_ggtt:1;
>>  
>> diff --git a/drivers/gpu/drm/i915/i915_driver.c
>> b/drivers/gpu/drm/i915/i915_driver.c
>> index ccf990dfd99b..776ab7844f60 100644
>> --- a/drivers/gpu/drm/i915/i915_driver.c
>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>> @@ -68,6 +68,7 @@
>>  #include "gem/i915_gem_ioctls.h"
>>  #include "gem/i915_gem_mman.h"
>>  #include "gem/i915_gem_pm.h"
>> +#include "gem/i915_gem_vm_bind.h"
>>  #include "gt/intel_gt.h"
>>  #include "gt/intel_gt_pm.h"
>>  #include "gt/intel_rc6.h"
>> @@ -1783,13 +1784,16 @@ static int i915_gem_vm_bind_ioctl(struct
>> drm_device *dev, void *data,
>>  {
>>         struct drm_i915_gem_vm_bind *args = data;
>>         struct i915_address_space *vm;
>> +       int ret;
>>  
>>         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>>         if (unlikely(!vm))
>>                 return -ENOENT;
>>  
>> +       ret = i915_gem_vm_bind_obj(vm, args, file);
>> +
>>         i915_vm_put(vm);
>> -       return -EINVAL;
>> +       return ret;
>>  }
>>  
>>  static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void
>> *data,
>> @@ -1797,13 +1801,16 @@ static int i915_gem_vm_unbind_ioctl(struct
>> drm_device *dev, void *data,
>>  {
>>         struct drm_i915_gem_vm_unbind *args = data;
>>         struct i915_address_space *vm;
>> +       int ret;
>>  
>>         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>>         if (unlikely(!vm))
>>                 return -ENOENT;
>>  
>> +       ret = i915_gem_vm_unbind_obj(vm, args);
>> +
>>         i915_vm_put(vm);
>> -       return -EINVAL;
>> +       return ret;
>>  }
>>  
>>  static const struct drm_ioctl_desc i915_ioctls[] = {
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c
>> b/drivers/gpu/drm/i915/i915_vma.c
>> index 43339ecabd73..d324e29cef0a 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -29,6 +29,7 @@
>>  #include "display/intel_frontbuffer.h"
>>  #include "gem/i915_gem_lmem.h"
>>  #include "gem/i915_gem_tiling.h"
>> +#include "gem/i915_gem_vm_bind.h"
>>  #include "gt/intel_engine.h"
>>  #include "gt/intel_engine_heartbeat.h"
>>  #include "gt/intel_gt.h"
>> @@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>         spin_unlock(&obj->vma.lock);
>>         mutex_unlock(&vm->mutex);
>>  
>> +       INIT_LIST_HEAD(&vma->vm_bind_link);
>>         return vma;
>>  
>>  err_unlock:
>> @@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object
>> *obj,
>>  {
>>         struct i915_vma *vma;
>>  
>> -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>>         GEM_BUG_ON(!kref_read(&vm->ref));
>>  
>>         spin_lock(&obj->vma.lock);
>> @@ -1660,6 +1661,10 @@ static void release_references(struct i915_vma
>> *vma, bool vm_ddestroy)
>>  
>>         spin_unlock(&obj->vma.lock);
>>  
>> +       i915_gem_vm_bind_lock(vma->vm);
>> +       i915_gem_vm_bind_remove(vma, true);
>> +       i915_gem_vm_bind_unlock(vma->vm);
>> +
>
>The vm might be destroyed at this point already.
>

Ah, due to async vma resource release...

>From what I understand we can destroy the vma from three call sites:
>1) VM_UNBIND -> The vma has already been removed from the vm_bind
>address space,
>2) object destruction -> since the vma has an object reference while in
>the vm_bind address space, it must also have been removed from the
>address space if called from object destruction.
>3) vm destruction. Suggestion is to call VM_UNBIND from under the
>vm_bind lock early in vm destruction.
>
>Then the above added code can be removed and replaced with an assert
>that the vm_bind address space RB_NODE is indeed empty.
>

...yah, makes sense to move this code to early in VM destruction than
here.

Niranjana

>
>>         spin_lock_irq(&gt->closed_lock);
>>         __i915_vma_remove_closed(vma);
>>         spin_unlock_irq(&gt->closed_lock);
>> diff --git a/drivers/gpu/drm/i915/i915_vma.h
>> b/drivers/gpu/drm/i915/i915_vma.h
>> index 88ca0bd9c900..dcb49f79ff7e 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.h
>> +++ b/drivers/gpu/drm/i915/i915_vma.h
>> @@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
>>  {
>>         ptrdiff_t cmp;
>>  
>> -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>> -
>>         cmp = ptrdiff(vma->vm, vm);
>>         if (cmp)
>>                 return cmp;
>> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
>> b/drivers/gpu/drm/i915/i915_vma_types.h
>> index be6e028c3b57..b6d179bdbfa0 100644
>> --- a/drivers/gpu/drm/i915/i915_vma_types.h
>> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>> @@ -289,6 +289,14 @@ struct i915_vma {
>>         /** This object's place on the active/inactive lists */
>>         struct list_head vm_link;
>>  
>> +       struct list_head vm_bind_link; /* Link in persistent VMA list
>> */
>> +
>> +       /** Interval tree structures for persistent vma */
>
>Proper kerneldoc.
>
>> +       struct rb_node rb;
>> +       u64 start;
>> +       u64 last;
>> +       u64 __subtree_last;
>> +
>>         struct list_head obj_link; /* Link in the object's VMA list
>> */
>>         struct rb_node obj_node;
>>         struct hlist_node obj_hash;
>
>Thanks,
>Thomas
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
@ 2022-07-07  5:43       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-07  5:43 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, matthew.auld,
	daniel.vetter, christian.koenig

On Wed, Jul 06, 2022 at 06:21:03PM +0200, Thomas Hellström wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> Bind and unbind the mappings upon VM_BIND and VM_UNBIND calls.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
>> ---
>>  drivers/gpu/drm/i915/Makefile                 |   1 +
>>  drivers/gpu/drm/i915/gem/i915_gem_create.c    |  10 +-
>>  drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +
>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  38 +++
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 233
>> ++++++++++++++++++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |   7 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>>  drivers/gpu/drm/i915/i915_driver.c            |  11 +-
>>  drivers/gpu/drm/i915/i915_vma.c               |   7 +-
>>  drivers/gpu/drm/i915/i915_vma.h               |   2 -
>>  drivers/gpu/drm/i915/i915_vma_types.h         |   8 +
>>  11 files changed, 318 insertions(+), 10 deletions(-)
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>  create mode 100644
>> drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>
>> diff --git a/drivers/gpu/drm/i915/Makefile
>> b/drivers/gpu/drm/i915/Makefile
>> index 522ef9b4aff3..4e1627e96c6e 100644
>> --- a/drivers/gpu/drm/i915/Makefile
>> +++ b/drivers/gpu/drm/i915/Makefile
>> @@ -165,6 +165,7 @@ gem-y += \
>>         gem/i915_gem_ttm_move.o \
>>         gem/i915_gem_ttm_pm.o \
>>         gem/i915_gem_userptr.o \
>> +       gem/i915_gem_vm_bind_object.o \
>>         gem/i915_gem_wait.o \
>>         gem/i915_gemfs.o
>>  i915-y += \
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> index 33673fe7ee0a..927a87e5ec59 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> @@ -15,10 +15,10 @@
>>  #include "i915_trace.h"
>>  #include "i915_user_extensions.h"
>>  
>> -static u32 object_max_page_size(struct intel_memory_region
>> **placements,
>> -                               unsigned int n_placements)
>> +u32 i915_gem_object_max_page_size(struct intel_memory_region
>> **placements,
>
>Kerneldoc.

This is an existing function that is being modified. As I
mentioned in other thread, we probably need a prep patch early
in this series to add missing kernel-docs in i915 which this
patch series would later update.

>
>> +                                 unsigned int n_placements)
>>  {
>> -       u32 max_page_size = 0;
>> +       u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>>         int i;
>>  
>>         for (i = 0; i < n_placements; i++) {
>> @@ -28,7 +28,6 @@ static u32 object_max_page_size(struct
>> intel_memory_region **placements,
>>                 max_page_size = max_t(u32, max_page_size, mr-
>> >min_page_size);
>>         }
>>  
>> -       GEM_BUG_ON(!max_page_size);
>>         return max_page_size;
>>  }
>
>Should this change be separated out? It's not immediately clear to a
>reviewer why it is included.

It is being removed as max_page_size now has a non-zero default
value and hence this check is not valid anymore.

>
>>  
>> @@ -99,7 +98,8 @@ __i915_gem_object_create_user_ext(struct
>> drm_i915_private *i915, u64 size,
>>  
>>         i915_gem_flush_free_objects(i915);
>>  
>> -       size = round_up(size, object_max_page_size(placements,
>> n_placements));
>> +       size = round_up(size,
>> i915_gem_object_max_page_size(placements,
>> +                                                          
>> n_placements));
>>         if (size == 0)
>>                 return ERR_PTR(-EINVAL);
>>  
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> index 6f0a3ce35567..650de2224843 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> @@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64
>> size)
>>  }
>>  
>>  void i915_gem_init__objects(struct drm_i915_private *i915);
>> +u32 i915_gem_object_max_page_size(struct intel_memory_region
>> **placements,
>> +                                 unsigned int n_placements);
>>  
>>  void i915_objects_module_exit(void);
>>  int i915_objects_module_init(void);
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> new file mode 100644
>> index 000000000000..642cdb559f17
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> @@ -0,0 +1,38 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +#ifndef __I915_GEM_VM_BIND_H
>> +#define __I915_GEM_VM_BIND_H
>> +
>> +#include "i915_drv.h"
>> +
>> +#define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
>> >vm_bind_lock)
>> +
>> +static inline void i915_gem_vm_bind_lock(struct i915_address_space
>> *vm)
>> +{
>> +       mutex_lock(&vm->vm_bind_lock);
>> +}
>> +
>> +static inline int
>> +i915_gem_vm_bind_lock_interruptible(struct i915_address_space *vm)
>> +{
>> +       return mutex_lock_interruptible(&vm->vm_bind_lock);
>> +}
>> +
>> +static inline void i915_gem_vm_bind_unlock(struct i915_address_space
>> *vm)
>> +{
>> +       mutex_unlock(&vm->vm_bind_lock);
>> +}
>> +
>
>Kerneldoc for the inlines.
>
>> +struct i915_vma *
>> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
>> release_obj);
>> +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>> +                        struct drm_i915_gem_vm_bind *va,
>> +                        struct drm_file *file);
>> +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
>> +                          struct drm_i915_gem_vm_unbind *va);
>> +
>> +#endif /* __I915_GEM_VM_BIND_H */
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> new file mode 100644
>> index 000000000000..43ceb4dcca6c
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> @@ -0,0 +1,233 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +#include <linux/interval_tree_generic.h>
>> +
>> +#include "gem/i915_gem_vm_bind.h"
>> +#include "gt/gen8_engine_cs.h"
>> +
>> +#include "i915_drv.h"
>> +#include "i915_gem_gtt.h"
>> +
>> +#define START(node) ((node)->start)
>> +#define LAST(node) ((node)->last)
>> +
>> +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
>> +                    START, LAST, static inline, i915_vm_bind_it)
>> +
>> +#undef START
>> +#undef LAST
>> +
>> +/**
>> + * DOC: VM_BIND/UNBIND ioctls
>> + *
>> + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM
>> buffer
>> + * objects (BOs) or sections of a BOs at specified GPU virtual
>> addresses on a
>> + * specified address space (VM). Multiple mappings can map to the
>> same physical
>> + * pages of an object (aliasing). These mappings (also referred to
>> as persistent
>> + * mappings) will be persistent across multiple GPU submissions
>> (execbuf calls)
>> + * issued by the UMD, without user having to provide a list of all
>> required
>> + * mappings during each submission (as required by older execbuf
>> mode).
>> + *
>> + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out
>> fence for
>> + * signaling the completion of bind/unbind operation.
>> + *
>> + * VM_BIND feature is advertised to user via
>> I915_PARAM_VM_BIND_VERSION.
>> + * User has to opt-in for VM_BIND mode of binding for an address
>> space (VM)
>> + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND
>> extension.
>> + *
>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>> concurrently
>> + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND
>> operations can be
>> + * done asynchronously, when valid out fence is specified.
>> + *
>> + * VM_BIND locking order is as below.
>> + *
>> + * 1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock
>> is taken in
>> + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while
>> releasing the
>> + *    mapping.
>> + *
>> + *    In future, when GPU page faults are supported, we can
>> potentially use a
>> + *    rwsem instead, so that multiple page fault handlers can take
>> the read
>> + *    side lock to lookup the mapping and hence can run in parallel.
>> + *    The older execbuf mode of binding do not need this lock.
>> + *
>> + * 2) Lock-B: The object's dma-resv lock will protect i915_vma state
>> and needs
>> + *    to be held while binding/unbinding a vma in the async worker
>> and while
>> + *    updating dma-resv fence list of an object. Note that private
>> BOs of a VM
>> + *    will all share a dma-resv object.
>> + *
>> + *    The future system allocator support will use the HMM
>> prescribed locking
>> + *    instead.
>
>I don't think the last sentence is relevant for this series. Also, are
>there any other mentions for Locks A, B and C? If not, can we ditch
>that naming?

It is taken from design rfc :). Yah, I think better to remove it and
probably the lock names and make it more specific to the implementation
in this patch series.

>
>> + *
>> + * 3) Lock-C: Spinlock/s to protect some of the VM's lists like the
>> list of
>> + *    invalidated vmas (due to eviction and userptr invalidation)
>> etc.
>> + */
>> +
>> +struct i915_vma *
>> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
>
>Kerneldoc for the extern functions.
>
>
>> +{
>> +       struct i915_vma *vma, *temp;
>> +
>> +       assert_vm_bind_held(vm);
>> +
>> +       vma = i915_vm_bind_it_iter_first(&vm->va, va, va);
>> +       /* Working around compiler error, remove later */
>
>Is this still relevant? What compiler error is seen here?
>
>> +       if (vma)
>> +               temp = i915_vm_bind_it_iter_next(vma, va + vma->size,
>> -1);
>> +       return vma;
>> +}
>> +
>> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
>> +{
>> +       assert_vm_bind_held(vma->vm);
>> +
>> +       if (!list_empty(&vma->vm_bind_link)) {
>> +               list_del_init(&vma->vm_bind_link);
>> +               i915_vm_bind_it_remove(vma, &vma->vm->va);
>> +
>> +               /* Release object */
>> +               if (release_obj)
>> +                       i915_vma_put(vma);
>
>i915_vma_put() here is confusing. Can we use i915_gem_object_put() to
>further make it clear that the persistent vmas actually take a
>reference on the object?
>

makes sense.

>> +       }
>> +}
>> +
>> +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
>> +                          struct drm_i915_gem_vm_unbind *va)
>> +{
>> +       struct drm_i915_gem_object *obj;
>> +       struct i915_vma *vma;
>> +       int ret;
>> +
>> +       va->start = gen8_noncanonical_addr(va->start);
>> +       ret = i915_gem_vm_bind_lock_interruptible(vm);
>> +       if (ret)
>> +               return ret;
>> +
>> +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>> +       if (!vma) {
>> +               ret = -ENOENT;
>> +               goto out_unlock;
>> +       }
>> +
>> +       if (vma->size != va->length)
>> +               ret = -EINVAL;
>> +       else
>> +               i915_gem_vm_bind_remove(vma, false);
>> +
>> +out_unlock:
>> +       i915_gem_vm_bind_unlock(vm);
>> +       if (ret || !vma)
>> +               return ret;
>> +
>> +       /* Destroy vma and then release object */
>> +       obj = vma->obj;
>> +       ret = i915_gem_object_lock(obj, NULL);
>> +       if (ret)
>> +               return ret;
>
>This call never returns an error and we could GEM_WARN_ON(...), or
>(void) to annotate that the return value is wilfully ignored.
>

makes sense.

>> +
>> +       i915_vma_destroy(vma);
>> +       i915_gem_object_unlock(obj);
>> +       i915_gem_object_put(obj);
>> +
>> +       return 0;
>> +}
>> +
>> +static struct i915_vma *vm_bind_get_vma(struct i915_address_space
>> *vm,
>> +                                       struct drm_i915_gem_object
>> *obj,
>> +                                       struct drm_i915_gem_vm_bind
>> *va)
>> +{
>> +       struct i915_ggtt_view view;
>> +       struct i915_vma *vma;
>> +
>> +       va->start = gen8_noncanonical_addr(va->start);
>> +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>> +       if (vma)
>> +               return ERR_PTR(-EEXIST);
>> +
>> +       view.type = I915_GGTT_VIEW_PARTIAL;
>> +       view.partial.offset = va->offset >> PAGE_SHIFT;
>> +       view.partial.size = va->length >> PAGE_SHIFT;
>
>IIRC, this vma view is not handled correctly in the vma code, that only
>understands views for ggtt bindings.
>

This patch series extends the partial view to ppgtt also.
Yah, the naming is still i915_ggtt_view, but I am hoping we can fix the
name in a follow up patch later.

>
>> +       vma = i915_vma_instance(obj, vm, &view);
>> +       if (IS_ERR(vma))
>> +               return vma;
>> +
>> +       vma->start = va->start;
>> +       vma->last = va->start + va->length - 1;
>> +
>> +       return vma;
>> +}
>> +
>> +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>> +                        struct drm_i915_gem_vm_bind *va,
>> +                        struct drm_file *file)
>> +{
>> +       struct drm_i915_gem_object *obj;
>> +       struct i915_vma *vma = NULL;
>> +       struct i915_gem_ww_ctx ww;
>> +       u64 pin_flags;
>> +       int ret = 0;
>> +
>> +       if (!vm->vm_bind_mode)
>> +               return -EOPNOTSUPP;
>> +
>> +       obj = i915_gem_object_lookup(file, va->handle);
>> +       if (!obj)
>> +               return -ENOENT;
>> +
>> +       if (!va->length ||
>> +           !IS_ALIGNED(va->offset | va->length,
>> +                       i915_gem_object_max_page_size(obj-
>> >mm.placements,
>> +                                                     obj-
>> >mm.n_placements)) ||
>> +           range_overflows_t(u64, va->offset, va->length, obj-
>> >base.size)) {
>> +               ret = -EINVAL;
>> +               goto put_obj;
>> +       }
>> +
>> +       ret = i915_gem_vm_bind_lock_interruptible(vm);
>> +       if (ret)
>> +               goto put_obj;
>> +
>> +       vma = vm_bind_get_vma(vm, obj, va);
>> +       if (IS_ERR(vma)) {
>> +               ret = PTR_ERR(vma);
>> +               goto unlock_vm;
>> +       }
>> +
>> +       i915_gem_ww_ctx_init(&ww, true);
>> +       pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
>> +retry:
>> +       ret = i915_gem_object_lock(vma->obj, &ww);
>> +       if (ret)
>> +               goto out_ww;
>> +
>> +       ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
>> +       if (ret)
>> +               goto out_ww;
>> +
>> +       /* Make it evictable */
>> +       __i915_vma_unpin(vma);
>
>A considerable effort has been put into avoiding short term vma pins in
>i915. We should add an interface like i915_vma_bind_ww() that avoids
>the pin altoghether.

Currently in i915 driver VA managment and device page table bindings
are tightly coupled. i915_vma_pin_ww() does the both VA allocation and
biding. And we also interpret VA being allocated (drm_mm node allocated)
also as vma is bound.

Decoupling it would be ideal but I think it needs to be carefully done
in a separate patch series to not cause any regression.

>
>> +
>> +       list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>> +       i915_vm_bind_it_insert(vma, &vm->va);
>> +
>> +       /* Hold object reference until vm_unbind */
>> +       i915_gem_object_get(vma->obj);
>> +out_ww:
>> +       if (ret == -EDEADLK) {
>> +               ret = i915_gem_ww_ctx_backoff(&ww);
>> +               if (!ret)
>> +                       goto retry;
>> +       }
>> +
>> +       if (ret)
>> +               i915_vma_destroy(vma);
>> +
>> +       i915_gem_ww_ctx_fini(&ww);
>
>Could use for_i915_gem_ww()?
>

Yah, I think it is a better idea to use it.

>> +unlock_vm:
>> +       i915_gem_vm_bind_unlock(vm);
>> +put_obj:
>> +       i915_gem_object_put(obj);
>> +       return ret;
>> +}
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index b67831833c9a..135dc4a76724 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -176,6 +176,8 @@ int i915_vm_lock_objects(struct
>> i915_address_space *vm,
>>  void i915_address_space_fini(struct i915_address_space *vm)
>>  {
>>         drm_mm_takedown(&vm->mm);
>> +       GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>> +       mutex_destroy(&vm->vm_bind_lock);
>>  }
>>  
>>  /**
>> @@ -282,6 +284,11 @@ void i915_address_space_init(struct
>> i915_address_space *vm, int subclass)
>>  
>>         INIT_LIST_HEAD(&vm->bound_list);
>>         INIT_LIST_HEAD(&vm->unbound_list);
>> +
>> +       vm->va = RB_ROOT_CACHED;
>> +       INIT_LIST_HEAD(&vm->vm_bind_list);
>> +       INIT_LIST_HEAD(&vm->vm_bound_list);
>> +       mutex_init(&vm->vm_bind_lock);
>>  }
>>  
>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index c812aa9708ae..d4a6ce65251d 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -259,6 +259,15 @@ struct i915_address_space {
>>          */
>>         struct list_head unbound_list;
>>  
>> +       /**
>> +        * List of VM_BIND objects.
>> +        */
>
>Proper kerneldoc + intel locking guidelines comments, please.
>
>> +       struct mutex vm_bind_lock;  /* Protects vm_bind lists */
>> +       struct list_head vm_bind_list;
>> +       struct list_head vm_bound_list;
>> +       /* va tree of persistent vmas */
>> +       struct rb_root_cached va;
>> +
>>         /* Global GTT */
>>         bool is_ggtt:1;
>>  
>> diff --git a/drivers/gpu/drm/i915/i915_driver.c
>> b/drivers/gpu/drm/i915/i915_driver.c
>> index ccf990dfd99b..776ab7844f60 100644
>> --- a/drivers/gpu/drm/i915/i915_driver.c
>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>> @@ -68,6 +68,7 @@
>>  #include "gem/i915_gem_ioctls.h"
>>  #include "gem/i915_gem_mman.h"
>>  #include "gem/i915_gem_pm.h"
>> +#include "gem/i915_gem_vm_bind.h"
>>  #include "gt/intel_gt.h"
>>  #include "gt/intel_gt_pm.h"
>>  #include "gt/intel_rc6.h"
>> @@ -1783,13 +1784,16 @@ static int i915_gem_vm_bind_ioctl(struct
>> drm_device *dev, void *data,
>>  {
>>         struct drm_i915_gem_vm_bind *args = data;
>>         struct i915_address_space *vm;
>> +       int ret;
>>  
>>         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>>         if (unlikely(!vm))
>>                 return -ENOENT;
>>  
>> +       ret = i915_gem_vm_bind_obj(vm, args, file);
>> +
>>         i915_vm_put(vm);
>> -       return -EINVAL;
>> +       return ret;
>>  }
>>  
>>  static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void
>> *data,
>> @@ -1797,13 +1801,16 @@ static int i915_gem_vm_unbind_ioctl(struct
>> drm_device *dev, void *data,
>>  {
>>         struct drm_i915_gem_vm_unbind *args = data;
>>         struct i915_address_space *vm;
>> +       int ret;
>>  
>>         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>>         if (unlikely(!vm))
>>                 return -ENOENT;
>>  
>> +       ret = i915_gem_vm_unbind_obj(vm, args);
>> +
>>         i915_vm_put(vm);
>> -       return -EINVAL;
>> +       return ret;
>>  }
>>  
>>  static const struct drm_ioctl_desc i915_ioctls[] = {
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c
>> b/drivers/gpu/drm/i915/i915_vma.c
>> index 43339ecabd73..d324e29cef0a 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -29,6 +29,7 @@
>>  #include "display/intel_frontbuffer.h"
>>  #include "gem/i915_gem_lmem.h"
>>  #include "gem/i915_gem_tiling.h"
>> +#include "gem/i915_gem_vm_bind.h"
>>  #include "gt/intel_engine.h"
>>  #include "gt/intel_engine_heartbeat.h"
>>  #include "gt/intel_gt.h"
>> @@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>         spin_unlock(&obj->vma.lock);
>>         mutex_unlock(&vm->mutex);
>>  
>> +       INIT_LIST_HEAD(&vma->vm_bind_link);
>>         return vma;
>>  
>>  err_unlock:
>> @@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object
>> *obj,
>>  {
>>         struct i915_vma *vma;
>>  
>> -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>>         GEM_BUG_ON(!kref_read(&vm->ref));
>>  
>>         spin_lock(&obj->vma.lock);
>> @@ -1660,6 +1661,10 @@ static void release_references(struct i915_vma
>> *vma, bool vm_ddestroy)
>>  
>>         spin_unlock(&obj->vma.lock);
>>  
>> +       i915_gem_vm_bind_lock(vma->vm);
>> +       i915_gem_vm_bind_remove(vma, true);
>> +       i915_gem_vm_bind_unlock(vma->vm);
>> +
>
>The vm might be destroyed at this point already.
>

Ah, due to async vma resource release...

>From what I understand we can destroy the vma from three call sites:
>1) VM_UNBIND -> The vma has already been removed from the vm_bind
>address space,
>2) object destruction -> since the vma has an object reference while in
>the vm_bind address space, it must also have been removed from the
>address space if called from object destruction.
>3) vm destruction. Suggestion is to call VM_UNBIND from under the
>vm_bind lock early in vm destruction.
>
>Then the above added code can be removed and replaced with an assert
>that the vm_bind address space RB_NODE is indeed empty.
>

...yah, makes sense to move this code to early in VM destruction than
here.

Niranjana

>
>>         spin_lock_irq(&gt->closed_lock);
>>         __i915_vma_remove_closed(vma);
>>         spin_unlock_irq(&gt->closed_lock);
>> diff --git a/drivers/gpu/drm/i915/i915_vma.h
>> b/drivers/gpu/drm/i915/i915_vma.h
>> index 88ca0bd9c900..dcb49f79ff7e 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.h
>> +++ b/drivers/gpu/drm/i915/i915_vma.h
>> @@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
>>  {
>>         ptrdiff_t cmp;
>>  
>> -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>> -
>>         cmp = ptrdiff(vma->vm, vm);
>>         if (cmp)
>>                 return cmp;
>> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
>> b/drivers/gpu/drm/i915/i915_vma_types.h
>> index be6e028c3b57..b6d179bdbfa0 100644
>> --- a/drivers/gpu/drm/i915/i915_vma_types.h
>> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>> @@ -289,6 +289,14 @@ struct i915_vma {
>>         /** This object's place on the active/inactive lists */
>>         struct list_head vm_link;
>>  
>> +       struct list_head vm_bind_link; /* Link in persistent VMA list
>> */
>> +
>> +       /** Interval tree structures for persistent vma */
>
>Proper kerneldoc.
>
>> +       struct rb_node rb;
>> +       u64 start;
>> +       u64 last;
>> +       u64 __subtree_last;
>> +
>>         struct list_head obj_link; /* Link in the object's VMA list
>> */
>>         struct rb_node obj_node;
>>         struct hlist_node obj_hash;
>
>Thanks,
>Thomas
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
  2022-07-07  1:41       ` Andi Shyti
  (?)
@ 2022-07-07  5:48       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-07  5:48 UTC (permalink / raw)
  To: Andi Shyti
  Cc: Thomas Hellström, paulo.r.zanoni, intel-gfx, dri-devel,
	matthew.auld, daniel.vetter, christian.koenig

On Thu, Jul 07, 2022 at 03:41:26AM +0200, Andi Shyti wrote:
>Hi,
>
>[...]
>
>> > @@ -28,7 +28,6 @@ static u32 object_max_page_size(struct
>> > intel_memory_region **placements,
>> >                 max_page_size = max_t(u32, max_page_size, mr-
>> > >min_page_size);
>> >         }
>> >  
>> > -       GEM_BUG_ON(!max_page_size);
>> >         return max_page_size;
>> >  }
>>
>> Should this change be separated out? It's not immediately clear to a
>> reviewer why it is included.
>
>no, it's not, indeed... and is it correct to assume that the
>default size is I915_GTT_PAGE_SIZE_4K?
>

Currently, supported minimum page sizes are either 4K or 64K.
So, we start with 4K as default and check if there is a bigger
min_page_size.

Niranjana

>[...]
>
>> > +#define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
>> > >vm_bind_lock)
>> > +
>> > +static inline void i915_gem_vm_bind_lock(struct i915_address_space
>> > *vm)
>> > +{
>> > +       mutex_lock(&vm->vm_bind_lock);
>> > +}
>> > +
>> > +static inline int
>> > +i915_gem_vm_bind_lock_interruptible(struct i915_address_space *vm)
>> > +{
>> > +       return mutex_lock_interruptible(&vm->vm_bind_lock);
>> > +}
>> > +
>> > +static inline void i915_gem_vm_bind_unlock(struct i915_address_space
>> > *vm)
>> > +{
>> > +       mutex_unlock(&vm->vm_bind_lock);
>> > +}
>> > +
>>
>> Kerneldoc for the inlines.
>
>do we really need these oneline wrappers?
>
>Andi

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 10/10] drm/i915/vm_bind: Fix vm->vm_bind_mutex and vm->mutex nesting
  2022-07-05  8:40     ` [Intel-gfx] " Thomas Hellström
@ 2022-07-07  5:56       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-07  5:56 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: matthew.brost, paulo.r.zanoni, lionel.g.landwerlin,
	tvrtko.ursulin, intel-gfx, dri-devel, matthew.auld, jason,
	daniel.vetter, christian.koenig

On Tue, Jul 05, 2022 at 10:40:56AM +0200, Thomas Hellström wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> VM_BIND functionality maintain that vm->vm_bind_mutex will never be
>> taken
>> while holding vm->mutex.
>> However, while closing 'vm', vma is destroyed while holding vm-
>> >mutex.
>> But vma releasing needs to take vm->vm_bind_mutex in order to delete
>> vma
>> from the vm_bind_list. To avoid this, destroy the vma outside vm-
>> >mutex
>> while closing the 'vm'.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>
>First, when introducing a new feature like this, we should not need to
>end the series with "Fix.." patches like this, rather whatever needs to
>be fixed should be fixed where the code was introduced.
>

Yah, makes sense.

>Second, an analogy whith linux kernel CPU mapping, could we instead
>think of the vm_bind_lock being similar to the mmap_lock, and the
>vm_mutex being similar to the i_mmap_lock, the former being used for VA
>manipulation and the latter when attaching / removing the backing store
>from the VA?
>
>Then we would not need to take the vm_bind_lock from vma destruction
>since the VA would already have been reclaimed at that point. For vm
>destruction here we'd loop over all relevant vm bind VAs under the
>vm_bind lock and call vm_unbind? Would that work?
>

Yah. Infact, in vm_unbind call, we first do VA reclaim
(i915_gem_vm_bind_remove()) under the vm_bind_lock and destroy the
vma (i915_vma_destroy()) outside the vm_bind_lock (under the object
lock). The vma destruction in vm_bind call error path is bit different,
but I think it can be handled as well.
Yah, as mentioned in other thread, doing a VA reclaim
(i915_gem_vm_bind_remove()) early during VM destruction under the
vm_bind_lock as you suggested would fit in there nicely.

Niranjana

>/Thomas
>
>
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gt/intel_gtt.c | 23 ++++++++++++++++++-----
>>  1 file changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 4ab3bda644ff..4f707d0eb3ef 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -109,7 +109,8 @@ int map_pt_dma_locked(struct i915_address_space
>> *vm, struct drm_i915_gem_object
>>         return 0;
>>  }
>>  
>> -static void clear_vm_list(struct list_head *list)
>> +static void clear_vm_list(struct list_head *list,
>> +                         struct list_head *destroy_list)
>>  {
>>         struct i915_vma *vma, *vn;
>>  
>> @@ -138,8 +139,7 @@ static void clear_vm_list(struct list_head *list)
>>                         i915_vm_resv_get(vma->vm);
>>                         vma->vm_ddestroy = true;
>>                 } else {
>> -                       i915_vma_destroy_locked(vma);
>> -                       i915_gem_object_put(obj);
>> +                       list_move_tail(&vma->vm_link, destroy_list);
>>                 }
>>  
>>         }
>> @@ -147,16 +147,29 @@ static void clear_vm_list(struct list_head
>> *list)
>>  
>>  static void __i915_vm_close(struct i915_address_space *vm)
>>  {
>> +       struct i915_vma *vma, *vn;
>> +       struct list_head list;
>> +
>> +       INIT_LIST_HEAD(&list);
>> +
>>         mutex_lock(&vm->mutex);
>>  
>> -       clear_vm_list(&vm->bound_list);
>> -       clear_vm_list(&vm->unbound_list);
>> +       clear_vm_list(&vm->bound_list, &list);
>> +       clear_vm_list(&vm->unbound_list, &list);
>>  
>>         /* Check for must-fix unanticipated side-effects */
>>         GEM_BUG_ON(!list_empty(&vm->bound_list));
>>         GEM_BUG_ON(!list_empty(&vm->unbound_list));
>>  
>>         mutex_unlock(&vm->mutex);
>> +
>> +       /* Destroy vmas outside vm->mutex */
>> +       list_for_each_entry_safe(vma, vn, &list, vm_link) {
>> +               struct drm_i915_gem_object *obj = vma->obj;
>> +
>> +               i915_vma_destroy(vma);
>> +               i915_gem_object_put(obj);
>> +       }
>>  }
>>  
>>  /* lock the vm into the current ww, if we lock one, we lock all */
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 10/10] drm/i915/vm_bind: Fix vm->vm_bind_mutex and vm->mutex nesting
@ 2022-07-07  5:56       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-07  5:56 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, matthew.auld,
	daniel.vetter, christian.koenig

On Tue, Jul 05, 2022 at 10:40:56AM +0200, Thomas Hellström wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> VM_BIND functionality maintain that vm->vm_bind_mutex will never be
>> taken
>> while holding vm->mutex.
>> However, while closing 'vm', vma is destroyed while holding vm-
>> >mutex.
>> But vma releasing needs to take vm->vm_bind_mutex in order to delete
>> vma
>> from the vm_bind_list. To avoid this, destroy the vma outside vm-
>> >mutex
>> while closing the 'vm'.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>
>First, when introducing a new feature like this, we should not need to
>end the series with "Fix.." patches like this, rather whatever needs to
>be fixed should be fixed where the code was introduced.
>

Yah, makes sense.

>Second, an analogy whith linux kernel CPU mapping, could we instead
>think of the vm_bind_lock being similar to the mmap_lock, and the
>vm_mutex being similar to the i_mmap_lock, the former being used for VA
>manipulation and the latter when attaching / removing the backing store
>from the VA?
>
>Then we would not need to take the vm_bind_lock from vma destruction
>since the VA would already have been reclaimed at that point. For vm
>destruction here we'd loop over all relevant vm bind VAs under the
>vm_bind lock and call vm_unbind? Would that work?
>

Yah. Infact, in vm_unbind call, we first do VA reclaim
(i915_gem_vm_bind_remove()) under the vm_bind_lock and destroy the
vma (i915_vma_destroy()) outside the vm_bind_lock (under the object
lock). The vma destruction in vm_bind call error path is bit different,
but I think it can be handled as well.
Yah, as mentioned in other thread, doing a VA reclaim
(i915_gem_vm_bind_remove()) early during VM destruction under the
vm_bind_lock as you suggested would fit in there nicely.

Niranjana

>/Thomas
>
>
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gt/intel_gtt.c | 23 ++++++++++++++++++-----
>>  1 file changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 4ab3bda644ff..4f707d0eb3ef 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -109,7 +109,8 @@ int map_pt_dma_locked(struct i915_address_space
>> *vm, struct drm_i915_gem_object
>>         return 0;
>>  }
>>  
>> -static void clear_vm_list(struct list_head *list)
>> +static void clear_vm_list(struct list_head *list,
>> +                         struct list_head *destroy_list)
>>  {
>>         struct i915_vma *vma, *vn;
>>  
>> @@ -138,8 +139,7 @@ static void clear_vm_list(struct list_head *list)
>>                         i915_vm_resv_get(vma->vm);
>>                         vma->vm_ddestroy = true;
>>                 } else {
>> -                       i915_vma_destroy_locked(vma);
>> -                       i915_gem_object_put(obj);
>> +                       list_move_tail(&vma->vm_link, destroy_list);
>>                 }
>>  
>>         }
>> @@ -147,16 +147,29 @@ static void clear_vm_list(struct list_head
>> *list)
>>  
>>  static void __i915_vm_close(struct i915_address_space *vm)
>>  {
>> +       struct i915_vma *vma, *vn;
>> +       struct list_head list;
>> +
>> +       INIT_LIST_HEAD(&list);
>> +
>>         mutex_lock(&vm->mutex);
>>  
>> -       clear_vm_list(&vm->bound_list);
>> -       clear_vm_list(&vm->unbound_list);
>> +       clear_vm_list(&vm->bound_list, &list);
>> +       clear_vm_list(&vm->unbound_list, &list);
>>  
>>         /* Check for must-fix unanticipated side-effects */
>>         GEM_BUG_ON(!list_empty(&vm->bound_list));
>>         GEM_BUG_ON(!list_empty(&vm->unbound_list));
>>  
>>         mutex_unlock(&vm->mutex);
>> +
>> +       /* Destroy vmas outside vm->mutex */
>> +       list_for_each_entry_safe(vma, vn, &list, vm_link) {
>> +               struct drm_i915_gem_object *obj = vma->obj;
>> +
>> +               i915_vma_destroy(vma);
>> +               i915_gem_object_put(obj);
>> +       }
>>  }
>>  
>>  /* lock the vm into the current ww, if we lock one, we lock all */
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
  2022-07-05 13:50         ` Zeng, Oak
  (?)
@ 2022-07-07  6:00         ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-07  6:00 UTC (permalink / raw)
  To: Zeng, Oak
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Hellstrom, Thomas, Auld,
	Matthew, Vetter, Daniel, christian.koenig

On Tue, Jul 05, 2022 at 06:50:19AM -0700, Zeng, Oak wrote:
>
>
>Thanks,
>Oak
>
>> -----Original Message-----
>> From: C, Ramalingam <ramalingam.c@intel.com>
>> Sent: July 5, 2022 5:20 AM
>> To: Zeng, Oak <oak.zeng@intel.com>
>> Cc: Vishwanathapura, Niranjana <niranjana.vishwanathapura@intel.com>;
>> intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Vetter,
>> Daniel <daniel.vetter@intel.com>; christian.koenig@amd.com; Hellstrom,
>> Thomas <thomas.hellstrom@intel.com>; Zanoni, Paulo R
>> <paulo.r.zanoni@intel.com>; Auld, Matthew <matthew.auld@intel.com>
>> Subject: Re: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent
>> vmas
>>
>> On 2022-07-04 at 17:05:38 +0000, Zeng, Oak wrote:
>> >
>> >
>> > Thanks,
>> > Oak
>> >
>> > > -----Original Message-----
>> > > From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf
>> > > Of Niranjana Vishwanathapura
>> > > Sent: July 1, 2022 6:51 PM
>> > > To: intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
>> > > Cc: Zanoni, Paulo R <paulo.r.zanoni@intel.com>; Hellstrom, Thomas
>> > > <thomas.hellstrom@intel.com>; Auld, Matthew
>> > > <matthew.auld@intel.com>; Vetter, Daniel <daniel.vetter@intel.com>;
>> > > christian.koenig@amd.com
>> > > Subject: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent
>> > > vmas
>> > >
>> > > Treat VM_BIND vmas as persistent and handle them during the request
>> > > submission in the execbuff path.
>> >
>> > Hi Niranjana,
>> >
>> > Is the meaning of "persistent" above persistent across all the subsequent
>> execbuf ioctls?
>>
>> Yes oak. Thats correct. persistent across multiple execbuf ioctls.
>
>Thank you Ram. Maybe we can add that to the commit message: " Treat VM_BIND vmas as persistent across multiple execbuf ioctls"?  I think this is versus the old execbuf mode where we bind in the execbuf ioctl and bindings are only valid for that execbuf. For those who don't have such background knowledge, it is hard to guess the meaning of persistent.
>

Yah, this is already documented in the DOC section of i915_vm_bind_object.c file.
Yah, I think we can include it here also in the commit message so that it is more
useful.

Niranjana

>Thanks,
>Oak
>>
>> Regards,
>> Ram.
>> >
>> > Thanks,
>> > Oak
>> >
>> > >
>> > > Support eviction by maintaining a list of evicted persistent vmas
>> > > for rebinding during next submission.
>> > >
>> > > Signed-off-by: Niranjana Vishwanathapura
>> > > <niranjana.vishwanathapura@intel.com>
>> > > ---
>> > >  drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
>> > >  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  3 +
>> > >  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 12 ++-
>> > >  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
>> > >  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>> > >  drivers/gpu/drm/i915/i915_gem_gtt.h           | 22 ++++++
>> > >  drivers/gpu/drm/i915/i915_vma.c               | 32 +++++++-
>> > >  drivers/gpu/drm/i915/i915_vma.h               | 78 +++++++++++++++++--
>> > >  drivers/gpu/drm/i915/i915_vma_types.h         | 23 ++++++
>> > >  9 files changed, 163 insertions(+), 12 deletions(-)
>> > >
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> > > index ccec4055fde3..5121f02ba95c 100644
>> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> > > @@ -38,6 +38,7 @@
>> > >  #include "i915_gem_mman.h"
>> > >  #include "i915_gem_object.h"
>> > >  #include "i915_gem_ttm.h"
>> > > +#include "i915_gem_vm_bind.h"
>> > >  #include "i915_memcpy.h"
>> > >  #include "i915_trace.h"
>> > >
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> > > index 849bf3c1061e..eaadf5a6ab09 100644
>> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> > > @@ -6,6 +6,7 @@
>> > >  #ifndef __I915_GEM_VM_BIND_H
>> > >  #define __I915_GEM_VM_BIND_H
>> > >
>> > > +#include <linux/dma-resv.h>
>> > >  #include "i915_drv.h"
>> > >
>> > >  #define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
>> > > >vm_bind_lock)
>> > > @@ -26,6 +27,8 @@ static inline void i915_gem_vm_bind_unlock(struct
>> > > i915_address_space *vm)
>> > >   mutex_unlock(&vm->vm_bind_lock);
>> > >  }
>> > >
>> > > +#define assert_vm_priv_held(vm)   assert_object_held((vm)->root_obj)
>> > > +
>> > >  static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
>> > >                                   struct i915_gem_ww_ctx *ww)
>> > >  {
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> > > index 96f139cc8060..1a8efa83547f 100644
>> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> > > @@ -85,6 +85,13 @@ void i915_gem_vm_bind_remove(struct i915_vma
>> > > *vma, bool release_obj)  {
>> > >   assert_vm_bind_held(vma->vm);
>> > >
>> > > + spin_lock(&vma->vm->vm_rebind_lock);
>> > > + if (!list_empty(&vma->vm_rebind_link))
>> > > +         list_del_init(&vma->vm_rebind_link);
>> > > + i915_vma_set_purged(vma);
>> > > + i915_vma_set_freed(vma);
>> > > + spin_unlock(&vma->vm->vm_rebind_lock);
>> > > +
>> > >   if (!list_empty(&vma->vm_bind_link)) {
>> > >           list_del_init(&vma->vm_bind_link);
>> > >           list_del_init(&vma->non_priv_vm_bind_link);
>> > > @@ -220,6 +227,7 @@ static struct i915_vma *vm_bind_get_vma(struct
>> > > i915_address_space *vm,
>> > >
>> > >   vma->start = va->start;
>> > >   vma->last = va->start + va->length - 1;
>> > > + i915_vma_set_persistent(vma);
>> > >
>> > >   return vma;
>> > >  }
>> > > @@ -304,8 +312,10 @@ int i915_gem_vm_bind_obj(struct
>> > > i915_address_space *vm,
>> > >
>> > >   i915_vm_bind_put_fence(vma);
>> > >  put_vma:
>> > > - if (ret)
>> > > + if (ret) {
>> > > +         i915_vma_set_freed(vma);
>> > >           i915_vma_destroy(vma);
>> > > + }
>> > >
>> > >   i915_gem_ww_ctx_fini(&ww);
>> > >  unlock_vm:
>> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> > > b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> > > index df0a8459c3c6..55d5389b2c6c 100644
>> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> > > @@ -293,6 +293,8 @@ void i915_address_space_init(struct
>> > > i915_address_space *vm, int subclass)
>> > >   INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>> > >   vm->root_obj = i915_gem_object_create_internal(vm->i915,
>> > > PAGE_SIZE);
>> > >   GEM_BUG_ON(IS_ERR(vm->root_obj));
>> > > + INIT_LIST_HEAD(&vm->vm_rebind_list);
>> > > + spin_lock_init(&vm->vm_rebind_lock);
>> > >  }
>> > >
>> > >  void *__px_vaddr(struct drm_i915_gem_object *p) diff --git
>> > > a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> > > b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> > > index f538ce9115c9..fe5485c4a1cd 100644
>> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> > > @@ -265,6 +265,8 @@ struct i915_address_space {
>> > >   struct mutex vm_bind_lock;  /* Protects vm_bind lists */
>> > >   struct list_head vm_bind_list;
>> > >   struct list_head vm_bound_list;
>> > > + struct list_head vm_rebind_list;
>> > > + spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
>> > >   /* va tree of persistent vmas */
>> > >   struct rb_root_cached va;
>> > >   struct list_head non_priv_vm_bind_list; diff --git
>> > > a/drivers/gpu/drm/i915/i915_gem_gtt.h
>> > > b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> > > index 8c2f57eb5dda..09b89d1913fc 100644
>> > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>> > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> > > @@ -51,4 +51,26 @@ int i915_gem_gtt_insert(struct i915_address_space
>> > > *vm,
>> > >
>> > >  #define PIN_OFFSET_MASK          I915_GTT_PAGE_MASK
>> > >
>> > > +static inline int i915_vm_sync(struct i915_address_space *vm) {
>> > > + int ret;
>> > > +
>> > > + /* Wait for all requests under this vm to finish */
>> > > + ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
>> > > +                             DMA_RESV_USAGE_BOOKKEEP, false,
>> > > +                             MAX_SCHEDULE_TIMEOUT);
>> > > + if (ret < 0)
>> > > +         return ret;
>> > > + else if (ret > 0)
>> > > +         return 0;
>> > > + else
>> > > +         return -ETIMEDOUT;
>> > > +}
>> > > +
>> > > +static inline bool i915_vm_is_active(const struct
>> > > +i915_address_space
>> > > +*vm) {
>> > > + return !dma_resv_test_signaled(vm->root_obj->base.resv,
>> > > +                                DMA_RESV_USAGE_BOOKKEEP); }
>> > > +
>> > >  #endif
>> > > diff --git a/drivers/gpu/drm/i915/i915_vma.c
>> > > b/drivers/gpu/drm/i915/i915_vma.c index 6737236b7884..6adb013579be
>> > > 100644
>> > > --- a/drivers/gpu/drm/i915/i915_vma.c
>> > > +++ b/drivers/gpu/drm/i915/i915_vma.c
>> > > @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
>> > >
>> > >   INIT_LIST_HEAD(&vma->vm_bind_link);
>> > >   INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>> > > + INIT_LIST_HEAD(&vma->vm_rebind_link);
>> > >   return vma;
>> > >
>> > >  err_unlock:
>> > > @@ -1622,7 +1623,8 @@ void i915_vma_close(struct i915_vma *vma)
>> > >   if (atomic_dec_and_lock_irqsave(&vma->open_count,
>> > >                                   &gt->closed_lock,
>> > >                                   flags)) {
>> > > -         __vma_close(vma, gt);
>> > > +         if (!i915_vma_is_persistent(vma))
>> > > +                 __vma_close(vma, gt);
>> > >           spin_unlock_irqrestore(&gt->closed_lock, flags);
>> > >   }
>> > >  }
>> > > @@ -1647,6 +1649,13 @@ static void force_unbind(struct i915_vma *vma)
>> > >   if (!drm_mm_node_allocated(&vma->node))
>> > >           return;
>> > >
>> > > + /*
>> > > +  * Mark persistent vma as purged to avoid it waiting
>> > > +  * for VM to be released.
>> > > +  */
>> > > + if (i915_vma_is_persistent(vma))
>> > > +         i915_vma_set_purged(vma);
>> > > +
>> > >   atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
>> > >   WARN_ON(__i915_vma_unbind(vma));
>> > >   GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
>> > > @@ -1666,9 +1675,12 @@ static void release_references(struct
>> > > i915_vma *vma, bool vm_ddestroy)
>> > >
>> > >   spin_unlock(&obj->vma.lock);
>> > >
>> > > - i915_gem_vm_bind_lock(vma->vm);
>> > > - i915_gem_vm_bind_remove(vma, true);
>> > > - i915_gem_vm_bind_unlock(vma->vm);
>> > > + if (i915_vma_is_persistent(vma) &&
>> > > +     !i915_vma_is_freed(vma)) {
>> > > +         i915_gem_vm_bind_lock(vma->vm);
>> > > +         i915_gem_vm_bind_remove(vma, true);
>> > > +         i915_gem_vm_bind_unlock(vma->vm);
>> > > + }
>> > >
>> > >   spin_lock_irq(&gt->closed_lock);
>> > >   __i915_vma_remove_closed(vma);
>> > > @@ -1839,6 +1851,8 @@ int _i915_vma_move_to_active(struct
>> i915_vma
>> > > *vma,
>> > >   int err;
>> > >
>> > >   assert_object_held(obj);
>> > > + if (i915_vma_is_persistent(vma))
>> > > +         return -EINVAL;
>> > >
>> > >   GEM_BUG_ON(!vma->pages);
>> > >
>> > > @@ -1999,6 +2013,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
>> > >   __i915_vma_evict(vma, false);
>> > >
>> > >   drm_mm_remove_node(&vma->node); /* pairs with
>> > > i915_vma_release() */
>> > > +
>> > > + if (i915_vma_is_persistent(vma)) {
>> > > +         spin_lock(&vma->vm->vm_rebind_lock);
>> > > +         if (list_empty(&vma->vm_rebind_link) &&
>> > > +             !i915_vma_is_purged(vma))
>> > > +                 list_add_tail(&vma->vm_rebind_link,
>> > > +                               &vma->vm->vm_rebind_list);
>> > > +         spin_unlock(&vma->vm->vm_rebind_lock);
>> > > + }
>> > > +
>> > >   return 0;
>> > >  }
>> > >
>> > > diff --git a/drivers/gpu/drm/i915/i915_vma.h
>> > > b/drivers/gpu/drm/i915/i915_vma.h index dcb49f79ff7e..6c1369a40e03
>> > > 100644
>> > > --- a/drivers/gpu/drm/i915/i915_vma.h
>> > > +++ b/drivers/gpu/drm/i915/i915_vma.h
>> > > @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object
>> > > *obj,
>> > >
>> > >  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned
>> > > int flags);  #define I915_VMA_RELEASE_MAP BIT(0)
>> > > -
>> > > -static inline bool i915_vma_is_active(const struct i915_vma *vma) -{
>> > > - return !i915_active_is_idle(&vma->active);
>> > > -}
>> > > -
>> > >  /* do not reserve memory to prevent deadlocks */  #define
>> > > __EXEC_OBJECT_NO_RESERVE BIT(31)
>> > >
>> > > @@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct
>> > > i915_vma
>> > > *vma)
>> > >   return i915_vm_to_ggtt(vma->vm)->pin_bias;
>> > >  }
>> > >
>> > > +static inline bool i915_vma_is_persistent(const struct i915_vma *vma) {
>> > > + return test_bit(I915_VMA_PERSISTENT_BIT,
>> > > __i915_vma_flags(vma)); }
>> > > +
>> > > +static inline void i915_vma_set_persistent(struct i915_vma *vma) {
>> > > + set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma)); }
>> > > +
>> > > +static inline bool i915_vma_is_purged(const struct i915_vma *vma) {
>> > > + return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma)); }
>> > > +
>> > > +static inline void i915_vma_set_purged(struct i915_vma *vma) {
>> > > + set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma)); }
>> > > +
>> > > +static inline bool i915_vma_is_freed(const struct i915_vma *vma) {
>> > > + return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma)); }
>> > > +
>> > > +static inline void i915_vma_set_freed(struct i915_vma *vma) {
>> > > + set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma)); }
>> > > +
>> > > +static inline bool i915_vma_is_active(const struct i915_vma *vma) {
>> > > + if (i915_vma_is_persistent(vma)) {
>> > > +         if (i915_vma_is_purged(vma))
>> > > +                 return false;
>> > > +
>> > > +         return i915_vm_is_active(vma->vm);
>> > > + }
>> > > +
>> > > + return !i915_active_is_idle(&vma->active);
>> > > +}
>> > > +
>> > >  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)  {
>> > >   i915_gem_object_get(vma->obj);
>> > > @@ -408,8 +444,36 @@ int i915_vma_wait_for_bind(struct i915_vma
>> > > *vma);
>> > >
>> > >  static inline int i915_vma_sync(struct i915_vma *vma)  {
>> > > + int ret;
>> > > +
>> > >   /* Wait for the asynchronous bindings and pending GPU reads */
>> > > - return i915_active_wait(&vma->active);
>> > > + ret = i915_active_wait(&vma->active);
>> > > + if (ret || !i915_vma_is_persistent(vma) ||
>> > > i915_vma_is_purged(vma))
>> > > +         return ret;
>> > > +
>> > > + return i915_vm_sync(vma->vm);
>> > > +}
>> > > +
>> > > +static inline bool i915_vma_is_bind_complete(struct i915_vma *vma) {
>> > > + /* Ensure vma bind is initiated */
>> > > + if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
>> > > +         return false;
>> > > +
>> > > + /* Ensure any binding started is complete */
>> > > + if (rcu_access_pointer(vma->active.excl.fence)) {
>> > > +         struct dma_fence *fence;
>> > > +
>> > > +         rcu_read_lock();
>> > > +         fence = dma_fence_get_rcu_safe(&vma->active.excl.fence);
>> > > +         rcu_read_unlock();
>> > > +         if (fence) {
>> > > +                 dma_fence_put(fence);
>> > > +                 return false;
>> > > +         }
>> > > + }
>> > > +
>> > > + return true;
>> > >  }
>> > >
>> > >  /**
>> > > diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
>> > > b/drivers/gpu/drm/i915/i915_vma_types.h
>> > > index 7d830a6a0b51..405c82e1bc30 100644
>> > > --- a/drivers/gpu/drm/i915/i915_vma_types.h
>> > > +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>> > > @@ -264,6 +264,28 @@ struct i915_vma {
>> > >  #define I915_VMA_SCANOUT_BIT     17
>> > >  #define I915_VMA_SCANOUT ((int)BIT(I915_VMA_SCANOUT_BIT))
>> > >
>> > > +  /**
>> > > +   * I915_VMA_PERSISTENT_BIT:
>> > > +   * The vma is persistent (created with VM_BIND call).
>> > > +   *
>> > > +   * I915_VMA_PURGED_BIT:
>> > > +   * The persistent vma is force unbound either due to VM_UNBIND call
>> > > +   * from UMD or VM is released. Do not check/wait for VM activeness
>> > > +   * in i915_vma_is_active() and i915_vma_sync() calls.
>> > > +   *
>> > > +   * I915_VMA_FREED_BIT:
>> > > +   * The persistent vma is being released by UMD via VM_UNBIND call.
>> > > +   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND
>> call
>> > > +   * already holds the lock.
>> > > +   */
>> > > +#define I915_VMA_PERSISTENT_BIT  19
>> > > +#define I915_VMA_PURGED_BIT      20
>> > > +#define I915_VMA_FREED_BIT       21
>> > > +
>> > > +#define I915_VMA_PERSISTENT
>> > >   ((int)BIT(I915_VMA_PERSISTENT_BIT))
>> > > +#define I915_VMA_PURGED
>>       ((int)BIT(I915_VMA_PURGED_BIT))
>> > > +#define I915_VMA_FREED
>>       ((int)BIT(I915_VMA_FREED_BIT))
>> > > +
>> > >   struct i915_active active;
>> > >
>> > >  #define I915_VMA_PAGES_BIAS 24
>> > > @@ -292,6 +314,7 @@ struct i915_vma {
>> > >   struct list_head vm_bind_link; /* Link in persistent VMA list */
>> > >   /* Link in non-private persistent VMA list */
>> > >   struct list_head non_priv_vm_bind_link;
>> > > + struct list_head vm_rebind_link; /* Link in vm_rebind_list */
>> > >
>> > >   /** Timeline fence for vm_bind completion notification */
>> > >   struct {
>> > > --
>> > > 2.21.0.rc0.32.g243a4c7e27
>> >

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 01/10] drm/i915/vm_bind: Introduce VM_BIND ioctl
  2022-07-07  5:01       ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-07  7:32         ` Hellstrom, Thomas
  -1 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-07  7:32 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	 Daniel, christian.koenig

On Wed, 2022-07-06 at 22:01 -0700, Niranjana Vishwanathapura wrote:
> > > +       /**
> > > +        * true: allow only vm_bind method of binding.
> > > +        * false: allow only legacy execbuff method of binding.
> > > +        */
> > 
> > Use proper kerneldoc. (Same holds for structure documentation
> > across
> > the series).
> > Also please follow internal locking guidelines on documentation of
> > members that need protection with locks.
> 
> I just followed the documentation convention that was already there
> ;)
> I think we need a prep patch in this series that adds kernel-doc for
> these structures and then add new fields for vm_bind with proper
> kernel-docs.

That would be awesome if we could do that, but as a minimum, I think
that new in-line struct / union comments should follow

https://www.kernel.org/doc/html/v5.3/doc-guide/kernel-doc.html#in-line-member-documentation-comments

and completely new struct / unions should follow

https://www.kernel.org/doc/html/v5.3/doc-guide/kernel-doc.html#in-line-member-documentation-comments,

and in particular the internal locking guidelines what members are
protected with what locks and, if applicable, how. (For example a
member may be protected by two locks when writing to it and only one of
the locks when reading).

/Thomas



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 01/10] drm/i915/vm_bind: Introduce VM_BIND ioctl
@ 2022-07-07  7:32         ` Hellstrom, Thomas
  0 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-07  7:32 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	 Daniel, christian.koenig

On Wed, 2022-07-06 at 22:01 -0700, Niranjana Vishwanathapura wrote:
> > > +       /**
> > > +        * true: allow only vm_bind method of binding.
> > > +        * false: allow only legacy execbuff method of binding.
> > > +        */
> > 
> > Use proper kerneldoc. (Same holds for structure documentation
> > across
> > the series).
> > Also please follow internal locking guidelines on documentation of
> > members that need protection with locks.
> 
> I just followed the documentation convention that was already there
> ;)
> I think we need a prep patch in this series that adds kernel-doc for
> these structures and then add new fields for vm_bind with proper
> kernel-docs.

That would be awesome if we could do that, but as a minimum, I think
that new in-line struct / union comments should follow

https://www.kernel.org/doc/html/v5.3/doc-guide/kernel-doc.html#in-line-member-documentation-comments

and completely new struct / unions should follow

https://www.kernel.org/doc/html/v5.3/doc-guide/kernel-doc.html#in-line-member-documentation-comments,

and in particular the internal locking guidelines what members are
protected with what locks and, if applicable, how. (For example a
member may be protected by two locks when writing to it and only one of
the locks when reading).

/Thomas



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
  2022-07-07  5:43       ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-07  8:14         ` Thomas Hellström
  -1 siblings, 0 replies; 121+ messages in thread
From: Thomas Hellström @ 2022-07-07  8:14 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: matthew.brost, paulo.r.zanoni, lionel.g.landwerlin,
	tvrtko.ursulin, intel-gfx, dri-devel, matthew.auld, jason,
	daniel.vetter, christian.koenig

On Wed, 2022-07-06 at 22:43 -0700, Niranjana Vishwanathapura wrote:
> On Wed, Jul 06, 2022 at 06:21:03PM +0200, Thomas Hellström wrote:
> > On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> > > Bind and unbind the mappings upon VM_BIND and VM_UNBIND calls.
> > > 
> > > Signed-off-by: Niranjana Vishwanathapura
> > > <niranjana.vishwanathapura@intel.com>
> > > Signed-off-by: Prathap Kumar Valsan
> > > <prathap.kumar.valsan@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/Makefile                 |   1 +
> > >  drivers/gpu/drm/i915/gem/i915_gem_create.c    |  10 +-
> > >  drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +
> > >  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  38 +++
> > >  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 233
> > > ++++++++++++++++++
> > >  drivers/gpu/drm/i915/gt/intel_gtt.c           |   7 +
> > >  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
> > >  drivers/gpu/drm/i915/i915_driver.c            |  11 +-
> > >  drivers/gpu/drm/i915/i915_vma.c               |   7 +-
> > >  drivers/gpu/drm/i915/i915_vma.h               |   2 -
> > >  drivers/gpu/drm/i915/i915_vma_types.h         |   8 +
> > >  11 files changed, 318 insertions(+), 10 deletions(-)
> > >  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > >  create mode 100644
> > > drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > 
> > > diff --git a/drivers/gpu/drm/i915/Makefile
> > > b/drivers/gpu/drm/i915/Makefile
> > > index 522ef9b4aff3..4e1627e96c6e 100644
> > > --- a/drivers/gpu/drm/i915/Makefile
> > > +++ b/drivers/gpu/drm/i915/Makefile
> > > @@ -165,6 +165,7 @@ gem-y += \
> > >         gem/i915_gem_ttm_move.o \
> > >         gem/i915_gem_ttm_pm.o \
> > >         gem/i915_gem_userptr.o \
> > > +       gem/i915_gem_vm_bind_object.o \
> > >         gem/i915_gem_wait.o \
> > >         gem/i915_gemfs.o
> > >  i915-y += \
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > index 33673fe7ee0a..927a87e5ec59 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > @@ -15,10 +15,10 @@
> > >  #include "i915_trace.h"
> > >  #include "i915_user_extensions.h"
> > >  
> > > -static u32 object_max_page_size(struct intel_memory_region
> > > **placements,
> > > -                               unsigned int n_placements)
> > > +u32 i915_gem_object_max_page_size(struct intel_memory_region
> > > **placements,
> > 
> > Kerneldoc.
> 
> This is an existing function that is being modified. As I
> mentioned in other thread, we probably need a prep patch early
> in this series to add missing kernel-docs in i915 which this
> patch series would later update.

Here we make a static function extern, which according to the patch
submission guidelines, mandates a kerneloc comment, so it's not so much
that the function is modified. We should be fine adding kerneldoc in
the patch that makes the function extern.


> 
> > 
> > > +                                 unsigned int n_placements)
> > >  {
> > > -       u32 max_page_size = 0;
> > > +       u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
> > >         int i;
> > >  
> > >         for (i = 0; i < n_placements; i++) {
> > > @@ -28,7 +28,6 @@ static u32 object_max_page_size(struct
> > > intel_memory_region **placements,
> > >                 max_page_size = max_t(u32, max_page_size, mr-
> > > > min_page_size);
> > >         }
> > >  
> > > -       GEM_BUG_ON(!max_page_size);
> > >         return max_page_size;
> > >  }
> > 
> > Should this change be separated out? It's not immediately clear to
> > a
> > reviewer why it is included.
> 
> It is being removed as max_page_size now has a non-zero default
> value and hence this check is not valid anymore.

But that in itself deserves an explanation in the patch commit message.
So that's why I wondered whether it wouldn't be better to separate it
out?

> 
> > 
> > >  
> > > @@ -99,7 +98,8 @@ __i915_gem_object_create_user_ext(struct
> > > drm_i915_private *i915, u64 size,
> > >  
> > >         i915_gem_flush_free_objects(i915);
> > >  
> > > -       size = round_up(size, object_max_page_size(placements,
> > > n_placements));
> > > +       size = round_up(size,
> > > i915_gem_object_max_page_size(placements,
> > > +                                                          
> > > n_placements));
> > >         if (size == 0)
> > >                 return ERR_PTR(-EINVAL);
> > >  
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > index 6f0a3ce35567..650de2224843 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > @@ -47,6 +47,8 @@ static inline bool
> > > i915_gem_object_size_2big(u64
> > > size)
> > >  }
> > >  
> > >  void i915_gem_init__objects(struct drm_i915_private *i915);
> > > +u32 i915_gem_object_max_page_size(struct intel_memory_region
> > > **placements,
> > > +                                 unsigned int n_placements);
> > >  
> > >  void i915_objects_module_exit(void);
> > >  int i915_objects_module_init(void);
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > new file mode 100644
> > > index 000000000000..642cdb559f17
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > @@ -0,0 +1,38 @@
> > > +/* SPDX-License-Identifier: MIT */
> > > +/*
> > > + * Copyright © 2022 Intel Corporation
> > > + */
> > > +
> > > +#ifndef __I915_GEM_VM_BIND_H
> > > +#define __I915_GEM_VM_BIND_H
> > > +
> > > +#include "i915_drv.h"
> > > +
> > > +#define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
> > > > vm_bind_lock)
> > > +
> > > +static inline void i915_gem_vm_bind_lock(struct
> > > i915_address_space
> > > *vm)
> > > +{
> > > +       mutex_lock(&vm->vm_bind_lock);
> > > +}
> > > +
> > > +static inline int
> > > +i915_gem_vm_bind_lock_interruptible(struct i915_address_space
> > > *vm)
> > > +{
> > > +       return mutex_lock_interruptible(&vm->vm_bind_lock);
> > > +}
> > > +
> > > +static inline void i915_gem_vm_bind_unlock(struct
> > > i915_address_space
> > > *vm)
> > > +{
> > > +       mutex_unlock(&vm->vm_bind_lock);
> > > +}
> > > +
> > 
> > Kerneldoc for the inlines.
> > 
> > > +struct i915_vma *
> > > +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64
> > > va);
> > > +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
> > > release_obj);
> > > +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> > > +                        struct drm_i915_gem_vm_bind *va,
> > > +                        struct drm_file *file);
> > > +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
> > > +                          struct drm_i915_gem_vm_unbind *va);
> > > +
> > > +#endif /* __I915_GEM_VM_BIND_H */
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > new file mode 100644
> > > index 000000000000..43ceb4dcca6c
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > @@ -0,0 +1,233 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright © 2022 Intel Corporation
> > > + */
> > > +
> > > +#include <linux/interval_tree_generic.h>
> > > +
> > > +#include "gem/i915_gem_vm_bind.h"
> > > +#include "gt/gen8_engine_cs.h"
> > > +
> > > +#include "i915_drv.h"
> > > +#include "i915_gem_gtt.h"
> > > +
> > > +#define START(node) ((node)->start)
> > > +#define LAST(node) ((node)->last)
> > > +
> > > +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
> > > +                    START, LAST, static inline, i915_vm_bind_it)
> > > +
> > > +#undef START
> > > +#undef LAST
> > > +
> > > +/**
> > > + * DOC: VM_BIND/UNBIND ioctls
> > > + *
> > > + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind
> > > GEM
> > > buffer
> > > + * objects (BOs) or sections of a BOs at specified GPU virtual
> > > addresses on a
> > > + * specified address space (VM). Multiple mappings can map to
> > > the
> > > same physical
> > > + * pages of an object (aliasing). These mappings (also referred
> > > to
> > > as persistent
> > > + * mappings) will be persistent across multiple GPU submissions
> > > (execbuf calls)
> > > + * issued by the UMD, without user having to provide a list of
> > > all
> > > required
> > > + * mappings during each submission (as required by older execbuf
> > > mode).
> > > + *
> > > + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out
> > > fence for
> > > + * signaling the completion of bind/unbind operation.
> > > + *
> > > + * VM_BIND feature is advertised to user via
> > > I915_PARAM_VM_BIND_VERSION.
> > > + * User has to opt-in for VM_BIND mode of binding for an address
> > > space (VM)
> > > + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND
> > > extension.
> > > + *
> > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> > > concurrently
> > > + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND
> > > operations can be
> > > + * done asynchronously, when valid out fence is specified.
> > > + *
> > > + * VM_BIND locking order is as below.
> > > + *
> > > + * 1) Lock-A: A vm_bind mutex will protect vm_bind lists. This
> > > lock
> > > is taken in
> > > + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and
> > > while
> > > releasing the
> > > + *    mapping.
> > > + *
> > > + *    In future, when GPU page faults are supported, we can
> > > potentially use a
> > > + *    rwsem instead, so that multiple page fault handlers can
> > > take
> > > the read
> > > + *    side lock to lookup the mapping and hence can run in
> > > parallel.
> > > + *    The older execbuf mode of binding do not need this lock.
> > > + *
> > > + * 2) Lock-B: The object's dma-resv lock will protect i915_vma
> > > state
> > > and needs
> > > + *    to be held while binding/unbinding a vma in the async
> > > worker
> > > and while
> > > + *    updating dma-resv fence list of an object. Note that
> > > private
> > > BOs of a VM
> > > + *    will all share a dma-resv object.
> > > + *
> > > + *    The future system allocator support will use the HMM
> > > prescribed locking
> > > + *    instead.
> > 
> > I don't think the last sentence is relevant for this series. Also,
> > are
> > there any other mentions for Locks A, B and C? If not, can we ditch
> > that naming?
> 
> It is taken from design rfc :). Yah, I think better to remove it and
> probably the lock names and make it more specific to the
> implementation
> in this patch series.

Ah, OK, if it's taken from the RFC and is an established naming in
documentation that will remain, then it's fine with me. Perhaps with a
pointer added to that doc that will help the reader.

> 
> > 
> > > + *
> > > + * 3) Lock-C: Spinlock/s to protect some of the VM's lists like
> > > the
> > > list of
> > > + *    invalidated vmas (due to eviction and userptr
> > > invalidation)
> > > etc.
> > > + */
> > > +
> > > +struct i915_vma *
> > > +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64
> > > va)
> > 
> > Kerneldoc for the extern functions.
> > 
> > 
> > > +{
> > > +       struct i915_vma *vma, *temp;
> > > +
> > > +       assert_vm_bind_held(vm);
> > > +
> > > +       vma = i915_vm_bind_it_iter_first(&vm->va, va, va);
> > > +       /* Working around compiler error, remove later */
> > 
> > Is this still relevant? What compiler error is seen here?
> > 
> > > +       if (vma)
> > > +               temp = i915_vm_bind_it_iter_next(vma, va + vma-
> > > >size,
> > > -1);
> > > +       return vma;
> > > +}
> > > +
> > > +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
> > > release_obj)
> > > +{
> > > +       assert_vm_bind_held(vma->vm);
> > > +
> > > +       if (!list_empty(&vma->vm_bind_link)) {
> > > +               list_del_init(&vma->vm_bind_link);
> > > +               i915_vm_bind_it_remove(vma, &vma->vm->va);
> > > +
> > > +               /* Release object */
> > > +               if (release_obj)
> > > +                       i915_vma_put(vma);
> > 
> > i915_vma_put() here is confusing. Can we use i915_gem_object_put()
> > to
> > further make it clear that the persistent vmas actually take a
> > reference on the object?
> > 
> 
> makes sense.
> 
> > > +       }
> > > +}
> > > +
> > > +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
> > > +                          struct drm_i915_gem_vm_unbind *va)
> > > +{
> > > +       struct drm_i915_gem_object *obj;
> > > +       struct i915_vma *vma;
> > > +       int ret;
> > > +
> > > +       va->start = gen8_noncanonical_addr(va->start);
> > > +       ret = i915_gem_vm_bind_lock_interruptible(vm);
> > > +       if (ret)
> > > +               return ret;
> > > +
> > > +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> > > +       if (!vma) {
> > > +               ret = -ENOENT;
> > > +               goto out_unlock;
> > > +       }
> > > +
> > > +       if (vma->size != va->length)
> > > +               ret = -EINVAL;
> > > +       else
> > > +               i915_gem_vm_bind_remove(vma, false);
> > > +
> > > +out_unlock:
> > > +       i915_gem_vm_bind_unlock(vm);
> > > +       if (ret || !vma)
> > > +               return ret;
> > > +
> > > +       /* Destroy vma and then release object */
> > > +       obj = vma->obj;
> > > +       ret = i915_gem_object_lock(obj, NULL);
> > > +       if (ret)
> > > +               return ret;
> > 
> > This call never returns an error and we could GEM_WARN_ON(...), or
> > (void) to annotate that the return value is wilfully ignored.
> > 
> 
> makes sense.
> 
> > > +
> > > +       i915_vma_destroy(vma);
> > > +       i915_gem_object_unlock(obj);
> > > +       i915_gem_object_put(obj);
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static struct i915_vma *vm_bind_get_vma(struct
> > > i915_address_space
> > > *vm,
> > > +                                       struct
> > > drm_i915_gem_object
> > > *obj,
> > > +                                       struct
> > > drm_i915_gem_vm_bind
> > > *va)
> > > +{
> > > +       struct i915_ggtt_view view;
> > > +       struct i915_vma *vma;
> > > +
> > > +       va->start = gen8_noncanonical_addr(va->start);
> > > +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> > > +       if (vma)
> > > +               return ERR_PTR(-EEXIST);
> > > +
> > > +       view.type = I915_GGTT_VIEW_PARTIAL;
> > > +       view.partial.offset = va->offset >> PAGE_SHIFT;
> > > +       view.partial.size = va->length >> PAGE_SHIFT;
> > 
> > IIRC, this vma view is not handled correctly in the vma code, that
> > only
> > understands views for ggtt bindings.
> > 
> 
> This patch series extends the partial view to ppgtt also.
> Yah, the naming is still i915_ggtt_view, but I am hoping we can fix
> the
> name in a follow up patch later.

Hmm, I somehow thought that the vma page adjustment was a NOP on ppgtt
and only done on ggtt. But that's indeed not the case. Yes, then this
is ok. We need to remember, though, that if we're going to use the
existing vma async unbinding functionality, we'd need to attach the vma
pages to the vma resource.


> 
> > 
> > > +       vma = i915_vma_instance(obj, vm, &view);
> > > +       if (IS_ERR(vma))
> > > +               return vma;
> > > +
> > > +       vma->start = va->start;
> > > +       vma->last = va->start + va->length - 1;
> > > +
> > > +       return vma;
> > > +}
> > > +
> > > +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> > > +                        struct drm_i915_gem_vm_bind *va,
> > > +                        struct drm_file *file)
> > > +{
> > > +       struct drm_i915_gem_object *obj;
> > > +       struct i915_vma *vma = NULL;
> > > +       struct i915_gem_ww_ctx ww;
> > > +       u64 pin_flags;
> > > +       int ret = 0;
> > > +
> > > +       if (!vm->vm_bind_mode)
> > > +               return -EOPNOTSUPP;
> > > +
> > > +       obj = i915_gem_object_lookup(file, va->handle);
> > > +       if (!obj)
> > > +               return -ENOENT;
> > > +
> > > +       if (!va->length ||
> > > +           !IS_ALIGNED(va->offset | va->length,
> > > +                       i915_gem_object_max_page_size(obj-
> > > > mm.placements,
> > > +                                                     obj-
> > > > mm.n_placements)) ||
> > > +           range_overflows_t(u64, va->offset, va->length, obj-
> > > > base.size)) {
> > > +               ret = -EINVAL;
> > > +               goto put_obj;
> > > +       }
> > > +
> > > +       ret = i915_gem_vm_bind_lock_interruptible(vm);
> > > +       if (ret)
> > > +               goto put_obj;
> > > +
> > > +       vma = vm_bind_get_vma(vm, obj, va);
> > > +       if (IS_ERR(vma)) {
> > > +               ret = PTR_ERR(vma);
> > > +               goto unlock_vm;
> > > +       }
> > > +
> > > +       i915_gem_ww_ctx_init(&ww, true);
> > > +       pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
> > > +retry:
> > > +       ret = i915_gem_object_lock(vma->obj, &ww);
> > > +       if (ret)
> > > +               goto out_ww;
> > > +
> > > +       ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
> > > +       if (ret)
> > > +               goto out_ww;
> > > +
> > > +       /* Make it evictable */
> > > +       __i915_vma_unpin(vma);
> > 
> > A considerable effort has been put into avoiding short term vma
> > pins in
> > i915. We should add an interface like i915_vma_bind_ww() that
> > avoids
> > the pin altoghether.
> 
> Currently in i915 driver VA managment and device page table bindings
> are tightly coupled. i915_vma_pin_ww() does the both VA allocation
> and
> biding. And we also interpret VA being allocated (drm_mm node
> allocated)
> also as vma is bound.
> 
> Decoupling it would be ideal but I think it needs to be carefully
> done
> in a separate patch series to not cause any regression.

So the idea would be not to decouple these, but to just avoid pinning
the vma in the process.


> 
> > 
> > > +
> > > +       list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
> > > +       i915_vm_bind_it_insert(vma, &vm->va);
> > > +
> > > +       /* Hold object reference until vm_unbind */
> > > +       i915_gem_object_get(vma->obj);
> > > +out_ww:
> > > +       if (ret == -EDEADLK) {
> > > +               ret = i915_gem_ww_ctx_backoff(&ww);
> > > +               if (!ret)
> > > +                       goto retry;
> > > +       }
> > > +
> > > +       if (ret)
> > > +               i915_vma_destroy(vma);
> > > +
> > > +       i915_gem_ww_ctx_fini(&ww);
> > 
> > Could use for_i915_gem_ww()?
> > 
> 
> Yah, I think it is a better idea to use it.
> 
> > > +unlock_vm:
> > > +       i915_gem_vm_bind_unlock(vm);
> > > +put_obj:
> > > +       i915_gem_object_put(obj);
> > > +       return ret;
> > > +}
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > index b67831833c9a..135dc4a76724 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > @@ -176,6 +176,8 @@ int i915_vm_lock_objects(struct
> > > i915_address_space *vm,
> > >  void i915_address_space_fini(struct i915_address_space *vm)
> > >  {
> > >         drm_mm_takedown(&vm->mm);
> > > +       GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
> > > +       mutex_destroy(&vm->vm_bind_lock);
> > >  }
> > >  
> > >  /**
> > > @@ -282,6 +284,11 @@ void i915_address_space_init(struct
> > > i915_address_space *vm, int subclass)
> > >  
> > >         INIT_LIST_HEAD(&vm->bound_list);
> > >         INIT_LIST_HEAD(&vm->unbound_list);
> > > +
> > > +       vm->va = RB_ROOT_CACHED;
> > > +       INIT_LIST_HEAD(&vm->vm_bind_list);
> > > +       INIT_LIST_HEAD(&vm->vm_bound_list);
> > > +       mutex_init(&vm->vm_bind_lock);
> > >  }
> > >  
> > >  void *__px_vaddr(struct drm_i915_gem_object *p)
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > index c812aa9708ae..d4a6ce65251d 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > @@ -259,6 +259,15 @@ struct i915_address_space {
> > >          */
> > >         struct list_head unbound_list;
> > >  
> > > +       /**
> > > +        * List of VM_BIND objects.
> > > +        */
> > 
> > Proper kerneldoc + intel locking guidelines comments, please.
> > 
> > > +       struct mutex vm_bind_lock;  /* Protects vm_bind lists */
> > > +       struct list_head vm_bind_list;
> > > +       struct list_head vm_bound_list;
> > > +       /* va tree of persistent vmas */
> > > +       struct rb_root_cached va;
> > > +
> > >         /* Global GTT */
> > >         bool is_ggtt:1;
> > >  
> > > diff --git a/drivers/gpu/drm/i915/i915_driver.c
> > > b/drivers/gpu/drm/i915/i915_driver.c
> > > index ccf990dfd99b..776ab7844f60 100644
> > > --- a/drivers/gpu/drm/i915/i915_driver.c
> > > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > > @@ -68,6 +68,7 @@
> > >  #include "gem/i915_gem_ioctls.h"
> > >  #include "gem/i915_gem_mman.h"
> > >  #include "gem/i915_gem_pm.h"
> > > +#include "gem/i915_gem_vm_bind.h"
> > >  #include "gt/intel_gt.h"
> > >  #include "gt/intel_gt_pm.h"
> > >  #include "gt/intel_rc6.h"
> > > @@ -1783,13 +1784,16 @@ static int i915_gem_vm_bind_ioctl(struct
> > > drm_device *dev, void *data,
> > >  {
> > >         struct drm_i915_gem_vm_bind *args = data;
> > >         struct i915_address_space *vm;
> > > +       int ret;
> > >  
> > >         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> > >         if (unlikely(!vm))
> > >                 return -ENOENT;
> > >  
> > > +       ret = i915_gem_vm_bind_obj(vm, args, file);
> > > +
> > >         i915_vm_put(vm);
> > > -       return -EINVAL;
> > > +       return ret;
> > >  }
> > >  
> > >  static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void
> > > *data,
> > > @@ -1797,13 +1801,16 @@ static int
> > > i915_gem_vm_unbind_ioctl(struct
> > > drm_device *dev, void *data,
> > >  {
> > >         struct drm_i915_gem_vm_unbind *args = data;
> > >         struct i915_address_space *vm;
> > > +       int ret;
> > >  
> > >         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> > >         if (unlikely(!vm))
> > >                 return -ENOENT;
> > >  
> > > +       ret = i915_gem_vm_unbind_obj(vm, args);
> > > +
> > >         i915_vm_put(vm);
> > > -       return -EINVAL;
> > > +       return ret;
> > >  }
> > >  
> > >  static const struct drm_ioctl_desc i915_ioctls[] = {
> > > diff --git a/drivers/gpu/drm/i915/i915_vma.c
> > > b/drivers/gpu/drm/i915/i915_vma.c
> > > index 43339ecabd73..d324e29cef0a 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma.c
> > > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > > @@ -29,6 +29,7 @@
> > >  #include "display/intel_frontbuffer.h"
> > >  #include "gem/i915_gem_lmem.h"
> > >  #include "gem/i915_gem_tiling.h"
> > > +#include "gem/i915_gem_vm_bind.h"
> > >  #include "gt/intel_engine.h"
> > >  #include "gt/intel_engine_heartbeat.h"
> > >  #include "gt/intel_gt.h"
> > > @@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
> > >         spin_unlock(&obj->vma.lock);
> > >         mutex_unlock(&vm->mutex);
> > >  
> > > +       INIT_LIST_HEAD(&vma->vm_bind_link);
> > >         return vma;
> > >  
> > >  err_unlock:
> > > @@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object
> > > *obj,
> > >  {
> > >         struct i915_vma *vma;
> > >  
> > > -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
> > >         GEM_BUG_ON(!kref_read(&vm->ref));
> > >  
> > >         spin_lock(&obj->vma.lock);
> > > @@ -1660,6 +1661,10 @@ static void release_references(struct
> > > i915_vma
> > > *vma, bool vm_ddestroy)
> > >  
> > >         spin_unlock(&obj->vma.lock);
> > >  
> > > +       i915_gem_vm_bind_lock(vma->vm);
> > > +       i915_gem_vm_bind_remove(vma, true);
> > > +       i915_gem_vm_bind_unlock(vma->vm);
> > > +
> > 
> > The vm might be destroyed at this point already.
> > 
> 
> Ah, due to async vma resource release...
> 
> > From what I understand we can destroy the vma from three call
> > sites:
> > 1) VM_UNBIND -> The vma has already been removed from the vm_bind
> > address space,
> > 2) object destruction -> since the vma has an object reference
> > while in
> > the vm_bind address space, it must also have been removed from the
> > address space if called from object destruction.
> > 3) vm destruction. Suggestion is to call VM_UNBIND from under the
> > vm_bind lock early in vm destruction.
> > 
> > Then the above added code can be removed and replaced with an
> > assert
> > that the vm_bind address space RB_NODE is indeed empty.
> > 
> 
> ...yah, makes sense to move this code to early in VM destruction than
> here.
> 
> Niranjana
> 
> > 
> > >         spin_lock_irq(&gt->closed_lock);
> > >         __i915_vma_remove_closed(vma);
> > >         spin_unlock_irq(&gt->closed_lock);
> > > diff --git a/drivers/gpu/drm/i915/i915_vma.h
> > > b/drivers/gpu/drm/i915/i915_vma.h
> > > index 88ca0bd9c900..dcb49f79ff7e 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma.h
> > > +++ b/drivers/gpu/drm/i915/i915_vma.h
> > > @@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
> > >  {
> > >         ptrdiff_t cmp;
> > >  
> > > -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
> > > -
> > >         cmp = ptrdiff(vma->vm, vm);
> > >         if (cmp)
> > >                 return cmp;
> > > diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> > > b/drivers/gpu/drm/i915/i915_vma_types.h
> > > index be6e028c3b57..b6d179bdbfa0 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma_types.h
> > > +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> > > @@ -289,6 +289,14 @@ struct i915_vma {
> > >         /** This object's place on the active/inactive lists */
> > >         struct list_head vm_link;
> > >  
> > > +       struct list_head vm_bind_link; /* Link in persistent VMA
> > > list
> > > */
> > > +
> > > +       /** Interval tree structures for persistent vma */
> > 
> > Proper kerneldoc.
> > 
> > > +       struct rb_node rb;
> > > +       u64 start;
> > > +       u64 last;
> > > +       u64 __subtree_last;
> > > +
> > >         struct list_head obj_link; /* Link in the object's VMA
> > > list
> > > */
> > >         struct rb_node obj_node;
> > >         struct hlist_node obj_hash;
> > 
> > Thanks,
> > Thomas
> > 


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
@ 2022-07-07  8:14         ` Thomas Hellström
  0 siblings, 0 replies; 121+ messages in thread
From: Thomas Hellström @ 2022-07-07  8:14 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, matthew.auld,
	daniel.vetter, christian.koenig

On Wed, 2022-07-06 at 22:43 -0700, Niranjana Vishwanathapura wrote:
> On Wed, Jul 06, 2022 at 06:21:03PM +0200, Thomas Hellström wrote:
> > On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> > > Bind and unbind the mappings upon VM_BIND and VM_UNBIND calls.
> > > 
> > > Signed-off-by: Niranjana Vishwanathapura
> > > <niranjana.vishwanathapura@intel.com>
> > > Signed-off-by: Prathap Kumar Valsan
> > > <prathap.kumar.valsan@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/Makefile                 |   1 +
> > >  drivers/gpu/drm/i915/gem/i915_gem_create.c    |  10 +-
> > >  drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +
> > >  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  38 +++
> > >  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 233
> > > ++++++++++++++++++
> > >  drivers/gpu/drm/i915/gt/intel_gtt.c           |   7 +
> > >  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
> > >  drivers/gpu/drm/i915/i915_driver.c            |  11 +-
> > >  drivers/gpu/drm/i915/i915_vma.c               |   7 +-
> > >  drivers/gpu/drm/i915/i915_vma.h               |   2 -
> > >  drivers/gpu/drm/i915/i915_vma_types.h         |   8 +
> > >  11 files changed, 318 insertions(+), 10 deletions(-)
> > >  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > >  create mode 100644
> > > drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > 
> > > diff --git a/drivers/gpu/drm/i915/Makefile
> > > b/drivers/gpu/drm/i915/Makefile
> > > index 522ef9b4aff3..4e1627e96c6e 100644
> > > --- a/drivers/gpu/drm/i915/Makefile
> > > +++ b/drivers/gpu/drm/i915/Makefile
> > > @@ -165,6 +165,7 @@ gem-y += \
> > >         gem/i915_gem_ttm_move.o \
> > >         gem/i915_gem_ttm_pm.o \
> > >         gem/i915_gem_userptr.o \
> > > +       gem/i915_gem_vm_bind_object.o \
> > >         gem/i915_gem_wait.o \
> > >         gem/i915_gemfs.o
> > >  i915-y += \
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > index 33673fe7ee0a..927a87e5ec59 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > @@ -15,10 +15,10 @@
> > >  #include "i915_trace.h"
> > >  #include "i915_user_extensions.h"
> > >  
> > > -static u32 object_max_page_size(struct intel_memory_region
> > > **placements,
> > > -                               unsigned int n_placements)
> > > +u32 i915_gem_object_max_page_size(struct intel_memory_region
> > > **placements,
> > 
> > Kerneldoc.
> 
> This is an existing function that is being modified. As I
> mentioned in other thread, we probably need a prep patch early
> in this series to add missing kernel-docs in i915 which this
> patch series would later update.

Here we make a static function extern, which according to the patch
submission guidelines, mandates a kerneloc comment, so it's not so much
that the function is modified. We should be fine adding kerneldoc in
the patch that makes the function extern.


> 
> > 
> > > +                                 unsigned int n_placements)
> > >  {
> > > -       u32 max_page_size = 0;
> > > +       u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
> > >         int i;
> > >  
> > >         for (i = 0; i < n_placements; i++) {
> > > @@ -28,7 +28,6 @@ static u32 object_max_page_size(struct
> > > intel_memory_region **placements,
> > >                 max_page_size = max_t(u32, max_page_size, mr-
> > > > min_page_size);
> > >         }
> > >  
> > > -       GEM_BUG_ON(!max_page_size);
> > >         return max_page_size;
> > >  }
> > 
> > Should this change be separated out? It's not immediately clear to
> > a
> > reviewer why it is included.
> 
> It is being removed as max_page_size now has a non-zero default
> value and hence this check is not valid anymore.

But that in itself deserves an explanation in the patch commit message.
So that's why I wondered whether it wouldn't be better to separate it
out?

> 
> > 
> > >  
> > > @@ -99,7 +98,8 @@ __i915_gem_object_create_user_ext(struct
> > > drm_i915_private *i915, u64 size,
> > >  
> > >         i915_gem_flush_free_objects(i915);
> > >  
> > > -       size = round_up(size, object_max_page_size(placements,
> > > n_placements));
> > > +       size = round_up(size,
> > > i915_gem_object_max_page_size(placements,
> > > +                                                          
> > > n_placements));
> > >         if (size == 0)
> > >                 return ERR_PTR(-EINVAL);
> > >  
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > index 6f0a3ce35567..650de2224843 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > @@ -47,6 +47,8 @@ static inline bool
> > > i915_gem_object_size_2big(u64
> > > size)
> > >  }
> > >  
> > >  void i915_gem_init__objects(struct drm_i915_private *i915);
> > > +u32 i915_gem_object_max_page_size(struct intel_memory_region
> > > **placements,
> > > +                                 unsigned int n_placements);
> > >  
> > >  void i915_objects_module_exit(void);
> > >  int i915_objects_module_init(void);
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > new file mode 100644
> > > index 000000000000..642cdb559f17
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > @@ -0,0 +1,38 @@
> > > +/* SPDX-License-Identifier: MIT */
> > > +/*
> > > + * Copyright © 2022 Intel Corporation
> > > + */
> > > +
> > > +#ifndef __I915_GEM_VM_BIND_H
> > > +#define __I915_GEM_VM_BIND_H
> > > +
> > > +#include "i915_drv.h"
> > > +
> > > +#define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
> > > > vm_bind_lock)
> > > +
> > > +static inline void i915_gem_vm_bind_lock(struct
> > > i915_address_space
> > > *vm)
> > > +{
> > > +       mutex_lock(&vm->vm_bind_lock);
> > > +}
> > > +
> > > +static inline int
> > > +i915_gem_vm_bind_lock_interruptible(struct i915_address_space
> > > *vm)
> > > +{
> > > +       return mutex_lock_interruptible(&vm->vm_bind_lock);
> > > +}
> > > +
> > > +static inline void i915_gem_vm_bind_unlock(struct
> > > i915_address_space
> > > *vm)
> > > +{
> > > +       mutex_unlock(&vm->vm_bind_lock);
> > > +}
> > > +
> > 
> > Kerneldoc for the inlines.
> > 
> > > +struct i915_vma *
> > > +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64
> > > va);
> > > +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
> > > release_obj);
> > > +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> > > +                        struct drm_i915_gem_vm_bind *va,
> > > +                        struct drm_file *file);
> > > +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
> > > +                          struct drm_i915_gem_vm_unbind *va);
> > > +
> > > +#endif /* __I915_GEM_VM_BIND_H */
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > new file mode 100644
> > > index 000000000000..43ceb4dcca6c
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > @@ -0,0 +1,233 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright © 2022 Intel Corporation
> > > + */
> > > +
> > > +#include <linux/interval_tree_generic.h>
> > > +
> > > +#include "gem/i915_gem_vm_bind.h"
> > > +#include "gt/gen8_engine_cs.h"
> > > +
> > > +#include "i915_drv.h"
> > > +#include "i915_gem_gtt.h"
> > > +
> > > +#define START(node) ((node)->start)
> > > +#define LAST(node) ((node)->last)
> > > +
> > > +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
> > > +                    START, LAST, static inline, i915_vm_bind_it)
> > > +
> > > +#undef START
> > > +#undef LAST
> > > +
> > > +/**
> > > + * DOC: VM_BIND/UNBIND ioctls
> > > + *
> > > + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind
> > > GEM
> > > buffer
> > > + * objects (BOs) or sections of a BOs at specified GPU virtual
> > > addresses on a
> > > + * specified address space (VM). Multiple mappings can map to
> > > the
> > > same physical
> > > + * pages of an object (aliasing). These mappings (also referred
> > > to
> > > as persistent
> > > + * mappings) will be persistent across multiple GPU submissions
> > > (execbuf calls)
> > > + * issued by the UMD, without user having to provide a list of
> > > all
> > > required
> > > + * mappings during each submission (as required by older execbuf
> > > mode).
> > > + *
> > > + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out
> > > fence for
> > > + * signaling the completion of bind/unbind operation.
> > > + *
> > > + * VM_BIND feature is advertised to user via
> > > I915_PARAM_VM_BIND_VERSION.
> > > + * User has to opt-in for VM_BIND mode of binding for an address
> > > space (VM)
> > > + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND
> > > extension.
> > > + *
> > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> > > concurrently
> > > + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND
> > > operations can be
> > > + * done asynchronously, when valid out fence is specified.
> > > + *
> > > + * VM_BIND locking order is as below.
> > > + *
> > > + * 1) Lock-A: A vm_bind mutex will protect vm_bind lists. This
> > > lock
> > > is taken in
> > > + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and
> > > while
> > > releasing the
> > > + *    mapping.
> > > + *
> > > + *    In future, when GPU page faults are supported, we can
> > > potentially use a
> > > + *    rwsem instead, so that multiple page fault handlers can
> > > take
> > > the read
> > > + *    side lock to lookup the mapping and hence can run in
> > > parallel.
> > > + *    The older execbuf mode of binding do not need this lock.
> > > + *
> > > + * 2) Lock-B: The object's dma-resv lock will protect i915_vma
> > > state
> > > and needs
> > > + *    to be held while binding/unbinding a vma in the async
> > > worker
> > > and while
> > > + *    updating dma-resv fence list of an object. Note that
> > > private
> > > BOs of a VM
> > > + *    will all share a dma-resv object.
> > > + *
> > > + *    The future system allocator support will use the HMM
> > > prescribed locking
> > > + *    instead.
> > 
> > I don't think the last sentence is relevant for this series. Also,
> > are
> > there any other mentions for Locks A, B and C? If not, can we ditch
> > that naming?
> 
> It is taken from design rfc :). Yah, I think better to remove it and
> probably the lock names and make it more specific to the
> implementation
> in this patch series.

Ah, OK, if it's taken from the RFC and is an established naming in
documentation that will remain, then it's fine with me. Perhaps with a
pointer added to that doc that will help the reader.

> 
> > 
> > > + *
> > > + * 3) Lock-C: Spinlock/s to protect some of the VM's lists like
> > > the
> > > list of
> > > + *    invalidated vmas (due to eviction and userptr
> > > invalidation)
> > > etc.
> > > + */
> > > +
> > > +struct i915_vma *
> > > +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64
> > > va)
> > 
> > Kerneldoc for the extern functions.
> > 
> > 
> > > +{
> > > +       struct i915_vma *vma, *temp;
> > > +
> > > +       assert_vm_bind_held(vm);
> > > +
> > > +       vma = i915_vm_bind_it_iter_first(&vm->va, va, va);
> > > +       /* Working around compiler error, remove later */
> > 
> > Is this still relevant? What compiler error is seen here?
> > 
> > > +       if (vma)
> > > +               temp = i915_vm_bind_it_iter_next(vma, va + vma-
> > > >size,
> > > -1);
> > > +       return vma;
> > > +}
> > > +
> > > +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
> > > release_obj)
> > > +{
> > > +       assert_vm_bind_held(vma->vm);
> > > +
> > > +       if (!list_empty(&vma->vm_bind_link)) {
> > > +               list_del_init(&vma->vm_bind_link);
> > > +               i915_vm_bind_it_remove(vma, &vma->vm->va);
> > > +
> > > +               /* Release object */
> > > +               if (release_obj)
> > > +                       i915_vma_put(vma);
> > 
> > i915_vma_put() here is confusing. Can we use i915_gem_object_put()
> > to
> > further make it clear that the persistent vmas actually take a
> > reference on the object?
> > 
> 
> makes sense.
> 
> > > +       }
> > > +}
> > > +
> > > +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
> > > +                          struct drm_i915_gem_vm_unbind *va)
> > > +{
> > > +       struct drm_i915_gem_object *obj;
> > > +       struct i915_vma *vma;
> > > +       int ret;
> > > +
> > > +       va->start = gen8_noncanonical_addr(va->start);
> > > +       ret = i915_gem_vm_bind_lock_interruptible(vm);
> > > +       if (ret)
> > > +               return ret;
> > > +
> > > +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> > > +       if (!vma) {
> > > +               ret = -ENOENT;
> > > +               goto out_unlock;
> > > +       }
> > > +
> > > +       if (vma->size != va->length)
> > > +               ret = -EINVAL;
> > > +       else
> > > +               i915_gem_vm_bind_remove(vma, false);
> > > +
> > > +out_unlock:
> > > +       i915_gem_vm_bind_unlock(vm);
> > > +       if (ret || !vma)
> > > +               return ret;
> > > +
> > > +       /* Destroy vma and then release object */
> > > +       obj = vma->obj;
> > > +       ret = i915_gem_object_lock(obj, NULL);
> > > +       if (ret)
> > > +               return ret;
> > 
> > This call never returns an error and we could GEM_WARN_ON(...), or
> > (void) to annotate that the return value is wilfully ignored.
> > 
> 
> makes sense.
> 
> > > +
> > > +       i915_vma_destroy(vma);
> > > +       i915_gem_object_unlock(obj);
> > > +       i915_gem_object_put(obj);
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static struct i915_vma *vm_bind_get_vma(struct
> > > i915_address_space
> > > *vm,
> > > +                                       struct
> > > drm_i915_gem_object
> > > *obj,
> > > +                                       struct
> > > drm_i915_gem_vm_bind
> > > *va)
> > > +{
> > > +       struct i915_ggtt_view view;
> > > +       struct i915_vma *vma;
> > > +
> > > +       va->start = gen8_noncanonical_addr(va->start);
> > > +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> > > +       if (vma)
> > > +               return ERR_PTR(-EEXIST);
> > > +
> > > +       view.type = I915_GGTT_VIEW_PARTIAL;
> > > +       view.partial.offset = va->offset >> PAGE_SHIFT;
> > > +       view.partial.size = va->length >> PAGE_SHIFT;
> > 
> > IIRC, this vma view is not handled correctly in the vma code, that
> > only
> > understands views for ggtt bindings.
> > 
> 
> This patch series extends the partial view to ppgtt also.
> Yah, the naming is still i915_ggtt_view, but I am hoping we can fix
> the
> name in a follow up patch later.

Hmm, I somehow thought that the vma page adjustment was a NOP on ppgtt
and only done on ggtt. But that's indeed not the case. Yes, then this
is ok. We need to remember, though, that if we're going to use the
existing vma async unbinding functionality, we'd need to attach the vma
pages to the vma resource.


> 
> > 
> > > +       vma = i915_vma_instance(obj, vm, &view);
> > > +       if (IS_ERR(vma))
> > > +               return vma;
> > > +
> > > +       vma->start = va->start;
> > > +       vma->last = va->start + va->length - 1;
> > > +
> > > +       return vma;
> > > +}
> > > +
> > > +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> > > +                        struct drm_i915_gem_vm_bind *va,
> > > +                        struct drm_file *file)
> > > +{
> > > +       struct drm_i915_gem_object *obj;
> > > +       struct i915_vma *vma = NULL;
> > > +       struct i915_gem_ww_ctx ww;
> > > +       u64 pin_flags;
> > > +       int ret = 0;
> > > +
> > > +       if (!vm->vm_bind_mode)
> > > +               return -EOPNOTSUPP;
> > > +
> > > +       obj = i915_gem_object_lookup(file, va->handle);
> > > +       if (!obj)
> > > +               return -ENOENT;
> > > +
> > > +       if (!va->length ||
> > > +           !IS_ALIGNED(va->offset | va->length,
> > > +                       i915_gem_object_max_page_size(obj-
> > > > mm.placements,
> > > +                                                     obj-
> > > > mm.n_placements)) ||
> > > +           range_overflows_t(u64, va->offset, va->length, obj-
> > > > base.size)) {
> > > +               ret = -EINVAL;
> > > +               goto put_obj;
> > > +       }
> > > +
> > > +       ret = i915_gem_vm_bind_lock_interruptible(vm);
> > > +       if (ret)
> > > +               goto put_obj;
> > > +
> > > +       vma = vm_bind_get_vma(vm, obj, va);
> > > +       if (IS_ERR(vma)) {
> > > +               ret = PTR_ERR(vma);
> > > +               goto unlock_vm;
> > > +       }
> > > +
> > > +       i915_gem_ww_ctx_init(&ww, true);
> > > +       pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
> > > +retry:
> > > +       ret = i915_gem_object_lock(vma->obj, &ww);
> > > +       if (ret)
> > > +               goto out_ww;
> > > +
> > > +       ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
> > > +       if (ret)
> > > +               goto out_ww;
> > > +
> > > +       /* Make it evictable */
> > > +       __i915_vma_unpin(vma);
> > 
> > A considerable effort has been put into avoiding short term vma
> > pins in
> > i915. We should add an interface like i915_vma_bind_ww() that
> > avoids
> > the pin altoghether.
> 
> Currently in i915 driver VA managment and device page table bindings
> are tightly coupled. i915_vma_pin_ww() does the both VA allocation
> and
> biding. And we also interpret VA being allocated (drm_mm node
> allocated)
> also as vma is bound.
> 
> Decoupling it would be ideal but I think it needs to be carefully
> done
> in a separate patch series to not cause any regression.

So the idea would be not to decouple these, but to just avoid pinning
the vma in the process.


> 
> > 
> > > +
> > > +       list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
> > > +       i915_vm_bind_it_insert(vma, &vm->va);
> > > +
> > > +       /* Hold object reference until vm_unbind */
> > > +       i915_gem_object_get(vma->obj);
> > > +out_ww:
> > > +       if (ret == -EDEADLK) {
> > > +               ret = i915_gem_ww_ctx_backoff(&ww);
> > > +               if (!ret)
> > > +                       goto retry;
> > > +       }
> > > +
> > > +       if (ret)
> > > +               i915_vma_destroy(vma);
> > > +
> > > +       i915_gem_ww_ctx_fini(&ww);
> > 
> > Could use for_i915_gem_ww()?
> > 
> 
> Yah, I think it is a better idea to use it.
> 
> > > +unlock_vm:
> > > +       i915_gem_vm_bind_unlock(vm);
> > > +put_obj:
> > > +       i915_gem_object_put(obj);
> > > +       return ret;
> > > +}
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > index b67831833c9a..135dc4a76724 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > @@ -176,6 +176,8 @@ int i915_vm_lock_objects(struct
> > > i915_address_space *vm,
> > >  void i915_address_space_fini(struct i915_address_space *vm)
> > >  {
> > >         drm_mm_takedown(&vm->mm);
> > > +       GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
> > > +       mutex_destroy(&vm->vm_bind_lock);
> > >  }
> > >  
> > >  /**
> > > @@ -282,6 +284,11 @@ void i915_address_space_init(struct
> > > i915_address_space *vm, int subclass)
> > >  
> > >         INIT_LIST_HEAD(&vm->bound_list);
> > >         INIT_LIST_HEAD(&vm->unbound_list);
> > > +
> > > +       vm->va = RB_ROOT_CACHED;
> > > +       INIT_LIST_HEAD(&vm->vm_bind_list);
> > > +       INIT_LIST_HEAD(&vm->vm_bound_list);
> > > +       mutex_init(&vm->vm_bind_lock);
> > >  }
> > >  
> > >  void *__px_vaddr(struct drm_i915_gem_object *p)
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > index c812aa9708ae..d4a6ce65251d 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > @@ -259,6 +259,15 @@ struct i915_address_space {
> > >          */
> > >         struct list_head unbound_list;
> > >  
> > > +       /**
> > > +        * List of VM_BIND objects.
> > > +        */
> > 
> > Proper kerneldoc + intel locking guidelines comments, please.
> > 
> > > +       struct mutex vm_bind_lock;  /* Protects vm_bind lists */
> > > +       struct list_head vm_bind_list;
> > > +       struct list_head vm_bound_list;
> > > +       /* va tree of persistent vmas */
> > > +       struct rb_root_cached va;
> > > +
> > >         /* Global GTT */
> > >         bool is_ggtt:1;
> > >  
> > > diff --git a/drivers/gpu/drm/i915/i915_driver.c
> > > b/drivers/gpu/drm/i915/i915_driver.c
> > > index ccf990dfd99b..776ab7844f60 100644
> > > --- a/drivers/gpu/drm/i915/i915_driver.c
> > > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > > @@ -68,6 +68,7 @@
> > >  #include "gem/i915_gem_ioctls.h"
> > >  #include "gem/i915_gem_mman.h"
> > >  #include "gem/i915_gem_pm.h"
> > > +#include "gem/i915_gem_vm_bind.h"
> > >  #include "gt/intel_gt.h"
> > >  #include "gt/intel_gt_pm.h"
> > >  #include "gt/intel_rc6.h"
> > > @@ -1783,13 +1784,16 @@ static int i915_gem_vm_bind_ioctl(struct
> > > drm_device *dev, void *data,
> > >  {
> > >         struct drm_i915_gem_vm_bind *args = data;
> > >         struct i915_address_space *vm;
> > > +       int ret;
> > >  
> > >         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> > >         if (unlikely(!vm))
> > >                 return -ENOENT;
> > >  
> > > +       ret = i915_gem_vm_bind_obj(vm, args, file);
> > > +
> > >         i915_vm_put(vm);
> > > -       return -EINVAL;
> > > +       return ret;
> > >  }
> > >  
> > >  static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void
> > > *data,
> > > @@ -1797,13 +1801,16 @@ static int
> > > i915_gem_vm_unbind_ioctl(struct
> > > drm_device *dev, void *data,
> > >  {
> > >         struct drm_i915_gem_vm_unbind *args = data;
> > >         struct i915_address_space *vm;
> > > +       int ret;
> > >  
> > >         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
> > >         if (unlikely(!vm))
> > >                 return -ENOENT;
> > >  
> > > +       ret = i915_gem_vm_unbind_obj(vm, args);
> > > +
> > >         i915_vm_put(vm);
> > > -       return -EINVAL;
> > > +       return ret;
> > >  }
> > >  
> > >  static const struct drm_ioctl_desc i915_ioctls[] = {
> > > diff --git a/drivers/gpu/drm/i915/i915_vma.c
> > > b/drivers/gpu/drm/i915/i915_vma.c
> > > index 43339ecabd73..d324e29cef0a 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma.c
> > > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > > @@ -29,6 +29,7 @@
> > >  #include "display/intel_frontbuffer.h"
> > >  #include "gem/i915_gem_lmem.h"
> > >  #include "gem/i915_gem_tiling.h"
> > > +#include "gem/i915_gem_vm_bind.h"
> > >  #include "gt/intel_engine.h"
> > >  #include "gt/intel_engine_heartbeat.h"
> > >  #include "gt/intel_gt.h"
> > > @@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
> > >         spin_unlock(&obj->vma.lock);
> > >         mutex_unlock(&vm->mutex);
> > >  
> > > +       INIT_LIST_HEAD(&vma->vm_bind_link);
> > >         return vma;
> > >  
> > >  err_unlock:
> > > @@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object
> > > *obj,
> > >  {
> > >         struct i915_vma *vma;
> > >  
> > > -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
> > >         GEM_BUG_ON(!kref_read(&vm->ref));
> > >  
> > >         spin_lock(&obj->vma.lock);
> > > @@ -1660,6 +1661,10 @@ static void release_references(struct
> > > i915_vma
> > > *vma, bool vm_ddestroy)
> > >  
> > >         spin_unlock(&obj->vma.lock);
> > >  
> > > +       i915_gem_vm_bind_lock(vma->vm);
> > > +       i915_gem_vm_bind_remove(vma, true);
> > > +       i915_gem_vm_bind_unlock(vma->vm);
> > > +
> > 
> > The vm might be destroyed at this point already.
> > 
> 
> Ah, due to async vma resource release...
> 
> > From what I understand we can destroy the vma from three call
> > sites:
> > 1) VM_UNBIND -> The vma has already been removed from the vm_bind
> > address space,
> > 2) object destruction -> since the vma has an object reference
> > while in
> > the vm_bind address space, it must also have been removed from the
> > address space if called from object destruction.
> > 3) vm destruction. Suggestion is to call VM_UNBIND from under the
> > vm_bind lock early in vm destruction.
> > 
> > Then the above added code can be removed and replaced with an
> > assert
> > that the vm_bind address space RB_NODE is indeed empty.
> > 
> 
> ...yah, makes sense to move this code to early in VM destruction than
> here.
> 
> Niranjana
> 
> > 
> > >         spin_lock_irq(&gt->closed_lock);
> > >         __i915_vma_remove_closed(vma);
> > >         spin_unlock_irq(&gt->closed_lock);
> > > diff --git a/drivers/gpu/drm/i915/i915_vma.h
> > > b/drivers/gpu/drm/i915/i915_vma.h
> > > index 88ca0bd9c900..dcb49f79ff7e 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma.h
> > > +++ b/drivers/gpu/drm/i915/i915_vma.h
> > > @@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
> > >  {
> > >         ptrdiff_t cmp;
> > >  
> > > -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
> > > -
> > >         cmp = ptrdiff(vma->vm, vm);
> > >         if (cmp)
> > >                 return cmp;
> > > diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> > > b/drivers/gpu/drm/i915/i915_vma_types.h
> > > index be6e028c3b57..b6d179bdbfa0 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma_types.h
> > > +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> > > @@ -289,6 +289,14 @@ struct i915_vma {
> > >         /** This object's place on the active/inactive lists */
> > >         struct list_head vm_link;
> > >  
> > > +       struct list_head vm_bind_link; /* Link in persistent VMA
> > > list
> > > */
> > > +
> > > +       /** Interval tree structures for persistent vma */
> > 
> > Proper kerneldoc.
> > 
> > > +       struct rb_node rb;
> > > +       u64 start;
> > > +       u64 last;
> > > +       u64 __subtree_last;
> > > +
> > >         struct list_head obj_link; /* Link in the object's VMA
> > > list
> > > */
> > >         struct rb_node obj_node;
> > >         struct hlist_node obj_hash;
> > 
> > Thanks,
> > Thomas
> > 


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
  2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-07 10:31     ` Hellstrom, Thomas
  -1 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-07 10:31 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Brost, Matthew, Zanoni, Paulo R, Ursulin, Tvrtko, Landwerlin,
	Lionel G, Auld, Matthew, jason, Vetter, Daniel, christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> Add uapi allowing user to specify a BO as private to a specified VM
> during the BO creation.
> VM private BOs can only be mapped on the specified VM and can't be
> dma_buf exported. VM private BOs share a single common dma_resv
> object,
> hence has a performance advantage requiring a single dma_resv object
> update in the execbuf path compared to non-private (shared) BOs.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41
> ++++++++++++++++++-
>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>  drivers/gpu/drm/i915/i915_vma.c               |  1 +
>  drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
>  include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
>  11 files changed, 110 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> index 927a87e5ec59..7e264566b51f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> @@ -11,6 +11,7 @@
>  #include "pxp/intel_pxp.h"
>  
>  #include "i915_drv.h"
> +#include "i915_gem_context.h"
>  #include "i915_gem_create.h"
>  #include "i915_trace.h"
>  #include "i915_user_extensions.h"
> @@ -243,6 +244,7 @@ struct create_ext {
>         unsigned int n_placements;
>         unsigned int placement_mask;
>         unsigned long flags;
> +       u32 vm_id;
>  };
>  
>  static void repr_placements(char *buf, size_t size,
> @@ -392,9 +394,24 @@ static int ext_set_protected(struct
> i915_user_extension __user *base, void *data
>         return 0;
>  }
>  
> +static int ext_set_vm_private(struct i915_user_extension __user
> *base,
> +                             void *data)
> +{
> +       struct drm_i915_gem_create_ext_vm_private ext;
> +       struct create_ext *ext_data = data;
> +
> +       if (copy_from_user(&ext, base, sizeof(ext)))
> +               return -EFAULT;
> +
> +       ext_data->vm_id = ext.vm_id;
> +
> +       return 0;
> +}
> +
>  static const i915_user_extension_fn create_extensions[] = {
>         [I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>         [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
> +       [I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
>  };
>  
>  /**
> @@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev,
> void *data,
>         struct drm_i915_private *i915 = to_i915(dev);
>         struct drm_i915_gem_create_ext *args = data;
>         struct create_ext ext_data = { .i915 = i915 };
> +       struct i915_address_space *vm = NULL;
>         struct drm_i915_gem_object *obj;
>         int ret;
>  
> @@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device
> *dev, void *data,
>         if (ret)
>                 return ret;
>  
> +       if (ext_data.vm_id) {
> +               vm = i915_gem_vm_lookup(file->driver_priv,
> ext_data.vm_id);
> +               if (unlikely(!vm))
> +                       return -ENOENT;
> +       }
> +
>         if (!ext_data.n_placements) {
>                 ext_data.placements[0] =
>                         intel_memory_region_by_type(i915,
> INTEL_MEMORY_SYSTEM);
> @@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device
> *dev, void *data,
>                                                 ext_data.placements,
>                                                 ext_data.n_placements
> ,
>                                                 ext_data.flags);
> -       if (IS_ERR(obj))
> -               return PTR_ERR(obj);
> +       if (IS_ERR(obj)) {
> +               ret = PTR_ERR(obj);
> +               goto vm_put;
> +       }
> +
> +       if (vm) {
> +               obj->base.resv = vm->root_obj->base.resv;
> +               obj->priv_root = i915_gem_object_get(vm->root_obj);
> +               i915_vm_put(vm);
> +       }
>  
>         return i915_gem_publish(obj, file, &args->size, &args-
> >handle);
> +vm_put:
> +       if (vm)
> +               i915_vm_put(vm);
> +
> +       return ret;
>  }
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index f5062d0c6333..6433173c3e84 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct
> drm_gem_object *gem_obj, int flags)
>         struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
>         DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>  
> +       if (obj->priv_root) {
> +               drm_dbg(obj->base.dev,
> +                       "Exporting VM private objects is not
> allowed\n");
> +               return ERR_PTR(-EINVAL);
> +       }
> +
>         exp_info.ops = &i915_dmabuf_ops;
>         exp_info.size = gem_obj->size;
>         exp_info.flags = flags;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 5cf36a130061..9fe3395ad4d9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -241,6 +241,9 @@ struct drm_i915_gem_object {
>  
>         const struct drm_i915_gem_object_ops *ops;
>  
> +       /* Shared root is object private to a VM; NULL otherwise */
> +       struct drm_i915_gem_object *priv_root;
> +
>         struct {
>                 /**
>                  * @vma.lock: protect the list/tree of vmas
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 7e1f8b83077f..f1912b12db00 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct
> ttm_buffer_object *bo)
>         i915_gem_object_release_memory_region(obj);
>         mutex_destroy(&obj->ttm.get_io_page.lock);
>  
> +       if (obj->priv_root)
> +               i915_gem_object_put(obj->priv_root);

This only works for ttm-based objects. For non-TTM objects on
integrated, we'll need to mimic the dma-resv individualization from
TTM.

> +
>         if (obj->ttm.created) {
>                 /*
>                  * We freely manage the shrinker LRU outide of the
> mm.pages life
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> index 642cdb559f17..ee6e4c52e80e 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -26,6 +26,17 @@ static inline void i915_gem_vm_bind_unlock(struct
> i915_address_space *vm)
>         mutex_unlock(&vm->vm_bind_lock);
>  }
>  
> +static inline int i915_gem_vm_priv_lock(struct i915_address_space
> *vm,
> +                                       struct i915_gem_ww_ctx *ww)
> +{
> +       return i915_gem_object_lock(vm->root_obj, ww);
> +}

Please make a pass on this patch making sure we provide kerneldoc where
supposed to.

> +
> +static inline void i915_gem_vm_priv_unlock(struct i915_address_space
> *vm)
> +{
> +       i915_gem_object_unlock(vm->root_obj);
> +}
> +
>  struct i915_vma *
>  i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>  void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
> release_obj);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> index 43ceb4dcca6c..3201204c8e74 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma,
> bool release_obj)
>  
>         if (!list_empty(&vma->vm_bind_link)) {
>                 list_del_init(&vma->vm_bind_link);
> +               list_del_init(&vma->non_priv_vm_bind_link);
>                 i915_vm_bind_it_remove(vma, &vma->vm->va);
>  
>                 /* Release object */
> @@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct
> i915_address_space *vm,
>                 goto put_obj;
>         }
>  
> +       if (obj->priv_root && obj->priv_root != vm->root_obj) {
> +               ret = -EINVAL;
> +               goto put_obj;
> +       }
> +
>         ret = i915_gem_vm_bind_lock_interruptible(vm);
>         if (ret)
>                 goto put_obj;
> @@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct
> i915_address_space *vm,
>  
>         list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>         i915_vm_bind_it_insert(vma, &vm->va);
> +       if (!obj->priv_root)
> +               list_add_tail(&vma->non_priv_vm_bind_link,
> +                             &vm->non_priv_vm_bind_list);

I guess I'll find more details in the execbuf patches, but would it
work to keep the non-private objects on the vm_bind_list, and just
never move them to the vm_bound_list, rather than having a separate
list for them?


>  
>         /* Hold object reference until vm_unbind */
>         i915_gem_object_get(vma->obj);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 135dc4a76724..df0a8459c3c6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct
> i915_address_space *vm,
>  void i915_address_space_fini(struct i915_address_space *vm)
>  {
>         drm_mm_takedown(&vm->mm);
> +       i915_gem_object_put(vm->root_obj);
>         GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>         mutex_destroy(&vm->vm_bind_lock);
>  }
> @@ -289,6 +290,9 @@ void i915_address_space_init(struct
> i915_address_space *vm, int subclass)
>         INIT_LIST_HEAD(&vm->vm_bind_list);
>         INIT_LIST_HEAD(&vm->vm_bound_list);
>         mutex_init(&vm->vm_bind_lock);
> +       INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
> +       vm->root_obj = i915_gem_object_create_internal(vm->i915,
> PAGE_SIZE);
> +       GEM_BUG_ON(IS_ERR(vm->root_obj));
>  }
>  
>  void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index d4a6ce65251d..f538ce9115c9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -267,6 +267,8 @@ struct i915_address_space {
>         struct list_head vm_bound_list;
>         /* va tree of persistent vmas */
>         struct rb_root_cached va;
> +       struct list_head non_priv_vm_bind_list;
> +       struct drm_i915_gem_object *root_obj;
>  
>         /* Global GTT */
>         bool is_ggtt:1;
> diff --git a/drivers/gpu/drm/i915/i915_vma.c
> b/drivers/gpu/drm/i915/i915_vma.c
> index d324e29cef0a..f0226581d342 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
>         mutex_unlock(&vm->mutex);
>  
>         INIT_LIST_HEAD(&vma->vm_bind_link);
> +       INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>         return vma;
>  
>  err_unlock:
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> b/drivers/gpu/drm/i915/i915_vma_types.h
> index b6d179bdbfa0..2298b3d6b7c4 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -290,6 +290,8 @@ struct i915_vma {
>         struct list_head vm_link;
>  
>         struct list_head vm_bind_link; /* Link in persistent VMA list
> */
> +       /* Link in non-private persistent VMA list */
> +       struct list_head non_priv_vm_bind_link;
>  
>         /** Interval tree structures for persistent vma */
>         struct rb_node rb;
> diff --git a/include/uapi/drm/i915_drm.h
> b/include/uapi/drm/i915_drm.h
> index 26cca49717f8..ce1c6592b0d7 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
>          *
>          * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
>          * struct drm_i915_gem_create_ext_protected_content.
> +        *
> +        * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
> +        * struct drm_i915_gem_create_ext_vm_private.
>          */
>  #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
>  #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
> +#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
>         __u64 extensions;
>  };
>  
> @@ -3662,6 +3666,32 @@ struct
> drm_i915_gem_create_ext_protected_content {
>  /* ID of the protected content session managed by i915 when PXP is
> active */
>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>  
> +/**
> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
> object
> + * private to the specified VM.
> + *
> + * See struct drm_i915_gem_create_ext.
> + *
> + * By default, BOs can be mapped on multiple VMs and can also be
> dma-buf
> + * exported. Hence these BOs are referred to as Shared BOs.
> + * During each execbuf3 submission, the request fence must be added
> to the
> + * dma-resv fence list of all shared BOs mapped on the VM.
> + *
> + * Unlike Shared BOs, these VM private BOs can only be mapped on the
> VM they
> + * are private to and can't be dma-buf exported. All private BOs of
> a VM share
> + * the dma-resv object. Hence during each execbuf3 submission, they
> need only
> + * one dma-resv fence list updated. Thus, the fast path (where
> required
> + * mappings are already bound) submission latency is O(1) w.r.t the
> number of
> + * VM private BOs.
> + */
> +struct drm_i915_gem_create_ext_vm_private {
> +       /** @base: Extension link. See struct i915_user_extension. */
> +       struct i915_user_extension base;
> +
> +       /** @vm_id: Id of the VM to which the object is private */
> +       __u32 vm_id;
> +};
> +
>  /**
>   * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>   *

Thanks,
Thomas


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
@ 2022-07-07 10:31     ` Hellstrom, Thomas
  0 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-07 10:31 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Zanoni, Paulo R, Auld, Matthew, Vetter, Daniel, christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> Add uapi allowing user to specify a BO as private to a specified VM
> during the BO creation.
> VM private BOs can only be mapped on the specified VM and can't be
> dma_buf exported. VM private BOs share a single common dma_resv
> object,
> hence has a performance advantage requiring a single dma_resv object
> update in the execbuf path compared to non-private (shared) BOs.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41
> ++++++++++++++++++-
>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>  drivers/gpu/drm/i915/i915_vma.c               |  1 +
>  drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
>  include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
>  11 files changed, 110 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> index 927a87e5ec59..7e264566b51f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> @@ -11,6 +11,7 @@
>  #include "pxp/intel_pxp.h"
>  
>  #include "i915_drv.h"
> +#include "i915_gem_context.h"
>  #include "i915_gem_create.h"
>  #include "i915_trace.h"
>  #include "i915_user_extensions.h"
> @@ -243,6 +244,7 @@ struct create_ext {
>         unsigned int n_placements;
>         unsigned int placement_mask;
>         unsigned long flags;
> +       u32 vm_id;
>  };
>  
>  static void repr_placements(char *buf, size_t size,
> @@ -392,9 +394,24 @@ static int ext_set_protected(struct
> i915_user_extension __user *base, void *data
>         return 0;
>  }
>  
> +static int ext_set_vm_private(struct i915_user_extension __user
> *base,
> +                             void *data)
> +{
> +       struct drm_i915_gem_create_ext_vm_private ext;
> +       struct create_ext *ext_data = data;
> +
> +       if (copy_from_user(&ext, base, sizeof(ext)))
> +               return -EFAULT;
> +
> +       ext_data->vm_id = ext.vm_id;
> +
> +       return 0;
> +}
> +
>  static const i915_user_extension_fn create_extensions[] = {
>         [I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>         [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
> +       [I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
>  };
>  
>  /**
> @@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev,
> void *data,
>         struct drm_i915_private *i915 = to_i915(dev);
>         struct drm_i915_gem_create_ext *args = data;
>         struct create_ext ext_data = { .i915 = i915 };
> +       struct i915_address_space *vm = NULL;
>         struct drm_i915_gem_object *obj;
>         int ret;
>  
> @@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device
> *dev, void *data,
>         if (ret)
>                 return ret;
>  
> +       if (ext_data.vm_id) {
> +               vm = i915_gem_vm_lookup(file->driver_priv,
> ext_data.vm_id);
> +               if (unlikely(!vm))
> +                       return -ENOENT;
> +       }
> +
>         if (!ext_data.n_placements) {
>                 ext_data.placements[0] =
>                         intel_memory_region_by_type(i915,
> INTEL_MEMORY_SYSTEM);
> @@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device
> *dev, void *data,
>                                                 ext_data.placements,
>                                                 ext_data.n_placements
> ,
>                                                 ext_data.flags);
> -       if (IS_ERR(obj))
> -               return PTR_ERR(obj);
> +       if (IS_ERR(obj)) {
> +               ret = PTR_ERR(obj);
> +               goto vm_put;
> +       }
> +
> +       if (vm) {
> +               obj->base.resv = vm->root_obj->base.resv;
> +               obj->priv_root = i915_gem_object_get(vm->root_obj);
> +               i915_vm_put(vm);
> +       }
>  
>         return i915_gem_publish(obj, file, &args->size, &args-
> >handle);
> +vm_put:
> +       if (vm)
> +               i915_vm_put(vm);
> +
> +       return ret;
>  }
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index f5062d0c6333..6433173c3e84 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct
> drm_gem_object *gem_obj, int flags)
>         struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
>         DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>  
> +       if (obj->priv_root) {
> +               drm_dbg(obj->base.dev,
> +                       "Exporting VM private objects is not
> allowed\n");
> +               return ERR_PTR(-EINVAL);
> +       }
> +
>         exp_info.ops = &i915_dmabuf_ops;
>         exp_info.size = gem_obj->size;
>         exp_info.flags = flags;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 5cf36a130061..9fe3395ad4d9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -241,6 +241,9 @@ struct drm_i915_gem_object {
>  
>         const struct drm_i915_gem_object_ops *ops;
>  
> +       /* Shared root is object private to a VM; NULL otherwise */
> +       struct drm_i915_gem_object *priv_root;
> +
>         struct {
>                 /**
>                  * @vma.lock: protect the list/tree of vmas
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 7e1f8b83077f..f1912b12db00 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct
> ttm_buffer_object *bo)
>         i915_gem_object_release_memory_region(obj);
>         mutex_destroy(&obj->ttm.get_io_page.lock);
>  
> +       if (obj->priv_root)
> +               i915_gem_object_put(obj->priv_root);

This only works for ttm-based objects. For non-TTM objects on
integrated, we'll need to mimic the dma-resv individualization from
TTM.

> +
>         if (obj->ttm.created) {
>                 /*
>                  * We freely manage the shrinker LRU outide of the
> mm.pages life
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> index 642cdb559f17..ee6e4c52e80e 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -26,6 +26,17 @@ static inline void i915_gem_vm_bind_unlock(struct
> i915_address_space *vm)
>         mutex_unlock(&vm->vm_bind_lock);
>  }
>  
> +static inline int i915_gem_vm_priv_lock(struct i915_address_space
> *vm,
> +                                       struct i915_gem_ww_ctx *ww)
> +{
> +       return i915_gem_object_lock(vm->root_obj, ww);
> +}

Please make a pass on this patch making sure we provide kerneldoc where
supposed to.

> +
> +static inline void i915_gem_vm_priv_unlock(struct i915_address_space
> *vm)
> +{
> +       i915_gem_object_unlock(vm->root_obj);
> +}
> +
>  struct i915_vma *
>  i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>  void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
> release_obj);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> index 43ceb4dcca6c..3201204c8e74 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma,
> bool release_obj)
>  
>         if (!list_empty(&vma->vm_bind_link)) {
>                 list_del_init(&vma->vm_bind_link);
> +               list_del_init(&vma->non_priv_vm_bind_link);
>                 i915_vm_bind_it_remove(vma, &vma->vm->va);
>  
>                 /* Release object */
> @@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct
> i915_address_space *vm,
>                 goto put_obj;
>         }
>  
> +       if (obj->priv_root && obj->priv_root != vm->root_obj) {
> +               ret = -EINVAL;
> +               goto put_obj;
> +       }
> +
>         ret = i915_gem_vm_bind_lock_interruptible(vm);
>         if (ret)
>                 goto put_obj;
> @@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct
> i915_address_space *vm,
>  
>         list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>         i915_vm_bind_it_insert(vma, &vm->va);
> +       if (!obj->priv_root)
> +               list_add_tail(&vma->non_priv_vm_bind_link,
> +                             &vm->non_priv_vm_bind_list);

I guess I'll find more details in the execbuf patches, but would it
work to keep the non-private objects on the vm_bind_list, and just
never move them to the vm_bound_list, rather than having a separate
list for them?


>  
>         /* Hold object reference until vm_unbind */
>         i915_gem_object_get(vma->obj);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 135dc4a76724..df0a8459c3c6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct
> i915_address_space *vm,
>  void i915_address_space_fini(struct i915_address_space *vm)
>  {
>         drm_mm_takedown(&vm->mm);
> +       i915_gem_object_put(vm->root_obj);
>         GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>         mutex_destroy(&vm->vm_bind_lock);
>  }
> @@ -289,6 +290,9 @@ void i915_address_space_init(struct
> i915_address_space *vm, int subclass)
>         INIT_LIST_HEAD(&vm->vm_bind_list);
>         INIT_LIST_HEAD(&vm->vm_bound_list);
>         mutex_init(&vm->vm_bind_lock);
> +       INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
> +       vm->root_obj = i915_gem_object_create_internal(vm->i915,
> PAGE_SIZE);
> +       GEM_BUG_ON(IS_ERR(vm->root_obj));
>  }
>  
>  void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index d4a6ce65251d..f538ce9115c9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -267,6 +267,8 @@ struct i915_address_space {
>         struct list_head vm_bound_list;
>         /* va tree of persistent vmas */
>         struct rb_root_cached va;
> +       struct list_head non_priv_vm_bind_list;
> +       struct drm_i915_gem_object *root_obj;
>  
>         /* Global GTT */
>         bool is_ggtt:1;
> diff --git a/drivers/gpu/drm/i915/i915_vma.c
> b/drivers/gpu/drm/i915/i915_vma.c
> index d324e29cef0a..f0226581d342 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
>         mutex_unlock(&vm->mutex);
>  
>         INIT_LIST_HEAD(&vma->vm_bind_link);
> +       INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>         return vma;
>  
>  err_unlock:
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> b/drivers/gpu/drm/i915/i915_vma_types.h
> index b6d179bdbfa0..2298b3d6b7c4 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -290,6 +290,8 @@ struct i915_vma {
>         struct list_head vm_link;
>  
>         struct list_head vm_bind_link; /* Link in persistent VMA list
> */
> +       /* Link in non-private persistent VMA list */
> +       struct list_head non_priv_vm_bind_link;
>  
>         /** Interval tree structures for persistent vma */
>         struct rb_node rb;
> diff --git a/include/uapi/drm/i915_drm.h
> b/include/uapi/drm/i915_drm.h
> index 26cca49717f8..ce1c6592b0d7 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
>          *
>          * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
>          * struct drm_i915_gem_create_ext_protected_content.
> +        *
> +        * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
> +        * struct drm_i915_gem_create_ext_vm_private.
>          */
>  #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
>  #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
> +#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
>         __u64 extensions;
>  };
>  
> @@ -3662,6 +3666,32 @@ struct
> drm_i915_gem_create_ext_protected_content {
>  /* ID of the protected content session managed by i915 when PXP is
> active */
>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>  
> +/**
> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
> object
> + * private to the specified VM.
> + *
> + * See struct drm_i915_gem_create_ext.
> + *
> + * By default, BOs can be mapped on multiple VMs and can also be
> dma-buf
> + * exported. Hence these BOs are referred to as Shared BOs.
> + * During each execbuf3 submission, the request fence must be added
> to the
> + * dma-resv fence list of all shared BOs mapped on the VM.
> + *
> + * Unlike Shared BOs, these VM private BOs can only be mapped on the
> VM they
> + * are private to and can't be dma-buf exported. All private BOs of
> a VM share
> + * the dma-resv object. Hence during each execbuf3 submission, they
> need only
> + * one dma-resv fence list updated. Thus, the fast path (where
> required
> + * mappings are already bound) submission latency is O(1) w.r.t the
> number of
> + * VM private BOs.
> + */
> +struct drm_i915_gem_create_ext_vm_private {
> +       /** @base: Extension link. See struct i915_user_extension. */
> +       struct i915_user_extension base;
> +
> +       /** @vm_id: Id of the VM to which the object is private */
> +       __u32 vm_id;
> +};
> +
>  /**
>   * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>   *

Thanks,
Thomas


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
  2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-07 11:27     ` Hellstrom, Thomas
  -1 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-07 11:27 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Brost, Matthew, Zanoni, Paulo R, Ursulin, Tvrtko, Landwerlin,
	Lionel G, Auld, Matthew, jason, Vetter, Daniel, christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> Treat VM_BIND vmas as persistent and handle them during the
> request submission in the execbuff path.
> 
> Support eviction by maintaining a list of evicted persistent vmas
> for rebinding during next submission.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  3 +
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 12 ++-
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>  drivers/gpu/drm/i915/i915_gem_gtt.h           | 22 ++++++
>  drivers/gpu/drm/i915/i915_vma.c               | 32 +++++++-
>  drivers/gpu/drm/i915/i915_vma.h               | 78
> +++++++++++++++++--
>  drivers/gpu/drm/i915/i915_vma_types.h         | 23 ++++++
>  9 files changed, 163 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index ccec4055fde3..5121f02ba95c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -38,6 +38,7 @@
>  #include "i915_gem_mman.h"
>  #include "i915_gem_object.h"
>  #include "i915_gem_ttm.h"
> +#include "i915_gem_vm_bind.h"
>  #include "i915_memcpy.h"
>  #include "i915_trace.h"
>  
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> index 849bf3c1061e..eaadf5a6ab09 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -6,6 +6,7 @@
>  #ifndef __I915_GEM_VM_BIND_H
>  #define __I915_GEM_VM_BIND_H
>  
> +#include <linux/dma-resv.h>
>  #include "i915_drv.h"
>  
>  #define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
> >vm_bind_lock)
> @@ -26,6 +27,8 @@ static inline void i915_gem_vm_bind_unlock(struct
> i915_address_space *vm)
>         mutex_unlock(&vm->vm_bind_lock);
>  }
>  
> +#define assert_vm_priv_held(vm)   assert_object_held((vm)->root_obj)
> +
>  static inline int i915_gem_vm_priv_lock(struct i915_address_space
> *vm,
>                                         struct i915_gem_ww_ctx *ww)
>  {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> index 96f139cc8060..1a8efa83547f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -85,6 +85,13 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma,
> bool release_obj)
>  {
>         assert_vm_bind_held(vma->vm);
>  
> +       spin_lock(&vma->vm->vm_rebind_lock);
> +       if (!list_empty(&vma->vm_rebind_link))
> +               list_del_init(&vma->vm_rebind_link);
> +       i915_vma_set_purged(vma);
> +       i915_vma_set_freed(vma);

Following the vma destruction discussion, can we ditch the freed flag
now?

> +       spin_unlock(&vma->vm->vm_rebind_lock);
> +
>         if (!list_empty(&vma->vm_bind_link)) {
>                 list_del_init(&vma->vm_bind_link);
>                 list_del_init(&vma->non_priv_vm_bind_link);
> @@ -220,6 +227,7 @@ static struct i915_vma *vm_bind_get_vma(struct
> i915_address_space *vm,
>  
>         vma->start = va->start;
>         vma->last = va->start + va->length - 1;
> +       i915_vma_set_persistent(vma);
>  
>         return vma;
>  }
> @@ -304,8 +312,10 @@ int i915_gem_vm_bind_obj(struct
> i915_address_space *vm,
>  
>         i915_vm_bind_put_fence(vma);
>  put_vma:
> -       if (ret)
> +       if (ret) {
> +               i915_vma_set_freed(vma);
>                 i915_vma_destroy(vma);
> +       }
>  
>         i915_gem_ww_ctx_fini(&ww);
>  unlock_vm:
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index df0a8459c3c6..55d5389b2c6c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -293,6 +293,8 @@ void i915_address_space_init(struct
> i915_address_space *vm, int subclass)
>         INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>         vm->root_obj = i915_gem_object_create_internal(vm->i915,
> PAGE_SIZE);
>         GEM_BUG_ON(IS_ERR(vm->root_obj));
> +       INIT_LIST_HEAD(&vm->vm_rebind_list);
> +       spin_lock_init(&vm->vm_rebind_lock);
>  }
>  
>  void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index f538ce9115c9..fe5485c4a1cd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -265,6 +265,8 @@ struct i915_address_space {
>         struct mutex vm_bind_lock;  /* Protects vm_bind lists */
>         struct list_head vm_bind_list;
>         struct list_head vm_bound_list;
> +       struct list_head vm_rebind_list;
> +       spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
>         /* va tree of persistent vmas */
>         struct rb_root_cached va;
>         struct list_head non_priv_vm_bind_list;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h
> b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 8c2f57eb5dda..09b89d1913fc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -51,4 +51,26 @@ int i915_gem_gtt_insert(struct i915_address_space
> *vm,
>  
>  #define PIN_OFFSET_MASK                I915_GTT_PAGE_MASK
>  
> +static inline int i915_vm_sync(struct i915_address_space *vm)

No need to make this an inline function. Only for performance reasons.
Kerneldoc.

> +{
> +       int ret;
> +
> +       /* Wait for all requests under this vm to finish */
> +       ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
> +                                   DMA_RESV_USAGE_BOOKKEEP, false,
> +                                   MAX_SCHEDULE_TIMEOUT);
> +       if (ret < 0)
> +               return ret;
> +       else if (ret > 0)
> +               return 0;
> +       else
> +               return -ETIMEDOUT;
> +}
> +
> +static inline bool i915_vm_is_active(const struct i915_address_space
> *vm)
> +{
> +       return !dma_resv_test_signaled(vm->root_obj->base.resv,
> +                                      DMA_RESV_USAGE_BOOKKEEP);
> +}
> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/i915_vma.c
> b/drivers/gpu/drm/i915/i915_vma.c
> index 6737236b7884..6adb013579be 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
>  
>         INIT_LIST_HEAD(&vma->vm_bind_link);
>         INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
> +       INIT_LIST_HEAD(&vma->vm_rebind_link);
>         return vma;
>  
>  err_unlock:
> @@ -1622,7 +1623,8 @@ void i915_vma_close(struct i915_vma *vma)
>         if (atomic_dec_and_lock_irqsave(&vma->open_count,
>                                         &gt->closed_lock,
>                                         flags)) {
> -               __vma_close(vma, gt);
> +               if (!i915_vma_is_persistent(vma))
> +                       __vma_close(vma, gt);

This change is not needed since persistent vmas shouldn't take part in
the other vma open-close life-time management.

>                 spin_unlock_irqrestore(&gt->closed_lock, flags);
>         }
>  }
> @@ -1647,6 +1649,13 @@ static void force_unbind(struct i915_vma *vma)
>         if (!drm_mm_node_allocated(&vma->node))
>                 return;
>  
> +       /*
> +        * Mark persistent vma as purged to avoid it waiting
> +        * for VM to be released.
> +        */
> +       if (i915_vma_is_persistent(vma))
> +               i915_vma_set_purged(vma);
> +
>         atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
>         WARN_ON(__i915_vma_unbind(vma));
>         GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
> @@ -1666,9 +1675,12 @@ static void release_references(struct i915_vma
> *vma, bool vm_ddestroy)
>  
>         spin_unlock(&obj->vma.lock);
>  
> -       i915_gem_vm_bind_lock(vma->vm);
> -       i915_gem_vm_bind_remove(vma, true);
> -       i915_gem_vm_bind_unlock(vma->vm);
> +       if (i915_vma_is_persistent(vma) &&
> +           !i915_vma_is_freed(vma)) {
> +               i915_gem_vm_bind_lock(vma->vm);
> +               i915_gem_vm_bind_remove(vma, true);
> +               i915_gem_vm_bind_unlock(vma->vm);
> +       }
>  
>         spin_lock_irq(&gt->closed_lock);
>         __i915_vma_remove_closed(vma);
> @@ -1839,6 +1851,8 @@ int _i915_vma_move_to_active(struct i915_vma
> *vma,
>         int err;
>  
>         assert_object_held(obj);
> +       if (i915_vma_is_persistent(vma))
> +               return -EINVAL;
>  
>         GEM_BUG_ON(!vma->pages);
>  
> @@ -1999,6 +2013,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
>         __i915_vma_evict(vma, false);
>  
>         drm_mm_remove_node(&vma->node); /* pairs with
> i915_vma_release() */
> +
> +       if (i915_vma_is_persistent(vma)) {
> +               spin_lock(&vma->vm->vm_rebind_lock);
> +               if (list_empty(&vma->vm_rebind_link) &&
> +                   !i915_vma_is_purged(vma))
> +                       list_add_tail(&vma->vm_rebind_link,
> +                                     &vma->vm->vm_rebind_list);
> +               spin_unlock(&vma->vm->vm_rebind_lock);
> +       }
> +
>         return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_vma.h
> b/drivers/gpu/drm/i915/i915_vma.h
> index dcb49f79ff7e..6c1369a40e03 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>  
>  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned
> int flags);
>  #define I915_VMA_RELEASE_MAP BIT(0)
> -
> -static inline bool i915_vma_is_active(const struct i915_vma *vma)
> -{
> -       return !i915_active_is_idle(&vma->active);
> -}
> -
>  /* do not reserve memory to prevent deadlocks */
>  #define __EXEC_OBJECT_NO_RESERVE BIT(31)
>  
> @@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct
> i915_vma *vma)
>         return i915_vm_to_ggtt(vma->vm)->pin_bias;
>  }
>  
> +static inline bool i915_vma_is_persistent(const struct i915_vma
> *vma)
> +{
> +       return test_bit(I915_VMA_PERSISTENT_BIT,
> __i915_vma_flags(vma));
> +}
> +
> +static inline void i915_vma_set_persistent(struct i915_vma *vma)
> +{
> +       set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline bool i915_vma_is_purged(const struct i915_vma *vma)
> +{
> +       return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline void i915_vma_set_purged(struct i915_vma *vma)
> +{
> +       set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline bool i915_vma_is_freed(const struct i915_vma *vma)
> +{
> +       return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline void i915_vma_set_freed(struct i915_vma *vma)
> +{
> +       set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline bool i915_vma_is_active(const struct i915_vma *vma)
> +{
> +       if (i915_vma_is_persistent(vma)) {
> +               if (i915_vma_is_purged(vma))
> +                       return false;
> +
> +               return i915_vm_is_active(vma->vm);
> +       }
> +
> +       return !i915_active_is_idle(&vma->active);
> +}
> +
>  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
>  {
>         i915_gem_object_get(vma->obj);
> @@ -408,8 +444,36 @@ int i915_vma_wait_for_bind(struct i915_vma
> *vma);
>  
>  static inline int i915_vma_sync(struct i915_vma *vma)
>  {
> +       int ret;
> +
>         /* Wait for the asynchronous bindings and pending GPU reads
> */
> -       return i915_active_wait(&vma->active);
> +       ret = i915_active_wait(&vma->active);
> +       if (ret || !i915_vma_is_persistent(vma) ||
> i915_vma_is_purged(vma))
> +               return ret;
> +
> +       return i915_vm_sync(vma->vm);
> +}
> +
> +static inline bool i915_vma_is_bind_complete(struct i915_vma *vma)
> +{
> +       /* Ensure vma bind is initiated */
> +       if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
> +               return false;
> +
> +       /* Ensure any binding started is complete */
> +       if (rcu_access_pointer(vma->active.excl.fence)) {
> +               struct dma_fence *fence;
> +
> +               rcu_read_lock();
> +               fence = dma_fence_get_rcu_safe(&vma-
> >active.excl.fence);
> +               rcu_read_unlock();
> +               if (fence) {
> +                       dma_fence_put(fence);
> +                       return false;
> +               }
> +       }

Could we use i915_active_fence_get() instead of first testing the
pointer and then open-coding i915_active_fence_get()? Also no need to
inline this.

> +
> +       return true;
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> b/drivers/gpu/drm/i915/i915_vma_types.h
> index 7d830a6a0b51..405c82e1bc30 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -264,6 +264,28 @@ struct i915_vma {
>  #define I915_VMA_SCANOUT_BIT   17
>  #define I915_VMA_SCANOUT       ((int)BIT(I915_VMA_SCANOUT_BIT))
>  
> +  /**
> +   * I915_VMA_PERSISTENT_BIT:
> +   * The vma is persistent (created with VM_BIND call).
> +   *
> +   * I915_VMA_PURGED_BIT:
> +   * The persistent vma is force unbound either due to VM_UNBIND
> call
> +   * from UMD or VM is released. Do not check/wait for VM activeness
> +   * in i915_vma_is_active() and i915_vma_sync() calls.
> +   *
> +   * I915_VMA_FREED_BIT:
> +   * The persistent vma is being released by UMD via VM_UNBIND call.
> +   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND
> call
> +   * already holds the lock.
> +   */
> +#define I915_VMA_PERSISTENT_BIT        19
> +#define I915_VMA_PURGED_BIT    20
> +#define I915_VMA_FREED_BIT     21
> +
> +#define I915_VMA_PERSISTENT    ((int)BIT(I915_VMA_PERSISTENT_BIT))
> +#define
> I915_VMA_PURGED                ((int)BIT(I915_VMA_PURGED_BIT))
> +#define I915_VMA_FREED         ((int)BIT(I915_VMA_FREED_BIT))
> +
>         struct i915_active active;
>  
>  #define I915_VMA_PAGES_BIAS 24
> @@ -292,6 +314,7 @@ struct i915_vma {
>         struct list_head vm_bind_link; /* Link in persistent VMA list
> */
Protected by..

>         /* Link in non-private persistent VMA list */
>         struct list_head non_priv_vm_bind_link;
> +       struct list_head vm_rebind_link; /* Link in vm_rebind_list */
>  
>         /** Timeline fence for vm_bind completion notification */
>         struct {


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
@ 2022-07-07 11:27     ` Hellstrom, Thomas
  0 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-07 11:27 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Zanoni, Paulo R, Auld, Matthew, Vetter, Daniel, christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> Treat VM_BIND vmas as persistent and handle them during the
> request submission in the execbuff path.
> 
> Support eviction by maintaining a list of evicted persistent vmas
> for rebinding during next submission.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  3 +
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 12 ++-
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>  drivers/gpu/drm/i915/i915_gem_gtt.h           | 22 ++++++
>  drivers/gpu/drm/i915/i915_vma.c               | 32 +++++++-
>  drivers/gpu/drm/i915/i915_vma.h               | 78
> +++++++++++++++++--
>  drivers/gpu/drm/i915/i915_vma_types.h         | 23 ++++++
>  9 files changed, 163 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index ccec4055fde3..5121f02ba95c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -38,6 +38,7 @@
>  #include "i915_gem_mman.h"
>  #include "i915_gem_object.h"
>  #include "i915_gem_ttm.h"
> +#include "i915_gem_vm_bind.h"
>  #include "i915_memcpy.h"
>  #include "i915_trace.h"
>  
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> index 849bf3c1061e..eaadf5a6ab09 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -6,6 +6,7 @@
>  #ifndef __I915_GEM_VM_BIND_H
>  #define __I915_GEM_VM_BIND_H
>  
> +#include <linux/dma-resv.h>
>  #include "i915_drv.h"
>  
>  #define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
> >vm_bind_lock)
> @@ -26,6 +27,8 @@ static inline void i915_gem_vm_bind_unlock(struct
> i915_address_space *vm)
>         mutex_unlock(&vm->vm_bind_lock);
>  }
>  
> +#define assert_vm_priv_held(vm)   assert_object_held((vm)->root_obj)
> +
>  static inline int i915_gem_vm_priv_lock(struct i915_address_space
> *vm,
>                                         struct i915_gem_ww_ctx *ww)
>  {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> index 96f139cc8060..1a8efa83547f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -85,6 +85,13 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma,
> bool release_obj)
>  {
>         assert_vm_bind_held(vma->vm);
>  
> +       spin_lock(&vma->vm->vm_rebind_lock);
> +       if (!list_empty(&vma->vm_rebind_link))
> +               list_del_init(&vma->vm_rebind_link);
> +       i915_vma_set_purged(vma);
> +       i915_vma_set_freed(vma);

Following the vma destruction discussion, can we ditch the freed flag
now?

> +       spin_unlock(&vma->vm->vm_rebind_lock);
> +
>         if (!list_empty(&vma->vm_bind_link)) {
>                 list_del_init(&vma->vm_bind_link);
>                 list_del_init(&vma->non_priv_vm_bind_link);
> @@ -220,6 +227,7 @@ static struct i915_vma *vm_bind_get_vma(struct
> i915_address_space *vm,
>  
>         vma->start = va->start;
>         vma->last = va->start + va->length - 1;
> +       i915_vma_set_persistent(vma);
>  
>         return vma;
>  }
> @@ -304,8 +312,10 @@ int i915_gem_vm_bind_obj(struct
> i915_address_space *vm,
>  
>         i915_vm_bind_put_fence(vma);
>  put_vma:
> -       if (ret)
> +       if (ret) {
> +               i915_vma_set_freed(vma);
>                 i915_vma_destroy(vma);
> +       }
>  
>         i915_gem_ww_ctx_fini(&ww);
>  unlock_vm:
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index df0a8459c3c6..55d5389b2c6c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -293,6 +293,8 @@ void i915_address_space_init(struct
> i915_address_space *vm, int subclass)
>         INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>         vm->root_obj = i915_gem_object_create_internal(vm->i915,
> PAGE_SIZE);
>         GEM_BUG_ON(IS_ERR(vm->root_obj));
> +       INIT_LIST_HEAD(&vm->vm_rebind_list);
> +       spin_lock_init(&vm->vm_rebind_lock);
>  }
>  
>  void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index f538ce9115c9..fe5485c4a1cd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -265,6 +265,8 @@ struct i915_address_space {
>         struct mutex vm_bind_lock;  /* Protects vm_bind lists */
>         struct list_head vm_bind_list;
>         struct list_head vm_bound_list;
> +       struct list_head vm_rebind_list;
> +       spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
>         /* va tree of persistent vmas */
>         struct rb_root_cached va;
>         struct list_head non_priv_vm_bind_list;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h
> b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 8c2f57eb5dda..09b89d1913fc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -51,4 +51,26 @@ int i915_gem_gtt_insert(struct i915_address_space
> *vm,
>  
>  #define PIN_OFFSET_MASK                I915_GTT_PAGE_MASK
>  
> +static inline int i915_vm_sync(struct i915_address_space *vm)

No need to make this an inline function. Only for performance reasons.
Kerneldoc.

> +{
> +       int ret;
> +
> +       /* Wait for all requests under this vm to finish */
> +       ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
> +                                   DMA_RESV_USAGE_BOOKKEEP, false,
> +                                   MAX_SCHEDULE_TIMEOUT);
> +       if (ret < 0)
> +               return ret;
> +       else if (ret > 0)
> +               return 0;
> +       else
> +               return -ETIMEDOUT;
> +}
> +
> +static inline bool i915_vm_is_active(const struct i915_address_space
> *vm)
> +{
> +       return !dma_resv_test_signaled(vm->root_obj->base.resv,
> +                                      DMA_RESV_USAGE_BOOKKEEP);
> +}
> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/i915_vma.c
> b/drivers/gpu/drm/i915/i915_vma.c
> index 6737236b7884..6adb013579be 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
>  
>         INIT_LIST_HEAD(&vma->vm_bind_link);
>         INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
> +       INIT_LIST_HEAD(&vma->vm_rebind_link);
>         return vma;
>  
>  err_unlock:
> @@ -1622,7 +1623,8 @@ void i915_vma_close(struct i915_vma *vma)
>         if (atomic_dec_and_lock_irqsave(&vma->open_count,
>                                         &gt->closed_lock,
>                                         flags)) {
> -               __vma_close(vma, gt);
> +               if (!i915_vma_is_persistent(vma))
> +                       __vma_close(vma, gt);

This change is not needed since persistent vmas shouldn't take part in
the other vma open-close life-time management.

>                 spin_unlock_irqrestore(&gt->closed_lock, flags);
>         }
>  }
> @@ -1647,6 +1649,13 @@ static void force_unbind(struct i915_vma *vma)
>         if (!drm_mm_node_allocated(&vma->node))
>                 return;
>  
> +       /*
> +        * Mark persistent vma as purged to avoid it waiting
> +        * for VM to be released.
> +        */
> +       if (i915_vma_is_persistent(vma))
> +               i915_vma_set_purged(vma);
> +
>         atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
>         WARN_ON(__i915_vma_unbind(vma));
>         GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
> @@ -1666,9 +1675,12 @@ static void release_references(struct i915_vma
> *vma, bool vm_ddestroy)
>  
>         spin_unlock(&obj->vma.lock);
>  
> -       i915_gem_vm_bind_lock(vma->vm);
> -       i915_gem_vm_bind_remove(vma, true);
> -       i915_gem_vm_bind_unlock(vma->vm);
> +       if (i915_vma_is_persistent(vma) &&
> +           !i915_vma_is_freed(vma)) {
> +               i915_gem_vm_bind_lock(vma->vm);
> +               i915_gem_vm_bind_remove(vma, true);
> +               i915_gem_vm_bind_unlock(vma->vm);
> +       }
>  
>         spin_lock_irq(&gt->closed_lock);
>         __i915_vma_remove_closed(vma);
> @@ -1839,6 +1851,8 @@ int _i915_vma_move_to_active(struct i915_vma
> *vma,
>         int err;
>  
>         assert_object_held(obj);
> +       if (i915_vma_is_persistent(vma))
> +               return -EINVAL;
>  
>         GEM_BUG_ON(!vma->pages);
>  
> @@ -1999,6 +2013,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
>         __i915_vma_evict(vma, false);
>  
>         drm_mm_remove_node(&vma->node); /* pairs with
> i915_vma_release() */
> +
> +       if (i915_vma_is_persistent(vma)) {
> +               spin_lock(&vma->vm->vm_rebind_lock);
> +               if (list_empty(&vma->vm_rebind_link) &&
> +                   !i915_vma_is_purged(vma))
> +                       list_add_tail(&vma->vm_rebind_link,
> +                                     &vma->vm->vm_rebind_list);
> +               spin_unlock(&vma->vm->vm_rebind_lock);
> +       }
> +
>         return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_vma.h
> b/drivers/gpu/drm/i915/i915_vma.h
> index dcb49f79ff7e..6c1369a40e03 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>  
>  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned
> int flags);
>  #define I915_VMA_RELEASE_MAP BIT(0)
> -
> -static inline bool i915_vma_is_active(const struct i915_vma *vma)
> -{
> -       return !i915_active_is_idle(&vma->active);
> -}
> -
>  /* do not reserve memory to prevent deadlocks */
>  #define __EXEC_OBJECT_NO_RESERVE BIT(31)
>  
> @@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct
> i915_vma *vma)
>         return i915_vm_to_ggtt(vma->vm)->pin_bias;
>  }
>  
> +static inline bool i915_vma_is_persistent(const struct i915_vma
> *vma)
> +{
> +       return test_bit(I915_VMA_PERSISTENT_BIT,
> __i915_vma_flags(vma));
> +}
> +
> +static inline void i915_vma_set_persistent(struct i915_vma *vma)
> +{
> +       set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline bool i915_vma_is_purged(const struct i915_vma *vma)
> +{
> +       return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline void i915_vma_set_purged(struct i915_vma *vma)
> +{
> +       set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline bool i915_vma_is_freed(const struct i915_vma *vma)
> +{
> +       return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline void i915_vma_set_freed(struct i915_vma *vma)
> +{
> +       set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
> +}
> +
> +static inline bool i915_vma_is_active(const struct i915_vma *vma)
> +{
> +       if (i915_vma_is_persistent(vma)) {
> +               if (i915_vma_is_purged(vma))
> +                       return false;
> +
> +               return i915_vm_is_active(vma->vm);
> +       }
> +
> +       return !i915_active_is_idle(&vma->active);
> +}
> +
>  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
>  {
>         i915_gem_object_get(vma->obj);
> @@ -408,8 +444,36 @@ int i915_vma_wait_for_bind(struct i915_vma
> *vma);
>  
>  static inline int i915_vma_sync(struct i915_vma *vma)
>  {
> +       int ret;
> +
>         /* Wait for the asynchronous bindings and pending GPU reads
> */
> -       return i915_active_wait(&vma->active);
> +       ret = i915_active_wait(&vma->active);
> +       if (ret || !i915_vma_is_persistent(vma) ||
> i915_vma_is_purged(vma))
> +               return ret;
> +
> +       return i915_vm_sync(vma->vm);
> +}
> +
> +static inline bool i915_vma_is_bind_complete(struct i915_vma *vma)
> +{
> +       /* Ensure vma bind is initiated */
> +       if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
> +               return false;
> +
> +       /* Ensure any binding started is complete */
> +       if (rcu_access_pointer(vma->active.excl.fence)) {
> +               struct dma_fence *fence;
> +
> +               rcu_read_lock();
> +               fence = dma_fence_get_rcu_safe(&vma-
> >active.excl.fence);
> +               rcu_read_unlock();
> +               if (fence) {
> +                       dma_fence_put(fence);
> +                       return false;
> +               }
> +       }

Could we use i915_active_fence_get() instead of first testing the
pointer and then open-coding i915_active_fence_get()? Also no need to
inline this.

> +
> +       return true;
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> b/drivers/gpu/drm/i915/i915_vma_types.h
> index 7d830a6a0b51..405c82e1bc30 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -264,6 +264,28 @@ struct i915_vma {
>  #define I915_VMA_SCANOUT_BIT   17
>  #define I915_VMA_SCANOUT       ((int)BIT(I915_VMA_SCANOUT_BIT))
>  
> +  /**
> +   * I915_VMA_PERSISTENT_BIT:
> +   * The vma is persistent (created with VM_BIND call).
> +   *
> +   * I915_VMA_PURGED_BIT:
> +   * The persistent vma is force unbound either due to VM_UNBIND
> call
> +   * from UMD or VM is released. Do not check/wait for VM activeness
> +   * in i915_vma_is_active() and i915_vma_sync() calls.
> +   *
> +   * I915_VMA_FREED_BIT:
> +   * The persistent vma is being released by UMD via VM_UNBIND call.
> +   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND
> call
> +   * already holds the lock.
> +   */
> +#define I915_VMA_PERSISTENT_BIT        19
> +#define I915_VMA_PURGED_BIT    20
> +#define I915_VMA_FREED_BIT     21
> +
> +#define I915_VMA_PERSISTENT    ((int)BIT(I915_VMA_PERSISTENT_BIT))
> +#define
> I915_VMA_PURGED                ((int)BIT(I915_VMA_PURGED_BIT))
> +#define I915_VMA_FREED         ((int)BIT(I915_VMA_FREED_BIT))
> +
>         struct i915_active active;
>  
>  #define I915_VMA_PAGES_BIAS 24
> @@ -292,6 +314,7 @@ struct i915_vma {
>         struct list_head vm_bind_link; /* Link in persistent VMA list
> */
Protected by..

>         /* Link in non-private persistent VMA list */
>         struct list_head non_priv_vm_bind_link;
> +       struct list_head vm_rebind_link; /* Link in vm_rebind_list */
>  
>         /** Timeline fence for vm_bind completion notification */
>         struct {


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 08/10] drm/i915/vm_bind: userptr dma-resv changes
  2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-07 13:11     ` Hellstrom, Thomas
  -1 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-07 13:11 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Brost, Matthew, Zanoni, Paulo R, Ursulin, Tvrtko, Landwerlin,
	Lionel G, Auld, Matthew, jason, Vetter, Daniel, christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> For persistent (vm_bind) vmas of userptr BOs, handle the user
> page pinning by using the i915_gem_object_userptr_submit_init()
> /done() functions
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 67
> +++++++++++++++++++
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 16 +++++
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  1 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  1 +
>  4 files changed, 85 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> index 2079f5ca9010..bf13dd6d642e 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> @@ -22,6 +22,7 @@
>  #include "i915_gem_vm_bind.h"
>  #include "i915_trace.h"
>  
> +#define __EXEC3_USERPTR_USED           BIT_ULL(34)
>  #define __EXEC3_HAS_PIN                        BIT_ULL(33)
>  #define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
>  #define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
> @@ -147,10 +148,36 @@ static void eb_scoop_unbound_vmas(struct
> i915_address_space *vm)
>         spin_unlock(&vm->vm_rebind_lock);
>  }
>  
> +static int eb_lookup_persistent_userptr_vmas(struct i915_execbuffer
> *eb)
> +{
> +       struct i915_address_space *vm = eb->context->vm;
> +       struct i915_vma *last_vma = NULL;
> +       struct i915_vma *vma;
> +       int err;
> +
> +       assert_vm_bind_held(vm);
> +
> +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
> +               if (i915_gem_object_is_userptr(vma->obj)) {
> +                       err =
> i915_gem_object_userptr_submit_init(vma->obj);
> +                       if (err)
> +                               return err;
> +
> +                       last_vma = vma;
> +               }
> +       }
> +

Don't we need to loop also over non-private userptr objects?


> +       if (last_vma)
> +               eb->args->flags |= __EXEC3_USERPTR_USED;
> +
> +       return 0;
> +}
> +
>  static int eb_lookup_vmas(struct i915_execbuffer *eb)
>  {
>         unsigned int i, current_batch = 0;
>         struct i915_vma *vma;
> +       int err = 0;
>  
>         for (i = 0; i < eb->num_batches; i++) {
>                 vma = eb_find_vma(eb->context->vm, eb-
> >batch_addresses[i]);
> @@ -163,6 +190,10 @@ static int eb_lookup_vmas(struct i915_execbuffer
> *eb)
>  
>         eb_scoop_unbound_vmas(eb->context->vm);
>  
> +       err = eb_lookup_persistent_userptr_vmas(eb);
> +       if (err)
> +               return err;
> +
>         return 0;
>  }
>  
> @@ -358,15 +389,51 @@ static void
> eb_persistent_vmas_move_to_active(struct i915_execbuffer *eb)
>  
>  static int eb_move_to_gpu(struct i915_execbuffer *eb)
>  {
> +       int err = 0, j;
> +
>         assert_vm_bind_held(eb->context->vm);
>         assert_vm_priv_held(eb->context->vm);
>  
>         eb_persistent_vmas_move_to_active(eb);
>  
> +#ifdef CONFIG_MMU_NOTIFIER
> +       if (!err && (eb->args->flags & __EXEC3_USERPTR_USED)) {
> +               struct i915_vma *vma;
> +
> +               assert_vm_bind_held(eb->context->vm);
> +               assert_vm_priv_held(eb->context->vm);
> +
> +               read_lock(&eb->i915->mm.notifier_lock);
> +               list_for_each_entry(vma, &eb->context->vm-
> >vm_bind_list,
> +                                   vm_bind_link) {
> +                       if (!i915_gem_object_is_userptr(vma->obj))
> +                               continue;
> +
> +                       err =
> i915_gem_object_userptr_submit_done(vma->obj);
> +                       if (err)
> +                               break;
> +               }
> +
> +               read_unlock(&eb->i915->mm.notifier_lock);
> +       }

Since we don't loop over the vm_bound_list, there is a need to check
whether the rebind_list is empty here under the notifier_lock in read
mode, and in that case, restart from eb_lookup_vmas(). That might also
eliminate the need for the __EXEC3_USERPTR_USED flag?

That will also catch any objects that were evicted between
eb_lookup_vmas() where the rebind_list was last checked, and
i915_gem_vm_priv_lock(), which prohibits further eviction, but if we
want to catch these earlier (which I think is a good idea), we could
check that the rebind_list is indeed empty just after taking the
vm_priv_lock(), and if not, restart from eb_lookup_vmas().


> +#endif
> +
> +       if (unlikely(err))
> +               goto err_skip;
> +
>         /* Unconditionally flush any chipset caches (for streaming
> writes). */
>         intel_gt_chipset_flush(eb->gt);
>  
>         return 0;
> +
> +err_skip:
> +       for_each_batch_create_order(eb, j) {
> +               if (!eb->requests[j])
> +                       break;
> +
> +               i915_request_set_error_once(eb->requests[j], err);
> +       }
> +       return err;
>  }
>  
>  static int eb_request_submit(struct i915_execbuffer *eb,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> index 1a8efa83547f..cae282b91618 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -263,6 +263,12 @@ int i915_gem_vm_bind_obj(struct
> i915_address_space *vm,
>                 goto put_obj;
>         }
>  
> +       if (i915_gem_object_is_userptr(obj)) {
> +               ret = i915_gem_object_userptr_submit_init(obj);
> +               if (ret)
> +                       goto put_obj;
> +       }
> +
>         ret = i915_gem_vm_bind_lock_interruptible(vm);
>         if (ret)
>                 goto put_obj;
> @@ -295,6 +301,16 @@ int i915_gem_vm_bind_obj(struct
> i915_address_space *vm,
>         /* Make it evictable */
>         __i915_vma_unpin(vma);
>  
> +#ifdef CONFIG_MMU_NOTIFIER
> +       if (i915_gem_object_is_userptr(obj)) {
> +               write_lock(&vm->i915->mm.notifier_lock);

Why do we need the lock in write mode here? 

> +               ret = i915_gem_object_userptr_submit_done(obj);
> +               write_unlock(&vm->i915->mm.notifier_lock);
> +               if (ret)
> +                       goto out_ww;
> +       }
> +#endif
> +
>         list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>         i915_vm_bind_it_insert(vma, &vm->va);
>         if (!obj->priv_root)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 55d5389b2c6c..4ab3bda644ff 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -295,6 +295,7 @@ void i915_address_space_init(struct
> i915_address_space *vm, int subclass)
>         GEM_BUG_ON(IS_ERR(vm->root_obj));
>         INIT_LIST_HEAD(&vm->vm_rebind_list);
>         spin_lock_init(&vm->vm_rebind_lock);
> +       INIT_LIST_HEAD(&vm->invalidate_link);
>  }
>  
>  void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index fe5485c4a1cd..f9edf11c144f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -267,6 +267,7 @@ struct i915_address_space {
>         struct list_head vm_bound_list;
>         struct list_head vm_rebind_list;
>         spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
> +       struct list_head invalidate_link;
>         /* va tree of persistent vmas */
>         struct rb_root_cached va;
>         struct list_head non_priv_vm_bind_list;


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 08/10] drm/i915/vm_bind: userptr dma-resv changes
@ 2022-07-07 13:11     ` Hellstrom, Thomas
  0 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-07 13:11 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Zanoni, Paulo R, Auld, Matthew, Vetter, Daniel, christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> For persistent (vm_bind) vmas of userptr BOs, handle the user
> page pinning by using the i915_gem_object_userptr_submit_init()
> /done() functions
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 67
> +++++++++++++++++++
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 16 +++++
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  1 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  1 +
>  4 files changed, 85 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> index 2079f5ca9010..bf13dd6d642e 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> @@ -22,6 +22,7 @@
>  #include "i915_gem_vm_bind.h"
>  #include "i915_trace.h"
>  
> +#define __EXEC3_USERPTR_USED           BIT_ULL(34)
>  #define __EXEC3_HAS_PIN                        BIT_ULL(33)
>  #define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
>  #define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
> @@ -147,10 +148,36 @@ static void eb_scoop_unbound_vmas(struct
> i915_address_space *vm)
>         spin_unlock(&vm->vm_rebind_lock);
>  }
>  
> +static int eb_lookup_persistent_userptr_vmas(struct i915_execbuffer
> *eb)
> +{
> +       struct i915_address_space *vm = eb->context->vm;
> +       struct i915_vma *last_vma = NULL;
> +       struct i915_vma *vma;
> +       int err;
> +
> +       assert_vm_bind_held(vm);
> +
> +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
> +               if (i915_gem_object_is_userptr(vma->obj)) {
> +                       err =
> i915_gem_object_userptr_submit_init(vma->obj);
> +                       if (err)
> +                               return err;
> +
> +                       last_vma = vma;
> +               }
> +       }
> +

Don't we need to loop also over non-private userptr objects?


> +       if (last_vma)
> +               eb->args->flags |= __EXEC3_USERPTR_USED;
> +
> +       return 0;
> +}
> +
>  static int eb_lookup_vmas(struct i915_execbuffer *eb)
>  {
>         unsigned int i, current_batch = 0;
>         struct i915_vma *vma;
> +       int err = 0;
>  
>         for (i = 0; i < eb->num_batches; i++) {
>                 vma = eb_find_vma(eb->context->vm, eb-
> >batch_addresses[i]);
> @@ -163,6 +190,10 @@ static int eb_lookup_vmas(struct i915_execbuffer
> *eb)
>  
>         eb_scoop_unbound_vmas(eb->context->vm);
>  
> +       err = eb_lookup_persistent_userptr_vmas(eb);
> +       if (err)
> +               return err;
> +
>         return 0;
>  }
>  
> @@ -358,15 +389,51 @@ static void
> eb_persistent_vmas_move_to_active(struct i915_execbuffer *eb)
>  
>  static int eb_move_to_gpu(struct i915_execbuffer *eb)
>  {
> +       int err = 0, j;
> +
>         assert_vm_bind_held(eb->context->vm);
>         assert_vm_priv_held(eb->context->vm);
>  
>         eb_persistent_vmas_move_to_active(eb);
>  
> +#ifdef CONFIG_MMU_NOTIFIER
> +       if (!err && (eb->args->flags & __EXEC3_USERPTR_USED)) {
> +               struct i915_vma *vma;
> +
> +               assert_vm_bind_held(eb->context->vm);
> +               assert_vm_priv_held(eb->context->vm);
> +
> +               read_lock(&eb->i915->mm.notifier_lock);
> +               list_for_each_entry(vma, &eb->context->vm-
> >vm_bind_list,
> +                                   vm_bind_link) {
> +                       if (!i915_gem_object_is_userptr(vma->obj))
> +                               continue;
> +
> +                       err =
> i915_gem_object_userptr_submit_done(vma->obj);
> +                       if (err)
> +                               break;
> +               }
> +
> +               read_unlock(&eb->i915->mm.notifier_lock);
> +       }

Since we don't loop over the vm_bound_list, there is a need to check
whether the rebind_list is empty here under the notifier_lock in read
mode, and in that case, restart from eb_lookup_vmas(). That might also
eliminate the need for the __EXEC3_USERPTR_USED flag?

That will also catch any objects that were evicted between
eb_lookup_vmas() where the rebind_list was last checked, and
i915_gem_vm_priv_lock(), which prohibits further eviction, but if we
want to catch these earlier (which I think is a good idea), we could
check that the rebind_list is indeed empty just after taking the
vm_priv_lock(), and if not, restart from eb_lookup_vmas().


> +#endif
> +
> +       if (unlikely(err))
> +               goto err_skip;
> +
>         /* Unconditionally flush any chipset caches (for streaming
> writes). */
>         intel_gt_chipset_flush(eb->gt);
>  
>         return 0;
> +
> +err_skip:
> +       for_each_batch_create_order(eb, j) {
> +               if (!eb->requests[j])
> +                       break;
> +
> +               i915_request_set_error_once(eb->requests[j], err);
> +       }
> +       return err;
>  }
>  
>  static int eb_request_submit(struct i915_execbuffer *eb,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> index 1a8efa83547f..cae282b91618 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -263,6 +263,12 @@ int i915_gem_vm_bind_obj(struct
> i915_address_space *vm,
>                 goto put_obj;
>         }
>  
> +       if (i915_gem_object_is_userptr(obj)) {
> +               ret = i915_gem_object_userptr_submit_init(obj);
> +               if (ret)
> +                       goto put_obj;
> +       }
> +
>         ret = i915_gem_vm_bind_lock_interruptible(vm);
>         if (ret)
>                 goto put_obj;
> @@ -295,6 +301,16 @@ int i915_gem_vm_bind_obj(struct
> i915_address_space *vm,
>         /* Make it evictable */
>         __i915_vma_unpin(vma);
>  
> +#ifdef CONFIG_MMU_NOTIFIER
> +       if (i915_gem_object_is_userptr(obj)) {
> +               write_lock(&vm->i915->mm.notifier_lock);

Why do we need the lock in write mode here? 

> +               ret = i915_gem_object_userptr_submit_done(obj);
> +               write_unlock(&vm->i915->mm.notifier_lock);
> +               if (ret)
> +                       goto out_ww;
> +       }
> +#endif
> +
>         list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>         i915_vm_bind_it_insert(vma, &vm->va);
>         if (!obj->priv_root)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 55d5389b2c6c..4ab3bda644ff 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -295,6 +295,7 @@ void i915_address_space_init(struct
> i915_address_space *vm, int subclass)
>         GEM_BUG_ON(IS_ERR(vm->root_obj));
>         INIT_LIST_HEAD(&vm->vm_rebind_list);
>         spin_lock_init(&vm->vm_rebind_lock);
> +       INIT_LIST_HEAD(&vm->invalidate_link);
>  }
>  
>  void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index fe5485c4a1cd..f9edf11c144f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -267,6 +267,7 @@ struct i915_address_space {
>         struct list_head vm_bound_list;
>         struct list_head vm_rebind_list;
>         spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
> +       struct list_head invalidate_link;
>         /* va tree of persistent vmas */
>         struct rb_root_cached va;
>         struct list_head non_priv_vm_bind_list;


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
  2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-07 13:27     ` Christian König
  -1 siblings, 0 replies; 121+ messages in thread
From: Christian König @ 2022-07-07 13:27 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin,
	lionel.g.landwerlin, thomas.hellstrom, matthew.auld, jason,
	daniel.vetter

Am 02.07.22 um 00:50 schrieb Niranjana Vishwanathapura:
> Add uapi allowing user to specify a BO as private to a specified VM
> during the BO creation.
> VM private BOs can only be mapped on the specified VM and can't be
> dma_buf exported. VM private BOs share a single common dma_resv object,
> hence has a performance advantage requiring a single dma_resv object
> update in the execbuf path compared to non-private (shared) BOs.

Sounds like you picked up the per VM BO idea from amdgpu here :)

Of hand looks like a good idea, but shouldn't we add a few comments in 
the common documentation about that?

E.g. something like "Multiple buffer objects sometimes share the same 
dma_resv object....." to the dma_resv documentation.

Probably best as a separate patch after this here has landed.

Regards,
Christian.

>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41 ++++++++++++++++++-
>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
>   .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
>   drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
>   .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
>   drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>   drivers/gpu/drm/i915/i915_vma.c               |  1 +
>   drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
>   include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
>   11 files changed, 110 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> index 927a87e5ec59..7e264566b51f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> @@ -11,6 +11,7 @@
>   #include "pxp/intel_pxp.h"
>   
>   #include "i915_drv.h"
> +#include "i915_gem_context.h"
>   #include "i915_gem_create.h"
>   #include "i915_trace.h"
>   #include "i915_user_extensions.h"
> @@ -243,6 +244,7 @@ struct create_ext {
>   	unsigned int n_placements;
>   	unsigned int placement_mask;
>   	unsigned long flags;
> +	u32 vm_id;
>   };
>   
>   static void repr_placements(char *buf, size_t size,
> @@ -392,9 +394,24 @@ static int ext_set_protected(struct i915_user_extension __user *base, void *data
>   	return 0;
>   }
>   
> +static int ext_set_vm_private(struct i915_user_extension __user *base,
> +			      void *data)
> +{
> +	struct drm_i915_gem_create_ext_vm_private ext;
> +	struct create_ext *ext_data = data;
> +
> +	if (copy_from_user(&ext, base, sizeof(ext)))
> +		return -EFAULT;
> +
> +	ext_data->vm_id = ext.vm_id;
> +
> +	return 0;
> +}
> +
>   static const i915_user_extension_fn create_extensions[] = {
>   	[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>   	[I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
> +	[I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
>   };
>   
>   /**
> @@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
>   	struct drm_i915_private *i915 = to_i915(dev);
>   	struct drm_i915_gem_create_ext *args = data;
>   	struct create_ext ext_data = { .i915 = i915 };
> +	struct i915_address_space *vm = NULL;
>   	struct drm_i915_gem_object *obj;
>   	int ret;
>   
> @@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
>   	if (ret)
>   		return ret;
>   
> +	if (ext_data.vm_id) {
> +		vm = i915_gem_vm_lookup(file->driver_priv, ext_data.vm_id);
> +		if (unlikely(!vm))
> +			return -ENOENT;
> +	}
> +
>   	if (!ext_data.n_placements) {
>   		ext_data.placements[0] =
>   			intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
> @@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
>   						ext_data.placements,
>   						ext_data.n_placements,
>   						ext_data.flags);
> -	if (IS_ERR(obj))
> -		return PTR_ERR(obj);
> +	if (IS_ERR(obj)) {
> +		ret = PTR_ERR(obj);
> +		goto vm_put;
> +	}
> +
> +	if (vm) {
> +		obj->base.resv = vm->root_obj->base.resv;
> +		obj->priv_root = i915_gem_object_get(vm->root_obj);
> +		i915_vm_put(vm);
> +	}
>   
>   	return i915_gem_publish(obj, file, &args->size, &args->handle);
> +vm_put:
> +	if (vm)
> +		i915_vm_put(vm);
> +
> +	return ret;
>   }
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index f5062d0c6333..6433173c3e84 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags)
>   	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
>   	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>   
> +	if (obj->priv_root) {
> +		drm_dbg(obj->base.dev,
> +			"Exporting VM private objects is not allowed\n");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
>   	exp_info.ops = &i915_dmabuf_ops;
>   	exp_info.size = gem_obj->size;
>   	exp_info.flags = flags;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 5cf36a130061..9fe3395ad4d9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -241,6 +241,9 @@ struct drm_i915_gem_object {
>   
>   	const struct drm_i915_gem_object_ops *ops;
>   
> +	/* Shared root is object private to a VM; NULL otherwise */
> +	struct drm_i915_gem_object *priv_root;
> +
>   	struct {
>   		/**
>   		 * @vma.lock: protect the list/tree of vmas
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 7e1f8b83077f..f1912b12db00 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
>   	i915_gem_object_release_memory_region(obj);
>   	mutex_destroy(&obj->ttm.get_io_page.lock);
>   
> +	if (obj->priv_root)
> +		i915_gem_object_put(obj->priv_root);
> +
>   	if (obj->ttm.created) {
>   		/*
>   		 * We freely manage the shrinker LRU outide of the mm.pages life
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> index 642cdb559f17..ee6e4c52e80e 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -26,6 +26,17 @@ static inline void i915_gem_vm_bind_unlock(struct i915_address_space *vm)
>   	mutex_unlock(&vm->vm_bind_lock);
>   }
>   
> +static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
> +					struct i915_gem_ww_ctx *ww)
> +{
> +	return i915_gem_object_lock(vm->root_obj, ww);
> +}
> +
> +static inline void i915_gem_vm_priv_unlock(struct i915_address_space *vm)
> +{
> +	i915_gem_object_unlock(vm->root_obj);
> +}
> +
>   struct i915_vma *
>   i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>   void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> index 43ceb4dcca6c..3201204c8e74 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
>   
>   	if (!list_empty(&vma->vm_bind_link)) {
>   		list_del_init(&vma->vm_bind_link);
> +		list_del_init(&vma->non_priv_vm_bind_link);
>   		i915_vm_bind_it_remove(vma, &vma->vm->va);
>   
>   		/* Release object */
> @@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>   		goto put_obj;
>   	}
>   
> +	if (obj->priv_root && obj->priv_root != vm->root_obj) {
> +		ret = -EINVAL;
> +		goto put_obj;
> +	}
> +
>   	ret = i915_gem_vm_bind_lock_interruptible(vm);
>   	if (ret)
>   		goto put_obj;
> @@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>   
>   	list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>   	i915_vm_bind_it_insert(vma, &vm->va);
> +	if (!obj->priv_root)
> +		list_add_tail(&vma->non_priv_vm_bind_link,
> +			      &vm->non_priv_vm_bind_list);
>   
>   	/* Hold object reference until vm_unbind */
>   	i915_gem_object_get(vma->obj);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 135dc4a76724..df0a8459c3c6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
>   void i915_address_space_fini(struct i915_address_space *vm)
>   {
>   	drm_mm_takedown(&vm->mm);
> +	i915_gem_object_put(vm->root_obj);
>   	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>   	mutex_destroy(&vm->vm_bind_lock);
>   }
> @@ -289,6 +290,9 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>   	INIT_LIST_HEAD(&vm->vm_bind_list);
>   	INIT_LIST_HEAD(&vm->vm_bound_list);
>   	mutex_init(&vm->vm_bind_lock);
> +	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
> +	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
> +	GEM_BUG_ON(IS_ERR(vm->root_obj));
>   }
>   
>   void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index d4a6ce65251d..f538ce9115c9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -267,6 +267,8 @@ struct i915_address_space {
>   	struct list_head vm_bound_list;
>   	/* va tree of persistent vmas */
>   	struct rb_root_cached va;
> +	struct list_head non_priv_vm_bind_list;
> +	struct drm_i915_gem_object *root_obj;
>   
>   	/* Global GTT */
>   	bool is_ggtt:1;
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index d324e29cef0a..f0226581d342 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
>   	mutex_unlock(&vm->mutex);
>   
>   	INIT_LIST_HEAD(&vma->vm_bind_link);
> +	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>   	return vma;
>   
>   err_unlock:
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
> index b6d179bdbfa0..2298b3d6b7c4 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -290,6 +290,8 @@ struct i915_vma {
>   	struct list_head vm_link;
>   
>   	struct list_head vm_bind_link; /* Link in persistent VMA list */
> +	/* Link in non-private persistent VMA list */
> +	struct list_head non_priv_vm_bind_link;
>   
>   	/** Interval tree structures for persistent vma */
>   	struct rb_node rb;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 26cca49717f8..ce1c6592b0d7 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
>   	 *
>   	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
>   	 * struct drm_i915_gem_create_ext_protected_content.
> +	 *
> +	 * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
> +	 * struct drm_i915_gem_create_ext_vm_private.
>   	 */
>   #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
>   #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
> +#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
>   	__u64 extensions;
>   };
>   
> @@ -3662,6 +3666,32 @@ struct drm_i915_gem_create_ext_protected_content {
>   /* ID of the protected content session managed by i915 when PXP is active */
>   #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>   
> +/**
> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
> + * private to the specified VM.
> + *
> + * See struct drm_i915_gem_create_ext.
> + *
> + * By default, BOs can be mapped on multiple VMs and can also be dma-buf
> + * exported. Hence these BOs are referred to as Shared BOs.
> + * During each execbuf3 submission, the request fence must be added to the
> + * dma-resv fence list of all shared BOs mapped on the VM.
> + *
> + * Unlike Shared BOs, these VM private BOs can only be mapped on the VM they
> + * are private to and can't be dma-buf exported. All private BOs of a VM share
> + * the dma-resv object. Hence during each execbuf3 submission, they need only
> + * one dma-resv fence list updated. Thus, the fast path (where required
> + * mappings are already bound) submission latency is O(1) w.r.t the number of
> + * VM private BOs.
> + */
> +struct drm_i915_gem_create_ext_vm_private {
> +	/** @base: Extension link. See struct i915_user_extension. */
> +	struct i915_user_extension base;
> +
> +	/** @vm_id: Id of the VM to which the object is private */
> +	__u32 vm_id;
> +};
> +
>   /**
>    * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>    *


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
@ 2022-07-07 13:27     ` Christian König
  0 siblings, 0 replies; 121+ messages in thread
From: Christian König @ 2022-07-07 13:27 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: paulo.r.zanoni, thomas.hellstrom, matthew.auld, daniel.vetter

Am 02.07.22 um 00:50 schrieb Niranjana Vishwanathapura:
> Add uapi allowing user to specify a BO as private to a specified VM
> during the BO creation.
> VM private BOs can only be mapped on the specified VM and can't be
> dma_buf exported. VM private BOs share a single common dma_resv object,
> hence has a performance advantage requiring a single dma_resv object
> update in the execbuf path compared to non-private (shared) BOs.

Sounds like you picked up the per VM BO idea from amdgpu here :)

Of hand looks like a good idea, but shouldn't we add a few comments in 
the common documentation about that?

E.g. something like "Multiple buffer objects sometimes share the same 
dma_resv object....." to the dma_resv documentation.

Probably best as a separate patch after this here has landed.

Regards,
Christian.

>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41 ++++++++++++++++++-
>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
>   .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
>   drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
>   .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
>   drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>   drivers/gpu/drm/i915/i915_vma.c               |  1 +
>   drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
>   include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
>   11 files changed, 110 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> index 927a87e5ec59..7e264566b51f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> @@ -11,6 +11,7 @@
>   #include "pxp/intel_pxp.h"
>   
>   #include "i915_drv.h"
> +#include "i915_gem_context.h"
>   #include "i915_gem_create.h"
>   #include "i915_trace.h"
>   #include "i915_user_extensions.h"
> @@ -243,6 +244,7 @@ struct create_ext {
>   	unsigned int n_placements;
>   	unsigned int placement_mask;
>   	unsigned long flags;
> +	u32 vm_id;
>   };
>   
>   static void repr_placements(char *buf, size_t size,
> @@ -392,9 +394,24 @@ static int ext_set_protected(struct i915_user_extension __user *base, void *data
>   	return 0;
>   }
>   
> +static int ext_set_vm_private(struct i915_user_extension __user *base,
> +			      void *data)
> +{
> +	struct drm_i915_gem_create_ext_vm_private ext;
> +	struct create_ext *ext_data = data;
> +
> +	if (copy_from_user(&ext, base, sizeof(ext)))
> +		return -EFAULT;
> +
> +	ext_data->vm_id = ext.vm_id;
> +
> +	return 0;
> +}
> +
>   static const i915_user_extension_fn create_extensions[] = {
>   	[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>   	[I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
> +	[I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
>   };
>   
>   /**
> @@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
>   	struct drm_i915_private *i915 = to_i915(dev);
>   	struct drm_i915_gem_create_ext *args = data;
>   	struct create_ext ext_data = { .i915 = i915 };
> +	struct i915_address_space *vm = NULL;
>   	struct drm_i915_gem_object *obj;
>   	int ret;
>   
> @@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
>   	if (ret)
>   		return ret;
>   
> +	if (ext_data.vm_id) {
> +		vm = i915_gem_vm_lookup(file->driver_priv, ext_data.vm_id);
> +		if (unlikely(!vm))
> +			return -ENOENT;
> +	}
> +
>   	if (!ext_data.n_placements) {
>   		ext_data.placements[0] =
>   			intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
> @@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
>   						ext_data.placements,
>   						ext_data.n_placements,
>   						ext_data.flags);
> -	if (IS_ERR(obj))
> -		return PTR_ERR(obj);
> +	if (IS_ERR(obj)) {
> +		ret = PTR_ERR(obj);
> +		goto vm_put;
> +	}
> +
> +	if (vm) {
> +		obj->base.resv = vm->root_obj->base.resv;
> +		obj->priv_root = i915_gem_object_get(vm->root_obj);
> +		i915_vm_put(vm);
> +	}
>   
>   	return i915_gem_publish(obj, file, &args->size, &args->handle);
> +vm_put:
> +	if (vm)
> +		i915_vm_put(vm);
> +
> +	return ret;
>   }
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index f5062d0c6333..6433173c3e84 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags)
>   	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
>   	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>   
> +	if (obj->priv_root) {
> +		drm_dbg(obj->base.dev,
> +			"Exporting VM private objects is not allowed\n");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
>   	exp_info.ops = &i915_dmabuf_ops;
>   	exp_info.size = gem_obj->size;
>   	exp_info.flags = flags;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 5cf36a130061..9fe3395ad4d9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -241,6 +241,9 @@ struct drm_i915_gem_object {
>   
>   	const struct drm_i915_gem_object_ops *ops;
>   
> +	/* Shared root is object private to a VM; NULL otherwise */
> +	struct drm_i915_gem_object *priv_root;
> +
>   	struct {
>   		/**
>   		 * @vma.lock: protect the list/tree of vmas
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 7e1f8b83077f..f1912b12db00 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
>   	i915_gem_object_release_memory_region(obj);
>   	mutex_destroy(&obj->ttm.get_io_page.lock);
>   
> +	if (obj->priv_root)
> +		i915_gem_object_put(obj->priv_root);
> +
>   	if (obj->ttm.created) {
>   		/*
>   		 * We freely manage the shrinker LRU outide of the mm.pages life
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> index 642cdb559f17..ee6e4c52e80e 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -26,6 +26,17 @@ static inline void i915_gem_vm_bind_unlock(struct i915_address_space *vm)
>   	mutex_unlock(&vm->vm_bind_lock);
>   }
>   
> +static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
> +					struct i915_gem_ww_ctx *ww)
> +{
> +	return i915_gem_object_lock(vm->root_obj, ww);
> +}
> +
> +static inline void i915_gem_vm_priv_unlock(struct i915_address_space *vm)
> +{
> +	i915_gem_object_unlock(vm->root_obj);
> +}
> +
>   struct i915_vma *
>   i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>   void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> index 43ceb4dcca6c..3201204c8e74 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
>   
>   	if (!list_empty(&vma->vm_bind_link)) {
>   		list_del_init(&vma->vm_bind_link);
> +		list_del_init(&vma->non_priv_vm_bind_link);
>   		i915_vm_bind_it_remove(vma, &vma->vm->va);
>   
>   		/* Release object */
> @@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>   		goto put_obj;
>   	}
>   
> +	if (obj->priv_root && obj->priv_root != vm->root_obj) {
> +		ret = -EINVAL;
> +		goto put_obj;
> +	}
> +
>   	ret = i915_gem_vm_bind_lock_interruptible(vm);
>   	if (ret)
>   		goto put_obj;
> @@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>   
>   	list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>   	i915_vm_bind_it_insert(vma, &vm->va);
> +	if (!obj->priv_root)
> +		list_add_tail(&vma->non_priv_vm_bind_link,
> +			      &vm->non_priv_vm_bind_list);
>   
>   	/* Hold object reference until vm_unbind */
>   	i915_gem_object_get(vma->obj);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 135dc4a76724..df0a8459c3c6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
>   void i915_address_space_fini(struct i915_address_space *vm)
>   {
>   	drm_mm_takedown(&vm->mm);
> +	i915_gem_object_put(vm->root_obj);
>   	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>   	mutex_destroy(&vm->vm_bind_lock);
>   }
> @@ -289,6 +290,9 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>   	INIT_LIST_HEAD(&vm->vm_bind_list);
>   	INIT_LIST_HEAD(&vm->vm_bound_list);
>   	mutex_init(&vm->vm_bind_lock);
> +	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
> +	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
> +	GEM_BUG_ON(IS_ERR(vm->root_obj));
>   }
>   
>   void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index d4a6ce65251d..f538ce9115c9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -267,6 +267,8 @@ struct i915_address_space {
>   	struct list_head vm_bound_list;
>   	/* va tree of persistent vmas */
>   	struct rb_root_cached va;
> +	struct list_head non_priv_vm_bind_list;
> +	struct drm_i915_gem_object *root_obj;
>   
>   	/* Global GTT */
>   	bool is_ggtt:1;
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index d324e29cef0a..f0226581d342 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
>   	mutex_unlock(&vm->mutex);
>   
>   	INIT_LIST_HEAD(&vma->vm_bind_link);
> +	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>   	return vma;
>   
>   err_unlock:
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
> index b6d179bdbfa0..2298b3d6b7c4 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -290,6 +290,8 @@ struct i915_vma {
>   	struct list_head vm_link;
>   
>   	struct list_head vm_bind_link; /* Link in persistent VMA list */
> +	/* Link in non-private persistent VMA list */
> +	struct list_head non_priv_vm_bind_link;
>   
>   	/** Interval tree structures for persistent vma */
>   	struct rb_node rb;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 26cca49717f8..ce1c6592b0d7 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
>   	 *
>   	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
>   	 * struct drm_i915_gem_create_ext_protected_content.
> +	 *
> +	 * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
> +	 * struct drm_i915_gem_create_ext_vm_private.
>   	 */
>   #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
>   #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
> +#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
>   	__u64 extensions;
>   };
>   
> @@ -3662,6 +3666,32 @@ struct drm_i915_gem_create_ext_protected_content {
>   /* ID of the protected content session managed by i915 when PXP is active */
>   #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>   
> +/**
> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
> + * private to the specified VM.
> + *
> + * See struct drm_i915_gem_create_ext.
> + *
> + * By default, BOs can be mapped on multiple VMs and can also be dma-buf
> + * exported. Hence these BOs are referred to as Shared BOs.
> + * During each execbuf3 submission, the request fence must be added to the
> + * dma-resv fence list of all shared BOs mapped on the VM.
> + *
> + * Unlike Shared BOs, these VM private BOs can only be mapped on the VM they
> + * are private to and can't be dma-buf exported. All private BOs of a VM share
> + * the dma-resv object. Hence during each execbuf3 submission, they need only
> + * one dma-resv fence list updated. Thus, the fast path (where required
> + * mappings are already bound) submission latency is O(1) w.r.t the number of
> + * VM private BOs.
> + */
> +struct drm_i915_gem_create_ext_vm_private {
> +	/** @base: Extension link. See struct i915_user_extension. */
> +	struct i915_user_extension base;
> +
> +	/** @vm_id: Id of the VM to which the object is private */
> +	__u32 vm_id;
> +};
> +
>   /**
>    * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>    *


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 06/10] drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
  2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-07 14:41     ` Hellstrom, Thomas
  -1 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-07 14:41 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Brost, Matthew, Zanoni, Paulo R, Ursulin, Tvrtko, Landwerlin,
	Lionel G, Auld, Matthew, jason, Vetter, Daniel, christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
> works in vm_bind mode. The vm_bind mode only works with
> this new execbuf3 ioctl.
> 
> The new execbuf3 ioctl will not have any execlist

I understand this that you mean there is no list of objects to validate
attached to the drm_i915_gem_execbuffer3 structure rather than that the
execlists submission backend is never used. Could we clarify this to
avoid confusion.


>  support
> and all the legacy support like relocations etc are removed.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/i915/Makefile                 |    1 +
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |    5 +
>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1029
> +++++++++++++++++
>  drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
>  drivers/gpu/drm/i915/i915_driver.c            |    1 +
>  include/uapi/drm/i915_drm.h                   |   67 +-
>  6 files changed, 1104 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> 
> diff --git a/drivers/gpu/drm/i915/Makefile
> b/drivers/gpu/drm/i915/Makefile
> index 4e1627e96c6e..38cd1c5bc1a5 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -148,6 +148,7 @@ gem-y += \
>         gem/i915_gem_dmabuf.o \
>         gem/i915_gem_domain.o \
>         gem/i915_gem_execbuffer.o \
> +       gem/i915_gem_execbuffer3.o \
>         gem/i915_gem_internal.o \
>         gem/i915_gem_object.o \
>         gem/i915_gem_lmem.o \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index b7b2c14fd9e1..37bb1383ab8f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -782,6 +782,11 @@ static int eb_select_context(struct
> i915_execbuffer *eb)
>         if (unlikely(IS_ERR(ctx)))
>                 return PTR_ERR(ctx);
>  
> +       if (ctx->vm->vm_bind_mode) {
> +               i915_gem_context_put(ctx);
> +               return -EOPNOTSUPP;
> +       }
> +
>         eb->gem_context = ctx;
>         if (i915_gem_context_has_full_ppgtt(ctx))
>                 eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> new file mode 100644
> index 000000000000..13121df72e3d
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> @@ -0,0 +1,1029 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#include <linux/dma-resv.h>
> +#include <linux/sync_file.h>
> +#include <linux/uaccess.h>
> +
> +#include <drm/drm_syncobj.h>
> +
> +#include "gt/intel_context.h"
> +#include "gt/intel_gpu_commands.h"
> +#include "gt/intel_gt.h"
> +#include "gt/intel_gt_pm.h"
> +#include "gt/intel_ring.h"
> +
> +#include "i915_drv.h"
> +#include "i915_file_private.h"
> +#include "i915_gem_context.h"
> +#include "i915_gem_ioctls.h"
> +#include "i915_gem_vm_bind.h"
> +#include "i915_trace.h"
> +
> +#define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
> +#define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
> +
> +/* Catch emission of unexpected errors for CI! */
> +#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
> +#undef EINVAL
> +#define EINVAL ({ \
> +       DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \
> +       22; \
> +})
> +#endif
> +
> +/**
> + * DOC: User command execution with execbuf3 ioctl
> + *
> + * A VM in VM_BIND mode will not support older execbuf mode of
> binding.
> + * The execbuf ioctl handling in VM_BIND mode differs significantly
> from the
> + * older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
> + * Hence, a new execbuf3 ioctl has been added to support VM_BIND
> mode. (See
> + * struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not
> accept any
> + * execlist. Hence, no support for implicit sync.
> + *
> + * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND
> mode only
> + * works with execbuf3 ioctl for submission.
> + *
> + * The execbuf3 ioctl directly specifies the batch addresses instead
> of as
> + * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also
> not
> + * support many of the older features like in/out/submit fences,
> fence array,
> + * default gem context etc. (See struct drm_i915_gem_execbuffer3).
> + *
> + * In VM_BIND mode, VA allocation is completely managed by the user
> instead of
> + * the i915 driver. Hence all VA assignment, eviction are not
> applicable in
> + * VM_BIND mode. Also, for determining object activeness, VM_BIND
> mode will not
> + * be using the i915_vma active reference tracking. It will instead
> check the
> + * dma-resv object's fence list for that.
> + *
> + * So, a lot of code supporting execbuf2 ioctl, like relocations, VA
> evictions,
> + * vma lookup table, implicit sync, vma active reference tracking
> etc., are not
> + * applicable for execbuf3 ioctl.
> + */
> +
> +struct eb_fence {
> +       struct drm_syncobj *syncobj; /* Use with ptr_mask_bits() */
> +       struct dma_fence *dma_fence;
> +       u64 value;
> +       struct dma_fence_chain *chain_fence;
> +};
> +
> +struct i915_execbuffer {
> +       struct drm_i915_private *i915; /** i915 backpointer */
> +       struct drm_file *file; /** per-file lookup tables and limits
> */
> +       struct drm_i915_gem_execbuffer3 *args; /** ioctl parameters
> */
> +
> +       struct intel_gt *gt; /* gt for the execbuf */
> +       struct intel_context *context; /* logical state for the
> request */
> +       struct i915_gem_context *gem_context; /** caller's context */
> +
> +       /** our requests to build */
> +       struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
> +
> +       /** used for excl fence in dma_resv objects when > 1 BB
> submitted */
> +       struct dma_fence *composite_fence;
> +
> +       struct i915_gem_ww_ctx ww;
> +
> +       /* number of batches in execbuf IOCTL */
> +       unsigned int num_batches;
> +
> +       u64 batch_addresses[MAX_ENGINE_INSTANCE + 1];
> +       /** identity of the batch obj/vma */
> +       struct i915_vma *batches[MAX_ENGINE_INSTANCE + 1];
> +
> +       struct eb_fence *fences;
> +       unsigned long num_fences;
> +};

Kerneldoc structures please.

It seems we are duplicating a lot of code from i915_execbuffer.c. Did
you consider 

struct i915_execbuffer3 {
...
};

struct i915_execbuffer2 {
	struct i915_execbuffer3 eb3;
	...
	[members that are not common]
};

Allowing execbuffer2 to share the execbuffer3 code to some extent.
Not sure about the gain at this point though. My worry would be that fo
r example fixes might be applied to one file and not the other.

> +
> +static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle);
> +static void eb_unpin_engine(struct i915_execbuffer *eb);
> +
> +static int eb_select_context(struct i915_execbuffer *eb)
> +{
> +       struct i915_gem_context *ctx;
> +
> +       ctx = i915_gem_context_lookup(eb->file->driver_priv, eb-
> >args->ctx_id);
> +       if (IS_ERR(ctx))
> +               return PTR_ERR(ctx);
> +
> +       eb->gem_context = ctx;
> +       return 0;
> +}
> +
> +static struct i915_vma *
> +eb_find_vma(struct i915_address_space *vm, u64 addr)
> +{
> +       u64 va;
> +
> +       assert_vm_bind_held(vm);
> +
> +       va = gen8_noncanonical_addr(addr & PIN_OFFSET_MASK);
> +       return i915_gem_vm_bind_lookup_vma(vm, va);
> +}
> +
> +static int eb_lookup_vmas(struct i915_execbuffer *eb)
> +{
> +       unsigned int i, current_batch = 0;
> +       struct i915_vma *vma;
> +
> +       for (i = 0; i < eb->num_batches; i++) {
> +               vma = eb_find_vma(eb->context->vm, eb-
> >batch_addresses[i]);
> +               if (!vma)
> +                       return -EINVAL;
> +
> +               eb->batches[current_batch] = vma;
> +               ++current_batch;
> +       }
> +
> +       return 0;
> +}
> +
> +static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
> +{
> +}
> +
> +static int eb_validate_vmas(struct i915_execbuffer *eb)
> +{
> +       int err;
> +       bool throttle = true;
> +
> +retry:
> +       err = eb_pin_engine(eb, throttle);
> +       if (err) {
> +               if (err != -EDEADLK)
> +                       return err;
> +
> +               goto err;
> +       }
> +
> +       /* only throttle once, even if we didn't need to throttle */
> +       throttle = false;
> +
> +err:
> +       if (err == -EDEADLK) {
> +               err = i915_gem_ww_ctx_backoff(&eb->ww);
> +               if (!err)
> +                       goto retry;
> +       }
> +
> +       return err;
> +}
> +
> +/*
> + * Using two helper loops for the order of which requests / batches
> are created
> + * and added the to backend. Requests are created in order from the
> parent to
> + * the last child. Requests are added in the reverse order, from the
> last child
> + * to parent. This is done for locking reasons as the timeline lock
> is acquired
> + * during request creation and released when the request is added to
> the
> + * backend. To make lockdep happy (see intel_context_timeline_lock)
> this must be
> + * the ordering.
> + */
> +#define for_each_batch_create_order(_eb, _i) \
> +       for ((_i) = 0; (_i) < (_eb)->num_batches; ++(_i))
> +#define for_each_batch_add_order(_eb, _i) \
> +       BUILD_BUG_ON(!typecheck(int, _i)); \
> +       for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
> +
> +static int eb_move_to_gpu(struct i915_execbuffer *eb)
> +{
> +       /* Unconditionally flush any chipset caches (for streaming
> writes). */
> +       intel_gt_chipset_flush(eb->gt);
> +
> +       return 0;
> +}
> +
> +static int eb_request_submit(struct i915_execbuffer *eb,
> +                            struct i915_request *rq,
> +                            struct i915_vma *batch,
> +                            u64 batch_len)
> +{
> +       int err;
> +
> +       if (intel_context_nopreempt(rq->context))
> +               __set_bit(I915_FENCE_FLAG_NOPREEMPT, &rq-
> >fence.flags);
> +
> +       /*
> +        * After we completed waiting for other engines (using HW
> semaphores)
> +        * then we can signal that this request/batch is ready to
> run. This
> +        * allows us to determine if the batch is still waiting on
> the GPU
> +        * or actually running by checking the breadcrumb.
> +        */
> +       if (rq->context->engine->emit_init_breadcrumb) {
> +               err = rq->context->engine->emit_init_breadcrumb(rq);
> +               if (err)
> +                       return err;
> +       }
> +
> +       err = rq->context->engine->emit_bb_start(rq,
> +                                                batch->node.start,
> +                                                batch_len, 0);
> +       if (err)
> +               return err;
> +
> +       return 0;
> +}
> +
> +static int eb_submit(struct i915_execbuffer *eb)
> +{
> +       unsigned int i;
> +       int err;
> +
> +       err = eb_move_to_gpu(eb);
> +
> +       for_each_batch_create_order(eb, i) {
> +               if (!eb->requests[i])
> +                       break;
> +
> +               trace_i915_request_queue(eb->requests[i], 0);
> +               if (!err)
> +                       err = eb_request_submit(eb, eb->requests[i],
> +                                               eb->batches[i],
> +                                               eb->batches[i]-
> >size);
> +       }
> +
> +       return err;
> +}
> +
> +static struct i915_request *eb_throttle(struct i915_execbuffer *eb,
> struct intel_context *ce)
> +{
> +       struct intel_ring *ring = ce->ring;
> +       struct intel_timeline *tl = ce->timeline;
> +       struct i915_request *rq;
> +
> +       /*
> +        * Completely unscientific finger-in-the-air estimates for
> suitable
> +        * maximum user request size (to avoid blocking) and then
> backoff.
> +        */
> +       if (intel_ring_update_space(ring) >= PAGE_SIZE)
> +               return NULL;
> +
> +       /*
> +        * Find a request that after waiting upon, there will be at
> least half
> +        * the ring available. The hysteresis allows us to compete
> for the
> +        * shared ring and should mean that we sleep less often prior
> to
> +        * claiming our resources, but not so long that the ring
> completely
> +        * drains before we can submit our next request.
> +        */
> +       list_for_each_entry(rq, &tl->requests, link) {
> +               if (rq->ring != ring)
> +                       continue;
> +
> +               if (__intel_ring_space(rq->postfix,
> +                                      ring->emit, ring->size) >
> ring->size / 2)
> +                       break;
> +       }
> +       if (&rq->link == &tl->requests)
> +               return NULL; /* weird, we will check again later for
> real */
> +
> +       return i915_request_get(rq);
> +}
> +
> +static int eb_pin_timeline(struct i915_execbuffer *eb, struct
> intel_context *ce,
> +                          bool throttle)
> +{
> +       struct intel_timeline *tl;
> +       struct i915_request *rq = NULL;
> +
> +       /*
> +        * Take a local wakeref for preparing to dispatch the execbuf
> as
> +        * we expect to access the hardware fairly frequently in the
> +        * process, and require the engine to be kept awake between
> accesses.
> +        * Upon dispatch, we acquire another prolonged wakeref that
> we hold
> +        * until the timeline is idle, which in turn releases the
> wakeref
> +        * taken on the engine, and the parent device.
> +        */
> +       tl = intel_context_timeline_lock(ce);
> +       if (IS_ERR(tl))
> +               return PTR_ERR(tl);
> +
> +       intel_context_enter(ce);
> +       if (throttle)
> +               rq = eb_throttle(eb, ce);
> +       intel_context_timeline_unlock(tl);
> +
> +       if (rq) {
> +               bool nonblock = eb->file->filp->f_flags & O_NONBLOCK;
> +               long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
> +
> +               if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
> +                                     timeout) < 0) {
> +                       i915_request_put(rq);
> +
> +                       /*
> +                        * Error path, cannot use
> intel_context_timeline_lock as
> +                        * that is user interruptable and this clean
> up step
> +                        * must be done.
> +                        */
> +                       mutex_lock(&ce->timeline->mutex);
> +                       intel_context_exit(ce);
> +                       mutex_unlock(&ce->timeline->mutex);
> +
> +                       if (nonblock)
> +                               return -EWOULDBLOCK;
> +                       else
> +                               return -EINTR;
> +               }
> +               i915_request_put(rq);
> +       }
> +
> +       return 0;
> +}
> +
> +static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
> +{
> +       struct intel_context *ce = eb->context, *child;
> +       int err;
> +       int i = 0, j = 0;
> +
> +       GEM_BUG_ON(eb->args->flags & __EXEC3_ENGINE_PINNED);
> +
> +       if (unlikely(intel_context_is_banned(ce)))
> +               return -EIO;
> +
> +       /*
> +        * Pinning the contexts may generate requests in order to
> acquire
> +        * GGTT space, so do this first before we reserve a seqno for
> +        * ourselves.
> +        */
> +       err = intel_context_pin_ww(ce, &eb->ww);
> +       if (err)
> +               return err;
> +       for_each_child(ce, child) {
> +               err = intel_context_pin_ww(child, &eb->ww);
> +               GEM_BUG_ON(err);        /* perma-pinned should incr a
> counter */
> +       }
> +
> +       for_each_child(ce, child) {
> +               err = eb_pin_timeline(eb, child, throttle);
> +               if (err)
> +                       goto unwind;
> +               ++i;
> +       }
> +       err = eb_pin_timeline(eb, ce, throttle);
> +       if (err)
> +               goto unwind;
> +
> +       eb->args->flags |= __EXEC3_ENGINE_PINNED;
> +       return 0;
> +
> +unwind:
> +       for_each_child(ce, child) {
> +               if (j++ < i) {
> +                       mutex_lock(&child->timeline->mutex);
> +                       intel_context_exit(child);
> +                       mutex_unlock(&child->timeline->mutex);
> +               }
> +       }
> +       for_each_child(ce, child)
> +               intel_context_unpin(child);
> +       intel_context_unpin(ce);
> +       return err;
> +}
> +
> +static void eb_unpin_engine(struct i915_execbuffer *eb)
> +{
> +       struct intel_context *ce = eb->context, *child;
> +
> +       if (!(eb->args->flags & __EXEC3_ENGINE_PINNED))
> +               return;
> +
> +       eb->args->flags &= ~__EXEC3_ENGINE_PINNED;
> +
> +       for_each_child(ce, child) {
> +               mutex_lock(&child->timeline->mutex);
> +               intel_context_exit(child);
> +               mutex_unlock(&child->timeline->mutex);
> +
> +               intel_context_unpin(child);
> +       }
> +
> +       mutex_lock(&ce->timeline->mutex);
> +       intel_context_exit(ce);
> +       mutex_unlock(&ce->timeline->mutex);
> +
> +       intel_context_unpin(ce);
> +}
> +
> +static int
> +eb_select_engine(struct i915_execbuffer *eb)
> +{
> +       struct intel_context *ce, *child;
> +       unsigned int idx;
> +       int err;
> +
> +       if (!i915_gem_context_user_engines(eb->gem_context))
> +               return -EINVAL;
> +
> +       idx = eb->args->engine_idx;
> +       ce = i915_gem_context_get_engine(eb->gem_context, idx);
> +       if (IS_ERR(ce))
> +               return PTR_ERR(ce);
> +
> +       eb->num_batches = ce->parallel.number_children + 1;
> +
> +       for_each_child(ce, child)
> +               intel_context_get(child);
> +       intel_gt_pm_get(ce->engine->gt);
> +
> +       if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
> +               err = intel_context_alloc_state(ce);
> +               if (err)
> +                       goto err;
> +       }
> +       for_each_child(ce, child) {
> +               if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
> +                       err = intel_context_alloc_state(child);
> +                       if (err)
> +                               goto err;
> +               }
> +       }
> +
> +       /*
> +        * ABI: Before userspace accesses the GPU (e.g. execbuffer),
> report
> +        * EIO if the GPU is already wedged.
> +        */
> +       err = intel_gt_terminally_wedged(ce->engine->gt);
> +       if (err)
> +               goto err;
> +
> +       if (!i915_vm_tryget(ce->vm)) {
> +               err = -ENOENT;
> +               goto err;
> +       }
> +
> +       eb->context = ce;
> +       eb->gt = ce->engine->gt;
> +
> +       /*
> +        * Make sure engine pool stays alive even if we call
> intel_context_put
> +        * during ww handling. The pool is destroyed when last pm
> reference
> +        * is dropped, which breaks our -EDEADLK handling.
> +        */
> +       return err;
> +
> +err:
> +       intel_gt_pm_put(ce->engine->gt);
> +       for_each_child(ce, child)
> +               intel_context_put(child);
> +       intel_context_put(ce);
> +       return err;
> +}
> +
> +static void
> +eb_put_engine(struct i915_execbuffer *eb)
> +{
> +       struct intel_context *child;
> +
> +       i915_vm_put(eb->context->vm);
> +       intel_gt_pm_put(eb->gt);
> +       for_each_child(eb->context, child)
> +               intel_context_put(child);
> +       intel_context_put(eb->context);
> +}
> +
> +static void
> +__free_fence_array(struct eb_fence *fences, unsigned int n)
> +{
> +       while (n--) {
> +               drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
> +               dma_fence_put(fences[n].dma_fence);
> +               dma_fence_chain_free(fences[n].chain_fence);
> +       }
> +       kvfree(fences);
> +}
> +
> +static int add_timeline_fence_array(struct i915_execbuffer *eb)
> +{
> +       struct drm_i915_gem_timeline_fence __user *user_fences;
> +       struct eb_fence *f;
> +       u64 nfences;
> +       int err = 0;
> +
> +       nfences = eb->args->fence_count;
> +       if (!nfences)
> +               return 0;
> +
> +       /* Check multiplication overflow for access_ok() and
> kvmalloc_array() */
> +       BUILD_BUG_ON(sizeof(size_t) > sizeof(unsigned long));
> +       if (nfences > min_t(unsigned long,
> +                           ULONG_MAX / sizeof(*user_fences),
> +                           SIZE_MAX / sizeof(*f)) - eb->num_fences)
> +               return -EINVAL;
> +
> +       user_fences = u64_to_user_ptr(eb->args->timeline_fences);
> +       if (!access_ok(user_fences, nfences * sizeof(*user_fences)))
> +               return -EFAULT;
> +
> +       f = krealloc(eb->fences,
> +                    (eb->num_fences + nfences) * sizeof(*f),
> +                    __GFP_NOWARN | GFP_KERNEL);
> +       if (!f)
> +               return -ENOMEM;
> +
> +       eb->fences = f;
> +       f += eb->num_fences;
> +
> +       BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
> +                    ~__I915_TIMELINE_FENCE_UNKNOWN_FLAGS);
> +
> +       while (nfences--) {
> +               struct drm_i915_gem_timeline_fence user_fence;
> +               struct drm_syncobj *syncobj;
> +               struct dma_fence *fence = NULL;
> +               u64 point;
> +
> +               if (__copy_from_user(&user_fence,
> +                                    user_fences++,
> +                                    sizeof(user_fence)))
> +                       return -EFAULT;
> +
> +               if (user_fence.flags &
> __I915_TIMELINE_FENCE_UNKNOWN_FLAGS)
> +                       return -EINVAL;
> +
> +               syncobj = drm_syncobj_find(eb->file,
> user_fence.handle);
> +               if (!syncobj) {
> +                       DRM_DEBUG("Invalid syncobj handle
> provided\n");
> +                       return -ENOENT;
> +               }
> +
> +               fence = drm_syncobj_fence_get(syncobj);
> +
> +               if (!fence && user_fence.flags &&
> +                   !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL))
> {
> +                       DRM_DEBUG("Syncobj handle has no fence\n");
> +                       drm_syncobj_put(syncobj);
> +                       return -EINVAL;
> +               }
> +
> +               point = user_fence.value;
> +               if (fence)
> +                       err = dma_fence_chain_find_seqno(&fence,
> point);
> +
> +               if (err && !(user_fence.flags &
> I915_TIMELINE_FENCE_SIGNAL)) {
> +                       DRM_DEBUG("Syncobj handle missing requested
> point %llu\n", point);
> +                       dma_fence_put(fence);
> +                       drm_syncobj_put(syncobj);
> +                       return err;
> +               }
> +
> +               /*
> +                * A point might have been signaled already and
> +                * garbage collected from the timeline. In this case
> +                * just ignore the point and carry on.
> +                */
> +               if (!fence && !(user_fence.flags &
> I915_TIMELINE_FENCE_SIGNAL)) {
> +                       drm_syncobj_put(syncobj);
> +                       continue;
> +               }
> +
> +               /*
> +                * For timeline syncobjs we need to preallocate
> chains for
> +                * later signaling.
> +                */
> +               if (point != 0 && user_fence.flags &
> I915_TIMELINE_FENCE_SIGNAL) {
> +                       /*
> +                        * Waiting and signaling the same point (when
> point !=
> +                        * 0) would break the timeline.
> +                        */
> +                       if (user_fence.flags &
> I915_TIMELINE_FENCE_WAIT) {
> +                               DRM_DEBUG("Trying to wait & signal
> the same timeline point.\n");
> +                               dma_fence_put(fence);
> +                               drm_syncobj_put(syncobj);
> +                               return -EINVAL;
> +                       }
> +
> +                       f->chain_fence = dma_fence_chain_alloc();
> +                       if (!f->chain_fence) {
> +                               drm_syncobj_put(syncobj);
> +                               dma_fence_put(fence);
> +                               return -ENOMEM;
> +                       }
> +               } else {
> +                       f->chain_fence = NULL;
> +               }
> +
> +               f->syncobj = ptr_pack_bits(syncobj, user_fence.flags,
> 2);
> +               f->dma_fence = fence;
> +               f->value = point;
> +               f++;
> +               eb->num_fences++;
> +       }
> +
> +       return 0;
> +}
> +
> +static void put_fence_array(struct eb_fence *fences, int num_fences)
> +{
> +       if (fences)
> +               __free_fence_array(fences, num_fences);
> +}
> +
> +static int
> +await_fence_array(struct i915_execbuffer *eb,
> +                 struct i915_request *rq)
> +{
> +       unsigned int n;
> +       int err;
> +
> +       for (n = 0; n < eb->num_fences; n++) {
> +               struct drm_syncobj *syncobj;
> +               unsigned int flags;
> +
> +               syncobj = ptr_unpack_bits(eb->fences[n].syncobj,
> &flags, 2);
> +
> +               if (!eb->fences[n].dma_fence)
> +                       continue;
> +
> +               err = i915_request_await_dma_fence(rq, eb-
> >fences[n].dma_fence);
> +               if (err < 0)
> +                       return err;
> +       }
> +
> +       return 0;
> +}
> +
> +static void signal_fence_array(const struct i915_execbuffer *eb,
> +                              struct dma_fence * const fence)
> +{
> +       unsigned int n;
> +
> +       for (n = 0; n < eb->num_fences; n++) {
> +               struct drm_syncobj *syncobj;
> +               unsigned int flags;
> +
> +               syncobj = ptr_unpack_bits(eb->fences[n].syncobj,
> &flags, 2);
> +               if (!(flags & I915_TIMELINE_FENCE_SIGNAL))
> +                       continue;
> +
> +               if (eb->fences[n].chain_fence) {
> +                       drm_syncobj_add_point(syncobj,
> +                                             eb-
> >fences[n].chain_fence,
> +                                             fence,
> +                                             eb->fences[n].value);
> +                       /*
> +                        * The chain's ownership is transferred to
> the
> +                        * timeline.
> +                        */
> +                       eb->fences[n].chain_fence = NULL;
> +               } else {
> +                       drm_syncobj_replace_fence(syncobj, fence);
> +               }
> +       }
> +}
> +
> +static int parse_timeline_fences(struct i915_execbuffer *eb)
> +{
> +       return add_timeline_fence_array(eb);
> +}
> +
> +static int parse_batch_addresses(struct i915_execbuffer *eb)
> +{
> +       struct drm_i915_gem_execbuffer3 *args = eb->args;
> +       u64 __user *batch_addr = u64_to_user_ptr(args-
> >batch_address);
> +
> +       if (copy_from_user(eb->batch_addresses, batch_addr,
> +                          sizeof(batch_addr[0]) * eb->num_batches))
> +               return -EFAULT;
> +
> +       return 0;
> +}
> +
> +static void retire_requests(struct intel_timeline *tl, struct
> i915_request *end)
> +{
> +       struct i915_request *rq, *rn;
> +
> +       list_for_each_entry_safe(rq, rn, &tl->requests, link)
> +               if (rq == end || !i915_request_retire(rq))
> +                       break;
> +}
> +
> +static int eb_request_add(struct i915_execbuffer *eb, struct
> i915_request *rq,
> +                         int err, bool last_parallel)
> +{
> +       struct intel_timeline * const tl = i915_request_timeline(rq);
> +       struct i915_sched_attr attr = {};
> +       struct i915_request *prev;
> +
> +       lockdep_assert_held(&tl->mutex);
> +       lockdep_unpin_lock(&tl->mutex, rq->cookie);
> +
> +       trace_i915_request_add(rq);
> +
> +       prev = __i915_request_commit(rq);
> +
> +       /* Check that the context wasn't destroyed before submission
> */
> +       if (likely(!intel_context_is_closed(eb->context))) {
> +               attr = eb->gem_context->sched;
> +       } else {
> +               /* Serialise with context_close via the
> add_to_timeline */
> +               i915_request_set_error_once(rq, -ENOENT);
> +               __i915_request_skip(rq);
> +               err = -ENOENT; /* override any transient errors */
> +       }
> +
> +       if (intel_context_is_parallel(eb->context)) {
> +               if (err) {
> +                       __i915_request_skip(rq);
> +                       set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
> +                               &rq->fence.flags);
> +               }
> +               if (last_parallel)
> +                       set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
> +                               &rq->fence.flags);
> +       }
> +
> +       __i915_request_queue(rq, &attr);
> +
> +       /* Try to clean up the client's timeline after submitting the
> request */
> +       if (prev)
> +               retire_requests(tl, prev);
> +
> +       mutex_unlock(&tl->mutex);
> +
> +       return err;
> +}
> +
> +static int eb_requests_add(struct i915_execbuffer *eb, int err)
> +{
> +       int i;
> +
> +       /*
> +        * We iterate in reverse order of creation to release
> timeline mutexes in
> +        * same order.
> +        */
> +       for_each_batch_add_order(eb, i) {
> +               struct i915_request *rq = eb->requests[i];
> +
> +               if (!rq)
> +                       continue;
> +               err |= eb_request_add(eb, rq, err, i == 0);
> +       }
> +
> +       return err;
> +}
> +
> +static void eb_requests_get(struct i915_execbuffer *eb)
> +{
> +       unsigned int i;
> +
> +       for_each_batch_create_order(eb, i) {
> +               if (!eb->requests[i])
> +                       break;
> +
> +               i915_request_get(eb->requests[i]);
> +       }
> +}
> +
> +static void eb_requests_put(struct i915_execbuffer *eb)
> +{
> +       unsigned int i;
> +
> +       for_each_batch_create_order(eb, i) {
> +               if (!eb->requests[i])
> +                       break;
> +
> +               i915_request_put(eb->requests[i]);
> +       }
> +}
> +
> +static int
> +eb_composite_fence_create(struct i915_execbuffer *eb)
> +{
> +       struct dma_fence_array *fence_array;
> +       struct dma_fence **fences;
> +       unsigned int i;
> +
> +       GEM_BUG_ON(!intel_context_is_parent(eb->context));
> +
> +       fences = kmalloc_array(eb->num_batches, sizeof(*fences),
> GFP_KERNEL);
> +       if (!fences)
> +               return -ENOMEM;
> +
> +       for_each_batch_create_order(eb, i) {
> +               fences[i] = &eb->requests[i]->fence;
> +               __set_bit(I915_FENCE_FLAG_COMPOSITE,
> +                         &eb->requests[i]->fence.flags);
> +       }
> +
> +       fence_array = dma_fence_array_create(eb->num_batches,
> +                                            fences,
> +                                            eb->context-
> >parallel.fence_context,
> +                                            eb->context-
> >parallel.seqno++,
> +                                            false);
> +       if (!fence_array) {
> +               kfree(fences);
> +               return -ENOMEM;
> +       }
> +
> +       /* Move ownership to the dma_fence_array created above */
> +       for_each_batch_create_order(eb, i)
> +               dma_fence_get(fences[i]);
> +
> +       eb->composite_fence = &fence_array->base;
> +
> +       return 0;
> +}
> +
> +static int
> +eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq)
> +{
> +       int err;
> +
> +       if (unlikely(eb->gem_context->syncobj)) {
> +               struct dma_fence *fence;
> +
> +               fence = drm_syncobj_fence_get(eb->gem_context-
> >syncobj);
> +               err = i915_request_await_dma_fence(rq, fence);
> +               dma_fence_put(fence);
> +               if (err)
> +                       return err;
> +       }
> +
> +       if (eb->fences) {
> +               err = await_fence_array(eb, rq);
> +               if (err)
> +                       return err;
> +       }
> +
> +       if (intel_context_is_parallel(eb->context)) {
> +               err = eb_composite_fence_create(eb);
> +               if (err)
> +                       return err;
> +       }
> +
> +       return 0;
> +}
> +
> +static struct intel_context *
> +eb_find_context(struct i915_execbuffer *eb, unsigned int
> context_number)
> +{
> +       struct intel_context *child;
> +
> +       if (likely(context_number == 0))
> +               return eb->context;
> +
> +       for_each_child(eb->context, child)
> +               if (!--context_number)
> +                       return child;
> +
> +       GEM_BUG_ON("Context not found");
> +
> +       return NULL;
> +}
> +
> +static int eb_requests_create(struct i915_execbuffer *eb)
> +{
> +       unsigned int i;
> +       int err;
> +
> +       for_each_batch_create_order(eb, i) {
> +               /* Allocate a request for this batch buffer nice and
> early. */
> +               eb->requests[i] =
> i915_request_create(eb_find_context(eb, i));
> +               if (IS_ERR(eb->requests[i])) {
> +                       err = PTR_ERR(eb->requests[i]);
> +                       eb->requests[i] = NULL;
> +                       return err;
> +               }
> +
> +               /*
> +                * Only the first request added (committed to
> backend) has to
> +                * take the in fences into account as all subsequent
> requests
> +                * will have fences inserted inbetween them.
> +                */
> +               if (i + 1 == eb->num_batches) {
> +                       err = eb_fences_add(eb, eb->requests[i]);
> +                       if (err)
> +                               return err;
> +               }

One thing I was hoping with the brand new execbuf3 IOCTL would be that
we could actually make it dma_fence_signalling critical path compliant.

That would mean annotate the dma_fence_signalling critical path just
after the first request is created and ending it just before that same
request was added.

The main violators are memory allocated when adding dependencies in
eb_fences_add(), but since those now are sort of limited in number, we
might be able to pre-allocate that memory before the first request is
created. 

The other main violator would be the multiple batch-buffers. Is this
mode of operation strictly needed for version 1 or can we ditch it?



> +
> +               /*
> +                * Not really on stack, but we don't want to call
> +                * kfree on the batch_snapshot when we put it, so use
> the
> +                * _onstack interface.

This comment is stale and can be removed.


/Thomas


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 06/10] drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
@ 2022-07-07 14:41     ` Hellstrom, Thomas
  0 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-07 14:41 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Zanoni, Paulo R, Auld, Matthew, Vetter, Daniel, christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
> works in vm_bind mode. The vm_bind mode only works with
> this new execbuf3 ioctl.
> 
> The new execbuf3 ioctl will not have any execlist

I understand this that you mean there is no list of objects to validate
attached to the drm_i915_gem_execbuffer3 structure rather than that the
execlists submission backend is never used. Could we clarify this to
avoid confusion.


>  support
> and all the legacy support like relocations etc are removed.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/i915/Makefile                 |    1 +
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |    5 +
>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1029
> +++++++++++++++++
>  drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
>  drivers/gpu/drm/i915/i915_driver.c            |    1 +
>  include/uapi/drm/i915_drm.h                   |   67 +-
>  6 files changed, 1104 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> 
> diff --git a/drivers/gpu/drm/i915/Makefile
> b/drivers/gpu/drm/i915/Makefile
> index 4e1627e96c6e..38cd1c5bc1a5 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -148,6 +148,7 @@ gem-y += \
>         gem/i915_gem_dmabuf.o \
>         gem/i915_gem_domain.o \
>         gem/i915_gem_execbuffer.o \
> +       gem/i915_gem_execbuffer3.o \
>         gem/i915_gem_internal.o \
>         gem/i915_gem_object.o \
>         gem/i915_gem_lmem.o \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index b7b2c14fd9e1..37bb1383ab8f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -782,6 +782,11 @@ static int eb_select_context(struct
> i915_execbuffer *eb)
>         if (unlikely(IS_ERR(ctx)))
>                 return PTR_ERR(ctx);
>  
> +       if (ctx->vm->vm_bind_mode) {
> +               i915_gem_context_put(ctx);
> +               return -EOPNOTSUPP;
> +       }
> +
>         eb->gem_context = ctx;
>         if (i915_gem_context_has_full_ppgtt(ctx))
>                 eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> new file mode 100644
> index 000000000000..13121df72e3d
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> @@ -0,0 +1,1029 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#include <linux/dma-resv.h>
> +#include <linux/sync_file.h>
> +#include <linux/uaccess.h>
> +
> +#include <drm/drm_syncobj.h>
> +
> +#include "gt/intel_context.h"
> +#include "gt/intel_gpu_commands.h"
> +#include "gt/intel_gt.h"
> +#include "gt/intel_gt_pm.h"
> +#include "gt/intel_ring.h"
> +
> +#include "i915_drv.h"
> +#include "i915_file_private.h"
> +#include "i915_gem_context.h"
> +#include "i915_gem_ioctls.h"
> +#include "i915_gem_vm_bind.h"
> +#include "i915_trace.h"
> +
> +#define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
> +#define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
> +
> +/* Catch emission of unexpected errors for CI! */
> +#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
> +#undef EINVAL
> +#define EINVAL ({ \
> +       DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \
> +       22; \
> +})
> +#endif
> +
> +/**
> + * DOC: User command execution with execbuf3 ioctl
> + *
> + * A VM in VM_BIND mode will not support older execbuf mode of
> binding.
> + * The execbuf ioctl handling in VM_BIND mode differs significantly
> from the
> + * older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
> + * Hence, a new execbuf3 ioctl has been added to support VM_BIND
> mode. (See
> + * struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not
> accept any
> + * execlist. Hence, no support for implicit sync.
> + *
> + * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND
> mode only
> + * works with execbuf3 ioctl for submission.
> + *
> + * The execbuf3 ioctl directly specifies the batch addresses instead
> of as
> + * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also
> not
> + * support many of the older features like in/out/submit fences,
> fence array,
> + * default gem context etc. (See struct drm_i915_gem_execbuffer3).
> + *
> + * In VM_BIND mode, VA allocation is completely managed by the user
> instead of
> + * the i915 driver. Hence all VA assignment, eviction are not
> applicable in
> + * VM_BIND mode. Also, for determining object activeness, VM_BIND
> mode will not
> + * be using the i915_vma active reference tracking. It will instead
> check the
> + * dma-resv object's fence list for that.
> + *
> + * So, a lot of code supporting execbuf2 ioctl, like relocations, VA
> evictions,
> + * vma lookup table, implicit sync, vma active reference tracking
> etc., are not
> + * applicable for execbuf3 ioctl.
> + */
> +
> +struct eb_fence {
> +       struct drm_syncobj *syncobj; /* Use with ptr_mask_bits() */
> +       struct dma_fence *dma_fence;
> +       u64 value;
> +       struct dma_fence_chain *chain_fence;
> +};
> +
> +struct i915_execbuffer {
> +       struct drm_i915_private *i915; /** i915 backpointer */
> +       struct drm_file *file; /** per-file lookup tables and limits
> */
> +       struct drm_i915_gem_execbuffer3 *args; /** ioctl parameters
> */
> +
> +       struct intel_gt *gt; /* gt for the execbuf */
> +       struct intel_context *context; /* logical state for the
> request */
> +       struct i915_gem_context *gem_context; /** caller's context */
> +
> +       /** our requests to build */
> +       struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
> +
> +       /** used for excl fence in dma_resv objects when > 1 BB
> submitted */
> +       struct dma_fence *composite_fence;
> +
> +       struct i915_gem_ww_ctx ww;
> +
> +       /* number of batches in execbuf IOCTL */
> +       unsigned int num_batches;
> +
> +       u64 batch_addresses[MAX_ENGINE_INSTANCE + 1];
> +       /** identity of the batch obj/vma */
> +       struct i915_vma *batches[MAX_ENGINE_INSTANCE + 1];
> +
> +       struct eb_fence *fences;
> +       unsigned long num_fences;
> +};

Kerneldoc structures please.

It seems we are duplicating a lot of code from i915_execbuffer.c. Did
you consider 

struct i915_execbuffer3 {
...
};

struct i915_execbuffer2 {
	struct i915_execbuffer3 eb3;
	...
	[members that are not common]
};

Allowing execbuffer2 to share the execbuffer3 code to some extent.
Not sure about the gain at this point though. My worry would be that fo
r example fixes might be applied to one file and not the other.

> +
> +static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle);
> +static void eb_unpin_engine(struct i915_execbuffer *eb);
> +
> +static int eb_select_context(struct i915_execbuffer *eb)
> +{
> +       struct i915_gem_context *ctx;
> +
> +       ctx = i915_gem_context_lookup(eb->file->driver_priv, eb-
> >args->ctx_id);
> +       if (IS_ERR(ctx))
> +               return PTR_ERR(ctx);
> +
> +       eb->gem_context = ctx;
> +       return 0;
> +}
> +
> +static struct i915_vma *
> +eb_find_vma(struct i915_address_space *vm, u64 addr)
> +{
> +       u64 va;
> +
> +       assert_vm_bind_held(vm);
> +
> +       va = gen8_noncanonical_addr(addr & PIN_OFFSET_MASK);
> +       return i915_gem_vm_bind_lookup_vma(vm, va);
> +}
> +
> +static int eb_lookup_vmas(struct i915_execbuffer *eb)
> +{
> +       unsigned int i, current_batch = 0;
> +       struct i915_vma *vma;
> +
> +       for (i = 0; i < eb->num_batches; i++) {
> +               vma = eb_find_vma(eb->context->vm, eb-
> >batch_addresses[i]);
> +               if (!vma)
> +                       return -EINVAL;
> +
> +               eb->batches[current_batch] = vma;
> +               ++current_batch;
> +       }
> +
> +       return 0;
> +}
> +
> +static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
> +{
> +}
> +
> +static int eb_validate_vmas(struct i915_execbuffer *eb)
> +{
> +       int err;
> +       bool throttle = true;
> +
> +retry:
> +       err = eb_pin_engine(eb, throttle);
> +       if (err) {
> +               if (err != -EDEADLK)
> +                       return err;
> +
> +               goto err;
> +       }
> +
> +       /* only throttle once, even if we didn't need to throttle */
> +       throttle = false;
> +
> +err:
> +       if (err == -EDEADLK) {
> +               err = i915_gem_ww_ctx_backoff(&eb->ww);
> +               if (!err)
> +                       goto retry;
> +       }
> +
> +       return err;
> +}
> +
> +/*
> + * Using two helper loops for the order of which requests / batches
> are created
> + * and added the to backend. Requests are created in order from the
> parent to
> + * the last child. Requests are added in the reverse order, from the
> last child
> + * to parent. This is done for locking reasons as the timeline lock
> is acquired
> + * during request creation and released when the request is added to
> the
> + * backend. To make lockdep happy (see intel_context_timeline_lock)
> this must be
> + * the ordering.
> + */
> +#define for_each_batch_create_order(_eb, _i) \
> +       for ((_i) = 0; (_i) < (_eb)->num_batches; ++(_i))
> +#define for_each_batch_add_order(_eb, _i) \
> +       BUILD_BUG_ON(!typecheck(int, _i)); \
> +       for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
> +
> +static int eb_move_to_gpu(struct i915_execbuffer *eb)
> +{
> +       /* Unconditionally flush any chipset caches (for streaming
> writes). */
> +       intel_gt_chipset_flush(eb->gt);
> +
> +       return 0;
> +}
> +
> +static int eb_request_submit(struct i915_execbuffer *eb,
> +                            struct i915_request *rq,
> +                            struct i915_vma *batch,
> +                            u64 batch_len)
> +{
> +       int err;
> +
> +       if (intel_context_nopreempt(rq->context))
> +               __set_bit(I915_FENCE_FLAG_NOPREEMPT, &rq-
> >fence.flags);
> +
> +       /*
> +        * After we completed waiting for other engines (using HW
> semaphores)
> +        * then we can signal that this request/batch is ready to
> run. This
> +        * allows us to determine if the batch is still waiting on
> the GPU
> +        * or actually running by checking the breadcrumb.
> +        */
> +       if (rq->context->engine->emit_init_breadcrumb) {
> +               err = rq->context->engine->emit_init_breadcrumb(rq);
> +               if (err)
> +                       return err;
> +       }
> +
> +       err = rq->context->engine->emit_bb_start(rq,
> +                                                batch->node.start,
> +                                                batch_len, 0);
> +       if (err)
> +               return err;
> +
> +       return 0;
> +}
> +
> +static int eb_submit(struct i915_execbuffer *eb)
> +{
> +       unsigned int i;
> +       int err;
> +
> +       err = eb_move_to_gpu(eb);
> +
> +       for_each_batch_create_order(eb, i) {
> +               if (!eb->requests[i])
> +                       break;
> +
> +               trace_i915_request_queue(eb->requests[i], 0);
> +               if (!err)
> +                       err = eb_request_submit(eb, eb->requests[i],
> +                                               eb->batches[i],
> +                                               eb->batches[i]-
> >size);
> +       }
> +
> +       return err;
> +}
> +
> +static struct i915_request *eb_throttle(struct i915_execbuffer *eb,
> struct intel_context *ce)
> +{
> +       struct intel_ring *ring = ce->ring;
> +       struct intel_timeline *tl = ce->timeline;
> +       struct i915_request *rq;
> +
> +       /*
> +        * Completely unscientific finger-in-the-air estimates for
> suitable
> +        * maximum user request size (to avoid blocking) and then
> backoff.
> +        */
> +       if (intel_ring_update_space(ring) >= PAGE_SIZE)
> +               return NULL;
> +
> +       /*
> +        * Find a request that after waiting upon, there will be at
> least half
> +        * the ring available. The hysteresis allows us to compete
> for the
> +        * shared ring and should mean that we sleep less often prior
> to
> +        * claiming our resources, but not so long that the ring
> completely
> +        * drains before we can submit our next request.
> +        */
> +       list_for_each_entry(rq, &tl->requests, link) {
> +               if (rq->ring != ring)
> +                       continue;
> +
> +               if (__intel_ring_space(rq->postfix,
> +                                      ring->emit, ring->size) >
> ring->size / 2)
> +                       break;
> +       }
> +       if (&rq->link == &tl->requests)
> +               return NULL; /* weird, we will check again later for
> real */
> +
> +       return i915_request_get(rq);
> +}
> +
> +static int eb_pin_timeline(struct i915_execbuffer *eb, struct
> intel_context *ce,
> +                          bool throttle)
> +{
> +       struct intel_timeline *tl;
> +       struct i915_request *rq = NULL;
> +
> +       /*
> +        * Take a local wakeref for preparing to dispatch the execbuf
> as
> +        * we expect to access the hardware fairly frequently in the
> +        * process, and require the engine to be kept awake between
> accesses.
> +        * Upon dispatch, we acquire another prolonged wakeref that
> we hold
> +        * until the timeline is idle, which in turn releases the
> wakeref
> +        * taken on the engine, and the parent device.
> +        */
> +       tl = intel_context_timeline_lock(ce);
> +       if (IS_ERR(tl))
> +               return PTR_ERR(tl);
> +
> +       intel_context_enter(ce);
> +       if (throttle)
> +               rq = eb_throttle(eb, ce);
> +       intel_context_timeline_unlock(tl);
> +
> +       if (rq) {
> +               bool nonblock = eb->file->filp->f_flags & O_NONBLOCK;
> +               long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
> +
> +               if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
> +                                     timeout) < 0) {
> +                       i915_request_put(rq);
> +
> +                       /*
> +                        * Error path, cannot use
> intel_context_timeline_lock as
> +                        * that is user interruptable and this clean
> up step
> +                        * must be done.
> +                        */
> +                       mutex_lock(&ce->timeline->mutex);
> +                       intel_context_exit(ce);
> +                       mutex_unlock(&ce->timeline->mutex);
> +
> +                       if (nonblock)
> +                               return -EWOULDBLOCK;
> +                       else
> +                               return -EINTR;
> +               }
> +               i915_request_put(rq);
> +       }
> +
> +       return 0;
> +}
> +
> +static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
> +{
> +       struct intel_context *ce = eb->context, *child;
> +       int err;
> +       int i = 0, j = 0;
> +
> +       GEM_BUG_ON(eb->args->flags & __EXEC3_ENGINE_PINNED);
> +
> +       if (unlikely(intel_context_is_banned(ce)))
> +               return -EIO;
> +
> +       /*
> +        * Pinning the contexts may generate requests in order to
> acquire
> +        * GGTT space, so do this first before we reserve a seqno for
> +        * ourselves.
> +        */
> +       err = intel_context_pin_ww(ce, &eb->ww);
> +       if (err)
> +               return err;
> +       for_each_child(ce, child) {
> +               err = intel_context_pin_ww(child, &eb->ww);
> +               GEM_BUG_ON(err);        /* perma-pinned should incr a
> counter */
> +       }
> +
> +       for_each_child(ce, child) {
> +               err = eb_pin_timeline(eb, child, throttle);
> +               if (err)
> +                       goto unwind;
> +               ++i;
> +       }
> +       err = eb_pin_timeline(eb, ce, throttle);
> +       if (err)
> +               goto unwind;
> +
> +       eb->args->flags |= __EXEC3_ENGINE_PINNED;
> +       return 0;
> +
> +unwind:
> +       for_each_child(ce, child) {
> +               if (j++ < i) {
> +                       mutex_lock(&child->timeline->mutex);
> +                       intel_context_exit(child);
> +                       mutex_unlock(&child->timeline->mutex);
> +               }
> +       }
> +       for_each_child(ce, child)
> +               intel_context_unpin(child);
> +       intel_context_unpin(ce);
> +       return err;
> +}
> +
> +static void eb_unpin_engine(struct i915_execbuffer *eb)
> +{
> +       struct intel_context *ce = eb->context, *child;
> +
> +       if (!(eb->args->flags & __EXEC3_ENGINE_PINNED))
> +               return;
> +
> +       eb->args->flags &= ~__EXEC3_ENGINE_PINNED;
> +
> +       for_each_child(ce, child) {
> +               mutex_lock(&child->timeline->mutex);
> +               intel_context_exit(child);
> +               mutex_unlock(&child->timeline->mutex);
> +
> +               intel_context_unpin(child);
> +       }
> +
> +       mutex_lock(&ce->timeline->mutex);
> +       intel_context_exit(ce);
> +       mutex_unlock(&ce->timeline->mutex);
> +
> +       intel_context_unpin(ce);
> +}
> +
> +static int
> +eb_select_engine(struct i915_execbuffer *eb)
> +{
> +       struct intel_context *ce, *child;
> +       unsigned int idx;
> +       int err;
> +
> +       if (!i915_gem_context_user_engines(eb->gem_context))
> +               return -EINVAL;
> +
> +       idx = eb->args->engine_idx;
> +       ce = i915_gem_context_get_engine(eb->gem_context, idx);
> +       if (IS_ERR(ce))
> +               return PTR_ERR(ce);
> +
> +       eb->num_batches = ce->parallel.number_children + 1;
> +
> +       for_each_child(ce, child)
> +               intel_context_get(child);
> +       intel_gt_pm_get(ce->engine->gt);
> +
> +       if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
> +               err = intel_context_alloc_state(ce);
> +               if (err)
> +                       goto err;
> +       }
> +       for_each_child(ce, child) {
> +               if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
> +                       err = intel_context_alloc_state(child);
> +                       if (err)
> +                               goto err;
> +               }
> +       }
> +
> +       /*
> +        * ABI: Before userspace accesses the GPU (e.g. execbuffer),
> report
> +        * EIO if the GPU is already wedged.
> +        */
> +       err = intel_gt_terminally_wedged(ce->engine->gt);
> +       if (err)
> +               goto err;
> +
> +       if (!i915_vm_tryget(ce->vm)) {
> +               err = -ENOENT;
> +               goto err;
> +       }
> +
> +       eb->context = ce;
> +       eb->gt = ce->engine->gt;
> +
> +       /*
> +        * Make sure engine pool stays alive even if we call
> intel_context_put
> +        * during ww handling. The pool is destroyed when last pm
> reference
> +        * is dropped, which breaks our -EDEADLK handling.
> +        */
> +       return err;
> +
> +err:
> +       intel_gt_pm_put(ce->engine->gt);
> +       for_each_child(ce, child)
> +               intel_context_put(child);
> +       intel_context_put(ce);
> +       return err;
> +}
> +
> +static void
> +eb_put_engine(struct i915_execbuffer *eb)
> +{
> +       struct intel_context *child;
> +
> +       i915_vm_put(eb->context->vm);
> +       intel_gt_pm_put(eb->gt);
> +       for_each_child(eb->context, child)
> +               intel_context_put(child);
> +       intel_context_put(eb->context);
> +}
> +
> +static void
> +__free_fence_array(struct eb_fence *fences, unsigned int n)
> +{
> +       while (n--) {
> +               drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
> +               dma_fence_put(fences[n].dma_fence);
> +               dma_fence_chain_free(fences[n].chain_fence);
> +       }
> +       kvfree(fences);
> +}
> +
> +static int add_timeline_fence_array(struct i915_execbuffer *eb)
> +{
> +       struct drm_i915_gem_timeline_fence __user *user_fences;
> +       struct eb_fence *f;
> +       u64 nfences;
> +       int err = 0;
> +
> +       nfences = eb->args->fence_count;
> +       if (!nfences)
> +               return 0;
> +
> +       /* Check multiplication overflow for access_ok() and
> kvmalloc_array() */
> +       BUILD_BUG_ON(sizeof(size_t) > sizeof(unsigned long));
> +       if (nfences > min_t(unsigned long,
> +                           ULONG_MAX / sizeof(*user_fences),
> +                           SIZE_MAX / sizeof(*f)) - eb->num_fences)
> +               return -EINVAL;
> +
> +       user_fences = u64_to_user_ptr(eb->args->timeline_fences);
> +       if (!access_ok(user_fences, nfences * sizeof(*user_fences)))
> +               return -EFAULT;
> +
> +       f = krealloc(eb->fences,
> +                    (eb->num_fences + nfences) * sizeof(*f),
> +                    __GFP_NOWARN | GFP_KERNEL);
> +       if (!f)
> +               return -ENOMEM;
> +
> +       eb->fences = f;
> +       f += eb->num_fences;
> +
> +       BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
> +                    ~__I915_TIMELINE_FENCE_UNKNOWN_FLAGS);
> +
> +       while (nfences--) {
> +               struct drm_i915_gem_timeline_fence user_fence;
> +               struct drm_syncobj *syncobj;
> +               struct dma_fence *fence = NULL;
> +               u64 point;
> +
> +               if (__copy_from_user(&user_fence,
> +                                    user_fences++,
> +                                    sizeof(user_fence)))
> +                       return -EFAULT;
> +
> +               if (user_fence.flags &
> __I915_TIMELINE_FENCE_UNKNOWN_FLAGS)
> +                       return -EINVAL;
> +
> +               syncobj = drm_syncobj_find(eb->file,
> user_fence.handle);
> +               if (!syncobj) {
> +                       DRM_DEBUG("Invalid syncobj handle
> provided\n");
> +                       return -ENOENT;
> +               }
> +
> +               fence = drm_syncobj_fence_get(syncobj);
> +
> +               if (!fence && user_fence.flags &&
> +                   !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL))
> {
> +                       DRM_DEBUG("Syncobj handle has no fence\n");
> +                       drm_syncobj_put(syncobj);
> +                       return -EINVAL;
> +               }
> +
> +               point = user_fence.value;
> +               if (fence)
> +                       err = dma_fence_chain_find_seqno(&fence,
> point);
> +
> +               if (err && !(user_fence.flags &
> I915_TIMELINE_FENCE_SIGNAL)) {
> +                       DRM_DEBUG("Syncobj handle missing requested
> point %llu\n", point);
> +                       dma_fence_put(fence);
> +                       drm_syncobj_put(syncobj);
> +                       return err;
> +               }
> +
> +               /*
> +                * A point might have been signaled already and
> +                * garbage collected from the timeline. In this case
> +                * just ignore the point and carry on.
> +                */
> +               if (!fence && !(user_fence.flags &
> I915_TIMELINE_FENCE_SIGNAL)) {
> +                       drm_syncobj_put(syncobj);
> +                       continue;
> +               }
> +
> +               /*
> +                * For timeline syncobjs we need to preallocate
> chains for
> +                * later signaling.
> +                */
> +               if (point != 0 && user_fence.flags &
> I915_TIMELINE_FENCE_SIGNAL) {
> +                       /*
> +                        * Waiting and signaling the same point (when
> point !=
> +                        * 0) would break the timeline.
> +                        */
> +                       if (user_fence.flags &
> I915_TIMELINE_FENCE_WAIT) {
> +                               DRM_DEBUG("Trying to wait & signal
> the same timeline point.\n");
> +                               dma_fence_put(fence);
> +                               drm_syncobj_put(syncobj);
> +                               return -EINVAL;
> +                       }
> +
> +                       f->chain_fence = dma_fence_chain_alloc();
> +                       if (!f->chain_fence) {
> +                               drm_syncobj_put(syncobj);
> +                               dma_fence_put(fence);
> +                               return -ENOMEM;
> +                       }
> +               } else {
> +                       f->chain_fence = NULL;
> +               }
> +
> +               f->syncobj = ptr_pack_bits(syncobj, user_fence.flags,
> 2);
> +               f->dma_fence = fence;
> +               f->value = point;
> +               f++;
> +               eb->num_fences++;
> +       }
> +
> +       return 0;
> +}
> +
> +static void put_fence_array(struct eb_fence *fences, int num_fences)
> +{
> +       if (fences)
> +               __free_fence_array(fences, num_fences);
> +}
> +
> +static int
> +await_fence_array(struct i915_execbuffer *eb,
> +                 struct i915_request *rq)
> +{
> +       unsigned int n;
> +       int err;
> +
> +       for (n = 0; n < eb->num_fences; n++) {
> +               struct drm_syncobj *syncobj;
> +               unsigned int flags;
> +
> +               syncobj = ptr_unpack_bits(eb->fences[n].syncobj,
> &flags, 2);
> +
> +               if (!eb->fences[n].dma_fence)
> +                       continue;
> +
> +               err = i915_request_await_dma_fence(rq, eb-
> >fences[n].dma_fence);
> +               if (err < 0)
> +                       return err;
> +       }
> +
> +       return 0;
> +}
> +
> +static void signal_fence_array(const struct i915_execbuffer *eb,
> +                              struct dma_fence * const fence)
> +{
> +       unsigned int n;
> +
> +       for (n = 0; n < eb->num_fences; n++) {
> +               struct drm_syncobj *syncobj;
> +               unsigned int flags;
> +
> +               syncobj = ptr_unpack_bits(eb->fences[n].syncobj,
> &flags, 2);
> +               if (!(flags & I915_TIMELINE_FENCE_SIGNAL))
> +                       continue;
> +
> +               if (eb->fences[n].chain_fence) {
> +                       drm_syncobj_add_point(syncobj,
> +                                             eb-
> >fences[n].chain_fence,
> +                                             fence,
> +                                             eb->fences[n].value);
> +                       /*
> +                        * The chain's ownership is transferred to
> the
> +                        * timeline.
> +                        */
> +                       eb->fences[n].chain_fence = NULL;
> +               } else {
> +                       drm_syncobj_replace_fence(syncobj, fence);
> +               }
> +       }
> +}
> +
> +static int parse_timeline_fences(struct i915_execbuffer *eb)
> +{
> +       return add_timeline_fence_array(eb);
> +}
> +
> +static int parse_batch_addresses(struct i915_execbuffer *eb)
> +{
> +       struct drm_i915_gem_execbuffer3 *args = eb->args;
> +       u64 __user *batch_addr = u64_to_user_ptr(args-
> >batch_address);
> +
> +       if (copy_from_user(eb->batch_addresses, batch_addr,
> +                          sizeof(batch_addr[0]) * eb->num_batches))
> +               return -EFAULT;
> +
> +       return 0;
> +}
> +
> +static void retire_requests(struct intel_timeline *tl, struct
> i915_request *end)
> +{
> +       struct i915_request *rq, *rn;
> +
> +       list_for_each_entry_safe(rq, rn, &tl->requests, link)
> +               if (rq == end || !i915_request_retire(rq))
> +                       break;
> +}
> +
> +static int eb_request_add(struct i915_execbuffer *eb, struct
> i915_request *rq,
> +                         int err, bool last_parallel)
> +{
> +       struct intel_timeline * const tl = i915_request_timeline(rq);
> +       struct i915_sched_attr attr = {};
> +       struct i915_request *prev;
> +
> +       lockdep_assert_held(&tl->mutex);
> +       lockdep_unpin_lock(&tl->mutex, rq->cookie);
> +
> +       trace_i915_request_add(rq);
> +
> +       prev = __i915_request_commit(rq);
> +
> +       /* Check that the context wasn't destroyed before submission
> */
> +       if (likely(!intel_context_is_closed(eb->context))) {
> +               attr = eb->gem_context->sched;
> +       } else {
> +               /* Serialise with context_close via the
> add_to_timeline */
> +               i915_request_set_error_once(rq, -ENOENT);
> +               __i915_request_skip(rq);
> +               err = -ENOENT; /* override any transient errors */
> +       }
> +
> +       if (intel_context_is_parallel(eb->context)) {
> +               if (err) {
> +                       __i915_request_skip(rq);
> +                       set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
> +                               &rq->fence.flags);
> +               }
> +               if (last_parallel)
> +                       set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
> +                               &rq->fence.flags);
> +       }
> +
> +       __i915_request_queue(rq, &attr);
> +
> +       /* Try to clean up the client's timeline after submitting the
> request */
> +       if (prev)
> +               retire_requests(tl, prev);
> +
> +       mutex_unlock(&tl->mutex);
> +
> +       return err;
> +}
> +
> +static int eb_requests_add(struct i915_execbuffer *eb, int err)
> +{
> +       int i;
> +
> +       /*
> +        * We iterate in reverse order of creation to release
> timeline mutexes in
> +        * same order.
> +        */
> +       for_each_batch_add_order(eb, i) {
> +               struct i915_request *rq = eb->requests[i];
> +
> +               if (!rq)
> +                       continue;
> +               err |= eb_request_add(eb, rq, err, i == 0);
> +       }
> +
> +       return err;
> +}
> +
> +static void eb_requests_get(struct i915_execbuffer *eb)
> +{
> +       unsigned int i;
> +
> +       for_each_batch_create_order(eb, i) {
> +               if (!eb->requests[i])
> +                       break;
> +
> +               i915_request_get(eb->requests[i]);
> +       }
> +}
> +
> +static void eb_requests_put(struct i915_execbuffer *eb)
> +{
> +       unsigned int i;
> +
> +       for_each_batch_create_order(eb, i) {
> +               if (!eb->requests[i])
> +                       break;
> +
> +               i915_request_put(eb->requests[i]);
> +       }
> +}
> +
> +static int
> +eb_composite_fence_create(struct i915_execbuffer *eb)
> +{
> +       struct dma_fence_array *fence_array;
> +       struct dma_fence **fences;
> +       unsigned int i;
> +
> +       GEM_BUG_ON(!intel_context_is_parent(eb->context));
> +
> +       fences = kmalloc_array(eb->num_batches, sizeof(*fences),
> GFP_KERNEL);
> +       if (!fences)
> +               return -ENOMEM;
> +
> +       for_each_batch_create_order(eb, i) {
> +               fences[i] = &eb->requests[i]->fence;
> +               __set_bit(I915_FENCE_FLAG_COMPOSITE,
> +                         &eb->requests[i]->fence.flags);
> +       }
> +
> +       fence_array = dma_fence_array_create(eb->num_batches,
> +                                            fences,
> +                                            eb->context-
> >parallel.fence_context,
> +                                            eb->context-
> >parallel.seqno++,
> +                                            false);
> +       if (!fence_array) {
> +               kfree(fences);
> +               return -ENOMEM;
> +       }
> +
> +       /* Move ownership to the dma_fence_array created above */
> +       for_each_batch_create_order(eb, i)
> +               dma_fence_get(fences[i]);
> +
> +       eb->composite_fence = &fence_array->base;
> +
> +       return 0;
> +}
> +
> +static int
> +eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq)
> +{
> +       int err;
> +
> +       if (unlikely(eb->gem_context->syncobj)) {
> +               struct dma_fence *fence;
> +
> +               fence = drm_syncobj_fence_get(eb->gem_context-
> >syncobj);
> +               err = i915_request_await_dma_fence(rq, fence);
> +               dma_fence_put(fence);
> +               if (err)
> +                       return err;
> +       }
> +
> +       if (eb->fences) {
> +               err = await_fence_array(eb, rq);
> +               if (err)
> +                       return err;
> +       }
> +
> +       if (intel_context_is_parallel(eb->context)) {
> +               err = eb_composite_fence_create(eb);
> +               if (err)
> +                       return err;
> +       }
> +
> +       return 0;
> +}
> +
> +static struct intel_context *
> +eb_find_context(struct i915_execbuffer *eb, unsigned int
> context_number)
> +{
> +       struct intel_context *child;
> +
> +       if (likely(context_number == 0))
> +               return eb->context;
> +
> +       for_each_child(eb->context, child)
> +               if (!--context_number)
> +                       return child;
> +
> +       GEM_BUG_ON("Context not found");
> +
> +       return NULL;
> +}
> +
> +static int eb_requests_create(struct i915_execbuffer *eb)
> +{
> +       unsigned int i;
> +       int err;
> +
> +       for_each_batch_create_order(eb, i) {
> +               /* Allocate a request for this batch buffer nice and
> early. */
> +               eb->requests[i] =
> i915_request_create(eb_find_context(eb, i));
> +               if (IS_ERR(eb->requests[i])) {
> +                       err = PTR_ERR(eb->requests[i]);
> +                       eb->requests[i] = NULL;
> +                       return err;
> +               }
> +
> +               /*
> +                * Only the first request added (committed to
> backend) has to
> +                * take the in fences into account as all subsequent
> requests
> +                * will have fences inserted inbetween them.
> +                */
> +               if (i + 1 == eb->num_batches) {
> +                       err = eb_fences_add(eb, eb->requests[i]);
> +                       if (err)
> +                               return err;
> +               }

One thing I was hoping with the brand new execbuf3 IOCTL would be that
we could actually make it dma_fence_signalling critical path compliant.

That would mean annotate the dma_fence_signalling critical path just
after the first request is created and ending it just before that same
request was added.

The main violators are memory allocated when adding dependencies in
eb_fences_add(), but since those now are sort of limited in number, we
might be able to pre-allocate that memory before the first request is
created. 

The other main violator would be the multiple batch-buffers. Is this
mode of operation strictly needed for version 1 or can we ditch it?



> +
> +               /*
> +                * Not really on stack, but we don't want to call
> +                * kfree on the batch_snapshot when we put it, so use
> the
> +                * _onstack interface.

This comment is stale and can be removed.


/Thomas


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 07/10] drm/i915/vm_bind: Handle persistent vmas in execbuf3
  2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-07 14:54     ` Hellstrom, Thomas
  -1 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-07 14:54 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Brost, Matthew, Zanoni, Paulo R, Ursulin, Tvrtko, Landwerlin,
	Lionel G, Auld, Matthew, jason, Vetter, Daniel, christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> Handle persistent (VM_BIND) mappings during the request submission
> in the execbuf3 path.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 176
> +++++++++++++++++-
>  1 file changed, 175 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> index 13121df72e3d..2079f5ca9010 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> @@ -22,6 +22,7 @@
>  #include "i915_gem_vm_bind.h"
>  #include "i915_trace.h"
>  
> +#define __EXEC3_HAS_PIN                        BIT_ULL(33)
>  #define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
>  #define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
>  
> @@ -45,7 +46,9 @@
>   * execlist. Hence, no support for implicit sync.
>   *
>   * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND
> mode only
> - * works with execbuf3 ioctl for submission.
> + * works with execbuf3 ioctl for submission. All BOs mapped on that
> VM (through
> + * VM_BIND call) at the time of execbuf3 call are deemed required
> for that
> + * submission.
>   *
>   * The execbuf3 ioctl directly specifies the batch addresses instead
> of as
>   * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also
> not
> @@ -61,6 +64,13 @@
>   * So, a lot of code supporting execbuf2 ioctl, like relocations, VA
> evictions,
>   * vma lookup table, implicit sync, vma active reference tracking
> etc., are not
>   * applicable for execbuf3 ioctl.
> + *
> + * During each execbuf submission, request fence is added to all
> VM_BIND mapped
> + * objects with DMA_RESV_USAGE_BOOKKEEP. The DMA_RESV_USAGE_BOOKKEEP
> usage will
> + * prevent over sync (See enum dma_resv_usage). Note that
> DRM_I915_GEM_WAIT and
> + * DRM_I915_GEM_BUSY ioctls do not check for DMA_RESV_USAGE_BOOKKEEP
> usage and
> + * hence should not be used for end of batch check. Instead, the
> execbuf3
> + * timeline out fence should be used for end of batch check.
>   */
>  
>  struct eb_fence {
> @@ -124,6 +134,19 @@ eb_find_vma(struct i915_address_space *vm, u64
> addr)
>         return i915_gem_vm_bind_lookup_vma(vm, va);
>  }
>  
> +static void eb_scoop_unbound_vmas(struct i915_address_space *vm)
> +{
> +       struct i915_vma *vma, *vn;
> +
> +       spin_lock(&vm->vm_rebind_lock);
> +       list_for_each_entry_safe(vma, vn, &vm->vm_rebind_list,
> vm_rebind_link) {
> +               list_del_init(&vma->vm_rebind_link);
> +               if (!list_empty(&vma->vm_bind_link))
> +                       list_move_tail(&vma->vm_bind_link, &vm-
> >vm_bind_list);
> +       }
> +       spin_unlock(&vm->vm_rebind_lock);
> +}
> +
>  static int eb_lookup_vmas(struct i915_execbuffer *eb)
>  {
>         unsigned int i, current_batch = 0;
> @@ -138,11 +161,118 @@ static int eb_lookup_vmas(struct
> i915_execbuffer *eb)
>                 ++current_batch;
>         }
>  
> +       eb_scoop_unbound_vmas(eb->context->vm);
> +
> +       return 0;
> +}
> +
> +static int eb_lock_vmas(struct i915_execbuffer *eb)
> +{
> +       struct i915_address_space *vm = eb->context->vm;
> +       struct i915_vma *vma;
> +       int err;
> +
> +       err = i915_gem_vm_priv_lock(eb->context->vm, &eb->ww);
> +       if (err)
> +               return err;
> +

See comment in review for 08/10 about re-checking the rebind list here.



> +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
> +                           non_priv_vm_bind_link) {
> +               err = i915_gem_object_lock(vma->obj, &eb->ww);
> +               if (err)
> +                       return err;
> +       }
> +
>         return 0;
>  }
>  
> +static void eb_release_persistent_vmas(struct i915_execbuffer *eb,
> bool final)
> +{
> +       struct i915_address_space *vm = eb->context->vm;
> +       struct i915_vma *vma, *vn;
> +
> +       assert_vm_bind_held(vm);
> +
> +       if (!(eb->args->flags & __EXEC3_HAS_PIN))
> +               return;
> +
> +       assert_vm_priv_held(vm);
> +
> +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
> +               __i915_vma_unpin(vma);
> +
> +       eb->args->flags &= ~__EXEC3_HAS_PIN;
> +       if (!final)
> +               return;
> +
> +       list_for_each_entry_safe(vma, vn, &vm->vm_bind_list,
> vm_bind_link)
> +               if (i915_vma_is_bind_complete(vma))
> +                       list_move_tail(&vma->vm_bind_link, &vm-
> >vm_bound_list);
> +}
> +
>  static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
>  {
> +       eb_release_persistent_vmas(eb, final);
> +       eb_unpin_engine(eb);
> +}
> +
> +static int eb_reserve_fence_for_persistent_vmas(struct
> i915_execbuffer *eb)
> +{
> +       struct i915_address_space *vm = eb->context->vm;
> +       struct i915_vma *vma;
> +       int ret;
> +
> +       ret = dma_resv_reserve_fences(vm->root_obj->base.resv, 1);
> +       if (ret)
> +               return ret;
> +
> +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
> +                           non_priv_vm_bind_link) {
> +               ret = dma_resv_reserve_fences(vma->obj->base.resv,
> 1);
> +               if (ret)
> +                       return ret;
> +       }
> +
> +       return 0;
> +}
> +
> +static int eb_validate_persistent_vmas(struct i915_execbuffer *eb)
> +{
> +       struct i915_address_space *vm = eb->context->vm;
> +       struct i915_vma *vma, *last_pinned_vma = NULL;
> +       int ret = 0;
> +
> +       assert_vm_bind_held(vm);
> +       assert_vm_priv_held(vm);
> +
> +       ret = eb_reserve_fence_for_persistent_vmas(eb);
> +       if (ret)
> +               return ret;
> +
> +       if (list_empty(&vm->vm_bind_list))
> +               return 0;
> +
> +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
> +               u64 pin_flags = vma->start | PIN_OFFSET_FIXED |
> PIN_USER;
> +
> +               ret = i915_vma_pin_ww(vma, &eb->ww, 0, 0, pin_flags);
> +               if (ret)
> +                       break;
> +
> +               last_pinned_vma = vma;
> +       }
> +
> +       if (ret && last_pinned_vma) {
> +               list_for_each_entry(vma, &vm->vm_bind_list,
> vm_bind_link) {
> +                       __i915_vma_unpin(vma);
> +                       if (vma == last_pinned_vma)
> +                               break;
> +               }
> +       } else if (last_pinned_vma) {
> +               eb->args->flags |= __EXEC3_HAS_PIN;
> +       }
> +
> +       return ret;
>  }
>  
>  static int eb_validate_vmas(struct i915_execbuffer *eb)
> @@ -162,8 +292,17 @@ static int eb_validate_vmas(struct
> i915_execbuffer *eb)
>         /* only throttle once, even if we didn't need to throttle */
>         throttle = false;
>  
> +       err = eb_lock_vmas(eb);
> +       if (err)
> +               goto err;
> +
> +       err = eb_validate_persistent_vmas(eb);
> +       if (err)
> +               goto err;
> +
>  err:
>         if (err == -EDEADLK) {
> +               eb_release_vmas(eb, false);
>                 err = i915_gem_ww_ctx_backoff(&eb->ww);
>                 if (!err)
>                         goto retry;
> @@ -187,8 +326,43 @@ static int eb_validate_vmas(struct
> i915_execbuffer *eb)
>         BUILD_BUG_ON(!typecheck(int, _i)); \
>         for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
>  
> +static void __eb_persistent_add_shared_fence(struct
> drm_i915_gem_object *obj,
> +                                            struct dma_fence *fence)
> +{
> +       dma_resv_add_fence(obj->base.resv, fence,
> DMA_RESV_USAGE_BOOKKEEP);
> +       obj->write_domain = 0;
> +       obj->read_domains |= I915_GEM_GPU_DOMAINS;
> +       obj->mm.dirty = true;
> +}
> +
> +static void eb_persistent_add_shared_fence(struct i915_execbuffer
> *eb)
> +{
> +       struct i915_address_space *vm = eb->context->vm;
> +       struct dma_fence *fence;
> +       struct i915_vma *vma;
> +
> +       fence = eb->composite_fence ? eb->composite_fence :
> +               &eb->requests[0]->fence;
> +
> +       __eb_persistent_add_shared_fence(vm->root_obj, fence);
> +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
> +                           non_priv_vm_bind_link)
> +               __eb_persistent_add_shared_fence(vma->obj, fence);
> +}
> +
> +static void eb_persistent_vmas_move_to_active(struct i915_execbuffer
> *eb)
> +{
> +       /* Add fence to BOs dma-resv fence list */
> +       eb_persistent_add_shared_fence(eb);

This means we don't add any fences to the vma active trackers. While
this works fine for TTM delayed buffer destruction, unbinding at
eviction and shrinking wouldn't wait for gpu activity to idle before
unbinding?


/Thomas


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 07/10] drm/i915/vm_bind: Handle persistent vmas in execbuf3
@ 2022-07-07 14:54     ` Hellstrom, Thomas
  0 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-07 14:54 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Zanoni, Paulo R, Auld, Matthew, Vetter, Daniel, christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> Handle persistent (VM_BIND) mappings during the request submission
> in the execbuf3 path.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 176
> +++++++++++++++++-
>  1 file changed, 175 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> index 13121df72e3d..2079f5ca9010 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> @@ -22,6 +22,7 @@
>  #include "i915_gem_vm_bind.h"
>  #include "i915_trace.h"
>  
> +#define __EXEC3_HAS_PIN                        BIT_ULL(33)
>  #define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
>  #define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
>  
> @@ -45,7 +46,9 @@
>   * execlist. Hence, no support for implicit sync.
>   *
>   * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND
> mode only
> - * works with execbuf3 ioctl for submission.
> + * works with execbuf3 ioctl for submission. All BOs mapped on that
> VM (through
> + * VM_BIND call) at the time of execbuf3 call are deemed required
> for that
> + * submission.
>   *
>   * The execbuf3 ioctl directly specifies the batch addresses instead
> of as
>   * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also
> not
> @@ -61,6 +64,13 @@
>   * So, a lot of code supporting execbuf2 ioctl, like relocations, VA
> evictions,
>   * vma lookup table, implicit sync, vma active reference tracking
> etc., are not
>   * applicable for execbuf3 ioctl.
> + *
> + * During each execbuf submission, request fence is added to all
> VM_BIND mapped
> + * objects with DMA_RESV_USAGE_BOOKKEEP. The DMA_RESV_USAGE_BOOKKEEP
> usage will
> + * prevent over sync (See enum dma_resv_usage). Note that
> DRM_I915_GEM_WAIT and
> + * DRM_I915_GEM_BUSY ioctls do not check for DMA_RESV_USAGE_BOOKKEEP
> usage and
> + * hence should not be used for end of batch check. Instead, the
> execbuf3
> + * timeline out fence should be used for end of batch check.
>   */
>  
>  struct eb_fence {
> @@ -124,6 +134,19 @@ eb_find_vma(struct i915_address_space *vm, u64
> addr)
>         return i915_gem_vm_bind_lookup_vma(vm, va);
>  }
>  
> +static void eb_scoop_unbound_vmas(struct i915_address_space *vm)
> +{
> +       struct i915_vma *vma, *vn;
> +
> +       spin_lock(&vm->vm_rebind_lock);
> +       list_for_each_entry_safe(vma, vn, &vm->vm_rebind_list,
> vm_rebind_link) {
> +               list_del_init(&vma->vm_rebind_link);
> +               if (!list_empty(&vma->vm_bind_link))
> +                       list_move_tail(&vma->vm_bind_link, &vm-
> >vm_bind_list);
> +       }
> +       spin_unlock(&vm->vm_rebind_lock);
> +}
> +
>  static int eb_lookup_vmas(struct i915_execbuffer *eb)
>  {
>         unsigned int i, current_batch = 0;
> @@ -138,11 +161,118 @@ static int eb_lookup_vmas(struct
> i915_execbuffer *eb)
>                 ++current_batch;
>         }
>  
> +       eb_scoop_unbound_vmas(eb->context->vm);
> +
> +       return 0;
> +}
> +
> +static int eb_lock_vmas(struct i915_execbuffer *eb)
> +{
> +       struct i915_address_space *vm = eb->context->vm;
> +       struct i915_vma *vma;
> +       int err;
> +
> +       err = i915_gem_vm_priv_lock(eb->context->vm, &eb->ww);
> +       if (err)
> +               return err;
> +

See comment in review for 08/10 about re-checking the rebind list here.



> +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
> +                           non_priv_vm_bind_link) {
> +               err = i915_gem_object_lock(vma->obj, &eb->ww);
> +               if (err)
> +                       return err;
> +       }
> +
>         return 0;
>  }
>  
> +static void eb_release_persistent_vmas(struct i915_execbuffer *eb,
> bool final)
> +{
> +       struct i915_address_space *vm = eb->context->vm;
> +       struct i915_vma *vma, *vn;
> +
> +       assert_vm_bind_held(vm);
> +
> +       if (!(eb->args->flags & __EXEC3_HAS_PIN))
> +               return;
> +
> +       assert_vm_priv_held(vm);
> +
> +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
> +               __i915_vma_unpin(vma);
> +
> +       eb->args->flags &= ~__EXEC3_HAS_PIN;
> +       if (!final)
> +               return;
> +
> +       list_for_each_entry_safe(vma, vn, &vm->vm_bind_list,
> vm_bind_link)
> +               if (i915_vma_is_bind_complete(vma))
> +                       list_move_tail(&vma->vm_bind_link, &vm-
> >vm_bound_list);
> +}
> +
>  static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
>  {
> +       eb_release_persistent_vmas(eb, final);
> +       eb_unpin_engine(eb);
> +}
> +
> +static int eb_reserve_fence_for_persistent_vmas(struct
> i915_execbuffer *eb)
> +{
> +       struct i915_address_space *vm = eb->context->vm;
> +       struct i915_vma *vma;
> +       int ret;
> +
> +       ret = dma_resv_reserve_fences(vm->root_obj->base.resv, 1);
> +       if (ret)
> +               return ret;
> +
> +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
> +                           non_priv_vm_bind_link) {
> +               ret = dma_resv_reserve_fences(vma->obj->base.resv,
> 1);
> +               if (ret)
> +                       return ret;
> +       }
> +
> +       return 0;
> +}
> +
> +static int eb_validate_persistent_vmas(struct i915_execbuffer *eb)
> +{
> +       struct i915_address_space *vm = eb->context->vm;
> +       struct i915_vma *vma, *last_pinned_vma = NULL;
> +       int ret = 0;
> +
> +       assert_vm_bind_held(vm);
> +       assert_vm_priv_held(vm);
> +
> +       ret = eb_reserve_fence_for_persistent_vmas(eb);
> +       if (ret)
> +               return ret;
> +
> +       if (list_empty(&vm->vm_bind_list))
> +               return 0;
> +
> +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
> +               u64 pin_flags = vma->start | PIN_OFFSET_FIXED |
> PIN_USER;
> +
> +               ret = i915_vma_pin_ww(vma, &eb->ww, 0, 0, pin_flags);
> +               if (ret)
> +                       break;
> +
> +               last_pinned_vma = vma;
> +       }
> +
> +       if (ret && last_pinned_vma) {
> +               list_for_each_entry(vma, &vm->vm_bind_list,
> vm_bind_link) {
> +                       __i915_vma_unpin(vma);
> +                       if (vma == last_pinned_vma)
> +                               break;
> +               }
> +       } else if (last_pinned_vma) {
> +               eb->args->flags |= __EXEC3_HAS_PIN;
> +       }
> +
> +       return ret;
>  }
>  
>  static int eb_validate_vmas(struct i915_execbuffer *eb)
> @@ -162,8 +292,17 @@ static int eb_validate_vmas(struct
> i915_execbuffer *eb)
>         /* only throttle once, even if we didn't need to throttle */
>         throttle = false;
>  
> +       err = eb_lock_vmas(eb);
> +       if (err)
> +               goto err;
> +
> +       err = eb_validate_persistent_vmas(eb);
> +       if (err)
> +               goto err;
> +
>  err:
>         if (err == -EDEADLK) {
> +               eb_release_vmas(eb, false);
>                 err = i915_gem_ww_ctx_backoff(&eb->ww);
>                 if (!err)
>                         goto retry;
> @@ -187,8 +326,43 @@ static int eb_validate_vmas(struct
> i915_execbuffer *eb)
>         BUILD_BUG_ON(!typecheck(int, _i)); \
>         for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
>  
> +static void __eb_persistent_add_shared_fence(struct
> drm_i915_gem_object *obj,
> +                                            struct dma_fence *fence)
> +{
> +       dma_resv_add_fence(obj->base.resv, fence,
> DMA_RESV_USAGE_BOOKKEEP);
> +       obj->write_domain = 0;
> +       obj->read_domains |= I915_GEM_GPU_DOMAINS;
> +       obj->mm.dirty = true;
> +}
> +
> +static void eb_persistent_add_shared_fence(struct i915_execbuffer
> *eb)
> +{
> +       struct i915_address_space *vm = eb->context->vm;
> +       struct dma_fence *fence;
> +       struct i915_vma *vma;
> +
> +       fence = eb->composite_fence ? eb->composite_fence :
> +               &eb->requests[0]->fence;
> +
> +       __eb_persistent_add_shared_fence(vm->root_obj, fence);
> +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
> +                           non_priv_vm_bind_link)
> +               __eb_persistent_add_shared_fence(vma->obj, fence);
> +}
> +
> +static void eb_persistent_vmas_move_to_active(struct i915_execbuffer
> *eb)
> +{
> +       /* Add fence to BOs dma-resv fence list */
> +       eb_persistent_add_shared_fence(eb);

This means we don't add any fences to the vma active trackers. While
this works fine for TTM delayed buffer destruction, unbinding at
eviction and shrinking wouldn't wait for gpu activity to idle before
unbinding?


/Thomas


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 06/10] drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
  2022-07-07 14:41     ` [Intel-gfx] " Hellstrom, Thomas
@ 2022-07-07 19:38       ` Andi Shyti
  -1 siblings, 0 replies; 121+ messages in thread
From: Andi Shyti @ 2022-07-07 19:38 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, Vishwanathapura, Niranjana, christian.koenig

Hi,

> It seems we are duplicating a lot of code from i915_execbuffer.c. Did
> you consider 

yeah... while reading the code I was thinking the same then I see
that you made the same comment. Perhaps we need to group
commonalities and make common library for execbuf 2 and 3.

Andi

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 06/10] drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
@ 2022-07-07 19:38       ` Andi Shyti
  0 siblings, 0 replies; 121+ messages in thread
From: Andi Shyti @ 2022-07-07 19:38 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, christian.koenig

Hi,

> It seems we are duplicating a lot of code from i915_execbuffer.c. Did
> you consider 

yeah... while reading the code I was thinking the same then I see
that you made the same comment. Perhaps we need to group
commonalities and make common library for execbuf 2 and 3.

Andi

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 08/10] drm/i915/vm_bind: userptr dma-resv changes
  2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-08 12:17     ` Hellstrom, Thomas
  -1 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-08 12:17 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Brost, Matthew, Zanoni, Paulo R, Ursulin, Tvrtko, Landwerlin,
	Lionel G, Auld, Matthew, jason, Vetter, Daniel, christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> For persistent (vm_bind) vmas of userptr BOs, handle the user
> page pinning by using the i915_gem_object_userptr_submit_init()
> /done() functions
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 67
> +++++++++++++++++++
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 16 +++++
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  1 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  1 +
>  4 files changed, 85 insertions(+)
> 

Hmm. I also miss the code in userptr invalidate that puts invalidated
vm-private userptr vmas on the rebind list?

/Thomas


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 08/10] drm/i915/vm_bind: userptr dma-resv changes
@ 2022-07-08 12:17     ` Hellstrom, Thomas
  0 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-08 12:17 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Zanoni, Paulo R, Auld, Matthew, Vetter, Daniel, christian.koenig

On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> For persistent (vm_bind) vmas of userptr BOs, handle the user
> page pinning by using the i915_gem_object_userptr_submit_init()
> /done() functions
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 67
> +++++++++++++++++++
>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 16 +++++
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  1 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  1 +
>  4 files changed, 85 insertions(+)
> 

Hmm. I also miss the code in userptr invalidate that puts invalidated
vm-private userptr vmas on the rebind list?

/Thomas


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 06/10] drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
  2022-07-07 19:38       ` Andi Shyti
@ 2022-07-08 12:22         ` Hellstrom, Thomas
  -1 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-08 12:22 UTC (permalink / raw)
  To: andi.shyti
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, Vishwanathapura, Niranjana, christian.koenig

On Thu, 2022-07-07 at 21:38 +0200, Andi Shyti wrote:
> Hi,
> 
> > It seems we are duplicating a lot of code from i915_execbuffer.c.
> > Did
> > you consider 
> 
> yeah... while reading the code I was thinking the same then I see
> that you made the same comment. Perhaps we need to group
> commonalities and make common library for execbuf 2 and 3.
> 

Indeed, we should at least attempt this, and judge whether the
assumption that this will allow us to remove a bunch of duplicated code
will hold.

/Thomas


> Andi


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 06/10] drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
@ 2022-07-08 12:22         ` Hellstrom, Thomas
  0 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-08 12:22 UTC (permalink / raw)
  To: andi.shyti
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, christian.koenig

On Thu, 2022-07-07 at 21:38 +0200, Andi Shyti wrote:
> Hi,
> 
> > It seems we are duplicating a lot of code from i915_execbuffer.c.
> > Did
> > you consider 
> 
> yeah... while reading the code I was thinking the same then I see
> that you made the same comment. Perhaps we need to group
> commonalities and make common library for execbuf 2 and 3.
> 

Indeed, we should at least attempt this, and judge whether the
assumption that this will allow us to remove a bunch of duplicated code
will hold.

/Thomas


> Andi


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 09/10] drm/i915/vm_bind: Skip vma_lookup for persistent vmas
  2022-07-05  8:57     ` Thomas Hellström
@ 2022-07-08 12:40       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 12:40 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: matthew.brost, paulo.r.zanoni, lionel.g.landwerlin,
	tvrtko.ursulin, intel-gfx, dri-devel, matthew.auld, jason,
	daniel.vetter, christian.koenig

On Tue, Jul 05, 2022 at 10:57:17AM +0200, Thomas Hellström wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> vma_lookup is tied to segment of the object instead of section
>> of VA space. Hence, it do not support aliasing (ie., multiple
>> bindings to the same section of the object).
>> Skip vma_lookup for persistent vmas as it supports aliasing.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_vma.c | 19 ++++++++++++++-----
>>  1 file changed, 14 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c
>> b/drivers/gpu/drm/i915/i915_vma.c
>> index 6adb013579be..9aa38b772b5b 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -197,6 +197,10 @@ vma_create(struct drm_i915_gem_object *obj,
>>                 __set_bit(I915_VMA_GGTT_BIT, __i915_vma_flags(vma));
>>         }
>>  
>> +       if (!i915_vma_is_ggtt(vma) &&
>> +           (view && view->type == I915_GGTT_VIEW_PARTIAL))
>> +               goto skip_rb_insert;
>> +
>
>Rather than guessing that a vma with this signature is a persistent
>vma, which is confusing to the reader, could we have an argument saying
>we want to create a persistent vma?

Yah, sounds good. We probably can even check for vm->vm_bind_mode here
instead of passing a new argument. I think i915 won't create any
internal vmas for this VM, so, should be good to check vm->vm_bind_mode.

>
>>         rb = NULL;
>>         p = &obj->vma.tree.rb_node;
>>         while (*p) {
>> @@ -221,6 +225,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>         rb_link_node(&vma->obj_node, rb, p);
>>         rb_insert_color(&vma->obj_node, &obj->vma.tree);
>>  
>> +skip_rb_insert:
>>         if (i915_vma_is_ggtt(vma))
>>                 /*
>>                  * We put the GGTT vma at the start of the vma-list,
>> followed
>> @@ -292,13 +297,16 @@ i915_vma_instance(struct drm_i915_gem_object
>> *obj,
>>                   struct i915_address_space *vm,
>>                   const struct i915_ggtt_view *view)
>>  {
>> -       struct i915_vma *vma;
>> +       struct i915_vma *vma = NULL;
>>  
>>         GEM_BUG_ON(!kref_read(&vm->ref));
>>  
>> -       spin_lock(&obj->vma.lock);
>> -       vma = i915_vma_lookup(obj, vm, view);
>> -       spin_unlock(&obj->vma.lock);
>> +       if (i915_is_ggtt(vm) || !view ||
>> +           view->type != I915_GGTT_VIEW_PARTIAL) {
>
>Same here?

We probably can remove this code and have vm_bind ioctl directly
call vma_create.

Niranjana

>
>/Thomas
>
>
>> +               spin_lock(&obj->vma.lock);
>> +               vma = i915_vma_lookup(obj, vm, view);
>> +               spin_unlock(&obj->vma.lock);
>> +       }
>>  
>>         /* vma_create() will resolve the race if another creates the
>> vma */
>>         if (unlikely(!vma))
>> @@ -1670,7 +1678,8 @@ static void release_references(struct i915_vma
>> *vma, bool vm_ddestroy)
>>  
>>         spin_lock(&obj->vma.lock);
>>         list_del(&vma->obj_link);
>> -       if (!RB_EMPTY_NODE(&vma->obj_node))
>> +       if (!i915_vma_is_persistent(vma) &&
>> +           !RB_EMPTY_NODE(&vma->obj_node))
>>                 rb_erase(&vma->obj_node, &obj->vma.tree);
>>  
>>         spin_unlock(&obj->vma.lock);
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 09/10] drm/i915/vm_bind: Skip vma_lookup for persistent vmas
@ 2022-07-08 12:40       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 12:40 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, matthew.auld,
	daniel.vetter, christian.koenig

On Tue, Jul 05, 2022 at 10:57:17AM +0200, Thomas Hellström wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> vma_lookup is tied to segment of the object instead of section
>> of VA space. Hence, it do not support aliasing (ie., multiple
>> bindings to the same section of the object).
>> Skip vma_lookup for persistent vmas as it supports aliasing.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_vma.c | 19 ++++++++++++++-----
>>  1 file changed, 14 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c
>> b/drivers/gpu/drm/i915/i915_vma.c
>> index 6adb013579be..9aa38b772b5b 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -197,6 +197,10 @@ vma_create(struct drm_i915_gem_object *obj,
>>                 __set_bit(I915_VMA_GGTT_BIT, __i915_vma_flags(vma));
>>         }
>>  
>> +       if (!i915_vma_is_ggtt(vma) &&
>> +           (view && view->type == I915_GGTT_VIEW_PARTIAL))
>> +               goto skip_rb_insert;
>> +
>
>Rather than guessing that a vma with this signature is a persistent
>vma, which is confusing to the reader, could we have an argument saying
>we want to create a persistent vma?

Yah, sounds good. We probably can even check for vm->vm_bind_mode here
instead of passing a new argument. I think i915 won't create any
internal vmas for this VM, so, should be good to check vm->vm_bind_mode.

>
>>         rb = NULL;
>>         p = &obj->vma.tree.rb_node;
>>         while (*p) {
>> @@ -221,6 +225,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>         rb_link_node(&vma->obj_node, rb, p);
>>         rb_insert_color(&vma->obj_node, &obj->vma.tree);
>>  
>> +skip_rb_insert:
>>         if (i915_vma_is_ggtt(vma))
>>                 /*
>>                  * We put the GGTT vma at the start of the vma-list,
>> followed
>> @@ -292,13 +297,16 @@ i915_vma_instance(struct drm_i915_gem_object
>> *obj,
>>                   struct i915_address_space *vm,
>>                   const struct i915_ggtt_view *view)
>>  {
>> -       struct i915_vma *vma;
>> +       struct i915_vma *vma = NULL;
>>  
>>         GEM_BUG_ON(!kref_read(&vm->ref));
>>  
>> -       spin_lock(&obj->vma.lock);
>> -       vma = i915_vma_lookup(obj, vm, view);
>> -       spin_unlock(&obj->vma.lock);
>> +       if (i915_is_ggtt(vm) || !view ||
>> +           view->type != I915_GGTT_VIEW_PARTIAL) {
>
>Same here?

We probably can remove this code and have vm_bind ioctl directly
call vma_create.

Niranjana

>
>/Thomas
>
>
>> +               spin_lock(&obj->vma.lock);
>> +               vma = i915_vma_lookup(obj, vm, view);
>> +               spin_unlock(&obj->vma.lock);
>> +       }
>>  
>>         /* vma_create() will resolve the race if another creates the
>> vma */
>>         if (unlikely(!vma))
>> @@ -1670,7 +1678,8 @@ static void release_references(struct i915_vma
>> *vma, bool vm_ddestroy)
>>  
>>         spin_lock(&obj->vma.lock);
>>         list_del(&vma->obj_link);
>> -       if (!RB_EMPTY_NODE(&vma->obj_node))
>> +       if (!i915_vma_is_persistent(vma) &&
>> +           !RB_EMPTY_NODE(&vma->obj_node))
>>                 rb_erase(&vma->obj_node, &obj->vma.tree);
>>  
>>         spin_unlock(&obj->vma.lock);
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 07/10] drm/i915/vm_bind: Handle persistent vmas in execbuf3
  2022-07-07 14:54     ` [Intel-gfx] " Hellstrom, Thomas
@ 2022-07-08 12:44       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 12:44 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	Daniel, christian.koenig

On Thu, Jul 07, 2022 at 07:54:16AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> Handle persistent (VM_BIND) mappings during the request submission
>> in the execbuf3 path.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 176
>> +++++++++++++++++-
>>  1 file changed, 175 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> index 13121df72e3d..2079f5ca9010 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> @@ -22,6 +22,7 @@
>>  #include "i915_gem_vm_bind.h"
>>  #include "i915_trace.h"
>>
>> +#define __EXEC3_HAS_PIN                        BIT_ULL(33)
>>  #define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
>>  #define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
>>
>> @@ -45,7 +46,9 @@
>>   * execlist. Hence, no support for implicit sync.
>>   *
>>   * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND
>> mode only
>> - * works with execbuf3 ioctl for submission.
>> + * works with execbuf3 ioctl for submission. All BOs mapped on that
>> VM (through
>> + * VM_BIND call) at the time of execbuf3 call are deemed required
>> for that
>> + * submission.
>>   *
>>   * The execbuf3 ioctl directly specifies the batch addresses instead
>> of as
>>   * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also
>> not
>> @@ -61,6 +64,13 @@
>>   * So, a lot of code supporting execbuf2 ioctl, like relocations, VA
>> evictions,
>>   * vma lookup table, implicit sync, vma active reference tracking
>> etc., are not
>>   * applicable for execbuf3 ioctl.
>> + *
>> + * During each execbuf submission, request fence is added to all
>> VM_BIND mapped
>> + * objects with DMA_RESV_USAGE_BOOKKEEP. The DMA_RESV_USAGE_BOOKKEEP
>> usage will
>> + * prevent over sync (See enum dma_resv_usage). Note that
>> DRM_I915_GEM_WAIT and
>> + * DRM_I915_GEM_BUSY ioctls do not check for DMA_RESV_USAGE_BOOKKEEP
>> usage and
>> + * hence should not be used for end of batch check. Instead, the
>> execbuf3
>> + * timeline out fence should be used for end of batch check.
>>   */
>>
>>  struct eb_fence {
>> @@ -124,6 +134,19 @@ eb_find_vma(struct i915_address_space *vm, u64
>> addr)
>>         return i915_gem_vm_bind_lookup_vma(vm, va);
>>  }
>>
>> +static void eb_scoop_unbound_vmas(struct i915_address_space *vm)
>> +{
>> +       struct i915_vma *vma, *vn;
>> +
>> +       spin_lock(&vm->vm_rebind_lock);
>> +       list_for_each_entry_safe(vma, vn, &vm->vm_rebind_list,
>> vm_rebind_link) {
>> +               list_del_init(&vma->vm_rebind_link);
>> +               if (!list_empty(&vma->vm_bind_link))
>> +                       list_move_tail(&vma->vm_bind_link, &vm-
>> >vm_bind_list);
>> +       }
>> +       spin_unlock(&vm->vm_rebind_lock);
>> +}
>> +
>>  static int eb_lookup_vmas(struct i915_execbuffer *eb)
>>  {
>>         unsigned int i, current_batch = 0;
>> @@ -138,11 +161,118 @@ static int eb_lookup_vmas(struct
>> i915_execbuffer *eb)
>>                 ++current_batch;
>>         }
>>
>> +       eb_scoop_unbound_vmas(eb->context->vm);
>> +
>> +       return 0;
>> +}
>> +
>> +static int eb_lock_vmas(struct i915_execbuffer *eb)
>> +{
>> +       struct i915_address_space *vm = eb->context->vm;
>> +       struct i915_vma *vma;
>> +       int err;
>> +
>> +       err = i915_gem_vm_priv_lock(eb->context->vm, &eb->ww);
>> +       if (err)
>> +               return err;
>> +
>
>See comment in review for 08/10 about re-checking the rebind list here.
>
>
>
>> +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
>> +                           non_priv_vm_bind_link) {
>> +               err = i915_gem_object_lock(vma->obj, &eb->ww);
>> +               if (err)
>> +                       return err;
>> +       }
>> +
>>         return 0;
>>  }
>>
>> +static void eb_release_persistent_vmas(struct i915_execbuffer *eb,
>> bool final)
>> +{
>> +       struct i915_address_space *vm = eb->context->vm;
>> +       struct i915_vma *vma, *vn;
>> +
>> +       assert_vm_bind_held(vm);
>> +
>> +       if (!(eb->args->flags & __EXEC3_HAS_PIN))
>> +               return;
>> +
>> +       assert_vm_priv_held(vm);
>> +
>> +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
>> +               __i915_vma_unpin(vma);
>> +
>> +       eb->args->flags &= ~__EXEC3_HAS_PIN;
>> +       if (!final)
>> +               return;
>> +
>> +       list_for_each_entry_safe(vma, vn, &vm->vm_bind_list,
>> vm_bind_link)
>> +               if (i915_vma_is_bind_complete(vma))
>> +                       list_move_tail(&vma->vm_bind_link, &vm-
>> >vm_bound_list);
>> +}
>> +
>>  static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
>>  {
>> +       eb_release_persistent_vmas(eb, final);
>> +       eb_unpin_engine(eb);
>> +}
>> +
>> +static int eb_reserve_fence_for_persistent_vmas(struct
>> i915_execbuffer *eb)
>> +{
>> +       struct i915_address_space *vm = eb->context->vm;
>> +       struct i915_vma *vma;
>> +       int ret;
>> +
>> +       ret = dma_resv_reserve_fences(vm->root_obj->base.resv, 1);
>> +       if (ret)
>> +               return ret;
>> +
>> +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
>> +                           non_priv_vm_bind_link) {
>> +               ret = dma_resv_reserve_fences(vma->obj->base.resv,
>> 1);
>> +               if (ret)
>> +                       return ret;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static int eb_validate_persistent_vmas(struct i915_execbuffer *eb)
>> +{
>> +       struct i915_address_space *vm = eb->context->vm;
>> +       struct i915_vma *vma, *last_pinned_vma = NULL;
>> +       int ret = 0;
>> +
>> +       assert_vm_bind_held(vm);
>> +       assert_vm_priv_held(vm);
>> +
>> +       ret = eb_reserve_fence_for_persistent_vmas(eb);
>> +       if (ret)
>> +               return ret;
>> +
>> +       if (list_empty(&vm->vm_bind_list))
>> +               return 0;
>> +
>> +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
>> +               u64 pin_flags = vma->start | PIN_OFFSET_FIXED |
>> PIN_USER;
>> +
>> +               ret = i915_vma_pin_ww(vma, &eb->ww, 0, 0, pin_flags);
>> +               if (ret)
>> +                       break;
>> +
>> +               last_pinned_vma = vma;
>> +       }
>> +
>> +       if (ret && last_pinned_vma) {
>> +               list_for_each_entry(vma, &vm->vm_bind_list,
>> vm_bind_link) {
>> +                       __i915_vma_unpin(vma);
>> +                       if (vma == last_pinned_vma)
>> +                               break;
>> +               }
>> +       } else if (last_pinned_vma) {
>> +               eb->args->flags |= __EXEC3_HAS_PIN;
>> +       }
>> +
>> +       return ret;
>>  }
>>
>>  static int eb_validate_vmas(struct i915_execbuffer *eb)
>> @@ -162,8 +292,17 @@ static int eb_validate_vmas(struct
>> i915_execbuffer *eb)
>>         /* only throttle once, even if we didn't need to throttle */
>>         throttle = false;
>>
>> +       err = eb_lock_vmas(eb);
>> +       if (err)
>> +               goto err;
>> +
>> +       err = eb_validate_persistent_vmas(eb);
>> +       if (err)
>> +               goto err;
>> +
>>  err:
>>         if (err == -EDEADLK) {
>> +               eb_release_vmas(eb, false);
>>                 err = i915_gem_ww_ctx_backoff(&eb->ww);
>>                 if (!err)
>>                         goto retry;
>> @@ -187,8 +326,43 @@ static int eb_validate_vmas(struct
>> i915_execbuffer *eb)
>>         BUILD_BUG_ON(!typecheck(int, _i)); \
>>         for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
>>
>> +static void __eb_persistent_add_shared_fence(struct
>> drm_i915_gem_object *obj,
>> +                                            struct dma_fence *fence)
>> +{
>> +       dma_resv_add_fence(obj->base.resv, fence,
>> DMA_RESV_USAGE_BOOKKEEP);
>> +       obj->write_domain = 0;
>> +       obj->read_domains |= I915_GEM_GPU_DOMAINS;
>> +       obj->mm.dirty = true;
>> +}
>> +
>> +static void eb_persistent_add_shared_fence(struct i915_execbuffer
>> *eb)
>> +{
>> +       struct i915_address_space *vm = eb->context->vm;
>> +       struct dma_fence *fence;
>> +       struct i915_vma *vma;
>> +
>> +       fence = eb->composite_fence ? eb->composite_fence :
>> +               &eb->requests[0]->fence;
>> +
>> +       __eb_persistent_add_shared_fence(vm->root_obj, fence);
>> +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
>> +                           non_priv_vm_bind_link)
>> +               __eb_persistent_add_shared_fence(vma->obj, fence);
>> +}
>> +
>> +static void eb_persistent_vmas_move_to_active(struct i915_execbuffer
>> *eb)
>> +{
>> +       /* Add fence to BOs dma-resv fence list */
>> +       eb_persistent_add_shared_fence(eb);
>
>This means we don't add any fences to the vma active trackers. While
>this works fine for TTM delayed buffer destruction, unbinding at
>eviction and shrinking wouldn't wait for gpu activity to idle before
>unbinding?

Eviction and shrinker will wait for gpu activity to idle before unbinding.
The i915_vma_is_active() and i915_vma_sync() have been updated to handle
the persistent vmas to differntly (by checking/waiting for dma-resv fence
list).

Niranjana

>
>
>/Thomas
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 07/10] drm/i915/vm_bind: Handle persistent vmas in execbuf3
@ 2022-07-08 12:44       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 12:44 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, christian.koenig

On Thu, Jul 07, 2022 at 07:54:16AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> Handle persistent (VM_BIND) mappings during the request submission
>> in the execbuf3 path.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 176
>> +++++++++++++++++-
>>  1 file changed, 175 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> index 13121df72e3d..2079f5ca9010 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> @@ -22,6 +22,7 @@
>>  #include "i915_gem_vm_bind.h"
>>  #include "i915_trace.h"
>>
>> +#define __EXEC3_HAS_PIN                        BIT_ULL(33)
>>  #define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
>>  #define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
>>
>> @@ -45,7 +46,9 @@
>>   * execlist. Hence, no support for implicit sync.
>>   *
>>   * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND
>> mode only
>> - * works with execbuf3 ioctl for submission.
>> + * works with execbuf3 ioctl for submission. All BOs mapped on that
>> VM (through
>> + * VM_BIND call) at the time of execbuf3 call are deemed required
>> for that
>> + * submission.
>>   *
>>   * The execbuf3 ioctl directly specifies the batch addresses instead
>> of as
>>   * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also
>> not
>> @@ -61,6 +64,13 @@
>>   * So, a lot of code supporting execbuf2 ioctl, like relocations, VA
>> evictions,
>>   * vma lookup table, implicit sync, vma active reference tracking
>> etc., are not
>>   * applicable for execbuf3 ioctl.
>> + *
>> + * During each execbuf submission, request fence is added to all
>> VM_BIND mapped
>> + * objects with DMA_RESV_USAGE_BOOKKEEP. The DMA_RESV_USAGE_BOOKKEEP
>> usage will
>> + * prevent over sync (See enum dma_resv_usage). Note that
>> DRM_I915_GEM_WAIT and
>> + * DRM_I915_GEM_BUSY ioctls do not check for DMA_RESV_USAGE_BOOKKEEP
>> usage and
>> + * hence should not be used for end of batch check. Instead, the
>> execbuf3
>> + * timeline out fence should be used for end of batch check.
>>   */
>>
>>  struct eb_fence {
>> @@ -124,6 +134,19 @@ eb_find_vma(struct i915_address_space *vm, u64
>> addr)
>>         return i915_gem_vm_bind_lookup_vma(vm, va);
>>  }
>>
>> +static void eb_scoop_unbound_vmas(struct i915_address_space *vm)
>> +{
>> +       struct i915_vma *vma, *vn;
>> +
>> +       spin_lock(&vm->vm_rebind_lock);
>> +       list_for_each_entry_safe(vma, vn, &vm->vm_rebind_list,
>> vm_rebind_link) {
>> +               list_del_init(&vma->vm_rebind_link);
>> +               if (!list_empty(&vma->vm_bind_link))
>> +                       list_move_tail(&vma->vm_bind_link, &vm-
>> >vm_bind_list);
>> +       }
>> +       spin_unlock(&vm->vm_rebind_lock);
>> +}
>> +
>>  static int eb_lookup_vmas(struct i915_execbuffer *eb)
>>  {
>>         unsigned int i, current_batch = 0;
>> @@ -138,11 +161,118 @@ static int eb_lookup_vmas(struct
>> i915_execbuffer *eb)
>>                 ++current_batch;
>>         }
>>
>> +       eb_scoop_unbound_vmas(eb->context->vm);
>> +
>> +       return 0;
>> +}
>> +
>> +static int eb_lock_vmas(struct i915_execbuffer *eb)
>> +{
>> +       struct i915_address_space *vm = eb->context->vm;
>> +       struct i915_vma *vma;
>> +       int err;
>> +
>> +       err = i915_gem_vm_priv_lock(eb->context->vm, &eb->ww);
>> +       if (err)
>> +               return err;
>> +
>
>See comment in review for 08/10 about re-checking the rebind list here.
>
>
>
>> +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
>> +                           non_priv_vm_bind_link) {
>> +               err = i915_gem_object_lock(vma->obj, &eb->ww);
>> +               if (err)
>> +                       return err;
>> +       }
>> +
>>         return 0;
>>  }
>>
>> +static void eb_release_persistent_vmas(struct i915_execbuffer *eb,
>> bool final)
>> +{
>> +       struct i915_address_space *vm = eb->context->vm;
>> +       struct i915_vma *vma, *vn;
>> +
>> +       assert_vm_bind_held(vm);
>> +
>> +       if (!(eb->args->flags & __EXEC3_HAS_PIN))
>> +               return;
>> +
>> +       assert_vm_priv_held(vm);
>> +
>> +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
>> +               __i915_vma_unpin(vma);
>> +
>> +       eb->args->flags &= ~__EXEC3_HAS_PIN;
>> +       if (!final)
>> +               return;
>> +
>> +       list_for_each_entry_safe(vma, vn, &vm->vm_bind_list,
>> vm_bind_link)
>> +               if (i915_vma_is_bind_complete(vma))
>> +                       list_move_tail(&vma->vm_bind_link, &vm-
>> >vm_bound_list);
>> +}
>> +
>>  static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
>>  {
>> +       eb_release_persistent_vmas(eb, final);
>> +       eb_unpin_engine(eb);
>> +}
>> +
>> +static int eb_reserve_fence_for_persistent_vmas(struct
>> i915_execbuffer *eb)
>> +{
>> +       struct i915_address_space *vm = eb->context->vm;
>> +       struct i915_vma *vma;
>> +       int ret;
>> +
>> +       ret = dma_resv_reserve_fences(vm->root_obj->base.resv, 1);
>> +       if (ret)
>> +               return ret;
>> +
>> +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
>> +                           non_priv_vm_bind_link) {
>> +               ret = dma_resv_reserve_fences(vma->obj->base.resv,
>> 1);
>> +               if (ret)
>> +                       return ret;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static int eb_validate_persistent_vmas(struct i915_execbuffer *eb)
>> +{
>> +       struct i915_address_space *vm = eb->context->vm;
>> +       struct i915_vma *vma, *last_pinned_vma = NULL;
>> +       int ret = 0;
>> +
>> +       assert_vm_bind_held(vm);
>> +       assert_vm_priv_held(vm);
>> +
>> +       ret = eb_reserve_fence_for_persistent_vmas(eb);
>> +       if (ret)
>> +               return ret;
>> +
>> +       if (list_empty(&vm->vm_bind_list))
>> +               return 0;
>> +
>> +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
>> +               u64 pin_flags = vma->start | PIN_OFFSET_FIXED |
>> PIN_USER;
>> +
>> +               ret = i915_vma_pin_ww(vma, &eb->ww, 0, 0, pin_flags);
>> +               if (ret)
>> +                       break;
>> +
>> +               last_pinned_vma = vma;
>> +       }
>> +
>> +       if (ret && last_pinned_vma) {
>> +               list_for_each_entry(vma, &vm->vm_bind_list,
>> vm_bind_link) {
>> +                       __i915_vma_unpin(vma);
>> +                       if (vma == last_pinned_vma)
>> +                               break;
>> +               }
>> +       } else if (last_pinned_vma) {
>> +               eb->args->flags |= __EXEC3_HAS_PIN;
>> +       }
>> +
>> +       return ret;
>>  }
>>
>>  static int eb_validate_vmas(struct i915_execbuffer *eb)
>> @@ -162,8 +292,17 @@ static int eb_validate_vmas(struct
>> i915_execbuffer *eb)
>>         /* only throttle once, even if we didn't need to throttle */
>>         throttle = false;
>>
>> +       err = eb_lock_vmas(eb);
>> +       if (err)
>> +               goto err;
>> +
>> +       err = eb_validate_persistent_vmas(eb);
>> +       if (err)
>> +               goto err;
>> +
>>  err:
>>         if (err == -EDEADLK) {
>> +               eb_release_vmas(eb, false);
>>                 err = i915_gem_ww_ctx_backoff(&eb->ww);
>>                 if (!err)
>>                         goto retry;
>> @@ -187,8 +326,43 @@ static int eb_validate_vmas(struct
>> i915_execbuffer *eb)
>>         BUILD_BUG_ON(!typecheck(int, _i)); \
>>         for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
>>
>> +static void __eb_persistent_add_shared_fence(struct
>> drm_i915_gem_object *obj,
>> +                                            struct dma_fence *fence)
>> +{
>> +       dma_resv_add_fence(obj->base.resv, fence,
>> DMA_RESV_USAGE_BOOKKEEP);
>> +       obj->write_domain = 0;
>> +       obj->read_domains |= I915_GEM_GPU_DOMAINS;
>> +       obj->mm.dirty = true;
>> +}
>> +
>> +static void eb_persistent_add_shared_fence(struct i915_execbuffer
>> *eb)
>> +{
>> +       struct i915_address_space *vm = eb->context->vm;
>> +       struct dma_fence *fence;
>> +       struct i915_vma *vma;
>> +
>> +       fence = eb->composite_fence ? eb->composite_fence :
>> +               &eb->requests[0]->fence;
>> +
>> +       __eb_persistent_add_shared_fence(vm->root_obj, fence);
>> +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
>> +                           non_priv_vm_bind_link)
>> +               __eb_persistent_add_shared_fence(vma->obj, fence);
>> +}
>> +
>> +static void eb_persistent_vmas_move_to_active(struct i915_execbuffer
>> *eb)
>> +{
>> +       /* Add fence to BOs dma-resv fence list */
>> +       eb_persistent_add_shared_fence(eb);
>
>This means we don't add any fences to the vma active trackers. While
>this works fine for TTM delayed buffer destruction, unbinding at
>eviction and shrinking wouldn't wait for gpu activity to idle before
>unbinding?

Eviction and shrinker will wait for gpu activity to idle before unbinding.
The i915_vma_is_active() and i915_vma_sync() have been updated to handle
the persistent vmas to differntly (by checking/waiting for dma-resv fence
list).

Niranjana

>
>
>/Thomas
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
  2022-07-07  8:14         ` [Intel-gfx] " Thomas Hellström
@ 2022-07-08 12:57           ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 12:57 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: matthew.brost, paulo.r.zanoni, lionel.g.landwerlin,
	tvrtko.ursulin, intel-gfx, dri-devel, matthew.auld, jason,
	daniel.vetter, christian.koenig

On Thu, Jul 07, 2022 at 10:14:38AM +0200, Thomas Hellström wrote:
>On Wed, 2022-07-06 at 22:43 -0700, Niranjana Vishwanathapura wrote:
>> On Wed, Jul 06, 2022 at 06:21:03PM +0200, Thomas Hellström wrote:
>> > On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> > > Bind and unbind the mappings upon VM_BIND and VM_UNBIND calls.
>> > >
>> > > Signed-off-by: Niranjana Vishwanathapura
>> > > <niranjana.vishwanathapura@intel.com>
>> > > Signed-off-by: Prathap Kumar Valsan
>> > > <prathap.kumar.valsan@intel.com>
>> > > ---
>> > >  drivers/gpu/drm/i915/Makefile                 |   1 +
>> > >  drivers/gpu/drm/i915/gem/i915_gem_create.c    |  10 +-
>> > >  drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +
>> > >  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  38 +++
>> > >  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 233
>> > > ++++++++++++++++++
>> > >  drivers/gpu/drm/i915/gt/intel_gtt.c           |   7 +
>> > >  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>> > >  drivers/gpu/drm/i915/i915_driver.c            |  11 +-
>> > >  drivers/gpu/drm/i915/i915_vma.c               |   7 +-
>> > >  drivers/gpu/drm/i915/i915_vma.h               |   2 -
>> > >  drivers/gpu/drm/i915/i915_vma_types.h         |   8 +
>> > >  11 files changed, 318 insertions(+), 10 deletions(-)
>> > >  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> > >  create mode 100644
>> > > drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> > >
>> > > diff --git a/drivers/gpu/drm/i915/Makefile
>> > > b/drivers/gpu/drm/i915/Makefile
>> > > index 522ef9b4aff3..4e1627e96c6e 100644
>> > > --- a/drivers/gpu/drm/i915/Makefile
>> > > +++ b/drivers/gpu/drm/i915/Makefile
>> > > @@ -165,6 +165,7 @@ gem-y += \
>> > >         gem/i915_gem_ttm_move.o \
>> > >         gem/i915_gem_ttm_pm.o \
>> > >         gem/i915_gem_userptr.o \
>> > > +       gem/i915_gem_vm_bind_object.o \
>> > >         gem/i915_gem_wait.o \
>> > >         gem/i915_gemfs.o
>> > >  i915-y += \
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> > > index 33673fe7ee0a..927a87e5ec59 100644
>> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> > > @@ -15,10 +15,10 @@
>> > >  #include "i915_trace.h"
>> > >  #include "i915_user_extensions.h"
>> > >  
>> > > -static u32 object_max_page_size(struct intel_memory_region
>> > > **placements,
>> > > -                               unsigned int n_placements)
>> > > +u32 i915_gem_object_max_page_size(struct intel_memory_region
>> > > **placements,
>> >
>> > Kerneldoc.
>>
>> This is an existing function that is being modified. As I
>> mentioned in other thread, we probably need a prep patch early
>> in this series to add missing kernel-docs in i915 which this
>> patch series would later update.
>
>Here we make a static function extern, which according to the patch
>submission guidelines, mandates a kerneloc comment, so it's not so much
>that the function is modified. We should be fine adding kerneldoc in
>the patch that makes the function extern.
>

Ok, sounds good.

>
>>
>> >
>> > > +                                 unsigned int n_placements)
>> > >  {
>> > > -       u32 max_page_size = 0;
>> > > +       u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>> > >         int i;
>> > >  
>> > >         for (i = 0; i < n_placements; i++) {
>> > > @@ -28,7 +28,6 @@ static u32 object_max_page_size(struct
>> > > intel_memory_region **placements,
>> > >                 max_page_size = max_t(u32, max_page_size, mr-
>> > > > min_page_size);
>> > >         }
>> > >  
>> > > -       GEM_BUG_ON(!max_page_size);
>> > >         return max_page_size;
>> > >  }
>> >
>> > Should this change be separated out? It's not immediately clear to
>> > a
>> > reviewer why it is included.
>>
>> It is being removed as max_page_size now has a non-zero default
>> value and hence this check is not valid anymore.
>
>But that in itself deserves an explanation in the patch commit message.
>So that's why I wondered whether it wouldn't be better to separate it
>out?

Yah, we can have this change in a separate patch before we introduce
VM_BIND feature.

>
>>
>> >
>> > >  
>> > > @@ -99,7 +98,8 @@ __i915_gem_object_create_user_ext(struct
>> > > drm_i915_private *i915, u64 size,
>> > >  
>> > >         i915_gem_flush_free_objects(i915);
>> > >  
>> > > -       size = round_up(size, object_max_page_size(placements,
>> > > n_placements));
>> > > +       size = round_up(size,
>> > > i915_gem_object_max_page_size(placements,
>> > > +                                                          
>> > > n_placements));
>> > >         if (size == 0)
>> > >                 return ERR_PTR(-EINVAL);
>> > >  
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> > > index 6f0a3ce35567..650de2224843 100644
>> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> > > @@ -47,6 +47,8 @@ static inline bool
>> > > i915_gem_object_size_2big(u64
>> > > size)
>> > >  }
>> > >  
>> > >  void i915_gem_init__objects(struct drm_i915_private *i915);
>> > > +u32 i915_gem_object_max_page_size(struct intel_memory_region
>> > > **placements,
>> > > +                                 unsigned int n_placements);
>> > >  
>> > >  void i915_objects_module_exit(void);
>> > >  int i915_objects_module_init(void);
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> > > new file mode 100644
>> > > index 000000000000..642cdb559f17
>> > > --- /dev/null
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> > > @@ -0,0 +1,38 @@
>> > > +/* SPDX-License-Identifier: MIT */
>> > > +/*
>> > > + * Copyright © 2022 Intel Corporation
>> > > + */
>> > > +
>> > > +#ifndef __I915_GEM_VM_BIND_H
>> > > +#define __I915_GEM_VM_BIND_H
>> > > +
>> > > +#include "i915_drv.h"
>> > > +
>> > > +#define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
>> > > > vm_bind_lock)
>> > > +
>> > > +static inline void i915_gem_vm_bind_lock(struct
>> > > i915_address_space
>> > > *vm)
>> > > +{
>> > > +       mutex_lock(&vm->vm_bind_lock);
>> > > +}
>> > > +
>> > > +static inline int
>> > > +i915_gem_vm_bind_lock_interruptible(struct i915_address_space
>> > > *vm)
>> > > +{
>> > > +       return mutex_lock_interruptible(&vm->vm_bind_lock);
>> > > +}
>> > > +
>> > > +static inline void i915_gem_vm_bind_unlock(struct
>> > > i915_address_space
>> > > *vm)
>> > > +{
>> > > +       mutex_unlock(&vm->vm_bind_lock);
>> > > +}
>> > > +
>> >
>> > Kerneldoc for the inlines.
>> >
>> > > +struct i915_vma *
>> > > +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64
>> > > va);
>> > > +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
>> > > release_obj);
>> > > +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>> > > +                        struct drm_i915_gem_vm_bind *va,
>> > > +                        struct drm_file *file);
>> > > +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
>> > > +                          struct drm_i915_gem_vm_unbind *va);
>> > > +
>> > > +#endif /* __I915_GEM_VM_BIND_H */
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> > > new file mode 100644
>> > > index 000000000000..43ceb4dcca6c
>> > > --- /dev/null
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> > > @@ -0,0 +1,233 @@
>> > > +// SPDX-License-Identifier: MIT
>> > > +/*
>> > > + * Copyright © 2022 Intel Corporation
>> > > + */
>> > > +
>> > > +#include <linux/interval_tree_generic.h>
>> > > +
>> > > +#include "gem/i915_gem_vm_bind.h"
>> > > +#include "gt/gen8_engine_cs.h"
>> > > +
>> > > +#include "i915_drv.h"
>> > > +#include "i915_gem_gtt.h"
>> > > +
>> > > +#define START(node) ((node)->start)
>> > > +#define LAST(node) ((node)->last)
>> > > +
>> > > +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
>> > > +                    START, LAST, static inline, i915_vm_bind_it)
>> > > +
>> > > +#undef START
>> > > +#undef LAST
>> > > +
>> > > +/**
>> > > + * DOC: VM_BIND/UNBIND ioctls
>> > > + *
>> > > + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind
>> > > GEM
>> > > buffer
>> > > + * objects (BOs) or sections of a BOs at specified GPU virtual
>> > > addresses on a
>> > > + * specified address space (VM). Multiple mappings can map to
>> > > the
>> > > same physical
>> > > + * pages of an object (aliasing). These mappings (also referred
>> > > to
>> > > as persistent
>> > > + * mappings) will be persistent across multiple GPU submissions
>> > > (execbuf calls)
>> > > + * issued by the UMD, without user having to provide a list of
>> > > all
>> > > required
>> > > + * mappings during each submission (as required by older execbuf
>> > > mode).
>> > > + *
>> > > + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out
>> > > fence for
>> > > + * signaling the completion of bind/unbind operation.
>> > > + *
>> > > + * VM_BIND feature is advertised to user via
>> > > I915_PARAM_VM_BIND_VERSION.
>> > > + * User has to opt-in for VM_BIND mode of binding for an address
>> > > space (VM)
>> > > + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND
>> > > extension.
>> > > + *
>> > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>> > > concurrently
>> > > + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND
>> > > operations can be
>> > > + * done asynchronously, when valid out fence is specified.
>> > > + *
>> > > + * VM_BIND locking order is as below.
>> > > + *
>> > > + * 1) Lock-A: A vm_bind mutex will protect vm_bind lists. This
>> > > lock
>> > > is taken in
>> > > + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and
>> > > while
>> > > releasing the
>> > > + *    mapping.
>> > > + *
>> > > + *    In future, when GPU page faults are supported, we can
>> > > potentially use a
>> > > + *    rwsem instead, so that multiple page fault handlers can
>> > > take
>> > > the read
>> > > + *    side lock to lookup the mapping and hence can run in
>> > > parallel.
>> > > + *    The older execbuf mode of binding do not need this lock.
>> > > + *
>> > > + * 2) Lock-B: The object's dma-resv lock will protect i915_vma
>> > > state
>> > > and needs
>> > > + *    to be held while binding/unbinding a vma in the async
>> > > worker
>> > > and while
>> > > + *    updating dma-resv fence list of an object. Note that
>> > > private
>> > > BOs of a VM
>> > > + *    will all share a dma-resv object.
>> > > + *
>> > > + *    The future system allocator support will use the HMM
>> > > prescribed locking
>> > > + *    instead.
>> >
>> > I don't think the last sentence is relevant for this series. Also,
>> > are
>> > there any other mentions for Locks A, B and C? If not, can we ditch
>> > that naming?
>>
>> It is taken from design rfc :). Yah, I think better to remove it and
>> probably the lock names and make it more specific to the
>> implementation
>> in this patch series.
>
>Ah, OK, if it's taken from the RFC and is an established naming in
>documentation that will remain, then it's fine with me. Perhaps with a
>pointer added to that doc that will help the reader.

sounds good.

>
>>
>> >
>> > > + *
>> > > + * 3) Lock-C: Spinlock/s to protect some of the VM's lists like
>> > > the
>> > > list of
>> > > + *    invalidated vmas (due to eviction and userptr
>> > > invalidation)
>> > > etc.
>> > > + */
>> > > +
>> > > +struct i915_vma *
>> > > +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64
>> > > va)
>> >
>> > Kerneldoc for the extern functions.
>> >
>> >
>> > > +{
>> > > +       struct i915_vma *vma, *temp;
>> > > +
>> > > +       assert_vm_bind_held(vm);
>> > > +
>> > > +       vma = i915_vm_bind_it_iter_first(&vm->va, va, va);
>> > > +       /* Working around compiler error, remove later */
>> >
>> > Is this still relevant? What compiler error is seen here?

I don't remember what error it was and I am no longer seeing it.
May be it got fixed in latest kernel. We can remove this work around.

>> >
>> > > +       if (vma)
>> > > +               temp = i915_vm_bind_it_iter_next(vma, va + vma-
>> > > >size,
>> > > -1);
>> > > +       return vma;
>> > > +}
>> > > +
>> > > +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
>> > > release_obj)
>> > > +{
>> > > +       assert_vm_bind_held(vma->vm);
>> > > +
>> > > +       if (!list_empty(&vma->vm_bind_link)) {
>> > > +               list_del_init(&vma->vm_bind_link);
>> > > +               i915_vm_bind_it_remove(vma, &vma->vm->va);
>> > > +
>> > > +               /* Release object */
>> > > +               if (release_obj)
>> > > +                       i915_vma_put(vma);
>> >
>> > i915_vma_put() here is confusing. Can we use i915_gem_object_put()
>> > to
>> > further make it clear that the persistent vmas actually take a
>> > reference on the object?
>> >
>>
>> makes sense.
>>
>> > > +       }
>> > > +}
>> > > +
>> > > +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
>> > > +                          struct drm_i915_gem_vm_unbind *va)
>> > > +{
>> > > +       struct drm_i915_gem_object *obj;
>> > > +       struct i915_vma *vma;
>> > > +       int ret;
>> > > +
>> > > +       va->start = gen8_noncanonical_addr(va->start);
>> > > +       ret = i915_gem_vm_bind_lock_interruptible(vm);
>> > > +       if (ret)
>> > > +               return ret;
>> > > +
>> > > +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>> > > +       if (!vma) {
>> > > +               ret = -ENOENT;
>> > > +               goto out_unlock;
>> > > +       }
>> > > +
>> > > +       if (vma->size != va->length)
>> > > +               ret = -EINVAL;
>> > > +       else
>> > > +               i915_gem_vm_bind_remove(vma, false);
>> > > +
>> > > +out_unlock:
>> > > +       i915_gem_vm_bind_unlock(vm);
>> > > +       if (ret || !vma)
>> > > +               return ret;
>> > > +
>> > > +       /* Destroy vma and then release object */
>> > > +       obj = vma->obj;
>> > > +       ret = i915_gem_object_lock(obj, NULL);
>> > > +       if (ret)
>> > > +               return ret;
>> >
>> > This call never returns an error and we could GEM_WARN_ON(...), or
>> > (void) to annotate that the return value is wilfully ignored.
>> >
>>
>> makes sense.
>>
>> > > +
>> > > +       i915_vma_destroy(vma);
>> > > +       i915_gem_object_unlock(obj);
>> > > +       i915_gem_object_put(obj);
>> > > +
>> > > +       return 0;
>> > > +}
>> > > +
>> > > +static struct i915_vma *vm_bind_get_vma(struct
>> > > i915_address_space
>> > > *vm,
>> > > +                                       struct
>> > > drm_i915_gem_object
>> > > *obj,
>> > > +                                       struct
>> > > drm_i915_gem_vm_bind
>> > > *va)
>> > > +{
>> > > +       struct i915_ggtt_view view;
>> > > +       struct i915_vma *vma;
>> > > +
>> > > +       va->start = gen8_noncanonical_addr(va->start);
>> > > +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>> > > +       if (vma)
>> > > +               return ERR_PTR(-EEXIST);
>> > > +
>> > > +       view.type = I915_GGTT_VIEW_PARTIAL;
>> > > +       view.partial.offset = va->offset >> PAGE_SHIFT;
>> > > +       view.partial.size = va->length >> PAGE_SHIFT;
>> >
>> > IIRC, this vma view is not handled correctly in the vma code, that
>> > only
>> > understands views for ggtt bindings.
>> >
>>
>> This patch series extends the partial view to ppgtt also.
>> Yah, the naming is still i915_ggtt_view, but I am hoping we can fix
>> the
>> name in a follow up patch later.
>
>Hmm, I somehow thought that the vma page adjustment was a NOP on ppgtt
>and only done on ggtt. But that's indeed not the case. Yes, then this
>is ok. We need to remember, though, that if we're going to use the
>existing vma async unbinding functionality, we'd need to attach the vma
>pages to the vma resource.

Yah. Given that vmas (and hence vma_resource) makes their on sg_table
from that of vma->obj (in this case through intel_partial_pages()),
the 'view' is not relavant after that. So, I think we should be good.

>
>
>>
>> >
>> > > +       vma = i915_vma_instance(obj, vm, &view);
>> > > +       if (IS_ERR(vma))
>> > > +               return vma;
>> > > +
>> > > +       vma->start = va->start;
>> > > +       vma->last = va->start + va->length - 1;
>> > > +
>> > > +       return vma;
>> > > +}
>> > > +
>> > > +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>> > > +                        struct drm_i915_gem_vm_bind *va,
>> > > +                        struct drm_file *file)
>> > > +{
>> > > +       struct drm_i915_gem_object *obj;
>> > > +       struct i915_vma *vma = NULL;
>> > > +       struct i915_gem_ww_ctx ww;
>> > > +       u64 pin_flags;
>> > > +       int ret = 0;
>> > > +
>> > > +       if (!vm->vm_bind_mode)
>> > > +               return -EOPNOTSUPP;
>> > > +
>> > > +       obj = i915_gem_object_lookup(file, va->handle);
>> > > +       if (!obj)
>> > > +               return -ENOENT;
>> > > +
>> > > +       if (!va->length ||
>> > > +           !IS_ALIGNED(va->offset | va->length,
>> > > +                       i915_gem_object_max_page_size(obj-
>> > > > mm.placements,
>> > > +                                                     obj-
>> > > > mm.n_placements)) ||
>> > > +           range_overflows_t(u64, va->offset, va->length, obj-
>> > > > base.size)) {
>> > > +               ret = -EINVAL;
>> > > +               goto put_obj;
>> > > +       }
>> > > +
>> > > +       ret = i915_gem_vm_bind_lock_interruptible(vm);
>> > > +       if (ret)
>> > > +               goto put_obj;
>> > > +
>> > > +       vma = vm_bind_get_vma(vm, obj, va);
>> > > +       if (IS_ERR(vma)) {
>> > > +               ret = PTR_ERR(vma);
>> > > +               goto unlock_vm;
>> > > +       }
>> > > +
>> > > +       i915_gem_ww_ctx_init(&ww, true);
>> > > +       pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
>> > > +retry:
>> > > +       ret = i915_gem_object_lock(vma->obj, &ww);
>> > > +       if (ret)
>> > > +               goto out_ww;
>> > > +
>> > > +       ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
>> > > +       if (ret)
>> > > +               goto out_ww;
>> > > +
>> > > +       /* Make it evictable */
>> > > +       __i915_vma_unpin(vma);
>> >
>> > A considerable effort has been put into avoiding short term vma
>> > pins in
>> > i915. We should add an interface like i915_vma_bind_ww() that
>> > avoids
>> > the pin altoghether.
>>
>> Currently in i915 driver VA managment and device page table bindings
>> are tightly coupled. i915_vma_pin_ww() does the both VA allocation
>> and
>> biding. And we also interpret VA being allocated (drm_mm node
>> allocated)
>> also as vma is bound.
>>
>> Decoupling it would be ideal but I think it needs to be carefully
>> done
>> in a separate patch series to not cause any regression.
>
>So the idea would be not to decouple these, but to just avoid pinning
>the vma in the process.

Well, we need the i915_vma_insert() as well (not just the bind).
I think the best and the only option we have today is i915_vma_pin_ww().
I think that slicing it falls into the bucket of decoupling as I mentioned
above. May be we can take this on later?

Niranjana

>
>
>>
>> >
>> > > +
>> > > +       list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>> > > +       i915_vm_bind_it_insert(vma, &vm->va);
>> > > +
>> > > +       /* Hold object reference until vm_unbind */
>> > > +       i915_gem_object_get(vma->obj);
>> > > +out_ww:
>> > > +       if (ret == -EDEADLK) {
>> > > +               ret = i915_gem_ww_ctx_backoff(&ww);
>> > > +               if (!ret)
>> > > +                       goto retry;
>> > > +       }
>> > > +
>> > > +       if (ret)
>> > > +               i915_vma_destroy(vma);
>> > > +
>> > > +       i915_gem_ww_ctx_fini(&ww);
>> >
>> > Could use for_i915_gem_ww()?
>> >
>>
>> Yah, I think it is a better idea to use it.
>>
>> > > +unlock_vm:
>> > > +       i915_gem_vm_bind_unlock(vm);
>> > > +put_obj:
>> > > +       i915_gem_object_put(obj);
>> > > +       return ret;
>> > > +}
>> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> > > b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> > > index b67831833c9a..135dc4a76724 100644
>> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> > > @@ -176,6 +176,8 @@ int i915_vm_lock_objects(struct
>> > > i915_address_space *vm,
>> > >  void i915_address_space_fini(struct i915_address_space *vm)
>> > >  {
>> > >         drm_mm_takedown(&vm->mm);
>> > > +       GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>> > > +       mutex_destroy(&vm->vm_bind_lock);
>> > >  }
>> > >  
>> > >  /**
>> > > @@ -282,6 +284,11 @@ void i915_address_space_init(struct
>> > > i915_address_space *vm, int subclass)
>> > >  
>> > >         INIT_LIST_HEAD(&vm->bound_list);
>> > >         INIT_LIST_HEAD(&vm->unbound_list);
>> > > +
>> > > +       vm->va = RB_ROOT_CACHED;
>> > > +       INIT_LIST_HEAD(&vm->vm_bind_list);
>> > > +       INIT_LIST_HEAD(&vm->vm_bound_list);
>> > > +       mutex_init(&vm->vm_bind_lock);
>> > >  }
>> > >  
>> > >  void *__px_vaddr(struct drm_i915_gem_object *p)
>> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> > > b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> > > index c812aa9708ae..d4a6ce65251d 100644
>> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> > > @@ -259,6 +259,15 @@ struct i915_address_space {
>> > >          */
>> > >         struct list_head unbound_list;
>> > >  
>> > > +       /**
>> > > +        * List of VM_BIND objects.
>> > > +        */
>> >
>> > Proper kerneldoc + intel locking guidelines comments, please.
>> >
>> > > +       struct mutex vm_bind_lock;  /* Protects vm_bind lists */
>> > > +       struct list_head vm_bind_list;
>> > > +       struct list_head vm_bound_list;
>> > > +       /* va tree of persistent vmas */
>> > > +       struct rb_root_cached va;
>> > > +
>> > >         /* Global GTT */
>> > >         bool is_ggtt:1;
>> > >  
>> > > diff --git a/drivers/gpu/drm/i915/i915_driver.c
>> > > b/drivers/gpu/drm/i915/i915_driver.c
>> > > index ccf990dfd99b..776ab7844f60 100644
>> > > --- a/drivers/gpu/drm/i915/i915_driver.c
>> > > +++ b/drivers/gpu/drm/i915/i915_driver.c
>> > > @@ -68,6 +68,7 @@
>> > >  #include "gem/i915_gem_ioctls.h"
>> > >  #include "gem/i915_gem_mman.h"
>> > >  #include "gem/i915_gem_pm.h"
>> > > +#include "gem/i915_gem_vm_bind.h"
>> > >  #include "gt/intel_gt.h"
>> > >  #include "gt/intel_gt_pm.h"
>> > >  #include "gt/intel_rc6.h"
>> > > @@ -1783,13 +1784,16 @@ static int i915_gem_vm_bind_ioctl(struct
>> > > drm_device *dev, void *data,
>> > >  {
>> > >         struct drm_i915_gem_vm_bind *args = data;
>> > >         struct i915_address_space *vm;
>> > > +       int ret;
>> > >  
>> > >         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>> > >         if (unlikely(!vm))
>> > >                 return -ENOENT;
>> > >  
>> > > +       ret = i915_gem_vm_bind_obj(vm, args, file);
>> > > +
>> > >         i915_vm_put(vm);
>> > > -       return -EINVAL;
>> > > +       return ret;
>> > >  }
>> > >  
>> > >  static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void
>> > > *data,
>> > > @@ -1797,13 +1801,16 @@ static int
>> > > i915_gem_vm_unbind_ioctl(struct
>> > > drm_device *dev, void *data,
>> > >  {
>> > >         struct drm_i915_gem_vm_unbind *args = data;
>> > >         struct i915_address_space *vm;
>> > > +       int ret;
>> > >  
>> > >         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>> > >         if (unlikely(!vm))
>> > >                 return -ENOENT;
>> > >  
>> > > +       ret = i915_gem_vm_unbind_obj(vm, args);
>> > > +
>> > >         i915_vm_put(vm);
>> > > -       return -EINVAL;
>> > > +       return ret;
>> > >  }
>> > >  
>> > >  static const struct drm_ioctl_desc i915_ioctls[] = {
>> > > diff --git a/drivers/gpu/drm/i915/i915_vma.c
>> > > b/drivers/gpu/drm/i915/i915_vma.c
>> > > index 43339ecabd73..d324e29cef0a 100644
>> > > --- a/drivers/gpu/drm/i915/i915_vma.c
>> > > +++ b/drivers/gpu/drm/i915/i915_vma.c
>> > > @@ -29,6 +29,7 @@
>> > >  #include "display/intel_frontbuffer.h"
>> > >  #include "gem/i915_gem_lmem.h"
>> > >  #include "gem/i915_gem_tiling.h"
>> > > +#include "gem/i915_gem_vm_bind.h"
>> > >  #include "gt/intel_engine.h"
>> > >  #include "gt/intel_engine_heartbeat.h"
>> > >  #include "gt/intel_gt.h"
>> > > @@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
>> > >         spin_unlock(&obj->vma.lock);
>> > >         mutex_unlock(&vm->mutex);
>> > >  
>> > > +       INIT_LIST_HEAD(&vma->vm_bind_link);
>> > >         return vma;
>> > >  
>> > >  err_unlock:
>> > > @@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object
>> > > *obj,
>> > >  {
>> > >         struct i915_vma *vma;
>> > >  
>> > > -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>> > >         GEM_BUG_ON(!kref_read(&vm->ref));
>> > >  
>> > >         spin_lock(&obj->vma.lock);
>> > > @@ -1660,6 +1661,10 @@ static void release_references(struct
>> > > i915_vma
>> > > *vma, bool vm_ddestroy)
>> > >  
>> > >         spin_unlock(&obj->vma.lock);
>> > >  
>> > > +       i915_gem_vm_bind_lock(vma->vm);
>> > > +       i915_gem_vm_bind_remove(vma, true);
>> > > +       i915_gem_vm_bind_unlock(vma->vm);
>> > > +
>> >
>> > The vm might be destroyed at this point already.
>> >
>>
>> Ah, due to async vma resource release...
>>
>> > From what I understand we can destroy the vma from three call
>> > sites:
>> > 1) VM_UNBIND -> The vma has already been removed from the vm_bind
>> > address space,
>> > 2) object destruction -> since the vma has an object reference
>> > while in
>> > the vm_bind address space, it must also have been removed from the
>> > address space if called from object destruction.
>> > 3) vm destruction. Suggestion is to call VM_UNBIND from under the
>> > vm_bind lock early in vm destruction.
>> >
>> > Then the above added code can be removed and replaced with an
>> > assert
>> > that the vm_bind address space RB_NODE is indeed empty.
>> >
>>
>> ...yah, makes sense to move this code to early in VM destruction than
>> here.
>>
>> Niranjana
>>
>> >
>> > >         spin_lock_irq(&gt->closed_lock);
>> > >         __i915_vma_remove_closed(vma);
>> > >         spin_unlock_irq(&gt->closed_lock);
>> > > diff --git a/drivers/gpu/drm/i915/i915_vma.h
>> > > b/drivers/gpu/drm/i915/i915_vma.h
>> > > index 88ca0bd9c900..dcb49f79ff7e 100644
>> > > --- a/drivers/gpu/drm/i915/i915_vma.h
>> > > +++ b/drivers/gpu/drm/i915/i915_vma.h
>> > > @@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
>> > >  {
>> > >         ptrdiff_t cmp;
>> > >  
>> > > -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>> > > -
>> > >         cmp = ptrdiff(vma->vm, vm);
>> > >         if (cmp)
>> > >                 return cmp;
>> > > diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
>> > > b/drivers/gpu/drm/i915/i915_vma_types.h
>> > > index be6e028c3b57..b6d179bdbfa0 100644
>> > > --- a/drivers/gpu/drm/i915/i915_vma_types.h
>> > > +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>> > > @@ -289,6 +289,14 @@ struct i915_vma {
>> > >         /** This object's place on the active/inactive lists */
>> > >         struct list_head vm_link;
>> > >  
>> > > +       struct list_head vm_bind_link; /* Link in persistent VMA
>> > > list
>> > > */
>> > > +
>> > > +       /** Interval tree structures for persistent vma */
>> >
>> > Proper kerneldoc.
>> >
>> > > +       struct rb_node rb;
>> > > +       u64 start;
>> > > +       u64 last;
>> > > +       u64 __subtree_last;
>> > > +
>> > >         struct list_head obj_link; /* Link in the object's VMA
>> > > list
>> > > */
>> > >         struct rb_node obj_node;
>> > >         struct hlist_node obj_hash;
>> >
>> > Thanks,
>> > Thomas
>> >
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
@ 2022-07-08 12:57           ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 12:57 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, matthew.auld,
	daniel.vetter, christian.koenig

On Thu, Jul 07, 2022 at 10:14:38AM +0200, Thomas Hellström wrote:
>On Wed, 2022-07-06 at 22:43 -0700, Niranjana Vishwanathapura wrote:
>> On Wed, Jul 06, 2022 at 06:21:03PM +0200, Thomas Hellström wrote:
>> > On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> > > Bind and unbind the mappings upon VM_BIND and VM_UNBIND calls.
>> > >
>> > > Signed-off-by: Niranjana Vishwanathapura
>> > > <niranjana.vishwanathapura@intel.com>
>> > > Signed-off-by: Prathap Kumar Valsan
>> > > <prathap.kumar.valsan@intel.com>
>> > > ---
>> > >  drivers/gpu/drm/i915/Makefile                 |   1 +
>> > >  drivers/gpu/drm/i915/gem/i915_gem_create.c    |  10 +-
>> > >  drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +
>> > >  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  38 +++
>> > >  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 233
>> > > ++++++++++++++++++
>> > >  drivers/gpu/drm/i915/gt/intel_gtt.c           |   7 +
>> > >  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>> > >  drivers/gpu/drm/i915/i915_driver.c            |  11 +-
>> > >  drivers/gpu/drm/i915/i915_vma.c               |   7 +-
>> > >  drivers/gpu/drm/i915/i915_vma.h               |   2 -
>> > >  drivers/gpu/drm/i915/i915_vma_types.h         |   8 +
>> > >  11 files changed, 318 insertions(+), 10 deletions(-)
>> > >  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> > >  create mode 100644
>> > > drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> > >
>> > > diff --git a/drivers/gpu/drm/i915/Makefile
>> > > b/drivers/gpu/drm/i915/Makefile
>> > > index 522ef9b4aff3..4e1627e96c6e 100644
>> > > --- a/drivers/gpu/drm/i915/Makefile
>> > > +++ b/drivers/gpu/drm/i915/Makefile
>> > > @@ -165,6 +165,7 @@ gem-y += \
>> > >         gem/i915_gem_ttm_move.o \
>> > >         gem/i915_gem_ttm_pm.o \
>> > >         gem/i915_gem_userptr.o \
>> > > +       gem/i915_gem_vm_bind_object.o \
>> > >         gem/i915_gem_wait.o \
>> > >         gem/i915_gemfs.o
>> > >  i915-y += \
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> > > index 33673fe7ee0a..927a87e5ec59 100644
>> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> > > @@ -15,10 +15,10 @@
>> > >  #include "i915_trace.h"
>> > >  #include "i915_user_extensions.h"
>> > >  
>> > > -static u32 object_max_page_size(struct intel_memory_region
>> > > **placements,
>> > > -                               unsigned int n_placements)
>> > > +u32 i915_gem_object_max_page_size(struct intel_memory_region
>> > > **placements,
>> >
>> > Kerneldoc.
>>
>> This is an existing function that is being modified. As I
>> mentioned in other thread, we probably need a prep patch early
>> in this series to add missing kernel-docs in i915 which this
>> patch series would later update.
>
>Here we make a static function extern, which according to the patch
>submission guidelines, mandates a kerneloc comment, so it's not so much
>that the function is modified. We should be fine adding kerneldoc in
>the patch that makes the function extern.
>

Ok, sounds good.

>
>>
>> >
>> > > +                                 unsigned int n_placements)
>> > >  {
>> > > -       u32 max_page_size = 0;
>> > > +       u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>> > >         int i;
>> > >  
>> > >         for (i = 0; i < n_placements; i++) {
>> > > @@ -28,7 +28,6 @@ static u32 object_max_page_size(struct
>> > > intel_memory_region **placements,
>> > >                 max_page_size = max_t(u32, max_page_size, mr-
>> > > > min_page_size);
>> > >         }
>> > >  
>> > > -       GEM_BUG_ON(!max_page_size);
>> > >         return max_page_size;
>> > >  }
>> >
>> > Should this change be separated out? It's not immediately clear to
>> > a
>> > reviewer why it is included.
>>
>> It is being removed as max_page_size now has a non-zero default
>> value and hence this check is not valid anymore.
>
>But that in itself deserves an explanation in the patch commit message.
>So that's why I wondered whether it wouldn't be better to separate it
>out?

Yah, we can have this change in a separate patch before we introduce
VM_BIND feature.

>
>>
>> >
>> > >  
>> > > @@ -99,7 +98,8 @@ __i915_gem_object_create_user_ext(struct
>> > > drm_i915_private *i915, u64 size,
>> > >  
>> > >         i915_gem_flush_free_objects(i915);
>> > >  
>> > > -       size = round_up(size, object_max_page_size(placements,
>> > > n_placements));
>> > > +       size = round_up(size,
>> > > i915_gem_object_max_page_size(placements,
>> > > +                                                          
>> > > n_placements));
>> > >         if (size == 0)
>> > >                 return ERR_PTR(-EINVAL);
>> > >  
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> > > index 6f0a3ce35567..650de2224843 100644
>> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> > > @@ -47,6 +47,8 @@ static inline bool
>> > > i915_gem_object_size_2big(u64
>> > > size)
>> > >  }
>> > >  
>> > >  void i915_gem_init__objects(struct drm_i915_private *i915);
>> > > +u32 i915_gem_object_max_page_size(struct intel_memory_region
>> > > **placements,
>> > > +                                 unsigned int n_placements);
>> > >  
>> > >  void i915_objects_module_exit(void);
>> > >  int i915_objects_module_init(void);
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> > > new file mode 100644
>> > > index 000000000000..642cdb559f17
>> > > --- /dev/null
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> > > @@ -0,0 +1,38 @@
>> > > +/* SPDX-License-Identifier: MIT */
>> > > +/*
>> > > + * Copyright © 2022 Intel Corporation
>> > > + */
>> > > +
>> > > +#ifndef __I915_GEM_VM_BIND_H
>> > > +#define __I915_GEM_VM_BIND_H
>> > > +
>> > > +#include "i915_drv.h"
>> > > +
>> > > +#define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
>> > > > vm_bind_lock)
>> > > +
>> > > +static inline void i915_gem_vm_bind_lock(struct
>> > > i915_address_space
>> > > *vm)
>> > > +{
>> > > +       mutex_lock(&vm->vm_bind_lock);
>> > > +}
>> > > +
>> > > +static inline int
>> > > +i915_gem_vm_bind_lock_interruptible(struct i915_address_space
>> > > *vm)
>> > > +{
>> > > +       return mutex_lock_interruptible(&vm->vm_bind_lock);
>> > > +}
>> > > +
>> > > +static inline void i915_gem_vm_bind_unlock(struct
>> > > i915_address_space
>> > > *vm)
>> > > +{
>> > > +       mutex_unlock(&vm->vm_bind_lock);
>> > > +}
>> > > +
>> >
>> > Kerneldoc for the inlines.
>> >
>> > > +struct i915_vma *
>> > > +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64
>> > > va);
>> > > +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
>> > > release_obj);
>> > > +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>> > > +                        struct drm_i915_gem_vm_bind *va,
>> > > +                        struct drm_file *file);
>> > > +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
>> > > +                          struct drm_i915_gem_vm_unbind *va);
>> > > +
>> > > +#endif /* __I915_GEM_VM_BIND_H */
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> > > new file mode 100644
>> > > index 000000000000..43ceb4dcca6c
>> > > --- /dev/null
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> > > @@ -0,0 +1,233 @@
>> > > +// SPDX-License-Identifier: MIT
>> > > +/*
>> > > + * Copyright © 2022 Intel Corporation
>> > > + */
>> > > +
>> > > +#include <linux/interval_tree_generic.h>
>> > > +
>> > > +#include "gem/i915_gem_vm_bind.h"
>> > > +#include "gt/gen8_engine_cs.h"
>> > > +
>> > > +#include "i915_drv.h"
>> > > +#include "i915_gem_gtt.h"
>> > > +
>> > > +#define START(node) ((node)->start)
>> > > +#define LAST(node) ((node)->last)
>> > > +
>> > > +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
>> > > +                    START, LAST, static inline, i915_vm_bind_it)
>> > > +
>> > > +#undef START
>> > > +#undef LAST
>> > > +
>> > > +/**
>> > > + * DOC: VM_BIND/UNBIND ioctls
>> > > + *
>> > > + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind
>> > > GEM
>> > > buffer
>> > > + * objects (BOs) or sections of a BOs at specified GPU virtual
>> > > addresses on a
>> > > + * specified address space (VM). Multiple mappings can map to
>> > > the
>> > > same physical
>> > > + * pages of an object (aliasing). These mappings (also referred
>> > > to
>> > > as persistent
>> > > + * mappings) will be persistent across multiple GPU submissions
>> > > (execbuf calls)
>> > > + * issued by the UMD, without user having to provide a list of
>> > > all
>> > > required
>> > > + * mappings during each submission (as required by older execbuf
>> > > mode).
>> > > + *
>> > > + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out
>> > > fence for
>> > > + * signaling the completion of bind/unbind operation.
>> > > + *
>> > > + * VM_BIND feature is advertised to user via
>> > > I915_PARAM_VM_BIND_VERSION.
>> > > + * User has to opt-in for VM_BIND mode of binding for an address
>> > > space (VM)
>> > > + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND
>> > > extension.
>> > > + *
>> > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>> > > concurrently
>> > > + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND
>> > > operations can be
>> > > + * done asynchronously, when valid out fence is specified.
>> > > + *
>> > > + * VM_BIND locking order is as below.
>> > > + *
>> > > + * 1) Lock-A: A vm_bind mutex will protect vm_bind lists. This
>> > > lock
>> > > is taken in
>> > > + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and
>> > > while
>> > > releasing the
>> > > + *    mapping.
>> > > + *
>> > > + *    In future, when GPU page faults are supported, we can
>> > > potentially use a
>> > > + *    rwsem instead, so that multiple page fault handlers can
>> > > take
>> > > the read
>> > > + *    side lock to lookup the mapping and hence can run in
>> > > parallel.
>> > > + *    The older execbuf mode of binding do not need this lock.
>> > > + *
>> > > + * 2) Lock-B: The object's dma-resv lock will protect i915_vma
>> > > state
>> > > and needs
>> > > + *    to be held while binding/unbinding a vma in the async
>> > > worker
>> > > and while
>> > > + *    updating dma-resv fence list of an object. Note that
>> > > private
>> > > BOs of a VM
>> > > + *    will all share a dma-resv object.
>> > > + *
>> > > + *    The future system allocator support will use the HMM
>> > > prescribed locking
>> > > + *    instead.
>> >
>> > I don't think the last sentence is relevant for this series. Also,
>> > are
>> > there any other mentions for Locks A, B and C? If not, can we ditch
>> > that naming?
>>
>> It is taken from design rfc :). Yah, I think better to remove it and
>> probably the lock names and make it more specific to the
>> implementation
>> in this patch series.
>
>Ah, OK, if it's taken from the RFC and is an established naming in
>documentation that will remain, then it's fine with me. Perhaps with a
>pointer added to that doc that will help the reader.

sounds good.

>
>>
>> >
>> > > + *
>> > > + * 3) Lock-C: Spinlock/s to protect some of the VM's lists like
>> > > the
>> > > list of
>> > > + *    invalidated vmas (due to eviction and userptr
>> > > invalidation)
>> > > etc.
>> > > + */
>> > > +
>> > > +struct i915_vma *
>> > > +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64
>> > > va)
>> >
>> > Kerneldoc for the extern functions.
>> >
>> >
>> > > +{
>> > > +       struct i915_vma *vma, *temp;
>> > > +
>> > > +       assert_vm_bind_held(vm);
>> > > +
>> > > +       vma = i915_vm_bind_it_iter_first(&vm->va, va, va);
>> > > +       /* Working around compiler error, remove later */
>> >
>> > Is this still relevant? What compiler error is seen here?

I don't remember what error it was and I am no longer seeing it.
May be it got fixed in latest kernel. We can remove this work around.

>> >
>> > > +       if (vma)
>> > > +               temp = i915_vm_bind_it_iter_next(vma, va + vma-
>> > > >size,
>> > > -1);
>> > > +       return vma;
>> > > +}
>> > > +
>> > > +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
>> > > release_obj)
>> > > +{
>> > > +       assert_vm_bind_held(vma->vm);
>> > > +
>> > > +       if (!list_empty(&vma->vm_bind_link)) {
>> > > +               list_del_init(&vma->vm_bind_link);
>> > > +               i915_vm_bind_it_remove(vma, &vma->vm->va);
>> > > +
>> > > +               /* Release object */
>> > > +               if (release_obj)
>> > > +                       i915_vma_put(vma);
>> >
>> > i915_vma_put() here is confusing. Can we use i915_gem_object_put()
>> > to
>> > further make it clear that the persistent vmas actually take a
>> > reference on the object?
>> >
>>
>> makes sense.
>>
>> > > +       }
>> > > +}
>> > > +
>> > > +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
>> > > +                          struct drm_i915_gem_vm_unbind *va)
>> > > +{
>> > > +       struct drm_i915_gem_object *obj;
>> > > +       struct i915_vma *vma;
>> > > +       int ret;
>> > > +
>> > > +       va->start = gen8_noncanonical_addr(va->start);
>> > > +       ret = i915_gem_vm_bind_lock_interruptible(vm);
>> > > +       if (ret)
>> > > +               return ret;
>> > > +
>> > > +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>> > > +       if (!vma) {
>> > > +               ret = -ENOENT;
>> > > +               goto out_unlock;
>> > > +       }
>> > > +
>> > > +       if (vma->size != va->length)
>> > > +               ret = -EINVAL;
>> > > +       else
>> > > +               i915_gem_vm_bind_remove(vma, false);
>> > > +
>> > > +out_unlock:
>> > > +       i915_gem_vm_bind_unlock(vm);
>> > > +       if (ret || !vma)
>> > > +               return ret;
>> > > +
>> > > +       /* Destroy vma and then release object */
>> > > +       obj = vma->obj;
>> > > +       ret = i915_gem_object_lock(obj, NULL);
>> > > +       if (ret)
>> > > +               return ret;
>> >
>> > This call never returns an error and we could GEM_WARN_ON(...), or
>> > (void) to annotate that the return value is wilfully ignored.
>> >
>>
>> makes sense.
>>
>> > > +
>> > > +       i915_vma_destroy(vma);
>> > > +       i915_gem_object_unlock(obj);
>> > > +       i915_gem_object_put(obj);
>> > > +
>> > > +       return 0;
>> > > +}
>> > > +
>> > > +static struct i915_vma *vm_bind_get_vma(struct
>> > > i915_address_space
>> > > *vm,
>> > > +                                       struct
>> > > drm_i915_gem_object
>> > > *obj,
>> > > +                                       struct
>> > > drm_i915_gem_vm_bind
>> > > *va)
>> > > +{
>> > > +       struct i915_ggtt_view view;
>> > > +       struct i915_vma *vma;
>> > > +
>> > > +       va->start = gen8_noncanonical_addr(va->start);
>> > > +       vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>> > > +       if (vma)
>> > > +               return ERR_PTR(-EEXIST);
>> > > +
>> > > +       view.type = I915_GGTT_VIEW_PARTIAL;
>> > > +       view.partial.offset = va->offset >> PAGE_SHIFT;
>> > > +       view.partial.size = va->length >> PAGE_SHIFT;
>> >
>> > IIRC, this vma view is not handled correctly in the vma code, that
>> > only
>> > understands views for ggtt bindings.
>> >
>>
>> This patch series extends the partial view to ppgtt also.
>> Yah, the naming is still i915_ggtt_view, but I am hoping we can fix
>> the
>> name in a follow up patch later.
>
>Hmm, I somehow thought that the vma page adjustment was a NOP on ppgtt
>and only done on ggtt. But that's indeed not the case. Yes, then this
>is ok. We need to remember, though, that if we're going to use the
>existing vma async unbinding functionality, we'd need to attach the vma
>pages to the vma resource.

Yah. Given that vmas (and hence vma_resource) makes their on sg_table
from that of vma->obj (in this case through intel_partial_pages()),
the 'view' is not relavant after that. So, I think we should be good.

>
>
>>
>> >
>> > > +       vma = i915_vma_instance(obj, vm, &view);
>> > > +       if (IS_ERR(vma))
>> > > +               return vma;
>> > > +
>> > > +       vma->start = va->start;
>> > > +       vma->last = va->start + va->length - 1;
>> > > +
>> > > +       return vma;
>> > > +}
>> > > +
>> > > +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>> > > +                        struct drm_i915_gem_vm_bind *va,
>> > > +                        struct drm_file *file)
>> > > +{
>> > > +       struct drm_i915_gem_object *obj;
>> > > +       struct i915_vma *vma = NULL;
>> > > +       struct i915_gem_ww_ctx ww;
>> > > +       u64 pin_flags;
>> > > +       int ret = 0;
>> > > +
>> > > +       if (!vm->vm_bind_mode)
>> > > +               return -EOPNOTSUPP;
>> > > +
>> > > +       obj = i915_gem_object_lookup(file, va->handle);
>> > > +       if (!obj)
>> > > +               return -ENOENT;
>> > > +
>> > > +       if (!va->length ||
>> > > +           !IS_ALIGNED(va->offset | va->length,
>> > > +                       i915_gem_object_max_page_size(obj-
>> > > > mm.placements,
>> > > +                                                     obj-
>> > > > mm.n_placements)) ||
>> > > +           range_overflows_t(u64, va->offset, va->length, obj-
>> > > > base.size)) {
>> > > +               ret = -EINVAL;
>> > > +               goto put_obj;
>> > > +       }
>> > > +
>> > > +       ret = i915_gem_vm_bind_lock_interruptible(vm);
>> > > +       if (ret)
>> > > +               goto put_obj;
>> > > +
>> > > +       vma = vm_bind_get_vma(vm, obj, va);
>> > > +       if (IS_ERR(vma)) {
>> > > +               ret = PTR_ERR(vma);
>> > > +               goto unlock_vm;
>> > > +       }
>> > > +
>> > > +       i915_gem_ww_ctx_init(&ww, true);
>> > > +       pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
>> > > +retry:
>> > > +       ret = i915_gem_object_lock(vma->obj, &ww);
>> > > +       if (ret)
>> > > +               goto out_ww;
>> > > +
>> > > +       ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
>> > > +       if (ret)
>> > > +               goto out_ww;
>> > > +
>> > > +       /* Make it evictable */
>> > > +       __i915_vma_unpin(vma);
>> >
>> > A considerable effort has been put into avoiding short term vma
>> > pins in
>> > i915. We should add an interface like i915_vma_bind_ww() that
>> > avoids
>> > the pin altoghether.
>>
>> Currently in i915 driver VA managment and device page table bindings
>> are tightly coupled. i915_vma_pin_ww() does the both VA allocation
>> and
>> biding. And we also interpret VA being allocated (drm_mm node
>> allocated)
>> also as vma is bound.
>>
>> Decoupling it would be ideal but I think it needs to be carefully
>> done
>> in a separate patch series to not cause any regression.
>
>So the idea would be not to decouple these, but to just avoid pinning
>the vma in the process.

Well, we need the i915_vma_insert() as well (not just the bind).
I think the best and the only option we have today is i915_vma_pin_ww().
I think that slicing it falls into the bucket of decoupling as I mentioned
above. May be we can take this on later?

Niranjana

>
>
>>
>> >
>> > > +
>> > > +       list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>> > > +       i915_vm_bind_it_insert(vma, &vm->va);
>> > > +
>> > > +       /* Hold object reference until vm_unbind */
>> > > +       i915_gem_object_get(vma->obj);
>> > > +out_ww:
>> > > +       if (ret == -EDEADLK) {
>> > > +               ret = i915_gem_ww_ctx_backoff(&ww);
>> > > +               if (!ret)
>> > > +                       goto retry;
>> > > +       }
>> > > +
>> > > +       if (ret)
>> > > +               i915_vma_destroy(vma);
>> > > +
>> > > +       i915_gem_ww_ctx_fini(&ww);
>> >
>> > Could use for_i915_gem_ww()?
>> >
>>
>> Yah, I think it is a better idea to use it.
>>
>> > > +unlock_vm:
>> > > +       i915_gem_vm_bind_unlock(vm);
>> > > +put_obj:
>> > > +       i915_gem_object_put(obj);
>> > > +       return ret;
>> > > +}
>> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> > > b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> > > index b67831833c9a..135dc4a76724 100644
>> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> > > @@ -176,6 +176,8 @@ int i915_vm_lock_objects(struct
>> > > i915_address_space *vm,
>> > >  void i915_address_space_fini(struct i915_address_space *vm)
>> > >  {
>> > >         drm_mm_takedown(&vm->mm);
>> > > +       GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>> > > +       mutex_destroy(&vm->vm_bind_lock);
>> > >  }
>> > >  
>> > >  /**
>> > > @@ -282,6 +284,11 @@ void i915_address_space_init(struct
>> > > i915_address_space *vm, int subclass)
>> > >  
>> > >         INIT_LIST_HEAD(&vm->bound_list);
>> > >         INIT_LIST_HEAD(&vm->unbound_list);
>> > > +
>> > > +       vm->va = RB_ROOT_CACHED;
>> > > +       INIT_LIST_HEAD(&vm->vm_bind_list);
>> > > +       INIT_LIST_HEAD(&vm->vm_bound_list);
>> > > +       mutex_init(&vm->vm_bind_lock);
>> > >  }
>> > >  
>> > >  void *__px_vaddr(struct drm_i915_gem_object *p)
>> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> > > b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> > > index c812aa9708ae..d4a6ce65251d 100644
>> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> > > @@ -259,6 +259,15 @@ struct i915_address_space {
>> > >          */
>> > >         struct list_head unbound_list;
>> > >  
>> > > +       /**
>> > > +        * List of VM_BIND objects.
>> > > +        */
>> >
>> > Proper kerneldoc + intel locking guidelines comments, please.
>> >
>> > > +       struct mutex vm_bind_lock;  /* Protects vm_bind lists */
>> > > +       struct list_head vm_bind_list;
>> > > +       struct list_head vm_bound_list;
>> > > +       /* va tree of persistent vmas */
>> > > +       struct rb_root_cached va;
>> > > +
>> > >         /* Global GTT */
>> > >         bool is_ggtt:1;
>> > >  
>> > > diff --git a/drivers/gpu/drm/i915/i915_driver.c
>> > > b/drivers/gpu/drm/i915/i915_driver.c
>> > > index ccf990dfd99b..776ab7844f60 100644
>> > > --- a/drivers/gpu/drm/i915/i915_driver.c
>> > > +++ b/drivers/gpu/drm/i915/i915_driver.c
>> > > @@ -68,6 +68,7 @@
>> > >  #include "gem/i915_gem_ioctls.h"
>> > >  #include "gem/i915_gem_mman.h"
>> > >  #include "gem/i915_gem_pm.h"
>> > > +#include "gem/i915_gem_vm_bind.h"
>> > >  #include "gt/intel_gt.h"
>> > >  #include "gt/intel_gt_pm.h"
>> > >  #include "gt/intel_rc6.h"
>> > > @@ -1783,13 +1784,16 @@ static int i915_gem_vm_bind_ioctl(struct
>> > > drm_device *dev, void *data,
>> > >  {
>> > >         struct drm_i915_gem_vm_bind *args = data;
>> > >         struct i915_address_space *vm;
>> > > +       int ret;
>> > >  
>> > >         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>> > >         if (unlikely(!vm))
>> > >                 return -ENOENT;
>> > >  
>> > > +       ret = i915_gem_vm_bind_obj(vm, args, file);
>> > > +
>> > >         i915_vm_put(vm);
>> > > -       return -EINVAL;
>> > > +       return ret;
>> > >  }
>> > >  
>> > >  static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void
>> > > *data,
>> > > @@ -1797,13 +1801,16 @@ static int
>> > > i915_gem_vm_unbind_ioctl(struct
>> > > drm_device *dev, void *data,
>> > >  {
>> > >         struct drm_i915_gem_vm_unbind *args = data;
>> > >         struct i915_address_space *vm;
>> > > +       int ret;
>> > >  
>> > >         vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>> > >         if (unlikely(!vm))
>> > >                 return -ENOENT;
>> > >  
>> > > +       ret = i915_gem_vm_unbind_obj(vm, args);
>> > > +
>> > >         i915_vm_put(vm);
>> > > -       return -EINVAL;
>> > > +       return ret;
>> > >  }
>> > >  
>> > >  static const struct drm_ioctl_desc i915_ioctls[] = {
>> > > diff --git a/drivers/gpu/drm/i915/i915_vma.c
>> > > b/drivers/gpu/drm/i915/i915_vma.c
>> > > index 43339ecabd73..d324e29cef0a 100644
>> > > --- a/drivers/gpu/drm/i915/i915_vma.c
>> > > +++ b/drivers/gpu/drm/i915/i915_vma.c
>> > > @@ -29,6 +29,7 @@
>> > >  #include "display/intel_frontbuffer.h"
>> > >  #include "gem/i915_gem_lmem.h"
>> > >  #include "gem/i915_gem_tiling.h"
>> > > +#include "gem/i915_gem_vm_bind.h"
>> > >  #include "gt/intel_engine.h"
>> > >  #include "gt/intel_engine_heartbeat.h"
>> > >  #include "gt/intel_gt.h"
>> > > @@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
>> > >         spin_unlock(&obj->vma.lock);
>> > >         mutex_unlock(&vm->mutex);
>> > >  
>> > > +       INIT_LIST_HEAD(&vma->vm_bind_link);
>> > >         return vma;
>> > >  
>> > >  err_unlock:
>> > > @@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object
>> > > *obj,
>> > >  {
>> > >         struct i915_vma *vma;
>> > >  
>> > > -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>> > >         GEM_BUG_ON(!kref_read(&vm->ref));
>> > >  
>> > >         spin_lock(&obj->vma.lock);
>> > > @@ -1660,6 +1661,10 @@ static void release_references(struct
>> > > i915_vma
>> > > *vma, bool vm_ddestroy)
>> > >  
>> > >         spin_unlock(&obj->vma.lock);
>> > >  
>> > > +       i915_gem_vm_bind_lock(vma->vm);
>> > > +       i915_gem_vm_bind_remove(vma, true);
>> > > +       i915_gem_vm_bind_unlock(vma->vm);
>> > > +
>> >
>> > The vm might be destroyed at this point already.
>> >
>>
>> Ah, due to async vma resource release...
>>
>> > From what I understand we can destroy the vma from three call
>> > sites:
>> > 1) VM_UNBIND -> The vma has already been removed from the vm_bind
>> > address space,
>> > 2) object destruction -> since the vma has an object reference
>> > while in
>> > the vm_bind address space, it must also have been removed from the
>> > address space if called from object destruction.
>> > 3) vm destruction. Suggestion is to call VM_UNBIND from under the
>> > vm_bind lock early in vm destruction.
>> >
>> > Then the above added code can be removed and replaced with an
>> > assert
>> > that the vm_bind address space RB_NODE is indeed empty.
>> >
>>
>> ...yah, makes sense to move this code to early in VM destruction than
>> here.
>>
>> Niranjana
>>
>> >
>> > >         spin_lock_irq(&gt->closed_lock);
>> > >         __i915_vma_remove_closed(vma);
>> > >         spin_unlock_irq(&gt->closed_lock);
>> > > diff --git a/drivers/gpu/drm/i915/i915_vma.h
>> > > b/drivers/gpu/drm/i915/i915_vma.h
>> > > index 88ca0bd9c900..dcb49f79ff7e 100644
>> > > --- a/drivers/gpu/drm/i915/i915_vma.h
>> > > +++ b/drivers/gpu/drm/i915/i915_vma.h
>> > > @@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
>> > >  {
>> > >         ptrdiff_t cmp;
>> > >  
>> > > -       GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>> > > -
>> > >         cmp = ptrdiff(vma->vm, vm);
>> > >         if (cmp)
>> > >                 return cmp;
>> > > diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
>> > > b/drivers/gpu/drm/i915/i915_vma_types.h
>> > > index be6e028c3b57..b6d179bdbfa0 100644
>> > > --- a/drivers/gpu/drm/i915/i915_vma_types.h
>> > > +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>> > > @@ -289,6 +289,14 @@ struct i915_vma {
>> > >         /** This object's place on the active/inactive lists */
>> > >         struct list_head vm_link;
>> > >  
>> > > +       struct list_head vm_bind_link; /* Link in persistent VMA
>> > > list
>> > > */
>> > > +
>> > > +       /** Interval tree structures for persistent vma */
>> >
>> > Proper kerneldoc.
>> >
>> > > +       struct rb_node rb;
>> > > +       u64 start;
>> > > +       u64 last;
>> > > +       u64 __subtree_last;
>> > > +
>> > >         struct list_head obj_link; /* Link in the object's VMA
>> > > list
>> > > */
>> > >         struct rb_node obj_node;
>> > >         struct hlist_node obj_hash;
>> >
>> > Thanks,
>> > Thomas
>> >
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 01/10] drm/i915/vm_bind: Introduce VM_BIND ioctl
  2022-07-07  7:32         ` [Intel-gfx] " Hellstrom, Thomas
@ 2022-07-08 12:58           ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 12:58 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	Daniel, christian.koenig

On Thu, Jul 07, 2022 at 12:32:14AM -0700, Hellstrom, Thomas wrote:
>On Wed, 2022-07-06 at 22:01 -0700, Niranjana Vishwanathapura wrote:
>> > > +       /**
>> > > +        * true: allow only vm_bind method of binding.
>> > > +        * false: allow only legacy execbuff method of binding.
>> > > +        */
>> >
>> > Use proper kerneldoc. (Same holds for structure documentation
>> > across
>> > the series).
>> > Also please follow internal locking guidelines on documentation of
>> > members that need protection with locks.
>>
>> I just followed the documentation convention that was already there
>> ;)
>> I think we need a prep patch in this series that adds kernel-doc for
>> these structures and then add new fields for vm_bind with proper
>> kernel-docs.
>
>That would be awesome if we could do that, but as a minimum, I think
>that new in-line struct / union comments should follow
>
>https://www.kernel.org/doc/html/v5.3/doc-guide/kernel-doc.html#in-line-member-documentation-comments
>
>and completely new struct / unions should follow
>
>https://www.kernel.org/doc/html/v5.3/doc-guide/kernel-doc.html#in-line-member-documentation-comments,
>
>and in particular the internal locking guidelines what members are
>protected with what locks and, if applicable, how. (For example a
>member may be protected by two locks when writing to it and only one of
>the locks when reading).

Sounds good.

Niranjana

>
>/Thomas
>
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 01/10] drm/i915/vm_bind: Introduce VM_BIND ioctl
@ 2022-07-08 12:58           ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 12:58 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, christian.koenig

On Thu, Jul 07, 2022 at 12:32:14AM -0700, Hellstrom, Thomas wrote:
>On Wed, 2022-07-06 at 22:01 -0700, Niranjana Vishwanathapura wrote:
>> > > +       /**
>> > > +        * true: allow only vm_bind method of binding.
>> > > +        * false: allow only legacy execbuff method of binding.
>> > > +        */
>> >
>> > Use proper kerneldoc. (Same holds for structure documentation
>> > across
>> > the series).
>> > Also please follow internal locking guidelines on documentation of
>> > members that need protection with locks.
>>
>> I just followed the documentation convention that was already there
>> ;)
>> I think we need a prep patch in this series that adds kernel-doc for
>> these structures and then add new fields for vm_bind with proper
>> kernel-docs.
>
>That would be awesome if we could do that, but as a minimum, I think
>that new in-line struct / union comments should follow
>
>https://www.kernel.org/doc/html/v5.3/doc-guide/kernel-doc.html#in-line-member-documentation-comments
>
>and completely new struct / unions should follow
>
>https://www.kernel.org/doc/html/v5.3/doc-guide/kernel-doc.html#in-line-member-documentation-comments,
>
>and in particular the internal locking guidelines what members are
>protected with what locks and, if applicable, how. (For example a
>member may be protected by two locks when writing to it and only one of
>the locks when reading).

Sounds good.

Niranjana

>
>/Thomas
>
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 07/10] drm/i915/vm_bind: Handle persistent vmas in execbuf3
  2022-07-08 12:44       ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-08 13:03         ` Hellstrom, Thomas
  -1 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-08 13:03 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	 Daniel, christian.koenig

On Fri, 2022-07-08 at 05:44 -0700, Niranjana Vishwanathapura wrote:
> On Thu, Jul 07, 2022 at 07:54:16AM -0700, Hellstrom, Thomas wrote:
> > On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> > > Handle persistent (VM_BIND) mappings during the request
> > > submission
> > > in the execbuf3 path.
> > > 
> > > Signed-off-by: Niranjana Vishwanathapura
> > > <niranjana.vishwanathapura@intel.com>
> > > ---
> > >  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 176
> > > +++++++++++++++++-
> > >  1 file changed, 175 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > index 13121df72e3d..2079f5ca9010 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > @@ -22,6 +22,7 @@
> > >  #include "i915_gem_vm_bind.h"
> > >  #include "i915_trace.h"
> > > 
> > > +#define __EXEC3_HAS_PIN                        BIT_ULL(33)
> > >  #define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
> > >  #define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
> > > 
> > > @@ -45,7 +46,9 @@
> > >   * execlist. Hence, no support for implicit sync.
> > >   *
> > >   * The new execbuf3 ioctl only works in VM_BIND mode and the
> > > VM_BIND
> > > mode only
> > > - * works with execbuf3 ioctl for submission.
> > > + * works with execbuf3 ioctl for submission. All BOs mapped on
> > > that
> > > VM (through
> > > + * VM_BIND call) at the time of execbuf3 call are deemed
> > > required
> > > for that
> > > + * submission.
> > >   *
> > >   * The execbuf3 ioctl directly specifies the batch addresses
> > > instead
> > > of as
> > >   * object handles as in execbuf2 ioctl. The execbuf3 ioctl will
> > > also
> > > not
> > > @@ -61,6 +64,13 @@
> > >   * So, a lot of code supporting execbuf2 ioctl, like
> > > relocations, VA
> > > evictions,
> > >   * vma lookup table, implicit sync, vma active reference
> > > tracking
> > > etc., are not
> > >   * applicable for execbuf3 ioctl.
> > > + *
> > > + * During each execbuf submission, request fence is added to all
> > > VM_BIND mapped
> > > + * objects with DMA_RESV_USAGE_BOOKKEEP. The
> > > DMA_RESV_USAGE_BOOKKEEP
> > > usage will
> > > + * prevent over sync (See enum dma_resv_usage). Note that
> > > DRM_I915_GEM_WAIT and
> > > + * DRM_I915_GEM_BUSY ioctls do not check for
> > > DMA_RESV_USAGE_BOOKKEEP
> > > usage and
> > > + * hence should not be used for end of batch check. Instead, the
> > > execbuf3
> > > + * timeline out fence should be used for end of batch check.
> > >   */
> > > 
> > >  struct eb_fence {
> > > @@ -124,6 +134,19 @@ eb_find_vma(struct i915_address_space *vm,
> > > u64
> > > addr)
> > >         return i915_gem_vm_bind_lookup_vma(vm, va);
> > >  }
> > > 
> > > +static void eb_scoop_unbound_vmas(struct i915_address_space *vm)
> > > +{
> > > +       struct i915_vma *vma, *vn;
> > > +
> > > +       spin_lock(&vm->vm_rebind_lock);
> > > +       list_for_each_entry_safe(vma, vn, &vm->vm_rebind_list,
> > > vm_rebind_link) {
> > > +               list_del_init(&vma->vm_rebind_link);
> > > +               if (!list_empty(&vma->vm_bind_link))
> > > +                       list_move_tail(&vma->vm_bind_link, &vm-
> > > > vm_bind_list);
> > > +       }
> > > +       spin_unlock(&vm->vm_rebind_lock);
> > > +}
> > > +
> > >  static int eb_lookup_vmas(struct i915_execbuffer *eb)
> > >  {
> > >         unsigned int i, current_batch = 0;
> > > @@ -138,11 +161,118 @@ static int eb_lookup_vmas(struct
> > > i915_execbuffer *eb)
> > >                 ++current_batch;
> > >         }
> > > 
> > > +       eb_scoop_unbound_vmas(eb->context->vm);
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static int eb_lock_vmas(struct i915_execbuffer *eb)
> > > +{
> > > +       struct i915_address_space *vm = eb->context->vm;
> > > +       struct i915_vma *vma;
> > > +       int err;
> > > +
> > > +       err = i915_gem_vm_priv_lock(eb->context->vm, &eb->ww);
> > > +       if (err)
> > > +               return err;
> > > +
> > 
> > See comment in review for 08/10 about re-checking the rebind list
> > here.
> > 
> > 
> > 
> > > +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
> > > +                           non_priv_vm_bind_link) {
> > > +               err = i915_gem_object_lock(vma->obj, &eb->ww);
> > > +               if (err)
> > > +                       return err;
> > > +       }
> > > +
> > >         return 0;
> > >  }
> > > 
> > > +static void eb_release_persistent_vmas(struct i915_execbuffer
> > > *eb,
> > > bool final)
> > > +{
> > > +       struct i915_address_space *vm = eb->context->vm;
> > > +       struct i915_vma *vma, *vn;
> > > +
> > > +       assert_vm_bind_held(vm);
> > > +
> > > +       if (!(eb->args->flags & __EXEC3_HAS_PIN))
> > > +               return;
> > > +
> > > +       assert_vm_priv_held(vm);
> > > +
> > > +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
> > > +               __i915_vma_unpin(vma);
> > > +
> > > +       eb->args->flags &= ~__EXEC3_HAS_PIN;
> > > +       if (!final)
> > > +               return;
> > > +
> > > +       list_for_each_entry_safe(vma, vn, &vm->vm_bind_list,
> > > vm_bind_link)
> > > +               if (i915_vma_is_bind_complete(vma))
> > > +                       list_move_tail(&vma->vm_bind_link, &vm-
> > > > vm_bound_list);
> > > +}
> > > +
> > >  static void eb_release_vmas(struct i915_execbuffer *eb, bool
> > > final)
> > >  {
> > > +       eb_release_persistent_vmas(eb, final);
> > > +       eb_unpin_engine(eb);
> > > +}
> > > +
> > > +static int eb_reserve_fence_for_persistent_vmas(struct
> > > i915_execbuffer *eb)
> > > +{
> > > +       struct i915_address_space *vm = eb->context->vm;
> > > +       struct i915_vma *vma;
> > > +       int ret;
> > > +
> > > +       ret = dma_resv_reserve_fences(vm->root_obj->base.resv,
> > > 1);
> > > +       if (ret)
> > > +               return ret;
> > > +
> > > +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
> > > +                           non_priv_vm_bind_link) {
> > > +               ret = dma_resv_reserve_fences(vma->obj-
> > > >base.resv,
> > > 1);
> > > +               if (ret)
> > > +                       return ret;
> > > +       }
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static int eb_validate_persistent_vmas(struct i915_execbuffer
> > > *eb)
> > > +{
> > > +       struct i915_address_space *vm = eb->context->vm;
> > > +       struct i915_vma *vma, *last_pinned_vma = NULL;
> > > +       int ret = 0;
> > > +
> > > +       assert_vm_bind_held(vm);
> > > +       assert_vm_priv_held(vm);
> > > +
> > > +       ret = eb_reserve_fence_for_persistent_vmas(eb);
> > > +       if (ret)
> > > +               return ret;
> > > +
> > > +       if (list_empty(&vm->vm_bind_list))
> > > +               return 0;
> > > +
> > > +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
> > > {
> > > +               u64 pin_flags = vma->start | PIN_OFFSET_FIXED |
> > > PIN_USER;
> > > +
> > > +               ret = i915_vma_pin_ww(vma, &eb->ww, 0, 0,
> > > pin_flags);
> > > +               if (ret)
> > > +                       break;
> > > +
> > > +               last_pinned_vma = vma;
> > > +       }
> > > +
> > > +       if (ret && last_pinned_vma) {
> > > +               list_for_each_entry(vma, &vm->vm_bind_list,
> > > vm_bind_link) {
> > > +                       __i915_vma_unpin(vma);
> > > +                       if (vma == last_pinned_vma)
> > > +                               break;
> > > +               }
> > > +       } else if (last_pinned_vma) {
> > > +               eb->args->flags |= __EXEC3_HAS_PIN;
> > > +       }
> > > +
> > > +       return ret;
> > >  }
> > > 
> > >  static int eb_validate_vmas(struct i915_execbuffer *eb)
> > > @@ -162,8 +292,17 @@ static int eb_validate_vmas(struct
> > > i915_execbuffer *eb)
> > >         /* only throttle once, even if we didn't need to throttle
> > > */
> > >         throttle = false;
> > > 
> > > +       err = eb_lock_vmas(eb);
> > > +       if (err)
> > > +               goto err;
> > > +
> > > +       err = eb_validate_persistent_vmas(eb);
> > > +       if (err)
> > > +               goto err;
> > > +
> > >  err:
> > >         if (err == -EDEADLK) {
> > > +               eb_release_vmas(eb, false);
> > >                 err = i915_gem_ww_ctx_backoff(&eb->ww);
> > >                 if (!err)
> > >                         goto retry;
> > > @@ -187,8 +326,43 @@ static int eb_validate_vmas(struct
> > > i915_execbuffer *eb)
> > >         BUILD_BUG_ON(!typecheck(int, _i)); \
> > >         for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
> > > 
> > > +static void __eb_persistent_add_shared_fence(struct
> > > drm_i915_gem_object *obj,
> > > +                                            struct dma_fence
> > > *fence)
> > > +{
> > > +       dma_resv_add_fence(obj->base.resv, fence,
> > > DMA_RESV_USAGE_BOOKKEEP);
> > > +       obj->write_domain = 0;
> > > +       obj->read_domains |= I915_GEM_GPU_DOMAINS;
> > > +       obj->mm.dirty = true;
> > > +}
> > > +
> > > +static void eb_persistent_add_shared_fence(struct
> > > i915_execbuffer
> > > *eb)
> > > +{
> > > +       struct i915_address_space *vm = eb->context->vm;
> > > +       struct dma_fence *fence;
> > > +       struct i915_vma *vma;
> > > +
> > > +       fence = eb->composite_fence ? eb->composite_fence :
> > > +               &eb->requests[0]->fence;
> > > +
> > > +       __eb_persistent_add_shared_fence(vm->root_obj, fence);
> > > +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
> > > +                           non_priv_vm_bind_link)
> > > +               __eb_persistent_add_shared_fence(vma->obj,
> > > fence);
> > > +}
> > > +
> > > +static void eb_persistent_vmas_move_to_active(struct
> > > i915_execbuffer
> > > *eb)
> > > +{
> > > +       /* Add fence to BOs dma-resv fence list */
> > > +       eb_persistent_add_shared_fence(eb);
> > 
> > This means we don't add any fences to the vma active trackers.
> > While
> > this works fine for TTM delayed buffer destruction, unbinding at
> > eviction and shrinking wouldn't wait for gpu activity to idle
> > before
> > unbinding?
> 
> Eviction and shrinker will wait for gpu activity to idle before
> unbinding.
> The i915_vma_is_active() and i915_vma_sync() have been updated to
> handle
> the persistent vmas to differntly (by checking/waiting for dma-resv
> fence
> list).

Ah, yes. Now I see. Still the async unbinding __i915_vma_unbind_async()
needs update? Should probably add a i915_sw_fence_await_reservation()
(modified if needed for bookkeeping fences) of the vm's reservation
object there?

/Thomas

> 
> Niranjana
> 
> > 
> > 
> > /Thomas
> > 


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 07/10] drm/i915/vm_bind: Handle persistent vmas in execbuf3
@ 2022-07-08 13:03         ` Hellstrom, Thomas
  0 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-08 13:03 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	 Daniel, christian.koenig

On Fri, 2022-07-08 at 05:44 -0700, Niranjana Vishwanathapura wrote:
> On Thu, Jul 07, 2022 at 07:54:16AM -0700, Hellstrom, Thomas wrote:
> > On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> > > Handle persistent (VM_BIND) mappings during the request
> > > submission
> > > in the execbuf3 path.
> > > 
> > > Signed-off-by: Niranjana Vishwanathapura
> > > <niranjana.vishwanathapura@intel.com>
> > > ---
> > >  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 176
> > > +++++++++++++++++-
> > >  1 file changed, 175 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > index 13121df72e3d..2079f5ca9010 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > @@ -22,6 +22,7 @@
> > >  #include "i915_gem_vm_bind.h"
> > >  #include "i915_trace.h"
> > > 
> > > +#define __EXEC3_HAS_PIN                        BIT_ULL(33)
> > >  #define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
> > >  #define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
> > > 
> > > @@ -45,7 +46,9 @@
> > >   * execlist. Hence, no support for implicit sync.
> > >   *
> > >   * The new execbuf3 ioctl only works in VM_BIND mode and the
> > > VM_BIND
> > > mode only
> > > - * works with execbuf3 ioctl for submission.
> > > + * works with execbuf3 ioctl for submission. All BOs mapped on
> > > that
> > > VM (through
> > > + * VM_BIND call) at the time of execbuf3 call are deemed
> > > required
> > > for that
> > > + * submission.
> > >   *
> > >   * The execbuf3 ioctl directly specifies the batch addresses
> > > instead
> > > of as
> > >   * object handles as in execbuf2 ioctl. The execbuf3 ioctl will
> > > also
> > > not
> > > @@ -61,6 +64,13 @@
> > >   * So, a lot of code supporting execbuf2 ioctl, like
> > > relocations, VA
> > > evictions,
> > >   * vma lookup table, implicit sync, vma active reference
> > > tracking
> > > etc., are not
> > >   * applicable for execbuf3 ioctl.
> > > + *
> > > + * During each execbuf submission, request fence is added to all
> > > VM_BIND mapped
> > > + * objects with DMA_RESV_USAGE_BOOKKEEP. The
> > > DMA_RESV_USAGE_BOOKKEEP
> > > usage will
> > > + * prevent over sync (See enum dma_resv_usage). Note that
> > > DRM_I915_GEM_WAIT and
> > > + * DRM_I915_GEM_BUSY ioctls do not check for
> > > DMA_RESV_USAGE_BOOKKEEP
> > > usage and
> > > + * hence should not be used for end of batch check. Instead, the
> > > execbuf3
> > > + * timeline out fence should be used for end of batch check.
> > >   */
> > > 
> > >  struct eb_fence {
> > > @@ -124,6 +134,19 @@ eb_find_vma(struct i915_address_space *vm,
> > > u64
> > > addr)
> > >         return i915_gem_vm_bind_lookup_vma(vm, va);
> > >  }
> > > 
> > > +static void eb_scoop_unbound_vmas(struct i915_address_space *vm)
> > > +{
> > > +       struct i915_vma *vma, *vn;
> > > +
> > > +       spin_lock(&vm->vm_rebind_lock);
> > > +       list_for_each_entry_safe(vma, vn, &vm->vm_rebind_list,
> > > vm_rebind_link) {
> > > +               list_del_init(&vma->vm_rebind_link);
> > > +               if (!list_empty(&vma->vm_bind_link))
> > > +                       list_move_tail(&vma->vm_bind_link, &vm-
> > > > vm_bind_list);
> > > +       }
> > > +       spin_unlock(&vm->vm_rebind_lock);
> > > +}
> > > +
> > >  static int eb_lookup_vmas(struct i915_execbuffer *eb)
> > >  {
> > >         unsigned int i, current_batch = 0;
> > > @@ -138,11 +161,118 @@ static int eb_lookup_vmas(struct
> > > i915_execbuffer *eb)
> > >                 ++current_batch;
> > >         }
> > > 
> > > +       eb_scoop_unbound_vmas(eb->context->vm);
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static int eb_lock_vmas(struct i915_execbuffer *eb)
> > > +{
> > > +       struct i915_address_space *vm = eb->context->vm;
> > > +       struct i915_vma *vma;
> > > +       int err;
> > > +
> > > +       err = i915_gem_vm_priv_lock(eb->context->vm, &eb->ww);
> > > +       if (err)
> > > +               return err;
> > > +
> > 
> > See comment in review for 08/10 about re-checking the rebind list
> > here.
> > 
> > 
> > 
> > > +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
> > > +                           non_priv_vm_bind_link) {
> > > +               err = i915_gem_object_lock(vma->obj, &eb->ww);
> > > +               if (err)
> > > +                       return err;
> > > +       }
> > > +
> > >         return 0;
> > >  }
> > > 
> > > +static void eb_release_persistent_vmas(struct i915_execbuffer
> > > *eb,
> > > bool final)
> > > +{
> > > +       struct i915_address_space *vm = eb->context->vm;
> > > +       struct i915_vma *vma, *vn;
> > > +
> > > +       assert_vm_bind_held(vm);
> > > +
> > > +       if (!(eb->args->flags & __EXEC3_HAS_PIN))
> > > +               return;
> > > +
> > > +       assert_vm_priv_held(vm);
> > > +
> > > +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
> > > +               __i915_vma_unpin(vma);
> > > +
> > > +       eb->args->flags &= ~__EXEC3_HAS_PIN;
> > > +       if (!final)
> > > +               return;
> > > +
> > > +       list_for_each_entry_safe(vma, vn, &vm->vm_bind_list,
> > > vm_bind_link)
> > > +               if (i915_vma_is_bind_complete(vma))
> > > +                       list_move_tail(&vma->vm_bind_link, &vm-
> > > > vm_bound_list);
> > > +}
> > > +
> > >  static void eb_release_vmas(struct i915_execbuffer *eb, bool
> > > final)
> > >  {
> > > +       eb_release_persistent_vmas(eb, final);
> > > +       eb_unpin_engine(eb);
> > > +}
> > > +
> > > +static int eb_reserve_fence_for_persistent_vmas(struct
> > > i915_execbuffer *eb)
> > > +{
> > > +       struct i915_address_space *vm = eb->context->vm;
> > > +       struct i915_vma *vma;
> > > +       int ret;
> > > +
> > > +       ret = dma_resv_reserve_fences(vm->root_obj->base.resv,
> > > 1);
> > > +       if (ret)
> > > +               return ret;
> > > +
> > > +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
> > > +                           non_priv_vm_bind_link) {
> > > +               ret = dma_resv_reserve_fences(vma->obj-
> > > >base.resv,
> > > 1);
> > > +               if (ret)
> > > +                       return ret;
> > > +       }
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static int eb_validate_persistent_vmas(struct i915_execbuffer
> > > *eb)
> > > +{
> > > +       struct i915_address_space *vm = eb->context->vm;
> > > +       struct i915_vma *vma, *last_pinned_vma = NULL;
> > > +       int ret = 0;
> > > +
> > > +       assert_vm_bind_held(vm);
> > > +       assert_vm_priv_held(vm);
> > > +
> > > +       ret = eb_reserve_fence_for_persistent_vmas(eb);
> > > +       if (ret)
> > > +               return ret;
> > > +
> > > +       if (list_empty(&vm->vm_bind_list))
> > > +               return 0;
> > > +
> > > +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
> > > {
> > > +               u64 pin_flags = vma->start | PIN_OFFSET_FIXED |
> > > PIN_USER;
> > > +
> > > +               ret = i915_vma_pin_ww(vma, &eb->ww, 0, 0,
> > > pin_flags);
> > > +               if (ret)
> > > +                       break;
> > > +
> > > +               last_pinned_vma = vma;
> > > +       }
> > > +
> > > +       if (ret && last_pinned_vma) {
> > > +               list_for_each_entry(vma, &vm->vm_bind_list,
> > > vm_bind_link) {
> > > +                       __i915_vma_unpin(vma);
> > > +                       if (vma == last_pinned_vma)
> > > +                               break;
> > > +               }
> > > +       } else if (last_pinned_vma) {
> > > +               eb->args->flags |= __EXEC3_HAS_PIN;
> > > +       }
> > > +
> > > +       return ret;
> > >  }
> > > 
> > >  static int eb_validate_vmas(struct i915_execbuffer *eb)
> > > @@ -162,8 +292,17 @@ static int eb_validate_vmas(struct
> > > i915_execbuffer *eb)
> > >         /* only throttle once, even if we didn't need to throttle
> > > */
> > >         throttle = false;
> > > 
> > > +       err = eb_lock_vmas(eb);
> > > +       if (err)
> > > +               goto err;
> > > +
> > > +       err = eb_validate_persistent_vmas(eb);
> > > +       if (err)
> > > +               goto err;
> > > +
> > >  err:
> > >         if (err == -EDEADLK) {
> > > +               eb_release_vmas(eb, false);
> > >                 err = i915_gem_ww_ctx_backoff(&eb->ww);
> > >                 if (!err)
> > >                         goto retry;
> > > @@ -187,8 +326,43 @@ static int eb_validate_vmas(struct
> > > i915_execbuffer *eb)
> > >         BUILD_BUG_ON(!typecheck(int, _i)); \
> > >         for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
> > > 
> > > +static void __eb_persistent_add_shared_fence(struct
> > > drm_i915_gem_object *obj,
> > > +                                            struct dma_fence
> > > *fence)
> > > +{
> > > +       dma_resv_add_fence(obj->base.resv, fence,
> > > DMA_RESV_USAGE_BOOKKEEP);
> > > +       obj->write_domain = 0;
> > > +       obj->read_domains |= I915_GEM_GPU_DOMAINS;
> > > +       obj->mm.dirty = true;
> > > +}
> > > +
> > > +static void eb_persistent_add_shared_fence(struct
> > > i915_execbuffer
> > > *eb)
> > > +{
> > > +       struct i915_address_space *vm = eb->context->vm;
> > > +       struct dma_fence *fence;
> > > +       struct i915_vma *vma;
> > > +
> > > +       fence = eb->composite_fence ? eb->composite_fence :
> > > +               &eb->requests[0]->fence;
> > > +
> > > +       __eb_persistent_add_shared_fence(vm->root_obj, fence);
> > > +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
> > > +                           non_priv_vm_bind_link)
> > > +               __eb_persistent_add_shared_fence(vma->obj,
> > > fence);
> > > +}
> > > +
> > > +static void eb_persistent_vmas_move_to_active(struct
> > > i915_execbuffer
> > > *eb)
> > > +{
> > > +       /* Add fence to BOs dma-resv fence list */
> > > +       eb_persistent_add_shared_fence(eb);
> > 
> > This means we don't add any fences to the vma active trackers.
> > While
> > this works fine for TTM delayed buffer destruction, unbinding at
> > eviction and shrinking wouldn't wait for gpu activity to idle
> > before
> > unbinding?
> 
> Eviction and shrinker will wait for gpu activity to idle before
> unbinding.
> The i915_vma_is_active() and i915_vma_sync() have been updated to
> handle
> the persistent vmas to differntly (by checking/waiting for dma-resv
> fence
> list).

Ah, yes. Now I see. Still the async unbinding __i915_vma_unbind_async()
needs update? Should probably add a i915_sw_fence_await_reservation()
(modified if needed for bookkeeping fences) of the vm's reservation
object there?

/Thomas

> 
> Niranjana
> 
> > 
> > 
> > /Thomas
> > 


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
  2022-07-07 10:31     ` [Intel-gfx] " Hellstrom, Thomas
@ 2022-07-08 13:14       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 13:14 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	Daniel, christian.koenig

On Thu, Jul 07, 2022 at 03:31:42AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> Add uapi allowing user to specify a BO as private to a specified VM
>> during the BO creation.
>> VM private BOs can only be mapped on the specified VM and can't be
>> dma_buf exported. VM private BOs share a single common dma_resv
>> object,
>> hence has a performance advantage requiring a single dma_resv object
>> update in the execbuf path compared to non-private (shared) BOs.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41
>> ++++++++++++++++++-
>>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
>>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
>>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>>  drivers/gpu/drm/i915/i915_vma.c               |  1 +
>>  drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
>>  include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
>>  11 files changed, 110 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> index 927a87e5ec59..7e264566b51f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> @@ -11,6 +11,7 @@
>>  #include "pxp/intel_pxp.h"
>>
>>  #include "i915_drv.h"
>> +#include "i915_gem_context.h"
>>  #include "i915_gem_create.h"
>>  #include "i915_trace.h"
>>  #include "i915_user_extensions.h"
>> @@ -243,6 +244,7 @@ struct create_ext {
>>         unsigned int n_placements;
>>         unsigned int placement_mask;
>>         unsigned long flags;
>> +       u32 vm_id;
>>  };
>>
>>  static void repr_placements(char *buf, size_t size,
>> @@ -392,9 +394,24 @@ static int ext_set_protected(struct
>> i915_user_extension __user *base, void *data
>>         return 0;
>>  }
>>
>> +static int ext_set_vm_private(struct i915_user_extension __user
>> *base,
>> +                             void *data)
>> +{
>> +       struct drm_i915_gem_create_ext_vm_private ext;
>> +       struct create_ext *ext_data = data;
>> +
>> +       if (copy_from_user(&ext, base, sizeof(ext)))
>> +               return -EFAULT;
>> +
>> +       ext_data->vm_id = ext.vm_id;
>> +
>> +       return 0;
>> +}
>> +
>>  static const i915_user_extension_fn create_extensions[] = {
>>         [I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>>         [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
>> +       [I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
>>  };
>>
>>  /**
>> @@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev,
>> void *data,
>>         struct drm_i915_private *i915 = to_i915(dev);
>>         struct drm_i915_gem_create_ext *args = data;
>>         struct create_ext ext_data = { .i915 = i915 };
>> +       struct i915_address_space *vm = NULL;
>>         struct drm_i915_gem_object *obj;
>>         int ret;
>>
>> @@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device
>> *dev, void *data,
>>         if (ret)
>>                 return ret;
>>
>> +       if (ext_data.vm_id) {
>> +               vm = i915_gem_vm_lookup(file->driver_priv,
>> ext_data.vm_id);
>> +               if (unlikely(!vm))
>> +                       return -ENOENT;
>> +       }
>> +
>>         if (!ext_data.n_placements) {
>>                 ext_data.placements[0] =
>>                         intel_memory_region_by_type(i915,
>> INTEL_MEMORY_SYSTEM);
>> @@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device
>> *dev, void *data,
>>                                                 ext_data.placements,
>>                                                 ext_data.n_placements
>> ,
>>                                                 ext_data.flags);
>> -       if (IS_ERR(obj))
>> -               return PTR_ERR(obj);
>> +       if (IS_ERR(obj)) {
>> +               ret = PTR_ERR(obj);
>> +               goto vm_put;
>> +       }
>> +
>> +       if (vm) {
>> +               obj->base.resv = vm->root_obj->base.resv;
>> +               obj->priv_root = i915_gem_object_get(vm->root_obj);
>> +               i915_vm_put(vm);
>> +       }
>>
>>         return i915_gem_publish(obj, file, &args->size, &args-
>> >handle);
>> +vm_put:
>> +       if (vm)
>> +               i915_vm_put(vm);
>> +
>> +       return ret;
>>  }
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> index f5062d0c6333..6433173c3e84 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> @@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct
>> drm_gem_object *gem_obj, int flags)
>>         struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
>>         DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>>
>> +       if (obj->priv_root) {
>> +               drm_dbg(obj->base.dev,
>> +                       "Exporting VM private objects is not
>> allowed\n");
>> +               return ERR_PTR(-EINVAL);
>> +       }
>> +
>>         exp_info.ops = &i915_dmabuf_ops;
>>         exp_info.size = gem_obj->size;
>>         exp_info.flags = flags;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> index 5cf36a130061..9fe3395ad4d9 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> @@ -241,6 +241,9 @@ struct drm_i915_gem_object {
>>
>>         const struct drm_i915_gem_object_ops *ops;
>>
>> +       /* Shared root is object private to a VM; NULL otherwise */
>> +       struct drm_i915_gem_object *priv_root;
>> +
>>         struct {
>>                 /**
>>                  * @vma.lock: protect the list/tree of vmas
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> index 7e1f8b83077f..f1912b12db00 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> @@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct
>> ttm_buffer_object *bo)
>>         i915_gem_object_release_memory_region(obj);
>>         mutex_destroy(&obj->ttm.get_io_page.lock);
>>
>> +       if (obj->priv_root)
>> +               i915_gem_object_put(obj->priv_root);
>
>This only works for ttm-based objects. For non-TTM objects on
>integrated, we'll need to mimic the dma-resv individualization from
>TTM.

Ah, earlier I was doing this during VM destruction, but ran into
problem as vma resources lives longer than VM in TTM case. So, I
moved it here.
Ok, yah, we probably need to mimic the dma-resv individualization
from TTM, or, we can release priv_root during VM destruction for
non-TTM objects?

>
>> +
>>         if (obj->ttm.created) {
>>                 /*
>>                  * We freely manage the shrinker LRU outide of the
>> mm.pages life
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> index 642cdb559f17..ee6e4c52e80e 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> @@ -26,6 +26,17 @@ static inline void i915_gem_vm_bind_unlock(struct
>> i915_address_space *vm)
>>         mutex_unlock(&vm->vm_bind_lock);
>>  }
>>
>> +static inline int i915_gem_vm_priv_lock(struct i915_address_space
>> *vm,
>> +                                       struct i915_gem_ww_ctx *ww)
>> +{
>> +       return i915_gem_object_lock(vm->root_obj, ww);
>> +}
>
>Please make a pass on this patch making sure we provide kerneldoc where
>supposed to.
>
>> +
>> +static inline void i915_gem_vm_priv_unlock(struct i915_address_space
>> *vm)
>> +{
>> +       i915_gem_object_unlock(vm->root_obj);
>> +}
>> +
>>  struct i915_vma *
>>  i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>>  void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
>> release_obj);
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> index 43ceb4dcca6c..3201204c8e74 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> @@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma,
>> bool release_obj)
>>
>>         if (!list_empty(&vma->vm_bind_link)) {
>>                 list_del_init(&vma->vm_bind_link);
>> +               list_del_init(&vma->non_priv_vm_bind_link);
>>                 i915_vm_bind_it_remove(vma, &vma->vm->va);
>>
>>                 /* Release object */
>> @@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct
>> i915_address_space *vm,
>>                 goto put_obj;
>>         }
>>
>> +       if (obj->priv_root && obj->priv_root != vm->root_obj) {
>> +               ret = -EINVAL;
>> +               goto put_obj;
>> +       }
>> +
>>         ret = i915_gem_vm_bind_lock_interruptible(vm);
>>         if (ret)
>>                 goto put_obj;
>> @@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct
>> i915_address_space *vm,
>>
>>         list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>>         i915_vm_bind_it_insert(vma, &vm->va);
>> +       if (!obj->priv_root)
>> +               list_add_tail(&vma->non_priv_vm_bind_link,
>> +                             &vm->non_priv_vm_bind_list);
>
>I guess I'll find more details in the execbuf patches, but would it
>work to keep the non-private objects on the vm_bind_list, and just
>never move them to the vm_bound_list, rather than having a separate
>list for them?

The vm_bind/bound_list and the non_priv_vm_bind_list are there for
very different reasons.

The reason for having separate vm_bind_list and vm_bound_list is that
during the execbuf path, we can rebind the unbound mappings by scooping
all unbound vmas back from bound list into the bind list and binding
them. In fact, this probably can be done with a single vm_bind_list and
a 'eb.bind_list' (local to execbuf3 ioctl) for rebinding.

The non_priv_vm_bind_list is just an optimization to loop only through
non-priv objects while taking the locks in eb_lock_persistent_vmas()
as only non-priv objects needs that (private objects are locked in a
single shot with vm_priv_lock). A non-priv mapping will also be in the
vm_bind/bound_list.

I think, we need to add this as documentation to be more clear.

Niranjana

>
>
>>
>>         /* Hold object reference until vm_unbind */
>>         i915_gem_object_get(vma->obj);
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 135dc4a76724..df0a8459c3c6 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct
>> i915_address_space *vm,
>>  void i915_address_space_fini(struct i915_address_space *vm)
>>  {
>>         drm_mm_takedown(&vm->mm);
>> +       i915_gem_object_put(vm->root_obj);
>>         GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>>         mutex_destroy(&vm->vm_bind_lock);
>>  }
>> @@ -289,6 +290,9 @@ void i915_address_space_init(struct
>> i915_address_space *vm, int subclass)
>>         INIT_LIST_HEAD(&vm->vm_bind_list);
>>         INIT_LIST_HEAD(&vm->vm_bound_list);
>>         mutex_init(&vm->vm_bind_lock);
>> +       INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>> +       vm->root_obj = i915_gem_object_create_internal(vm->i915,
>> PAGE_SIZE);
>> +       GEM_BUG_ON(IS_ERR(vm->root_obj));
>>  }
>>
>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index d4a6ce65251d..f538ce9115c9 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -267,6 +267,8 @@ struct i915_address_space {
>>         struct list_head vm_bound_list;
>>         /* va tree of persistent vmas */
>>         struct rb_root_cached va;
>> +       struct list_head non_priv_vm_bind_list;
>> +       struct drm_i915_gem_object *root_obj;
>>
>>         /* Global GTT */
>>         bool is_ggtt:1;
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c
>> b/drivers/gpu/drm/i915/i915_vma.c
>> index d324e29cef0a..f0226581d342 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>         mutex_unlock(&vm->mutex);
>>
>>         INIT_LIST_HEAD(&vma->vm_bind_link);
>> +       INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>>         return vma;
>>
>>  err_unlock:
>> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
>> b/drivers/gpu/drm/i915/i915_vma_types.h
>> index b6d179bdbfa0..2298b3d6b7c4 100644
>> --- a/drivers/gpu/drm/i915/i915_vma_types.h
>> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>> @@ -290,6 +290,8 @@ struct i915_vma {
>>         struct list_head vm_link;
>>
>>         struct list_head vm_bind_link; /* Link in persistent VMA list
>> */
>> +       /* Link in non-private persistent VMA list */
>> +       struct list_head non_priv_vm_bind_link;
>>
>>         /** Interval tree structures for persistent vma */
>>         struct rb_node rb;
>> diff --git a/include/uapi/drm/i915_drm.h
>> b/include/uapi/drm/i915_drm.h
>> index 26cca49717f8..ce1c6592b0d7 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
>>          *
>>          * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
>>          * struct drm_i915_gem_create_ext_protected_content.
>> +        *
>> +        * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
>> +        * struct drm_i915_gem_create_ext_vm_private.
>>          */
>>  #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
>>  #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
>> +#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
>>         __u64 extensions;
>>  };
>>
>> @@ -3662,6 +3666,32 @@ struct
>> drm_i915_gem_create_ext_protected_content {
>>  /* ID of the protected content session managed by i915 when PXP is
>> active */
>>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>>
>> +/**
>> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
>> object
>> + * private to the specified VM.
>> + *
>> + * See struct drm_i915_gem_create_ext.
>> + *
>> + * By default, BOs can be mapped on multiple VMs and can also be
>> dma-buf
>> + * exported. Hence these BOs are referred to as Shared BOs.
>> + * During each execbuf3 submission, the request fence must be added
>> to the
>> + * dma-resv fence list of all shared BOs mapped on the VM.
>> + *
>> + * Unlike Shared BOs, these VM private BOs can only be mapped on the
>> VM they
>> + * are private to and can't be dma-buf exported. All private BOs of
>> a VM share
>> + * the dma-resv object. Hence during each execbuf3 submission, they
>> need only
>> + * one dma-resv fence list updated. Thus, the fast path (where
>> required
>> + * mappings are already bound) submission latency is O(1) w.r.t the
>> number of
>> + * VM private BOs.
>> + */
>> +struct drm_i915_gem_create_ext_vm_private {
>> +       /** @base: Extension link. See struct i915_user_extension. */
>> +       struct i915_user_extension base;
>> +
>> +       /** @vm_id: Id of the VM to which the object is private */
>> +       __u32 vm_id;
>> +};
>> +
>>  /**
>>   * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>>   *
>
>Thanks,
>Thomas
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
@ 2022-07-08 13:14       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 13:14 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, christian.koenig

On Thu, Jul 07, 2022 at 03:31:42AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> Add uapi allowing user to specify a BO as private to a specified VM
>> during the BO creation.
>> VM private BOs can only be mapped on the specified VM and can't be
>> dma_buf exported. VM private BOs share a single common dma_resv
>> object,
>> hence has a performance advantage requiring a single dma_resv object
>> update in the execbuf path compared to non-private (shared) BOs.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41
>> ++++++++++++++++++-
>>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
>>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
>>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>>  drivers/gpu/drm/i915/i915_vma.c               |  1 +
>>  drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
>>  include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
>>  11 files changed, 110 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> index 927a87e5ec59..7e264566b51f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> @@ -11,6 +11,7 @@
>>  #include "pxp/intel_pxp.h"
>>
>>  #include "i915_drv.h"
>> +#include "i915_gem_context.h"
>>  #include "i915_gem_create.h"
>>  #include "i915_trace.h"
>>  #include "i915_user_extensions.h"
>> @@ -243,6 +244,7 @@ struct create_ext {
>>         unsigned int n_placements;
>>         unsigned int placement_mask;
>>         unsigned long flags;
>> +       u32 vm_id;
>>  };
>>
>>  static void repr_placements(char *buf, size_t size,
>> @@ -392,9 +394,24 @@ static int ext_set_protected(struct
>> i915_user_extension __user *base, void *data
>>         return 0;
>>  }
>>
>> +static int ext_set_vm_private(struct i915_user_extension __user
>> *base,
>> +                             void *data)
>> +{
>> +       struct drm_i915_gem_create_ext_vm_private ext;
>> +       struct create_ext *ext_data = data;
>> +
>> +       if (copy_from_user(&ext, base, sizeof(ext)))
>> +               return -EFAULT;
>> +
>> +       ext_data->vm_id = ext.vm_id;
>> +
>> +       return 0;
>> +}
>> +
>>  static const i915_user_extension_fn create_extensions[] = {
>>         [I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>>         [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
>> +       [I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
>>  };
>>
>>  /**
>> @@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev,
>> void *data,
>>         struct drm_i915_private *i915 = to_i915(dev);
>>         struct drm_i915_gem_create_ext *args = data;
>>         struct create_ext ext_data = { .i915 = i915 };
>> +       struct i915_address_space *vm = NULL;
>>         struct drm_i915_gem_object *obj;
>>         int ret;
>>
>> @@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device
>> *dev, void *data,
>>         if (ret)
>>                 return ret;
>>
>> +       if (ext_data.vm_id) {
>> +               vm = i915_gem_vm_lookup(file->driver_priv,
>> ext_data.vm_id);
>> +               if (unlikely(!vm))
>> +                       return -ENOENT;
>> +       }
>> +
>>         if (!ext_data.n_placements) {
>>                 ext_data.placements[0] =
>>                         intel_memory_region_by_type(i915,
>> INTEL_MEMORY_SYSTEM);
>> @@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device
>> *dev, void *data,
>>                                                 ext_data.placements,
>>                                                 ext_data.n_placements
>> ,
>>                                                 ext_data.flags);
>> -       if (IS_ERR(obj))
>> -               return PTR_ERR(obj);
>> +       if (IS_ERR(obj)) {
>> +               ret = PTR_ERR(obj);
>> +               goto vm_put;
>> +       }
>> +
>> +       if (vm) {
>> +               obj->base.resv = vm->root_obj->base.resv;
>> +               obj->priv_root = i915_gem_object_get(vm->root_obj);
>> +               i915_vm_put(vm);
>> +       }
>>
>>         return i915_gem_publish(obj, file, &args->size, &args-
>> >handle);
>> +vm_put:
>> +       if (vm)
>> +               i915_vm_put(vm);
>> +
>> +       return ret;
>>  }
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> index f5062d0c6333..6433173c3e84 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> @@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct
>> drm_gem_object *gem_obj, int flags)
>>         struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
>>         DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>>
>> +       if (obj->priv_root) {
>> +               drm_dbg(obj->base.dev,
>> +                       "Exporting VM private objects is not
>> allowed\n");
>> +               return ERR_PTR(-EINVAL);
>> +       }
>> +
>>         exp_info.ops = &i915_dmabuf_ops;
>>         exp_info.size = gem_obj->size;
>>         exp_info.flags = flags;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> index 5cf36a130061..9fe3395ad4d9 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> @@ -241,6 +241,9 @@ struct drm_i915_gem_object {
>>
>>         const struct drm_i915_gem_object_ops *ops;
>>
>> +       /* Shared root is object private to a VM; NULL otherwise */
>> +       struct drm_i915_gem_object *priv_root;
>> +
>>         struct {
>>                 /**
>>                  * @vma.lock: protect the list/tree of vmas
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> index 7e1f8b83077f..f1912b12db00 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> @@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct
>> ttm_buffer_object *bo)
>>         i915_gem_object_release_memory_region(obj);
>>         mutex_destroy(&obj->ttm.get_io_page.lock);
>>
>> +       if (obj->priv_root)
>> +               i915_gem_object_put(obj->priv_root);
>
>This only works for ttm-based objects. For non-TTM objects on
>integrated, we'll need to mimic the dma-resv individualization from
>TTM.

Ah, earlier I was doing this during VM destruction, but ran into
problem as vma resources lives longer than VM in TTM case. So, I
moved it here.
Ok, yah, we probably need to mimic the dma-resv individualization
from TTM, or, we can release priv_root during VM destruction for
non-TTM objects?

>
>> +
>>         if (obj->ttm.created) {
>>                 /*
>>                  * We freely manage the shrinker LRU outide of the
>> mm.pages life
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> index 642cdb559f17..ee6e4c52e80e 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> @@ -26,6 +26,17 @@ static inline void i915_gem_vm_bind_unlock(struct
>> i915_address_space *vm)
>>         mutex_unlock(&vm->vm_bind_lock);
>>  }
>>
>> +static inline int i915_gem_vm_priv_lock(struct i915_address_space
>> *vm,
>> +                                       struct i915_gem_ww_ctx *ww)
>> +{
>> +       return i915_gem_object_lock(vm->root_obj, ww);
>> +}
>
>Please make a pass on this patch making sure we provide kerneldoc where
>supposed to.
>
>> +
>> +static inline void i915_gem_vm_priv_unlock(struct i915_address_space
>> *vm)
>> +{
>> +       i915_gem_object_unlock(vm->root_obj);
>> +}
>> +
>>  struct i915_vma *
>>  i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>>  void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
>> release_obj);
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> index 43ceb4dcca6c..3201204c8e74 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> @@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma,
>> bool release_obj)
>>
>>         if (!list_empty(&vma->vm_bind_link)) {
>>                 list_del_init(&vma->vm_bind_link);
>> +               list_del_init(&vma->non_priv_vm_bind_link);
>>                 i915_vm_bind_it_remove(vma, &vma->vm->va);
>>
>>                 /* Release object */
>> @@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct
>> i915_address_space *vm,
>>                 goto put_obj;
>>         }
>>
>> +       if (obj->priv_root && obj->priv_root != vm->root_obj) {
>> +               ret = -EINVAL;
>> +               goto put_obj;
>> +       }
>> +
>>         ret = i915_gem_vm_bind_lock_interruptible(vm);
>>         if (ret)
>>                 goto put_obj;
>> @@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct
>> i915_address_space *vm,
>>
>>         list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>>         i915_vm_bind_it_insert(vma, &vm->va);
>> +       if (!obj->priv_root)
>> +               list_add_tail(&vma->non_priv_vm_bind_link,
>> +                             &vm->non_priv_vm_bind_list);
>
>I guess I'll find more details in the execbuf patches, but would it
>work to keep the non-private objects on the vm_bind_list, and just
>never move them to the vm_bound_list, rather than having a separate
>list for them?

The vm_bind/bound_list and the non_priv_vm_bind_list are there for
very different reasons.

The reason for having separate vm_bind_list and vm_bound_list is that
during the execbuf path, we can rebind the unbound mappings by scooping
all unbound vmas back from bound list into the bind list and binding
them. In fact, this probably can be done with a single vm_bind_list and
a 'eb.bind_list' (local to execbuf3 ioctl) for rebinding.

The non_priv_vm_bind_list is just an optimization to loop only through
non-priv objects while taking the locks in eb_lock_persistent_vmas()
as only non-priv objects needs that (private objects are locked in a
single shot with vm_priv_lock). A non-priv mapping will also be in the
vm_bind/bound_list.

I think, we need to add this as documentation to be more clear.

Niranjana

>
>
>>
>>         /* Hold object reference until vm_unbind */
>>         i915_gem_object_get(vma->obj);
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 135dc4a76724..df0a8459c3c6 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct
>> i915_address_space *vm,
>>  void i915_address_space_fini(struct i915_address_space *vm)
>>  {
>>         drm_mm_takedown(&vm->mm);
>> +       i915_gem_object_put(vm->root_obj);
>>         GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>>         mutex_destroy(&vm->vm_bind_lock);
>>  }
>> @@ -289,6 +290,9 @@ void i915_address_space_init(struct
>> i915_address_space *vm, int subclass)
>>         INIT_LIST_HEAD(&vm->vm_bind_list);
>>         INIT_LIST_HEAD(&vm->vm_bound_list);
>>         mutex_init(&vm->vm_bind_lock);
>> +       INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>> +       vm->root_obj = i915_gem_object_create_internal(vm->i915,
>> PAGE_SIZE);
>> +       GEM_BUG_ON(IS_ERR(vm->root_obj));
>>  }
>>
>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index d4a6ce65251d..f538ce9115c9 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -267,6 +267,8 @@ struct i915_address_space {
>>         struct list_head vm_bound_list;
>>         /* va tree of persistent vmas */
>>         struct rb_root_cached va;
>> +       struct list_head non_priv_vm_bind_list;
>> +       struct drm_i915_gem_object *root_obj;
>>
>>         /* Global GTT */
>>         bool is_ggtt:1;
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c
>> b/drivers/gpu/drm/i915/i915_vma.c
>> index d324e29cef0a..f0226581d342 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>         mutex_unlock(&vm->mutex);
>>
>>         INIT_LIST_HEAD(&vma->vm_bind_link);
>> +       INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>>         return vma;
>>
>>  err_unlock:
>> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
>> b/drivers/gpu/drm/i915/i915_vma_types.h
>> index b6d179bdbfa0..2298b3d6b7c4 100644
>> --- a/drivers/gpu/drm/i915/i915_vma_types.h
>> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>> @@ -290,6 +290,8 @@ struct i915_vma {
>>         struct list_head vm_link;
>>
>>         struct list_head vm_bind_link; /* Link in persistent VMA list
>> */
>> +       /* Link in non-private persistent VMA list */
>> +       struct list_head non_priv_vm_bind_link;
>>
>>         /** Interval tree structures for persistent vma */
>>         struct rb_node rb;
>> diff --git a/include/uapi/drm/i915_drm.h
>> b/include/uapi/drm/i915_drm.h
>> index 26cca49717f8..ce1c6592b0d7 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
>>          *
>>          * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
>>          * struct drm_i915_gem_create_ext_protected_content.
>> +        *
>> +        * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
>> +        * struct drm_i915_gem_create_ext_vm_private.
>>          */
>>  #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
>>  #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
>> +#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
>>         __u64 extensions;
>>  };
>>
>> @@ -3662,6 +3666,32 @@ struct
>> drm_i915_gem_create_ext_protected_content {
>>  /* ID of the protected content session managed by i915 when PXP is
>> active */
>>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>>
>> +/**
>> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
>> object
>> + * private to the specified VM.
>> + *
>> + * See struct drm_i915_gem_create_ext.
>> + *
>> + * By default, BOs can be mapped on multiple VMs and can also be
>> dma-buf
>> + * exported. Hence these BOs are referred to as Shared BOs.
>> + * During each execbuf3 submission, the request fence must be added
>> to the
>> + * dma-resv fence list of all shared BOs mapped on the VM.
>> + *
>> + * Unlike Shared BOs, these VM private BOs can only be mapped on the
>> VM they
>> + * are private to and can't be dma-buf exported. All private BOs of
>> a VM share
>> + * the dma-resv object. Hence during each execbuf3 submission, they
>> need only
>> + * one dma-resv fence list updated. Thus, the fast path (where
>> required
>> + * mappings are already bound) submission latency is O(1) w.r.t the
>> number of
>> + * VM private BOs.
>> + */
>> +struct drm_i915_gem_create_ext_vm_private {
>> +       /** @base: Extension link. See struct i915_user_extension. */
>> +       struct i915_user_extension base;
>> +
>> +       /** @vm_id: Id of the VM to which the object is private */
>> +       __u32 vm_id;
>> +};
>> +
>>  /**
>>   * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>>   *
>
>Thanks,
>Thomas
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
  2022-07-07 13:27     ` [Intel-gfx] " Christian König
@ 2022-07-08 13:23       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 13:23 UTC (permalink / raw)
  To: Christian König
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin, intel-gfx,
	lionel.g.landwerlin, thomas.hellstrom, dri-devel, jason,
	daniel.vetter, matthew.auld

On Thu, Jul 07, 2022 at 03:27:43PM +0200, Christian König wrote:
>Am 02.07.22 um 00:50 schrieb Niranjana Vishwanathapura:
>>Add uapi allowing user to specify a BO as private to a specified VM
>>during the BO creation.
>>VM private BOs can only be mapped on the specified VM and can't be
>>dma_buf exported. VM private BOs share a single common dma_resv object,
>>hence has a performance advantage requiring a single dma_resv object
>>update in the execbuf path compared to non-private (shared) BOs.
>
>Sounds like you picked up the per VM BO idea from amdgpu here :)
>
>Of hand looks like a good idea, but shouldn't we add a few comments in 
>the common documentation about that?
>
>E.g. something like "Multiple buffer objects sometimes share the same 
>dma_resv object....." to the dma_resv documentation.
>
>Probably best as a separate patch after this here has landed.

:)
Sounds good. Probably we need to update documentation of
drm_gem_object.resv and drm_gem_object._resv here, right?

Doing it in a separate patch after this series lands sounds good to me.

Thanks,
Niranjana

>
>Regards,
>Christian.
>
>>
>>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>---
>>  drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41 ++++++++++++++++++-
>>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
>>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
>>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>>  drivers/gpu/drm/i915/i915_vma.c               |  1 +
>>  drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
>>  include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
>>  11 files changed, 110 insertions(+), 2 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>index 927a87e5ec59..7e264566b51f 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>@@ -11,6 +11,7 @@
>>  #include "pxp/intel_pxp.h"
>>  #include "i915_drv.h"
>>+#include "i915_gem_context.h"
>>  #include "i915_gem_create.h"
>>  #include "i915_trace.h"
>>  #include "i915_user_extensions.h"
>>@@ -243,6 +244,7 @@ struct create_ext {
>>  	unsigned int n_placements;
>>  	unsigned int placement_mask;
>>  	unsigned long flags;
>>+	u32 vm_id;
>>  };
>>  static void repr_placements(char *buf, size_t size,
>>@@ -392,9 +394,24 @@ static int ext_set_protected(struct i915_user_extension __user *base, void *data
>>  	return 0;
>>  }
>>+static int ext_set_vm_private(struct i915_user_extension __user *base,
>>+			      void *data)
>>+{
>>+	struct drm_i915_gem_create_ext_vm_private ext;
>>+	struct create_ext *ext_data = data;
>>+
>>+	if (copy_from_user(&ext, base, sizeof(ext)))
>>+		return -EFAULT;
>>+
>>+	ext_data->vm_id = ext.vm_id;
>>+
>>+	return 0;
>>+}
>>+
>>  static const i915_user_extension_fn create_extensions[] = {
>>  	[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>>  	[I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
>>+	[I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
>>  };
>>  /**
>>@@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
>>  	struct drm_i915_private *i915 = to_i915(dev);
>>  	struct drm_i915_gem_create_ext *args = data;
>>  	struct create_ext ext_data = { .i915 = i915 };
>>+	struct i915_address_space *vm = NULL;
>>  	struct drm_i915_gem_object *obj;
>>  	int ret;
>>@@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
>>  	if (ret)
>>  		return ret;
>>+	if (ext_data.vm_id) {
>>+		vm = i915_gem_vm_lookup(file->driver_priv, ext_data.vm_id);
>>+		if (unlikely(!vm))
>>+			return -ENOENT;
>>+	}
>>+
>>  	if (!ext_data.n_placements) {
>>  		ext_data.placements[0] =
>>  			intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
>>@@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
>>  						ext_data.placements,
>>  						ext_data.n_placements,
>>  						ext_data.flags);
>>-	if (IS_ERR(obj))
>>-		return PTR_ERR(obj);
>>+	if (IS_ERR(obj)) {
>>+		ret = PTR_ERR(obj);
>>+		goto vm_put;
>>+	}
>>+
>>+	if (vm) {
>>+		obj->base.resv = vm->root_obj->base.resv;
>>+		obj->priv_root = i915_gem_object_get(vm->root_obj);
>>+		i915_vm_put(vm);
>>+	}
>>  	return i915_gem_publish(obj, file, &args->size, &args->handle);
>>+vm_put:
>>+	if (vm)
>>+		i915_vm_put(vm);
>>+
>>+	return ret;
>>  }
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>index f5062d0c6333..6433173c3e84 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>@@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags)
>>  	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
>>  	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>>+	if (obj->priv_root) {
>>+		drm_dbg(obj->base.dev,
>>+			"Exporting VM private objects is not allowed\n");
>>+		return ERR_PTR(-EINVAL);
>>+	}
>>+
>>  	exp_info.ops = &i915_dmabuf_ops;
>>  	exp_info.size = gem_obj->size;
>>  	exp_info.flags = flags;
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>index 5cf36a130061..9fe3395ad4d9 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>@@ -241,6 +241,9 @@ struct drm_i915_gem_object {
>>  	const struct drm_i915_gem_object_ops *ops;
>>+	/* Shared root is object private to a VM; NULL otherwise */
>>+	struct drm_i915_gem_object *priv_root;
>>+
>>  	struct {
>>  		/**
>>  		 * @vma.lock: protect the list/tree of vmas
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>index 7e1f8b83077f..f1912b12db00 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>@@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
>>  	i915_gem_object_release_memory_region(obj);
>>  	mutex_destroy(&obj->ttm.get_io_page.lock);
>>+	if (obj->priv_root)
>>+		i915_gem_object_put(obj->priv_root);
>>+
>>  	if (obj->ttm.created) {
>>  		/*
>>  		 * We freely manage the shrinker LRU outide of the mm.pages life
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>index 642cdb559f17..ee6e4c52e80e 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>@@ -26,6 +26,17 @@ static inline void i915_gem_vm_bind_unlock(struct i915_address_space *vm)
>>  	mutex_unlock(&vm->vm_bind_lock);
>>  }
>>+static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
>>+					struct i915_gem_ww_ctx *ww)
>>+{
>>+	return i915_gem_object_lock(vm->root_obj, ww);
>>+}
>>+
>>+static inline void i915_gem_vm_priv_unlock(struct i915_address_space *vm)
>>+{
>>+	i915_gem_object_unlock(vm->root_obj);
>>+}
>>+
>>  struct i915_vma *
>>  i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>>  void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>index 43ceb4dcca6c..3201204c8e74 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>@@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
>>  	if (!list_empty(&vma->vm_bind_link)) {
>>  		list_del_init(&vma->vm_bind_link);
>>+		list_del_init(&vma->non_priv_vm_bind_link);
>>  		i915_vm_bind_it_remove(vma, &vma->vm->va);
>>  		/* Release object */
>>@@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>>  		goto put_obj;
>>  	}
>>+	if (obj->priv_root && obj->priv_root != vm->root_obj) {
>>+		ret = -EINVAL;
>>+		goto put_obj;
>>+	}
>>+
>>  	ret = i915_gem_vm_bind_lock_interruptible(vm);
>>  	if (ret)
>>  		goto put_obj;
>>@@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>>  	list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>>  	i915_vm_bind_it_insert(vma, &vm->va);
>>+	if (!obj->priv_root)
>>+		list_add_tail(&vma->non_priv_vm_bind_link,
>>+			      &vm->non_priv_vm_bind_list);
>>  	/* Hold object reference until vm_unbind */
>>  	i915_gem_object_get(vma->obj);
>>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>index 135dc4a76724..df0a8459c3c6 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>@@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
>>  void i915_address_space_fini(struct i915_address_space *vm)
>>  {
>>  	drm_mm_takedown(&vm->mm);
>>+	i915_gem_object_put(vm->root_obj);
>>  	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>>  	mutex_destroy(&vm->vm_bind_lock);
>>  }
>>@@ -289,6 +290,9 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>>  	INIT_LIST_HEAD(&vm->vm_bind_list);
>>  	INIT_LIST_HEAD(&vm->vm_bound_list);
>>  	mutex_init(&vm->vm_bind_lock);
>>+	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>>+	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
>>+	GEM_BUG_ON(IS_ERR(vm->root_obj));
>>  }
>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>index d4a6ce65251d..f538ce9115c9 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>@@ -267,6 +267,8 @@ struct i915_address_space {
>>  	struct list_head vm_bound_list;
>>  	/* va tree of persistent vmas */
>>  	struct rb_root_cached va;
>>+	struct list_head non_priv_vm_bind_list;
>>+	struct drm_i915_gem_object *root_obj;
>>  	/* Global GTT */
>>  	bool is_ggtt:1;
>>diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>>index d324e29cef0a..f0226581d342 100644
>>--- a/drivers/gpu/drm/i915/i915_vma.c
>>+++ b/drivers/gpu/drm/i915/i915_vma.c
>>@@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>  	mutex_unlock(&vm->mutex);
>>  	INIT_LIST_HEAD(&vma->vm_bind_link);
>>+	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>>  	return vma;
>>  err_unlock:
>>diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
>>index b6d179bdbfa0..2298b3d6b7c4 100644
>>--- a/drivers/gpu/drm/i915/i915_vma_types.h
>>+++ b/drivers/gpu/drm/i915/i915_vma_types.h
>>@@ -290,6 +290,8 @@ struct i915_vma {
>>  	struct list_head vm_link;
>>  	struct list_head vm_bind_link; /* Link in persistent VMA list */
>>+	/* Link in non-private persistent VMA list */
>>+	struct list_head non_priv_vm_bind_link;
>>  	/** Interval tree structures for persistent vma */
>>  	struct rb_node rb;
>>diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>index 26cca49717f8..ce1c6592b0d7 100644
>>--- a/include/uapi/drm/i915_drm.h
>>+++ b/include/uapi/drm/i915_drm.h
>>@@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
>>  	 *
>>  	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
>>  	 * struct drm_i915_gem_create_ext_protected_content.
>>+	 *
>>+	 * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
>>+	 * struct drm_i915_gem_create_ext_vm_private.
>>  	 */
>>  #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
>>  #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
>>+#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
>>  	__u64 extensions;
>>  };
>>@@ -3662,6 +3666,32 @@ struct drm_i915_gem_create_ext_protected_content {
>>  /* ID of the protected content session managed by i915 when PXP is active */
>>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>>+/**
>>+ * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
>>+ * private to the specified VM.
>>+ *
>>+ * See struct drm_i915_gem_create_ext.
>>+ *
>>+ * By default, BOs can be mapped on multiple VMs and can also be dma-buf
>>+ * exported. Hence these BOs are referred to as Shared BOs.
>>+ * During each execbuf3 submission, the request fence must be added to the
>>+ * dma-resv fence list of all shared BOs mapped on the VM.
>>+ *
>>+ * Unlike Shared BOs, these VM private BOs can only be mapped on the VM they
>>+ * are private to and can't be dma-buf exported. All private BOs of a VM share
>>+ * the dma-resv object. Hence during each execbuf3 submission, they need only
>>+ * one dma-resv fence list updated. Thus, the fast path (where required
>>+ * mappings are already bound) submission latency is O(1) w.r.t the number of
>>+ * VM private BOs.
>>+ */
>>+struct drm_i915_gem_create_ext_vm_private {
>>+	/** @base: Extension link. See struct i915_user_extension. */
>>+	struct i915_user_extension base;
>>+
>>+	/** @vm_id: Id of the VM to which the object is private */
>>+	__u32 vm_id;
>>+};
>>+
>>  /**
>>   * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>>   *
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
@ 2022-07-08 13:23       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 13:23 UTC (permalink / raw)
  To: Christian König
  Cc: paulo.r.zanoni, intel-gfx, thomas.hellstrom, dri-devel,
	daniel.vetter, matthew.auld

On Thu, Jul 07, 2022 at 03:27:43PM +0200, Christian König wrote:
>Am 02.07.22 um 00:50 schrieb Niranjana Vishwanathapura:
>>Add uapi allowing user to specify a BO as private to a specified VM
>>during the BO creation.
>>VM private BOs can only be mapped on the specified VM and can't be
>>dma_buf exported. VM private BOs share a single common dma_resv object,
>>hence has a performance advantage requiring a single dma_resv object
>>update in the execbuf path compared to non-private (shared) BOs.
>
>Sounds like you picked up the per VM BO idea from amdgpu here :)
>
>Of hand looks like a good idea, but shouldn't we add a few comments in 
>the common documentation about that?
>
>E.g. something like "Multiple buffer objects sometimes share the same 
>dma_resv object....." to the dma_resv documentation.
>
>Probably best as a separate patch after this here has landed.

:)
Sounds good. Probably we need to update documentation of
drm_gem_object.resv and drm_gem_object._resv here, right?

Doing it in a separate patch after this series lands sounds good to me.

Thanks,
Niranjana

>
>Regards,
>Christian.
>
>>
>>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>---
>>  drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41 ++++++++++++++++++-
>>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
>>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
>>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>>  drivers/gpu/drm/i915/i915_vma.c               |  1 +
>>  drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
>>  include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
>>  11 files changed, 110 insertions(+), 2 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>index 927a87e5ec59..7e264566b51f 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>@@ -11,6 +11,7 @@
>>  #include "pxp/intel_pxp.h"
>>  #include "i915_drv.h"
>>+#include "i915_gem_context.h"
>>  #include "i915_gem_create.h"
>>  #include "i915_trace.h"
>>  #include "i915_user_extensions.h"
>>@@ -243,6 +244,7 @@ struct create_ext {
>>  	unsigned int n_placements;
>>  	unsigned int placement_mask;
>>  	unsigned long flags;
>>+	u32 vm_id;
>>  };
>>  static void repr_placements(char *buf, size_t size,
>>@@ -392,9 +394,24 @@ static int ext_set_protected(struct i915_user_extension __user *base, void *data
>>  	return 0;
>>  }
>>+static int ext_set_vm_private(struct i915_user_extension __user *base,
>>+			      void *data)
>>+{
>>+	struct drm_i915_gem_create_ext_vm_private ext;
>>+	struct create_ext *ext_data = data;
>>+
>>+	if (copy_from_user(&ext, base, sizeof(ext)))
>>+		return -EFAULT;
>>+
>>+	ext_data->vm_id = ext.vm_id;
>>+
>>+	return 0;
>>+}
>>+
>>  static const i915_user_extension_fn create_extensions[] = {
>>  	[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>>  	[I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
>>+	[I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
>>  };
>>  /**
>>@@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
>>  	struct drm_i915_private *i915 = to_i915(dev);
>>  	struct drm_i915_gem_create_ext *args = data;
>>  	struct create_ext ext_data = { .i915 = i915 };
>>+	struct i915_address_space *vm = NULL;
>>  	struct drm_i915_gem_object *obj;
>>  	int ret;
>>@@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
>>  	if (ret)
>>  		return ret;
>>+	if (ext_data.vm_id) {
>>+		vm = i915_gem_vm_lookup(file->driver_priv, ext_data.vm_id);
>>+		if (unlikely(!vm))
>>+			return -ENOENT;
>>+	}
>>+
>>  	if (!ext_data.n_placements) {
>>  		ext_data.placements[0] =
>>  			intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
>>@@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
>>  						ext_data.placements,
>>  						ext_data.n_placements,
>>  						ext_data.flags);
>>-	if (IS_ERR(obj))
>>-		return PTR_ERR(obj);
>>+	if (IS_ERR(obj)) {
>>+		ret = PTR_ERR(obj);
>>+		goto vm_put;
>>+	}
>>+
>>+	if (vm) {
>>+		obj->base.resv = vm->root_obj->base.resv;
>>+		obj->priv_root = i915_gem_object_get(vm->root_obj);
>>+		i915_vm_put(vm);
>>+	}
>>  	return i915_gem_publish(obj, file, &args->size, &args->handle);
>>+vm_put:
>>+	if (vm)
>>+		i915_vm_put(vm);
>>+
>>+	return ret;
>>  }
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>index f5062d0c6333..6433173c3e84 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>@@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags)
>>  	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
>>  	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>>+	if (obj->priv_root) {
>>+		drm_dbg(obj->base.dev,
>>+			"Exporting VM private objects is not allowed\n");
>>+		return ERR_PTR(-EINVAL);
>>+	}
>>+
>>  	exp_info.ops = &i915_dmabuf_ops;
>>  	exp_info.size = gem_obj->size;
>>  	exp_info.flags = flags;
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>index 5cf36a130061..9fe3395ad4d9 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>@@ -241,6 +241,9 @@ struct drm_i915_gem_object {
>>  	const struct drm_i915_gem_object_ops *ops;
>>+	/* Shared root is object private to a VM; NULL otherwise */
>>+	struct drm_i915_gem_object *priv_root;
>>+
>>  	struct {
>>  		/**
>>  		 * @vma.lock: protect the list/tree of vmas
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>index 7e1f8b83077f..f1912b12db00 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>@@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
>>  	i915_gem_object_release_memory_region(obj);
>>  	mutex_destroy(&obj->ttm.get_io_page.lock);
>>+	if (obj->priv_root)
>>+		i915_gem_object_put(obj->priv_root);
>>+
>>  	if (obj->ttm.created) {
>>  		/*
>>  		 * We freely manage the shrinker LRU outide of the mm.pages life
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>index 642cdb559f17..ee6e4c52e80e 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>@@ -26,6 +26,17 @@ static inline void i915_gem_vm_bind_unlock(struct i915_address_space *vm)
>>  	mutex_unlock(&vm->vm_bind_lock);
>>  }
>>+static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
>>+					struct i915_gem_ww_ctx *ww)
>>+{
>>+	return i915_gem_object_lock(vm->root_obj, ww);
>>+}
>>+
>>+static inline void i915_gem_vm_priv_unlock(struct i915_address_space *vm)
>>+{
>>+	i915_gem_object_unlock(vm->root_obj);
>>+}
>>+
>>  struct i915_vma *
>>  i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>>  void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>index 43ceb4dcca6c..3201204c8e74 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>@@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
>>  	if (!list_empty(&vma->vm_bind_link)) {
>>  		list_del_init(&vma->vm_bind_link);
>>+		list_del_init(&vma->non_priv_vm_bind_link);
>>  		i915_vm_bind_it_remove(vma, &vma->vm->va);
>>  		/* Release object */
>>@@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>>  		goto put_obj;
>>  	}
>>+	if (obj->priv_root && obj->priv_root != vm->root_obj) {
>>+		ret = -EINVAL;
>>+		goto put_obj;
>>+	}
>>+
>>  	ret = i915_gem_vm_bind_lock_interruptible(vm);
>>  	if (ret)
>>  		goto put_obj;
>>@@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>>  	list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>>  	i915_vm_bind_it_insert(vma, &vm->va);
>>+	if (!obj->priv_root)
>>+		list_add_tail(&vma->non_priv_vm_bind_link,
>>+			      &vm->non_priv_vm_bind_list);
>>  	/* Hold object reference until vm_unbind */
>>  	i915_gem_object_get(vma->obj);
>>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>index 135dc4a76724..df0a8459c3c6 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>@@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
>>  void i915_address_space_fini(struct i915_address_space *vm)
>>  {
>>  	drm_mm_takedown(&vm->mm);
>>+	i915_gem_object_put(vm->root_obj);
>>  	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>>  	mutex_destroy(&vm->vm_bind_lock);
>>  }
>>@@ -289,6 +290,9 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>>  	INIT_LIST_HEAD(&vm->vm_bind_list);
>>  	INIT_LIST_HEAD(&vm->vm_bound_list);
>>  	mutex_init(&vm->vm_bind_lock);
>>+	INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>>+	vm->root_obj = i915_gem_object_create_internal(vm->i915, PAGE_SIZE);
>>+	GEM_BUG_ON(IS_ERR(vm->root_obj));
>>  }
>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>index d4a6ce65251d..f538ce9115c9 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>@@ -267,6 +267,8 @@ struct i915_address_space {
>>  	struct list_head vm_bound_list;
>>  	/* va tree of persistent vmas */
>>  	struct rb_root_cached va;
>>+	struct list_head non_priv_vm_bind_list;
>>+	struct drm_i915_gem_object *root_obj;
>>  	/* Global GTT */
>>  	bool is_ggtt:1;
>>diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>>index d324e29cef0a..f0226581d342 100644
>>--- a/drivers/gpu/drm/i915/i915_vma.c
>>+++ b/drivers/gpu/drm/i915/i915_vma.c
>>@@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>  	mutex_unlock(&vm->mutex);
>>  	INIT_LIST_HEAD(&vma->vm_bind_link);
>>+	INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>>  	return vma;
>>  err_unlock:
>>diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
>>index b6d179bdbfa0..2298b3d6b7c4 100644
>>--- a/drivers/gpu/drm/i915/i915_vma_types.h
>>+++ b/drivers/gpu/drm/i915/i915_vma_types.h
>>@@ -290,6 +290,8 @@ struct i915_vma {
>>  	struct list_head vm_link;
>>  	struct list_head vm_bind_link; /* Link in persistent VMA list */
>>+	/* Link in non-private persistent VMA list */
>>+	struct list_head non_priv_vm_bind_link;
>>  	/** Interval tree structures for persistent vma */
>>  	struct rb_node rb;
>>diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>index 26cca49717f8..ce1c6592b0d7 100644
>>--- a/include/uapi/drm/i915_drm.h
>>+++ b/include/uapi/drm/i915_drm.h
>>@@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
>>  	 *
>>  	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
>>  	 * struct drm_i915_gem_create_ext_protected_content.
>>+	 *
>>+	 * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
>>+	 * struct drm_i915_gem_create_ext_vm_private.
>>  	 */
>>  #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
>>  #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
>>+#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
>>  	__u64 extensions;
>>  };
>>@@ -3662,6 +3666,32 @@ struct drm_i915_gem_create_ext_protected_content {
>>  /* ID of the protected content session managed by i915 when PXP is active */
>>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>>+/**
>>+ * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
>>+ * private to the specified VM.
>>+ *
>>+ * See struct drm_i915_gem_create_ext.
>>+ *
>>+ * By default, BOs can be mapped on multiple VMs and can also be dma-buf
>>+ * exported. Hence these BOs are referred to as Shared BOs.
>>+ * During each execbuf3 submission, the request fence must be added to the
>>+ * dma-resv fence list of all shared BOs mapped on the VM.
>>+ *
>>+ * Unlike Shared BOs, these VM private BOs can only be mapped on the VM they
>>+ * are private to and can't be dma-buf exported. All private BOs of a VM share
>>+ * the dma-resv object. Hence during each execbuf3 submission, they need only
>>+ * one dma-resv fence list updated. Thus, the fast path (where required
>>+ * mappings are already bound) submission latency is O(1) w.r.t the number of
>>+ * VM private BOs.
>>+ */
>>+struct drm_i915_gem_create_ext_vm_private {
>>+	/** @base: Extension link. See struct i915_user_extension. */
>>+	struct i915_user_extension base;
>>+
>>+	/** @vm_id: Id of the VM to which the object is private */
>>+	__u32 vm_id;
>>+};
>>+
>>  /**
>>   * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>>   *
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
  2022-07-08 13:14       ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-08 13:43         ` Hellstrom, Thomas
  -1 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-08 13:43 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	 Daniel, christian.koenig

On Fri, 2022-07-08 at 06:14 -0700, Niranjana Vishwanathapura wrote:
> On Thu, Jul 07, 2022 at 03:31:42AM -0700, Hellstrom, Thomas wrote:
> > On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> > > Add uapi allowing user to specify a BO as private to a specified
> > > VM
> > > during the BO creation.
> > > VM private BOs can only be mapped on the specified VM and can't
> > > be
> > > dma_buf exported. VM private BOs share a single common dma_resv
> > > object,
> > > hence has a performance advantage requiring a single dma_resv
> > > object
> > > update in the execbuf path compared to non-private (shared) BOs.
> > > 
> > > Signed-off-by: Niranjana Vishwanathapura
> > > <niranjana.vishwanathapura@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41
> > > ++++++++++++++++++-
> > >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
> > >  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
> > >  drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
> > >  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
> > >  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
> > >  drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
> > >  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
> > >  drivers/gpu/drm/i915/i915_vma.c               |  1 +
> > >  drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
> > >  include/uapi/drm/i915_drm.h                   | 30
> > > ++++++++++++++
> > >  11 files changed, 110 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > index 927a87e5ec59..7e264566b51f 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > @@ -11,6 +11,7 @@
> > >  #include "pxp/intel_pxp.h"
> > > 
> > >  #include "i915_drv.h"
> > > +#include "i915_gem_context.h"
> > >  #include "i915_gem_create.h"
> > >  #include "i915_trace.h"
> > >  #include "i915_user_extensions.h"
> > > @@ -243,6 +244,7 @@ struct create_ext {
> > >         unsigned int n_placements;
> > >         unsigned int placement_mask;
> > >         unsigned long flags;
> > > +       u32 vm_id;
> > >  };
> > > 
> > >  static void repr_placements(char *buf, size_t size,
> > > @@ -392,9 +394,24 @@ static int ext_set_protected(struct
> > > i915_user_extension __user *base, void *data
> > >         return 0;
> > >  }
> > > 
> > > +static int ext_set_vm_private(struct i915_user_extension __user
> > > *base,
> > > +                             void *data)
> > > +{
> > > +       struct drm_i915_gem_create_ext_vm_private ext;
> > > +       struct create_ext *ext_data = data;
> > > +
> > > +       if (copy_from_user(&ext, base, sizeof(ext)))
> > > +               return -EFAULT;
> > > +
> > > +       ext_data->vm_id = ext.vm_id;
> > > +
> > > +       return 0;
> > > +}
> > > +
> > >  static const i915_user_extension_fn create_extensions[] = {
> > >         [I915_GEM_CREATE_EXT_MEMORY_REGIONS] =
> > > ext_set_placements,
> > >         [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] =
> > > ext_set_protected,
> > > +       [I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
> > >  };
> > > 
> > >  /**
> > > @@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device
> > > *dev,
> > > void *data,
> > >         struct drm_i915_private *i915 = to_i915(dev);
> > >         struct drm_i915_gem_create_ext *args = data;
> > >         struct create_ext ext_data = { .i915 = i915 };
> > > +       struct i915_address_space *vm = NULL;
> > >         struct drm_i915_gem_object *obj;
> > >         int ret;
> > > 
> > > @@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device
> > > *dev, void *data,
> > >         if (ret)
> > >                 return ret;
> > > 
> > > +       if (ext_data.vm_id) {
> > > +               vm = i915_gem_vm_lookup(file->driver_priv,
> > > ext_data.vm_id);
> > > +               if (unlikely(!vm))
> > > +                       return -ENOENT;
> > > +       }
> > > +
> > >         if (!ext_data.n_placements) {
> > >                 ext_data.placements[0] =
> > >                         intel_memory_region_by_type(i915,
> > > INTEL_MEMORY_SYSTEM);
> > > @@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device
> > > *dev, void *data,
> > >                                                
> > > ext_data.placements,
> > >                                                
> > > ext_data.n_placements
> > > ,
> > >                                                 ext_data.flags);
> > > -       if (IS_ERR(obj))
> > > -               return PTR_ERR(obj);
> > > +       if (IS_ERR(obj)) {
> > > +               ret = PTR_ERR(obj);
> > > +               goto vm_put;
> > > +       }
> > > +
> > > +       if (vm) {
> > > +               obj->base.resv = vm->root_obj->base.resv;
> > > +               obj->priv_root = i915_gem_object_get(vm-
> > > >root_obj);
> > > +               i915_vm_put(vm);
> > > +       }
> > > 
> > >         return i915_gem_publish(obj, file, &args->size, &args-
> > > > handle);
> > > +vm_put:
> > > +       if (vm)
> > > +               i915_vm_put(vm);
> > > +
> > > +       return ret;
> > >  }
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > index f5062d0c6333..6433173c3e84 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > @@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct
> > > drm_gem_object *gem_obj, int flags)
> > >         struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
> > >         DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> > > 
> > > +       if (obj->priv_root) {
> > > +               drm_dbg(obj->base.dev,
> > > +                       "Exporting VM private objects is not
> > > allowed\n");
> > > +               return ERR_PTR(-EINVAL);
> > > +       }
> > > +
> > >         exp_info.ops = &i915_dmabuf_ops;
> > >         exp_info.size = gem_obj->size;
> > >         exp_info.flags = flags;
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > > b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > > index 5cf36a130061..9fe3395ad4d9 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > > @@ -241,6 +241,9 @@ struct drm_i915_gem_object {
> > > 
> > >         const struct drm_i915_gem_object_ops *ops;
> > > 
> > > +       /* Shared root is object private to a VM; NULL otherwise
> > > */
> > > +       struct drm_i915_gem_object *priv_root;
> > > +
> > >         struct {
> > >                 /**
> > >                  * @vma.lock: protect the list/tree of vmas
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > > index 7e1f8b83077f..f1912b12db00 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > > @@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct
> > > ttm_buffer_object *bo)
> > >         i915_gem_object_release_memory_region(obj);
> > >         mutex_destroy(&obj->ttm.get_io_page.lock);
> > > 
> > > +       if (obj->priv_root)
> > > +               i915_gem_object_put(obj->priv_root);
> > 
> > This only works for ttm-based objects. For non-TTM objects on
> > integrated, we'll need to mimic the dma-resv individualization from
> > TTM.
> 
> Ah, earlier I was doing this during VM destruction, but ran into
> problem as vma resources lives longer than VM in TTM case. So, I
> moved it here.
> Ok, yah, we probably need to mimic the dma-resv individualization
> from TTM, or, we can release priv_root during VM destruction for
> non-TTM objects?
> 
> > 
> > > +
> > >         if (obj->ttm.created) {
> > >                 /*
> > >                  * We freely manage the shrinker LRU outide of
> > > the
> > > mm.pages life
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > index 642cdb559f17..ee6e4c52e80e 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > @@ -26,6 +26,17 @@ static inline void
> > > i915_gem_vm_bind_unlock(struct
> > > i915_address_space *vm)
> > >         mutex_unlock(&vm->vm_bind_lock);
> > >  }
> > > 
> > > +static inline int i915_gem_vm_priv_lock(struct
> > > i915_address_space
> > > *vm,
> > > +                                       struct i915_gem_ww_ctx
> > > *ww)
> > > +{
> > > +       return i915_gem_object_lock(vm->root_obj, ww);
> > > +}
> > 
> > Please make a pass on this patch making sure we provide kerneldoc
> > where
> > supposed to.
> > 
> > > +
> > > +static inline void i915_gem_vm_priv_unlock(struct
> > > i915_address_space
> > > *vm)
> > > +{
> > > +       i915_gem_object_unlock(vm->root_obj);
> > > +}
> > > +
> > >  struct i915_vma *
> > >  i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64
> > > va);
> > >  void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
> > > release_obj);
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > index 43ceb4dcca6c..3201204c8e74 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > @@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma
> > > *vma,
> > > bool release_obj)
> > > 
> > >         if (!list_empty(&vma->vm_bind_link)) {
> > >                 list_del_init(&vma->vm_bind_link);
> > > +               list_del_init(&vma->non_priv_vm_bind_link);
> > >                 i915_vm_bind_it_remove(vma, &vma->vm->va);
> > > 
> > >                 /* Release object */
> > > @@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct
> > > i915_address_space *vm,
> > >                 goto put_obj;
> > >         }
> > > 
> > > +       if (obj->priv_root && obj->priv_root != vm->root_obj) {
> > > +               ret = -EINVAL;
> > > +               goto put_obj;
> > > +       }
> > > +
> > >         ret = i915_gem_vm_bind_lock_interruptible(vm);
> > >         if (ret)
> > >                 goto put_obj;
> > > @@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct
> > > i915_address_space *vm,
> > > 
> > >         list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
> > >         i915_vm_bind_it_insert(vma, &vm->va);
> > > +       if (!obj->priv_root)
> > > +               list_add_tail(&vma->non_priv_vm_bind_link,
> > > +                             &vm->non_priv_vm_bind_list);
> > 
> > I guess I'll find more details in the execbuf patches, but would it
> > work to keep the non-private objects on the vm_bind_list, and just
> > never move them to the vm_bound_list, rather than having a separate
> > list for them?
> 
> The vm_bind/bound_list and the non_priv_vm_bind_list are there for
> very different reasons.
> 
> The reason for having separate vm_bind_list and vm_bound_list is that
> during the execbuf path, we can rebind the unbound mappings by
> scooping
> all unbound vmas back from bound list into the bind list and binding
> them. In fact, this probably can be done with a single vm_bind_list
> and
> a 'eb.bind_list' (local to execbuf3 ioctl) for rebinding.
> 
> The non_priv_vm_bind_list is just an optimization to loop only
> through
> non-priv objects while taking the locks in eb_lock_persistent_vmas()
> as only non-priv objects needs that (private objects are locked in a
> single shot with vm_priv_lock). A non-priv mapping will also be in
> the
> vm_bind/bound_list.
> 
> I think, we need to add this as documentation to be more clear.

OK, I understood it as private objects were either on the vm_bind list
or vm_bound_list depending on whether they needed rebinding or not, and
shared objects only on the non_priv_vm_bind list, and were always
locked, validated and fenced...

Need to take a deeper look...

/Thomas



> 
> Niranjana
> 
> > 
> > 
> > > 
> > >         /* Hold object reference until vm_unbind */
> > >         i915_gem_object_get(vma->obj);
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > index 135dc4a76724..df0a8459c3c6 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > @@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct
> > > i915_address_space *vm,
> > >  void i915_address_space_fini(struct i915_address_space *vm)
> > >  {
> > >         drm_mm_takedown(&vm->mm);
> > > +       i915_gem_object_put(vm->root_obj);
> > >         GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
> > >         mutex_destroy(&vm->vm_bind_lock);
> > >  }
> > > @@ -289,6 +290,9 @@ void i915_address_space_init(struct
> > > i915_address_space *vm, int subclass)
> > >         INIT_LIST_HEAD(&vm->vm_bind_list);
> > >         INIT_LIST_HEAD(&vm->vm_bound_list);
> > >         mutex_init(&vm->vm_bind_lock);
> > > +       INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
> > > +       vm->root_obj = i915_gem_object_create_internal(vm->i915,
> > > PAGE_SIZE);
> > > +       GEM_BUG_ON(IS_ERR(vm->root_obj));
> > >  }
> > > 
> > >  void *__px_vaddr(struct drm_i915_gem_object *p)
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > index d4a6ce65251d..f538ce9115c9 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > @@ -267,6 +267,8 @@ struct i915_address_space {
> > >         struct list_head vm_bound_list;
> > >         /* va tree of persistent vmas */
> > >         struct rb_root_cached va;
> > > +       struct list_head non_priv_vm_bind_list;
> > > +       struct drm_i915_gem_object *root_obj;
> > > 
> > >         /* Global GTT */
> > >         bool is_ggtt:1;
> > > diff --git a/drivers/gpu/drm/i915/i915_vma.c
> > > b/drivers/gpu/drm/i915/i915_vma.c
> > > index d324e29cef0a..f0226581d342 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma.c
> > > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > > @@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
> > >         mutex_unlock(&vm->mutex);
> > > 
> > >         INIT_LIST_HEAD(&vma->vm_bind_link);
> > > +       INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
> > >         return vma;
> > > 
> > >  err_unlock:
> > > diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> > > b/drivers/gpu/drm/i915/i915_vma_types.h
> > > index b6d179bdbfa0..2298b3d6b7c4 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma_types.h
> > > +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> > > @@ -290,6 +290,8 @@ struct i915_vma {
> > >         struct list_head vm_link;
> > > 
> > >         struct list_head vm_bind_link; /* Link in persistent VMA
> > > list
> > > */
> > > +       /* Link in non-private persistent VMA list */
> > > +       struct list_head non_priv_vm_bind_link;
> > > 
> > >         /** Interval tree structures for persistent vma */
> > >         struct rb_node rb;
> > > diff --git a/include/uapi/drm/i915_drm.h
> > > b/include/uapi/drm/i915_drm.h
> > > index 26cca49717f8..ce1c6592b0d7 100644
> > > --- a/include/uapi/drm/i915_drm.h
> > > +++ b/include/uapi/drm/i915_drm.h
> > > @@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
> > >          *
> > >          * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
> > >          * struct drm_i915_gem_create_ext_protected_content.
> > > +        *
> > > +        * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
> > > +        * struct drm_i915_gem_create_ext_vm_private.
> > >          */
> > >  #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
> > >  #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
> > > +#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
> > >         __u64 extensions;
> > >  };
> > > 
> > > @@ -3662,6 +3666,32 @@ struct
> > > drm_i915_gem_create_ext_protected_content {
> > >  /* ID of the protected content session managed by i915 when PXP
> > > is
> > > active */
> > >  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
> > > 
> > > +/**
> > > + * struct drm_i915_gem_create_ext_vm_private - Extension to make
> > > the
> > > object
> > > + * private to the specified VM.
> > > + *
> > > + * See struct drm_i915_gem_create_ext.
> > > + *
> > > + * By default, BOs can be mapped on multiple VMs and can also be
> > > dma-buf
> > > + * exported. Hence these BOs are referred to as Shared BOs.
> > > + * During each execbuf3 submission, the request fence must be
> > > added
> > > to the
> > > + * dma-resv fence list of all shared BOs mapped on the VM.
> > > + *
> > > + * Unlike Shared BOs, these VM private BOs can only be mapped on
> > > the
> > > VM they
> > > + * are private to and can't be dma-buf exported. All private BOs
> > > of
> > > a VM share
> > > + * the dma-resv object. Hence during each execbuf3 submission,
> > > they
> > > need only
> > > + * one dma-resv fence list updated. Thus, the fast path (where
> > > required
> > > + * mappings are already bound) submission latency is O(1) w.r.t
> > > the
> > > number of
> > > + * VM private BOs.
> > > + */
> > > +struct drm_i915_gem_create_ext_vm_private {
> > > +       /** @base: Extension link. See struct
> > > i915_user_extension. */
> > > +       struct i915_user_extension base;
> > > +
> > > +       /** @vm_id: Id of the VM to which the object is private
> > > */
> > > +       __u32 vm_id;
> > > +};
> > > +
> > >  /**
> > >   * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> > >   *
> > 
> > Thanks,
> > Thomas
> > 


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
@ 2022-07-08 13:43         ` Hellstrom, Thomas
  0 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-08 13:43 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	 Daniel, christian.koenig

On Fri, 2022-07-08 at 06:14 -0700, Niranjana Vishwanathapura wrote:
> On Thu, Jul 07, 2022 at 03:31:42AM -0700, Hellstrom, Thomas wrote:
> > On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> > > Add uapi allowing user to specify a BO as private to a specified
> > > VM
> > > during the BO creation.
> > > VM private BOs can only be mapped on the specified VM and can't
> > > be
> > > dma_buf exported. VM private BOs share a single common dma_resv
> > > object,
> > > hence has a performance advantage requiring a single dma_resv
> > > object
> > > update in the execbuf path compared to non-private (shared) BOs.
> > > 
> > > Signed-off-by: Niranjana Vishwanathapura
> > > <niranjana.vishwanathapura@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41
> > > ++++++++++++++++++-
> > >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
> > >  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
> > >  drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
> > >  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
> > >  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
> > >  drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
> > >  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
> > >  drivers/gpu/drm/i915/i915_vma.c               |  1 +
> > >  drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
> > >  include/uapi/drm/i915_drm.h                   | 30
> > > ++++++++++++++
> > >  11 files changed, 110 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > index 927a87e5ec59..7e264566b51f 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > @@ -11,6 +11,7 @@
> > >  #include "pxp/intel_pxp.h"
> > > 
> > >  #include "i915_drv.h"
> > > +#include "i915_gem_context.h"
> > >  #include "i915_gem_create.h"
> > >  #include "i915_trace.h"
> > >  #include "i915_user_extensions.h"
> > > @@ -243,6 +244,7 @@ struct create_ext {
> > >         unsigned int n_placements;
> > >         unsigned int placement_mask;
> > >         unsigned long flags;
> > > +       u32 vm_id;
> > >  };
> > > 
> > >  static void repr_placements(char *buf, size_t size,
> > > @@ -392,9 +394,24 @@ static int ext_set_protected(struct
> > > i915_user_extension __user *base, void *data
> > >         return 0;
> > >  }
> > > 
> > > +static int ext_set_vm_private(struct i915_user_extension __user
> > > *base,
> > > +                             void *data)
> > > +{
> > > +       struct drm_i915_gem_create_ext_vm_private ext;
> > > +       struct create_ext *ext_data = data;
> > > +
> > > +       if (copy_from_user(&ext, base, sizeof(ext)))
> > > +               return -EFAULT;
> > > +
> > > +       ext_data->vm_id = ext.vm_id;
> > > +
> > > +       return 0;
> > > +}
> > > +
> > >  static const i915_user_extension_fn create_extensions[] = {
> > >         [I915_GEM_CREATE_EXT_MEMORY_REGIONS] =
> > > ext_set_placements,
> > >         [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] =
> > > ext_set_protected,
> > > +       [I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
> > >  };
> > > 
> > >  /**
> > > @@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device
> > > *dev,
> > > void *data,
> > >         struct drm_i915_private *i915 = to_i915(dev);
> > >         struct drm_i915_gem_create_ext *args = data;
> > >         struct create_ext ext_data = { .i915 = i915 };
> > > +       struct i915_address_space *vm = NULL;
> > >         struct drm_i915_gem_object *obj;
> > >         int ret;
> > > 
> > > @@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device
> > > *dev, void *data,
> > >         if (ret)
> > >                 return ret;
> > > 
> > > +       if (ext_data.vm_id) {
> > > +               vm = i915_gem_vm_lookup(file->driver_priv,
> > > ext_data.vm_id);
> > > +               if (unlikely(!vm))
> > > +                       return -ENOENT;
> > > +       }
> > > +
> > >         if (!ext_data.n_placements) {
> > >                 ext_data.placements[0] =
> > >                         intel_memory_region_by_type(i915,
> > > INTEL_MEMORY_SYSTEM);
> > > @@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device
> > > *dev, void *data,
> > >                                                
> > > ext_data.placements,
> > >                                                
> > > ext_data.n_placements
> > > ,
> > >                                                 ext_data.flags);
> > > -       if (IS_ERR(obj))
> > > -               return PTR_ERR(obj);
> > > +       if (IS_ERR(obj)) {
> > > +               ret = PTR_ERR(obj);
> > > +               goto vm_put;
> > > +       }
> > > +
> > > +       if (vm) {
> > > +               obj->base.resv = vm->root_obj->base.resv;
> > > +               obj->priv_root = i915_gem_object_get(vm-
> > > >root_obj);
> > > +               i915_vm_put(vm);
> > > +       }
> > > 
> > >         return i915_gem_publish(obj, file, &args->size, &args-
> > > > handle);
> > > +vm_put:
> > > +       if (vm)
> > > +               i915_vm_put(vm);
> > > +
> > > +       return ret;
> > >  }
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > index f5062d0c6333..6433173c3e84 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > @@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct
> > > drm_gem_object *gem_obj, int flags)
> > >         struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
> > >         DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> > > 
> > > +       if (obj->priv_root) {
> > > +               drm_dbg(obj->base.dev,
> > > +                       "Exporting VM private objects is not
> > > allowed\n");
> > > +               return ERR_PTR(-EINVAL);
> > > +       }
> > > +
> > >         exp_info.ops = &i915_dmabuf_ops;
> > >         exp_info.size = gem_obj->size;
> > >         exp_info.flags = flags;
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > > b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > > index 5cf36a130061..9fe3395ad4d9 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > > @@ -241,6 +241,9 @@ struct drm_i915_gem_object {
> > > 
> > >         const struct drm_i915_gem_object_ops *ops;
> > > 
> > > +       /* Shared root is object private to a VM; NULL otherwise
> > > */
> > > +       struct drm_i915_gem_object *priv_root;
> > > +
> > >         struct {
> > >                 /**
> > >                  * @vma.lock: protect the list/tree of vmas
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > > index 7e1f8b83077f..f1912b12db00 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > > @@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct
> > > ttm_buffer_object *bo)
> > >         i915_gem_object_release_memory_region(obj);
> > >         mutex_destroy(&obj->ttm.get_io_page.lock);
> > > 
> > > +       if (obj->priv_root)
> > > +               i915_gem_object_put(obj->priv_root);
> > 
> > This only works for ttm-based objects. For non-TTM objects on
> > integrated, we'll need to mimic the dma-resv individualization from
> > TTM.
> 
> Ah, earlier I was doing this during VM destruction, but ran into
> problem as vma resources lives longer than VM in TTM case. So, I
> moved it here.
> Ok, yah, we probably need to mimic the dma-resv individualization
> from TTM, or, we can release priv_root during VM destruction for
> non-TTM objects?
> 
> > 
> > > +
> > >         if (obj->ttm.created) {
> > >                 /*
> > >                  * We freely manage the shrinker LRU outide of
> > > the
> > > mm.pages life
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > index 642cdb559f17..ee6e4c52e80e 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> > > @@ -26,6 +26,17 @@ static inline void
> > > i915_gem_vm_bind_unlock(struct
> > > i915_address_space *vm)
> > >         mutex_unlock(&vm->vm_bind_lock);
> > >  }
> > > 
> > > +static inline int i915_gem_vm_priv_lock(struct
> > > i915_address_space
> > > *vm,
> > > +                                       struct i915_gem_ww_ctx
> > > *ww)
> > > +{
> > > +       return i915_gem_object_lock(vm->root_obj, ww);
> > > +}
> > 
> > Please make a pass on this patch making sure we provide kerneldoc
> > where
> > supposed to.
> > 
> > > +
> > > +static inline void i915_gem_vm_priv_unlock(struct
> > > i915_address_space
> > > *vm)
> > > +{
> > > +       i915_gem_object_unlock(vm->root_obj);
> > > +}
> > > +
> > >  struct i915_vma *
> > >  i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64
> > > va);
> > >  void i915_gem_vm_bind_remove(struct i915_vma *vma, bool
> > > release_obj);
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > index 43ceb4dcca6c..3201204c8e74 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> > > @@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma
> > > *vma,
> > > bool release_obj)
> > > 
> > >         if (!list_empty(&vma->vm_bind_link)) {
> > >                 list_del_init(&vma->vm_bind_link);
> > > +               list_del_init(&vma->non_priv_vm_bind_link);
> > >                 i915_vm_bind_it_remove(vma, &vma->vm->va);
> > > 
> > >                 /* Release object */
> > > @@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct
> > > i915_address_space *vm,
> > >                 goto put_obj;
> > >         }
> > > 
> > > +       if (obj->priv_root && obj->priv_root != vm->root_obj) {
> > > +               ret = -EINVAL;
> > > +               goto put_obj;
> > > +       }
> > > +
> > >         ret = i915_gem_vm_bind_lock_interruptible(vm);
> > >         if (ret)
> > >                 goto put_obj;
> > > @@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct
> > > i915_address_space *vm,
> > > 
> > >         list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
> > >         i915_vm_bind_it_insert(vma, &vm->va);
> > > +       if (!obj->priv_root)
> > > +               list_add_tail(&vma->non_priv_vm_bind_link,
> > > +                             &vm->non_priv_vm_bind_list);
> > 
> > I guess I'll find more details in the execbuf patches, but would it
> > work to keep the non-private objects on the vm_bind_list, and just
> > never move them to the vm_bound_list, rather than having a separate
> > list for them?
> 
> The vm_bind/bound_list and the non_priv_vm_bind_list are there for
> very different reasons.
> 
> The reason for having separate vm_bind_list and vm_bound_list is that
> during the execbuf path, we can rebind the unbound mappings by
> scooping
> all unbound vmas back from bound list into the bind list and binding
> them. In fact, this probably can be done with a single vm_bind_list
> and
> a 'eb.bind_list' (local to execbuf3 ioctl) for rebinding.
> 
> The non_priv_vm_bind_list is just an optimization to loop only
> through
> non-priv objects while taking the locks in eb_lock_persistent_vmas()
> as only non-priv objects needs that (private objects are locked in a
> single shot with vm_priv_lock). A non-priv mapping will also be in
> the
> vm_bind/bound_list.
> 
> I think, we need to add this as documentation to be more clear.

OK, I understood it as private objects were either on the vm_bind list
or vm_bound_list depending on whether they needed rebinding or not, and
shared objects only on the non_priv_vm_bind list, and were always
locked, validated and fenced...

Need to take a deeper look...

/Thomas



> 
> Niranjana
> 
> > 
> > 
> > > 
> > >         /* Hold object reference until vm_unbind */
> > >         i915_gem_object_get(vma->obj);
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > index 135dc4a76724..df0a8459c3c6 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > @@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct
> > > i915_address_space *vm,
> > >  void i915_address_space_fini(struct i915_address_space *vm)
> > >  {
> > >         drm_mm_takedown(&vm->mm);
> > > +       i915_gem_object_put(vm->root_obj);
> > >         GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
> > >         mutex_destroy(&vm->vm_bind_lock);
> > >  }
> > > @@ -289,6 +290,9 @@ void i915_address_space_init(struct
> > > i915_address_space *vm, int subclass)
> > >         INIT_LIST_HEAD(&vm->vm_bind_list);
> > >         INIT_LIST_HEAD(&vm->vm_bound_list);
> > >         mutex_init(&vm->vm_bind_lock);
> > > +       INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
> > > +       vm->root_obj = i915_gem_object_create_internal(vm->i915,
> > > PAGE_SIZE);
> > > +       GEM_BUG_ON(IS_ERR(vm->root_obj));
> > >  }
> > > 
> > >  void *__px_vaddr(struct drm_i915_gem_object *p)
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > index d4a6ce65251d..f538ce9115c9 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > @@ -267,6 +267,8 @@ struct i915_address_space {
> > >         struct list_head vm_bound_list;
> > >         /* va tree of persistent vmas */
> > >         struct rb_root_cached va;
> > > +       struct list_head non_priv_vm_bind_list;
> > > +       struct drm_i915_gem_object *root_obj;
> > > 
> > >         /* Global GTT */
> > >         bool is_ggtt:1;
> > > diff --git a/drivers/gpu/drm/i915/i915_vma.c
> > > b/drivers/gpu/drm/i915/i915_vma.c
> > > index d324e29cef0a..f0226581d342 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma.c
> > > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > > @@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
> > >         mutex_unlock(&vm->mutex);
> > > 
> > >         INIT_LIST_HEAD(&vma->vm_bind_link);
> > > +       INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
> > >         return vma;
> > > 
> > >  err_unlock:
> > > diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
> > > b/drivers/gpu/drm/i915/i915_vma_types.h
> > > index b6d179bdbfa0..2298b3d6b7c4 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma_types.h
> > > +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> > > @@ -290,6 +290,8 @@ struct i915_vma {
> > >         struct list_head vm_link;
> > > 
> > >         struct list_head vm_bind_link; /* Link in persistent VMA
> > > list
> > > */
> > > +       /* Link in non-private persistent VMA list */
> > > +       struct list_head non_priv_vm_bind_link;
> > > 
> > >         /** Interval tree structures for persistent vma */
> > >         struct rb_node rb;
> > > diff --git a/include/uapi/drm/i915_drm.h
> > > b/include/uapi/drm/i915_drm.h
> > > index 26cca49717f8..ce1c6592b0d7 100644
> > > --- a/include/uapi/drm/i915_drm.h
> > > +++ b/include/uapi/drm/i915_drm.h
> > > @@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
> > >          *
> > >          * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
> > >          * struct drm_i915_gem_create_ext_protected_content.
> > > +        *
> > > +        * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
> > > +        * struct drm_i915_gem_create_ext_vm_private.
> > >          */
> > >  #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
> > >  #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
> > > +#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
> > >         __u64 extensions;
> > >  };
> > > 
> > > @@ -3662,6 +3666,32 @@ struct
> > > drm_i915_gem_create_ext_protected_content {
> > >  /* ID of the protected content session managed by i915 when PXP
> > > is
> > > active */
> > >  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
> > > 
> > > +/**
> > > + * struct drm_i915_gem_create_ext_vm_private - Extension to make
> > > the
> > > object
> > > + * private to the specified VM.
> > > + *
> > > + * See struct drm_i915_gem_create_ext.
> > > + *
> > > + * By default, BOs can be mapped on multiple VMs and can also be
> > > dma-buf
> > > + * exported. Hence these BOs are referred to as Shared BOs.
> > > + * During each execbuf3 submission, the request fence must be
> > > added
> > > to the
> > > + * dma-resv fence list of all shared BOs mapped on the VM.
> > > + *
> > > + * Unlike Shared BOs, these VM private BOs can only be mapped on
> > > the
> > > VM they
> > > + * are private to and can't be dma-buf exported. All private BOs
> > > of
> > > a VM share
> > > + * the dma-resv object. Hence during each execbuf3 submission,
> > > they
> > > need only
> > > + * one dma-resv fence list updated. Thus, the fast path (where
> > > required
> > > + * mappings are already bound) submission latency is O(1) w.r.t
> > > the
> > > number of
> > > + * VM private BOs.
> > > + */
> > > +struct drm_i915_gem_create_ext_vm_private {
> > > +       /** @base: Extension link. See struct
> > > i915_user_extension. */
> > > +       struct i915_user_extension base;
> > > +
> > > +       /** @vm_id: Id of the VM to which the object is private
> > > */
> > > +       __u32 vm_id;
> > > +};
> > > +
> > >  /**
> > >   * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> > >   *
> > 
> > Thanks,
> > Thomas
> > 


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 06/10] drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
  2022-07-07 14:41     ` [Intel-gfx] " Hellstrom, Thomas
@ 2022-07-08 13:47       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 13:47 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	Daniel, christian.koenig

On Thu, Jul 07, 2022 at 07:41:54AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
>> works in vm_bind mode. The vm_bind mode only works with
>> this new execbuf3 ioctl.
>>
>> The new execbuf3 ioctl will not have any execlist
>
>I understand this that you mean there is no list of objects to validate
>attached to the drm_i915_gem_execbuffer3 structure rather than that the
>execlists submission backend is never used. Could we clarify this to
>avoid confusion.

Yah, side effect of overloading the word 'execlist' for multiple things.
Yah, I meant, no list of objects to validate. I agree, we need to clarify
that here.

>
>
>>  support
>> and all the legacy support like relocations etc are removed.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/i915/Makefile                 |    1 +
>>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |    5 +
>>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1029
>> +++++++++++++++++
>>  drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
>>  drivers/gpu/drm/i915/i915_driver.c            |    1 +
>>  include/uapi/drm/i915_drm.h                   |   67 +-
>>  6 files changed, 1104 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>>
>> diff --git a/drivers/gpu/drm/i915/Makefile
>> b/drivers/gpu/drm/i915/Makefile
>> index 4e1627e96c6e..38cd1c5bc1a5 100644
>> --- a/drivers/gpu/drm/i915/Makefile
>> +++ b/drivers/gpu/drm/i915/Makefile
>> @@ -148,6 +148,7 @@ gem-y += \
>>         gem/i915_gem_dmabuf.o \
>>         gem/i915_gem_domain.o \
>>         gem/i915_gem_execbuffer.o \
>> +       gem/i915_gem_execbuffer3.o \
>>         gem/i915_gem_internal.o \
>>         gem/i915_gem_object.o \
>>         gem/i915_gem_lmem.o \
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index b7b2c14fd9e1..37bb1383ab8f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -782,6 +782,11 @@ static int eb_select_context(struct
>> i915_execbuffer *eb)
>>         if (unlikely(IS_ERR(ctx)))
>>                 return PTR_ERR(ctx);
>>
>> +       if (ctx->vm->vm_bind_mode) {
>> +               i915_gem_context_put(ctx);
>> +               return -EOPNOTSUPP;
>> +       }
>> +
>>         eb->gem_context = ctx;
>>         if (i915_gem_context_has_full_ppgtt(ctx))
>>                 eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> new file mode 100644
>> index 000000000000..13121df72e3d
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> @@ -0,0 +1,1029 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +#include <linux/dma-resv.h>
>> +#include <linux/sync_file.h>
>> +#include <linux/uaccess.h>
>> +
>> +#include <drm/drm_syncobj.h>
>> +
>> +#include "gt/intel_context.h"
>> +#include "gt/intel_gpu_commands.h"
>> +#include "gt/intel_gt.h"
>> +#include "gt/intel_gt_pm.h"
>> +#include "gt/intel_ring.h"
>> +
>> +#include "i915_drv.h"
>> +#include "i915_file_private.h"
>> +#include "i915_gem_context.h"
>> +#include "i915_gem_ioctls.h"
>> +#include "i915_gem_vm_bind.h"
>> +#include "i915_trace.h"
>> +
>> +#define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
>> +#define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
>> +
>> +/* Catch emission of unexpected errors for CI! */
>> +#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
>> +#undef EINVAL
>> +#define EINVAL ({ \
>> +       DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \
>> +       22; \
>> +})
>> +#endif
>> +
>> +/**
>> + * DOC: User command execution with execbuf3 ioctl
>> + *
>> + * A VM in VM_BIND mode will not support older execbuf mode of
>> binding.
>> + * The execbuf ioctl handling in VM_BIND mode differs significantly
>> from the
>> + * older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
>> + * Hence, a new execbuf3 ioctl has been added to support VM_BIND
>> mode. (See
>> + * struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not
>> accept any
>> + * execlist. Hence, no support for implicit sync.
>> + *
>> + * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND
>> mode only
>> + * works with execbuf3 ioctl for submission.
>> + *
>> + * The execbuf3 ioctl directly specifies the batch addresses instead
>> of as
>> + * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also
>> not
>> + * support many of the older features like in/out/submit fences,
>> fence array,
>> + * default gem context etc. (See struct drm_i915_gem_execbuffer3).
>> + *
>> + * In VM_BIND mode, VA allocation is completely managed by the user
>> instead of
>> + * the i915 driver. Hence all VA assignment, eviction are not
>> applicable in
>> + * VM_BIND mode. Also, for determining object activeness, VM_BIND
>> mode will not
>> + * be using the i915_vma active reference tracking. It will instead
>> check the
>> + * dma-resv object's fence list for that.
>> + *
>> + * So, a lot of code supporting execbuf2 ioctl, like relocations, VA
>> evictions,
>> + * vma lookup table, implicit sync, vma active reference tracking
>> etc., are not
>> + * applicable for execbuf3 ioctl.
>> + */
>> +
>> +struct eb_fence {
>> +       struct drm_syncobj *syncobj; /* Use with ptr_mask_bits() */
>> +       struct dma_fence *dma_fence;
>> +       u64 value;
>> +       struct dma_fence_chain *chain_fence;
>> +};
>> +
>> +struct i915_execbuffer {
>> +       struct drm_i915_private *i915; /** i915 backpointer */
>> +       struct drm_file *file; /** per-file lookup tables and limits
>> */
>> +       struct drm_i915_gem_execbuffer3 *args; /** ioctl parameters
>> */
>> +
>> +       struct intel_gt *gt; /* gt for the execbuf */
>> +       struct intel_context *context; /* logical state for the
>> request */
>> +       struct i915_gem_context *gem_context; /** caller's context */
>> +
>> +       /** our requests to build */
>> +       struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
>> +
>> +       /** used for excl fence in dma_resv objects when > 1 BB
>> submitted */
>> +       struct dma_fence *composite_fence;
>> +
>> +       struct i915_gem_ww_ctx ww;
>> +
>> +       /* number of batches in execbuf IOCTL */
>> +       unsigned int num_batches;
>> +
>> +       u64 batch_addresses[MAX_ENGINE_INSTANCE + 1];
>> +       /** identity of the batch obj/vma */
>> +       struct i915_vma *batches[MAX_ENGINE_INSTANCE + 1];
>> +
>> +       struct eb_fence *fences;
>> +       unsigned long num_fences;
>> +};
>
>Kerneldoc structures please.
>
>It seems we are duplicating a lot of code from i915_execbuffer.c. Did
>you consider
>
>struct i915_execbuffer3 {
>...
>};
>
>struct i915_execbuffer2 {
>        struct i915_execbuffer3 eb3;
>        ...
>        [members that are not common]
>};
>
>Allowing execbuffer2 to share the execbuffer3 code to some extent.
>Not sure about the gain at this point though. My worry would be that fo
>r example fixes might be applied to one file and not the other.

I have added a TODO in the cover letter of this patch series to share
the code between execbuf2 and execbuf3.
But, I am not sure to what extent. Execbuf3 is much leaner than execbuf2
and we don't want to make it bad by forcing code sharing with legacy path.
We can perhaps abstract out some functions which takes specific arguments
(instead of 'eb'), that way we can keep these structures separate and still
share some code.

>
>> +
>> +static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle);
>> +static void eb_unpin_engine(struct i915_execbuffer *eb);
>> +
>> +static int eb_select_context(struct i915_execbuffer *eb)
>> +{
>> +       struct i915_gem_context *ctx;
>> +
>> +       ctx = i915_gem_context_lookup(eb->file->driver_priv, eb-
>> >args->ctx_id);
>> +       if (IS_ERR(ctx))
>> +               return PTR_ERR(ctx);
>> +
>> +       eb->gem_context = ctx;
>> +       return 0;
>> +}
>> +
>> +static struct i915_vma *
>> +eb_find_vma(struct i915_address_space *vm, u64 addr)
>> +{
>> +       u64 va;
>> +
>> +       assert_vm_bind_held(vm);
>> +
>> +       va = gen8_noncanonical_addr(addr & PIN_OFFSET_MASK);
>> +       return i915_gem_vm_bind_lookup_vma(vm, va);
>> +}
>> +
>> +static int eb_lookup_vmas(struct i915_execbuffer *eb)
>> +{
>> +       unsigned int i, current_batch = 0;
>> +       struct i915_vma *vma;
>> +
>> +       for (i = 0; i < eb->num_batches; i++) {
>> +               vma = eb_find_vma(eb->context->vm, eb-
>> >batch_addresses[i]);
>> +               if (!vma)
>> +                       return -EINVAL;
>> +
>> +               eb->batches[current_batch] = vma;
>> +               ++current_batch;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
>> +{
>> +}
>> +
>> +static int eb_validate_vmas(struct i915_execbuffer *eb)
>> +{
>> +       int err;
>> +       bool throttle = true;
>> +
>> +retry:
>> +       err = eb_pin_engine(eb, throttle);
>> +       if (err) {
>> +               if (err != -EDEADLK)
>> +                       return err;
>> +
>> +               goto err;
>> +       }
>> +
>> +       /* only throttle once, even if we didn't need to throttle */
>> +       throttle = false;
>> +
>> +err:
>> +       if (err == -EDEADLK) {
>> +               err = i915_gem_ww_ctx_backoff(&eb->ww);
>> +               if (!err)
>> +                       goto retry;
>> +       }
>> +
>> +       return err;
>> +}
>> +
>> +/*
>> + * Using two helper loops for the order of which requests / batches
>> are created
>> + * and added the to backend. Requests are created in order from the
>> parent to
>> + * the last child. Requests are added in the reverse order, from the
>> last child
>> + * to parent. This is done for locking reasons as the timeline lock
>> is acquired
>> + * during request creation and released when the request is added to
>> the
>> + * backend. To make lockdep happy (see intel_context_timeline_lock)
>> this must be
>> + * the ordering.
>> + */
>> +#define for_each_batch_create_order(_eb, _i) \
>> +       for ((_i) = 0; (_i) < (_eb)->num_batches; ++(_i))
>> +#define for_each_batch_add_order(_eb, _i) \
>> +       BUILD_BUG_ON(!typecheck(int, _i)); \
>> +       for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
>> +
>> +static int eb_move_to_gpu(struct i915_execbuffer *eb)
>> +{
>> +       /* Unconditionally flush any chipset caches (for streaming
>> writes). */
>> +       intel_gt_chipset_flush(eb->gt);
>> +
>> +       return 0;
>> +}
>> +
>> +static int eb_request_submit(struct i915_execbuffer *eb,
>> +                            struct i915_request *rq,
>> +                            struct i915_vma *batch,
>> +                            u64 batch_len)
>> +{
>> +       int err;
>> +
>> +       if (intel_context_nopreempt(rq->context))
>> +               __set_bit(I915_FENCE_FLAG_NOPREEMPT, &rq-
>> >fence.flags);
>> +
>> +       /*
>> +        * After we completed waiting for other engines (using HW
>> semaphores)
>> +        * then we can signal that this request/batch is ready to
>> run. This
>> +        * allows us to determine if the batch is still waiting on
>> the GPU
>> +        * or actually running by checking the breadcrumb.
>> +        */
>> +       if (rq->context->engine->emit_init_breadcrumb) {
>> +               err = rq->context->engine->emit_init_breadcrumb(rq);
>> +               if (err)
>> +                       return err;
>> +       }
>> +
>> +       err = rq->context->engine->emit_bb_start(rq,
>> +                                                batch->node.start,
>> +                                                batch_len, 0);
>> +       if (err)
>> +               return err;
>> +
>> +       return 0;
>> +}
>> +
>> +static int eb_submit(struct i915_execbuffer *eb)
>> +{
>> +       unsigned int i;
>> +       int err;
>> +
>> +       err = eb_move_to_gpu(eb);
>> +
>> +       for_each_batch_create_order(eb, i) {
>> +               if (!eb->requests[i])
>> +                       break;
>> +
>> +               trace_i915_request_queue(eb->requests[i], 0);
>> +               if (!err)
>> +                       err = eb_request_submit(eb, eb->requests[i],
>> +                                               eb->batches[i],
>> +                                               eb->batches[i]-
>> >size);
>> +       }
>> +
>> +       return err;
>> +}
>> +
>> +static struct i915_request *eb_throttle(struct i915_execbuffer *eb,
>> struct intel_context *ce)
>> +{
>> +       struct intel_ring *ring = ce->ring;
>> +       struct intel_timeline *tl = ce->timeline;
>> +       struct i915_request *rq;
>> +
>> +       /*
>> +        * Completely unscientific finger-in-the-air estimates for
>> suitable
>> +        * maximum user request size (to avoid blocking) and then
>> backoff.
>> +        */
>> +       if (intel_ring_update_space(ring) >= PAGE_SIZE)
>> +               return NULL;
>> +
>> +       /*
>> +        * Find a request that after waiting upon, there will be at
>> least half
>> +        * the ring available. The hysteresis allows us to compete
>> for the
>> +        * shared ring and should mean that we sleep less often prior
>> to
>> +        * claiming our resources, but not so long that the ring
>> completely
>> +        * drains before we can submit our next request.
>> +        */
>> +       list_for_each_entry(rq, &tl->requests, link) {
>> +               if (rq->ring != ring)
>> +                       continue;
>> +
>> +               if (__intel_ring_space(rq->postfix,
>> +                                      ring->emit, ring->size) >
>> ring->size / 2)
>> +                       break;
>> +       }
>> +       if (&rq->link == &tl->requests)
>> +               return NULL; /* weird, we will check again later for
>> real */
>> +
>> +       return i915_request_get(rq);
>> +}
>> +
>> +static int eb_pin_timeline(struct i915_execbuffer *eb, struct
>> intel_context *ce,
>> +                          bool throttle)
>> +{
>> +       struct intel_timeline *tl;
>> +       struct i915_request *rq = NULL;
>> +
>> +       /*
>> +        * Take a local wakeref for preparing to dispatch the execbuf
>> as
>> +        * we expect to access the hardware fairly frequently in the
>> +        * process, and require the engine to be kept awake between
>> accesses.
>> +        * Upon dispatch, we acquire another prolonged wakeref that
>> we hold
>> +        * until the timeline is idle, which in turn releases the
>> wakeref
>> +        * taken on the engine, and the parent device.
>> +        */
>> +       tl = intel_context_timeline_lock(ce);
>> +       if (IS_ERR(tl))
>> +               return PTR_ERR(tl);
>> +
>> +       intel_context_enter(ce);
>> +       if (throttle)
>> +               rq = eb_throttle(eb, ce);
>> +       intel_context_timeline_unlock(tl);
>> +
>> +       if (rq) {
>> +               bool nonblock = eb->file->filp->f_flags & O_NONBLOCK;
>> +               long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
>> +
>> +               if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
>> +                                     timeout) < 0) {
>> +                       i915_request_put(rq);
>> +
>> +                       /*
>> +                        * Error path, cannot use
>> intel_context_timeline_lock as
>> +                        * that is user interruptable and this clean
>> up step
>> +                        * must be done.
>> +                        */
>> +                       mutex_lock(&ce->timeline->mutex);
>> +                       intel_context_exit(ce);
>> +                       mutex_unlock(&ce->timeline->mutex);
>> +
>> +                       if (nonblock)
>> +                               return -EWOULDBLOCK;
>> +                       else
>> +                               return -EINTR;
>> +               }
>> +               i915_request_put(rq);
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
>> +{
>> +       struct intel_context *ce = eb->context, *child;
>> +       int err;
>> +       int i = 0, j = 0;
>> +
>> +       GEM_BUG_ON(eb->args->flags & __EXEC3_ENGINE_PINNED);
>> +
>> +       if (unlikely(intel_context_is_banned(ce)))
>> +               return -EIO;
>> +
>> +       /*
>> +        * Pinning the contexts may generate requests in order to
>> acquire
>> +        * GGTT space, so do this first before we reserve a seqno for
>> +        * ourselves.
>> +        */
>> +       err = intel_context_pin_ww(ce, &eb->ww);
>> +       if (err)
>> +               return err;
>> +       for_each_child(ce, child) {
>> +               err = intel_context_pin_ww(child, &eb->ww);
>> +               GEM_BUG_ON(err);        /* perma-pinned should incr a
>> counter */
>> +       }
>> +
>> +       for_each_child(ce, child) {
>> +               err = eb_pin_timeline(eb, child, throttle);
>> +               if (err)
>> +                       goto unwind;
>> +               ++i;
>> +       }
>> +       err = eb_pin_timeline(eb, ce, throttle);
>> +       if (err)
>> +               goto unwind;
>> +
>> +       eb->args->flags |= __EXEC3_ENGINE_PINNED;
>> +       return 0;
>> +
>> +unwind:
>> +       for_each_child(ce, child) {
>> +               if (j++ < i) {
>> +                       mutex_lock(&child->timeline->mutex);
>> +                       intel_context_exit(child);
>> +                       mutex_unlock(&child->timeline->mutex);
>> +               }
>> +       }
>> +       for_each_child(ce, child)
>> +               intel_context_unpin(child);
>> +       intel_context_unpin(ce);
>> +       return err;
>> +}
>> +
>> +static void eb_unpin_engine(struct i915_execbuffer *eb)
>> +{
>> +       struct intel_context *ce = eb->context, *child;
>> +
>> +       if (!(eb->args->flags & __EXEC3_ENGINE_PINNED))
>> +               return;
>> +
>> +       eb->args->flags &= ~__EXEC3_ENGINE_PINNED;
>> +
>> +       for_each_child(ce, child) {
>> +               mutex_lock(&child->timeline->mutex);
>> +               intel_context_exit(child);
>> +               mutex_unlock(&child->timeline->mutex);
>> +
>> +               intel_context_unpin(child);
>> +       }
>> +
>> +       mutex_lock(&ce->timeline->mutex);
>> +       intel_context_exit(ce);
>> +       mutex_unlock(&ce->timeline->mutex);
>> +
>> +       intel_context_unpin(ce);
>> +}
>> +
>> +static int
>> +eb_select_engine(struct i915_execbuffer *eb)
>> +{
>> +       struct intel_context *ce, *child;
>> +       unsigned int idx;
>> +       int err;
>> +
>> +       if (!i915_gem_context_user_engines(eb->gem_context))
>> +               return -EINVAL;
>> +
>> +       idx = eb->args->engine_idx;
>> +       ce = i915_gem_context_get_engine(eb->gem_context, idx);
>> +       if (IS_ERR(ce))
>> +               return PTR_ERR(ce);
>> +
>> +       eb->num_batches = ce->parallel.number_children + 1;
>> +
>> +       for_each_child(ce, child)
>> +               intel_context_get(child);
>> +       intel_gt_pm_get(ce->engine->gt);
>> +
>> +       if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
>> +               err = intel_context_alloc_state(ce);
>> +               if (err)
>> +                       goto err;
>> +       }
>> +       for_each_child(ce, child) {
>> +               if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
>> +                       err = intel_context_alloc_state(child);
>> +                       if (err)
>> +                               goto err;
>> +               }
>> +       }
>> +
>> +       /*
>> +        * ABI: Before userspace accesses the GPU (e.g. execbuffer),
>> report
>> +        * EIO if the GPU is already wedged.
>> +        */
>> +       err = intel_gt_terminally_wedged(ce->engine->gt);
>> +       if (err)
>> +               goto err;
>> +
>> +       if (!i915_vm_tryget(ce->vm)) {
>> +               err = -ENOENT;
>> +               goto err;
>> +       }
>> +
>> +       eb->context = ce;
>> +       eb->gt = ce->engine->gt;
>> +
>> +       /*
>> +        * Make sure engine pool stays alive even if we call
>> intel_context_put
>> +        * during ww handling. The pool is destroyed when last pm
>> reference
>> +        * is dropped, which breaks our -EDEADLK handling.
>> +        */
>> +       return err;
>> +
>> +err:
>> +       intel_gt_pm_put(ce->engine->gt);
>> +       for_each_child(ce, child)
>> +               intel_context_put(child);
>> +       intel_context_put(ce);
>> +       return err;
>> +}
>> +
>> +static void
>> +eb_put_engine(struct i915_execbuffer *eb)
>> +{
>> +       struct intel_context *child;
>> +
>> +       i915_vm_put(eb->context->vm);
>> +       intel_gt_pm_put(eb->gt);
>> +       for_each_child(eb->context, child)
>> +               intel_context_put(child);
>> +       intel_context_put(eb->context);
>> +}
>> +
>> +static void
>> +__free_fence_array(struct eb_fence *fences, unsigned int n)
>> +{
>> +       while (n--) {
>> +               drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
>> +               dma_fence_put(fences[n].dma_fence);
>> +               dma_fence_chain_free(fences[n].chain_fence);
>> +       }
>> +       kvfree(fences);
>> +}
>> +
>> +static int add_timeline_fence_array(struct i915_execbuffer *eb)
>> +{
>> +       struct drm_i915_gem_timeline_fence __user *user_fences;
>> +       struct eb_fence *f;
>> +       u64 nfences;
>> +       int err = 0;
>> +
>> +       nfences = eb->args->fence_count;
>> +       if (!nfences)
>> +               return 0;
>> +
>> +       /* Check multiplication overflow for access_ok() and
>> kvmalloc_array() */
>> +       BUILD_BUG_ON(sizeof(size_t) > sizeof(unsigned long));
>> +       if (nfences > min_t(unsigned long,
>> +                           ULONG_MAX / sizeof(*user_fences),
>> +                           SIZE_MAX / sizeof(*f)) - eb->num_fences)
>> +               return -EINVAL;
>> +
>> +       user_fences = u64_to_user_ptr(eb->args->timeline_fences);
>> +       if (!access_ok(user_fences, nfences * sizeof(*user_fences)))
>> +               return -EFAULT;
>> +
>> +       f = krealloc(eb->fences,
>> +                    (eb->num_fences + nfences) * sizeof(*f),
>> +                    __GFP_NOWARN | GFP_KERNEL);
>> +       if (!f)
>> +               return -ENOMEM;
>> +
>> +       eb->fences = f;
>> +       f += eb->num_fences;
>> +
>> +       BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
>> +                    ~__I915_TIMELINE_FENCE_UNKNOWN_FLAGS);
>> +
>> +       while (nfences--) {
>> +               struct drm_i915_gem_timeline_fence user_fence;
>> +               struct drm_syncobj *syncobj;
>> +               struct dma_fence *fence = NULL;
>> +               u64 point;
>> +
>> +               if (__copy_from_user(&user_fence,
>> +                                    user_fences++,
>> +                                    sizeof(user_fence)))
>> +                       return -EFAULT;
>> +
>> +               if (user_fence.flags &
>> __I915_TIMELINE_FENCE_UNKNOWN_FLAGS)
>> +                       return -EINVAL;
>> +
>> +               syncobj = drm_syncobj_find(eb->file,
>> user_fence.handle);
>> +               if (!syncobj) {
>> +                       DRM_DEBUG("Invalid syncobj handle
>> provided\n");
>> +                       return -ENOENT;
>> +               }
>> +
>> +               fence = drm_syncobj_fence_get(syncobj);
>> +
>> +               if (!fence && user_fence.flags &&
>> +                   !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL))
>> {
>> +                       DRM_DEBUG("Syncobj handle has no fence\n");
>> +                       drm_syncobj_put(syncobj);
>> +                       return -EINVAL;
>> +               }
>> +
>> +               point = user_fence.value;
>> +               if (fence)
>> +                       err = dma_fence_chain_find_seqno(&fence,
>> point);
>> +
>> +               if (err && !(user_fence.flags &
>> I915_TIMELINE_FENCE_SIGNAL)) {
>> +                       DRM_DEBUG("Syncobj handle missing requested
>> point %llu\n", point);
>> +                       dma_fence_put(fence);
>> +                       drm_syncobj_put(syncobj);
>> +                       return err;
>> +               }
>> +
>> +               /*
>> +                * A point might have been signaled already and
>> +                * garbage collected from the timeline. In this case
>> +                * just ignore the point and carry on.
>> +                */
>> +               if (!fence && !(user_fence.flags &
>> I915_TIMELINE_FENCE_SIGNAL)) {
>> +                       drm_syncobj_put(syncobj);
>> +                       continue;
>> +               }
>> +
>> +               /*
>> +                * For timeline syncobjs we need to preallocate
>> chains for
>> +                * later signaling.
>> +                */
>> +               if (point != 0 && user_fence.flags &
>> I915_TIMELINE_FENCE_SIGNAL) {
>> +                       /*
>> +                        * Waiting and signaling the same point (when
>> point !=
>> +                        * 0) would break the timeline.
>> +                        */
>> +                       if (user_fence.flags &
>> I915_TIMELINE_FENCE_WAIT) {
>> +                               DRM_DEBUG("Trying to wait & signal
>> the same timeline point.\n");
>> +                               dma_fence_put(fence);
>> +                               drm_syncobj_put(syncobj);
>> +                               return -EINVAL;
>> +                       }
>> +
>> +                       f->chain_fence = dma_fence_chain_alloc();
>> +                       if (!f->chain_fence) {
>> +                               drm_syncobj_put(syncobj);
>> +                               dma_fence_put(fence);
>> +                               return -ENOMEM;
>> +                       }
>> +               } else {
>> +                       f->chain_fence = NULL;
>> +               }
>> +
>> +               f->syncobj = ptr_pack_bits(syncobj, user_fence.flags,
>> 2);
>> +               f->dma_fence = fence;
>> +               f->value = point;
>> +               f++;
>> +               eb->num_fences++;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static void put_fence_array(struct eb_fence *fences, int num_fences)
>> +{
>> +       if (fences)
>> +               __free_fence_array(fences, num_fences);
>> +}
>> +
>> +static int
>> +await_fence_array(struct i915_execbuffer *eb,
>> +                 struct i915_request *rq)
>> +{
>> +       unsigned int n;
>> +       int err;
>> +
>> +       for (n = 0; n < eb->num_fences; n++) {
>> +               struct drm_syncobj *syncobj;
>> +               unsigned int flags;
>> +
>> +               syncobj = ptr_unpack_bits(eb->fences[n].syncobj,
>> &flags, 2);
>> +
>> +               if (!eb->fences[n].dma_fence)
>> +                       continue;
>> +
>> +               err = i915_request_await_dma_fence(rq, eb-
>> >fences[n].dma_fence);
>> +               if (err < 0)
>> +                       return err;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static void signal_fence_array(const struct i915_execbuffer *eb,
>> +                              struct dma_fence * const fence)
>> +{
>> +       unsigned int n;
>> +
>> +       for (n = 0; n < eb->num_fences; n++) {
>> +               struct drm_syncobj *syncobj;
>> +               unsigned int flags;
>> +
>> +               syncobj = ptr_unpack_bits(eb->fences[n].syncobj,
>> &flags, 2);
>> +               if (!(flags & I915_TIMELINE_FENCE_SIGNAL))
>> +                       continue;
>> +
>> +               if (eb->fences[n].chain_fence) {
>> +                       drm_syncobj_add_point(syncobj,
>> +                                             eb-
>> >fences[n].chain_fence,
>> +                                             fence,
>> +                                             eb->fences[n].value);
>> +                       /*
>> +                        * The chain's ownership is transferred to
>> the
>> +                        * timeline.
>> +                        */
>> +                       eb->fences[n].chain_fence = NULL;
>> +               } else {
>> +                       drm_syncobj_replace_fence(syncobj, fence);
>> +               }
>> +       }
>> +}
>> +
>> +static int parse_timeline_fences(struct i915_execbuffer *eb)
>> +{
>> +       return add_timeline_fence_array(eb);
>> +}
>> +
>> +static int parse_batch_addresses(struct i915_execbuffer *eb)
>> +{
>> +       struct drm_i915_gem_execbuffer3 *args = eb->args;
>> +       u64 __user *batch_addr = u64_to_user_ptr(args-
>> >batch_address);
>> +
>> +       if (copy_from_user(eb->batch_addresses, batch_addr,
>> +                          sizeof(batch_addr[0]) * eb->num_batches))
>> +               return -EFAULT;
>> +
>> +       return 0;
>> +}
>> +
>> +static void retire_requests(struct intel_timeline *tl, struct
>> i915_request *end)
>> +{
>> +       struct i915_request *rq, *rn;
>> +
>> +       list_for_each_entry_safe(rq, rn, &tl->requests, link)
>> +               if (rq == end || !i915_request_retire(rq))
>> +                       break;
>> +}
>> +
>> +static int eb_request_add(struct i915_execbuffer *eb, struct
>> i915_request *rq,
>> +                         int err, bool last_parallel)
>> +{
>> +       struct intel_timeline * const tl = i915_request_timeline(rq);
>> +       struct i915_sched_attr attr = {};
>> +       struct i915_request *prev;
>> +
>> +       lockdep_assert_held(&tl->mutex);
>> +       lockdep_unpin_lock(&tl->mutex, rq->cookie);
>> +
>> +       trace_i915_request_add(rq);
>> +
>> +       prev = __i915_request_commit(rq);
>> +
>> +       /* Check that the context wasn't destroyed before submission
>> */
>> +       if (likely(!intel_context_is_closed(eb->context))) {
>> +               attr = eb->gem_context->sched;
>> +       } else {
>> +               /* Serialise with context_close via the
>> add_to_timeline */
>> +               i915_request_set_error_once(rq, -ENOENT);
>> +               __i915_request_skip(rq);
>> +               err = -ENOENT; /* override any transient errors */
>> +       }
>> +
>> +       if (intel_context_is_parallel(eb->context)) {
>> +               if (err) {
>> +                       __i915_request_skip(rq);
>> +                       set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
>> +                               &rq->fence.flags);
>> +               }
>> +               if (last_parallel)
>> +                       set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
>> +                               &rq->fence.flags);
>> +       }
>> +
>> +       __i915_request_queue(rq, &attr);
>> +
>> +       /* Try to clean up the client's timeline after submitting the
>> request */
>> +       if (prev)
>> +               retire_requests(tl, prev);
>> +
>> +       mutex_unlock(&tl->mutex);
>> +
>> +       return err;
>> +}
>> +
>> +static int eb_requests_add(struct i915_execbuffer *eb, int err)
>> +{
>> +       int i;
>> +
>> +       /*
>> +        * We iterate in reverse order of creation to release
>> timeline mutexes in
>> +        * same order.
>> +        */
>> +       for_each_batch_add_order(eb, i) {
>> +               struct i915_request *rq = eb->requests[i];
>> +
>> +               if (!rq)
>> +                       continue;
>> +               err |= eb_request_add(eb, rq, err, i == 0);
>> +       }
>> +
>> +       return err;
>> +}
>> +
>> +static void eb_requests_get(struct i915_execbuffer *eb)
>> +{
>> +       unsigned int i;
>> +
>> +       for_each_batch_create_order(eb, i) {
>> +               if (!eb->requests[i])
>> +                       break;
>> +
>> +               i915_request_get(eb->requests[i]);
>> +       }
>> +}
>> +
>> +static void eb_requests_put(struct i915_execbuffer *eb)
>> +{
>> +       unsigned int i;
>> +
>> +       for_each_batch_create_order(eb, i) {
>> +               if (!eb->requests[i])
>> +                       break;
>> +
>> +               i915_request_put(eb->requests[i]);
>> +       }
>> +}
>> +
>> +static int
>> +eb_composite_fence_create(struct i915_execbuffer *eb)
>> +{
>> +       struct dma_fence_array *fence_array;
>> +       struct dma_fence **fences;
>> +       unsigned int i;
>> +
>> +       GEM_BUG_ON(!intel_context_is_parent(eb->context));
>> +
>> +       fences = kmalloc_array(eb->num_batches, sizeof(*fences),
>> GFP_KERNEL);
>> +       if (!fences)
>> +               return -ENOMEM;
>> +
>> +       for_each_batch_create_order(eb, i) {
>> +               fences[i] = &eb->requests[i]->fence;
>> +               __set_bit(I915_FENCE_FLAG_COMPOSITE,
>> +                         &eb->requests[i]->fence.flags);
>> +       }
>> +
>> +       fence_array = dma_fence_array_create(eb->num_batches,
>> +                                            fences,
>> +                                            eb->context-
>> >parallel.fence_context,
>> +                                            eb->context-
>> >parallel.seqno++,
>> +                                            false);
>> +       if (!fence_array) {
>> +               kfree(fences);
>> +               return -ENOMEM;
>> +       }
>> +
>> +       /* Move ownership to the dma_fence_array created above */
>> +       for_each_batch_create_order(eb, i)
>> +               dma_fence_get(fences[i]);
>> +
>> +       eb->composite_fence = &fence_array->base;
>> +
>> +       return 0;
>> +}
>> +
>> +static int
>> +eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq)
>> +{
>> +       int err;
>> +
>> +       if (unlikely(eb->gem_context->syncobj)) {
>> +               struct dma_fence *fence;
>> +
>> +               fence = drm_syncobj_fence_get(eb->gem_context-
>> >syncobj);
>> +               err = i915_request_await_dma_fence(rq, fence);
>> +               dma_fence_put(fence);
>> +               if (err)
>> +                       return err;
>> +       }
>> +
>> +       if (eb->fences) {
>> +               err = await_fence_array(eb, rq);
>> +               if (err)
>> +                       return err;
>> +       }
>> +
>> +       if (intel_context_is_parallel(eb->context)) {
>> +               err = eb_composite_fence_create(eb);
>> +               if (err)
>> +                       return err;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static struct intel_context *
>> +eb_find_context(struct i915_execbuffer *eb, unsigned int
>> context_number)
>> +{
>> +       struct intel_context *child;
>> +
>> +       if (likely(context_number == 0))
>> +               return eb->context;
>> +
>> +       for_each_child(eb->context, child)
>> +               if (!--context_number)
>> +                       return child;
>> +
>> +       GEM_BUG_ON("Context not found");
>> +
>> +       return NULL;
>> +}
>> +
>> +static int eb_requests_create(struct i915_execbuffer *eb)
>> +{
>> +       unsigned int i;
>> +       int err;
>> +
>> +       for_each_batch_create_order(eb, i) {
>> +               /* Allocate a request for this batch buffer nice and
>> early. */
>> +               eb->requests[i] =
>> i915_request_create(eb_find_context(eb, i));
>> +               if (IS_ERR(eb->requests[i])) {
>> +                       err = PTR_ERR(eb->requests[i]);
>> +                       eb->requests[i] = NULL;
>> +                       return err;
>> +               }
>> +
>> +               /*
>> +                * Only the first request added (committed to
>> backend) has to
>> +                * take the in fences into account as all subsequent
>> requests
>> +                * will have fences inserted inbetween them.
>> +                */
>> +               if (i + 1 == eb->num_batches) {
>> +                       err = eb_fences_add(eb, eb->requests[i]);
>> +                       if (err)
>> +                               return err;
>> +               }
>
>One thing I was hoping with the brand new execbuf3 IOCTL would be that
>we could actually make it dma_fence_signalling critical path compliant.
>
>That would mean annotate the dma_fence_signalling critical path just
>after the first request is created and ending it just before that same
>request was added.
>
>The main violators are memory allocated when adding dependencies in
>eb_fences_add(), but since those now are sort of limited in number, we
>might be able to pre-allocate that memory before the first request is
>created.
>
>The other main violator would be the multiple batch-buffers. Is this
>mode of operation strictly needed for version 1 or can we ditch it?
>

Hmm...I am not sure about multiple batch-buffers question. As Mesa is
the primary usecase, probably Mesa folks can answer that?

But, given we have to support to in any case, may be we can keep that
support and take on the dma_fence_signaling critical path annotation
separately in a subsequent patch series?

Niranjana

>
>
>> +
>> +               /*
>> +                * Not really on stack, but we don't want to call
>> +                * kfree on the batch_snapshot when we put it, so use
>> the
>> +                * _onstack interface.
>
>This comment is stale and can be removed.
>
>
>/Thomas
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 06/10] drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
@ 2022-07-08 13:47       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 13:47 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, christian.koenig

On Thu, Jul 07, 2022 at 07:41:54AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
>> works in vm_bind mode. The vm_bind mode only works with
>> this new execbuf3 ioctl.
>>
>> The new execbuf3 ioctl will not have any execlist
>
>I understand this that you mean there is no list of objects to validate
>attached to the drm_i915_gem_execbuffer3 structure rather than that the
>execlists submission backend is never used. Could we clarify this to
>avoid confusion.

Yah, side effect of overloading the word 'execlist' for multiple things.
Yah, I meant, no list of objects to validate. I agree, we need to clarify
that here.

>
>
>>  support
>> and all the legacy support like relocations etc are removed.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/i915/Makefile                 |    1 +
>>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |    5 +
>>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1029
>> +++++++++++++++++
>>  drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
>>  drivers/gpu/drm/i915/i915_driver.c            |    1 +
>>  include/uapi/drm/i915_drm.h                   |   67 +-
>>  6 files changed, 1104 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>>
>> diff --git a/drivers/gpu/drm/i915/Makefile
>> b/drivers/gpu/drm/i915/Makefile
>> index 4e1627e96c6e..38cd1c5bc1a5 100644
>> --- a/drivers/gpu/drm/i915/Makefile
>> +++ b/drivers/gpu/drm/i915/Makefile
>> @@ -148,6 +148,7 @@ gem-y += \
>>         gem/i915_gem_dmabuf.o \
>>         gem/i915_gem_domain.o \
>>         gem/i915_gem_execbuffer.o \
>> +       gem/i915_gem_execbuffer3.o \
>>         gem/i915_gem_internal.o \
>>         gem/i915_gem_object.o \
>>         gem/i915_gem_lmem.o \
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index b7b2c14fd9e1..37bb1383ab8f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -782,6 +782,11 @@ static int eb_select_context(struct
>> i915_execbuffer *eb)
>>         if (unlikely(IS_ERR(ctx)))
>>                 return PTR_ERR(ctx);
>>
>> +       if (ctx->vm->vm_bind_mode) {
>> +               i915_gem_context_put(ctx);
>> +               return -EOPNOTSUPP;
>> +       }
>> +
>>         eb->gem_context = ctx;
>>         if (i915_gem_context_has_full_ppgtt(ctx))
>>                 eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> new file mode 100644
>> index 000000000000..13121df72e3d
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> @@ -0,0 +1,1029 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +#include <linux/dma-resv.h>
>> +#include <linux/sync_file.h>
>> +#include <linux/uaccess.h>
>> +
>> +#include <drm/drm_syncobj.h>
>> +
>> +#include "gt/intel_context.h"
>> +#include "gt/intel_gpu_commands.h"
>> +#include "gt/intel_gt.h"
>> +#include "gt/intel_gt_pm.h"
>> +#include "gt/intel_ring.h"
>> +
>> +#include "i915_drv.h"
>> +#include "i915_file_private.h"
>> +#include "i915_gem_context.h"
>> +#include "i915_gem_ioctls.h"
>> +#include "i915_gem_vm_bind.h"
>> +#include "i915_trace.h"
>> +
>> +#define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
>> +#define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
>> +
>> +/* Catch emission of unexpected errors for CI! */
>> +#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
>> +#undef EINVAL
>> +#define EINVAL ({ \
>> +       DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \
>> +       22; \
>> +})
>> +#endif
>> +
>> +/**
>> + * DOC: User command execution with execbuf3 ioctl
>> + *
>> + * A VM in VM_BIND mode will not support older execbuf mode of
>> binding.
>> + * The execbuf ioctl handling in VM_BIND mode differs significantly
>> from the
>> + * older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
>> + * Hence, a new execbuf3 ioctl has been added to support VM_BIND
>> mode. (See
>> + * struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not
>> accept any
>> + * execlist. Hence, no support for implicit sync.
>> + *
>> + * The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND
>> mode only
>> + * works with execbuf3 ioctl for submission.
>> + *
>> + * The execbuf3 ioctl directly specifies the batch addresses instead
>> of as
>> + * object handles as in execbuf2 ioctl. The execbuf3 ioctl will also
>> not
>> + * support many of the older features like in/out/submit fences,
>> fence array,
>> + * default gem context etc. (See struct drm_i915_gem_execbuffer3).
>> + *
>> + * In VM_BIND mode, VA allocation is completely managed by the user
>> instead of
>> + * the i915 driver. Hence all VA assignment, eviction are not
>> applicable in
>> + * VM_BIND mode. Also, for determining object activeness, VM_BIND
>> mode will not
>> + * be using the i915_vma active reference tracking. It will instead
>> check the
>> + * dma-resv object's fence list for that.
>> + *
>> + * So, a lot of code supporting execbuf2 ioctl, like relocations, VA
>> evictions,
>> + * vma lookup table, implicit sync, vma active reference tracking
>> etc., are not
>> + * applicable for execbuf3 ioctl.
>> + */
>> +
>> +struct eb_fence {
>> +       struct drm_syncobj *syncobj; /* Use with ptr_mask_bits() */
>> +       struct dma_fence *dma_fence;
>> +       u64 value;
>> +       struct dma_fence_chain *chain_fence;
>> +};
>> +
>> +struct i915_execbuffer {
>> +       struct drm_i915_private *i915; /** i915 backpointer */
>> +       struct drm_file *file; /** per-file lookup tables and limits
>> */
>> +       struct drm_i915_gem_execbuffer3 *args; /** ioctl parameters
>> */
>> +
>> +       struct intel_gt *gt; /* gt for the execbuf */
>> +       struct intel_context *context; /* logical state for the
>> request */
>> +       struct i915_gem_context *gem_context; /** caller's context */
>> +
>> +       /** our requests to build */
>> +       struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
>> +
>> +       /** used for excl fence in dma_resv objects when > 1 BB
>> submitted */
>> +       struct dma_fence *composite_fence;
>> +
>> +       struct i915_gem_ww_ctx ww;
>> +
>> +       /* number of batches in execbuf IOCTL */
>> +       unsigned int num_batches;
>> +
>> +       u64 batch_addresses[MAX_ENGINE_INSTANCE + 1];
>> +       /** identity of the batch obj/vma */
>> +       struct i915_vma *batches[MAX_ENGINE_INSTANCE + 1];
>> +
>> +       struct eb_fence *fences;
>> +       unsigned long num_fences;
>> +};
>
>Kerneldoc structures please.
>
>It seems we are duplicating a lot of code from i915_execbuffer.c. Did
>you consider
>
>struct i915_execbuffer3 {
>...
>};
>
>struct i915_execbuffer2 {
>        struct i915_execbuffer3 eb3;
>        ...
>        [members that are not common]
>};
>
>Allowing execbuffer2 to share the execbuffer3 code to some extent.
>Not sure about the gain at this point though. My worry would be that fo
>r example fixes might be applied to one file and not the other.

I have added a TODO in the cover letter of this patch series to share
the code between execbuf2 and execbuf3.
But, I am not sure to what extent. Execbuf3 is much leaner than execbuf2
and we don't want to make it bad by forcing code sharing with legacy path.
We can perhaps abstract out some functions which takes specific arguments
(instead of 'eb'), that way we can keep these structures separate and still
share some code.

>
>> +
>> +static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle);
>> +static void eb_unpin_engine(struct i915_execbuffer *eb);
>> +
>> +static int eb_select_context(struct i915_execbuffer *eb)
>> +{
>> +       struct i915_gem_context *ctx;
>> +
>> +       ctx = i915_gem_context_lookup(eb->file->driver_priv, eb-
>> >args->ctx_id);
>> +       if (IS_ERR(ctx))
>> +               return PTR_ERR(ctx);
>> +
>> +       eb->gem_context = ctx;
>> +       return 0;
>> +}
>> +
>> +static struct i915_vma *
>> +eb_find_vma(struct i915_address_space *vm, u64 addr)
>> +{
>> +       u64 va;
>> +
>> +       assert_vm_bind_held(vm);
>> +
>> +       va = gen8_noncanonical_addr(addr & PIN_OFFSET_MASK);
>> +       return i915_gem_vm_bind_lookup_vma(vm, va);
>> +}
>> +
>> +static int eb_lookup_vmas(struct i915_execbuffer *eb)
>> +{
>> +       unsigned int i, current_batch = 0;
>> +       struct i915_vma *vma;
>> +
>> +       for (i = 0; i < eb->num_batches; i++) {
>> +               vma = eb_find_vma(eb->context->vm, eb-
>> >batch_addresses[i]);
>> +               if (!vma)
>> +                       return -EINVAL;
>> +
>> +               eb->batches[current_batch] = vma;
>> +               ++current_batch;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
>> +{
>> +}
>> +
>> +static int eb_validate_vmas(struct i915_execbuffer *eb)
>> +{
>> +       int err;
>> +       bool throttle = true;
>> +
>> +retry:
>> +       err = eb_pin_engine(eb, throttle);
>> +       if (err) {
>> +               if (err != -EDEADLK)
>> +                       return err;
>> +
>> +               goto err;
>> +       }
>> +
>> +       /* only throttle once, even if we didn't need to throttle */
>> +       throttle = false;
>> +
>> +err:
>> +       if (err == -EDEADLK) {
>> +               err = i915_gem_ww_ctx_backoff(&eb->ww);
>> +               if (!err)
>> +                       goto retry;
>> +       }
>> +
>> +       return err;
>> +}
>> +
>> +/*
>> + * Using two helper loops for the order of which requests / batches
>> are created
>> + * and added the to backend. Requests are created in order from the
>> parent to
>> + * the last child. Requests are added in the reverse order, from the
>> last child
>> + * to parent. This is done for locking reasons as the timeline lock
>> is acquired
>> + * during request creation and released when the request is added to
>> the
>> + * backend. To make lockdep happy (see intel_context_timeline_lock)
>> this must be
>> + * the ordering.
>> + */
>> +#define for_each_batch_create_order(_eb, _i) \
>> +       for ((_i) = 0; (_i) < (_eb)->num_batches; ++(_i))
>> +#define for_each_batch_add_order(_eb, _i) \
>> +       BUILD_BUG_ON(!typecheck(int, _i)); \
>> +       for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
>> +
>> +static int eb_move_to_gpu(struct i915_execbuffer *eb)
>> +{
>> +       /* Unconditionally flush any chipset caches (for streaming
>> writes). */
>> +       intel_gt_chipset_flush(eb->gt);
>> +
>> +       return 0;
>> +}
>> +
>> +static int eb_request_submit(struct i915_execbuffer *eb,
>> +                            struct i915_request *rq,
>> +                            struct i915_vma *batch,
>> +                            u64 batch_len)
>> +{
>> +       int err;
>> +
>> +       if (intel_context_nopreempt(rq->context))
>> +               __set_bit(I915_FENCE_FLAG_NOPREEMPT, &rq-
>> >fence.flags);
>> +
>> +       /*
>> +        * After we completed waiting for other engines (using HW
>> semaphores)
>> +        * then we can signal that this request/batch is ready to
>> run. This
>> +        * allows us to determine if the batch is still waiting on
>> the GPU
>> +        * or actually running by checking the breadcrumb.
>> +        */
>> +       if (rq->context->engine->emit_init_breadcrumb) {
>> +               err = rq->context->engine->emit_init_breadcrumb(rq);
>> +               if (err)
>> +                       return err;
>> +       }
>> +
>> +       err = rq->context->engine->emit_bb_start(rq,
>> +                                                batch->node.start,
>> +                                                batch_len, 0);
>> +       if (err)
>> +               return err;
>> +
>> +       return 0;
>> +}
>> +
>> +static int eb_submit(struct i915_execbuffer *eb)
>> +{
>> +       unsigned int i;
>> +       int err;
>> +
>> +       err = eb_move_to_gpu(eb);
>> +
>> +       for_each_batch_create_order(eb, i) {
>> +               if (!eb->requests[i])
>> +                       break;
>> +
>> +               trace_i915_request_queue(eb->requests[i], 0);
>> +               if (!err)
>> +                       err = eb_request_submit(eb, eb->requests[i],
>> +                                               eb->batches[i],
>> +                                               eb->batches[i]-
>> >size);
>> +       }
>> +
>> +       return err;
>> +}
>> +
>> +static struct i915_request *eb_throttle(struct i915_execbuffer *eb,
>> struct intel_context *ce)
>> +{
>> +       struct intel_ring *ring = ce->ring;
>> +       struct intel_timeline *tl = ce->timeline;
>> +       struct i915_request *rq;
>> +
>> +       /*
>> +        * Completely unscientific finger-in-the-air estimates for
>> suitable
>> +        * maximum user request size (to avoid blocking) and then
>> backoff.
>> +        */
>> +       if (intel_ring_update_space(ring) >= PAGE_SIZE)
>> +               return NULL;
>> +
>> +       /*
>> +        * Find a request that after waiting upon, there will be at
>> least half
>> +        * the ring available. The hysteresis allows us to compete
>> for the
>> +        * shared ring and should mean that we sleep less often prior
>> to
>> +        * claiming our resources, but not so long that the ring
>> completely
>> +        * drains before we can submit our next request.
>> +        */
>> +       list_for_each_entry(rq, &tl->requests, link) {
>> +               if (rq->ring != ring)
>> +                       continue;
>> +
>> +               if (__intel_ring_space(rq->postfix,
>> +                                      ring->emit, ring->size) >
>> ring->size / 2)
>> +                       break;
>> +       }
>> +       if (&rq->link == &tl->requests)
>> +               return NULL; /* weird, we will check again later for
>> real */
>> +
>> +       return i915_request_get(rq);
>> +}
>> +
>> +static int eb_pin_timeline(struct i915_execbuffer *eb, struct
>> intel_context *ce,
>> +                          bool throttle)
>> +{
>> +       struct intel_timeline *tl;
>> +       struct i915_request *rq = NULL;
>> +
>> +       /*
>> +        * Take a local wakeref for preparing to dispatch the execbuf
>> as
>> +        * we expect to access the hardware fairly frequently in the
>> +        * process, and require the engine to be kept awake between
>> accesses.
>> +        * Upon dispatch, we acquire another prolonged wakeref that
>> we hold
>> +        * until the timeline is idle, which in turn releases the
>> wakeref
>> +        * taken on the engine, and the parent device.
>> +        */
>> +       tl = intel_context_timeline_lock(ce);
>> +       if (IS_ERR(tl))
>> +               return PTR_ERR(tl);
>> +
>> +       intel_context_enter(ce);
>> +       if (throttle)
>> +               rq = eb_throttle(eb, ce);
>> +       intel_context_timeline_unlock(tl);
>> +
>> +       if (rq) {
>> +               bool nonblock = eb->file->filp->f_flags & O_NONBLOCK;
>> +               long timeout = nonblock ? 0 : MAX_SCHEDULE_TIMEOUT;
>> +
>> +               if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
>> +                                     timeout) < 0) {
>> +                       i915_request_put(rq);
>> +
>> +                       /*
>> +                        * Error path, cannot use
>> intel_context_timeline_lock as
>> +                        * that is user interruptable and this clean
>> up step
>> +                        * must be done.
>> +                        */
>> +                       mutex_lock(&ce->timeline->mutex);
>> +                       intel_context_exit(ce);
>> +                       mutex_unlock(&ce->timeline->mutex);
>> +
>> +                       if (nonblock)
>> +                               return -EWOULDBLOCK;
>> +                       else
>> +                               return -EINTR;
>> +               }
>> +               i915_request_put(rq);
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle)
>> +{
>> +       struct intel_context *ce = eb->context, *child;
>> +       int err;
>> +       int i = 0, j = 0;
>> +
>> +       GEM_BUG_ON(eb->args->flags & __EXEC3_ENGINE_PINNED);
>> +
>> +       if (unlikely(intel_context_is_banned(ce)))
>> +               return -EIO;
>> +
>> +       /*
>> +        * Pinning the contexts may generate requests in order to
>> acquire
>> +        * GGTT space, so do this first before we reserve a seqno for
>> +        * ourselves.
>> +        */
>> +       err = intel_context_pin_ww(ce, &eb->ww);
>> +       if (err)
>> +               return err;
>> +       for_each_child(ce, child) {
>> +               err = intel_context_pin_ww(child, &eb->ww);
>> +               GEM_BUG_ON(err);        /* perma-pinned should incr a
>> counter */
>> +       }
>> +
>> +       for_each_child(ce, child) {
>> +               err = eb_pin_timeline(eb, child, throttle);
>> +               if (err)
>> +                       goto unwind;
>> +               ++i;
>> +       }
>> +       err = eb_pin_timeline(eb, ce, throttle);
>> +       if (err)
>> +               goto unwind;
>> +
>> +       eb->args->flags |= __EXEC3_ENGINE_PINNED;
>> +       return 0;
>> +
>> +unwind:
>> +       for_each_child(ce, child) {
>> +               if (j++ < i) {
>> +                       mutex_lock(&child->timeline->mutex);
>> +                       intel_context_exit(child);
>> +                       mutex_unlock(&child->timeline->mutex);
>> +               }
>> +       }
>> +       for_each_child(ce, child)
>> +               intel_context_unpin(child);
>> +       intel_context_unpin(ce);
>> +       return err;
>> +}
>> +
>> +static void eb_unpin_engine(struct i915_execbuffer *eb)
>> +{
>> +       struct intel_context *ce = eb->context, *child;
>> +
>> +       if (!(eb->args->flags & __EXEC3_ENGINE_PINNED))
>> +               return;
>> +
>> +       eb->args->flags &= ~__EXEC3_ENGINE_PINNED;
>> +
>> +       for_each_child(ce, child) {
>> +               mutex_lock(&child->timeline->mutex);
>> +               intel_context_exit(child);
>> +               mutex_unlock(&child->timeline->mutex);
>> +
>> +               intel_context_unpin(child);
>> +       }
>> +
>> +       mutex_lock(&ce->timeline->mutex);
>> +       intel_context_exit(ce);
>> +       mutex_unlock(&ce->timeline->mutex);
>> +
>> +       intel_context_unpin(ce);
>> +}
>> +
>> +static int
>> +eb_select_engine(struct i915_execbuffer *eb)
>> +{
>> +       struct intel_context *ce, *child;
>> +       unsigned int idx;
>> +       int err;
>> +
>> +       if (!i915_gem_context_user_engines(eb->gem_context))
>> +               return -EINVAL;
>> +
>> +       idx = eb->args->engine_idx;
>> +       ce = i915_gem_context_get_engine(eb->gem_context, idx);
>> +       if (IS_ERR(ce))
>> +               return PTR_ERR(ce);
>> +
>> +       eb->num_batches = ce->parallel.number_children + 1;
>> +
>> +       for_each_child(ce, child)
>> +               intel_context_get(child);
>> +       intel_gt_pm_get(ce->engine->gt);
>> +
>> +       if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
>> +               err = intel_context_alloc_state(ce);
>> +               if (err)
>> +                       goto err;
>> +       }
>> +       for_each_child(ce, child) {
>> +               if (!test_bit(CONTEXT_ALLOC_BIT, &child->flags)) {
>> +                       err = intel_context_alloc_state(child);
>> +                       if (err)
>> +                               goto err;
>> +               }
>> +       }
>> +
>> +       /*
>> +        * ABI: Before userspace accesses the GPU (e.g. execbuffer),
>> report
>> +        * EIO if the GPU is already wedged.
>> +        */
>> +       err = intel_gt_terminally_wedged(ce->engine->gt);
>> +       if (err)
>> +               goto err;
>> +
>> +       if (!i915_vm_tryget(ce->vm)) {
>> +               err = -ENOENT;
>> +               goto err;
>> +       }
>> +
>> +       eb->context = ce;
>> +       eb->gt = ce->engine->gt;
>> +
>> +       /*
>> +        * Make sure engine pool stays alive even if we call
>> intel_context_put
>> +        * during ww handling. The pool is destroyed when last pm
>> reference
>> +        * is dropped, which breaks our -EDEADLK handling.
>> +        */
>> +       return err;
>> +
>> +err:
>> +       intel_gt_pm_put(ce->engine->gt);
>> +       for_each_child(ce, child)
>> +               intel_context_put(child);
>> +       intel_context_put(ce);
>> +       return err;
>> +}
>> +
>> +static void
>> +eb_put_engine(struct i915_execbuffer *eb)
>> +{
>> +       struct intel_context *child;
>> +
>> +       i915_vm_put(eb->context->vm);
>> +       intel_gt_pm_put(eb->gt);
>> +       for_each_child(eb->context, child)
>> +               intel_context_put(child);
>> +       intel_context_put(eb->context);
>> +}
>> +
>> +static void
>> +__free_fence_array(struct eb_fence *fences, unsigned int n)
>> +{
>> +       while (n--) {
>> +               drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
>> +               dma_fence_put(fences[n].dma_fence);
>> +               dma_fence_chain_free(fences[n].chain_fence);
>> +       }
>> +       kvfree(fences);
>> +}
>> +
>> +static int add_timeline_fence_array(struct i915_execbuffer *eb)
>> +{
>> +       struct drm_i915_gem_timeline_fence __user *user_fences;
>> +       struct eb_fence *f;
>> +       u64 nfences;
>> +       int err = 0;
>> +
>> +       nfences = eb->args->fence_count;
>> +       if (!nfences)
>> +               return 0;
>> +
>> +       /* Check multiplication overflow for access_ok() and
>> kvmalloc_array() */
>> +       BUILD_BUG_ON(sizeof(size_t) > sizeof(unsigned long));
>> +       if (nfences > min_t(unsigned long,
>> +                           ULONG_MAX / sizeof(*user_fences),
>> +                           SIZE_MAX / sizeof(*f)) - eb->num_fences)
>> +               return -EINVAL;
>> +
>> +       user_fences = u64_to_user_ptr(eb->args->timeline_fences);
>> +       if (!access_ok(user_fences, nfences * sizeof(*user_fences)))
>> +               return -EFAULT;
>> +
>> +       f = krealloc(eb->fences,
>> +                    (eb->num_fences + nfences) * sizeof(*f),
>> +                    __GFP_NOWARN | GFP_KERNEL);
>> +       if (!f)
>> +               return -ENOMEM;
>> +
>> +       eb->fences = f;
>> +       f += eb->num_fences;
>> +
>> +       BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
>> +                    ~__I915_TIMELINE_FENCE_UNKNOWN_FLAGS);
>> +
>> +       while (nfences--) {
>> +               struct drm_i915_gem_timeline_fence user_fence;
>> +               struct drm_syncobj *syncobj;
>> +               struct dma_fence *fence = NULL;
>> +               u64 point;
>> +
>> +               if (__copy_from_user(&user_fence,
>> +                                    user_fences++,
>> +                                    sizeof(user_fence)))
>> +                       return -EFAULT;
>> +
>> +               if (user_fence.flags &
>> __I915_TIMELINE_FENCE_UNKNOWN_FLAGS)
>> +                       return -EINVAL;
>> +
>> +               syncobj = drm_syncobj_find(eb->file,
>> user_fence.handle);
>> +               if (!syncobj) {
>> +                       DRM_DEBUG("Invalid syncobj handle
>> provided\n");
>> +                       return -ENOENT;
>> +               }
>> +
>> +               fence = drm_syncobj_fence_get(syncobj);
>> +
>> +               if (!fence && user_fence.flags &&
>> +                   !(user_fence.flags & I915_TIMELINE_FENCE_SIGNAL))
>> {
>> +                       DRM_DEBUG("Syncobj handle has no fence\n");
>> +                       drm_syncobj_put(syncobj);
>> +                       return -EINVAL;
>> +               }
>> +
>> +               point = user_fence.value;
>> +               if (fence)
>> +                       err = dma_fence_chain_find_seqno(&fence,
>> point);
>> +
>> +               if (err && !(user_fence.flags &
>> I915_TIMELINE_FENCE_SIGNAL)) {
>> +                       DRM_DEBUG("Syncobj handle missing requested
>> point %llu\n", point);
>> +                       dma_fence_put(fence);
>> +                       drm_syncobj_put(syncobj);
>> +                       return err;
>> +               }
>> +
>> +               /*
>> +                * A point might have been signaled already and
>> +                * garbage collected from the timeline. In this case
>> +                * just ignore the point and carry on.
>> +                */
>> +               if (!fence && !(user_fence.flags &
>> I915_TIMELINE_FENCE_SIGNAL)) {
>> +                       drm_syncobj_put(syncobj);
>> +                       continue;
>> +               }
>> +
>> +               /*
>> +                * For timeline syncobjs we need to preallocate
>> chains for
>> +                * later signaling.
>> +                */
>> +               if (point != 0 && user_fence.flags &
>> I915_TIMELINE_FENCE_SIGNAL) {
>> +                       /*
>> +                        * Waiting and signaling the same point (when
>> point !=
>> +                        * 0) would break the timeline.
>> +                        */
>> +                       if (user_fence.flags &
>> I915_TIMELINE_FENCE_WAIT) {
>> +                               DRM_DEBUG("Trying to wait & signal
>> the same timeline point.\n");
>> +                               dma_fence_put(fence);
>> +                               drm_syncobj_put(syncobj);
>> +                               return -EINVAL;
>> +                       }
>> +
>> +                       f->chain_fence = dma_fence_chain_alloc();
>> +                       if (!f->chain_fence) {
>> +                               drm_syncobj_put(syncobj);
>> +                               dma_fence_put(fence);
>> +                               return -ENOMEM;
>> +                       }
>> +               } else {
>> +                       f->chain_fence = NULL;
>> +               }
>> +
>> +               f->syncobj = ptr_pack_bits(syncobj, user_fence.flags,
>> 2);
>> +               f->dma_fence = fence;
>> +               f->value = point;
>> +               f++;
>> +               eb->num_fences++;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static void put_fence_array(struct eb_fence *fences, int num_fences)
>> +{
>> +       if (fences)
>> +               __free_fence_array(fences, num_fences);
>> +}
>> +
>> +static int
>> +await_fence_array(struct i915_execbuffer *eb,
>> +                 struct i915_request *rq)
>> +{
>> +       unsigned int n;
>> +       int err;
>> +
>> +       for (n = 0; n < eb->num_fences; n++) {
>> +               struct drm_syncobj *syncobj;
>> +               unsigned int flags;
>> +
>> +               syncobj = ptr_unpack_bits(eb->fences[n].syncobj,
>> &flags, 2);
>> +
>> +               if (!eb->fences[n].dma_fence)
>> +                       continue;
>> +
>> +               err = i915_request_await_dma_fence(rq, eb-
>> >fences[n].dma_fence);
>> +               if (err < 0)
>> +                       return err;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static void signal_fence_array(const struct i915_execbuffer *eb,
>> +                              struct dma_fence * const fence)
>> +{
>> +       unsigned int n;
>> +
>> +       for (n = 0; n < eb->num_fences; n++) {
>> +               struct drm_syncobj *syncobj;
>> +               unsigned int flags;
>> +
>> +               syncobj = ptr_unpack_bits(eb->fences[n].syncobj,
>> &flags, 2);
>> +               if (!(flags & I915_TIMELINE_FENCE_SIGNAL))
>> +                       continue;
>> +
>> +               if (eb->fences[n].chain_fence) {
>> +                       drm_syncobj_add_point(syncobj,
>> +                                             eb-
>> >fences[n].chain_fence,
>> +                                             fence,
>> +                                             eb->fences[n].value);
>> +                       /*
>> +                        * The chain's ownership is transferred to
>> the
>> +                        * timeline.
>> +                        */
>> +                       eb->fences[n].chain_fence = NULL;
>> +               } else {
>> +                       drm_syncobj_replace_fence(syncobj, fence);
>> +               }
>> +       }
>> +}
>> +
>> +static int parse_timeline_fences(struct i915_execbuffer *eb)
>> +{
>> +       return add_timeline_fence_array(eb);
>> +}
>> +
>> +static int parse_batch_addresses(struct i915_execbuffer *eb)
>> +{
>> +       struct drm_i915_gem_execbuffer3 *args = eb->args;
>> +       u64 __user *batch_addr = u64_to_user_ptr(args-
>> >batch_address);
>> +
>> +       if (copy_from_user(eb->batch_addresses, batch_addr,
>> +                          sizeof(batch_addr[0]) * eb->num_batches))
>> +               return -EFAULT;
>> +
>> +       return 0;
>> +}
>> +
>> +static void retire_requests(struct intel_timeline *tl, struct
>> i915_request *end)
>> +{
>> +       struct i915_request *rq, *rn;
>> +
>> +       list_for_each_entry_safe(rq, rn, &tl->requests, link)
>> +               if (rq == end || !i915_request_retire(rq))
>> +                       break;
>> +}
>> +
>> +static int eb_request_add(struct i915_execbuffer *eb, struct
>> i915_request *rq,
>> +                         int err, bool last_parallel)
>> +{
>> +       struct intel_timeline * const tl = i915_request_timeline(rq);
>> +       struct i915_sched_attr attr = {};
>> +       struct i915_request *prev;
>> +
>> +       lockdep_assert_held(&tl->mutex);
>> +       lockdep_unpin_lock(&tl->mutex, rq->cookie);
>> +
>> +       trace_i915_request_add(rq);
>> +
>> +       prev = __i915_request_commit(rq);
>> +
>> +       /* Check that the context wasn't destroyed before submission
>> */
>> +       if (likely(!intel_context_is_closed(eb->context))) {
>> +               attr = eb->gem_context->sched;
>> +       } else {
>> +               /* Serialise with context_close via the
>> add_to_timeline */
>> +               i915_request_set_error_once(rq, -ENOENT);
>> +               __i915_request_skip(rq);
>> +               err = -ENOENT; /* override any transient errors */
>> +       }
>> +
>> +       if (intel_context_is_parallel(eb->context)) {
>> +               if (err) {
>> +                       __i915_request_skip(rq);
>> +                       set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
>> +                               &rq->fence.flags);
>> +               }
>> +               if (last_parallel)
>> +                       set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
>> +                               &rq->fence.flags);
>> +       }
>> +
>> +       __i915_request_queue(rq, &attr);
>> +
>> +       /* Try to clean up the client's timeline after submitting the
>> request */
>> +       if (prev)
>> +               retire_requests(tl, prev);
>> +
>> +       mutex_unlock(&tl->mutex);
>> +
>> +       return err;
>> +}
>> +
>> +static int eb_requests_add(struct i915_execbuffer *eb, int err)
>> +{
>> +       int i;
>> +
>> +       /*
>> +        * We iterate in reverse order of creation to release
>> timeline mutexes in
>> +        * same order.
>> +        */
>> +       for_each_batch_add_order(eb, i) {
>> +               struct i915_request *rq = eb->requests[i];
>> +
>> +               if (!rq)
>> +                       continue;
>> +               err |= eb_request_add(eb, rq, err, i == 0);
>> +       }
>> +
>> +       return err;
>> +}
>> +
>> +static void eb_requests_get(struct i915_execbuffer *eb)
>> +{
>> +       unsigned int i;
>> +
>> +       for_each_batch_create_order(eb, i) {
>> +               if (!eb->requests[i])
>> +                       break;
>> +
>> +               i915_request_get(eb->requests[i]);
>> +       }
>> +}
>> +
>> +static void eb_requests_put(struct i915_execbuffer *eb)
>> +{
>> +       unsigned int i;
>> +
>> +       for_each_batch_create_order(eb, i) {
>> +               if (!eb->requests[i])
>> +                       break;
>> +
>> +               i915_request_put(eb->requests[i]);
>> +       }
>> +}
>> +
>> +static int
>> +eb_composite_fence_create(struct i915_execbuffer *eb)
>> +{
>> +       struct dma_fence_array *fence_array;
>> +       struct dma_fence **fences;
>> +       unsigned int i;
>> +
>> +       GEM_BUG_ON(!intel_context_is_parent(eb->context));
>> +
>> +       fences = kmalloc_array(eb->num_batches, sizeof(*fences),
>> GFP_KERNEL);
>> +       if (!fences)
>> +               return -ENOMEM;
>> +
>> +       for_each_batch_create_order(eb, i) {
>> +               fences[i] = &eb->requests[i]->fence;
>> +               __set_bit(I915_FENCE_FLAG_COMPOSITE,
>> +                         &eb->requests[i]->fence.flags);
>> +       }
>> +
>> +       fence_array = dma_fence_array_create(eb->num_batches,
>> +                                            fences,
>> +                                            eb->context-
>> >parallel.fence_context,
>> +                                            eb->context-
>> >parallel.seqno++,
>> +                                            false);
>> +       if (!fence_array) {
>> +               kfree(fences);
>> +               return -ENOMEM;
>> +       }
>> +
>> +       /* Move ownership to the dma_fence_array created above */
>> +       for_each_batch_create_order(eb, i)
>> +               dma_fence_get(fences[i]);
>> +
>> +       eb->composite_fence = &fence_array->base;
>> +
>> +       return 0;
>> +}
>> +
>> +static int
>> +eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq)
>> +{
>> +       int err;
>> +
>> +       if (unlikely(eb->gem_context->syncobj)) {
>> +               struct dma_fence *fence;
>> +
>> +               fence = drm_syncobj_fence_get(eb->gem_context-
>> >syncobj);
>> +               err = i915_request_await_dma_fence(rq, fence);
>> +               dma_fence_put(fence);
>> +               if (err)
>> +                       return err;
>> +       }
>> +
>> +       if (eb->fences) {
>> +               err = await_fence_array(eb, rq);
>> +               if (err)
>> +                       return err;
>> +       }
>> +
>> +       if (intel_context_is_parallel(eb->context)) {
>> +               err = eb_composite_fence_create(eb);
>> +               if (err)
>> +                       return err;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static struct intel_context *
>> +eb_find_context(struct i915_execbuffer *eb, unsigned int
>> context_number)
>> +{
>> +       struct intel_context *child;
>> +
>> +       if (likely(context_number == 0))
>> +               return eb->context;
>> +
>> +       for_each_child(eb->context, child)
>> +               if (!--context_number)
>> +                       return child;
>> +
>> +       GEM_BUG_ON("Context not found");
>> +
>> +       return NULL;
>> +}
>> +
>> +static int eb_requests_create(struct i915_execbuffer *eb)
>> +{
>> +       unsigned int i;
>> +       int err;
>> +
>> +       for_each_batch_create_order(eb, i) {
>> +               /* Allocate a request for this batch buffer nice and
>> early. */
>> +               eb->requests[i] =
>> i915_request_create(eb_find_context(eb, i));
>> +               if (IS_ERR(eb->requests[i])) {
>> +                       err = PTR_ERR(eb->requests[i]);
>> +                       eb->requests[i] = NULL;
>> +                       return err;
>> +               }
>> +
>> +               /*
>> +                * Only the first request added (committed to
>> backend) has to
>> +                * take the in fences into account as all subsequent
>> requests
>> +                * will have fences inserted inbetween them.
>> +                */
>> +               if (i + 1 == eb->num_batches) {
>> +                       err = eb_fences_add(eb, eb->requests[i]);
>> +                       if (err)
>> +                               return err;
>> +               }
>
>One thing I was hoping with the brand new execbuf3 IOCTL would be that
>we could actually make it dma_fence_signalling critical path compliant.
>
>That would mean annotate the dma_fence_signalling critical path just
>after the first request is created and ending it just before that same
>request was added.
>
>The main violators are memory allocated when adding dependencies in
>eb_fences_add(), but since those now are sort of limited in number, we
>might be able to pre-allocate that memory before the first request is
>created.
>
>The other main violator would be the multiple batch-buffers. Is this
>mode of operation strictly needed for version 1 or can we ditch it?
>

Hmm...I am not sure about multiple batch-buffers question. As Mesa is
the primary usecase, probably Mesa folks can answer that?

But, given we have to support to in any case, may be we can keep that
support and take on the dma_fence_signaling critical path annotation
separately in a subsequent patch series?

Niranjana

>
>
>> +
>> +               /*
>> +                * Not really on stack, but we don't want to call
>> +                * kfree on the batch_snapshot when we put it, so use
>> the
>> +                * _onstack interface.
>
>This comment is stale and can be removed.
>
>
>/Thomas
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 06/10] drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
  2022-07-08 13:47       ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-08 14:37         ` Hellstrom, Thomas
  -1 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-08 14:37 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	 Daniel, christian.koenig

Hi,

On Fri, 2022-07-08 at 06:47 -0700, Niranjana Vishwanathapura wrote:
> On Thu, Jul 07, 2022 at 07:41:54AM -0700, Hellstrom, Thomas wrote:
> > On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> > > Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
> > > works in vm_bind mode. The vm_bind mode only works with
> > > this new execbuf3 ioctl.
> > > 
> > > The new execbuf3 ioctl will not have any execlist
> > 
> > I understand this that you mean there is no list of objects to
> > validate
> > attached to the drm_i915_gem_execbuffer3 structure rather than that
> > the
> > execlists submission backend is never used. Could we clarify this
> > to
> > avoid confusion.
> 
> Yah, side effect of overloading the word 'execlist' for multiple
> things.
> Yah, I meant, no list of objects to validate. I agree, we need to
> clarify
> that here.
> 
> > 
> > 
> > >  support
> > > and all the legacy support like relocations etc are removed.
> > > 
> > > Signed-off-by: Niranjana Vishwanathapura
> > > <niranjana.vishwanathapura@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/Makefile                 |    1 +
> > >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |    5 +
> > >  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1029
> > > +++++++++++++++++
> > >  drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
> > >  drivers/gpu/drm/i915/i915_driver.c            |    1 +
> > >  include/uapi/drm/i915_drm.h                   |   67 +-
> > >  6 files changed, 1104 insertions(+), 1 deletion(-)
> > >  create mode 100644
> > > drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > 
> > > diff --git a/drivers/gpu/drm/i915/Makefile
> > > b/drivers/gpu/drm/i915/Makefile
> > > index 4e1627e96c6e..38cd1c5bc1a5 100644
> > > --- a/drivers/gpu/drm/i915/Makefile
> > > +++ b/drivers/gpu/drm/i915/Makefile
> > > @@ -148,6 +148,7 @@ gem-y += \
> > >         gem/i915_gem_dmabuf.o \
> > >         gem/i915_gem_domain.o \
> > >         gem/i915_gem_execbuffer.o \
> > > +       gem/i915_gem_execbuffer3.o \
> > >         gem/i915_gem_internal.o \
> > >         gem/i915_gem_object.o \
> > >         gem/i915_gem_lmem.o \
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > index b7b2c14fd9e1..37bb1383ab8f 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > @@ -782,6 +782,11 @@ static int eb_select_context(struct
> > > i915_execbuffer *eb)
> > >         if (unlikely(IS_ERR(ctx)))
> > >                 return PTR_ERR(ctx);
> > > 
> > > +       if (ctx->vm->vm_bind_mode) {
> > > +               i915_gem_context_put(ctx);
> > > +               return -EOPNOTSUPP;
> > > +       }
> > > +
> > >         eb->gem_context = ctx;
> > >         if (i915_gem_context_has_full_ppgtt(ctx))
> > >                 eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > new file mode 100644
> > > index 000000000000..13121df72e3d
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > @@ -0,0 +1,1029 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright © 2022 Intel Corporation
> > > + */
> > > +
> > > +#include <linux/dma-resv.h>
> > > +#include <linux/sync_file.h>
> > > +#include <linux/uaccess.h>
> > > +
> > > +#include <drm/drm_syncobj.h>
> > > +
> > > +#include "gt/intel_context.h"
> > > +#include "gt/intel_gpu_commands.h"
> > > +#include "gt/intel_gt.h"
> > > +#include "gt/intel_gt_pm.h"
> > > +#include "gt/intel_ring.h"
> > > +
> > > +#include "i915_drv.h"
> > > +#include "i915_file_private.h"
> > > +#include "i915_gem_context.h"
> > > +#include "i915_gem_ioctls.h"
> > > +#include "i915_gem_vm_bind.h"
> > > +#include "i915_trace.h"
> > > +
> > > +#define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
> > > +#define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
> > > +
> > > +/* Catch emission of unexpected errors for CI! */
> > > +#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
> > > +#undef EINVAL
> > > +#define EINVAL ({ \
> > > +       DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__,
> > > __LINE__); \
> > > +       22; \
> > > +})
> > > +#endif
> > > +
> > > +/**
> > > + * DOC: User command execution with execbuf3 ioctl
> > > + *
> > > + * A VM in VM_BIND mode will not support older execbuf mode of
> > > binding.
> > > + * The execbuf ioctl handling in VM_BIND mode differs
> > > significantly
> > > from the
> > > + * older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
> > > + * Hence, a new execbuf3 ioctl has been added to support VM_BIND
> > > mode. (See
> > > + * struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not
> > > accept any
> > > + * execlist. Hence, no support for implicit sync.
> > > + *
> > > + * The new execbuf3 ioctl only works in VM_BIND mode and the
> > > VM_BIND
> > > mode only
> > > + * works with execbuf3 ioctl for submission.
> > > + *
> > > + * The execbuf3 ioctl directly specifies the batch addresses
> > > instead
> > > of as
> > > + * object handles as in execbuf2 ioctl. The execbuf3 ioctl will
> > > also
> > > not
> > > + * support many of the older features like in/out/submit fences,
> > > fence array,
> > > + * default gem context etc. (See struct
> > > drm_i915_gem_execbuffer3).
> > > + *
> > > + * In VM_BIND mode, VA allocation is completely managed by the
> > > user
> > > instead of
> > > + * the i915 driver. Hence all VA assignment, eviction are not
> > > applicable in
> > > + * VM_BIND mode. Also, for determining object activeness,
> > > VM_BIND
> > > mode will not
> > > + * be using the i915_vma active reference tracking. It will
> > > instead
> > > check the
> > > + * dma-resv object's fence list for that.
> > > + *
> > > + * So, a lot of code supporting execbuf2 ioctl, like
> > > relocations, VA
> > > evictions,
> > > + * vma lookup table, implicit sync, vma active reference
> > > tracking
> > > etc., are not
> > > + * applicable for execbuf3 ioctl.
> > > + */
> > > +
> > > +struct eb_fence {
> > > +       struct drm_syncobj *syncobj; /* Use with ptr_mask_bits()
> > > */
> > > +       struct dma_fence *dma_fence;
> > > +       u64 value;
> > > +       struct dma_fence_chain *chain_fence;
> > > +};
> > > +
> > > +struct i915_execbuffer {
> > > +       struct drm_i915_private *i915; /** i915 backpointer */
> > > +       struct drm_file *file; /** per-file lookup tables and
> > > limits
> > > */
> > > +       struct drm_i915_gem_execbuffer3 *args; /** ioctl
> > > parameters
> > > */
> > > +
> > > +       struct intel_gt *gt; /* gt for the execbuf */
> > > +       struct intel_context *context; /* logical state for the
> > > request */
> > > +       struct i915_gem_context *gem_context; /** caller's
> > > context */
> > > +
> > > +       /** our requests to build */
> > > +       struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
> > > +
> > > +       /** used for excl fence in dma_resv objects when > 1 BB
> > > submitted */
> > > +       struct dma_fence *composite_fence;
> > > +
> > > +       struct i915_gem_ww_ctx ww;
> > > +
> > > +       /* number of batches in execbuf IOCTL */
> > > +       unsigned int num_batches;
> > > +
> > > +       u64 batch_addresses[MAX_ENGINE_INSTANCE + 1];
> > > +       /** identity of the batch obj/vma */
> > > +       struct i915_vma *batches[MAX_ENGINE_INSTANCE + 1];
> > > +
> > > +       struct eb_fence *fences;
> > > +       unsigned long num_fences;
> > > +};
> > 
> > Kerneldoc structures please.
> > 
> > It seems we are duplicating a lot of code from i915_execbuffer.c.
> > Did
> > you consider
> > 
> > struct i915_execbuffer3 {
> > ...
> > };
> > 
> > struct i915_execbuffer2 {
> >        struct i915_execbuffer3 eb3;
> >        ...
> >        [members that are not common]
> > };
> > 
> > Allowing execbuffer2 to share the execbuffer3 code to some extent.
> > Not sure about the gain at this point though. My worry would be
> > that fo
> > r example fixes might be applied to one file and not the other.
> 
> I have added a TODO in the cover letter of this patch series to share
> the code between execbuf2 and execbuf3.
> But, I am not sure to what extent. Execbuf3 is much leaner than
> execbuf2
> and we don't want to make it bad by forcing code sharing with legacy
> path.
> We can perhaps abstract out some functions which takes specific
> arguments
> (instead of 'eb'), that way we can keep these structures separate and
> still
> share some code.


Fully agree we shouldn't make eb3 code more complicated because of eb2.
My question was more of using i915_execbuffer3 and its functions as a
"base class" and subclass it for eb2, eb2 adding and implementing
additional functionality it needs.

But OTOH I just learned we've might have been asked not to share any
code between those two from drm maintainers, so need to dig up that
discussion.

/Thomas




^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 06/10] drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
@ 2022-07-08 14:37         ` Hellstrom, Thomas
  0 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-08 14:37 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	 Daniel, christian.koenig

Hi,

On Fri, 2022-07-08 at 06:47 -0700, Niranjana Vishwanathapura wrote:
> On Thu, Jul 07, 2022 at 07:41:54AM -0700, Hellstrom, Thomas wrote:
> > On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
> > > Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
> > > works in vm_bind mode. The vm_bind mode only works with
> > > this new execbuf3 ioctl.
> > > 
> > > The new execbuf3 ioctl will not have any execlist
> > 
> > I understand this that you mean there is no list of objects to
> > validate
> > attached to the drm_i915_gem_execbuffer3 structure rather than that
> > the
> > execlists submission backend is never used. Could we clarify this
> > to
> > avoid confusion.
> 
> Yah, side effect of overloading the word 'execlist' for multiple
> things.
> Yah, I meant, no list of objects to validate. I agree, we need to
> clarify
> that here.
> 
> > 
> > 
> > >  support
> > > and all the legacy support like relocations etc are removed.
> > > 
> > > Signed-off-by: Niranjana Vishwanathapura
> > > <niranjana.vishwanathapura@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/Makefile                 |    1 +
> > >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |    5 +
> > >  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1029
> > > +++++++++++++++++
> > >  drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
> > >  drivers/gpu/drm/i915/i915_driver.c            |    1 +
> > >  include/uapi/drm/i915_drm.h                   |   67 +-
> > >  6 files changed, 1104 insertions(+), 1 deletion(-)
> > >  create mode 100644
> > > drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > 
> > > diff --git a/drivers/gpu/drm/i915/Makefile
> > > b/drivers/gpu/drm/i915/Makefile
> > > index 4e1627e96c6e..38cd1c5bc1a5 100644
> > > --- a/drivers/gpu/drm/i915/Makefile
> > > +++ b/drivers/gpu/drm/i915/Makefile
> > > @@ -148,6 +148,7 @@ gem-y += \
> > >         gem/i915_gem_dmabuf.o \
> > >         gem/i915_gem_domain.o \
> > >         gem/i915_gem_execbuffer.o \
> > > +       gem/i915_gem_execbuffer3.o \
> > >         gem/i915_gem_internal.o \
> > >         gem/i915_gem_object.o \
> > >         gem/i915_gem_lmem.o \
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > index b7b2c14fd9e1..37bb1383ab8f 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > @@ -782,6 +782,11 @@ static int eb_select_context(struct
> > > i915_execbuffer *eb)
> > >         if (unlikely(IS_ERR(ctx)))
> > >                 return PTR_ERR(ctx);
> > > 
> > > +       if (ctx->vm->vm_bind_mode) {
> > > +               i915_gem_context_put(ctx);
> > > +               return -EOPNOTSUPP;
> > > +       }
> > > +
> > >         eb->gem_context = ctx;
> > >         if (i915_gem_context_has_full_ppgtt(ctx))
> > >                 eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > new file mode 100644
> > > index 000000000000..13121df72e3d
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
> > > @@ -0,0 +1,1029 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright © 2022 Intel Corporation
> > > + */
> > > +
> > > +#include <linux/dma-resv.h>
> > > +#include <linux/sync_file.h>
> > > +#include <linux/uaccess.h>
> > > +
> > > +#include <drm/drm_syncobj.h>
> > > +
> > > +#include "gt/intel_context.h"
> > > +#include "gt/intel_gpu_commands.h"
> > > +#include "gt/intel_gt.h"
> > > +#include "gt/intel_gt_pm.h"
> > > +#include "gt/intel_ring.h"
> > > +
> > > +#include "i915_drv.h"
> > > +#include "i915_file_private.h"
> > > +#include "i915_gem_context.h"
> > > +#include "i915_gem_ioctls.h"
> > > +#include "i915_gem_vm_bind.h"
> > > +#include "i915_trace.h"
> > > +
> > > +#define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
> > > +#define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
> > > +
> > > +/* Catch emission of unexpected errors for CI! */
> > > +#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
> > > +#undef EINVAL
> > > +#define EINVAL ({ \
> > > +       DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__,
> > > __LINE__); \
> > > +       22; \
> > > +})
> > > +#endif
> > > +
> > > +/**
> > > + * DOC: User command execution with execbuf3 ioctl
> > > + *
> > > + * A VM in VM_BIND mode will not support older execbuf mode of
> > > binding.
> > > + * The execbuf ioctl handling in VM_BIND mode differs
> > > significantly
> > > from the
> > > + * older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
> > > + * Hence, a new execbuf3 ioctl has been added to support VM_BIND
> > > mode. (See
> > > + * struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not
> > > accept any
> > > + * execlist. Hence, no support for implicit sync.
> > > + *
> > > + * The new execbuf3 ioctl only works in VM_BIND mode and the
> > > VM_BIND
> > > mode only
> > > + * works with execbuf3 ioctl for submission.
> > > + *
> > > + * The execbuf3 ioctl directly specifies the batch addresses
> > > instead
> > > of as
> > > + * object handles as in execbuf2 ioctl. The execbuf3 ioctl will
> > > also
> > > not
> > > + * support many of the older features like in/out/submit fences,
> > > fence array,
> > > + * default gem context etc. (See struct
> > > drm_i915_gem_execbuffer3).
> > > + *
> > > + * In VM_BIND mode, VA allocation is completely managed by the
> > > user
> > > instead of
> > > + * the i915 driver. Hence all VA assignment, eviction are not
> > > applicable in
> > > + * VM_BIND mode. Also, for determining object activeness,
> > > VM_BIND
> > > mode will not
> > > + * be using the i915_vma active reference tracking. It will
> > > instead
> > > check the
> > > + * dma-resv object's fence list for that.
> > > + *
> > > + * So, a lot of code supporting execbuf2 ioctl, like
> > > relocations, VA
> > > evictions,
> > > + * vma lookup table, implicit sync, vma active reference
> > > tracking
> > > etc., are not
> > > + * applicable for execbuf3 ioctl.
> > > + */
> > > +
> > > +struct eb_fence {
> > > +       struct drm_syncobj *syncobj; /* Use with ptr_mask_bits()
> > > */
> > > +       struct dma_fence *dma_fence;
> > > +       u64 value;
> > > +       struct dma_fence_chain *chain_fence;
> > > +};
> > > +
> > > +struct i915_execbuffer {
> > > +       struct drm_i915_private *i915; /** i915 backpointer */
> > > +       struct drm_file *file; /** per-file lookup tables and
> > > limits
> > > */
> > > +       struct drm_i915_gem_execbuffer3 *args; /** ioctl
> > > parameters
> > > */
> > > +
> > > +       struct intel_gt *gt; /* gt for the execbuf */
> > > +       struct intel_context *context; /* logical state for the
> > > request */
> > > +       struct i915_gem_context *gem_context; /** caller's
> > > context */
> > > +
> > > +       /** our requests to build */
> > > +       struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
> > > +
> > > +       /** used for excl fence in dma_resv objects when > 1 BB
> > > submitted */
> > > +       struct dma_fence *composite_fence;
> > > +
> > > +       struct i915_gem_ww_ctx ww;
> > > +
> > > +       /* number of batches in execbuf IOCTL */
> > > +       unsigned int num_batches;
> > > +
> > > +       u64 batch_addresses[MAX_ENGINE_INSTANCE + 1];
> > > +       /** identity of the batch obj/vma */
> > > +       struct i915_vma *batches[MAX_ENGINE_INSTANCE + 1];
> > > +
> > > +       struct eb_fence *fences;
> > > +       unsigned long num_fences;
> > > +};
> > 
> > Kerneldoc structures please.
> > 
> > It seems we are duplicating a lot of code from i915_execbuffer.c.
> > Did
> > you consider
> > 
> > struct i915_execbuffer3 {
> > ...
> > };
> > 
> > struct i915_execbuffer2 {
> >        struct i915_execbuffer3 eb3;
> >        ...
> >        [members that are not common]
> > };
> > 
> > Allowing execbuffer2 to share the execbuffer3 code to some extent.
> > Not sure about the gain at this point though. My worry would be
> > that fo
> > r example fixes might be applied to one file and not the other.
> 
> I have added a TODO in the cover letter of this patch series to share
> the code between execbuf2 and execbuf3.
> But, I am not sure to what extent. Execbuf3 is much leaner than
> execbuf2
> and we don't want to make it bad by forcing code sharing with legacy
> path.
> We can perhaps abstract out some functions which takes specific
> arguments
> (instead of 'eb'), that way we can keep these structures separate and
> still
> share some code.


Fully agree we shouldn't make eb3 code more complicated because of eb2.
My question was more of using i915_execbuffer3 and its functions as a
"base class" and subclass it for eb2, eb2 adding and implementing
additional functionality it needs.

But OTOH I just learned we've might have been asked not to share any
code between those two from drm maintainers, so need to dig up that
discussion.

/Thomas




^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
  2022-07-08 13:43         ` [Intel-gfx] " Hellstrom, Thomas
@ 2022-07-08 14:44           ` Hellstrom, Thomas
  -1 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-08 14:44 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	 Daniel, christian.koenig

On Fri, 2022-07-08 at 15:43 +0200, Thomas Hellström wrote:
> > The vm_bind/bound_list and the non_priv_vm_bind_list are there for
> > very different reasons.
> > 
> > The reason for having separate vm_bind_list and vm_bound_list is
> > that
> > during the execbuf path, we can rebind the unbound mappings by
> > scooping
> > all unbound vmas back from bound list into the bind list and
> > binding
> > them. In fact, this probably can be done with a single vm_bind_list
> > and
> > a 'eb.bind_list' (local to execbuf3 ioctl) for rebinding.
> > 
> > The non_priv_vm_bind_list is just an optimization to loop only
> > through
> > non-priv objects while taking the locks in
> > eb_lock_persistent_vmas()
> > as only non-priv objects needs that (private objects are locked in
> > a
> > single shot with vm_priv_lock). A non-priv mapping will also be in
> > the
> > vm_bind/bound_list.
> > 
> > I think, we need to add this as documentation to be more clear.
> 
> OK, I understood it as private objects were either on the vm_bind
> list
> or vm_bound_list depending on whether they needed rebinding or not,
> and
> shared objects only on the non_priv_vm_bind list, and were always
> locked, validated and fenced...
> 
> Need to take a deeper look...
> 
> /Thomas
> 
> 
> 
> > 
> > Niranjana
> > 
> > 

Hmm. Just a quick thought on this, Since the non-private vm-bind
objects all need to be iterated through (locked and fenced and userptr
valid) on each execbuf, and checking for validation (resident and
bound) is a very quick check, then we'd never need to add them to the
rebind list at all, right? If so the rebind list would be exclusive to
vm-private objects.

Also I don't think the vm_bind list can be execbuf-local, since binding
may not have completed at vma_release time, at which point the objects
need to remain on the vm_bind list until the next execbuf...

/Thomas



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
@ 2022-07-08 14:44           ` Hellstrom, Thomas
  0 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-08 14:44 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	 Daniel, christian.koenig

On Fri, 2022-07-08 at 15:43 +0200, Thomas Hellström wrote:
> > The vm_bind/bound_list and the non_priv_vm_bind_list are there for
> > very different reasons.
> > 
> > The reason for having separate vm_bind_list and vm_bound_list is
> > that
> > during the execbuf path, we can rebind the unbound mappings by
> > scooping
> > all unbound vmas back from bound list into the bind list and
> > binding
> > them. In fact, this probably can be done with a single vm_bind_list
> > and
> > a 'eb.bind_list' (local to execbuf3 ioctl) for rebinding.
> > 
> > The non_priv_vm_bind_list is just an optimization to loop only
> > through
> > non-priv objects while taking the locks in
> > eb_lock_persistent_vmas()
> > as only non-priv objects needs that (private objects are locked in
> > a
> > single shot with vm_priv_lock). A non-priv mapping will also be in
> > the
> > vm_bind/bound_list.
> > 
> > I think, we need to add this as documentation to be more clear.
> 
> OK, I understood it as private objects were either on the vm_bind
> list
> or vm_bound_list depending on whether they needed rebinding or not,
> and
> shared objects only on the non_priv_vm_bind list, and were always
> locked, validated and fenced...
> 
> Need to take a deeper look...
> 
> /Thomas
> 
> 
> 
> > 
> > Niranjana
> > 
> > 

Hmm. Just a quick thought on this, Since the non-private vm-bind
objects all need to be iterated through (locked and fenced and userptr
valid) on each execbuf, and checking for validation (resident and
bound) is a very quick check, then we'd never need to add them to the
rebind list at all, right? If so the rebind list would be exclusive to
vm-private objects.

Also I don't think the vm_bind list can be execbuf-local, since binding
may not have completed at vma_release time, at which point the objects
need to remain on the vm_bind list until the next execbuf...

/Thomas



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 08/10] drm/i915/vm_bind: userptr dma-resv changes
  2022-07-07 13:11     ` [Intel-gfx] " Hellstrom, Thomas
@ 2022-07-08 14:51       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 14:51 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	Daniel, christian.koenig

On Thu, Jul 07, 2022 at 06:11:13AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> For persistent (vm_bind) vmas of userptr BOs, handle the user
>> page pinning by using the i915_gem_object_userptr_submit_init()
>> /done() functions
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 67
>> +++++++++++++++++++
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 16 +++++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  1 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  1 +
>>  4 files changed, 85 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> index 2079f5ca9010..bf13dd6d642e 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> @@ -22,6 +22,7 @@
>>  #include "i915_gem_vm_bind.h"
>>  #include "i915_trace.h"
>>
>> +#define __EXEC3_USERPTR_USED           BIT_ULL(34)
>>  #define __EXEC3_HAS_PIN                        BIT_ULL(33)
>>  #define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
>>  #define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
>> @@ -147,10 +148,36 @@ static void eb_scoop_unbound_vmas(struct
>> i915_address_space *vm)
>>         spin_unlock(&vm->vm_rebind_lock);
>>  }
>>
>> +static int eb_lookup_persistent_userptr_vmas(struct i915_execbuffer
>> *eb)
>> +{
>> +       struct i915_address_space *vm = eb->context->vm;
>> +       struct i915_vma *last_vma = NULL;
>> +       struct i915_vma *vma;
>> +       int err;
>> +
>> +       assert_vm_bind_held(vm);
>> +
>> +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
>> +               if (i915_gem_object_is_userptr(vma->obj)) {
>> +                       err =
>> i915_gem_object_userptr_submit_init(vma->obj);
>> +                       if (err)
>> +                               return err;
>> +
>> +                       last_vma = vma;
>> +               }
>> +       }
>> +
>
>Don't we need to loop also over non-private userptr objects?

No, as explained in other thread, non-private BOs will also be
there in vm_bind/bound_list.

>
>
>> +       if (last_vma)
>> +               eb->args->flags |= __EXEC3_USERPTR_USED;
>> +
>> +       return 0;
>> +}
>> +
>>  static int eb_lookup_vmas(struct i915_execbuffer *eb)
>>  {
>>         unsigned int i, current_batch = 0;
>>         struct i915_vma *vma;
>> +       int err = 0;
>>
>>         for (i = 0; i < eb->num_batches; i++) {
>>                 vma = eb_find_vma(eb->context->vm, eb-
>> >batch_addresses[i]);
>> @@ -163,6 +190,10 @@ static int eb_lookup_vmas(struct i915_execbuffer
>> *eb)
>>
>>         eb_scoop_unbound_vmas(eb->context->vm);
>>
>> +       err = eb_lookup_persistent_userptr_vmas(eb);
>> +       if (err)
>> +               return err;
>> +
>>         return 0;
>>  }
>>
>> @@ -358,15 +389,51 @@ static void
>> eb_persistent_vmas_move_to_active(struct i915_execbuffer *eb)
>>
>>  static int eb_move_to_gpu(struct i915_execbuffer *eb)
>>  {
>> +       int err = 0, j;
>> +
>>         assert_vm_bind_held(eb->context->vm);
>>         assert_vm_priv_held(eb->context->vm);
>>
>>         eb_persistent_vmas_move_to_active(eb);
>>
>> +#ifdef CONFIG_MMU_NOTIFIER
>> +       if (!err && (eb->args->flags & __EXEC3_USERPTR_USED)) {
>> +               struct i915_vma *vma;
>> +
>> +               assert_vm_bind_held(eb->context->vm);
>> +               assert_vm_priv_held(eb->context->vm);
>> +
>> +               read_lock(&eb->i915->mm.notifier_lock);
>> +               list_for_each_entry(vma, &eb->context->vm-
>> >vm_bind_list,
>> +                                   vm_bind_link) {
>> +                       if (!i915_gem_object_is_userptr(vma->obj))
>> +                               continue;
>> +
>> +                       err =
>> i915_gem_object_userptr_submit_done(vma->obj);
>> +                       if (err)
>> +                               break;
>> +               }
>> +
>> +               read_unlock(&eb->i915->mm.notifier_lock);
>> +       }
>
>Since we don't loop over the vm_bound_list, there is a need to check
>whether the rebind_list is empty here under the notifier_lock in read
>mode, and in that case, restart from eb_lookup_vmas(). That might also
>eliminate the need for the __EXEC3_USERPTR_USED flag?
>
>That will also catch any objects that were evicted between
>eb_lookup_vmas() where the rebind_list was last checked, and
>i915_gem_vm_priv_lock(), which prohibits further eviction, but if we
>want to catch these earlier (which I think is a good idea), we could
>check that the rebind_list is indeed empty just after taking the
>vm_priv_lock(), and if not, restart from eb_lookup_vmas().

Yah, right, we need to check rebind_list here and if not empty, restart
from lookup phase.
It is bit tricky with userptr here as the unbind happens during
submit_init() call after we scoop unbound vmas here, the vmas gets
re-added to rebind_list :(.
I think we need a separate 'invalidated_userptr_list' here and we
iterate through it for submit_init() and submit_done() calls (yes,
__EXEC3_USERPTR_USED flag won't be needed then).
And, we call, eb_scoop_unbound_vmas() after calling
eb_lookup_persistent_userptr_vmas(), so that we scoop all unbound
vmas properly.

>
>
>> +#endif
>> +
>> +       if (unlikely(err))
>> +               goto err_skip;
>> +
>>         /* Unconditionally flush any chipset caches (for streaming
>> writes). */
>>         intel_gt_chipset_flush(eb->gt);
>>
>>         return 0;
>> +
>> +err_skip:
>> +       for_each_batch_create_order(eb, j) {
>> +               if (!eb->requests[j])
>> +                       break;
>> +
>> +               i915_request_set_error_once(eb->requests[j], err);
>> +       }
>> +       return err;
>>  }
>>
>>  static int eb_request_submit(struct i915_execbuffer *eb,
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> index 1a8efa83547f..cae282b91618 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> @@ -263,6 +263,12 @@ int i915_gem_vm_bind_obj(struct
>> i915_address_space *vm,
>>                 goto put_obj;
>>         }
>>
>> +       if (i915_gem_object_is_userptr(obj)) {
>> +               ret = i915_gem_object_userptr_submit_init(obj);
>> +               if (ret)
>> +                       goto put_obj;
>> +       }
>> +
>>         ret = i915_gem_vm_bind_lock_interruptible(vm);
>>         if (ret)
>>                 goto put_obj;
>> @@ -295,6 +301,16 @@ int i915_gem_vm_bind_obj(struct
>> i915_address_space *vm,
>>         /* Make it evictable */
>>         __i915_vma_unpin(vma);
>>
>> +#ifdef CONFIG_MMU_NOTIFIER
>> +       if (i915_gem_object_is_userptr(obj)) {
>> +               write_lock(&vm->i915->mm.notifier_lock);
>
>Why do we need the lock in write mode here?

Looks like it was no intentional. Should switch to read_lock here.

Niranjana

>
>> +               ret = i915_gem_object_userptr_submit_done(obj);
>> +               write_unlock(&vm->i915->mm.notifier_lock);
>> +               if (ret)
>> +                       goto out_ww;
>> +       }
>> +#endif
>> +
>>         list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>>         i915_vm_bind_it_insert(vma, &vm->va);
>>         if (!obj->priv_root)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 55d5389b2c6c..4ab3bda644ff 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -295,6 +295,7 @@ void i915_address_space_init(struct
>> i915_address_space *vm, int subclass)
>>         GEM_BUG_ON(IS_ERR(vm->root_obj));
>>         INIT_LIST_HEAD(&vm->vm_rebind_list);
>>         spin_lock_init(&vm->vm_rebind_lock);
>> +       INIT_LIST_HEAD(&vm->invalidate_link);
>>  }
>>
>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index fe5485c4a1cd..f9edf11c144f 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -267,6 +267,7 @@ struct i915_address_space {
>>         struct list_head vm_bound_list;
>>         struct list_head vm_rebind_list;
>>         spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
>> +       struct list_head invalidate_link;
>>         /* va tree of persistent vmas */
>>         struct rb_root_cached va;
>>         struct list_head non_priv_vm_bind_list;
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 08/10] drm/i915/vm_bind: userptr dma-resv changes
@ 2022-07-08 14:51       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 14:51 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, christian.koenig

On Thu, Jul 07, 2022 at 06:11:13AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> For persistent (vm_bind) vmas of userptr BOs, handle the user
>> page pinning by using the i915_gem_object_userptr_submit_init()
>> /done() functions
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 67
>> +++++++++++++++++++
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 16 +++++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  1 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  1 +
>>  4 files changed, 85 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> index 2079f5ca9010..bf13dd6d642e 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> @@ -22,6 +22,7 @@
>>  #include "i915_gem_vm_bind.h"
>>  #include "i915_trace.h"
>>
>> +#define __EXEC3_USERPTR_USED           BIT_ULL(34)
>>  #define __EXEC3_HAS_PIN                        BIT_ULL(33)
>>  #define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
>>  #define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
>> @@ -147,10 +148,36 @@ static void eb_scoop_unbound_vmas(struct
>> i915_address_space *vm)
>>         spin_unlock(&vm->vm_rebind_lock);
>>  }
>>
>> +static int eb_lookup_persistent_userptr_vmas(struct i915_execbuffer
>> *eb)
>> +{
>> +       struct i915_address_space *vm = eb->context->vm;
>> +       struct i915_vma *last_vma = NULL;
>> +       struct i915_vma *vma;
>> +       int err;
>> +
>> +       assert_vm_bind_held(vm);
>> +
>> +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link) {
>> +               if (i915_gem_object_is_userptr(vma->obj)) {
>> +                       err =
>> i915_gem_object_userptr_submit_init(vma->obj);
>> +                       if (err)
>> +                               return err;
>> +
>> +                       last_vma = vma;
>> +               }
>> +       }
>> +
>
>Don't we need to loop also over non-private userptr objects?

No, as explained in other thread, non-private BOs will also be
there in vm_bind/bound_list.

>
>
>> +       if (last_vma)
>> +               eb->args->flags |= __EXEC3_USERPTR_USED;
>> +
>> +       return 0;
>> +}
>> +
>>  static int eb_lookup_vmas(struct i915_execbuffer *eb)
>>  {
>>         unsigned int i, current_batch = 0;
>>         struct i915_vma *vma;
>> +       int err = 0;
>>
>>         for (i = 0; i < eb->num_batches; i++) {
>>                 vma = eb_find_vma(eb->context->vm, eb-
>> >batch_addresses[i]);
>> @@ -163,6 +190,10 @@ static int eb_lookup_vmas(struct i915_execbuffer
>> *eb)
>>
>>         eb_scoop_unbound_vmas(eb->context->vm);
>>
>> +       err = eb_lookup_persistent_userptr_vmas(eb);
>> +       if (err)
>> +               return err;
>> +
>>         return 0;
>>  }
>>
>> @@ -358,15 +389,51 @@ static void
>> eb_persistent_vmas_move_to_active(struct i915_execbuffer *eb)
>>
>>  static int eb_move_to_gpu(struct i915_execbuffer *eb)
>>  {
>> +       int err = 0, j;
>> +
>>         assert_vm_bind_held(eb->context->vm);
>>         assert_vm_priv_held(eb->context->vm);
>>
>>         eb_persistent_vmas_move_to_active(eb);
>>
>> +#ifdef CONFIG_MMU_NOTIFIER
>> +       if (!err && (eb->args->flags & __EXEC3_USERPTR_USED)) {
>> +               struct i915_vma *vma;
>> +
>> +               assert_vm_bind_held(eb->context->vm);
>> +               assert_vm_priv_held(eb->context->vm);
>> +
>> +               read_lock(&eb->i915->mm.notifier_lock);
>> +               list_for_each_entry(vma, &eb->context->vm-
>> >vm_bind_list,
>> +                                   vm_bind_link) {
>> +                       if (!i915_gem_object_is_userptr(vma->obj))
>> +                               continue;
>> +
>> +                       err =
>> i915_gem_object_userptr_submit_done(vma->obj);
>> +                       if (err)
>> +                               break;
>> +               }
>> +
>> +               read_unlock(&eb->i915->mm.notifier_lock);
>> +       }
>
>Since we don't loop over the vm_bound_list, there is a need to check
>whether the rebind_list is empty here under the notifier_lock in read
>mode, and in that case, restart from eb_lookup_vmas(). That might also
>eliminate the need for the __EXEC3_USERPTR_USED flag?
>
>That will also catch any objects that were evicted between
>eb_lookup_vmas() where the rebind_list was last checked, and
>i915_gem_vm_priv_lock(), which prohibits further eviction, but if we
>want to catch these earlier (which I think is a good idea), we could
>check that the rebind_list is indeed empty just after taking the
>vm_priv_lock(), and if not, restart from eb_lookup_vmas().

Yah, right, we need to check rebind_list here and if not empty, restart
from lookup phase.
It is bit tricky with userptr here as the unbind happens during
submit_init() call after we scoop unbound vmas here, the vmas gets
re-added to rebind_list :(.
I think we need a separate 'invalidated_userptr_list' here and we
iterate through it for submit_init() and submit_done() calls (yes,
__EXEC3_USERPTR_USED flag won't be needed then).
And, we call, eb_scoop_unbound_vmas() after calling
eb_lookup_persistent_userptr_vmas(), so that we scoop all unbound
vmas properly.

>
>
>> +#endif
>> +
>> +       if (unlikely(err))
>> +               goto err_skip;
>> +
>>         /* Unconditionally flush any chipset caches (for streaming
>> writes). */
>>         intel_gt_chipset_flush(eb->gt);
>>
>>         return 0;
>> +
>> +err_skip:
>> +       for_each_batch_create_order(eb, j) {
>> +               if (!eb->requests[j])
>> +                       break;
>> +
>> +               i915_request_set_error_once(eb->requests[j], err);
>> +       }
>> +       return err;
>>  }
>>
>>  static int eb_request_submit(struct i915_execbuffer *eb,
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> index 1a8efa83547f..cae282b91618 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> @@ -263,6 +263,12 @@ int i915_gem_vm_bind_obj(struct
>> i915_address_space *vm,
>>                 goto put_obj;
>>         }
>>
>> +       if (i915_gem_object_is_userptr(obj)) {
>> +               ret = i915_gem_object_userptr_submit_init(obj);
>> +               if (ret)
>> +                       goto put_obj;
>> +       }
>> +
>>         ret = i915_gem_vm_bind_lock_interruptible(vm);
>>         if (ret)
>>                 goto put_obj;
>> @@ -295,6 +301,16 @@ int i915_gem_vm_bind_obj(struct
>> i915_address_space *vm,
>>         /* Make it evictable */
>>         __i915_vma_unpin(vma);
>>
>> +#ifdef CONFIG_MMU_NOTIFIER
>> +       if (i915_gem_object_is_userptr(obj)) {
>> +               write_lock(&vm->i915->mm.notifier_lock);
>
>Why do we need the lock in write mode here?

Looks like it was no intentional. Should switch to read_lock here.

Niranjana

>
>> +               ret = i915_gem_object_userptr_submit_done(obj);
>> +               write_unlock(&vm->i915->mm.notifier_lock);
>> +               if (ret)
>> +                       goto out_ww;
>> +       }
>> +#endif
>> +
>>         list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>>         i915_vm_bind_it_insert(vma, &vm->va);
>>         if (!obj->priv_root)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 55d5389b2c6c..4ab3bda644ff 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -295,6 +295,7 @@ void i915_address_space_init(struct
>> i915_address_space *vm, int subclass)
>>         GEM_BUG_ON(IS_ERR(vm->root_obj));
>>         INIT_LIST_HEAD(&vm->vm_rebind_list);
>>         spin_lock_init(&vm->vm_rebind_lock);
>> +       INIT_LIST_HEAD(&vm->invalidate_link);
>>  }
>>
>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index fe5485c4a1cd..f9edf11c144f 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -267,6 +267,7 @@ struct i915_address_space {
>>         struct list_head vm_bound_list;
>>         struct list_head vm_rebind_list;
>>         spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
>> +       struct list_head invalidate_link;
>>         /* va tree of persistent vmas */
>>         struct rb_root_cached va;
>>         struct list_head non_priv_vm_bind_list;
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 08/10] drm/i915/vm_bind: userptr dma-resv changes
  2022-07-08 12:17     ` [Intel-gfx] " Hellstrom, Thomas
@ 2022-07-08 14:54       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 14:54 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	Daniel, christian.koenig

On Fri, Jul 08, 2022 at 05:17:53AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> For persistent (vm_bind) vmas of userptr BOs, handle the user
>> page pinning by using the i915_gem_object_userptr_submit_init()
>> /done() functions
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 67
>> +++++++++++++++++++
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 16 +++++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  1 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  1 +
>>  4 files changed, 85 insertions(+)
>>
>
>Hmm. I also miss the code in userptr invalidate that puts invalidated
>vm-private userptr vmas on the rebind list?

Yah, looks like it is lost in rebase.
Based on discussion in other thread on this patch, it is going to be
bit different here than adding to rebind_list.

Niranjana

>
>/Thomas
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 08/10] drm/i915/vm_bind: userptr dma-resv changes
@ 2022-07-08 14:54       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 14:54 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, christian.koenig

On Fri, Jul 08, 2022 at 05:17:53AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> For persistent (vm_bind) vmas of userptr BOs, handle the user
>> page pinning by using the i915_gem_object_userptr_submit_init()
>> /done() functions
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 67
>> +++++++++++++++++++
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 16 +++++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  1 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  1 +
>>  4 files changed, 85 insertions(+)
>>
>
>Hmm. I also miss the code in userptr invalidate that puts invalidated
>vm-private userptr vmas on the rebind list?

Yah, looks like it is lost in rebase.
Based on discussion in other thread on this patch, it is going to be
bit different here than adding to rebind_list.

Niranjana

>
>/Thomas
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
  2022-07-07 11:27     ` [Intel-gfx] " Hellstrom, Thomas
@ 2022-07-08 15:06       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 15:06 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	Daniel, christian.koenig

On Thu, Jul 07, 2022 at 04:27:23AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> Treat VM_BIND vmas as persistent and handle them during the
>> request submission in the execbuff path.
>>
>> Support eviction by maintaining a list of evicted persistent vmas
>> for rebinding during next submission.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  3 +
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 12 ++-
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>>  drivers/gpu/drm/i915/i915_gem_gtt.h           | 22 ++++++
>>  drivers/gpu/drm/i915/i915_vma.c               | 32 +++++++-
>>  drivers/gpu/drm/i915/i915_vma.h               | 78
>> +++++++++++++++++--
>>  drivers/gpu/drm/i915/i915_vma_types.h         | 23 ++++++
>>  9 files changed, 163 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index ccec4055fde3..5121f02ba95c 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -38,6 +38,7 @@
>>  #include "i915_gem_mman.h"
>>  #include "i915_gem_object.h"
>>  #include "i915_gem_ttm.h"
>> +#include "i915_gem_vm_bind.h"
>>  #include "i915_memcpy.h"
>>  #include "i915_trace.h"
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> index 849bf3c1061e..eaadf5a6ab09 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> @@ -6,6 +6,7 @@
>>  #ifndef __I915_GEM_VM_BIND_H
>>  #define __I915_GEM_VM_BIND_H
>>
>> +#include <linux/dma-resv.h>
>>  #include "i915_drv.h"
>>
>>  #define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
>> >vm_bind_lock)
>> @@ -26,6 +27,8 @@ static inline void i915_gem_vm_bind_unlock(struct
>> i915_address_space *vm)
>>         mutex_unlock(&vm->vm_bind_lock);
>>  }
>>
>> +#define assert_vm_priv_held(vm)   assert_object_held((vm)->root_obj)
>> +
>>  static inline int i915_gem_vm_priv_lock(struct i915_address_space
>> *vm,
>>                                         struct i915_gem_ww_ctx *ww)
>>  {
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> index 96f139cc8060..1a8efa83547f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> @@ -85,6 +85,13 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma,
>> bool release_obj)
>>  {
>>         assert_vm_bind_held(vma->vm);
>>
>> +       spin_lock(&vma->vm->vm_rebind_lock);
>> +       if (!list_empty(&vma->vm_rebind_link))
>> +               list_del_init(&vma->vm_rebind_link);
>> +       i915_vma_set_purged(vma);
>> +       i915_vma_set_freed(vma);
>
>Following the vma destruction discussion, can we ditch the freed flag
>now?

Yes

>
>> +       spin_unlock(&vma->vm->vm_rebind_lock);
>> +
>>         if (!list_empty(&vma->vm_bind_link)) {
>>                 list_del_init(&vma->vm_bind_link);
>>                 list_del_init(&vma->non_priv_vm_bind_link);
>> @@ -220,6 +227,7 @@ static struct i915_vma *vm_bind_get_vma(struct
>> i915_address_space *vm,
>>
>>         vma->start = va->start;
>>         vma->last = va->start + va->length - 1;
>> +       i915_vma_set_persistent(vma);
>>
>>         return vma;
>>  }
>> @@ -304,8 +312,10 @@ int i915_gem_vm_bind_obj(struct
>> i915_address_space *vm,
>>
>>         i915_vm_bind_put_fence(vma);
>>  put_vma:
>> -       if (ret)
>> +       if (ret) {
>> +               i915_vma_set_freed(vma);
>>                 i915_vma_destroy(vma);
>> +       }
>>
>>         i915_gem_ww_ctx_fini(&ww);
>>  unlock_vm:
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index df0a8459c3c6..55d5389b2c6c 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -293,6 +293,8 @@ void i915_address_space_init(struct
>> i915_address_space *vm, int subclass)
>>         INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>>         vm->root_obj = i915_gem_object_create_internal(vm->i915,
>> PAGE_SIZE);
>>         GEM_BUG_ON(IS_ERR(vm->root_obj));
>> +       INIT_LIST_HEAD(&vm->vm_rebind_list);
>> +       spin_lock_init(&vm->vm_rebind_lock);
>>  }
>>
>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index f538ce9115c9..fe5485c4a1cd 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -265,6 +265,8 @@ struct i915_address_space {
>>         struct mutex vm_bind_lock;  /* Protects vm_bind lists */
>>         struct list_head vm_bind_list;
>>         struct list_head vm_bound_list;
>> +       struct list_head vm_rebind_list;
>> +       spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
>>         /* va tree of persistent vmas */
>>         struct rb_root_cached va;
>>         struct list_head non_priv_vm_bind_list;
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h
>> b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> index 8c2f57eb5dda..09b89d1913fc 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> @@ -51,4 +51,26 @@ int i915_gem_gtt_insert(struct i915_address_space
>> *vm,
>>
>>  #define PIN_OFFSET_MASK                I915_GTT_PAGE_MASK
>>
>> +static inline int i915_vm_sync(struct i915_address_space *vm)
>
>No need to make this an inline function. Only for performance reasons.
>Kerneldoc.
>
>> +{
>> +       int ret;
>> +
>> +       /* Wait for all requests under this vm to finish */
>> +       ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
>> +                                   DMA_RESV_USAGE_BOOKKEEP, false,
>> +                                   MAX_SCHEDULE_TIMEOUT);
>> +       if (ret < 0)
>> +               return ret;
>> +       else if (ret > 0)
>> +               return 0;
>> +       else
>> +               return -ETIMEDOUT;
>> +}
>> +
>> +static inline bool i915_vm_is_active(const struct i915_address_space
>> *vm)
>> +{
>> +       return !dma_resv_test_signaled(vm->root_obj->base.resv,
>> +                                      DMA_RESV_USAGE_BOOKKEEP);
>> +}
>> +
>>  #endif
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c
>> b/drivers/gpu/drm/i915/i915_vma.c
>> index 6737236b7884..6adb013579be 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>
>>         INIT_LIST_HEAD(&vma->vm_bind_link);
>>         INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>> +       INIT_LIST_HEAD(&vma->vm_rebind_link);
>>         return vma;
>>
>>  err_unlock:
>> @@ -1622,7 +1623,8 @@ void i915_vma_close(struct i915_vma *vma)
>>         if (atomic_dec_and_lock_irqsave(&vma->open_count,
>>                                         &gt->closed_lock,
>>                                         flags)) {
>> -               __vma_close(vma, gt);
>> +               if (!i915_vma_is_persistent(vma))
>> +                       __vma_close(vma, gt);
>
>This change is not needed since persistent vmas shouldn't take part in
>the other vma open-close life-time management.

Yah, looks like a relic from when we did not have separate execbuf3.
Should be removed.

>
>>                 spin_unlock_irqrestore(&gt->closed_lock, flags);
>>         }
>>  }
>> @@ -1647,6 +1649,13 @@ static void force_unbind(struct i915_vma *vma)
>>         if (!drm_mm_node_allocated(&vma->node))
>>                 return;
>>
>> +       /*
>> +        * Mark persistent vma as purged to avoid it waiting
>> +        * for VM to be released.
>> +        */
>> +       if (i915_vma_is_persistent(vma))
>> +               i915_vma_set_purged(vma);
>> +
>>         atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
>>         WARN_ON(__i915_vma_unbind(vma));
>>         GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
>> @@ -1666,9 +1675,12 @@ static void release_references(struct i915_vma
>> *vma, bool vm_ddestroy)
>>
>>         spin_unlock(&obj->vma.lock);
>>
>> -       i915_gem_vm_bind_lock(vma->vm);
>> -       i915_gem_vm_bind_remove(vma, true);
>> -       i915_gem_vm_bind_unlock(vma->vm);
>> +       if (i915_vma_is_persistent(vma) &&
>> +           !i915_vma_is_freed(vma)) {
>> +               i915_gem_vm_bind_lock(vma->vm);
>> +               i915_gem_vm_bind_remove(vma, true);
>> +               i915_gem_vm_bind_unlock(vma->vm);
>> +       }
>>
>>         spin_lock_irq(&gt->closed_lock);
>>         __i915_vma_remove_closed(vma);
>> @@ -1839,6 +1851,8 @@ int _i915_vma_move_to_active(struct i915_vma
>> *vma,
>>         int err;
>>
>>         assert_object_held(obj);
>> +       if (i915_vma_is_persistent(vma))
>> +               return -EINVAL;
>>
>>         GEM_BUG_ON(!vma->pages);
>>
>> @@ -1999,6 +2013,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
>>         __i915_vma_evict(vma, false);
>>
>>         drm_mm_remove_node(&vma->node); /* pairs with
>> i915_vma_release() */
>> +
>> +       if (i915_vma_is_persistent(vma)) {
>> +               spin_lock(&vma->vm->vm_rebind_lock);
>> +               if (list_empty(&vma->vm_rebind_link) &&
>> +                   !i915_vma_is_purged(vma))
>> +                       list_add_tail(&vma->vm_rebind_link,
>> +                                     &vma->vm->vm_rebind_list);
>> +               spin_unlock(&vma->vm->vm_rebind_lock);
>> +       }
>> +
>>         return 0;
>>  }
>>
>> diff --git a/drivers/gpu/drm/i915/i915_vma.h
>> b/drivers/gpu/drm/i915/i915_vma.h
>> index dcb49f79ff7e..6c1369a40e03 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.h
>> +++ b/drivers/gpu/drm/i915/i915_vma.h
>> @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>>
>>  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned
>> int flags);
>>  #define I915_VMA_RELEASE_MAP BIT(0)
>> -
>> -static inline bool i915_vma_is_active(const struct i915_vma *vma)
>> -{
>> -       return !i915_active_is_idle(&vma->active);
>> -}
>> -
>>  /* do not reserve memory to prevent deadlocks */
>>  #define __EXEC_OBJECT_NO_RESERVE BIT(31)
>>
>> @@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct
>> i915_vma *vma)
>>         return i915_vm_to_ggtt(vma->vm)->pin_bias;
>>  }
>>
>> +static inline bool i915_vma_is_persistent(const struct i915_vma
>> *vma)
>> +{
>> +       return test_bit(I915_VMA_PERSISTENT_BIT,
>> __i915_vma_flags(vma));
>> +}
>> +
>> +static inline void i915_vma_set_persistent(struct i915_vma *vma)
>> +{
>> +       set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline bool i915_vma_is_purged(const struct i915_vma *vma)
>> +{
>> +       return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline void i915_vma_set_purged(struct i915_vma *vma)
>> +{
>> +       set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline bool i915_vma_is_freed(const struct i915_vma *vma)
>> +{
>> +       return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline void i915_vma_set_freed(struct i915_vma *vma)
>> +{
>> +       set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline bool i915_vma_is_active(const struct i915_vma *vma)
>> +{
>> +       if (i915_vma_is_persistent(vma)) {
>> +               if (i915_vma_is_purged(vma))
>> +                       return false;
>> +
>> +               return i915_vm_is_active(vma->vm);
>> +       }
>> +
>> +       return !i915_active_is_idle(&vma->active);
>> +}
>> +
>>  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
>>  {
>>         i915_gem_object_get(vma->obj);
>> @@ -408,8 +444,36 @@ int i915_vma_wait_for_bind(struct i915_vma
>> *vma);
>>
>>  static inline int i915_vma_sync(struct i915_vma *vma)
>>  {
>> +       int ret;
>> +
>>         /* Wait for the asynchronous bindings and pending GPU reads
>> */
>> -       return i915_active_wait(&vma->active);
>> +       ret = i915_active_wait(&vma->active);
>> +       if (ret || !i915_vma_is_persistent(vma) ||
>> i915_vma_is_purged(vma))
>> +               return ret;
>> +
>> +       return i915_vm_sync(vma->vm);
>> +}
>> +
>> +static inline bool i915_vma_is_bind_complete(struct i915_vma *vma)
>> +{
>> +       /* Ensure vma bind is initiated */
>> +       if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
>> +               return false;
>> +
>> +       /* Ensure any binding started is complete */
>> +       if (rcu_access_pointer(vma->active.excl.fence)) {
>> +               struct dma_fence *fence;
>> +
>> +               rcu_read_lock();
>> +               fence = dma_fence_get_rcu_safe(&vma-
>> >active.excl.fence);
>> +               rcu_read_unlock();
>> +               if (fence) {
>> +                       dma_fence_put(fence);
>> +                       return false;
>> +               }
>> +       }
>
>Could we use i915_active_fence_get() instead of first testing the
>pointer and then open-coding i915_active_fence_get()? Also no need to
>inline this.

Yah. Looks like we can also use i915_vma_verify_bind_complete() (just
move it out of I915_DEBUG_GEM).

Niranjana

>
>> +
>> +       return true;
>>  }
>>
>>  /**
>> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
>> b/drivers/gpu/drm/i915/i915_vma_types.h
>> index 7d830a6a0b51..405c82e1bc30 100644
>> --- a/drivers/gpu/drm/i915/i915_vma_types.h
>> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>> @@ -264,6 +264,28 @@ struct i915_vma {
>>  #define I915_VMA_SCANOUT_BIT   17
>>  #define I915_VMA_SCANOUT       ((int)BIT(I915_VMA_SCANOUT_BIT))
>>
>> +  /**
>> +   * I915_VMA_PERSISTENT_BIT:
>> +   * The vma is persistent (created with VM_BIND call).
>> +   *
>> +   * I915_VMA_PURGED_BIT:
>> +   * The persistent vma is force unbound either due to VM_UNBIND
>> call
>> +   * from UMD or VM is released. Do not check/wait for VM activeness
>> +   * in i915_vma_is_active() and i915_vma_sync() calls.
>> +   *
>> +   * I915_VMA_FREED_BIT:
>> +   * The persistent vma is being released by UMD via VM_UNBIND call.
>> +   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND
>> call
>> +   * already holds the lock.
>> +   */
>> +#define I915_VMA_PERSISTENT_BIT        19
>> +#define I915_VMA_PURGED_BIT    20
>> +#define I915_VMA_FREED_BIT     21
>> +
>> +#define I915_VMA_PERSISTENT    ((int)BIT(I915_VMA_PERSISTENT_BIT))
>> +#define
>> I915_VMA_PURGED                ((int)BIT(I915_VMA_PURGED_BIT))
>> +#define I915_VMA_FREED         ((int)BIT(I915_VMA_FREED_BIT))
>> +
>>         struct i915_active active;
>>
>>  #define I915_VMA_PAGES_BIAS 24
>> @@ -292,6 +314,7 @@ struct i915_vma {
>>         struct list_head vm_bind_link; /* Link in persistent VMA list
>> */
>Protected by..
>
>>         /* Link in non-private persistent VMA list */
>>         struct list_head non_priv_vm_bind_link;
>> +       struct list_head vm_rebind_link; /* Link in vm_rebind_list */
>>
>>         /** Timeline fence for vm_bind completion notification */
>>         struct {
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas
@ 2022-07-08 15:06       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-08 15:06 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, christian.koenig

On Thu, Jul 07, 2022 at 04:27:23AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> Treat VM_BIND vmas as persistent and handle them during the
>> request submission in the execbuff path.
>>
>> Support eviction by maintaining a list of evicted persistent vmas
>> for rebinding during next submission.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  3 +
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 12 ++-
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  2 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>>  drivers/gpu/drm/i915/i915_gem_gtt.h           | 22 ++++++
>>  drivers/gpu/drm/i915/i915_vma.c               | 32 +++++++-
>>  drivers/gpu/drm/i915/i915_vma.h               | 78
>> +++++++++++++++++--
>>  drivers/gpu/drm/i915/i915_vma_types.h         | 23 ++++++
>>  9 files changed, 163 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index ccec4055fde3..5121f02ba95c 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -38,6 +38,7 @@
>>  #include "i915_gem_mman.h"
>>  #include "i915_gem_object.h"
>>  #include "i915_gem_ttm.h"
>> +#include "i915_gem_vm_bind.h"
>>  #include "i915_memcpy.h"
>>  #include "i915_trace.h"
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> index 849bf3c1061e..eaadf5a6ab09 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> @@ -6,6 +6,7 @@
>>  #ifndef __I915_GEM_VM_BIND_H
>>  #define __I915_GEM_VM_BIND_H
>>
>> +#include <linux/dma-resv.h>
>>  #include "i915_drv.h"
>>
>>  #define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)-
>> >vm_bind_lock)
>> @@ -26,6 +27,8 @@ static inline void i915_gem_vm_bind_unlock(struct
>> i915_address_space *vm)
>>         mutex_unlock(&vm->vm_bind_lock);
>>  }
>>
>> +#define assert_vm_priv_held(vm)   assert_object_held((vm)->root_obj)
>> +
>>  static inline int i915_gem_vm_priv_lock(struct i915_address_space
>> *vm,
>>                                         struct i915_gem_ww_ctx *ww)
>>  {
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> index 96f139cc8060..1a8efa83547f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>> @@ -85,6 +85,13 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma,
>> bool release_obj)
>>  {
>>         assert_vm_bind_held(vma->vm);
>>
>> +       spin_lock(&vma->vm->vm_rebind_lock);
>> +       if (!list_empty(&vma->vm_rebind_link))
>> +               list_del_init(&vma->vm_rebind_link);
>> +       i915_vma_set_purged(vma);
>> +       i915_vma_set_freed(vma);
>
>Following the vma destruction discussion, can we ditch the freed flag
>now?

Yes

>
>> +       spin_unlock(&vma->vm->vm_rebind_lock);
>> +
>>         if (!list_empty(&vma->vm_bind_link)) {
>>                 list_del_init(&vma->vm_bind_link);
>>                 list_del_init(&vma->non_priv_vm_bind_link);
>> @@ -220,6 +227,7 @@ static struct i915_vma *vm_bind_get_vma(struct
>> i915_address_space *vm,
>>
>>         vma->start = va->start;
>>         vma->last = va->start + va->length - 1;
>> +       i915_vma_set_persistent(vma);
>>
>>         return vma;
>>  }
>> @@ -304,8 +312,10 @@ int i915_gem_vm_bind_obj(struct
>> i915_address_space *vm,
>>
>>         i915_vm_bind_put_fence(vma);
>>  put_vma:
>> -       if (ret)
>> +       if (ret) {
>> +               i915_vma_set_freed(vma);
>>                 i915_vma_destroy(vma);
>> +       }
>>
>>         i915_gem_ww_ctx_fini(&ww);
>>  unlock_vm:
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index df0a8459c3c6..55d5389b2c6c 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -293,6 +293,8 @@ void i915_address_space_init(struct
>> i915_address_space *vm, int subclass)
>>         INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>>         vm->root_obj = i915_gem_object_create_internal(vm->i915,
>> PAGE_SIZE);
>>         GEM_BUG_ON(IS_ERR(vm->root_obj));
>> +       INIT_LIST_HEAD(&vm->vm_rebind_list);
>> +       spin_lock_init(&vm->vm_rebind_lock);
>>  }
>>
>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index f538ce9115c9..fe5485c4a1cd 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -265,6 +265,8 @@ struct i915_address_space {
>>         struct mutex vm_bind_lock;  /* Protects vm_bind lists */
>>         struct list_head vm_bind_list;
>>         struct list_head vm_bound_list;
>> +       struct list_head vm_rebind_list;
>> +       spinlock_t vm_rebind_lock;   /* Protects vm_rebind_list */
>>         /* va tree of persistent vmas */
>>         struct rb_root_cached va;
>>         struct list_head non_priv_vm_bind_list;
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h
>> b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> index 8c2f57eb5dda..09b89d1913fc 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> @@ -51,4 +51,26 @@ int i915_gem_gtt_insert(struct i915_address_space
>> *vm,
>>
>>  #define PIN_OFFSET_MASK                I915_GTT_PAGE_MASK
>>
>> +static inline int i915_vm_sync(struct i915_address_space *vm)
>
>No need to make this an inline function. Only for performance reasons.
>Kerneldoc.
>
>> +{
>> +       int ret;
>> +
>> +       /* Wait for all requests under this vm to finish */
>> +       ret = dma_resv_wait_timeout(vm->root_obj->base.resv,
>> +                                   DMA_RESV_USAGE_BOOKKEEP, false,
>> +                                   MAX_SCHEDULE_TIMEOUT);
>> +       if (ret < 0)
>> +               return ret;
>> +       else if (ret > 0)
>> +               return 0;
>> +       else
>> +               return -ETIMEDOUT;
>> +}
>> +
>> +static inline bool i915_vm_is_active(const struct i915_address_space
>> *vm)
>> +{
>> +       return !dma_resv_test_signaled(vm->root_obj->base.resv,
>> +                                      DMA_RESV_USAGE_BOOKKEEP);
>> +}
>> +
>>  #endif
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c
>> b/drivers/gpu/drm/i915/i915_vma.c
>> index 6737236b7884..6adb013579be 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -237,6 +237,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>
>>         INIT_LIST_HEAD(&vma->vm_bind_link);
>>         INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>> +       INIT_LIST_HEAD(&vma->vm_rebind_link);
>>         return vma;
>>
>>  err_unlock:
>> @@ -1622,7 +1623,8 @@ void i915_vma_close(struct i915_vma *vma)
>>         if (atomic_dec_and_lock_irqsave(&vma->open_count,
>>                                         &gt->closed_lock,
>>                                         flags)) {
>> -               __vma_close(vma, gt);
>> +               if (!i915_vma_is_persistent(vma))
>> +                       __vma_close(vma, gt);
>
>This change is not needed since persistent vmas shouldn't take part in
>the other vma open-close life-time management.

Yah, looks like a relic from when we did not have separate execbuf3.
Should be removed.

>
>>                 spin_unlock_irqrestore(&gt->closed_lock, flags);
>>         }
>>  }
>> @@ -1647,6 +1649,13 @@ static void force_unbind(struct i915_vma *vma)
>>         if (!drm_mm_node_allocated(&vma->node))
>>                 return;
>>
>> +       /*
>> +        * Mark persistent vma as purged to avoid it waiting
>> +        * for VM to be released.
>> +        */
>> +       if (i915_vma_is_persistent(vma))
>> +               i915_vma_set_purged(vma);
>> +
>>         atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
>>         WARN_ON(__i915_vma_unbind(vma));
>>         GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
>> @@ -1666,9 +1675,12 @@ static void release_references(struct i915_vma
>> *vma, bool vm_ddestroy)
>>
>>         spin_unlock(&obj->vma.lock);
>>
>> -       i915_gem_vm_bind_lock(vma->vm);
>> -       i915_gem_vm_bind_remove(vma, true);
>> -       i915_gem_vm_bind_unlock(vma->vm);
>> +       if (i915_vma_is_persistent(vma) &&
>> +           !i915_vma_is_freed(vma)) {
>> +               i915_gem_vm_bind_lock(vma->vm);
>> +               i915_gem_vm_bind_remove(vma, true);
>> +               i915_gem_vm_bind_unlock(vma->vm);
>> +       }
>>
>>         spin_lock_irq(&gt->closed_lock);
>>         __i915_vma_remove_closed(vma);
>> @@ -1839,6 +1851,8 @@ int _i915_vma_move_to_active(struct i915_vma
>> *vma,
>>         int err;
>>
>>         assert_object_held(obj);
>> +       if (i915_vma_is_persistent(vma))
>> +               return -EINVAL;
>>
>>         GEM_BUG_ON(!vma->pages);
>>
>> @@ -1999,6 +2013,16 @@ int __i915_vma_unbind(struct i915_vma *vma)
>>         __i915_vma_evict(vma, false);
>>
>>         drm_mm_remove_node(&vma->node); /* pairs with
>> i915_vma_release() */
>> +
>> +       if (i915_vma_is_persistent(vma)) {
>> +               spin_lock(&vma->vm->vm_rebind_lock);
>> +               if (list_empty(&vma->vm_rebind_link) &&
>> +                   !i915_vma_is_purged(vma))
>> +                       list_add_tail(&vma->vm_rebind_link,
>> +                                     &vma->vm->vm_rebind_list);
>> +               spin_unlock(&vma->vm->vm_rebind_lock);
>> +       }
>> +
>>         return 0;
>>  }
>>
>> diff --git a/drivers/gpu/drm/i915/i915_vma.h
>> b/drivers/gpu/drm/i915/i915_vma.h
>> index dcb49f79ff7e..6c1369a40e03 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.h
>> +++ b/drivers/gpu/drm/i915/i915_vma.h
>> @@ -47,12 +47,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>>
>>  void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned
>> int flags);
>>  #define I915_VMA_RELEASE_MAP BIT(0)
>> -
>> -static inline bool i915_vma_is_active(const struct i915_vma *vma)
>> -{
>> -       return !i915_active_is_idle(&vma->active);
>> -}
>> -
>>  /* do not reserve memory to prevent deadlocks */
>>  #define __EXEC_OBJECT_NO_RESERVE BIT(31)
>>
>> @@ -138,6 +132,48 @@ static inline u32 i915_ggtt_pin_bias(struct
>> i915_vma *vma)
>>         return i915_vm_to_ggtt(vma->vm)->pin_bias;
>>  }
>>
>> +static inline bool i915_vma_is_persistent(const struct i915_vma
>> *vma)
>> +{
>> +       return test_bit(I915_VMA_PERSISTENT_BIT,
>> __i915_vma_flags(vma));
>> +}
>> +
>> +static inline void i915_vma_set_persistent(struct i915_vma *vma)
>> +{
>> +       set_bit(I915_VMA_PERSISTENT_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline bool i915_vma_is_purged(const struct i915_vma *vma)
>> +{
>> +       return test_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline void i915_vma_set_purged(struct i915_vma *vma)
>> +{
>> +       set_bit(I915_VMA_PURGED_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline bool i915_vma_is_freed(const struct i915_vma *vma)
>> +{
>> +       return test_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline void i915_vma_set_freed(struct i915_vma *vma)
>> +{
>> +       set_bit(I915_VMA_FREED_BIT, __i915_vma_flags(vma));
>> +}
>> +
>> +static inline bool i915_vma_is_active(const struct i915_vma *vma)
>> +{
>> +       if (i915_vma_is_persistent(vma)) {
>> +               if (i915_vma_is_purged(vma))
>> +                       return false;
>> +
>> +               return i915_vm_is_active(vma->vm);
>> +       }
>> +
>> +       return !i915_active_is_idle(&vma->active);
>> +}
>> +
>>  static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
>>  {
>>         i915_gem_object_get(vma->obj);
>> @@ -408,8 +444,36 @@ int i915_vma_wait_for_bind(struct i915_vma
>> *vma);
>>
>>  static inline int i915_vma_sync(struct i915_vma *vma)
>>  {
>> +       int ret;
>> +
>>         /* Wait for the asynchronous bindings and pending GPU reads
>> */
>> -       return i915_active_wait(&vma->active);
>> +       ret = i915_active_wait(&vma->active);
>> +       if (ret || !i915_vma_is_persistent(vma) ||
>> i915_vma_is_purged(vma))
>> +               return ret;
>> +
>> +       return i915_vm_sync(vma->vm);
>> +}
>> +
>> +static inline bool i915_vma_is_bind_complete(struct i915_vma *vma)
>> +{
>> +       /* Ensure vma bind is initiated */
>> +       if (!i915_vma_is_bound(vma, I915_VMA_BIND_MASK))
>> +               return false;
>> +
>> +       /* Ensure any binding started is complete */
>> +       if (rcu_access_pointer(vma->active.excl.fence)) {
>> +               struct dma_fence *fence;
>> +
>> +               rcu_read_lock();
>> +               fence = dma_fence_get_rcu_safe(&vma-
>> >active.excl.fence);
>> +               rcu_read_unlock();
>> +               if (fence) {
>> +                       dma_fence_put(fence);
>> +                       return false;
>> +               }
>> +       }
>
>Could we use i915_active_fence_get() instead of first testing the
>pointer and then open-coding i915_active_fence_get()? Also no need to
>inline this.

Yah. Looks like we can also use i915_vma_verify_bind_complete() (just
move it out of I915_DEBUG_GEM).

Niranjana

>
>> +
>> +       return true;
>>  }
>>
>>  /**
>> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
>> b/drivers/gpu/drm/i915/i915_vma_types.h
>> index 7d830a6a0b51..405c82e1bc30 100644
>> --- a/drivers/gpu/drm/i915/i915_vma_types.h
>> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>> @@ -264,6 +264,28 @@ struct i915_vma {
>>  #define I915_VMA_SCANOUT_BIT   17
>>  #define I915_VMA_SCANOUT       ((int)BIT(I915_VMA_SCANOUT_BIT))
>>
>> +  /**
>> +   * I915_VMA_PERSISTENT_BIT:
>> +   * The vma is persistent (created with VM_BIND call).
>> +   *
>> +   * I915_VMA_PURGED_BIT:
>> +   * The persistent vma is force unbound either due to VM_UNBIND
>> call
>> +   * from UMD or VM is released. Do not check/wait for VM activeness
>> +   * in i915_vma_is_active() and i915_vma_sync() calls.
>> +   *
>> +   * I915_VMA_FREED_BIT:
>> +   * The persistent vma is being released by UMD via VM_UNBIND call.
>> +   * While releasing the vma, do not take VM_BIND lock as VM_UNBIND
>> call
>> +   * already holds the lock.
>> +   */
>> +#define I915_VMA_PERSISTENT_BIT        19
>> +#define I915_VMA_PURGED_BIT    20
>> +#define I915_VMA_FREED_BIT     21
>> +
>> +#define I915_VMA_PERSISTENT    ((int)BIT(I915_VMA_PERSISTENT_BIT))
>> +#define
>> I915_VMA_PURGED                ((int)BIT(I915_VMA_PURGED_BIT))
>> +#define I915_VMA_FREED         ((int)BIT(I915_VMA_FREED_BIT))
>> +
>>         struct i915_active active;
>>
>>  #define I915_VMA_PAGES_BIAS 24
>> @@ -292,6 +314,7 @@ struct i915_vma {
>>         struct list_head vm_bind_link; /* Link in persistent VMA list
>> */
>Protected by..
>
>>         /* Link in non-private persistent VMA list */
>>         struct list_head non_priv_vm_bind_link;
>> +       struct list_head vm_rebind_link; /* Link in vm_rebind_list */
>>
>>         /** Timeline fence for vm_bind completion notification */
>>         struct {
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 08/10] drm/i915/vm_bind: userptr dma-resv changes
  2022-07-08 14:51       ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-08 15:20         ` Hellstrom, Thomas
  -1 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-08 15:20 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	 Daniel, christian.koenig

On Fri, 2022-07-08 at 07:51 -0700, Niranjana Vishwanathapura wrote:
> > Since we don't loop over the vm_bound_list, there is a need to
> > check
> > whether the rebind_list is empty here under the notifier_lock in
> > read
> > mode, and in that case, restart from eb_lookup_vmas(). That might
> > also
> > eliminate the need for the __EXEC3_USERPTR_USED flag?
> > 
> > That will also catch any objects that were evicted between
> > eb_lookup_vmas() where the rebind_list was last checked, and
> > i915_gem_vm_priv_lock(), which prohibits further eviction, but if
> > we
> > want to catch these earlier (which I think is a good idea), we
> > could
> > check that the rebind_list is indeed empty just after taking the
> > vm_priv_lock(), and if not, restart from eb_lookup_vmas().
> 
> Yah, right, we need to check rebind_list here and if not empty,
> restart
> from lookup phase.
> It is bit tricky with userptr here as the unbind happens during
> submit_init() call after we scoop unbound vmas here, the vmas gets
> re-added to rebind_list :(.

Ugh.

> I think we need a separate 'invalidated_userptr_list' here and we
> iterate through it for submit_init() and submit_done() calls (yes,
> __EXEC3_USERPTR_USED flag won't be needed then).
> And, we call, eb_scoop_unbound_vmas() after calling
> eb_lookup_persistent_userptr_vmas(), so that we scoop all unbound
> vmas properly.
> 

I'm not sure that will help much, because we'd also need to recheck the
rebind_list and possibly restart after taking the vm_priv_lock, since
objects can be evicted between the scooping and taking the
vm_priv_lock. So then the userptrs will be caught by that check.

/Thomas



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 08/10] drm/i915/vm_bind: userptr dma-resv changes
@ 2022-07-08 15:20         ` Hellstrom, Thomas
  0 siblings, 0 replies; 121+ messages in thread
From: Hellstrom, Thomas @ 2022-07-08 15:20 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	 Daniel, christian.koenig

On Fri, 2022-07-08 at 07:51 -0700, Niranjana Vishwanathapura wrote:
> > Since we don't loop over the vm_bound_list, there is a need to
> > check
> > whether the rebind_list is empty here under the notifier_lock in
> > read
> > mode, and in that case, restart from eb_lookup_vmas(). That might
> > also
> > eliminate the need for the __EXEC3_USERPTR_USED flag?
> > 
> > That will also catch any objects that were evicted between
> > eb_lookup_vmas() where the rebind_list was last checked, and
> > i915_gem_vm_priv_lock(), which prohibits further eviction, but if
> > we
> > want to catch these earlier (which I think is a good idea), we
> > could
> > check that the rebind_list is indeed empty just after taking the
> > vm_priv_lock(), and if not, restart from eb_lookup_vmas().
> 
> Yah, right, we need to check rebind_list here and if not empty,
> restart
> from lookup phase.
> It is bit tricky with userptr here as the unbind happens during
> submit_init() call after we scoop unbound vmas here, the vmas gets
> re-added to rebind_list :(.

Ugh.

> I think we need a separate 'invalidated_userptr_list' here and we
> iterate through it for submit_init() and submit_done() calls (yes,
> __EXEC3_USERPTR_USED flag won't be needed then).
> And, we call, eb_scoop_unbound_vmas() after calling
> eb_lookup_persistent_userptr_vmas(), so that we scoop all unbound
> vmas properly.
> 

I'm not sure that will help much, because we'd also need to recheck the
rebind_list and possibly restart after taking the vm_priv_lock, since
objects can be evicted between the scooping and taking the
vm_priv_lock. So then the userptrs will be caught by that check.

/Thomas



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
  2022-07-08 13:23       ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-07-08 17:32         ` Christian König
  -1 siblings, 0 replies; 121+ messages in thread
From: Christian König @ 2022-07-08 17:32 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin, intel-gfx,
	lionel.g.landwerlin, thomas.hellstrom, dri-devel, jason,
	daniel.vetter, matthew.auld

Am 08.07.22 um 15:23 schrieb Niranjana Vishwanathapura:
> On Thu, Jul 07, 2022 at 03:27:43PM +0200, Christian König wrote:
>> Am 02.07.22 um 00:50 schrieb Niranjana Vishwanathapura:
>>> Add uapi allowing user to specify a BO as private to a specified VM
>>> during the BO creation.
>>> VM private BOs can only be mapped on the specified VM and can't be
>>> dma_buf exported. VM private BOs share a single common dma_resv object,
>>> hence has a performance advantage requiring a single dma_resv object
>>> update in the execbuf path compared to non-private (shared) BOs.
>>
>> Sounds like you picked up the per VM BO idea from amdgpu here :)
>>
>> Of hand looks like a good idea, but shouldn't we add a few comments 
>> in the common documentation about that?
>>
>> E.g. something like "Multiple buffer objects sometimes share the same 
>> dma_resv object....." to the dma_resv documentation.
>>
>> Probably best as a separate patch after this here has landed.
>
> :)
> Sounds good. Probably we need to update documentation of
> drm_gem_object.resv and drm_gem_object._resv here, right?

Yes, I would also add a word or two to the dma_resv object. Something 
like "Multiple buffers are sometimes using a single dma_resv object..."

Christian.

>
> Doing it in a separate patch after this series lands sounds good to me.
>
> Thanks,
> Niranjana
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Niranjana Vishwanathapura 
>>> <niranjana.vishwanathapura@intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41 ++++++++++++++++++-
>>>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
>>>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
>>>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
>>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
>>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
>>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
>>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>>>  drivers/gpu/drm/i915/i915_vma.c               |  1 +
>>>  drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
>>>  include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
>>>  11 files changed, 110 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>> index 927a87e5ec59..7e264566b51f 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>> @@ -11,6 +11,7 @@
>>>  #include "pxp/intel_pxp.h"
>>>  #include "i915_drv.h"
>>> +#include "i915_gem_context.h"
>>>  #include "i915_gem_create.h"
>>>  #include "i915_trace.h"
>>>  #include "i915_user_extensions.h"
>>> @@ -243,6 +244,7 @@ struct create_ext {
>>>      unsigned int n_placements;
>>>      unsigned int placement_mask;
>>>      unsigned long flags;
>>> +    u32 vm_id;
>>>  };
>>>  static void repr_placements(char *buf, size_t size,
>>> @@ -392,9 +394,24 @@ static int ext_set_protected(struct 
>>> i915_user_extension __user *base, void *data
>>>      return 0;
>>>  }
>>> +static int ext_set_vm_private(struct i915_user_extension __user *base,
>>> +                  void *data)
>>> +{
>>> +    struct drm_i915_gem_create_ext_vm_private ext;
>>> +    struct create_ext *ext_data = data;
>>> +
>>> +    if (copy_from_user(&ext, base, sizeof(ext)))
>>> +        return -EFAULT;
>>> +
>>> +    ext_data->vm_id = ext.vm_id;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>  static const i915_user_extension_fn create_extensions[] = {
>>>      [I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>>>      [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
>>> +    [I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
>>>  };
>>>  /**
>>> @@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device 
>>> *dev, void *data,
>>>      struct drm_i915_private *i915 = to_i915(dev);
>>>      struct drm_i915_gem_create_ext *args = data;
>>>      struct create_ext ext_data = { .i915 = i915 };
>>> +    struct i915_address_space *vm = NULL;
>>>      struct drm_i915_gem_object *obj;
>>>      int ret;
>>> @@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device 
>>> *dev, void *data,
>>>      if (ret)
>>>          return ret;
>>> +    if (ext_data.vm_id) {
>>> +        vm = i915_gem_vm_lookup(file->driver_priv, ext_data.vm_id);
>>> +        if (unlikely(!vm))
>>> +            return -ENOENT;
>>> +    }
>>> +
>>>      if (!ext_data.n_placements) {
>>>          ext_data.placements[0] =
>>>              intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
>>> @@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device 
>>> *dev, void *data,
>>>                          ext_data.placements,
>>>                          ext_data.n_placements,
>>>                          ext_data.flags);
>>> -    if (IS_ERR(obj))
>>> -        return PTR_ERR(obj);
>>> +    if (IS_ERR(obj)) {
>>> +        ret = PTR_ERR(obj);
>>> +        goto vm_put;
>>> +    }
>>> +
>>> +    if (vm) {
>>> +        obj->base.resv = vm->root_obj->base.resv;
>>> +        obj->priv_root = i915_gem_object_get(vm->root_obj);
>>> +        i915_vm_put(vm);
>>> +    }
>>>      return i915_gem_publish(obj, file, &args->size, &args->handle);
>>> +vm_put:
>>> +    if (vm)
>>> +        i915_vm_put(vm);
>>> +
>>> +    return ret;
>>>  }
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> index f5062d0c6333..6433173c3e84 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> @@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct 
>>> drm_gem_object *gem_obj, int flags)
>>>      struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
>>>      DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>>> +    if (obj->priv_root) {
>>> +        drm_dbg(obj->base.dev,
>>> +            "Exporting VM private objects is not allowed\n");
>>> +        return ERR_PTR(-EINVAL);
>>> +    }
>>> +
>>>      exp_info.ops = &i915_dmabuf_ops;
>>>      exp_info.size = gem_obj->size;
>>>      exp_info.flags = flags;
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> index 5cf36a130061..9fe3395ad4d9 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> @@ -241,6 +241,9 @@ struct drm_i915_gem_object {
>>>      const struct drm_i915_gem_object_ops *ops;
>>> +    /* Shared root is object private to a VM; NULL otherwise */
>>> +    struct drm_i915_gem_object *priv_root;
>>> +
>>>      struct {
>>>          /**
>>>           * @vma.lock: protect the list/tree of vmas
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> index 7e1f8b83077f..f1912b12db00 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> @@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct 
>>> ttm_buffer_object *bo)
>>>      i915_gem_object_release_memory_region(obj);
>>>      mutex_destroy(&obj->ttm.get_io_page.lock);
>>> +    if (obj->priv_root)
>>> +        i915_gem_object_put(obj->priv_root);
>>> +
>>>      if (obj->ttm.created) {
>>>          /*
>>>           * We freely manage the shrinker LRU outide of the mm.pages 
>>> life
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>> index 642cdb559f17..ee6e4c52e80e 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>> @@ -26,6 +26,17 @@ static inline void i915_gem_vm_bind_unlock(struct 
>>> i915_address_space *vm)
>>>      mutex_unlock(&vm->vm_bind_lock);
>>>  }
>>> +static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
>>> +                    struct i915_gem_ww_ctx *ww)
>>> +{
>>> +    return i915_gem_object_lock(vm->root_obj, ww);
>>> +}
>>> +
>>> +static inline void i915_gem_vm_priv_unlock(struct 
>>> i915_address_space *vm)
>>> +{
>>> +    i915_gem_object_unlock(vm->root_obj);
>>> +}
>>> +
>>>  struct i915_vma *
>>>  i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>>>  void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>> index 43ceb4dcca6c..3201204c8e74 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>> @@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, 
>>> bool release_obj)
>>>      if (!list_empty(&vma->vm_bind_link)) {
>>>          list_del_init(&vma->vm_bind_link);
>>> +        list_del_init(&vma->non_priv_vm_bind_link);
>>>          i915_vm_bind_it_remove(vma, &vma->vm->va);
>>>          /* Release object */
>>> @@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct 
>>> i915_address_space *vm,
>>>          goto put_obj;
>>>      }
>>> +    if (obj->priv_root && obj->priv_root != vm->root_obj) {
>>> +        ret = -EINVAL;
>>> +        goto put_obj;
>>> +    }
>>> +
>>>      ret = i915_gem_vm_bind_lock_interruptible(vm);
>>>      if (ret)
>>>          goto put_obj;
>>> @@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct 
>>> i915_address_space *vm,
>>>      list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>>>      i915_vm_bind_it_insert(vma, &vm->va);
>>> +    if (!obj->priv_root)
>>> +        list_add_tail(&vma->non_priv_vm_bind_link,
>>> +                  &vm->non_priv_vm_bind_list);
>>>      /* Hold object reference until vm_unbind */
>>>      i915_gem_object_get(vma->obj);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> index 135dc4a76724..df0a8459c3c6 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> @@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct 
>>> i915_address_space *vm,
>>>  void i915_address_space_fini(struct i915_address_space *vm)
>>>  {
>>>      drm_mm_takedown(&vm->mm);
>>> +    i915_gem_object_put(vm->root_obj);
>>>      GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>>>      mutex_destroy(&vm->vm_bind_lock);
>>>  }
>>> @@ -289,6 +290,9 @@ void i915_address_space_init(struct 
>>> i915_address_space *vm, int subclass)
>>>      INIT_LIST_HEAD(&vm->vm_bind_list);
>>>      INIT_LIST_HEAD(&vm->vm_bound_list);
>>>      mutex_init(&vm->vm_bind_lock);
>>> +    INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>>> +    vm->root_obj = i915_gem_object_create_internal(vm->i915, 
>>> PAGE_SIZE);
>>> +    GEM_BUG_ON(IS_ERR(vm->root_obj));
>>>  }
>>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> index d4a6ce65251d..f538ce9115c9 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> @@ -267,6 +267,8 @@ struct i915_address_space {
>>>      struct list_head vm_bound_list;
>>>      /* va tree of persistent vmas */
>>>      struct rb_root_cached va;
>>> +    struct list_head non_priv_vm_bind_list;
>>> +    struct drm_i915_gem_object *root_obj;
>>>      /* Global GTT */
>>>      bool is_ggtt:1;
>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>> b/drivers/gpu/drm/i915/i915_vma.c
>>> index d324e29cef0a..f0226581d342 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>> @@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>>      mutex_unlock(&vm->mutex);
>>>      INIT_LIST_HEAD(&vma->vm_bind_link);
>>> +    INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>>>      return vma;
>>>  err_unlock:
>>> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h 
>>> b/drivers/gpu/drm/i915/i915_vma_types.h
>>> index b6d179bdbfa0..2298b3d6b7c4 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma_types.h
>>> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>>> @@ -290,6 +290,8 @@ struct i915_vma {
>>>      struct list_head vm_link;
>>>      struct list_head vm_bind_link; /* Link in persistent VMA list */
>>> +    /* Link in non-private persistent VMA list */
>>> +    struct list_head non_priv_vm_bind_link;
>>>      /** Interval tree structures for persistent vma */
>>>      struct rb_node rb;
>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>> index 26cca49717f8..ce1c6592b0d7 100644
>>> --- a/include/uapi/drm/i915_drm.h
>>> +++ b/include/uapi/drm/i915_drm.h
>>> @@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
>>>       *
>>>       * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
>>>       * struct drm_i915_gem_create_ext_protected_content.
>>> +     *
>>> +     * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
>>> +     * struct drm_i915_gem_create_ext_vm_private.
>>>       */
>>>  #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
>>>  #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
>>> +#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
>>>      __u64 extensions;
>>>  };
>>> @@ -3662,6 +3666,32 @@ struct 
>>> drm_i915_gem_create_ext_protected_content {
>>>  /* ID of the protected content session managed by i915 when PXP is 
>>> active */
>>>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>>> +/**
>>> + * struct drm_i915_gem_create_ext_vm_private - Extension to make 
>>> the object
>>> + * private to the specified VM.
>>> + *
>>> + * See struct drm_i915_gem_create_ext.
>>> + *
>>> + * By default, BOs can be mapped on multiple VMs and can also be 
>>> dma-buf
>>> + * exported. Hence these BOs are referred to as Shared BOs.
>>> + * During each execbuf3 submission, the request fence must be added 
>>> to the
>>> + * dma-resv fence list of all shared BOs mapped on the VM.
>>> + *
>>> + * Unlike Shared BOs, these VM private BOs can only be mapped on 
>>> the VM they
>>> + * are private to and can't be dma-buf exported. All private BOs of 
>>> a VM share
>>> + * the dma-resv object. Hence during each execbuf3 submission, they 
>>> need only
>>> + * one dma-resv fence list updated. Thus, the fast path (where 
>>> required
>>> + * mappings are already bound) submission latency is O(1) w.r.t the 
>>> number of
>>> + * VM private BOs.
>>> + */
>>> +struct drm_i915_gem_create_ext_vm_private {
>>> +    /** @base: Extension link. See struct i915_user_extension. */
>>> +    struct i915_user_extension base;
>>> +
>>> +    /** @vm_id: Id of the VM to which the object is private */
>>> +    __u32 vm_id;
>>> +};
>>> +
>>>  /**
>>>   * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>>>   *
>>


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
@ 2022-07-08 17:32         ` Christian König
  0 siblings, 0 replies; 121+ messages in thread
From: Christian König @ 2022-07-08 17:32 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: paulo.r.zanoni, intel-gfx, thomas.hellstrom, dri-devel,
	daniel.vetter, matthew.auld

Am 08.07.22 um 15:23 schrieb Niranjana Vishwanathapura:
> On Thu, Jul 07, 2022 at 03:27:43PM +0200, Christian König wrote:
>> Am 02.07.22 um 00:50 schrieb Niranjana Vishwanathapura:
>>> Add uapi allowing user to specify a BO as private to a specified VM
>>> during the BO creation.
>>> VM private BOs can only be mapped on the specified VM and can't be
>>> dma_buf exported. VM private BOs share a single common dma_resv object,
>>> hence has a performance advantage requiring a single dma_resv object
>>> update in the execbuf path compared to non-private (shared) BOs.
>>
>> Sounds like you picked up the per VM BO idea from amdgpu here :)
>>
>> Of hand looks like a good idea, but shouldn't we add a few comments 
>> in the common documentation about that?
>>
>> E.g. something like "Multiple buffer objects sometimes share the same 
>> dma_resv object....." to the dma_resv documentation.
>>
>> Probably best as a separate patch after this here has landed.
>
> :)
> Sounds good. Probably we need to update documentation of
> drm_gem_object.resv and drm_gem_object._resv here, right?

Yes, I would also add a word or two to the dma_resv object. Something 
like "Multiple buffers are sometimes using a single dma_resv object..."

Christian.

>
> Doing it in a separate patch after this series lands sounds good to me.
>
> Thanks,
> Niranjana
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Niranjana Vishwanathapura 
>>> <niranjana.vishwanathapura@intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41 ++++++++++++++++++-
>>>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
>>>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
>>>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
>>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
>>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
>>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
>>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>>>  drivers/gpu/drm/i915/i915_vma.c               |  1 +
>>>  drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
>>>  include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
>>>  11 files changed, 110 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>> index 927a87e5ec59..7e264566b51f 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>> @@ -11,6 +11,7 @@
>>>  #include "pxp/intel_pxp.h"
>>>  #include "i915_drv.h"
>>> +#include "i915_gem_context.h"
>>>  #include "i915_gem_create.h"
>>>  #include "i915_trace.h"
>>>  #include "i915_user_extensions.h"
>>> @@ -243,6 +244,7 @@ struct create_ext {
>>>      unsigned int n_placements;
>>>      unsigned int placement_mask;
>>>      unsigned long flags;
>>> +    u32 vm_id;
>>>  };
>>>  static void repr_placements(char *buf, size_t size,
>>> @@ -392,9 +394,24 @@ static int ext_set_protected(struct 
>>> i915_user_extension __user *base, void *data
>>>      return 0;
>>>  }
>>> +static int ext_set_vm_private(struct i915_user_extension __user *base,
>>> +                  void *data)
>>> +{
>>> +    struct drm_i915_gem_create_ext_vm_private ext;
>>> +    struct create_ext *ext_data = data;
>>> +
>>> +    if (copy_from_user(&ext, base, sizeof(ext)))
>>> +        return -EFAULT;
>>> +
>>> +    ext_data->vm_id = ext.vm_id;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>  static const i915_user_extension_fn create_extensions[] = {
>>>      [I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>>>      [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
>>> +    [I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
>>>  };
>>>  /**
>>> @@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device 
>>> *dev, void *data,
>>>      struct drm_i915_private *i915 = to_i915(dev);
>>>      struct drm_i915_gem_create_ext *args = data;
>>>      struct create_ext ext_data = { .i915 = i915 };
>>> +    struct i915_address_space *vm = NULL;
>>>      struct drm_i915_gem_object *obj;
>>>      int ret;
>>> @@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device 
>>> *dev, void *data,
>>>      if (ret)
>>>          return ret;
>>> +    if (ext_data.vm_id) {
>>> +        vm = i915_gem_vm_lookup(file->driver_priv, ext_data.vm_id);
>>> +        if (unlikely(!vm))
>>> +            return -ENOENT;
>>> +    }
>>> +
>>>      if (!ext_data.n_placements) {
>>>          ext_data.placements[0] =
>>>              intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
>>> @@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device 
>>> *dev, void *data,
>>>                          ext_data.placements,
>>>                          ext_data.n_placements,
>>>                          ext_data.flags);
>>> -    if (IS_ERR(obj))
>>> -        return PTR_ERR(obj);
>>> +    if (IS_ERR(obj)) {
>>> +        ret = PTR_ERR(obj);
>>> +        goto vm_put;
>>> +    }
>>> +
>>> +    if (vm) {
>>> +        obj->base.resv = vm->root_obj->base.resv;
>>> +        obj->priv_root = i915_gem_object_get(vm->root_obj);
>>> +        i915_vm_put(vm);
>>> +    }
>>>      return i915_gem_publish(obj, file, &args->size, &args->handle);
>>> +vm_put:
>>> +    if (vm)
>>> +        i915_vm_put(vm);
>>> +
>>> +    return ret;
>>>  }
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> index f5062d0c6333..6433173c3e84 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> @@ -218,6 +218,12 @@ struct dma_buf *i915_gem_prime_export(struct 
>>> drm_gem_object *gem_obj, int flags)
>>>      struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
>>>      DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>>> +    if (obj->priv_root) {
>>> +        drm_dbg(obj->base.dev,
>>> +            "Exporting VM private objects is not allowed\n");
>>> +        return ERR_PTR(-EINVAL);
>>> +    }
>>> +
>>>      exp_info.ops = &i915_dmabuf_ops;
>>>      exp_info.size = gem_obj->size;
>>>      exp_info.flags = flags;
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> index 5cf36a130061..9fe3395ad4d9 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> @@ -241,6 +241,9 @@ struct drm_i915_gem_object {
>>>      const struct drm_i915_gem_object_ops *ops;
>>> +    /* Shared root is object private to a VM; NULL otherwise */
>>> +    struct drm_i915_gem_object *priv_root;
>>> +
>>>      struct {
>>>          /**
>>>           * @vma.lock: protect the list/tree of vmas
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> index 7e1f8b83077f..f1912b12db00 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> @@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct 
>>> ttm_buffer_object *bo)
>>>      i915_gem_object_release_memory_region(obj);
>>>      mutex_destroy(&obj->ttm.get_io_page.lock);
>>> +    if (obj->priv_root)
>>> +        i915_gem_object_put(obj->priv_root);
>>> +
>>>      if (obj->ttm.created) {
>>>          /*
>>>           * We freely manage the shrinker LRU outide of the mm.pages 
>>> life
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>> index 642cdb559f17..ee6e4c52e80e 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>> @@ -26,6 +26,17 @@ static inline void i915_gem_vm_bind_unlock(struct 
>>> i915_address_space *vm)
>>>      mutex_unlock(&vm->vm_bind_lock);
>>>  }
>>> +static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
>>> +                    struct i915_gem_ww_ctx *ww)
>>> +{
>>> +    return i915_gem_object_lock(vm->root_obj, ww);
>>> +}
>>> +
>>> +static inline void i915_gem_vm_priv_unlock(struct 
>>> i915_address_space *vm)
>>> +{
>>> +    i915_gem_object_unlock(vm->root_obj);
>>> +}
>>> +
>>>  struct i915_vma *
>>>  i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>>>  void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>> index 43ceb4dcca6c..3201204c8e74 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>> @@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma *vma, 
>>> bool release_obj)
>>>      if (!list_empty(&vma->vm_bind_link)) {
>>>          list_del_init(&vma->vm_bind_link);
>>> +        list_del_init(&vma->non_priv_vm_bind_link);
>>>          i915_vm_bind_it_remove(vma, &vma->vm->va);
>>>          /* Release object */
>>> @@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct 
>>> i915_address_space *vm,
>>>          goto put_obj;
>>>      }
>>> +    if (obj->priv_root && obj->priv_root != vm->root_obj) {
>>> +        ret = -EINVAL;
>>> +        goto put_obj;
>>> +    }
>>> +
>>>      ret = i915_gem_vm_bind_lock_interruptible(vm);
>>>      if (ret)
>>>          goto put_obj;
>>> @@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct 
>>> i915_address_space *vm,
>>>      list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>>>      i915_vm_bind_it_insert(vma, &vm->va);
>>> +    if (!obj->priv_root)
>>> +        list_add_tail(&vma->non_priv_vm_bind_link,
>>> +                  &vm->non_priv_vm_bind_list);
>>>      /* Hold object reference until vm_unbind */
>>>      i915_gem_object_get(vma->obj);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> index 135dc4a76724..df0a8459c3c6 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> @@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct 
>>> i915_address_space *vm,
>>>  void i915_address_space_fini(struct i915_address_space *vm)
>>>  {
>>>      drm_mm_takedown(&vm->mm);
>>> +    i915_gem_object_put(vm->root_obj);
>>>      GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>>>      mutex_destroy(&vm->vm_bind_lock);
>>>  }
>>> @@ -289,6 +290,9 @@ void i915_address_space_init(struct 
>>> i915_address_space *vm, int subclass)
>>>      INIT_LIST_HEAD(&vm->vm_bind_list);
>>>      INIT_LIST_HEAD(&vm->vm_bound_list);
>>>      mutex_init(&vm->vm_bind_lock);
>>> +    INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>>> +    vm->root_obj = i915_gem_object_create_internal(vm->i915, 
>>> PAGE_SIZE);
>>> +    GEM_BUG_ON(IS_ERR(vm->root_obj));
>>>  }
>>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> index d4a6ce65251d..f538ce9115c9 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> @@ -267,6 +267,8 @@ struct i915_address_space {
>>>      struct list_head vm_bound_list;
>>>      /* va tree of persistent vmas */
>>>      struct rb_root_cached va;
>>> +    struct list_head non_priv_vm_bind_list;
>>> +    struct drm_i915_gem_object *root_obj;
>>>      /* Global GTT */
>>>      bool is_ggtt:1;
>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>> b/drivers/gpu/drm/i915/i915_vma.c
>>> index d324e29cef0a..f0226581d342 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>> @@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>>      mutex_unlock(&vm->mutex);
>>>      INIT_LIST_HEAD(&vma->vm_bind_link);
>>> +    INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>>>      return vma;
>>>  err_unlock:
>>> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h 
>>> b/drivers/gpu/drm/i915/i915_vma_types.h
>>> index b6d179bdbfa0..2298b3d6b7c4 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma_types.h
>>> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>>> @@ -290,6 +290,8 @@ struct i915_vma {
>>>      struct list_head vm_link;
>>>      struct list_head vm_bind_link; /* Link in persistent VMA list */
>>> +    /* Link in non-private persistent VMA list */
>>> +    struct list_head non_priv_vm_bind_link;
>>>      /** Interval tree structures for persistent vma */
>>>      struct rb_node rb;
>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>> index 26cca49717f8..ce1c6592b0d7 100644
>>> --- a/include/uapi/drm/i915_drm.h
>>> +++ b/include/uapi/drm/i915_drm.h
>>> @@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
>>>       *
>>>       * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
>>>       * struct drm_i915_gem_create_ext_protected_content.
>>> +     *
>>> +     * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
>>> +     * struct drm_i915_gem_create_ext_vm_private.
>>>       */
>>>  #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
>>>  #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
>>> +#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
>>>      __u64 extensions;
>>>  };
>>> @@ -3662,6 +3666,32 @@ struct 
>>> drm_i915_gem_create_ext_protected_content {
>>>  /* ID of the protected content session managed by i915 when PXP is 
>>> active */
>>>  #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>>> +/**
>>> + * struct drm_i915_gem_create_ext_vm_private - Extension to make 
>>> the object
>>> + * private to the specified VM.
>>> + *
>>> + * See struct drm_i915_gem_create_ext.
>>> + *
>>> + * By default, BOs can be mapped on multiple VMs and can also be 
>>> dma-buf
>>> + * exported. Hence these BOs are referred to as Shared BOs.
>>> + * During each execbuf3 submission, the request fence must be added 
>>> to the
>>> + * dma-resv fence list of all shared BOs mapped on the VM.
>>> + *
>>> + * Unlike Shared BOs, these VM private BOs can only be mapped on 
>>> the VM they
>>> + * are private to and can't be dma-buf exported. All private BOs of 
>>> a VM share
>>> + * the dma-resv object. Hence during each execbuf3 submission, they 
>>> need only
>>> + * one dma-resv fence list updated. Thus, the fast path (where 
>>> required
>>> + * mappings are already bound) submission latency is O(1) w.r.t the 
>>> number of
>>> + * VM private BOs.
>>> + */
>>> +struct drm_i915_gem_create_ext_vm_private {
>>> +    /** @base: Extension link. See struct i915_user_extension. */
>>> +    struct i915_user_extension base;
>>> +
>>> +    /** @vm_id: Id of the VM to which the object is private */
>>> +    __u32 vm_id;
>>> +};
>>> +
>>>  /**
>>>   * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>>>   *
>>


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
  2022-07-08 14:44           ` [Intel-gfx] " Hellstrom, Thomas
@ 2022-07-09 20:13             ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-09 20:13 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	Daniel, christian.koenig

On Fri, Jul 08, 2022 at 07:44:54AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-08 at 15:43 +0200, Thomas Hellström wrote:
>> > The vm_bind/bound_list and the non_priv_vm_bind_list are there for
>> > very different reasons.
>> >
>> > The reason for having separate vm_bind_list and vm_bound_list is
>> > that
>> > during the execbuf path, we can rebind the unbound mappings by
>> > scooping
>> > all unbound vmas back from bound list into the bind list and
>> > binding
>> > them. In fact, this probably can be done with a single vm_bind_list
>> > and
>> > a 'eb.bind_list' (local to execbuf3 ioctl) for rebinding.
>> >
>> > The non_priv_vm_bind_list is just an optimization to loop only
>> > through
>> > non-priv objects while taking the locks in
>> > eb_lock_persistent_vmas()
>> > as only non-priv objects needs that (private objects are locked in
>> > a
>> > single shot with vm_priv_lock). A non-priv mapping will also be in
>> > the
>> > vm_bind/bound_list.
>> >
>> > I think, we need to add this as documentation to be more clear.
>>
>> OK, I understood it as private objects were either on the vm_bind
>> list
>> or vm_bound_list depending on whether they needed rebinding or not,
>> and
>> shared objects only on the non_priv_vm_bind list, and were always
>> locked, validated and fenced...
>>
>> Need to take a deeper look...
>>
>> /Thomas
>>
>>
>>
>> >
>> > Niranjana
>> >
>> >
>
>Hmm. Just a quick thought on this, Since the non-private vm-bind
>objects all need to be iterated through (locked and fenced and userptr
>valid) on each execbuf, and checking for validation (resident and
>bound) is a very quick check, then we'd never need to add them to the
>rebind list at all, right? If so the rebind list would be exclusive to
>vm-private objects.

Yah, non-private vm-bind objects all need to be iterated through, locked
and fenced (not required to be validated/userptr validated unless there
is an invalidation).

Yah, we can say that it is a quick check to see if BO needs rebinding
(validated), and hence rebind_list is mainly useful only for vm-private
objects.
But there has been some discussions to optimize the non-private BOs case
by not having to iterate. I am not sure how that will shape up, but
something to consider here.

>
>Also I don't think the vm_bind list can be execbuf-local, since binding
>may not have completed at vma_release time, at which point the objects
>need to remain on the vm_bind list until the next execbuf...

Yah, in execbuf3, after scooping all unbind BOs in vm_bind_list and
validating (binding) them, during eb_release_vmas, we only move those
BOs to vm_bound_list for which the binding is complete. If not, they
will remain in vm_bind_list. If we make this list, execbuf-local, then
we have to add all the BOs for which binding is not complete back into
rebind_list so that next execbuf will pick it up. Probably the current
vm_bind/bound_list is the better option.

Niranjana

>
>/Thomas
>
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
@ 2022-07-09 20:13             ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-09 20:13 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, christian.koenig

On Fri, Jul 08, 2022 at 07:44:54AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-08 at 15:43 +0200, Thomas Hellström wrote:
>> > The vm_bind/bound_list and the non_priv_vm_bind_list are there for
>> > very different reasons.
>> >
>> > The reason for having separate vm_bind_list and vm_bound_list is
>> > that
>> > during the execbuf path, we can rebind the unbound mappings by
>> > scooping
>> > all unbound vmas back from bound list into the bind list and
>> > binding
>> > them. In fact, this probably can be done with a single vm_bind_list
>> > and
>> > a 'eb.bind_list' (local to execbuf3 ioctl) for rebinding.
>> >
>> > The non_priv_vm_bind_list is just an optimization to loop only
>> > through
>> > non-priv objects while taking the locks in
>> > eb_lock_persistent_vmas()
>> > as only non-priv objects needs that (private objects are locked in
>> > a
>> > single shot with vm_priv_lock). A non-priv mapping will also be in
>> > the
>> > vm_bind/bound_list.
>> >
>> > I think, we need to add this as documentation to be more clear.
>>
>> OK, I understood it as private objects were either on the vm_bind
>> list
>> or vm_bound_list depending on whether they needed rebinding or not,
>> and
>> shared objects only on the non_priv_vm_bind list, and were always
>> locked, validated and fenced...
>>
>> Need to take a deeper look...
>>
>> /Thomas
>>
>>
>>
>> >
>> > Niranjana
>> >
>> >
>
>Hmm. Just a quick thought on this, Since the non-private vm-bind
>objects all need to be iterated through (locked and fenced and userptr
>valid) on each execbuf, and checking for validation (resident and
>bound) is a very quick check, then we'd never need to add them to the
>rebind list at all, right? If so the rebind list would be exclusive to
>vm-private objects.

Yah, non-private vm-bind objects all need to be iterated through, locked
and fenced (not required to be validated/userptr validated unless there
is an invalidation).

Yah, we can say that it is a quick check to see if BO needs rebinding
(validated), and hence rebind_list is mainly useful only for vm-private
objects.
But there has been some discussions to optimize the non-private BOs case
by not having to iterate. I am not sure how that will shape up, but
something to consider here.

>
>Also I don't think the vm_bind list can be execbuf-local, since binding
>may not have completed at vma_release time, at which point the objects
>need to remain on the vm_bind list until the next execbuf...

Yah, in execbuf3, after scooping all unbind BOs in vm_bind_list and
validating (binding) them, during eb_release_vmas, we only move those
BOs to vm_bound_list for which the binding is complete. If not, they
will remain in vm_bind_list. If we make this list, execbuf-local, then
we have to add all the BOs for which binding is not complete back into
rebind_list so that next execbuf will pick it up. Probably the current
vm_bind/bound_list is the better option.

Niranjana

>
>/Thomas
>
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
  2022-07-08 17:32         ` [Intel-gfx] " Christian König
@ 2022-07-09 20:14           ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-09 20:14 UTC (permalink / raw)
  To: Christian König
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin, intel-gfx,
	lionel.g.landwerlin, thomas.hellstrom, dri-devel, jason,
	daniel.vetter, matthew.auld

On Fri, Jul 08, 2022 at 07:32:54PM +0200, Christian König wrote:
>Am 08.07.22 um 15:23 schrieb Niranjana Vishwanathapura:
>>On Thu, Jul 07, 2022 at 03:27:43PM +0200, Christian König wrote:
>>>Am 02.07.22 um 00:50 schrieb Niranjana Vishwanathapura:
>>>>Add uapi allowing user to specify a BO as private to a specified VM
>>>>during the BO creation.
>>>>VM private BOs can only be mapped on the specified VM and can't be
>>>>dma_buf exported. VM private BOs share a single common dma_resv object,
>>>>hence has a performance advantage requiring a single dma_resv object
>>>>update in the execbuf path compared to non-private (shared) BOs.
>>>
>>>Sounds like you picked up the per VM BO idea from amdgpu here :)
>>>
>>>Of hand looks like a good idea, but shouldn't we add a few 
>>>comments in the common documentation about that?
>>>
>>>E.g. something like "Multiple buffer objects sometimes share the 
>>>same dma_resv object....." to the dma_resv documentation.
>>>
>>>Probably best as a separate patch after this here has landed.
>>
>>:)
>>Sounds good. Probably we need to update documentation of
>>drm_gem_object.resv and drm_gem_object._resv here, right?
>
>Yes, I would also add a word or two to the dma_resv object. Something 
>like "Multiple buffers are sometimes using a single dma_resv 
>object..."
>

Ok, sounds good.

Niranjana

>Christian.
>
>>
>>Doing it in a separate patch after this series lands sounds good to me.
>>
>>Thanks,
>>Niranjana
>>
>>>
>>>Regards,
>>>Christian.
>>>
>>>>
>>>>Signed-off-by: Niranjana Vishwanathapura 
>>>><niranjana.vishwanathapura@intel.com>
>>>>---
>>>> drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41 ++++++++++++++++++-
>>>> drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
>>>> .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
>>>> drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
>>>> drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
>>>> .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
>>>> drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
>>>> drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>>>> drivers/gpu/drm/i915/i915_vma.c               |  1 +
>>>> drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
>>>> include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
>>>> 11 files changed, 110 insertions(+), 2 deletions(-)
>>>>
>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
>>>>b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>>index 927a87e5ec59..7e264566b51f 100644
>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>>@@ -11,6 +11,7 @@
>>>> #include "pxp/intel_pxp.h"
>>>> #include "i915_drv.h"
>>>>+#include "i915_gem_context.h"
>>>> #include "i915_gem_create.h"
>>>> #include "i915_trace.h"
>>>> #include "i915_user_extensions.h"
>>>>@@ -243,6 +244,7 @@ struct create_ext {
>>>>     unsigned int n_placements;
>>>>     unsigned int placement_mask;
>>>>     unsigned long flags;
>>>>+    u32 vm_id;
>>>> };
>>>> static void repr_placements(char *buf, size_t size,
>>>>@@ -392,9 +394,24 @@ static int ext_set_protected(struct 
>>>>i915_user_extension __user *base, void *data
>>>>     return 0;
>>>> }
>>>>+static int ext_set_vm_private(struct i915_user_extension __user *base,
>>>>+                  void *data)
>>>>+{
>>>>+    struct drm_i915_gem_create_ext_vm_private ext;
>>>>+    struct create_ext *ext_data = data;
>>>>+
>>>>+    if (copy_from_user(&ext, base, sizeof(ext)))
>>>>+        return -EFAULT;
>>>>+
>>>>+    ext_data->vm_id = ext.vm_id;
>>>>+
>>>>+    return 0;
>>>>+}
>>>>+
>>>> static const i915_user_extension_fn create_extensions[] = {
>>>>     [I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>>>>     [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
>>>>+    [I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
>>>> };
>>>> /**
>>>>@@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device 
>>>>*dev, void *data,
>>>>     struct drm_i915_private *i915 = to_i915(dev);
>>>>     struct drm_i915_gem_create_ext *args = data;
>>>>     struct create_ext ext_data = { .i915 = i915 };
>>>>+    struct i915_address_space *vm = NULL;
>>>>     struct drm_i915_gem_object *obj;
>>>>     int ret;
>>>>@@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device 
>>>>*dev, void *data,
>>>>     if (ret)
>>>>         return ret;
>>>>+    if (ext_data.vm_id) {
>>>>+        vm = i915_gem_vm_lookup(file->driver_priv, ext_data.vm_id);
>>>>+        if (unlikely(!vm))
>>>>+            return -ENOENT;
>>>>+    }
>>>>+
>>>>     if (!ext_data.n_placements) {
>>>>         ext_data.placements[0] =
>>>>             intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
>>>>@@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device 
>>>>*dev, void *data,
>>>>                         ext_data.placements,
>>>>                         ext_data.n_placements,
>>>>                         ext_data.flags);
>>>>-    if (IS_ERR(obj))
>>>>-        return PTR_ERR(obj);
>>>>+    if (IS_ERR(obj)) {
>>>>+        ret = PTR_ERR(obj);
>>>>+        goto vm_put;
>>>>+    }
>>>>+
>>>>+    if (vm) {
>>>>+        obj->base.resv = vm->root_obj->base.resv;
>>>>+        obj->priv_root = i915_gem_object_get(vm->root_obj);
>>>>+        i915_vm_put(vm);
>>>>+    }
>>>>     return i915_gem_publish(obj, file, &args->size, &args->handle);
>>>>+vm_put:
>>>>+    if (vm)
>>>>+        i915_vm_put(vm);
>>>>+
>>>>+    return ret;
>>>> }
>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>>>b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>>index f5062d0c6333..6433173c3e84 100644
>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>>@@ -218,6 +218,12 @@ struct dma_buf 
>>>>*i915_gem_prime_export(struct drm_gem_object *gem_obj, int 
>>>>flags)
>>>>     struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
>>>>     DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>>>>+    if (obj->priv_root) {
>>>>+        drm_dbg(obj->base.dev,
>>>>+            "Exporting VM private objects is not allowed\n");
>>>>+        return ERR_PTR(-EINVAL);
>>>>+    }
>>>>+
>>>>     exp_info.ops = &i915_dmabuf_ops;
>>>>     exp_info.size = gem_obj->size;
>>>>     exp_info.flags = flags;
>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
>>>>b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>>>index 5cf36a130061..9fe3395ad4d9 100644
>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>>>@@ -241,6 +241,9 @@ struct drm_i915_gem_object {
>>>>     const struct drm_i915_gem_object_ops *ops;
>>>>+    /* Shared root is object private to a VM; NULL otherwise */
>>>>+    struct drm_i915_gem_object *priv_root;
>>>>+
>>>>     struct {
>>>>         /**
>>>>          * @vma.lock: protect the list/tree of vmas
>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
>>>>b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>>index 7e1f8b83077f..f1912b12db00 100644
>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>>@@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct 
>>>>ttm_buffer_object *bo)
>>>>     i915_gem_object_release_memory_region(obj);
>>>>     mutex_destroy(&obj->ttm.get_io_page.lock);
>>>>+    if (obj->priv_root)
>>>>+        i915_gem_object_put(obj->priv_root);
>>>>+
>>>>     if (obj->ttm.created) {
>>>>         /*
>>>>          * We freely manage the shrinker LRU outide of the 
>>>>mm.pages life
>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h 
>>>>b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>>>index 642cdb559f17..ee6e4c52e80e 100644
>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>>>@@ -26,6 +26,17 @@ static inline void 
>>>>i915_gem_vm_bind_unlock(struct i915_address_space *vm)
>>>>     mutex_unlock(&vm->vm_bind_lock);
>>>> }
>>>>+static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
>>>>+                    struct i915_gem_ww_ctx *ww)
>>>>+{
>>>>+    return i915_gem_object_lock(vm->root_obj, ww);
>>>>+}
>>>>+
>>>>+static inline void i915_gem_vm_priv_unlock(struct 
>>>>i915_address_space *vm)
>>>>+{
>>>>+    i915_gem_object_unlock(vm->root_obj);
>>>>+}
>>>>+
>>>> struct i915_vma *
>>>> i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>>>> void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c 
>>>>b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>>>index 43ceb4dcca6c..3201204c8e74 100644
>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>>>@@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma 
>>>>*vma, bool release_obj)
>>>>     if (!list_empty(&vma->vm_bind_link)) {
>>>>         list_del_init(&vma->vm_bind_link);
>>>>+        list_del_init(&vma->non_priv_vm_bind_link);
>>>>         i915_vm_bind_it_remove(vma, &vma->vm->va);
>>>>         /* Release object */
>>>>@@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct 
>>>>i915_address_space *vm,
>>>>         goto put_obj;
>>>>     }
>>>>+    if (obj->priv_root && obj->priv_root != vm->root_obj) {
>>>>+        ret = -EINVAL;
>>>>+        goto put_obj;
>>>>+    }
>>>>+
>>>>     ret = i915_gem_vm_bind_lock_interruptible(vm);
>>>>     if (ret)
>>>>         goto put_obj;
>>>>@@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct 
>>>>i915_address_space *vm,
>>>>     list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>>>>     i915_vm_bind_it_insert(vma, &vm->va);
>>>>+    if (!obj->priv_root)
>>>>+        list_add_tail(&vma->non_priv_vm_bind_link,
>>>>+                  &vm->non_priv_vm_bind_list);
>>>>     /* Hold object reference until vm_unbind */
>>>>     i915_gem_object_get(vma->obj);
>>>>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>>>b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>index 135dc4a76724..df0a8459c3c6 100644
>>>>--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>@@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct 
>>>>i915_address_space *vm,
>>>> void i915_address_space_fini(struct i915_address_space *vm)
>>>> {
>>>>     drm_mm_takedown(&vm->mm);
>>>>+    i915_gem_object_put(vm->root_obj);
>>>>     GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>>>>     mutex_destroy(&vm->vm_bind_lock);
>>>> }
>>>>@@ -289,6 +290,9 @@ void i915_address_space_init(struct 
>>>>i915_address_space *vm, int subclass)
>>>>     INIT_LIST_HEAD(&vm->vm_bind_list);
>>>>     INIT_LIST_HEAD(&vm->vm_bound_list);
>>>>     mutex_init(&vm->vm_bind_lock);
>>>>+    INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>>>>+    vm->root_obj = i915_gem_object_create_internal(vm->i915, 
>>>>PAGE_SIZE);
>>>>+    GEM_BUG_ON(IS_ERR(vm->root_obj));
>>>> }
>>>> void *__px_vaddr(struct drm_i915_gem_object *p)
>>>>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>>>b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>index d4a6ce65251d..f538ce9115c9 100644
>>>>--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>@@ -267,6 +267,8 @@ struct i915_address_space {
>>>>     struct list_head vm_bound_list;
>>>>     /* va tree of persistent vmas */
>>>>     struct rb_root_cached va;
>>>>+    struct list_head non_priv_vm_bind_list;
>>>>+    struct drm_i915_gem_object *root_obj;
>>>>     /* Global GTT */
>>>>     bool is_ggtt:1;
>>>>diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>>b/drivers/gpu/drm/i915/i915_vma.c
>>>>index d324e29cef0a..f0226581d342 100644
>>>>--- a/drivers/gpu/drm/i915/i915_vma.c
>>>>+++ b/drivers/gpu/drm/i915/i915_vma.c
>>>>@@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>>>     mutex_unlock(&vm->mutex);
>>>>     INIT_LIST_HEAD(&vma->vm_bind_link);
>>>>+    INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>>>>     return vma;
>>>> err_unlock:
>>>>diff --git a/drivers/gpu/drm/i915/i915_vma_types.h 
>>>>b/drivers/gpu/drm/i915/i915_vma_types.h
>>>>index b6d179bdbfa0..2298b3d6b7c4 100644
>>>>--- a/drivers/gpu/drm/i915/i915_vma_types.h
>>>>+++ b/drivers/gpu/drm/i915/i915_vma_types.h
>>>>@@ -290,6 +290,8 @@ struct i915_vma {
>>>>     struct list_head vm_link;
>>>>     struct list_head vm_bind_link; /* Link in persistent VMA list */
>>>>+    /* Link in non-private persistent VMA list */
>>>>+    struct list_head non_priv_vm_bind_link;
>>>>     /** Interval tree structures for persistent vma */
>>>>     struct rb_node rb;
>>>>diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>>>index 26cca49717f8..ce1c6592b0d7 100644
>>>>--- a/include/uapi/drm/i915_drm.h
>>>>+++ b/include/uapi/drm/i915_drm.h
>>>>@@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
>>>>      *
>>>>      * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
>>>>      * struct drm_i915_gem_create_ext_protected_content.
>>>>+     *
>>>>+     * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
>>>>+     * struct drm_i915_gem_create_ext_vm_private.
>>>>      */
>>>> #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
>>>> #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
>>>>+#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
>>>>     __u64 extensions;
>>>> };
>>>>@@ -3662,6 +3666,32 @@ struct 
>>>>drm_i915_gem_create_ext_protected_content {
>>>> /* ID of the protected content session managed by i915 when PXP 
>>>>is active */
>>>> #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>>>>+/**
>>>>+ * struct drm_i915_gem_create_ext_vm_private - Extension to 
>>>>make the object
>>>>+ * private to the specified VM.
>>>>+ *
>>>>+ * See struct drm_i915_gem_create_ext.
>>>>+ *
>>>>+ * By default, BOs can be mapped on multiple VMs and can also 
>>>>be dma-buf
>>>>+ * exported. Hence these BOs are referred to as Shared BOs.
>>>>+ * During each execbuf3 submission, the request fence must be 
>>>>added to the
>>>>+ * dma-resv fence list of all shared BOs mapped on the VM.
>>>>+ *
>>>>+ * Unlike Shared BOs, these VM private BOs can only be mapped 
>>>>on the VM they
>>>>+ * are private to and can't be dma-buf exported. All private 
>>>>BOs of a VM share
>>>>+ * the dma-resv object. Hence during each execbuf3 submission, 
>>>>they need only
>>>>+ * one dma-resv fence list updated. Thus, the fast path (where 
>>>>required
>>>>+ * mappings are already bound) submission latency is O(1) w.r.t 
>>>>the number of
>>>>+ * VM private BOs.
>>>>+ */
>>>>+struct drm_i915_gem_create_ext_vm_private {
>>>>+    /** @base: Extension link. See struct i915_user_extension. */
>>>>+    struct i915_user_extension base;
>>>>+
>>>>+    /** @vm_id: Id of the VM to which the object is private */
>>>>+    __u32 vm_id;
>>>>+};
>>>>+
>>>> /**
>>>>  * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>>>>  *
>>>
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs
@ 2022-07-09 20:14           ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-09 20:14 UTC (permalink / raw)
  To: Christian König
  Cc: paulo.r.zanoni, intel-gfx, thomas.hellstrom, dri-devel,
	daniel.vetter, matthew.auld

On Fri, Jul 08, 2022 at 07:32:54PM +0200, Christian König wrote:
>Am 08.07.22 um 15:23 schrieb Niranjana Vishwanathapura:
>>On Thu, Jul 07, 2022 at 03:27:43PM +0200, Christian König wrote:
>>>Am 02.07.22 um 00:50 schrieb Niranjana Vishwanathapura:
>>>>Add uapi allowing user to specify a BO as private to a specified VM
>>>>during the BO creation.
>>>>VM private BOs can only be mapped on the specified VM and can't be
>>>>dma_buf exported. VM private BOs share a single common dma_resv object,
>>>>hence has a performance advantage requiring a single dma_resv object
>>>>update in the execbuf path compared to non-private (shared) BOs.
>>>
>>>Sounds like you picked up the per VM BO idea from amdgpu here :)
>>>
>>>Of hand looks like a good idea, but shouldn't we add a few 
>>>comments in the common documentation about that?
>>>
>>>E.g. something like "Multiple buffer objects sometimes share the 
>>>same dma_resv object....." to the dma_resv documentation.
>>>
>>>Probably best as a separate patch after this here has landed.
>>
>>:)
>>Sounds good. Probably we need to update documentation of
>>drm_gem_object.resv and drm_gem_object._resv here, right?
>
>Yes, I would also add a word or two to the dma_resv object. Something 
>like "Multiple buffers are sometimes using a single dma_resv 
>object..."
>

Ok, sounds good.

Niranjana

>Christian.
>
>>
>>Doing it in a separate patch after this series lands sounds good to me.
>>
>>Thanks,
>>Niranjana
>>
>>>
>>>Regards,
>>>Christian.
>>>
>>>>
>>>>Signed-off-by: Niranjana Vishwanathapura 
>>>><niranjana.vishwanathapura@intel.com>
>>>>---
>>>> drivers/gpu/drm/i915/gem/i915_gem_create.c    | 41 ++++++++++++++++++-
>>>> drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  6 +++
>>>> .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
>>>> drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 ++
>>>> drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   | 11 +++++
>>>> .../drm/i915/gem/i915_gem_vm_bind_object.c    |  9 ++++
>>>> drivers/gpu/drm/i915/gt/intel_gtt.c           |  4 ++
>>>> drivers/gpu/drm/i915/gt/intel_gtt.h           |  2 +
>>>> drivers/gpu/drm/i915/i915_vma.c               |  1 +
>>>> drivers/gpu/drm/i915/i915_vma_types.h         |  2 +
>>>> include/uapi/drm/i915_drm.h                   | 30 ++++++++++++++
>>>> 11 files changed, 110 insertions(+), 2 deletions(-)
>>>>
>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
>>>>b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>>index 927a87e5ec59..7e264566b51f 100644
>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>>>@@ -11,6 +11,7 @@
>>>> #include "pxp/intel_pxp.h"
>>>> #include "i915_drv.h"
>>>>+#include "i915_gem_context.h"
>>>> #include "i915_gem_create.h"
>>>> #include "i915_trace.h"
>>>> #include "i915_user_extensions.h"
>>>>@@ -243,6 +244,7 @@ struct create_ext {
>>>>     unsigned int n_placements;
>>>>     unsigned int placement_mask;
>>>>     unsigned long flags;
>>>>+    u32 vm_id;
>>>> };
>>>> static void repr_placements(char *buf, size_t size,
>>>>@@ -392,9 +394,24 @@ static int ext_set_protected(struct 
>>>>i915_user_extension __user *base, void *data
>>>>     return 0;
>>>> }
>>>>+static int ext_set_vm_private(struct i915_user_extension __user *base,
>>>>+                  void *data)
>>>>+{
>>>>+    struct drm_i915_gem_create_ext_vm_private ext;
>>>>+    struct create_ext *ext_data = data;
>>>>+
>>>>+    if (copy_from_user(&ext, base, sizeof(ext)))
>>>>+        return -EFAULT;
>>>>+
>>>>+    ext_data->vm_id = ext.vm_id;
>>>>+
>>>>+    return 0;
>>>>+}
>>>>+
>>>> static const i915_user_extension_fn create_extensions[] = {
>>>>     [I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>>>>     [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
>>>>+    [I915_GEM_CREATE_EXT_VM_PRIVATE] = ext_set_vm_private,
>>>> };
>>>> /**
>>>>@@ -410,6 +427,7 @@ i915_gem_create_ext_ioctl(struct drm_device 
>>>>*dev, void *data,
>>>>     struct drm_i915_private *i915 = to_i915(dev);
>>>>     struct drm_i915_gem_create_ext *args = data;
>>>>     struct create_ext ext_data = { .i915 = i915 };
>>>>+    struct i915_address_space *vm = NULL;
>>>>     struct drm_i915_gem_object *obj;
>>>>     int ret;
>>>>@@ -423,6 +441,12 @@ i915_gem_create_ext_ioctl(struct drm_device 
>>>>*dev, void *data,
>>>>     if (ret)
>>>>         return ret;
>>>>+    if (ext_data.vm_id) {
>>>>+        vm = i915_gem_vm_lookup(file->driver_priv, ext_data.vm_id);
>>>>+        if (unlikely(!vm))
>>>>+            return -ENOENT;
>>>>+    }
>>>>+
>>>>     if (!ext_data.n_placements) {
>>>>         ext_data.placements[0] =
>>>>             intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
>>>>@@ -449,8 +473,21 @@ i915_gem_create_ext_ioctl(struct drm_device 
>>>>*dev, void *data,
>>>>                         ext_data.placements,
>>>>                         ext_data.n_placements,
>>>>                         ext_data.flags);
>>>>-    if (IS_ERR(obj))
>>>>-        return PTR_ERR(obj);
>>>>+    if (IS_ERR(obj)) {
>>>>+        ret = PTR_ERR(obj);
>>>>+        goto vm_put;
>>>>+    }
>>>>+
>>>>+    if (vm) {
>>>>+        obj->base.resv = vm->root_obj->base.resv;
>>>>+        obj->priv_root = i915_gem_object_get(vm->root_obj);
>>>>+        i915_vm_put(vm);
>>>>+    }
>>>>     return i915_gem_publish(obj, file, &args->size, &args->handle);
>>>>+vm_put:
>>>>+    if (vm)
>>>>+        i915_vm_put(vm);
>>>>+
>>>>+    return ret;
>>>> }
>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>>>b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>>index f5062d0c6333..6433173c3e84 100644
>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>>@@ -218,6 +218,12 @@ struct dma_buf 
>>>>*i915_gem_prime_export(struct drm_gem_object *gem_obj, int 
>>>>flags)
>>>>     struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
>>>>     DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>>>>+    if (obj->priv_root) {
>>>>+        drm_dbg(obj->base.dev,
>>>>+            "Exporting VM private objects is not allowed\n");
>>>>+        return ERR_PTR(-EINVAL);
>>>>+    }
>>>>+
>>>>     exp_info.ops = &i915_dmabuf_ops;
>>>>     exp_info.size = gem_obj->size;
>>>>     exp_info.flags = flags;
>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
>>>>b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>>>index 5cf36a130061..9fe3395ad4d9 100644
>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>>>@@ -241,6 +241,9 @@ struct drm_i915_gem_object {
>>>>     const struct drm_i915_gem_object_ops *ops;
>>>>+    /* Shared root is object private to a VM; NULL otherwise */
>>>>+    struct drm_i915_gem_object *priv_root;
>>>>+
>>>>     struct {
>>>>         /**
>>>>          * @vma.lock: protect the list/tree of vmas
>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
>>>>b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>>index 7e1f8b83077f..f1912b12db00 100644
>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>>@@ -1152,6 +1152,9 @@ void i915_ttm_bo_destroy(struct 
>>>>ttm_buffer_object *bo)
>>>>     i915_gem_object_release_memory_region(obj);
>>>>     mutex_destroy(&obj->ttm.get_io_page.lock);
>>>>+    if (obj->priv_root)
>>>>+        i915_gem_object_put(obj->priv_root);
>>>>+
>>>>     if (obj->ttm.created) {
>>>>         /*
>>>>          * We freely manage the shrinker LRU outide of the 
>>>>mm.pages life
>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h 
>>>>b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>>>index 642cdb559f17..ee6e4c52e80e 100644
>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>>>@@ -26,6 +26,17 @@ static inline void 
>>>>i915_gem_vm_bind_unlock(struct i915_address_space *vm)
>>>>     mutex_unlock(&vm->vm_bind_lock);
>>>> }
>>>>+static inline int i915_gem_vm_priv_lock(struct i915_address_space *vm,
>>>>+                    struct i915_gem_ww_ctx *ww)
>>>>+{
>>>>+    return i915_gem_object_lock(vm->root_obj, ww);
>>>>+}
>>>>+
>>>>+static inline void i915_gem_vm_priv_unlock(struct 
>>>>i915_address_space *vm)
>>>>+{
>>>>+    i915_gem_object_unlock(vm->root_obj);
>>>>+}
>>>>+
>>>> struct i915_vma *
>>>> i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>>>> void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c 
>>>>b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>>>index 43ceb4dcca6c..3201204c8e74 100644
>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>>>@@ -85,6 +85,7 @@ void i915_gem_vm_bind_remove(struct i915_vma 
>>>>*vma, bool release_obj)
>>>>     if (!list_empty(&vma->vm_bind_link)) {
>>>>         list_del_init(&vma->vm_bind_link);
>>>>+        list_del_init(&vma->non_priv_vm_bind_link);
>>>>         i915_vm_bind_it_remove(vma, &vma->vm->va);
>>>>         /* Release object */
>>>>@@ -185,6 +186,11 @@ int i915_gem_vm_bind_obj(struct 
>>>>i915_address_space *vm,
>>>>         goto put_obj;
>>>>     }
>>>>+    if (obj->priv_root && obj->priv_root != vm->root_obj) {
>>>>+        ret = -EINVAL;
>>>>+        goto put_obj;
>>>>+    }
>>>>+
>>>>     ret = i915_gem_vm_bind_lock_interruptible(vm);
>>>>     if (ret)
>>>>         goto put_obj;
>>>>@@ -211,6 +217,9 @@ int i915_gem_vm_bind_obj(struct 
>>>>i915_address_space *vm,
>>>>     list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>>>>     i915_vm_bind_it_insert(vma, &vm->va);
>>>>+    if (!obj->priv_root)
>>>>+        list_add_tail(&vma->non_priv_vm_bind_link,
>>>>+                  &vm->non_priv_vm_bind_list);
>>>>     /* Hold object reference until vm_unbind */
>>>>     i915_gem_object_get(vma->obj);
>>>>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>>>b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>index 135dc4a76724..df0a8459c3c6 100644
>>>>--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>@@ -176,6 +176,7 @@ int i915_vm_lock_objects(struct 
>>>>i915_address_space *vm,
>>>> void i915_address_space_fini(struct i915_address_space *vm)
>>>> {
>>>>     drm_mm_takedown(&vm->mm);
>>>>+    i915_gem_object_put(vm->root_obj);
>>>>     GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>>>>     mutex_destroy(&vm->vm_bind_lock);
>>>> }
>>>>@@ -289,6 +290,9 @@ void i915_address_space_init(struct 
>>>>i915_address_space *vm, int subclass)
>>>>     INIT_LIST_HEAD(&vm->vm_bind_list);
>>>>     INIT_LIST_HEAD(&vm->vm_bound_list);
>>>>     mutex_init(&vm->vm_bind_lock);
>>>>+    INIT_LIST_HEAD(&vm->non_priv_vm_bind_list);
>>>>+    vm->root_obj = i915_gem_object_create_internal(vm->i915, 
>>>>PAGE_SIZE);
>>>>+    GEM_BUG_ON(IS_ERR(vm->root_obj));
>>>> }
>>>> void *__px_vaddr(struct drm_i915_gem_object *p)
>>>>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>>>b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>index d4a6ce65251d..f538ce9115c9 100644
>>>>--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>@@ -267,6 +267,8 @@ struct i915_address_space {
>>>>     struct list_head vm_bound_list;
>>>>     /* va tree of persistent vmas */
>>>>     struct rb_root_cached va;
>>>>+    struct list_head non_priv_vm_bind_list;
>>>>+    struct drm_i915_gem_object *root_obj;
>>>>     /* Global GTT */
>>>>     bool is_ggtt:1;
>>>>diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>>b/drivers/gpu/drm/i915/i915_vma.c
>>>>index d324e29cef0a..f0226581d342 100644
>>>>--- a/drivers/gpu/drm/i915/i915_vma.c
>>>>+++ b/drivers/gpu/drm/i915/i915_vma.c
>>>>@@ -236,6 +236,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>>>     mutex_unlock(&vm->mutex);
>>>>     INIT_LIST_HEAD(&vma->vm_bind_link);
>>>>+    INIT_LIST_HEAD(&vma->non_priv_vm_bind_link);
>>>>     return vma;
>>>> err_unlock:
>>>>diff --git a/drivers/gpu/drm/i915/i915_vma_types.h 
>>>>b/drivers/gpu/drm/i915/i915_vma_types.h
>>>>index b6d179bdbfa0..2298b3d6b7c4 100644
>>>>--- a/drivers/gpu/drm/i915/i915_vma_types.h
>>>>+++ b/drivers/gpu/drm/i915/i915_vma_types.h
>>>>@@ -290,6 +290,8 @@ struct i915_vma {
>>>>     struct list_head vm_link;
>>>>     struct list_head vm_bind_link; /* Link in persistent VMA list */
>>>>+    /* Link in non-private persistent VMA list */
>>>>+    struct list_head non_priv_vm_bind_link;
>>>>     /** Interval tree structures for persistent vma */
>>>>     struct rb_node rb;
>>>>diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>>>index 26cca49717f8..ce1c6592b0d7 100644
>>>>--- a/include/uapi/drm/i915_drm.h
>>>>+++ b/include/uapi/drm/i915_drm.h
>>>>@@ -3542,9 +3542,13 @@ struct drm_i915_gem_create_ext {
>>>>      *
>>>>      * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
>>>>      * struct drm_i915_gem_create_ext_protected_content.
>>>>+     *
>>>>+     * For I915_GEM_CREATE_EXT_VM_PRIVATE usage see
>>>>+     * struct drm_i915_gem_create_ext_vm_private.
>>>>      */
>>>> #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
>>>> #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
>>>>+#define I915_GEM_CREATE_EXT_VM_PRIVATE 2
>>>>     __u64 extensions;
>>>> };
>>>>@@ -3662,6 +3666,32 @@ struct 
>>>>drm_i915_gem_create_ext_protected_content {
>>>> /* ID of the protected content session managed by i915 when PXP 
>>>>is active */
>>>> #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
>>>>+/**
>>>>+ * struct drm_i915_gem_create_ext_vm_private - Extension to 
>>>>make the object
>>>>+ * private to the specified VM.
>>>>+ *
>>>>+ * See struct drm_i915_gem_create_ext.
>>>>+ *
>>>>+ * By default, BOs can be mapped on multiple VMs and can also 
>>>>be dma-buf
>>>>+ * exported. Hence these BOs are referred to as Shared BOs.
>>>>+ * During each execbuf3 submission, the request fence must be 
>>>>added to the
>>>>+ * dma-resv fence list of all shared BOs mapped on the VM.
>>>>+ *
>>>>+ * Unlike Shared BOs, these VM private BOs can only be mapped 
>>>>on the VM they
>>>>+ * are private to and can't be dma-buf exported. All private 
>>>>BOs of a VM share
>>>>+ * the dma-resv object. Hence during each execbuf3 submission, 
>>>>they need only
>>>>+ * one dma-resv fence list updated. Thus, the fast path (where 
>>>>required
>>>>+ * mappings are already bound) submission latency is O(1) w.r.t 
>>>>the number of
>>>>+ * VM private BOs.
>>>>+ */
>>>>+struct drm_i915_gem_create_ext_vm_private {
>>>>+    /** @base: Extension link. See struct i915_user_extension. */
>>>>+    struct i915_user_extension base;
>>>>+
>>>>+    /** @vm_id: Id of the VM to which the object is private */
>>>>+    __u32 vm_id;
>>>>+};
>>>>+
>>>> /**
>>>>  * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>>>>  *
>>>
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 06/10] drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
  2022-07-08 14:37         ` [Intel-gfx] " Hellstrom, Thomas
@ 2022-07-09 20:23           ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-09 20:23 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	Daniel, christian.koenig

On Fri, Jul 08, 2022 at 07:37:30AM -0700, Hellstrom, Thomas wrote:
>Hi,
>
>On Fri, 2022-07-08 at 06:47 -0700, Niranjana Vishwanathapura wrote:
>> On Thu, Jul 07, 2022 at 07:41:54AM -0700, Hellstrom, Thomas wrote:
>> > On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> > > Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
>> > > works in vm_bind mode. The vm_bind mode only works with
>> > > this new execbuf3 ioctl.
>> > >
>> > > The new execbuf3 ioctl will not have any execlist
>> >
>> > I understand this that you mean there is no list of objects to
>> > validate
>> > attached to the drm_i915_gem_execbuffer3 structure rather than that
>> > the
>> > execlists submission backend is never used. Could we clarify this
>> > to
>> > avoid confusion.
>>
>> Yah, side effect of overloading the word 'execlist' for multiple
>> things.
>> Yah, I meant, no list of objects to validate. I agree, we need to
>> clarify
>> that here.
>>
>> >
>> >
>> > >  support
>> > > and all the legacy support like relocations etc are removed.
>> > >
>> > > Signed-off-by: Niranjana Vishwanathapura
>> > > <niranjana.vishwanathapura@intel.com>
>> > > ---
>> > >  drivers/gpu/drm/i915/Makefile                 |    1 +
>> > >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |    5 +
>> > >  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1029
>> > > +++++++++++++++++
>> > >  drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
>> > >  drivers/gpu/drm/i915/i915_driver.c            |    1 +
>> > >  include/uapi/drm/i915_drm.h                   |   67 +-
>> > >  6 files changed, 1104 insertions(+), 1 deletion(-)
>> > >  create mode 100644
>> > > drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > >
>> > > diff --git a/drivers/gpu/drm/i915/Makefile
>> > > b/drivers/gpu/drm/i915/Makefile
>> > > index 4e1627e96c6e..38cd1c5bc1a5 100644
>> > > --- a/drivers/gpu/drm/i915/Makefile
>> > > +++ b/drivers/gpu/drm/i915/Makefile
>> > > @@ -148,6 +148,7 @@ gem-y += \
>> > >         gem/i915_gem_dmabuf.o \
>> > >         gem/i915_gem_domain.o \
>> > >         gem/i915_gem_execbuffer.o \
>> > > +       gem/i915_gem_execbuffer3.o \
>> > >         gem/i915_gem_internal.o \
>> > >         gem/i915_gem_object.o \
>> > >         gem/i915_gem_lmem.o \
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> > > index b7b2c14fd9e1..37bb1383ab8f 100644
>> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> > > @@ -782,6 +782,11 @@ static int eb_select_context(struct
>> > > i915_execbuffer *eb)
>> > >         if (unlikely(IS_ERR(ctx)))
>> > >                 return PTR_ERR(ctx);
>> > >
>> > > +       if (ctx->vm->vm_bind_mode) {
>> > > +               i915_gem_context_put(ctx);
>> > > +               return -EOPNOTSUPP;
>> > > +       }
>> > > +
>> > >         eb->gem_context = ctx;
>> > >         if (i915_gem_context_has_full_ppgtt(ctx))
>> > >                 eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > > new file mode 100644
>> > > index 000000000000..13121df72e3d
>> > > --- /dev/null
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > > @@ -0,0 +1,1029 @@
>> > > +// SPDX-License-Identifier: MIT
>> > > +/*
>> > > + * Copyright © 2022 Intel Corporation
>> > > + */
>> > > +
>> > > +#include <linux/dma-resv.h>
>> > > +#include <linux/sync_file.h>
>> > > +#include <linux/uaccess.h>
>> > > +
>> > > +#include <drm/drm_syncobj.h>
>> > > +
>> > > +#include "gt/intel_context.h"
>> > > +#include "gt/intel_gpu_commands.h"
>> > > +#include "gt/intel_gt.h"
>> > > +#include "gt/intel_gt_pm.h"
>> > > +#include "gt/intel_ring.h"
>> > > +
>> > > +#include "i915_drv.h"
>> > > +#include "i915_file_private.h"
>> > > +#include "i915_gem_context.h"
>> > > +#include "i915_gem_ioctls.h"
>> > > +#include "i915_gem_vm_bind.h"
>> > > +#include "i915_trace.h"
>> > > +
>> > > +#define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
>> > > +#define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
>> > > +
>> > > +/* Catch emission of unexpected errors for CI! */
>> > > +#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
>> > > +#undef EINVAL
>> > > +#define EINVAL ({ \
>> > > +       DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__,
>> > > __LINE__); \
>> > > +       22; \
>> > > +})
>> > > +#endif
>> > > +
>> > > +/**
>> > > + * DOC: User command execution with execbuf3 ioctl
>> > > + *
>> > > + * A VM in VM_BIND mode will not support older execbuf mode of
>> > > binding.
>> > > + * The execbuf ioctl handling in VM_BIND mode differs
>> > > significantly
>> > > from the
>> > > + * older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
>> > > + * Hence, a new execbuf3 ioctl has been added to support VM_BIND
>> > > mode. (See
>> > > + * struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not
>> > > accept any
>> > > + * execlist. Hence, no support for implicit sync.
>> > > + *
>> > > + * The new execbuf3 ioctl only works in VM_BIND mode and the
>> > > VM_BIND
>> > > mode only
>> > > + * works with execbuf3 ioctl for submission.
>> > > + *
>> > > + * The execbuf3 ioctl directly specifies the batch addresses
>> > > instead
>> > > of as
>> > > + * object handles as in execbuf2 ioctl. The execbuf3 ioctl will
>> > > also
>> > > not
>> > > + * support many of the older features like in/out/submit fences,
>> > > fence array,
>> > > + * default gem context etc. (See struct
>> > > drm_i915_gem_execbuffer3).
>> > > + *
>> > > + * In VM_BIND mode, VA allocation is completely managed by the
>> > > user
>> > > instead of
>> > > + * the i915 driver. Hence all VA assignment, eviction are not
>> > > applicable in
>> > > + * VM_BIND mode. Also, for determining object activeness,
>> > > VM_BIND
>> > > mode will not
>> > > + * be using the i915_vma active reference tracking. It will
>> > > instead
>> > > check the
>> > > + * dma-resv object's fence list for that.
>> > > + *
>> > > + * So, a lot of code supporting execbuf2 ioctl, like
>> > > relocations, VA
>> > > evictions,
>> > > + * vma lookup table, implicit sync, vma active reference
>> > > tracking
>> > > etc., are not
>> > > + * applicable for execbuf3 ioctl.
>> > > + */
>> > > +
>> > > +struct eb_fence {
>> > > +       struct drm_syncobj *syncobj; /* Use with ptr_mask_bits()
>> > > */
>> > > +       struct dma_fence *dma_fence;
>> > > +       u64 value;
>> > > +       struct dma_fence_chain *chain_fence;
>> > > +};
>> > > +
>> > > +struct i915_execbuffer {
>> > > +       struct drm_i915_private *i915; /** i915 backpointer */
>> > > +       struct drm_file *file; /** per-file lookup tables and
>> > > limits
>> > > */
>> > > +       struct drm_i915_gem_execbuffer3 *args; /** ioctl
>> > > parameters
>> > > */
>> > > +
>> > > +       struct intel_gt *gt; /* gt for the execbuf */
>> > > +       struct intel_context *context; /* logical state for the
>> > > request */
>> > > +       struct i915_gem_context *gem_context; /** caller's
>> > > context */
>> > > +
>> > > +       /** our requests to build */
>> > > +       struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
>> > > +
>> > > +       /** used for excl fence in dma_resv objects when > 1 BB
>> > > submitted */
>> > > +       struct dma_fence *composite_fence;
>> > > +
>> > > +       struct i915_gem_ww_ctx ww;
>> > > +
>> > > +       /* number of batches in execbuf IOCTL */
>> > > +       unsigned int num_batches;
>> > > +
>> > > +       u64 batch_addresses[MAX_ENGINE_INSTANCE + 1];
>> > > +       /** identity of the batch obj/vma */
>> > > +       struct i915_vma *batches[MAX_ENGINE_INSTANCE + 1];
>> > > +
>> > > +       struct eb_fence *fences;
>> > > +       unsigned long num_fences;
>> > > +};
>> >
>> > Kerneldoc structures please.
>> >
>> > It seems we are duplicating a lot of code from i915_execbuffer.c.
>> > Did
>> > you consider
>> >
>> > struct i915_execbuffer3 {
>> > ...
>> > };
>> >
>> > struct i915_execbuffer2 {
>> >        struct i915_execbuffer3 eb3;
>> >        ...
>> >        [members that are not common]
>> > };
>> >
>> > Allowing execbuffer2 to share the execbuffer3 code to some extent.
>> > Not sure about the gain at this point though. My worry would be
>> > that fo
>> > r example fixes might be applied to one file and not the other.
>>
>> I have added a TODO in the cover letter of this patch series to share
>> the code between execbuf2 and execbuf3.
>> But, I am not sure to what extent. Execbuf3 is much leaner than
>> execbuf2
>> and we don't want to make it bad by forcing code sharing with legacy
>> path.
>> We can perhaps abstract out some functions which takes specific
>> arguments
>> (instead of 'eb'), that way we can keep these structures separate and
>> still
>> share some code.
>
>
>Fully agree we shouldn't make eb3 code more complicated because of eb2.
>My question was more of using i915_execbuffer3 and its functions as a
>"base class" and subclass it for eb2, eb2 adding and implementing
>additional functionality it needs.

Yah, this was Daniel's feedback before the new execbuf3 was proposed.
A base eb class derived to execlist mode and vm_bind mode eb classes.

I think that this will require lot of changes in the legacy execbuf2
code, which we are now not touching for VM_BIND mode. Hence, I am not
sure if it is worth it.

>
>But OTOH I just learned we've might have been asked not to share any
>code between those two from drm maintainers, so need to dig up that
>discussion.

Yah, that is what I understood from the VM_BIND design feedback.
Here I am only thinking about possiblity having some common helper
functions that both can make use of (like eb_pin/unpin_engine etc).

Niranjana

>
>/Thomas
>
>
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 06/10] drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl
@ 2022-07-09 20:23           ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-09 20:23 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, christian.koenig

On Fri, Jul 08, 2022 at 07:37:30AM -0700, Hellstrom, Thomas wrote:
>Hi,
>
>On Fri, 2022-07-08 at 06:47 -0700, Niranjana Vishwanathapura wrote:
>> On Thu, Jul 07, 2022 at 07:41:54AM -0700, Hellstrom, Thomas wrote:
>> > On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> > > Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only
>> > > works in vm_bind mode. The vm_bind mode only works with
>> > > this new execbuf3 ioctl.
>> > >
>> > > The new execbuf3 ioctl will not have any execlist
>> >
>> > I understand this that you mean there is no list of objects to
>> > validate
>> > attached to the drm_i915_gem_execbuffer3 structure rather than that
>> > the
>> > execlists submission backend is never used. Could we clarify this
>> > to
>> > avoid confusion.
>>
>> Yah, side effect of overloading the word 'execlist' for multiple
>> things.
>> Yah, I meant, no list of objects to validate. I agree, we need to
>> clarify
>> that here.
>>
>> >
>> >
>> > >  support
>> > > and all the legacy support like relocations etc are removed.
>> > >
>> > > Signed-off-by: Niranjana Vishwanathapura
>> > > <niranjana.vishwanathapura@intel.com>
>> > > ---
>> > >  drivers/gpu/drm/i915/Makefile                 |    1 +
>> > >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |    5 +
>> > >  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 1029
>> > > +++++++++++++++++
>> > >  drivers/gpu/drm/i915/gem/i915_gem_ioctls.h    |    2 +
>> > >  drivers/gpu/drm/i915/i915_driver.c            |    1 +
>> > >  include/uapi/drm/i915_drm.h                   |   67 +-
>> > >  6 files changed, 1104 insertions(+), 1 deletion(-)
>> > >  create mode 100644
>> > > drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > >
>> > > diff --git a/drivers/gpu/drm/i915/Makefile
>> > > b/drivers/gpu/drm/i915/Makefile
>> > > index 4e1627e96c6e..38cd1c5bc1a5 100644
>> > > --- a/drivers/gpu/drm/i915/Makefile
>> > > +++ b/drivers/gpu/drm/i915/Makefile
>> > > @@ -148,6 +148,7 @@ gem-y += \
>> > >         gem/i915_gem_dmabuf.o \
>> > >         gem/i915_gem_domain.o \
>> > >         gem/i915_gem_execbuffer.o \
>> > > +       gem/i915_gem_execbuffer3.o \
>> > >         gem/i915_gem_internal.o \
>> > >         gem/i915_gem_object.o \
>> > >         gem/i915_gem_lmem.o \
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> > > index b7b2c14fd9e1..37bb1383ab8f 100644
>> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> > > @@ -782,6 +782,11 @@ static int eb_select_context(struct
>> > > i915_execbuffer *eb)
>> > >         if (unlikely(IS_ERR(ctx)))
>> > >                 return PTR_ERR(ctx);
>> > >
>> > > +       if (ctx->vm->vm_bind_mode) {
>> > > +               i915_gem_context_put(ctx);
>> > > +               return -EOPNOTSUPP;
>> > > +       }
>> > > +
>> > >         eb->gem_context = ctx;
>> > >         if (i915_gem_context_has_full_ppgtt(ctx))
>> > >                 eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > > new file mode 100644
>> > > index 000000000000..13121df72e3d
>> > > --- /dev/null
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > > @@ -0,0 +1,1029 @@
>> > > +// SPDX-License-Identifier: MIT
>> > > +/*
>> > > + * Copyright © 2022 Intel Corporation
>> > > + */
>> > > +
>> > > +#include <linux/dma-resv.h>
>> > > +#include <linux/sync_file.h>
>> > > +#include <linux/uaccess.h>
>> > > +
>> > > +#include <drm/drm_syncobj.h>
>> > > +
>> > > +#include "gt/intel_context.h"
>> > > +#include "gt/intel_gpu_commands.h"
>> > > +#include "gt/intel_gt.h"
>> > > +#include "gt/intel_gt_pm.h"
>> > > +#include "gt/intel_ring.h"
>> > > +
>> > > +#include "i915_drv.h"
>> > > +#include "i915_file_private.h"
>> > > +#include "i915_gem_context.h"
>> > > +#include "i915_gem_ioctls.h"
>> > > +#include "i915_gem_vm_bind.h"
>> > > +#include "i915_trace.h"
>> > > +
>> > > +#define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
>> > > +#define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
>> > > +
>> > > +/* Catch emission of unexpected errors for CI! */
>> > > +#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
>> > > +#undef EINVAL
>> > > +#define EINVAL ({ \
>> > > +       DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__,
>> > > __LINE__); \
>> > > +       22; \
>> > > +})
>> > > +#endif
>> > > +
>> > > +/**
>> > > + * DOC: User command execution with execbuf3 ioctl
>> > > + *
>> > > + * A VM in VM_BIND mode will not support older execbuf mode of
>> > > binding.
>> > > + * The execbuf ioctl handling in VM_BIND mode differs
>> > > significantly
>> > > from the
>> > > + * older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
>> > > + * Hence, a new execbuf3 ioctl has been added to support VM_BIND
>> > > mode. (See
>> > > + * struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not
>> > > accept any
>> > > + * execlist. Hence, no support for implicit sync.
>> > > + *
>> > > + * The new execbuf3 ioctl only works in VM_BIND mode and the
>> > > VM_BIND
>> > > mode only
>> > > + * works with execbuf3 ioctl for submission.
>> > > + *
>> > > + * The execbuf3 ioctl directly specifies the batch addresses
>> > > instead
>> > > of as
>> > > + * object handles as in execbuf2 ioctl. The execbuf3 ioctl will
>> > > also
>> > > not
>> > > + * support many of the older features like in/out/submit fences,
>> > > fence array,
>> > > + * default gem context etc. (See struct
>> > > drm_i915_gem_execbuffer3).
>> > > + *
>> > > + * In VM_BIND mode, VA allocation is completely managed by the
>> > > user
>> > > instead of
>> > > + * the i915 driver. Hence all VA assignment, eviction are not
>> > > applicable in
>> > > + * VM_BIND mode. Also, for determining object activeness,
>> > > VM_BIND
>> > > mode will not
>> > > + * be using the i915_vma active reference tracking. It will
>> > > instead
>> > > check the
>> > > + * dma-resv object's fence list for that.
>> > > + *
>> > > + * So, a lot of code supporting execbuf2 ioctl, like
>> > > relocations, VA
>> > > evictions,
>> > > + * vma lookup table, implicit sync, vma active reference
>> > > tracking
>> > > etc., are not
>> > > + * applicable for execbuf3 ioctl.
>> > > + */
>> > > +
>> > > +struct eb_fence {
>> > > +       struct drm_syncobj *syncobj; /* Use with ptr_mask_bits()
>> > > */
>> > > +       struct dma_fence *dma_fence;
>> > > +       u64 value;
>> > > +       struct dma_fence_chain *chain_fence;
>> > > +};
>> > > +
>> > > +struct i915_execbuffer {
>> > > +       struct drm_i915_private *i915; /** i915 backpointer */
>> > > +       struct drm_file *file; /** per-file lookup tables and
>> > > limits
>> > > */
>> > > +       struct drm_i915_gem_execbuffer3 *args; /** ioctl
>> > > parameters
>> > > */
>> > > +
>> > > +       struct intel_gt *gt; /* gt for the execbuf */
>> > > +       struct intel_context *context; /* logical state for the
>> > > request */
>> > > +       struct i915_gem_context *gem_context; /** caller's
>> > > context */
>> > > +
>> > > +       /** our requests to build */
>> > > +       struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
>> > > +
>> > > +       /** used for excl fence in dma_resv objects when > 1 BB
>> > > submitted */
>> > > +       struct dma_fence *composite_fence;
>> > > +
>> > > +       struct i915_gem_ww_ctx ww;
>> > > +
>> > > +       /* number of batches in execbuf IOCTL */
>> > > +       unsigned int num_batches;
>> > > +
>> > > +       u64 batch_addresses[MAX_ENGINE_INSTANCE + 1];
>> > > +       /** identity of the batch obj/vma */
>> > > +       struct i915_vma *batches[MAX_ENGINE_INSTANCE + 1];
>> > > +
>> > > +       struct eb_fence *fences;
>> > > +       unsigned long num_fences;
>> > > +};
>> >
>> > Kerneldoc structures please.
>> >
>> > It seems we are duplicating a lot of code from i915_execbuffer.c.
>> > Did
>> > you consider
>> >
>> > struct i915_execbuffer3 {
>> > ...
>> > };
>> >
>> > struct i915_execbuffer2 {
>> >        struct i915_execbuffer3 eb3;
>> >        ...
>> >        [members that are not common]
>> > };
>> >
>> > Allowing execbuffer2 to share the execbuffer3 code to some extent.
>> > Not sure about the gain at this point though. My worry would be
>> > that fo
>> > r example fixes might be applied to one file and not the other.
>>
>> I have added a TODO in the cover letter of this patch series to share
>> the code between execbuf2 and execbuf3.
>> But, I am not sure to what extent. Execbuf3 is much leaner than
>> execbuf2
>> and we don't want to make it bad by forcing code sharing with legacy
>> path.
>> We can perhaps abstract out some functions which takes specific
>> arguments
>> (instead of 'eb'), that way we can keep these structures separate and
>> still
>> share some code.
>
>
>Fully agree we shouldn't make eb3 code more complicated because of eb2.
>My question was more of using i915_execbuffer3 and its functions as a
>"base class" and subclass it for eb2, eb2 adding and implementing
>additional functionality it needs.

Yah, this was Daniel's feedback before the new execbuf3 was proposed.
A base eb class derived to execlist mode and vm_bind mode eb classes.

I think that this will require lot of changes in the legacy execbuf2
code, which we are now not touching for VM_BIND mode. Hence, I am not
sure if it is worth it.

>
>But OTOH I just learned we've might have been asked not to share any
>code between those two from drm maintainers, so need to dig up that
>discussion.

Yah, that is what I understood from the VM_BIND design feedback.
Here I am only thinking about possiblity having some common helper
functions that both can make use of (like eb_pin/unpin_engine etc).

Niranjana

>
>/Thomas
>
>
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 07/10] drm/i915/vm_bind: Handle persistent vmas in execbuf3
  2022-07-08 13:03         ` [Intel-gfx] " Hellstrom, Thomas
@ 2022-07-09 20:25           ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-09 20:25 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Auld, Matthew, jason, Vetter,
	Daniel, christian.koenig

On Fri, Jul 08, 2022 at 06:03:28AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-08 at 05:44 -0700, Niranjana Vishwanathapura wrote:
>> On Thu, Jul 07, 2022 at 07:54:16AM -0700, Hellstrom, Thomas wrote:
>> > On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> > > Handle persistent (VM_BIND) mappings during the request
>> > > submission
>> > > in the execbuf3 path.
>> > >
>> > > Signed-off-by: Niranjana Vishwanathapura
>> > > <niranjana.vishwanathapura@intel.com>
>> > > ---
>> > >  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 176
>> > > +++++++++++++++++-
>> > >  1 file changed, 175 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > > index 13121df72e3d..2079f5ca9010 100644
>> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > > @@ -22,6 +22,7 @@
>> > >  #include "i915_gem_vm_bind.h"
>> > >  #include "i915_trace.h"
>> > >
>> > > +#define __EXEC3_HAS_PIN                        BIT_ULL(33)
>> > >  #define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
>> > >  #define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
>> > >
>> > > @@ -45,7 +46,9 @@
>> > >   * execlist. Hence, no support for implicit sync.
>> > >   *
>> > >   * The new execbuf3 ioctl only works in VM_BIND mode and the
>> > > VM_BIND
>> > > mode only
>> > > - * works with execbuf3 ioctl for submission.
>> > > + * works with execbuf3 ioctl for submission. All BOs mapped on
>> > > that
>> > > VM (through
>> > > + * VM_BIND call) at the time of execbuf3 call are deemed
>> > > required
>> > > for that
>> > > + * submission.
>> > >   *
>> > >   * The execbuf3 ioctl directly specifies the batch addresses
>> > > instead
>> > > of as
>> > >   * object handles as in execbuf2 ioctl. The execbuf3 ioctl will
>> > > also
>> > > not
>> > > @@ -61,6 +64,13 @@
>> > >   * So, a lot of code supporting execbuf2 ioctl, like
>> > > relocations, VA
>> > > evictions,
>> > >   * vma lookup table, implicit sync, vma active reference
>> > > tracking
>> > > etc., are not
>> > >   * applicable for execbuf3 ioctl.
>> > > + *
>> > > + * During each execbuf submission, request fence is added to all
>> > > VM_BIND mapped
>> > > + * objects with DMA_RESV_USAGE_BOOKKEEP. The
>> > > DMA_RESV_USAGE_BOOKKEEP
>> > > usage will
>> > > + * prevent over sync (See enum dma_resv_usage). Note that
>> > > DRM_I915_GEM_WAIT and
>> > > + * DRM_I915_GEM_BUSY ioctls do not check for
>> > > DMA_RESV_USAGE_BOOKKEEP
>> > > usage and
>> > > + * hence should not be used for end of batch check. Instead, the
>> > > execbuf3
>> > > + * timeline out fence should be used for end of batch check.
>> > >   */
>> > >
>> > >  struct eb_fence {
>> > > @@ -124,6 +134,19 @@ eb_find_vma(struct i915_address_space *vm,
>> > > u64
>> > > addr)
>> > >         return i915_gem_vm_bind_lookup_vma(vm, va);
>> > >  }
>> > >
>> > > +static void eb_scoop_unbound_vmas(struct i915_address_space *vm)
>> > > +{
>> > > +       struct i915_vma *vma, *vn;
>> > > +
>> > > +       spin_lock(&vm->vm_rebind_lock);
>> > > +       list_for_each_entry_safe(vma, vn, &vm->vm_rebind_list,
>> > > vm_rebind_link) {
>> > > +               list_del_init(&vma->vm_rebind_link);
>> > > +               if (!list_empty(&vma->vm_bind_link))
>> > > +                       list_move_tail(&vma->vm_bind_link, &vm-
>> > > > vm_bind_list);
>> > > +       }
>> > > +       spin_unlock(&vm->vm_rebind_lock);
>> > > +}
>> > > +
>> > >  static int eb_lookup_vmas(struct i915_execbuffer *eb)
>> > >  {
>> > >         unsigned int i, current_batch = 0;
>> > > @@ -138,11 +161,118 @@ static int eb_lookup_vmas(struct
>> > > i915_execbuffer *eb)
>> > >                 ++current_batch;
>> > >         }
>> > >
>> > > +       eb_scoop_unbound_vmas(eb->context->vm);
>> > > +
>> > > +       return 0;
>> > > +}
>> > > +
>> > > +static int eb_lock_vmas(struct i915_execbuffer *eb)
>> > > +{
>> > > +       struct i915_address_space *vm = eb->context->vm;
>> > > +       struct i915_vma *vma;
>> > > +       int err;
>> > > +
>> > > +       err = i915_gem_vm_priv_lock(eb->context->vm, &eb->ww);
>> > > +       if (err)
>> > > +               return err;
>> > > +
>> >
>> > See comment in review for 08/10 about re-checking the rebind list
>> > here.
>> >
>> >
>> >
>> > > +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
>> > > +                           non_priv_vm_bind_link) {
>> > > +               err = i915_gem_object_lock(vma->obj, &eb->ww);
>> > > +               if (err)
>> > > +                       return err;
>> > > +       }
>> > > +
>> > >         return 0;
>> > >  }
>> > >
>> > > +static void eb_release_persistent_vmas(struct i915_execbuffer
>> > > *eb,
>> > > bool final)
>> > > +{
>> > > +       struct i915_address_space *vm = eb->context->vm;
>> > > +       struct i915_vma *vma, *vn;
>> > > +
>> > > +       assert_vm_bind_held(vm);
>> > > +
>> > > +       if (!(eb->args->flags & __EXEC3_HAS_PIN))
>> > > +               return;
>> > > +
>> > > +       assert_vm_priv_held(vm);
>> > > +
>> > > +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
>> > > +               __i915_vma_unpin(vma);
>> > > +
>> > > +       eb->args->flags &= ~__EXEC3_HAS_PIN;
>> > > +       if (!final)
>> > > +               return;
>> > > +
>> > > +       list_for_each_entry_safe(vma, vn, &vm->vm_bind_list,
>> > > vm_bind_link)
>> > > +               if (i915_vma_is_bind_complete(vma))
>> > > +                       list_move_tail(&vma->vm_bind_link, &vm-
>> > > > vm_bound_list);
>> > > +}
>> > > +
>> > >  static void eb_release_vmas(struct i915_execbuffer *eb, bool
>> > > final)
>> > >  {
>> > > +       eb_release_persistent_vmas(eb, final);
>> > > +       eb_unpin_engine(eb);
>> > > +}
>> > > +
>> > > +static int eb_reserve_fence_for_persistent_vmas(struct
>> > > i915_execbuffer *eb)
>> > > +{
>> > > +       struct i915_address_space *vm = eb->context->vm;
>> > > +       struct i915_vma *vma;
>> > > +       int ret;
>> > > +
>> > > +       ret = dma_resv_reserve_fences(vm->root_obj->base.resv,
>> > > 1);
>> > > +       if (ret)
>> > > +               return ret;
>> > > +
>> > > +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
>> > > +                           non_priv_vm_bind_link) {
>> > > +               ret = dma_resv_reserve_fences(vma->obj-
>> > > >base.resv,
>> > > 1);
>> > > +               if (ret)
>> > > +                       return ret;
>> > > +       }
>> > > +
>> > > +       return 0;
>> > > +}
>> > > +
>> > > +static int eb_validate_persistent_vmas(struct i915_execbuffer
>> > > *eb)
>> > > +{
>> > > +       struct i915_address_space *vm = eb->context->vm;
>> > > +       struct i915_vma *vma, *last_pinned_vma = NULL;
>> > > +       int ret = 0;
>> > > +
>> > > +       assert_vm_bind_held(vm);
>> > > +       assert_vm_priv_held(vm);
>> > > +
>> > > +       ret = eb_reserve_fence_for_persistent_vmas(eb);
>> > > +       if (ret)
>> > > +               return ret;
>> > > +
>> > > +       if (list_empty(&vm->vm_bind_list))
>> > > +               return 0;
>> > > +
>> > > +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
>> > > {
>> > > +               u64 pin_flags = vma->start | PIN_OFFSET_FIXED |
>> > > PIN_USER;
>> > > +
>> > > +               ret = i915_vma_pin_ww(vma, &eb->ww, 0, 0,
>> > > pin_flags);
>> > > +               if (ret)
>> > > +                       break;
>> > > +
>> > > +               last_pinned_vma = vma;
>> > > +       }
>> > > +
>> > > +       if (ret && last_pinned_vma) {
>> > > +               list_for_each_entry(vma, &vm->vm_bind_list,
>> > > vm_bind_link) {
>> > > +                       __i915_vma_unpin(vma);
>> > > +                       if (vma == last_pinned_vma)
>> > > +                               break;
>> > > +               }
>> > > +       } else if (last_pinned_vma) {
>> > > +               eb->args->flags |= __EXEC3_HAS_PIN;
>> > > +       }
>> > > +
>> > > +       return ret;
>> > >  }
>> > >
>> > >  static int eb_validate_vmas(struct i915_execbuffer *eb)
>> > > @@ -162,8 +292,17 @@ static int eb_validate_vmas(struct
>> > > i915_execbuffer *eb)
>> > >         /* only throttle once, even if we didn't need to throttle
>> > > */
>> > >         throttle = false;
>> > >
>> > > +       err = eb_lock_vmas(eb);
>> > > +       if (err)
>> > > +               goto err;
>> > > +
>> > > +       err = eb_validate_persistent_vmas(eb);
>> > > +       if (err)
>> > > +               goto err;
>> > > +
>> > >  err:
>> > >         if (err == -EDEADLK) {
>> > > +               eb_release_vmas(eb, false);
>> > >                 err = i915_gem_ww_ctx_backoff(&eb->ww);
>> > >                 if (!err)
>> > >                         goto retry;
>> > > @@ -187,8 +326,43 @@ static int eb_validate_vmas(struct
>> > > i915_execbuffer *eb)
>> > >         BUILD_BUG_ON(!typecheck(int, _i)); \
>> > >         for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
>> > >
>> > > +static void __eb_persistent_add_shared_fence(struct
>> > > drm_i915_gem_object *obj,
>> > > +                                            struct dma_fence
>> > > *fence)
>> > > +{
>> > > +       dma_resv_add_fence(obj->base.resv, fence,
>> > > DMA_RESV_USAGE_BOOKKEEP);
>> > > +       obj->write_domain = 0;
>> > > +       obj->read_domains |= I915_GEM_GPU_DOMAINS;
>> > > +       obj->mm.dirty = true;
>> > > +}
>> > > +
>> > > +static void eb_persistent_add_shared_fence(struct
>> > > i915_execbuffer
>> > > *eb)
>> > > +{
>> > > +       struct i915_address_space *vm = eb->context->vm;
>> > > +       struct dma_fence *fence;
>> > > +       struct i915_vma *vma;
>> > > +
>> > > +       fence = eb->composite_fence ? eb->composite_fence :
>> > > +               &eb->requests[0]->fence;
>> > > +
>> > > +       __eb_persistent_add_shared_fence(vm->root_obj, fence);
>> > > +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
>> > > +                           non_priv_vm_bind_link)
>> > > +               __eb_persistent_add_shared_fence(vma->obj,
>> > > fence);
>> > > +}
>> > > +
>> > > +static void eb_persistent_vmas_move_to_active(struct
>> > > i915_execbuffer
>> > > *eb)
>> > > +{
>> > > +       /* Add fence to BOs dma-resv fence list */
>> > > +       eb_persistent_add_shared_fence(eb);
>> >
>> > This means we don't add any fences to the vma active trackers.
>> > While
>> > this works fine for TTM delayed buffer destruction, unbinding at
>> > eviction and shrinking wouldn't wait for gpu activity to idle
>> > before
>> > unbinding?
>>
>> Eviction and shrinker will wait for gpu activity to idle before
>> unbinding.
>> The i915_vma_is_active() and i915_vma_sync() have been updated to
>> handle
>> the persistent vmas to differntly (by checking/waiting for dma-resv
>> fence
>> list).
>
>Ah, yes. Now I see. Still the async unbinding __i915_vma_unbind_async()
>needs update? Should probably add a i915_sw_fence_await_reservation()
>(modified if needed for bookkeeping fences) of the vm's reservation
>object there?

Ah, yah, I missed __i915_vma_unbind_async(). Looks like we need something
like that. Need to look closer.

Niranjana

>
>/Thomas
>
>>
>> Niranjana
>>
>> >
>> >
>> > /Thomas
>> >
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 07/10] drm/i915/vm_bind: Handle persistent vmas in execbuf3
@ 2022-07-09 20:25           ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-09 20:25 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Auld, Matthew, Vetter,
	Daniel, christian.koenig

On Fri, Jul 08, 2022 at 06:03:28AM -0700, Hellstrom, Thomas wrote:
>On Fri, 2022-07-08 at 05:44 -0700, Niranjana Vishwanathapura wrote:
>> On Thu, Jul 07, 2022 at 07:54:16AM -0700, Hellstrom, Thomas wrote:
>> > On Fri, 2022-07-01 at 15:50 -0700, Niranjana Vishwanathapura wrote:
>> > > Handle persistent (VM_BIND) mappings during the request
>> > > submission
>> > > in the execbuf3 path.
>> > >
>> > > Signed-off-by: Niranjana Vishwanathapura
>> > > <niranjana.vishwanathapura@intel.com>
>> > > ---
>> > >  .../gpu/drm/i915/gem/i915_gem_execbuffer3.c   | 176
>> > > +++++++++++++++++-
>> > >  1 file changed, 175 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > > index 13121df72e3d..2079f5ca9010 100644
>> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> > > @@ -22,6 +22,7 @@
>> > >  #include "i915_gem_vm_bind.h"
>> > >  #include "i915_trace.h"
>> > >
>> > > +#define __EXEC3_HAS_PIN                        BIT_ULL(33)
>> > >  #define __EXEC3_ENGINE_PINNED          BIT_ULL(32)
>> > >  #define __EXEC3_INTERNAL_FLAGS         (~0ull << 32)
>> > >
>> > > @@ -45,7 +46,9 @@
>> > >   * execlist. Hence, no support for implicit sync.
>> > >   *
>> > >   * The new execbuf3 ioctl only works in VM_BIND mode and the
>> > > VM_BIND
>> > > mode only
>> > > - * works with execbuf3 ioctl for submission.
>> > > + * works with execbuf3 ioctl for submission. All BOs mapped on
>> > > that
>> > > VM (through
>> > > + * VM_BIND call) at the time of execbuf3 call are deemed
>> > > required
>> > > for that
>> > > + * submission.
>> > >   *
>> > >   * The execbuf3 ioctl directly specifies the batch addresses
>> > > instead
>> > > of as
>> > >   * object handles as in execbuf2 ioctl. The execbuf3 ioctl will
>> > > also
>> > > not
>> > > @@ -61,6 +64,13 @@
>> > >   * So, a lot of code supporting execbuf2 ioctl, like
>> > > relocations, VA
>> > > evictions,
>> > >   * vma lookup table, implicit sync, vma active reference
>> > > tracking
>> > > etc., are not
>> > >   * applicable for execbuf3 ioctl.
>> > > + *
>> > > + * During each execbuf submission, request fence is added to all
>> > > VM_BIND mapped
>> > > + * objects with DMA_RESV_USAGE_BOOKKEEP. The
>> > > DMA_RESV_USAGE_BOOKKEEP
>> > > usage will
>> > > + * prevent over sync (See enum dma_resv_usage). Note that
>> > > DRM_I915_GEM_WAIT and
>> > > + * DRM_I915_GEM_BUSY ioctls do not check for
>> > > DMA_RESV_USAGE_BOOKKEEP
>> > > usage and
>> > > + * hence should not be used for end of batch check. Instead, the
>> > > execbuf3
>> > > + * timeline out fence should be used for end of batch check.
>> > >   */
>> > >
>> > >  struct eb_fence {
>> > > @@ -124,6 +134,19 @@ eb_find_vma(struct i915_address_space *vm,
>> > > u64
>> > > addr)
>> > >         return i915_gem_vm_bind_lookup_vma(vm, va);
>> > >  }
>> > >
>> > > +static void eb_scoop_unbound_vmas(struct i915_address_space *vm)
>> > > +{
>> > > +       struct i915_vma *vma, *vn;
>> > > +
>> > > +       spin_lock(&vm->vm_rebind_lock);
>> > > +       list_for_each_entry_safe(vma, vn, &vm->vm_rebind_list,
>> > > vm_rebind_link) {
>> > > +               list_del_init(&vma->vm_rebind_link);
>> > > +               if (!list_empty(&vma->vm_bind_link))
>> > > +                       list_move_tail(&vma->vm_bind_link, &vm-
>> > > > vm_bind_list);
>> > > +       }
>> > > +       spin_unlock(&vm->vm_rebind_lock);
>> > > +}
>> > > +
>> > >  static int eb_lookup_vmas(struct i915_execbuffer *eb)
>> > >  {
>> > >         unsigned int i, current_batch = 0;
>> > > @@ -138,11 +161,118 @@ static int eb_lookup_vmas(struct
>> > > i915_execbuffer *eb)
>> > >                 ++current_batch;
>> > >         }
>> > >
>> > > +       eb_scoop_unbound_vmas(eb->context->vm);
>> > > +
>> > > +       return 0;
>> > > +}
>> > > +
>> > > +static int eb_lock_vmas(struct i915_execbuffer *eb)
>> > > +{
>> > > +       struct i915_address_space *vm = eb->context->vm;
>> > > +       struct i915_vma *vma;
>> > > +       int err;
>> > > +
>> > > +       err = i915_gem_vm_priv_lock(eb->context->vm, &eb->ww);
>> > > +       if (err)
>> > > +               return err;
>> > > +
>> >
>> > See comment in review for 08/10 about re-checking the rebind list
>> > here.
>> >
>> >
>> >
>> > > +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
>> > > +                           non_priv_vm_bind_link) {
>> > > +               err = i915_gem_object_lock(vma->obj, &eb->ww);
>> > > +               if (err)
>> > > +                       return err;
>> > > +       }
>> > > +
>> > >         return 0;
>> > >  }
>> > >
>> > > +static void eb_release_persistent_vmas(struct i915_execbuffer
>> > > *eb,
>> > > bool final)
>> > > +{
>> > > +       struct i915_address_space *vm = eb->context->vm;
>> > > +       struct i915_vma *vma, *vn;
>> > > +
>> > > +       assert_vm_bind_held(vm);
>> > > +
>> > > +       if (!(eb->args->flags & __EXEC3_HAS_PIN))
>> > > +               return;
>> > > +
>> > > +       assert_vm_priv_held(vm);
>> > > +
>> > > +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
>> > > +               __i915_vma_unpin(vma);
>> > > +
>> > > +       eb->args->flags &= ~__EXEC3_HAS_PIN;
>> > > +       if (!final)
>> > > +               return;
>> > > +
>> > > +       list_for_each_entry_safe(vma, vn, &vm->vm_bind_list,
>> > > vm_bind_link)
>> > > +               if (i915_vma_is_bind_complete(vma))
>> > > +                       list_move_tail(&vma->vm_bind_link, &vm-
>> > > > vm_bound_list);
>> > > +}
>> > > +
>> > >  static void eb_release_vmas(struct i915_execbuffer *eb, bool
>> > > final)
>> > >  {
>> > > +       eb_release_persistent_vmas(eb, final);
>> > > +       eb_unpin_engine(eb);
>> > > +}
>> > > +
>> > > +static int eb_reserve_fence_for_persistent_vmas(struct
>> > > i915_execbuffer *eb)
>> > > +{
>> > > +       struct i915_address_space *vm = eb->context->vm;
>> > > +       struct i915_vma *vma;
>> > > +       int ret;
>> > > +
>> > > +       ret = dma_resv_reserve_fences(vm->root_obj->base.resv,
>> > > 1);
>> > > +       if (ret)
>> > > +               return ret;
>> > > +
>> > > +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
>> > > +                           non_priv_vm_bind_link) {
>> > > +               ret = dma_resv_reserve_fences(vma->obj-
>> > > >base.resv,
>> > > 1);
>> > > +               if (ret)
>> > > +                       return ret;
>> > > +       }
>> > > +
>> > > +       return 0;
>> > > +}
>> > > +
>> > > +static int eb_validate_persistent_vmas(struct i915_execbuffer
>> > > *eb)
>> > > +{
>> > > +       struct i915_address_space *vm = eb->context->vm;
>> > > +       struct i915_vma *vma, *last_pinned_vma = NULL;
>> > > +       int ret = 0;
>> > > +
>> > > +       assert_vm_bind_held(vm);
>> > > +       assert_vm_priv_held(vm);
>> > > +
>> > > +       ret = eb_reserve_fence_for_persistent_vmas(eb);
>> > > +       if (ret)
>> > > +               return ret;
>> > > +
>> > > +       if (list_empty(&vm->vm_bind_list))
>> > > +               return 0;
>> > > +
>> > > +       list_for_each_entry(vma, &vm->vm_bind_list, vm_bind_link)
>> > > {
>> > > +               u64 pin_flags = vma->start | PIN_OFFSET_FIXED |
>> > > PIN_USER;
>> > > +
>> > > +               ret = i915_vma_pin_ww(vma, &eb->ww, 0, 0,
>> > > pin_flags);
>> > > +               if (ret)
>> > > +                       break;
>> > > +
>> > > +               last_pinned_vma = vma;
>> > > +       }
>> > > +
>> > > +       if (ret && last_pinned_vma) {
>> > > +               list_for_each_entry(vma, &vm->vm_bind_list,
>> > > vm_bind_link) {
>> > > +                       __i915_vma_unpin(vma);
>> > > +                       if (vma == last_pinned_vma)
>> > > +                               break;
>> > > +               }
>> > > +       } else if (last_pinned_vma) {
>> > > +               eb->args->flags |= __EXEC3_HAS_PIN;
>> > > +       }
>> > > +
>> > > +       return ret;
>> > >  }
>> > >
>> > >  static int eb_validate_vmas(struct i915_execbuffer *eb)
>> > > @@ -162,8 +292,17 @@ static int eb_validate_vmas(struct
>> > > i915_execbuffer *eb)
>> > >         /* only throttle once, even if we didn't need to throttle
>> > > */
>> > >         throttle = false;
>> > >
>> > > +       err = eb_lock_vmas(eb);
>> > > +       if (err)
>> > > +               goto err;
>> > > +
>> > > +       err = eb_validate_persistent_vmas(eb);
>> > > +       if (err)
>> > > +               goto err;
>> > > +
>> > >  err:
>> > >         if (err == -EDEADLK) {
>> > > +               eb_release_vmas(eb, false);
>> > >                 err = i915_gem_ww_ctx_backoff(&eb->ww);
>> > >                 if (!err)
>> > >                         goto retry;
>> > > @@ -187,8 +326,43 @@ static int eb_validate_vmas(struct
>> > > i915_execbuffer *eb)
>> > >         BUILD_BUG_ON(!typecheck(int, _i)); \
>> > >         for ((_i) = (_eb)->num_batches - 1; (_i) >= 0; --(_i))
>> > >
>> > > +static void __eb_persistent_add_shared_fence(struct
>> > > drm_i915_gem_object *obj,
>> > > +                                            struct dma_fence
>> > > *fence)
>> > > +{
>> > > +       dma_resv_add_fence(obj->base.resv, fence,
>> > > DMA_RESV_USAGE_BOOKKEEP);
>> > > +       obj->write_domain = 0;
>> > > +       obj->read_domains |= I915_GEM_GPU_DOMAINS;
>> > > +       obj->mm.dirty = true;
>> > > +}
>> > > +
>> > > +static void eb_persistent_add_shared_fence(struct
>> > > i915_execbuffer
>> > > *eb)
>> > > +{
>> > > +       struct i915_address_space *vm = eb->context->vm;
>> > > +       struct dma_fence *fence;
>> > > +       struct i915_vma *vma;
>> > > +
>> > > +       fence = eb->composite_fence ? eb->composite_fence :
>> > > +               &eb->requests[0]->fence;
>> > > +
>> > > +       __eb_persistent_add_shared_fence(vm->root_obj, fence);
>> > > +       list_for_each_entry(vma, &vm->non_priv_vm_bind_list,
>> > > +                           non_priv_vm_bind_link)
>> > > +               __eb_persistent_add_shared_fence(vma->obj,
>> > > fence);
>> > > +}
>> > > +
>> > > +static void eb_persistent_vmas_move_to_active(struct
>> > > i915_execbuffer
>> > > *eb)
>> > > +{
>> > > +       /* Add fence to BOs dma-resv fence list */
>> > > +       eb_persistent_add_shared_fence(eb);
>> >
>> > This means we don't add any fences to the vma active trackers.
>> > While
>> > this works fine for TTM delayed buffer destruction, unbinding at
>> > eviction and shrinking wouldn't wait for gpu activity to idle
>> > before
>> > unbinding?
>>
>> Eviction and shrinker will wait for gpu activity to idle before
>> unbinding.
>> The i915_vma_is_active() and i915_vma_sync() have been updated to
>> handle
>> the persistent vmas to differntly (by checking/waiting for dma-resv
>> fence
>> list).
>
>Ah, yes. Now I see. Still the async unbinding __i915_vma_unbind_async()
>needs update? Should probably add a i915_sw_fence_await_reservation()
>(modified if needed for bookkeeping fences) of the vm's reservation
>object there?

Ah, yah, I missed __i915_vma_unbind_async(). Looks like we need something
like that. Need to look closer.

Niranjana

>
>/Thomas
>
>>
>> Niranjana
>>
>> >
>> >
>> > /Thomas
>> >
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [RFC 08/10] drm/i915/vm_bind: userptr dma-resv changes
  2022-07-08 15:20         ` [Intel-gfx] " Hellstrom, Thomas
@ 2022-07-09 20:56           ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-09 20:56 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Brost, Matthew, Zanoni, Paulo R, Ursulin, Tvrtko, intel-gfx,
	dri-devel, Landwerlin, Lionel G, jason, Vetter, Daniel,
	christian.koenig, Auld, Matthew

On Fri, Jul 08, 2022 at 03:20:01PM +0000, Hellstrom, Thomas wrote:
>On Fri, 2022-07-08 at 07:51 -0700, Niranjana Vishwanathapura wrote:
>> > Since we don't loop over the vm_bound_list, there is a need to
>> > check
>> > whether the rebind_list is empty here under the notifier_lock in
>> > read
>> > mode, and in that case, restart from eb_lookup_vmas(). That might
>> > also
>> > eliminate the need for the __EXEC3_USERPTR_USED flag?
>> >
>> > That will also catch any objects that were evicted between
>> > eb_lookup_vmas() where the rebind_list was last checked, and
>> > i915_gem_vm_priv_lock(), which prohibits further eviction, but if
>> > we
>> > want to catch these earlier (which I think is a good idea), we
>> > could
>> > check that the rebind_list is indeed empty just after taking the
>> > vm_priv_lock(), and if not, restart from eb_lookup_vmas().
>>
>> Yah, right, we need to check rebind_list here and if not empty,
>> restart
>> from lookup phase.
>> It is bit tricky with userptr here as the unbind happens during
>> submit_init() call after we scoop unbound vmas here, the vmas gets
>> re-added to rebind_list :(.
>
>Ugh.
>
>> I think we need a separate 'invalidated_userptr_list' here and we
>> iterate through it for submit_init() and submit_done() calls (yes,
>> __EXEC3_USERPTR_USED flag won't be needed then).
>> And, we call, eb_scoop_unbound_vmas() after calling
>> eb_lookup_persistent_userptr_vmas(), so that we scoop all unbound
>> vmas properly.
>>
>
>I'm not sure that will help much, because we'd also need to recheck the
>rebind_list and possibly restart after taking the vm_priv_lock, since
>objects can be evicted between the scooping and taking the
>vm_priv_lock. So then the userptrs will be caught by that check.

Yah, what I mentioned above is in addition to rechecking rebind_list and
restarting.

Niranjana

>
>/Thomas
>
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 08/10] drm/i915/vm_bind: userptr dma-resv changes
@ 2022-07-09 20:56           ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-09 20:56 UTC (permalink / raw)
  To: Hellstrom, Thomas
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Vetter, Daniel,
	christian.koenig, Auld, Matthew

On Fri, Jul 08, 2022 at 03:20:01PM +0000, Hellstrom, Thomas wrote:
>On Fri, 2022-07-08 at 07:51 -0700, Niranjana Vishwanathapura wrote:
>> > Since we don't loop over the vm_bound_list, there is a need to
>> > check
>> > whether the rebind_list is empty here under the notifier_lock in
>> > read
>> > mode, and in that case, restart from eb_lookup_vmas(). That might
>> > also
>> > eliminate the need for the __EXEC3_USERPTR_USED flag?
>> >
>> > That will also catch any objects that were evicted between
>> > eb_lookup_vmas() where the rebind_list was last checked, and
>> > i915_gem_vm_priv_lock(), which prohibits further eviction, but if
>> > we
>> > want to catch these earlier (which I think is a good idea), we
>> > could
>> > check that the rebind_list is indeed empty just after taking the
>> > vm_priv_lock(), and if not, restart from eb_lookup_vmas().
>>
>> Yah, right, we need to check rebind_list here and if not empty,
>> restart
>> from lookup phase.
>> It is bit tricky with userptr here as the unbind happens during
>> submit_init() call after we scoop unbound vmas here, the vmas gets
>> re-added to rebind_list :(.
>
>Ugh.
>
>> I think we need a separate 'invalidated_userptr_list' here and we
>> iterate through it for submit_init() and submit_done() calls (yes,
>> __EXEC3_USERPTR_USED flag won't be needed then).
>> And, we call, eb_scoop_unbound_vmas() after calling
>> eb_lookup_persistent_userptr_vmas(), so that we scoop all unbound
>> vmas properly.
>>
>
>I'm not sure that will help much, because we'd also need to recheck the
>rebind_list and possibly restart after taking the vm_priv_lock, since
>objects can be evicted between the scooping and taking the
>vm_priv_lock. So then the userptrs will be caught by that check.

Yah, what I mentioned above is in addition to rechecking rebind_list and
restarting.

Niranjana

>
>/Thomas
>
>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
  2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
  (?)
  (?)
@ 2022-07-18 10:55   ` Tvrtko Ursulin
  2022-07-26  5:07     ` Niranjana Vishwanathapura
  -1 siblings, 1 reply; 121+ messages in thread
From: Tvrtko Ursulin @ 2022-07-18 10:55 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel
  Cc: daniel.vetter, christian.koenig, thomas.hellstrom,
	paulo.r.zanoni, matthew.auld


On 01/07/2022 23:50, Niranjana Vishwanathapura wrote:
> Bind and unbind the mappings upon VM_BIND and VM_UNBIND calls.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> ---
>   drivers/gpu/drm/i915/Makefile                 |   1 +
>   drivers/gpu/drm/i915/gem/i915_gem_create.c    |  10 +-
>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +
>   drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  38 +++
>   .../drm/i915/gem/i915_gem_vm_bind_object.c    | 233 ++++++++++++++++++
>   drivers/gpu/drm/i915/gt/intel_gtt.c           |   7 +
>   drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>   drivers/gpu/drm/i915/i915_driver.c            |  11 +-
>   drivers/gpu/drm/i915/i915_vma.c               |   7 +-
>   drivers/gpu/drm/i915/i915_vma.h               |   2 -
>   drivers/gpu/drm/i915/i915_vma_types.h         |   8 +
>   11 files changed, 318 insertions(+), 10 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>   create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 522ef9b4aff3..4e1627e96c6e 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -165,6 +165,7 @@ gem-y += \
>   	gem/i915_gem_ttm_move.o \
>   	gem/i915_gem_ttm_pm.o \
>   	gem/i915_gem_userptr.o \
> +	gem/i915_gem_vm_bind_object.o \
>   	gem/i915_gem_wait.o \
>   	gem/i915_gemfs.o
>   i915-y += \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> index 33673fe7ee0a..927a87e5ec59 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> @@ -15,10 +15,10 @@
>   #include "i915_trace.h"
>   #include "i915_user_extensions.h"
>   
> -static u32 object_max_page_size(struct intel_memory_region **placements,
> -				unsigned int n_placements)
> +u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
> +				  unsigned int n_placements)
>   {
> -	u32 max_page_size = 0;
> +	u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>   	int i;
>   
>   	for (i = 0; i < n_placements; i++) {
> @@ -28,7 +28,6 @@ static u32 object_max_page_size(struct intel_memory_region **placements,
>   		max_page_size = max_t(u32, max_page_size, mr->min_page_size);
>   	}
>   
> -	GEM_BUG_ON(!max_page_size);
>   	return max_page_size;
>   }
>   
> @@ -99,7 +98,8 @@ __i915_gem_object_create_user_ext(struct drm_i915_private *i915, u64 size,
>   
>   	i915_gem_flush_free_objects(i915);
>   
> -	size = round_up(size, object_max_page_size(placements, n_placements));
> +	size = round_up(size, i915_gem_object_max_page_size(placements,
> +							    n_placements));
>   	if (size == 0)
>   		return ERR_PTR(-EINVAL);
>   
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 6f0a3ce35567..650de2224843 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
>   }
>   
>   void i915_gem_init__objects(struct drm_i915_private *i915);
> +u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
> +				  unsigned int n_placements);
>   
>   void i915_objects_module_exit(void);
>   int i915_objects_module_init(void);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> new file mode 100644
> index 000000000000..642cdb559f17
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
> @@ -0,0 +1,38 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#ifndef __I915_GEM_VM_BIND_H
> +#define __I915_GEM_VM_BIND_H
> +
> +#include "i915_drv.h"
> +
> +#define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)->vm_bind_lock)
> +
> +static inline void i915_gem_vm_bind_lock(struct i915_address_space *vm)
> +{
> +	mutex_lock(&vm->vm_bind_lock);
> +}
> +
> +static inline int
> +i915_gem_vm_bind_lock_interruptible(struct i915_address_space *vm)
> +{
> +	return mutex_lock_interruptible(&vm->vm_bind_lock);
> +}
> +
> +static inline void i915_gem_vm_bind_unlock(struct i915_address_space *vm)
> +{
> +	mutex_unlock(&vm->vm_bind_lock);
> +}
> +
> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
> +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> +			 struct drm_i915_gem_vm_bind *va,
> +			 struct drm_file *file);
> +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
> +			   struct drm_i915_gem_vm_unbind *va);
> +
> +#endif /* __I915_GEM_VM_BIND_H */
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> new file mode 100644
> index 000000000000..43ceb4dcca6c
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
> @@ -0,0 +1,233 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#include <linux/interval_tree_generic.h>
> +
> +#include "gem/i915_gem_vm_bind.h"
> +#include "gt/gen8_engine_cs.h"
> +
> +#include "i915_drv.h"
> +#include "i915_gem_gtt.h"
> +
> +#define START(node) ((node)->start)
> +#define LAST(node) ((node)->last)
> +
> +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
> +		     START, LAST, static inline, i915_vm_bind_it)
> +
> +#undef START
> +#undef LAST
> +
> +/**
> + * DOC: VM_BIND/UNBIND ioctls
> + *
> + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
> + * objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
> + * specified address space (VM). Multiple mappings can map to the same physical
> + * pages of an object (aliasing). These mappings (also referred to as persistent
> + * mappings) will be persistent across multiple GPU submissions (execbuf calls)
> + * issued by the UMD, without user having to provide a list of all required
> + * mappings during each submission (as required by older execbuf mode).
> + *
> + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
> + * signaling the completion of bind/unbind operation.
> + *
> + * VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
> + * User has to opt-in for VM_BIND mode of binding for an address space (VM)
> + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be
> + * done asynchronously, when valid out fence is specified.
> + *
> + * VM_BIND locking order is as below.
> + *
> + * 1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
> + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
> + *    mapping.
> + *
> + *    In future, when GPU page faults are supported, we can potentially use a
> + *    rwsem instead, so that multiple page fault handlers can take the read
> + *    side lock to lookup the mapping and hence can run in parallel.
> + *    The older execbuf mode of binding do not need this lock.
> + *
> + * 2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs
> + *    to be held while binding/unbinding a vma in the async worker and while
> + *    updating dma-resv fence list of an object. Note that private BOs of a VM
> + *    will all share a dma-resv object.
> + *
> + *    The future system allocator support will use the HMM prescribed locking
> + *    instead.
> + *
> + * 3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
> + *    invalidated vmas (due to eviction and userptr invalidation) etc.
> + */
> +
> +struct i915_vma *
> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
> +{
> +	struct i915_vma *vma, *temp;
> +
> +	assert_vm_bind_held(vm);
> +
> +	vma = i915_vm_bind_it_iter_first(&vm->va, va, va);
> +	/* Working around compiler error, remove later */
> +	if (vma)
> +		temp = i915_vm_bind_it_iter_next(vma, va + vma->size, -1);
> +	return vma;
> +}
> +
> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
> +{
> +	assert_vm_bind_held(vma->vm);
> +
> +	if (!list_empty(&vma->vm_bind_link)) {
> +		list_del_init(&vma->vm_bind_link);
> +		i915_vm_bind_it_remove(vma, &vma->vm->va);
> +
> +		/* Release object */
> +		if (release_obj)
> +			i915_vma_put(vma);
> +	}
> +}
> +
> +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
> +			   struct drm_i915_gem_vm_unbind *va)
> +{
> +	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma;
> +	int ret;
> +
> +	va->start = gen8_noncanonical_addr(va->start);
> +	ret = i915_gem_vm_bind_lock_interruptible(vm);
> +	if (ret)
> +		return ret;
> +
> +	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +	if (!vma) {
> +		ret = -ENOENT;
> +		goto out_unlock;
> +	}
> +
> +	if (vma->size != va->length)
> +		ret = -EINVAL;
> +	else
> +		i915_gem_vm_bind_remove(vma, false);
> +
> +out_unlock:
> +	i915_gem_vm_bind_unlock(vm);
> +	if (ret || !vma)
> +		return ret;
> +
> +	/* Destroy vma and then release object */
> +	obj = vma->obj;
> +	ret = i915_gem_object_lock(obj, NULL);
> +	if (ret)
> +		return ret;
> +
> +	i915_vma_destroy(vma);
> +	i915_gem_object_unlock(obj);
> +	i915_gem_object_put(obj);
> +
> +	return 0;
> +}
> +
> +static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
> +					struct drm_i915_gem_object *obj,
> +					struct drm_i915_gem_vm_bind *va)
> +{
> +	struct i915_ggtt_view view;
> +	struct i915_vma *vma;
> +
> +	va->start = gen8_noncanonical_addr(va->start);
> +	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
> +	if (vma)
> +		return ERR_PTR(-EEXIST);
> +
> +	view.type = I915_GGTT_VIEW_PARTIAL;

One pre-requisite, which was known for "years", was to refactor the GGTT 
view code into a generic concept. (GGTT has no place in VM BIND code.) 
It may be just a question of renaming things, or it may end up a bit 
more, but in any case please do include that refactor in this series.

Regards,

Tvrtko

> +	view.partial.offset = va->offset >> PAGE_SHIFT;
> +	view.partial.size = va->length >> PAGE_SHIFT;
> +	vma = i915_vma_instance(obj, vm, &view);
> +	if (IS_ERR(vma))
> +		return vma;
> +
> +	vma->start = va->start;
> +	vma->last = va->start + va->length - 1;
> +
> +	return vma;
> +}
> +
> +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
> +			 struct drm_i915_gem_vm_bind *va,
> +			 struct drm_file *file)
> +{
> +	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma = NULL;
> +	struct i915_gem_ww_ctx ww;
> +	u64 pin_flags;
> +	int ret = 0;
> +
> +	if (!vm->vm_bind_mode)
> +		return -EOPNOTSUPP;
> +
> +	obj = i915_gem_object_lookup(file, va->handle);
> +	if (!obj)
> +		return -ENOENT;
> +
> +	if (!va->length ||
> +	    !IS_ALIGNED(va->offset | va->length,
> +			i915_gem_object_max_page_size(obj->mm.placements,
> +						      obj->mm.n_placements)) ||
> +	    range_overflows_t(u64, va->offset, va->length, obj->base.size)) {
> +		ret = -EINVAL;
> +		goto put_obj;
> +	}
> +
> +	ret = i915_gem_vm_bind_lock_interruptible(vm);
> +	if (ret)
> +		goto put_obj;
> +
> +	vma = vm_bind_get_vma(vm, obj, va);
> +	if (IS_ERR(vma)) {
> +		ret = PTR_ERR(vma);
> +		goto unlock_vm;
> +	}
> +
> +	i915_gem_ww_ctx_init(&ww, true);
> +	pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
> +retry:
> +	ret = i915_gem_object_lock(vma->obj, &ww);
> +	if (ret)
> +		goto out_ww;
> +
> +	ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
> +	if (ret)
> +		goto out_ww;
> +
> +	/* Make it evictable */
> +	__i915_vma_unpin(vma);
> +
> +	list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
> +	i915_vm_bind_it_insert(vma, &vm->va);
> +
> +	/* Hold object reference until vm_unbind */
> +	i915_gem_object_get(vma->obj);
> +out_ww:
> +	if (ret == -EDEADLK) {
> +		ret = i915_gem_ww_ctx_backoff(&ww);
> +		if (!ret)
> +			goto retry;
> +	}
> +
> +	if (ret)
> +		i915_vma_destroy(vma);
> +
> +	i915_gem_ww_ctx_fini(&ww);
> +unlock_vm:
> +	i915_gem_vm_bind_unlock(vm);
> +put_obj:
> +	i915_gem_object_put(obj);
> +	return ret;
> +}
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index b67831833c9a..135dc4a76724 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -176,6 +176,8 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
>   void i915_address_space_fini(struct i915_address_space *vm)
>   {
>   	drm_mm_takedown(&vm->mm);
> +	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
> +	mutex_destroy(&vm->vm_bind_lock);
>   }
>   
>   /**
> @@ -282,6 +284,11 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>   
>   	INIT_LIST_HEAD(&vm->bound_list);
>   	INIT_LIST_HEAD(&vm->unbound_list);
> +
> +	vm->va = RB_ROOT_CACHED;
> +	INIT_LIST_HEAD(&vm->vm_bind_list);
> +	INIT_LIST_HEAD(&vm->vm_bound_list);
> +	mutex_init(&vm->vm_bind_lock);
>   }
>   
>   void *__px_vaddr(struct drm_i915_gem_object *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index c812aa9708ae..d4a6ce65251d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -259,6 +259,15 @@ struct i915_address_space {
>   	 */
>   	struct list_head unbound_list;
>   
> +	/**
> +	 * List of VM_BIND objects.
> +	 */
> +	struct mutex vm_bind_lock;  /* Protects vm_bind lists */
> +	struct list_head vm_bind_list;
> +	struct list_head vm_bound_list;
> +	/* va tree of persistent vmas */
> +	struct rb_root_cached va;
> +
>   	/* Global GTT */
>   	bool is_ggtt:1;
>   
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index ccf990dfd99b..776ab7844f60 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -68,6 +68,7 @@
>   #include "gem/i915_gem_ioctls.h"
>   #include "gem/i915_gem_mman.h"
>   #include "gem/i915_gem_pm.h"
> +#include "gem/i915_gem_vm_bind.h"
>   #include "gt/intel_gt.h"
>   #include "gt/intel_gt_pm.h"
>   #include "gt/intel_rc6.h"
> @@ -1783,13 +1784,16 @@ static int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
>   {
>   	struct drm_i915_gem_vm_bind *args = data;
>   	struct i915_address_space *vm;
> +	int ret;
>   
>   	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>   	if (unlikely(!vm))
>   		return -ENOENT;
>   
> +	ret = i915_gem_vm_bind_obj(vm, args, file);
> +
>   	i915_vm_put(vm);
> -	return -EINVAL;
> +	return ret;
>   }
>   
>   static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
> @@ -1797,13 +1801,16 @@ static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
>   {
>   	struct drm_i915_gem_vm_unbind *args = data;
>   	struct i915_address_space *vm;
> +	int ret;
>   
>   	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>   	if (unlikely(!vm))
>   		return -ENOENT;
>   
> +	ret = i915_gem_vm_unbind_obj(vm, args);
> +
>   	i915_vm_put(vm);
> -	return -EINVAL;
> +	return ret;
>   }
>   
>   static const struct drm_ioctl_desc i915_ioctls[] = {
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 43339ecabd73..d324e29cef0a 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -29,6 +29,7 @@
>   #include "display/intel_frontbuffer.h"
>   #include "gem/i915_gem_lmem.h"
>   #include "gem/i915_gem_tiling.h"
> +#include "gem/i915_gem_vm_bind.h"
>   #include "gt/intel_engine.h"
>   #include "gt/intel_engine_heartbeat.h"
>   #include "gt/intel_gt.h"
> @@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
>   	spin_unlock(&obj->vma.lock);
>   	mutex_unlock(&vm->mutex);
>   
> +	INIT_LIST_HEAD(&vma->vm_bind_link);
>   	return vma;
>   
>   err_unlock:
> @@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>   {
>   	struct i915_vma *vma;
>   
> -	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>   	GEM_BUG_ON(!kref_read(&vm->ref));
>   
>   	spin_lock(&obj->vma.lock);
> @@ -1660,6 +1661,10 @@ static void release_references(struct i915_vma *vma, bool vm_ddestroy)
>   
>   	spin_unlock(&obj->vma.lock);
>   
> +	i915_gem_vm_bind_lock(vma->vm);
> +	i915_gem_vm_bind_remove(vma, true);
> +	i915_gem_vm_bind_unlock(vma->vm);
> +
>   	spin_lock_irq(&gt->closed_lock);
>   	__i915_vma_remove_closed(vma);
>   	spin_unlock_irq(&gt->closed_lock);
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 88ca0bd9c900..dcb49f79ff7e 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
>   {
>   	ptrdiff_t cmp;
>   
> -	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
> -
>   	cmp = ptrdiff(vma->vm, vm);
>   	if (cmp)
>   		return cmp;
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
> index be6e028c3b57..b6d179bdbfa0 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -289,6 +289,14 @@ struct i915_vma {
>   	/** This object's place on the active/inactive lists */
>   	struct list_head vm_link;
>   
> +	struct list_head vm_bind_link; /* Link in persistent VMA list */
> +
> +	/** Interval tree structures for persistent vma */
> +	struct rb_node rb;
> +	u64 start;
> +	u64 last;
> +	u64 __subtree_last;
> +
>   	struct list_head obj_link; /* Link in the object's VMA list */
>   	struct rb_node obj_node;
>   	struct hlist_node obj_hash;

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
  2022-07-18 10:55   ` Tvrtko Ursulin
@ 2022-07-26  5:07     ` Niranjana Vishwanathapura
  2022-07-26  8:40       ` Tvrtko Ursulin
  0 siblings, 1 reply; 121+ messages in thread
From: Niranjana Vishwanathapura @ 2022-07-26  5:07 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	matthew.auld, daniel.vetter, christian.koenig

On Mon, Jul 18, 2022 at 11:55:41AM +0100, Tvrtko Ursulin wrote:
>
>On 01/07/2022 23:50, Niranjana Vishwanathapura wrote:
>>Bind and unbind the mappings upon VM_BIND and VM_UNBIND calls.
>>
>>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
>>---
>>  drivers/gpu/drm/i915/Makefile                 |   1 +
>>  drivers/gpu/drm/i915/gem/i915_gem_create.c    |  10 +-
>>  drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +
>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  38 +++
>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 233 ++++++++++++++++++
>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |   7 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>>  drivers/gpu/drm/i915/i915_driver.c            |  11 +-
>>  drivers/gpu/drm/i915/i915_vma.c               |   7 +-
>>  drivers/gpu/drm/i915/i915_vma.h               |   2 -
>>  drivers/gpu/drm/i915/i915_vma_types.h         |   8 +
>>  11 files changed, 318 insertions(+), 10 deletions(-)
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>
>>diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>>index 522ef9b4aff3..4e1627e96c6e 100644
>>--- a/drivers/gpu/drm/i915/Makefile
>>+++ b/drivers/gpu/drm/i915/Makefile
>>@@ -165,6 +165,7 @@ gem-y += \
>>  	gem/i915_gem_ttm_move.o \
>>  	gem/i915_gem_ttm_pm.o \
>>  	gem/i915_gem_userptr.o \
>>+	gem/i915_gem_vm_bind_object.o \
>>  	gem/i915_gem_wait.o \
>>  	gem/i915_gemfs.o
>>  i915-y += \
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>index 33673fe7ee0a..927a87e5ec59 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>@@ -15,10 +15,10 @@
>>  #include "i915_trace.h"
>>  #include "i915_user_extensions.h"
>>-static u32 object_max_page_size(struct intel_memory_region **placements,
>>-				unsigned int n_placements)
>>+u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
>>+				  unsigned int n_placements)
>>  {
>>-	u32 max_page_size = 0;
>>+	u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>>  	int i;
>>  	for (i = 0; i < n_placements; i++) {
>>@@ -28,7 +28,6 @@ static u32 object_max_page_size(struct intel_memory_region **placements,
>>  		max_page_size = max_t(u32, max_page_size, mr->min_page_size);
>>  	}
>>-	GEM_BUG_ON(!max_page_size);
>>  	return max_page_size;
>>  }
>>@@ -99,7 +98,8 @@ __i915_gem_object_create_user_ext(struct drm_i915_private *i915, u64 size,
>>  	i915_gem_flush_free_objects(i915);
>>-	size = round_up(size, object_max_page_size(placements, n_placements));
>>+	size = round_up(size, i915_gem_object_max_page_size(placements,
>>+							    n_placements));
>>  	if (size == 0)
>>  		return ERR_PTR(-EINVAL);
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>index 6f0a3ce35567..650de2224843 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>@@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
>>  }
>>  void i915_gem_init__objects(struct drm_i915_private *i915);
>>+u32 i915_gem_object_max_page_size(struct intel_memory_region **placements,
>>+				  unsigned int n_placements);
>>  void i915_objects_module_exit(void);
>>  int i915_objects_module_init(void);
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>new file mode 100644
>>index 000000000000..642cdb559f17
>>--- /dev/null
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>@@ -0,0 +1,38 @@
>>+/* SPDX-License-Identifier: MIT */
>>+/*
>>+ * Copyright © 2022 Intel Corporation
>>+ */
>>+
>>+#ifndef __I915_GEM_VM_BIND_H
>>+#define __I915_GEM_VM_BIND_H
>>+
>>+#include "i915_drv.h"
>>+
>>+#define assert_vm_bind_held(vm)   lockdep_assert_held(&(vm)->vm_bind_lock)
>>+
>>+static inline void i915_gem_vm_bind_lock(struct i915_address_space *vm)
>>+{
>>+	mutex_lock(&vm->vm_bind_lock);
>>+}
>>+
>>+static inline int
>>+i915_gem_vm_bind_lock_interruptible(struct i915_address_space *vm)
>>+{
>>+	return mutex_lock_interruptible(&vm->vm_bind_lock);
>>+}
>>+
>>+static inline void i915_gem_vm_bind_unlock(struct i915_address_space *vm)
>>+{
>>+	mutex_unlock(&vm->vm_bind_lock);
>>+}
>>+
>>+struct i915_vma *
>>+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>>+void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
>>+int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>>+			 struct drm_i915_gem_vm_bind *va,
>>+			 struct drm_file *file);
>>+int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
>>+			   struct drm_i915_gem_vm_unbind *va);
>>+
>>+#endif /* __I915_GEM_VM_BIND_H */
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>new file mode 100644
>>index 000000000000..43ceb4dcca6c
>>--- /dev/null
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>@@ -0,0 +1,233 @@
>>+// SPDX-License-Identifier: MIT
>>+/*
>>+ * Copyright © 2022 Intel Corporation
>>+ */
>>+
>>+#include <linux/interval_tree_generic.h>
>>+
>>+#include "gem/i915_gem_vm_bind.h"
>>+#include "gt/gen8_engine_cs.h"
>>+
>>+#include "i915_drv.h"
>>+#include "i915_gem_gtt.h"
>>+
>>+#define START(node) ((node)->start)
>>+#define LAST(node) ((node)->last)
>>+
>>+INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
>>+		     START, LAST, static inline, i915_vm_bind_it)
>>+
>>+#undef START
>>+#undef LAST
>>+
>>+/**
>>+ * DOC: VM_BIND/UNBIND ioctls
>>+ *
>>+ * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
>>+ * objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
>>+ * specified address space (VM). Multiple mappings can map to the same physical
>>+ * pages of an object (aliasing). These mappings (also referred to as persistent
>>+ * mappings) will be persistent across multiple GPU submissions (execbuf calls)
>>+ * issued by the UMD, without user having to provide a list of all required
>>+ * mappings during each submission (as required by older execbuf mode).
>>+ *
>>+ * The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
>>+ * signaling the completion of bind/unbind operation.
>>+ *
>>+ * VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
>>+ * User has to opt-in for VM_BIND mode of binding for an address space (VM)
>>+ * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
>>+ *
>>+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>>+ * are not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be
>>+ * done asynchronously, when valid out fence is specified.
>>+ *
>>+ * VM_BIND locking order is as below.
>>+ *
>>+ * 1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
>>+ *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
>>+ *    mapping.
>>+ *
>>+ *    In future, when GPU page faults are supported, we can potentially use a
>>+ *    rwsem instead, so that multiple page fault handlers can take the read
>>+ *    side lock to lookup the mapping and hence can run in parallel.
>>+ *    The older execbuf mode of binding do not need this lock.
>>+ *
>>+ * 2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs
>>+ *    to be held while binding/unbinding a vma in the async worker and while
>>+ *    updating dma-resv fence list of an object. Note that private BOs of a VM
>>+ *    will all share a dma-resv object.
>>+ *
>>+ *    The future system allocator support will use the HMM prescribed locking
>>+ *    instead.
>>+ *
>>+ * 3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
>>+ *    invalidated vmas (due to eviction and userptr invalidation) etc.
>>+ */
>>+
>>+struct i915_vma *
>>+i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
>>+{
>>+	struct i915_vma *vma, *temp;
>>+
>>+	assert_vm_bind_held(vm);
>>+
>>+	vma = i915_vm_bind_it_iter_first(&vm->va, va, va);
>>+	/* Working around compiler error, remove later */
>>+	if (vma)
>>+		temp = i915_vm_bind_it_iter_next(vma, va + vma->size, -1);
>>+	return vma;
>>+}
>>+
>>+void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
>>+{
>>+	assert_vm_bind_held(vma->vm);
>>+
>>+	if (!list_empty(&vma->vm_bind_link)) {
>>+		list_del_init(&vma->vm_bind_link);
>>+		i915_vm_bind_it_remove(vma, &vma->vm->va);
>>+
>>+		/* Release object */
>>+		if (release_obj)
>>+			i915_vma_put(vma);
>>+	}
>>+}
>>+
>>+int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
>>+			   struct drm_i915_gem_vm_unbind *va)
>>+{
>>+	struct drm_i915_gem_object *obj;
>>+	struct i915_vma *vma;
>>+	int ret;
>>+
>>+	va->start = gen8_noncanonical_addr(va->start);
>>+	ret = i915_gem_vm_bind_lock_interruptible(vm);
>>+	if (ret)
>>+		return ret;
>>+
>>+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>>+	if (!vma) {
>>+		ret = -ENOENT;
>>+		goto out_unlock;
>>+	}
>>+
>>+	if (vma->size != va->length)
>>+		ret = -EINVAL;
>>+	else
>>+		i915_gem_vm_bind_remove(vma, false);
>>+
>>+out_unlock:
>>+	i915_gem_vm_bind_unlock(vm);
>>+	if (ret || !vma)
>>+		return ret;
>>+
>>+	/* Destroy vma and then release object */
>>+	obj = vma->obj;
>>+	ret = i915_gem_object_lock(obj, NULL);
>>+	if (ret)
>>+		return ret;
>>+
>>+	i915_vma_destroy(vma);
>>+	i915_gem_object_unlock(obj);
>>+	i915_gem_object_put(obj);
>>+
>>+	return 0;
>>+}
>>+
>>+static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
>>+					struct drm_i915_gem_object *obj,
>>+					struct drm_i915_gem_vm_bind *va)
>>+{
>>+	struct i915_ggtt_view view;
>>+	struct i915_vma *vma;
>>+
>>+	va->start = gen8_noncanonical_addr(va->start);
>>+	vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>>+	if (vma)
>>+		return ERR_PTR(-EEXIST);
>>+
>>+	view.type = I915_GGTT_VIEW_PARTIAL;
>
>One pre-requisite, which was known for "years", was to refactor the 
>GGTT view code into a generic concept. (GGTT has no place in VM BIND 
>code.) It may be just a question of renaming things, or it may end up 
>a bit more, but in any case please do include that refactor in this 
>series.
>

Thanks Tvrtko,
Yah, as mentioned in the other thread, my plan is to rename ggtt_view
to gtt_view. But it requires changes in lot of places and it probably
not going to look good in this patch series. So, my take is to do it
after this patch seires lands.

Niranjana

>Regards,
>
>Tvrtko
>
>>+	view.partial.offset = va->offset >> PAGE_SHIFT;
>>+	view.partial.size = va->length >> PAGE_SHIFT;
>>+	vma = i915_vma_instance(obj, vm, &view);
>>+	if (IS_ERR(vma))
>>+		return vma;
>>+
>>+	vma->start = va->start;
>>+	vma->last = va->start + va->length - 1;
>>+
>>+	return vma;
>>+}
>>+
>>+int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>>+			 struct drm_i915_gem_vm_bind *va,
>>+			 struct drm_file *file)
>>+{
>>+	struct drm_i915_gem_object *obj;
>>+	struct i915_vma *vma = NULL;
>>+	struct i915_gem_ww_ctx ww;
>>+	u64 pin_flags;
>>+	int ret = 0;
>>+
>>+	if (!vm->vm_bind_mode)
>>+		return -EOPNOTSUPP;
>>+
>>+	obj = i915_gem_object_lookup(file, va->handle);
>>+	if (!obj)
>>+		return -ENOENT;
>>+
>>+	if (!va->length ||
>>+	    !IS_ALIGNED(va->offset | va->length,
>>+			i915_gem_object_max_page_size(obj->mm.placements,
>>+						      obj->mm.n_placements)) ||
>>+	    range_overflows_t(u64, va->offset, va->length, obj->base.size)) {
>>+		ret = -EINVAL;
>>+		goto put_obj;
>>+	}
>>+
>>+	ret = i915_gem_vm_bind_lock_interruptible(vm);
>>+	if (ret)
>>+		goto put_obj;
>>+
>>+	vma = vm_bind_get_vma(vm, obj, va);
>>+	if (IS_ERR(vma)) {
>>+		ret = PTR_ERR(vma);
>>+		goto unlock_vm;
>>+	}
>>+
>>+	i915_gem_ww_ctx_init(&ww, true);
>>+	pin_flags = va->start | PIN_OFFSET_FIXED | PIN_USER;
>>+retry:
>>+	ret = i915_gem_object_lock(vma->obj, &ww);
>>+	if (ret)
>>+		goto out_ww;
>>+
>>+	ret = i915_vma_pin_ww(vma, &ww, 0, 0, pin_flags);
>>+	if (ret)
>>+		goto out_ww;
>>+
>>+	/* Make it evictable */
>>+	__i915_vma_unpin(vma);
>>+
>>+	list_add_tail(&vma->vm_bind_link, &vm->vm_bound_list);
>>+	i915_vm_bind_it_insert(vma, &vm->va);
>>+
>>+	/* Hold object reference until vm_unbind */
>>+	i915_gem_object_get(vma->obj);
>>+out_ww:
>>+	if (ret == -EDEADLK) {
>>+		ret = i915_gem_ww_ctx_backoff(&ww);
>>+		if (!ret)
>>+			goto retry;
>>+	}
>>+
>>+	if (ret)
>>+		i915_vma_destroy(vma);
>>+
>>+	i915_gem_ww_ctx_fini(&ww);
>>+unlock_vm:
>>+	i915_gem_vm_bind_unlock(vm);
>>+put_obj:
>>+	i915_gem_object_put(obj);
>>+	return ret;
>>+}
>>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>index b67831833c9a..135dc4a76724 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>@@ -176,6 +176,8 @@ int i915_vm_lock_objects(struct i915_address_space *vm,
>>  void i915_address_space_fini(struct i915_address_space *vm)
>>  {
>>  	drm_mm_takedown(&vm->mm);
>>+	GEM_BUG_ON(!RB_EMPTY_ROOT(&vm->va.rb_root));
>>+	mutex_destroy(&vm->vm_bind_lock);
>>  }
>>  /**
>>@@ -282,6 +284,11 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>>  	INIT_LIST_HEAD(&vm->bound_list);
>>  	INIT_LIST_HEAD(&vm->unbound_list);
>>+
>>+	vm->va = RB_ROOT_CACHED;
>>+	INIT_LIST_HEAD(&vm->vm_bind_list);
>>+	INIT_LIST_HEAD(&vm->vm_bound_list);
>>+	mutex_init(&vm->vm_bind_lock);
>>  }
>>  void *__px_vaddr(struct drm_i915_gem_object *p)
>>diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>index c812aa9708ae..d4a6ce65251d 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>@@ -259,6 +259,15 @@ struct i915_address_space {
>>  	 */
>>  	struct list_head unbound_list;
>>+	/**
>>+	 * List of VM_BIND objects.
>>+	 */
>>+	struct mutex vm_bind_lock;  /* Protects vm_bind lists */
>>+	struct list_head vm_bind_list;
>>+	struct list_head vm_bound_list;
>>+	/* va tree of persistent vmas */
>>+	struct rb_root_cached va;
>>+
>>  	/* Global GTT */
>>  	bool is_ggtt:1;
>>diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>>index ccf990dfd99b..776ab7844f60 100644
>>--- a/drivers/gpu/drm/i915/i915_driver.c
>>+++ b/drivers/gpu/drm/i915/i915_driver.c
>>@@ -68,6 +68,7 @@
>>  #include "gem/i915_gem_ioctls.h"
>>  #include "gem/i915_gem_mman.h"
>>  #include "gem/i915_gem_pm.h"
>>+#include "gem/i915_gem_vm_bind.h"
>>  #include "gt/intel_gt.h"
>>  #include "gt/intel_gt_pm.h"
>>  #include "gt/intel_rc6.h"
>>@@ -1783,13 +1784,16 @@ static int i915_gem_vm_bind_ioctl(struct drm_device *dev, void *data,
>>  {
>>  	struct drm_i915_gem_vm_bind *args = data;
>>  	struct i915_address_space *vm;
>>+	int ret;
>>  	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>>  	if (unlikely(!vm))
>>  		return -ENOENT;
>>+	ret = i915_gem_vm_bind_obj(vm, args, file);
>>+
>>  	i915_vm_put(vm);
>>-	return -EINVAL;
>>+	return ret;
>>  }
>>  static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
>>@@ -1797,13 +1801,16 @@ static int i915_gem_vm_unbind_ioctl(struct drm_device *dev, void *data,
>>  {
>>  	struct drm_i915_gem_vm_unbind *args = data;
>>  	struct i915_address_space *vm;
>>+	int ret;
>>  	vm = i915_gem_vm_lookup(file->driver_priv, args->vm_id);
>>  	if (unlikely(!vm))
>>  		return -ENOENT;
>>+	ret = i915_gem_vm_unbind_obj(vm, args);
>>+
>>  	i915_vm_put(vm);
>>-	return -EINVAL;
>>+	return ret;
>>  }
>>  static const struct drm_ioctl_desc i915_ioctls[] = {
>>diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>>index 43339ecabd73..d324e29cef0a 100644
>>--- a/drivers/gpu/drm/i915/i915_vma.c
>>+++ b/drivers/gpu/drm/i915/i915_vma.c
>>@@ -29,6 +29,7 @@
>>  #include "display/intel_frontbuffer.h"
>>  #include "gem/i915_gem_lmem.h"
>>  #include "gem/i915_gem_tiling.h"
>>+#include "gem/i915_gem_vm_bind.h"
>>  #include "gt/intel_engine.h"
>>  #include "gt/intel_engine_heartbeat.h"
>>  #include "gt/intel_gt.h"
>>@@ -234,6 +235,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>  	spin_unlock(&obj->vma.lock);
>>  	mutex_unlock(&vm->mutex);
>>+	INIT_LIST_HEAD(&vma->vm_bind_link);
>>  	return vma;
>>  err_unlock:
>>@@ -290,7 +292,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>>  {
>>  	struct i915_vma *vma;
>>-	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>>  	GEM_BUG_ON(!kref_read(&vm->ref));
>>  	spin_lock(&obj->vma.lock);
>>@@ -1660,6 +1661,10 @@ static void release_references(struct i915_vma *vma, bool vm_ddestroy)
>>  	spin_unlock(&obj->vma.lock);
>>+	i915_gem_vm_bind_lock(vma->vm);
>>+	i915_gem_vm_bind_remove(vma, true);
>>+	i915_gem_vm_bind_unlock(vma->vm);
>>+
>>  	spin_lock_irq(&gt->closed_lock);
>>  	__i915_vma_remove_closed(vma);
>>  	spin_unlock_irq(&gt->closed_lock);
>>diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
>>index 88ca0bd9c900..dcb49f79ff7e 100644
>>--- a/drivers/gpu/drm/i915/i915_vma.h
>>+++ b/drivers/gpu/drm/i915/i915_vma.h
>>@@ -164,8 +164,6 @@ i915_vma_compare(struct i915_vma *vma,
>>  {
>>  	ptrdiff_t cmp;
>>-	GEM_BUG_ON(view && !i915_is_ggtt_or_dpt(vm));
>>-
>>  	cmp = ptrdiff(vma->vm, vm);
>>  	if (cmp)
>>  		return cmp;
>>diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
>>index be6e028c3b57..b6d179bdbfa0 100644
>>--- a/drivers/gpu/drm/i915/i915_vma_types.h
>>+++ b/drivers/gpu/drm/i915/i915_vma_types.h
>>@@ -289,6 +289,14 @@ struct i915_vma {
>>  	/** This object's place on the active/inactive lists */
>>  	struct list_head vm_link;
>>+	struct list_head vm_bind_link; /* Link in persistent VMA list */
>>+
>>+	/** Interval tree structures for persistent vma */
>>+	struct rb_node rb;
>>+	u64 start;
>>+	u64 last;
>>+	u64 __subtree_last;
>>+
>>  	struct list_head obj_link; /* Link in the object's VMA list */
>>  	struct rb_node obj_node;
>>  	struct hlist_node obj_hash;

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [Intel-gfx] [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings
  2022-07-26  5:07     ` Niranjana Vishwanathapura
@ 2022-07-26  8:40       ` Tvrtko Ursulin
  0 siblings, 0 replies; 121+ messages in thread
From: Tvrtko Ursulin @ 2022-07-26  8:40 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: paulo.r.zanoni, intel-gfx, dri-devel, thomas.hellstrom,
	matthew.auld, daniel.vetter, christian.koenig


On 26/07/2022 06:07, Niranjana Vishwanathapura wrote:
> On Mon, Jul 18, 2022 at 11:55:41AM +0100, Tvrtko Ursulin wrote:
>>
>> On 01/07/2022 23:50, Niranjana Vishwanathapura wrote:
>>> Bind and unbind the mappings upon VM_BIND and VM_UNBIND calls.
>>>
>>> Signed-off-by: Niranjana Vishwanathapura 
>>> <niranjana.vishwanathapura@intel.com>
>>> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/Makefile                 |   1 +
>>>  drivers/gpu/drm/i915/gem/i915_gem_create.c    |  10 +-
>>>  drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +
>>>  drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h   |  38 +++
>>>  .../drm/i915/gem/i915_gem_vm_bind_object.c    | 233 ++++++++++++++++++
>>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |   7 +
>>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
>>>  drivers/gpu/drm/i915/i915_driver.c            |  11 +-
>>>  drivers/gpu/drm/i915/i915_vma.c               |   7 +-
>>>  drivers/gpu/drm/i915/i915_vma.h               |   2 -
>>>  drivers/gpu/drm/i915/i915_vma_types.h         |   8 +
>>>  11 files changed, 318 insertions(+), 10 deletions(-)
>>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>>
>>> diff --git a/drivers/gpu/drm/i915/Makefile 
>>> b/drivers/gpu/drm/i915/Makefile
>>> index 522ef9b4aff3..4e1627e96c6e 100644
>>> --- a/drivers/gpu/drm/i915/Makefile
>>> +++ b/drivers/gpu/drm/i915/Makefile
>>> @@ -165,6 +165,7 @@ gem-y += \
>>>      gem/i915_gem_ttm_move.o \
>>>      gem/i915_gem_ttm_pm.o \
>>>      gem/i915_gem_userptr.o \
>>> +    gem/i915_gem_vm_bind_object.o \
>>>      gem/i915_gem_wait.o \
>>>      gem/i915_gemfs.o
>>>  i915-y += \
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>> index 33673fe7ee0a..927a87e5ec59 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>> @@ -15,10 +15,10 @@
>>>  #include "i915_trace.h"
>>>  #include "i915_user_extensions.h"
>>> -static u32 object_max_page_size(struct intel_memory_region 
>>> **placements,
>>> -                unsigned int n_placements)
>>> +u32 i915_gem_object_max_page_size(struct intel_memory_region 
>>> **placements,
>>> +                  unsigned int n_placements)
>>>  {
>>> -    u32 max_page_size = 0;
>>> +    u32 max_page_size = I915_GTT_PAGE_SIZE_4K;
>>>      int i;
>>>      for (i = 0; i < n_placements; i++) {
>>> @@ -28,7 +28,6 @@ static u32 object_max_page_size(struct 
>>> intel_memory_region **placements,
>>>          max_page_size = max_t(u32, max_page_size, mr->min_page_size);
>>>      }
>>> -    GEM_BUG_ON(!max_page_size);
>>>      return max_page_size;
>>>  }
>>> @@ -99,7 +98,8 @@ __i915_gem_object_create_user_ext(struct 
>>> drm_i915_private *i915, u64 size,
>>>      i915_gem_flush_free_objects(i915);
>>> -    size = round_up(size, object_max_page_size(placements, 
>>> n_placements));
>>> +    size = round_up(size, i915_gem_object_max_page_size(placements,
>>> +                                n_placements));
>>>      if (size == 0)
>>>          return ERR_PTR(-EINVAL);
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> index 6f0a3ce35567..650de2224843 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> @@ -47,6 +47,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
>>>  }
>>>  void i915_gem_init__objects(struct drm_i915_private *i915);
>>> +u32 i915_gem_object_max_page_size(struct intel_memory_region 
>>> **placements,
>>> +                  unsigned int n_placements);
>>>  void i915_objects_module_exit(void);
>>>  int i915_objects_module_init(void);
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>> new file mode 100644
>>> index 000000000000..642cdb559f17
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>>> @@ -0,0 +1,38 @@
>>> +/* SPDX-License-Identifier: MIT */
>>> +/*
>>> + * Copyright © 2022 Intel Corporation
>>> + */
>>> +
>>> +#ifndef __I915_GEM_VM_BIND_H
>>> +#define __I915_GEM_VM_BIND_H
>>> +
>>> +#include "i915_drv.h"
>>> +
>>> +#define assert_vm_bind_held(vm)   
>>> lockdep_assert_held(&(vm)->vm_bind_lock)
>>> +
>>> +static inline void i915_gem_vm_bind_lock(struct i915_address_space *vm)
>>> +{
>>> +    mutex_lock(&vm->vm_bind_lock);
>>> +}
>>> +
>>> +static inline int
>>> +i915_gem_vm_bind_lock_interruptible(struct i915_address_space *vm)
>>> +{
>>> +    return mutex_lock_interruptible(&vm->vm_bind_lock);
>>> +}
>>> +
>>> +static inline void i915_gem_vm_bind_unlock(struct i915_address_space 
>>> *vm)
>>> +{
>>> +    mutex_unlock(&vm->vm_bind_lock);
>>> +}
>>> +
>>> +struct i915_vma *
>>> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va);
>>> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj);
>>> +int i915_gem_vm_bind_obj(struct i915_address_space *vm,
>>> +             struct drm_i915_gem_vm_bind *va,
>>> +             struct drm_file *file);
>>> +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
>>> +               struct drm_i915_gem_vm_unbind *va);
>>> +
>>> +#endif /* __I915_GEM_VM_BIND_H */
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>> new file mode 100644
>>> index 000000000000..43ceb4dcca6c
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>> @@ -0,0 +1,233 @@
>>> +// SPDX-License-Identifier: MIT
>>> +/*
>>> + * Copyright © 2022 Intel Corporation
>>> + */
>>> +
>>> +#include <linux/interval_tree_generic.h>
>>> +
>>> +#include "gem/i915_gem_vm_bind.h"
>>> +#include "gt/gen8_engine_cs.h"
>>> +
>>> +#include "i915_drv.h"
>>> +#include "i915_gem_gtt.h"
>>> +
>>> +#define START(node) ((node)->start)
>>> +#define LAST(node) ((node)->last)
>>> +
>>> +INTERVAL_TREE_DEFINE(struct i915_vma, rb, u64, __subtree_last,
>>> +             START, LAST, static inline, i915_vm_bind_it)
>>> +
>>> +#undef START
>>> +#undef LAST
>>> +
>>> +/**
>>> + * DOC: VM_BIND/UNBIND ioctls
>>> + *
>>> + * DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM 
>>> buffer
>>> + * objects (BOs) or sections of a BOs at specified GPU virtual 
>>> addresses on a
>>> + * specified address space (VM). Multiple mappings can map to the 
>>> same physical
>>> + * pages of an object (aliasing). These mappings (also referred to 
>>> as persistent
>>> + * mappings) will be persistent across multiple GPU submissions 
>>> (execbuf calls)
>>> + * issued by the UMD, without user having to provide a list of all 
>>> required
>>> + * mappings during each submission (as required by older execbuf mode).
>>> + *
>>> + * The VM_BIND/UNBIND calls allow UMDs to request a timeline out 
>>> fence for
>>> + * signaling the completion of bind/unbind operation.
>>> + *
>>> + * VM_BIND feature is advertised to user via 
>>> I915_PARAM_VM_BIND_VERSION.
>>> + * User has to opt-in for VM_BIND mode of binding for an address 
>>> space (VM)
>>> + * during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND 
>>> extension.
>>> + *
>>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads 
>>> concurrently
>>> + * are not ordered. Furthermore, parts of the VM_BIND/UNBIND 
>>> operations can be
>>> + * done asynchronously, when valid out fence is specified.
>>> + *
>>> + * VM_BIND locking order is as below.
>>> + *
>>> + * 1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock 
>>> is taken in
>>> + *    vm_bind/vm_unbind ioctl calls, in the execbuf path and while 
>>> releasing the
>>> + *    mapping.
>>> + *
>>> + *    In future, when GPU page faults are supported, we can 
>>> potentially use a
>>> + *    rwsem instead, so that multiple page fault handlers can take 
>>> the read
>>> + *    side lock to lookup the mapping and hence can run in parallel.
>>> + *    The older execbuf mode of binding do not need this lock.
>>> + *
>>> + * 2) Lock-B: The object's dma-resv lock will protect i915_vma state 
>>> and needs
>>> + *    to be held while binding/unbinding a vma in the async worker 
>>> and while
>>> + *    updating dma-resv fence list of an object. Note that private 
>>> BOs of a VM
>>> + *    will all share a dma-resv object.
>>> + *
>>> + *    The future system allocator support will use the HMM 
>>> prescribed locking
>>> + *    instead.
>>> + *
>>> + * 3) Lock-C: Spinlock/s to protect some of the VM's lists like the 
>>> list of
>>> + *    invalidated vmas (due to eviction and userptr invalidation) etc.
>>> + */
>>> +
>>> +struct i915_vma *
>>> +i915_gem_vm_bind_lookup_vma(struct i915_address_space *vm, u64 va)
>>> +{
>>> +    struct i915_vma *vma, *temp;
>>> +
>>> +    assert_vm_bind_held(vm);
>>> +
>>> +    vma = i915_vm_bind_it_iter_first(&vm->va, va, va);
>>> +    /* Working around compiler error, remove later */
>>> +    if (vma)
>>> +        temp = i915_vm_bind_it_iter_next(vma, va + vma->size, -1);
>>> +    return vma;
>>> +}
>>> +
>>> +void i915_gem_vm_bind_remove(struct i915_vma *vma, bool release_obj)
>>> +{
>>> +    assert_vm_bind_held(vma->vm);
>>> +
>>> +    if (!list_empty(&vma->vm_bind_link)) {
>>> +        list_del_init(&vma->vm_bind_link);
>>> +        i915_vm_bind_it_remove(vma, &vma->vm->va);
>>> +
>>> +        /* Release object */
>>> +        if (release_obj)
>>> +            i915_vma_put(vma);
>>> +    }
>>> +}
>>> +
>>> +int i915_gem_vm_unbind_obj(struct i915_address_space *vm,
>>> +               struct drm_i915_gem_vm_unbind *va)
>>> +{
>>> +    struct drm_i915_gem_object *obj;
>>> +    struct i915_vma *vma;
>>> +    int ret;
>>> +
>>> +    va->start = gen8_noncanonical_addr(va->start);
>>> +    ret = i915_gem_vm_bind_lock_interruptible(vm);
>>> +    if (ret)
>>> +        return ret;
>>> +
>>> +    vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>>> +    if (!vma) {
>>> +        ret = -ENOENT;
>>> +        goto out_unlock;
>>> +    }
>>> +
>>> +    if (vma->size != va->length)
>>> +        ret = -EINVAL;
>>> +    else
>>> +        i915_gem_vm_bind_remove(vma, false);
>>> +
>>> +out_unlock:
>>> +    i915_gem_vm_bind_unlock(vm);
>>> +    if (ret || !vma)
>>> +        return ret;
>>> +
>>> +    /* Destroy vma and then release object */
>>> +    obj = vma->obj;
>>> +    ret = i915_gem_object_lock(obj, NULL);
>>> +    if (ret)
>>> +        return ret;
>>> +
>>> +    i915_vma_destroy(vma);
>>> +    i915_gem_object_unlock(obj);
>>> +    i915_gem_object_put(obj);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static struct i915_vma *vm_bind_get_vma(struct i915_address_space *vm,
>>> +                    struct drm_i915_gem_object *obj,
>>> +                    struct drm_i915_gem_vm_bind *va)
>>> +{
>>> +    struct i915_ggtt_view view;
>>> +    struct i915_vma *vma;
>>> +
>>> +    va->start = gen8_noncanonical_addr(va->start);
>>> +    vma = i915_gem_vm_bind_lookup_vma(vm, va->start);
>>> +    if (vma)
>>> +        return ERR_PTR(-EEXIST);
>>> +
>>> +    view.type = I915_GGTT_VIEW_PARTIAL;
>>
>> One pre-requisite, which was known for "years", was to refactor the 
>> GGTT view code into a generic concept. (GGTT has no place in VM BIND 
>> code.) It may be just a question of renaming things, or it may end up 
>> a bit more, but in any case please do include that refactor in this 
>> series.
>>
> 
> Thanks Tvrtko,
> Yah, as mentioned in the other thread, my plan is to rename ggtt_view
> to gtt_view. But it requires changes in lot of places and it probably

I did not spot the other thread - link or msg-id?

> not going to look good in this patch series. So, my take is to do it
> after this patch seires lands.

Well..

Message-ID: <aaca5d74-6e25-d2a2-1c81-db48a8e805e7@linux.intel.com>
Date: Tue, 26 Jan 2021 17:34:15 +0000

"""
...But there would be plenty of more renaming to do, plenty
more view related things are left with "ggtt" in their names.
"""

So I was wrong, it wasn't years, only year and half. ;)

Lets not always continue the suboptimal patterns of merging under 
pressure and fixing up later, so please lets do it properly and refactor 
at the beginning of the series. I don't see why it would not look good. 
It's how things are always done and for me what doesn't look good is to 
have ggtt objects in ppgtt.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 121+ messages in thread

end of thread, other threads:[~2022-07-26  8:40 UTC | newest]

Thread overview: 121+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-01 22:50 [RFC 00/10] drm/i915/vm_bind: Add VM_BIND functionality Niranjana Vishwanathapura
2022-07-01 22:50 ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-01 22:50 ` [RFC 01/10] drm/i915/vm_bind: Introduce VM_BIND ioctl Niranjana Vishwanathapura
2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-05  9:59   ` Hellstrom, Thomas
2022-07-05  9:59     ` Hellstrom, Thomas
2022-07-07  1:18     ` [Intel-gfx] " Andi Shyti
2022-07-07  1:18       ` Andi Shyti
2022-07-07  5:06       ` Niranjana Vishwanathapura
2022-07-07  5:01     ` Niranjana Vishwanathapura
2022-07-07  5:01       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-07  7:32       ` Hellstrom, Thomas
2022-07-07  7:32         ` [Intel-gfx] " Hellstrom, Thomas
2022-07-08 12:58         ` Niranjana Vishwanathapura
2022-07-08 12:58           ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-01 22:50 ` [RFC 02/10] drm/i915/vm_bind: Bind and unbind mappings Niranjana Vishwanathapura
2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-06 16:21   ` Thomas Hellström
2022-07-06 16:21     ` [Intel-gfx] " Thomas Hellström
2022-07-07  1:41     ` Andi Shyti
2022-07-07  1:41       ` Andi Shyti
2022-07-07  5:48       ` Niranjana Vishwanathapura
2022-07-07  5:43     ` Niranjana Vishwanathapura
2022-07-07  5:43       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-07  8:14       ` Thomas Hellström
2022-07-07  8:14         ` [Intel-gfx] " Thomas Hellström
2022-07-08 12:57         ` Niranjana Vishwanathapura
2022-07-08 12:57           ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-18 10:55   ` Tvrtko Ursulin
2022-07-26  5:07     ` Niranjana Vishwanathapura
2022-07-26  8:40       ` Tvrtko Ursulin
2022-07-01 22:50 ` [RFC 03/10] drm/i915/vm_bind: Support private and shared BOs Niranjana Vishwanathapura
2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-07 10:31   ` Hellstrom, Thomas
2022-07-07 10:31     ` [Intel-gfx] " Hellstrom, Thomas
2022-07-08 13:14     ` Niranjana Vishwanathapura
2022-07-08 13:14       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-08 13:43       ` Hellstrom, Thomas
2022-07-08 13:43         ` [Intel-gfx] " Hellstrom, Thomas
2022-07-08 14:44         ` Hellstrom, Thomas
2022-07-08 14:44           ` [Intel-gfx] " Hellstrom, Thomas
2022-07-09 20:13           ` Niranjana Vishwanathapura
2022-07-09 20:13             ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-07 13:27   ` Christian König
2022-07-07 13:27     ` [Intel-gfx] " Christian König
2022-07-08 13:23     ` Niranjana Vishwanathapura
2022-07-08 13:23       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-08 17:32       ` Christian König
2022-07-08 17:32         ` [Intel-gfx] " Christian König
2022-07-09 20:14         ` Niranjana Vishwanathapura
2022-07-09 20:14           ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-01 22:50 ` [RFC 04/10] drm/i915/vm_bind: Add out fence support Niranjana Vishwanathapura
2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-01 22:50 ` [RFC 05/10] drm/i915/vm_bind: Handle persistent vmas Niranjana Vishwanathapura
2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-04 17:05   ` Zeng, Oak
2022-07-04 17:05     ` Zeng, Oak
2022-07-05  9:20     ` Ramalingam C
2022-07-05  9:20       ` Ramalingam C
2022-07-05 13:50       ` Zeng, Oak
2022-07-05 13:50         ` Zeng, Oak
2022-07-07  6:00         ` Niranjana Vishwanathapura
2022-07-07 11:27   ` Hellstrom, Thomas
2022-07-07 11:27     ` [Intel-gfx] " Hellstrom, Thomas
2022-07-08 15:06     ` Niranjana Vishwanathapura
2022-07-08 15:06       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-01 22:50 ` [RFC 06/10] drm/i915/vm_bind: Add I915_GEM_EXECBUFFER3 ioctl Niranjana Vishwanathapura
2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-07 14:41   ` Hellstrom, Thomas
2022-07-07 14:41     ` [Intel-gfx] " Hellstrom, Thomas
2022-07-07 19:38     ` Andi Shyti
2022-07-07 19:38       ` Andi Shyti
2022-07-08 12:22       ` Hellstrom, Thomas
2022-07-08 12:22         ` Hellstrom, Thomas
2022-07-08 13:47     ` Niranjana Vishwanathapura
2022-07-08 13:47       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-08 14:37       ` Hellstrom, Thomas
2022-07-08 14:37         ` [Intel-gfx] " Hellstrom, Thomas
2022-07-09 20:23         ` Niranjana Vishwanathapura
2022-07-09 20:23           ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-01 22:50 ` [RFC 07/10] drm/i915/vm_bind: Handle persistent vmas in execbuf3 Niranjana Vishwanathapura
2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-07 14:54   ` Hellstrom, Thomas
2022-07-07 14:54     ` [Intel-gfx] " Hellstrom, Thomas
2022-07-08 12:44     ` Niranjana Vishwanathapura
2022-07-08 12:44       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-08 13:03       ` Hellstrom, Thomas
2022-07-08 13:03         ` [Intel-gfx] " Hellstrom, Thomas
2022-07-09 20:25         ` Niranjana Vishwanathapura
2022-07-09 20:25           ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-01 22:50 ` [RFC 08/10] drm/i915/vm_bind: userptr dma-resv changes Niranjana Vishwanathapura
2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-07 13:11   ` Hellstrom, Thomas
2022-07-07 13:11     ` [Intel-gfx] " Hellstrom, Thomas
2022-07-08 14:51     ` Niranjana Vishwanathapura
2022-07-08 14:51       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-08 15:20       ` Hellstrom, Thomas
2022-07-08 15:20         ` [Intel-gfx] " Hellstrom, Thomas
2022-07-09 20:56         ` Niranjana Vishwanathapura
2022-07-09 20:56           ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-08 12:17   ` Hellstrom, Thomas
2022-07-08 12:17     ` [Intel-gfx] " Hellstrom, Thomas
2022-07-08 14:54     ` Niranjana Vishwanathapura
2022-07-08 14:54       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-01 22:50 ` [RFC 09/10] drm/i915/vm_bind: Skip vma_lookup for persistent vmas Niranjana Vishwanathapura
2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-05  8:57   ` Thomas Hellström
2022-07-05  8:57     ` Thomas Hellström
2022-07-08 12:40     ` Niranjana Vishwanathapura
2022-07-08 12:40       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-01 22:50 ` [RFC 10/10] drm/i915/vm_bind: Fix vm->vm_bind_mutex and vm->mutex nesting Niranjana Vishwanathapura
2022-07-01 22:50   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-05  8:40   ` Thomas Hellström
2022-07-05  8:40     ` [Intel-gfx] " Thomas Hellström
2022-07-06 16:33     ` Ramalingam C
2022-07-06 16:33       ` [Intel-gfx] " Ramalingam C
2022-07-07  5:56     ` Niranjana Vishwanathapura
2022-07-07  5:56       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-01 23:19 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/vm_bind: Add VM_BIND functionality Patchwork
2022-07-01 23:19 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-07-01 23:40 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.