All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] drm/doc/rfc: i915 VM_BIND feature design + uapi
@ 2022-06-17  5:14 ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 19+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-17  5:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel, daniel.vetter
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin, chris.p.wilson,
	thomas.hellstrom, oak.zeng, matthew.auld, jason,
	lionel.g.landwerlin, christian.koenig

This is the i915 driver VM_BIND feature design RFC patch series along
with the required uapi definition and description of intended use cases.

v2: Reduce the scope to simple Mesa use case.
    Remove all compute related uapi, vm_bind/unbind queue support and
    only support a timeline out fence instead of an in/out timeline
    fence array.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Niranjana Vishwanathapura (3):
  drm/doc/rfc: VM_BIND feature design document
  drm/i915: Update i915 uapi documentation
  drm/doc/rfc: VM_BIND uapi definition

 Documentation/gpu/rfc/i915_vm_bind.h   | 226 +++++++++++++++++++++++
 Documentation/gpu/rfc/i915_vm_bind.rst | 238 +++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst        |   4 +
 include/uapi/drm/i915_drm.h            | 205 ++++++++++++++++-----
 4 files changed, 628 insertions(+), 45 deletions(-)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst

-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Intel-gfx] [PATCH v2 0/3] drm/doc/rfc: i915 VM_BIND feature design + uapi
@ 2022-06-17  5:14 ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 19+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-17  5:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel, daniel.vetter
  Cc: paulo.r.zanoni, chris.p.wilson, thomas.hellstrom, matthew.auld,
	christian.koenig

This is the i915 driver VM_BIND feature design RFC patch series along
with the required uapi definition and description of intended use cases.

v2: Reduce the scope to simple Mesa use case.
    Remove all compute related uapi, vm_bind/unbind queue support and
    only support a timeline out fence instead of an in/out timeline
    fence array.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Niranjana Vishwanathapura (3):
  drm/doc/rfc: VM_BIND feature design document
  drm/i915: Update i915 uapi documentation
  drm/doc/rfc: VM_BIND uapi definition

 Documentation/gpu/rfc/i915_vm_bind.h   | 226 +++++++++++++++++++++++
 Documentation/gpu/rfc/i915_vm_bind.rst | 238 +++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst        |   4 +
 include/uapi/drm/i915_drm.h            | 205 ++++++++++++++++-----
 4 files changed, 628 insertions(+), 45 deletions(-)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst

-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/3] drm/doc/rfc: VM_BIND feature design document
  2022-06-17  5:14 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-06-17  5:14   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 19+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-17  5:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel, daniel.vetter
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin, chris.p.wilson,
	thomas.hellstrom, oak.zeng, matthew.auld, jason,
	lionel.g.landwerlin, christian.koenig

VM_BIND design document with description of intended use cases.

v2: Reduce the scope to simple Mesa use case.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 Documentation/gpu/rfc/i915_vm_bind.rst | 238 +++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst        |   4 +
 2 files changed, 242 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst

diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst b/Documentation/gpu/rfc/i915_vm_bind.rst
new file mode 100644
index 000000000000..4ab590ef11fd
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.rst
@@ -0,0 +1,238 @@
+==========================================
+I915 VM_BIND feature design and use cases
+==========================================
+
+VM_BIND feature
+================
+DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
+objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
+specified address space (VM). These mappings (also referred to as persistent
+mappings) will be persistent across multiple GPU submissions (execbuf calls)
+issued by the UMD, without user having to provide a list of all required
+mappings during each submission (as required by older execbuf mode).
+
+The VM_BIND/UNBIND calls allow UMDs to request a timeline fence for signaling
+the completion of bind/unbind operation.
+
+VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND.
+User has to opt-in for VM_BIND mode of binding for an address space (VM)
+during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
+
+Normally, vm_bind/unbind operations will get completed synchronously,
+but if the object is being moved, the binding will happen once that the
+moving is complete and out fence will be signaled after binding is complete.
+The bind/unbind operation can get completed out of submission order.
+
+VM_BIND features include:
+
+* Multiple Virtual Address (VA) mappings can map to the same physical pages
+  of an object (aliasing).
+* VA mapping can map to a partial section of the BO (partial binding).
+* Support capture of persistent mappings in the dump upon GPU error.
+* TLB is flushed upon unbind completion. Batching of TLB flushes in some
+  use cases will be helpful.
+* Support for userptr gem objects (no special uapi is required for this).
+
+Execbuf ioctl in VM_BIND mode
+-------------------------------
+A VM in VM_BIND mode will not support older execbuf mode of binding.
+The execbuf ioctl handling in VM_BIND mode differs significantly from the
+older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
+Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
+struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
+execlist. Hence, no support for implicit sync. It is expected that the below
+work will be able to support requirements of object dependency setting in all
+use cases:
+
+"dma-buf: Add an API for exporting sync files"
+(https://lwn.net/Articles/859290/)
+
+The execbuf3 ioctl directly specifies the batch addresses instead of as
+object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
+support many of the older features like in/out/submit fences, fence array,
+default gem context and many more (See struct drm_i915_gem_execbuffer3).
+
+In VM_BIND mode, VA allocation is completely managed by the user instead of
+the i915 driver. Hence all VA assignment, eviction are not applicable in
+VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
+be using the i915_vma active reference tracking. It will instead use dma-resv
+object for that (See `VM_BIND dma_resv usage`_).
+
+So, a lot of existing code supporting execbuf2 ioctl, like relocations, VA
+evictions, vma lookup table, implicit sync, vma active reference tracking etc.,
+are not applicable for execbuf3 ioctl. Hence, all execbuf3 specific handling
+should be in a separate file and only functionalities common to these ioctls
+can be the shared code where possible.
+
+VM_PRIVATE objects
+-------------------
+By default, BOs can be mapped on multiple VMs and can also be dma-buf
+exported. Hence these BOs are referred to as Shared BOs.
+During each execbuf submission, the request fence must be added to the
+dma-resv fence list of all shared BOs mapped on the VM.
+
+VM_BIND feature introduces an optimization where user can create BO which
+is private to a specified VM via I915_GEM_CREATE_EXT_VM_PRIVATE flag during
+BO creation. Unlike Shared BOs, these VM private BOs can only be mapped on
+the VM they are private to and can't be dma-buf exported.
+All private BOs of a VM share the dma-resv object. Hence during each execbuf
+submission, they need only one dma-resv fence list updated. Thus, the fast
+path (where required mappings are already bound) submission latency is O(1)
+w.r.t the number of VM private BOs.
+
+VM_BIND locking hirarchy
+-------------------------
+The locking design here supports the older (execlist based) execbuf mode, the
+newer VM_BIND mode, the VM_BIND mode with GPU page faults and possible future
+system allocator support (See `Shared Virtual Memory (SVM) support`_).
+The older execbuf mode and the newer VM_BIND mode without page faults manages
+residency of backing storage using dma_fence. The VM_BIND mode with page faults
+and the system allocator support do not use any dma_fence at all.
+
+VM_BIND locking order is as below.
+
+1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
+   vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
+   mapping.
+
+   In future, when GPU page faults are supported, we can potentially use a
+   rwsem instead, so that multiple page fault handlers can take the read side
+   lock to lookup the mapping and hence can run in parallel.
+   The older execbuf mode of binding do not need this lock.
+
+2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs to
+   be held while binding/unbinding a vma in the async worker and while updating
+   dma-resv fence list of an object. Note that private BOs of a VM will all
+   share a dma-resv object.
+
+   The future system allocator support will use the HMM prescribed locking
+   instead.
+
+3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
+   invalidated vmas (due to eviction and userptr invalidation) etc.
+
+When GPU page faults are supported, the execbuf path do not take any of these
+locks. There we will simply smash the new batch buffer address into the ring and
+then tell the scheduler run that. The lock taking only happens from the page
+fault handler, where we take lock-A in read mode, whichever lock-B we need to
+find the backing storage (dma_resv lock for gem objects, and hmm/core mm for
+system allocator) and some additional locks (lock-D) for taking care of page
+table races. Page fault mode should not need to ever manipulate the vm lists,
+so won't ever need lock-C.
+
+VM_BIND LRU handling
+---------------------
+We need to ensure VM_BIND mapped objects are properly LRU tagged to avoid
+performance degradation. We will also need support for bulk LRU movement of
+VM_BIND objects to avoid additional latencies in execbuf path.
+
+The page table pages are similar to VM_BIND mapped objects (See
+`Evictable page table allocations`_) and are maintained per VM and needs to
+be pinned in memory when VM is made active (ie., upon an execbuf call with
+that VM). So, bulk LRU movement of page table pages is also needed.
+
+VM_BIND dma_resv usage
+-----------------------
+Fences needs to be added to all VM_BIND mapped objects. During each execbuf
+submission, they are added with DMA_RESV_USAGE_BOOKKEEP usage to prevent
+over sync (See enum dma_resv_usage). One can override it with either
+DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE usage during object dependency
+setting (either through explicit or implicit mechanism).
+
+When vm_bind is called for a non-private object while the VM is already
+active, the fences need to be copied from VM's shared dma-resv object
+(common to all private objects of the VM) to this non-private object.
+If this results in performance degradation, then some optimization will
+be needed here. This is not a problem for VM's private objects as they use
+shared dma-resv object which is always updated on each execbuf submission.
+
+Also, in VM_BIND mode, use dma-resv apis for determining object activeness
+(See dma_resv_test_signaled() and dma_resv_wait_timeout()) and do not use the
+older i915_vma active reference tracking which is deprecated. This should be
+easier to get it working with the current TTM backend.
+
+Mesa use case
+--------------
+VM_BIND can potentially reduce the CPU overhead in Mesa (both Vulkan and Iris),
+hence improving performance of CPU-bound applications. It also allows us to
+implement Vulkan's Sparse Resources. With increasing GPU hardware performance,
+reducing CPU overhead becomes more impactful.
+
+
+Other VM_BIND use cases
+========================
+
+Long running Compute contexts
+------------------------------
+Usage of dma-fence expects that they complete in reasonable amount of time.
+Compute on the other hand can be long running. Hence it is appropriate for
+compute to use user/memory fence (See `User/Memory Fence`_) and dma-fence usage
+must be limited to in-kernel consumption only.
+
+Where GPU page faults are not available, kernel driver upon buffer invalidation
+will initiate a suspend (preemption) of long running context, finish the
+invalidation, revalidate the BO and then resume the compute context. This is
+done by having a per-context preempt fence which is enabled when someone tries
+to wait on it and triggers the context preemption.
+
+User/Memory Fence
+~~~~~~~~~~~~~~~~~~
+User/Memory fence is a <address, value> pair. To signal the user fence, the
+specified value will be written at the specified virtual address and wakeup the
+waiting process. User fence can be signaled either by the GPU or kernel async
+worker (like upon bind completion). User can wait on a user fence with a new
+user fence wait ioctl.
+
+Here is some prior work on this:
+https://patchwork.freedesktop.org/patch/349417/
+
+Low Latency Submission
+~~~~~~~~~~~~~~~~~~~~~~~
+Allows compute UMD to directly submit GPU jobs instead of through execbuf
+ioctl. This is made possible by VM_BIND is not being synchronized against
+execbuf. VM_BIND allows bind/unbind of mappings required for the directly
+submitted jobs.
+
+Debugger
+---------
+With debug event interface user space process (debugger) is able to keep track
+of and act upon resources created by another process (debugged) and attached
+to GPU via vm_bind interface.
+
+GPU page faults
+----------------
+GPU page faults when supported (in future), will only be supported in the
+VM_BIND mode. While both the older execbuf mode and the newer VM_BIND mode of
+binding will require using dma-fence to ensure residency, the GPU page faults
+mode when supported, will not use any dma-fence as residency is purely managed
+by installing and removing/invalidating page table entries.
+
+Page level hints settings
+--------------------------
+VM_BIND allows any hints setting per mapping instead of per BO.
+Possible hints include read-only mapping, placement and atomicity.
+Sub-BO level placement hint will be even more relevant with
+upcoming GPU on-demand page fault support.
+
+Page level Cache/CLOS settings
+-------------------------------
+VM_BIND allows cache/CLOS settings per mapping instead of per BO.
+
+Evictable page table allocations
+---------------------------------
+Make pagetable allocations evictable and manage them similar to VM_BIND
+mapped objects. Page table pages are similar to persistent mappings of a
+VM (difference here are that the page table pages will not have an i915_vma
+structure and after swapping pages back in, parent page link needs to be
+updated).
+
+Shared Virtual Memory (SVM) support
+------------------------------------
+VM_BIND interface can be used to map system memory directly (without gem BO
+abstraction) using the HMM interface. SVM is only supported with GPU page
+faults enabled.
+
+VM_BIND UAPI
+=============
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_vm_bind.h
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..7d10c36b268d 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -23,3 +23,7 @@ host such documentation:
 .. toctree::
 
     i915_scheduler.rst
+
+.. toctree::
+
+    i915_vm_bind.rst
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-gfx] [PATCH v2 1/3] drm/doc/rfc: VM_BIND feature design document
@ 2022-06-17  5:14   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 19+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-17  5:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel, daniel.vetter
  Cc: paulo.r.zanoni, chris.p.wilson, thomas.hellstrom, matthew.auld,
	christian.koenig

VM_BIND design document with description of intended use cases.

v2: Reduce the scope to simple Mesa use case.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 Documentation/gpu/rfc/i915_vm_bind.rst | 238 +++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst        |   4 +
 2 files changed, 242 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst

diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst b/Documentation/gpu/rfc/i915_vm_bind.rst
new file mode 100644
index 000000000000..4ab590ef11fd
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.rst
@@ -0,0 +1,238 @@
+==========================================
+I915 VM_BIND feature design and use cases
+==========================================
+
+VM_BIND feature
+================
+DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
+objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
+specified address space (VM). These mappings (also referred to as persistent
+mappings) will be persistent across multiple GPU submissions (execbuf calls)
+issued by the UMD, without user having to provide a list of all required
+mappings during each submission (as required by older execbuf mode).
+
+The VM_BIND/UNBIND calls allow UMDs to request a timeline fence for signaling
+the completion of bind/unbind operation.
+
+VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND.
+User has to opt-in for VM_BIND mode of binding for an address space (VM)
+during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
+
+Normally, vm_bind/unbind operations will get completed synchronously,
+but if the object is being moved, the binding will happen once that the
+moving is complete and out fence will be signaled after binding is complete.
+The bind/unbind operation can get completed out of submission order.
+
+VM_BIND features include:
+
+* Multiple Virtual Address (VA) mappings can map to the same physical pages
+  of an object (aliasing).
+* VA mapping can map to a partial section of the BO (partial binding).
+* Support capture of persistent mappings in the dump upon GPU error.
+* TLB is flushed upon unbind completion. Batching of TLB flushes in some
+  use cases will be helpful.
+* Support for userptr gem objects (no special uapi is required for this).
+
+Execbuf ioctl in VM_BIND mode
+-------------------------------
+A VM in VM_BIND mode will not support older execbuf mode of binding.
+The execbuf ioctl handling in VM_BIND mode differs significantly from the
+older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
+Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
+struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
+execlist. Hence, no support for implicit sync. It is expected that the below
+work will be able to support requirements of object dependency setting in all
+use cases:
+
+"dma-buf: Add an API for exporting sync files"
+(https://lwn.net/Articles/859290/)
+
+The execbuf3 ioctl directly specifies the batch addresses instead of as
+object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
+support many of the older features like in/out/submit fences, fence array,
+default gem context and many more (See struct drm_i915_gem_execbuffer3).
+
+In VM_BIND mode, VA allocation is completely managed by the user instead of
+the i915 driver. Hence all VA assignment, eviction are not applicable in
+VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
+be using the i915_vma active reference tracking. It will instead use dma-resv
+object for that (See `VM_BIND dma_resv usage`_).
+
+So, a lot of existing code supporting execbuf2 ioctl, like relocations, VA
+evictions, vma lookup table, implicit sync, vma active reference tracking etc.,
+are not applicable for execbuf3 ioctl. Hence, all execbuf3 specific handling
+should be in a separate file and only functionalities common to these ioctls
+can be the shared code where possible.
+
+VM_PRIVATE objects
+-------------------
+By default, BOs can be mapped on multiple VMs and can also be dma-buf
+exported. Hence these BOs are referred to as Shared BOs.
+During each execbuf submission, the request fence must be added to the
+dma-resv fence list of all shared BOs mapped on the VM.
+
+VM_BIND feature introduces an optimization where user can create BO which
+is private to a specified VM via I915_GEM_CREATE_EXT_VM_PRIVATE flag during
+BO creation. Unlike Shared BOs, these VM private BOs can only be mapped on
+the VM they are private to and can't be dma-buf exported.
+All private BOs of a VM share the dma-resv object. Hence during each execbuf
+submission, they need only one dma-resv fence list updated. Thus, the fast
+path (where required mappings are already bound) submission latency is O(1)
+w.r.t the number of VM private BOs.
+
+VM_BIND locking hirarchy
+-------------------------
+The locking design here supports the older (execlist based) execbuf mode, the
+newer VM_BIND mode, the VM_BIND mode with GPU page faults and possible future
+system allocator support (See `Shared Virtual Memory (SVM) support`_).
+The older execbuf mode and the newer VM_BIND mode without page faults manages
+residency of backing storage using dma_fence. The VM_BIND mode with page faults
+and the system allocator support do not use any dma_fence at all.
+
+VM_BIND locking order is as below.
+
+1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
+   vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
+   mapping.
+
+   In future, when GPU page faults are supported, we can potentially use a
+   rwsem instead, so that multiple page fault handlers can take the read side
+   lock to lookup the mapping and hence can run in parallel.
+   The older execbuf mode of binding do not need this lock.
+
+2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs to
+   be held while binding/unbinding a vma in the async worker and while updating
+   dma-resv fence list of an object. Note that private BOs of a VM will all
+   share a dma-resv object.
+
+   The future system allocator support will use the HMM prescribed locking
+   instead.
+
+3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
+   invalidated vmas (due to eviction and userptr invalidation) etc.
+
+When GPU page faults are supported, the execbuf path do not take any of these
+locks. There we will simply smash the new batch buffer address into the ring and
+then tell the scheduler run that. The lock taking only happens from the page
+fault handler, where we take lock-A in read mode, whichever lock-B we need to
+find the backing storage (dma_resv lock for gem objects, and hmm/core mm for
+system allocator) and some additional locks (lock-D) for taking care of page
+table races. Page fault mode should not need to ever manipulate the vm lists,
+so won't ever need lock-C.
+
+VM_BIND LRU handling
+---------------------
+We need to ensure VM_BIND mapped objects are properly LRU tagged to avoid
+performance degradation. We will also need support for bulk LRU movement of
+VM_BIND objects to avoid additional latencies in execbuf path.
+
+The page table pages are similar to VM_BIND mapped objects (See
+`Evictable page table allocations`_) and are maintained per VM and needs to
+be pinned in memory when VM is made active (ie., upon an execbuf call with
+that VM). So, bulk LRU movement of page table pages is also needed.
+
+VM_BIND dma_resv usage
+-----------------------
+Fences needs to be added to all VM_BIND mapped objects. During each execbuf
+submission, they are added with DMA_RESV_USAGE_BOOKKEEP usage to prevent
+over sync (See enum dma_resv_usage). One can override it with either
+DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE usage during object dependency
+setting (either through explicit or implicit mechanism).
+
+When vm_bind is called for a non-private object while the VM is already
+active, the fences need to be copied from VM's shared dma-resv object
+(common to all private objects of the VM) to this non-private object.
+If this results in performance degradation, then some optimization will
+be needed here. This is not a problem for VM's private objects as they use
+shared dma-resv object which is always updated on each execbuf submission.
+
+Also, in VM_BIND mode, use dma-resv apis for determining object activeness
+(See dma_resv_test_signaled() and dma_resv_wait_timeout()) and do not use the
+older i915_vma active reference tracking which is deprecated. This should be
+easier to get it working with the current TTM backend.
+
+Mesa use case
+--------------
+VM_BIND can potentially reduce the CPU overhead in Mesa (both Vulkan and Iris),
+hence improving performance of CPU-bound applications. It also allows us to
+implement Vulkan's Sparse Resources. With increasing GPU hardware performance,
+reducing CPU overhead becomes more impactful.
+
+
+Other VM_BIND use cases
+========================
+
+Long running Compute contexts
+------------------------------
+Usage of dma-fence expects that they complete in reasonable amount of time.
+Compute on the other hand can be long running. Hence it is appropriate for
+compute to use user/memory fence (See `User/Memory Fence`_) and dma-fence usage
+must be limited to in-kernel consumption only.
+
+Where GPU page faults are not available, kernel driver upon buffer invalidation
+will initiate a suspend (preemption) of long running context, finish the
+invalidation, revalidate the BO and then resume the compute context. This is
+done by having a per-context preempt fence which is enabled when someone tries
+to wait on it and triggers the context preemption.
+
+User/Memory Fence
+~~~~~~~~~~~~~~~~~~
+User/Memory fence is a <address, value> pair. To signal the user fence, the
+specified value will be written at the specified virtual address and wakeup the
+waiting process. User fence can be signaled either by the GPU or kernel async
+worker (like upon bind completion). User can wait on a user fence with a new
+user fence wait ioctl.
+
+Here is some prior work on this:
+https://patchwork.freedesktop.org/patch/349417/
+
+Low Latency Submission
+~~~~~~~~~~~~~~~~~~~~~~~
+Allows compute UMD to directly submit GPU jobs instead of through execbuf
+ioctl. This is made possible by VM_BIND is not being synchronized against
+execbuf. VM_BIND allows bind/unbind of mappings required for the directly
+submitted jobs.
+
+Debugger
+---------
+With debug event interface user space process (debugger) is able to keep track
+of and act upon resources created by another process (debugged) and attached
+to GPU via vm_bind interface.
+
+GPU page faults
+----------------
+GPU page faults when supported (in future), will only be supported in the
+VM_BIND mode. While both the older execbuf mode and the newer VM_BIND mode of
+binding will require using dma-fence to ensure residency, the GPU page faults
+mode when supported, will not use any dma-fence as residency is purely managed
+by installing and removing/invalidating page table entries.
+
+Page level hints settings
+--------------------------
+VM_BIND allows any hints setting per mapping instead of per BO.
+Possible hints include read-only mapping, placement and atomicity.
+Sub-BO level placement hint will be even more relevant with
+upcoming GPU on-demand page fault support.
+
+Page level Cache/CLOS settings
+-------------------------------
+VM_BIND allows cache/CLOS settings per mapping instead of per BO.
+
+Evictable page table allocations
+---------------------------------
+Make pagetable allocations evictable and manage them similar to VM_BIND
+mapped objects. Page table pages are similar to persistent mappings of a
+VM (difference here are that the page table pages will not have an i915_vma
+structure and after swapping pages back in, parent page link needs to be
+updated).
+
+Shared Virtual Memory (SVM) support
+------------------------------------
+VM_BIND interface can be used to map system memory directly (without gem BO
+abstraction) using the HMM interface. SVM is only supported with GPU page
+faults enabled.
+
+VM_BIND UAPI
+=============
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_vm_bind.h
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..7d10c36b268d 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -23,3 +23,7 @@ host such documentation:
 .. toctree::
 
     i915_scheduler.rst
+
+.. toctree::
+
+    i915_vm_bind.rst
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/3] drm/i915: Update i915 uapi documentation
  2022-06-17  5:14 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-06-17  5:14   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 19+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-17  5:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel, daniel.vetter
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin, chris.p.wilson,
	thomas.hellstrom, oak.zeng, matthew.auld, jason,
	lionel.g.landwerlin, christian.koenig

Add some missing i915 upai documentation which the new
i915 VM_BIND feature documentation will be refer to.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 include/uapi/drm/i915_drm.h | 205 ++++++++++++++++++++++++++++--------
 1 file changed, 160 insertions(+), 45 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index de49b68b4fc8..f5ce34d447b1 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -751,14 +751,27 @@ typedef struct drm_i915_irq_wait {
 
 /* Must be kept compact -- no holes and well documented */
 
-typedef struct drm_i915_getparam {
+/**
+ * struct drm_i915_getparam - Driver parameter query structure.
+ */
+struct drm_i915_getparam {
+	/** @param: Driver parameter to query. */
 	__s32 param;
-	/*
+
+	/**
+	 * @value: Address of memory where queried value should be put.
+	 *
 	 * WARNING: Using pointers instead of fixed-size u64 means we need to write
 	 * compat32 code. Don't repeat this mistake.
 	 */
 	int __user *value;
-} drm_i915_getparam_t;
+};
+
+/**
+ * typedef drm_i915_getparam_t - Driver parameter query structure.
+ * See struct drm_i915_getparam.
+ */
+typedef struct drm_i915_getparam drm_i915_getparam_t;
 
 /* Ioctl to set kernel params:
  */
@@ -1239,76 +1252,119 @@ struct drm_i915_gem_exec_object2 {
 	__u64 rsvd2;
 };
 
+/**
+ * struct drm_i915_gem_exec_fence - An input or output fence for the execbuf
+ * ioctl.
+ *
+ * The request will wait for input fence to signal before submission.
+ *
+ * The returned output fence will be signaled after the completion of the
+ * request.
+ */
 struct drm_i915_gem_exec_fence {
-	/**
-	 * User's handle for a drm_syncobj to wait on or signal.
-	 */
+	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
 	__u32 handle;
 
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_EXEC_FENCE_WAIT:
+	 * Wait for the input fence before request submission.
+	 *
+	 * I915_EXEC_FENCE_SIGNAL:
+	 * Return request completion fence as output
+	 */
+	__u32 flags;
 #define I915_EXEC_FENCE_WAIT            (1<<0)
 #define I915_EXEC_FENCE_SIGNAL          (1<<1)
 #define __I915_EXEC_FENCE_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SIGNAL << 1))
-	__u32 flags;
 };
 
-/*
- * See drm_i915_gem_execbuffer_ext_timeline_fences.
- */
-#define DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES 0
-
-/*
+/**
+ * struct drm_i915_gem_execbuffer_ext_timeline_fences - Timeline fences
+ * for execbuf ioctl.
+ *
  * This structure describes an array of drm_syncobj and associated points for
  * timeline variants of drm_syncobj. It is invalid to append this structure to
  * the execbuf if I915_EXEC_FENCE_ARRAY is set.
  */
 struct drm_i915_gem_execbuffer_ext_timeline_fences {
+#define DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES 0
+	/** @base: Extension link. See struct i915_user_extension. */
 	struct i915_user_extension base;
 
 	/**
-	 * Number of element in the handles_ptr & value_ptr arrays.
+	 * @fence_count: Number of elements in the @handles_ptr & @value_ptr
+	 * arrays.
 	 */
 	__u64 fence_count;
 
 	/**
-	 * Pointer to an array of struct drm_i915_gem_exec_fence of length
-	 * fence_count.
+	 * @handles_ptr: Pointer to an array of struct drm_i915_gem_exec_fence
+	 * of length @fence_count.
 	 */
 	__u64 handles_ptr;
 
 	/**
-	 * Pointer to an array of u64 values of length fence_count. Values
-	 * must be 0 for a binary drm_syncobj. A Value of 0 for a timeline
-	 * drm_syncobj is invalid as it turns a drm_syncobj into a binary one.
+	 * @values_ptr: Pointer to an array of u64 values of length
+	 * @fence_count.
+	 * Values must be 0 for a binary drm_syncobj. A Value of 0 for a
+	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+	 * binary one.
 	 */
 	__u64 values_ptr;
 };
 
+/**
+ * struct drm_i915_gem_execbuffer2 - Structure for DRM_I915_GEM_EXECBUFFER2
+ * ioctl.
+ */
 struct drm_i915_gem_execbuffer2 {
-	/**
-	 * List of gem_exec_object2 structs
-	 */
+	/** @buffers_ptr: Pointer to a list of gem_exec_object2 structs */
 	__u64 buffers_ptr;
+
+	/** @buffer_count: Number of elements in @buffers_ptr array */
 	__u32 buffer_count;
 
-	/** Offset in the batchbuffer to start execution from. */
+	/**
+	 * @batch_start_offset: Offset in the batchbuffer to start execution
+	 * from.
+	 */
 	__u32 batch_start_offset;
-	/** Bytes used in batchbuffer from batch_start_offset */
+
+	/**
+	 * @batch_len: Length in bytes of the batch buffer, starting from the
+	 * @batch_start_offset. If 0, length is assumed to be the batch buffer
+	 * object size.
+	 */
 	__u32 batch_len;
+
+	/** @DR1: deprecated */
 	__u32 DR1;
+
+	/** @DR4: deprecated */
 	__u32 DR4;
+
+	/** @num_cliprects: See @cliprects_ptr */
 	__u32 num_cliprects;
+
 	/**
-	 * This is a struct drm_clip_rect *cliprects if I915_EXEC_FENCE_ARRAY
-	 * & I915_EXEC_USE_EXTENSIONS are not set.
+	 * @cliprects_ptr: Kernel clipping was a DRI1 misfeature.
+	 *
+	 * It is invalid to use this field if I915_EXEC_FENCE_ARRAY or
+	 * I915_EXEC_USE_EXTENSIONS flags are not set.
 	 *
 	 * If I915_EXEC_FENCE_ARRAY is set, then this is a pointer to an array
-	 * of struct drm_i915_gem_exec_fence and num_cliprects is the length
-	 * of the array.
+	 * of &drm_i915_gem_exec_fence and @num_cliprects is the length of the
+	 * array.
 	 *
 	 * If I915_EXEC_USE_EXTENSIONS is set, then this is a pointer to a
-	 * single struct i915_user_extension and num_cliprects is 0.
+	 * single &i915_user_extension and num_cliprects is 0.
 	 */
 	__u64 cliprects_ptr;
+
+	/** @flags: Execbuf flags */
+	__u64 flags;
 #define I915_EXEC_RING_MASK              (0x3f)
 #define I915_EXEC_DEFAULT                (0<<0)
 #define I915_EXEC_RENDER                 (1<<0)
@@ -1326,10 +1382,6 @@ struct drm_i915_gem_execbuffer2 {
 #define I915_EXEC_CONSTANTS_REL_GENERAL (0<<6) /* default */
 #define I915_EXEC_CONSTANTS_ABSOLUTE 	(1<<6)
 #define I915_EXEC_CONSTANTS_REL_SURFACE (2<<6) /* gen4/5 only */
-	__u64 flags;
-	__u64 rsvd1; /* now used for context info */
-	__u64 rsvd2;
-};
 
 /** Resets the SO write offset registers for transform feedback on gen7. */
 #define I915_EXEC_GEN7_SOL_RESET	(1<<8)
@@ -1432,9 +1484,23 @@ struct drm_i915_gem_execbuffer2 {
  * drm_i915_gem_execbuffer_ext enum.
  */
 #define I915_EXEC_USE_EXTENSIONS	(1 << 21)
-
 #define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_USE_EXTENSIONS << 1))
 
+	/** @rsvd1: Context id */
+	__u64 rsvd1;
+
+	/**
+	 * @rsvd2: in and out sync_file file descriptors.
+	 *
+	 * When I915_EXEC_FENCE_IN or I915_EXEC_FENCE_SUBMIT flag is set, the
+	 * lower 32 bits of this field will have the in sync_file fd (input).
+	 *
+	 * When I915_EXEC_FENCE_OUT flag is set, the upper 32 bits of this
+	 * field will have the out sync_file fd (output).
+	 */
+	__u64 rsvd2;
+};
+
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
 	(eb2).rsvd1 = context & I915_EXEC_CONTEXT_ID_MASK
@@ -1814,19 +1880,58 @@ struct drm_i915_gem_context_create {
 	__u32 pad;
 };
 
+/**
+ * struct drm_i915_gem_context_create_ext - Structure for creating contexts.
+ */
 struct drm_i915_gem_context_create_ext {
-	__u32 ctx_id; /* output: id of new context*/
+	/** @ctx_id: Id of the created context (output) */
+	__u32 ctx_id;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS:
+	 *
+	 * Extensions may be appended to this structure and driver must check
+	 * for those. See @extensions.
+	 *
+	 * I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE
+	 *
+	 * Created context will have single timeline.
+	 */
 	__u32 flags;
 #define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS	(1u << 0)
 #define I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE	(1u << 1)
 #define I915_CONTEXT_CREATE_FLAGS_UNKNOWN \
 	(-(I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE << 1))
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 * 
+	 * I915_CONTEXT_CREATE_EXT_SETPARAM:
+	 * Context parameter to set or query during context creation.
+	 * See struct drm_i915_gem_context_create_ext_setparam.
+	 * 
+	 * I915_CONTEXT_CREATE_EXT_CLONE:
+	 * This extension has been removed. On the off chance someone somewhere
+	 * has attempted to use it, never re-use this extension number.
+	 */
 	__u64 extensions;
+#define I915_CONTEXT_CREATE_EXT_SETPARAM 0
+#define I915_CONTEXT_CREATE_EXT_CLONE 1
 };
 
+/**
+ * struct drm_i915_gem_context_param - Context parameter to set or query.
+ */
 struct drm_i915_gem_context_param {
+	/** @ctx_id: Context id */
 	__u32 ctx_id;
+
+	/** @size: Size of the parameter @value
 	__u32 size;
+
+	/** @param: Parameter to set or query */
 	__u64 param;
 #define I915_CONTEXT_PARAM_BAN_PERIOD	0x1
 /* I915_CONTEXT_PARAM_NO_ZEROMAP has been removed.  On the off chance
@@ -1973,6 +2078,7 @@ struct drm_i915_gem_context_param {
 #define I915_CONTEXT_PARAM_PROTECTED_CONTENT    0xd
 /* Must be kept compact -- no holes and well documented */
 
+	/** @value: Context parameter value to be set or queried */
 	__u64 value;
 };
 
@@ -2371,23 +2477,29 @@ struct i915_context_param_engines {
 	struct i915_engine_class_instance engines[N__]; \
 } __attribute__((packed)) name__
 
+/**
+ * struct drm_i915_gem_context_create_ext_setparam - Context parameter
+ * to set or query during context creation.
+ */
 struct drm_i915_gem_context_create_ext_setparam {
-#define I915_CONTEXT_CREATE_EXT_SETPARAM 0
+	/** @base: Extension link. See struct i915_user_extension. */
 	struct i915_user_extension base;
+
+	/**
+	 * @param: Context parameter to set or query.
+	 * See struct drm_i915_gem_context_param.
+	 */
 	struct drm_i915_gem_context_param param;
 };
 
-/* This API has been removed.  On the off chance someone somewhere has
- * attempted to use it, never re-use this extension number.
- */
-#define I915_CONTEXT_CREATE_EXT_CLONE 1
-
 struct drm_i915_gem_context_destroy {
 	__u32 ctx_id;
 	__u32 pad;
 };
 
-/*
+/**
+ * struct drm_i915_gem_vm_control - Structure to create or destroy VM.
+ *
  * DRM_I915_GEM_VM_CREATE -
  *
  * Create a new virtual memory address space (ppGTT) for use within a context
@@ -2397,20 +2509,23 @@ struct drm_i915_gem_context_destroy {
  * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
  * returned in the outparam @id.
  *
- * No flags are defined, with all bits reserved and must be zero.
- *
  * An extension chain maybe provided, starting with @extensions, and terminated
  * by the @next_extension being 0. Currently, no extensions are defined.
  *
  * DRM_I915_GEM_VM_DESTROY -
  *
- * Destroys a previously created VM id, specified in @id.
+ * Destroys a previously created VM id, specified in @vm_id.
  *
  * No extensions or flags are allowed currently, and so must be zero.
  */
 struct drm_i915_gem_vm_control {
+	/** @extensions: Zero-terminated chain of extensions. */
 	__u64 extensions;
+
+	/** @flags: reserved for future usage, currently MBZ */
 	__u32 flags;
+
+	/** @vm_id: Id of the VM created or to be destroyed */
 	__u32 vm_id;
 };
 
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-gfx] [PATCH v2 2/3] drm/i915: Update i915 uapi documentation
@ 2022-06-17  5:14   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 19+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-17  5:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel, daniel.vetter
  Cc: paulo.r.zanoni, chris.p.wilson, thomas.hellstrom, matthew.auld,
	christian.koenig

Add some missing i915 upai documentation which the new
i915 VM_BIND feature documentation will be refer to.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 include/uapi/drm/i915_drm.h | 205 ++++++++++++++++++++++++++++--------
 1 file changed, 160 insertions(+), 45 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index de49b68b4fc8..f5ce34d447b1 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -751,14 +751,27 @@ typedef struct drm_i915_irq_wait {
 
 /* Must be kept compact -- no holes and well documented */
 
-typedef struct drm_i915_getparam {
+/**
+ * struct drm_i915_getparam - Driver parameter query structure.
+ */
+struct drm_i915_getparam {
+	/** @param: Driver parameter to query. */
 	__s32 param;
-	/*
+
+	/**
+	 * @value: Address of memory where queried value should be put.
+	 *
 	 * WARNING: Using pointers instead of fixed-size u64 means we need to write
 	 * compat32 code. Don't repeat this mistake.
 	 */
 	int __user *value;
-} drm_i915_getparam_t;
+};
+
+/**
+ * typedef drm_i915_getparam_t - Driver parameter query structure.
+ * See struct drm_i915_getparam.
+ */
+typedef struct drm_i915_getparam drm_i915_getparam_t;
 
 /* Ioctl to set kernel params:
  */
@@ -1239,76 +1252,119 @@ struct drm_i915_gem_exec_object2 {
 	__u64 rsvd2;
 };
 
+/**
+ * struct drm_i915_gem_exec_fence - An input or output fence for the execbuf
+ * ioctl.
+ *
+ * The request will wait for input fence to signal before submission.
+ *
+ * The returned output fence will be signaled after the completion of the
+ * request.
+ */
 struct drm_i915_gem_exec_fence {
-	/**
-	 * User's handle for a drm_syncobj to wait on or signal.
-	 */
+	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
 	__u32 handle;
 
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_EXEC_FENCE_WAIT:
+	 * Wait for the input fence before request submission.
+	 *
+	 * I915_EXEC_FENCE_SIGNAL:
+	 * Return request completion fence as output
+	 */
+	__u32 flags;
 #define I915_EXEC_FENCE_WAIT            (1<<0)
 #define I915_EXEC_FENCE_SIGNAL          (1<<1)
 #define __I915_EXEC_FENCE_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SIGNAL << 1))
-	__u32 flags;
 };
 
-/*
- * See drm_i915_gem_execbuffer_ext_timeline_fences.
- */
-#define DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES 0
-
-/*
+/**
+ * struct drm_i915_gem_execbuffer_ext_timeline_fences - Timeline fences
+ * for execbuf ioctl.
+ *
  * This structure describes an array of drm_syncobj and associated points for
  * timeline variants of drm_syncobj. It is invalid to append this structure to
  * the execbuf if I915_EXEC_FENCE_ARRAY is set.
  */
 struct drm_i915_gem_execbuffer_ext_timeline_fences {
+#define DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES 0
+	/** @base: Extension link. See struct i915_user_extension. */
 	struct i915_user_extension base;
 
 	/**
-	 * Number of element in the handles_ptr & value_ptr arrays.
+	 * @fence_count: Number of elements in the @handles_ptr & @value_ptr
+	 * arrays.
 	 */
 	__u64 fence_count;
 
 	/**
-	 * Pointer to an array of struct drm_i915_gem_exec_fence of length
-	 * fence_count.
+	 * @handles_ptr: Pointer to an array of struct drm_i915_gem_exec_fence
+	 * of length @fence_count.
 	 */
 	__u64 handles_ptr;
 
 	/**
-	 * Pointer to an array of u64 values of length fence_count. Values
-	 * must be 0 for a binary drm_syncobj. A Value of 0 for a timeline
-	 * drm_syncobj is invalid as it turns a drm_syncobj into a binary one.
+	 * @values_ptr: Pointer to an array of u64 values of length
+	 * @fence_count.
+	 * Values must be 0 for a binary drm_syncobj. A Value of 0 for a
+	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+	 * binary one.
 	 */
 	__u64 values_ptr;
 };
 
+/**
+ * struct drm_i915_gem_execbuffer2 - Structure for DRM_I915_GEM_EXECBUFFER2
+ * ioctl.
+ */
 struct drm_i915_gem_execbuffer2 {
-	/**
-	 * List of gem_exec_object2 structs
-	 */
+	/** @buffers_ptr: Pointer to a list of gem_exec_object2 structs */
 	__u64 buffers_ptr;
+
+	/** @buffer_count: Number of elements in @buffers_ptr array */
 	__u32 buffer_count;
 
-	/** Offset in the batchbuffer to start execution from. */
+	/**
+	 * @batch_start_offset: Offset in the batchbuffer to start execution
+	 * from.
+	 */
 	__u32 batch_start_offset;
-	/** Bytes used in batchbuffer from batch_start_offset */
+
+	/**
+	 * @batch_len: Length in bytes of the batch buffer, starting from the
+	 * @batch_start_offset. If 0, length is assumed to be the batch buffer
+	 * object size.
+	 */
 	__u32 batch_len;
+
+	/** @DR1: deprecated */
 	__u32 DR1;
+
+	/** @DR4: deprecated */
 	__u32 DR4;
+
+	/** @num_cliprects: See @cliprects_ptr */
 	__u32 num_cliprects;
+
 	/**
-	 * This is a struct drm_clip_rect *cliprects if I915_EXEC_FENCE_ARRAY
-	 * & I915_EXEC_USE_EXTENSIONS are not set.
+	 * @cliprects_ptr: Kernel clipping was a DRI1 misfeature.
+	 *
+	 * It is invalid to use this field if I915_EXEC_FENCE_ARRAY or
+	 * I915_EXEC_USE_EXTENSIONS flags are not set.
 	 *
 	 * If I915_EXEC_FENCE_ARRAY is set, then this is a pointer to an array
-	 * of struct drm_i915_gem_exec_fence and num_cliprects is the length
-	 * of the array.
+	 * of &drm_i915_gem_exec_fence and @num_cliprects is the length of the
+	 * array.
 	 *
 	 * If I915_EXEC_USE_EXTENSIONS is set, then this is a pointer to a
-	 * single struct i915_user_extension and num_cliprects is 0.
+	 * single &i915_user_extension and num_cliprects is 0.
 	 */
 	__u64 cliprects_ptr;
+
+	/** @flags: Execbuf flags */
+	__u64 flags;
 #define I915_EXEC_RING_MASK              (0x3f)
 #define I915_EXEC_DEFAULT                (0<<0)
 #define I915_EXEC_RENDER                 (1<<0)
@@ -1326,10 +1382,6 @@ struct drm_i915_gem_execbuffer2 {
 #define I915_EXEC_CONSTANTS_REL_GENERAL (0<<6) /* default */
 #define I915_EXEC_CONSTANTS_ABSOLUTE 	(1<<6)
 #define I915_EXEC_CONSTANTS_REL_SURFACE (2<<6) /* gen4/5 only */
-	__u64 flags;
-	__u64 rsvd1; /* now used for context info */
-	__u64 rsvd2;
-};
 
 /** Resets the SO write offset registers for transform feedback on gen7. */
 #define I915_EXEC_GEN7_SOL_RESET	(1<<8)
@@ -1432,9 +1484,23 @@ struct drm_i915_gem_execbuffer2 {
  * drm_i915_gem_execbuffer_ext enum.
  */
 #define I915_EXEC_USE_EXTENSIONS	(1 << 21)
-
 #define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_USE_EXTENSIONS << 1))
 
+	/** @rsvd1: Context id */
+	__u64 rsvd1;
+
+	/**
+	 * @rsvd2: in and out sync_file file descriptors.
+	 *
+	 * When I915_EXEC_FENCE_IN or I915_EXEC_FENCE_SUBMIT flag is set, the
+	 * lower 32 bits of this field will have the in sync_file fd (input).
+	 *
+	 * When I915_EXEC_FENCE_OUT flag is set, the upper 32 bits of this
+	 * field will have the out sync_file fd (output).
+	 */
+	__u64 rsvd2;
+};
+
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
 	(eb2).rsvd1 = context & I915_EXEC_CONTEXT_ID_MASK
@@ -1814,19 +1880,58 @@ struct drm_i915_gem_context_create {
 	__u32 pad;
 };
 
+/**
+ * struct drm_i915_gem_context_create_ext - Structure for creating contexts.
+ */
 struct drm_i915_gem_context_create_ext {
-	__u32 ctx_id; /* output: id of new context*/
+	/** @ctx_id: Id of the created context (output) */
+	__u32 ctx_id;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS:
+	 *
+	 * Extensions may be appended to this structure and driver must check
+	 * for those. See @extensions.
+	 *
+	 * I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE
+	 *
+	 * Created context will have single timeline.
+	 */
 	__u32 flags;
 #define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS	(1u << 0)
 #define I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE	(1u << 1)
 #define I915_CONTEXT_CREATE_FLAGS_UNKNOWN \
 	(-(I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE << 1))
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 * 
+	 * I915_CONTEXT_CREATE_EXT_SETPARAM:
+	 * Context parameter to set or query during context creation.
+	 * See struct drm_i915_gem_context_create_ext_setparam.
+	 * 
+	 * I915_CONTEXT_CREATE_EXT_CLONE:
+	 * This extension has been removed. On the off chance someone somewhere
+	 * has attempted to use it, never re-use this extension number.
+	 */
 	__u64 extensions;
+#define I915_CONTEXT_CREATE_EXT_SETPARAM 0
+#define I915_CONTEXT_CREATE_EXT_CLONE 1
 };
 
+/**
+ * struct drm_i915_gem_context_param - Context parameter to set or query.
+ */
 struct drm_i915_gem_context_param {
+	/** @ctx_id: Context id */
 	__u32 ctx_id;
+
+	/** @size: Size of the parameter @value
 	__u32 size;
+
+	/** @param: Parameter to set or query */
 	__u64 param;
 #define I915_CONTEXT_PARAM_BAN_PERIOD	0x1
 /* I915_CONTEXT_PARAM_NO_ZEROMAP has been removed.  On the off chance
@@ -1973,6 +2078,7 @@ struct drm_i915_gem_context_param {
 #define I915_CONTEXT_PARAM_PROTECTED_CONTENT    0xd
 /* Must be kept compact -- no holes and well documented */
 
+	/** @value: Context parameter value to be set or queried */
 	__u64 value;
 };
 
@@ -2371,23 +2477,29 @@ struct i915_context_param_engines {
 	struct i915_engine_class_instance engines[N__]; \
 } __attribute__((packed)) name__
 
+/**
+ * struct drm_i915_gem_context_create_ext_setparam - Context parameter
+ * to set or query during context creation.
+ */
 struct drm_i915_gem_context_create_ext_setparam {
-#define I915_CONTEXT_CREATE_EXT_SETPARAM 0
+	/** @base: Extension link. See struct i915_user_extension. */
 	struct i915_user_extension base;
+
+	/**
+	 * @param: Context parameter to set or query.
+	 * See struct drm_i915_gem_context_param.
+	 */
 	struct drm_i915_gem_context_param param;
 };
 
-/* This API has been removed.  On the off chance someone somewhere has
- * attempted to use it, never re-use this extension number.
- */
-#define I915_CONTEXT_CREATE_EXT_CLONE 1
-
 struct drm_i915_gem_context_destroy {
 	__u32 ctx_id;
 	__u32 pad;
 };
 
-/*
+/**
+ * struct drm_i915_gem_vm_control - Structure to create or destroy VM.
+ *
  * DRM_I915_GEM_VM_CREATE -
  *
  * Create a new virtual memory address space (ppGTT) for use within a context
@@ -2397,20 +2509,23 @@ struct drm_i915_gem_context_destroy {
  * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
  * returned in the outparam @id.
  *
- * No flags are defined, with all bits reserved and must be zero.
- *
  * An extension chain maybe provided, starting with @extensions, and terminated
  * by the @next_extension being 0. Currently, no extensions are defined.
  *
  * DRM_I915_GEM_VM_DESTROY -
  *
- * Destroys a previously created VM id, specified in @id.
+ * Destroys a previously created VM id, specified in @vm_id.
  *
  * No extensions or flags are allowed currently, and so must be zero.
  */
 struct drm_i915_gem_vm_control {
+	/** @extensions: Zero-terminated chain of extensions. */
 	__u64 extensions;
+
+	/** @flags: reserved for future usage, currently MBZ */
 	__u32 flags;
+
+	/** @vm_id: Id of the VM created or to be destroyed */
 	__u32 vm_id;
 };
 
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-17  5:14 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-06-17  5:14   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 19+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-17  5:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel, daniel.vetter
  Cc: matthew.brost, paulo.r.zanoni, tvrtko.ursulin, chris.p.wilson,
	thomas.hellstrom, oak.zeng, matthew.auld, jason,
	lionel.g.landwerlin, christian.koenig

VM_BIND and related uapi definitions

v2: Reduce the scope to simple Mesa use case.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 Documentation/gpu/rfc/i915_vm_bind.h | 226 +++++++++++++++++++++++++++
 1 file changed, 226 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
new file mode 100644
index 000000000000..b7540ddb526d
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,226 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_HAS_VM_BIND
+ *
+ * VM_BIND feature availability.
+ * See typedef drm_i915_getparam_t param.
+ */
+#define I915_PARAM_HAS_VM_BIND		57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode of operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ *
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND	(1 << 0)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND		0x3d
+#define DRM_I915_GEM_VM_UNBIND		0x3e
+#define DRM_I915_GEM_EXECBUFFER3	0x3f
+
+#define DRM_IOCTL_I915_GEM_VM_BIND		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
+
+/**
+ * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion notification.
+ *
+ * A timeline out fence for vm_bind/unbind completion notification.
+ */
+struct drm_i915_gem_vm_bind_fence {
+	/** @handle: User's handle for a drm_syncobj to signal. */
+	__u32 handle;
+
+	/** @rsvd: Reserved, MBZ */
+	__u32 rsvd;
+
+	/**
+	 * @value: A point in the timeline.
+	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+	 * binary one.
+	 */
+	__u64 value;
+};
+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
+ * virtual address (VA) range to the section of an object that should be bound
+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) and can
+ * be mapped to whole object or a section of the object (partial binding).
+ * Multiple VA mappings can be created to the same section of the object
+ * (aliasing).
+ *
+ * The @start, @offset and @length should be 4K page aligned. However the DG2
+ * and XEHPSDV has 64K page size for device local-memory and has compact page
+ * table. On those platforms, for binding device local-memory objects, the
+ * @start should be 2M aligned, @offset and @length should be 64K aligned.
+ * Also, on those platforms, it is not allowed to bind an device local-memory
+ * object and a system memory object in a single 2M section of VA range.
+ */
+struct drm_i915_gem_vm_bind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @handle: Object handle */
+	__u32 handle;
+
+	/** @start: Virtual Address start to bind */
+	__u64 start;
+
+	/** @offset: Offset in object to bind */
+	__u64 offset;
+
+	/** @length: Length of mapping to bind */
+	__u64 length;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_GEM_VM_BIND_READONLY:
+	 * Mapping is read-only.
+	 *
+	 * I915_GEM_VM_BIND_CAPTURE:
+	 * Capture this mapping in the dump upon GPU error.
+	 */
+	__u64 flags;
+#define I915_GEM_VM_BIND_READONLY    (1 << 0)
+#define I915_GEM_VM_BIND_CAPTURE     (1 << 1)
+
+	/** @fence: Timeline fence for bind completion signaling */
+	struct drm_i915_gem_vm_bind_fence fence;
+
+	/** @extensions: 0-terminated chain of extensions */
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
+ *
+ * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
+ * address (VA) range that should be unbound from the device page table of the
+ * specified address space (VM). The specified VA range must match one of the
+ * mappings created with the VM_BIND ioctl. TLB is flushed upon unbind
+ * completion.
+ *
+ * The @start and @length musy specify a unique mapping bound with VM_BIND
+ * ioctl.
+ */
+struct drm_i915_gem_vm_unbind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @rsvd: Reserved, MBZ */
+	__u32 rsvd;
+
+	/** @start: Virtual Address start to unbind */
+	__u64 start;
+
+	/** @length: Length of mapping to unbind */
+	__u64 length;
+
+	/** @flags: Reserved for future usage, currently MBZ */
+	__u64 flags;
+
+	/** @fence: Timeline fence for unbind completion signaling */
+	struct drm_i915_gem_vm_bind_fence fence;
+
+	/** @extensions: 0-terminated chain of extensions */
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
+ * ioctl.
+ *
+ * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
+ * only works with this ioctl for submission.
+ * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
+ */
+struct drm_i915_gem_execbuffer3 {
+	/**
+	 * @ctx_id: Context id
+	 *
+	 * Only contexts with user engine map are allowed.
+	 */
+	__u32 ctx_id;
+
+	/**
+	 * @engine_idx: Engine index
+	 *
+	 * An index in the user engine map of the context specified by @ctx_id.
+	 */
+	__u32 engine_idx;
+
+	/** @rsvd1: Reserved, MBZ */
+	__u32 rsvd1;
+
+	/**
+	 * @batch_count: Number of batches in @batch_address array.
+	 *
+	 * 0 is invalid. For parallel submission, it should be equal to the
+	 * number of (parallel) engines involved in that submission.
+	 */
+	__u32 batch_count;
+
+	/**
+	 * @batch_address: Array of batch gpu virtual addresses.
+	 *
+	 * If @batch_count is 1, then it is the gpu virtual address of the
+	 * batch buffer. If @batch_count > 1, then it is a pointer to an array
+	 * of batch buffer gpu virtual addresses.
+	 */
+	__u64 batch_address;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_EXEC3_SECURE:
+	 * Request a privileged ("secure") batch buffer/s.
+	 * It is only available for DRM_ROOT_ONLY | DRM_MASTER processes.
+	 */
+	__u64 flags;
+#define I915_EXEC3_SECURE	(1<<0)
+
+	/** @rsvd2: Reserved, MBZ */
+	__u64 rsvd2;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * DRM_I915_GEM_EXECBUFFER3_EXT_TIMELINE_FENCES:
+	 * It has same format as DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES.
+	 * See struct drm_i915_gem_execbuffer_ext_timeline_fences.
+	 */
+	__u64 extensions;
+#define DRM_I915_GEM_EXECBUFFER3_EXT_TIMELINE_FENCES	0
+};
+
+/**
+ * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
+ * private to the specified VM.
+ *
+ * See struct drm_i915_gem_create_ext.
+ */
+struct drm_i915_gem_create_ext_vm_private {
+#define I915_GEM_CREATE_EXT_VM_PRIVATE		2
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+
+	/** @vm_id: Id of the VM to which the object is private */
+	__u32 vm_id;
+};
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-gfx] [PATCH v2 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-17  5:14   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 19+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-17  5:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel, daniel.vetter
  Cc: paulo.r.zanoni, chris.p.wilson, thomas.hellstrom, matthew.auld,
	christian.koenig

VM_BIND and related uapi definitions

v2: Reduce the scope to simple Mesa use case.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 Documentation/gpu/rfc/i915_vm_bind.h | 226 +++++++++++++++++++++++++++
 1 file changed, 226 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
new file mode 100644
index 000000000000..b7540ddb526d
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,226 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_HAS_VM_BIND
+ *
+ * VM_BIND feature availability.
+ * See typedef drm_i915_getparam_t param.
+ */
+#define I915_PARAM_HAS_VM_BIND		57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode of operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ *
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND	(1 << 0)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND		0x3d
+#define DRM_I915_GEM_VM_UNBIND		0x3e
+#define DRM_I915_GEM_EXECBUFFER3	0x3f
+
+#define DRM_IOCTL_I915_GEM_VM_BIND		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
+
+/**
+ * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion notification.
+ *
+ * A timeline out fence for vm_bind/unbind completion notification.
+ */
+struct drm_i915_gem_vm_bind_fence {
+	/** @handle: User's handle for a drm_syncobj to signal. */
+	__u32 handle;
+
+	/** @rsvd: Reserved, MBZ */
+	__u32 rsvd;
+
+	/**
+	 * @value: A point in the timeline.
+	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+	 * binary one.
+	 */
+	__u64 value;
+};
+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
+ * virtual address (VA) range to the section of an object that should be bound
+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) and can
+ * be mapped to whole object or a section of the object (partial binding).
+ * Multiple VA mappings can be created to the same section of the object
+ * (aliasing).
+ *
+ * The @start, @offset and @length should be 4K page aligned. However the DG2
+ * and XEHPSDV has 64K page size for device local-memory and has compact page
+ * table. On those platforms, for binding device local-memory objects, the
+ * @start should be 2M aligned, @offset and @length should be 64K aligned.
+ * Also, on those platforms, it is not allowed to bind an device local-memory
+ * object and a system memory object in a single 2M section of VA range.
+ */
+struct drm_i915_gem_vm_bind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @handle: Object handle */
+	__u32 handle;
+
+	/** @start: Virtual Address start to bind */
+	__u64 start;
+
+	/** @offset: Offset in object to bind */
+	__u64 offset;
+
+	/** @length: Length of mapping to bind */
+	__u64 length;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_GEM_VM_BIND_READONLY:
+	 * Mapping is read-only.
+	 *
+	 * I915_GEM_VM_BIND_CAPTURE:
+	 * Capture this mapping in the dump upon GPU error.
+	 */
+	__u64 flags;
+#define I915_GEM_VM_BIND_READONLY    (1 << 0)
+#define I915_GEM_VM_BIND_CAPTURE     (1 << 1)
+
+	/** @fence: Timeline fence for bind completion signaling */
+	struct drm_i915_gem_vm_bind_fence fence;
+
+	/** @extensions: 0-terminated chain of extensions */
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
+ *
+ * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
+ * address (VA) range that should be unbound from the device page table of the
+ * specified address space (VM). The specified VA range must match one of the
+ * mappings created with the VM_BIND ioctl. TLB is flushed upon unbind
+ * completion.
+ *
+ * The @start and @length musy specify a unique mapping bound with VM_BIND
+ * ioctl.
+ */
+struct drm_i915_gem_vm_unbind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @rsvd: Reserved, MBZ */
+	__u32 rsvd;
+
+	/** @start: Virtual Address start to unbind */
+	__u64 start;
+
+	/** @length: Length of mapping to unbind */
+	__u64 length;
+
+	/** @flags: Reserved for future usage, currently MBZ */
+	__u64 flags;
+
+	/** @fence: Timeline fence for unbind completion signaling */
+	struct drm_i915_gem_vm_bind_fence fence;
+
+	/** @extensions: 0-terminated chain of extensions */
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
+ * ioctl.
+ *
+ * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
+ * only works with this ioctl for submission.
+ * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
+ */
+struct drm_i915_gem_execbuffer3 {
+	/**
+	 * @ctx_id: Context id
+	 *
+	 * Only contexts with user engine map are allowed.
+	 */
+	__u32 ctx_id;
+
+	/**
+	 * @engine_idx: Engine index
+	 *
+	 * An index in the user engine map of the context specified by @ctx_id.
+	 */
+	__u32 engine_idx;
+
+	/** @rsvd1: Reserved, MBZ */
+	__u32 rsvd1;
+
+	/**
+	 * @batch_count: Number of batches in @batch_address array.
+	 *
+	 * 0 is invalid. For parallel submission, it should be equal to the
+	 * number of (parallel) engines involved in that submission.
+	 */
+	__u32 batch_count;
+
+	/**
+	 * @batch_address: Array of batch gpu virtual addresses.
+	 *
+	 * If @batch_count is 1, then it is the gpu virtual address of the
+	 * batch buffer. If @batch_count > 1, then it is a pointer to an array
+	 * of batch buffer gpu virtual addresses.
+	 */
+	__u64 batch_address;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_EXEC3_SECURE:
+	 * Request a privileged ("secure") batch buffer/s.
+	 * It is only available for DRM_ROOT_ONLY | DRM_MASTER processes.
+	 */
+	__u64 flags;
+#define I915_EXEC3_SECURE	(1<<0)
+
+	/** @rsvd2: Reserved, MBZ */
+	__u64 rsvd2;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * DRM_I915_GEM_EXECBUFFER3_EXT_TIMELINE_FENCES:
+	 * It has same format as DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES.
+	 * See struct drm_i915_gem_execbuffer_ext_timeline_fences.
+	 */
+	__u64 extensions;
+#define DRM_I915_GEM_EXECBUFFER3_EXT_TIMELINE_FENCES	0
+};
+
+/**
+ * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
+ * private to the specified VM.
+ *
+ * See struct drm_i915_gem_create_ext.
+ */
+struct drm_i915_gem_create_ext_vm_private {
+#define I915_GEM_CREATE_EXT_VM_PRIVATE		2
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+
+	/** @vm_id: Id of the VM to which the object is private */
+	__u32 vm_id;
+};
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/doc/rfc: i915 VM_BIND feature design + uapi
  2022-06-17  5:14 ` [Intel-gfx] " Niranjana Vishwanathapura
                   ` (3 preceding siblings ...)
  (?)
@ 2022-06-17  5:26 ` Patchwork
  -1 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2022-06-17  5:26 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-gfx

== Series Details ==

Series: drm/doc/rfc: i915 VM_BIND feature design + uapi
URL   : https://patchwork.freedesktop.org/series/105267/
State : failure

== Summary ==

Error: make failed
  CALL    scripts/checksyscalls.sh
  CALL    scripts/atomic/check-atomics.sh
  DESCEND objtool
  CHK     include/generated/compile.h
  CC [M]  drivers/gpu/drm/i915/i915_driver.o
In file included from ./drivers/gpu/drm/i915/i915_pmu.h:13,
                 from ./drivers/gpu/drm/i915/gt/intel_engine_types.h:21,
                 from ./drivers/gpu/drm/i915/gt/intel_context_types.h:18,
                 from ./drivers/gpu/drm/i915/gem/i915_gem_context_types.h:20,
                 from ./drivers/gpu/drm/i915/i915_request.h:34,
                 from ./drivers/gpu/drm/i915/i915_active.h:13,
                 from ./drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h:12,
                 from ./drivers/gpu/drm/i915/i915_vma.h:33,
                 from drivers/gpu/drm/i915/display/intel_display_types.h:48,
                 from drivers/gpu/drm/i915/i915_driver.c:52:
./include/uapi/drm/i915_drm.h:1934:2: error: "/*" within comment [-Werror=comment]
  /** @param: Parameter to set or query */
   
cc1: all warnings being treated as errors
scripts/Makefile.build:249: recipe for target 'drivers/gpu/drm/i915/i915_driver.o' failed
make[4]: *** [drivers/gpu/drm/i915/i915_driver.o] Error 1
scripts/Makefile.build:466: recipe for target 'drivers/gpu/drm/i915' failed
make[3]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:466: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:466: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Makefile:1843: recipe for target 'drivers' failed
make: *** [drivers] Error 2



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH v2 2/3] drm/i915: Update i915 uapi documentation
  2022-06-17  5:14   ` [Intel-gfx] " Niranjana Vishwanathapura
  (?)
@ 2022-06-17  9:10   ` kernel test robot
  -1 siblings, 0 replies; 19+ messages in thread
From: kernel test robot @ 2022-06-17  9:10 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: llvm, kbuild-all

Hi Niranjana,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on drm-tip/drm-tip]

url:    https://github.com/intel-lab-lkp/linux/commits/Niranjana-Vishwanathapura/drm-doc-rfc-i915-VM_BIND-feature-design-uapi/20220617-131543
base:   git://anongit.freedesktop.org/drm/drm-tip drm-tip
config: x86_64-randconfig-a005 (https://download.01.org/0day-ci/archive/20220617/202206171655.ypHKJ5Tg-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d764aa7fc6b9cc3fbe960019018f5f9e941eb0a6)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/036987efcc7c66ca2bc7a2dff4da7716c3459480
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Niranjana-Vishwanathapura/drm-doc-rfc-i915-VM_BIND-feature-design-uapi/20220617-131543
        git checkout 036987efcc7c66ca2bc7a2dff4da7716c3459480
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/gpu/drm/i915/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/i915/i915_driver.c:52:
   In file included from drivers/gpu/drm/i915/display/intel_display_types.h:48:
   In file included from drivers/gpu/drm/i915/i915_vma.h:33:
   In file included from drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h:12:
   In file included from drivers/gpu/drm/i915/i915_active.h:13:
   In file included from drivers/gpu/drm/i915/i915_request.h:34:
   In file included from drivers/gpu/drm/i915/gem/i915_gem_context_types.h:20:
   In file included from drivers/gpu/drm/i915/gt/intel_context_types.h:18:
   In file included from drivers/gpu/drm/i915/gt/intel_engine_types.h:21:
   In file included from drivers/gpu/drm/i915/i915_pmu.h:13:
>> include/uapi/drm/i915_drm.h:1934:2: warning: '/*' within block comment [-Wcomment]
           /** @param: Parameter to set or query */
           ^
   1 warning generated.
--
   In file included from drivers/gpu/drm/i915/gem/i915_gem_context.c:74:
   In file included from drivers/gpu/drm/i915/gt/gen6_ppgtt.h:9:
   In file included from drivers/gpu/drm/i915/gt/intel_gtt.h:28:
   In file included from drivers/gpu/drm/i915/gt/intel_reset.h:13:
   In file included from drivers/gpu/drm/i915/gt/intel_engine_types.h:21:
   In file included from drivers/gpu/drm/i915/i915_pmu.h:13:
>> include/uapi/drm/i915_drm.h:1934:2: warning: '/*' within block comment [-Wcomment]
           /** @param: Parameter to set or query */
           ^
>> drivers/gpu/drm/i915/gem/i915_gem_context.c:177:12: error: no member named 'size' in 'struct drm_i915_gem_context_param'
           if (args->size)
               ~~~~  ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:370:12: error: no member named 'size' in 'struct drm_i915_gem_context_param'
           if (args->size)
               ~~~~  ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:749:12: error: no member named 'size' in 'struct drm_i915_gem_context_param'
           if (args->size < sizeof(*user) ||
               ~~~~  ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:750:24: error: no member named 'size' in 'struct drm_i915_gem_context_param'
               !IS_ALIGNED(args->size - sizeof(*user), sizeof(*user->engines))) {
                           ~~~~  ^
   include/linux/align.h:13:30: note: expanded from macro 'IS_ALIGNED'
   #define IS_ALIGNED(x, a)                (((x) & ((typeof(x))(a) - 1)) == 0)
                                              ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:750:24: error: no member named 'size' in 'struct drm_i915_gem_context_param'
               !IS_ALIGNED(args->size - sizeof(*user), sizeof(*user->engines))) {
                           ~~~~  ^
   include/linux/align.h:13:44: note: expanded from macro 'IS_ALIGNED'
   #define IS_ALIGNED(x, a)                (((x) & ((typeof(x))(a) - 1)) == 0)
                                                            ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:752:10: error: no member named 'size' in 'struct drm_i915_gem_context_param'
                           args->size);
                           ~~~~  ^
   include/drm/drm_print.h:461:63: note: expanded from macro 'drm_dbg'
           drm_dev_dbg((drm) ? (drm)->dev : NULL, DRM_UT_DRIVER, fmt, ##__VA_ARGS__)
                                                                        ^~~~~~~~~~~
   drivers/gpu/drm/i915/gem/i915_gem_context.c:756:27: error: no member named 'size' in 'struct drm_i915_gem_context_param'
           set.num_engines = (args->size - sizeof(*user)) / sizeof(*user->engines);
                              ~~~~  ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:821:12: error: no member named 'size' in 'struct drm_i915_gem_context_param'
           if (args->size < sizeof(user_sseu))
               ~~~~  ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:870:8: error: no member named 'size' in 'struct drm_i915_gem_context_param'
           args->size = sizeof(user_sseu);
           ~~~~  ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:883:13: error: no member named 'size' in 'struct drm_i915_gem_context_param'
                   if (args->size)
                       ~~~~  ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:892:13: error: no member named 'size' in 'struct drm_i915_gem_context_param'
                   if (args->size)
                       ~~~~  ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:905:13: error: no member named 'size' in 'struct drm_i915_gem_context_param'
                   if (args->size)
                       ~~~~  ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:934:13: error: no member named 'size' in 'struct drm_i915_gem_context_param'
                   if (args->size)
                       ~~~~  ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:1866:8: error: no member named 'size' in 'struct drm_i915_gem_context_param'
           args->size = 0;
           ~~~~  ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:1981:12: error: no member named 'size' in 'struct drm_i915_gem_context_param'
           if (args->size < sizeof(user_sseu))
               ~~~~  ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:2019:8: error: no member named 'size' in 'struct drm_i915_gem_context_param'
           args->size = sizeof(user_sseu);
           ~~~~  ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:2030:12: error: no member named 'size' in 'struct drm_i915_gem_context_param'
           if (args->size)
               ~~~~  ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:2067:8: error: no member named 'size' in 'struct drm_i915_gem_context_param'
           args->size = 0;
           ~~~~  ^
   drivers/gpu/drm/i915/gem/i915_gem_context.c:2081:13: error: no member named 'size' in 'struct drm_i915_gem_context_param'
                   if (args->size)
                       ~~~~  ^
   fatal error: too many errors emitted, stopping now [-ferror-limit=]
   1 warning and 20 errors generated.


vim +177 drivers/gpu/drm/i915/gem/i915_gem_context.c

6b736de5746a30 drivers/gpu/drm/i915/i915_gem_context.c     Chris Wilson   2019-04-26  171  
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  172  static int validate_priority(struct drm_i915_private *i915,
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  173  			     const struct drm_i915_gem_context_param *args)
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  174  {
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  175  	s64 priority = args->value;
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  176  
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08 @177  	if (args->size)
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  178  		return -EINVAL;
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  179  
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  180  	if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  181  		return -ENODEV;
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  182  
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  183  	if (priority > I915_CONTEXT_MAX_USER_PRIORITY ||
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  184  	    priority < I915_CONTEXT_MIN_USER_PRIORITY)
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  185  		return -EINVAL;
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  186  
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  187  	if (priority > I915_CONTEXT_DEFAULT_PRIORITY &&
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  188  	    !capable(CAP_SYS_NICE))
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  189  		return -EPERM;
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  190  
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  191  	return 0;
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  192  }
aaa5957c97592b drivers/gpu/drm/i915/gem/i915_gem_context.c Jason Ekstrand 2021-07-08  193  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/3] drm/doc/rfc: VM_BIND feature design document
  2022-06-17  5:14   ` [Intel-gfx] " Niranjana Vishwanathapura
  (?)
@ 2022-06-20 10:43   ` Tvrtko Ursulin
  2022-06-20 16:29     ` Niranjana Vishwanathapura
  -1 siblings, 1 reply; 19+ messages in thread
From: Tvrtko Ursulin @ 2022-06-20 10:43 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, intel-gfx, dri-devel, daniel.vetter
  Cc: christian.koenig, thomas.hellstrom, paulo.r.zanoni,
	chris.p.wilson, matthew.auld


Hi,

On 17/06/2022 06:14, Niranjana Vishwanathapura wrote:
> VM_BIND design document with description of intended use cases.
> 
> v2: Reduce the scope to simple Mesa use case.

since I expressed interest please add me to cc when sending out.

How come the direction changed to simplify all of a sudden? I did not 
spot any discussion to that effect. Was it internal talks?

> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>   Documentation/gpu/rfc/i915_vm_bind.rst | 238 +++++++++++++++++++++++++
>   Documentation/gpu/rfc/index.rst        |   4 +
>   2 files changed, 242 insertions(+)
>   create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst
> 
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst b/Documentation/gpu/rfc/i915_vm_bind.rst
> new file mode 100644
> index 000000000000..4ab590ef11fd
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.rst
> @@ -0,0 +1,238 @@
> +==========================================
> +I915 VM_BIND feature design and use cases
> +==========================================
> +
> +VM_BIND feature
> +================
> +DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
> +objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
> +specified address space (VM). These mappings (also referred to as persistent
> +mappings) will be persistent across multiple GPU submissions (execbuf calls)
> +issued by the UMD, without user having to provide a list of all required
> +mappings during each submission (as required by older execbuf mode).
> +
> +The VM_BIND/UNBIND calls allow UMDs to request a timeline fence for signaling
> +the completion of bind/unbind operation.
> +
> +VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND.
> +User has to opt-in for VM_BIND mode of binding for an address space (VM)
> +during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
> +
> +Normally, vm_bind/unbind operations will get completed synchronously,

To me synchronously, at this point in the text, reads as ioctl will 
return only when the operation is done. Rest of the paragraph however 
disagrees (plus existence of out fence). It is not clear to me what is 
the actual behaviour. Will it be clear to userspace developers reading 
uapi kerneldoc? If it is async, what are the ordering rules in this version?

> +but if the object is being moved, the binding will happen once that the
> +moving is complete and out fence will be signaled after binding is complete.
> +The bind/unbind operation can get completed out of submission order.
> +
> +VM_BIND features include:
> +
> +* Multiple Virtual Address (VA) mappings can map to the same physical pages
> +  of an object (aliasing).
> +* VA mapping can map to a partial section of the BO (partial binding).
> +* Support capture of persistent mappings in the dump upon GPU error.
> +* TLB is flushed upon unbind completion. Batching of TLB flushes in some
> +  use cases will be helpful.
> +* Support for userptr gem objects (no special uapi is required for this).
> +
> +Execbuf ioctl in VM_BIND mode
> +-------------------------------
> +A VM in VM_BIND mode will not support older execbuf mode of binding.
> +The execbuf ioctl handling in VM_BIND mode differs significantly from the
> +older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
> +Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
> +struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
> +execlist. Hence, no support for implicit sync. It is expected that the below
> +work will be able to support requirements of object dependency setting in all
> +use cases:
> +
> +"dma-buf: Add an API for exporting sync files"
> +(https://lwn.net/Articles/859290/)

What does this mean? If execbuf3 does not know about target objects how 
can we add a meaningful fence?

> +
> +The execbuf3 ioctl directly specifies the batch addresses instead of as
> +object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
> +support many of the older features like in/out/submit fences, fence array,
> +default gem context and many more (See struct drm_i915_gem_execbuffer3).
> +
> +In VM_BIND mode, VA allocation is completely managed by the user instead of
> +the i915 driver. Hence all VA assignment, eviction are not applicable in
> +VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
> +be using the i915_vma active reference tracking. It will instead use dma-resv
> +object for that (See `VM_BIND dma_resv usage`_).
> +
> +So, a lot of existing code supporting execbuf2 ioctl, like relocations, VA
> +evictions, vma lookup table, implicit sync, vma active reference tracking etc.,
> +are not applicable for execbuf3 ioctl. Hence, all execbuf3 specific handling
> +should be in a separate file and only functionalities common to these ioctls
> +can be the shared code where possible.
> +
> +VM_PRIVATE objects
> +-------------------
> +By default, BOs can be mapped on multiple VMs and can also be dma-buf
> +exported. Hence these BOs are referred to as Shared BOs.
> +During each execbuf submission, the request fence must be added to the
> +dma-resv fence list of all shared BOs mapped on the VM.

Does this tie to my previous question? Design is to add each fence to 
literally _all_ BOs mapped to a VM, on every execbuf3? If so, is that 
definitely needed and for what use case? Mixing implicit and explicit, I 
mean bridging implicit and explicit sync clients?

Regards,

Tvrtko

> +
> +VM_BIND feature introduces an optimization where user can create BO which
> +is private to a specified VM via I915_GEM_CREATE_EXT_VM_PRIVATE flag during
> +BO creation. Unlike Shared BOs, these VM private BOs can only be mapped on
> +the VM they are private to and can't be dma-buf exported.
> +All private BOs of a VM share the dma-resv object. Hence during each execbuf
> +submission, they need only one dma-resv fence list updated. Thus, the fast
> +path (where required mappings are already bound) submission latency is O(1)
> +w.r.t the number of VM private BOs.
> +
> +VM_BIND locking hirarchy
> +-------------------------
> +The locking design here supports the older (execlist based) execbuf mode, the
> +newer VM_BIND mode, the VM_BIND mode with GPU page faults and possible future
> +system allocator support (See `Shared Virtual Memory (SVM) support`_).
> +The older execbuf mode and the newer VM_BIND mode without page faults manages
> +residency of backing storage using dma_fence. The VM_BIND mode with page faults
> +and the system allocator support do not use any dma_fence at all.
> +
> +VM_BIND locking order is as below.
> +
> +1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
> +   vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
> +   mapping.
> +
> +   In future, when GPU page faults are supported, we can potentially use a
> +   rwsem instead, so that multiple page fault handlers can take the read side
> +   lock to lookup the mapping and hence can run in parallel.
> +   The older execbuf mode of binding do not need this lock.
> +
> +2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs to
> +   be held while binding/unbinding a vma in the async worker and while updating
> +   dma-resv fence list of an object. Note that private BOs of a VM will all
> +   share a dma-resv object.
> +
> +   The future system allocator support will use the HMM prescribed locking
> +   instead.
> +
> +3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
> +   invalidated vmas (due to eviction and userptr invalidation) etc.
> +
> +When GPU page faults are supported, the execbuf path do not take any of these
> +locks. There we will simply smash the new batch buffer address into the ring and
> +then tell the scheduler run that. The lock taking only happens from the page
> +fault handler, where we take lock-A in read mode, whichever lock-B we need to
> +find the backing storage (dma_resv lock for gem objects, and hmm/core mm for
> +system allocator) and some additional locks (lock-D) for taking care of page
> +table races. Page fault mode should not need to ever manipulate the vm lists,
> +so won't ever need lock-C.
> +
> +VM_BIND LRU handling
> +---------------------
> +We need to ensure VM_BIND mapped objects are properly LRU tagged to avoid
> +performance degradation. We will also need support for bulk LRU movement of
> +VM_BIND objects to avoid additional latencies in execbuf path.
> +
> +The page table pages are similar to VM_BIND mapped objects (See
> +`Evictable page table allocations`_) and are maintained per VM and needs to
> +be pinned in memory when VM is made active (ie., upon an execbuf call with
> +that VM). So, bulk LRU movement of page table pages is also needed.
> +
> +VM_BIND dma_resv usage
> +-----------------------
> +Fences needs to be added to all VM_BIND mapped objects. During each execbuf
> +submission, they are added with DMA_RESV_USAGE_BOOKKEEP usage to prevent
> +over sync (See enum dma_resv_usage). One can override it with either
> +DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE usage during object dependency
> +setting (either through explicit or implicit mechanism).
> +
> +When vm_bind is called for a non-private object while the VM is already
> +active, the fences need to be copied from VM's shared dma-resv object
> +(common to all private objects of the VM) to this non-private object.
> +If this results in performance degradation, then some optimization will
> +be needed here. This is not a problem for VM's private objects as they use
> +shared dma-resv object which is always updated on each execbuf submission.
> +
> +Also, in VM_BIND mode, use dma-resv apis for determining object activeness
> +(See dma_resv_test_signaled() and dma_resv_wait_timeout()) and do not use the
> +older i915_vma active reference tracking which is deprecated. This should be
> +easier to get it working with the current TTM backend.
> +
> +Mesa use case
> +--------------
> +VM_BIND can potentially reduce the CPU overhead in Mesa (both Vulkan and Iris),
> +hence improving performance of CPU-bound applications. It also allows us to
> +implement Vulkan's Sparse Resources. With increasing GPU hardware performance,
> +reducing CPU overhead becomes more impactful.
> +
> +
> +Other VM_BIND use cases
> +========================
> +
> +Long running Compute contexts
> +------------------------------
> +Usage of dma-fence expects that they complete in reasonable amount of time.
> +Compute on the other hand can be long running. Hence it is appropriate for
> +compute to use user/memory fence (See `User/Memory Fence`_) and dma-fence usage
> +must be limited to in-kernel consumption only.
> +
> +Where GPU page faults are not available, kernel driver upon buffer invalidation
> +will initiate a suspend (preemption) of long running context, finish the
> +invalidation, revalidate the BO and then resume the compute context. This is
> +done by having a per-context preempt fence which is enabled when someone tries
> +to wait on it and triggers the context preemption.
> +
> +User/Memory Fence
> +~~~~~~~~~~~~~~~~~~
> +User/Memory fence is a <address, value> pair. To signal the user fence, the
> +specified value will be written at the specified virtual address and wakeup the
> +waiting process. User fence can be signaled either by the GPU or kernel async
> +worker (like upon bind completion). User can wait on a user fence with a new
> +user fence wait ioctl.
> +
> +Here is some prior work on this:
> +https://patchwork.freedesktop.org/patch/349417/
> +
> +Low Latency Submission
> +~~~~~~~~~~~~~~~~~~~~~~~
> +Allows compute UMD to directly submit GPU jobs instead of through execbuf
> +ioctl. This is made possible by VM_BIND is not being synchronized against
> +execbuf. VM_BIND allows bind/unbind of mappings required for the directly
> +submitted jobs.
> +
> +Debugger
> +---------
> +With debug event interface user space process (debugger) is able to keep track
> +of and act upon resources created by another process (debugged) and attached
> +to GPU via vm_bind interface.
> +
> +GPU page faults
> +----------------
> +GPU page faults when supported (in future), will only be supported in the
> +VM_BIND mode. While both the older execbuf mode and the newer VM_BIND mode of
> +binding will require using dma-fence to ensure residency, the GPU page faults
> +mode when supported, will not use any dma-fence as residency is purely managed
> +by installing and removing/invalidating page table entries.
> +
> +Page level hints settings
> +--------------------------
> +VM_BIND allows any hints setting per mapping instead of per BO.
> +Possible hints include read-only mapping, placement and atomicity.
> +Sub-BO level placement hint will be even more relevant with
> +upcoming GPU on-demand page fault support.
> +
> +Page level Cache/CLOS settings
> +-------------------------------
> +VM_BIND allows cache/CLOS settings per mapping instead of per BO.
> +
> +Evictable page table allocations
> +---------------------------------
> +Make pagetable allocations evictable and manage them similar to VM_BIND
> +mapped objects. Page table pages are similar to persistent mappings of a
> +VM (difference here are that the page table pages will not have an i915_vma
> +structure and after swapping pages back in, parent page link needs to be
> +updated).
> +
> +Shared Virtual Memory (SVM) support
> +------------------------------------
> +VM_BIND interface can be used to map system memory directly (without gem BO
> +abstraction) using the HMM interface. SVM is only supported with GPU page
> +faults enabled.
> +
> +VM_BIND UAPI
> +=============
> +
> +.. kernel-doc:: Documentation/gpu/rfc/i915_vm_bind.h
> diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
> index 91e93a705230..7d10c36b268d 100644
> --- a/Documentation/gpu/rfc/index.rst
> +++ b/Documentation/gpu/rfc/index.rst
> @@ -23,3 +23,7 @@ host such documentation:
>   .. toctree::
>   
>       i915_scheduler.rst
> +
> +.. toctree::
> +
> +    i915_vm_bind.rst

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v2 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-17  5:14   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-06-20 14:42     ` Zeng, Oak
  -1 siblings, 0 replies; 19+ messages in thread
From: Zeng, Oak @ 2022-06-20 14:42 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana, intel-gfx, dri-devel, Vetter,  Daniel
  Cc: Brost, Matthew, Zanoni, Paulo R, Ursulin, Tvrtko, Wilson,
	Chris P, Hellstrom, Thomas, Auld, Matthew, jason, Landwerlin,
	Lionel G, christian.koenig



Thanks,
Oak

> -----Original Message-----
> From: Vishwanathapura, Niranjana <niranjana.vishwanathapura@intel.com>
> Sent: June 17, 2022 1:15 AM
> To: intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Vetter,
> Daniel <daniel.vetter@intel.com>
> Cc: Hellstrom, Thomas <thomas.hellstrom@intel.com>; Wilson, Chris P
> <chris.p.wilson@intel.com>; jason@jlekstrand.net;
> christian.koenig@amd.com; Brost, Matthew <matthew.brost@intel.com>;
> Ursulin, Tvrtko <tvrtko.ursulin@intel.com>; Auld, Matthew
> <matthew.auld@intel.com>; Landwerlin, Lionel G
> <lionel.g.landwerlin@intel.com>; Zanoni, Paulo R
> <paulo.r.zanoni@intel.com>; Zeng, Oak <oak.zeng@intel.com>
> Subject: [PATCH v2 3/3] drm/doc/rfc: VM_BIND uapi definition
> 
> VM_BIND and related uapi definitions
> 
> v2: Reduce the scope to simple Mesa use case.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  Documentation/gpu/rfc/i915_vm_bind.h | 226
> +++++++++++++++++++++++++++
>  1 file changed, 226 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> 
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
> b/Documentation/gpu/rfc/i915_vm_bind.h
> new file mode 100644
> index 000000000000..b7540ddb526d
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> @@ -0,0 +1,226 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +/**
> + * DOC: I915_PARAM_HAS_VM_BIND
> + *
> + * VM_BIND feature availability.
> + * See typedef drm_i915_getparam_t param.
> + */
> +#define I915_PARAM_HAS_VM_BIND		57
> +
> +/**
> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> + *
> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> + * See struct drm_i915_gem_vm_control flags.
> + *
> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept
> any
> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> + *
> + */
> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND	(1 << 0)
> +
> +/* VM_BIND related ioctls */
> +#define DRM_I915_GEM_VM_BIND		0x3d
> +#define DRM_I915_GEM_VM_UNBIND		0x3e
> +#define DRM_I915_GEM_EXECBUFFER3	0x3f
> +
> +#define DRM_IOCTL_I915_GEM_VM_BIND
> 	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND,
> struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND
> 	DRM_IOWR(DRM_COMMAND_BASE +
> DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3
> 	DRM_IOWR(DRM_COMMAND_BASE +
> DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> +
> +/**
> + * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion
> notification.
> + *
> + * A timeline out fence for vm_bind/unbind completion notification.
> + */
> +struct drm_i915_gem_vm_bind_fence {
> +	/** @handle: User's handle for a drm_syncobj to signal. */
> +	__u32 handle;
> +
> +	/** @rsvd: Reserved, MBZ */
> +	__u32 rsvd;
> +
> +	/**
> +	 * @value: A point in the timeline.
> +	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> +	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> +	 * binary one.
> +	 */
> +	__u64 value;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the mapping of
> GPU
> + * virtual address (VA) range to the section of an object that should be
> bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not currently bound) and can
> + * be mapped to whole object or a section of the object (partial binding).
> + * Multiple VA mappings can be created to the same section of the object
> + * (aliasing).
> + *
> + * The @start, @offset and @length should be 4K page aligned. However
> the DG2
> + * and XEHPSDV has 64K page size for device local-memory and has compact
> page
> + * table. On those platforms, for binding device local-memory objects, the
> + * @start should be 2M aligned, @offset and @length should be 64K aligned.
> + * Also, on those platforms, it is not allowed to bind an device local-memory
> + * object and a system memory object in a single 2M section of VA range.
> + */
> +struct drm_i915_gem_vm_bind {
> +	/** @vm_id: VM (address space) id to bind */
> +	__u32 vm_id;
> +
> +	/** @handle: Object handle */
> +	__u32 handle;
> +
> +	/** @start: Virtual Address start to bind */
> +	__u64 start;
> +
> +	/** @offset: Offset in object to bind */
> +	__u64 offset;
> +
> +	/** @length: Length of mapping to bind */
> +	__u64 length;
> +
> +	/**
> +	 * @flags: Supported flags are:
> +	 *
> +	 * I915_GEM_VM_BIND_READONLY:
> +	 * Mapping is read-only.
> +	 *
> +	 * I915_GEM_VM_BIND_CAPTURE:
> +	 * Capture this mapping in the dump upon GPU error.
> +	 */
> +	__u64 flags;
> +#define I915_GEM_VM_BIND_READONLY    (1 << 0)

Should we define another flag for DEVICE_ATOMIC? Without this flag, do you imply all the mapping support device atomic operation? 
HW platform also has an implication to device atomic, i.e., some platform don't support device atomics to system memory.

Regards,
Oak

> +#define I915_GEM_VM_BIND_CAPTURE     (1 << 1)
> +
> +	/** @fence: Timeline fence for bind completion signaling */
> +	struct drm_i915_gem_vm_bind_fence fence;
> +
> +	/** @extensions: 0-terminated chain of extensions */
> +	__u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> + *
> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
> + * address (VA) range that should be unbound from the device page table
> of the
> + * specified address space (VM). The specified VA range must match one of
> the
> + * mappings created with the VM_BIND ioctl. TLB is flushed upon unbind
> + * completion.
> + *
> + * The @start and @length musy specify a unique mapping bound with
> VM_BIND
> + * ioctl.
> + */
> +struct drm_i915_gem_vm_unbind {
> +	/** @vm_id: VM (address space) id to bind */
> +	__u32 vm_id;
> +
> +	/** @rsvd: Reserved, MBZ */
> +	__u32 rsvd;
> +
> +	/** @start: Virtual Address start to unbind */
> +	__u64 start;
> +
> +	/** @length: Length of mapping to unbind */
> +	__u64 length;
> +
> +	/** @flags: Reserved for future usage, currently MBZ */
> +	__u64 flags;
> +
> +	/** @fence: Timeline fence for unbind completion signaling */
> +	struct drm_i915_gem_vm_bind_fence fence;
> +
> +	/** @extensions: 0-terminated chain of extensions */
> +	__u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_execbuffer3 - Structure for
> DRM_I915_GEM_EXECBUFFER3
> + * ioctl.
> + *
> + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and
> VM_BIND mode
> + * only works with this ioctl for submission.
> + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
> + */
> +struct drm_i915_gem_execbuffer3 {
> +	/**
> +	 * @ctx_id: Context id
> +	 *
> +	 * Only contexts with user engine map are allowed.
> +	 */
> +	__u32 ctx_id;
> +
> +	/**
> +	 * @engine_idx: Engine index
> +	 *
> +	 * An index in the user engine map of the context specified by
> @ctx_id.
> +	 */
> +	__u32 engine_idx;
> +
> +	/** @rsvd1: Reserved, MBZ */
> +	__u32 rsvd1;
> +
> +	/**
> +	 * @batch_count: Number of batches in @batch_address array.
> +	 *
> +	 * 0 is invalid. For parallel submission, it should be equal to the
> +	 * number of (parallel) engines involved in that submission.
> +	 */
> +	__u32 batch_count;
> +
> +	/**
> +	 * @batch_address: Array of batch gpu virtual addresses.
> +	 *
> +	 * If @batch_count is 1, then it is the gpu virtual address of the
> +	 * batch buffer. If @batch_count > 1, then it is a pointer to an array
> +	 * of batch buffer gpu virtual addresses.
> +	 */
> +	__u64 batch_address;
> +
> +	/**
> +	 * @flags: Supported flags are:
> +	 *
> +	 * I915_EXEC3_SECURE:
> +	 * Request a privileged ("secure") batch buffer/s.
> +	 * It is only available for DRM_ROOT_ONLY | DRM_MASTER
> processes.
> +	 */
> +	__u64 flags;
> +#define I915_EXEC3_SECURE	(1<<0)
> +
> +	/** @rsvd2: Reserved, MBZ */
> +	__u64 rsvd2;
> +
> +	/**
> +	 * @extensions: Zero-terminated chain of extensions.
> +	 *
> +	 * DRM_I915_GEM_EXECBUFFER3_EXT_TIMELINE_FENCES:
> +	 * It has same format as
> DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES.
> +	 * See struct drm_i915_gem_execbuffer_ext_timeline_fences.
> +	 */
> +	__u64 extensions;
> +#define DRM_I915_GEM_EXECBUFFER3_EXT_TIMELINE_FENCES	0
> +};
> +
> +/**
> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
> object
> + * private to the specified VM.
> + *
> + * See struct drm_i915_gem_create_ext.
> + */
> +struct drm_i915_gem_create_ext_vm_private {
> +#define I915_GEM_CREATE_EXT_VM_PRIVATE		2
> +	/** @base: Extension link. See struct i915_user_extension. */
> +	struct i915_user_extension base;
> +
> +	/** @vm_id: Id of the VM to which the object is private */
> +	__u32 vm_id;
> +};
> --
> 2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH v2 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-20 14:42     ` Zeng, Oak
  0 siblings, 0 replies; 19+ messages in thread
From: Zeng, Oak @ 2022-06-20 14:42 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana, intel-gfx, dri-devel, Vetter,  Daniel
  Cc: Zanoni, Paulo R, Wilson, Chris P, Hellstrom, Thomas, Auld,
	Matthew, christian.koenig



Thanks,
Oak

> -----Original Message-----
> From: Vishwanathapura, Niranjana <niranjana.vishwanathapura@intel.com>
> Sent: June 17, 2022 1:15 AM
> To: intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Vetter,
> Daniel <daniel.vetter@intel.com>
> Cc: Hellstrom, Thomas <thomas.hellstrom@intel.com>; Wilson, Chris P
> <chris.p.wilson@intel.com>; jason@jlekstrand.net;
> christian.koenig@amd.com; Brost, Matthew <matthew.brost@intel.com>;
> Ursulin, Tvrtko <tvrtko.ursulin@intel.com>; Auld, Matthew
> <matthew.auld@intel.com>; Landwerlin, Lionel G
> <lionel.g.landwerlin@intel.com>; Zanoni, Paulo R
> <paulo.r.zanoni@intel.com>; Zeng, Oak <oak.zeng@intel.com>
> Subject: [PATCH v2 3/3] drm/doc/rfc: VM_BIND uapi definition
> 
> VM_BIND and related uapi definitions
> 
> v2: Reduce the scope to simple Mesa use case.
> 
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> ---
>  Documentation/gpu/rfc/i915_vm_bind.h | 226
> +++++++++++++++++++++++++++
>  1 file changed, 226 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> 
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
> b/Documentation/gpu/rfc/i915_vm_bind.h
> new file mode 100644
> index 000000000000..b7540ddb526d
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> @@ -0,0 +1,226 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +/**
> + * DOC: I915_PARAM_HAS_VM_BIND
> + *
> + * VM_BIND feature availability.
> + * See typedef drm_i915_getparam_t param.
> + */
> +#define I915_PARAM_HAS_VM_BIND		57
> +
> +/**
> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> + *
> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> + * See struct drm_i915_gem_vm_control flags.
> + *
> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept
> any
> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> + *
> + */
> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND	(1 << 0)
> +
> +/* VM_BIND related ioctls */
> +#define DRM_I915_GEM_VM_BIND		0x3d
> +#define DRM_I915_GEM_VM_UNBIND		0x3e
> +#define DRM_I915_GEM_EXECBUFFER3	0x3f
> +
> +#define DRM_IOCTL_I915_GEM_VM_BIND
> 	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND,
> struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND
> 	DRM_IOWR(DRM_COMMAND_BASE +
> DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3
> 	DRM_IOWR(DRM_COMMAND_BASE +
> DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> +
> +/**
> + * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion
> notification.
> + *
> + * A timeline out fence for vm_bind/unbind completion notification.
> + */
> +struct drm_i915_gem_vm_bind_fence {
> +	/** @handle: User's handle for a drm_syncobj to signal. */
> +	__u32 handle;
> +
> +	/** @rsvd: Reserved, MBZ */
> +	__u32 rsvd;
> +
> +	/**
> +	 * @value: A point in the timeline.
> +	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> +	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> +	 * binary one.
> +	 */
> +	__u64 value;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the mapping of
> GPU
> + * virtual address (VA) range to the section of an object that should be
> bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not currently bound) and can
> + * be mapped to whole object or a section of the object (partial binding).
> + * Multiple VA mappings can be created to the same section of the object
> + * (aliasing).
> + *
> + * The @start, @offset and @length should be 4K page aligned. However
> the DG2
> + * and XEHPSDV has 64K page size for device local-memory and has compact
> page
> + * table. On those platforms, for binding device local-memory objects, the
> + * @start should be 2M aligned, @offset and @length should be 64K aligned.
> + * Also, on those platforms, it is not allowed to bind an device local-memory
> + * object and a system memory object in a single 2M section of VA range.
> + */
> +struct drm_i915_gem_vm_bind {
> +	/** @vm_id: VM (address space) id to bind */
> +	__u32 vm_id;
> +
> +	/** @handle: Object handle */
> +	__u32 handle;
> +
> +	/** @start: Virtual Address start to bind */
> +	__u64 start;
> +
> +	/** @offset: Offset in object to bind */
> +	__u64 offset;
> +
> +	/** @length: Length of mapping to bind */
> +	__u64 length;
> +
> +	/**
> +	 * @flags: Supported flags are:
> +	 *
> +	 * I915_GEM_VM_BIND_READONLY:
> +	 * Mapping is read-only.
> +	 *
> +	 * I915_GEM_VM_BIND_CAPTURE:
> +	 * Capture this mapping in the dump upon GPU error.
> +	 */
> +	__u64 flags;
> +#define I915_GEM_VM_BIND_READONLY    (1 << 0)

Should we define another flag for DEVICE_ATOMIC? Without this flag, do you imply all the mapping support device atomic operation? 
HW platform also has an implication to device atomic, i.e., some platform don't support device atomics to system memory.

Regards,
Oak

> +#define I915_GEM_VM_BIND_CAPTURE     (1 << 1)
> +
> +	/** @fence: Timeline fence for bind completion signaling */
> +	struct drm_i915_gem_vm_bind_fence fence;
> +
> +	/** @extensions: 0-terminated chain of extensions */
> +	__u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> + *
> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
> + * address (VA) range that should be unbound from the device page table
> of the
> + * specified address space (VM). The specified VA range must match one of
> the
> + * mappings created with the VM_BIND ioctl. TLB is flushed upon unbind
> + * completion.
> + *
> + * The @start and @length musy specify a unique mapping bound with
> VM_BIND
> + * ioctl.
> + */
> +struct drm_i915_gem_vm_unbind {
> +	/** @vm_id: VM (address space) id to bind */
> +	__u32 vm_id;
> +
> +	/** @rsvd: Reserved, MBZ */
> +	__u32 rsvd;
> +
> +	/** @start: Virtual Address start to unbind */
> +	__u64 start;
> +
> +	/** @length: Length of mapping to unbind */
> +	__u64 length;
> +
> +	/** @flags: Reserved for future usage, currently MBZ */
> +	__u64 flags;
> +
> +	/** @fence: Timeline fence for unbind completion signaling */
> +	struct drm_i915_gem_vm_bind_fence fence;
> +
> +	/** @extensions: 0-terminated chain of extensions */
> +	__u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_execbuffer3 - Structure for
> DRM_I915_GEM_EXECBUFFER3
> + * ioctl.
> + *
> + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and
> VM_BIND mode
> + * only works with this ioctl for submission.
> + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
> + */
> +struct drm_i915_gem_execbuffer3 {
> +	/**
> +	 * @ctx_id: Context id
> +	 *
> +	 * Only contexts with user engine map are allowed.
> +	 */
> +	__u32 ctx_id;
> +
> +	/**
> +	 * @engine_idx: Engine index
> +	 *
> +	 * An index in the user engine map of the context specified by
> @ctx_id.
> +	 */
> +	__u32 engine_idx;
> +
> +	/** @rsvd1: Reserved, MBZ */
> +	__u32 rsvd1;
> +
> +	/**
> +	 * @batch_count: Number of batches in @batch_address array.
> +	 *
> +	 * 0 is invalid. For parallel submission, it should be equal to the
> +	 * number of (parallel) engines involved in that submission.
> +	 */
> +	__u32 batch_count;
> +
> +	/**
> +	 * @batch_address: Array of batch gpu virtual addresses.
> +	 *
> +	 * If @batch_count is 1, then it is the gpu virtual address of the
> +	 * batch buffer. If @batch_count > 1, then it is a pointer to an array
> +	 * of batch buffer gpu virtual addresses.
> +	 */
> +	__u64 batch_address;
> +
> +	/**
> +	 * @flags: Supported flags are:
> +	 *
> +	 * I915_EXEC3_SECURE:
> +	 * Request a privileged ("secure") batch buffer/s.
> +	 * It is only available for DRM_ROOT_ONLY | DRM_MASTER
> processes.
> +	 */
> +	__u64 flags;
> +#define I915_EXEC3_SECURE	(1<<0)
> +
> +	/** @rsvd2: Reserved, MBZ */
> +	__u64 rsvd2;
> +
> +	/**
> +	 * @extensions: Zero-terminated chain of extensions.
> +	 *
> +	 * DRM_I915_GEM_EXECBUFFER3_EXT_TIMELINE_FENCES:
> +	 * It has same format as
> DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES.
> +	 * See struct drm_i915_gem_execbuffer_ext_timeline_fences.
> +	 */
> +	__u64 extensions;
> +#define DRM_I915_GEM_EXECBUFFER3_EXT_TIMELINE_FENCES	0
> +};
> +
> +/**
> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
> object
> + * private to the specified VM.
> + *
> + * See struct drm_i915_gem_create_ext.
> + */
> +struct drm_i915_gem_create_ext_vm_private {
> +#define I915_GEM_CREATE_EXT_VM_PRIVATE		2
> +	/** @base: Extension link. See struct i915_user_extension. */
> +	struct i915_user_extension base;
> +
> +	/** @vm_id: Id of the VM to which the object is private */
> +	__u32 vm_id;
> +};
> --
> 2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-20 14:42     ` [Intel-gfx] " Zeng, Oak
@ 2022-06-20 15:58       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 19+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-20 15:58 UTC (permalink / raw)
  To: Zeng, Oak
  Cc: Brost, Matthew, Zanoni, Paulo R, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, Wilson, Chris P, Hellstrom, Thomas, dri-devel,
	jason, Vetter, Daniel, christian.koenig, Auld, Matthew

On Mon, Jun 20, 2022 at 07:42:25AM -0700, Zeng, Oak wrote:
>
>
>Thanks,
>Oak
>
>> -----Original Message-----
>> From: Vishwanathapura, Niranjana <niranjana.vishwanathapura@intel.com>
>> Sent: June 17, 2022 1:15 AM
>> To: intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Vetter,
>> Daniel <daniel.vetter@intel.com>
>> Cc: Hellstrom, Thomas <thomas.hellstrom@intel.com>; Wilson, Chris P
>> <chris.p.wilson@intel.com>; jason@jlekstrand.net;
>> christian.koenig@amd.com; Brost, Matthew <matthew.brost@intel.com>;
>> Ursulin, Tvrtko <tvrtko.ursulin@intel.com>; Auld, Matthew
>> <matthew.auld@intel.com>; Landwerlin, Lionel G
>> <lionel.g.landwerlin@intel.com>; Zanoni, Paulo R
>> <paulo.r.zanoni@intel.com>; Zeng, Oak <oak.zeng@intel.com>
>> Subject: [PATCH v2 3/3] drm/doc/rfc: VM_BIND uapi definition
>>
>> VM_BIND and related uapi definitions
>>
>> v2: Reduce the scope to simple Mesa use case.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  Documentation/gpu/rfc/i915_vm_bind.h | 226
>> +++++++++++++++++++++++++++
>>  1 file changed, 226 insertions(+)
>>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>
>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
>> b/Documentation/gpu/rfc/i915_vm_bind.h
>> new file mode 100644
>> index 000000000000..b7540ddb526d
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>> @@ -0,0 +1,226 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +/**
>> + * DOC: I915_PARAM_HAS_VM_BIND
>> + *
>> + * VM_BIND feature availability.
>> + * See typedef drm_i915_getparam_t param.
>> + */
>> +#define I915_PARAM_HAS_VM_BIND               57
>> +
>> +/**
>> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>> + *
>> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>> + * See struct drm_i915_gem_vm_control flags.
>> + *
>> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept
>> any
>> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>> + *
>> + */
>> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND     (1 << 0)
>> +
>> +/* VM_BIND related ioctls */
>> +#define DRM_I915_GEM_VM_BIND         0x3d
>> +#define DRM_I915_GEM_VM_UNBIND               0x3e
>> +#define DRM_I915_GEM_EXECBUFFER3     0x3f
>> +
>> +#define DRM_IOCTL_I915_GEM_VM_BIND
>>       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND,
>> struct drm_i915_gem_vm_bind)
>> +#define DRM_IOCTL_I915_GEM_VM_UNBIND
>>       DRM_IOWR(DRM_COMMAND_BASE +
>> DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
>> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3
>>       DRM_IOWR(DRM_COMMAND_BASE +
>> DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
>> +
>> +/**
>> + * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion
>> notification.
>> + *
>> + * A timeline out fence for vm_bind/unbind completion notification.
>> + */
>> +struct drm_i915_gem_vm_bind_fence {
>> +     /** @handle: User's handle for a drm_syncobj to signal. */
>> +     __u32 handle;
>> +
>> +     /** @rsvd: Reserved, MBZ */
>> +     __u32 rsvd;
>> +
>> +     /**
>> +      * @value: A point in the timeline.
>> +      * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>> +      * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
>> +      * binary one.
>> +      */
>> +     __u64 value;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>> + *
>> + * This structure is passed to VM_BIND ioctl and specifies the mapping of
>> GPU
>> + * virtual address (VA) range to the section of an object that should be
>> bound
>> + * in the device page table of the specified address space (VM).
>> + * The VA range specified must be unique (ie., not currently bound) and can
>> + * be mapped to whole object or a section of the object (partial binding).
>> + * Multiple VA mappings can be created to the same section of the object
>> + * (aliasing).
>> + *
>> + * The @start, @offset and @length should be 4K page aligned. However
>> the DG2
>> + * and XEHPSDV has 64K page size for device local-memory and has compact
>> page
>> + * table. On those platforms, for binding device local-memory objects, the
>> + * @start should be 2M aligned, @offset and @length should be 64K aligned.
>> + * Also, on those platforms, it is not allowed to bind an device local-memory
>> + * object and a system memory object in a single 2M section of VA range.
>> + */
>> +struct drm_i915_gem_vm_bind {
>> +     /** @vm_id: VM (address space) id to bind */
>> +     __u32 vm_id;
>> +
>> +     /** @handle: Object handle */
>> +     __u32 handle;
>> +
>> +     /** @start: Virtual Address start to bind */
>> +     __u64 start;
>> +
>> +     /** @offset: Offset in object to bind */
>> +     __u64 offset;
>> +
>> +     /** @length: Length of mapping to bind */
>> +     __u64 length;
>> +
>> +     /**
>> +      * @flags: Supported flags are:
>> +      *
>> +      * I915_GEM_VM_BIND_READONLY:
>> +      * Mapping is read-only.
>> +      *
>> +      * I915_GEM_VM_BIND_CAPTURE:
>> +      * Capture this mapping in the dump upon GPU error.
>> +      */
>> +     __u64 flags;
>> +#define I915_GEM_VM_BIND_READONLY    (1 << 0)
>
>Should we define another flag for DEVICE_ATOMIC? Without this flag, do you imply all the mapping support device atomic operation?
>HW platform also has an implication to device atomic, i.e., some platform don't support device atomics to system memory.
>

Thanks Oak, we can always add required flags later when we want to add the support.

Niranjana

>Regards,
>Oak
>
>> +#define I915_GEM_VM_BIND_CAPTURE     (1 << 1)
>> +
>> +     /** @fence: Timeline fence for bind completion signaling */
>> +     struct drm_i915_gem_vm_bind_fence fence;
>> +
>> +     /** @extensions: 0-terminated chain of extensions */
>> +     __u64 extensions;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>> + *
>> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
>> + * address (VA) range that should be unbound from the device page table
>> of the
>> + * specified address space (VM). The specified VA range must match one of
>> the
>> + * mappings created with the VM_BIND ioctl. TLB is flushed upon unbind
>> + * completion.
>> + *
>> + * The @start and @length musy specify a unique mapping bound with
>> VM_BIND
>> + * ioctl.
>> + */
>> +struct drm_i915_gem_vm_unbind {
>> +     /** @vm_id: VM (address space) id to bind */
>> +     __u32 vm_id;
>> +
>> +     /** @rsvd: Reserved, MBZ */
>> +     __u32 rsvd;
>> +
>> +     /** @start: Virtual Address start to unbind */
>> +     __u64 start;
>> +
>> +     /** @length: Length of mapping to unbind */
>> +     __u64 length;
>> +
>> +     /** @flags: Reserved for future usage, currently MBZ */
>> +     __u64 flags;
>> +
>> +     /** @fence: Timeline fence for unbind completion signaling */
>> +     struct drm_i915_gem_vm_bind_fence fence;
>> +
>> +     /** @extensions: 0-terminated chain of extensions */
>> +     __u64 extensions;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_execbuffer3 - Structure for
>> DRM_I915_GEM_EXECBUFFER3
>> + * ioctl.
>> + *
>> + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and
>> VM_BIND mode
>> + * only works with this ioctl for submission.
>> + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>> + */
>> +struct drm_i915_gem_execbuffer3 {
>> +     /**
>> +      * @ctx_id: Context id
>> +      *
>> +      * Only contexts with user engine map are allowed.
>> +      */
>> +     __u32 ctx_id;
>> +
>> +     /**
>> +      * @engine_idx: Engine index
>> +      *
>> +      * An index in the user engine map of the context specified by
>> @ctx_id.
>> +      */
>> +     __u32 engine_idx;
>> +
>> +     /** @rsvd1: Reserved, MBZ */
>> +     __u32 rsvd1;
>> +
>> +     /**
>> +      * @batch_count: Number of batches in @batch_address array.
>> +      *
>> +      * 0 is invalid. For parallel submission, it should be equal to the
>> +      * number of (parallel) engines involved in that submission.
>> +      */
>> +     __u32 batch_count;
>> +
>> +     /**
>> +      * @batch_address: Array of batch gpu virtual addresses.
>> +      *
>> +      * If @batch_count is 1, then it is the gpu virtual address of the
>> +      * batch buffer. If @batch_count > 1, then it is a pointer to an array
>> +      * of batch buffer gpu virtual addresses.
>> +      */
>> +     __u64 batch_address;
>> +
>> +     /**
>> +      * @flags: Supported flags are:
>> +      *
>> +      * I915_EXEC3_SECURE:
>> +      * Request a privileged ("secure") batch buffer/s.
>> +      * It is only available for DRM_ROOT_ONLY | DRM_MASTER
>> processes.
>> +      */
>> +     __u64 flags;
>> +#define I915_EXEC3_SECURE    (1<<0)
>> +
>> +     /** @rsvd2: Reserved, MBZ */
>> +     __u64 rsvd2;
>> +
>> +     /**
>> +      * @extensions: Zero-terminated chain of extensions.
>> +      *
>> +      * DRM_I915_GEM_EXECBUFFER3_EXT_TIMELINE_FENCES:
>> +      * It has same format as
>> DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES.
>> +      * See struct drm_i915_gem_execbuffer_ext_timeline_fences.
>> +      */
>> +     __u64 extensions;
>> +#define DRM_I915_GEM_EXECBUFFER3_EXT_TIMELINE_FENCES 0
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
>> object
>> + * private to the specified VM.
>> + *
>> + * See struct drm_i915_gem_create_ext.
>> + */
>> +struct drm_i915_gem_create_ext_vm_private {
>> +#define I915_GEM_CREATE_EXT_VM_PRIVATE               2
>> +     /** @base: Extension link. See struct i915_user_extension. */
>> +     struct i915_user_extension base;
>> +
>> +     /** @vm_id: Id of the VM to which the object is private */
>> +     __u32 vm_id;
>> +};
>> --
>> 2.21.0.rc0.32.g243a4c7e27
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH v2 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-20 15:58       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 19+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-20 15:58 UTC (permalink / raw)
  To: Zeng, Oak
  Cc: Zanoni, Paulo R, intel-gfx, Wilson, Chris P, Hellstrom, Thomas,
	dri-devel, Vetter, Daniel, christian.koenig, Auld, Matthew

On Mon, Jun 20, 2022 at 07:42:25AM -0700, Zeng, Oak wrote:
>
>
>Thanks,
>Oak
>
>> -----Original Message-----
>> From: Vishwanathapura, Niranjana <niranjana.vishwanathapura@intel.com>
>> Sent: June 17, 2022 1:15 AM
>> To: intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Vetter,
>> Daniel <daniel.vetter@intel.com>
>> Cc: Hellstrom, Thomas <thomas.hellstrom@intel.com>; Wilson, Chris P
>> <chris.p.wilson@intel.com>; jason@jlekstrand.net;
>> christian.koenig@amd.com; Brost, Matthew <matthew.brost@intel.com>;
>> Ursulin, Tvrtko <tvrtko.ursulin@intel.com>; Auld, Matthew
>> <matthew.auld@intel.com>; Landwerlin, Lionel G
>> <lionel.g.landwerlin@intel.com>; Zanoni, Paulo R
>> <paulo.r.zanoni@intel.com>; Zeng, Oak <oak.zeng@intel.com>
>> Subject: [PATCH v2 3/3] drm/doc/rfc: VM_BIND uapi definition
>>
>> VM_BIND and related uapi definitions
>>
>> v2: Reduce the scope to simple Mesa use case.
>>
>> Signed-off-by: Niranjana Vishwanathapura
>> <niranjana.vishwanathapura@intel.com>
>> ---
>>  Documentation/gpu/rfc/i915_vm_bind.h | 226
>> +++++++++++++++++++++++++++
>>  1 file changed, 226 insertions(+)
>>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>
>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
>> b/Documentation/gpu/rfc/i915_vm_bind.h
>> new file mode 100644
>> index 000000000000..b7540ddb526d
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>> @@ -0,0 +1,226 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +/**
>> + * DOC: I915_PARAM_HAS_VM_BIND
>> + *
>> + * VM_BIND feature availability.
>> + * See typedef drm_i915_getparam_t param.
>> + */
>> +#define I915_PARAM_HAS_VM_BIND               57
>> +
>> +/**
>> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>> + *
>> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>> + * See struct drm_i915_gem_vm_control flags.
>> + *
>> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept
>> any
>> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>> + *
>> + */
>> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND     (1 << 0)
>> +
>> +/* VM_BIND related ioctls */
>> +#define DRM_I915_GEM_VM_BIND         0x3d
>> +#define DRM_I915_GEM_VM_UNBIND               0x3e
>> +#define DRM_I915_GEM_EXECBUFFER3     0x3f
>> +
>> +#define DRM_IOCTL_I915_GEM_VM_BIND
>>       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND,
>> struct drm_i915_gem_vm_bind)
>> +#define DRM_IOCTL_I915_GEM_VM_UNBIND
>>       DRM_IOWR(DRM_COMMAND_BASE +
>> DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
>> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3
>>       DRM_IOWR(DRM_COMMAND_BASE +
>> DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
>> +
>> +/**
>> + * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion
>> notification.
>> + *
>> + * A timeline out fence for vm_bind/unbind completion notification.
>> + */
>> +struct drm_i915_gem_vm_bind_fence {
>> +     /** @handle: User's handle for a drm_syncobj to signal. */
>> +     __u32 handle;
>> +
>> +     /** @rsvd: Reserved, MBZ */
>> +     __u32 rsvd;
>> +
>> +     /**
>> +      * @value: A point in the timeline.
>> +      * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>> +      * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
>> +      * binary one.
>> +      */
>> +     __u64 value;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>> + *
>> + * This structure is passed to VM_BIND ioctl and specifies the mapping of
>> GPU
>> + * virtual address (VA) range to the section of an object that should be
>> bound
>> + * in the device page table of the specified address space (VM).
>> + * The VA range specified must be unique (ie., not currently bound) and can
>> + * be mapped to whole object or a section of the object (partial binding).
>> + * Multiple VA mappings can be created to the same section of the object
>> + * (aliasing).
>> + *
>> + * The @start, @offset and @length should be 4K page aligned. However
>> the DG2
>> + * and XEHPSDV has 64K page size for device local-memory and has compact
>> page
>> + * table. On those platforms, for binding device local-memory objects, the
>> + * @start should be 2M aligned, @offset and @length should be 64K aligned.
>> + * Also, on those platforms, it is not allowed to bind an device local-memory
>> + * object and a system memory object in a single 2M section of VA range.
>> + */
>> +struct drm_i915_gem_vm_bind {
>> +     /** @vm_id: VM (address space) id to bind */
>> +     __u32 vm_id;
>> +
>> +     /** @handle: Object handle */
>> +     __u32 handle;
>> +
>> +     /** @start: Virtual Address start to bind */
>> +     __u64 start;
>> +
>> +     /** @offset: Offset in object to bind */
>> +     __u64 offset;
>> +
>> +     /** @length: Length of mapping to bind */
>> +     __u64 length;
>> +
>> +     /**
>> +      * @flags: Supported flags are:
>> +      *
>> +      * I915_GEM_VM_BIND_READONLY:
>> +      * Mapping is read-only.
>> +      *
>> +      * I915_GEM_VM_BIND_CAPTURE:
>> +      * Capture this mapping in the dump upon GPU error.
>> +      */
>> +     __u64 flags;
>> +#define I915_GEM_VM_BIND_READONLY    (1 << 0)
>
>Should we define another flag for DEVICE_ATOMIC? Without this flag, do you imply all the mapping support device atomic operation?
>HW platform also has an implication to device atomic, i.e., some platform don't support device atomics to system memory.
>

Thanks Oak, we can always add required flags later when we want to add the support.

Niranjana

>Regards,
>Oak
>
>> +#define I915_GEM_VM_BIND_CAPTURE     (1 << 1)
>> +
>> +     /** @fence: Timeline fence for bind completion signaling */
>> +     struct drm_i915_gem_vm_bind_fence fence;
>> +
>> +     /** @extensions: 0-terminated chain of extensions */
>> +     __u64 extensions;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>> + *
>> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
>> + * address (VA) range that should be unbound from the device page table
>> of the
>> + * specified address space (VM). The specified VA range must match one of
>> the
>> + * mappings created with the VM_BIND ioctl. TLB is flushed upon unbind
>> + * completion.
>> + *
>> + * The @start and @length musy specify a unique mapping bound with
>> VM_BIND
>> + * ioctl.
>> + */
>> +struct drm_i915_gem_vm_unbind {
>> +     /** @vm_id: VM (address space) id to bind */
>> +     __u32 vm_id;
>> +
>> +     /** @rsvd: Reserved, MBZ */
>> +     __u32 rsvd;
>> +
>> +     /** @start: Virtual Address start to unbind */
>> +     __u64 start;
>> +
>> +     /** @length: Length of mapping to unbind */
>> +     __u64 length;
>> +
>> +     /** @flags: Reserved for future usage, currently MBZ */
>> +     __u64 flags;
>> +
>> +     /** @fence: Timeline fence for unbind completion signaling */
>> +     struct drm_i915_gem_vm_bind_fence fence;
>> +
>> +     /** @extensions: 0-terminated chain of extensions */
>> +     __u64 extensions;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_execbuffer3 - Structure for
>> DRM_I915_GEM_EXECBUFFER3
>> + * ioctl.
>> + *
>> + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and
>> VM_BIND mode
>> + * only works with this ioctl for submission.
>> + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>> + */
>> +struct drm_i915_gem_execbuffer3 {
>> +     /**
>> +      * @ctx_id: Context id
>> +      *
>> +      * Only contexts with user engine map are allowed.
>> +      */
>> +     __u32 ctx_id;
>> +
>> +     /**
>> +      * @engine_idx: Engine index
>> +      *
>> +      * An index in the user engine map of the context specified by
>> @ctx_id.
>> +      */
>> +     __u32 engine_idx;
>> +
>> +     /** @rsvd1: Reserved, MBZ */
>> +     __u32 rsvd1;
>> +
>> +     /**
>> +      * @batch_count: Number of batches in @batch_address array.
>> +      *
>> +      * 0 is invalid. For parallel submission, it should be equal to the
>> +      * number of (parallel) engines involved in that submission.
>> +      */
>> +     __u32 batch_count;
>> +
>> +     /**
>> +      * @batch_address: Array of batch gpu virtual addresses.
>> +      *
>> +      * If @batch_count is 1, then it is the gpu virtual address of the
>> +      * batch buffer. If @batch_count > 1, then it is a pointer to an array
>> +      * of batch buffer gpu virtual addresses.
>> +      */
>> +     __u64 batch_address;
>> +
>> +     /**
>> +      * @flags: Supported flags are:
>> +      *
>> +      * I915_EXEC3_SECURE:
>> +      * Request a privileged ("secure") batch buffer/s.
>> +      * It is only available for DRM_ROOT_ONLY | DRM_MASTER
>> processes.
>> +      */
>> +     __u64 flags;
>> +#define I915_EXEC3_SECURE    (1<<0)
>> +
>> +     /** @rsvd2: Reserved, MBZ */
>> +     __u64 rsvd2;
>> +
>> +     /**
>> +      * @extensions: Zero-terminated chain of extensions.
>> +      *
>> +      * DRM_I915_GEM_EXECBUFFER3_EXT_TIMELINE_FENCES:
>> +      * It has same format as
>> DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES.
>> +      * See struct drm_i915_gem_execbuffer_ext_timeline_fences.
>> +      */
>> +     __u64 extensions;
>> +#define DRM_I915_GEM_EXECBUFFER3_EXT_TIMELINE_FENCES 0
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
>> object
>> + * private to the specified VM.
>> + *
>> + * See struct drm_i915_gem_create_ext.
>> + */
>> +struct drm_i915_gem_create_ext_vm_private {
>> +#define I915_GEM_CREATE_EXT_VM_PRIVATE               2
>> +     /** @base: Extension link. See struct i915_user_extension. */
>> +     struct i915_user_extension base;
>> +
>> +     /** @vm_id: Id of the VM to which the object is private */
>> +     __u32 vm_id;
>> +};
>> --
>> 2.21.0.rc0.32.g243a4c7e27
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/3] drm/doc/rfc: VM_BIND feature design document
  2022-06-20 10:43   ` Tvrtko Ursulin
@ 2022-06-20 16:29     ` Niranjana Vishwanathapura
  2022-06-21  8:35       ` Tvrtko Ursulin
  0 siblings, 1 reply; 19+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-20 16:29 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: paulo.r.zanoni, intel-gfx, chris.p.wilson, thomas.hellstrom,
	dri-devel, daniel.vetter, christian.koenig, matthew.auld

On Mon, Jun 20, 2022 at 11:43:10AM +0100, Tvrtko Ursulin wrote:
>
>Hi,
>
>On 17/06/2022 06:14, Niranjana Vishwanathapura wrote:
>>VM_BIND design document with description of intended use cases.
>>
>>v2: Reduce the scope to simple Mesa use case.
>
>since I expressed interest please add me to cc when sending out.
>

Hi Tvrtko,
I did include you in the cc list with git send-email, but looks like some patches
in this series has the full cc list, but some don't (you are on cc list of this
patch though). I am not sure why.

>How come the direction changed to simplify all of a sudden? I did not 
>spot any discussion to that effect. Was it internal talks?
>

Yah, some of us had offline discussion involving the Mesa team.
I did update the thread (previous version of this patch series) about that.
Plan was to align our roadmap to focus on the deliverables at this point
without further complicating the uapi. 

>>
>>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>---
>>  Documentation/gpu/rfc/i915_vm_bind.rst | 238 +++++++++++++++++++++++++
>>  Documentation/gpu/rfc/index.rst        |   4 +
>>  2 files changed, 242 insertions(+)
>>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst
>>
>>diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst b/Documentation/gpu/rfc/i915_vm_bind.rst
>>new file mode 100644
>>index 000000000000..4ab590ef11fd
>>--- /dev/null
>>+++ b/Documentation/gpu/rfc/i915_vm_bind.rst
>>@@ -0,0 +1,238 @@
>>+==========================================
>>+I915 VM_BIND feature design and use cases
>>+==========================================
>>+
>>+VM_BIND feature
>>+================
>>+DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
>>+objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
>>+specified address space (VM). These mappings (also referred to as persistent
>>+mappings) will be persistent across multiple GPU submissions (execbuf calls)
>>+issued by the UMD, without user having to provide a list of all required
>>+mappings during each submission (as required by older execbuf mode).
>>+
>>+The VM_BIND/UNBIND calls allow UMDs to request a timeline fence for signaling
>>+the completion of bind/unbind operation.
>>+
>>+VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND.
>>+User has to opt-in for VM_BIND mode of binding for an address space (VM)
>>+during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
>>+
>>+Normally, vm_bind/unbind operations will get completed synchronously,
>
>To me synchronously, at this point in the text, reads as ioctl will 
>return only when the operation is done. Rest of the paragraph however 
>disagrees (plus existence of out fence). It is not clear to me what is 
>the actual behaviour. Will it be clear to userspace developers reading 
>uapi kerneldoc? If it is async, what are the ordering rules in this 
>version?
>

Yah, here I am simply stating the i915_vma_pin_ww() behavior which mostly
does the binding synchronously unless there is a moving fence associated
with the object in which case, binding will complete later once that fence
is signaled (hence the out fence).

>>+but if the object is being moved, the binding will happen once that the
>>+moving is complete and out fence will be signaled after binding is complete.
>>+The bind/unbind operation can get completed out of submission order.
>>+
>>+VM_BIND features include:
>>+
>>+* Multiple Virtual Address (VA) mappings can map to the same physical pages
>>+  of an object (aliasing).
>>+* VA mapping can map to a partial section of the BO (partial binding).
>>+* Support capture of persistent mappings in the dump upon GPU error.
>>+* TLB is flushed upon unbind completion. Batching of TLB flushes in some
>>+  use cases will be helpful.
>>+* Support for userptr gem objects (no special uapi is required for this).
>>+
>>+Execbuf ioctl in VM_BIND mode
>>+-------------------------------
>>+A VM in VM_BIND mode will not support older execbuf mode of binding.
>>+The execbuf ioctl handling in VM_BIND mode differs significantly from the
>>+older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
>>+Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
>>+struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
>>+execlist. Hence, no support for implicit sync. It is expected that the below
>>+work will be able to support requirements of object dependency setting in all
>>+use cases:
>>+
>>+"dma-buf: Add an API for exporting sync files"
>>+(https://lwn.net/Articles/859290/)
>
>What does this mean? If execbuf3 does not know about target objects 
>how can we add a meaningful fence?
>

Execbuf3 does know about the target objects. It is all the objects
bound to that VM via vm_bind call.

>>+
>>+The execbuf3 ioctl directly specifies the batch addresses instead of as
>>+object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
>>+support many of the older features like in/out/submit fences, fence array,
>>+default gem context and many more (See struct drm_i915_gem_execbuffer3).
>>+
>>+In VM_BIND mode, VA allocation is completely managed by the user instead of
>>+the i915 driver. Hence all VA assignment, eviction are not applicable in
>>+VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
>>+be using the i915_vma active reference tracking. It will instead use dma-resv
>>+object for that (See `VM_BIND dma_resv usage`_).
>>+
>>+So, a lot of existing code supporting execbuf2 ioctl, like relocations, VA
>>+evictions, vma lookup table, implicit sync, vma active reference tracking etc.,
>>+are not applicable for execbuf3 ioctl. Hence, all execbuf3 specific handling
>>+should be in a separate file and only functionalities common to these ioctls
>>+can be the shared code where possible.
>>+
>>+VM_PRIVATE objects
>>+-------------------
>>+By default, BOs can be mapped on multiple VMs and can also be dma-buf
>>+exported. Hence these BOs are referred to as Shared BOs.
>>+During each execbuf submission, the request fence must be added to the
>>+dma-resv fence list of all shared BOs mapped on the VM.
>
>Does this tie to my previous question? Design is to add each fence to 
>literally _all_ BOs mapped to a VM, on every execbuf3? If so, is that 
>definitely needed and for what use case? Mixing implicit and explicit, 
>I mean bridging implicit and explicit sync clients?
>

Yes. It is similar to how legacy execbuf2 does. ie., add request fence
to all of the target BOs. Only difference is in execbuf2 case, target
objects are the objects in execlist, whereas in execbuf2, it is all
the BOs mapped to that VM via vm_bind call. It is needed as UMD says
that it is needed by vm_bind'ing the BO before the execbuf3 call.

Niranjana

>Regards,
>
>Tvrtko
>
>>+
>>+VM_BIND feature introduces an optimization where user can create BO which
>>+is private to a specified VM via I915_GEM_CREATE_EXT_VM_PRIVATE flag during
>>+BO creation. Unlike Shared BOs, these VM private BOs can only be mapped on
>>+the VM they are private to and can't be dma-buf exported.
>>+All private BOs of a VM share the dma-resv object. Hence during each execbuf
>>+submission, they need only one dma-resv fence list updated. Thus, the fast
>>+path (where required mappings are already bound) submission latency is O(1)
>>+w.r.t the number of VM private BOs.
>>+
>>+VM_BIND locking hirarchy
>>+-------------------------
>>+The locking design here supports the older (execlist based) execbuf mode, the
>>+newer VM_BIND mode, the VM_BIND mode with GPU page faults and possible future
>>+system allocator support (See `Shared Virtual Memory (SVM) support`_).
>>+The older execbuf mode and the newer VM_BIND mode without page faults manages
>>+residency of backing storage using dma_fence. The VM_BIND mode with page faults
>>+and the system allocator support do not use any dma_fence at all.
>>+
>>+VM_BIND locking order is as below.
>>+
>>+1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
>>+   vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
>>+   mapping.
>>+
>>+   In future, when GPU page faults are supported, we can potentially use a
>>+   rwsem instead, so that multiple page fault handlers can take the read side
>>+   lock to lookup the mapping and hence can run in parallel.
>>+   The older execbuf mode of binding do not need this lock.
>>+
>>+2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs to
>>+   be held while binding/unbinding a vma in the async worker and while updating
>>+   dma-resv fence list of an object. Note that private BOs of a VM will all
>>+   share a dma-resv object.
>>+
>>+   The future system allocator support will use the HMM prescribed locking
>>+   instead.
>>+
>>+3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
>>+   invalidated vmas (due to eviction and userptr invalidation) etc.
>>+
>>+When GPU page faults are supported, the execbuf path do not take any of these
>>+locks. There we will simply smash the new batch buffer address into the ring and
>>+then tell the scheduler run that. The lock taking only happens from the page
>>+fault handler, where we take lock-A in read mode, whichever lock-B we need to
>>+find the backing storage (dma_resv lock for gem objects, and hmm/core mm for
>>+system allocator) and some additional locks (lock-D) for taking care of page
>>+table races. Page fault mode should not need to ever manipulate the vm lists,
>>+so won't ever need lock-C.
>>+
>>+VM_BIND LRU handling
>>+---------------------
>>+We need to ensure VM_BIND mapped objects are properly LRU tagged to avoid
>>+performance degradation. We will also need support for bulk LRU movement of
>>+VM_BIND objects to avoid additional latencies in execbuf path.
>>+
>>+The page table pages are similar to VM_BIND mapped objects (See
>>+`Evictable page table allocations`_) and are maintained per VM and needs to
>>+be pinned in memory when VM is made active (ie., upon an execbuf call with
>>+that VM). So, bulk LRU movement of page table pages is also needed.
>>+
>>+VM_BIND dma_resv usage
>>+-----------------------
>>+Fences needs to be added to all VM_BIND mapped objects. During each execbuf
>>+submission, they are added with DMA_RESV_USAGE_BOOKKEEP usage to prevent
>>+over sync (See enum dma_resv_usage). One can override it with either
>>+DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE usage during object dependency
>>+setting (either through explicit or implicit mechanism).
>>+
>>+When vm_bind is called for a non-private object while the VM is already
>>+active, the fences need to be copied from VM's shared dma-resv object
>>+(common to all private objects of the VM) to this non-private object.
>>+If this results in performance degradation, then some optimization will
>>+be needed here. This is not a problem for VM's private objects as they use
>>+shared dma-resv object which is always updated on each execbuf submission.
>>+
>>+Also, in VM_BIND mode, use dma-resv apis for determining object activeness
>>+(See dma_resv_test_signaled() and dma_resv_wait_timeout()) and do not use the
>>+older i915_vma active reference tracking which is deprecated. This should be
>>+easier to get it working with the current TTM backend.
>>+
>>+Mesa use case
>>+--------------
>>+VM_BIND can potentially reduce the CPU overhead in Mesa (both Vulkan and Iris),
>>+hence improving performance of CPU-bound applications. It also allows us to
>>+implement Vulkan's Sparse Resources. With increasing GPU hardware performance,
>>+reducing CPU overhead becomes more impactful.
>>+
>>+
>>+Other VM_BIND use cases
>>+========================
>>+
>>+Long running Compute contexts
>>+------------------------------
>>+Usage of dma-fence expects that they complete in reasonable amount of time.
>>+Compute on the other hand can be long running. Hence it is appropriate for
>>+compute to use user/memory fence (See `User/Memory Fence`_) and dma-fence usage
>>+must be limited to in-kernel consumption only.
>>+
>>+Where GPU page faults are not available, kernel driver upon buffer invalidation
>>+will initiate a suspend (preemption) of long running context, finish the
>>+invalidation, revalidate the BO and then resume the compute context. This is
>>+done by having a per-context preempt fence which is enabled when someone tries
>>+to wait on it and triggers the context preemption.
>>+
>>+User/Memory Fence
>>+~~~~~~~~~~~~~~~~~~
>>+User/Memory fence is a <address, value> pair. To signal the user fence, the
>>+specified value will be written at the specified virtual address and wakeup the
>>+waiting process. User fence can be signaled either by the GPU or kernel async
>>+worker (like upon bind completion). User can wait on a user fence with a new
>>+user fence wait ioctl.
>>+
>>+Here is some prior work on this:
>>+https://patchwork.freedesktop.org/patch/349417/
>>+
>>+Low Latency Submission
>>+~~~~~~~~~~~~~~~~~~~~~~~
>>+Allows compute UMD to directly submit GPU jobs instead of through execbuf
>>+ioctl. This is made possible by VM_BIND is not being synchronized against
>>+execbuf. VM_BIND allows bind/unbind of mappings required for the directly
>>+submitted jobs.
>>+
>>+Debugger
>>+---------
>>+With debug event interface user space process (debugger) is able to keep track
>>+of and act upon resources created by another process (debugged) and attached
>>+to GPU via vm_bind interface.
>>+
>>+GPU page faults
>>+----------------
>>+GPU page faults when supported (in future), will only be supported in the
>>+VM_BIND mode. While both the older execbuf mode and the newer VM_BIND mode of
>>+binding will require using dma-fence to ensure residency, the GPU page faults
>>+mode when supported, will not use any dma-fence as residency is purely managed
>>+by installing and removing/invalidating page table entries.
>>+
>>+Page level hints settings
>>+--------------------------
>>+VM_BIND allows any hints setting per mapping instead of per BO.
>>+Possible hints include read-only mapping, placement and atomicity.
>>+Sub-BO level placement hint will be even more relevant with
>>+upcoming GPU on-demand page fault support.
>>+
>>+Page level Cache/CLOS settings
>>+-------------------------------
>>+VM_BIND allows cache/CLOS settings per mapping instead of per BO.
>>+
>>+Evictable page table allocations
>>+---------------------------------
>>+Make pagetable allocations evictable and manage them similar to VM_BIND
>>+mapped objects. Page table pages are similar to persistent mappings of a
>>+VM (difference here are that the page table pages will not have an i915_vma
>>+structure and after swapping pages back in, parent page link needs to be
>>+updated).
>>+
>>+Shared Virtual Memory (SVM) support
>>+------------------------------------
>>+VM_BIND interface can be used to map system memory directly (without gem BO
>>+abstraction) using the HMM interface. SVM is only supported with GPU page
>>+faults enabled.
>>+
>>+VM_BIND UAPI
>>+=============
>>+
>>+.. kernel-doc:: Documentation/gpu/rfc/i915_vm_bind.h
>>diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
>>index 91e93a705230..7d10c36b268d 100644
>>--- a/Documentation/gpu/rfc/index.rst
>>+++ b/Documentation/gpu/rfc/index.rst
>>@@ -23,3 +23,7 @@ host such documentation:
>>  .. toctree::
>>      i915_scheduler.rst
>>+
>>+.. toctree::
>>+
>>+    i915_vm_bind.rst

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/3] drm/doc/rfc: VM_BIND feature design document
  2022-06-20 16:29     ` Niranjana Vishwanathapura
@ 2022-06-21  8:35       ` Tvrtko Ursulin
  2022-06-21 14:43         ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 19+ messages in thread
From: Tvrtko Ursulin @ 2022-06-21  8:35 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: paulo.r.zanoni, intel-gfx, chris.p.wilson, thomas.hellstrom,
	dri-devel, daniel.vetter, christian.koenig, matthew.auld


On 20/06/2022 17:29, Niranjana Vishwanathapura wrote:
> On Mon, Jun 20, 2022 at 11:43:10AM +0100, Tvrtko Ursulin wrote:
>>
>> Hi,
>>
>> On 17/06/2022 06:14, Niranjana Vishwanathapura wrote:
>>> VM_BIND design document with description of intended use cases.
>>>
>>> v2: Reduce the scope to simple Mesa use case.
>>
>> since I expressed interest please add me to cc when sending out.
>>
> 
> Hi Tvrtko,
> I did include you in the cc list with git send-email, but looks like 
> some patches
> in this series has the full cc list, but some don't (you are on cc list 
> of this
> patch though). I am not sure why.

Odd, I'm not on CC on the (only for me) copy I found in the mailing list.

>> How come the direction changed to simplify all of a sudden? I did not 
>> spot any discussion to that effect. Was it internal talks?
>>
> 
> Yah, some of us had offline discussion involving the Mesa team.
> I did update the thread (previous version of this patch series) about that.
> Plan was to align our roadmap to focus on the deliverables at this point
> without further complicating the uapi.
>>>
>>> Signed-off-by: Niranjana Vishwanathapura 
>>> <niranjana.vishwanathapura@intel.com>
>>> ---
>>>  Documentation/gpu/rfc/i915_vm_bind.rst | 238 +++++++++++++++++++++++++
>>>  Documentation/gpu/rfc/index.rst        |   4 +
>>>  2 files changed, 242 insertions(+)
>>>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst
>>>
>>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst 
>>> b/Documentation/gpu/rfc/i915_vm_bind.rst
>>> new file mode 100644
>>> index 000000000000..4ab590ef11fd
>>> --- /dev/null
>>> +++ b/Documentation/gpu/rfc/i915_vm_bind.rst
>>> @@ -0,0 +1,238 @@
>>> +==========================================
>>> +I915 VM_BIND feature design and use cases
>>> +==========================================
>>> +
>>> +VM_BIND feature
>>> +================
>>> +DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
>>> +objects (BOs) or sections of a BOs at specified GPU virtual 
>>> addresses on a
>>> +specified address space (VM). These mappings (also referred to as 
>>> persistent
>>> +mappings) will be persistent across multiple GPU submissions 
>>> (execbuf calls)
>>> +issued by the UMD, without user having to provide a list of all 
>>> required
>>> +mappings during each submission (as required by older execbuf mode).
>>> +
>>> +The VM_BIND/UNBIND calls allow UMDs to request a timeline fence for 
>>> signaling
>>> +the completion of bind/unbind operation.
>>> +
>>> +VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND.
>>> +User has to opt-in for VM_BIND mode of binding for an address space 
>>> (VM)
>>> +during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
>>> +
>>> +Normally, vm_bind/unbind operations will get completed synchronously,
>>
>> To me synchronously, at this point in the text, reads as ioctl will 
>> return only when the operation is done. Rest of the paragraph however 
>> disagrees (plus existence of out fence). It is not clear to me what is 
>> the actual behaviour. Will it be clear to userspace developers reading 
>> uapi kerneldoc? If it is async, what are the ordering rules in this 
>> version?
>>
> 
> Yah, here I am simply stating the i915_vma_pin_ww() behavior which mostly
> does the binding synchronously unless there is a moving fence associated
> with the object in which case, binding will complete later once that fence
> is signaled (hence the out fence).

So from userspace point of view it is fully asynchronous and out of 
order? I'd suggest spelling that out in the uapi kerneldoc.

>>> +but if the object is being moved, the binding will happen once that the
>>> +moving is complete and out fence will be signaled after binding is 
>>> complete.
>>> +The bind/unbind operation can get completed out of submission order.
>>> +
>>> +VM_BIND features include:
>>> +
>>> +* Multiple Virtual Address (VA) mappings can map to the same 
>>> physical pages
>>> +  of an object (aliasing).
>>> +* VA mapping can map to a partial section of the BO (partial binding).
>>> +* Support capture of persistent mappings in the dump upon GPU error.
>>> +* TLB is flushed upon unbind completion. Batching of TLB flushes in 
>>> some
>>> +  use cases will be helpful.
>>> +* Support for userptr gem objects (no special uapi is required for 
>>> this).
>>> +
>>> +Execbuf ioctl in VM_BIND mode
>>> +-------------------------------
>>> +A VM in VM_BIND mode will not support older execbuf mode of binding.
>>> +The execbuf ioctl handling in VM_BIND mode differs significantly 
>>> from the
>>> +older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
>>> +Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. 
>>> (See
>>> +struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept 
>>> any
>>> +execlist. Hence, no support for implicit sync. It is expected that 
>>> the below
>>> +work will be able to support requirements of object dependency 
>>> setting in all
>>> +use cases:
>>> +
>>> +"dma-buf: Add an API for exporting sync files"
>>> +(https://lwn.net/Articles/859290/)
>>
>> What does this mean? If execbuf3 does not know about target objects 
>> how can we add a meaningful fence?
>>
> 
> Execbuf3 does know about the target objects. It is all the objects
> bound to that VM via vm_bind call.
> 
>>> +
>>> +The execbuf3 ioctl directly specifies the batch addresses instead of as
>>> +object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
>>> +support many of the older features like in/out/submit fences, fence 
>>> array,
>>> +default gem context and many more (See struct 
>>> drm_i915_gem_execbuffer3).
>>> +
>>> +In VM_BIND mode, VA allocation is completely managed by the user 
>>> instead of
>>> +the i915 driver. Hence all VA assignment, eviction are not 
>>> applicable in
>>> +VM_BIND mode. Also, for determining object activeness, VM_BIND mode 
>>> will not
>>> +be using the i915_vma active reference tracking. It will instead use 
>>> dma-resv
>>> +object for that (See `VM_BIND dma_resv usage`_).
>>> +
>>> +So, a lot of existing code supporting execbuf2 ioctl, like 
>>> relocations, VA
>>> +evictions, vma lookup table, implicit sync, vma active reference 
>>> tracking etc.,
>>> +are not applicable for execbuf3 ioctl. Hence, all execbuf3 specific 
>>> handling
>>> +should be in a separate file and only functionalities common to 
>>> these ioctls
>>> +can be the shared code where possible.
>>> +
>>> +VM_PRIVATE objects
>>> +-------------------
>>> +By default, BOs can be mapped on multiple VMs and can also be dma-buf
>>> +exported. Hence these BOs are referred to as Shared BOs.
>>> +During each execbuf submission, the request fence must be added to the
>>> +dma-resv fence list of all shared BOs mapped on the VM.
>>
>> Does this tie to my previous question? Design is to add each fence to 
>> literally _all_ BOs mapped to a VM, on every execbuf3? If so, is that 
>> definitely needed and for what use case? Mixing implicit and explicit, 
>> I mean bridging implicit and explicit sync clients?
>>
> 
> Yes. It is similar to how legacy execbuf2 does. ie., add request fence
> to all of the target BOs. Only difference is in execbuf2 case, target
> objects are the objects in execlist, whereas in execbuf2, it is all
> the BOs mapped to that VM via vm_bind call. It is needed as UMD says
> that it is needed by vm_bind'ing the BO before the execbuf3 call.

Sorry I did not understand why it is needed, the last sentence that is, 
what did that suppose to mean?

Regards,

Tvrtko

> Niranjana
> 
>> Regards,
>>
>> Tvrtko
>>
>>> +
>>> +VM_BIND feature introduces an optimization where user can create BO 
>>> which
>>> +is private to a specified VM via I915_GEM_CREATE_EXT_VM_PRIVATE flag 
>>> during
>>> +BO creation. Unlike Shared BOs, these VM private BOs can only be 
>>> mapped on
>>> +the VM they are private to and can't be dma-buf exported.
>>> +All private BOs of a VM share the dma-resv object. Hence during each 
>>> execbuf
>>> +submission, they need only one dma-resv fence list updated. Thus, 
>>> the fast
>>> +path (where required mappings are already bound) submission latency 
>>> is O(1)
>>> +w.r.t the number of VM private BOs.
>>> +
>>> +VM_BIND locking hirarchy
>>> +-------------------------
>>> +The locking design here supports the older (execlist based) execbuf 
>>> mode, the
>>> +newer VM_BIND mode, the VM_BIND mode with GPU page faults and 
>>> possible future
>>> +system allocator support (See `Shared Virtual Memory (SVM) support`_).
>>> +The older execbuf mode and the newer VM_BIND mode without page 
>>> faults manages
>>> +residency of backing storage using dma_fence. The VM_BIND mode with 
>>> page faults
>>> +and the system allocator support do not use any dma_fence at all.
>>> +
>>> +VM_BIND locking order is as below.
>>> +
>>> +1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is 
>>> taken in
>>> +   vm_bind/vm_unbind ioctl calls, in the execbuf path and while 
>>> releasing the
>>> +   mapping.
>>> +
>>> +   In future, when GPU page faults are supported, we can potentially 
>>> use a
>>> +   rwsem instead, so that multiple page fault handlers can take the 
>>> read side
>>> +   lock to lookup the mapping and hence can run in parallel.
>>> +   The older execbuf mode of binding do not need this lock.
>>> +
>>> +2) Lock-B: The object's dma-resv lock will protect i915_vma state 
>>> and needs to
>>> +   be held while binding/unbinding a vma in the async worker and 
>>> while updating
>>> +   dma-resv fence list of an object. Note that private BOs of a VM 
>>> will all
>>> +   share a dma-resv object.
>>> +
>>> +   The future system allocator support will use the HMM prescribed 
>>> locking
>>> +   instead.
>>> +
>>> +3) Lock-C: Spinlock/s to protect some of the VM's lists like the 
>>> list of
>>> +   invalidated vmas (due to eviction and userptr invalidation) etc.
>>> +
>>> +When GPU page faults are supported, the execbuf path do not take any 
>>> of these
>>> +locks. There we will simply smash the new batch buffer address into 
>>> the ring and
>>> +then tell the scheduler run that. The lock taking only happens from 
>>> the page
>>> +fault handler, where we take lock-A in read mode, whichever lock-B 
>>> we need to
>>> +find the backing storage (dma_resv lock for gem objects, and 
>>> hmm/core mm for
>>> +system allocator) and some additional locks (lock-D) for taking care 
>>> of page
>>> +table races. Page fault mode should not need to ever manipulate the 
>>> vm lists,
>>> +so won't ever need lock-C.
>>> +
>>> +VM_BIND LRU handling
>>> +---------------------
>>> +We need to ensure VM_BIND mapped objects are properly LRU tagged to 
>>> avoid
>>> +performance degradation. We will also need support for bulk LRU 
>>> movement of
>>> +VM_BIND objects to avoid additional latencies in execbuf path.
>>> +
>>> +The page table pages are similar to VM_BIND mapped objects (See
>>> +`Evictable page table allocations`_) and are maintained per VM and 
>>> needs to
>>> +be pinned in memory when VM is made active (ie., upon an execbuf 
>>> call with
>>> +that VM). So, bulk LRU movement of page table pages is also needed.
>>> +
>>> +VM_BIND dma_resv usage
>>> +-----------------------
>>> +Fences needs to be added to all VM_BIND mapped objects. During each 
>>> execbuf
>>> +submission, they are added with DMA_RESV_USAGE_BOOKKEEP usage to 
>>> prevent
>>> +over sync (See enum dma_resv_usage). One can override it with either
>>> +DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE usage during object 
>>> dependency
>>> +setting (either through explicit or implicit mechanism).
>>> +
>>> +When vm_bind is called for a non-private object while the VM is already
>>> +active, the fences need to be copied from VM's shared dma-resv object
>>> +(common to all private objects of the VM) to this non-private object.
>>> +If this results in performance degradation, then some optimization will
>>> +be needed here. This is not a problem for VM's private objects as 
>>> they use
>>> +shared dma-resv object which is always updated on each execbuf 
>>> submission.
>>> +
>>> +Also, in VM_BIND mode, use dma-resv apis for determining object 
>>> activeness
>>> +(See dma_resv_test_signaled() and dma_resv_wait_timeout()) and do 
>>> not use the
>>> +older i915_vma active reference tracking which is deprecated. This 
>>> should be
>>> +easier to get it working with the current TTM backend.
>>> +
>>> +Mesa use case
>>> +--------------
>>> +VM_BIND can potentially reduce the CPU overhead in Mesa (both Vulkan 
>>> and Iris),
>>> +hence improving performance of CPU-bound applications. It also 
>>> allows us to
>>> +implement Vulkan's Sparse Resources. With increasing GPU hardware 
>>> performance,
>>> +reducing CPU overhead becomes more impactful.
>>> +
>>> +
>>> +Other VM_BIND use cases
>>> +========================
>>> +
>>> +Long running Compute contexts
>>> +------------------------------
>>> +Usage of dma-fence expects that they complete in reasonable amount 
>>> of time.
>>> +Compute on the other hand can be long running. Hence it is 
>>> appropriate for
>>> +compute to use user/memory fence (See `User/Memory Fence`_) and 
>>> dma-fence usage
>>> +must be limited to in-kernel consumption only.
>>> +
>>> +Where GPU page faults are not available, kernel driver upon buffer 
>>> invalidation
>>> +will initiate a suspend (preemption) of long running context, finish 
>>> the
>>> +invalidation, revalidate the BO and then resume the compute context. 
>>> This is
>>> +done by having a per-context preempt fence which is enabled when 
>>> someone tries
>>> +to wait on it and triggers the context preemption.
>>> +
>>> +User/Memory Fence
>>> +~~~~~~~~~~~~~~~~~~
>>> +User/Memory fence is a <address, value> pair. To signal the user 
>>> fence, the
>>> +specified value will be written at the specified virtual address and 
>>> wakeup the
>>> +waiting process. User fence can be signaled either by the GPU or 
>>> kernel async
>>> +worker (like upon bind completion). User can wait on a user fence 
>>> with a new
>>> +user fence wait ioctl.
>>> +
>>> +Here is some prior work on this:
>>> +https://patchwork.freedesktop.org/patch/349417/
>>> +
>>> +Low Latency Submission
>>> +~~~~~~~~~~~~~~~~~~~~~~~
>>> +Allows compute UMD to directly submit GPU jobs instead of through 
>>> execbuf
>>> +ioctl. This is made possible by VM_BIND is not being synchronized 
>>> against
>>> +execbuf. VM_BIND allows bind/unbind of mappings required for the 
>>> directly
>>> +submitted jobs.
>>> +
>>> +Debugger
>>> +---------
>>> +With debug event interface user space process (debugger) is able to 
>>> keep track
>>> +of and act upon resources created by another process (debugged) and 
>>> attached
>>> +to GPU via vm_bind interface.
>>> +
>>> +GPU page faults
>>> +----------------
>>> +GPU page faults when supported (in future), will only be supported 
>>> in the
>>> +VM_BIND mode. While both the older execbuf mode and the newer 
>>> VM_BIND mode of
>>> +binding will require using dma-fence to ensure residency, the GPU 
>>> page faults
>>> +mode when supported, will not use any dma-fence as residency is 
>>> purely managed
>>> +by installing and removing/invalidating page table entries.
>>> +
>>> +Page level hints settings
>>> +--------------------------
>>> +VM_BIND allows any hints setting per mapping instead of per BO.
>>> +Possible hints include read-only mapping, placement and atomicity.
>>> +Sub-BO level placement hint will be even more relevant with
>>> +upcoming GPU on-demand page fault support.
>>> +
>>> +Page level Cache/CLOS settings
>>> +-------------------------------
>>> +VM_BIND allows cache/CLOS settings per mapping instead of per BO.
>>> +
>>> +Evictable page table allocations
>>> +---------------------------------
>>> +Make pagetable allocations evictable and manage them similar to VM_BIND
>>> +mapped objects. Page table pages are similar to persistent mappings 
>>> of a
>>> +VM (difference here are that the page table pages will not have an 
>>> i915_vma
>>> +structure and after swapping pages back in, parent page link needs 
>>> to be
>>> +updated).
>>> +
>>> +Shared Virtual Memory (SVM) support
>>> +------------------------------------
>>> +VM_BIND interface can be used to map system memory directly (without 
>>> gem BO
>>> +abstraction) using the HMM interface. SVM is only supported with GPU 
>>> page
>>> +faults enabled.
>>> +
>>> +VM_BIND UAPI
>>> +=============
>>> +
>>> +.. kernel-doc:: Documentation/gpu/rfc/i915_vm_bind.h
>>> diff --git a/Documentation/gpu/rfc/index.rst 
>>> b/Documentation/gpu/rfc/index.rst
>>> index 91e93a705230..7d10c36b268d 100644
>>> --- a/Documentation/gpu/rfc/index.rst
>>> +++ b/Documentation/gpu/rfc/index.rst
>>> @@ -23,3 +23,7 @@ host such documentation:
>>>  .. toctree::
>>>      i915_scheduler.rst
>>> +
>>> +.. toctree::
>>> +
>>> +    i915_vm_bind.rst

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/3] drm/doc/rfc: VM_BIND feature design document
  2022-06-21  8:35       ` Tvrtko Ursulin
@ 2022-06-21 14:43         ` Niranjana Vishwanathapura
  2022-06-22  8:17           ` Tvrtko Ursulin
  0 siblings, 1 reply; 19+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-21 14:43 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: paulo.r.zanoni, intel-gfx, chris.p.wilson, thomas.hellstrom,
	dri-devel, daniel.vetter, christian.koenig, matthew.auld

On Tue, Jun 21, 2022 at 09:35:16AM +0100, Tvrtko Ursulin wrote:
>
>On 20/06/2022 17:29, Niranjana Vishwanathapura wrote:
>>On Mon, Jun 20, 2022 at 11:43:10AM +0100, Tvrtko Ursulin wrote:
>>>
>>>Hi,
>>>
>>>On 17/06/2022 06:14, Niranjana Vishwanathapura wrote:
>>>>VM_BIND design document with description of intended use cases.
>>>>
>>>>v2: Reduce the scope to simple Mesa use case.
>>>
>>>since I expressed interest please add me to cc when sending out.
>>>
>>
>>Hi Tvrtko,
>>I did include you in the cc list with git send-email, but looks like 
>>some patches
>>in this series has the full cc list, but some don't (you are on cc 
>>list of this
>>patch though). I am not sure why.
>
>Odd, I'm not on CC on the (only for me) copy I found in the mailing list.
>
>>>How come the direction changed to simplify all of a sudden? I did 
>>>not spot any discussion to that effect. Was it internal talks?
>>>
>>
>>Yah, some of us had offline discussion involving the Mesa team.
>>I did update the thread (previous version of this patch series) about that.
>>Plan was to align our roadmap to focus on the deliverables at this point
>>without further complicating the uapi.
>>>>
>>>>Signed-off-by: Niranjana Vishwanathapura 
>>>><niranjana.vishwanathapura@intel.com>
>>>>---
>>>> Documentation/gpu/rfc/i915_vm_bind.rst | 238 +++++++++++++++++++++++++
>>>> Documentation/gpu/rfc/index.rst        |   4 +
>>>> 2 files changed, 242 insertions(+)
>>>> create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst
>>>>
>>>>diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst 
>>>>b/Documentation/gpu/rfc/i915_vm_bind.rst
>>>>new file mode 100644
>>>>index 000000000000..4ab590ef11fd
>>>>--- /dev/null
>>>>+++ b/Documentation/gpu/rfc/i915_vm_bind.rst
>>>>@@ -0,0 +1,238 @@
>>>>+==========================================
>>>>+I915 VM_BIND feature design and use cases
>>>>+==========================================
>>>>+
>>>>+VM_BIND feature
>>>>+================
>>>>+DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
>>>>+objects (BOs) or sections of a BOs at specified GPU virtual 
>>>>addresses on a
>>>>+specified address space (VM). These mappings (also referred to 
>>>>as persistent
>>>>+mappings) will be persistent across multiple GPU submissions 
>>>>(execbuf calls)
>>>>+issued by the UMD, without user having to provide a list of all 
>>>>required
>>>>+mappings during each submission (as required by older execbuf mode).
>>>>+
>>>>+The VM_BIND/UNBIND calls allow UMDs to request a timeline fence 
>>>>for signaling
>>>>+the completion of bind/unbind operation.
>>>>+
>>>>+VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND.
>>>>+User has to opt-in for VM_BIND mode of binding for an address 
>>>>space (VM)
>>>>+during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
>>>>+
>>>>+Normally, vm_bind/unbind operations will get completed synchronously,
>>>
>>>To me synchronously, at this point in the text, reads as ioctl 
>>>will return only when the operation is done. Rest of the paragraph 
>>>however disagrees (plus existence of out fence). It is not clear 
>>>to me what is the actual behaviour. Will it be clear to userspace 
>>>developers reading uapi kerneldoc? If it is async, what are the 
>>>ordering rules in this version?
>>>
>>
>>Yah, here I am simply stating the i915_vma_pin_ww() behavior which mostly
>>does the binding synchronously unless there is a moving fence associated
>>with the object in which case, binding will complete later once that fence
>>is signaled (hence the out fence).
>
>So from userspace point of view it is fully asynchronous and out of 
>order? I'd suggest spelling that out in the uapi kerneldoc.
>

Yah. I can see how some i915 details I provided here can be confusing.
Ok, will remove and spell it out that user must anticipate fully async
out of order completions.

>>>>+but if the object is being moved, the binding will happen once that the
>>>>+moving is complete and out fence will be signaled after binding 
>>>>is complete.
>>>>+The bind/unbind operation can get completed out of submission order.
>>>>+
>>>>+VM_BIND features include:
>>>>+
>>>>+* Multiple Virtual Address (VA) mappings can map to the same 
>>>>physical pages
>>>>+  of an object (aliasing).
>>>>+* VA mapping can map to a partial section of the BO (partial binding).
>>>>+* Support capture of persistent mappings in the dump upon GPU error.
>>>>+* TLB is flushed upon unbind completion. Batching of TLB 
>>>>flushes in some
>>>>+  use cases will be helpful.
>>>>+* Support for userptr gem objects (no special uapi is required 
>>>>for this).
>>>>+
>>>>+Execbuf ioctl in VM_BIND mode
>>>>+-------------------------------
>>>>+A VM in VM_BIND mode will not support older execbuf mode of binding.
>>>>+The execbuf ioctl handling in VM_BIND mode differs 
>>>>significantly from the
>>>>+older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
>>>>+Hence, a new execbuf3 ioctl has been added to support VM_BIND 
>>>>mode. (See
>>>>+struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not 
>>>>accept any
>>>>+execlist. Hence, no support for implicit sync. It is expected 
>>>>that the below
>>>>+work will be able to support requirements of object dependency 
>>>>setting in all
>>>>+use cases:
>>>>+
>>>>+"dma-buf: Add an API for exporting sync files"
>>>>+(https://lwn.net/Articles/859290/)
>>>
>>>What does this mean? If execbuf3 does not know about target 
>>>objects how can we add a meaningful fence?
>>>
>>
>>Execbuf3 does know about the target objects. It is all the objects
>>bound to that VM via vm_bind call.
>>
>>>>+
>>>>+The execbuf3 ioctl directly specifies the batch addresses instead of as
>>>>+object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
>>>>+support many of the older features like in/out/submit fences, 
>>>>fence array,
>>>>+default gem context and many more (See struct 
>>>>drm_i915_gem_execbuffer3).
>>>>+
>>>>+In VM_BIND mode, VA allocation is completely managed by the 
>>>>user instead of
>>>>+the i915 driver. Hence all VA assignment, eviction are not 
>>>>applicable in
>>>>+VM_BIND mode. Also, for determining object activeness, VM_BIND 
>>>>mode will not
>>>>+be using the i915_vma active reference tracking. It will 
>>>>instead use dma-resv
>>>>+object for that (See `VM_BIND dma_resv usage`_).
>>>>+
>>>>+So, a lot of existing code supporting execbuf2 ioctl, like 
>>>>relocations, VA
>>>>+evictions, vma lookup table, implicit sync, vma active 
>>>>reference tracking etc.,
>>>>+are not applicable for execbuf3 ioctl. Hence, all execbuf3 
>>>>specific handling
>>>>+should be in a separate file and only functionalities common to 
>>>>these ioctls
>>>>+can be the shared code where possible.
>>>>+
>>>>+VM_PRIVATE objects
>>>>+-------------------
>>>>+By default, BOs can be mapped on multiple VMs and can also be dma-buf
>>>>+exported. Hence these BOs are referred to as Shared BOs.
>>>>+During each execbuf submission, the request fence must be added to the
>>>>+dma-resv fence list of all shared BOs mapped on the VM.
>>>
>>>Does this tie to my previous question? Design is to add each fence 
>>>to literally _all_ BOs mapped to a VM, on every execbuf3? If so, 
>>>is that definitely needed and for what use case? Mixing implicit 
>>>and explicit, I mean bridging implicit and explicit sync clients?
>>>
>>
>>Yes. It is similar to how legacy execbuf2 does. ie., add request fence
>>to all of the target BOs. Only difference is in execbuf2 case, target
>>objects are the objects in execlist, whereas in execbuf2, it is all
>>the BOs mapped to that VM via vm_bind call. It is needed as UMD says
>>that it is needed by vm_bind'ing the BO before the execbuf3 call.
>
>Sorry I did not understand why it is needed, the last sentence that 
>is, what did that suppose to mean?
>

I am seeing there is a typo in my above comment. It should have been,
"wherewas in execbuf3, it is all the BOs mapped to that VM via vm_bind call".

We need all the BO's dma-resv fence list should be properly updated
as we depend on it for gem_wait ioctl etc. Also note that we are moving
away from i915_vma active tracking mechanism and instead will be checking
the BO's dma-resv fence list to check if BO is active or not.
So, we need the BO's dma-resv fence list properly updated.
As for execbuf3, all the vm_bind BOs are target BOs, we need to update
the dma-resv fence list for all of them (private or shared).

Niranjana

>Regards,
>
>Tvrtko
>
>>Niranjana
>>
>>>Regards,
>>>
>>>Tvrtko
>>>
>>>>+
>>>>+VM_BIND feature introduces an optimization where user can 
>>>>create BO which
>>>>+is private to a specified VM via I915_GEM_CREATE_EXT_VM_PRIVATE 
>>>>flag during
>>>>+BO creation. Unlike Shared BOs, these VM private BOs can only 
>>>>be mapped on
>>>>+the VM they are private to and can't be dma-buf exported.
>>>>+All private BOs of a VM share the dma-resv object. Hence during 
>>>>each execbuf
>>>>+submission, they need only one dma-resv fence list updated. 
>>>>Thus, the fast
>>>>+path (where required mappings are already bound) submission 
>>>>latency is O(1)
>>>>+w.r.t the number of VM private BOs.
>>>>+
>>>>+VM_BIND locking hirarchy
>>>>+-------------------------
>>>>+The locking design here supports the older (execlist based) 
>>>>execbuf mode, the
>>>>+newer VM_BIND mode, the VM_BIND mode with GPU page faults and 
>>>>possible future
>>>>+system allocator support (See `Shared Virtual Memory (SVM) support`_).
>>>>+The older execbuf mode and the newer VM_BIND mode without page 
>>>>faults manages
>>>>+residency of backing storage using dma_fence. The VM_BIND mode 
>>>>with page faults
>>>>+and the system allocator support do not use any dma_fence at all.
>>>>+
>>>>+VM_BIND locking order is as below.
>>>>+
>>>>+1) Lock-A: A vm_bind mutex will protect vm_bind lists. This 
>>>>lock is taken in
>>>>+   vm_bind/vm_unbind ioctl calls, in the execbuf path and while 
>>>>releasing the
>>>>+   mapping.
>>>>+
>>>>+   In future, when GPU page faults are supported, we can 
>>>>potentially use a
>>>>+   rwsem instead, so that multiple page fault handlers can take 
>>>>the read side
>>>>+   lock to lookup the mapping and hence can run in parallel.
>>>>+   The older execbuf mode of binding do not need this lock.
>>>>+
>>>>+2) Lock-B: The object's dma-resv lock will protect i915_vma 
>>>>state and needs to
>>>>+   be held while binding/unbinding a vma in the async worker 
>>>>and while updating
>>>>+   dma-resv fence list of an object. Note that private BOs of a 
>>>>VM will all
>>>>+   share a dma-resv object.
>>>>+
>>>>+   The future system allocator support will use the HMM 
>>>>prescribed locking
>>>>+   instead.
>>>>+
>>>>+3) Lock-C: Spinlock/s to protect some of the VM's lists like 
>>>>the list of
>>>>+   invalidated vmas (due to eviction and userptr invalidation) etc.
>>>>+
>>>>+When GPU page faults are supported, the execbuf path do not 
>>>>take any of these
>>>>+locks. There we will simply smash the new batch buffer address 
>>>>into the ring and
>>>>+then tell the scheduler run that. The lock taking only happens 
>>>>from the page
>>>>+fault handler, where we take lock-A in read mode, whichever 
>>>>lock-B we need to
>>>>+find the backing storage (dma_resv lock for gem objects, and 
>>>>hmm/core mm for
>>>>+system allocator) and some additional locks (lock-D) for taking 
>>>>care of page
>>>>+table races. Page fault mode should not need to ever manipulate 
>>>>the vm lists,
>>>>+so won't ever need lock-C.
>>>>+
>>>>+VM_BIND LRU handling
>>>>+---------------------
>>>>+We need to ensure VM_BIND mapped objects are properly LRU 
>>>>tagged to avoid
>>>>+performance degradation. We will also need support for bulk LRU 
>>>>movement of
>>>>+VM_BIND objects to avoid additional latencies in execbuf path.
>>>>+
>>>>+The page table pages are similar to VM_BIND mapped objects (See
>>>>+`Evictable page table allocations`_) and are maintained per VM 
>>>>and needs to
>>>>+be pinned in memory when VM is made active (ie., upon an 
>>>>execbuf call with
>>>>+that VM). So, bulk LRU movement of page table pages is also needed.
>>>>+
>>>>+VM_BIND dma_resv usage
>>>>+-----------------------
>>>>+Fences needs to be added to all VM_BIND mapped objects. During 
>>>>each execbuf
>>>>+submission, they are added with DMA_RESV_USAGE_BOOKKEEP usage 
>>>>to prevent
>>>>+over sync (See enum dma_resv_usage). One can override it with either
>>>>+DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE usage during object 
>>>>dependency
>>>>+setting (either through explicit or implicit mechanism).
>>>>+
>>>>+When vm_bind is called for a non-private object while the VM is already
>>>>+active, the fences need to be copied from VM's shared dma-resv object
>>>>+(common to all private objects of the VM) to this non-private object.
>>>>+If this results in performance degradation, then some optimization will
>>>>+be needed here. This is not a problem for VM's private objects 
>>>>as they use
>>>>+shared dma-resv object which is always updated on each execbuf 
>>>>submission.
>>>>+
>>>>+Also, in VM_BIND mode, use dma-resv apis for determining object 
>>>>activeness
>>>>+(See dma_resv_test_signaled() and dma_resv_wait_timeout()) and 
>>>>do not use the
>>>>+older i915_vma active reference tracking which is deprecated. 
>>>>This should be
>>>>+easier to get it working with the current TTM backend.
>>>>+
>>>>+Mesa use case
>>>>+--------------
>>>>+VM_BIND can potentially reduce the CPU overhead in Mesa (both 
>>>>Vulkan and Iris),
>>>>+hence improving performance of CPU-bound applications. It also 
>>>>allows us to
>>>>+implement Vulkan's Sparse Resources. With increasing GPU 
>>>>hardware performance,
>>>>+reducing CPU overhead becomes more impactful.
>>>>+
>>>>+
>>>>+Other VM_BIND use cases
>>>>+========================
>>>>+
>>>>+Long running Compute contexts
>>>>+------------------------------
>>>>+Usage of dma-fence expects that they complete in reasonable 
>>>>amount of time.
>>>>+Compute on the other hand can be long running. Hence it is 
>>>>appropriate for
>>>>+compute to use user/memory fence (See `User/Memory Fence`_) and 
>>>>dma-fence usage
>>>>+must be limited to in-kernel consumption only.
>>>>+
>>>>+Where GPU page faults are not available, kernel driver upon 
>>>>buffer invalidation
>>>>+will initiate a suspend (preemption) of long running context, 
>>>>finish the
>>>>+invalidation, revalidate the BO and then resume the compute 
>>>>context. This is
>>>>+done by having a per-context preempt fence which is enabled 
>>>>when someone tries
>>>>+to wait on it and triggers the context preemption.
>>>>+
>>>>+User/Memory Fence
>>>>+~~~~~~~~~~~~~~~~~~
>>>>+User/Memory fence is a <address, value> pair. To signal the 
>>>>user fence, the
>>>>+specified value will be written at the specified virtual 
>>>>address and wakeup the
>>>>+waiting process. User fence can be signaled either by the GPU 
>>>>or kernel async
>>>>+worker (like upon bind completion). User can wait on a user 
>>>>fence with a new
>>>>+user fence wait ioctl.
>>>>+
>>>>+Here is some prior work on this:
>>>>+https://patchwork.freedesktop.org/patch/349417/
>>>>+
>>>>+Low Latency Submission
>>>>+~~~~~~~~~~~~~~~~~~~~~~~
>>>>+Allows compute UMD to directly submit GPU jobs instead of 
>>>>through execbuf
>>>>+ioctl. This is made possible by VM_BIND is not being 
>>>>synchronized against
>>>>+execbuf. VM_BIND allows bind/unbind of mappings required for 
>>>>the directly
>>>>+submitted jobs.
>>>>+
>>>>+Debugger
>>>>+---------
>>>>+With debug event interface user space process (debugger) is 
>>>>able to keep track
>>>>+of and act upon resources created by another process (debugged) 
>>>>and attached
>>>>+to GPU via vm_bind interface.
>>>>+
>>>>+GPU page faults
>>>>+----------------
>>>>+GPU page faults when supported (in future), will only be 
>>>>supported in the
>>>>+VM_BIND mode. While both the older execbuf mode and the newer 
>>>>VM_BIND mode of
>>>>+binding will require using dma-fence to ensure residency, the 
>>>>GPU page faults
>>>>+mode when supported, will not use any dma-fence as residency is 
>>>>purely managed
>>>>+by installing and removing/invalidating page table entries.
>>>>+
>>>>+Page level hints settings
>>>>+--------------------------
>>>>+VM_BIND allows any hints setting per mapping instead of per BO.
>>>>+Possible hints include read-only mapping, placement and atomicity.
>>>>+Sub-BO level placement hint will be even more relevant with
>>>>+upcoming GPU on-demand page fault support.
>>>>+
>>>>+Page level Cache/CLOS settings
>>>>+-------------------------------
>>>>+VM_BIND allows cache/CLOS settings per mapping instead of per BO.
>>>>+
>>>>+Evictable page table allocations
>>>>+---------------------------------
>>>>+Make pagetable allocations evictable and manage them similar to VM_BIND
>>>>+mapped objects. Page table pages are similar to persistent 
>>>>mappings of a
>>>>+VM (difference here are that the page table pages will not have 
>>>>an i915_vma
>>>>+structure and after swapping pages back in, parent page link 
>>>>needs to be
>>>>+updated).
>>>>+
>>>>+Shared Virtual Memory (SVM) support
>>>>+------------------------------------
>>>>+VM_BIND interface can be used to map system memory directly 
>>>>(without gem BO
>>>>+abstraction) using the HMM interface. SVM is only supported 
>>>>with GPU page
>>>>+faults enabled.
>>>>+
>>>>+VM_BIND UAPI
>>>>+=============
>>>>+
>>>>+.. kernel-doc:: Documentation/gpu/rfc/i915_vm_bind.h
>>>>diff --git a/Documentation/gpu/rfc/index.rst 
>>>>b/Documentation/gpu/rfc/index.rst
>>>>index 91e93a705230..7d10c36b268d 100644
>>>>--- a/Documentation/gpu/rfc/index.rst
>>>>+++ b/Documentation/gpu/rfc/index.rst
>>>>@@ -23,3 +23,7 @@ host such documentation:
>>>> .. toctree::
>>>>     i915_scheduler.rst
>>>>+
>>>>+.. toctree::
>>>>+
>>>>+    i915_vm_bind.rst

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/3] drm/doc/rfc: VM_BIND feature design document
  2022-06-21 14:43         ` Niranjana Vishwanathapura
@ 2022-06-22  8:17           ` Tvrtko Ursulin
  0 siblings, 0 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2022-06-22  8:17 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: paulo.r.zanoni, intel-gfx, chris.p.wilson, thomas.hellstrom,
	dri-devel, daniel.vetter, christian.koenig, matthew.auld


On 21/06/2022 15:43, Niranjana Vishwanathapura wrote:
> On Tue, Jun 21, 2022 at 09:35:16AM +0100, Tvrtko Ursulin wrote:
>>
>> On 20/06/2022 17:29, Niranjana Vishwanathapura wrote:
>>> On Mon, Jun 20, 2022 at 11:43:10AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 17/06/2022 06:14, Niranjana Vishwanathapura wrote:
>>>>> VM_BIND design document with description of intended use cases.
>>>>>
>>>>> v2: Reduce the scope to simple Mesa use case.
>>>>
>>>> since I expressed interest please add me to cc when sending out.
>>>>
>>>
>>> Hi Tvrtko,
>>> I did include you in the cc list with git send-email, but looks like 
>>> some patches
>>> in this series has the full cc list, but some don't (you are on cc 
>>> list of this
>>> patch though). I am not sure why.
>>
>> Odd, I'm not on CC on the (only for me) copy I found in the mailing list.
>>
>>>> How come the direction changed to simplify all of a sudden? I did 
>>>> not spot any discussion to that effect. Was it internal talks?
>>>>
>>>
>>> Yah, some of us had offline discussion involving the Mesa team.
>>> I did update the thread (previous version of this patch series) about 
>>> that.
>>> Plan was to align our roadmap to focus on the deliverables at this point
>>> without further complicating the uapi.
>>>>>
>>>>> Signed-off-by: Niranjana Vishwanathapura 
>>>>> <niranjana.vishwanathapura@intel.com>
>>>>> ---
>>>>>  Documentation/gpu/rfc/i915_vm_bind.rst | 238 
>>>>> +++++++++++++++++++++++++
>>>>>  Documentation/gpu/rfc/index.rst        |   4 +
>>>>>  2 files changed, 242 insertions(+)
>>>>>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst
>>>>>
>>>>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst 
>>>>> b/Documentation/gpu/rfc/i915_vm_bind.rst
>>>>> new file mode 100644
>>>>> index 000000000000..4ab590ef11fd
>>>>> --- /dev/null
>>>>> +++ b/Documentation/gpu/rfc/i915_vm_bind.rst
>>>>> @@ -0,0 +1,238 @@
>>>>> +==========================================
>>>>> +I915 VM_BIND feature design and use cases
>>>>> +==========================================
>>>>> +
>>>>> +VM_BIND feature
>>>>> +================
>>>>> +DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM 
>>>>> buffer
>>>>> +objects (BOs) or sections of a BOs at specified GPU virtual 
>>>>> addresses on a
>>>>> +specified address space (VM). These mappings (also referred to as 
>>>>> persistent
>>>>> +mappings) will be persistent across multiple GPU submissions 
>>>>> (execbuf calls)
>>>>> +issued by the UMD, without user having to provide a list of all 
>>>>> required
>>>>> +mappings during each submission (as required by older execbuf mode).
>>>>> +
>>>>> +The VM_BIND/UNBIND calls allow UMDs to request a timeline fence 
>>>>> for signaling
>>>>> +the completion of bind/unbind operation.
>>>>> +
>>>>> +VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND.
>>>>> +User has to opt-in for VM_BIND mode of binding for an address 
>>>>> space (VM)
>>>>> +during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND 
>>>>> extension.
>>>>> +
>>>>> +Normally, vm_bind/unbind operations will get completed synchronously,
>>>>
>>>> To me synchronously, at this point in the text, reads as ioctl will 
>>>> return only when the operation is done. Rest of the paragraph 
>>>> however disagrees (plus existence of out fence). It is not clear to 
>>>> me what is the actual behaviour. Will it be clear to userspace 
>>>> developers reading uapi kerneldoc? If it is async, what are the 
>>>> ordering rules in this version?
>>>>
>>>
>>> Yah, here I am simply stating the i915_vma_pin_ww() behavior which 
>>> mostly
>>> does the binding synchronously unless there is a moving fence associated
>>> with the object in which case, binding will complete later once that 
>>> fence
>>> is signaled (hence the out fence).
>>
>> So from userspace point of view it is fully asynchronous and out of 
>> order? I'd suggest spelling that out in the uapi kerneldoc.
>>
> 
> Yah. I can see how some i915 details I provided here can be confusing.
> Ok, will remove and spell it out that user must anticipate fully async
> out of order completions.
> 
>>>>> +but if the object is being moved, the binding will happen once 
>>>>> that the
>>>>> +moving is complete and out fence will be signaled after binding is 
>>>>> complete.
>>>>> +The bind/unbind operation can get completed out of submission order.
>>>>> +
>>>>> +VM_BIND features include:
>>>>> +
>>>>> +* Multiple Virtual Address (VA) mappings can map to the same 
>>>>> physical pages
>>>>> +  of an object (aliasing).
>>>>> +* VA mapping can map to a partial section of the BO (partial 
>>>>> binding).
>>>>> +* Support capture of persistent mappings in the dump upon GPU error.
>>>>> +* TLB is flushed upon unbind completion. Batching of TLB flushes 
>>>>> in some
>>>>> +  use cases will be helpful.
>>>>> +* Support for userptr gem objects (no special uapi is required for 
>>>>> this).
>>>>> +
>>>>> +Execbuf ioctl in VM_BIND mode
>>>>> +-------------------------------
>>>>> +A VM in VM_BIND mode will not support older execbuf mode of binding.
>>>>> +The execbuf ioctl handling in VM_BIND mode differs significantly 
>>>>> from the
>>>>> +older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
>>>>> +Hence, a new execbuf3 ioctl has been added to support VM_BIND 
>>>>> mode. (See
>>>>> +struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not 
>>>>> accept any
>>>>> +execlist. Hence, no support for implicit sync. It is expected that 
>>>>> the below
>>>>> +work will be able to support requirements of object dependency 
>>>>> setting in all
>>>>> +use cases:
>>>>> +
>>>>> +"dma-buf: Add an API for exporting sync files"
>>>>> +(https://lwn.net/Articles/859290/)
>>>>
>>>> What does this mean? If execbuf3 does not know about target objects 
>>>> how can we add a meaningful fence?
>>>>
>>>
>>> Execbuf3 does know about the target objects. It is all the objects
>>> bound to that VM via vm_bind call.
>>>
>>>>> +
>>>>> +The execbuf3 ioctl directly specifies the batch addresses instead 
>>>>> of as
>>>>> +object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
>>>>> +support many of the older features like in/out/submit fences, 
>>>>> fence array,
>>>>> +default gem context and many more (See struct 
>>>>> drm_i915_gem_execbuffer3).
>>>>> +
>>>>> +In VM_BIND mode, VA allocation is completely managed by the user 
>>>>> instead of
>>>>> +the i915 driver. Hence all VA assignment, eviction are not 
>>>>> applicable in
>>>>> +VM_BIND mode. Also, for determining object activeness, VM_BIND 
>>>>> mode will not
>>>>> +be using the i915_vma active reference tracking. It will instead 
>>>>> use dma-resv
>>>>> +object for that (See `VM_BIND dma_resv usage`_).
>>>>> +
>>>>> +So, a lot of existing code supporting execbuf2 ioctl, like 
>>>>> relocations, VA
>>>>> +evictions, vma lookup table, implicit sync, vma active reference 
>>>>> tracking etc.,
>>>>> +are not applicable for execbuf3 ioctl. Hence, all execbuf3 
>>>>> specific handling
>>>>> +should be in a separate file and only functionalities common to 
>>>>> these ioctls
>>>>> +can be the shared code where possible.
>>>>> +
>>>>> +VM_PRIVATE objects
>>>>> +-------------------
>>>>> +By default, BOs can be mapped on multiple VMs and can also be dma-buf
>>>>> +exported. Hence these BOs are referred to as Shared BOs.
>>>>> +During each execbuf submission, the request fence must be added to 
>>>>> the
>>>>> +dma-resv fence list of all shared BOs mapped on the VM.
>>>>
>>>> Does this tie to my previous question? Design is to add each fence 
>>>> to literally _all_ BOs mapped to a VM, on every execbuf3? If so, is 
>>>> that definitely needed and for what use case? Mixing implicit and 
>>>> explicit, I mean bridging implicit and explicit sync clients?
>>>>
>>>
>>> Yes. It is similar to how legacy execbuf2 does. ie., add request fence
>>> to all of the target BOs. Only difference is in execbuf2 case, target
>>> objects are the objects in execlist, whereas in execbuf2, it is all
>>> the BOs mapped to that VM via vm_bind call. It is needed as UMD says
>>> that it is needed by vm_bind'ing the BO before the execbuf3 call.
>>
>> Sorry I did not understand why it is needed, the last sentence that 
>> is, what did that suppose to mean?
>>
> 
> I am seeing there is a typo in my above comment. It should have been,
> "wherewas in execbuf3, it is all the BOs mapped to that VM via vm_bind 
> call".
> 
> We need all the BO's dma-resv fence list should be properly updated
> as we depend on it for gem_wait ioctl etc. Also note that we are moving
> away from i915_vma active tracking mechanism and instead will be checking
> the BO's dma-resv fence list to check if BO is active or not.
> So, we need the BO's dma-resv fence list properly updated.
> As for execbuf3, all the vm_bind BOs are target BOs, we need to update
> the dma-resv fence list for all of them (private or shared).

Why do we care about gem_wait on a random BO handle if userspace is 
supposed to explicitly manage things? Perhaps the key is in the "etc" 
part - so what is etc? Or maybe I am not seeing something in the 
activity tracking angle? If it is eviction then why it wouldn't be 
possible to just not evict anything from a vm if a vm is busy?

Anyway my concern is that inserting a fence to _all_ objects in a VM on 
_every_ execbuf feels it could be quite costly. So there should be a 
strong reason to do it which needs to be documented.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-06-22  8:18 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-17  5:14 [PATCH v2 0/3] drm/doc/rfc: i915 VM_BIND feature design + uapi Niranjana Vishwanathapura
2022-06-17  5:14 ` [Intel-gfx] " Niranjana Vishwanathapura
2022-06-17  5:14 ` [PATCH v2 1/3] drm/doc/rfc: VM_BIND feature design document Niranjana Vishwanathapura
2022-06-17  5:14   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-06-20 10:43   ` Tvrtko Ursulin
2022-06-20 16:29     ` Niranjana Vishwanathapura
2022-06-21  8:35       ` Tvrtko Ursulin
2022-06-21 14:43         ` Niranjana Vishwanathapura
2022-06-22  8:17           ` Tvrtko Ursulin
2022-06-17  5:14 ` [PATCH v2 2/3] drm/i915: Update i915 uapi documentation Niranjana Vishwanathapura
2022-06-17  5:14   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-06-17  9:10   ` kernel test robot
2022-06-17  5:14 ` [PATCH v2 3/3] drm/doc/rfc: VM_BIND uapi definition Niranjana Vishwanathapura
2022-06-17  5:14   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-06-20 14:42   ` Zeng, Oak
2022-06-20 14:42     ` [Intel-gfx] " Zeng, Oak
2022-06-20 15:58     ` Niranjana Vishwanathapura
2022-06-20 15:58       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-06-17  5:26 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/doc/rfc: i915 VM_BIND feature design + uapi Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.