[PATCH v7 0/3] drm/doc/rfc: i915 VM_BIND feature design + uapi

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v7 0/3] drm/doc/rfc: i915 VM_BIND feature design + uapi
@ 2022-06-26  1:49 ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-26  1:49 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, lionel.g.landwerlin,
	tvrtko.ursulin, chris.p.wilson, thomas.hellstrom, oak.zeng,
	matthew.auld, jason, daniel.vetter, christian.koenig

This is the i915 driver VM_BIND feature design RFC patch series along
with the required uapi definition and description of intended use cases.

v2: Reduce the scope to simple Mesa use case.
    Remove all compute related uapi, vm_bind/unbind queue support and
    only support a timeline out fence instead of an in/out timeline
    fence array.
v3: Expand documentation on dma-resv usage, TLB flushing, execbuf3 and
    VM_UNBIND. Add FENCE_VALID and TLB_FLUSH flags.
v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
    uapi documentation for vm_bind/unbind.
v5: Update TLB flushing documentation.
    Add version support to stage implementation.
v6: Define and use drm_i915_gem_timeline_fence structure for
    execbuf3 and vm_bind/unbind timeline fences.
v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
    Update documentation on async vm_bind/unbind and versioning.
    Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
    batch_count field and I915_EXEC3_SECURE flag.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Niranjana Vishwanathapura (3):
  drm/doc/rfc: VM_BIND feature design document
  drm/i915: Update i915 uapi documentation
  drm/doc/rfc: VM_BIND uapi definition

 Documentation/gpu/rfc/i915_vm_bind.h   | 280 +++++++++++++++++++++++++
 Documentation/gpu/rfc/i915_vm_bind.rst | 246 ++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst        |   4 +
 include/uapi/drm/i915_drm.h            | 205 ++++++++++++++----
 4 files changed, 690 insertions(+), 45 deletions(-)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst

-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH v7 0/3] drm/doc/rfc: i915 VM_BIND feature design + uapi
@ 2022-06-26  1:49 ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-26  1:49 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, chris.p.wilson, thomas.hellstrom, matthew.auld,
	daniel.vetter, christian.koenig

This is the i915 driver VM_BIND feature design RFC patch series along
with the required uapi definition and description of intended use cases.

v2: Reduce the scope to simple Mesa use case.
    Remove all compute related uapi, vm_bind/unbind queue support and
    only support a timeline out fence instead of an in/out timeline
    fence array.
v3: Expand documentation on dma-resv usage, TLB flushing, execbuf3 and
    VM_UNBIND. Add FENCE_VALID and TLB_FLUSH flags.
v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
    uapi documentation for vm_bind/unbind.
v5: Update TLB flushing documentation.
    Add version support to stage implementation.
v6: Define and use drm_i915_gem_timeline_fence structure for
    execbuf3 and vm_bind/unbind timeline fences.
v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
    Update documentation on async vm_bind/unbind and versioning.
    Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
    batch_count field and I915_EXEC3_SECURE flag.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Niranjana Vishwanathapura (3):
  drm/doc/rfc: VM_BIND feature design document
  drm/i915: Update i915 uapi documentation
  drm/doc/rfc: VM_BIND uapi definition

 Documentation/gpu/rfc/i915_vm_bind.h   | 280 +++++++++++++++++++++++++
 Documentation/gpu/rfc/i915_vm_bind.rst | 246 ++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst        |   4 +
 include/uapi/drm/i915_drm.h            | 205 ++++++++++++++----
 4 files changed, 690 insertions(+), 45 deletions(-)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst

-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v6 1/3] drm/doc/rfc: VM_BIND feature design document
  2022-06-26  1:49 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-06-26  1:49   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-26  1:49 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, lionel.g.landwerlin,
	tvrtko.ursulin, chris.p.wilson, thomas.hellstrom, oak.zeng,
	matthew.auld, jason, daniel.vetter, christian.koenig

VM_BIND design document with description of intended use cases.

v2: Reduce the scope to simple Mesa use case.
v3: Expand documentation on dma-resv usage, TLB flushing and
    execbuf3.
v4: Remove vm_bind tlb flush request support.
v5: Update TLB flushing documentation.
v6: Update out of order completion documentation.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 Documentation/gpu/rfc/i915_vm_bind.rst | 246 +++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst        |   4 +
 2 files changed, 250 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst

diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst b/Documentation/gpu/rfc/i915_vm_bind.rst
new file mode 100644
index 000000000000..032ee32b885c
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.rst
@@ -0,0 +1,246 @@
+==========================================
+I915 VM_BIND feature design and use cases
+==========================================
+
+VM_BIND feature
+================
+DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
+objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
+specified address space (VM). These mappings (also referred to as persistent
+mappings) will be persistent across multiple GPU submissions (execbuf calls)
+issued by the UMD, without user having to provide a list of all required
+mappings during each submission (as required by older execbuf mode).
+
+The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
+signaling the completion of bind/unbind operation.
+
+VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND.
+User has to opt-in for VM_BIND mode of binding for an address space (VM)
+during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
+
+VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently are
+not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be done
+asynchronously, when valid out fence is specified.
+
+VM_BIND features include:
+
+* Multiple Virtual Address (VA) mappings can map to the same physical pages
+  of an object (aliasing).
+* VA mapping can map to a partial section of the BO (partial binding).
+* Support capture of persistent mappings in the dump upon GPU error.
+* Support for userptr gem objects (no special uapi is required for this).
+
+TLB flush consideration
+------------------------
+The i915 driver flushes the TLB for each submission and when an object's
+pages are released. The VM_BIND/UNBIND operation will not do any additional
+TLB flush. Any VM_BIND mapping added will be in the working set for subsequent
+submissions on that VM and will not be in the working set for currently running
+batches (which would require additional TLB flushes, which is not supported).
+
+Execbuf ioctl in VM_BIND mode
+-------------------------------
+A VM in VM_BIND mode will not support older execbuf mode of binding.
+The execbuf ioctl handling in VM_BIND mode differs significantly from the
+older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
+Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
+struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
+execlist. Hence, no support for implicit sync. It is expected that the below
+work will be able to support requirements of object dependency setting in all
+use cases:
+
+"dma-buf: Add an API for exporting sync files"
+(https://lwn.net/Articles/859290/)
+
+The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
+works with execbuf3 ioctl for submission. All BOs mapped on that VM (through
+VM_BIND call) at the time of execbuf3 call are deemed required for that
+submission.
+
+The execbuf3 ioctl directly specifies the batch addresses instead of as
+object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
+support many of the older features like in/out/submit fences, fence array,
+default gem context and many more (See struct drm_i915_gem_execbuffer3).
+
+In VM_BIND mode, VA allocation is completely managed by the user instead of
+the i915 driver. Hence all VA assignment, eviction are not applicable in
+VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
+be using the i915_vma active reference tracking. It will instead use dma-resv
+object for that (See `VM_BIND dma_resv usage`_).
+
+So, a lot of existing code supporting execbuf2 ioctl, like relocations, VA
+evictions, vma lookup table, implicit sync, vma active reference tracking etc.,
+are not applicable for execbuf3 ioctl. Hence, all execbuf3 specific handling
+should be in a separate file and only functionalities common to these ioctls
+can be the shared code where possible.
+
+VM_PRIVATE objects
+-------------------
+By default, BOs can be mapped on multiple VMs and can also be dma-buf
+exported. Hence these BOs are referred to as Shared BOs.
+During each execbuf submission, the request fence must be added to the
+dma-resv fence list of all shared BOs mapped on the VM.
+
+VM_BIND feature introduces an optimization where user can create BO which
+is private to a specified VM via I915_GEM_CREATE_EXT_VM_PRIVATE flag during
+BO creation. Unlike Shared BOs, these VM private BOs can only be mapped on
+the VM they are private to and can't be dma-buf exported.
+All private BOs of a VM share the dma-resv object. Hence during each execbuf
+submission, they need only one dma-resv fence list updated. Thus, the fast
+path (where required mappings are already bound) submission latency is O(1)
+w.r.t the number of VM private BOs.
+
+VM_BIND locking hirarchy
+-------------------------
+The locking design here supports the older (execlist based) execbuf mode, the
+newer VM_BIND mode, the VM_BIND mode with GPU page faults and possible future
+system allocator support (See `Shared Virtual Memory (SVM) support`_).
+The older execbuf mode and the newer VM_BIND mode without page faults manages
+residency of backing storage using dma_fence. The VM_BIND mode with page faults
+and the system allocator support do not use any dma_fence at all.
+
+VM_BIND locking order is as below.
+
+1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
+   vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
+   mapping.
+
+   In future, when GPU page faults are supported, we can potentially use a
+   rwsem instead, so that multiple page fault handlers can take the read side
+   lock to lookup the mapping and hence can run in parallel.
+   The older execbuf mode of binding do not need this lock.
+
+2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs to
+   be held while binding/unbinding a vma in the async worker and while updating
+   dma-resv fence list of an object. Note that private BOs of a VM will all
+   share a dma-resv object.
+
+   The future system allocator support will use the HMM prescribed locking
+   instead.
+
+3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
+   invalidated vmas (due to eviction and userptr invalidation) etc.
+
+When GPU page faults are supported, the execbuf path do not take any of these
+locks. There we will simply smash the new batch buffer address into the ring and
+then tell the scheduler run that. The lock taking only happens from the page
+fault handler, where we take lock-A in read mode, whichever lock-B we need to
+find the backing storage (dma_resv lock for gem objects, and hmm/core mm for
+system allocator) and some additional locks (lock-D) for taking care of page
+table races. Page fault mode should not need to ever manipulate the vm lists,
+so won't ever need lock-C.
+
+VM_BIND LRU handling
+---------------------
+We need to ensure VM_BIND mapped objects are properly LRU tagged to avoid
+performance degradation. We will also need support for bulk LRU movement of
+VM_BIND objects to avoid additional latencies in execbuf path.
+
+The page table pages are similar to VM_BIND mapped objects (See
+`Evictable page table allocations`_) and are maintained per VM and needs to
+be pinned in memory when VM is made active (ie., upon an execbuf call with
+that VM). So, bulk LRU movement of page table pages is also needed.
+
+VM_BIND dma_resv usage
+-----------------------
+Fences needs to be added to all VM_BIND mapped objects. During each execbuf
+submission, they are added with DMA_RESV_USAGE_BOOKKEEP usage to prevent
+over sync (See enum dma_resv_usage). One can override it with either
+DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE usage during explicit object
+dependency setting.
+
+Note that DRM_I915_GEM_WAIT and DRM_I915_GEM_BUSY ioctls do not check for
+DMA_RESV_USAGE_BOOKKEEP usage and hence should not be used for end of batch
+check. Instead, the execbuf3 out fence should be used for end of batch check
+(See struct drm_i915_gem_execbuffer3).
+
+Also, in VM_BIND mode, use dma-resv apis for determining object activeness
+(See dma_resv_test_signaled() and dma_resv_wait_timeout()) and do not use the
+older i915_vma active reference tracking which is deprecated. This should be
+easier to get it working with the current TTM backend.
+
+Mesa use case
+--------------
+VM_BIND can potentially reduce the CPU overhead in Mesa (both Vulkan and Iris),
+hence improving performance of CPU-bound applications. It also allows us to
+implement Vulkan's Sparse Resources. With increasing GPU hardware performance,
+reducing CPU overhead becomes more impactful.
+
+
+Other VM_BIND use cases
+========================
+
+Long running Compute contexts
+------------------------------
+Usage of dma-fence expects that they complete in reasonable amount of time.
+Compute on the other hand can be long running. Hence it is appropriate for
+compute to use user/memory fence (See `User/Memory Fence`_) and dma-fence usage
+must be limited to in-kernel consumption only.
+
+Where GPU page faults are not available, kernel driver upon buffer invalidation
+will initiate a suspend (preemption) of long running context, finish the
+invalidation, revalidate the BO and then resume the compute context. This is
+done by having a per-context preempt fence which is enabled when someone tries
+to wait on it and triggers the context preemption.
+
+User/Memory Fence
+~~~~~~~~~~~~~~~~~~
+User/Memory fence is a <address, value> pair. To signal the user fence, the
+specified value will be written at the specified virtual address and wakeup the
+waiting process. User fence can be signaled either by the GPU or kernel async
+worker (like upon bind completion). User can wait on a user fence with a new
+user fence wait ioctl.
+
+Here is some prior work on this:
+https://patchwork.freedesktop.org/patch/349417/
+
+Low Latency Submission
+~~~~~~~~~~~~~~~~~~~~~~~
+Allows compute UMD to directly submit GPU jobs instead of through execbuf
+ioctl. This is made possible by VM_BIND is not being synchronized against
+execbuf. VM_BIND allows bind/unbind of mappings required for the directly
+submitted jobs.
+
+Debugger
+---------
+With debug event interface user space process (debugger) is able to keep track
+of and act upon resources created by another process (debugged) and attached
+to GPU via vm_bind interface.
+
+GPU page faults
+----------------
+GPU page faults when supported (in future), will only be supported in the
+VM_BIND mode. While both the older execbuf mode and the newer VM_BIND mode of
+binding will require using dma-fence to ensure residency, the GPU page faults
+mode when supported, will not use any dma-fence as residency is purely managed
+by installing and removing/invalidating page table entries.
+
+Page level hints settings
+--------------------------
+VM_BIND allows any hints setting per mapping instead of per BO.
+Possible hints include read-only mapping, placement and atomicity.
+Sub-BO level placement hint will be even more relevant with
+upcoming GPU on-demand page fault support.
+
+Page level Cache/CLOS settings
+-------------------------------
+VM_BIND allows cache/CLOS settings per mapping instead of per BO.
+
+Evictable page table allocations
+---------------------------------
+Make pagetable allocations evictable and manage them similar to VM_BIND
+mapped objects. Page table pages are similar to persistent mappings of a
+VM (difference here are that the page table pages will not have an i915_vma
+structure and after swapping pages back in, parent page link needs to be
+updated).
+
+Shared Virtual Memory (SVM) support
+------------------------------------
+VM_BIND interface can be used to map system memory directly (without gem BO
+abstraction) using the HMM interface. SVM is only supported with GPU page
+faults enabled.
+
+VM_BIND UAPI
+=============
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_vm_bind.h
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..7d10c36b268d 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -23,3 +23,7 @@ host such documentation:
 .. toctree::
 
     i915_scheduler.rst
+
+.. toctree::
+
+    i915_vm_bind.rst
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH v6 1/3] drm/doc/rfc: VM_BIND feature design document
@ 2022-06-26  1:49   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-26  1:49 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, chris.p.wilson, thomas.hellstrom, matthew.auld,
	daniel.vetter, christian.koenig

VM_BIND design document with description of intended use cases.

v2: Reduce the scope to simple Mesa use case.
v3: Expand documentation on dma-resv usage, TLB flushing and
    execbuf3.
v4: Remove vm_bind tlb flush request support.
v5: Update TLB flushing documentation.
v6: Update out of order completion documentation.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 Documentation/gpu/rfc/i915_vm_bind.rst | 246 +++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst        |   4 +
 2 files changed, 250 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst

diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst b/Documentation/gpu/rfc/i915_vm_bind.rst
new file mode 100644
index 000000000000..032ee32b885c
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.rst
@@ -0,0 +1,246 @@
+==========================================
+I915 VM_BIND feature design and use cases
+==========================================
+
+VM_BIND feature
+================
+DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
+objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
+specified address space (VM). These mappings (also referred to as persistent
+mappings) will be persistent across multiple GPU submissions (execbuf calls)
+issued by the UMD, without user having to provide a list of all required
+mappings during each submission (as required by older execbuf mode).
+
+The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
+signaling the completion of bind/unbind operation.
+
+VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND.
+User has to opt-in for VM_BIND mode of binding for an address space (VM)
+during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
+
+VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently are
+not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be done
+asynchronously, when valid out fence is specified.
+
+VM_BIND features include:
+
+* Multiple Virtual Address (VA) mappings can map to the same physical pages
+  of an object (aliasing).
+* VA mapping can map to a partial section of the BO (partial binding).
+* Support capture of persistent mappings in the dump upon GPU error.
+* Support for userptr gem objects (no special uapi is required for this).
+
+TLB flush consideration
+------------------------
+The i915 driver flushes the TLB for each submission and when an object's
+pages are released. The VM_BIND/UNBIND operation will not do any additional
+TLB flush. Any VM_BIND mapping added will be in the working set for subsequent
+submissions on that VM and will not be in the working set for currently running
+batches (which would require additional TLB flushes, which is not supported).
+
+Execbuf ioctl in VM_BIND mode
+-------------------------------
+A VM in VM_BIND mode will not support older execbuf mode of binding.
+The execbuf ioctl handling in VM_BIND mode differs significantly from the
+older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
+Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
+struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
+execlist. Hence, no support for implicit sync. It is expected that the below
+work will be able to support requirements of object dependency setting in all
+use cases:
+
+"dma-buf: Add an API for exporting sync files"
+(https://lwn.net/Articles/859290/)
+
+The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
+works with execbuf3 ioctl for submission. All BOs mapped on that VM (through
+VM_BIND call) at the time of execbuf3 call are deemed required for that
+submission.
+
+The execbuf3 ioctl directly specifies the batch addresses instead of as
+object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
+support many of the older features like in/out/submit fences, fence array,
+default gem context and many more (See struct drm_i915_gem_execbuffer3).
+
+In VM_BIND mode, VA allocation is completely managed by the user instead of
+the i915 driver. Hence all VA assignment, eviction are not applicable in
+VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
+be using the i915_vma active reference tracking. It will instead use dma-resv
+object for that (See `VM_BIND dma_resv usage`_).
+
+So, a lot of existing code supporting execbuf2 ioctl, like relocations, VA
+evictions, vma lookup table, implicit sync, vma active reference tracking etc.,
+are not applicable for execbuf3 ioctl. Hence, all execbuf3 specific handling
+should be in a separate file and only functionalities common to these ioctls
+can be the shared code where possible.
+
+VM_PRIVATE objects
+-------------------
+By default, BOs can be mapped on multiple VMs and can also be dma-buf
+exported. Hence these BOs are referred to as Shared BOs.
+During each execbuf submission, the request fence must be added to the
+dma-resv fence list of all shared BOs mapped on the VM.
+
+VM_BIND feature introduces an optimization where user can create BO which
+is private to a specified VM via I915_GEM_CREATE_EXT_VM_PRIVATE flag during
+BO creation. Unlike Shared BOs, these VM private BOs can only be mapped on
+the VM they are private to and can't be dma-buf exported.
+All private BOs of a VM share the dma-resv object. Hence during each execbuf
+submission, they need only one dma-resv fence list updated. Thus, the fast
+path (where required mappings are already bound) submission latency is O(1)
+w.r.t the number of VM private BOs.
+
+VM_BIND locking hirarchy
+-------------------------
+The locking design here supports the older (execlist based) execbuf mode, the
+newer VM_BIND mode, the VM_BIND mode with GPU page faults and possible future
+system allocator support (See `Shared Virtual Memory (SVM) support`_).
+The older execbuf mode and the newer VM_BIND mode without page faults manages
+residency of backing storage using dma_fence. The VM_BIND mode with page faults
+and the system allocator support do not use any dma_fence at all.
+
+VM_BIND locking order is as below.
+
+1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
+   vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
+   mapping.
+
+   In future, when GPU page faults are supported, we can potentially use a
+   rwsem instead, so that multiple page fault handlers can take the read side
+   lock to lookup the mapping and hence can run in parallel.
+   The older execbuf mode of binding do not need this lock.
+
+2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs to
+   be held while binding/unbinding a vma in the async worker and while updating
+   dma-resv fence list of an object. Note that private BOs of a VM will all
+   share a dma-resv object.
+
+   The future system allocator support will use the HMM prescribed locking
+   instead.
+
+3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
+   invalidated vmas (due to eviction and userptr invalidation) etc.
+
+When GPU page faults are supported, the execbuf path do not take any of these
+locks. There we will simply smash the new batch buffer address into the ring and
+then tell the scheduler run that. The lock taking only happens from the page
+fault handler, where we take lock-A in read mode, whichever lock-B we need to
+find the backing storage (dma_resv lock for gem objects, and hmm/core mm for
+system allocator) and some additional locks (lock-D) for taking care of page
+table races. Page fault mode should not need to ever manipulate the vm lists,
+so won't ever need lock-C.
+
+VM_BIND LRU handling
+---------------------
+We need to ensure VM_BIND mapped objects are properly LRU tagged to avoid
+performance degradation. We will also need support for bulk LRU movement of
+VM_BIND objects to avoid additional latencies in execbuf path.
+
+The page table pages are similar to VM_BIND mapped objects (See
+`Evictable page table allocations`_) and are maintained per VM and needs to
+be pinned in memory when VM is made active (ie., upon an execbuf call with
+that VM). So, bulk LRU movement of page table pages is also needed.
+
+VM_BIND dma_resv usage
+-----------------------
+Fences needs to be added to all VM_BIND mapped objects. During each execbuf
+submission, they are added with DMA_RESV_USAGE_BOOKKEEP usage to prevent
+over sync (See enum dma_resv_usage). One can override it with either
+DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE usage during explicit object
+dependency setting.
+
+Note that DRM_I915_GEM_WAIT and DRM_I915_GEM_BUSY ioctls do not check for
+DMA_RESV_USAGE_BOOKKEEP usage and hence should not be used for end of batch
+check. Instead, the execbuf3 out fence should be used for end of batch check
+(See struct drm_i915_gem_execbuffer3).
+
+Also, in VM_BIND mode, use dma-resv apis for determining object activeness
+(See dma_resv_test_signaled() and dma_resv_wait_timeout()) and do not use the
+older i915_vma active reference tracking which is deprecated. This should be
+easier to get it working with the current TTM backend.
+
+Mesa use case
+--------------
+VM_BIND can potentially reduce the CPU overhead in Mesa (both Vulkan and Iris),
+hence improving performance of CPU-bound applications. It also allows us to
+implement Vulkan's Sparse Resources. With increasing GPU hardware performance,
+reducing CPU overhead becomes more impactful.
+
+
+Other VM_BIND use cases
+========================
+
+Long running Compute contexts
+------------------------------
+Usage of dma-fence expects that they complete in reasonable amount of time.
+Compute on the other hand can be long running. Hence it is appropriate for
+compute to use user/memory fence (See `User/Memory Fence`_) and dma-fence usage
+must be limited to in-kernel consumption only.
+
+Where GPU page faults are not available, kernel driver upon buffer invalidation
+will initiate a suspend (preemption) of long running context, finish the
+invalidation, revalidate the BO and then resume the compute context. This is
+done by having a per-context preempt fence which is enabled when someone tries
+to wait on it and triggers the context preemption.
+
+User/Memory Fence
+~~~~~~~~~~~~~~~~~~
+User/Memory fence is a <address, value> pair. To signal the user fence, the
+specified value will be written at the specified virtual address and wakeup the
+waiting process. User fence can be signaled either by the GPU or kernel async
+worker (like upon bind completion). User can wait on a user fence with a new
+user fence wait ioctl.
+
+Here is some prior work on this:
+https://patchwork.freedesktop.org/patch/349417/
+
+Low Latency Submission
+~~~~~~~~~~~~~~~~~~~~~~~
+Allows compute UMD to directly submit GPU jobs instead of through execbuf
+ioctl. This is made possible by VM_BIND is not being synchronized against
+execbuf. VM_BIND allows bind/unbind of mappings required for the directly
+submitted jobs.
+
+Debugger
+---------
+With debug event interface user space process (debugger) is able to keep track
+of and act upon resources created by another process (debugged) and attached
+to GPU via vm_bind interface.
+
+GPU page faults
+----------------
+GPU page faults when supported (in future), will only be supported in the
+VM_BIND mode. While both the older execbuf mode and the newer VM_BIND mode of
+binding will require using dma-fence to ensure residency, the GPU page faults
+mode when supported, will not use any dma-fence as residency is purely managed
+by installing and removing/invalidating page table entries.
+
+Page level hints settings
+--------------------------
+VM_BIND allows any hints setting per mapping instead of per BO.
+Possible hints include read-only mapping, placement and atomicity.
+Sub-BO level placement hint will be even more relevant with
+upcoming GPU on-demand page fault support.
+
+Page level Cache/CLOS settings
+-------------------------------
+VM_BIND allows cache/CLOS settings per mapping instead of per BO.
+
+Evictable page table allocations
+---------------------------------
+Make pagetable allocations evictable and manage them similar to VM_BIND
+mapped objects. Page table pages are similar to persistent mappings of a
+VM (difference here are that the page table pages will not have an i915_vma
+structure and after swapping pages back in, parent page link needs to be
+updated).
+
+Shared Virtual Memory (SVM) support
+------------------------------------
+VM_BIND interface can be used to map system memory directly (without gem BO
+abstraction) using the HMM interface. SVM is only supported with GPU page
+faults enabled.
+
+VM_BIND UAPI
+=============
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_vm_bind.h
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..7d10c36b268d 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -23,3 +23,7 @@ host such documentation:
 .. toctree::
 
     i915_scheduler.rst
+
+.. toctree::
+
+    i915_vm_bind.rst
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v6 2/3] drm/i915: Update i915 uapi documentation
  2022-06-26  1:49 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-06-26  1:49   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-26  1:49 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, lionel.g.landwerlin,
	tvrtko.ursulin, chris.p.wilson, thomas.hellstrom, oak.zeng,
	matthew.auld, jason, daniel.vetter, christian.koenig

Add some missing i915 upai documentation which the new
i915 VM_BIND feature documentation will be refer to.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 include/uapi/drm/i915_drm.h | 205 ++++++++++++++++++++++++++++--------
 1 file changed, 160 insertions(+), 45 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index de49b68b4fc8..4afe95d8b98b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -751,14 +751,27 @@ typedef struct drm_i915_irq_wait {
 
 /* Must be kept compact -- no holes and well documented */
 
-typedef struct drm_i915_getparam {
+/**
+ * struct drm_i915_getparam - Driver parameter query structure.
+ */
+struct drm_i915_getparam {
+	/** @param: Driver parameter to query. */
 	__s32 param;
-	/*
+
+	/**
+	 * @value: Address of memory where queried value should be put.
+	 *
 	 * WARNING: Using pointers instead of fixed-size u64 means we need to write
 	 * compat32 code. Don't repeat this mistake.
 	 */
 	int __user *value;
-} drm_i915_getparam_t;
+};
+
+/**
+ * typedef drm_i915_getparam_t - Driver parameter query structure.
+ * See struct drm_i915_getparam.
+ */
+typedef struct drm_i915_getparam drm_i915_getparam_t;
 
 /* Ioctl to set kernel params:
  */
@@ -1239,76 +1252,119 @@ struct drm_i915_gem_exec_object2 {
 	__u64 rsvd2;
 };
 
+/**
+ * struct drm_i915_gem_exec_fence - An input or output fence for the execbuf
+ * ioctl.
+ *
+ * The request will wait for input fence to signal before submission.
+ *
+ * The returned output fence will be signaled after the completion of the
+ * request.
+ */
 struct drm_i915_gem_exec_fence {
-	/**
-	 * User's handle for a drm_syncobj to wait on or signal.
-	 */
+	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
 	__u32 handle;
 
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_EXEC_FENCE_WAIT:
+	 * Wait for the input fence before request submission.
+	 *
+	 * I915_EXEC_FENCE_SIGNAL:
+	 * Return request completion fence as output
+	 */
+	__u32 flags;
 #define I915_EXEC_FENCE_WAIT            (1<<0)
 #define I915_EXEC_FENCE_SIGNAL          (1<<1)
 #define __I915_EXEC_FENCE_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SIGNAL << 1))
-	__u32 flags;
 };
 
-/*
- * See drm_i915_gem_execbuffer_ext_timeline_fences.
- */
-#define DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES 0
-
-/*
+/**
+ * struct drm_i915_gem_execbuffer_ext_timeline_fences - Timeline fences
+ * for execbuf ioctl.
+ *
  * This structure describes an array of drm_syncobj and associated points for
  * timeline variants of drm_syncobj. It is invalid to append this structure to
  * the execbuf if I915_EXEC_FENCE_ARRAY is set.
  */
 struct drm_i915_gem_execbuffer_ext_timeline_fences {
+#define DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES 0
+	/** @base: Extension link. See struct i915_user_extension. */
 	struct i915_user_extension base;
 
 	/**
-	 * Number of element in the handles_ptr & value_ptr arrays.
+	 * @fence_count: Number of elements in the @handles_ptr & @value_ptr
+	 * arrays.
 	 */
 	__u64 fence_count;
 
 	/**
-	 * Pointer to an array of struct drm_i915_gem_exec_fence of length
-	 * fence_count.
+	 * @handles_ptr: Pointer to an array of struct drm_i915_gem_exec_fence
+	 * of length @fence_count.
 	 */
 	__u64 handles_ptr;
 
 	/**
-	 * Pointer to an array of u64 values of length fence_count. Values
-	 * must be 0 for a binary drm_syncobj. A Value of 0 for a timeline
-	 * drm_syncobj is invalid as it turns a drm_syncobj into a binary one.
+	 * @values_ptr: Pointer to an array of u64 values of length
+	 * @fence_count.
+	 * Values must be 0 for a binary drm_syncobj. A Value of 0 for a
+	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+	 * binary one.
 	 */
 	__u64 values_ptr;
 };
 
+/**
+ * struct drm_i915_gem_execbuffer2 - Structure for DRM_I915_GEM_EXECBUFFER2
+ * ioctl.
+ */
 struct drm_i915_gem_execbuffer2 {
-	/**
-	 * List of gem_exec_object2 structs
-	 */
+	/** @buffers_ptr: Pointer to a list of gem_exec_object2 structs */
 	__u64 buffers_ptr;
+
+	/** @buffer_count: Number of elements in @buffers_ptr array */
 	__u32 buffer_count;
 
-	/** Offset in the batchbuffer to start execution from. */
+	/**
+	 * @batch_start_offset: Offset in the batchbuffer to start execution
+	 * from.
+	 */
 	__u32 batch_start_offset;
-	/** Bytes used in batchbuffer from batch_start_offset */
+
+	/**
+	 * @batch_len: Length in bytes of the batch buffer, starting from the
+	 * @batch_start_offset. If 0, length is assumed to be the batch buffer
+	 * object size.
+	 */
 	__u32 batch_len;
+
+	/** @DR1: deprecated */
 	__u32 DR1;
+
+	/** @DR4: deprecated */
 	__u32 DR4;
+
+	/** @num_cliprects: See @cliprects_ptr */
 	__u32 num_cliprects;
+
 	/**
-	 * This is a struct drm_clip_rect *cliprects if I915_EXEC_FENCE_ARRAY
-	 * & I915_EXEC_USE_EXTENSIONS are not set.
+	 * @cliprects_ptr: Kernel clipping was a DRI1 misfeature.
+	 *
+	 * It is invalid to use this field if I915_EXEC_FENCE_ARRAY or
+	 * I915_EXEC_USE_EXTENSIONS flags are not set.
 	 *
 	 * If I915_EXEC_FENCE_ARRAY is set, then this is a pointer to an array
-	 * of struct drm_i915_gem_exec_fence and num_cliprects is the length
-	 * of the array.
+	 * of &drm_i915_gem_exec_fence and @num_cliprects is the length of the
+	 * array.
 	 *
 	 * If I915_EXEC_USE_EXTENSIONS is set, then this is a pointer to a
-	 * single struct i915_user_extension and num_cliprects is 0.
+	 * single &i915_user_extension and num_cliprects is 0.
 	 */
 	__u64 cliprects_ptr;
+
+	/** @flags: Execbuf flags */
+	__u64 flags;
 #define I915_EXEC_RING_MASK              (0x3f)
 #define I915_EXEC_DEFAULT                (0<<0)
 #define I915_EXEC_RENDER                 (1<<0)
@@ -1326,10 +1382,6 @@ struct drm_i915_gem_execbuffer2 {
 #define I915_EXEC_CONSTANTS_REL_GENERAL (0<<6) /* default */
 #define I915_EXEC_CONSTANTS_ABSOLUTE 	(1<<6)
 #define I915_EXEC_CONSTANTS_REL_SURFACE (2<<6) /* gen4/5 only */
-	__u64 flags;
-	__u64 rsvd1; /* now used for context info */
-	__u64 rsvd2;
-};
 
 /** Resets the SO write offset registers for transform feedback on gen7. */
 #define I915_EXEC_GEN7_SOL_RESET	(1<<8)
@@ -1432,9 +1484,23 @@ struct drm_i915_gem_execbuffer2 {
  * drm_i915_gem_execbuffer_ext enum.
  */
 #define I915_EXEC_USE_EXTENSIONS	(1 << 21)
-
 #define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_USE_EXTENSIONS << 1))
 
+	/** @rsvd1: Context id */
+	__u64 rsvd1;
+
+	/**
+	 * @rsvd2: in and out sync_file file descriptors.
+	 *
+	 * When I915_EXEC_FENCE_IN or I915_EXEC_FENCE_SUBMIT flag is set, the
+	 * lower 32 bits of this field will have the in sync_file fd (input).
+	 *
+	 * When I915_EXEC_FENCE_OUT flag is set, the upper 32 bits of this
+	 * field will have the out sync_file fd (output).
+	 */
+	__u64 rsvd2;
+};
+
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
 	(eb2).rsvd1 = context & I915_EXEC_CONTEXT_ID_MASK
@@ -1814,19 +1880,58 @@ struct drm_i915_gem_context_create {
 	__u32 pad;
 };
 
+/**
+ * struct drm_i915_gem_context_create_ext - Structure for creating contexts.
+ */
 struct drm_i915_gem_context_create_ext {
-	__u32 ctx_id; /* output: id of new context*/
+	/** @ctx_id: Id of the created context (output) */
+	__u32 ctx_id;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS:
+	 *
+	 * Extensions may be appended to this structure and driver must check
+	 * for those. See @extensions.
+	 *
+	 * I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE
+	 *
+	 * Created context will have single timeline.
+	 */
 	__u32 flags;
 #define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS	(1u << 0)
 #define I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE	(1u << 1)
 #define I915_CONTEXT_CREATE_FLAGS_UNKNOWN \
 	(-(I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE << 1))
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * I915_CONTEXT_CREATE_EXT_SETPARAM:
+	 * Context parameter to set or query during context creation.
+	 * See struct drm_i915_gem_context_create_ext_setparam.
+	 *
+	 * I915_CONTEXT_CREATE_EXT_CLONE:
+	 * This extension has been removed. On the off chance someone somewhere
+	 * has attempted to use it, never re-use this extension number.
+	 */
 	__u64 extensions;
+#define I915_CONTEXT_CREATE_EXT_SETPARAM 0
+#define I915_CONTEXT_CREATE_EXT_CLONE 1
 };
 
+/**
+ * struct drm_i915_gem_context_param - Context parameter to set or query.
+ */
 struct drm_i915_gem_context_param {
+	/** @ctx_id: Context id */
 	__u32 ctx_id;
+
+	/** @size: Size of the parameter @value */
 	__u32 size;
+
+	/** @param: Parameter to set or query */
 	__u64 param;
 #define I915_CONTEXT_PARAM_BAN_PERIOD	0x1
 /* I915_CONTEXT_PARAM_NO_ZEROMAP has been removed.  On the off chance
@@ -1973,6 +2078,7 @@ struct drm_i915_gem_context_param {
 #define I915_CONTEXT_PARAM_PROTECTED_CONTENT    0xd
 /* Must be kept compact -- no holes and well documented */
 
+	/** @value: Context parameter value to be set or queried */
 	__u64 value;
 };
 
@@ -2371,23 +2477,29 @@ struct i915_context_param_engines {
 	struct i915_engine_class_instance engines[N__]; \
 } __attribute__((packed)) name__
 
+/**
+ * struct drm_i915_gem_context_create_ext_setparam - Context parameter
+ * to set or query during context creation.
+ */
 struct drm_i915_gem_context_create_ext_setparam {
-#define I915_CONTEXT_CREATE_EXT_SETPARAM 0
+	/** @base: Extension link. See struct i915_user_extension. */
 	struct i915_user_extension base;
+
+	/**
+	 * @param: Context parameter to set or query.
+	 * See struct drm_i915_gem_context_param.
+	 */
 	struct drm_i915_gem_context_param param;
 };
 
-/* This API has been removed.  On the off chance someone somewhere has
- * attempted to use it, never re-use this extension number.
- */
-#define I915_CONTEXT_CREATE_EXT_CLONE 1
-
 struct drm_i915_gem_context_destroy {
 	__u32 ctx_id;
 	__u32 pad;
 };
 
-/*
+/**
+ * struct drm_i915_gem_vm_control - Structure to create or destroy VM.
+ *
  * DRM_I915_GEM_VM_CREATE -
  *
  * Create a new virtual memory address space (ppGTT) for use within a context
@@ -2397,20 +2509,23 @@ struct drm_i915_gem_context_destroy {
  * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
  * returned in the outparam @id.
  *
- * No flags are defined, with all bits reserved and must be zero.
- *
  * An extension chain maybe provided, starting with @extensions, and terminated
  * by the @next_extension being 0. Currently, no extensions are defined.
  *
  * DRM_I915_GEM_VM_DESTROY -
  *
- * Destroys a previously created VM id, specified in @id.
+ * Destroys a previously created VM id, specified in @vm_id.
  *
  * No extensions or flags are allowed currently, and so must be zero.
  */
 struct drm_i915_gem_vm_control {
+	/** @extensions: Zero-terminated chain of extensions. */
 	__u64 extensions;
+
+	/** @flags: reserved for future usage, currently MBZ */
 	__u32 flags;
+
+	/** @vm_id: Id of the VM created or to be destroyed */
 	__u32 vm_id;
 };
 
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH v6 2/3] drm/i915: Update i915 uapi documentation
@ 2022-06-26  1:49   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-26  1:49 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, chris.p.wilson, thomas.hellstrom, matthew.auld,
	daniel.vetter, christian.koenig

Add some missing i915 upai documentation which the new
i915 VM_BIND feature documentation will be refer to.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 include/uapi/drm/i915_drm.h | 205 ++++++++++++++++++++++++++++--------
 1 file changed, 160 insertions(+), 45 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index de49b68b4fc8..4afe95d8b98b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -751,14 +751,27 @@ typedef struct drm_i915_irq_wait {
 
 /* Must be kept compact -- no holes and well documented */
 
-typedef struct drm_i915_getparam {
+/**
+ * struct drm_i915_getparam - Driver parameter query structure.
+ */
+struct drm_i915_getparam {
+	/** @param: Driver parameter to query. */
 	__s32 param;
-	/*
+
+	/**
+	 * @value: Address of memory where queried value should be put.
+	 *
 	 * WARNING: Using pointers instead of fixed-size u64 means we need to write
 	 * compat32 code. Don't repeat this mistake.
 	 */
 	int __user *value;
-} drm_i915_getparam_t;
+};
+
+/**
+ * typedef drm_i915_getparam_t - Driver parameter query structure.
+ * See struct drm_i915_getparam.
+ */
+typedef struct drm_i915_getparam drm_i915_getparam_t;
 
 /* Ioctl to set kernel params:
  */
@@ -1239,76 +1252,119 @@ struct drm_i915_gem_exec_object2 {
 	__u64 rsvd2;
 };
 
+/**
+ * struct drm_i915_gem_exec_fence - An input or output fence for the execbuf
+ * ioctl.
+ *
+ * The request will wait for input fence to signal before submission.
+ *
+ * The returned output fence will be signaled after the completion of the
+ * request.
+ */
 struct drm_i915_gem_exec_fence {
-	/**
-	 * User's handle for a drm_syncobj to wait on or signal.
-	 */
+	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
 	__u32 handle;
 
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_EXEC_FENCE_WAIT:
+	 * Wait for the input fence before request submission.
+	 *
+	 * I915_EXEC_FENCE_SIGNAL:
+	 * Return request completion fence as output
+	 */
+	__u32 flags;
 #define I915_EXEC_FENCE_WAIT            (1<<0)
 #define I915_EXEC_FENCE_SIGNAL          (1<<1)
 #define __I915_EXEC_FENCE_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SIGNAL << 1))
-	__u32 flags;
 };
 
-/*
- * See drm_i915_gem_execbuffer_ext_timeline_fences.
- */
-#define DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES 0
-
-/*
+/**
+ * struct drm_i915_gem_execbuffer_ext_timeline_fences - Timeline fences
+ * for execbuf ioctl.
+ *
  * This structure describes an array of drm_syncobj and associated points for
  * timeline variants of drm_syncobj. It is invalid to append this structure to
  * the execbuf if I915_EXEC_FENCE_ARRAY is set.
  */
 struct drm_i915_gem_execbuffer_ext_timeline_fences {
+#define DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES 0
+	/** @base: Extension link. See struct i915_user_extension. */
 	struct i915_user_extension base;
 
 	/**
-	 * Number of element in the handles_ptr & value_ptr arrays.
+	 * @fence_count: Number of elements in the @handles_ptr & @value_ptr
+	 * arrays.
 	 */
 	__u64 fence_count;
 
 	/**
-	 * Pointer to an array of struct drm_i915_gem_exec_fence of length
-	 * fence_count.
+	 * @handles_ptr: Pointer to an array of struct drm_i915_gem_exec_fence
+	 * of length @fence_count.
 	 */
 	__u64 handles_ptr;
 
 	/**
-	 * Pointer to an array of u64 values of length fence_count. Values
-	 * must be 0 for a binary drm_syncobj. A Value of 0 for a timeline
-	 * drm_syncobj is invalid as it turns a drm_syncobj into a binary one.
+	 * @values_ptr: Pointer to an array of u64 values of length
+	 * @fence_count.
+	 * Values must be 0 for a binary drm_syncobj. A Value of 0 for a
+	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+	 * binary one.
 	 */
 	__u64 values_ptr;
 };
 
+/**
+ * struct drm_i915_gem_execbuffer2 - Structure for DRM_I915_GEM_EXECBUFFER2
+ * ioctl.
+ */
 struct drm_i915_gem_execbuffer2 {
-	/**
-	 * List of gem_exec_object2 structs
-	 */
+	/** @buffers_ptr: Pointer to a list of gem_exec_object2 structs */
 	__u64 buffers_ptr;
+
+	/** @buffer_count: Number of elements in @buffers_ptr array */
 	__u32 buffer_count;
 
-	/** Offset in the batchbuffer to start execution from. */
+	/**
+	 * @batch_start_offset: Offset in the batchbuffer to start execution
+	 * from.
+	 */
 	__u32 batch_start_offset;
-	/** Bytes used in batchbuffer from batch_start_offset */
+
+	/**
+	 * @batch_len: Length in bytes of the batch buffer, starting from the
+	 * @batch_start_offset. If 0, length is assumed to be the batch buffer
+	 * object size.
+	 */
 	__u32 batch_len;
+
+	/** @DR1: deprecated */
 	__u32 DR1;
+
+	/** @DR4: deprecated */
 	__u32 DR4;
+
+	/** @num_cliprects: See @cliprects_ptr */
 	__u32 num_cliprects;
+
 	/**
-	 * This is a struct drm_clip_rect *cliprects if I915_EXEC_FENCE_ARRAY
-	 * & I915_EXEC_USE_EXTENSIONS are not set.
+	 * @cliprects_ptr: Kernel clipping was a DRI1 misfeature.
+	 *
+	 * It is invalid to use this field if I915_EXEC_FENCE_ARRAY or
+	 * I915_EXEC_USE_EXTENSIONS flags are not set.
 	 *
 	 * If I915_EXEC_FENCE_ARRAY is set, then this is a pointer to an array
-	 * of struct drm_i915_gem_exec_fence and num_cliprects is the length
-	 * of the array.
+	 * of &drm_i915_gem_exec_fence and @num_cliprects is the length of the
+	 * array.
 	 *
 	 * If I915_EXEC_USE_EXTENSIONS is set, then this is a pointer to a
-	 * single struct i915_user_extension and num_cliprects is 0.
+	 * single &i915_user_extension and num_cliprects is 0.
 	 */
 	__u64 cliprects_ptr;
+
+	/** @flags: Execbuf flags */
+	__u64 flags;
 #define I915_EXEC_RING_MASK              (0x3f)
 #define I915_EXEC_DEFAULT                (0<<0)
 #define I915_EXEC_RENDER                 (1<<0)
@@ -1326,10 +1382,6 @@ struct drm_i915_gem_execbuffer2 {
 #define I915_EXEC_CONSTANTS_REL_GENERAL (0<<6) /* default */
 #define I915_EXEC_CONSTANTS_ABSOLUTE 	(1<<6)
 #define I915_EXEC_CONSTANTS_REL_SURFACE (2<<6) /* gen4/5 only */
-	__u64 flags;
-	__u64 rsvd1; /* now used for context info */
-	__u64 rsvd2;
-};
 
 /** Resets the SO write offset registers for transform feedback on gen7. */
 #define I915_EXEC_GEN7_SOL_RESET	(1<<8)
@@ -1432,9 +1484,23 @@ struct drm_i915_gem_execbuffer2 {
  * drm_i915_gem_execbuffer_ext enum.
  */
 #define I915_EXEC_USE_EXTENSIONS	(1 << 21)
-
 #define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_USE_EXTENSIONS << 1))
 
+	/** @rsvd1: Context id */
+	__u64 rsvd1;
+
+	/**
+	 * @rsvd2: in and out sync_file file descriptors.
+	 *
+	 * When I915_EXEC_FENCE_IN or I915_EXEC_FENCE_SUBMIT flag is set, the
+	 * lower 32 bits of this field will have the in sync_file fd (input).
+	 *
+	 * When I915_EXEC_FENCE_OUT flag is set, the upper 32 bits of this
+	 * field will have the out sync_file fd (output).
+	 */
+	__u64 rsvd2;
+};
+
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
 	(eb2).rsvd1 = context & I915_EXEC_CONTEXT_ID_MASK
@@ -1814,19 +1880,58 @@ struct drm_i915_gem_context_create {
 	__u32 pad;
 };
 
+/**
+ * struct drm_i915_gem_context_create_ext - Structure for creating contexts.
+ */
 struct drm_i915_gem_context_create_ext {
-	__u32 ctx_id; /* output: id of new context*/
+	/** @ctx_id: Id of the created context (output) */
+	__u32 ctx_id;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS:
+	 *
+	 * Extensions may be appended to this structure and driver must check
+	 * for those. See @extensions.
+	 *
+	 * I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE
+	 *
+	 * Created context will have single timeline.
+	 */
 	__u32 flags;
 #define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS	(1u << 0)
 #define I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE	(1u << 1)
 #define I915_CONTEXT_CREATE_FLAGS_UNKNOWN \
 	(-(I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE << 1))
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * I915_CONTEXT_CREATE_EXT_SETPARAM:
+	 * Context parameter to set or query during context creation.
+	 * See struct drm_i915_gem_context_create_ext_setparam.
+	 *
+	 * I915_CONTEXT_CREATE_EXT_CLONE:
+	 * This extension has been removed. On the off chance someone somewhere
+	 * has attempted to use it, never re-use this extension number.
+	 */
 	__u64 extensions;
+#define I915_CONTEXT_CREATE_EXT_SETPARAM 0
+#define I915_CONTEXT_CREATE_EXT_CLONE 1
 };
 
+/**
+ * struct drm_i915_gem_context_param - Context parameter to set or query.
+ */
 struct drm_i915_gem_context_param {
+	/** @ctx_id: Context id */
 	__u32 ctx_id;
+
+	/** @size: Size of the parameter @value */
 	__u32 size;
+
+	/** @param: Parameter to set or query */
 	__u64 param;
 #define I915_CONTEXT_PARAM_BAN_PERIOD	0x1
 /* I915_CONTEXT_PARAM_NO_ZEROMAP has been removed.  On the off chance
@@ -1973,6 +2078,7 @@ struct drm_i915_gem_context_param {
 #define I915_CONTEXT_PARAM_PROTECTED_CONTENT    0xd
 /* Must be kept compact -- no holes and well documented */
 
+	/** @value: Context parameter value to be set or queried */
 	__u64 value;
 };
 
@@ -2371,23 +2477,29 @@ struct i915_context_param_engines {
 	struct i915_engine_class_instance engines[N__]; \
 } __attribute__((packed)) name__
 
+/**
+ * struct drm_i915_gem_context_create_ext_setparam - Context parameter
+ * to set or query during context creation.
+ */
 struct drm_i915_gem_context_create_ext_setparam {
-#define I915_CONTEXT_CREATE_EXT_SETPARAM 0
+	/** @base: Extension link. See struct i915_user_extension. */
 	struct i915_user_extension base;
+
+	/**
+	 * @param: Context parameter to set or query.
+	 * See struct drm_i915_gem_context_param.
+	 */
 	struct drm_i915_gem_context_param param;
 };
 
-/* This API has been removed.  On the off chance someone somewhere has
- * attempted to use it, never re-use this extension number.
- */
-#define I915_CONTEXT_CREATE_EXT_CLONE 1
-
 struct drm_i915_gem_context_destroy {
 	__u32 ctx_id;
 	__u32 pad;
 };
 
-/*
+/**
+ * struct drm_i915_gem_vm_control - Structure to create or destroy VM.
+ *
  * DRM_I915_GEM_VM_CREATE -
  *
  * Create a new virtual memory address space (ppGTT) for use within a context
@@ -2397,20 +2509,23 @@ struct drm_i915_gem_context_destroy {
  * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
  * returned in the outparam @id.
  *
- * No flags are defined, with all bits reserved and must be zero.
- *
  * An extension chain maybe provided, starting with @extensions, and terminated
  * by the @next_extension being 0. Currently, no extensions are defined.
  *
  * DRM_I915_GEM_VM_DESTROY -
  *
- * Destroys a previously created VM id, specified in @id.
+ * Destroys a previously created VM id, specified in @vm_id.
  *
  * No extensions or flags are allowed currently, and so must be zero.
  */
 struct drm_i915_gem_vm_control {
+	/** @extensions: Zero-terminated chain of extensions. */
 	__u64 extensions;
+
+	/** @flags: reserved for future usage, currently MBZ */
 	__u32 flags;
+
+	/** @vm_id: Id of the VM created or to be destroyed */
 	__u32 vm_id;
 };
 
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-26  1:49 ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-06-26  1:49   ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-26  1:49 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: matthew.brost, paulo.r.zanoni, lionel.g.landwerlin,
	tvrtko.ursulin, chris.p.wilson, thomas.hellstrom, oak.zeng,
	matthew.auld, jason, daniel.vetter, christian.koenig

VM_BIND and related uapi definitions

v2: Reduce the scope to simple Mesa use case.
v3: Expand VM_UNBIND documentation and add
    I915_GEM_VM_BIND/UNBIND_FENCE_VALID
    and I915_GEM_VM_BIND_TLB_FLUSH flags.
v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
    documentation for vm_bind/unbind.
v5: Remove TLB flush requirement on VM_UNBIND.
    Add version support to stage implementation.
v6: Define and use drm_i915_gem_timeline_fence structure for
    all timeline fences.
v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
    Update documentation on async vm_bind/unbind and versioning.
    Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
    batch_count field and I915_EXEC3_SECURE flag.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
 1 file changed, 280 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
new file mode 100644
index 000000000000..a93e08bceee6
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,280 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_VM_BIND_VERSION
+ *
+ * VM_BIND feature version supported.
+ * See typedef drm_i915_getparam_t param.
+ *
+ * Specifies the VM_BIND feature version supported.
+ * The following versions of VM_BIND have been defined:
+ *
+ * 0: No VM_BIND support.
+ *
+ * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
+ *    previously with VM_BIND, the ioctl will not support unbinding multiple
+ *    mappings or splitting them. Similarly, VM_BIND calls will not replace
+ *    any existing mappings.
+ *
+ * 2: The restrictions on unbinding partial or multiple mappings is
+ *    lifted, Similarly, binding will replace any mappings in the given range.
+ *
+ * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
+ */
+#define I915_PARAM_VM_BIND_VERSION	57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode of operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND	(1 << 0)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND		0x3d
+#define DRM_I915_GEM_VM_UNBIND		0x3e
+#define DRM_I915_GEM_EXECBUFFER3	0x3f
+
+#define DRM_IOCTL_I915_GEM_VM_BIND		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
+
+/**
+ * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
+ *
+ * The operation will wait for input fence to signal.
+ *
+ * The returned output fence will be signaled after the completion of the
+ * operation.
+ */
+struct drm_i915_gem_timeline_fence {
+	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
+	__u32 handle;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_TIMELINE_FENCE_WAIT:
+	 * Wait for the input fence before the operation.
+	 *
+	 * I915_TIMELINE_FENCE_SIGNAL:
+	 * Return operation completion fence as output.
+	 */
+	__u32 flags;
+#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
+#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
+#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
+
+	/**
+	 * @value: A point in the timeline.
+	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+	 * binary one.
+	 */
+	__u64 value;
+};
+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
+ * virtual address (VA) range to the section of an object that should be bound
+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) and can
+ * be mapped to whole object or a section of the object (partial binding).
+ * Multiple VA mappings can be created to the same section of the object
+ * (aliasing).
+ *
+ * The @start, @offset and @length must be 4K page aligned. However the DG2
+ * and XEHPSDV has 64K page size for device local-memory and has compact page
+ * table. On those platforms, for binding device local-memory objects, the
+ * @start must be 2M aligned, @offset and @length must be 64K aligned.
+ * Also, for such mappings, i915 will reserve the whole 2M range for it so as
+ * to not allow multiple mappings in that 2M range (Compact page tables do not
+ * allow 64K page and 4K page bindings in the same 2M range).
+ *
+ * Error code -EINVAL will be returned if @start, @offset and @length are not
+ * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
+ * -ENOSPC will be returned if the VA range specified can't be reserved.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_BIND operation can be done
+ * asynchronously, if valid @fence is specified.
+ */
+struct drm_i915_gem_vm_bind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @handle: Object handle */
+	__u32 handle;
+
+	/** @start: Virtual Address start to bind */
+	__u64 start;
+
+	/** @offset: Offset in object to bind */
+	__u64 offset;
+
+	/** @length: Length of mapping to bind */
+	__u64 length;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_GEM_VM_BIND_READONLY:
+	 * Mapping is read-only.
+	 *
+	 * I915_GEM_VM_BIND_CAPTURE:
+	 * Capture this mapping in the dump upon GPU error.
+	 */
+	__u64 flags;
+#define I915_GEM_VM_BIND_READONLY	(1 << 1)
+#define I915_GEM_VM_BIND_CAPTURE	(1 << 2)
+
+	/**
+	 * @fence: Timeline fence for bind completion signaling.
+	 *
+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
+	 * is invalid, and an error will be returned.
+	 */
+	struct drm_i915_gem_timeline_fence fence;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
+ *
+ * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
+ * address (VA) range that should be unbound from the device page table of the
+ * specified address space (VM). VM_UNBIND will force unbind the specified
+ * range from device page table without waiting for any GPU job to complete.
+ * It is UMDs responsibility to ensure the mapping is no longer in use before
+ * calling VM_UNBIND.
+ *
+ * If the specified mapping is not found, the ioctl will simply return without
+ * any error.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
+ * asynchronously, if valid @fence is specified.
+ */
+struct drm_i915_gem_vm_unbind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @rsvd: Reserved, MBZ */
+	__u32 rsvd;
+
+	/** @start: Virtual Address start to unbind */
+	__u64 start;
+
+	/** @length: Length of mapping to unbind */
+	__u64 length;
+
+	/** @flags: Currently reserved, MBZ */
+	__u64 flags;
+
+	/**
+	 * @fence: Timeline fence for unbind completion signaling.
+	 *
+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
+	 * is invalid, and an error will be returned.
+	 */
+	struct drm_i915_gem_timeline_fence fence;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
+ * ioctl.
+ *
+ * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
+ * only works with this ioctl for submission.
+ * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
+ */
+struct drm_i915_gem_execbuffer3 {
+	/**
+	 * @ctx_id: Context id
+	 *
+	 * Only contexts with user engine map are allowed.
+	 */
+	__u32 ctx_id;
+
+	/**
+	 * @engine_idx: Engine index
+	 *
+	 * An index in the user engine map of the context specified by @ctx_id.
+	 */
+	__u32 engine_idx;
+
+	/**
+	 * @batch_address: Batch gpu virtual address/es.
+	 *
+	 * For normal submission, it is the gpu virtual address of the batch
+	 * buffer. For parallel submission, it is a pointer to an array of
+	 * batch buffer gpu virtual addresses with array size equal to the
+	 * number of (parallel) engines involved in that submission (See
+	 * struct i915_context_engines_parallel_submit).
+	 */
+	__u64 batch_address;
+
+	/** @flags: Currently reserved, MBZ */
+	__u64 flags;
+
+	/** @rsvd1: Reserved, MBZ */
+	__u32 rsvd1;
+
+	/** @fence_count: Number of fences in @timeline_fences array. */
+	__u32 fence_count;
+
+	/**
+	 * @timeline_fences: Pointer to an array of timeline fences.
+	 *
+	 * Timeline fences are of format struct drm_i915_gem_timeline_fence.
+	 */
+	__u64 timeline_fences;
+
+	/** @rsvd2: Reserved, MBZ */
+	__u64 rsvd2;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
+ * private to the specified VM.
+ *
+ * See struct drm_i915_gem_create_ext.
+ */
+struct drm_i915_gem_create_ext_vm_private {
+#define I915_GEM_CREATE_EXT_VM_PRIVATE		2
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+
+	/** @vm_id: Id of the VM to which the object is private */
+	__u32 vm_id;
+};
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-26  1:49   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-26  1:49 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: paulo.r.zanoni, chris.p.wilson, thomas.hellstrom, matthew.auld,
	daniel.vetter, christian.koenig

VM_BIND and related uapi definitions

v2: Reduce the scope to simple Mesa use case.
v3: Expand VM_UNBIND documentation and add
    I915_GEM_VM_BIND/UNBIND_FENCE_VALID
    and I915_GEM_VM_BIND_TLB_FLUSH flags.
v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
    documentation for vm_bind/unbind.
v5: Remove TLB flush requirement on VM_UNBIND.
    Add version support to stage implementation.
v6: Define and use drm_i915_gem_timeline_fence structure for
    all timeline fences.
v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
    Update documentation on async vm_bind/unbind and versioning.
    Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
    batch_count field and I915_EXEC3_SECURE flag.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
 1 file changed, 280 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
new file mode 100644
index 000000000000..a93e08bceee6
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,280 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_VM_BIND_VERSION
+ *
+ * VM_BIND feature version supported.
+ * See typedef drm_i915_getparam_t param.
+ *
+ * Specifies the VM_BIND feature version supported.
+ * The following versions of VM_BIND have been defined:
+ *
+ * 0: No VM_BIND support.
+ *
+ * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
+ *    previously with VM_BIND, the ioctl will not support unbinding multiple
+ *    mappings or splitting them. Similarly, VM_BIND calls will not replace
+ *    any existing mappings.
+ *
+ * 2: The restrictions on unbinding partial or multiple mappings is
+ *    lifted, Similarly, binding will replace any mappings in the given range.
+ *
+ * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
+ */
+#define I915_PARAM_VM_BIND_VERSION	57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode of operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND	(1 << 0)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND		0x3d
+#define DRM_I915_GEM_VM_UNBIND		0x3e
+#define DRM_I915_GEM_EXECBUFFER3	0x3f
+
+#define DRM_IOCTL_I915_GEM_VM_BIND		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
+
+/**
+ * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
+ *
+ * The operation will wait for input fence to signal.
+ *
+ * The returned output fence will be signaled after the completion of the
+ * operation.
+ */
+struct drm_i915_gem_timeline_fence {
+	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
+	__u32 handle;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_TIMELINE_FENCE_WAIT:
+	 * Wait for the input fence before the operation.
+	 *
+	 * I915_TIMELINE_FENCE_SIGNAL:
+	 * Return operation completion fence as output.
+	 */
+	__u32 flags;
+#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
+#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
+#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
+
+	/**
+	 * @value: A point in the timeline.
+	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+	 * binary one.
+	 */
+	__u64 value;
+};
+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
+ * virtual address (VA) range to the section of an object that should be bound
+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) and can
+ * be mapped to whole object or a section of the object (partial binding).
+ * Multiple VA mappings can be created to the same section of the object
+ * (aliasing).
+ *
+ * The @start, @offset and @length must be 4K page aligned. However the DG2
+ * and XEHPSDV has 64K page size for device local-memory and has compact page
+ * table. On those platforms, for binding device local-memory objects, the
+ * @start must be 2M aligned, @offset and @length must be 64K aligned.
+ * Also, for such mappings, i915 will reserve the whole 2M range for it so as
+ * to not allow multiple mappings in that 2M range (Compact page tables do not
+ * allow 64K page and 4K page bindings in the same 2M range).
+ *
+ * Error code -EINVAL will be returned if @start, @offset and @length are not
+ * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
+ * -ENOSPC will be returned if the VA range specified can't be reserved.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_BIND operation can be done
+ * asynchronously, if valid @fence is specified.
+ */
+struct drm_i915_gem_vm_bind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @handle: Object handle */
+	__u32 handle;
+
+	/** @start: Virtual Address start to bind */
+	__u64 start;
+
+	/** @offset: Offset in object to bind */
+	__u64 offset;
+
+	/** @length: Length of mapping to bind */
+	__u64 length;
+
+	/**
+	 * @flags: Supported flags are:
+	 *
+	 * I915_GEM_VM_BIND_READONLY:
+	 * Mapping is read-only.
+	 *
+	 * I915_GEM_VM_BIND_CAPTURE:
+	 * Capture this mapping in the dump upon GPU error.
+	 */
+	__u64 flags;
+#define I915_GEM_VM_BIND_READONLY	(1 << 1)
+#define I915_GEM_VM_BIND_CAPTURE	(1 << 2)
+
+	/**
+	 * @fence: Timeline fence for bind completion signaling.
+	 *
+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
+	 * is invalid, and an error will be returned.
+	 */
+	struct drm_i915_gem_timeline_fence fence;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
+ *
+ * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
+ * address (VA) range that should be unbound from the device page table of the
+ * specified address space (VM). VM_UNBIND will force unbind the specified
+ * range from device page table without waiting for any GPU job to complete.
+ * It is UMDs responsibility to ensure the mapping is no longer in use before
+ * calling VM_UNBIND.
+ *
+ * If the specified mapping is not found, the ioctl will simply return without
+ * any error.
+ *
+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
+ * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
+ * asynchronously, if valid @fence is specified.
+ */
+struct drm_i915_gem_vm_unbind {
+	/** @vm_id: VM (address space) id to bind */
+	__u32 vm_id;
+
+	/** @rsvd: Reserved, MBZ */
+	__u32 rsvd;
+
+	/** @start: Virtual Address start to unbind */
+	__u64 start;
+
+	/** @length: Length of mapping to unbind */
+	__u64 length;
+
+	/** @flags: Currently reserved, MBZ */
+	__u64 flags;
+
+	/**
+	 * @fence: Timeline fence for unbind completion signaling.
+	 *
+	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
+	 * is invalid, and an error will be returned.
+	 */
+	struct drm_i915_gem_timeline_fence fence;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
+ * ioctl.
+ *
+ * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
+ * only works with this ioctl for submission.
+ * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
+ */
+struct drm_i915_gem_execbuffer3 {
+	/**
+	 * @ctx_id: Context id
+	 *
+	 * Only contexts with user engine map are allowed.
+	 */
+	__u32 ctx_id;
+
+	/**
+	 * @engine_idx: Engine index
+	 *
+	 * An index in the user engine map of the context specified by @ctx_id.
+	 */
+	__u32 engine_idx;
+
+	/**
+	 * @batch_address: Batch gpu virtual address/es.
+	 *
+	 * For normal submission, it is the gpu virtual address of the batch
+	 * buffer. For parallel submission, it is a pointer to an array of
+	 * batch buffer gpu virtual addresses with array size equal to the
+	 * number of (parallel) engines involved in that submission (See
+	 * struct i915_context_engines_parallel_submit).
+	 */
+	__u64 batch_address;
+
+	/** @flags: Currently reserved, MBZ */
+	__u64 flags;
+
+	/** @rsvd1: Reserved, MBZ */
+	__u32 rsvd1;
+
+	/** @fence_count: Number of fences in @timeline_fences array. */
+	__u32 fence_count;
+
+	/**
+	 * @timeline_fences: Pointer to an array of timeline fences.
+	 *
+	 * Timeline fences are of format struct drm_i915_gem_timeline_fence.
+	 */
+	__u64 timeline_fences;
+
+	/** @rsvd2: Reserved, MBZ */
+	__u64 rsvd2;
+
+	/**
+	 * @extensions: Zero-terminated chain of extensions.
+	 *
+	 * For future extensions. See struct i915_user_extension.
+	 */
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
+ * private to the specified VM.
+ *
+ * See struct drm_i915_gem_create_ext.
+ */
+struct drm_i915_gem_create_ext_vm_private {
+#define I915_GEM_CREATE_EXT_VM_PRIVATE		2
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+
+	/** @vm_id: Id of the VM to which the object is private */
+	__u32 vm_id;
+};
-- 
2.21.0.rc0.32.g243a4c7e27


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/doc/rfc: i915 VM_BIND feature design + uapi
  2022-06-26  1:49 ` [Intel-gfx] " Niranjana Vishwanathapura
                   ` (3 preceding siblings ...)
  (?)
@ 2022-06-26  2:03 ` Patchwork
  -1 siblings, 0 replies; 53+ messages in thread
From: Patchwork @ 2022-06-26  2:03 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-gfx

== Series Details ==

Series: drm/doc/rfc: i915 VM_BIND feature design + uapi
URL   : https://patchwork.freedesktop.org/series/105635/
State : warning

== Summary ==

Error: dim checkpatch failed
66c4704a6b86 drm/doc/rfc: VM_BIND feature design document
-:18: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#18: 
new file mode 100644

-:23: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#23: FILE: Documentation/gpu/rfc/i915_vm_bind.rst:1:
+==========================================

total: 0 errors, 2 warnings, 0 checks, 253 lines checked
d6790e27d31d drm/i915: Update i915 uapi documentation
-:44: WARNING:NEW_TYPEDEFS: do not add new typedefs
#44: FILE: include/uapi/drm/i915_drm.h:774:
+typedef struct drm_i915_getparam drm_i915_getparam_t;

total: 0 errors, 1 warnings, 0 checks, 337 lines checked
dbf1da3aeb67 drm/doc/rfc: VM_BIND uapi definition
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in <module>
    from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
-:27: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#27: 
new file mode 100644

-:77: WARNING:LONG_LINE: line length of 126 exceeds 100 columns
#77: FILE: Documentation/gpu/rfc/i915_vm_bind.h:46:
+#define DRM_IOCTL_I915_GEM_VM_BIND		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)

-:78: WARNING:LONG_LINE: line length of 128 exceeds 100 columns
#78: FILE: Documentation/gpu/rfc/i915_vm_bind.h:47:
+#define DRM_IOCTL_I915_GEM_VM_UNBIND		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)

-:79: WARNING:LONG_LINE: line length of 134 exceeds 100 columns
#79: FILE: Documentation/gpu/rfc/i915_vm_bind.h:48:
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)

total: 0 errors, 4 warnings, 0 checks, 280 lines checked



^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/doc/rfc: i915 VM_BIND feature design + uapi
  2022-06-26  1:49 ` [Intel-gfx] " Niranjana Vishwanathapura
                   ` (4 preceding siblings ...)
  (?)
@ 2022-06-26  2:03 ` Patchwork
  -1 siblings, 0 replies; 53+ messages in thread
From: Patchwork @ 2022-06-26  2:03 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-gfx

== Series Details ==

Series: drm/doc/rfc: i915 VM_BIND feature design + uapi
URL   : https://patchwork.freedesktop.org/series/105635/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/doc/rfc: i915 VM_BIND feature design + uapi
  2022-06-26  1:49 ` [Intel-gfx] " Niranjana Vishwanathapura
                   ` (5 preceding siblings ...)
  (?)
@ 2022-06-26  2:25 ` Patchwork
  -1 siblings, 0 replies; 53+ messages in thread
From: Patchwork @ 2022-06-26  2:25 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 7777 bytes --]

== Series Details ==

Series: drm/doc/rfc: i915 VM_BIND feature design + uapi
URL   : https://patchwork.freedesktop.org/series/105635/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11805 -> Patchwork_105635v1
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/index.html

Participating hosts (37 -> 38)
------------------------------

  Additional (1): fi-tgl-u2 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_105635v1:

### IGT changes ###

#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_selftest@live@uncore:
    - {bat-adln-1}:       NOTRUN -> [DMESG-FAIL][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/bat-adln-1/igt@i915_selftest@live@uncore.html

  
Known issues
------------

  Here are the changes found in Patchwork_105635v1 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_huc_copy@huc-copy:
    - fi-tgl-u2:          NOTRUN -> [SKIP][2] ([i915#2190])
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/fi-tgl-u2/igt@gem_huc_copy@huc-copy.html

  * igt@i915_selftest@live@gem:
    - fi-blb-e6850:       NOTRUN -> [DMESG-FAIL][3] ([i915#4528])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/fi-blb-e6850/igt@i915_selftest@live@gem.html

  * igt@i915_selftest@live@objects:
    - fi-bdw-5557u:       [PASS][4] -> [INCOMPLETE][5] ([i915#6000])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/fi-bdw-5557u/igt@i915_selftest@live@objects.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/fi-bdw-5557u/igt@i915_selftest@live@objects.html

  * igt@i915_selftest@live@requests:
    - fi-pnv-d510:        [PASS][6] -> [DMESG-FAIL][7] ([i915#4528])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/fi-pnv-d510/igt@i915_selftest@live@requests.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/fi-pnv-d510/igt@i915_selftest@live@requests.html

  * igt@kms_busy@basic@flip:
    - fi-tgl-u2:          NOTRUN -> [DMESG-WARN][8] ([i915#402])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/fi-tgl-u2/igt@kms_busy@basic@flip.html

  * igt@kms_chamelium@hdmi-edid-read:
    - fi-tgl-u2:          NOTRUN -> [SKIP][9] ([fdo#109284] / [fdo#111827]) +7 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/fi-tgl-u2/igt@kms_chamelium@hdmi-edid-read.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor:
    - fi-tgl-u2:          NOTRUN -> [SKIP][10] ([i915#4103])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/fi-tgl-u2/igt@kms_cursor_legacy@basic-busy-flip-before-cursor.html

  * igt@kms_force_connector_basic@force-load-detect:
    - fi-tgl-u2:          NOTRUN -> [SKIP][11] ([fdo#109285])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/fi-tgl-u2/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_setmode@basic-clone-single-crtc:
    - fi-tgl-u2:          NOTRUN -> [SKIP][12] ([i915#3555])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/fi-tgl-u2/igt@kms_setmode@basic-clone-single-crtc.html

  * igt@prime_vgem@basic-userptr:
    - fi-tgl-u2:          NOTRUN -> [SKIP][13] ([fdo#109295] / [i915#3301])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/fi-tgl-u2/igt@prime_vgem@basic-userptr.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@gt_heartbeat:
    - {fi-ehl-2}:         [DMESG-FAIL][14] ([i915#5334]) -> [PASS][15]
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/fi-ehl-2/igt@i915_selftest@live@gt_heartbeat.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/fi-ehl-2/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@hangcheck:
    - bat-dg1-5:          [DMESG-FAIL][16] ([i915#4494] / [i915#4957]) -> [PASS][17]
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/bat-dg1-5/igt@i915_selftest@live@hangcheck.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/bat-dg1-5/igt@i915_selftest@live@hangcheck.html

  * igt@i915_selftest@live@requests:
    - fi-blb-e6850:       [DMESG-FAIL][18] ([i915#4528]) -> [PASS][19]
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/fi-blb-e6850/igt@i915_selftest@live@requests.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/fi-blb-e6850/igt@i915_selftest@live@requests.html

  * igt@i915_selftest@live@sanitycheck:
    - {bat-adln-1}:       [DMESG-FAIL][20] -> [PASS][21]
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/bat-adln-1/igt@i915_selftest@live@sanitycheck.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/bat-adln-1/igt@i915_selftest@live@sanitycheck.html

  * igt@kms_force_connector_basic@force-connector-state:
    - {bat-adlp-6}:       [DMESG-WARN][22] ([i915#3576]) -> [PASS][23] +3 similar issues
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/bat-adlp-6/igt@kms_force_connector_basic@force-connector-state.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/bat-adlp-6/igt@kms_force_connector_basic@force-connector-state.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-cfl-8109u:       [DMESG-WARN][24] ([i915#62]) -> [PASS][25] +13 similar issues
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/fi-cfl-8109u/igt@kms_frontbuffer_tracking@basic.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/fi-cfl-8109u/igt@kms_frontbuffer_tracking@basic.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3576]: https://gitlab.freedesktop.org/drm/intel/issues/3576
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4494]: https://gitlab.freedesktop.org/drm/intel/issues/4494
  [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528
  [i915#4957]: https://gitlab.freedesktop.org/drm/intel/issues/4957
  [i915#5334]: https://gitlab.freedesktop.org/drm/intel/issues/5334
  [i915#6000]: https://gitlab.freedesktop.org/drm/intel/issues/6000
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62


Build changes
-------------

  * Linux: CI_DRM_11805 -> Patchwork_105635v1

  CI-20190529: 20190529
  CI_DRM_11805: 2a406c5f1126c1220fdaf841df3ef0ae487cd067 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6542: d38a476ee4b9f9a95d8f452de0d66cc52f7f079b @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_105635v1: 2a406c5f1126c1220fdaf841df3ef0ae487cd067 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

d068d4cb8d04 drm/doc/rfc: VM_BIND uapi definition
d146bde11b67 drm/i915: Update i915 uapi documentation
bfac334e42dc drm/doc/rfc: VM_BIND feature design document

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/index.html

[-- Attachment #2: Type: text/html, Size: 8963 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 1/3] drm/doc/rfc: VM_BIND feature design document
  2022-06-26  1:49   ` [Intel-gfx] " Niranjana Vishwanathapura
  (?)
@ 2022-06-27 16:12   ` Daniel Vetter
  -1 siblings, 0 replies; 53+ messages in thread
From: Daniel Vetter @ 2022-06-27 16:12 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: paulo.r.zanoni, intel-gfx, chris.p.wilson, thomas.hellstrom,
	dri-devel, daniel.vetter, christian.koenig, matthew.auld

On Sat, Jun 25, 2022 at 06:49:14PM -0700, Niranjana Vishwanathapura wrote:
> VM_BIND design document with description of intended use cases.
> 
> v2: Reduce the scope to simple Mesa use case.
> v3: Expand documentation on dma-resv usage, TLB flushing and
>     execbuf3.
> v4: Remove vm_bind tlb flush request support.
> v5: Update TLB flushing documentation.
> v6: Update out of order completion documentation.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Aside on the tlb flush discussion: I think that one doesn't have a big
impact if we need to later on fix it with a flag, and what's currently
specified here is the solution that fits best into the existing semantics.
So feels like a solid path until we have this all up&running and can do
real benchmarks with applications.
-Daniel

> ---
>  Documentation/gpu/rfc/i915_vm_bind.rst | 246 +++++++++++++++++++++++++
>  Documentation/gpu/rfc/index.rst        |   4 +
>  2 files changed, 250 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst
> 
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst b/Documentation/gpu/rfc/i915_vm_bind.rst
> new file mode 100644
> index 000000000000..032ee32b885c
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.rst
> @@ -0,0 +1,246 @@
> +==========================================
> +I915 VM_BIND feature design and use cases
> +==========================================
> +
> +VM_BIND feature
> +================
> +DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
> +objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
> +specified address space (VM). These mappings (also referred to as persistent
> +mappings) will be persistent across multiple GPU submissions (execbuf calls)
> +issued by the UMD, without user having to provide a list of all required
> +mappings during each submission (as required by older execbuf mode).
> +
> +The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
> +signaling the completion of bind/unbind operation.
> +
> +VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND.
> +User has to opt-in for VM_BIND mode of binding for an address space (VM)
> +during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
> +
> +VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently are
> +not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be done
> +asynchronously, when valid out fence is specified.
> +
> +VM_BIND features include:
> +
> +* Multiple Virtual Address (VA) mappings can map to the same physical pages
> +  of an object (aliasing).
> +* VA mapping can map to a partial section of the BO (partial binding).
> +* Support capture of persistent mappings in the dump upon GPU error.
> +* Support for userptr gem objects (no special uapi is required for this).
> +
> +TLB flush consideration
> +------------------------
> +The i915 driver flushes the TLB for each submission and when an object's
> +pages are released. The VM_BIND/UNBIND operation will not do any additional
> +TLB flush. Any VM_BIND mapping added will be in the working set for subsequent
> +submissions on that VM and will not be in the working set for currently running
> +batches (which would require additional TLB flushes, which is not supported).
> +
> +Execbuf ioctl in VM_BIND mode
> +-------------------------------
> +A VM in VM_BIND mode will not support older execbuf mode of binding.
> +The execbuf ioctl handling in VM_BIND mode differs significantly from the
> +older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
> +Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
> +struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
> +execlist. Hence, no support for implicit sync. It is expected that the below
> +work will be able to support requirements of object dependency setting in all
> +use cases:
> +
> +"dma-buf: Add an API for exporting sync files"
> +(https://lwn.net/Articles/859290/)
> +
> +The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
> +works with execbuf3 ioctl for submission. All BOs mapped on that VM (through
> +VM_BIND call) at the time of execbuf3 call are deemed required for that
> +submission.
> +
> +The execbuf3 ioctl directly specifies the batch addresses instead of as
> +object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
> +support many of the older features like in/out/submit fences, fence array,
> +default gem context and many more (See struct drm_i915_gem_execbuffer3).
> +
> +In VM_BIND mode, VA allocation is completely managed by the user instead of
> +the i915 driver. Hence all VA assignment, eviction are not applicable in
> +VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
> +be using the i915_vma active reference tracking. It will instead use dma-resv
> +object for that (See `VM_BIND dma_resv usage`_).
> +
> +So, a lot of existing code supporting execbuf2 ioctl, like relocations, VA
> +evictions, vma lookup table, implicit sync, vma active reference tracking etc.,
> +are not applicable for execbuf3 ioctl. Hence, all execbuf3 specific handling
> +should be in a separate file and only functionalities common to these ioctls
> +can be the shared code where possible.
> +
> +VM_PRIVATE objects
> +-------------------
> +By default, BOs can be mapped on multiple VMs and can also be dma-buf
> +exported. Hence these BOs are referred to as Shared BOs.
> +During each execbuf submission, the request fence must be added to the
> +dma-resv fence list of all shared BOs mapped on the VM.
> +
> +VM_BIND feature introduces an optimization where user can create BO which
> +is private to a specified VM via I915_GEM_CREATE_EXT_VM_PRIVATE flag during
> +BO creation. Unlike Shared BOs, these VM private BOs can only be mapped on
> +the VM they are private to and can't be dma-buf exported.
> +All private BOs of a VM share the dma-resv object. Hence during each execbuf
> +submission, they need only one dma-resv fence list updated. Thus, the fast
> +path (where required mappings are already bound) submission latency is O(1)
> +w.r.t the number of VM private BOs.
> +
> +VM_BIND locking hirarchy
> +-------------------------
> +The locking design here supports the older (execlist based) execbuf mode, the
> +newer VM_BIND mode, the VM_BIND mode with GPU page faults and possible future
> +system allocator support (See `Shared Virtual Memory (SVM) support`_).
> +The older execbuf mode and the newer VM_BIND mode without page faults manages
> +residency of backing storage using dma_fence. The VM_BIND mode with page faults
> +and the system allocator support do not use any dma_fence at all.
> +
> +VM_BIND locking order is as below.
> +
> +1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
> +   vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
> +   mapping.
> +
> +   In future, when GPU page faults are supported, we can potentially use a
> +   rwsem instead, so that multiple page fault handlers can take the read side
> +   lock to lookup the mapping and hence can run in parallel.
> +   The older execbuf mode of binding do not need this lock.
> +
> +2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs to
> +   be held while binding/unbinding a vma in the async worker and while updating
> +   dma-resv fence list of an object. Note that private BOs of a VM will all
> +   share a dma-resv object.
> +
> +   The future system allocator support will use the HMM prescribed locking
> +   instead.
> +
> +3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
> +   invalidated vmas (due to eviction and userptr invalidation) etc.
> +
> +When GPU page faults are supported, the execbuf path do not take any of these
> +locks. There we will simply smash the new batch buffer address into the ring and
> +then tell the scheduler run that. The lock taking only happens from the page
> +fault handler, where we take lock-A in read mode, whichever lock-B we need to
> +find the backing storage (dma_resv lock for gem objects, and hmm/core mm for
> +system allocator) and some additional locks (lock-D) for taking care of page
> +table races. Page fault mode should not need to ever manipulate the vm lists,
> +so won't ever need lock-C.
> +
> +VM_BIND LRU handling
> +---------------------
> +We need to ensure VM_BIND mapped objects are properly LRU tagged to avoid
> +performance degradation. We will also need support for bulk LRU movement of
> +VM_BIND objects to avoid additional latencies in execbuf path.
> +
> +The page table pages are similar to VM_BIND mapped objects (See
> +`Evictable page table allocations`_) and are maintained per VM and needs to
> +be pinned in memory when VM is made active (ie., upon an execbuf call with
> +that VM). So, bulk LRU movement of page table pages is also needed.
> +
> +VM_BIND dma_resv usage
> +-----------------------
> +Fences needs to be added to all VM_BIND mapped objects. During each execbuf
> +submission, they are added with DMA_RESV_USAGE_BOOKKEEP usage to prevent
> +over sync (See enum dma_resv_usage). One can override it with either
> +DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE usage during explicit object
> +dependency setting.
> +
> +Note that DRM_I915_GEM_WAIT and DRM_I915_GEM_BUSY ioctls do not check for
> +DMA_RESV_USAGE_BOOKKEEP usage and hence should not be used for end of batch
> +check. Instead, the execbuf3 out fence should be used for end of batch check
> +(See struct drm_i915_gem_execbuffer3).
> +
> +Also, in VM_BIND mode, use dma-resv apis for determining object activeness
> +(See dma_resv_test_signaled() and dma_resv_wait_timeout()) and do not use the
> +older i915_vma active reference tracking which is deprecated. This should be
> +easier to get it working with the current TTM backend.
> +
> +Mesa use case
> +--------------
> +VM_BIND can potentially reduce the CPU overhead in Mesa (both Vulkan and Iris),
> +hence improving performance of CPU-bound applications. It also allows us to
> +implement Vulkan's Sparse Resources. With increasing GPU hardware performance,
> +reducing CPU overhead becomes more impactful.
> +
> +
> +Other VM_BIND use cases
> +========================
> +
> +Long running Compute contexts
> +------------------------------
> +Usage of dma-fence expects that they complete in reasonable amount of time.
> +Compute on the other hand can be long running. Hence it is appropriate for
> +compute to use user/memory fence (See `User/Memory Fence`_) and dma-fence usage
> +must be limited to in-kernel consumption only.
> +
> +Where GPU page faults are not available, kernel driver upon buffer invalidation
> +will initiate a suspend (preemption) of long running context, finish the
> +invalidation, revalidate the BO and then resume the compute context. This is
> +done by having a per-context preempt fence which is enabled when someone tries
> +to wait on it and triggers the context preemption.
> +
> +User/Memory Fence
> +~~~~~~~~~~~~~~~~~~
> +User/Memory fence is a <address, value> pair. To signal the user fence, the
> +specified value will be written at the specified virtual address and wakeup the
> +waiting process. User fence can be signaled either by the GPU or kernel async
> +worker (like upon bind completion). User can wait on a user fence with a new
> +user fence wait ioctl.
> +
> +Here is some prior work on this:
> +https://patchwork.freedesktop.org/patch/349417/
> +
> +Low Latency Submission
> +~~~~~~~~~~~~~~~~~~~~~~~
> +Allows compute UMD to directly submit GPU jobs instead of through execbuf
> +ioctl. This is made possible by VM_BIND is not being synchronized against
> +execbuf. VM_BIND allows bind/unbind of mappings required for the directly
> +submitted jobs.
> +
> +Debugger
> +---------
> +With debug event interface user space process (debugger) is able to keep track
> +of and act upon resources created by another process (debugged) and attached
> +to GPU via vm_bind interface.
> +
> +GPU page faults
> +----------------
> +GPU page faults when supported (in future), will only be supported in the
> +VM_BIND mode. While both the older execbuf mode and the newer VM_BIND mode of
> +binding will require using dma-fence to ensure residency, the GPU page faults
> +mode when supported, will not use any dma-fence as residency is purely managed
> +by installing and removing/invalidating page table entries.
> +
> +Page level hints settings
> +--------------------------
> +VM_BIND allows any hints setting per mapping instead of per BO.
> +Possible hints include read-only mapping, placement and atomicity.
> +Sub-BO level placement hint will be even more relevant with
> +upcoming GPU on-demand page fault support.
> +
> +Page level Cache/CLOS settings
> +-------------------------------
> +VM_BIND allows cache/CLOS settings per mapping instead of per BO.
> +
> +Evictable page table allocations
> +---------------------------------
> +Make pagetable allocations evictable and manage them similar to VM_BIND
> +mapped objects. Page table pages are similar to persistent mappings of a
> +VM (difference here are that the page table pages will not have an i915_vma
> +structure and after swapping pages back in, parent page link needs to be
> +updated).
> +
> +Shared Virtual Memory (SVM) support
> +------------------------------------
> +VM_BIND interface can be used to map system memory directly (without gem BO
> +abstraction) using the HMM interface. SVM is only supported with GPU page
> +faults enabled.
> +
> +VM_BIND UAPI
> +=============
> +
> +.. kernel-doc:: Documentation/gpu/rfc/i915_vm_bind.h
> diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
> index 91e93a705230..7d10c36b268d 100644
> --- a/Documentation/gpu/rfc/index.rst
> +++ b/Documentation/gpu/rfc/index.rst
> @@ -23,3 +23,7 @@ host such documentation:
>  .. toctree::
>  
>      i915_scheduler.rst
> +
> +.. toctree::
> +
> +    i915_vm_bind.rst
> -- 
> 2.21.0.rc0.32.g243a4c7e27
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/doc/rfc: i915 VM_BIND feature design + uapi
  2022-06-26  1:49 ` [Intel-gfx] " Niranjana Vishwanathapura
                   ` (6 preceding siblings ...)
  (?)
@ 2022-06-27 21:34 ` Patchwork
  -1 siblings, 0 replies; 53+ messages in thread
From: Patchwork @ 2022-06-27 21:34 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 54948 bytes --]

== Series Details ==

Series: drm/doc/rfc: i915 VM_BIND feature design + uapi
URL   : https://patchwork.freedesktop.org/series/105635/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11805_full -> Patchwork_105635v1_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_105635v1_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_105635v1_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (13 -> 13)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_105635v1_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_exec_reloc@basic-cpu-read-active:
    - shard-tglb:         [PASS][1] -> [SKIP][2] +4 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-tglb3/igt@gem_exec_reloc@basic-cpu-read-active.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-tglb2/igt@gem_exec_reloc@basic-cpu-read-active.html

  * igt@kms_big_fb@x-tiled-32bpp-rotate-180:
    - shard-tglb:         [PASS][3] -> [FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-tglb3/igt@kms_big_fb@x-tiled-32bpp-rotate-180.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-tglb2/igt@kms_big_fb@x-tiled-32bpp-rotate-180.html

  * igt@kms_cursor_legacy@flip-vs-cursor@varying-size:
    - shard-skl:          [PASS][5] -> [FAIL][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-skl10/igt@kms_cursor_legacy@flip-vs-cursor@varying-size.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl3/igt@kms_cursor_legacy@flip-vs-cursor@varying-size.html

  
#### Warnings ####

  * igt@gem_ccs@block-copy-inplace:
    - shard-tglb:         [SKIP][7] ([i915#3555] / [i915#5325]) -> [SKIP][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-tglb3/igt@gem_ccs@block-copy-inplace.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-tglb2/igt@gem_ccs@block-copy-inplace.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@kms_panel_fitting@legacy:
    - {shard-dg1}:        NOTRUN -> [SKIP][9]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-dg1-15/igt@kms_panel_fitting@legacy.html

  
Known issues
------------

  Here are the changes found in Patchwork_105635v1_full that come from known issues:

### CI changes ###

#### Issues hit ####

  * boot:
    - shard-glk:          ([PASS][10], [PASS][11], [PASS][12], [PASS][13], [PASS][14], [PASS][15], [PASS][16], [PASS][17], [PASS][18], [PASS][19], [PASS][20], [PASS][21], [PASS][22], [PASS][23], [PASS][24], [PASS][25], [PASS][26], [PASS][27], [PASS][28], [PASS][29], [PASS][30], [PASS][31], [PASS][32], [PASS][33], [PASS][34]) -> ([PASS][35], [PASS][36], [PASS][37], [PASS][38], [PASS][39], [PASS][40], [PASS][41], [PASS][42], [PASS][43], [PASS][44], [PASS][45], [PASS][46], [PASS][47], [PASS][48], [PASS][49], [FAIL][50], [PASS][51], [PASS][52], [PASS][53], [PASS][54], [PASS][55], [PASS][56], [PASS][57], [PASS][58], [PASS][59]) ([i915#4392])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk9/boot.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk9/boot.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk9/boot.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk8/boot.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk8/boot.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk8/boot.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk8/boot.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk7/boot.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk7/boot.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk7/boot.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk6/boot.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk6/boot.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk6/boot.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk5/boot.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk5/boot.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk5/boot.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk3/boot.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk3/boot.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk3/boot.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk2/boot.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk2/boot.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk2/boot.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk1/boot.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk1/boot.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk1/boot.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk9/boot.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk9/boot.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk8/boot.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk8/boot.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk8/boot.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk7/boot.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk7/boot.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk7/boot.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk6/boot.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk6/boot.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk6/boot.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk5/boot.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk5/boot.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk5/boot.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk3/boot.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk3/boot.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk3/boot.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk2/boot.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk2/boot.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk2/boot.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk1/boot.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk1/boot.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk1/boot.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk9/boot.html
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk9/boot.html

  

### IGT changes ###

#### Issues hit ####

  * igt@gem_ccs@block-copy-uncompressed:
    - shard-iclb:         NOTRUN -> [SKIP][60] ([i915#5327])
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@gem_ccs@block-copy-uncompressed.html

  * igt@gem_ctx_exec@basic-nohangcheck:
    - shard-tglb:         [PASS][61] -> [FAIL][62] ([i915#6268])
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-tglb7/igt@gem_ctx_exec@basic-nohangcheck.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-tglb3/igt@gem_ctx_exec@basic-nohangcheck.html

  * igt@gem_eio@unwedge-stress:
    - shard-tglb:         [PASS][63] -> [FAIL][64] ([i915#5784])
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-tglb6/igt@gem_eio@unwedge-stress.html
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-tglb5/igt@gem_eio@unwedge-stress.html

  * igt@gem_exec_balancer@parallel-contexts:
    - shard-iclb:         [PASS][65] -> [SKIP][66] ([i915#4525]) +1 similar issue
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-iclb4/igt@gem_exec_balancer@parallel-contexts.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb5/igt@gem_exec_balancer@parallel-contexts.html

  * igt@gem_exec_fair@basic-none@vecs0:
    - shard-glk:          [PASS][67] -> [FAIL][68] ([i915#2842])
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk8/igt@gem_exec_fair@basic-none@vecs0.html
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk9/igt@gem_exec_fair@basic-none@vecs0.html

  * igt@gem_exec_fair@basic-pace-solo@rcs0:
    - shard-kbl:          [PASS][69] -> [FAIL][70] ([i915#2842]) +1 similar issue
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-kbl7/igt@gem_exec_fair@basic-pace-solo@rcs0.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-kbl7/igt@gem_exec_fair@basic-pace-solo@rcs0.html

  * igt@gem_exec_fair@basic-throttle@rcs0:
    - shard-tglb:         [PASS][71] -> [FAIL][72] ([i915#2842]) +1 similar issue
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-tglb2/igt@gem_exec_fair@basic-throttle@rcs0.html
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-tglb3/igt@gem_exec_fair@basic-throttle@rcs0.html

  * igt@gem_exec_flush@basic-batch-kernel-default-cmd:
    - shard-iclb:         NOTRUN -> [SKIP][73] ([fdo#109313])
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@gem_exec_flush@basic-batch-kernel-default-cmd.html

  * igt@gem_lmem_swapping@heavy-verify-multi-ccs:
    - shard-kbl:          NOTRUN -> [SKIP][74] ([fdo#109271] / [i915#4613])
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-kbl1/igt@gem_lmem_swapping@heavy-verify-multi-ccs.html

  * igt@gem_lmem_swapping@parallel-random:
    - shard-iclb:         NOTRUN -> [SKIP][75] ([i915#4613])
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@gem_lmem_swapping@parallel-random.html

  * igt@gem_lmem_swapping@parallel-random-verify-ccs:
    - shard-skl:          NOTRUN -> [SKIP][76] ([fdo#109271] / [i915#4613]) +1 similar issue
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl10/igt@gem_lmem_swapping@parallel-random-verify-ccs.html

  * igt@gem_lmem_swapping@verify:
    - shard-apl:          NOTRUN -> [SKIP][77] ([fdo#109271] / [i915#4613])
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-apl3/igt@gem_lmem_swapping@verify.html

  * igt@gem_pxp@verify-pxp-execution-after-suspend-resume:
    - shard-iclb:         NOTRUN -> [SKIP][78] ([i915#4270])
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@gem_pxp@verify-pxp-execution-after-suspend-resume.html

  * igt@gem_softpin@evict-snoop:
    - shard-kbl:          NOTRUN -> [SKIP][79] ([fdo#109271]) +39 similar issues
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-kbl1/igt@gem_softpin@evict-snoop.html

  * igt@gem_userptr_blits@input-checking:
    - shard-skl:          NOTRUN -> [DMESG-WARN][80] ([i915#4991]) +1 similar issue
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl10/igt@gem_userptr_blits@input-checking.html

  * igt@gen7_exec_parse@basic-rejected:
    - shard-iclb:         NOTRUN -> [SKIP][81] ([fdo#109289]) +1 similar issue
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@gen7_exec_parse@basic-rejected.html

  * igt@gen9_exec_parse@bb-start-out:
    - shard-iclb:         NOTRUN -> [SKIP][82] ([i915#2856])
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@gen9_exec_parse@bb-start-out.html

  * igt@i915_module_load@reload-with-fault-injection:
    - shard-tglb:         [PASS][83] -> [TIMEOUT][84] ([i915#3953])
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-tglb3/igt@i915_module_load@reload-with-fault-injection.html
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-tglb2/igt@i915_module_load@reload-with-fault-injection.html

  * igt@kms_big_fb@4-tiled-max-hw-stride-64bpp-rotate-0-hflip:
    - shard-iclb:         NOTRUN -> [SKIP][85] ([i915#5286]) +1 similar issue
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@kms_big_fb@4-tiled-max-hw-stride-64bpp-rotate-0-hflip.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-180-hflip:
    - shard-iclb:         NOTRUN -> [SKIP][86] ([fdo#110723])
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-180-hflip.html

  * igt@kms_ccs@pipe-a-missing-ccs-buffer-y_tiled_gen12_rc_ccs_cc:
    - shard-kbl:          NOTRUN -> [SKIP][87] ([fdo#109271] / [i915#3886])
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-kbl1/igt@kms_ccs@pipe-a-missing-ccs-buffer-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-a-random-ccs-data-y_tiled_gen12_rc_ccs:
    - shard-skl:          NOTRUN -> [SKIP][88] ([fdo#109271]) +115 similar issues
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl1/igt@kms_ccs@pipe-a-random-ccs-data-y_tiled_gen12_rc_ccs.html

  * igt@kms_ccs@pipe-c-bad-aux-stride-y_tiled_gen12_rc_ccs_cc:
    - shard-iclb:         NOTRUN -> [SKIP][89] ([fdo#109278] / [i915#3886])
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@kms_ccs@pipe-c-bad-aux-stride-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-c-bad-pixel-format-y_tiled_gen12_mc_ccs:
    - shard-apl:          NOTRUN -> [SKIP][90] ([fdo#109271] / [i915#3886]) +1 similar issue
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-apl3/igt@kms_ccs@pipe-c-bad-pixel-format-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-c-crc-sprite-planes-basic-y_tiled_gen12_rc_ccs_cc:
    - shard-skl:          NOTRUN -> [SKIP][91] ([fdo#109271] / [i915#3886]) +1 similar issue
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl7/igt@kms_ccs@pipe-c-crc-sprite-planes-basic-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-d-random-ccs-data-4_tiled_dg2_rc_ccs_cc:
    - shard-iclb:         NOTRUN -> [SKIP][92] ([fdo#109278]) +6 similar issues
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@kms_ccs@pipe-d-random-ccs-data-4_tiled_dg2_rc_ccs_cc.html

  * igt@kms_chamelium@dp-hpd-fast:
    - shard-kbl:          NOTRUN -> [SKIP][93] ([fdo#109271] / [fdo#111827]) +1 similar issue
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-kbl1/igt@kms_chamelium@dp-hpd-fast.html

  * igt@kms_chamelium@hdmi-hpd-after-suspend:
    - shard-skl:          NOTRUN -> [SKIP][94] ([fdo#109271] / [fdo#111827]) +6 similar issues
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl10/igt@kms_chamelium@hdmi-hpd-after-suspend.html

  * igt@kms_chamelium@vga-frame-dump:
    - shard-apl:          NOTRUN -> [SKIP][95] ([fdo#109271] / [fdo#111827]) +4 similar issues
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-apl3/igt@kms_chamelium@vga-frame-dump.html

  * igt@kms_chamelium@vga-hpd-fast:
    - shard-iclb:         NOTRUN -> [SKIP][96] ([fdo#109284] / [fdo#111827])
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@kms_chamelium@vga-hpd-fast.html

  * igt@kms_content_protection@atomic:
    - shard-apl:          NOTRUN -> [TIMEOUT][97] ([i915#1319])
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-apl3/igt@kms_content_protection@atomic.html

  * igt@kms_content_protection@dp-mst-lic-type-1:
    - shard-iclb:         NOTRUN -> [SKIP][98] ([i915#3116])
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@kms_content_protection@dp-mst-lic-type-1.html

  * igt@kms_content_protection@lic:
    - shard-kbl:          NOTRUN -> [TIMEOUT][99] ([i915#1319])
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-kbl1/igt@kms_content_protection@lic.html

  * igt@kms_cursor_crc@cursor-suspend@pipe-b-dp-1:
    - shard-apl:          [PASS][100] -> [DMESG-WARN][101] ([i915#180])
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-apl7/igt@kms_cursor_crc@cursor-suspend@pipe-b-dp-1.html
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-apl8/igt@kms_cursor_crc@cursor-suspend@pipe-b-dp-1.html

  * igt@kms_cursor_legacy@flip-vs-cursor@atomic:
    - shard-skl:          [PASS][102] -> [FAIL][103] ([i915#2346]) +1 similar issue
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-skl10/igt@kms_cursor_legacy@flip-vs-cursor@atomic.html
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl3/igt@kms_cursor_legacy@flip-vs-cursor@atomic.html

  * igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions:
    - shard-glk:          [PASS][104] -> [FAIL][105] ([i915#2346])
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk5/igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions.html
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk5/igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions.html

  * igt@kms_draw_crc@draw-method-rgb565-mmap-cpu-4tiled:
    - shard-iclb:         NOTRUN -> [SKIP][106] ([i915#5287])
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@kms_draw_crc@draw-method-rgb565-mmap-cpu-4tiled.html

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-apl:          [PASS][107] -> [FAIL][108] ([i915#4767])
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-apl4/igt@kms_fbcon_fbt@fbc-suspend.html
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-apl3/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_fbcon_fbt@psr-suspend:
    - shard-skl:          [PASS][109] -> [FAIL][110] ([i915#4767])
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-skl4/igt@kms_fbcon_fbt@psr-suspend.html
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl3/igt@kms_fbcon_fbt@psr-suspend.html

  * igt@kms_flip@2x-flip-vs-rmfb-interruptible:
    - shard-iclb:         NOTRUN -> [SKIP][111] ([fdo#109274]) +1 similar issue
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@kms_flip@2x-flip-vs-rmfb-interruptible.html

  * igt@kms_flip@2x-flip-vs-wf_vblank-interruptible@ab-hdmi-a1-hdmi-a2:
    - shard-glk:          [PASS][112] -> [FAIL][113] ([i915#2122])
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk2/igt@kms_flip@2x-flip-vs-wf_vblank-interruptible@ab-hdmi-a1-hdmi-a2.html
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk7/igt@kms_flip@2x-flip-vs-wf_vblank-interruptible@ab-hdmi-a1-hdmi-a2.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@a-edp1:
    - shard-skl:          [PASS][114] -> [FAIL][115] ([i915#79])
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-skl9/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-edp1.html
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl7/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-edp1.html

  * igt@kms_flip@flip-vs-suspend@a-dp1:
    - shard-kbl:          [PASS][116] -> [DMESG-WARN][117] ([i915#180]) +4 similar issues
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-kbl3/igt@kms_flip@flip-vs-suspend@a-dp1.html
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-kbl6/igt@kms_flip@flip-vs-suspend@a-dp1.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-upscaling:
    - shard-glk:          [PASS][118] -> [FAIL][119] ([i915#4911])
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk6/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-upscaling.html
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk8/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-upscaling.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-draw-mmap-gtt:
    - shard-apl:          NOTRUN -> [SKIP][120] ([fdo#109271]) +53 similar issues
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-apl8/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-draw-mmap-gtt.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-move:
    - shard-iclb:         NOTRUN -> [SKIP][121] ([fdo#109280]) +7 similar issues
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-move.html

  * igt@kms_frontbuffer_tracking@psr-2p-primscrn-spr-indfb-draw-mmap-wc:
    - shard-snb:          NOTRUN -> [SKIP][122] ([fdo#109271]) +7 similar issues
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-snb6/igt@kms_frontbuffer_tracking@psr-2p-primscrn-spr-indfb-draw-mmap-wc.html

  * igt@kms_hdr@bpc-switch@pipe-a-dp-1:
    - shard-kbl:          [PASS][123] -> [FAIL][124] ([i915#1188])
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-kbl7/igt@kms_hdr@bpc-switch@pipe-a-dp-1.html
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-kbl7/igt@kms_hdr@bpc-switch@pipe-a-dp-1.html

  * igt@kms_plane_alpha_blend@pipe-c-constant-alpha-max:
    - shard-skl:          NOTRUN -> [FAIL][125] ([fdo#108145] / [i915#265])
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl1/igt@kms_plane_alpha_blend@pipe-c-constant-alpha-max.html

  * igt@kms_psr2_sf@cursor-plane-update-sf:
    - shard-skl:          NOTRUN -> [SKIP][126] ([fdo#109271] / [i915#658])
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl1/igt@kms_psr2_sf@cursor-plane-update-sf.html

  * igt@kms_psr2_sf@primary-plane-update-sf-dmg-area:
    - shard-kbl:          NOTRUN -> [SKIP][127] ([fdo#109271] / [i915#658])
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-kbl1/igt@kms_psr2_sf@primary-plane-update-sf-dmg-area.html

  * igt@kms_psr@psr2_cursor_blt:
    - shard-iclb:         [PASS][128] -> [SKIP][129] ([fdo#109441]) +1 similar issue
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-iclb2/igt@kms_psr@psr2_cursor_blt.html
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb6/igt@kms_psr@psr2_cursor_blt.html

  * igt@kms_psr@psr2_sprite_plane_onoff:
    - shard-iclb:         NOTRUN -> [SKIP][130] ([fdo#109441]) +1 similar issue
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@kms_psr@psr2_sprite_plane_onoff.html

  * igt@kms_writeback@writeback-invalid-parameters:
    - shard-skl:          NOTRUN -> [SKIP][131] ([fdo#109271] / [i915#2437]) +1 similar issue
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl10/igt@kms_writeback@writeback-invalid-parameters.html

  * igt@kms_writeback@writeback-pixel-formats:
    - shard-apl:          NOTRUN -> [SKIP][132] ([fdo#109271] / [i915#2437])
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-apl8/igt@kms_writeback@writeback-pixel-formats.html

  * igt@prime_nv_pcopy@test3_4:
    - shard-iclb:         NOTRUN -> [SKIP][133] ([fdo#109291]) +1 similar issue
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb7/igt@prime_nv_pcopy@test3_4.html

  * igt@sysfs_clients@create:
    - shard-skl:          NOTRUN -> [SKIP][134] ([fdo#109271] / [i915#2994])
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl10/igt@sysfs_clients@create.html

  
#### Possible fixes ####

  * igt@drm_read@short-buffer-block:
    - {shard-rkl}:        [SKIP][135] ([i915#4098]) -> [PASS][136] +1 similar issue
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-1/igt@drm_read@short-buffer-block.html
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-6/igt@drm_read@short-buffer-block.html

  * igt@fbdev@read:
    - {shard-rkl}:        [SKIP][137] ([i915#2582]) -> [PASS][138]
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-1/igt@fbdev@read.html
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-6/igt@fbdev@read.html

  * igt@gem_ctx_exec@basic-nohangcheck:
    - {shard-rkl}:        [FAIL][139] ([i915#6268]) -> [PASS][140]
   [139]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-1/igt@gem_ctx_exec@basic-nohangcheck.html
   [140]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-2/igt@gem_ctx_exec@basic-nohangcheck.html

  * igt@gem_ctx_persistence@engines-hostile@vcs1:
    - {shard-dg1}:        [FAIL][141] ([i915#4883]) -> [PASS][142] +1 similar issue
   [141]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-dg1-19/igt@gem_ctx_persistence@engines-hostile@vcs1.html
   [142]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-dg1-19/igt@gem_ctx_persistence@engines-hostile@vcs1.html

  * igt@gem_ctx_persistence@legacy-engines-hostile@bsd:
    - {shard-rkl}:        [FAIL][143] ([i915#2410]) -> [PASS][144]
   [143]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-5/igt@gem_ctx_persistence@legacy-engines-hostile@bsd.html
   [144]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-1/igt@gem_ctx_persistence@legacy-engines-hostile@bsd.html

  * igt@gem_eio@kms:
    - shard-tglb:         [FAIL][145] ([i915#5784]) -> [PASS][146]
   [145]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-tglb1/igt@gem_eio@kms.html
   [146]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-tglb6/igt@gem_eio@kms.html

  * igt@gem_exec_balancer@parallel-bb-first:
    - shard-iclb:         [SKIP][147] ([i915#4525]) -> [PASS][148] +2 similar issues
   [147]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-iclb8/igt@gem_exec_balancer@parallel-bb-first.html
   [148]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb2/igt@gem_exec_balancer@parallel-bb-first.html

  * igt@gem_exec_endless@dispatch@bcs0:
    - {shard-rkl}:        [SKIP][149] ([i915#6247]) -> [PASS][150]
   [149]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-5/igt@gem_exec_endless@dispatch@bcs0.html
   [150]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-1/igt@gem_exec_endless@dispatch@bcs0.html

  * igt@gem_exec_fair@basic-none-share@rcs0:
    - shard-tglb:         [FAIL][151] ([i915#2842]) -> [PASS][152] +1 similar issue
   [151]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-tglb6/igt@gem_exec_fair@basic-none-share@rcs0.html
   [152]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-tglb5/igt@gem_exec_fair@basic-none-share@rcs0.html
    - shard-iclb:         [FAIL][153] ([i915#2842]) -> [PASS][154]
   [153]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-iclb2/igt@gem_exec_fair@basic-none-share@rcs0.html
   [154]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb4/igt@gem_exec_fair@basic-none-share@rcs0.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - shard-glk:          [FAIL][155] ([i915#2842]) -> [PASS][156]
   [155]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk2/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [156]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk7/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gem_exec_reloc@basic-write-read-noreloc:
    - {shard-rkl}:        [SKIP][157] ([i915#3281]) -> [PASS][158] +15 similar issues
   [157]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-2/igt@gem_exec_reloc@basic-write-read-noreloc.html
   [158]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-5/igt@gem_exec_reloc@basic-write-read-noreloc.html

  * igt@gem_exec_whisper@basic-fds-forked-all:
    - shard-skl:          [INCOMPLETE][159] ([i915#5843]) -> [PASS][160]
   [159]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-skl9/igt@gem_exec_whisper@basic-fds-forked-all.html
   [160]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl1/igt@gem_exec_whisper@basic-fds-forked-all.html

  * igt@gem_huc_copy@huc-copy:
    - shard-tglb:         [SKIP][161] ([i915#2190]) -> [PASS][162]
   [161]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-tglb7/igt@gem_huc_copy@huc-copy.html
   [162]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-tglb1/igt@gem_huc_copy@huc-copy.html

  * igt@gem_set_tiling_vs_pwrite:
    - {shard-rkl}:        [SKIP][163] ([i915#3282]) -> [PASS][164] +5 similar issues
   [163]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-2/igt@gem_set_tiling_vs_pwrite.html
   [164]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-5/igt@gem_set_tiling_vs_pwrite.html

  * igt@gem_workarounds@suspend-resume:
    - shard-skl:          [INCOMPLETE][165] ([i915#4939] / [i915#5129]) -> [PASS][166]
   [165]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-skl10/igt@gem_workarounds@suspend-resume.html
   [166]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl7/igt@gem_workarounds@suspend-resume.html

  * igt@gem_workarounds@suspend-resume-fd:
    - shard-kbl:          [DMESG-WARN][167] ([i915#180]) -> [PASS][168] +2 similar issues
   [167]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-kbl6/igt@gem_workarounds@suspend-resume-fd.html
   [168]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-kbl1/igt@gem_workarounds@suspend-resume-fd.html

  * igt@gen9_exec_parse@valid-registers:
    - {shard-rkl}:        [SKIP][169] ([i915#2527]) -> [PASS][170] +4 similar issues
   [169]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-6/igt@gen9_exec_parse@valid-registers.html
   [170]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-5/igt@gen9_exec_parse@valid-registers.html

  * igt@i915_hangman@gt-engine-error@bcs0:
    - {shard-rkl}:        [SKIP][171] ([i915#6258]) -> [PASS][172]
   [171]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-5/igt@i915_hangman@gt-engine-error@bcs0.html
   [172]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-1/igt@i915_hangman@gt-engine-error@bcs0.html

  * igt@i915_module_load@reload-with-fault-injection:
    - shard-snb:          [DMESG-WARN][173] ([i915#6201]) -> [PASS][174]
   [173]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-snb4/igt@i915_module_load@reload-with-fault-injection.html
   [174]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-snb6/igt@i915_module_load@reload-with-fault-injection.html

  * igt@i915_pm_backlight@fade:
    - {shard-rkl}:        [SKIP][175] ([i915#3012]) -> [PASS][176] +1 similar issue
   [175]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-1/igt@i915_pm_backlight@fade.html
   [176]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-6/igt@i915_pm_backlight@fade.html

  * igt@i915_pm_dc@dc6-psr:
    - shard-iclb:         [FAIL][177] ([i915#454]) -> [PASS][178]
   [177]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-iclb7/igt@i915_pm_dc@dc6-psr.html
   [178]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb1/igt@i915_pm_dc@dc6-psr.html

  * igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a:
    - {shard-dg1}:        [SKIP][179] ([i915#1937]) -> [PASS][180]
   [179]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-dg1-18/igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a.html
   [180]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-dg1-12/igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a.html

  * igt@i915_pm_rpm@modeset-non-lpsp-stress:
    - {shard-dg1}:        [SKIP][181] ([i915#1397]) -> [PASS][182]
   [181]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-dg1-16/igt@i915_pm_rpm@modeset-non-lpsp-stress.html
   [182]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-dg1-18/igt@i915_pm_rpm@modeset-non-lpsp-stress.html

  * igt@i915_selftest@live@gt_pm:
    - {shard-tglu}:       [DMESG-FAIL][183] ([i915#3987]) -> [PASS][184]
   [183]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-tglu-4/igt@i915_selftest@live@gt_pm.html
   [184]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-tglu-4/igt@i915_selftest@live@gt_pm.html

  * igt@kms_async_flips@alternate-sync-async-flip@pipe-b-edp-1:
    - shard-skl:          [FAIL][185] ([i915#2521]) -> [PASS][186]
   [185]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-skl4/igt@kms_async_flips@alternate-sync-async-flip@pipe-b-edp-1.html
   [186]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl3/igt@kms_async_flips@alternate-sync-async-flip@pipe-b-edp-1.html

  * igt@kms_color@pipe-a-ctm-red-to-blue:
    - {shard-rkl}:        [SKIP][187] ([i915#1149] / [i915#1849] / [i915#4070] / [i915#4098]) -> [PASS][188] +2 similar issues
   [187]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-1/igt@kms_color@pipe-a-ctm-red-to-blue.html
   [188]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-6/igt@kms_color@pipe-a-ctm-red-to-blue.html

  * igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions-varying-size:
    - shard-glk:          [FAIL][189] ([i915#2346]) -> [PASS][190]
   [189]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-glk5/igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions-varying-size.html
   [190]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-glk5/igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions-varying-size.html

  * igt@kms_draw_crc@draw-method-xrgb2101010-blt-untiled:
    - {shard-rkl}:        [SKIP][191] ([fdo#111314] / [i915#4098] / [i915#4369]) -> [PASS][192] +2 similar issues
   [191]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-1/igt@kms_draw_crc@draw-method-xrgb2101010-blt-untiled.html
   [192]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-6/igt@kms_draw_crc@draw-method-xrgb2101010-blt-untiled.html

  * igt@kms_flip@flip-vs-expired-vblank@c-edp1:
    - shard-skl:          [FAIL][193] ([i915#79]) -> [PASS][194] +1 similar issue
   [193]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-skl7/igt@kms_flip@flip-vs-expired-vblank@c-edp1.html
   [194]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl4/igt@kms_flip@flip-vs-expired-vblank@c-edp1.html
    - shard-tglb:         [FAIL][195] ([i915#79]) -> [PASS][196]
   [195]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-tglb7/igt@kms_flip@flip-vs-expired-vblank@c-edp1.html
   [196]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-tglb6/igt@kms_flip@flip-vs-expired-vblank@c-edp1.html

  * igt@kms_flip@flip-vs-suspend@c-dp1:
    - shard-apl:          [DMESG-WARN][197] ([i915#180]) -> [PASS][198] +2 similar issues
   [197]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-apl8/igt@kms_flip@flip-vs-suspend@c-dp1.html
   [198]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-apl8/igt@kms_flip@flip-vs-suspend@c-dp1.html

  * igt@kms_flip@plain-flip-ts-check@c-edp1:
    - shard-skl:          [FAIL][199] ([i915#2122]) -> [PASS][200] +1 similar issue
   [199]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-skl7/igt@kms_flip@plain-flip-ts-check@c-edp1.html
   [200]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl4/igt@kms_flip@plain-flip-ts-check@c-edp1.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytile-downscaling:
    - shard-iclb:         [SKIP][201] ([i915#3701]) -> [PASS][202]
   [201]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-iclb2/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytile-downscaling.html
   [202]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb4/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytile-downscaling.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-pri-indfb-multidraw:
    - {shard-rkl}:        [SKIP][203] ([i915#1849] / [i915#4098]) -> [PASS][204] +18 similar issues
   [203]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-1/igt@kms_frontbuffer_tracking@fbcpsr-1p-pri-indfb-multidraw.html
   [204]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-6/igt@kms_frontbuffer_tracking@fbcpsr-1p-pri-indfb-multidraw.html

  * igt@kms_invalid_mode@int-max-clock:
    - {shard-rkl}:        [SKIP][205] ([i915#4278]) -> [PASS][206] +1 similar issue
   [205]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-1/igt@kms_invalid_mode@int-max-clock.html
   [206]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-6/igt@kms_invalid_mode@int-max-clock.html

  * igt@kms_plane@plane-position-covered@pipe-b-planes:
    - {shard-rkl}:        [SKIP][207] ([i915#3558]) -> [PASS][208] +1 similar issue
   [207]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-1/igt@kms_plane@plane-position-covered@pipe-b-planes.html
   [208]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-6/igt@kms_plane@plane-position-covered@pipe-b-planes.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-7efc:
    - {shard-rkl}:        [SKIP][209] ([i915#1849] / [i915#4070] / [i915#4098]) -> [PASS][210]
   [209]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-1/igt@kms_plane_alpha_blend@pipe-a-alpha-7efc.html
   [210]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-6/igt@kms_plane_alpha_blend@pipe-a-alpha-7efc.html

  * igt@kms_properties@crtc-properties-atomic:
    - {shard-rkl}:        [SKIP][211] ([i915#1849]) -> [PASS][212]
   [211]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-1/igt@kms_properties@crtc-properties-atomic.html
   [212]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-6/igt@kms_properties@crtc-properties-atomic.html

  * igt@kms_psr@cursor_mmap_cpu:
    - {shard-rkl}:        [SKIP][213] ([i915#1072]) -> [PASS][214] +1 similar issue
   [213]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-1/igt@kms_psr@cursor_mmap_cpu.html
   [214]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-6/igt@kms_psr@cursor_mmap_cpu.html

  * igt@kms_psr@psr2_suspend:
    - shard-iclb:         [SKIP][215] ([fdo#109441]) -> [PASS][216] +2 similar issues
   [215]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-iclb8/igt@kms_psr@psr2_suspend.html
   [216]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb2/igt@kms_psr@psr2_suspend.html

  * igt@kms_vblank@pipe-a-ts-continuation-idle-hang:
    - shard-snb:          [SKIP][217] ([fdo#109271]) -> [PASS][218] +1 similar issue
   [217]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-snb6/igt@kms_vblank@pipe-a-ts-continuation-idle-hang.html
   [218]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-snb6/igt@kms_vblank@pipe-a-ts-continuation-idle-hang.html

  * igt@kms_vblank@pipe-b-query-idle:
    - {shard-rkl}:        [SKIP][219] ([i915#1845] / [i915#4098]) -> [PASS][220] +19 similar issues
   [219]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-rkl-1/igt@kms_vblank@pipe-b-query-idle.html
   [220]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-rkl-6/igt@kms_vblank@pipe-b-query-idle.html

  
#### Warnings ####

  * igt@gem_exec_balancer@parallel-ordering:
    - shard-iclb:         [SKIP][221] ([i915#4525]) -> [FAIL][222] ([i915#6117])
   [221]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-iclb3/igt@gem_exec_balancer@parallel-ordering.html
   [222]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb1/igt@gem_exec_balancer@parallel-ordering.html

  * igt@gem_exec_fair@basic-none-rrul@rcs0:
    - shard-iclb:         [FAIL][223] ([i915#2842]) -> [FAIL][224] ([i915#2852])
   [223]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-iclb6/igt@gem_exec_fair@basic-none-rrul@rcs0.html
   [224]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb8/igt@gem_exec_fair@basic-none-rrul@rcs0.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-move:
    - shard-skl:          [SKIP][225] ([fdo#109271]) -> [SKIP][226] ([fdo#109271] / [i915#1888])
   [225]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-skl4/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-move.html
   [226]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl1/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-move.html

  * igt@kms_psr2_sf@cursor-plane-move-continuous-sf:
    - shard-iclb:         [SKIP][227] ([i915#658]) -> [SKIP][228] ([i915#2920]) +1 similar issue
   [227]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-iclb8/igt@kms_psr2_sf@cursor-plane-move-continuous-sf.html
   [228]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb2/igt@kms_psr2_sf@cursor-plane-move-continuous-sf.html

  * igt@kms_psr2_sf@overlay-plane-move-continuous-sf:
    - shard-iclb:         [SKIP][229] ([i915#2920]) -> [SKIP][230] ([i915#658])
   [229]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-iclb2/igt@kms_psr2_sf@overlay-plane-move-continuous-sf.html
   [230]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb4/igt@kms_psr2_sf@overlay-plane-move-continuous-sf.html

  * igt@kms_psr2_sf@plane-move-sf-dmg-area:
    - shard-iclb:         [SKIP][231] ([i915#2920]) -> [SKIP][232] ([fdo#111068] / [i915#658])
   [231]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-iclb2/igt@kms_psr2_sf@plane-move-sf-dmg-area.html
   [232]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-iclb4/igt@kms_psr2_sf@plane-move-sf-dmg-area.html

  * igt@runner@aborted:
    - shard-skl:          ([FAIL][233], [FAIL][234]) ([i915#4312]) -> ([FAIL][235], [FAIL][236], [FAIL][237], [FAIL][238], [FAIL][239]) ([i915#2029] / [i915#3002] / [i915#4312] / [i915#5257])
   [233]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-skl7/igt@runner@aborted.html
   [234]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-skl9/igt@runner@aborted.html
   [235]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl3/igt@runner@aborted.html
   [236]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl7/igt@runner@aborted.html
   [237]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl3/igt@runner@aborted.html
   [238]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl10/igt@runner@aborted.html
   [239]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-skl4/igt@runner@aborted.html
    - shard-apl:          ([FAIL][240], [FAIL][241], [FAIL][242], [FAIL][243]) ([i915#180] / [i915#3002] / [i915#4312] / [i915#5257]) -> ([FAIL][244], [FAIL][245], [FAIL][246]) ([i915#3002] / [i915#4312] / [i915#5257])
   [240]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-apl8/igt@runner@aborted.html
   [241]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-apl3/igt@runner@aborted.html
   [242]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-apl1/igt@runner@aborted.html
   [243]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11805/shard-apl7/igt@runner@aborted.html
   [244]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-apl8/igt@runner@aborted.html
   [245]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-apl8/igt@runner@aborted.html
   [246]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/shard-apl1/igt@runner@aborted.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109274]: https://bugs.freedesktop.org/show_bug.cgi?id=109274
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280
  [fdo#109283]: https://bugs.freedesktop.org/show_bug.cgi?id=109283
  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109291]: https://bugs.freedesktop.org/show_bug.cgi?id=109291
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#109307]: https://bugs.freedesktop.org/show_bug.cgi?id=109307
  [fdo#109313]: https://bugs.freedesktop.org/show_bug.cgi?id=109313
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#110254]: https://bugs.freedesktop.org/show_bug.cgi?id=110254
  [fdo#110723]: https://bugs.freedesktop.org/show_bug.cgi?id=110723
  [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068
  [fdo#111314]: https://bugs.freedesktop.org/show_bug.cgi?id=111314
  [fdo#111614]: https://bugs.freedesktop.org/show_bug.cgi?id=111614
  [fdo#111615]: https://bugs.freedesktop.org/show_bug.cgi?id=111615
  [fdo#111656]: https://bugs.freedesktop.org/show_bug.cgi?id=111656
  [fdo#111825]: https://bugs.freedesktop.org/show_bug.cgi?id=111825
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1149]: https://gitlab.freedesktop.org/drm/intel/issues/1149
  [i915#1155]: https://gitlab.freedesktop.org/drm/intel/issues/1155
  [i915#1188]: https://gitlab.freedesktop.org/drm/intel/issues/1188
  [i915#1319]: https://gitlab.freedesktop.org/drm/intel/issues/1319
  [i915#132]: https://gitlab.freedesktop.org/drm/intel/issues/132
  [i915#1397]: https://gitlab.freedesktop.org/drm/intel/issues/1397
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1825]: https://gitlab.freedesktop.org/drm/intel/issues/1825
  [i915#1839]: https://gitlab.freedesktop.org/drm/intel/issues/1839
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#1911]: https://gitlab.freedesktop.org/drm/intel/issues/1911
  [i915#1937]: https://gitlab.freedesktop.org/drm/intel/issues/1937
  [i915#2029]: https://gitlab.freedesktop.org/drm/intel/issues/2029
  [i915#2122]: https://gitlab.freedesktop.org/drm/intel/issues/2122
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2346]: https://gitlab.freedesktop.org/drm/intel/issues/2346
  [i915#2410]: https://gitlab.freedesktop.org/drm/intel/issues/2410
  [i915#2437]: https://gitlab.freedesktop.org/drm/intel/issues/2437
  [i915#2521]: https://gitlab.freedesktop.org/drm/intel/issues/2521
  [i915#2527]: https://gitlab.freedesktop.org/drm/intel/issues/2527
  [i915#2530]: https://gitlab.freedesktop.org/drm/intel/issues/2530
  [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582
  [i915#2587]: https://gitlab.freedesktop.org/drm/intel/issues/2587
  [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265
  [i915#2672]: https://gitlab.freedesktop.org/drm/intel/issues/2672
  [i915#280]: https://gitlab.freedesktop.org/drm/intel/issues/280
  [i915#284]: https://gitlab.freedesktop.org/drm/intel/issues/284
  [i915#2842]: https://gitlab.freedesktop.org/drm/intel/issues/2842
  [i915#2852]: https://gitlab.freedesktop.org/drm/intel/issues/2852
  [i915#2856]: https://gitlab.freedesktop.org/drm/intel/issues/2856
  [i915#2920]: https://gitlab.freedesktop.org/drm/intel/issues/2920
  [i915#2994]: https://gitlab.freedesktop.org/drm/intel/issues/2994
  [i915#3002]: https://gitlab.freedesktop.org/drm/intel/issues/3002
  [i915#3012]: https://gitlab.freedesktop.org/drm/intel/issues/3012
  [i915#3063]: https://gitlab.freedesktop.org/drm/intel/issues/3063
  [i915#3116]: https://gitlab.freedesktop.org/drm/intel/issues/3116
  [i915#3281]: https://gitlab.freedesktop.org/drm/intel/issues/3281
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3291]: https://gitlab.freedesktop.org/drm/intel/issues/3291
  [i915#3318]: https://gitlab.freedesktop.org/drm/intel/issues/3318
  [i915#3458]: https://gitlab.freedesktop.org/drm/intel/issues/3458
  [i915#3539]: https://gitlab.freedesktop.org/drm/intel/issues/3539
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3558]: https://gitlab.freedesktop.org/drm/intel/issues/3558
  [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637
  [i915#3638]: https://gitlab.freedesktop.org/drm/intel/issues/3638
  [i915#3689]: https://gitlab.freedesktop.org/drm/intel/issues/3689
  [i915#3701]: https://gitlab.freedesktop.org/drm/intel/issues/3701
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#3734]: https://gitlab.freedesktop.org/drm/intel/issues/3734
  [i915#3886]: https://gitlab.freedesktop.org/drm/intel/issues/3886
  [i915#3953]: https://gitlab.freedesktop.org/drm/intel/issues/3953
  [i915#3987]: https://gitlab.freedesktop.org/drm/intel/issues/3987
  [i915#4070]: https://gitlab.freedesktop.org/drm/intel/issues/4070
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4078]: https://gitlab.freedesktop.org/drm/intel/issues/4078
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4098]: https://gitlab.freedesktop.org/drm/intel/issues/4098
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4270]: https://gitlab.freedesktop.org/drm/intel/issues/4270
  [i915#4278]: https://gitlab.freedesktop.org/drm/intel/issues/4278
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4369]: https://gitlab.freedesktop.org/drm/intel/issues/4369
  [i915#4392]: https://gitlab.freedesktop.org/drm/intel/issues/4392
  [i915#4525]: https://gitlab.freedesktop.org/drm/intel/issues/4525
  [i915#4538]: https://gitlab.freedesktop.org/drm/intel/issues/4538
  [i915#454]: https://gitlab.freedesktop.org/drm/intel/issues/454
  [i915#4565]: https://gitlab.freedesktop.org/drm/intel/issues/4565
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4767]: https://gitlab.freedesktop.org/drm/intel/issues/4767
  [i915#4812]: https://gitlab.freedesktop.org/drm/intel/issues/4812
  [i915#4833]: https://gitlab.freedesktop.org/drm/intel/issues/4833
  [i915#4842]: https://gitlab.freedesktop.org/drm/intel/issues/4842
  [i915#4852]: https://gitlab.freedesktop.org/drm/intel/issues/4852
  [i915#4853]: https://gitlab.freedesktop.org/drm/intel/issues/4853
  [i915#4860]: https://gitlab.freedesktop.org/drm/intel/issues/4860
  [i915#4873]: https://gitlab.freedesktop.org/drm/intel/issues/4873
  [i915#4877]: https://gitlab.freedesktop.org/drm/intel/issues/4877
  [i915#4880]: https://gitlab.freedesktop.org/drm/intel/issues/4880
  [i915#4883]: https://gitlab.freedesktop.org/drm/intel/issues/4883
  [i915#4911]: https://gitlab.freedesktop.org/drm/intel/issues/4911
  [i915#4939]: https://gitlab.freedesktop.org/drm/intel/issues/4939
  [i915#4991]: https://gitlab.freedesktop.org/drm/intel/issues/4991
  [i915#5129]: https://gitlab.freedesktop.org/drm/intel/issues/5129
  [i915#5176]: https://gitlab.freedesktop.org/drm/intel/issues/5176
  [i915#5235]: https://gitlab.freedesktop.org/drm/intel/issues/5235
  [i915#5257]: https://gitlab.freedesktop.org/drm/intel/issues/5257
  [i915#5286]: https://gitlab.freedesktop.org/drm/intel/issues/5286
  [i915#5287]: https://gitlab.freedesktop.org/drm/intel/issues/5287
  [i915#5289]: https://gitlab.freedesktop.org/drm/intel/issues/5289
  [i915#5325]: https://gitlab.freedesktop.org/drm/intel/issues/5325
  [i915#5327]: https://gitlab.freedesktop.org/drm/intel/issues/5327
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533
  [i915#5439]: https://gitlab.freedesktop.org/drm/intel/issues/5439
  [i915#5563]: https://gitlab.freedesktop.org/drm/intel/issues/5563
  [i915#5784]: https://gitlab.freedesktop.org/drm/intel/issues/5784
  [i915#5843]: https://gitlab.freedesktop.org/drm/intel/issues/5843
  [i915#6076]: https://gitlab.freedesktop.org/drm/intel/issues/6076
  [i915#6095]: https://gitlab.freedesktop.org/drm/intel/issues/6095
  [i915#6117]: https://gitlab.freedesktop.org/drm/intel/issues/6117
  [i915#6140]: https://gitlab.freedesktop.org/drm/intel/issues/6140
  [i915#6201]: https://gitlab.freedesktop.org/drm/intel/issues/6201
  [i915#6227]: https://gitlab.freedesktop.org/drm/intel/issues/6227
  [i915#6247]: https://gitlab.freedesktop.org/drm/intel/issues/6247
  [i915#6248]: https://gitlab.freedesktop.org/drm/intel/issues/6248
  [i915#6252]: https://gitlab.freedesktop.org/drm/intel/issues/6252
  [i915#6258]: https://gitlab.freedesktop.org/drm/intel/issues/6258
  [i915#6268]: https://gitlab.freedesktop.org/drm/intel/issues/6268
  [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79


Build changes
-------------

  * Linux: CI_DRM_11805 -> Patchwork_105635v1

  CI-20190529: 20190529
  CI_DRM_11805: 2a406c5f1126c1220fdaf841df3ef0ae487cd067 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6542: d38a476ee4b9f9a95d8f452de0d66cc52f7f079b @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_105635v1: 2a406c5f1126c1220fdaf841df3ef0ae487cd067 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105635v1/index.html

[-- Attachment #2: Type: text/html, Size: 60039 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-26  1:49   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-06-30  0:33     ` Zanoni, Paulo R
  -1 siblings, 0 replies; 53+ messages in thread
From: Zanoni, Paulo R @ 2022-06-30  0:33 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Brost, Matthew, Landwerlin, Lionel G, Ursulin, Tvrtko, Wilson,
	Chris P, Hellstrom, Thomas, Zeng, Oak, Auld, Matthew, jason,
	Vetter, Daniel, christian.koenig

On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
> VM_BIND and related uapi definitions
> 
> v2: Reduce the scope to simple Mesa use case.
> v3: Expand VM_UNBIND documentation and add
>     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>     and I915_GEM_VM_BIND_TLB_FLUSH flags.
> v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>     documentation for vm_bind/unbind.
> v5: Remove TLB flush requirement on VM_UNBIND.
>     Add version support to stage implementation.
> v6: Define and use drm_i915_gem_timeline_fence structure for
>     all timeline fences.
> v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>     Update documentation on async vm_bind/unbind and versioning.
>     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>     batch_count field and I915_EXEC3_SECURE flag.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>  1 file changed, 280 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> 
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
> new file mode 100644
> index 000000000000..a93e08bceee6
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> @@ -0,0 +1,280 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +/**
> + * DOC: I915_PARAM_VM_BIND_VERSION
> + *
> + * VM_BIND feature version supported.
> + * See typedef drm_i915_getparam_t param.
> + *
> + * Specifies the VM_BIND feature version supported.
> + * The following versions of VM_BIND have been defined:
> + *
> + * 0: No VM_BIND support.
> + *
> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
> + *    previously with VM_BIND, the ioctl will not support unbinding multiple
> + *    mappings or splitting them. Similarly, VM_BIND calls will not replace
> + *    any existing mappings.
> + *
> + * 2: The restrictions on unbinding partial or multiple mappings is
> + *    lifted, Similarly, binding will replace any mappings in the given range.
> + *
> + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
> + */
> +#define I915_PARAM_VM_BIND_VERSION	57
> +
> +/**
> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> + *
> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> + * See struct drm_i915_gem_vm_control flags.
> + *
> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> + */
> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND	(1 << 0)
> +
> +/* VM_BIND related ioctls */
> +#define DRM_I915_GEM_VM_BIND		0x3d
> +#define DRM_I915_GEM_VM_UNBIND		0x3e
> +#define DRM_I915_GEM_EXECBUFFER3	0x3f
> +
> +#define DRM_IOCTL_I915_GEM_VM_BIND		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> +
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the completion of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> +	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
> +	__u32 handle;
> +
> +	/**
> +	 * @flags: Supported flags are:
> +	 *
> +	 * I915_TIMELINE_FENCE_WAIT:
> +	 * Wait for the input fence before the operation.
> +	 *
> +	 * I915_TIMELINE_FENCE_SIGNAL:
> +	 * Return operation completion fence as output.
> +	 */
> +	__u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
> +
> +	/**
> +	 * @value: A point in the timeline.
> +	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> +	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> +	 * binary one.
> +	 */
> +	__u64 value;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
> + * virtual address (VA) range to the section of an object that should be bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not currently bound) and can
> + * be mapped to whole object or a section of the object (partial binding).
> + * Multiple VA mappings can be created to the same section of the object
> + * (aliasing).
> + *
> + * The @start, @offset and @length must be 4K page aligned. However the DG2
> + * and XEHPSDV has 64K page size for device local-memory and has compact page
> + * table. On those platforms, for binding device local-memory objects, the
> + * @start must be 2M aligned, @offset and @length must be 64K aligned.
> + * Also, for such mappings, i915 will reserve the whole 2M range for it so as
> + * to not allow multiple mappings in that 2M range (Compact page tables do not
> + * allow 64K page and 4K page bindings in the same 2M range).
> + *
> + * Error code -EINVAL will be returned if @start, @offset and @length are not
> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
> + * -ENOSPC will be returned if the VA range specified can't be reserved.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
> + * asynchronously, if valid @fence is specified.

Does that mean that if I don't provide @fence, then this ioctl will be
synchronous (i.e., when it returns, the memory will be guaranteed to be
bound)? The text is kinda implying that, but from one of your earlier
replies to Tvrtko, that doesn't seem to be the case. I guess we could
change the text to make this more explicit.

In addition, previously we had the guarantee that an execbuf ioctl
would wait for all the pending vm_bind operations to finish before
doing anything. Do we still have this guarantee or do we have to make
use of the fences now?

> + */
> +struct drm_i915_gem_vm_bind {
> +	/** @vm_id: VM (address space) id to bind */
> +	__u32 vm_id;
> +
> +	/** @handle: Object handle */
> +	__u32 handle;
> +
> +	/** @start: Virtual Address start to bind */
> +	__u64 start;
> +
> +	/** @offset: Offset in object to bind */
> +	__u64 offset;
> +
> +	/** @length: Length of mapping to bind */
> +	__u64 length;
> +
> +	/**
> +	 * @flags: Supported flags are:
> +	 *
> +	 * I915_GEM_VM_BIND_READONLY:
> +	 * Mapping is read-only.

Can you please explain what happens when we try to write to a range
that's bound as read-only?


> +	 *
> +	 * I915_GEM_VM_BIND_CAPTURE:
> +	 * Capture this mapping in the dump upon GPU error.
> +	 */
> +	__u64 flags;
> +#define I915_GEM_VM_BIND_READONLY	(1 << 1)
> +#define I915_GEM_VM_BIND_CAPTURE	(1 << 2)
> +
> +	/**
> +	 * @fence: Timeline fence for bind completion signaling.
> +	 *
> +	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +	 * is invalid, and an error will be returned.
> +	 */
> +	struct drm_i915_gem_timeline_fence fence;
> +
> +	/**
> +	 * @extensions: Zero-terminated chain of extensions.
> +	 *
> +	 * For future extensions. See struct i915_user_extension.
> +	 */
> +	__u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> + *
> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
> + * address (VA) range that should be unbound from the device page table of the
> + * specified address space (VM). VM_UNBIND will force unbind the specified
> + * range from device page table without waiting for any GPU job to complete.
> + * It is UMDs responsibility to ensure the mapping is no longer in use before
> + * calling VM_UNBIND.
> + *
> + * If the specified mapping is not found, the ioctl will simply return without
> + * any error.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_unbind {
> +	/** @vm_id: VM (address space) id to bind */
> +	__u32 vm_id;
> +
> +	/** @rsvd: Reserved, MBZ */
> +	__u32 rsvd;
> +
> +	/** @start: Virtual Address start to unbind */
> +	__u64 start;
> +
> +	/** @length: Length of mapping to unbind */
> +	__u64 length;
> +
> +	/** @flags: Currently reserved, MBZ */
> +	__u64 flags;
> +
> +	/**
> +	 * @fence: Timeline fence for unbind completion signaling.
> +	 *
> +	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +	 * is invalid, and an error will be returned.
> +	 */
> +	struct drm_i915_gem_timeline_fence fence;
> +
> +	/**
> +	 * @extensions: Zero-terminated chain of extensions.
> +	 *
> +	 * For future extensions. See struct i915_user_extension.
> +	 */
> +	__u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
> + * ioctl.
> + *
> + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
> + * only works with this ioctl for submission.
> + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
> + */
> +struct drm_i915_gem_execbuffer3 {
> +	/**
> +	 * @ctx_id: Context id
> +	 *
> +	 * Only contexts with user engine map are allowed.
> +	 */
> +	__u32 ctx_id;
> +
> +	/**
> +	 * @engine_idx: Engine index
> +	 *
> +	 * An index in the user engine map of the context specified by @ctx_id.
> +	 */
> +	__u32 engine_idx;
> +
> +	/**
> +	 * @batch_address: Batch gpu virtual address/es.
> +	 *
> +	 * For normal submission, it is the gpu virtual address of the batch
> +	 * buffer. For parallel submission, it is a pointer to an array of
> +	 * batch buffer gpu virtual addresses with array size equal to the
> +	 * number of (parallel) engines involved in that submission (See
> +	 * struct i915_context_engines_parallel_submit).
> +	 */
> +	__u64 batch_address;
> +
> +	/** @flags: Currently reserved, MBZ */
> +	__u64 flags;
> +
> +	/** @rsvd1: Reserved, MBZ */
> +	__u32 rsvd1;
> +
> +	/** @fence_count: Number of fences in @timeline_fences array. */
> +	__u32 fence_count;
> +
> +	/**
> +	 * @timeline_fences: Pointer to an array of timeline fences.
> +	 *
> +	 * Timeline fences are of format struct drm_i915_gem_timeline_fence.
> +	 */
> +	__u64 timeline_fences;
> +
> +	/** @rsvd2: Reserved, MBZ */
> +	__u64 rsvd2;
> +

Just out of curiosity: if we can extend behavior with @extensions and
even @flags, why would we need a rsvd2? Perhaps we could kill rsvd2?

> +	/**
> +	 * @extensions: Zero-terminated chain of extensions.
> +	 *
> +	 * For future extensions. See struct i915_user_extension.
> +	 */
> +	__u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
> + * private to the specified VM.
> + *
> + * See struct drm_i915_gem_create_ext.
> + */
> +struct drm_i915_gem_create_ext_vm_private {
> +#define I915_GEM_CREATE_EXT_VM_PRIVATE		2
> +	/** @base: Extension link. See struct i915_user_extension. */
> +	struct i915_user_extension base;
> +
> +	/** @vm_id: Id of the VM to which the object is private */
> +	__u32 vm_id;
> +};


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-30  0:33     ` Zanoni, Paulo R
  0 siblings, 0 replies; 53+ messages in thread
From: Zanoni, Paulo R @ 2022-06-30  0:33 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Wilson, Chris P, Hellstrom, Thomas, Auld, Matthew, Vetter,
	Daniel, christian.koenig

On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
> VM_BIND and related uapi definitions
> 
> v2: Reduce the scope to simple Mesa use case.
> v3: Expand VM_UNBIND documentation and add
>     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>     and I915_GEM_VM_BIND_TLB_FLUSH flags.
> v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>     documentation for vm_bind/unbind.
> v5: Remove TLB flush requirement on VM_UNBIND.
>     Add version support to stage implementation.
> v6: Define and use drm_i915_gem_timeline_fence structure for
>     all timeline fences.
> v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>     Update documentation on async vm_bind/unbind and versioning.
>     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>     batch_count field and I915_EXEC3_SECURE flag.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>  1 file changed, 280 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> 
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
> new file mode 100644
> index 000000000000..a93e08bceee6
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> @@ -0,0 +1,280 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +/**
> + * DOC: I915_PARAM_VM_BIND_VERSION
> + *
> + * VM_BIND feature version supported.
> + * See typedef drm_i915_getparam_t param.
> + *
> + * Specifies the VM_BIND feature version supported.
> + * The following versions of VM_BIND have been defined:
> + *
> + * 0: No VM_BIND support.
> + *
> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
> + *    previously with VM_BIND, the ioctl will not support unbinding multiple
> + *    mappings or splitting them. Similarly, VM_BIND calls will not replace
> + *    any existing mappings.
> + *
> + * 2: The restrictions on unbinding partial or multiple mappings is
> + *    lifted, Similarly, binding will replace any mappings in the given range.
> + *
> + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
> + */
> +#define I915_PARAM_VM_BIND_VERSION	57
> +
> +/**
> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> + *
> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> + * See struct drm_i915_gem_vm_control flags.
> + *
> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> + */
> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND	(1 << 0)
> +
> +/* VM_BIND related ioctls */
> +#define DRM_I915_GEM_VM_BIND		0x3d
> +#define DRM_I915_GEM_VM_UNBIND		0x3e
> +#define DRM_I915_GEM_EXECBUFFER3	0x3f
> +
> +#define DRM_IOCTL_I915_GEM_VM_BIND		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> +
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the completion of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> +	/** @handle: User's handle for a drm_syncobj to wait on or signal. */
> +	__u32 handle;
> +
> +	/**
> +	 * @flags: Supported flags are:
> +	 *
> +	 * I915_TIMELINE_FENCE_WAIT:
> +	 * Wait for the input fence before the operation.
> +	 *
> +	 * I915_TIMELINE_FENCE_SIGNAL:
> +	 * Return operation completion fence as output.
> +	 */
> +	__u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
> +
> +	/**
> +	 * @value: A point in the timeline.
> +	 * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> +	 * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> +	 * binary one.
> +	 */
> +	__u64 value;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
> + * virtual address (VA) range to the section of an object that should be bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not currently bound) and can
> + * be mapped to whole object or a section of the object (partial binding).
> + * Multiple VA mappings can be created to the same section of the object
> + * (aliasing).
> + *
> + * The @start, @offset and @length must be 4K page aligned. However the DG2
> + * and XEHPSDV has 64K page size for device local-memory and has compact page
> + * table. On those platforms, for binding device local-memory objects, the
> + * @start must be 2M aligned, @offset and @length must be 64K aligned.
> + * Also, for such mappings, i915 will reserve the whole 2M range for it so as
> + * to not allow multiple mappings in that 2M range (Compact page tables do not
> + * allow 64K page and 4K page bindings in the same 2M range).
> + *
> + * Error code -EINVAL will be returned if @start, @offset and @length are not
> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
> + * -ENOSPC will be returned if the VA range specified can't be reserved.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
> + * asynchronously, if valid @fence is specified.

Does that mean that if I don't provide @fence, then this ioctl will be
synchronous (i.e., when it returns, the memory will be guaranteed to be
bound)? The text is kinda implying that, but from one of your earlier
replies to Tvrtko, that doesn't seem to be the case. I guess we could
change the text to make this more explicit.

In addition, previously we had the guarantee that an execbuf ioctl
would wait for all the pending vm_bind operations to finish before
doing anything. Do we still have this guarantee or do we have to make
use of the fences now?

> + */
> +struct drm_i915_gem_vm_bind {
> +	/** @vm_id: VM (address space) id to bind */
> +	__u32 vm_id;
> +
> +	/** @handle: Object handle */
> +	__u32 handle;
> +
> +	/** @start: Virtual Address start to bind */
> +	__u64 start;
> +
> +	/** @offset: Offset in object to bind */
> +	__u64 offset;
> +
> +	/** @length: Length of mapping to bind */
> +	__u64 length;
> +
> +	/**
> +	 * @flags: Supported flags are:
> +	 *
> +	 * I915_GEM_VM_BIND_READONLY:
> +	 * Mapping is read-only.

Can you please explain what happens when we try to write to a range
that's bound as read-only?


> +	 *
> +	 * I915_GEM_VM_BIND_CAPTURE:
> +	 * Capture this mapping in the dump upon GPU error.
> +	 */
> +	__u64 flags;
> +#define I915_GEM_VM_BIND_READONLY	(1 << 1)
> +#define I915_GEM_VM_BIND_CAPTURE	(1 << 2)
> +
> +	/**
> +	 * @fence: Timeline fence for bind completion signaling.
> +	 *
> +	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +	 * is invalid, and an error will be returned.
> +	 */
> +	struct drm_i915_gem_timeline_fence fence;
> +
> +	/**
> +	 * @extensions: Zero-terminated chain of extensions.
> +	 *
> +	 * For future extensions. See struct i915_user_extension.
> +	 */
> +	__u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> + *
> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
> + * address (VA) range that should be unbound from the device page table of the
> + * specified address space (VM). VM_UNBIND will force unbind the specified
> + * range from device page table without waiting for any GPU job to complete.
> + * It is UMDs responsibility to ensure the mapping is no longer in use before
> + * calling VM_UNBIND.
> + *
> + * If the specified mapping is not found, the ioctl will simply return without
> + * any error.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_unbind {
> +	/** @vm_id: VM (address space) id to bind */
> +	__u32 vm_id;
> +
> +	/** @rsvd: Reserved, MBZ */
> +	__u32 rsvd;
> +
> +	/** @start: Virtual Address start to unbind */
> +	__u64 start;
> +
> +	/** @length: Length of mapping to unbind */
> +	__u64 length;
> +
> +	/** @flags: Currently reserved, MBZ */
> +	__u64 flags;
> +
> +	/**
> +	 * @fence: Timeline fence for unbind completion signaling.
> +	 *
> +	 * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +	 * is invalid, and an error will be returned.
> +	 */
> +	struct drm_i915_gem_timeline_fence fence;
> +
> +	/**
> +	 * @extensions: Zero-terminated chain of extensions.
> +	 *
> +	 * For future extensions. See struct i915_user_extension.
> +	 */
> +	__u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
> + * ioctl.
> + *
> + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
> + * only works with this ioctl for submission.
> + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
> + */
> +struct drm_i915_gem_execbuffer3 {
> +	/**
> +	 * @ctx_id: Context id
> +	 *
> +	 * Only contexts with user engine map are allowed.
> +	 */
> +	__u32 ctx_id;
> +
> +	/**
> +	 * @engine_idx: Engine index
> +	 *
> +	 * An index in the user engine map of the context specified by @ctx_id.
> +	 */
> +	__u32 engine_idx;
> +
> +	/**
> +	 * @batch_address: Batch gpu virtual address/es.
> +	 *
> +	 * For normal submission, it is the gpu virtual address of the batch
> +	 * buffer. For parallel submission, it is a pointer to an array of
> +	 * batch buffer gpu virtual addresses with array size equal to the
> +	 * number of (parallel) engines involved in that submission (See
> +	 * struct i915_context_engines_parallel_submit).
> +	 */
> +	__u64 batch_address;
> +
> +	/** @flags: Currently reserved, MBZ */
> +	__u64 flags;
> +
> +	/** @rsvd1: Reserved, MBZ */
> +	__u32 rsvd1;
> +
> +	/** @fence_count: Number of fences in @timeline_fences array. */
> +	__u32 fence_count;
> +
> +	/**
> +	 * @timeline_fences: Pointer to an array of timeline fences.
> +	 *
> +	 * Timeline fences are of format struct drm_i915_gem_timeline_fence.
> +	 */
> +	__u64 timeline_fences;
> +
> +	/** @rsvd2: Reserved, MBZ */
> +	__u64 rsvd2;
> +

Just out of curiosity: if we can extend behavior with @extensions and
even @flags, why would we need a rsvd2? Perhaps we could kill rsvd2?

> +	/**
> +	 * @extensions: Zero-terminated chain of extensions.
> +	 *
> +	 * For future extensions. See struct i915_user_extension.
> +	 */
> +	__u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
> + * private to the specified VM.
> + *
> + * See struct drm_i915_gem_create_ext.
> + */
> +struct drm_i915_gem_create_ext_vm_private {
> +#define I915_GEM_CREATE_EXT_VM_PRIVATE		2
> +	/** @base: Extension link. See struct i915_user_extension. */
> +	struct i915_user_extension base;
> +
> +	/** @vm_id: Id of the VM to which the object is private */
> +	__u32 vm_id;
> +};


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 1/3] drm/doc/rfc: VM_BIND feature design document
  2022-06-26  1:49   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-06-30  0:38     ` Zanoni, Paulo R
  -1 siblings, 0 replies; 53+ messages in thread
From: Zanoni, Paulo R @ 2022-06-30  0:38 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Brost, Matthew, Landwerlin, Lionel G, Ursulin, Tvrtko, Wilson,
	Chris P, Hellstrom, Thomas, Zeng, Oak, Auld, Matthew, jason,
	Vetter, Daniel, christian.koenig

On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
> VM_BIND design document with description of intended use cases.
> 
> v2: Reduce the scope to simple Mesa use case.
> v3: Expand documentation on dma-resv usage, TLB flushing and
>     execbuf3.
> v4: Remove vm_bind tlb flush request support.
> v5: Update TLB flushing documentation.
> v6: Update out of order completion documentation.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  Documentation/gpu/rfc/i915_vm_bind.rst | 246 +++++++++++++++++++++++++
>  Documentation/gpu/rfc/index.rst        |   4 +
>  2 files changed, 250 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst
> 
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst b/Documentation/gpu/rfc/i915_vm_bind.rst
> new file mode 100644
> index 000000000000..032ee32b885c
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.rst
> @@ -0,0 +1,246 @@
> +==========================================
> +I915 VM_BIND feature design and use cases
> +==========================================
> +
> +VM_BIND feature
> +================
> +DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
> +objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
> +specified address space (VM). These mappings (also referred to as persistent
> +mappings) will be persistent across multiple GPU submissions (execbuf calls)
> +issued by the UMD, without user having to provide a list of all required
> +mappings during each submission (as required by older execbuf mode).
> +
> +The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
> +signaling the completion of bind/unbind operation.
> +
> +VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND.

I915_PARAM_VM_BIND_VERSION


> +User has to opt-in for VM_BIND mode of binding for an address space (VM)
> +during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
> +
> +VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently are
> +not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be done
> +asynchronously, when valid out fence is specified.
> +
> +VM_BIND features include:
> +
> +* Multiple Virtual Address (VA) mappings can map to the same physical pages
> +  of an object (aliasing).
> +* VA mapping can map to a partial section of the BO (partial binding).
> +* Support capture of persistent mappings in the dump upon GPU error.
> +* Support for userptr gem objects (no special uapi is required for this).
> +
> +TLB flush consideration
> +------------------------
> +The i915 driver flushes the TLB for each submission and when an object's
> +pages are released. The VM_BIND/UNBIND operation will not do any additional
> +TLB flush. Any VM_BIND mapping added will be in the working set for subsequent
> +submissions on that VM and will not be in the working set for currently running
> +batches (which would require additional TLB flushes, which is not supported).
> +
> +Execbuf ioctl in VM_BIND mode
> +-------------------------------
> +A VM in VM_BIND mode will not support older execbuf mode of binding.
> +The execbuf ioctl handling in VM_BIND mode differs significantly from the
> +older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
> +Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
> +struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
> +execlist. Hence, no support for implicit sync. It is expected that the below
> +work will be able to support requirements of object dependency setting in all
> +use cases:
> +
> +"dma-buf: Add an API for exporting sync files"
> +(https://lwn.net/Articles/859290/)
> +
> +The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
> +works with execbuf3 ioctl for submission. All BOs mapped on that VM (through
> +VM_BIND call) at the time of execbuf3 call are deemed required for that
> +submission.
> +
> +The execbuf3 ioctl directly specifies the batch addresses instead of as
> +object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
> +support many of the older features like in/out/submit fences, fence array,
> +default gem context and many more (See struct drm_i915_gem_execbuffer3).

Just as a note: both Iris and Vulkan use some of these features, so
some rework will be required. From what I can see, all current behavior
we depend on will be supported in some way or another, so hopefully
we'll be fine.


> +
> +In VM_BIND mode, VA allocation is completely managed by the user instead of
> +the i915 driver. Hence all VA assignment, eviction are not applicable in
> +VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
> +be using the i915_vma active reference tracking. It will instead use dma-resv
> +object for that (See `VM_BIND dma_resv usage`_).
> +
> +So, a lot of existing code supporting execbuf2 ioctl, like relocations, VA
> +evictions, vma lookup table, implicit sync, vma active reference tracking etc.,
> +are not applicable for execbuf3 ioctl. Hence, all execbuf3 specific handling
> +should be in a separate file and only functionalities common to these ioctls
> +can be the shared code where possible.
> +
> +VM_PRIVATE objects
> +-------------------
> +By default, BOs can be mapped on multiple VMs and can also be dma-buf
> +exported. Hence these BOs are referred to as Shared BOs.
> +During each execbuf submission, the request fence must be added to the
> +dma-resv fence list of all shared BOs mapped on the VM.
> +
> +VM_BIND feature introduces an optimization where user can create BO which
> +is private to a specified VM via I915_GEM_CREATE_EXT_VM_PRIVATE flag during
> +BO creation. Unlike Shared BOs, these VM private BOs can only be mapped on
> +the VM they are private to and can't be dma-buf exported.
> +All private BOs of a VM share the dma-resv object. Hence during each execbuf
> +submission, they need only one dma-resv fence list updated. Thus, the fast
> +path (where required mappings are already bound) submission latency is O(1)
> +w.r.t the number of VM private BOs.
> +
> +VM_BIND locking hirarchy
> +-------------------------
> +The locking design here supports the older (execlist based) execbuf mode, the
> +newer VM_BIND mode, the VM_BIND mode with GPU page faults and possible future
> +system allocator support (See `Shared Virtual Memory (SVM) support`_).
> +The older execbuf mode and the newer VM_BIND mode without page faults manages
> +residency of backing storage using dma_fence. The VM_BIND mode with page faults
> +and the system allocator support do not use any dma_fence at all.
> +
> +VM_BIND locking order is as below.
> +
> +1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
> +   vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
> +   mapping.
> +
> +   In future, when GPU page faults are supported, we can potentially use a
> +   rwsem instead, so that multiple page fault handlers can take the read side
> +   lock to lookup the mapping and hence can run in parallel.
> +   The older execbuf mode of binding do not need this lock.
> +
> +2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs to
> +   be held while binding/unbinding a vma in the async worker and while updating
> +   dma-resv fence list of an object. Note that private BOs of a VM will all
> +   share a dma-resv object.
> +
> +   The future system allocator support will use the HMM prescribed locking
> +   instead.
> +
> +3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
> +   invalidated vmas (due to eviction and userptr invalidation) etc.
> +
> +When GPU page faults are supported, the execbuf path do not take any of these
> +locks. There we will simply smash the new batch buffer address into the ring and
> +then tell the scheduler run that. The lock taking only happens from the page
> +fault handler, where we take lock-A in read mode, whichever lock-B we need to
> +find the backing storage (dma_resv lock for gem objects, and hmm/core mm for
> +system allocator) and some additional locks (lock-D) for taking care of page
> +table races. Page fault mode should not need to ever manipulate the vm lists,
> +so won't ever need lock-C.
> +
> +VM_BIND LRU handling
> +---------------------
> +We need to ensure VM_BIND mapped objects are properly LRU tagged to avoid
> +performance degradation. We will also need support for bulk LRU movement of
> +VM_BIND objects to avoid additional latencies in execbuf path.
> +
> +The page table pages are similar to VM_BIND mapped objects (See
> +`Evictable page table allocations`_) and are maintained per VM and needs to
> +be pinned in memory when VM is made active (ie., upon an execbuf call with
> +that VM). So, bulk LRU movement of page table pages is also needed.
> +
> +VM_BIND dma_resv usage
> +-----------------------
> +Fences needs to be added to all VM_BIND mapped objects. During each execbuf
> +submission, they are added with DMA_RESV_USAGE_BOOKKEEP usage to prevent
> +over sync (See enum dma_resv_usage). One can override it with either
> +DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE usage during explicit object
> +dependency setting.
> +
> +Note that DRM_I915_GEM_WAIT and DRM_I915_GEM_BUSY ioctls do not check for
> +DMA_RESV_USAGE_BOOKKEEP usage and hence should not be used for end of batch
> +check. Instead, the execbuf3 out fence should be used for end of batch check
> +(See struct drm_i915_gem_execbuffer3).

From what I remember Mesa is calling gem_wait and gem_busy on batches
sometimes, so some adjusting will be required.


-

From what I could understand, the general plan seems fine. We'll need
some adjusting in our drivers before we can even try to use this new
API, but hopefully the API will be usable with the current plans. If it
isn't, then we can always change the plan. So, with that said, the plan
is:

Acked-by: Paulo Zanoni <paulo.r.zanoni@intel.com>


> +
> +Also, in VM_BIND mode, use dma-resv apis for determining object activeness
> +(See dma_resv_test_signaled() and dma_resv_wait_timeout()) and do not use the
> +older i915_vma active reference tracking which is deprecated. This should be
> +easier to get it working with the current TTM backend.
> +
> +Mesa use case
> +--------------
> +VM_BIND can potentially reduce the CPU overhead in Mesa (both Vulkan and Iris),
> +hence improving performance of CPU-bound applications. It also allows us to
> +implement Vulkan's Sparse Resources. With increasing GPU hardware performance,
> +reducing CPU overhead becomes more impactful.
> +
> +
> +Other VM_BIND use cases
> +========================
> +
> +Long running Compute contexts
> +------------------------------
> +Usage of dma-fence expects that they complete in reasonable amount of time.
> +Compute on the other hand can be long running. Hence it is appropriate for
> +compute to use user/memory fence (See `User/Memory Fence`_) and dma-fence usage
> +must be limited to in-kernel consumption only.
> +
> +Where GPU page faults are not available, kernel driver upon buffer invalidation
> +will initiate a suspend (preemption) of long running context, finish the
> +invalidation, revalidate the BO and then resume the compute context. This is
> +done by having a per-context preempt fence which is enabled when someone tries
> +to wait on it and triggers the context preemption.
> +
> +User/Memory Fence
> +~~~~~~~~~~~~~~~~~~
> +User/Memory fence is a <address, value> pair. To signal the user fence, the
> +specified value will be written at the specified virtual address and wakeup the
> +waiting process. User fence can be signaled either by the GPU or kernel async
> +worker (like upon bind completion). User can wait on a user fence with a new
> +user fence wait ioctl.
> +
> +Here is some prior work on this:
> +https://patchwork.freedesktop.org/patch/349417/
> +
> +Low Latency Submission
> +~~~~~~~~~~~~~~~~~~~~~~~
> +Allows compute UMD to directly submit GPU jobs instead of through execbuf
> +ioctl. This is made possible by VM_BIND is not being synchronized against
> +execbuf. VM_BIND allows bind/unbind of mappings required for the directly
> +submitted jobs.
> +
> +Debugger
> +---------
> +With debug event interface user space process (debugger) is able to keep track
> +of and act upon resources created by another process (debugged) and attached
> +to GPU via vm_bind interface.
> +
> +GPU page faults
> +----------------
> +GPU page faults when supported (in future), will only be supported in the
> +VM_BIND mode. While both the older execbuf mode and the newer VM_BIND mode of
> +binding will require using dma-fence to ensure residency, the GPU page faults
> +mode when supported, will not use any dma-fence as residency is purely managed
> +by installing and removing/invalidating page table entries.
> +
> +Page level hints settings
> +--------------------------
> +VM_BIND allows any hints setting per mapping instead of per BO.
> +Possible hints include read-only mapping, placement and atomicity.
> +Sub-BO level placement hint will be even more relevant with
> +upcoming GPU on-demand page fault support.
> +
> +Page level Cache/CLOS settings
> +-------------------------------
> +VM_BIND allows cache/CLOS settings per mapping instead of per BO.
> +
> +Evictable page table allocations
> +---------------------------------
> +Make pagetable allocations evictable and manage them similar to VM_BIND
> +mapped objects. Page table pages are similar to persistent mappings of a
> +VM (difference here are that the page table pages will not have an i915_vma
> +structure and after swapping pages back in, parent page link needs to be
> +updated).
> +
> +Shared Virtual Memory (SVM) support
> +------------------------------------
> +VM_BIND interface can be used to map system memory directly (without gem BO
> +abstraction) using the HMM interface. SVM is only supported with GPU page
> +faults enabled.
> +
> +VM_BIND UAPI
> +=============
> +
> +.. kernel-doc:: Documentation/gpu/rfc/i915_vm_bind.h
> diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
> index 91e93a705230..7d10c36b268d 100644
> --- a/Documentation/gpu/rfc/index.rst
> +++ b/Documentation/gpu/rfc/index.rst
> @@ -23,3 +23,7 @@ host such documentation:
>  .. toctree::
>  
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>      i915_scheduler.rst
> +
> +.. toctree::
> +
> +    i915_vm_bind.rst


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 1/3] drm/doc/rfc: VM_BIND feature design document
@ 2022-06-30  0:38     ` Zanoni, Paulo R
  0 siblings, 0 replies; 53+ messages in thread
From: Zanoni, Paulo R @ 2022-06-30  0:38 UTC (permalink / raw)
  To: dri-devel, Vishwanathapura, Niranjana, intel-gfx
  Cc: Wilson, Chris P, Hellstrom, Thomas, Auld, Matthew, Vetter,
	Daniel, christian.koenig

On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
> VM_BIND design document with description of intended use cases.
> 
> v2: Reduce the scope to simple Mesa use case.
> v3: Expand documentation on dma-resv usage, TLB flushing and
>     execbuf3.
> v4: Remove vm_bind tlb flush request support.
> v5: Update TLB flushing documentation.
> v6: Update out of order completion documentation.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  Documentation/gpu/rfc/i915_vm_bind.rst | 246 +++++++++++++++++++++++++
>  Documentation/gpu/rfc/index.rst        |   4 +
>  2 files changed, 250 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst
> 
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst b/Documentation/gpu/rfc/i915_vm_bind.rst
> new file mode 100644
> index 000000000000..032ee32b885c
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.rst
> @@ -0,0 +1,246 @@
> +==========================================
> +I915 VM_BIND feature design and use cases
> +==========================================
> +
> +VM_BIND feature
> +================
> +DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
> +objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
> +specified address space (VM). These mappings (also referred to as persistent
> +mappings) will be persistent across multiple GPU submissions (execbuf calls)
> +issued by the UMD, without user having to provide a list of all required
> +mappings during each submission (as required by older execbuf mode).
> +
> +The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
> +signaling the completion of bind/unbind operation.
> +
> +VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND.

I915_PARAM_VM_BIND_VERSION


> +User has to opt-in for VM_BIND mode of binding for an address space (VM)
> +during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
> +
> +VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently are
> +not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be done
> +asynchronously, when valid out fence is specified.
> +
> +VM_BIND features include:
> +
> +* Multiple Virtual Address (VA) mappings can map to the same physical pages
> +  of an object (aliasing).
> +* VA mapping can map to a partial section of the BO (partial binding).
> +* Support capture of persistent mappings in the dump upon GPU error.
> +* Support for userptr gem objects (no special uapi is required for this).
> +
> +TLB flush consideration
> +------------------------
> +The i915 driver flushes the TLB for each submission and when an object's
> +pages are released. The VM_BIND/UNBIND operation will not do any additional
> +TLB flush. Any VM_BIND mapping added will be in the working set for subsequent
> +submissions on that VM and will not be in the working set for currently running
> +batches (which would require additional TLB flushes, which is not supported).
> +
> +Execbuf ioctl in VM_BIND mode
> +-------------------------------
> +A VM in VM_BIND mode will not support older execbuf mode of binding.
> +The execbuf ioctl handling in VM_BIND mode differs significantly from the
> +older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
> +Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
> +struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
> +execlist. Hence, no support for implicit sync. It is expected that the below
> +work will be able to support requirements of object dependency setting in all
> +use cases:
> +
> +"dma-buf: Add an API for exporting sync files"
> +(https://lwn.net/Articles/859290/)
> +
> +The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
> +works with execbuf3 ioctl for submission. All BOs mapped on that VM (through
> +VM_BIND call) at the time of execbuf3 call are deemed required for that
> +submission.
> +
> +The execbuf3 ioctl directly specifies the batch addresses instead of as
> +object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
> +support many of the older features like in/out/submit fences, fence array,
> +default gem context and many more (See struct drm_i915_gem_execbuffer3).

Just as a note: both Iris and Vulkan use some of these features, so
some rework will be required. From what I can see, all current behavior
we depend on will be supported in some way or another, so hopefully
we'll be fine.


> +
> +In VM_BIND mode, VA allocation is completely managed by the user instead of
> +the i915 driver. Hence all VA assignment, eviction are not applicable in
> +VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
> +be using the i915_vma active reference tracking. It will instead use dma-resv
> +object for that (See `VM_BIND dma_resv usage`_).
> +
> +So, a lot of existing code supporting execbuf2 ioctl, like relocations, VA
> +evictions, vma lookup table, implicit sync, vma active reference tracking etc.,
> +are not applicable for execbuf3 ioctl. Hence, all execbuf3 specific handling
> +should be in a separate file and only functionalities common to these ioctls
> +can be the shared code where possible.
> +
> +VM_PRIVATE objects
> +-------------------
> +By default, BOs can be mapped on multiple VMs and can also be dma-buf
> +exported. Hence these BOs are referred to as Shared BOs.
> +During each execbuf submission, the request fence must be added to the
> +dma-resv fence list of all shared BOs mapped on the VM.
> +
> +VM_BIND feature introduces an optimization where user can create BO which
> +is private to a specified VM via I915_GEM_CREATE_EXT_VM_PRIVATE flag during
> +BO creation. Unlike Shared BOs, these VM private BOs can only be mapped on
> +the VM they are private to and can't be dma-buf exported.
> +All private BOs of a VM share the dma-resv object. Hence during each execbuf
> +submission, they need only one dma-resv fence list updated. Thus, the fast
> +path (where required mappings are already bound) submission latency is O(1)
> +w.r.t the number of VM private BOs.
> +
> +VM_BIND locking hirarchy
> +-------------------------
> +The locking design here supports the older (execlist based) execbuf mode, the
> +newer VM_BIND mode, the VM_BIND mode with GPU page faults and possible future
> +system allocator support (See `Shared Virtual Memory (SVM) support`_).
> +The older execbuf mode and the newer VM_BIND mode without page faults manages
> +residency of backing storage using dma_fence. The VM_BIND mode with page faults
> +and the system allocator support do not use any dma_fence at all.
> +
> +VM_BIND locking order is as below.
> +
> +1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
> +   vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
> +   mapping.
> +
> +   In future, when GPU page faults are supported, we can potentially use a
> +   rwsem instead, so that multiple page fault handlers can take the read side
> +   lock to lookup the mapping and hence can run in parallel.
> +   The older execbuf mode of binding do not need this lock.
> +
> +2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs to
> +   be held while binding/unbinding a vma in the async worker and while updating
> +   dma-resv fence list of an object. Note that private BOs of a VM will all
> +   share a dma-resv object.
> +
> +   The future system allocator support will use the HMM prescribed locking
> +   instead.
> +
> +3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
> +   invalidated vmas (due to eviction and userptr invalidation) etc.
> +
> +When GPU page faults are supported, the execbuf path do not take any of these
> +locks. There we will simply smash the new batch buffer address into the ring and
> +then tell the scheduler run that. The lock taking only happens from the page
> +fault handler, where we take lock-A in read mode, whichever lock-B we need to
> +find the backing storage (dma_resv lock for gem objects, and hmm/core mm for
> +system allocator) and some additional locks (lock-D) for taking care of page
> +table races. Page fault mode should not need to ever manipulate the vm lists,
> +so won't ever need lock-C.
> +
> +VM_BIND LRU handling
> +---------------------
> +We need to ensure VM_BIND mapped objects are properly LRU tagged to avoid
> +performance degradation. We will also need support for bulk LRU movement of
> +VM_BIND objects to avoid additional latencies in execbuf path.
> +
> +The page table pages are similar to VM_BIND mapped objects (See
> +`Evictable page table allocations`_) and are maintained per VM and needs to
> +be pinned in memory when VM is made active (ie., upon an execbuf call with
> +that VM). So, bulk LRU movement of page table pages is also needed.
> +
> +VM_BIND dma_resv usage
> +-----------------------
> +Fences needs to be added to all VM_BIND mapped objects. During each execbuf
> +submission, they are added with DMA_RESV_USAGE_BOOKKEEP usage to prevent
> +over sync (See enum dma_resv_usage). One can override it with either
> +DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE usage during explicit object
> +dependency setting.
> +
> +Note that DRM_I915_GEM_WAIT and DRM_I915_GEM_BUSY ioctls do not check for
> +DMA_RESV_USAGE_BOOKKEEP usage and hence should not be used for end of batch
> +check. Instead, the execbuf3 out fence should be used for end of batch check
> +(See struct drm_i915_gem_execbuffer3).

From what I remember Mesa is calling gem_wait and gem_busy on batches
sometimes, so some adjusting will be required.


-

From what I could understand, the general plan seems fine. We'll need
some adjusting in our drivers before we can even try to use this new
API, but hopefully the API will be usable with the current plans. If it
isn't, then we can always change the plan. So, with that said, the plan
is:

Acked-by: Paulo Zanoni <paulo.r.zanoni@intel.com>


> +
> +Also, in VM_BIND mode, use dma-resv apis for determining object activeness
> +(See dma_resv_test_signaled() and dma_resv_wait_timeout()) and do not use the
> +older i915_vma active reference tracking which is deprecated. This should be
> +easier to get it working with the current TTM backend.
> +
> +Mesa use case
> +--------------
> +VM_BIND can potentially reduce the CPU overhead in Mesa (both Vulkan and Iris),
> +hence improving performance of CPU-bound applications. It also allows us to
> +implement Vulkan's Sparse Resources. With increasing GPU hardware performance,
> +reducing CPU overhead becomes more impactful.
> +
> +
> +Other VM_BIND use cases
> +========================
> +
> +Long running Compute contexts
> +------------------------------
> +Usage of dma-fence expects that they complete in reasonable amount of time.
> +Compute on the other hand can be long running. Hence it is appropriate for
> +compute to use user/memory fence (See `User/Memory Fence`_) and dma-fence usage
> +must be limited to in-kernel consumption only.
> +
> +Where GPU page faults are not available, kernel driver upon buffer invalidation
> +will initiate a suspend (preemption) of long running context, finish the
> +invalidation, revalidate the BO and then resume the compute context. This is
> +done by having a per-context preempt fence which is enabled when someone tries
> +to wait on it and triggers the context preemption.
> +
> +User/Memory Fence
> +~~~~~~~~~~~~~~~~~~
> +User/Memory fence is a <address, value> pair. To signal the user fence, the
> +specified value will be written at the specified virtual address and wakeup the
> +waiting process. User fence can be signaled either by the GPU or kernel async
> +worker (like upon bind completion). User can wait on a user fence with a new
> +user fence wait ioctl.
> +
> +Here is some prior work on this:
> +https://patchwork.freedesktop.org/patch/349417/
> +
> +Low Latency Submission
> +~~~~~~~~~~~~~~~~~~~~~~~
> +Allows compute UMD to directly submit GPU jobs instead of through execbuf
> +ioctl. This is made possible by VM_BIND is not being synchronized against
> +execbuf. VM_BIND allows bind/unbind of mappings required for the directly
> +submitted jobs.
> +
> +Debugger
> +---------
> +With debug event interface user space process (debugger) is able to keep track
> +of and act upon resources created by another process (debugged) and attached
> +to GPU via vm_bind interface.
> +
> +GPU page faults
> +----------------
> +GPU page faults when supported (in future), will only be supported in the
> +VM_BIND mode. While both the older execbuf mode and the newer VM_BIND mode of
> +binding will require using dma-fence to ensure residency, the GPU page faults
> +mode when supported, will not use any dma-fence as residency is purely managed
> +by installing and removing/invalidating page table entries.
> +
> +Page level hints settings
> +--------------------------
> +VM_BIND allows any hints setting per mapping instead of per BO.
> +Possible hints include read-only mapping, placement and atomicity.
> +Sub-BO level placement hint will be even more relevant with
> +upcoming GPU on-demand page fault support.
> +
> +Page level Cache/CLOS settings
> +-------------------------------
> +VM_BIND allows cache/CLOS settings per mapping instead of per BO.
> +
> +Evictable page table allocations
> +---------------------------------
> +Make pagetable allocations evictable and manage them similar to VM_BIND
> +mapped objects. Page table pages are similar to persistent mappings of a
> +VM (difference here are that the page table pages will not have an i915_vma
> +structure and after swapping pages back in, parent page link needs to be
> +updated).
> +
> +Shared Virtual Memory (SVM) support
> +------------------------------------
> +VM_BIND interface can be used to map system memory directly (without gem BO
> +abstraction) using the HMM interface. SVM is only supported with GPU page
> +faults enabled.
> +
> +VM_BIND UAPI
> +=============
> +
> +.. kernel-doc:: Documentation/gpu/rfc/i915_vm_bind.h
> diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
> index 91e93a705230..7d10c36b268d 100644
> --- a/Documentation/gpu/rfc/index.rst
> +++ b/Documentation/gpu/rfc/index.rst
> @@ -23,3 +23,7 @@ host such documentation:
>  .. toctree::
>  
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>      i915_scheduler.rst
> +
> +.. toctree::
> +
> +    i915_vm_bind.rst


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-26  1:49   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-06-30  5:11     ` Jason Ekstrand
  -1 siblings, 0 replies; 53+ messages in thread
From: Jason Ekstrand @ 2022-06-30  5:11 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: Matthew Brost, Paulo Zanoni, Lionel Landwerlin, Tvrtko Ursulin,
	Intel GFX, Chris Wilson, Thomas Hellstrom, oak.zeng,
	Maling list - DRI developers, Daniel Vetter,
	Christian König, Matthew Auld

[-- Attachment #1: Type: text/plain, Size: 15224 bytes --]

On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura <
niranjana.vishwanathapura@intel.com> wrote:

> VM_BIND and related uapi definitions
>
> v2: Reduce the scope to simple Mesa use case.
> v3: Expand VM_UNBIND documentation and add
>     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>     and I915_GEM_VM_BIND_TLB_FLUSH flags.
> v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>     documentation for vm_bind/unbind.
> v5: Remove TLB flush requirement on VM_UNBIND.
>     Add version support to stage implementation.
> v6: Define and use drm_i915_gem_timeline_fence structure for
>     all timeline fences.
> v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>     Update documentation on async vm_bind/unbind and versioning.
>     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>     batch_count field and I915_EXEC3_SECURE flag.
>
> Signed-off-by: Niranjana Vishwanathapura <
> niranjana.vishwanathapura@intel.com>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>  1 file changed, 280 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
> b/Documentation/gpu/rfc/i915_vm_bind.h
> new file mode 100644
> index 000000000000..a93e08bceee6
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> @@ -0,0 +1,280 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +/**
> + * DOC: I915_PARAM_VM_BIND_VERSION
> + *
> + * VM_BIND feature version supported.
> + * See typedef drm_i915_getparam_t param.
> + *
> + * Specifies the VM_BIND feature version supported.
> + * The following versions of VM_BIND have been defined:
> + *
> + * 0: No VM_BIND support.
> + *
> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
> + *    previously with VM_BIND, the ioctl will not support unbinding
> multiple
> + *    mappings or splitting them. Similarly, VM_BIND calls will not
> replace
> + *    any existing mappings.
> + *
> + * 2: The restrictions on unbinding partial or multiple mappings is
> + *    lifted, Similarly, binding will replace any mappings in the given
> range.
> + *
> + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
> + */
> +#define I915_PARAM_VM_BIND_VERSION     57
> +
> +/**
> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> + *
> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> + * See struct drm_i915_gem_vm_control flags.
> + *
> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> + */
> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
> +
> +/* VM_BIND related ioctls */
> +#define DRM_I915_GEM_VM_BIND           0x3d
> +#define DRM_I915_GEM_VM_UNBIND         0x3e
> +#define DRM_I915_GEM_EXECBUFFER3       0x3f
> +
> +#define DRM_IOCTL_I915_GEM_VM_BIND             DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND           DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3         DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> +
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the completion of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> +       /** @handle: User's handle for a drm_syncobj to wait on or signal.
> */
> +       __u32 handle;
> +
> +       /**
> +        * @flags: Supported flags are:
> +        *
> +        * I915_TIMELINE_FENCE_WAIT:
> +        * Wait for the input fence before the operation.
> +        *
> +        * I915_TIMELINE_FENCE_SIGNAL:
> +        * Return operation completion fence as output.
> +        */
> +       __u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL
> << 1))
> +
> +       /**
> +        * @value: A point in the timeline.
> +        * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> +        * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> +        * binary one.
> +        */
> +       __u64 value;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the mapping of
> GPU
> + * virtual address (VA) range to the section of an object that should be
> bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not currently bound) and
> can
> + * be mapped to whole object or a section of the object (partial binding).
> + * Multiple VA mappings can be created to the same section of the object
> + * (aliasing).
> + *
> + * The @start, @offset and @length must be 4K page aligned. However the
> DG2
> + * and XEHPSDV has 64K page size for device local-memory and has compact
> page
> + * table. On those platforms, for binding device local-memory objects, the
> + * @start must be 2M aligned, @offset and @length must be 64K aligned.
>

This is not acceptable.  We need 64K granularity.  This includes the
starting address, the BO offset, and the length.  Why?  The tl;dr is that
it's a requirement for about 50% of D3D12 apps if we want them to run on
Linux via D3D12.  A longer explanation follows.  I don't necessarily expect
kernel folks to get all the details but hopefully I'll have left enough of
a map that some of the Intel Mesa folks can help fill in details.

Many modern D3D12 apps have a hard requirement on Tier2 tiled resources.
This is a feature that Intel has supported in the D3D12 driver since
Skylake.  In order to implement this feature, VKD3D requires the various
sparseResidencyImage* and sparseResidency*Sampled Vulkan features.  If we
want those apps to work (there's getting to be quite a few of them), we
need to implement the Vulkan sparse residency features.

What is sparse residency?  I'm glad you asked!  The sparse residency
features allow a client to separately bind each miplevel or array slice of
an image to a chunk of device memory independently, without affecting any
other areas of the image.  Once you get to a high enough miplevel that
everything fits inside a single sparse image block (that's a technical
Vulkan term you can search for in the spec), you can enter a "miptail"
which contains all the remaining miplevels in a single sparse image block.

The term "sparse image block" is what the Vulkan spec uses.  On Intel
hardware and in the docs, it's what we call a "tile".  Specifically, the
image needs to use Yf or Ys tiling on SKL-TGL or a Tile64 on DG2+.  This is
because Tile4 and legacy X and Y-tiling don't provide any guarantees about
page alignment for slices.  Yf, Ys, and Tile64, on the other hand, align
all slices of the image to a tile boundary, allowing us to map memory to
different slices independently, assuming we have 64K (or 4K for Yf) VM_BIND
granularity.  (4K isn't actually a requirement for SKL-TGL; we can use Ys
all the time which has 64K tiles but there's no reason to not support 4K
alignments on integrated.)

Someone may be tempted to ask, "Can't we wiggle the strides around or
something to make it work?"  I thought about that and no, you can't.  The
problem here is LOD2+.  Sure, you can have a stride such that the image is
a multiple of 2M worth of tiles across.  That'll work fine for LOD0 and
LOD1; both will be 2M aligned.  However, LOD2 won't be and there's no way
to control that.  The hardware will place it to the right of LOD1 by
ROUND_UP(width, tile_width) pixels and there's nothing you can do about
that.  If that position doesn't happen to hit a 2M boundary, you're out of
luck.

I hope that explanation provides enough detail.  Sadly, this is one of
those things which has a lot of moving pieces all over different bits of
the hardware and various APIs and they all have to work together just right
for it to all come out in the end.  But, yeah, we really need 64K aligned
binding if we want VKD3D to work.

--Jason



> + * Also, for such mappings, i915 will reserve the whole 2M range for it
> so as
> + * to not allow multiple mappings in that 2M range (Compact page tables
> do not
> + * allow 64K page and 4K page bindings in the same 2M range).
> + *
> + * Error code -EINVAL will be returned if @start, @offset and @length are
> not
> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error
> code
> + * -ENOSPC will be returned if the VA range specified can't be reserved.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND operation can be
> done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_bind {
> +       /** @vm_id: VM (address space) id to bind */
> +       __u32 vm_id;
> +
> +       /** @handle: Object handle */
> +       __u32 handle;
> +
> +       /** @start: Virtual Address start to bind */
> +       __u64 start;
> +
> +       /** @offset: Offset in object to bind */
> +       __u64 offset;
> +
> +       /** @length: Length of mapping to bind */
> +       __u64 length;
> +
> +       /**
> +        * @flags: Supported flags are:
> +        *
> +        * I915_GEM_VM_BIND_READONLY:
> +        * Mapping is read-only.
> +        *
> +        * I915_GEM_VM_BIND_CAPTURE:
> +        * Capture this mapping in the dump upon GPU error.
> +        */
> +       __u64 flags;
> +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
> +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
> +
> +       /**
> +        * @fence: Timeline fence for bind completion signaling.
> +        *
> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +        * is invalid, and an error will be returned.
> +        */
> +       struct drm_i915_gem_timeline_fence fence;
> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> + *
> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU
> virtual
> + * address (VA) range that should be unbound from the device page table
> of the
> + * specified address space (VM). VM_UNBIND will force unbind the specified
> + * range from device page table without waiting for any GPU job to
> complete.
> + * It is UMDs responsibility to ensure the mapping is no longer in use
> before
> + * calling VM_UNBIND.
> + *
> + * If the specified mapping is not found, the ioctl will simply return
> without
> + * any error.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> concurrently
> + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be
> done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_unbind {
> +       /** @vm_id: VM (address space) id to bind */
> +       __u32 vm_id;
> +
> +       /** @rsvd: Reserved, MBZ */
> +       __u32 rsvd;
> +
> +       /** @start: Virtual Address start to unbind */
> +       __u64 start;
> +
> +       /** @length: Length of mapping to unbind */
> +       __u64 length;
> +
> +       /** @flags: Currently reserved, MBZ */
> +       __u64 flags;
> +
> +       /**
> +        * @fence: Timeline fence for unbind completion signaling.
> +        *
> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +        * is invalid, and an error will be returned.
> +        */
> +       struct drm_i915_gem_timeline_fence fence;
> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_execbuffer3 - Structure for
> DRM_I915_GEM_EXECBUFFER3
> + * ioctl.
> + *
> + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND
> mode
> + * only works with this ioctl for submission.
> + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
> + */
> +struct drm_i915_gem_execbuffer3 {
> +       /**
> +        * @ctx_id: Context id
> +        *
> +        * Only contexts with user engine map are allowed.
> +        */
> +       __u32 ctx_id;
> +
> +       /**
> +        * @engine_idx: Engine index
> +        *
> +        * An index in the user engine map of the context specified by
> @ctx_id.
> +        */
> +       __u32 engine_idx;
> +
> +       /**
> +        * @batch_address: Batch gpu virtual address/es.
> +        *
> +        * For normal submission, it is the gpu virtual address of the
> batch
> +        * buffer. For parallel submission, it is a pointer to an array of
> +        * batch buffer gpu virtual addresses with array size equal to the
> +        * number of (parallel) engines involved in that submission (See
> +        * struct i915_context_engines_parallel_submit).
> +        */
> +       __u64 batch_address;
> +
> +       /** @flags: Currently reserved, MBZ */
> +       __u64 flags;
> +
> +       /** @rsvd1: Reserved, MBZ */
> +       __u32 rsvd1;
> +
> +       /** @fence_count: Number of fences in @timeline_fences array. */
> +       __u32 fence_count;
> +
> +       /**
> +        * @timeline_fences: Pointer to an array of timeline fences.
> +        *
> +        * Timeline fences are of format struct
> drm_i915_gem_timeline_fence.
> +        */
> +       __u64 timeline_fences;
> +
> +       /** @rsvd2: Reserved, MBZ */
> +       __u64 rsvd2;
> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
> object
> + * private to the specified VM.
> + *
> + * See struct drm_i915_gem_create_ext.
> + */
> +struct drm_i915_gem_create_ext_vm_private {
> +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
> +       /** @base: Extension link. See struct i915_user_extension. */
> +       struct i915_user_extension base;
> +
> +       /** @vm_id: Id of the VM to which the object is private */
> +       __u32 vm_id;
> +};
> --
> 2.21.0.rc0.32.g243a4c7e27
>
>

[-- Attachment #2: Type: text/html, Size: 17235 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-30  5:11     ` Jason Ekstrand
  0 siblings, 0 replies; 53+ messages in thread
From: Jason Ekstrand @ 2022-06-30  5:11 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: Paulo Zanoni, Intel GFX, Chris Wilson, Thomas Hellstrom,
	Maling list - DRI developers, Daniel Vetter,
	Christian König, Matthew Auld

[-- Attachment #1: Type: text/plain, Size: 15224 bytes --]

On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura <
niranjana.vishwanathapura@intel.com> wrote:

> VM_BIND and related uapi definitions
>
> v2: Reduce the scope to simple Mesa use case.
> v3: Expand VM_UNBIND documentation and add
>     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>     and I915_GEM_VM_BIND_TLB_FLUSH flags.
> v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>     documentation for vm_bind/unbind.
> v5: Remove TLB flush requirement on VM_UNBIND.
>     Add version support to stage implementation.
> v6: Define and use drm_i915_gem_timeline_fence structure for
>     all timeline fences.
> v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>     Update documentation on async vm_bind/unbind and versioning.
>     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>     batch_count field and I915_EXEC3_SECURE flag.
>
> Signed-off-by: Niranjana Vishwanathapura <
> niranjana.vishwanathapura@intel.com>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>  1 file changed, 280 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
> b/Documentation/gpu/rfc/i915_vm_bind.h
> new file mode 100644
> index 000000000000..a93e08bceee6
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> @@ -0,0 +1,280 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +/**
> + * DOC: I915_PARAM_VM_BIND_VERSION
> + *
> + * VM_BIND feature version supported.
> + * See typedef drm_i915_getparam_t param.
> + *
> + * Specifies the VM_BIND feature version supported.
> + * The following versions of VM_BIND have been defined:
> + *
> + * 0: No VM_BIND support.
> + *
> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
> + *    previously with VM_BIND, the ioctl will not support unbinding
> multiple
> + *    mappings or splitting them. Similarly, VM_BIND calls will not
> replace
> + *    any existing mappings.
> + *
> + * 2: The restrictions on unbinding partial or multiple mappings is
> + *    lifted, Similarly, binding will replace any mappings in the given
> range.
> + *
> + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
> + */
> +#define I915_PARAM_VM_BIND_VERSION     57
> +
> +/**
> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> + *
> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> + * See struct drm_i915_gem_vm_control flags.
> + *
> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> + */
> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
> +
> +/* VM_BIND related ioctls */
> +#define DRM_I915_GEM_VM_BIND           0x3d
> +#define DRM_I915_GEM_VM_UNBIND         0x3e
> +#define DRM_I915_GEM_EXECBUFFER3       0x3f
> +
> +#define DRM_IOCTL_I915_GEM_VM_BIND             DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND           DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3         DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> +
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the completion of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> +       /** @handle: User's handle for a drm_syncobj to wait on or signal.
> */
> +       __u32 handle;
> +
> +       /**
> +        * @flags: Supported flags are:
> +        *
> +        * I915_TIMELINE_FENCE_WAIT:
> +        * Wait for the input fence before the operation.
> +        *
> +        * I915_TIMELINE_FENCE_SIGNAL:
> +        * Return operation completion fence as output.
> +        */
> +       __u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL
> << 1))
> +
> +       /**
> +        * @value: A point in the timeline.
> +        * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> +        * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> +        * binary one.
> +        */
> +       __u64 value;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the mapping of
> GPU
> + * virtual address (VA) range to the section of an object that should be
> bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not currently bound) and
> can
> + * be mapped to whole object or a section of the object (partial binding).
> + * Multiple VA mappings can be created to the same section of the object
> + * (aliasing).
> + *
> + * The @start, @offset and @length must be 4K page aligned. However the
> DG2
> + * and XEHPSDV has 64K page size for device local-memory and has compact
> page
> + * table. On those platforms, for binding device local-memory objects, the
> + * @start must be 2M aligned, @offset and @length must be 64K aligned.
>

This is not acceptable.  We need 64K granularity.  This includes the
starting address, the BO offset, and the length.  Why?  The tl;dr is that
it's a requirement for about 50% of D3D12 apps if we want them to run on
Linux via D3D12.  A longer explanation follows.  I don't necessarily expect
kernel folks to get all the details but hopefully I'll have left enough of
a map that some of the Intel Mesa folks can help fill in details.

Many modern D3D12 apps have a hard requirement on Tier2 tiled resources.
This is a feature that Intel has supported in the D3D12 driver since
Skylake.  In order to implement this feature, VKD3D requires the various
sparseResidencyImage* and sparseResidency*Sampled Vulkan features.  If we
want those apps to work (there's getting to be quite a few of them), we
need to implement the Vulkan sparse residency features.

What is sparse residency?  I'm glad you asked!  The sparse residency
features allow a client to separately bind each miplevel or array slice of
an image to a chunk of device memory independently, without affecting any
other areas of the image.  Once you get to a high enough miplevel that
everything fits inside a single sparse image block (that's a technical
Vulkan term you can search for in the spec), you can enter a "miptail"
which contains all the remaining miplevels in a single sparse image block.

The term "sparse image block" is what the Vulkan spec uses.  On Intel
hardware and in the docs, it's what we call a "tile".  Specifically, the
image needs to use Yf or Ys tiling on SKL-TGL or a Tile64 on DG2+.  This is
because Tile4 and legacy X and Y-tiling don't provide any guarantees about
page alignment for slices.  Yf, Ys, and Tile64, on the other hand, align
all slices of the image to a tile boundary, allowing us to map memory to
different slices independently, assuming we have 64K (or 4K for Yf) VM_BIND
granularity.  (4K isn't actually a requirement for SKL-TGL; we can use Ys
all the time which has 64K tiles but there's no reason to not support 4K
alignments on integrated.)

Someone may be tempted to ask, "Can't we wiggle the strides around or
something to make it work?"  I thought about that and no, you can't.  The
problem here is LOD2+.  Sure, you can have a stride such that the image is
a multiple of 2M worth of tiles across.  That'll work fine for LOD0 and
LOD1; both will be 2M aligned.  However, LOD2 won't be and there's no way
to control that.  The hardware will place it to the right of LOD1 by
ROUND_UP(width, tile_width) pixels and there's nothing you can do about
that.  If that position doesn't happen to hit a 2M boundary, you're out of
luck.

I hope that explanation provides enough detail.  Sadly, this is one of
those things which has a lot of moving pieces all over different bits of
the hardware and various APIs and they all have to work together just right
for it to all come out in the end.  But, yeah, we really need 64K aligned
binding if we want VKD3D to work.

--Jason



> + * Also, for such mappings, i915 will reserve the whole 2M range for it
> so as
> + * to not allow multiple mappings in that 2M range (Compact page tables
> do not
> + * allow 64K page and 4K page bindings in the same 2M range).
> + *
> + * Error code -EINVAL will be returned if @start, @offset and @length are
> not
> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error
> code
> + * -ENOSPC will be returned if the VA range specified can't be reserved.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND operation can be
> done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_bind {
> +       /** @vm_id: VM (address space) id to bind */
> +       __u32 vm_id;
> +
> +       /** @handle: Object handle */
> +       __u32 handle;
> +
> +       /** @start: Virtual Address start to bind */
> +       __u64 start;
> +
> +       /** @offset: Offset in object to bind */
> +       __u64 offset;
> +
> +       /** @length: Length of mapping to bind */
> +       __u64 length;
> +
> +       /**
> +        * @flags: Supported flags are:
> +        *
> +        * I915_GEM_VM_BIND_READONLY:
> +        * Mapping is read-only.
> +        *
> +        * I915_GEM_VM_BIND_CAPTURE:
> +        * Capture this mapping in the dump upon GPU error.
> +        */
> +       __u64 flags;
> +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
> +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
> +
> +       /**
> +        * @fence: Timeline fence for bind completion signaling.
> +        *
> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +        * is invalid, and an error will be returned.
> +        */
> +       struct drm_i915_gem_timeline_fence fence;
> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> + *
> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU
> virtual
> + * address (VA) range that should be unbound from the device page table
> of the
> + * specified address space (VM). VM_UNBIND will force unbind the specified
> + * range from device page table without waiting for any GPU job to
> complete.
> + * It is UMDs responsibility to ensure the mapping is no longer in use
> before
> + * calling VM_UNBIND.
> + *
> + * If the specified mapping is not found, the ioctl will simply return
> without
> + * any error.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> concurrently
> + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be
> done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_unbind {
> +       /** @vm_id: VM (address space) id to bind */
> +       __u32 vm_id;
> +
> +       /** @rsvd: Reserved, MBZ */
> +       __u32 rsvd;
> +
> +       /** @start: Virtual Address start to unbind */
> +       __u64 start;
> +
> +       /** @length: Length of mapping to unbind */
> +       __u64 length;
> +
> +       /** @flags: Currently reserved, MBZ */
> +       __u64 flags;
> +
> +       /**
> +        * @fence: Timeline fence for unbind completion signaling.
> +        *
> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +        * is invalid, and an error will be returned.
> +        */
> +       struct drm_i915_gem_timeline_fence fence;
> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_execbuffer3 - Structure for
> DRM_I915_GEM_EXECBUFFER3
> + * ioctl.
> + *
> + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND
> mode
> + * only works with this ioctl for submission.
> + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
> + */
> +struct drm_i915_gem_execbuffer3 {
> +       /**
> +        * @ctx_id: Context id
> +        *
> +        * Only contexts with user engine map are allowed.
> +        */
> +       __u32 ctx_id;
> +
> +       /**
> +        * @engine_idx: Engine index
> +        *
> +        * An index in the user engine map of the context specified by
> @ctx_id.
> +        */
> +       __u32 engine_idx;
> +
> +       /**
> +        * @batch_address: Batch gpu virtual address/es.
> +        *
> +        * For normal submission, it is the gpu virtual address of the
> batch
> +        * buffer. For parallel submission, it is a pointer to an array of
> +        * batch buffer gpu virtual addresses with array size equal to the
> +        * number of (parallel) engines involved in that submission (See
> +        * struct i915_context_engines_parallel_submit).
> +        */
> +       __u64 batch_address;
> +
> +       /** @flags: Currently reserved, MBZ */
> +       __u64 flags;
> +
> +       /** @rsvd1: Reserved, MBZ */
> +       __u32 rsvd1;
> +
> +       /** @fence_count: Number of fences in @timeline_fences array. */
> +       __u32 fence_count;
> +
> +       /**
> +        * @timeline_fences: Pointer to an array of timeline fences.
> +        *
> +        * Timeline fences are of format struct
> drm_i915_gem_timeline_fence.
> +        */
> +       __u64 timeline_fences;
> +
> +       /** @rsvd2: Reserved, MBZ */
> +       __u64 rsvd2;
> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
> object
> + * private to the specified VM.
> + *
> + * See struct drm_i915_gem_create_ext.
> + */
> +struct drm_i915_gem_create_ext_vm_private {
> +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
> +       /** @base: Extension link. See struct i915_user_extension. */
> +       struct i915_user_extension base;
> +
> +       /** @vm_id: Id of the VM to which the object is private */
> +       __u32 vm_id;
> +};
> --
> 2.21.0.rc0.32.g243a4c7e27
>
>

[-- Attachment #2: Type: text/html, Size: 17235 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 1/3] drm/doc/rfc: VM_BIND feature design document
  2022-06-30  0:38     ` [Intel-gfx] " Zanoni, Paulo R
@ 2022-06-30  5:39       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-30  5:39 UTC (permalink / raw)
  To: Zanoni, Paulo R
  Cc: Brost, Matthew, Landwerlin, Lionel G, Ursulin, Tvrtko, intel-gfx,
	Wilson, Chris P, Hellstrom, Thomas, Zeng, Oak, dri-devel, jason,
	Vetter, Daniel, christian.koenig, Auld, Matthew

On Wed, Jun 29, 2022 at 05:38:59PM -0700, Zanoni, Paulo R wrote:
>On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
>> VM_BIND design document with description of intended use cases.
>>
>> v2: Reduce the scope to simple Mesa use case.
>> v3: Expand documentation on dma-resv usage, TLB flushing and
>>     execbuf3.
>> v4: Remove vm_bind tlb flush request support.
>> v5: Update TLB flushing documentation.
>> v6: Update out of order completion documentation.
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  Documentation/gpu/rfc/i915_vm_bind.rst | 246 +++++++++++++++++++++++++
>>  Documentation/gpu/rfc/index.rst        |   4 +
>>  2 files changed, 250 insertions(+)
>>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst
>>
>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst b/Documentation/gpu/rfc/i915_vm_bind.rst
>> new file mode 100644
>> index 000000000000..032ee32b885c
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/i915_vm_bind.rst
>> @@ -0,0 +1,246 @@
>> +==========================================
>> +I915 VM_BIND feature design and use cases
>> +==========================================
>> +
>> +VM_BIND feature
>> +================
>> +DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
>> +objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
>> +specified address space (VM). These mappings (also referred to as persistent
>> +mappings) will be persistent across multiple GPU submissions (execbuf calls)
>> +issued by the UMD, without user having to provide a list of all required
>> +mappings during each submission (as required by older execbuf mode).
>> +
>> +The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
>> +signaling the completion of bind/unbind operation.
>> +
>> +VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND.
>
>I915_PARAM_VM_BIND_VERSION

Thanks, will fix.

>
>
>> +User has to opt-in for VM_BIND mode of binding for an address space (VM)
>> +during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
>> +
>> +VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently are
>> +not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be done
>> +asynchronously, when valid out fence is specified.
>> +
>> +VM_BIND features include:
>> +
>> +* Multiple Virtual Address (VA) mappings can map to the same physical pages
>> +  of an object (aliasing).
>> +* VA mapping can map to a partial section of the BO (partial binding).
>> +* Support capture of persistent mappings in the dump upon GPU error.
>> +* Support for userptr gem objects (no special uapi is required for this).
>> +
>> +TLB flush consideration
>> +------------------------
>> +The i915 driver flushes the TLB for each submission and when an object's
>> +pages are released. The VM_BIND/UNBIND operation will not do any additional
>> +TLB flush. Any VM_BIND mapping added will be in the working set for subsequent
>> +submissions on that VM and will not be in the working set for currently running
>> +batches (which would require additional TLB flushes, which is not supported).
>> +
>> +Execbuf ioctl in VM_BIND mode
>> +-------------------------------
>> +A VM in VM_BIND mode will not support older execbuf mode of binding.
>> +The execbuf ioctl handling in VM_BIND mode differs significantly from the
>> +older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
>> +Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
>> +struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
>> +execlist. Hence, no support for implicit sync. It is expected that the below
>> +work will be able to support requirements of object dependency setting in all
>> +use cases:
>> +
>> +"dma-buf: Add an API for exporting sync files"
>> +(https://lwn.net/Articles/859290/)
>> +
>> +The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
>> +works with execbuf3 ioctl for submission. All BOs mapped on that VM (through
>> +VM_BIND call) at the time of execbuf3 call are deemed required for that
>> +submission.
>> +
>> +The execbuf3 ioctl directly specifies the batch addresses instead of as
>> +object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
>> +support many of the older features like in/out/submit fences, fence array,
>> +default gem context and many more (See struct drm_i915_gem_execbuffer3).
>
>Just as a note: both Iris and Vulkan use some of these features, so
>some rework will be required. From what I can see, all current behavior
>we depend on will be supported in some way or another, so hopefully
>we'll be fine.
>
>
>> +
>> +In VM_BIND mode, VA allocation is completely managed by the user instead of
>> +the i915 driver. Hence all VA assignment, eviction are not applicable in
>> +VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
>> +be using the i915_vma active reference tracking. It will instead use dma-resv
>> +object for that (See `VM_BIND dma_resv usage`_).
>> +
>> +So, a lot of existing code supporting execbuf2 ioctl, like relocations, VA
>> +evictions, vma lookup table, implicit sync, vma active reference tracking etc.,
>> +are not applicable for execbuf3 ioctl. Hence, all execbuf3 specific handling
>> +should be in a separate file and only functionalities common to these ioctls
>> +can be the shared code where possible.
>> +
>> +VM_PRIVATE objects
>> +-------------------
>> +By default, BOs can be mapped on multiple VMs and can also be dma-buf
>> +exported. Hence these BOs are referred to as Shared BOs.
>> +During each execbuf submission, the request fence must be added to the
>> +dma-resv fence list of all shared BOs mapped on the VM.
>> +
>> +VM_BIND feature introduces an optimization where user can create BO which
>> +is private to a specified VM via I915_GEM_CREATE_EXT_VM_PRIVATE flag during
>> +BO creation. Unlike Shared BOs, these VM private BOs can only be mapped on
>> +the VM they are private to and can't be dma-buf exported.
>> +All private BOs of a VM share the dma-resv object. Hence during each execbuf
>> +submission, they need only one dma-resv fence list updated. Thus, the fast
>> +path (where required mappings are already bound) submission latency is O(1)
>> +w.r.t the number of VM private BOs.
>> +
>> +VM_BIND locking hirarchy
>> +-------------------------
>> +The locking design here supports the older (execlist based) execbuf mode, the
>> +newer VM_BIND mode, the VM_BIND mode with GPU page faults and possible future
>> +system allocator support (See `Shared Virtual Memory (SVM) support`_).
>> +The older execbuf mode and the newer VM_BIND mode without page faults manages
>> +residency of backing storage using dma_fence. The VM_BIND mode with page faults
>> +and the system allocator support do not use any dma_fence at all.
>> +
>> +VM_BIND locking order is as below.
>> +
>> +1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
>> +   vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
>> +   mapping.
>> +
>> +   In future, when GPU page faults are supported, we can potentially use a
>> +   rwsem instead, so that multiple page fault handlers can take the read side
>> +   lock to lookup the mapping and hence can run in parallel.
>> +   The older execbuf mode of binding do not need this lock.
>> +
>> +2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs to
>> +   be held while binding/unbinding a vma in the async worker and while updating
>> +   dma-resv fence list of an object. Note that private BOs of a VM will all
>> +   share a dma-resv object.
>> +
>> +   The future system allocator support will use the HMM prescribed locking
>> +   instead.
>> +
>> +3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
>> +   invalidated vmas (due to eviction and userptr invalidation) etc.
>> +
>> +When GPU page faults are supported, the execbuf path do not take any of these
>> +locks. There we will simply smash the new batch buffer address into the ring and
>> +then tell the scheduler run that. The lock taking only happens from the page
>> +fault handler, where we take lock-A in read mode, whichever lock-B we need to
>> +find the backing storage (dma_resv lock for gem objects, and hmm/core mm for
>> +system allocator) and some additional locks (lock-D) for taking care of page
>> +table races. Page fault mode should not need to ever manipulate the vm lists,
>> +so won't ever need lock-C.
>> +
>> +VM_BIND LRU handling
>> +---------------------
>> +We need to ensure VM_BIND mapped objects are properly LRU tagged to avoid
>> +performance degradation. We will also need support for bulk LRU movement of
>> +VM_BIND objects to avoid additional latencies in execbuf path.
>> +
>> +The page table pages are similar to VM_BIND mapped objects (See
>> +`Evictable page table allocations`_) and are maintained per VM and needs to
>> +be pinned in memory when VM is made active (ie., upon an execbuf call with
>> +that VM). So, bulk LRU movement of page table pages is also needed.
>> +
>> +VM_BIND dma_resv usage
>> +-----------------------
>> +Fences needs to be added to all VM_BIND mapped objects. During each execbuf
>> +submission, they are added with DMA_RESV_USAGE_BOOKKEEP usage to prevent
>> +over sync (See enum dma_resv_usage). One can override it with either
>> +DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE usage during explicit object
>> +dependency setting.
>> +
>> +Note that DRM_I915_GEM_WAIT and DRM_I915_GEM_BUSY ioctls do not check for
>> +DMA_RESV_USAGE_BOOKKEEP usage and hence should not be used for end of batch
>> +check. Instead, the execbuf3 out fence should be used for end of batch check
>> +(See struct drm_i915_gem_execbuffer3).
>
>From what I remember Mesa is calling gem_wait and gem_busy on batches
>sometimes, so some adjusting will be required.
>
>
>-
>
>From what I could understand, the general plan seems fine. We'll need
>some adjusting in our drivers before we can even try to use this new
>API, but hopefully the API will be usable with the current plans. If it
>isn't, then we can always change the plan. So, with that said, the plan
>is:
>
>Acked-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
>

Thanks,
Niranjana

>
>> +
>> +Also, in VM_BIND mode, use dma-resv apis for determining object activeness
>> +(See dma_resv_test_signaled() and dma_resv_wait_timeout()) and do not use the
>> +older i915_vma active reference tracking which is deprecated. This should be
>> +easier to get it working with the current TTM backend.
>> +
>> +Mesa use case
>> +--------------
>> +VM_BIND can potentially reduce the CPU overhead in Mesa (both Vulkan and Iris),
>> +hence improving performance of CPU-bound applications. It also allows us to
>> +implement Vulkan's Sparse Resources. With increasing GPU hardware performance,
>> +reducing CPU overhead becomes more impactful.
>> +
>> +
>> +Other VM_BIND use cases
>> +========================
>> +
>> +Long running Compute contexts
>> +------------------------------
>> +Usage of dma-fence expects that they complete in reasonable amount of time.
>> +Compute on the other hand can be long running. Hence it is appropriate for
>> +compute to use user/memory fence (See `User/Memory Fence`_) and dma-fence usage
>> +must be limited to in-kernel consumption only.
>> +
>> +Where GPU page faults are not available, kernel driver upon buffer invalidation
>> +will initiate a suspend (preemption) of long running context, finish the
>> +invalidation, revalidate the BO and then resume the compute context. This is
>> +done by having a per-context preempt fence which is enabled when someone tries
>> +to wait on it and triggers the context preemption.
>> +
>> +User/Memory Fence
>> +~~~~~~~~~~~~~~~~~~
>> +User/Memory fence is a <address, value> pair. To signal the user fence, the
>> +specified value will be written at the specified virtual address and wakeup the
>> +waiting process. User fence can be signaled either by the GPU or kernel async
>> +worker (like upon bind completion). User can wait on a user fence with a new
>> +user fence wait ioctl.
>> +
>> +Here is some prior work on this:
>> +https://patchwork.freedesktop.org/patch/349417/
>> +
>> +Low Latency Submission
>> +~~~~~~~~~~~~~~~~~~~~~~~
>> +Allows compute UMD to directly submit GPU jobs instead of through execbuf
>> +ioctl. This is made possible by VM_BIND is not being synchronized against
>> +execbuf. VM_BIND allows bind/unbind of mappings required for the directly
>> +submitted jobs.
>> +
>> +Debugger
>> +---------
>> +With debug event interface user space process (debugger) is able to keep track
>> +of and act upon resources created by another process (debugged) and attached
>> +to GPU via vm_bind interface.
>> +
>> +GPU page faults
>> +----------------
>> +GPU page faults when supported (in future), will only be supported in the
>> +VM_BIND mode. While both the older execbuf mode and the newer VM_BIND mode of
>> +binding will require using dma-fence to ensure residency, the GPU page faults
>> +mode when supported, will not use any dma-fence as residency is purely managed
>> +by installing and removing/invalidating page table entries.
>> +
>> +Page level hints settings
>> +--------------------------
>> +VM_BIND allows any hints setting per mapping instead of per BO.
>> +Possible hints include read-only mapping, placement and atomicity.
>> +Sub-BO level placement hint will be even more relevant with
>> +upcoming GPU on-demand page fault support.
>> +
>> +Page level Cache/CLOS settings
>> +-------------------------------
>> +VM_BIND allows cache/CLOS settings per mapping instead of per BO.
>> +
>> +Evictable page table allocations
>> +---------------------------------
>> +Make pagetable allocations evictable and manage them similar to VM_BIND
>> +mapped objects. Page table pages are similar to persistent mappings of a
>> +VM (difference here are that the page table pages will not have an i915_vma
>> +structure and after swapping pages back in, parent page link needs to be
>> +updated).
>> +
>> +Shared Virtual Memory (SVM) support
>> +------------------------------------
>> +VM_BIND interface can be used to map system memory directly (without gem BO
>> +abstraction) using the HMM interface. SVM is only supported with GPU page
>> +faults enabled.
>> +
>> +VM_BIND UAPI
>> +=============
>> +
>> +.. kernel-doc:: Documentation/gpu/rfc/i915_vm_bind.h
>> diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
>> index 91e93a705230..7d10c36b268d 100644
>> --- a/Documentation/gpu/rfc/index.rst
>> +++ b/Documentation/gpu/rfc/index.rst
>> @@ -23,3 +23,7 @@ host such documentation:
>>  .. toctree::
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>      i915_scheduler.rst
>> +
>> +.. toctree::
>> +
>> +    i915_vm_bind.rst
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 1/3] drm/doc/rfc: VM_BIND feature design document
@ 2022-06-30  5:39       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-30  5:39 UTC (permalink / raw)
  To: Zanoni, Paulo R
  Cc: intel-gfx, Wilson, Chris P, Hellstrom, Thomas, dri-devel, Vetter,
	Daniel, christian.koenig, Auld, Matthew

On Wed, Jun 29, 2022 at 05:38:59PM -0700, Zanoni, Paulo R wrote:
>On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
>> VM_BIND design document with description of intended use cases.
>>
>> v2: Reduce the scope to simple Mesa use case.
>> v3: Expand documentation on dma-resv usage, TLB flushing and
>>     execbuf3.
>> v4: Remove vm_bind tlb flush request support.
>> v5: Update TLB flushing documentation.
>> v6: Update out of order completion documentation.
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  Documentation/gpu/rfc/i915_vm_bind.rst | 246 +++++++++++++++++++++++++
>>  Documentation/gpu/rfc/index.rst        |   4 +
>>  2 files changed, 250 insertions(+)
>>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.rst
>>
>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst b/Documentation/gpu/rfc/i915_vm_bind.rst
>> new file mode 100644
>> index 000000000000..032ee32b885c
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/i915_vm_bind.rst
>> @@ -0,0 +1,246 @@
>> +==========================================
>> +I915 VM_BIND feature design and use cases
>> +==========================================
>> +
>> +VM_BIND feature
>> +================
>> +DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
>> +objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
>> +specified address space (VM). These mappings (also referred to as persistent
>> +mappings) will be persistent across multiple GPU submissions (execbuf calls)
>> +issued by the UMD, without user having to provide a list of all required
>> +mappings during each submission (as required by older execbuf mode).
>> +
>> +The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
>> +signaling the completion of bind/unbind operation.
>> +
>> +VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND.
>
>I915_PARAM_VM_BIND_VERSION

Thanks, will fix.

>
>
>> +User has to opt-in for VM_BIND mode of binding for an address space (VM)
>> +during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
>> +
>> +VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently are
>> +not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be done
>> +asynchronously, when valid out fence is specified.
>> +
>> +VM_BIND features include:
>> +
>> +* Multiple Virtual Address (VA) mappings can map to the same physical pages
>> +  of an object (aliasing).
>> +* VA mapping can map to a partial section of the BO (partial binding).
>> +* Support capture of persistent mappings in the dump upon GPU error.
>> +* Support for userptr gem objects (no special uapi is required for this).
>> +
>> +TLB flush consideration
>> +------------------------
>> +The i915 driver flushes the TLB for each submission and when an object's
>> +pages are released. The VM_BIND/UNBIND operation will not do any additional
>> +TLB flush. Any VM_BIND mapping added will be in the working set for subsequent
>> +submissions on that VM and will not be in the working set for currently running
>> +batches (which would require additional TLB flushes, which is not supported).
>> +
>> +Execbuf ioctl in VM_BIND mode
>> +-------------------------------
>> +A VM in VM_BIND mode will not support older execbuf mode of binding.
>> +The execbuf ioctl handling in VM_BIND mode differs significantly from the
>> +older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
>> +Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
>> +struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
>> +execlist. Hence, no support for implicit sync. It is expected that the below
>> +work will be able to support requirements of object dependency setting in all
>> +use cases:
>> +
>> +"dma-buf: Add an API for exporting sync files"
>> +(https://lwn.net/Articles/859290/)
>> +
>> +The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
>> +works with execbuf3 ioctl for submission. All BOs mapped on that VM (through
>> +VM_BIND call) at the time of execbuf3 call are deemed required for that
>> +submission.
>> +
>> +The execbuf3 ioctl directly specifies the batch addresses instead of as
>> +object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
>> +support many of the older features like in/out/submit fences, fence array,
>> +default gem context and many more (See struct drm_i915_gem_execbuffer3).
>
>Just as a note: both Iris and Vulkan use some of these features, so
>some rework will be required. From what I can see, all current behavior
>we depend on will be supported in some way or another, so hopefully
>we'll be fine.
>
>
>> +
>> +In VM_BIND mode, VA allocation is completely managed by the user instead of
>> +the i915 driver. Hence all VA assignment, eviction are not applicable in
>> +VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
>> +be using the i915_vma active reference tracking. It will instead use dma-resv
>> +object for that (See `VM_BIND dma_resv usage`_).
>> +
>> +So, a lot of existing code supporting execbuf2 ioctl, like relocations, VA
>> +evictions, vma lookup table, implicit sync, vma active reference tracking etc.,
>> +are not applicable for execbuf3 ioctl. Hence, all execbuf3 specific handling
>> +should be in a separate file and only functionalities common to these ioctls
>> +can be the shared code where possible.
>> +
>> +VM_PRIVATE objects
>> +-------------------
>> +By default, BOs can be mapped on multiple VMs and can also be dma-buf
>> +exported. Hence these BOs are referred to as Shared BOs.
>> +During each execbuf submission, the request fence must be added to the
>> +dma-resv fence list of all shared BOs mapped on the VM.
>> +
>> +VM_BIND feature introduces an optimization where user can create BO which
>> +is private to a specified VM via I915_GEM_CREATE_EXT_VM_PRIVATE flag during
>> +BO creation. Unlike Shared BOs, these VM private BOs can only be mapped on
>> +the VM they are private to and can't be dma-buf exported.
>> +All private BOs of a VM share the dma-resv object. Hence during each execbuf
>> +submission, they need only one dma-resv fence list updated. Thus, the fast
>> +path (where required mappings are already bound) submission latency is O(1)
>> +w.r.t the number of VM private BOs.
>> +
>> +VM_BIND locking hirarchy
>> +-------------------------
>> +The locking design here supports the older (execlist based) execbuf mode, the
>> +newer VM_BIND mode, the VM_BIND mode with GPU page faults and possible future
>> +system allocator support (See `Shared Virtual Memory (SVM) support`_).
>> +The older execbuf mode and the newer VM_BIND mode without page faults manages
>> +residency of backing storage using dma_fence. The VM_BIND mode with page faults
>> +and the system allocator support do not use any dma_fence at all.
>> +
>> +VM_BIND locking order is as below.
>> +
>> +1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
>> +   vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
>> +   mapping.
>> +
>> +   In future, when GPU page faults are supported, we can potentially use a
>> +   rwsem instead, so that multiple page fault handlers can take the read side
>> +   lock to lookup the mapping and hence can run in parallel.
>> +   The older execbuf mode of binding do not need this lock.
>> +
>> +2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs to
>> +   be held while binding/unbinding a vma in the async worker and while updating
>> +   dma-resv fence list of an object. Note that private BOs of a VM will all
>> +   share a dma-resv object.
>> +
>> +   The future system allocator support will use the HMM prescribed locking
>> +   instead.
>> +
>> +3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
>> +   invalidated vmas (due to eviction and userptr invalidation) etc.
>> +
>> +When GPU page faults are supported, the execbuf path do not take any of these
>> +locks. There we will simply smash the new batch buffer address into the ring and
>> +then tell the scheduler run that. The lock taking only happens from the page
>> +fault handler, where we take lock-A in read mode, whichever lock-B we need to
>> +find the backing storage (dma_resv lock for gem objects, and hmm/core mm for
>> +system allocator) and some additional locks (lock-D) for taking care of page
>> +table races. Page fault mode should not need to ever manipulate the vm lists,
>> +so won't ever need lock-C.
>> +
>> +VM_BIND LRU handling
>> +---------------------
>> +We need to ensure VM_BIND mapped objects are properly LRU tagged to avoid
>> +performance degradation. We will also need support for bulk LRU movement of
>> +VM_BIND objects to avoid additional latencies in execbuf path.
>> +
>> +The page table pages are similar to VM_BIND mapped objects (See
>> +`Evictable page table allocations`_) and are maintained per VM and needs to
>> +be pinned in memory when VM is made active (ie., upon an execbuf call with
>> +that VM). So, bulk LRU movement of page table pages is also needed.
>> +
>> +VM_BIND dma_resv usage
>> +-----------------------
>> +Fences needs to be added to all VM_BIND mapped objects. During each execbuf
>> +submission, they are added with DMA_RESV_USAGE_BOOKKEEP usage to prevent
>> +over sync (See enum dma_resv_usage). One can override it with either
>> +DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE usage during explicit object
>> +dependency setting.
>> +
>> +Note that DRM_I915_GEM_WAIT and DRM_I915_GEM_BUSY ioctls do not check for
>> +DMA_RESV_USAGE_BOOKKEEP usage and hence should not be used for end of batch
>> +check. Instead, the execbuf3 out fence should be used for end of batch check
>> +(See struct drm_i915_gem_execbuffer3).
>
>From what I remember Mesa is calling gem_wait and gem_busy on batches
>sometimes, so some adjusting will be required.
>
>
>-
>
>From what I could understand, the general plan seems fine. We'll need
>some adjusting in our drivers before we can even try to use this new
>API, but hopefully the API will be usable with the current plans. If it
>isn't, then we can always change the plan. So, with that said, the plan
>is:
>
>Acked-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
>

Thanks,
Niranjana

>
>> +
>> +Also, in VM_BIND mode, use dma-resv apis for determining object activeness
>> +(See dma_resv_test_signaled() and dma_resv_wait_timeout()) and do not use the
>> +older i915_vma active reference tracking which is deprecated. This should be
>> +easier to get it working with the current TTM backend.
>> +
>> +Mesa use case
>> +--------------
>> +VM_BIND can potentially reduce the CPU overhead in Mesa (both Vulkan and Iris),
>> +hence improving performance of CPU-bound applications. It also allows us to
>> +implement Vulkan's Sparse Resources. With increasing GPU hardware performance,
>> +reducing CPU overhead becomes more impactful.
>> +
>> +
>> +Other VM_BIND use cases
>> +========================
>> +
>> +Long running Compute contexts
>> +------------------------------
>> +Usage of dma-fence expects that they complete in reasonable amount of time.
>> +Compute on the other hand can be long running. Hence it is appropriate for
>> +compute to use user/memory fence (See `User/Memory Fence`_) and dma-fence usage
>> +must be limited to in-kernel consumption only.
>> +
>> +Where GPU page faults are not available, kernel driver upon buffer invalidation
>> +will initiate a suspend (preemption) of long running context, finish the
>> +invalidation, revalidate the BO and then resume the compute context. This is
>> +done by having a per-context preempt fence which is enabled when someone tries
>> +to wait on it and triggers the context preemption.
>> +
>> +User/Memory Fence
>> +~~~~~~~~~~~~~~~~~~
>> +User/Memory fence is a <address, value> pair. To signal the user fence, the
>> +specified value will be written at the specified virtual address and wakeup the
>> +waiting process. User fence can be signaled either by the GPU or kernel async
>> +worker (like upon bind completion). User can wait on a user fence with a new
>> +user fence wait ioctl.
>> +
>> +Here is some prior work on this:
>> +https://patchwork.freedesktop.org/patch/349417/
>> +
>> +Low Latency Submission
>> +~~~~~~~~~~~~~~~~~~~~~~~
>> +Allows compute UMD to directly submit GPU jobs instead of through execbuf
>> +ioctl. This is made possible by VM_BIND is not being synchronized against
>> +execbuf. VM_BIND allows bind/unbind of mappings required for the directly
>> +submitted jobs.
>> +
>> +Debugger
>> +---------
>> +With debug event interface user space process (debugger) is able to keep track
>> +of and act upon resources created by another process (debugged) and attached
>> +to GPU via vm_bind interface.
>> +
>> +GPU page faults
>> +----------------
>> +GPU page faults when supported (in future), will only be supported in the
>> +VM_BIND mode. While both the older execbuf mode and the newer VM_BIND mode of
>> +binding will require using dma-fence to ensure residency, the GPU page faults
>> +mode when supported, will not use any dma-fence as residency is purely managed
>> +by installing and removing/invalidating page table entries.
>> +
>> +Page level hints settings
>> +--------------------------
>> +VM_BIND allows any hints setting per mapping instead of per BO.
>> +Possible hints include read-only mapping, placement and atomicity.
>> +Sub-BO level placement hint will be even more relevant with
>> +upcoming GPU on-demand page fault support.
>> +
>> +Page level Cache/CLOS settings
>> +-------------------------------
>> +VM_BIND allows cache/CLOS settings per mapping instead of per BO.
>> +
>> +Evictable page table allocations
>> +---------------------------------
>> +Make pagetable allocations evictable and manage them similar to VM_BIND
>> +mapped objects. Page table pages are similar to persistent mappings of a
>> +VM (difference here are that the page table pages will not have an i915_vma
>> +structure and after swapping pages back in, parent page link needs to be
>> +updated).
>> +
>> +Shared Virtual Memory (SVM) support
>> +------------------------------------
>> +VM_BIND interface can be used to map system memory directly (without gem BO
>> +abstraction) using the HMM interface. SVM is only supported with GPU page
>> +faults enabled.
>> +
>> +VM_BIND UAPI
>> +=============
>> +
>> +.. kernel-doc:: Documentation/gpu/rfc/i915_vm_bind.h
>> diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
>> index 91e93a705230..7d10c36b268d 100644
>> --- a/Documentation/gpu/rfc/index.rst
>> +++ b/Documentation/gpu/rfc/index.rst
>> @@ -23,3 +23,7 @@ host such documentation:
>>  .. toctree::
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>      i915_scheduler.rst
>> +
>> +.. toctree::
>> +
>> +    i915_vm_bind.rst
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30  0:33     ` [Intel-gfx] " Zanoni, Paulo R
@ 2022-06-30  6:08       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-30  6:08 UTC (permalink / raw)
  To: Zanoni, Paulo R
  Cc: Brost, Matthew, Landwerlin, Lionel G, Ursulin, Tvrtko, intel-gfx,
	Wilson, Chris P, Hellstrom, Thomas, Zeng, Oak, dri-devel, jason,
	Vetter, Daniel, christian.koenig, Auld, Matthew

On Wed, Jun 29, 2022 at 05:33:49PM -0700, Zanoni, Paulo R wrote:
>On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
>> VM_BIND and related uapi definitions
>>
>> v2: Reduce the scope to simple Mesa use case.
>> v3: Expand VM_UNBIND documentation and add
>>     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>>     and I915_GEM_VM_BIND_TLB_FLUSH flags.
>> v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>>     documentation for vm_bind/unbind.
>> v5: Remove TLB flush requirement on VM_UNBIND.
>>     Add version support to stage implementation.
>> v6: Define and use drm_i915_gem_timeline_fence structure for
>>     all timeline fences.
>> v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>>     Update documentation on async vm_bind/unbind and versioning.
>>     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>>     batch_count field and I915_EXEC3_SECURE flag.
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> ---
>>  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>>  1 file changed, 280 insertions(+)
>>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>
>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
>> new file mode 100644
>> index 000000000000..a93e08bceee6
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>> @@ -0,0 +1,280 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +/**
>> + * DOC: I915_PARAM_VM_BIND_VERSION
>> + *
>> + * VM_BIND feature version supported.
>> + * See typedef drm_i915_getparam_t param.
>> + *
>> + * Specifies the VM_BIND feature version supported.
>> + * The following versions of VM_BIND have been defined:
>> + *
>> + * 0: No VM_BIND support.
>> + *
>> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
>> + *    previously with VM_BIND, the ioctl will not support unbinding multiple
>> + *    mappings or splitting them. Similarly, VM_BIND calls will not replace
>> + *    any existing mappings.
>> + *
>> + * 2: The restrictions on unbinding partial or multiple mappings is
>> + *    lifted, Similarly, binding will replace any mappings in the given range.
>> + *
>> + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>> + */
>> +#define I915_PARAM_VM_BIND_VERSION   57
>> +
>> +/**
>> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>> + *
>> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>> + * See struct drm_i915_gem_vm_control flags.
>> + *
>> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
>> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>> + */
>> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND     (1 << 0)
>> +
>> +/* VM_BIND related ioctls */
>> +#define DRM_I915_GEM_VM_BIND         0x3d
>> +#define DRM_I915_GEM_VM_UNBIND               0x3e
>> +#define DRM_I915_GEM_EXECBUFFER3     0x3f
>> +
>> +#define DRM_IOCTL_I915_GEM_VM_BIND           DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
>> +#define DRM_IOCTL_I915_GEM_VM_UNBIND         DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
>> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3               DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
>> +
>> +/**
>> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
>> + *
>> + * The operation will wait for input fence to signal.
>> + *
>> + * The returned output fence will be signaled after the completion of the
>> + * operation.
>> + */
>> +struct drm_i915_gem_timeline_fence {
>> +     /** @handle: User's handle for a drm_syncobj to wait on or signal. */
>> +     __u32 handle;
>> +
>> +     /**
>> +      * @flags: Supported flags are:
>> +      *
>> +      * I915_TIMELINE_FENCE_WAIT:
>> +      * Wait for the input fence before the operation.
>> +      *
>> +      * I915_TIMELINE_FENCE_SIGNAL:
>> +      * Return operation completion fence as output.
>> +      */
>> +     __u32 flags;
>> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>> +
>> +     /**
>> +      * @value: A point in the timeline.
>> +      * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>> +      * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
>> +      * binary one.
>> +      */
>> +     __u64 value;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>> + *
>> + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
>> + * virtual address (VA) range to the section of an object that should be bound
>> + * in the device page table of the specified address space (VM).
>> + * The VA range specified must be unique (ie., not currently bound) and can
>> + * be mapped to whole object or a section of the object (partial binding).
>> + * Multiple VA mappings can be created to the same section of the object
>> + * (aliasing).
>> + *
>> + * The @start, @offset and @length must be 4K page aligned. However the DG2
>> + * and XEHPSDV has 64K page size for device local-memory and has compact page
>> + * table. On those platforms, for binding device local-memory objects, the
>> + * @start must be 2M aligned, @offset and @length must be 64K aligned.
>> + * Also, for such mappings, i915 will reserve the whole 2M range for it so as
>> + * to not allow multiple mappings in that 2M range (Compact page tables do not
>> + * allow 64K page and 4K page bindings in the same 2M range).
>> + *
>> + * Error code -EINVAL will be returned if @start, @offset and @length are not
>> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
>> + * -ENOSPC will be returned if the VA range specified can't be reserved.
>> + *
>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>> + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
>> + * asynchronously, if valid @fence is specified.
>
>Does that mean that if I don't provide @fence, then this ioctl will be
>synchronous (i.e., when it returns, the memory will be guaranteed to be
>bound)? The text is kinda implying that, but from one of your earlier
>replies to Tvrtko, that doesn't seem to be the case. I guess we could
>change the text to make this more explicit.
>

Yes, I thought, if user doesn't specify the out fence, KMD better make
the ioctl synchronous by waiting until the binding finishes before
returning. Otherwise, UMD has no way to ensure binding is complete and
UMD must pass in out fence for VM_BIND calls.

But latest comment form Daniel on other thread might suggest something else.
Daniel, can you comment?

>In addition, previously we had the guarantee that an execbuf ioctl
>would wait for all the pending vm_bind operations to finish before
>doing anything. Do we still have this guarantee or do we have to make
>use of the fences now?
>

No, we don't have that anymore (execbuf is decoupled from VM_BIND).
Execbuf3 submission will not wait for any previous VM_BIND to finish.
UMD must pass in VM_BIND out fence as in fence for execbuf3 to ensure
that.

>> + */
>> +struct drm_i915_gem_vm_bind {
>> +     /** @vm_id: VM (address space) id to bind */
>> +     __u32 vm_id;
>> +
>> +     /** @handle: Object handle */
>> +     __u32 handle;
>> +
>> +     /** @start: Virtual Address start to bind */
>> +     __u64 start;
>> +
>> +     /** @offset: Offset in object to bind */
>> +     __u64 offset;
>> +
>> +     /** @length: Length of mapping to bind */
>> +     __u64 length;
>> +
>> +     /**
>> +      * @flags: Supported flags are:
>> +      *
>> +      * I915_GEM_VM_BIND_READONLY:
>> +      * Mapping is read-only.
>
>Can you please explain what happens when we try to write to a range
>that's bound as read-only?
>

It will be mapped as read-only in device page table. Hence any
write access will fail. I would expect a CAT error reported.

I am seeing that currently the page table R/W setting is based
on whether BO is readonly or not (UMDs can request a userptr
BO to readonly). We can make this READONLY here as a subset.
ie., if BO is readonly, the mappings must be readonly. If BO
is not readonly, then the mapping can be either readonly or
not.

But if Mesa doesn't have a use for this, then we can remove
this flag for now.

>
>> +      *
>> +      * I915_GEM_VM_BIND_CAPTURE:
>> +      * Capture this mapping in the dump upon GPU error.
>> +      */
>> +     __u64 flags;
>> +#define I915_GEM_VM_BIND_READONLY    (1 << 1)
>> +#define I915_GEM_VM_BIND_CAPTURE     (1 << 2)
>> +
>> +     /**
>> +      * @fence: Timeline fence for bind completion signaling.
>> +      *
>> +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>> +      * is invalid, and an error will be returned.
>> +      */
>> +     struct drm_i915_gem_timeline_fence fence;
>> +
>> +     /**
>> +      * @extensions: Zero-terminated chain of extensions.
>> +      *
>> +      * For future extensions. See struct i915_user_extension.
>> +      */
>> +     __u64 extensions;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>> + *
>> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
>> + * address (VA) range that should be unbound from the device page table of the
>> + * specified address space (VM). VM_UNBIND will force unbind the specified
>> + * range from device page table without waiting for any GPU job to complete.
>> + * It is UMDs responsibility to ensure the mapping is no longer in use before
>> + * calling VM_UNBIND.
>> + *
>> + * If the specified mapping is not found, the ioctl will simply return without
>> + * any error.
>> + *
>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>> + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
>> + * asynchronously, if valid @fence is specified.
>> + */
>> +struct drm_i915_gem_vm_unbind {
>> +     /** @vm_id: VM (address space) id to bind */
>> +     __u32 vm_id;
>> +
>> +     /** @rsvd: Reserved, MBZ */
>> +     __u32 rsvd;
>> +
>> +     /** @start: Virtual Address start to unbind */
>> +     __u64 start;
>> +
>> +     /** @length: Length of mapping to unbind */
>> +     __u64 length;
>> +
>> +     /** @flags: Currently reserved, MBZ */
>> +     __u64 flags;
>> +
>> +     /**
>> +      * @fence: Timeline fence for unbind completion signaling.
>> +      *
>> +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>> +      * is invalid, and an error will be returned.
>> +      */
>> +     struct drm_i915_gem_timeline_fence fence;
>> +
>> +     /**
>> +      * @extensions: Zero-terminated chain of extensions.
>> +      *
>> +      * For future extensions. See struct i915_user_extension.
>> +      */
>> +     __u64 extensions;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
>> + * ioctl.
>> + *
>> + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
>> + * only works with this ioctl for submission.
>> + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>> + */
>> +struct drm_i915_gem_execbuffer3 {
>> +     /**
>> +      * @ctx_id: Context id
>> +      *
>> +      * Only contexts with user engine map are allowed.
>> +      */
>> +     __u32 ctx_id;
>> +
>> +     /**
>> +      * @engine_idx: Engine index
>> +      *
>> +      * An index in the user engine map of the context specified by @ctx_id.
>> +      */
>> +     __u32 engine_idx;
>> +
>> +     /**
>> +      * @batch_address: Batch gpu virtual address/es.
>> +      *
>> +      * For normal submission, it is the gpu virtual address of the batch
>> +      * buffer. For parallel submission, it is a pointer to an array of
>> +      * batch buffer gpu virtual addresses with array size equal to the
>> +      * number of (parallel) engines involved in that submission (See
>> +      * struct i915_context_engines_parallel_submit).
>> +      */
>> +     __u64 batch_address;
>> +
>> +     /** @flags: Currently reserved, MBZ */
>> +     __u64 flags;
>> +
>> +     /** @rsvd1: Reserved, MBZ */
>> +     __u32 rsvd1;
>> +
>> +     /** @fence_count: Number of fences in @timeline_fences array. */
>> +     __u32 fence_count;
>> +
>> +     /**
>> +      * @timeline_fences: Pointer to an array of timeline fences.
>> +      *
>> +      * Timeline fences are of format struct drm_i915_gem_timeline_fence.
>> +      */
>> +     __u64 timeline_fences;
>> +
>> +     /** @rsvd2: Reserved, MBZ */
>> +     __u64 rsvd2;
>> +
>
>Just out of curiosity: if we can extend behavior with @extensions and
>even @flags, why would we need a rsvd2? Perhaps we could kill rsvd2?
>

True. I added it just in case some requests came up that would require
some additional fields. During this review process itself there were
some requests. Adding directly here should have a slight performance
edge over adding it as an extension (one less copy_from_user).

But if folks think this is an overkill, I will remove it.

Niranjana

>> +     /**
>> +      * @extensions: Zero-terminated chain of extensions.
>> +      *
>> +      * For future extensions. See struct i915_user_extension.
>> +      */
>> +     __u64 extensions;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
>> + * private to the specified VM.
>> + *
>> + * See struct drm_i915_gem_create_ext.
>> + */
>> +struct drm_i915_gem_create_ext_vm_private {
>> +#define I915_GEM_CREATE_EXT_VM_PRIVATE               2
>> +     /** @base: Extension link. See struct i915_user_extension. */
>> +     struct i915_user_extension base;
>> +
>> +     /** @vm_id: Id of the VM to which the object is private */
>> +     __u32 vm_id;
>> +};
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-30  6:08       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-30  6:08 UTC (permalink / raw)
  To: Zanoni, Paulo R
  Cc: intel-gfx, Wilson, Chris P, Hellstrom, Thomas, dri-devel, Vetter,
	Daniel, christian.koenig, Auld, Matthew

On Wed, Jun 29, 2022 at 05:33:49PM -0700, Zanoni, Paulo R wrote:
>On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
>> VM_BIND and related uapi definitions
>>
>> v2: Reduce the scope to simple Mesa use case.
>> v3: Expand VM_UNBIND documentation and add
>>     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>>     and I915_GEM_VM_BIND_TLB_FLUSH flags.
>> v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>>     documentation for vm_bind/unbind.
>> v5: Remove TLB flush requirement on VM_UNBIND.
>>     Add version support to stage implementation.
>> v6: Define and use drm_i915_gem_timeline_fence structure for
>>     all timeline fences.
>> v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>>     Update documentation on async vm_bind/unbind and versioning.
>>     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>>     batch_count field and I915_EXEC3_SECURE flag.
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> ---
>>  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>>  1 file changed, 280 insertions(+)
>>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>
>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
>> new file mode 100644
>> index 000000000000..a93e08bceee6
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>> @@ -0,0 +1,280 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +/**
>> + * DOC: I915_PARAM_VM_BIND_VERSION
>> + *
>> + * VM_BIND feature version supported.
>> + * See typedef drm_i915_getparam_t param.
>> + *
>> + * Specifies the VM_BIND feature version supported.
>> + * The following versions of VM_BIND have been defined:
>> + *
>> + * 0: No VM_BIND support.
>> + *
>> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
>> + *    previously with VM_BIND, the ioctl will not support unbinding multiple
>> + *    mappings or splitting them. Similarly, VM_BIND calls will not replace
>> + *    any existing mappings.
>> + *
>> + * 2: The restrictions on unbinding partial or multiple mappings is
>> + *    lifted, Similarly, binding will replace any mappings in the given range.
>> + *
>> + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>> + */
>> +#define I915_PARAM_VM_BIND_VERSION   57
>> +
>> +/**
>> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>> + *
>> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>> + * See struct drm_i915_gem_vm_control flags.
>> + *
>> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
>> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>> + */
>> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND     (1 << 0)
>> +
>> +/* VM_BIND related ioctls */
>> +#define DRM_I915_GEM_VM_BIND         0x3d
>> +#define DRM_I915_GEM_VM_UNBIND               0x3e
>> +#define DRM_I915_GEM_EXECBUFFER3     0x3f
>> +
>> +#define DRM_IOCTL_I915_GEM_VM_BIND           DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
>> +#define DRM_IOCTL_I915_GEM_VM_UNBIND         DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
>> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3               DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
>> +
>> +/**
>> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
>> + *
>> + * The operation will wait for input fence to signal.
>> + *
>> + * The returned output fence will be signaled after the completion of the
>> + * operation.
>> + */
>> +struct drm_i915_gem_timeline_fence {
>> +     /** @handle: User's handle for a drm_syncobj to wait on or signal. */
>> +     __u32 handle;
>> +
>> +     /**
>> +      * @flags: Supported flags are:
>> +      *
>> +      * I915_TIMELINE_FENCE_WAIT:
>> +      * Wait for the input fence before the operation.
>> +      *
>> +      * I915_TIMELINE_FENCE_SIGNAL:
>> +      * Return operation completion fence as output.
>> +      */
>> +     __u32 flags;
>> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>> +
>> +     /**
>> +      * @value: A point in the timeline.
>> +      * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>> +      * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
>> +      * binary one.
>> +      */
>> +     __u64 value;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>> + *
>> + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
>> + * virtual address (VA) range to the section of an object that should be bound
>> + * in the device page table of the specified address space (VM).
>> + * The VA range specified must be unique (ie., not currently bound) and can
>> + * be mapped to whole object or a section of the object (partial binding).
>> + * Multiple VA mappings can be created to the same section of the object
>> + * (aliasing).
>> + *
>> + * The @start, @offset and @length must be 4K page aligned. However the DG2
>> + * and XEHPSDV has 64K page size for device local-memory and has compact page
>> + * table. On those platforms, for binding device local-memory objects, the
>> + * @start must be 2M aligned, @offset and @length must be 64K aligned.
>> + * Also, for such mappings, i915 will reserve the whole 2M range for it so as
>> + * to not allow multiple mappings in that 2M range (Compact page tables do not
>> + * allow 64K page and 4K page bindings in the same 2M range).
>> + *
>> + * Error code -EINVAL will be returned if @start, @offset and @length are not
>> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
>> + * -ENOSPC will be returned if the VA range specified can't be reserved.
>> + *
>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>> + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
>> + * asynchronously, if valid @fence is specified.
>
>Does that mean that if I don't provide @fence, then this ioctl will be
>synchronous (i.e., when it returns, the memory will be guaranteed to be
>bound)? The text is kinda implying that, but from one of your earlier
>replies to Tvrtko, that doesn't seem to be the case. I guess we could
>change the text to make this more explicit.
>

Yes, I thought, if user doesn't specify the out fence, KMD better make
the ioctl synchronous by waiting until the binding finishes before
returning. Otherwise, UMD has no way to ensure binding is complete and
UMD must pass in out fence for VM_BIND calls.

But latest comment form Daniel on other thread might suggest something else.
Daniel, can you comment?

>In addition, previously we had the guarantee that an execbuf ioctl
>would wait for all the pending vm_bind operations to finish before
>doing anything. Do we still have this guarantee or do we have to make
>use of the fences now?
>

No, we don't have that anymore (execbuf is decoupled from VM_BIND).
Execbuf3 submission will not wait for any previous VM_BIND to finish.
UMD must pass in VM_BIND out fence as in fence for execbuf3 to ensure
that.

>> + */
>> +struct drm_i915_gem_vm_bind {
>> +     /** @vm_id: VM (address space) id to bind */
>> +     __u32 vm_id;
>> +
>> +     /** @handle: Object handle */
>> +     __u32 handle;
>> +
>> +     /** @start: Virtual Address start to bind */
>> +     __u64 start;
>> +
>> +     /** @offset: Offset in object to bind */
>> +     __u64 offset;
>> +
>> +     /** @length: Length of mapping to bind */
>> +     __u64 length;
>> +
>> +     /**
>> +      * @flags: Supported flags are:
>> +      *
>> +      * I915_GEM_VM_BIND_READONLY:
>> +      * Mapping is read-only.
>
>Can you please explain what happens when we try to write to a range
>that's bound as read-only?
>

It will be mapped as read-only in device page table. Hence any
write access will fail. I would expect a CAT error reported.

I am seeing that currently the page table R/W setting is based
on whether BO is readonly or not (UMDs can request a userptr
BO to readonly). We can make this READONLY here as a subset.
ie., if BO is readonly, the mappings must be readonly. If BO
is not readonly, then the mapping can be either readonly or
not.

But if Mesa doesn't have a use for this, then we can remove
this flag for now.

>
>> +      *
>> +      * I915_GEM_VM_BIND_CAPTURE:
>> +      * Capture this mapping in the dump upon GPU error.
>> +      */
>> +     __u64 flags;
>> +#define I915_GEM_VM_BIND_READONLY    (1 << 1)
>> +#define I915_GEM_VM_BIND_CAPTURE     (1 << 2)
>> +
>> +     /**
>> +      * @fence: Timeline fence for bind completion signaling.
>> +      *
>> +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>> +      * is invalid, and an error will be returned.
>> +      */
>> +     struct drm_i915_gem_timeline_fence fence;
>> +
>> +     /**
>> +      * @extensions: Zero-terminated chain of extensions.
>> +      *
>> +      * For future extensions. See struct i915_user_extension.
>> +      */
>> +     __u64 extensions;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>> + *
>> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
>> + * address (VA) range that should be unbound from the device page table of the
>> + * specified address space (VM). VM_UNBIND will force unbind the specified
>> + * range from device page table without waiting for any GPU job to complete.
>> + * It is UMDs responsibility to ensure the mapping is no longer in use before
>> + * calling VM_UNBIND.
>> + *
>> + * If the specified mapping is not found, the ioctl will simply return without
>> + * any error.
>> + *
>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>> + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
>> + * asynchronously, if valid @fence is specified.
>> + */
>> +struct drm_i915_gem_vm_unbind {
>> +     /** @vm_id: VM (address space) id to bind */
>> +     __u32 vm_id;
>> +
>> +     /** @rsvd: Reserved, MBZ */
>> +     __u32 rsvd;
>> +
>> +     /** @start: Virtual Address start to unbind */
>> +     __u64 start;
>> +
>> +     /** @length: Length of mapping to unbind */
>> +     __u64 length;
>> +
>> +     /** @flags: Currently reserved, MBZ */
>> +     __u64 flags;
>> +
>> +     /**
>> +      * @fence: Timeline fence for unbind completion signaling.
>> +      *
>> +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>> +      * is invalid, and an error will be returned.
>> +      */
>> +     struct drm_i915_gem_timeline_fence fence;
>> +
>> +     /**
>> +      * @extensions: Zero-terminated chain of extensions.
>> +      *
>> +      * For future extensions. See struct i915_user_extension.
>> +      */
>> +     __u64 extensions;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
>> + * ioctl.
>> + *
>> + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
>> + * only works with this ioctl for submission.
>> + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>> + */
>> +struct drm_i915_gem_execbuffer3 {
>> +     /**
>> +      * @ctx_id: Context id
>> +      *
>> +      * Only contexts with user engine map are allowed.
>> +      */
>> +     __u32 ctx_id;
>> +
>> +     /**
>> +      * @engine_idx: Engine index
>> +      *
>> +      * An index in the user engine map of the context specified by @ctx_id.
>> +      */
>> +     __u32 engine_idx;
>> +
>> +     /**
>> +      * @batch_address: Batch gpu virtual address/es.
>> +      *
>> +      * For normal submission, it is the gpu virtual address of the batch
>> +      * buffer. For parallel submission, it is a pointer to an array of
>> +      * batch buffer gpu virtual addresses with array size equal to the
>> +      * number of (parallel) engines involved in that submission (See
>> +      * struct i915_context_engines_parallel_submit).
>> +      */
>> +     __u64 batch_address;
>> +
>> +     /** @flags: Currently reserved, MBZ */
>> +     __u64 flags;
>> +
>> +     /** @rsvd1: Reserved, MBZ */
>> +     __u32 rsvd1;
>> +
>> +     /** @fence_count: Number of fences in @timeline_fences array. */
>> +     __u32 fence_count;
>> +
>> +     /**
>> +      * @timeline_fences: Pointer to an array of timeline fences.
>> +      *
>> +      * Timeline fences are of format struct drm_i915_gem_timeline_fence.
>> +      */
>> +     __u64 timeline_fences;
>> +
>> +     /** @rsvd2: Reserved, MBZ */
>> +     __u64 rsvd2;
>> +
>
>Just out of curiosity: if we can extend behavior with @extensions and
>even @flags, why would we need a rsvd2? Perhaps we could kill rsvd2?
>

True. I added it just in case some requests came up that would require
some additional fields. During this review process itself there were
some requests. Adding directly here should have a slight performance
edge over adding it as an extension (one less copy_from_user).

But if folks think this is an overkill, I will remove it.

Niranjana

>> +     /**
>> +      * @extensions: Zero-terminated chain of extensions.
>> +      *
>> +      * For future extensions. See struct i915_user_extension.
>> +      */
>> +     __u64 extensions;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
>> + * private to the specified VM.
>> + *
>> + * See struct drm_i915_gem_create_ext.
>> + */
>> +struct drm_i915_gem_create_ext_vm_private {
>> +#define I915_GEM_CREATE_EXT_VM_PRIVATE               2
>> +     /** @base: Extension link. See struct i915_user_extension. */
>> +     struct i915_user_extension base;
>> +
>> +     /** @vm_id: Id of the VM to which the object is private */
>> +     __u32 vm_id;
>> +};
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30  5:11     ` [Intel-gfx] " Jason Ekstrand
@ 2022-06-30  6:15       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-30  6:15 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Matthew Brost, Paulo Zanoni, Maling list - DRI developers,
	Tvrtko Ursulin, Intel GFX, Chris Wilson, Thomas Hellstrom,
	oak.zeng, Lionel Landwerlin, Daniel Vetter, Christian König,
	Matthew Auld

On Thu, Jun 30, 2022 at 12:11:15AM -0500, Jason Ekstrand wrote:
>   On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura
>   <niranjana.vishwanathapura@intel.com> wrote:
>
>     VM_BIND and related uapi definitions
>
>     v2: Reduce the scope to simple Mesa use case.
>     v3: Expand VM_UNBIND documentation and add
>         I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>         and I915_GEM_VM_BIND_TLB_FLUSH flags.
>     v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>         documentation for vm_bind/unbind.
>     v5: Remove TLB flush requirement on VM_UNBIND.
>         Add version support to stage implementation.
>     v6: Define and use drm_i915_gem_timeline_fence structure for
>         all timeline fences.
>     v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>         Update documentation on async vm_bind/unbind and versioning.
>         Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>         batch_count field and I915_EXEC3_SECURE flag.
>
>     Signed-off-by: Niranjana Vishwanathapura
>     <niranjana.vishwanathapura@intel.com>
>     Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>     ---
>      Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>      1 file changed, 280 insertions(+)
>      create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>
>     diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
>     b/Documentation/gpu/rfc/i915_vm_bind.h
>     new file mode 100644
>     index 000000000000..a93e08bceee6
>     --- /dev/null
>     +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>     @@ -0,0 +1,280 @@
>     +/* SPDX-License-Identifier: MIT */
>     +/*
>     + * Copyright © 2022 Intel Corporation
>     + */
>     +
>     +/**
>     + * DOC: I915_PARAM_VM_BIND_VERSION
>     + *
>     + * VM_BIND feature version supported.
>     + * See typedef drm_i915_getparam_t param.
>     + *
>     + * Specifies the VM_BIND feature version supported.
>     + * The following versions of VM_BIND have been defined:
>     + *
>     + * 0: No VM_BIND support.
>     + *
>     + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
>     created
>     + *    previously with VM_BIND, the ioctl will not support unbinding
>     multiple
>     + *    mappings or splitting them. Similarly, VM_BIND calls will not
>     replace
>     + *    any existing mappings.
>     + *
>     + * 2: The restrictions on unbinding partial or multiple mappings is
>     + *    lifted, Similarly, binding will replace any mappings in the given
>     range.
>     + *
>     + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>     + */
>     +#define I915_PARAM_VM_BIND_VERSION     57
>     +
>     +/**
>     + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>     + *
>     + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>     + * See struct drm_i915_gem_vm_control flags.
>     + *
>     + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>     + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept
>     any
>     + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>     + */
>     +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
>     +
>     +/* VM_BIND related ioctls */
>     +#define DRM_I915_GEM_VM_BIND           0x3d
>     +#define DRM_I915_GEM_VM_UNBIND         0x3e
>     +#define DRM_I915_GEM_EXECBUFFER3       0x3f
>     +
>     +#define DRM_IOCTL_I915_GEM_VM_BIND           
>      DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
>     drm_i915_gem_vm_bind)
>     +#define DRM_IOCTL_I915_GEM_VM_UNBIND         
>      DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
>     drm_i915_gem_vm_bind)
>     +#define DRM_IOCTL_I915_GEM_EXECBUFFER3       
>      DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
>     drm_i915_gem_execbuffer3)
>     +
>     +/**
>     + * struct drm_i915_gem_timeline_fence - An input or output timeline
>     fence.
>     + *
>     + * The operation will wait for input fence to signal.
>     + *
>     + * The returned output fence will be signaled after the completion of
>     the
>     + * operation.
>     + */
>     +struct drm_i915_gem_timeline_fence {
>     +       /** @handle: User's handle for a drm_syncobj to wait on or
>     signal. */
>     +       __u32 handle;
>     +
>     +       /**
>     +        * @flags: Supported flags are:
>     +        *
>     +        * I915_TIMELINE_FENCE_WAIT:
>     +        * Wait for the input fence before the operation.
>     +        *
>     +        * I915_TIMELINE_FENCE_SIGNAL:
>     +        * Return operation completion fence as output.
>     +        */
>     +       __u32 flags;
>     +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>     +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>     +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS
>     (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>     +
>     +       /**
>     +        * @value: A point in the timeline.
>     +        * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>     +        * timeline drm_syncobj is invalid as it turns a drm_syncobj
>     into a
>     +        * binary one.
>     +        */
>     +       __u64 value;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>     + *
>     + * This structure is passed to VM_BIND ioctl and specifies the mapping
>     of GPU
>     + * virtual address (VA) range to the section of an object that should
>     be bound
>     + * in the device page table of the specified address space (VM).
>     + * The VA range specified must be unique (ie., not currently bound) and
>     can
>     + * be mapped to whole object or a section of the object (partial
>     binding).
>     + * Multiple VA mappings can be created to the same section of the
>     object
>     + * (aliasing).
>     + *
>     + * The @start, @offset and @length must be 4K page aligned. However the
>     DG2
>     + * and XEHPSDV has 64K page size for device local-memory and has
>     compact page
>     + * table. On those platforms, for binding device local-memory objects,
>     the
>     + * @start must be 2M aligned, @offset and @length must be 64K aligned.
>
>   This is not acceptable.  We need 64K granularity.  This includes the
>   starting address, the BO offset, and the length.  Why?  The tl;dr is that
>   it's a requirement for about 50% of D3D12 apps if we want them to run on
>   Linux via D3D12.  A longer explanation follows.  I don't necessarily
>   expect kernel folks to get all the details but hopefully I'll have left
>   enough of a map that some of the Intel Mesa folks can help fill in
>   details.
>   Many modern D3D12 apps have a hard requirement on Tier2 tiled resources. 
>   This is a feature that Intel has supported in the D3D12 driver since
>   Skylake.  In order to implement this feature, VKD3D requires the various
>   sparseResidencyImage* and sparseResidency*Sampled Vulkan features.  If we
>   want those apps to work (there's getting to be quite a few of them), we
>   need to implement the Vulkan sparse residency features.
>   What is sparse residency?  I'm glad you asked!  The sparse residency
>   features allow a client to separately bind each miplevel or array slice of
>   an image to a chunk of device memory independently, without affecting any
>   other areas of the image.  Once you get to a high enough miplevel that
>   everything fits inside a single sparse image block (that's a technical
>   Vulkan term you can search for in the spec), you can enter a "miptail"
>   which contains all the remaining miplevels in a single sparse image block.
>   The term "sparse image block" is what the Vulkan spec uses.  On Intel
>   hardware and in the docs, it's what we call a "tile".  Specifically, the
>   image needs to use Yf or Ys tiling on SKL-TGL or a Tile64 on DG2+.  This
>   is because Tile4 and legacy X and Y-tiling don't provide any guarantees
>   about page alignment for slices.  Yf, Ys, and Tile64, on the other hand,
>   align all slices of the image to a tile boundary, allowing us to map
>   memory to different slices independently, assuming we have 64K (or 4K for
>   Yf) VM_BIND granularity.  (4K isn't actually a requirement for SKL-TGL; we
>   can use Ys all the time which has 64K tiles but there's no reason to not
>   support 4K alignments on integrated.)
>   Someone may be tempted to ask, "Can't we wiggle the strides around or
>   something to make it work?"  I thought about that and no, you can't.  The
>   problem here is LOD2+.  Sure, you can have a stride such that the image is
>   a multiple of 2M worth of tiles across.  That'll work fine for LOD0 and
>   LOD1; both will be 2M aligned.  However, LOD2 won't be and there's no way
>   to control that.  The hardware will place it to the right of LOD1 by
>   ROUND_UP(width, tile_width) pixels and there's nothing you can do about
>   that.  If that position doesn't happen to hit a 2M boundary, you're out of
>   luck.
>   I hope that explanation provides enough detail.  Sadly, this is one of
>   those things which has a lot of moving pieces all over different bits of
>   the hardware and various APIs and they all have to work together just
>   right for it to all come out in the end.  But, yeah, we really need 64K
>   aligned binding if we want VKD3D to work.

Thanks Jason,

We currently had 64K alignment for VM_BIND, But currently for non-VM_BIND
scenario (for soft-pinning), there is a 2M alignment requirement and I just
kept the same for VM_BIND. This was discussed here.
https://lists.freedesktop.org/archives/intel-gfx/2022-June/299185.html

So Matt Auld,
If this 2M requirement is not going to cut it for Mesa sparse binding feature,
I think we will have to go with 64K alignment but ensuring UMD doesn't mix
and match 64K and 4K mappings in same 2M range. What do you think?

Niranjana

>   --Jason
>    
>
>     + * Also, for such mappings, i915 will reserve the whole 2M range for it
>     so as
>     + * to not allow multiple mappings in that 2M range (Compact page tables
>     do not
>     + * allow 64K page and 4K page bindings in the same 2M range).
>     + *
>     + * Error code -EINVAL will be returned if @start, @offset and @length
>     are not
>     + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION),
>     error code
>     + * -ENOSPC will be returned if the VA range specified can't be
>     reserved.
>     + *
>     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>     concurrently
>     + * are not ordered. Furthermore, parts of the VM_BIND operation can be
>     done
>     + * asynchronously, if valid @fence is specified.
>     + */
>     +struct drm_i915_gem_vm_bind {
>     +       /** @vm_id: VM (address space) id to bind */
>     +       __u32 vm_id;
>     +
>     +       /** @handle: Object handle */
>     +       __u32 handle;
>     +
>     +       /** @start: Virtual Address start to bind */
>     +       __u64 start;
>     +
>     +       /** @offset: Offset in object to bind */
>     +       __u64 offset;
>     +
>     +       /** @length: Length of mapping to bind */
>     +       __u64 length;
>     +
>     +       /**
>     +        * @flags: Supported flags are:
>     +        *
>     +        * I915_GEM_VM_BIND_READONLY:
>     +        * Mapping is read-only.
>     +        *
>     +        * I915_GEM_VM_BIND_CAPTURE:
>     +        * Capture this mapping in the dump upon GPU error.
>     +        */
>     +       __u64 flags;
>     +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
>     +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
>     +
>     +       /**
>     +        * @fence: Timeline fence for bind completion signaling.
>     +        *
>     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>     +        * is invalid, and an error will be returned.
>     +        */
>     +       struct drm_i915_gem_timeline_fence fence;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>     + *
>     + * This structure is passed to VM_UNBIND ioctl and specifies the GPU
>     virtual
>     + * address (VA) range that should be unbound from the device page table
>     of the
>     + * specified address space (VM). VM_UNBIND will force unbind the
>     specified
>     + * range from device page table without waiting for any GPU job to
>     complete.
>     + * It is UMDs responsibility to ensure the mapping is no longer in use
>     before
>     + * calling VM_UNBIND.
>     + *
>     + * If the specified mapping is not found, the ioctl will simply return
>     without
>     + * any error.
>     + *
>     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>     concurrently
>     + * are not ordered. Furthermore, parts of the VM_UNBIND operation can
>     be done
>     + * asynchronously, if valid @fence is specified.
>     + */
>     +struct drm_i915_gem_vm_unbind {
>     +       /** @vm_id: VM (address space) id to bind */
>     +       __u32 vm_id;
>     +
>     +       /** @rsvd: Reserved, MBZ */
>     +       __u32 rsvd;
>     +
>     +       /** @start: Virtual Address start to unbind */
>     +       __u64 start;
>     +
>     +       /** @length: Length of mapping to unbind */
>     +       __u64 length;
>     +
>     +       /** @flags: Currently reserved, MBZ */
>     +       __u64 flags;
>     +
>     +       /**
>     +        * @fence: Timeline fence for unbind completion signaling.
>     +        *
>     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>     +        * is invalid, and an error will be returned.
>     +        */
>     +       struct drm_i915_gem_timeline_fence fence;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_execbuffer3 - Structure for
>     DRM_I915_GEM_EXECBUFFER3
>     + * ioctl.
>     + *
>     + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and
>     VM_BIND mode
>     + * only works with this ioctl for submission.
>     + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>     + */
>     +struct drm_i915_gem_execbuffer3 {
>     +       /**
>     +        * @ctx_id: Context id
>     +        *
>     +        * Only contexts with user engine map are allowed.
>     +        */
>     +       __u32 ctx_id;
>     +
>     +       /**
>     +        * @engine_idx: Engine index
>     +        *
>     +        * An index in the user engine map of the context specified by
>     @ctx_id.
>     +        */
>     +       __u32 engine_idx;
>     +
>     +       /**
>     +        * @batch_address: Batch gpu virtual address/es.
>     +        *
>     +        * For normal submission, it is the gpu virtual address of the
>     batch
>     +        * buffer. For parallel submission, it is a pointer to an array
>     of
>     +        * batch buffer gpu virtual addresses with array size equal to
>     the
>     +        * number of (parallel) engines involved in that submission (See
>     +        * struct i915_context_engines_parallel_submit).
>     +        */
>     +       __u64 batch_address;
>     +
>     +       /** @flags: Currently reserved, MBZ */
>     +       __u64 flags;
>     +
>     +       /** @rsvd1: Reserved, MBZ */
>     +       __u32 rsvd1;
>     +
>     +       /** @fence_count: Number of fences in @timeline_fences array. */
>     +       __u32 fence_count;
>     +
>     +       /**
>     +        * @timeline_fences: Pointer to an array of timeline fences.
>     +        *
>     +        * Timeline fences are of format struct
>     drm_i915_gem_timeline_fence.
>     +        */
>     +       __u64 timeline_fences;
>     +
>     +       /** @rsvd2: Reserved, MBZ */
>     +       __u64 rsvd2;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
>     object
>     + * private to the specified VM.
>     + *
>     + * See struct drm_i915_gem_create_ext.
>     + */
>     +struct drm_i915_gem_create_ext_vm_private {
>     +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
>     +       /** @base: Extension link. See struct i915_user_extension. */
>     +       struct i915_user_extension base;
>     +
>     +       /** @vm_id: Id of the VM to which the object is private */
>     +       __u32 vm_id;
>     +};
>     --
>     2.21.0.rc0.32.g243a4c7e27

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-30  6:15       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-30  6:15 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Paulo Zanoni, Maling list - DRI developers, Intel GFX,
	Chris Wilson, Thomas Hellstrom, Daniel Vetter,
	Christian König, Matthew Auld

On Thu, Jun 30, 2022 at 12:11:15AM -0500, Jason Ekstrand wrote:
>   On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura
>   <niranjana.vishwanathapura@intel.com> wrote:
>
>     VM_BIND and related uapi definitions
>
>     v2: Reduce the scope to simple Mesa use case.
>     v3: Expand VM_UNBIND documentation and add
>         I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>         and I915_GEM_VM_BIND_TLB_FLUSH flags.
>     v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>         documentation for vm_bind/unbind.
>     v5: Remove TLB flush requirement on VM_UNBIND.
>         Add version support to stage implementation.
>     v6: Define and use drm_i915_gem_timeline_fence structure for
>         all timeline fences.
>     v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>         Update documentation on async vm_bind/unbind and versioning.
>         Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>         batch_count field and I915_EXEC3_SECURE flag.
>
>     Signed-off-by: Niranjana Vishwanathapura
>     <niranjana.vishwanathapura@intel.com>
>     Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>     ---
>      Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>      1 file changed, 280 insertions(+)
>      create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>
>     diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
>     b/Documentation/gpu/rfc/i915_vm_bind.h
>     new file mode 100644
>     index 000000000000..a93e08bceee6
>     --- /dev/null
>     +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>     @@ -0,0 +1,280 @@
>     +/* SPDX-License-Identifier: MIT */
>     +/*
>     + * Copyright © 2022 Intel Corporation
>     + */
>     +
>     +/**
>     + * DOC: I915_PARAM_VM_BIND_VERSION
>     + *
>     + * VM_BIND feature version supported.
>     + * See typedef drm_i915_getparam_t param.
>     + *
>     + * Specifies the VM_BIND feature version supported.
>     + * The following versions of VM_BIND have been defined:
>     + *
>     + * 0: No VM_BIND support.
>     + *
>     + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
>     created
>     + *    previously with VM_BIND, the ioctl will not support unbinding
>     multiple
>     + *    mappings or splitting them. Similarly, VM_BIND calls will not
>     replace
>     + *    any existing mappings.
>     + *
>     + * 2: The restrictions on unbinding partial or multiple mappings is
>     + *    lifted, Similarly, binding will replace any mappings in the given
>     range.
>     + *
>     + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>     + */
>     +#define I915_PARAM_VM_BIND_VERSION     57
>     +
>     +/**
>     + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>     + *
>     + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>     + * See struct drm_i915_gem_vm_control flags.
>     + *
>     + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>     + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept
>     any
>     + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>     + */
>     +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
>     +
>     +/* VM_BIND related ioctls */
>     +#define DRM_I915_GEM_VM_BIND           0x3d
>     +#define DRM_I915_GEM_VM_UNBIND         0x3e
>     +#define DRM_I915_GEM_EXECBUFFER3       0x3f
>     +
>     +#define DRM_IOCTL_I915_GEM_VM_BIND           
>      DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
>     drm_i915_gem_vm_bind)
>     +#define DRM_IOCTL_I915_GEM_VM_UNBIND         
>      DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
>     drm_i915_gem_vm_bind)
>     +#define DRM_IOCTL_I915_GEM_EXECBUFFER3       
>      DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
>     drm_i915_gem_execbuffer3)
>     +
>     +/**
>     + * struct drm_i915_gem_timeline_fence - An input or output timeline
>     fence.
>     + *
>     + * The operation will wait for input fence to signal.
>     + *
>     + * The returned output fence will be signaled after the completion of
>     the
>     + * operation.
>     + */
>     +struct drm_i915_gem_timeline_fence {
>     +       /** @handle: User's handle for a drm_syncobj to wait on or
>     signal. */
>     +       __u32 handle;
>     +
>     +       /**
>     +        * @flags: Supported flags are:
>     +        *
>     +        * I915_TIMELINE_FENCE_WAIT:
>     +        * Wait for the input fence before the operation.
>     +        *
>     +        * I915_TIMELINE_FENCE_SIGNAL:
>     +        * Return operation completion fence as output.
>     +        */
>     +       __u32 flags;
>     +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>     +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>     +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS
>     (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>     +
>     +       /**
>     +        * @value: A point in the timeline.
>     +        * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>     +        * timeline drm_syncobj is invalid as it turns a drm_syncobj
>     into a
>     +        * binary one.
>     +        */
>     +       __u64 value;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>     + *
>     + * This structure is passed to VM_BIND ioctl and specifies the mapping
>     of GPU
>     + * virtual address (VA) range to the section of an object that should
>     be bound
>     + * in the device page table of the specified address space (VM).
>     + * The VA range specified must be unique (ie., not currently bound) and
>     can
>     + * be mapped to whole object or a section of the object (partial
>     binding).
>     + * Multiple VA mappings can be created to the same section of the
>     object
>     + * (aliasing).
>     + *
>     + * The @start, @offset and @length must be 4K page aligned. However the
>     DG2
>     + * and XEHPSDV has 64K page size for device local-memory and has
>     compact page
>     + * table. On those platforms, for binding device local-memory objects,
>     the
>     + * @start must be 2M aligned, @offset and @length must be 64K aligned.
>
>   This is not acceptable.  We need 64K granularity.  This includes the
>   starting address, the BO offset, and the length.  Why?  The tl;dr is that
>   it's a requirement for about 50% of D3D12 apps if we want them to run on
>   Linux via D3D12.  A longer explanation follows.  I don't necessarily
>   expect kernel folks to get all the details but hopefully I'll have left
>   enough of a map that some of the Intel Mesa folks can help fill in
>   details.
>   Many modern D3D12 apps have a hard requirement on Tier2 tiled resources. 
>   This is a feature that Intel has supported in the D3D12 driver since
>   Skylake.  In order to implement this feature, VKD3D requires the various
>   sparseResidencyImage* and sparseResidency*Sampled Vulkan features.  If we
>   want those apps to work (there's getting to be quite a few of them), we
>   need to implement the Vulkan sparse residency features.
>   What is sparse residency?  I'm glad you asked!  The sparse residency
>   features allow a client to separately bind each miplevel or array slice of
>   an image to a chunk of device memory independently, without affecting any
>   other areas of the image.  Once you get to a high enough miplevel that
>   everything fits inside a single sparse image block (that's a technical
>   Vulkan term you can search for in the spec), you can enter a "miptail"
>   which contains all the remaining miplevels in a single sparse image block.
>   The term "sparse image block" is what the Vulkan spec uses.  On Intel
>   hardware and in the docs, it's what we call a "tile".  Specifically, the
>   image needs to use Yf or Ys tiling on SKL-TGL or a Tile64 on DG2+.  This
>   is because Tile4 and legacy X and Y-tiling don't provide any guarantees
>   about page alignment for slices.  Yf, Ys, and Tile64, on the other hand,
>   align all slices of the image to a tile boundary, allowing us to map
>   memory to different slices independently, assuming we have 64K (or 4K for
>   Yf) VM_BIND granularity.  (4K isn't actually a requirement for SKL-TGL; we
>   can use Ys all the time which has 64K tiles but there's no reason to not
>   support 4K alignments on integrated.)
>   Someone may be tempted to ask, "Can't we wiggle the strides around or
>   something to make it work?"  I thought about that and no, you can't.  The
>   problem here is LOD2+.  Sure, you can have a stride such that the image is
>   a multiple of 2M worth of tiles across.  That'll work fine for LOD0 and
>   LOD1; both will be 2M aligned.  However, LOD2 won't be and there's no way
>   to control that.  The hardware will place it to the right of LOD1 by
>   ROUND_UP(width, tile_width) pixels and there's nothing you can do about
>   that.  If that position doesn't happen to hit a 2M boundary, you're out of
>   luck.
>   I hope that explanation provides enough detail.  Sadly, this is one of
>   those things which has a lot of moving pieces all over different bits of
>   the hardware and various APIs and they all have to work together just
>   right for it to all come out in the end.  But, yeah, we really need 64K
>   aligned binding if we want VKD3D to work.

Thanks Jason,

We currently had 64K alignment for VM_BIND, But currently for non-VM_BIND
scenario (for soft-pinning), there is a 2M alignment requirement and I just
kept the same for VM_BIND. This was discussed here.
https://lists.freedesktop.org/archives/intel-gfx/2022-June/299185.html

So Matt Auld,
If this 2M requirement is not going to cut it for Mesa sparse binding feature,
I think we will have to go with 64K alignment but ensuring UMD doesn't mix
and match 64K and 4K mappings in same 2M range. What do you think?

Niranjana

>   --Jason
>    
>
>     + * Also, for such mappings, i915 will reserve the whole 2M range for it
>     so as
>     + * to not allow multiple mappings in that 2M range (Compact page tables
>     do not
>     + * allow 64K page and 4K page bindings in the same 2M range).
>     + *
>     + * Error code -EINVAL will be returned if @start, @offset and @length
>     are not
>     + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION),
>     error code
>     + * -ENOSPC will be returned if the VA range specified can't be
>     reserved.
>     + *
>     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>     concurrently
>     + * are not ordered. Furthermore, parts of the VM_BIND operation can be
>     done
>     + * asynchronously, if valid @fence is specified.
>     + */
>     +struct drm_i915_gem_vm_bind {
>     +       /** @vm_id: VM (address space) id to bind */
>     +       __u32 vm_id;
>     +
>     +       /** @handle: Object handle */
>     +       __u32 handle;
>     +
>     +       /** @start: Virtual Address start to bind */
>     +       __u64 start;
>     +
>     +       /** @offset: Offset in object to bind */
>     +       __u64 offset;
>     +
>     +       /** @length: Length of mapping to bind */
>     +       __u64 length;
>     +
>     +       /**
>     +        * @flags: Supported flags are:
>     +        *
>     +        * I915_GEM_VM_BIND_READONLY:
>     +        * Mapping is read-only.
>     +        *
>     +        * I915_GEM_VM_BIND_CAPTURE:
>     +        * Capture this mapping in the dump upon GPU error.
>     +        */
>     +       __u64 flags;
>     +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
>     +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
>     +
>     +       /**
>     +        * @fence: Timeline fence for bind completion signaling.
>     +        *
>     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>     +        * is invalid, and an error will be returned.
>     +        */
>     +       struct drm_i915_gem_timeline_fence fence;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>     + *
>     + * This structure is passed to VM_UNBIND ioctl and specifies the GPU
>     virtual
>     + * address (VA) range that should be unbound from the device page table
>     of the
>     + * specified address space (VM). VM_UNBIND will force unbind the
>     specified
>     + * range from device page table without waiting for any GPU job to
>     complete.
>     + * It is UMDs responsibility to ensure the mapping is no longer in use
>     before
>     + * calling VM_UNBIND.
>     + *
>     + * If the specified mapping is not found, the ioctl will simply return
>     without
>     + * any error.
>     + *
>     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>     concurrently
>     + * are not ordered. Furthermore, parts of the VM_UNBIND operation can
>     be done
>     + * asynchronously, if valid @fence is specified.
>     + */
>     +struct drm_i915_gem_vm_unbind {
>     +       /** @vm_id: VM (address space) id to bind */
>     +       __u32 vm_id;
>     +
>     +       /** @rsvd: Reserved, MBZ */
>     +       __u32 rsvd;
>     +
>     +       /** @start: Virtual Address start to unbind */
>     +       __u64 start;
>     +
>     +       /** @length: Length of mapping to unbind */
>     +       __u64 length;
>     +
>     +       /** @flags: Currently reserved, MBZ */
>     +       __u64 flags;
>     +
>     +       /**
>     +        * @fence: Timeline fence for unbind completion signaling.
>     +        *
>     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>     +        * is invalid, and an error will be returned.
>     +        */
>     +       struct drm_i915_gem_timeline_fence fence;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_execbuffer3 - Structure for
>     DRM_I915_GEM_EXECBUFFER3
>     + * ioctl.
>     + *
>     + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and
>     VM_BIND mode
>     + * only works with this ioctl for submission.
>     + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>     + */
>     +struct drm_i915_gem_execbuffer3 {
>     +       /**
>     +        * @ctx_id: Context id
>     +        *
>     +        * Only contexts with user engine map are allowed.
>     +        */
>     +       __u32 ctx_id;
>     +
>     +       /**
>     +        * @engine_idx: Engine index
>     +        *
>     +        * An index in the user engine map of the context specified by
>     @ctx_id.
>     +        */
>     +       __u32 engine_idx;
>     +
>     +       /**
>     +        * @batch_address: Batch gpu virtual address/es.
>     +        *
>     +        * For normal submission, it is the gpu virtual address of the
>     batch
>     +        * buffer. For parallel submission, it is a pointer to an array
>     of
>     +        * batch buffer gpu virtual addresses with array size equal to
>     the
>     +        * number of (parallel) engines involved in that submission (See
>     +        * struct i915_context_engines_parallel_submit).
>     +        */
>     +       __u64 batch_address;
>     +
>     +       /** @flags: Currently reserved, MBZ */
>     +       __u64 flags;
>     +
>     +       /** @rsvd1: Reserved, MBZ */
>     +       __u32 rsvd1;
>     +
>     +       /** @fence_count: Number of fences in @timeline_fences array. */
>     +       __u32 fence_count;
>     +
>     +       /**
>     +        * @timeline_fences: Pointer to an array of timeline fences.
>     +        *
>     +        * Timeline fences are of format struct
>     drm_i915_gem_timeline_fence.
>     +        */
>     +       __u64 timeline_fences;
>     +
>     +       /** @rsvd2: Reserved, MBZ */
>     +       __u64 rsvd2;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
>     object
>     + * private to the specified VM.
>     + *
>     + * See struct drm_i915_gem_create_ext.
>     + */
>     +struct drm_i915_gem_create_ext_vm_private {
>     +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
>     +       /** @base: Extension link. See struct i915_user_extension. */
>     +       struct i915_user_extension base;
>     +
>     +       /** @vm_id: Id of the VM to which the object is private */
>     +       __u32 vm_id;
>     +};
>     --
>     2.21.0.rc0.32.g243a4c7e27

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30  6:15       ` [Intel-gfx] " Niranjana Vishwanathapura
  (?)
@ 2022-06-30  6:24       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-30  6:24 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Paulo Zanoni, Intel GFX, Chris Wilson, Thomas Hellstrom,
	Maling list - DRI developers, Daniel Vetter,
	Christian König, Matthew Auld

On Wed, Jun 29, 2022 at 11:15:31PM -0700, Niranjana Vishwanathapura wrote:
>On Thu, Jun 30, 2022 at 12:11:15AM -0500, Jason Ekstrand wrote:
>>  On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura
>>  <niranjana.vishwanathapura@intel.com> wrote:
>>
>>    VM_BIND and related uapi definitions
>>
>>    v2: Reduce the scope to simple Mesa use case.
>>    v3: Expand VM_UNBIND documentation and add
>>        I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>>        and I915_GEM_VM_BIND_TLB_FLUSH flags.
>>    v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>>        documentation for vm_bind/unbind.
>>    v5: Remove TLB flush requirement on VM_UNBIND.
>>        Add version support to stage implementation.
>>    v6: Define and use drm_i915_gem_timeline_fence structure for
>>        all timeline fences.
>>    v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>>        Update documentation on async vm_bind/unbind and versioning.
>>        Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>>        batch_count field and I915_EXEC3_SECURE flag.
>>
>>    Signed-off-by: Niranjana Vishwanathapura
>>    <niranjana.vishwanathapura@intel.com>
>>    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>>    ---
>>     Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>>     1 file changed, 280 insertions(+)
>>     create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>
>>    diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
>>    b/Documentation/gpu/rfc/i915_vm_bind.h
>>    new file mode 100644
>>    index 000000000000..a93e08bceee6
>>    --- /dev/null
>>    +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>>    @@ -0,0 +1,280 @@
>>    +/* SPDX-License-Identifier: MIT */
>>    +/*
>>    + * Copyright © 2022 Intel Corporation
>>    + */
>>    +
>>    +/**
>>    + * DOC: I915_PARAM_VM_BIND_VERSION
>>    + *
>>    + * VM_BIND feature version supported.
>>    + * See typedef drm_i915_getparam_t param.
>>    + *
>>    + * Specifies the VM_BIND feature version supported.
>>    + * The following versions of VM_BIND have been defined:
>>    + *
>>    + * 0: No VM_BIND support.
>>    + *
>>    + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
>>    created
>>    + *    previously with VM_BIND, the ioctl will not support unbinding
>>    multiple
>>    + *    mappings or splitting them. Similarly, VM_BIND calls will not
>>    replace
>>    + *    any existing mappings.
>>    + *
>>    + * 2: The restrictions on unbinding partial or multiple mappings is
>>    + *    lifted, Similarly, binding will replace any mappings in the given
>>    range.
>>    + *
>>    + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>>    + */
>>    +#define I915_PARAM_VM_BIND_VERSION     57
>>    +
>>    +/**
>>    + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>>    + *
>>    + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>>    + * See struct drm_i915_gem_vm_control flags.
>>    + *
>>    + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>>    + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept
>>    any
>>    + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>>    + */
>>    +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
>>    +
>>    +/* VM_BIND related ioctls */
>>    +#define DRM_I915_GEM_VM_BIND           0x3d
>>    +#define DRM_I915_GEM_VM_UNBIND         0x3e
>>    +#define DRM_I915_GEM_EXECBUFFER3       0x3f
>>    +
>>    +#define DRM_IOCTL_I915_GEM_VM_BIND                
>>DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
>>    drm_i915_gem_vm_bind)
>>    +#define DRM_IOCTL_I915_GEM_VM_UNBIND              
>>DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
>>    drm_i915_gem_vm_bind)
>>    +#define DRM_IOCTL_I915_GEM_EXECBUFFER3            
>>DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
>>    drm_i915_gem_execbuffer3)
>>    +
>>    +/**
>>    + * struct drm_i915_gem_timeline_fence - An input or output timeline
>>    fence.
>>    + *
>>    + * The operation will wait for input fence to signal.
>>    + *
>>    + * The returned output fence will be signaled after the completion of
>>    the
>>    + * operation.
>>    + */
>>    +struct drm_i915_gem_timeline_fence {
>>    +       /** @handle: User's handle for a drm_syncobj to wait on or
>>    signal. */
>>    +       __u32 handle;
>>    +
>>    +       /**
>>    +        * @flags: Supported flags are:
>>    +        *
>>    +        * I915_TIMELINE_FENCE_WAIT:
>>    +        * Wait for the input fence before the operation.
>>    +        *
>>    +        * I915_TIMELINE_FENCE_SIGNAL:
>>    +        * Return operation completion fence as output.
>>    +        */
>>    +       __u32 flags;
>>    +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>>    +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>>    +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS
>>    (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>>    +
>>    +       /**
>>    +        * @value: A point in the timeline.
>>    +        * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>>    +        * timeline drm_syncobj is invalid as it turns a drm_syncobj
>>    into a
>>    +        * binary one.
>>    +        */
>>    +       __u64 value;
>>    +};
>>    +
>>    +/**
>>    + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>>    + *
>>    + * This structure is passed to VM_BIND ioctl and specifies the mapping
>>    of GPU
>>    + * virtual address (VA) range to the section of an object that should
>>    be bound
>>    + * in the device page table of the specified address space (VM).
>>    + * The VA range specified must be unique (ie., not currently bound) and
>>    can
>>    + * be mapped to whole object or a section of the object (partial
>>    binding).
>>    + * Multiple VA mappings can be created to the same section of the
>>    object
>>    + * (aliasing).
>>    + *
>>    + * The @start, @offset and @length must be 4K page aligned. However the
>>    DG2
>>    + * and XEHPSDV has 64K page size for device local-memory and has
>>    compact page
>>    + * table. On those platforms, for binding device local-memory objects,
>>    the
>>    + * @start must be 2M aligned, @offset and @length must be 64K aligned.
>>
>>  This is not acceptable.  We need 64K granularity.  This includes the
>>  starting address, the BO offset, and the length.  Why?  The tl;dr is that
>>  it's a requirement for about 50% of D3D12 apps if we want them to run on
>>  Linux via D3D12.  A longer explanation follows.  I don't necessarily
>>  expect kernel folks to get all the details but hopefully I'll have left
>>  enough of a map that some of the Intel Mesa folks can help fill in
>>  details.
>>  Many modern D3D12 apps have a hard requirement on Tier2 tiled 
>>resources.   This is a feature that Intel has supported in the D3D12 
>>driver since
>>  Skylake.  In order to implement this feature, VKD3D requires the various
>>  sparseResidencyImage* and sparseResidency*Sampled Vulkan features.  If we
>>  want those apps to work (there's getting to be quite a few of them), we
>>  need to implement the Vulkan sparse residency features.
>>  What is sparse residency?  I'm glad you asked!  The sparse residency
>>  features allow a client to separately bind each miplevel or array slice of
>>  an image to a chunk of device memory independently, without affecting any
>>  other areas of the image.  Once you get to a high enough miplevel that
>>  everything fits inside a single sparse image block (that's a technical
>>  Vulkan term you can search for in the spec), you can enter a "miptail"
>>  which contains all the remaining miplevels in a single sparse image block.
>>  The term "sparse image block" is what the Vulkan spec uses.  On Intel
>>  hardware and in the docs, it's what we call a "tile".  Specifically, the
>>  image needs to use Yf or Ys tiling on SKL-TGL or a Tile64 on DG2+.  This
>>  is because Tile4 and legacy X and Y-tiling don't provide any guarantees
>>  about page alignment for slices.  Yf, Ys, and Tile64, on the other hand,
>>  align all slices of the image to a tile boundary, allowing us to map
>>  memory to different slices independently, assuming we have 64K (or 4K for
>>  Yf) VM_BIND granularity.  (4K isn't actually a requirement for SKL-TGL; we
>>  can use Ys all the time which has 64K tiles but there's no reason to not
>>  support 4K alignments on integrated.)
>>  Someone may be tempted to ask, "Can't we wiggle the strides around or
>>  something to make it work?"  I thought about that and no, you can't.  The
>>  problem here is LOD2+.  Sure, you can have a stride such that the image is
>>  a multiple of 2M worth of tiles across.  That'll work fine for LOD0 and
>>  LOD1; both will be 2M aligned.  However, LOD2 won't be and there's no way
>>  to control that.  The hardware will place it to the right of LOD1 by
>>  ROUND_UP(width, tile_width) pixels and there's nothing you can do about
>>  that.  If that position doesn't happen to hit a 2M boundary, you're out of
>>  luck.
>>  I hope that explanation provides enough detail.  Sadly, this is one of
>>  those things which has a lot of moving pieces all over different bits of
>>  the hardware and various APIs and they all have to work together just
>>  right for it to all come out in the end.  But, yeah, we really need 64K
>>  aligned binding if we want VKD3D to work.
>
>Thanks Jason,
>
>We currently had 64K alignment for VM_BIND, But currently for non-VM_BIND

s/We currently had/We previously had/

>scenario (for soft-pinning), there is a 2M alignment requirement and I just
>kept the same for VM_BIND. This was discussed here.
>https://lists.freedesktop.org/archives/intel-gfx/2022-June/299185.html
>
>So Matt Auld,
>If this 2M requirement is not going to cut it for Mesa sparse binding feature,
>I think we will have to go with 64K alignment but ensuring UMD doesn't mix
>and match 64K and 4K mappings in same 2M range. What do you think?
>
>Niranjana
>
>>  --Jason
>>
>>    + * Also, for such mappings, i915 will reserve the whole 2M range for it
>>    so as
>>    + * to not allow multiple mappings in that 2M range (Compact page tables
>>    do not
>>    + * allow 64K page and 4K page bindings in the same 2M range).
>>    + *
>>    + * Error code -EINVAL will be returned if @start, @offset and @length
>>    are not
>>    + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION),
>>    error code
>>    + * -ENOSPC will be returned if the VA range specified can't be
>>    reserved.
>>    + *
>>    + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>>    concurrently
>>    + * are not ordered. Furthermore, parts of the VM_BIND operation can be
>>    done
>>    + * asynchronously, if valid @fence is specified.
>>    + */
>>    +struct drm_i915_gem_vm_bind {
>>    +       /** @vm_id: VM (address space) id to bind */
>>    +       __u32 vm_id;
>>    +
>>    +       /** @handle: Object handle */
>>    +       __u32 handle;
>>    +
>>    +       /** @start: Virtual Address start to bind */
>>    +       __u64 start;
>>    +
>>    +       /** @offset: Offset in object to bind */
>>    +       __u64 offset;
>>    +
>>    +       /** @length: Length of mapping to bind */
>>    +       __u64 length;
>>    +
>>    +       /**
>>    +        * @flags: Supported flags are:
>>    +        *
>>    +        * I915_GEM_VM_BIND_READONLY:
>>    +        * Mapping is read-only.
>>    +        *
>>    +        * I915_GEM_VM_BIND_CAPTURE:
>>    +        * Capture this mapping in the dump upon GPU error.
>>    +        */
>>    +       __u64 flags;
>>    +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
>>    +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
>>    +
>>    +       /**
>>    +        * @fence: Timeline fence for bind completion signaling.
>>    +        *
>>    +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>>    +        * is invalid, and an error will be returned.
>>    +        */
>>    +       struct drm_i915_gem_timeline_fence fence;
>>    +
>>    +       /**
>>    +        * @extensions: Zero-terminated chain of extensions.
>>    +        *
>>    +        * For future extensions. See struct i915_user_extension.
>>    +        */
>>    +       __u64 extensions;
>>    +};
>>    +
>>    +/**
>>    + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>>    + *
>>    + * This structure is passed to VM_UNBIND ioctl and specifies the GPU
>>    virtual
>>    + * address (VA) range that should be unbound from the device page table
>>    of the
>>    + * specified address space (VM). VM_UNBIND will force unbind the
>>    specified
>>    + * range from device page table without waiting for any GPU job to
>>    complete.
>>    + * It is UMDs responsibility to ensure the mapping is no longer in use
>>    before
>>    + * calling VM_UNBIND.
>>    + *
>>    + * If the specified mapping is not found, the ioctl will simply return
>>    without
>>    + * any error.
>>    + *
>>    + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>>    concurrently
>>    + * are not ordered. Furthermore, parts of the VM_UNBIND operation can
>>    be done
>>    + * asynchronously, if valid @fence is specified.
>>    + */
>>    +struct drm_i915_gem_vm_unbind {
>>    +       /** @vm_id: VM (address space) id to bind */
>>    +       __u32 vm_id;
>>    +
>>    +       /** @rsvd: Reserved, MBZ */
>>    +       __u32 rsvd;
>>    +
>>    +       /** @start: Virtual Address start to unbind */
>>    +       __u64 start;
>>    +
>>    +       /** @length: Length of mapping to unbind */
>>    +       __u64 length;
>>    +
>>    +       /** @flags: Currently reserved, MBZ */
>>    +       __u64 flags;
>>    +
>>    +       /**
>>    +        * @fence: Timeline fence for unbind completion signaling.
>>    +        *
>>    +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>>    +        * is invalid, and an error will be returned.
>>    +        */
>>    +       struct drm_i915_gem_timeline_fence fence;
>>    +
>>    +       /**
>>    +        * @extensions: Zero-terminated chain of extensions.
>>    +        *
>>    +        * For future extensions. See struct i915_user_extension.
>>    +        */
>>    +       __u64 extensions;
>>    +};
>>    +
>>    +/**
>>    + * struct drm_i915_gem_execbuffer3 - Structure for
>>    DRM_I915_GEM_EXECBUFFER3
>>    + * ioctl.
>>    + *
>>    + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and
>>    VM_BIND mode
>>    + * only works with this ioctl for submission.
>>    + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>>    + */
>>    +struct drm_i915_gem_execbuffer3 {
>>    +       /**
>>    +        * @ctx_id: Context id
>>    +        *
>>    +        * Only contexts with user engine map are allowed.
>>    +        */
>>    +       __u32 ctx_id;
>>    +
>>    +       /**
>>    +        * @engine_idx: Engine index
>>    +        *
>>    +        * An index in the user engine map of the context specified by
>>    @ctx_id.
>>    +        */
>>    +       __u32 engine_idx;
>>    +
>>    +       /**
>>    +        * @batch_address: Batch gpu virtual address/es.
>>    +        *
>>    +        * For normal submission, it is the gpu virtual address of the
>>    batch
>>    +        * buffer. For parallel submission, it is a pointer to an array
>>    of
>>    +        * batch buffer gpu virtual addresses with array size equal to
>>    the
>>    +        * number of (parallel) engines involved in that submission (See
>>    +        * struct i915_context_engines_parallel_submit).
>>    +        */
>>    +       __u64 batch_address;
>>    +
>>    +       /** @flags: Currently reserved, MBZ */
>>    +       __u64 flags;
>>    +
>>    +       /** @rsvd1: Reserved, MBZ */
>>    +       __u32 rsvd1;
>>    +
>>    +       /** @fence_count: Number of fences in @timeline_fences array. */
>>    +       __u32 fence_count;
>>    +
>>    +       /**
>>    +        * @timeline_fences: Pointer to an array of timeline fences.
>>    +        *
>>    +        * Timeline fences are of format struct
>>    drm_i915_gem_timeline_fence.
>>    +        */
>>    +       __u64 timeline_fences;
>>    +
>>    +       /** @rsvd2: Reserved, MBZ */
>>    +       __u64 rsvd2;
>>    +
>>    +       /**
>>    +        * @extensions: Zero-terminated chain of extensions.
>>    +        *
>>    +        * For future extensions. See struct i915_user_extension.
>>    +        */
>>    +       __u64 extensions;
>>    +};
>>    +
>>    +/**
>>    + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
>>    object
>>    + * private to the specified VM.
>>    + *
>>    + * See struct drm_i915_gem_create_ext.
>>    + */
>>    +struct drm_i915_gem_create_ext_vm_private {
>>    +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
>>    +       /** @base: Extension link. See struct i915_user_extension. */
>>    +       struct i915_user_extension base;
>>    +
>>    +       /** @vm_id: Id of the VM to which the object is private */
>>    +       __u32 vm_id;
>>    +};
>>    --
>>    2.21.0.rc0.32.g243a4c7e27

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30  6:08       ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-06-30  6:39         ` Zanoni, Paulo R
  -1 siblings, 0 replies; 53+ messages in thread
From: Zanoni, Paulo R @ 2022-06-30  6:39 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Brost, Matthew, Ursulin, Tvrtko, intel-gfx, dri-devel, Hellstrom,
	Thomas, Zeng, Oak, Wilson, Chris P, jason, Vetter, Daniel,
	Landwerlin, Lionel G, christian.koenig, Auld, Matthew

On Wed, 2022-06-29 at 23:08 -0700, Niranjana Vishwanathapura wrote:
> On Wed, Jun 29, 2022 at 05:33:49PM -0700, Zanoni, Paulo R wrote:
> > On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
> > > VM_BIND and related uapi definitions
> > > 
> > > v2: Reduce the scope to simple Mesa use case.
> > > v3: Expand VM_UNBIND documentation and add
> > >     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
> > >     and I915_GEM_VM_BIND_TLB_FLUSH flags.
> > > v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
> > >     documentation for vm_bind/unbind.
> > > v5: Remove TLB flush requirement on VM_UNBIND.
> > >     Add version support to stage implementation.
> > > v6: Define and use drm_i915_gem_timeline_fence structure for
> > >     all timeline fences.
> > > v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
> > >     Update documentation on async vm_bind/unbind and versioning.
> > >     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
> > >     batch_count field and I915_EXEC3_SECURE flag.
> > > 
> > > Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > ---
> > >  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
> > >  1 file changed, 280 insertions(+)
> > >  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> > > 
> > > diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
> > > new file mode 100644
> > > index 000000000000..a93e08bceee6
> > > --- /dev/null
> > > +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> > > @@ -0,0 +1,280 @@
> > > +/* SPDX-License-Identifier: MIT */
> > > +/*
> > > + * Copyright © 2022 Intel Corporation
> > > + */
> > > +
> > > +/**
> > > + * DOC: I915_PARAM_VM_BIND_VERSION
> > > + *
> > > + * VM_BIND feature version supported.
> > > + * See typedef drm_i915_getparam_t param.
> > > + *
> > > + * Specifies the VM_BIND feature version supported.
> > > + * The following versions of VM_BIND have been defined:
> > > + *
> > > + * 0: No VM_BIND support.
> > > + *
> > > + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
> > > + *    previously with VM_BIND, the ioctl will not support unbinding multiple
> > > + *    mappings or splitting them. Similarly, VM_BIND calls will not replace
> > > + *    any existing mappings.
> > > + *
> > > + * 2: The restrictions on unbinding partial or multiple mappings is
> > > + *    lifted, Similarly, binding will replace any mappings in the given range.
> > > + *
> > > + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
> > > + */
> > > +#define I915_PARAM_VM_BIND_VERSION   57
> > > +
> > > +/**
> > > + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> > > + *
> > > + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> > > + * See struct drm_i915_gem_vm_control flags.
> > > + *
> > > + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> > > + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
> > > + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> > > + */
> > > +#define I915_VM_CREATE_FLAGS_USE_VM_BIND     (1 << 0)
> > > +
> > > +/* VM_BIND related ioctls */
> > > +#define DRM_I915_GEM_VM_BIND         0x3d
> > > +#define DRM_I915_GEM_VM_UNBIND               0x3e
> > > +#define DRM_I915_GEM_EXECBUFFER3     0x3f
> > > +
> > > +#define DRM_IOCTL_I915_GEM_VM_BIND           DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> > > +#define DRM_IOCTL_I915_GEM_VM_UNBIND         DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> > > +#define DRM_IOCTL_I915_GEM_EXECBUFFER3               DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> > > +
> > > +/**
> > > + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> > > + *
> > > + * The operation will wait for input fence to signal.
> > > + *
> > > + * The returned output fence will be signaled after the completion of the
> > > + * operation.
> > > + */
> > > +struct drm_i915_gem_timeline_fence {
> > > +     /** @handle: User's handle for a drm_syncobj to wait on or signal. */
> > > +     __u32 handle;
> > > +
> > > +     /**
> > > +      * @flags: Supported flags are:
> > > +      *
> > > +      * I915_TIMELINE_FENCE_WAIT:
> > > +      * Wait for the input fence before the operation.
> > > +      *
> > > +      * I915_TIMELINE_FENCE_SIGNAL:
> > > +      * Return operation completion fence as output.
> > > +      */
> > > +     __u32 flags;
> > > +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> > > +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> > > +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
> > > +
> > > +     /**
> > > +      * @value: A point in the timeline.
> > > +      * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> > > +      * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> > > +      * binary one.
> > > +      */
> > > +     __u64 value;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> > > + *
> > > + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
> > > + * virtual address (VA) range to the section of an object that should be bound
> > > + * in the device page table of the specified address space (VM).
> > > + * The VA range specified must be unique (ie., not currently bound) and can
> > > + * be mapped to whole object or a section of the object (partial binding).
> > > + * Multiple VA mappings can be created to the same section of the object
> > > + * (aliasing).
> > > + *
> > > + * The @start, @offset and @length must be 4K page aligned. However the DG2
> > > + * and XEHPSDV has 64K page size for device local-memory and has compact page
> > > + * table. On those platforms, for binding device local-memory objects, the
> > > + * @start must be 2M aligned, @offset and @length must be 64K aligned.
> > > + * Also, for such mappings, i915 will reserve the whole 2M range for it so as
> > > + * to not allow multiple mappings in that 2M range (Compact page tables do not
> > > + * allow 64K page and 4K page bindings in the same 2M range).
> > > + *
> > > + * Error code -EINVAL will be returned if @start, @offset and @length are not
> > > + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
> > > + * -ENOSPC will be returned if the VA range specified can't be reserved.
> > > + *
> > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> > > + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
> > > + * asynchronously, if valid @fence is specified.
> > 
> > Does that mean that if I don't provide @fence, then this ioctl will be
> > synchronous (i.e., when it returns, the memory will be guaranteed to be
> > bound)? The text is kinda implying that, but from one of your earlier
> > replies to Tvrtko, that doesn't seem to be the case. I guess we could
> > change the text to make this more explicit.
> > 
> 
> Yes, I thought, if user doesn't specify the out fence, KMD better make
> the ioctl synchronous by waiting until the binding finishes before
> returning. Otherwise, UMD has no way to ensure binding is complete and
> UMD must pass in out fence for VM_BIND calls.
> 
> But latest comment form Daniel on other thread might suggest something else.
> Daniel, can you comment?

Whatever we decide, let's make sure it's documented.

> 
> > In addition, previously we had the guarantee that an execbuf ioctl
> > would wait for all the pending vm_bind operations to finish before
> > doing anything. Do we still have this guarantee or do we have to make
> > use of the fences now?
> > 
> 
> No, we don't have that anymore (execbuf is decoupled from VM_BIND).
> Execbuf3 submission will not wait for any previous VM_BIND to finish.
> UMD must pass in VM_BIND out fence as in fence for execbuf3 to ensure
> that.

Got it, thanks.

> 
> > > + */
> > > +struct drm_i915_gem_vm_bind {
> > > +     /** @vm_id: VM (address space) id to bind */
> > > +     __u32 vm_id;
> > > +
> > > +     /** @handle: Object handle */
> > > +     __u32 handle;
> > > +
> > > +     /** @start: Virtual Address start to bind */
> > > +     __u64 start;
> > > +
> > > +     /** @offset: Offset in object to bind */
> > > +     __u64 offset;
> > > +
> > > +     /** @length: Length of mapping to bind */
> > > +     __u64 length;
> > > +
> > > +     /**
> > > +      * @flags: Supported flags are:
> > > +      *
> > > +      * I915_GEM_VM_BIND_READONLY:
> > > +      * Mapping is read-only.
> > 
> > Can you please explain what happens when we try to write to a range
> > that's bound as read-only?
> > 
> 
> It will be mapped as read-only in device page table. Hence any
> write access will fail. I would expect a CAT error reported.

What's a CAT error? Does this lead to machine freeze or a GPU hang?
Let's make sure we document this.

> 
> I am seeing that currently the page table R/W setting is based
> on whether BO is readonly or not (UMDs can request a userptr
> BO to readonly). We can make this READONLY here as a subset.
> ie., if BO is readonly, the mappings must be readonly. If BO
> is not readonly, then the mapping can be either readonly or
> not.
> 
> But if Mesa doesn't have a use for this, then we can remove
> this flag for now.
> 

I was considering using it for Vulkan's Sparse
residencyNonResidentStrict, so we map all unbound pages to a read-only
page. But for that to work, the required behavior would have to be:
reads all return zero, writes are ignored without any sort of error.

But maybe our hardware provides other ways to implement this, I haven't
checked yet.


> > 
> > > +      *
> > > +      * I915_GEM_VM_BIND_CAPTURE:
> > > +      * Capture this mapping in the dump upon GPU error.
> > > +      */
> > > +     __u64 flags;
> > > +#define I915_GEM_VM_BIND_READONLY    (1 << 1)
> > > +#define I915_GEM_VM_BIND_CAPTURE     (1 << 2)
> > > +
> > > +     /**
> > > +      * @fence: Timeline fence for bind completion signaling.
> > > +      *
> > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> > > +      * is invalid, and an error will be returned.
> > > +      */
> > > +     struct drm_i915_gem_timeline_fence fence;
> > > +
> > > +     /**
> > > +      * @extensions: Zero-terminated chain of extensions.
> > > +      *
> > > +      * For future extensions. See struct i915_user_extension.
> > > +      */
> > > +     __u64 extensions;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> > > + *
> > > + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
> > > + * address (VA) range that should be unbound from the device page table of the
> > > + * specified address space (VM). VM_UNBIND will force unbind the specified
> > > + * range from device page table without waiting for any GPU job to complete.
> > > + * It is UMDs responsibility to ensure the mapping is no longer in use before
> > > + * calling VM_UNBIND.
> > > + *
> > > + * If the specified mapping is not found, the ioctl will simply return without
> > > + * any error.
> > > + *
> > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> > > + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
> > > + * asynchronously, if valid @fence is specified.
> > > + */
> > > +struct drm_i915_gem_vm_unbind {
> > > +     /** @vm_id: VM (address space) id to bind */
> > > +     __u32 vm_id;
> > > +
> > > +     /** @rsvd: Reserved, MBZ */
> > > +     __u32 rsvd;
> > > +
> > > +     /** @start: Virtual Address start to unbind */
> > > +     __u64 start;
> > > +
> > > +     /** @length: Length of mapping to unbind */
> > > +     __u64 length;
> > > +
> > > +     /** @flags: Currently reserved, MBZ */
> > > +     __u64 flags;
> > > +
> > > +     /**
> > > +      * @fence: Timeline fence for unbind completion signaling.
> > > +      *
> > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> > > +      * is invalid, and an error will be returned.
> > > +      */
> > > +     struct drm_i915_gem_timeline_fence fence;
> > > +
> > > +     /**
> > > +      * @extensions: Zero-terminated chain of extensions.
> > > +      *
> > > +      * For future extensions. See struct i915_user_extension.
> > > +      */
> > > +     __u64 extensions;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
> > > + * ioctl.
> > > + *
> > > + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
> > > + * only works with this ioctl for submission.
> > > + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
> > > + */
> > > +struct drm_i915_gem_execbuffer3 {
> > > +     /**
> > > +      * @ctx_id: Context id
> > > +      *
> > > +      * Only contexts with user engine map are allowed.
> > > +      */
> > > +     __u32 ctx_id;
> > > +
> > > +     /**
> > > +      * @engine_idx: Engine index
> > > +      *
> > > +      * An index in the user engine map of the context specified by @ctx_id.
> > > +      */
> > > +     __u32 engine_idx;
> > > +
> > > +     /**
> > > +      * @batch_address: Batch gpu virtual address/es.
> > > +      *
> > > +      * For normal submission, it is the gpu virtual address of the batch
> > > +      * buffer. For parallel submission, it is a pointer to an array of
> > > +      * batch buffer gpu virtual addresses with array size equal to the
> > > +      * number of (parallel) engines involved in that submission (See
> > > +      * struct i915_context_engines_parallel_submit).
> > > +      */
> > > +     __u64 batch_address;
> > > +
> > > +     /** @flags: Currently reserved, MBZ */
> > > +     __u64 flags;
> > > +
> > > +     /** @rsvd1: Reserved, MBZ */
> > > +     __u32 rsvd1;
> > > +
> > > +     /** @fence_count: Number of fences in @timeline_fences array. */
> > > +     __u32 fence_count;
> > > +
> > > +     /**
> > > +      * @timeline_fences: Pointer to an array of timeline fences.
> > > +      *
> > > +      * Timeline fences are of format struct drm_i915_gem_timeline_fence.
> > > +      */
> > > +     __u64 timeline_fences;
> > > +
> > > +     /** @rsvd2: Reserved, MBZ */
> > > +     __u64 rsvd2;
> > > +
> > 
> > Just out of curiosity: if we can extend behavior with @extensions and
> > even @flags, why would we need a rsvd2? Perhaps we could kill rsvd2?
> > 
> 
> True. I added it just in case some requests came up that would require
> some additional fields. During this review process itself there were
> some requests. Adding directly here should have a slight performance
> edge over adding it as an extension (one less copy_from_user).
> 
> But if folks think this is an overkill, I will remove it.

I do not have strong opinions here, I'm just curious.

Thanks,
Paulo

> 
> Niranjana
> 
> > > +     /**
> > > +      * @extensions: Zero-terminated chain of extensions.
> > > +      *
> > > +      * For future extensions. See struct i915_user_extension.
> > > +      */
> > > +     __u64 extensions;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
> > > + * private to the specified VM.
> > > + *
> > > + * See struct drm_i915_gem_create_ext.
> > > + */
> > > +struct drm_i915_gem_create_ext_vm_private {
> > > +#define I915_GEM_CREATE_EXT_VM_PRIVATE               2
> > > +     /** @base: Extension link. See struct i915_user_extension. */
> > > +     struct i915_user_extension base;
> > > +
> > > +     /** @vm_id: Id of the VM to which the object is private */
> > > +     __u32 vm_id;
> > > +};
> > 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-30  6:39         ` Zanoni, Paulo R
  0 siblings, 0 replies; 53+ messages in thread
From: Zanoni, Paulo R @ 2022-06-30  6:39 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: intel-gfx, dri-devel, Hellstrom, Thomas, Wilson, Chris P, Vetter,
	Daniel, christian.koenig, Auld, Matthew

On Wed, 2022-06-29 at 23:08 -0700, Niranjana Vishwanathapura wrote:
> On Wed, Jun 29, 2022 at 05:33:49PM -0700, Zanoni, Paulo R wrote:
> > On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
> > > VM_BIND and related uapi definitions
> > > 
> > > v2: Reduce the scope to simple Mesa use case.
> > > v3: Expand VM_UNBIND documentation and add
> > >     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
> > >     and I915_GEM_VM_BIND_TLB_FLUSH flags.
> > > v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
> > >     documentation for vm_bind/unbind.
> > > v5: Remove TLB flush requirement on VM_UNBIND.
> > >     Add version support to stage implementation.
> > > v6: Define and use drm_i915_gem_timeline_fence structure for
> > >     all timeline fences.
> > > v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
> > >     Update documentation on async vm_bind/unbind and versioning.
> > >     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
> > >     batch_count field and I915_EXEC3_SECURE flag.
> > > 
> > > Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > ---
> > >  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
> > >  1 file changed, 280 insertions(+)
> > >  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> > > 
> > > diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
> > > new file mode 100644
> > > index 000000000000..a93e08bceee6
> > > --- /dev/null
> > > +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> > > @@ -0,0 +1,280 @@
> > > +/* SPDX-License-Identifier: MIT */
> > > +/*
> > > + * Copyright © 2022 Intel Corporation
> > > + */
> > > +
> > > +/**
> > > + * DOC: I915_PARAM_VM_BIND_VERSION
> > > + *
> > > + * VM_BIND feature version supported.
> > > + * See typedef drm_i915_getparam_t param.
> > > + *
> > > + * Specifies the VM_BIND feature version supported.
> > > + * The following versions of VM_BIND have been defined:
> > > + *
> > > + * 0: No VM_BIND support.
> > > + *
> > > + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
> > > + *    previously with VM_BIND, the ioctl will not support unbinding multiple
> > > + *    mappings or splitting them. Similarly, VM_BIND calls will not replace
> > > + *    any existing mappings.
> > > + *
> > > + * 2: The restrictions on unbinding partial or multiple mappings is
> > > + *    lifted, Similarly, binding will replace any mappings in the given range.
> > > + *
> > > + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
> > > + */
> > > +#define I915_PARAM_VM_BIND_VERSION   57
> > > +
> > > +/**
> > > + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> > > + *
> > > + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> > > + * See struct drm_i915_gem_vm_control flags.
> > > + *
> > > + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> > > + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
> > > + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> > > + */
> > > +#define I915_VM_CREATE_FLAGS_USE_VM_BIND     (1 << 0)
> > > +
> > > +/* VM_BIND related ioctls */
> > > +#define DRM_I915_GEM_VM_BIND         0x3d
> > > +#define DRM_I915_GEM_VM_UNBIND               0x3e
> > > +#define DRM_I915_GEM_EXECBUFFER3     0x3f
> > > +
> > > +#define DRM_IOCTL_I915_GEM_VM_BIND           DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> > > +#define DRM_IOCTL_I915_GEM_VM_UNBIND         DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> > > +#define DRM_IOCTL_I915_GEM_EXECBUFFER3               DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> > > +
> > > +/**
> > > + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> > > + *
> > > + * The operation will wait for input fence to signal.
> > > + *
> > > + * The returned output fence will be signaled after the completion of the
> > > + * operation.
> > > + */
> > > +struct drm_i915_gem_timeline_fence {
> > > +     /** @handle: User's handle for a drm_syncobj to wait on or signal. */
> > > +     __u32 handle;
> > > +
> > > +     /**
> > > +      * @flags: Supported flags are:
> > > +      *
> > > +      * I915_TIMELINE_FENCE_WAIT:
> > > +      * Wait for the input fence before the operation.
> > > +      *
> > > +      * I915_TIMELINE_FENCE_SIGNAL:
> > > +      * Return operation completion fence as output.
> > > +      */
> > > +     __u32 flags;
> > > +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> > > +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> > > +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
> > > +
> > > +     /**
> > > +      * @value: A point in the timeline.
> > > +      * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> > > +      * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> > > +      * binary one.
> > > +      */
> > > +     __u64 value;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> > > + *
> > > + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
> > > + * virtual address (VA) range to the section of an object that should be bound
> > > + * in the device page table of the specified address space (VM).
> > > + * The VA range specified must be unique (ie., not currently bound) and can
> > > + * be mapped to whole object or a section of the object (partial binding).
> > > + * Multiple VA mappings can be created to the same section of the object
> > > + * (aliasing).
> > > + *
> > > + * The @start, @offset and @length must be 4K page aligned. However the DG2
> > > + * and XEHPSDV has 64K page size for device local-memory and has compact page
> > > + * table. On those platforms, for binding device local-memory objects, the
> > > + * @start must be 2M aligned, @offset and @length must be 64K aligned.
> > > + * Also, for such mappings, i915 will reserve the whole 2M range for it so as
> > > + * to not allow multiple mappings in that 2M range (Compact page tables do not
> > > + * allow 64K page and 4K page bindings in the same 2M range).
> > > + *
> > > + * Error code -EINVAL will be returned if @start, @offset and @length are not
> > > + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
> > > + * -ENOSPC will be returned if the VA range specified can't be reserved.
> > > + *
> > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> > > + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
> > > + * asynchronously, if valid @fence is specified.
> > 
> > Does that mean that if I don't provide @fence, then this ioctl will be
> > synchronous (i.e., when it returns, the memory will be guaranteed to be
> > bound)? The text is kinda implying that, but from one of your earlier
> > replies to Tvrtko, that doesn't seem to be the case. I guess we could
> > change the text to make this more explicit.
> > 
> 
> Yes, I thought, if user doesn't specify the out fence, KMD better make
> the ioctl synchronous by waiting until the binding finishes before
> returning. Otherwise, UMD has no way to ensure binding is complete and
> UMD must pass in out fence for VM_BIND calls.
> 
> But latest comment form Daniel on other thread might suggest something else.
> Daniel, can you comment?

Whatever we decide, let's make sure it's documented.

> 
> > In addition, previously we had the guarantee that an execbuf ioctl
> > would wait for all the pending vm_bind operations to finish before
> > doing anything. Do we still have this guarantee or do we have to make
> > use of the fences now?
> > 
> 
> No, we don't have that anymore (execbuf is decoupled from VM_BIND).
> Execbuf3 submission will not wait for any previous VM_BIND to finish.
> UMD must pass in VM_BIND out fence as in fence for execbuf3 to ensure
> that.

Got it, thanks.

> 
> > > + */
> > > +struct drm_i915_gem_vm_bind {
> > > +     /** @vm_id: VM (address space) id to bind */
> > > +     __u32 vm_id;
> > > +
> > > +     /** @handle: Object handle */
> > > +     __u32 handle;
> > > +
> > > +     /** @start: Virtual Address start to bind */
> > > +     __u64 start;
> > > +
> > > +     /** @offset: Offset in object to bind */
> > > +     __u64 offset;
> > > +
> > > +     /** @length: Length of mapping to bind */
> > > +     __u64 length;
> > > +
> > > +     /**
> > > +      * @flags: Supported flags are:
> > > +      *
> > > +      * I915_GEM_VM_BIND_READONLY:
> > > +      * Mapping is read-only.
> > 
> > Can you please explain what happens when we try to write to a range
> > that's bound as read-only?
> > 
> 
> It will be mapped as read-only in device page table. Hence any
> write access will fail. I would expect a CAT error reported.

What's a CAT error? Does this lead to machine freeze or a GPU hang?
Let's make sure we document this.

> 
> I am seeing that currently the page table R/W setting is based
> on whether BO is readonly or not (UMDs can request a userptr
> BO to readonly). We can make this READONLY here as a subset.
> ie., if BO is readonly, the mappings must be readonly. If BO
> is not readonly, then the mapping can be either readonly or
> not.
> 
> But if Mesa doesn't have a use for this, then we can remove
> this flag for now.
> 

I was considering using it for Vulkan's Sparse
residencyNonResidentStrict, so we map all unbound pages to a read-only
page. But for that to work, the required behavior would have to be:
reads all return zero, writes are ignored without any sort of error.

But maybe our hardware provides other ways to implement this, I haven't
checked yet.


> > 
> > > +      *
> > > +      * I915_GEM_VM_BIND_CAPTURE:
> > > +      * Capture this mapping in the dump upon GPU error.
> > > +      */
> > > +     __u64 flags;
> > > +#define I915_GEM_VM_BIND_READONLY    (1 << 1)
> > > +#define I915_GEM_VM_BIND_CAPTURE     (1 << 2)
> > > +
> > > +     /**
> > > +      * @fence: Timeline fence for bind completion signaling.
> > > +      *
> > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> > > +      * is invalid, and an error will be returned.
> > > +      */
> > > +     struct drm_i915_gem_timeline_fence fence;
> > > +
> > > +     /**
> > > +      * @extensions: Zero-terminated chain of extensions.
> > > +      *
> > > +      * For future extensions. See struct i915_user_extension.
> > > +      */
> > > +     __u64 extensions;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> > > + *
> > > + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
> > > + * address (VA) range that should be unbound from the device page table of the
> > > + * specified address space (VM). VM_UNBIND will force unbind the specified
> > > + * range from device page table without waiting for any GPU job to complete.
> > > + * It is UMDs responsibility to ensure the mapping is no longer in use before
> > > + * calling VM_UNBIND.
> > > + *
> > > + * If the specified mapping is not found, the ioctl will simply return without
> > > + * any error.
> > > + *
> > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> > > + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
> > > + * asynchronously, if valid @fence is specified.
> > > + */
> > > +struct drm_i915_gem_vm_unbind {
> > > +     /** @vm_id: VM (address space) id to bind */
> > > +     __u32 vm_id;
> > > +
> > > +     /** @rsvd: Reserved, MBZ */
> > > +     __u32 rsvd;
> > > +
> > > +     /** @start: Virtual Address start to unbind */
> > > +     __u64 start;
> > > +
> > > +     /** @length: Length of mapping to unbind */
> > > +     __u64 length;
> > > +
> > > +     /** @flags: Currently reserved, MBZ */
> > > +     __u64 flags;
> > > +
> > > +     /**
> > > +      * @fence: Timeline fence for unbind completion signaling.
> > > +      *
> > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> > > +      * is invalid, and an error will be returned.
> > > +      */
> > > +     struct drm_i915_gem_timeline_fence fence;
> > > +
> > > +     /**
> > > +      * @extensions: Zero-terminated chain of extensions.
> > > +      *
> > > +      * For future extensions. See struct i915_user_extension.
> > > +      */
> > > +     __u64 extensions;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
> > > + * ioctl.
> > > + *
> > > + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
> > > + * only works with this ioctl for submission.
> > > + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
> > > + */
> > > +struct drm_i915_gem_execbuffer3 {
> > > +     /**
> > > +      * @ctx_id: Context id
> > > +      *
> > > +      * Only contexts with user engine map are allowed.
> > > +      */
> > > +     __u32 ctx_id;
> > > +
> > > +     /**
> > > +      * @engine_idx: Engine index
> > > +      *
> > > +      * An index in the user engine map of the context specified by @ctx_id.
> > > +      */
> > > +     __u32 engine_idx;
> > > +
> > > +     /**
> > > +      * @batch_address: Batch gpu virtual address/es.
> > > +      *
> > > +      * For normal submission, it is the gpu virtual address of the batch
> > > +      * buffer. For parallel submission, it is a pointer to an array of
> > > +      * batch buffer gpu virtual addresses with array size equal to the
> > > +      * number of (parallel) engines involved in that submission (See
> > > +      * struct i915_context_engines_parallel_submit).
> > > +      */
> > > +     __u64 batch_address;
> > > +
> > > +     /** @flags: Currently reserved, MBZ */
> > > +     __u64 flags;
> > > +
> > > +     /** @rsvd1: Reserved, MBZ */
> > > +     __u32 rsvd1;
> > > +
> > > +     /** @fence_count: Number of fences in @timeline_fences array. */
> > > +     __u32 fence_count;
> > > +
> > > +     /**
> > > +      * @timeline_fences: Pointer to an array of timeline fences.
> > > +      *
> > > +      * Timeline fences are of format struct drm_i915_gem_timeline_fence.
> > > +      */
> > > +     __u64 timeline_fences;
> > > +
> > > +     /** @rsvd2: Reserved, MBZ */
> > > +     __u64 rsvd2;
> > > +
> > 
> > Just out of curiosity: if we can extend behavior with @extensions and
> > even @flags, why would we need a rsvd2? Perhaps we could kill rsvd2?
> > 
> 
> True. I added it just in case some requests came up that would require
> some additional fields. During this review process itself there were
> some requests. Adding directly here should have a slight performance
> edge over adding it as an extension (one less copy_from_user).
> 
> But if folks think this is an overkill, I will remove it.

I do not have strong opinions here, I'm just curious.

Thanks,
Paulo

> 
> Niranjana
> 
> > > +     /**
> > > +      * @extensions: Zero-terminated chain of extensions.
> > > +      *
> > > +      * For future extensions. See struct i915_user_extension.
> > > +      */
> > > +     __u64 extensions;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
> > > + * private to the specified VM.
> > > + *
> > > + * See struct drm_i915_gem_create_ext.
> > > + */
> > > +struct drm_i915_gem_create_ext_vm_private {
> > > +#define I915_GEM_CREATE_EXT_VM_PRIVATE               2
> > > +     /** @base: Extension link. See struct i915_user_extension. */
> > > +     struct i915_user_extension base;
> > > +
> > > +     /** @vm_id: Id of the VM to which the object is private */
> > > +     __u32 vm_id;
> > > +};
> > 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30  6:08       ` [Intel-gfx] " Niranjana Vishwanathapura
  (?)
  (?)
@ 2022-06-30  7:59       ` Tvrtko Ursulin
  2022-06-30 16:22         ` Niranjana Vishwanathapura
  -1 siblings, 1 reply; 53+ messages in thread
From: Tvrtko Ursulin @ 2022-06-30  7:59 UTC (permalink / raw)
  To: Niranjana Vishwanathapura, Zanoni, Paulo R
  Cc: intel-gfx, dri-devel, Hellstrom, Thomas, Wilson, Chris P, Vetter,
	Daniel, christian.koenig, Auld, Matthew


On 30/06/2022 07:08, Niranjana Vishwanathapura wrote:
> On Wed, Jun 29, 2022 at 05:33:49PM -0700, Zanoni, Paulo R wrote:
>> On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
>>> VM_BIND and related uapi definitions
>>>
>>> v2: Reduce the scope to simple Mesa use case.
>>> v3: Expand VM_UNBIND documentation and add
>>>     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>>>     and I915_GEM_VM_BIND_TLB_FLUSH flags.
>>> v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>>>     documentation for vm_bind/unbind.
>>> v5: Remove TLB flush requirement on VM_UNBIND.
>>>     Add version support to stage implementation.
>>> v6: Define and use drm_i915_gem_timeline_fence structure for
>>>     all timeline fences.
>>> v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>>>     Update documentation on async vm_bind/unbind and versioning.
>>>     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>>>     batch_count field and I915_EXEC3_SECURE flag.
>>>
>>> Signed-off-by: Niranjana Vishwanathapura 
>>> <niranjana.vishwanathapura@intel.com>
>>> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>>> ---
>>>  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>>>  1 file changed, 280 insertions(+)
>>>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>>
>>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
>>> b/Documentation/gpu/rfc/i915_vm_bind.h
>>> new file mode 100644
>>> index 000000000000..a93e08bceee6
>>> --- /dev/null
>>> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>>> @@ -0,0 +1,280 @@
>>> +/* SPDX-License-Identifier: MIT */
>>> +/*
>>> + * Copyright © 2022 Intel Corporation
>>> + */
>>> +
>>> +/**
>>> + * DOC: I915_PARAM_VM_BIND_VERSION
>>> + *
>>> + * VM_BIND feature version supported.
>>> + * See typedef drm_i915_getparam_t param.
>>> + *
>>> + * Specifies the VM_BIND feature version supported.
>>> + * The following versions of VM_BIND have been defined:
>>> + *
>>> + * 0: No VM_BIND support.
>>> + *
>>> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings 
>>> created
>>> + *    previously with VM_BIND, the ioctl will not support unbinding 
>>> multiple
>>> + *    mappings or splitting them. Similarly, VM_BIND calls will not 
>>> replace
>>> + *    any existing mappings.
>>> + *
>>> + * 2: The restrictions on unbinding partial or multiple mappings is
>>> + *    lifted, Similarly, binding will replace any mappings in the 
>>> given range.
>>> + *
>>> + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>>> + */
>>> +#define I915_PARAM_VM_BIND_VERSION   57
>>> +
>>> +/**
>>> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>>> + *
>>> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>>> + * See struct drm_i915_gem_vm_control flags.
>>> + *
>>> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>>> + * For VM_BIND mode, we have new execbuf3 ioctl which will not 
>>> accept any
>>> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>>> + */
>>> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND     (1 << 0)
>>> +
>>> +/* VM_BIND related ioctls */
>>> +#define DRM_I915_GEM_VM_BIND         0x3d
>>> +#define DRM_I915_GEM_VM_UNBIND               0x3e
>>> +#define DRM_I915_GEM_EXECBUFFER3     0x3f
>>> +
>>> +#define DRM_IOCTL_I915_GEM_VM_BIND           
>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct 
>>> drm_i915_gem_vm_bind)
>>> +#define DRM_IOCTL_I915_GEM_VM_UNBIND         
>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct 
>>> drm_i915_gem_vm_bind)
>>> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3               
>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct 
>>> drm_i915_gem_execbuffer3)
>>> +
>>> +/**
>>> + * struct drm_i915_gem_timeline_fence - An input or output timeline 
>>> fence.
>>> + *
>>> + * The operation will wait for input fence to signal.
>>> + *
>>> + * The returned output fence will be signaled after the completion 
>>> of the
>>> + * operation.
>>> + */
>>> +struct drm_i915_gem_timeline_fence {
>>> +     /** @handle: User's handle for a drm_syncobj to wait on or 
>>> signal. */
>>> +     __u32 handle;
>>> +
>>> +     /**
>>> +      * @flags: Supported flags are:
>>> +      *
>>> +      * I915_TIMELINE_FENCE_WAIT:
>>> +      * Wait for the input fence before the operation.
>>> +      *
>>> +      * I915_TIMELINE_FENCE_SIGNAL:
>>> +      * Return operation completion fence as output.
>>> +      */
>>> +     __u32 flags;
>>> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>>> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>>> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS 
>>> (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>>> +
>>> +     /**
>>> +      * @value: A point in the timeline.
>>> +      * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>>> +      * timeline drm_syncobj is invalid as it turns a drm_syncobj 
>>> into a
>>> +      * binary one.
>>> +      */
>>> +     __u64 value;
>>> +};
>>> +
>>> +/**
>>> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>>> + *
>>> + * This structure is passed to VM_BIND ioctl and specifies the 
>>> mapping of GPU
>>> + * virtual address (VA) range to the section of an object that 
>>> should be bound
>>> + * in the device page table of the specified address space (VM).
>>> + * The VA range specified must be unique (ie., not currently bound) 
>>> and can
>>> + * be mapped to whole object or a section of the object (partial 
>>> binding).
>>> + * Multiple VA mappings can be created to the same section of the 
>>> object
>>> + * (aliasing).
>>> + *
>>> + * The @start, @offset and @length must be 4K page aligned. However 
>>> the DG2
>>> + * and XEHPSDV has 64K page size for device local-memory and has 
>>> compact page
>>> + * table. On those platforms, for binding device local-memory 
>>> objects, the
>>> + * @start must be 2M aligned, @offset and @length must be 64K aligned.
>>> + * Also, for such mappings, i915 will reserve the whole 2M range for 
>>> it so as
>>> + * to not allow multiple mappings in that 2M range (Compact page 
>>> tables do not
>>> + * allow 64K page and 4K page bindings in the same 2M range).
>>> + *
>>> + * Error code -EINVAL will be returned if @start, @offset and 
>>> @length are not
>>> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), 
>>> error code
>>> + * -ENOSPC will be returned if the VA range specified can't be 
>>> reserved.
>>> + *
>>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads 
>>> concurrently
>>> + * are not ordered. Furthermore, parts of the VM_BIND operation can 
>>> be done
>>> + * asynchronously, if valid @fence is specified.
>>
>> Does that mean that if I don't provide @fence, then this ioctl will be
>> synchronous (i.e., when it returns, the memory will be guaranteed to be
>> bound)? The text is kinda implying that, but from one of your earlier
>> replies to Tvrtko, that doesn't seem to be the case. I guess we could
>> change the text to make this more explicit.
>>
> 
> Yes, I thought, if user doesn't specify the out fence, KMD better make
> the ioctl synchronous by waiting until the binding finishes before
> returning. Otherwise, UMD has no way to ensure binding is complete and
> UMD must pass in out fence for VM_BIND calls.

This problematic angle is exactly what I raised and I did not understand 
you were suggesting sync behaviour back then.

I suggested a possible execbuf3 extension which makes it wait for any 
pending (un)bind activity on a VM. Sounds better to me than making 
everything sync for the use case of N binds followed by 1 execbuf. *If* 
userspace wants an easy "fire and forget" mode for such use case, rather 
than having to use a fence on all.

Regards,

Tvrtko

> But latest comment form Daniel on other thread might suggest something 
> else.
> Daniel, can you comment?
> 
>> In addition, previously we had the guarantee that an execbuf ioctl
>> would wait for all the pending vm_bind operations to finish before
>> doing anything. Do we still have this guarantee or do we have to make
>> use of the fences now?
>>
> 
> No, we don't have that anymore (execbuf is decoupled from VM_BIND).
> Execbuf3 submission will not wait for any previous VM_BIND to finish.
> UMD must pass in VM_BIND out fence as in fence for execbuf3 to ensure
> that.
> 
>>> + */
>>> +struct drm_i915_gem_vm_bind {
>>> +     /** @vm_id: VM (address space) id to bind */
>>> +     __u32 vm_id;
>>> +
>>> +     /** @handle: Object handle */
>>> +     __u32 handle;
>>> +
>>> +     /** @start: Virtual Address start to bind */
>>> +     __u64 start;
>>> +
>>> +     /** @offset: Offset in object to bind */
>>> +     __u64 offset;
>>> +
>>> +     /** @length: Length of mapping to bind */
>>> +     __u64 length;
>>> +
>>> +     /**
>>> +      * @flags: Supported flags are:
>>> +      *
>>> +      * I915_GEM_VM_BIND_READONLY:
>>> +      * Mapping is read-only.
>>
>> Can you please explain what happens when we try to write to a range
>> that's bound as read-only?
>>
> 
> It will be mapped as read-only in device page table. Hence any
> write access will fail. I would expect a CAT error reported.
> 
> I am seeing that currently the page table R/W setting is based
> on whether BO is readonly or not (UMDs can request a userptr
> BO to readonly). We can make this READONLY here as a subset.
> ie., if BO is readonly, the mappings must be readonly. If BO
> is not readonly, then the mapping can be either readonly or
> not.
> 
> But if Mesa doesn't have a use for this, then we can remove
> this flag for now.
> 
>>
>>> +      *
>>> +      * I915_GEM_VM_BIND_CAPTURE:
>>> +      * Capture this mapping in the dump upon GPU error.
>>> +      */
>>> +     __u64 flags;
>>> +#define I915_GEM_VM_BIND_READONLY    (1 << 1)
>>> +#define I915_GEM_VM_BIND_CAPTURE     (1 << 2)
>>> +
>>> +     /**
>>> +      * @fence: Timeline fence for bind completion signaling.
>>> +      *
>>> +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>>> +      * is invalid, and an error will be returned.
>>> +      */
>>> +     struct drm_i915_gem_timeline_fence fence;
>>> +
>>> +     /**
>>> +      * @extensions: Zero-terminated chain of extensions.
>>> +      *
>>> +      * For future extensions. See struct i915_user_extension.
>>> +      */
>>> +     __u64 extensions;
>>> +};
>>> +
>>> +/**
>>> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>>> + *
>>> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU 
>>> virtual
>>> + * address (VA) range that should be unbound from the device page 
>>> table of the
>>> + * specified address space (VM). VM_UNBIND will force unbind the 
>>> specified
>>> + * range from device page table without waiting for any GPU job to 
>>> complete.
>>> + * It is UMDs responsibility to ensure the mapping is no longer in 
>>> use before
>>> + * calling VM_UNBIND.
>>> + *
>>> + * If the specified mapping is not found, the ioctl will simply 
>>> return without
>>> + * any error.
>>> + *
>>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads 
>>> concurrently
>>> + * are not ordered. Furthermore, parts of the VM_UNBIND operation 
>>> can be done
>>> + * asynchronously, if valid @fence is specified.
>>> + */
>>> +struct drm_i915_gem_vm_unbind {
>>> +     /** @vm_id: VM (address space) id to bind */
>>> +     __u32 vm_id;
>>> +
>>> +     /** @rsvd: Reserved, MBZ */
>>> +     __u32 rsvd;
>>> +
>>> +     /** @start: Virtual Address start to unbind */
>>> +     __u64 start;
>>> +
>>> +     /** @length: Length of mapping to unbind */
>>> +     __u64 length;
>>> +
>>> +     /** @flags: Currently reserved, MBZ */
>>> +     __u64 flags;
>>> +
>>> +     /**
>>> +      * @fence: Timeline fence for unbind completion signaling.
>>> +      *
>>> +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>>> +      * is invalid, and an error will be returned.
>>> +      */
>>> +     struct drm_i915_gem_timeline_fence fence;
>>> +
>>> +     /**
>>> +      * @extensions: Zero-terminated chain of extensions.
>>> +      *
>>> +      * For future extensions. See struct i915_user_extension.
>>> +      */
>>> +     __u64 extensions;
>>> +};
>>> +
>>> +/**
>>> + * struct drm_i915_gem_execbuffer3 - Structure for 
>>> DRM_I915_GEM_EXECBUFFER3
>>> + * ioctl.
>>> + *
>>> + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and 
>>> VM_BIND mode
>>> + * only works with this ioctl for submission.
>>> + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>>> + */
>>> +struct drm_i915_gem_execbuffer3 {
>>> +     /**
>>> +      * @ctx_id: Context id
>>> +      *
>>> +      * Only contexts with user engine map are allowed.
>>> +      */
>>> +     __u32 ctx_id;
>>> +
>>> +     /**
>>> +      * @engine_idx: Engine index
>>> +      *
>>> +      * An index in the user engine map of the context specified by 
>>> @ctx_id.
>>> +      */
>>> +     __u32 engine_idx;
>>> +
>>> +     /**
>>> +      * @batch_address: Batch gpu virtual address/es.
>>> +      *
>>> +      * For normal submission, it is the gpu virtual address of the 
>>> batch
>>> +      * buffer. For parallel submission, it is a pointer to an array of
>>> +      * batch buffer gpu virtual addresses with array size equal to the
>>> +      * number of (parallel) engines involved in that submission (See
>>> +      * struct i915_context_engines_parallel_submit).
>>> +      */
>>> +     __u64 batch_address;
>>> +
>>> +     /** @flags: Currently reserved, MBZ */
>>> +     __u64 flags;
>>> +
>>> +     /** @rsvd1: Reserved, MBZ */
>>> +     __u32 rsvd1;
>>> +
>>> +     /** @fence_count: Number of fences in @timeline_fences array. */
>>> +     __u32 fence_count;
>>> +
>>> +     /**
>>> +      * @timeline_fences: Pointer to an array of timeline fences.
>>> +      *
>>> +      * Timeline fences are of format struct 
>>> drm_i915_gem_timeline_fence.
>>> +      */
>>> +     __u64 timeline_fences;
>>> +
>>> +     /** @rsvd2: Reserved, MBZ */
>>> +     __u64 rsvd2;
>>> +
>>
>> Just out of curiosity: if we can extend behavior with @extensions and
>> even @flags, why would we need a rsvd2? Perhaps we could kill rsvd2?
>>
> 
> True. I added it just in case some requests came up that would require
> some additional fields. During this review process itself there were
> some requests. Adding directly here should have a slight performance
> edge over adding it as an extension (one less copy_from_user).
> 
> But if folks think this is an overkill, I will remove it.
> 
> Niranjana
> 
>>> +     /**
>>> +      * @extensions: Zero-terminated chain of extensions.
>>> +      *
>>> +      * For future extensions. See struct i915_user_extension.
>>> +      */
>>> +     __u64 extensions;
>>> +};
>>> +
>>> +/**
>>> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the 
>>> object
>>> + * private to the specified VM.
>>> + *
>>> + * See struct drm_i915_gem_create_ext.
>>> + */
>>> +struct drm_i915_gem_create_ext_vm_private {
>>> +#define I915_GEM_CREATE_EXT_VM_PRIVATE               2
>>> +     /** @base: Extension link. See struct i915_user_extension. */
>>> +     struct i915_user_extension base;
>>> +
>>> +     /** @vm_id: Id of the VM to which the object is private */
>>> +     __u32 vm_id;
>>> +};
>>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30  5:11     ` [Intel-gfx] " Jason Ekstrand
@ 2022-06-30 15:14       ` Matthew Auld
  -1 siblings, 0 replies; 53+ messages in thread
From: Matthew Auld @ 2022-06-30 15:14 UTC (permalink / raw)
  To: Jason Ekstrand, Niranjana Vishwanathapura
  Cc: Matthew Brost, Paulo Zanoni, Lionel Landwerlin, Tvrtko Ursulin,
	Intel GFX, Chris Wilson, Thomas Hellstrom, oak.zeng,
	Maling list - DRI developers, Daniel Vetter,
	Christian König

On 30/06/2022 06:11, Jason Ekstrand wrote:
> On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura 
> <niranjana.vishwanathapura@intel.com 
> <mailto:niranjana.vishwanathapura@intel.com>> wrote:
> 
>     VM_BIND and related uapi definitions
> 
>     v2: Reduce the scope to simple Mesa use case.
>     v3: Expand VM_UNBIND documentation and add
>          I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>          and I915_GEM_VM_BIND_TLB_FLUSH flags.
>     v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>          documentation for vm_bind/unbind.
>     v5: Remove TLB flush requirement on VM_UNBIND.
>          Add version support to stage implementation.
>     v6: Define and use drm_i915_gem_timeline_fence structure for
>          all timeline fences.
>     v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>          Update documentation on async vm_bind/unbind and versioning.
>          Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>          batch_count field and I915_EXEC3_SECURE flag.
> 
>     Signed-off-by: Niranjana Vishwanathapura
>     <niranjana.vishwanathapura@intel.com
>     <mailto:niranjana.vishwanathapura@intel.com>>
>     Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch
>     <mailto:daniel.vetter@ffwll.ch>>
>     ---
>       Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>       1 file changed, 280 insertions(+)
>       create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> 
>     diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
>     b/Documentation/gpu/rfc/i915_vm_bind.h
>     new file mode 100644
>     index 000000000000..a93e08bceee6
>     --- /dev/null
>     +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>     @@ -0,0 +1,280 @@
>     +/* SPDX-License-Identifier: MIT */
>     +/*
>     + * Copyright © 2022 Intel Corporation
>     + */
>     +
>     +/**
>     + * DOC: I915_PARAM_VM_BIND_VERSION
>     + *
>     + * VM_BIND feature version supported.
>     + * See typedef drm_i915_getparam_t param.
>     + *
>     + * Specifies the VM_BIND feature version supported.
>     + * The following versions of VM_BIND have been defined:
>     + *
>     + * 0: No VM_BIND support.
>     + *
>     + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
>     created
>     + *    previously with VM_BIND, the ioctl will not support unbinding
>     multiple
>     + *    mappings or splitting them. Similarly, VM_BIND calls will not
>     replace
>     + *    any existing mappings.
>     + *
>     + * 2: The restrictions on unbinding partial or multiple mappings is
>     + *    lifted, Similarly, binding will replace any mappings in the
>     given range.
>     + *
>     + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>     + */
>     +#define I915_PARAM_VM_BIND_VERSION     57
>     +
>     +/**
>     + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>     + *
>     + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>     + * See struct drm_i915_gem_vm_control flags.
>     + *
>     + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>     + * For VM_BIND mode, we have new execbuf3 ioctl which will not
>     accept any
>     + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>     + */
>     +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
>     +
>     +/* VM_BIND related ioctls */
>     +#define DRM_I915_GEM_VM_BIND           0x3d
>     +#define DRM_I915_GEM_VM_UNBIND         0x3e
>     +#define DRM_I915_GEM_EXECBUFFER3       0x3f
>     +
>     +#define DRM_IOCTL_I915_GEM_VM_BIND           
>       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
>     drm_i915_gem_vm_bind)
>     +#define DRM_IOCTL_I915_GEM_VM_UNBIND         
>       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
>     drm_i915_gem_vm_bind)
>     +#define DRM_IOCTL_I915_GEM_EXECBUFFER3       
>       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
>     drm_i915_gem_execbuffer3)
>     +
>     +/**
>     + * struct drm_i915_gem_timeline_fence - An input or output timeline
>     fence.
>     + *
>     + * The operation will wait for input fence to signal.
>     + *
>     + * The returned output fence will be signaled after the completion
>     of the
>     + * operation.
>     + */
>     +struct drm_i915_gem_timeline_fence {
>     +       /** @handle: User's handle for a drm_syncobj to wait on or
>     signal. */
>     +       __u32 handle;
>     +
>     +       /**
>     +        * @flags: Supported flags are:
>     +        *
>     +        * I915_TIMELINE_FENCE_WAIT:
>     +        * Wait for the input fence before the operation.
>     +        *
>     +        * I915_TIMELINE_FENCE_SIGNAL:
>     +        * Return operation completion fence as output.
>     +        */
>     +       __u32 flags;
>     +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>     +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>     +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS
>     (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>     +
>     +       /**
>     +        * @value: A point in the timeline.
>     +        * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>     +        * timeline drm_syncobj is invalid as it turns a drm_syncobj
>     into a
>     +        * binary one.
>     +        */
>     +       __u64 value;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>     + *
>     + * This structure is passed to VM_BIND ioctl and specifies the
>     mapping of GPU
>     + * virtual address (VA) range to the section of an object that
>     should be bound
>     + * in the device page table of the specified address space (VM).
>     + * The VA range specified must be unique (ie., not currently bound)
>     and can
>     + * be mapped to whole object or a section of the object (partial
>     binding).
>     + * Multiple VA mappings can be created to the same section of the
>     object
>     + * (aliasing).
>     + *
>     + * The @start, @offset and @length must be 4K page aligned. However
>     the DG2
>     + * and XEHPSDV has 64K page size for device local-memory and has
>     compact page
>     + * table. On those platforms, for binding device local-memory
>     objects, the
>     + * @start must be 2M aligned, @offset and @length must be 64K aligned.
> 
> 
> This is not acceptable.  We need 64K granularity.  This includes the 
> starting address, the BO offset, and the length.  Why?  The tl;dr is 
> that it's a requirement for about 50% of D3D12 apps if we want them to 
> run on Linux via D3D12.  A longer explanation follows.  I don't 
> necessarily expect kernel folks to get all the details but hopefully 
> I'll have left enough of a map that some of the Intel Mesa folks can 
> help fill in details.
> 
> Many modern D3D12 apps have a hard requirement on Tier2 tiled 
> resources.  This is a feature that Intel has supported in the D3D12 
> driver since Skylake.  In order to implement this feature, VKD3D 
> requires the various sparseResidencyImage* and sparseResidency*Sampled 
> Vulkan features.  If we want those apps to work (there's getting to be 
> quite a few of them), we need to implement the Vulkan sparse residency 
> features.
> |
> |
> What is sparse residency?  I'm glad you asked!  The sparse residency 
> features allow a client to separately bind each miplevel or array slice 
> of an image to a chunk of device memory independently, without affecting 
> any other areas of the image.  Once you get to a high enough miplevel 
> that everything fits inside a single sparse image block (that's a 
> technical Vulkan term you can search for in the spec), you can enter a 
> "miptail" which contains all the remaining miplevels in a single sparse 
> image block.
> 
> The term "sparse image block" is what the Vulkan spec uses.  On Intel 
> hardware and in the docs, it's what we call a "tile".  Specifically, the 
> image needs to use Yf or Ys tiling on SKL-TGL or a Tile64 on DG2+.  This 
> is because Tile4 and legacy X and Y-tiling don't provide any guarantees 
> about page alignment for slices.  Yf, Ys, and Tile64, on the other hand, 
> align all slices of the image to a tile boundary, allowing us to map 
> memory to different slices independently, assuming we have 64K (or 4K 
> for Yf) VM_BIND granularity.  (4K isn't actually a requirement for 
> SKL-TGL; we can use Ys all the time which has 64K tiles but there's no 
> reason to not support 4K alignments on integrated.)
> 
> Someone may be tempted to ask, "Can't we wiggle the strides around or 
> something to make it work?"  I thought about that and no, you can't.  
> The problem here is LOD2+.  Sure, you can have a stride such that the 
> image is a multiple of 2M worth of tiles across.  That'll work fine for 
> LOD0 and LOD1; both will be 2M aligned.  However, LOD2 won't be and 
> there's no way to control that.  The hardware will place it to the right 
> of LOD1 by ROUND_UP(width, tile_width) pixels and there's nothing you 
> can do about that.  If that position doesn't happen to hit a 2M 
> boundary, you're out of luck.
> 
> I hope that explanation provides enough detail.  Sadly, this is one of 
> those things which has a lot of moving pieces all over different bits of 
> the hardware and various APIs and they all have to work together just 
> right for it to all come out in the end.  But, yeah, we really need 64K 
> aligned binding if we want VKD3D to work.

Just to confirm, the new model would be to enforce 64K GTT alignment for 
lmem pages, and then for smem pages we would only require 4K alignment, 
but with the added restriction that userspace will never try to mix the 
two (lmem vs smem) within the same 2M va range (page-table). The kernel 
will verify this and throw an error if needed. This model should work 
with the above?

> 
> --Jason
> 
>     + * Also, for such mappings, i915 will reserve the whole 2M range
>     for it so as
>     + * to not allow multiple mappings in that 2M range (Compact page
>     tables do not
>     + * allow 64K page and 4K page bindings in the same 2M range).
>     + *
>     + * Error code -EINVAL will be returned if @start, @offset and
>     @length are not
>     + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION),
>     error code
>     + * -ENOSPC will be returned if the VA range specified can't be
>     reserved.
>     + *
>     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>     concurrently
>     + * are not ordered. Furthermore, parts of the VM_BIND operation can
>     be done
>     + * asynchronously, if valid @fence is specified.
>     + */
>     +struct drm_i915_gem_vm_bind {
>     +       /** @vm_id: VM (address space) id to bind */
>     +       __u32 vm_id;
>     +
>     +       /** @handle: Object handle */
>     +       __u32 handle;
>     +
>     +       /** @start: Virtual Address start to bind */
>     +       __u64 start;
>     +
>     +       /** @offset: Offset in object to bind */
>     +       __u64 offset;
>     +
>     +       /** @length: Length of mapping to bind */
>     +       __u64 length;
>     +
>     +       /**
>     +        * @flags: Supported flags are:
>     +        *
>     +        * I915_GEM_VM_BIND_READONLY:
>     +        * Mapping is read-only.
>     +        *
>     +        * I915_GEM_VM_BIND_CAPTURE:
>     +        * Capture this mapping in the dump upon GPU error.
>     +        */
>     +       __u64 flags;
>     +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
>     +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
>     +
>     +       /**
>     +        * @fence: Timeline fence for bind completion signaling.
>     +        *
>     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>     +        * is invalid, and an error will be returned.
>     +        */
>     +       struct drm_i915_gem_timeline_fence fence;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>     + *
>     + * This structure is passed to VM_UNBIND ioctl and specifies the
>     GPU virtual
>     + * address (VA) range that should be unbound from the device page
>     table of the
>     + * specified address space (VM). VM_UNBIND will force unbind the
>     specified
>     + * range from device page table without waiting for any GPU job to
>     complete.
>     + * It is UMDs responsibility to ensure the mapping is no longer in
>     use before
>     + * calling VM_UNBIND.
>     + *
>     + * If the specified mapping is not found, the ioctl will simply
>     return without
>     + * any error.
>     + *
>     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>     concurrently
>     + * are not ordered. Furthermore, parts of the VM_UNBIND operation
>     can be done
>     + * asynchronously, if valid @fence is specified.
>     + */
>     +struct drm_i915_gem_vm_unbind {
>     +       /** @vm_id: VM (address space) id to bind */
>     +       __u32 vm_id;
>     +
>     +       /** @rsvd: Reserved, MBZ */
>     +       __u32 rsvd;
>     +
>     +       /** @start: Virtual Address start to unbind */
>     +       __u64 start;
>     +
>     +       /** @length: Length of mapping to unbind */
>     +       __u64 length;
>     +
>     +       /** @flags: Currently reserved, MBZ */
>     +       __u64 flags;
>     +
>     +       /**
>     +        * @fence: Timeline fence for unbind completion signaling.
>     +        *
>     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>     +        * is invalid, and an error will be returned.
>     +        */
>     +       struct drm_i915_gem_timeline_fence fence;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_execbuffer3 - Structure for
>     DRM_I915_GEM_EXECBUFFER3
>     + * ioctl.
>     + *
>     + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and
>     VM_BIND mode
>     + * only works with this ioctl for submission.
>     + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>     + */
>     +struct drm_i915_gem_execbuffer3 {
>     +       /**
>     +        * @ctx_id: Context id
>     +        *
>     +        * Only contexts with user engine map are allowed.
>     +        */
>     +       __u32 ctx_id;
>     +
>     +       /**
>     +        * @engine_idx: Engine index
>     +        *
>     +        * An index in the user engine map of the context specified
>     by @ctx_id.
>     +        */
>     +       __u32 engine_idx;
>     +
>     +       /**
>     +        * @batch_address: Batch gpu virtual address/es.
>     +        *
>     +        * For normal submission, it is the gpu virtual address of
>     the batch
>     +        * buffer. For parallel submission, it is a pointer to an
>     array of
>     +        * batch buffer gpu virtual addresses with array size equal
>     to the
>     +        * number of (parallel) engines involved in that submission (See
>     +        * struct i915_context_engines_parallel_submit).
>     +        */
>     +       __u64 batch_address;
>     +
>     +       /** @flags: Currently reserved, MBZ */
>     +       __u64 flags;
>     +
>     +       /** @rsvd1: Reserved, MBZ */
>     +       __u32 rsvd1;
>     +
>     +       /** @fence_count: Number of fences in @timeline_fences array. */
>     +       __u32 fence_count;
>     +
>     +       /**
>     +        * @timeline_fences: Pointer to an array of timeline fences.
>     +        *
>     +        * Timeline fences are of format struct
>     drm_i915_gem_timeline_fence.
>     +        */
>     +       __u64 timeline_fences;
>     +
>     +       /** @rsvd2: Reserved, MBZ */
>     +       __u64 rsvd2;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_create_ext_vm_private - Extension to make
>     the object
>     + * private to the specified VM.
>     + *
>     + * See struct drm_i915_gem_create_ext.
>     + */
>     +struct drm_i915_gem_create_ext_vm_private {
>     +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
>     +       /** @base: Extension link. See struct i915_user_extension. */
>     +       struct i915_user_extension base;
>     +
>     +       /** @vm_id: Id of the VM to which the object is private */
>     +       __u32 vm_id;
>     +};
>     -- 
>     2.21.0.rc0.32.g243a4c7e27
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-30 15:14       ` Matthew Auld
  0 siblings, 0 replies; 53+ messages in thread
From: Matthew Auld @ 2022-06-30 15:14 UTC (permalink / raw)
  To: Jason Ekstrand, Niranjana Vishwanathapura
  Cc: Paulo Zanoni, Intel GFX, Chris Wilson, Thomas Hellstrom,
	Maling list - DRI developers, Daniel Vetter,
	Christian König

On 30/06/2022 06:11, Jason Ekstrand wrote:
> On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura 
> <niranjana.vishwanathapura@intel.com 
> <mailto:niranjana.vishwanathapura@intel.com>> wrote:
> 
>     VM_BIND and related uapi definitions
> 
>     v2: Reduce the scope to simple Mesa use case.
>     v3: Expand VM_UNBIND documentation and add
>          I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>          and I915_GEM_VM_BIND_TLB_FLUSH flags.
>     v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>          documentation for vm_bind/unbind.
>     v5: Remove TLB flush requirement on VM_UNBIND.
>          Add version support to stage implementation.
>     v6: Define and use drm_i915_gem_timeline_fence structure for
>          all timeline fences.
>     v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>          Update documentation on async vm_bind/unbind and versioning.
>          Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>          batch_count field and I915_EXEC3_SECURE flag.
> 
>     Signed-off-by: Niranjana Vishwanathapura
>     <niranjana.vishwanathapura@intel.com
>     <mailto:niranjana.vishwanathapura@intel.com>>
>     Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch
>     <mailto:daniel.vetter@ffwll.ch>>
>     ---
>       Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>       1 file changed, 280 insertions(+)
>       create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> 
>     diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
>     b/Documentation/gpu/rfc/i915_vm_bind.h
>     new file mode 100644
>     index 000000000000..a93e08bceee6
>     --- /dev/null
>     +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>     @@ -0,0 +1,280 @@
>     +/* SPDX-License-Identifier: MIT */
>     +/*
>     + * Copyright © 2022 Intel Corporation
>     + */
>     +
>     +/**
>     + * DOC: I915_PARAM_VM_BIND_VERSION
>     + *
>     + * VM_BIND feature version supported.
>     + * See typedef drm_i915_getparam_t param.
>     + *
>     + * Specifies the VM_BIND feature version supported.
>     + * The following versions of VM_BIND have been defined:
>     + *
>     + * 0: No VM_BIND support.
>     + *
>     + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
>     created
>     + *    previously with VM_BIND, the ioctl will not support unbinding
>     multiple
>     + *    mappings or splitting them. Similarly, VM_BIND calls will not
>     replace
>     + *    any existing mappings.
>     + *
>     + * 2: The restrictions on unbinding partial or multiple mappings is
>     + *    lifted, Similarly, binding will replace any mappings in the
>     given range.
>     + *
>     + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>     + */
>     +#define I915_PARAM_VM_BIND_VERSION     57
>     +
>     +/**
>     + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>     + *
>     + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>     + * See struct drm_i915_gem_vm_control flags.
>     + *
>     + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>     + * For VM_BIND mode, we have new execbuf3 ioctl which will not
>     accept any
>     + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>     + */
>     +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
>     +
>     +/* VM_BIND related ioctls */
>     +#define DRM_I915_GEM_VM_BIND           0x3d
>     +#define DRM_I915_GEM_VM_UNBIND         0x3e
>     +#define DRM_I915_GEM_EXECBUFFER3       0x3f
>     +
>     +#define DRM_IOCTL_I915_GEM_VM_BIND           
>       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
>     drm_i915_gem_vm_bind)
>     +#define DRM_IOCTL_I915_GEM_VM_UNBIND         
>       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
>     drm_i915_gem_vm_bind)
>     +#define DRM_IOCTL_I915_GEM_EXECBUFFER3       
>       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
>     drm_i915_gem_execbuffer3)
>     +
>     +/**
>     + * struct drm_i915_gem_timeline_fence - An input or output timeline
>     fence.
>     + *
>     + * The operation will wait for input fence to signal.
>     + *
>     + * The returned output fence will be signaled after the completion
>     of the
>     + * operation.
>     + */
>     +struct drm_i915_gem_timeline_fence {
>     +       /** @handle: User's handle for a drm_syncobj to wait on or
>     signal. */
>     +       __u32 handle;
>     +
>     +       /**
>     +        * @flags: Supported flags are:
>     +        *
>     +        * I915_TIMELINE_FENCE_WAIT:
>     +        * Wait for the input fence before the operation.
>     +        *
>     +        * I915_TIMELINE_FENCE_SIGNAL:
>     +        * Return operation completion fence as output.
>     +        */
>     +       __u32 flags;
>     +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>     +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>     +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS
>     (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>     +
>     +       /**
>     +        * @value: A point in the timeline.
>     +        * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>     +        * timeline drm_syncobj is invalid as it turns a drm_syncobj
>     into a
>     +        * binary one.
>     +        */
>     +       __u64 value;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>     + *
>     + * This structure is passed to VM_BIND ioctl and specifies the
>     mapping of GPU
>     + * virtual address (VA) range to the section of an object that
>     should be bound
>     + * in the device page table of the specified address space (VM).
>     + * The VA range specified must be unique (ie., not currently bound)
>     and can
>     + * be mapped to whole object or a section of the object (partial
>     binding).
>     + * Multiple VA mappings can be created to the same section of the
>     object
>     + * (aliasing).
>     + *
>     + * The @start, @offset and @length must be 4K page aligned. However
>     the DG2
>     + * and XEHPSDV has 64K page size for device local-memory and has
>     compact page
>     + * table. On those platforms, for binding device local-memory
>     objects, the
>     + * @start must be 2M aligned, @offset and @length must be 64K aligned.
> 
> 
> This is not acceptable.  We need 64K granularity.  This includes the 
> starting address, the BO offset, and the length.  Why?  The tl;dr is 
> that it's a requirement for about 50% of D3D12 apps if we want them to 
> run on Linux via D3D12.  A longer explanation follows.  I don't 
> necessarily expect kernel folks to get all the details but hopefully 
> I'll have left enough of a map that some of the Intel Mesa folks can 
> help fill in details.
> 
> Many modern D3D12 apps have a hard requirement on Tier2 tiled 
> resources.  This is a feature that Intel has supported in the D3D12 
> driver since Skylake.  In order to implement this feature, VKD3D 
> requires the various sparseResidencyImage* and sparseResidency*Sampled 
> Vulkan features.  If we want those apps to work (there's getting to be 
> quite a few of them), we need to implement the Vulkan sparse residency 
> features.
> |
> |
> What is sparse residency?  I'm glad you asked!  The sparse residency 
> features allow a client to separately bind each miplevel or array slice 
> of an image to a chunk of device memory independently, without affecting 
> any other areas of the image.  Once you get to a high enough miplevel 
> that everything fits inside a single sparse image block (that's a 
> technical Vulkan term you can search for in the spec), you can enter a 
> "miptail" which contains all the remaining miplevels in a single sparse 
> image block.
> 
> The term "sparse image block" is what the Vulkan spec uses.  On Intel 
> hardware and in the docs, it's what we call a "tile".  Specifically, the 
> image needs to use Yf or Ys tiling on SKL-TGL or a Tile64 on DG2+.  This 
> is because Tile4 and legacy X and Y-tiling don't provide any guarantees 
> about page alignment for slices.  Yf, Ys, and Tile64, on the other hand, 
> align all slices of the image to a tile boundary, allowing us to map 
> memory to different slices independently, assuming we have 64K (or 4K 
> for Yf) VM_BIND granularity.  (4K isn't actually a requirement for 
> SKL-TGL; we can use Ys all the time which has 64K tiles but there's no 
> reason to not support 4K alignments on integrated.)
> 
> Someone may be tempted to ask, "Can't we wiggle the strides around or 
> something to make it work?"  I thought about that and no, you can't.  
> The problem here is LOD2+.  Sure, you can have a stride such that the 
> image is a multiple of 2M worth of tiles across.  That'll work fine for 
> LOD0 and LOD1; both will be 2M aligned.  However, LOD2 won't be and 
> there's no way to control that.  The hardware will place it to the right 
> of LOD1 by ROUND_UP(width, tile_width) pixels and there's nothing you 
> can do about that.  If that position doesn't happen to hit a 2M 
> boundary, you're out of luck.
> 
> I hope that explanation provides enough detail.  Sadly, this is one of 
> those things which has a lot of moving pieces all over different bits of 
> the hardware and various APIs and they all have to work together just 
> right for it to all come out in the end.  But, yeah, we really need 64K 
> aligned binding if we want VKD3D to work.

Just to confirm, the new model would be to enforce 64K GTT alignment for 
lmem pages, and then for smem pages we would only require 4K alignment, 
but with the added restriction that userspace will never try to mix the 
two (lmem vs smem) within the same 2M va range (page-table). The kernel 
will verify this and throw an error if needed. This model should work 
with the above?

> 
> --Jason
> 
>     + * Also, for such mappings, i915 will reserve the whole 2M range
>     for it so as
>     + * to not allow multiple mappings in that 2M range (Compact page
>     tables do not
>     + * allow 64K page and 4K page bindings in the same 2M range).
>     + *
>     + * Error code -EINVAL will be returned if @start, @offset and
>     @length are not
>     + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION),
>     error code
>     + * -ENOSPC will be returned if the VA range specified can't be
>     reserved.
>     + *
>     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>     concurrently
>     + * are not ordered. Furthermore, parts of the VM_BIND operation can
>     be done
>     + * asynchronously, if valid @fence is specified.
>     + */
>     +struct drm_i915_gem_vm_bind {
>     +       /** @vm_id: VM (address space) id to bind */
>     +       __u32 vm_id;
>     +
>     +       /** @handle: Object handle */
>     +       __u32 handle;
>     +
>     +       /** @start: Virtual Address start to bind */
>     +       __u64 start;
>     +
>     +       /** @offset: Offset in object to bind */
>     +       __u64 offset;
>     +
>     +       /** @length: Length of mapping to bind */
>     +       __u64 length;
>     +
>     +       /**
>     +        * @flags: Supported flags are:
>     +        *
>     +        * I915_GEM_VM_BIND_READONLY:
>     +        * Mapping is read-only.
>     +        *
>     +        * I915_GEM_VM_BIND_CAPTURE:
>     +        * Capture this mapping in the dump upon GPU error.
>     +        */
>     +       __u64 flags;
>     +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
>     +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
>     +
>     +       /**
>     +        * @fence: Timeline fence for bind completion signaling.
>     +        *
>     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>     +        * is invalid, and an error will be returned.
>     +        */
>     +       struct drm_i915_gem_timeline_fence fence;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>     + *
>     + * This structure is passed to VM_UNBIND ioctl and specifies the
>     GPU virtual
>     + * address (VA) range that should be unbound from the device page
>     table of the
>     + * specified address space (VM). VM_UNBIND will force unbind the
>     specified
>     + * range from device page table without waiting for any GPU job to
>     complete.
>     + * It is UMDs responsibility to ensure the mapping is no longer in
>     use before
>     + * calling VM_UNBIND.
>     + *
>     + * If the specified mapping is not found, the ioctl will simply
>     return without
>     + * any error.
>     + *
>     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>     concurrently
>     + * are not ordered. Furthermore, parts of the VM_UNBIND operation
>     can be done
>     + * asynchronously, if valid @fence is specified.
>     + */
>     +struct drm_i915_gem_vm_unbind {
>     +       /** @vm_id: VM (address space) id to bind */
>     +       __u32 vm_id;
>     +
>     +       /** @rsvd: Reserved, MBZ */
>     +       __u32 rsvd;
>     +
>     +       /** @start: Virtual Address start to unbind */
>     +       __u64 start;
>     +
>     +       /** @length: Length of mapping to unbind */
>     +       __u64 length;
>     +
>     +       /** @flags: Currently reserved, MBZ */
>     +       __u64 flags;
>     +
>     +       /**
>     +        * @fence: Timeline fence for unbind completion signaling.
>     +        *
>     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>     +        * is invalid, and an error will be returned.
>     +        */
>     +       struct drm_i915_gem_timeline_fence fence;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_execbuffer3 - Structure for
>     DRM_I915_GEM_EXECBUFFER3
>     + * ioctl.
>     + *
>     + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and
>     VM_BIND mode
>     + * only works with this ioctl for submission.
>     + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>     + */
>     +struct drm_i915_gem_execbuffer3 {
>     +       /**
>     +        * @ctx_id: Context id
>     +        *
>     +        * Only contexts with user engine map are allowed.
>     +        */
>     +       __u32 ctx_id;
>     +
>     +       /**
>     +        * @engine_idx: Engine index
>     +        *
>     +        * An index in the user engine map of the context specified
>     by @ctx_id.
>     +        */
>     +       __u32 engine_idx;
>     +
>     +       /**
>     +        * @batch_address: Batch gpu virtual address/es.
>     +        *
>     +        * For normal submission, it is the gpu virtual address of
>     the batch
>     +        * buffer. For parallel submission, it is a pointer to an
>     array of
>     +        * batch buffer gpu virtual addresses with array size equal
>     to the
>     +        * number of (parallel) engines involved in that submission (See
>     +        * struct i915_context_engines_parallel_submit).
>     +        */
>     +       __u64 batch_address;
>     +
>     +       /** @flags: Currently reserved, MBZ */
>     +       __u64 flags;
>     +
>     +       /** @rsvd1: Reserved, MBZ */
>     +       __u32 rsvd1;
>     +
>     +       /** @fence_count: Number of fences in @timeline_fences array. */
>     +       __u32 fence_count;
>     +
>     +       /**
>     +        * @timeline_fences: Pointer to an array of timeline fences.
>     +        *
>     +        * Timeline fences are of format struct
>     drm_i915_gem_timeline_fence.
>     +        */
>     +       __u64 timeline_fences;
>     +
>     +       /** @rsvd2: Reserved, MBZ */
>     +       __u64 rsvd2;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_create_ext_vm_private - Extension to make
>     the object
>     + * private to the specified VM.
>     + *
>     + * See struct drm_i915_gem_create_ext.
>     + */
>     +struct drm_i915_gem_create_ext_vm_private {
>     +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
>     +       /** @base: Extension link. See struct i915_user_extension. */
>     +       struct i915_user_extension base;
>     +
>     +       /** @vm_id: Id of the VM to which the object is private */
>     +       __u32 vm_id;
>     +};
>     -- 
>     2.21.0.rc0.32.g243a4c7e27
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30 15:14       ` [Intel-gfx] " Matthew Auld
@ 2022-06-30 15:34         ` Jason Ekstrand
  -1 siblings, 0 replies; 53+ messages in thread
From: Jason Ekstrand @ 2022-06-30 15:34 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Matthew Brost, Paulo Zanoni, Lionel Landwerlin, Tvrtko Ursulin,
	Intel GFX, Maling list - DRI developers, Thomas Hellstrom,
	oak.zeng, Chris Wilson, Daniel Vetter, Niranjana Vishwanathapura,
	Christian König

[-- Attachment #1: Type: text/plain, Size: 18722 bytes --]

On Thu, Jun 30, 2022 at 10:14 AM Matthew Auld <matthew.auld@intel.com>
wrote:

> On 30/06/2022 06:11, Jason Ekstrand wrote:
> > On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura
> > <niranjana.vishwanathapura@intel.com
> > <mailto:niranjana.vishwanathapura@intel.com>> wrote:
> >
> >     VM_BIND and related uapi definitions
> >
> >     v2: Reduce the scope to simple Mesa use case.
> >     v3: Expand VM_UNBIND documentation and add
> >          I915_GEM_VM_BIND/UNBIND_FENCE_VALID
> >          and I915_GEM_VM_BIND_TLB_FLUSH flags.
> >     v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
> >          documentation for vm_bind/unbind.
> >     v5: Remove TLB flush requirement on VM_UNBIND.
> >          Add version support to stage implementation.
> >     v6: Define and use drm_i915_gem_timeline_fence structure for
> >          all timeline fences.
> >     v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
> >          Update documentation on async vm_bind/unbind and versioning.
> >          Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
> >          batch_count field and I915_EXEC3_SECURE flag.
> >
> >     Signed-off-by: Niranjana Vishwanathapura
> >     <niranjana.vishwanathapura@intel.com
> >     <mailto:niranjana.vishwanathapura@intel.com>>
> >     Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch
> >     <mailto:daniel.vetter@ffwll.ch>>
> >     ---
> >       Documentation/gpu/rfc/i915_vm_bind.h | 280
> +++++++++++++++++++++++++++
> >       1 file changed, 280 insertions(+)
> >       create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> >
> >     diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
> >     b/Documentation/gpu/rfc/i915_vm_bind.h
> >     new file mode 100644
> >     index 000000000000..a93e08bceee6
> >     --- /dev/null
> >     +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> >     @@ -0,0 +1,280 @@
> >     +/* SPDX-License-Identifier: MIT */
> >     +/*
> >     + * Copyright © 2022 Intel Corporation
> >     + */
> >     +
> >     +/**
> >     + * DOC: I915_PARAM_VM_BIND_VERSION
> >     + *
> >     + * VM_BIND feature version supported.
> >     + * See typedef drm_i915_getparam_t param.
> >     + *
> >     + * Specifies the VM_BIND feature version supported.
> >     + * The following versions of VM_BIND have been defined:
> >     + *
> >     + * 0: No VM_BIND support.
> >     + *
> >     + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
> >     created
> >     + *    previously with VM_BIND, the ioctl will not support unbinding
> >     multiple
> >     + *    mappings or splitting them. Similarly, VM_BIND calls will not
> >     replace
> >     + *    any existing mappings.
> >     + *
> >     + * 2: The restrictions on unbinding partial or multiple mappings is
> >     + *    lifted, Similarly, binding will replace any mappings in the
> >     given range.
> >     + *
> >     + * See struct drm_i915_gem_vm_bind and struct
> drm_i915_gem_vm_unbind.
> >     + */
> >     +#define I915_PARAM_VM_BIND_VERSION     57
> >     +
> >     +/**
> >     + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> >     + *
> >     + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> >     + * See struct drm_i915_gem_vm_control flags.
> >     + *
> >     + * The older execbuf2 ioctl will not support VM_BIND mode of
> operation.
> >     + * For VM_BIND mode, we have new execbuf3 ioctl which will not
> >     accept any
> >     + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> >     + */
> >     +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
> >     +
> >     +/* VM_BIND related ioctls */
> >     +#define DRM_I915_GEM_VM_BIND           0x3d
> >     +#define DRM_I915_GEM_VM_UNBIND         0x3e
> >     +#define DRM_I915_GEM_EXECBUFFER3       0x3f
> >     +
> >     +#define DRM_IOCTL_I915_GEM_VM_BIND
> >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
> >     drm_i915_gem_vm_bind)
> >     +#define DRM_IOCTL_I915_GEM_VM_UNBIND
> >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
> >     drm_i915_gem_vm_bind)
> >     +#define DRM_IOCTL_I915_GEM_EXECBUFFER3
> >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
> >     drm_i915_gem_execbuffer3)
> >     +
> >     +/**
> >     + * struct drm_i915_gem_timeline_fence - An input or output timeline
> >     fence.
> >     + *
> >     + * The operation will wait for input fence to signal.
> >     + *
> >     + * The returned output fence will be signaled after the completion
> >     of the
> >     + * operation.
> >     + */
> >     +struct drm_i915_gem_timeline_fence {
> >     +       /** @handle: User's handle for a drm_syncobj to wait on or
> >     signal. */
> >     +       __u32 handle;
> >     +
> >     +       /**
> >     +        * @flags: Supported flags are:
> >     +        *
> >     +        * I915_TIMELINE_FENCE_WAIT:
> >     +        * Wait for the input fence before the operation.
> >     +        *
> >     +        * I915_TIMELINE_FENCE_SIGNAL:
> >     +        * Return operation completion fence as output.
> >     +        */
> >     +       __u32 flags;
> >     +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> >     +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> >     +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS
> >     (-(I915_TIMELINE_FENCE_SIGNAL << 1))
> >     +
> >     +       /**
> >     +        * @value: A point in the timeline.
> >     +        * Value must be 0 for a binary drm_syncobj. A Value of 0
> for a
> >     +        * timeline drm_syncobj is invalid as it turns a drm_syncobj
> >     into a
> >     +        * binary one.
> >     +        */
> >     +       __u64 value;
> >     +};
> >     +
> >     +/**
> >     + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> >     + *
> >     + * This structure is passed to VM_BIND ioctl and specifies the
> >     mapping of GPU
> >     + * virtual address (VA) range to the section of an object that
> >     should be bound
> >     + * in the device page table of the specified address space (VM).
> >     + * The VA range specified must be unique (ie., not currently bound)
> >     and can
> >     + * be mapped to whole object or a section of the object (partial
> >     binding).
> >     + * Multiple VA mappings can be created to the same section of the
> >     object
> >     + * (aliasing).
> >     + *
> >     + * The @start, @offset and @length must be 4K page aligned. However
> >     the DG2
> >     + * and XEHPSDV has 64K page size for device local-memory and has
> >     compact page
> >     + * table. On those platforms, for binding device local-memory
> >     objects, the
> >     + * @start must be 2M aligned, @offset and @length must be 64K
> aligned.
> >
> >
> > This is not acceptable.  We need 64K granularity.  This includes the
> > starting address, the BO offset, and the length.  Why?  The tl;dr is
> > that it's a requirement for about 50% of D3D12 apps if we want them to
> > run on Linux via D3D12.  A longer explanation follows.  I don't
> > necessarily expect kernel folks to get all the details but hopefully
> > I'll have left enough of a map that some of the Intel Mesa folks can
> > help fill in details.
> >
> > Many modern D3D12 apps have a hard requirement on Tier2 tiled
> > resources.  This is a feature that Intel has supported in the D3D12
> > driver since Skylake.  In order to implement this feature, VKD3D
> > requires the various sparseResidencyImage* and sparseResidency*Sampled
> > Vulkan features.  If we want those apps to work (there's getting to be
> > quite a few of them), we need to implement the Vulkan sparse residency
> > features.
> > |
> > |
> > What is sparse residency?  I'm glad you asked!  The sparse residency
> > features allow a client to separately bind each miplevel or array slice
> > of an image to a chunk of device memory independently, without affecting
> > any other areas of the image.  Once you get to a high enough miplevel
> > that everything fits inside a single sparse image block (that's a
> > technical Vulkan term you can search for in the spec), you can enter a
> > "miptail" which contains all the remaining miplevels in a single sparse
> > image block.
> >
> > The term "sparse image block" is what the Vulkan spec uses.  On Intel
> > hardware and in the docs, it's what we call a "tile".  Specifically, the
> > image needs to use Yf or Ys tiling on SKL-TGL or a Tile64 on DG2+.  This
> > is because Tile4 and legacy X and Y-tiling don't provide any guarantees
> > about page alignment for slices.  Yf, Ys, and Tile64, on the other hand,
> > align all slices of the image to a tile boundary, allowing us to map
> > memory to different slices independently, assuming we have 64K (or 4K
> > for Yf) VM_BIND granularity.  (4K isn't actually a requirement for
> > SKL-TGL; we can use Ys all the time which has 64K tiles but there's no
> > reason to not support 4K alignments on integrated.)
> >
> > Someone may be tempted to ask, "Can't we wiggle the strides around or
> > something to make it work?"  I thought about that and no, you can't.
> > The problem here is LOD2+.  Sure, you can have a stride such that the
> > image is a multiple of 2M worth of tiles across.  That'll work fine for
> > LOD0 and LOD1; both will be 2M aligned.  However, LOD2 won't be and
> > there's no way to control that.  The hardware will place it to the right
> > of LOD1 by ROUND_UP(width, tile_width) pixels and there's nothing you
> > can do about that.  If that position doesn't happen to hit a 2M
> > boundary, you're out of luck.
> >
> > I hope that explanation provides enough detail.  Sadly, this is one of
> > those things which has a lot of moving pieces all over different bits of
> > the hardware and various APIs and they all have to work together just
> > right for it to all come out in the end.  But, yeah, we really need 64K
> > aligned binding if we want VKD3D to work.
>
> Just to confirm, the new model would be to enforce 64K GTT alignment for
> lmem pages, and then for smem pages we would only require 4K alignment,
> but with the added restriction that userspace will never try to mix the
> two (lmem vs smem) within the same 2M va range (page-table). The kernel
> will verify this and throw an error if needed. This model should work
> with the above?
>

Mesa doesn't have full control over BO placement so I don't think we can
guarantee quite as much as you want there.  We can guarantee, I think, that
we never place LMEM-only and SMEM-only in the same 2M block.  However, most
BOs will be LMEM+SMEM (with a preference for LMEM) and then it'll be up to
the kernel to sort out any issues.  Is that reasonable?

--Jason



> >
> > --Jason
> >
> >     + * Also, for such mappings, i915 will reserve the whole 2M range
> >     for it so as
> >     + * to not allow multiple mappings in that 2M range (Compact page
> >     tables do not
> >     + * allow 64K page and 4K page bindings in the same 2M range).
> >     + *
> >     + * Error code -EINVAL will be returned if @start, @offset and
> >     @length are not
> >     + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION),
> >     error code
> >     + * -ENOSPC will be returned if the VA range specified can't be
> >     reserved.
> >     + *
> >     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> >     concurrently
> >     + * are not ordered. Furthermore, parts of the VM_BIND operation can
> >     be done
> >     + * asynchronously, if valid @fence is specified.
> >     + */
> >     +struct drm_i915_gem_vm_bind {
> >     +       /** @vm_id: VM (address space) id to bind */
> >     +       __u32 vm_id;
> >     +
> >     +       /** @handle: Object handle */
> >     +       __u32 handle;
> >     +
> >     +       /** @start: Virtual Address start to bind */
> >     +       __u64 start;
> >     +
> >     +       /** @offset: Offset in object to bind */
> >     +       __u64 offset;
> >     +
> >     +       /** @length: Length of mapping to bind */
> >     +       __u64 length;
> >     +
> >     +       /**
> >     +        * @flags: Supported flags are:
> >     +        *
> >     +        * I915_GEM_VM_BIND_READONLY:
> >     +        * Mapping is read-only.
> >     +        *
> >     +        * I915_GEM_VM_BIND_CAPTURE:
> >     +        * Capture this mapping in the dump upon GPU error.
> >     +        */
> >     +       __u64 flags;
> >     +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
> >     +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
> >     +
> >     +       /**
> >     +        * @fence: Timeline fence for bind completion signaling.
> >     +        *
> >     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT
> flag
> >     +        * is invalid, and an error will be returned.
> >     +        */
> >     +       struct drm_i915_gem_timeline_fence fence;
> >     +
> >     +       /**
> >     +        * @extensions: Zero-terminated chain of extensions.
> >     +        *
> >     +        * For future extensions. See struct i915_user_extension.
> >     +        */
> >     +       __u64 extensions;
> >     +};
> >     +
> >     +/**
> >     + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> >     + *
> >     + * This structure is passed to VM_UNBIND ioctl and specifies the
> >     GPU virtual
> >     + * address (VA) range that should be unbound from the device page
> >     table of the
> >     + * specified address space (VM). VM_UNBIND will force unbind the
> >     specified
> >     + * range from device page table without waiting for any GPU job to
> >     complete.
> >     + * It is UMDs responsibility to ensure the mapping is no longer in
> >     use before
> >     + * calling VM_UNBIND.
> >     + *
> >     + * If the specified mapping is not found, the ioctl will simply
> >     return without
> >     + * any error.
> >     + *
> >     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> >     concurrently
> >     + * are not ordered. Furthermore, parts of the VM_UNBIND operation
> >     can be done
> >     + * asynchronously, if valid @fence is specified.
> >     + */
> >     +struct drm_i915_gem_vm_unbind {
> >     +       /** @vm_id: VM (address space) id to bind */
> >     +       __u32 vm_id;
> >     +
> >     +       /** @rsvd: Reserved, MBZ */
> >     +       __u32 rsvd;
> >     +
> >     +       /** @start: Virtual Address start to unbind */
> >     +       __u64 start;
> >     +
> >     +       /** @length: Length of mapping to unbind */
> >     +       __u64 length;
> >     +
> >     +       /** @flags: Currently reserved, MBZ */
> >     +       __u64 flags;
> >     +
> >     +       /**
> >     +        * @fence: Timeline fence for unbind completion signaling.
> >     +        *
> >     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT
> flag
> >     +        * is invalid, and an error will be returned.
> >     +        */
> >     +       struct drm_i915_gem_timeline_fence fence;
> >     +
> >     +       /**
> >     +        * @extensions: Zero-terminated chain of extensions.
> >     +        *
> >     +        * For future extensions. See struct i915_user_extension.
> >     +        */
> >     +       __u64 extensions;
> >     +};
> >     +
> >     +/**
> >     + * struct drm_i915_gem_execbuffer3 - Structure for
> >     DRM_I915_GEM_EXECBUFFER3
> >     + * ioctl.
> >     + *
> >     + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and
> >     VM_BIND mode
> >     + * only works with this ioctl for submission.
> >     + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
> >     + */
> >     +struct drm_i915_gem_execbuffer3 {
> >     +       /**
> >     +        * @ctx_id: Context id
> >     +        *
> >     +        * Only contexts with user engine map are allowed.
> >     +        */
> >     +       __u32 ctx_id;
> >     +
> >     +       /**
> >     +        * @engine_idx: Engine index
> >     +        *
> >     +        * An index in the user engine map of the context specified
> >     by @ctx_id.
> >     +        */
> >     +       __u32 engine_idx;
> >     +
> >     +       /**
> >     +        * @batch_address: Batch gpu virtual address/es.
> >     +        *
> >     +        * For normal submission, it is the gpu virtual address of
> >     the batch
> >     +        * buffer. For parallel submission, it is a pointer to an
> >     array of
> >     +        * batch buffer gpu virtual addresses with array size equal
> >     to the
> >     +        * number of (parallel) engines involved in that submission
> (See
> >     +        * struct i915_context_engines_parallel_submit).
> >     +        */
> >     +       __u64 batch_address;
> >     +
> >     +       /** @flags: Currently reserved, MBZ */
> >     +       __u64 flags;
> >     +
> >     +       /** @rsvd1: Reserved, MBZ */
> >     +       __u32 rsvd1;
> >     +
> >     +       /** @fence_count: Number of fences in @timeline_fences
> array. */
> >     +       __u32 fence_count;
> >     +
> >     +       /**
> >     +        * @timeline_fences: Pointer to an array of timeline fences.
> >     +        *
> >     +        * Timeline fences are of format struct
> >     drm_i915_gem_timeline_fence.
> >     +        */
> >     +       __u64 timeline_fences;
> >     +
> >     +       /** @rsvd2: Reserved, MBZ */
> >     +       __u64 rsvd2;
> >     +
> >     +       /**
> >     +        * @extensions: Zero-terminated chain of extensions.
> >     +        *
> >     +        * For future extensions. See struct i915_user_extension.
> >     +        */
> >     +       __u64 extensions;
> >     +};
> >     +
> >     +/**
> >     + * struct drm_i915_gem_create_ext_vm_private - Extension to make
> >     the object
> >     + * private to the specified VM.
> >     + *
> >     + * See struct drm_i915_gem_create_ext.
> >     + */
> >     +struct drm_i915_gem_create_ext_vm_private {
> >     +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
> >     +       /** @base: Extension link. See struct i915_user_extension. */
> >     +       struct i915_user_extension base;
> >     +
> >     +       /** @vm_id: Id of the VM to which the object is private */
> >     +       __u32 vm_id;
> >     +};
> >     --
> >     2.21.0.rc0.32.g243a4c7e27
> >
>

[-- Attachment #2: Type: text/html, Size: 23692 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-30 15:34         ` Jason Ekstrand
  0 siblings, 0 replies; 53+ messages in thread
From: Jason Ekstrand @ 2022-06-30 15:34 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Paulo Zanoni, Intel GFX, Maling list - DRI developers,
	Thomas Hellstrom, Chris Wilson, Daniel Vetter,
	Christian König

[-- Attachment #1: Type: text/plain, Size: 18722 bytes --]

On Thu, Jun 30, 2022 at 10:14 AM Matthew Auld <matthew.auld@intel.com>
wrote:

> On 30/06/2022 06:11, Jason Ekstrand wrote:
> > On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura
> > <niranjana.vishwanathapura@intel.com
> > <mailto:niranjana.vishwanathapura@intel.com>> wrote:
> >
> >     VM_BIND and related uapi definitions
> >
> >     v2: Reduce the scope to simple Mesa use case.
> >     v3: Expand VM_UNBIND documentation and add
> >          I915_GEM_VM_BIND/UNBIND_FENCE_VALID
> >          and I915_GEM_VM_BIND_TLB_FLUSH flags.
> >     v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
> >          documentation for vm_bind/unbind.
> >     v5: Remove TLB flush requirement on VM_UNBIND.
> >          Add version support to stage implementation.
> >     v6: Define and use drm_i915_gem_timeline_fence structure for
> >          all timeline fences.
> >     v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
> >          Update documentation on async vm_bind/unbind and versioning.
> >          Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
> >          batch_count field and I915_EXEC3_SECURE flag.
> >
> >     Signed-off-by: Niranjana Vishwanathapura
> >     <niranjana.vishwanathapura@intel.com
> >     <mailto:niranjana.vishwanathapura@intel.com>>
> >     Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch
> >     <mailto:daniel.vetter@ffwll.ch>>
> >     ---
> >       Documentation/gpu/rfc/i915_vm_bind.h | 280
> +++++++++++++++++++++++++++
> >       1 file changed, 280 insertions(+)
> >       create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> >
> >     diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
> >     b/Documentation/gpu/rfc/i915_vm_bind.h
> >     new file mode 100644
> >     index 000000000000..a93e08bceee6
> >     --- /dev/null
> >     +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> >     @@ -0,0 +1,280 @@
> >     +/* SPDX-License-Identifier: MIT */
> >     +/*
> >     + * Copyright © 2022 Intel Corporation
> >     + */
> >     +
> >     +/**
> >     + * DOC: I915_PARAM_VM_BIND_VERSION
> >     + *
> >     + * VM_BIND feature version supported.
> >     + * See typedef drm_i915_getparam_t param.
> >     + *
> >     + * Specifies the VM_BIND feature version supported.
> >     + * The following versions of VM_BIND have been defined:
> >     + *
> >     + * 0: No VM_BIND support.
> >     + *
> >     + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
> >     created
> >     + *    previously with VM_BIND, the ioctl will not support unbinding
> >     multiple
> >     + *    mappings or splitting them. Similarly, VM_BIND calls will not
> >     replace
> >     + *    any existing mappings.
> >     + *
> >     + * 2: The restrictions on unbinding partial or multiple mappings is
> >     + *    lifted, Similarly, binding will replace any mappings in the
> >     given range.
> >     + *
> >     + * See struct drm_i915_gem_vm_bind and struct
> drm_i915_gem_vm_unbind.
> >     + */
> >     +#define I915_PARAM_VM_BIND_VERSION     57
> >     +
> >     +/**
> >     + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> >     + *
> >     + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> >     + * See struct drm_i915_gem_vm_control flags.
> >     + *
> >     + * The older execbuf2 ioctl will not support VM_BIND mode of
> operation.
> >     + * For VM_BIND mode, we have new execbuf3 ioctl which will not
> >     accept any
> >     + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> >     + */
> >     +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
> >     +
> >     +/* VM_BIND related ioctls */
> >     +#define DRM_I915_GEM_VM_BIND           0x3d
> >     +#define DRM_I915_GEM_VM_UNBIND         0x3e
> >     +#define DRM_I915_GEM_EXECBUFFER3       0x3f
> >     +
> >     +#define DRM_IOCTL_I915_GEM_VM_BIND
> >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
> >     drm_i915_gem_vm_bind)
> >     +#define DRM_IOCTL_I915_GEM_VM_UNBIND
> >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
> >     drm_i915_gem_vm_bind)
> >     +#define DRM_IOCTL_I915_GEM_EXECBUFFER3
> >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
> >     drm_i915_gem_execbuffer3)
> >     +
> >     +/**
> >     + * struct drm_i915_gem_timeline_fence - An input or output timeline
> >     fence.
> >     + *
> >     + * The operation will wait for input fence to signal.
> >     + *
> >     + * The returned output fence will be signaled after the completion
> >     of the
> >     + * operation.
> >     + */
> >     +struct drm_i915_gem_timeline_fence {
> >     +       /** @handle: User's handle for a drm_syncobj to wait on or
> >     signal. */
> >     +       __u32 handle;
> >     +
> >     +       /**
> >     +        * @flags: Supported flags are:
> >     +        *
> >     +        * I915_TIMELINE_FENCE_WAIT:
> >     +        * Wait for the input fence before the operation.
> >     +        *
> >     +        * I915_TIMELINE_FENCE_SIGNAL:
> >     +        * Return operation completion fence as output.
> >     +        */
> >     +       __u32 flags;
> >     +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> >     +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> >     +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS
> >     (-(I915_TIMELINE_FENCE_SIGNAL << 1))
> >     +
> >     +       /**
> >     +        * @value: A point in the timeline.
> >     +        * Value must be 0 for a binary drm_syncobj. A Value of 0
> for a
> >     +        * timeline drm_syncobj is invalid as it turns a drm_syncobj
> >     into a
> >     +        * binary one.
> >     +        */
> >     +       __u64 value;
> >     +};
> >     +
> >     +/**
> >     + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> >     + *
> >     + * This structure is passed to VM_BIND ioctl and specifies the
> >     mapping of GPU
> >     + * virtual address (VA) range to the section of an object that
> >     should be bound
> >     + * in the device page table of the specified address space (VM).
> >     + * The VA range specified must be unique (ie., not currently bound)
> >     and can
> >     + * be mapped to whole object or a section of the object (partial
> >     binding).
> >     + * Multiple VA mappings can be created to the same section of the
> >     object
> >     + * (aliasing).
> >     + *
> >     + * The @start, @offset and @length must be 4K page aligned. However
> >     the DG2
> >     + * and XEHPSDV has 64K page size for device local-memory and has
> >     compact page
> >     + * table. On those platforms, for binding device local-memory
> >     objects, the
> >     + * @start must be 2M aligned, @offset and @length must be 64K
> aligned.
> >
> >
> > This is not acceptable.  We need 64K granularity.  This includes the
> > starting address, the BO offset, and the length.  Why?  The tl;dr is
> > that it's a requirement for about 50% of D3D12 apps if we want them to
> > run on Linux via D3D12.  A longer explanation follows.  I don't
> > necessarily expect kernel folks to get all the details but hopefully
> > I'll have left enough of a map that some of the Intel Mesa folks can
> > help fill in details.
> >
> > Many modern D3D12 apps have a hard requirement on Tier2 tiled
> > resources.  This is a feature that Intel has supported in the D3D12
> > driver since Skylake.  In order to implement this feature, VKD3D
> > requires the various sparseResidencyImage* and sparseResidency*Sampled
> > Vulkan features.  If we want those apps to work (there's getting to be
> > quite a few of them), we need to implement the Vulkan sparse residency
> > features.
> > |
> > |
> > What is sparse residency?  I'm glad you asked!  The sparse residency
> > features allow a client to separately bind each miplevel or array slice
> > of an image to a chunk of device memory independently, without affecting
> > any other areas of the image.  Once you get to a high enough miplevel
> > that everything fits inside a single sparse image block (that's a
> > technical Vulkan term you can search for in the spec), you can enter a
> > "miptail" which contains all the remaining miplevels in a single sparse
> > image block.
> >
> > The term "sparse image block" is what the Vulkan spec uses.  On Intel
> > hardware and in the docs, it's what we call a "tile".  Specifically, the
> > image needs to use Yf or Ys tiling on SKL-TGL or a Tile64 on DG2+.  This
> > is because Tile4 and legacy X and Y-tiling don't provide any guarantees
> > about page alignment for slices.  Yf, Ys, and Tile64, on the other hand,
> > align all slices of the image to a tile boundary, allowing us to map
> > memory to different slices independently, assuming we have 64K (or 4K
> > for Yf) VM_BIND granularity.  (4K isn't actually a requirement for
> > SKL-TGL; we can use Ys all the time which has 64K tiles but there's no
> > reason to not support 4K alignments on integrated.)
> >
> > Someone may be tempted to ask, "Can't we wiggle the strides around or
> > something to make it work?"  I thought about that and no, you can't.
> > The problem here is LOD2+.  Sure, you can have a stride such that the
> > image is a multiple of 2M worth of tiles across.  That'll work fine for
> > LOD0 and LOD1; both will be 2M aligned.  However, LOD2 won't be and
> > there's no way to control that.  The hardware will place it to the right
> > of LOD1 by ROUND_UP(width, tile_width) pixels and there's nothing you
> > can do about that.  If that position doesn't happen to hit a 2M
> > boundary, you're out of luck.
> >
> > I hope that explanation provides enough detail.  Sadly, this is one of
> > those things which has a lot of moving pieces all over different bits of
> > the hardware and various APIs and they all have to work together just
> > right for it to all come out in the end.  But, yeah, we really need 64K
> > aligned binding if we want VKD3D to work.
>
> Just to confirm, the new model would be to enforce 64K GTT alignment for
> lmem pages, and then for smem pages we would only require 4K alignment,
> but with the added restriction that userspace will never try to mix the
> two (lmem vs smem) within the same 2M va range (page-table). The kernel
> will verify this and throw an error if needed. This model should work
> with the above?
>

Mesa doesn't have full control over BO placement so I don't think we can
guarantee quite as much as you want there.  We can guarantee, I think, that
we never place LMEM-only and SMEM-only in the same 2M block.  However, most
BOs will be LMEM+SMEM (with a preference for LMEM) and then it'll be up to
the kernel to sort out any issues.  Is that reasonable?

--Jason



> >
> > --Jason
> >
> >     + * Also, for such mappings, i915 will reserve the whole 2M range
> >     for it so as
> >     + * to not allow multiple mappings in that 2M range (Compact page
> >     tables do not
> >     + * allow 64K page and 4K page bindings in the same 2M range).
> >     + *
> >     + * Error code -EINVAL will be returned if @start, @offset and
> >     @length are not
> >     + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION),
> >     error code
> >     + * -ENOSPC will be returned if the VA range specified can't be
> >     reserved.
> >     + *
> >     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> >     concurrently
> >     + * are not ordered. Furthermore, parts of the VM_BIND operation can
> >     be done
> >     + * asynchronously, if valid @fence is specified.
> >     + */
> >     +struct drm_i915_gem_vm_bind {
> >     +       /** @vm_id: VM (address space) id to bind */
> >     +       __u32 vm_id;
> >     +
> >     +       /** @handle: Object handle */
> >     +       __u32 handle;
> >     +
> >     +       /** @start: Virtual Address start to bind */
> >     +       __u64 start;
> >     +
> >     +       /** @offset: Offset in object to bind */
> >     +       __u64 offset;
> >     +
> >     +       /** @length: Length of mapping to bind */
> >     +       __u64 length;
> >     +
> >     +       /**
> >     +        * @flags: Supported flags are:
> >     +        *
> >     +        * I915_GEM_VM_BIND_READONLY:
> >     +        * Mapping is read-only.
> >     +        *
> >     +        * I915_GEM_VM_BIND_CAPTURE:
> >     +        * Capture this mapping in the dump upon GPU error.
> >     +        */
> >     +       __u64 flags;
> >     +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
> >     +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
> >     +
> >     +       /**
> >     +        * @fence: Timeline fence for bind completion signaling.
> >     +        *
> >     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT
> flag
> >     +        * is invalid, and an error will be returned.
> >     +        */
> >     +       struct drm_i915_gem_timeline_fence fence;
> >     +
> >     +       /**
> >     +        * @extensions: Zero-terminated chain of extensions.
> >     +        *
> >     +        * For future extensions. See struct i915_user_extension.
> >     +        */
> >     +       __u64 extensions;
> >     +};
> >     +
> >     +/**
> >     + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> >     + *
> >     + * This structure is passed to VM_UNBIND ioctl and specifies the
> >     GPU virtual
> >     + * address (VA) range that should be unbound from the device page
> >     table of the
> >     + * specified address space (VM). VM_UNBIND will force unbind the
> >     specified
> >     + * range from device page table without waiting for any GPU job to
> >     complete.
> >     + * It is UMDs responsibility to ensure the mapping is no longer in
> >     use before
> >     + * calling VM_UNBIND.
> >     + *
> >     + * If the specified mapping is not found, the ioctl will simply
> >     return without
> >     + * any error.
> >     + *
> >     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> >     concurrently
> >     + * are not ordered. Furthermore, parts of the VM_UNBIND operation
> >     can be done
> >     + * asynchronously, if valid @fence is specified.
> >     + */
> >     +struct drm_i915_gem_vm_unbind {
> >     +       /** @vm_id: VM (address space) id to bind */
> >     +       __u32 vm_id;
> >     +
> >     +       /** @rsvd: Reserved, MBZ */
> >     +       __u32 rsvd;
> >     +
> >     +       /** @start: Virtual Address start to unbind */
> >     +       __u64 start;
> >     +
> >     +       /** @length: Length of mapping to unbind */
> >     +       __u64 length;
> >     +
> >     +       /** @flags: Currently reserved, MBZ */
> >     +       __u64 flags;
> >     +
> >     +       /**
> >     +        * @fence: Timeline fence for unbind completion signaling.
> >     +        *
> >     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT
> flag
> >     +        * is invalid, and an error will be returned.
> >     +        */
> >     +       struct drm_i915_gem_timeline_fence fence;
> >     +
> >     +       /**
> >     +        * @extensions: Zero-terminated chain of extensions.
> >     +        *
> >     +        * For future extensions. See struct i915_user_extension.
> >     +        */
> >     +       __u64 extensions;
> >     +};
> >     +
> >     +/**
> >     + * struct drm_i915_gem_execbuffer3 - Structure for
> >     DRM_I915_GEM_EXECBUFFER3
> >     + * ioctl.
> >     + *
> >     + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and
> >     VM_BIND mode
> >     + * only works with this ioctl for submission.
> >     + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
> >     + */
> >     +struct drm_i915_gem_execbuffer3 {
> >     +       /**
> >     +        * @ctx_id: Context id
> >     +        *
> >     +        * Only contexts with user engine map are allowed.
> >     +        */
> >     +       __u32 ctx_id;
> >     +
> >     +       /**
> >     +        * @engine_idx: Engine index
> >     +        *
> >     +        * An index in the user engine map of the context specified
> >     by @ctx_id.
> >     +        */
> >     +       __u32 engine_idx;
> >     +
> >     +       /**
> >     +        * @batch_address: Batch gpu virtual address/es.
> >     +        *
> >     +        * For normal submission, it is the gpu virtual address of
> >     the batch
> >     +        * buffer. For parallel submission, it is a pointer to an
> >     array of
> >     +        * batch buffer gpu virtual addresses with array size equal
> >     to the
> >     +        * number of (parallel) engines involved in that submission
> (See
> >     +        * struct i915_context_engines_parallel_submit).
> >     +        */
> >     +       __u64 batch_address;
> >     +
> >     +       /** @flags: Currently reserved, MBZ */
> >     +       __u64 flags;
> >     +
> >     +       /** @rsvd1: Reserved, MBZ */
> >     +       __u32 rsvd1;
> >     +
> >     +       /** @fence_count: Number of fences in @timeline_fences
> array. */
> >     +       __u32 fence_count;
> >     +
> >     +       /**
> >     +        * @timeline_fences: Pointer to an array of timeline fences.
> >     +        *
> >     +        * Timeline fences are of format struct
> >     drm_i915_gem_timeline_fence.
> >     +        */
> >     +       __u64 timeline_fences;
> >     +
> >     +       /** @rsvd2: Reserved, MBZ */
> >     +       __u64 rsvd2;
> >     +
> >     +       /**
> >     +        * @extensions: Zero-terminated chain of extensions.
> >     +        *
> >     +        * For future extensions. See struct i915_user_extension.
> >     +        */
> >     +       __u64 extensions;
> >     +};
> >     +
> >     +/**
> >     + * struct drm_i915_gem_create_ext_vm_private - Extension to make
> >     the object
> >     + * private to the specified VM.
> >     + *
> >     + * See struct drm_i915_gem_create_ext.
> >     + */
> >     +struct drm_i915_gem_create_ext_vm_private {
> >     +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
> >     +       /** @base: Extension link. See struct i915_user_extension. */
> >     +       struct i915_user_extension base;
> >     +
> >     +       /** @vm_id: Id of the VM to which the object is private */
> >     +       __u32 vm_id;
> >     +};
> >     --
> >     2.21.0.rc0.32.g243a4c7e27
> >
>

[-- Attachment #2: Type: text/html, Size: 23692 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-26  1:49   ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-06-30 15:45     ` Jason Ekstrand
  -1 siblings, 0 replies; 53+ messages in thread
From: Jason Ekstrand @ 2022-06-30 15:45 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: Matthew Brost, Paulo Zanoni, Lionel Landwerlin, Tvrtko Ursulin,
	Intel GFX, Chris Wilson, Thomas Hellstrom, oak.zeng,
	Maling list - DRI developers, Daniel Vetter,
	Christian König, Matthew Auld

[-- Attachment #1: Type: text/plain, Size: 12430 bytes --]

On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura <
niranjana.vishwanathapura@intel.com> wrote:

> VM_BIND and related uapi definitions
>
> v2: Reduce the scope to simple Mesa use case.
> v3: Expand VM_UNBIND documentation and add
>     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>     and I915_GEM_VM_BIND_TLB_FLUSH flags.
> v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>     documentation for vm_bind/unbind.
> v5: Remove TLB flush requirement on VM_UNBIND.
>     Add version support to stage implementation.
> v6: Define and use drm_i915_gem_timeline_fence structure for
>     all timeline fences.
> v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>     Update documentation on async vm_bind/unbind and versioning.
>     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>     batch_count field and I915_EXEC3_SECURE flag.
>
> Signed-off-by: Niranjana Vishwanathapura <
> niranjana.vishwanathapura@intel.com>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>  1 file changed, 280 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
> b/Documentation/gpu/rfc/i915_vm_bind.h
> new file mode 100644
> index 000000000000..a93e08bceee6
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> @@ -0,0 +1,280 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +/**
> + * DOC: I915_PARAM_VM_BIND_VERSION
> + *
> + * VM_BIND feature version supported.
> + * See typedef drm_i915_getparam_t param.
> + *
> + * Specifies the VM_BIND feature version supported.
> + * The following versions of VM_BIND have been defined:
> + *
> + * 0: No VM_BIND support.
> + *
> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
> + *    previously with VM_BIND, the ioctl will not support unbinding
> multiple
> + *    mappings or splitting them. Similarly, VM_BIND calls will not
> replace
> + *    any existing mappings.
> + *
> + * 2: The restrictions on unbinding partial or multiple mappings is
> + *    lifted, Similarly, binding will replace any mappings in the given
> range.
> + *
> + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
> + */
> +#define I915_PARAM_VM_BIND_VERSION     57
> +
> +/**
> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> + *
> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> + * See struct drm_i915_gem_vm_control flags.
> + *
> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> + */
> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
> +
> +/* VM_BIND related ioctls */
> +#define DRM_I915_GEM_VM_BIND           0x3d
> +#define DRM_I915_GEM_VM_UNBIND         0x3e
> +#define DRM_I915_GEM_EXECBUFFER3       0x3f
> +
> +#define DRM_IOCTL_I915_GEM_VM_BIND             DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND           DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3         DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> +
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the completion of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> +       /** @handle: User's handle for a drm_syncobj to wait on or signal.
> */
> +       __u32 handle;
> +
> +       /**
> +        * @flags: Supported flags are:
> +        *
> +        * I915_TIMELINE_FENCE_WAIT:
> +        * Wait for the input fence before the operation.
> +        *
> +        * I915_TIMELINE_FENCE_SIGNAL:
> +        * Return operation completion fence as output.
> +        */
> +       __u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL
> << 1))
> +
> +       /**
> +        * @value: A point in the timeline.
> +        * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> +        * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> +        * binary one.
> +        */
> +       __u64 value;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the mapping of
> GPU
> + * virtual address (VA) range to the section of an object that should be
> bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not currently bound) and
> can
> + * be mapped to whole object or a section of the object (partial binding).
> + * Multiple VA mappings can be created to the same section of the object
> + * (aliasing).
> + *
> + * The @start, @offset and @length must be 4K page aligned. However the
> DG2
> + * and XEHPSDV has 64K page size for device local-memory and has compact
> page
> + * table. On those platforms, for binding device local-memory objects, the
> + * @start must be 2M aligned, @offset and @length must be 64K aligned.
> + * Also, for such mappings, i915 will reserve the whole 2M range for it
> so as
> + * to not allow multiple mappings in that 2M range (Compact page tables
> do not
> + * allow 64K page and 4K page bindings in the same 2M range).
> + *
> + * Error code -EINVAL will be returned if @start, @offset and @length are
> not
> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error
> code
> + * -ENOSPC will be returned if the VA range specified can't be reserved.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND operation can be
> done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_bind {
> +       /** @vm_id: VM (address space) id to bind */
> +       __u32 vm_id;
> +
> +       /** @handle: Object handle */
> +       __u32 handle;
> +
> +       /** @start: Virtual Address start to bind */
> +       __u64 start;
> +
> +       /** @offset: Offset in object to bind */
> +       __u64 offset;
> +
> +       /** @length: Length of mapping to bind */
> +       __u64 length;
> +
> +       /**
> +        * @flags: Supported flags are:
> +        *
> +        * I915_GEM_VM_BIND_READONLY:
> +        * Mapping is read-only.
> +        *
> +        * I915_GEM_VM_BIND_CAPTURE:
> +        * Capture this mapping in the dump upon GPU error.
> +        */
> +       __u64 flags;
> +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
> +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
> +
> +       /**
> +        * @fence: Timeline fence for bind completion signaling.
> +        *
> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +        * is invalid, and an error will be returned.
> +        */
> +       struct drm_i915_gem_timeline_fence fence;
>

Why a single fence and not an array of fences?  If Mesa wants to use the
out fences for signalling VkSemaphores on the sparse binding queue, we need
N of them.  We can still have the "zero fences means block" behavior.

--Jason


> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> + *
> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU
> virtual
> + * address (VA) range that should be unbound from the device page table
> of the
> + * specified address space (VM). VM_UNBIND will force unbind the specified
> + * range from device page table without waiting for any GPU job to
> complete.
> + * It is UMDs responsibility to ensure the mapping is no longer in use
> before
> + * calling VM_UNBIND.
> + *
> + * If the specified mapping is not found, the ioctl will simply return
> without
> + * any error.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> concurrently
> + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be
> done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_unbind {
> +       /** @vm_id: VM (address space) id to bind */
> +       __u32 vm_id;
> +
> +       /** @rsvd: Reserved, MBZ */
> +       __u32 rsvd;
> +
> +       /** @start: Virtual Address start to unbind */
> +       __u64 start;
> +
> +       /** @length: Length of mapping to unbind */
> +       __u64 length;
> +
> +       /** @flags: Currently reserved, MBZ */
> +       __u64 flags;
> +
> +       /**
> +        * @fence: Timeline fence for unbind completion signaling.
> +        *
> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +        * is invalid, and an error will be returned.
> +        */
> +       struct drm_i915_gem_timeline_fence fence;
> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_execbuffer3 - Structure for
> DRM_I915_GEM_EXECBUFFER3
> + * ioctl.
> + *
> + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND
> mode
> + * only works with this ioctl for submission.
> + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
> + */
> +struct drm_i915_gem_execbuffer3 {
> +       /**
> +        * @ctx_id: Context id
> +        *
> +        * Only contexts with user engine map are allowed.
> +        */
> +       __u32 ctx_id;
> +
> +       /**
> +        * @engine_idx: Engine index
> +        *
> +        * An index in the user engine map of the context specified by
> @ctx_id.
> +        */
> +       __u32 engine_idx;
> +
> +       /**
> +        * @batch_address: Batch gpu virtual address/es.
> +        *
> +        * For normal submission, it is the gpu virtual address of the
> batch
> +        * buffer. For parallel submission, it is a pointer to an array of
> +        * batch buffer gpu virtual addresses with array size equal to the
> +        * number of (parallel) engines involved in that submission (See
> +        * struct i915_context_engines_parallel_submit).
> +        */
> +       __u64 batch_address;
> +
> +       /** @flags: Currently reserved, MBZ */
> +       __u64 flags;
> +
> +       /** @rsvd1: Reserved, MBZ */
> +       __u32 rsvd1;
> +
> +       /** @fence_count: Number of fences in @timeline_fences array. */
> +       __u32 fence_count;
> +
> +       /**
> +        * @timeline_fences: Pointer to an array of timeline fences.
> +        *
> +        * Timeline fences are of format struct
> drm_i915_gem_timeline_fence.
> +        */
> +       __u64 timeline_fences;
> +
> +       /** @rsvd2: Reserved, MBZ */
> +       __u64 rsvd2;
> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
> object
> + * private to the specified VM.
> + *
> + * See struct drm_i915_gem_create_ext.
> + */
> +struct drm_i915_gem_create_ext_vm_private {
> +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
> +       /** @base: Extension link. See struct i915_user_extension. */
> +       struct i915_user_extension base;
> +
> +       /** @vm_id: Id of the VM to which the object is private */
> +       __u32 vm_id;
> +};
> --
> 2.21.0.rc0.32.g243a4c7e27
>
>

[-- Attachment #2: Type: text/html, Size: 14205 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-30 15:45     ` Jason Ekstrand
  0 siblings, 0 replies; 53+ messages in thread
From: Jason Ekstrand @ 2022-06-30 15:45 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: Paulo Zanoni, Intel GFX, Chris Wilson, Thomas Hellstrom,
	Maling list - DRI developers, Daniel Vetter,
	Christian König, Matthew Auld

[-- Attachment #1: Type: text/plain, Size: 12430 bytes --]

On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura <
niranjana.vishwanathapura@intel.com> wrote:

> VM_BIND and related uapi definitions
>
> v2: Reduce the scope to simple Mesa use case.
> v3: Expand VM_UNBIND documentation and add
>     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>     and I915_GEM_VM_BIND_TLB_FLUSH flags.
> v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>     documentation for vm_bind/unbind.
> v5: Remove TLB flush requirement on VM_UNBIND.
>     Add version support to stage implementation.
> v6: Define and use drm_i915_gem_timeline_fence structure for
>     all timeline fences.
> v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>     Update documentation on async vm_bind/unbind and versioning.
>     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>     batch_count field and I915_EXEC3_SECURE flag.
>
> Signed-off-by: Niranjana Vishwanathapura <
> niranjana.vishwanathapura@intel.com>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>  1 file changed, 280 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
> b/Documentation/gpu/rfc/i915_vm_bind.h
> new file mode 100644
> index 000000000000..a93e08bceee6
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> @@ -0,0 +1,280 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +/**
> + * DOC: I915_PARAM_VM_BIND_VERSION
> + *
> + * VM_BIND feature version supported.
> + * See typedef drm_i915_getparam_t param.
> + *
> + * Specifies the VM_BIND feature version supported.
> + * The following versions of VM_BIND have been defined:
> + *
> + * 0: No VM_BIND support.
> + *
> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
> + *    previously with VM_BIND, the ioctl will not support unbinding
> multiple
> + *    mappings or splitting them. Similarly, VM_BIND calls will not
> replace
> + *    any existing mappings.
> + *
> + * 2: The restrictions on unbinding partial or multiple mappings is
> + *    lifted, Similarly, binding will replace any mappings in the given
> range.
> + *
> + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
> + */
> +#define I915_PARAM_VM_BIND_VERSION     57
> +
> +/**
> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> + *
> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> + * See struct drm_i915_gem_vm_control flags.
> + *
> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> + */
> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
> +
> +/* VM_BIND related ioctls */
> +#define DRM_I915_GEM_VM_BIND           0x3d
> +#define DRM_I915_GEM_VM_UNBIND         0x3e
> +#define DRM_I915_GEM_EXECBUFFER3       0x3f
> +
> +#define DRM_IOCTL_I915_GEM_VM_BIND             DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND           DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3         DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> +
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the completion of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> +       /** @handle: User's handle for a drm_syncobj to wait on or signal.
> */
> +       __u32 handle;
> +
> +       /**
> +        * @flags: Supported flags are:
> +        *
> +        * I915_TIMELINE_FENCE_WAIT:
> +        * Wait for the input fence before the operation.
> +        *
> +        * I915_TIMELINE_FENCE_SIGNAL:
> +        * Return operation completion fence as output.
> +        */
> +       __u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL
> << 1))
> +
> +       /**
> +        * @value: A point in the timeline.
> +        * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> +        * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> +        * binary one.
> +        */
> +       __u64 value;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the mapping of
> GPU
> + * virtual address (VA) range to the section of an object that should be
> bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not currently bound) and
> can
> + * be mapped to whole object or a section of the object (partial binding).
> + * Multiple VA mappings can be created to the same section of the object
> + * (aliasing).
> + *
> + * The @start, @offset and @length must be 4K page aligned. However the
> DG2
> + * and XEHPSDV has 64K page size for device local-memory and has compact
> page
> + * table. On those platforms, for binding device local-memory objects, the
> + * @start must be 2M aligned, @offset and @length must be 64K aligned.
> + * Also, for such mappings, i915 will reserve the whole 2M range for it
> so as
> + * to not allow multiple mappings in that 2M range (Compact page tables
> do not
> + * allow 64K page and 4K page bindings in the same 2M range).
> + *
> + * Error code -EINVAL will be returned if @start, @offset and @length are
> not
> + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error
> code
> + * -ENOSPC will be returned if the VA range specified can't be reserved.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> concurrently
> + * are not ordered. Furthermore, parts of the VM_BIND operation can be
> done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_bind {
> +       /** @vm_id: VM (address space) id to bind */
> +       __u32 vm_id;
> +
> +       /** @handle: Object handle */
> +       __u32 handle;
> +
> +       /** @start: Virtual Address start to bind */
> +       __u64 start;
> +
> +       /** @offset: Offset in object to bind */
> +       __u64 offset;
> +
> +       /** @length: Length of mapping to bind */
> +       __u64 length;
> +
> +       /**
> +        * @flags: Supported flags are:
> +        *
> +        * I915_GEM_VM_BIND_READONLY:
> +        * Mapping is read-only.
> +        *
> +        * I915_GEM_VM_BIND_CAPTURE:
> +        * Capture this mapping in the dump upon GPU error.
> +        */
> +       __u64 flags;
> +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
> +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
> +
> +       /**
> +        * @fence: Timeline fence for bind completion signaling.
> +        *
> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +        * is invalid, and an error will be returned.
> +        */
> +       struct drm_i915_gem_timeline_fence fence;
>

Why a single fence and not an array of fences?  If Mesa wants to use the
out fences for signalling VkSemaphores on the sparse binding queue, we need
N of them.  We can still have the "zero fences means block" behavior.

--Jason


> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> + *
> + * This structure is passed to VM_UNBIND ioctl and specifies the GPU
> virtual
> + * address (VA) range that should be unbound from the device page table
> of the
> + * specified address space (VM). VM_UNBIND will force unbind the specified
> + * range from device page table without waiting for any GPU job to
> complete.
> + * It is UMDs responsibility to ensure the mapping is no longer in use
> before
> + * calling VM_UNBIND.
> + *
> + * If the specified mapping is not found, the ioctl will simply return
> without
> + * any error.
> + *
> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
> concurrently
> + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be
> done
> + * asynchronously, if valid @fence is specified.
> + */
> +struct drm_i915_gem_vm_unbind {
> +       /** @vm_id: VM (address space) id to bind */
> +       __u32 vm_id;
> +
> +       /** @rsvd: Reserved, MBZ */
> +       __u32 rsvd;
> +
> +       /** @start: Virtual Address start to unbind */
> +       __u64 start;
> +
> +       /** @length: Length of mapping to unbind */
> +       __u64 length;
> +
> +       /** @flags: Currently reserved, MBZ */
> +       __u64 flags;
> +
> +       /**
> +        * @fence: Timeline fence for unbind completion signaling.
> +        *
> +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> +        * is invalid, and an error will be returned.
> +        */
> +       struct drm_i915_gem_timeline_fence fence;
> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_execbuffer3 - Structure for
> DRM_I915_GEM_EXECBUFFER3
> + * ioctl.
> + *
> + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND
> mode
> + * only works with this ioctl for submission.
> + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
> + */
> +struct drm_i915_gem_execbuffer3 {
> +       /**
> +        * @ctx_id: Context id
> +        *
> +        * Only contexts with user engine map are allowed.
> +        */
> +       __u32 ctx_id;
> +
> +       /**
> +        * @engine_idx: Engine index
> +        *
> +        * An index in the user engine map of the context specified by
> @ctx_id.
> +        */
> +       __u32 engine_idx;
> +
> +       /**
> +        * @batch_address: Batch gpu virtual address/es.
> +        *
> +        * For normal submission, it is the gpu virtual address of the
> batch
> +        * buffer. For parallel submission, it is a pointer to an array of
> +        * batch buffer gpu virtual addresses with array size equal to the
> +        * number of (parallel) engines involved in that submission (See
> +        * struct i915_context_engines_parallel_submit).
> +        */
> +       __u64 batch_address;
> +
> +       /** @flags: Currently reserved, MBZ */
> +       __u64 flags;
> +
> +       /** @rsvd1: Reserved, MBZ */
> +       __u32 rsvd1;
> +
> +       /** @fence_count: Number of fences in @timeline_fences array. */
> +       __u32 fence_count;
> +
> +       /**
> +        * @timeline_fences: Pointer to an array of timeline fences.
> +        *
> +        * Timeline fences are of format struct
> drm_i915_gem_timeline_fence.
> +        */
> +       __u64 timeline_fences;
> +
> +       /** @rsvd2: Reserved, MBZ */
> +       __u64 rsvd2;
> +
> +       /**
> +        * @extensions: Zero-terminated chain of extensions.
> +        *
> +        * For future extensions. See struct i915_user_extension.
> +        */
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
> object
> + * private to the specified VM.
> + *
> + * See struct drm_i915_gem_create_ext.
> + */
> +struct drm_i915_gem_create_ext_vm_private {
> +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
> +       /** @base: Extension link. See struct i915_user_extension. */
> +       struct i915_user_extension base;
> +
> +       /** @vm_id: Id of the VM to which the object is private */
> +       __u32 vm_id;
> +};
> --
> 2.21.0.rc0.32.g243a4c7e27
>
>

[-- Attachment #2: Type: text/html, Size: 14205 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30  6:39         ` [Intel-gfx] " Zanoni, Paulo R
@ 2022-06-30 16:18           ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-30 16:18 UTC (permalink / raw)
  To: Zanoni, Paulo R
  Cc: Brost, Matthew, Ursulin, Tvrtko, intel-gfx, dri-devel, Hellstrom,
	Thomas, Zeng, Oak, Wilson, Chris P, jason, Vetter, Daniel,
	Landwerlin, Lionel G, christian.koenig, Auld, Matthew

On Wed, Jun 29, 2022 at 11:39:52PM -0700, Zanoni, Paulo R wrote:
>On Wed, 2022-06-29 at 23:08 -0700, Niranjana Vishwanathapura wrote:
>> On Wed, Jun 29, 2022 at 05:33:49PM -0700, Zanoni, Paulo R wrote:
>> > On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
>> > > VM_BIND and related uapi definitions
>> > >
>> > > v2: Reduce the scope to simple Mesa use case.
>> > > v3: Expand VM_UNBIND documentation and add
>> > >     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>> > >     and I915_GEM_VM_BIND_TLB_FLUSH flags.
>> > > v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>> > >     documentation for vm_bind/unbind.
>> > > v5: Remove TLB flush requirement on VM_UNBIND.
>> > >     Add version support to stage implementation.
>> > > v6: Define and use drm_i915_gem_timeline_fence structure for
>> > >     all timeline fences.
>> > > v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>> > >     Update documentation on async vm_bind/unbind and versioning.
>> > >     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>> > >     batch_count field and I915_EXEC3_SECURE flag.
>> > >
>> > > Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> > > ---
>> > >  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>> > >  1 file changed, 280 insertions(+)
>> > >  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>> > >
>> > > diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
>> > > new file mode 100644
>> > > index 000000000000..a93e08bceee6
>> > > --- /dev/null
>> > > +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>> > > @@ -0,0 +1,280 @@
>> > > +/* SPDX-License-Identifier: MIT */
>> > > +/*
>> > > + * Copyright © 2022 Intel Corporation
>> > > + */
>> > > +
>> > > +/**
>> > > + * DOC: I915_PARAM_VM_BIND_VERSION
>> > > + *
>> > > + * VM_BIND feature version supported.
>> > > + * See typedef drm_i915_getparam_t param.
>> > > + *
>> > > + * Specifies the VM_BIND feature version supported.
>> > > + * The following versions of VM_BIND have been defined:
>> > > + *
>> > > + * 0: No VM_BIND support.
>> > > + *
>> > > + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
>> > > + *    previously with VM_BIND, the ioctl will not support unbinding multiple
>> > > + *    mappings or splitting them. Similarly, VM_BIND calls will not replace
>> > > + *    any existing mappings.
>> > > + *
>> > > + * 2: The restrictions on unbinding partial or multiple mappings is
>> > > + *    lifted, Similarly, binding will replace any mappings in the given range.
>> > > + *
>> > > + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>> > > + */
>> > > +#define I915_PARAM_VM_BIND_VERSION   57
>> > > +
>> > > +/**
>> > > + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>> > > + *
>> > > + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>> > > + * See struct drm_i915_gem_vm_control flags.
>> > > + *
>> > > + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>> > > + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
>> > > + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>> > > + */
>> > > +#define I915_VM_CREATE_FLAGS_USE_VM_BIND     (1 << 0)
>> > > +
>> > > +/* VM_BIND related ioctls */
>> > > +#define DRM_I915_GEM_VM_BIND         0x3d
>> > > +#define DRM_I915_GEM_VM_UNBIND               0x3e
>> > > +#define DRM_I915_GEM_EXECBUFFER3     0x3f
>> > > +
>> > > +#define DRM_IOCTL_I915_GEM_VM_BIND           DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
>> > > +#define DRM_IOCTL_I915_GEM_VM_UNBIND         DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
>> > > +#define DRM_IOCTL_I915_GEM_EXECBUFFER3               DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
>> > > +
>> > > +/**
>> > > + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
>> > > + *
>> > > + * The operation will wait for input fence to signal.
>> > > + *
>> > > + * The returned output fence will be signaled after the completion of the
>> > > + * operation.
>> > > + */
>> > > +struct drm_i915_gem_timeline_fence {
>> > > +     /** @handle: User's handle for a drm_syncobj to wait on or signal. */
>> > > +     __u32 handle;
>> > > +
>> > > +     /**
>> > > +      * @flags: Supported flags are:
>> > > +      *
>> > > +      * I915_TIMELINE_FENCE_WAIT:
>> > > +      * Wait for the input fence before the operation.
>> > > +      *
>> > > +      * I915_TIMELINE_FENCE_SIGNAL:
>> > > +      * Return operation completion fence as output.
>> > > +      */
>> > > +     __u32 flags;
>> > > +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>> > > +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>> > > +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>> > > +
>> > > +     /**
>> > > +      * @value: A point in the timeline.
>> > > +      * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>> > > +      * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
>> > > +      * binary one.
>> > > +      */
>> > > +     __u64 value;
>> > > +};
>> > > +
>> > > +/**
>> > > + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>> > > + *
>> > > + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
>> > > + * virtual address (VA) range to the section of an object that should be bound
>> > > + * in the device page table of the specified address space (VM).
>> > > + * The VA range specified must be unique (ie., not currently bound) and can
>> > > + * be mapped to whole object or a section of the object (partial binding).
>> > > + * Multiple VA mappings can be created to the same section of the object
>> > > + * (aliasing).
>> > > + *
>> > > + * The @start, @offset and @length must be 4K page aligned. However the DG2
>> > > + * and XEHPSDV has 64K page size for device local-memory and has compact page
>> > > + * table. On those platforms, for binding device local-memory objects, the
>> > > + * @start must be 2M aligned, @offset and @length must be 64K aligned.
>> > > + * Also, for such mappings, i915 will reserve the whole 2M range for it so as
>> > > + * to not allow multiple mappings in that 2M range (Compact page tables do not
>> > > + * allow 64K page and 4K page bindings in the same 2M range).
>> > > + *
>> > > + * Error code -EINVAL will be returned if @start, @offset and @length are not
>> > > + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
>> > > + * -ENOSPC will be returned if the VA range specified can't be reserved.
>> > > + *
>> > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>> > > + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
>> > > + * asynchronously, if valid @fence is specified.
>> >
>> > Does that mean that if I don't provide @fence, then this ioctl will be
>> > synchronous (i.e., when it returns, the memory will be guaranteed to be
>> > bound)? The text is kinda implying that, but from one of your earlier
>> > replies to Tvrtko, that doesn't seem to be the case. I guess we could
>> > change the text to make this more explicit.
>> >
>>
>> Yes, I thought, if user doesn't specify the out fence, KMD better make
>> the ioctl synchronous by waiting until the binding finishes before
>> returning. Otherwise, UMD has no way to ensure binding is complete and
>> UMD must pass in out fence for VM_BIND calls.
>>
>> But latest comment form Daniel on other thread might suggest something else.
>> Daniel, can you comment?
>
>Whatever we decide, let's make sure it's documented.
>
>>
>> > In addition, previously we had the guarantee that an execbuf ioctl
>> > would wait for all the pending vm_bind operations to finish before
>> > doing anything. Do we still have this guarantee or do we have to make
>> > use of the fences now?
>> >
>>
>> No, we don't have that anymore (execbuf is decoupled from VM_BIND).
>> Execbuf3 submission will not wait for any previous VM_BIND to finish.
>> UMD must pass in VM_BIND out fence as in fence for execbuf3 to ensure
>> that.
>
>Got it, thanks.
>
>>
>> > > + */
>> > > +struct drm_i915_gem_vm_bind {
>> > > +     /** @vm_id: VM (address space) id to bind */
>> > > +     __u32 vm_id;
>> > > +
>> > > +     /** @handle: Object handle */
>> > > +     __u32 handle;
>> > > +
>> > > +     /** @start: Virtual Address start to bind */
>> > > +     __u64 start;
>> > > +
>> > > +     /** @offset: Offset in object to bind */
>> > > +     __u64 offset;
>> > > +
>> > > +     /** @length: Length of mapping to bind */
>> > > +     __u64 length;
>> > > +
>> > > +     /**
>> > > +      * @flags: Supported flags are:
>> > > +      *
>> > > +      * I915_GEM_VM_BIND_READONLY:
>> > > +      * Mapping is read-only.
>> >
>> > Can you please explain what happens when we try to write to a range
>> > that's bound as read-only?
>> >
>>
>> It will be mapped as read-only in device page table. Hence any
>> write access will fail. I would expect a CAT error reported.
>
>What's a CAT error? Does this lead to machine freeze or a GPU hang?
>Let's make sure we document this.
>

Catastrophic error.

>>
>> I am seeing that currently the page table R/W setting is based
>> on whether BO is readonly or not (UMDs can request a userptr
>> BO to readonly). We can make this READONLY here as a subset.
>> ie., if BO is readonly, the mappings must be readonly. If BO
>> is not readonly, then the mapping can be either readonly or
>> not.
>>
>> But if Mesa doesn't have a use for this, then we can remove
>> this flag for now.
>>
>
>I was considering using it for Vulkan's Sparse
>residencyNonResidentStrict, so we map all unbound pages to a read-only
>page. But for that to work, the required behavior would have to be:
>reads all return zero, writes are ignored without any sort of error.
>
>But maybe our hardware provides other ways to implement this, I haven't
>checked yet.
>

I am not sure what the behavior is. Probably writes are not simply ignored,
will check.
Looks like we can remove this flag for now. We can always add it back
later if we need it. Is that Ok with you?

Niranjana

>
>> >
>> > > +      *
>> > > +      * I915_GEM_VM_BIND_CAPTURE:
>> > > +      * Capture this mapping in the dump upon GPU error.
>> > > +      */
>> > > +     __u64 flags;
>> > > +#define I915_GEM_VM_BIND_READONLY    (1 << 1)
>> > > +#define I915_GEM_VM_BIND_CAPTURE     (1 << 2)
>> > > +
>> > > +     /**
>> > > +      * @fence: Timeline fence for bind completion signaling.
>> > > +      *
>> > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>> > > +      * is invalid, and an error will be returned.
>> > > +      */
>> > > +     struct drm_i915_gem_timeline_fence fence;
>> > > +
>> > > +     /**
>> > > +      * @extensions: Zero-terminated chain of extensions.
>> > > +      *
>> > > +      * For future extensions. See struct i915_user_extension.
>> > > +      */
>> > > +     __u64 extensions;
>> > > +};
>> > > +
>> > > +/**
>> > > + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>> > > + *
>> > > + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
>> > > + * address (VA) range that should be unbound from the device page table of the
>> > > + * specified address space (VM). VM_UNBIND will force unbind the specified
>> > > + * range from device page table without waiting for any GPU job to complete.
>> > > + * It is UMDs responsibility to ensure the mapping is no longer in use before
>> > > + * calling VM_UNBIND.
>> > > + *
>> > > + * If the specified mapping is not found, the ioctl will simply return without
>> > > + * any error.
>> > > + *
>> > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>> > > + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
>> > > + * asynchronously, if valid @fence is specified.
>> > > + */
>> > > +struct drm_i915_gem_vm_unbind {
>> > > +     /** @vm_id: VM (address space) id to bind */
>> > > +     __u32 vm_id;
>> > > +
>> > > +     /** @rsvd: Reserved, MBZ */
>> > > +     __u32 rsvd;
>> > > +
>> > > +     /** @start: Virtual Address start to unbind */
>> > > +     __u64 start;
>> > > +
>> > > +     /** @length: Length of mapping to unbind */
>> > > +     __u64 length;
>> > > +
>> > > +     /** @flags: Currently reserved, MBZ */
>> > > +     __u64 flags;
>> > > +
>> > > +     /**
>> > > +      * @fence: Timeline fence for unbind completion signaling.
>> > > +      *
>> > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>> > > +      * is invalid, and an error will be returned.
>> > > +      */
>> > > +     struct drm_i915_gem_timeline_fence fence;
>> > > +
>> > > +     /**
>> > > +      * @extensions: Zero-terminated chain of extensions.
>> > > +      *
>> > > +      * For future extensions. See struct i915_user_extension.
>> > > +      */
>> > > +     __u64 extensions;
>> > > +};
>> > > +
>> > > +/**
>> > > + * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
>> > > + * ioctl.
>> > > + *
>> > > + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
>> > > + * only works with this ioctl for submission.
>> > > + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>> > > + */
>> > > +struct drm_i915_gem_execbuffer3 {
>> > > +     /**
>> > > +      * @ctx_id: Context id
>> > > +      *
>> > > +      * Only contexts with user engine map are allowed.
>> > > +      */
>> > > +     __u32 ctx_id;
>> > > +
>> > > +     /**
>> > > +      * @engine_idx: Engine index
>> > > +      *
>> > > +      * An index in the user engine map of the context specified by @ctx_id.
>> > > +      */
>> > > +     __u32 engine_idx;
>> > > +
>> > > +     /**
>> > > +      * @batch_address: Batch gpu virtual address/es.
>> > > +      *
>> > > +      * For normal submission, it is the gpu virtual address of the batch
>> > > +      * buffer. For parallel submission, it is a pointer to an array of
>> > > +      * batch buffer gpu virtual addresses with array size equal to the
>> > > +      * number of (parallel) engines involved in that submission (See
>> > > +      * struct i915_context_engines_parallel_submit).
>> > > +      */
>> > > +     __u64 batch_address;
>> > > +
>> > > +     /** @flags: Currently reserved, MBZ */
>> > > +     __u64 flags;
>> > > +
>> > > +     /** @rsvd1: Reserved, MBZ */
>> > > +     __u32 rsvd1;
>> > > +
>> > > +     /** @fence_count: Number of fences in @timeline_fences array. */
>> > > +     __u32 fence_count;
>> > > +
>> > > +     /**
>> > > +      * @timeline_fences: Pointer to an array of timeline fences.
>> > > +      *
>> > > +      * Timeline fences are of format struct drm_i915_gem_timeline_fence.
>> > > +      */
>> > > +     __u64 timeline_fences;
>> > > +
>> > > +     /** @rsvd2: Reserved, MBZ */
>> > > +     __u64 rsvd2;
>> > > +
>> >
>> > Just out of curiosity: if we can extend behavior with @extensions and
>> > even @flags, why would we need a rsvd2? Perhaps we could kill rsvd2?
>> >
>>
>> True. I added it just in case some requests came up that would require
>> some additional fields. During this review process itself there were
>> some requests. Adding directly here should have a slight performance
>> edge over adding it as an extension (one less copy_from_user).
>>
>> But if folks think this is an overkill, I will remove it.
>
>I do not have strong opinions here, I'm just curious.
>
>Thanks,
>Paulo
>
>>
>> Niranjana
>>
>> > > +     /**
>> > > +      * @extensions: Zero-terminated chain of extensions.
>> > > +      *
>> > > +      * For future extensions. See struct i915_user_extension.
>> > > +      */
>> > > +     __u64 extensions;
>> > > +};
>> > > +
>> > > +/**
>> > > + * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
>> > > + * private to the specified VM.
>> > > + *
>> > > + * See struct drm_i915_gem_create_ext.
>> > > + */
>> > > +struct drm_i915_gem_create_ext_vm_private {
>> > > +#define I915_GEM_CREATE_EXT_VM_PRIVATE               2
>> > > +     /** @base: Extension link. See struct i915_user_extension. */
>> > > +     struct i915_user_extension base;
>> > > +
>> > > +     /** @vm_id: Id of the VM to which the object is private */
>> > > +     __u32 vm_id;
>> > > +};
>> >
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-30 16:18           ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-30 16:18 UTC (permalink / raw)
  To: Zanoni, Paulo R
  Cc: intel-gfx, dri-devel, Hellstrom, Thomas, Wilson, Chris P, Vetter,
	Daniel, christian.koenig, Auld, Matthew

On Wed, Jun 29, 2022 at 11:39:52PM -0700, Zanoni, Paulo R wrote:
>On Wed, 2022-06-29 at 23:08 -0700, Niranjana Vishwanathapura wrote:
>> On Wed, Jun 29, 2022 at 05:33:49PM -0700, Zanoni, Paulo R wrote:
>> > On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
>> > > VM_BIND and related uapi definitions
>> > >
>> > > v2: Reduce the scope to simple Mesa use case.
>> > > v3: Expand VM_UNBIND documentation and add
>> > >     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>> > >     and I915_GEM_VM_BIND_TLB_FLUSH flags.
>> > > v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>> > >     documentation for vm_bind/unbind.
>> > > v5: Remove TLB flush requirement on VM_UNBIND.
>> > >     Add version support to stage implementation.
>> > > v6: Define and use drm_i915_gem_timeline_fence structure for
>> > >     all timeline fences.
>> > > v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>> > >     Update documentation on async vm_bind/unbind and versioning.
>> > >     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>> > >     batch_count field and I915_EXEC3_SECURE flag.
>> > >
>> > > Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> > > ---
>> > >  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>> > >  1 file changed, 280 insertions(+)
>> > >  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>> > >
>> > > diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
>> > > new file mode 100644
>> > > index 000000000000..a93e08bceee6
>> > > --- /dev/null
>> > > +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>> > > @@ -0,0 +1,280 @@
>> > > +/* SPDX-License-Identifier: MIT */
>> > > +/*
>> > > + * Copyright © 2022 Intel Corporation
>> > > + */
>> > > +
>> > > +/**
>> > > + * DOC: I915_PARAM_VM_BIND_VERSION
>> > > + *
>> > > + * VM_BIND feature version supported.
>> > > + * See typedef drm_i915_getparam_t param.
>> > > + *
>> > > + * Specifies the VM_BIND feature version supported.
>> > > + * The following versions of VM_BIND have been defined:
>> > > + *
>> > > + * 0: No VM_BIND support.
>> > > + *
>> > > + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
>> > > + *    previously with VM_BIND, the ioctl will not support unbinding multiple
>> > > + *    mappings or splitting them. Similarly, VM_BIND calls will not replace
>> > > + *    any existing mappings.
>> > > + *
>> > > + * 2: The restrictions on unbinding partial or multiple mappings is
>> > > + *    lifted, Similarly, binding will replace any mappings in the given range.
>> > > + *
>> > > + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>> > > + */
>> > > +#define I915_PARAM_VM_BIND_VERSION   57
>> > > +
>> > > +/**
>> > > + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>> > > + *
>> > > + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>> > > + * See struct drm_i915_gem_vm_control flags.
>> > > + *
>> > > + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>> > > + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
>> > > + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>> > > + */
>> > > +#define I915_VM_CREATE_FLAGS_USE_VM_BIND     (1 << 0)
>> > > +
>> > > +/* VM_BIND related ioctls */
>> > > +#define DRM_I915_GEM_VM_BIND         0x3d
>> > > +#define DRM_I915_GEM_VM_UNBIND               0x3e
>> > > +#define DRM_I915_GEM_EXECBUFFER3     0x3f
>> > > +
>> > > +#define DRM_IOCTL_I915_GEM_VM_BIND           DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
>> > > +#define DRM_IOCTL_I915_GEM_VM_UNBIND         DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
>> > > +#define DRM_IOCTL_I915_GEM_EXECBUFFER3               DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
>> > > +
>> > > +/**
>> > > + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
>> > > + *
>> > > + * The operation will wait for input fence to signal.
>> > > + *
>> > > + * The returned output fence will be signaled after the completion of the
>> > > + * operation.
>> > > + */
>> > > +struct drm_i915_gem_timeline_fence {
>> > > +     /** @handle: User's handle for a drm_syncobj to wait on or signal. */
>> > > +     __u32 handle;
>> > > +
>> > > +     /**
>> > > +      * @flags: Supported flags are:
>> > > +      *
>> > > +      * I915_TIMELINE_FENCE_WAIT:
>> > > +      * Wait for the input fence before the operation.
>> > > +      *
>> > > +      * I915_TIMELINE_FENCE_SIGNAL:
>> > > +      * Return operation completion fence as output.
>> > > +      */
>> > > +     __u32 flags;
>> > > +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>> > > +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>> > > +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>> > > +
>> > > +     /**
>> > > +      * @value: A point in the timeline.
>> > > +      * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>> > > +      * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
>> > > +      * binary one.
>> > > +      */
>> > > +     __u64 value;
>> > > +};
>> > > +
>> > > +/**
>> > > + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>> > > + *
>> > > + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
>> > > + * virtual address (VA) range to the section of an object that should be bound
>> > > + * in the device page table of the specified address space (VM).
>> > > + * The VA range specified must be unique (ie., not currently bound) and can
>> > > + * be mapped to whole object or a section of the object (partial binding).
>> > > + * Multiple VA mappings can be created to the same section of the object
>> > > + * (aliasing).
>> > > + *
>> > > + * The @start, @offset and @length must be 4K page aligned. However the DG2
>> > > + * and XEHPSDV has 64K page size for device local-memory and has compact page
>> > > + * table. On those platforms, for binding device local-memory objects, the
>> > > + * @start must be 2M aligned, @offset and @length must be 64K aligned.
>> > > + * Also, for such mappings, i915 will reserve the whole 2M range for it so as
>> > > + * to not allow multiple mappings in that 2M range (Compact page tables do not
>> > > + * allow 64K page and 4K page bindings in the same 2M range).
>> > > + *
>> > > + * Error code -EINVAL will be returned if @start, @offset and @length are not
>> > > + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
>> > > + * -ENOSPC will be returned if the VA range specified can't be reserved.
>> > > + *
>> > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>> > > + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
>> > > + * asynchronously, if valid @fence is specified.
>> >
>> > Does that mean that if I don't provide @fence, then this ioctl will be
>> > synchronous (i.e., when it returns, the memory will be guaranteed to be
>> > bound)? The text is kinda implying that, but from one of your earlier
>> > replies to Tvrtko, that doesn't seem to be the case. I guess we could
>> > change the text to make this more explicit.
>> >
>>
>> Yes, I thought, if user doesn't specify the out fence, KMD better make
>> the ioctl synchronous by waiting until the binding finishes before
>> returning. Otherwise, UMD has no way to ensure binding is complete and
>> UMD must pass in out fence for VM_BIND calls.
>>
>> But latest comment form Daniel on other thread might suggest something else.
>> Daniel, can you comment?
>
>Whatever we decide, let's make sure it's documented.
>
>>
>> > In addition, previously we had the guarantee that an execbuf ioctl
>> > would wait for all the pending vm_bind operations to finish before
>> > doing anything. Do we still have this guarantee or do we have to make
>> > use of the fences now?
>> >
>>
>> No, we don't have that anymore (execbuf is decoupled from VM_BIND).
>> Execbuf3 submission will not wait for any previous VM_BIND to finish.
>> UMD must pass in VM_BIND out fence as in fence for execbuf3 to ensure
>> that.
>
>Got it, thanks.
>
>>
>> > > + */
>> > > +struct drm_i915_gem_vm_bind {
>> > > +     /** @vm_id: VM (address space) id to bind */
>> > > +     __u32 vm_id;
>> > > +
>> > > +     /** @handle: Object handle */
>> > > +     __u32 handle;
>> > > +
>> > > +     /** @start: Virtual Address start to bind */
>> > > +     __u64 start;
>> > > +
>> > > +     /** @offset: Offset in object to bind */
>> > > +     __u64 offset;
>> > > +
>> > > +     /** @length: Length of mapping to bind */
>> > > +     __u64 length;
>> > > +
>> > > +     /**
>> > > +      * @flags: Supported flags are:
>> > > +      *
>> > > +      * I915_GEM_VM_BIND_READONLY:
>> > > +      * Mapping is read-only.
>> >
>> > Can you please explain what happens when we try to write to a range
>> > that's bound as read-only?
>> >
>>
>> It will be mapped as read-only in device page table. Hence any
>> write access will fail. I would expect a CAT error reported.
>
>What's a CAT error? Does this lead to machine freeze or a GPU hang?
>Let's make sure we document this.
>

Catastrophic error.

>>
>> I am seeing that currently the page table R/W setting is based
>> on whether BO is readonly or not (UMDs can request a userptr
>> BO to readonly). We can make this READONLY here as a subset.
>> ie., if BO is readonly, the mappings must be readonly. If BO
>> is not readonly, then the mapping can be either readonly or
>> not.
>>
>> But if Mesa doesn't have a use for this, then we can remove
>> this flag for now.
>>
>
>I was considering using it for Vulkan's Sparse
>residencyNonResidentStrict, so we map all unbound pages to a read-only
>page. But for that to work, the required behavior would have to be:
>reads all return zero, writes are ignored without any sort of error.
>
>But maybe our hardware provides other ways to implement this, I haven't
>checked yet.
>

I am not sure what the behavior is. Probably writes are not simply ignored,
will check.
Looks like we can remove this flag for now. We can always add it back
later if we need it. Is that Ok with you?

Niranjana

>
>> >
>> > > +      *
>> > > +      * I915_GEM_VM_BIND_CAPTURE:
>> > > +      * Capture this mapping in the dump upon GPU error.
>> > > +      */
>> > > +     __u64 flags;
>> > > +#define I915_GEM_VM_BIND_READONLY    (1 << 1)
>> > > +#define I915_GEM_VM_BIND_CAPTURE     (1 << 2)
>> > > +
>> > > +     /**
>> > > +      * @fence: Timeline fence for bind completion signaling.
>> > > +      *
>> > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>> > > +      * is invalid, and an error will be returned.
>> > > +      */
>> > > +     struct drm_i915_gem_timeline_fence fence;
>> > > +
>> > > +     /**
>> > > +      * @extensions: Zero-terminated chain of extensions.
>> > > +      *
>> > > +      * For future extensions. See struct i915_user_extension.
>> > > +      */
>> > > +     __u64 extensions;
>> > > +};
>> > > +
>> > > +/**
>> > > + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>> > > + *
>> > > + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
>> > > + * address (VA) range that should be unbound from the device page table of the
>> > > + * specified address space (VM). VM_UNBIND will force unbind the specified
>> > > + * range from device page table without waiting for any GPU job to complete.
>> > > + * It is UMDs responsibility to ensure the mapping is no longer in use before
>> > > + * calling VM_UNBIND.
>> > > + *
>> > > + * If the specified mapping is not found, the ioctl will simply return without
>> > > + * any error.
>> > > + *
>> > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>> > > + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
>> > > + * asynchronously, if valid @fence is specified.
>> > > + */
>> > > +struct drm_i915_gem_vm_unbind {
>> > > +     /** @vm_id: VM (address space) id to bind */
>> > > +     __u32 vm_id;
>> > > +
>> > > +     /** @rsvd: Reserved, MBZ */
>> > > +     __u32 rsvd;
>> > > +
>> > > +     /** @start: Virtual Address start to unbind */
>> > > +     __u64 start;
>> > > +
>> > > +     /** @length: Length of mapping to unbind */
>> > > +     __u64 length;
>> > > +
>> > > +     /** @flags: Currently reserved, MBZ */
>> > > +     __u64 flags;
>> > > +
>> > > +     /**
>> > > +      * @fence: Timeline fence for unbind completion signaling.
>> > > +      *
>> > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>> > > +      * is invalid, and an error will be returned.
>> > > +      */
>> > > +     struct drm_i915_gem_timeline_fence fence;
>> > > +
>> > > +     /**
>> > > +      * @extensions: Zero-terminated chain of extensions.
>> > > +      *
>> > > +      * For future extensions. See struct i915_user_extension.
>> > > +      */
>> > > +     __u64 extensions;
>> > > +};
>> > > +
>> > > +/**
>> > > + * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
>> > > + * ioctl.
>> > > + *
>> > > + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
>> > > + * only works with this ioctl for submission.
>> > > + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>> > > + */
>> > > +struct drm_i915_gem_execbuffer3 {
>> > > +     /**
>> > > +      * @ctx_id: Context id
>> > > +      *
>> > > +      * Only contexts with user engine map are allowed.
>> > > +      */
>> > > +     __u32 ctx_id;
>> > > +
>> > > +     /**
>> > > +      * @engine_idx: Engine index
>> > > +      *
>> > > +      * An index in the user engine map of the context specified by @ctx_id.
>> > > +      */
>> > > +     __u32 engine_idx;
>> > > +
>> > > +     /**
>> > > +      * @batch_address: Batch gpu virtual address/es.
>> > > +      *
>> > > +      * For normal submission, it is the gpu virtual address of the batch
>> > > +      * buffer. For parallel submission, it is a pointer to an array of
>> > > +      * batch buffer gpu virtual addresses with array size equal to the
>> > > +      * number of (parallel) engines involved in that submission (See
>> > > +      * struct i915_context_engines_parallel_submit).
>> > > +      */
>> > > +     __u64 batch_address;
>> > > +
>> > > +     /** @flags: Currently reserved, MBZ */
>> > > +     __u64 flags;
>> > > +
>> > > +     /** @rsvd1: Reserved, MBZ */
>> > > +     __u32 rsvd1;
>> > > +
>> > > +     /** @fence_count: Number of fences in @timeline_fences array. */
>> > > +     __u32 fence_count;
>> > > +
>> > > +     /**
>> > > +      * @timeline_fences: Pointer to an array of timeline fences.
>> > > +      *
>> > > +      * Timeline fences are of format struct drm_i915_gem_timeline_fence.
>> > > +      */
>> > > +     __u64 timeline_fences;
>> > > +
>> > > +     /** @rsvd2: Reserved, MBZ */
>> > > +     __u64 rsvd2;
>> > > +
>> >
>> > Just out of curiosity: if we can extend behavior with @extensions and
>> > even @flags, why would we need a rsvd2? Perhaps we could kill rsvd2?
>> >
>>
>> True. I added it just in case some requests came up that would require
>> some additional fields. During this review process itself there were
>> some requests. Adding directly here should have a slight performance
>> edge over adding it as an extension (one less copy_from_user).
>>
>> But if folks think this is an overkill, I will remove it.
>
>I do not have strong opinions here, I'm just curious.
>
>Thanks,
>Paulo
>
>>
>> Niranjana
>>
>> > > +     /**
>> > > +      * @extensions: Zero-terminated chain of extensions.
>> > > +      *
>> > > +      * For future extensions. See struct i915_user_extension.
>> > > +      */
>> > > +     __u64 extensions;
>> > > +};
>> > > +
>> > > +/**
>> > > + * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
>> > > + * private to the specified VM.
>> > > + *
>> > > + * See struct drm_i915_gem_create_ext.
>> > > + */
>> > > +struct drm_i915_gem_create_ext_vm_private {
>> > > +#define I915_GEM_CREATE_EXT_VM_PRIVATE               2
>> > > +     /** @base: Extension link. See struct i915_user_extension. */
>> > > +     struct i915_user_extension base;
>> > > +
>> > > +     /** @vm_id: Id of the VM to which the object is private */
>> > > +     __u32 vm_id;
>> > > +};
>> >
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30  7:59       ` Tvrtko Ursulin
@ 2022-06-30 16:22         ` Niranjana Vishwanathapura
  2022-07-01  8:11           ` Tvrtko Ursulin
  0 siblings, 1 reply; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-30 16:22 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Hellstrom, Thomas, Wilson,
	Chris P, Vetter, Daniel, christian.koenig, Auld, Matthew

On Thu, Jun 30, 2022 at 08:59:09AM +0100, Tvrtko Ursulin wrote:
>
>On 30/06/2022 07:08, Niranjana Vishwanathapura wrote:
>>On Wed, Jun 29, 2022 at 05:33:49PM -0700, Zanoni, Paulo R wrote:
>>>On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
>>>>VM_BIND and related uapi definitions
>>>>
>>>>v2: Reduce the scope to simple Mesa use case.
>>>>v3: Expand VM_UNBIND documentation and add
>>>>    I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>>>>    and I915_GEM_VM_BIND_TLB_FLUSH flags.
>>>>v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>>>>    documentation for vm_bind/unbind.
>>>>v5: Remove TLB flush requirement on VM_UNBIND.
>>>>    Add version support to stage implementation.
>>>>v6: Define and use drm_i915_gem_timeline_fence structure for
>>>>    all timeline fences.
>>>>v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>>>>    Update documentation on async vm_bind/unbind and versioning.
>>>>    Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>>>>    batch_count field and I915_EXEC3_SECURE flag.
>>>>
>>>>Signed-off-by: Niranjana Vishwanathapura 
>>>><niranjana.vishwanathapura@intel.com>
>>>>Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>>---
>>>> Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>>>> 1 file changed, 280 insertions(+)
>>>> create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>>>
>>>>diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
>>>>b/Documentation/gpu/rfc/i915_vm_bind.h
>>>>new file mode 100644
>>>>index 000000000000..a93e08bceee6
>>>>--- /dev/null
>>>>+++ b/Documentation/gpu/rfc/i915_vm_bind.h
>>>>@@ -0,0 +1,280 @@
>>>>+/* SPDX-License-Identifier: MIT */
>>>>+/*
>>>>+ * Copyright © 2022 Intel Corporation
>>>>+ */
>>>>+
>>>>+/**
>>>>+ * DOC: I915_PARAM_VM_BIND_VERSION
>>>>+ *
>>>>+ * VM_BIND feature version supported.
>>>>+ * See typedef drm_i915_getparam_t param.
>>>>+ *
>>>>+ * Specifies the VM_BIND feature version supported.
>>>>+ * The following versions of VM_BIND have been defined:
>>>>+ *
>>>>+ * 0: No VM_BIND support.
>>>>+ *
>>>>+ * 1: In VM_UNBIND calls, the UMD must specify the exact 
>>>>mappings created
>>>>+ *    previously with VM_BIND, the ioctl will not support 
>>>>unbinding multiple
>>>>+ *    mappings or splitting them. Similarly, VM_BIND calls will 
>>>>not replace
>>>>+ *    any existing mappings.
>>>>+ *
>>>>+ * 2: The restrictions on unbinding partial or multiple mappings is
>>>>+ *    lifted, Similarly, binding will replace any mappings in 
>>>>the given range.
>>>>+ *
>>>>+ * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>>>>+ */
>>>>+#define I915_PARAM_VM_BIND_VERSION   57
>>>>+
>>>>+/**
>>>>+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>>>>+ *
>>>>+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
>>>>+ * See struct drm_i915_gem_vm_control flags.
>>>>+ *
>>>>+ * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>>>>+ * For VM_BIND mode, we have new execbuf3 ioctl which will not 
>>>>accept any
>>>>+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>>>>+ */
>>>>+#define I915_VM_CREATE_FLAGS_USE_VM_BIND     (1 << 0)
>>>>+
>>>>+/* VM_BIND related ioctls */
>>>>+#define DRM_I915_GEM_VM_BIND         0x3d
>>>>+#define DRM_I915_GEM_VM_UNBIND               0x3e
>>>>+#define DRM_I915_GEM_EXECBUFFER3     0x3f
>>>>+
>>>>+#define DRM_IOCTL_I915_GEM_VM_BIND           
>>>>DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct 
>>>>drm_i915_gem_vm_bind)
>>>>+#define DRM_IOCTL_I915_GEM_VM_UNBIND         
>>>>DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct 
>>>>drm_i915_gem_vm_bind)
>>>>+#define DRM_IOCTL_I915_GEM_EXECBUFFER3               
>>>>DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct 
>>>>drm_i915_gem_execbuffer3)
>>>>+
>>>>+/**
>>>>+ * struct drm_i915_gem_timeline_fence - An input or output 
>>>>timeline fence.
>>>>+ *
>>>>+ * The operation will wait for input fence to signal.
>>>>+ *
>>>>+ * The returned output fence will be signaled after the 
>>>>completion of the
>>>>+ * operation.
>>>>+ */
>>>>+struct drm_i915_gem_timeline_fence {
>>>>+     /** @handle: User's handle for a drm_syncobj to wait on or 
>>>>signal. */
>>>>+     __u32 handle;
>>>>+
>>>>+     /**
>>>>+      * @flags: Supported flags are:
>>>>+      *
>>>>+      * I915_TIMELINE_FENCE_WAIT:
>>>>+      * Wait for the input fence before the operation.
>>>>+      *
>>>>+      * I915_TIMELINE_FENCE_SIGNAL:
>>>>+      * Return operation completion fence as output.
>>>>+      */
>>>>+     __u32 flags;
>>>>+#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>>>>+#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>>>>+#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS 
>>>>(-(I915_TIMELINE_FENCE_SIGNAL << 1))
>>>>+
>>>>+     /**
>>>>+      * @value: A point in the timeline.
>>>>+      * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>>>>+      * timeline drm_syncobj is invalid as it turns a 
>>>>drm_syncobj into a
>>>>+      * binary one.
>>>>+      */
>>>>+     __u64 value;
>>>>+};
>>>>+
>>>>+/**
>>>>+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>>>>+ *
>>>>+ * This structure is passed to VM_BIND ioctl and specifies the 
>>>>mapping of GPU
>>>>+ * virtual address (VA) range to the section of an object that 
>>>>should be bound
>>>>+ * in the device page table of the specified address space (VM).
>>>>+ * The VA range specified must be unique (ie., not currently 
>>>>bound) and can
>>>>+ * be mapped to whole object or a section of the object 
>>>>(partial binding).
>>>>+ * Multiple VA mappings can be created to the same section of 
>>>>the object
>>>>+ * (aliasing).
>>>>+ *
>>>>+ * The @start, @offset and @length must be 4K page aligned. 
>>>>However the DG2
>>>>+ * and XEHPSDV has 64K page size for device local-memory and 
>>>>has compact page
>>>>+ * table. On those platforms, for binding device local-memory 
>>>>objects, the
>>>>+ * @start must be 2M aligned, @offset and @length must be 64K aligned.
>>>>+ * Also, for such mappings, i915 will reserve the whole 2M 
>>>>range for it so as
>>>>+ * to not allow multiple mappings in that 2M range (Compact 
>>>>page tables do not
>>>>+ * allow 64K page and 4K page bindings in the same 2M range).
>>>>+ *
>>>>+ * Error code -EINVAL will be returned if @start, @offset and 
>>>>@length are not
>>>>+ * properly aligned. In version 1 (See 
>>>>I915_PARAM_VM_BIND_VERSION), error code
>>>>+ * -ENOSPC will be returned if the VA range specified can't be 
>>>>reserved.
>>>>+ *
>>>>+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads 
>>>>concurrently
>>>>+ * are not ordered. Furthermore, parts of the VM_BIND operation 
>>>>can be done
>>>>+ * asynchronously, if valid @fence is specified.
>>>
>>>Does that mean that if I don't provide @fence, then this ioctl will be
>>>synchronous (i.e., when it returns, the memory will be guaranteed to be
>>>bound)? The text is kinda implying that, but from one of your earlier
>>>replies to Tvrtko, that doesn't seem to be the case. I guess we could
>>>change the text to make this more explicit.
>>>
>>
>>Yes, I thought, if user doesn't specify the out fence, KMD better make
>>the ioctl synchronous by waiting until the binding finishes before
>>returning. Otherwise, UMD has no way to ensure binding is complete and
>>UMD must pass in out fence for VM_BIND calls.
>
>This problematic angle is exactly what I raised and I did not 
>understand you were suggesting sync behaviour back then.
>
>I suggested a possible execbuf3 extension which makes it wait for any 
>pending (un)bind activity on a VM. Sounds better to me than making 
>everything sync for the use case of N binds followed by 1 execbuf. 
>*If* userspace wants an easy "fire and forget" mode for such use case, 
>rather than having to use a fence on all.
>

This is a good optimization. But it creates some synchronization between
VM_BIND and execbuf3. Based on discussion in IRC, looks like folks are
Ok with waiting in VM_BIND call if out fence is not specified by UMD.
So, we can go with that for now.

Niranjana

>Regards,
>
>Tvrtko
>
>>But latest comment form Daniel on other thread might suggest 
>>something else.
>>Daniel, can you comment?
>>
>>>In addition, previously we had the guarantee that an execbuf ioctl
>>>would wait for all the pending vm_bind operations to finish before
>>>doing anything. Do we still have this guarantee or do we have to make
>>>use of the fences now?
>>>
>>
>>No, we don't have that anymore (execbuf is decoupled from VM_BIND).
>>Execbuf3 submission will not wait for any previous VM_BIND to finish.
>>UMD must pass in VM_BIND out fence as in fence for execbuf3 to ensure
>>that.
>>
>>>>+ */
>>>>+struct drm_i915_gem_vm_bind {
>>>>+     /** @vm_id: VM (address space) id to bind */
>>>>+     __u32 vm_id;
>>>>+
>>>>+     /** @handle: Object handle */
>>>>+     __u32 handle;
>>>>+
>>>>+     /** @start: Virtual Address start to bind */
>>>>+     __u64 start;
>>>>+
>>>>+     /** @offset: Offset in object to bind */
>>>>+     __u64 offset;
>>>>+
>>>>+     /** @length: Length of mapping to bind */
>>>>+     __u64 length;
>>>>+
>>>>+     /**
>>>>+      * @flags: Supported flags are:
>>>>+      *
>>>>+      * I915_GEM_VM_BIND_READONLY:
>>>>+      * Mapping is read-only.
>>>
>>>Can you please explain what happens when we try to write to a range
>>>that's bound as read-only?
>>>
>>
>>It will be mapped as read-only in device page table. Hence any
>>write access will fail. I would expect a CAT error reported.
>>
>>I am seeing that currently the page table R/W setting is based
>>on whether BO is readonly or not (UMDs can request a userptr
>>BO to readonly). We can make this READONLY here as a subset.
>>ie., if BO is readonly, the mappings must be readonly. If BO
>>is not readonly, then the mapping can be either readonly or
>>not.
>>
>>But if Mesa doesn't have a use for this, then we can remove
>>this flag for now.
>>
>>>
>>>>+      *
>>>>+      * I915_GEM_VM_BIND_CAPTURE:
>>>>+      * Capture this mapping in the dump upon GPU error.
>>>>+      */
>>>>+     __u64 flags;
>>>>+#define I915_GEM_VM_BIND_READONLY    (1 << 1)
>>>>+#define I915_GEM_VM_BIND_CAPTURE     (1 << 2)
>>>>+
>>>>+     /**
>>>>+      * @fence: Timeline fence for bind completion signaling.
>>>>+      *
>>>>+      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>>>>+      * is invalid, and an error will be returned.
>>>>+      */
>>>>+     struct drm_i915_gem_timeline_fence fence;
>>>>+
>>>>+     /**
>>>>+      * @extensions: Zero-terminated chain of extensions.
>>>>+      *
>>>>+      * For future extensions. See struct i915_user_extension.
>>>>+      */
>>>>+     __u64 extensions;
>>>>+};
>>>>+
>>>>+/**
>>>>+ * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>>>>+ *
>>>>+ * This structure is passed to VM_UNBIND ioctl and specifies 
>>>>the GPU virtual
>>>>+ * address (VA) range that should be unbound from the device 
>>>>page table of the
>>>>+ * specified address space (VM). VM_UNBIND will force unbind 
>>>>the specified
>>>>+ * range from device page table without waiting for any GPU job 
>>>>to complete.
>>>>+ * It is UMDs responsibility to ensure the mapping is no longer 
>>>>in use before
>>>>+ * calling VM_UNBIND.
>>>>+ *
>>>>+ * If the specified mapping is not found, the ioctl will simply 
>>>>return without
>>>>+ * any error.
>>>>+ *
>>>>+ * VM_BIND/UNBIND ioctl calls executed on different CPU threads 
>>>>concurrently
>>>>+ * are not ordered. Furthermore, parts of the VM_UNBIND 
>>>>operation can be done
>>>>+ * asynchronously, if valid @fence is specified.
>>>>+ */
>>>>+struct drm_i915_gem_vm_unbind {
>>>>+     /** @vm_id: VM (address space) id to bind */
>>>>+     __u32 vm_id;
>>>>+
>>>>+     /** @rsvd: Reserved, MBZ */
>>>>+     __u32 rsvd;
>>>>+
>>>>+     /** @start: Virtual Address start to unbind */
>>>>+     __u64 start;
>>>>+
>>>>+     /** @length: Length of mapping to unbind */
>>>>+     __u64 length;
>>>>+
>>>>+     /** @flags: Currently reserved, MBZ */
>>>>+     __u64 flags;
>>>>+
>>>>+     /**
>>>>+      * @fence: Timeline fence for unbind completion signaling.
>>>>+      *
>>>>+      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>>>>+      * is invalid, and an error will be returned.
>>>>+      */
>>>>+     struct drm_i915_gem_timeline_fence fence;
>>>>+
>>>>+     /**
>>>>+      * @extensions: Zero-terminated chain of extensions.
>>>>+      *
>>>>+      * For future extensions. See struct i915_user_extension.
>>>>+      */
>>>>+     __u64 extensions;
>>>>+};
>>>>+
>>>>+/**
>>>>+ * struct drm_i915_gem_execbuffer3 - Structure for 
>>>>DRM_I915_GEM_EXECBUFFER3
>>>>+ * ioctl.
>>>>+ *
>>>>+ * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode 
>>>>and VM_BIND mode
>>>>+ * only works with this ioctl for submission.
>>>>+ * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>>>>+ */
>>>>+struct drm_i915_gem_execbuffer3 {
>>>>+     /**
>>>>+      * @ctx_id: Context id
>>>>+      *
>>>>+      * Only contexts with user engine map are allowed.
>>>>+      */
>>>>+     __u32 ctx_id;
>>>>+
>>>>+     /**
>>>>+      * @engine_idx: Engine index
>>>>+      *
>>>>+      * An index in the user engine map of the context 
>>>>specified by @ctx_id.
>>>>+      */
>>>>+     __u32 engine_idx;
>>>>+
>>>>+     /**
>>>>+      * @batch_address: Batch gpu virtual address/es.
>>>>+      *
>>>>+      * For normal submission, it is the gpu virtual address of 
>>>>the batch
>>>>+      * buffer. For parallel submission, it is a pointer to an array of
>>>>+      * batch buffer gpu virtual addresses with array size equal to the
>>>>+      * number of (parallel) engines involved in that submission (See
>>>>+      * struct i915_context_engines_parallel_submit).
>>>>+      */
>>>>+     __u64 batch_address;
>>>>+
>>>>+     /** @flags: Currently reserved, MBZ */
>>>>+     __u64 flags;
>>>>+
>>>>+     /** @rsvd1: Reserved, MBZ */
>>>>+     __u32 rsvd1;
>>>>+
>>>>+     /** @fence_count: Number of fences in @timeline_fences array. */
>>>>+     __u32 fence_count;
>>>>+
>>>>+     /**
>>>>+      * @timeline_fences: Pointer to an array of timeline fences.
>>>>+      *
>>>>+      * Timeline fences are of format struct 
>>>>drm_i915_gem_timeline_fence.
>>>>+      */
>>>>+     __u64 timeline_fences;
>>>>+
>>>>+     /** @rsvd2: Reserved, MBZ */
>>>>+     __u64 rsvd2;
>>>>+
>>>
>>>Just out of curiosity: if we can extend behavior with @extensions and
>>>even @flags, why would we need a rsvd2? Perhaps we could kill rsvd2?
>>>
>>
>>True. I added it just in case some requests came up that would require
>>some additional fields. During this review process itself there were
>>some requests. Adding directly here should have a slight performance
>>edge over adding it as an extension (one less copy_from_user).
>>
>>But if folks think this is an overkill, I will remove it.
>>
>>Niranjana
>>
>>>>+     /**
>>>>+      * @extensions: Zero-terminated chain of extensions.
>>>>+      *
>>>>+      * For future extensions. See struct i915_user_extension.
>>>>+      */
>>>>+     __u64 extensions;
>>>>+};
>>>>+
>>>>+/**
>>>>+ * struct drm_i915_gem_create_ext_vm_private - Extension to 
>>>>make the object
>>>>+ * private to the specified VM.
>>>>+ *
>>>>+ * See struct drm_i915_gem_create_ext.
>>>>+ */
>>>>+struct drm_i915_gem_create_ext_vm_private {
>>>>+#define I915_GEM_CREATE_EXT_VM_PRIVATE               2
>>>>+     /** @base: Extension link. See struct i915_user_extension. */
>>>>+     struct i915_user_extension base;
>>>>+
>>>>+     /** @vm_id: Id of the VM to which the object is private */
>>>>+     __u32 vm_id;
>>>>+};
>>>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30 15:34         ` [Intel-gfx] " Jason Ekstrand
@ 2022-06-30 16:23           ` Matthew Auld
  -1 siblings, 0 replies; 53+ messages in thread
From: Matthew Auld @ 2022-06-30 16:23 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Matthew Brost, Paulo Zanoni, Lionel Landwerlin, Tvrtko Ursulin,
	Intel GFX, Maling list - DRI developers, Thomas Hellstrom,
	oak.zeng, Chris Wilson, Daniel Vetter, Niranjana Vishwanathapura,
	Christian König

On 30/06/2022 16:34, Jason Ekstrand wrote:
> On Thu, Jun 30, 2022 at 10:14 AM Matthew Auld <matthew.auld@intel.com 
> <mailto:matthew.auld@intel.com>> wrote:
> 
>     On 30/06/2022 06:11, Jason Ekstrand wrote:
>      > On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura
>      > <niranjana.vishwanathapura@intel.com
>     <mailto:niranjana.vishwanathapura@intel.com>
>      > <mailto:niranjana.vishwanathapura@intel.com
>     <mailto:niranjana.vishwanathapura@intel.com>>> wrote:
>      >
>      >     VM_BIND and related uapi definitions
>      >
>      >     v2: Reduce the scope to simple Mesa use case.
>      >     v3: Expand VM_UNBIND documentation and add
>      >          I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>      >          and I915_GEM_VM_BIND_TLB_FLUSH flags.
>      >     v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>      >          documentation for vm_bind/unbind.
>      >     v5: Remove TLB flush requirement on VM_UNBIND.
>      >          Add version support to stage implementation.
>      >     v6: Define and use drm_i915_gem_timeline_fence structure for
>      >          all timeline fences.
>      >     v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>      >          Update documentation on async vm_bind/unbind and versioning.
>      >          Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>      >          batch_count field and I915_EXEC3_SECURE flag.
>      >
>      >     Signed-off-by: Niranjana Vishwanathapura
>      >     <niranjana.vishwanathapura@intel.com
>     <mailto:niranjana.vishwanathapura@intel.com>
>      >     <mailto:niranjana.vishwanathapura@intel.com
>     <mailto:niranjana.vishwanathapura@intel.com>>>
>      >     Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch
>     <mailto:daniel.vetter@ffwll.ch>
>      >     <mailto:daniel.vetter@ffwll.ch <mailto:daniel.vetter@ffwll.ch>>>
>      >     ---
>      >       Documentation/gpu/rfc/i915_vm_bind.h | 280
>     +++++++++++++++++++++++++++
>      >       1 file changed, 280 insertions(+)
>      >       create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>      >
>      >     diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
>      >     b/Documentation/gpu/rfc/i915_vm_bind.h
>      >     new file mode 100644
>      >     index 000000000000..a93e08bceee6
>      >     --- /dev/null
>      >     +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>      >     @@ -0,0 +1,280 @@
>      >     +/* SPDX-License-Identifier: MIT */
>      >     +/*
>      >     + * Copyright © 2022 Intel Corporation
>      >     + */
>      >     +
>      >     +/**
>      >     + * DOC: I915_PARAM_VM_BIND_VERSION
>      >     + *
>      >     + * VM_BIND feature version supported.
>      >     + * See typedef drm_i915_getparam_t param.
>      >     + *
>      >     + * Specifies the VM_BIND feature version supported.
>      >     + * The following versions of VM_BIND have been defined:
>      >     + *
>      >     + * 0: No VM_BIND support.
>      >     + *
>      >     + * 1: In VM_UNBIND calls, the UMD must specify the exact
>     mappings
>      >     created
>      >     + *    previously with VM_BIND, the ioctl will not support
>     unbinding
>      >     multiple
>      >     + *    mappings or splitting them. Similarly, VM_BIND calls
>     will not
>      >     replace
>      >     + *    any existing mappings.
>      >     + *
>      >     + * 2: The restrictions on unbinding partial or multiple
>     mappings is
>      >     + *    lifted, Similarly, binding will replace any mappings
>     in the
>      >     given range.
>      >     + *
>      >     + * See struct drm_i915_gem_vm_bind and struct
>     drm_i915_gem_vm_unbind.
>      >     + */
>      >     +#define I915_PARAM_VM_BIND_VERSION     57
>      >     +
>      >     +/**
>      >     + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>      >     + *
>      >     + * Flag to opt-in for VM_BIND mode of binding during VM
>     creation.
>      >     + * See struct drm_i915_gem_vm_control flags.
>      >     + *
>      >     + * The older execbuf2 ioctl will not support VM_BIND mode of
>     operation.
>      >     + * For VM_BIND mode, we have new execbuf3 ioctl which will not
>      >     accept any
>      >     + * execlist (See struct drm_i915_gem_execbuffer3 for more
>     details).
>      >     + */
>      >     +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
>      >     +
>      >     +/* VM_BIND related ioctls */
>      >     +#define DRM_I915_GEM_VM_BIND           0x3d
>      >     +#define DRM_I915_GEM_VM_UNBIND         0x3e
>      >     +#define DRM_I915_GEM_EXECBUFFER3       0x3f
>      >     +
>      >     +#define DRM_IOCTL_I915_GEM_VM_BIND
>      >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
>      >     drm_i915_gem_vm_bind)
>      >     +#define DRM_IOCTL_I915_GEM_VM_UNBIND
>      >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
>      >     drm_i915_gem_vm_bind)
>      >     +#define DRM_IOCTL_I915_GEM_EXECBUFFER3
>      >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
>      >     drm_i915_gem_execbuffer3)
>      >     +
>      >     +/**
>      >     + * struct drm_i915_gem_timeline_fence - An input or output
>     timeline
>      >     fence.
>      >     + *
>      >     + * The operation will wait for input fence to signal.
>      >     + *
>      >     + * The returned output fence will be signaled after the
>     completion
>      >     of the
>      >     + * operation.
>      >     + */
>      >     +struct drm_i915_gem_timeline_fence {
>      >     +       /** @handle: User's handle for a drm_syncobj to wait
>     on or
>      >     signal. */
>      >     +       __u32 handle;
>      >     +
>      >     +       /**
>      >     +        * @flags: Supported flags are:
>      >     +        *
>      >     +        * I915_TIMELINE_FENCE_WAIT:
>      >     +        * Wait for the input fence before the operation.
>      >     +        *
>      >     +        * I915_TIMELINE_FENCE_SIGNAL:
>      >     +        * Return operation completion fence as output.
>      >     +        */
>      >     +       __u32 flags;
>      >     +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>      >     +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>      >     +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS
>      >     (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>      >     +
>      >     +       /**
>      >     +        * @value: A point in the timeline.
>      >     +        * Value must be 0 for a binary drm_syncobj. A Value
>     of 0 for a
>      >     +        * timeline drm_syncobj is invalid as it turns a
>     drm_syncobj
>      >     into a
>      >     +        * binary one.
>      >     +        */
>      >     +       __u64 value;
>      >     +};
>      >     +
>      >     +/**
>      >     + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>      >     + *
>      >     + * This structure is passed to VM_BIND ioctl and specifies the
>      >     mapping of GPU
>      >     + * virtual address (VA) range to the section of an object that
>      >     should be bound
>      >     + * in the device page table of the specified address space (VM).
>      >     + * The VA range specified must be unique (ie., not currently
>     bound)
>      >     and can
>      >     + * be mapped to whole object or a section of the object (partial
>      >     binding).
>      >     + * Multiple VA mappings can be created to the same section
>     of the
>      >     object
>      >     + * (aliasing).
>      >     + *
>      >     + * The @start, @offset and @length must be 4K page aligned.
>     However
>      >     the DG2
>      >     + * and XEHPSDV has 64K page size for device local-memory and has
>      >     compact page
>      >     + * table. On those platforms, for binding device local-memory
>      >     objects, the
>      >     + * @start must be 2M aligned, @offset and @length must be
>     64K aligned.
>      >
>      >
>      > This is not acceptable.  We need 64K granularity.  This includes the
>      > starting address, the BO offset, and the length.  Why?  The tl;dr is
>      > that it's a requirement for about 50% of D3D12 apps if we want
>     them to
>      > run on Linux via D3D12.  A longer explanation follows.  I don't
>      > necessarily expect kernel folks to get all the details but hopefully
>      > I'll have left enough of a map that some of the Intel Mesa folks can
>      > help fill in details.
>      >
>      > Many modern D3D12 apps have a hard requirement on Tier2 tiled
>      > resources.  This is a feature that Intel has supported in the D3D12
>      > driver since Skylake.  In order to implement this feature, VKD3D
>      > requires the various sparseResidencyImage* and
>     sparseResidency*Sampled
>      > Vulkan features.  If we want those apps to work (there's getting
>     to be
>      > quite a few of them), we need to implement the Vulkan sparse
>     residency
>      > features.
>      > |
>      > |
>      > What is sparse residency?  I'm glad you asked!  The sparse residency
>      > features allow a client to separately bind each miplevel or array
>     slice
>      > of an image to a chunk of device memory independently, without
>     affecting
>      > any other areas of the image.  Once you get to a high enough
>     miplevel
>      > that everything fits inside a single sparse image block (that's a
>      > technical Vulkan term you can search for in the spec), you can
>     enter a
>      > "miptail" which contains all the remaining miplevels in a single
>     sparse
>      > image block.
>      >
>      > The term "sparse image block" is what the Vulkan spec uses.  On
>     Intel
>      > hardware and in the docs, it's what we call a "tile". 
>     Specifically, the
>      > image needs to use Yf or Ys tiling on SKL-TGL or a Tile64 on
>     DG2+.  This
>      > is because Tile4 and legacy X and Y-tiling don't provide any
>     guarantees
>      > about page alignment for slices.  Yf, Ys, and Tile64, on the
>     other hand,
>      > align all slices of the image to a tile boundary, allowing us to map
>      > memory to different slices independently, assuming we have 64K
>     (or 4K
>      > for Yf) VM_BIND granularity.  (4K isn't actually a requirement for
>      > SKL-TGL; we can use Ys all the time which has 64K tiles but
>     there's no
>      > reason to not support 4K alignments on integrated.)
>      >
>      > Someone may be tempted to ask, "Can't we wiggle the strides
>     around or
>      > something to make it work?"  I thought about that and no, you can't.
>      > The problem here is LOD2+.  Sure, you can have a stride such that
>     the
>      > image is a multiple of 2M worth of tiles across.  That'll work
>     fine for
>      > LOD0 and LOD1; both will be 2M aligned.  However, LOD2 won't be and
>      > there's no way to control that.  The hardware will place it to
>     the right
>      > of LOD1 by ROUND_UP(width, tile_width) pixels and there's nothing
>     you
>      > can do about that.  If that position doesn't happen to hit a 2M
>      > boundary, you're out of luck.
>      >
>      > I hope that explanation provides enough detail.  Sadly, this is
>     one of
>      > those things which has a lot of moving pieces all over different
>     bits of
>      > the hardware and various APIs and they all have to work together
>     just
>      > right for it to all come out in the end.  But, yeah, we really
>     need 64K
>      > aligned binding if we want VKD3D to work.
> 
>     Just to confirm, the new model would be to enforce 64K GTT alignment
>     for
>     lmem pages, and then for smem pages we would only require 4K alignment,
>     but with the added restriction that userspace will never try to mix the
>     two (lmem vs smem) within the same 2M va range (page-table). The kernel
>     will verify this and throw an error if needed. This model should work
>     with the above?
> 
> 
> Mesa doesn't have full control over BO placement so I don't think we can 
> guarantee quite as much as you want there.  We can guarantee, I think, 
> that we never place LMEM-only and SMEM-only in the same 2M block.  
> However, most BOs will be LMEM+SMEM (with a preference for LMEM) and 
> then it'll be up to the kernel to sort out any issues.  Is that reasonable?

That seems tricky for the lmem + smem case. On DG2 the hw design is such 
that you can't have 64K and 4K GTT pages within the same page-table, 
since the entire page-table is either operating in 64K or 4K GTT page 
mode (there is some special bit on the PDE that we need to toggle to 
turn on/off the 64K mode).

> 
> --Jason
> 
>      >
>      > --Jason
>      >
>      >     + * Also, for such mappings, i915 will reserve the whole 2M range
>      >     for it so as
>      >     + * to not allow multiple mappings in that 2M range (Compact page
>      >     tables do not
>      >     + * allow 64K page and 4K page bindings in the same 2M range).
>      >     + *
>      >     + * Error code -EINVAL will be returned if @start, @offset and
>      >     @length are not
>      >     + * properly aligned. In version 1 (See
>     I915_PARAM_VM_BIND_VERSION),
>      >     error code
>      >     + * -ENOSPC will be returned if the VA range specified can't be
>      >     reserved.
>      >     + *
>      >     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>      >     concurrently
>      >     + * are not ordered. Furthermore, parts of the VM_BIND
>     operation can
>      >     be done
>      >     + * asynchronously, if valid @fence is specified.
>      >     + */
>      >     +struct drm_i915_gem_vm_bind {
>      >     +       /** @vm_id: VM (address space) id to bind */
>      >     +       __u32 vm_id;
>      >     +
>      >     +       /** @handle: Object handle */
>      >     +       __u32 handle;
>      >     +
>      >     +       /** @start: Virtual Address start to bind */
>      >     +       __u64 start;
>      >     +
>      >     +       /** @offset: Offset in object to bind */
>      >     +       __u64 offset;
>      >     +
>      >     +       /** @length: Length of mapping to bind */
>      >     +       __u64 length;
>      >     +
>      >     +       /**
>      >     +        * @flags: Supported flags are:
>      >     +        *
>      >     +        * I915_GEM_VM_BIND_READONLY:
>      >     +        * Mapping is read-only.
>      >     +        *
>      >     +        * I915_GEM_VM_BIND_CAPTURE:
>      >     +        * Capture this mapping in the dump upon GPU error.
>      >     +        */
>      >     +       __u64 flags;
>      >     +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
>      >     +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
>      >     +
>      >     +       /**
>      >     +        * @fence: Timeline fence for bind completion signaling.
>      >     +        *
>      >     +        * It is an out fence, hence using
>     I915_TIMELINE_FENCE_WAIT flag
>      >     +        * is invalid, and an error will be returned.
>      >     +        */
>      >     +       struct drm_i915_gem_timeline_fence fence;
>      >     +
>      >     +       /**
>      >     +        * @extensions: Zero-terminated chain of extensions.
>      >     +        *
>      >     +        * For future extensions. See struct i915_user_extension.
>      >     +        */
>      >     +       __u64 extensions;
>      >     +};
>      >     +
>      >     +/**
>      >     + * struct drm_i915_gem_vm_unbind - VA to object mapping to
>     unbind.
>      >     + *
>      >     + * This structure is passed to VM_UNBIND ioctl and specifies the
>      >     GPU virtual
>      >     + * address (VA) range that should be unbound from the device
>     page
>      >     table of the
>      >     + * specified address space (VM). VM_UNBIND will force unbind the
>      >     specified
>      >     + * range from device page table without waiting for any GPU
>     job to
>      >     complete.
>      >     + * It is UMDs responsibility to ensure the mapping is no
>     longer in
>      >     use before
>      >     + * calling VM_UNBIND.
>      >     + *
>      >     + * If the specified mapping is not found, the ioctl will simply
>      >     return without
>      >     + * any error.
>      >     + *
>      >     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>      >     concurrently
>      >     + * are not ordered. Furthermore, parts of the VM_UNBIND
>     operation
>      >     can be done
>      >     + * asynchronously, if valid @fence is specified.
>      >     + */
>      >     +struct drm_i915_gem_vm_unbind {
>      >     +       /** @vm_id: VM (address space) id to bind */
>      >     +       __u32 vm_id;
>      >     +
>      >     +       /** @rsvd: Reserved, MBZ */
>      >     +       __u32 rsvd;
>      >     +
>      >     +       /** @start: Virtual Address start to unbind */
>      >     +       __u64 start;
>      >     +
>      >     +       /** @length: Length of mapping to unbind */
>      >     +       __u64 length;
>      >     +
>      >     +       /** @flags: Currently reserved, MBZ */
>      >     +       __u64 flags;
>      >     +
>      >     +       /**
>      >     +        * @fence: Timeline fence for unbind completion
>     signaling.
>      >     +        *
>      >     +        * It is an out fence, hence using
>     I915_TIMELINE_FENCE_WAIT flag
>      >     +        * is invalid, and an error will be returned.
>      >     +        */
>      >     +       struct drm_i915_gem_timeline_fence fence;
>      >     +
>      >     +       /**
>      >     +        * @extensions: Zero-terminated chain of extensions.
>      >     +        *
>      >     +        * For future extensions. See struct i915_user_extension.
>      >     +        */
>      >     +       __u64 extensions;
>      >     +};
>      >     +
>      >     +/**
>      >     + * struct drm_i915_gem_execbuffer3 - Structure for
>      >     DRM_I915_GEM_EXECBUFFER3
>      >     + * ioctl.
>      >     + *
>      >     + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and
>      >     VM_BIND mode
>      >     + * only works with this ioctl for submission.
>      >     + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>      >     + */
>      >     +struct drm_i915_gem_execbuffer3 {
>      >     +       /**
>      >     +        * @ctx_id: Context id
>      >     +        *
>      >     +        * Only contexts with user engine map are allowed.
>      >     +        */
>      >     +       __u32 ctx_id;
>      >     +
>      >     +       /**
>      >     +        * @engine_idx: Engine index
>      >     +        *
>      >     +        * An index in the user engine map of the context
>     specified
>      >     by @ctx_id.
>      >     +        */
>      >     +       __u32 engine_idx;
>      >     +
>      >     +       /**
>      >     +        * @batch_address: Batch gpu virtual address/es.
>      >     +        *
>      >     +        * For normal submission, it is the gpu virtual
>     address of
>      >     the batch
>      >     +        * buffer. For parallel submission, it is a pointer to an
>      >     array of
>      >     +        * batch buffer gpu virtual addresses with array size
>     equal
>      >     to the
>      >     +        * number of (parallel) engines involved in that
>     submission (See
>      >     +        * struct i915_context_engines_parallel_submit).
>      >     +        */
>      >     +       __u64 batch_address;
>      >     +
>      >     +       /** @flags: Currently reserved, MBZ */
>      >     +       __u64 flags;
>      >     +
>      >     +       /** @rsvd1: Reserved, MBZ */
>      >     +       __u32 rsvd1;
>      >     +
>      >     +       /** @fence_count: Number of fences in
>     @timeline_fences array. */
>      >     +       __u32 fence_count;
>      >     +
>      >     +       /**
>      >     +        * @timeline_fences: Pointer to an array of timeline
>     fences.
>      >     +        *
>      >     +        * Timeline fences are of format struct
>      >     drm_i915_gem_timeline_fence.
>      >     +        */
>      >     +       __u64 timeline_fences;
>      >     +
>      >     +       /** @rsvd2: Reserved, MBZ */
>      >     +       __u64 rsvd2;
>      >     +
>      >     +       /**
>      >     +        * @extensions: Zero-terminated chain of extensions.
>      >     +        *
>      >     +        * For future extensions. See struct i915_user_extension.
>      >     +        */
>      >     +       __u64 extensions;
>      >     +};
>      >     +
>      >     +/**
>      >     + * struct drm_i915_gem_create_ext_vm_private - Extension to make
>      >     the object
>      >     + * private to the specified VM.
>      >     + *
>      >     + * See struct drm_i915_gem_create_ext.
>      >     + */
>      >     +struct drm_i915_gem_create_ext_vm_private {
>      >     +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
>      >     +       /** @base: Extension link. See struct
>     i915_user_extension. */
>      >     +       struct i915_user_extension base;
>      >     +
>      >     +       /** @vm_id: Id of the VM to which the object is
>     private */
>      >     +       __u32 vm_id;
>      >     +};
>      >     --
>      >     2.21.0.rc0.32.g243a4c7e27
>      >
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-30 16:23           ` Matthew Auld
  0 siblings, 0 replies; 53+ messages in thread
From: Matthew Auld @ 2022-06-30 16:23 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Paulo Zanoni, Intel GFX, Maling list - DRI developers,
	Thomas Hellstrom, Chris Wilson, Daniel Vetter,
	Christian König

On 30/06/2022 16:34, Jason Ekstrand wrote:
> On Thu, Jun 30, 2022 at 10:14 AM Matthew Auld <matthew.auld@intel.com 
> <mailto:matthew.auld@intel.com>> wrote:
> 
>     On 30/06/2022 06:11, Jason Ekstrand wrote:
>      > On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura
>      > <niranjana.vishwanathapura@intel.com
>     <mailto:niranjana.vishwanathapura@intel.com>
>      > <mailto:niranjana.vishwanathapura@intel.com
>     <mailto:niranjana.vishwanathapura@intel.com>>> wrote:
>      >
>      >     VM_BIND and related uapi definitions
>      >
>      >     v2: Reduce the scope to simple Mesa use case.
>      >     v3: Expand VM_UNBIND documentation and add
>      >          I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>      >          and I915_GEM_VM_BIND_TLB_FLUSH flags.
>      >     v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>      >          documentation for vm_bind/unbind.
>      >     v5: Remove TLB flush requirement on VM_UNBIND.
>      >          Add version support to stage implementation.
>      >     v6: Define and use drm_i915_gem_timeline_fence structure for
>      >          all timeline fences.
>      >     v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>      >          Update documentation on async vm_bind/unbind and versioning.
>      >          Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>      >          batch_count field and I915_EXEC3_SECURE flag.
>      >
>      >     Signed-off-by: Niranjana Vishwanathapura
>      >     <niranjana.vishwanathapura@intel.com
>     <mailto:niranjana.vishwanathapura@intel.com>
>      >     <mailto:niranjana.vishwanathapura@intel.com
>     <mailto:niranjana.vishwanathapura@intel.com>>>
>      >     Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch
>     <mailto:daniel.vetter@ffwll.ch>
>      >     <mailto:daniel.vetter@ffwll.ch <mailto:daniel.vetter@ffwll.ch>>>
>      >     ---
>      >       Documentation/gpu/rfc/i915_vm_bind.h | 280
>     +++++++++++++++++++++++++++
>      >       1 file changed, 280 insertions(+)
>      >       create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>      >
>      >     diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
>      >     b/Documentation/gpu/rfc/i915_vm_bind.h
>      >     new file mode 100644
>      >     index 000000000000..a93e08bceee6
>      >     --- /dev/null
>      >     +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>      >     @@ -0,0 +1,280 @@
>      >     +/* SPDX-License-Identifier: MIT */
>      >     +/*
>      >     + * Copyright © 2022 Intel Corporation
>      >     + */
>      >     +
>      >     +/**
>      >     + * DOC: I915_PARAM_VM_BIND_VERSION
>      >     + *
>      >     + * VM_BIND feature version supported.
>      >     + * See typedef drm_i915_getparam_t param.
>      >     + *
>      >     + * Specifies the VM_BIND feature version supported.
>      >     + * The following versions of VM_BIND have been defined:
>      >     + *
>      >     + * 0: No VM_BIND support.
>      >     + *
>      >     + * 1: In VM_UNBIND calls, the UMD must specify the exact
>     mappings
>      >     created
>      >     + *    previously with VM_BIND, the ioctl will not support
>     unbinding
>      >     multiple
>      >     + *    mappings or splitting them. Similarly, VM_BIND calls
>     will not
>      >     replace
>      >     + *    any existing mappings.
>      >     + *
>      >     + * 2: The restrictions on unbinding partial or multiple
>     mappings is
>      >     + *    lifted, Similarly, binding will replace any mappings
>     in the
>      >     given range.
>      >     + *
>      >     + * See struct drm_i915_gem_vm_bind and struct
>     drm_i915_gem_vm_unbind.
>      >     + */
>      >     +#define I915_PARAM_VM_BIND_VERSION     57
>      >     +
>      >     +/**
>      >     + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>      >     + *
>      >     + * Flag to opt-in for VM_BIND mode of binding during VM
>     creation.
>      >     + * See struct drm_i915_gem_vm_control flags.
>      >     + *
>      >     + * The older execbuf2 ioctl will not support VM_BIND mode of
>     operation.
>      >     + * For VM_BIND mode, we have new execbuf3 ioctl which will not
>      >     accept any
>      >     + * execlist (See struct drm_i915_gem_execbuffer3 for more
>     details).
>      >     + */
>      >     +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
>      >     +
>      >     +/* VM_BIND related ioctls */
>      >     +#define DRM_I915_GEM_VM_BIND           0x3d
>      >     +#define DRM_I915_GEM_VM_UNBIND         0x3e
>      >     +#define DRM_I915_GEM_EXECBUFFER3       0x3f
>      >     +
>      >     +#define DRM_IOCTL_I915_GEM_VM_BIND
>      >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
>      >     drm_i915_gem_vm_bind)
>      >     +#define DRM_IOCTL_I915_GEM_VM_UNBIND
>      >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
>      >     drm_i915_gem_vm_bind)
>      >     +#define DRM_IOCTL_I915_GEM_EXECBUFFER3
>      >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
>      >     drm_i915_gem_execbuffer3)
>      >     +
>      >     +/**
>      >     + * struct drm_i915_gem_timeline_fence - An input or output
>     timeline
>      >     fence.
>      >     + *
>      >     + * The operation will wait for input fence to signal.
>      >     + *
>      >     + * The returned output fence will be signaled after the
>     completion
>      >     of the
>      >     + * operation.
>      >     + */
>      >     +struct drm_i915_gem_timeline_fence {
>      >     +       /** @handle: User's handle for a drm_syncobj to wait
>     on or
>      >     signal. */
>      >     +       __u32 handle;
>      >     +
>      >     +       /**
>      >     +        * @flags: Supported flags are:
>      >     +        *
>      >     +        * I915_TIMELINE_FENCE_WAIT:
>      >     +        * Wait for the input fence before the operation.
>      >     +        *
>      >     +        * I915_TIMELINE_FENCE_SIGNAL:
>      >     +        * Return operation completion fence as output.
>      >     +        */
>      >     +       __u32 flags;
>      >     +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>      >     +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>      >     +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS
>      >     (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>      >     +
>      >     +       /**
>      >     +        * @value: A point in the timeline.
>      >     +        * Value must be 0 for a binary drm_syncobj. A Value
>     of 0 for a
>      >     +        * timeline drm_syncobj is invalid as it turns a
>     drm_syncobj
>      >     into a
>      >     +        * binary one.
>      >     +        */
>      >     +       __u64 value;
>      >     +};
>      >     +
>      >     +/**
>      >     + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>      >     + *
>      >     + * This structure is passed to VM_BIND ioctl and specifies the
>      >     mapping of GPU
>      >     + * virtual address (VA) range to the section of an object that
>      >     should be bound
>      >     + * in the device page table of the specified address space (VM).
>      >     + * The VA range specified must be unique (ie., not currently
>     bound)
>      >     and can
>      >     + * be mapped to whole object or a section of the object (partial
>      >     binding).
>      >     + * Multiple VA mappings can be created to the same section
>     of the
>      >     object
>      >     + * (aliasing).
>      >     + *
>      >     + * The @start, @offset and @length must be 4K page aligned.
>     However
>      >     the DG2
>      >     + * and XEHPSDV has 64K page size for device local-memory and has
>      >     compact page
>      >     + * table. On those platforms, for binding device local-memory
>      >     objects, the
>      >     + * @start must be 2M aligned, @offset and @length must be
>     64K aligned.
>      >
>      >
>      > This is not acceptable.  We need 64K granularity.  This includes the
>      > starting address, the BO offset, and the length.  Why?  The tl;dr is
>      > that it's a requirement for about 50% of D3D12 apps if we want
>     them to
>      > run on Linux via D3D12.  A longer explanation follows.  I don't
>      > necessarily expect kernel folks to get all the details but hopefully
>      > I'll have left enough of a map that some of the Intel Mesa folks can
>      > help fill in details.
>      >
>      > Many modern D3D12 apps have a hard requirement on Tier2 tiled
>      > resources.  This is a feature that Intel has supported in the D3D12
>      > driver since Skylake.  In order to implement this feature, VKD3D
>      > requires the various sparseResidencyImage* and
>     sparseResidency*Sampled
>      > Vulkan features.  If we want those apps to work (there's getting
>     to be
>      > quite a few of them), we need to implement the Vulkan sparse
>     residency
>      > features.
>      > |
>      > |
>      > What is sparse residency?  I'm glad you asked!  The sparse residency
>      > features allow a client to separately bind each miplevel or array
>     slice
>      > of an image to a chunk of device memory independently, without
>     affecting
>      > any other areas of the image.  Once you get to a high enough
>     miplevel
>      > that everything fits inside a single sparse image block (that's a
>      > technical Vulkan term you can search for in the spec), you can
>     enter a
>      > "miptail" which contains all the remaining miplevels in a single
>     sparse
>      > image block.
>      >
>      > The term "sparse image block" is what the Vulkan spec uses.  On
>     Intel
>      > hardware and in the docs, it's what we call a "tile". 
>     Specifically, the
>      > image needs to use Yf or Ys tiling on SKL-TGL or a Tile64 on
>     DG2+.  This
>      > is because Tile4 and legacy X and Y-tiling don't provide any
>     guarantees
>      > about page alignment for slices.  Yf, Ys, and Tile64, on the
>     other hand,
>      > align all slices of the image to a tile boundary, allowing us to map
>      > memory to different slices independently, assuming we have 64K
>     (or 4K
>      > for Yf) VM_BIND granularity.  (4K isn't actually a requirement for
>      > SKL-TGL; we can use Ys all the time which has 64K tiles but
>     there's no
>      > reason to not support 4K alignments on integrated.)
>      >
>      > Someone may be tempted to ask, "Can't we wiggle the strides
>     around or
>      > something to make it work?"  I thought about that and no, you can't.
>      > The problem here is LOD2+.  Sure, you can have a stride such that
>     the
>      > image is a multiple of 2M worth of tiles across.  That'll work
>     fine for
>      > LOD0 and LOD1; both will be 2M aligned.  However, LOD2 won't be and
>      > there's no way to control that.  The hardware will place it to
>     the right
>      > of LOD1 by ROUND_UP(width, tile_width) pixels and there's nothing
>     you
>      > can do about that.  If that position doesn't happen to hit a 2M
>      > boundary, you're out of luck.
>      >
>      > I hope that explanation provides enough detail.  Sadly, this is
>     one of
>      > those things which has a lot of moving pieces all over different
>     bits of
>      > the hardware and various APIs and they all have to work together
>     just
>      > right for it to all come out in the end.  But, yeah, we really
>     need 64K
>      > aligned binding if we want VKD3D to work.
> 
>     Just to confirm, the new model would be to enforce 64K GTT alignment
>     for
>     lmem pages, and then for smem pages we would only require 4K alignment,
>     but with the added restriction that userspace will never try to mix the
>     two (lmem vs smem) within the same 2M va range (page-table). The kernel
>     will verify this and throw an error if needed. This model should work
>     with the above?
> 
> 
> Mesa doesn't have full control over BO placement so I don't think we can 
> guarantee quite as much as you want there.  We can guarantee, I think, 
> that we never place LMEM-only and SMEM-only in the same 2M block.  
> However, most BOs will be LMEM+SMEM (with a preference for LMEM) and 
> then it'll be up to the kernel to sort out any issues.  Is that reasonable?

That seems tricky for the lmem + smem case. On DG2 the hw design is such 
that you can't have 64K and 4K GTT pages within the same page-table, 
since the entire page-table is either operating in 64K or 4K GTT page 
mode (there is some special bit on the PDE that we need to toggle to 
turn on/off the 64K mode).

> 
> --Jason
> 
>      >
>      > --Jason
>      >
>      >     + * Also, for such mappings, i915 will reserve the whole 2M range
>      >     for it so as
>      >     + * to not allow multiple mappings in that 2M range (Compact page
>      >     tables do not
>      >     + * allow 64K page and 4K page bindings in the same 2M range).
>      >     + *
>      >     + * Error code -EINVAL will be returned if @start, @offset and
>      >     @length are not
>      >     + * properly aligned. In version 1 (See
>     I915_PARAM_VM_BIND_VERSION),
>      >     error code
>      >     + * -ENOSPC will be returned if the VA range specified can't be
>      >     reserved.
>      >     + *
>      >     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>      >     concurrently
>      >     + * are not ordered. Furthermore, parts of the VM_BIND
>     operation can
>      >     be done
>      >     + * asynchronously, if valid @fence is specified.
>      >     + */
>      >     +struct drm_i915_gem_vm_bind {
>      >     +       /** @vm_id: VM (address space) id to bind */
>      >     +       __u32 vm_id;
>      >     +
>      >     +       /** @handle: Object handle */
>      >     +       __u32 handle;
>      >     +
>      >     +       /** @start: Virtual Address start to bind */
>      >     +       __u64 start;
>      >     +
>      >     +       /** @offset: Offset in object to bind */
>      >     +       __u64 offset;
>      >     +
>      >     +       /** @length: Length of mapping to bind */
>      >     +       __u64 length;
>      >     +
>      >     +       /**
>      >     +        * @flags: Supported flags are:
>      >     +        *
>      >     +        * I915_GEM_VM_BIND_READONLY:
>      >     +        * Mapping is read-only.
>      >     +        *
>      >     +        * I915_GEM_VM_BIND_CAPTURE:
>      >     +        * Capture this mapping in the dump upon GPU error.
>      >     +        */
>      >     +       __u64 flags;
>      >     +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
>      >     +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
>      >     +
>      >     +       /**
>      >     +        * @fence: Timeline fence for bind completion signaling.
>      >     +        *
>      >     +        * It is an out fence, hence using
>     I915_TIMELINE_FENCE_WAIT flag
>      >     +        * is invalid, and an error will be returned.
>      >     +        */
>      >     +       struct drm_i915_gem_timeline_fence fence;
>      >     +
>      >     +       /**
>      >     +        * @extensions: Zero-terminated chain of extensions.
>      >     +        *
>      >     +        * For future extensions. See struct i915_user_extension.
>      >     +        */
>      >     +       __u64 extensions;
>      >     +};
>      >     +
>      >     +/**
>      >     + * struct drm_i915_gem_vm_unbind - VA to object mapping to
>     unbind.
>      >     + *
>      >     + * This structure is passed to VM_UNBIND ioctl and specifies the
>      >     GPU virtual
>      >     + * address (VA) range that should be unbound from the device
>     page
>      >     table of the
>      >     + * specified address space (VM). VM_UNBIND will force unbind the
>      >     specified
>      >     + * range from device page table without waiting for any GPU
>     job to
>      >     complete.
>      >     + * It is UMDs responsibility to ensure the mapping is no
>     longer in
>      >     use before
>      >     + * calling VM_UNBIND.
>      >     + *
>      >     + * If the specified mapping is not found, the ioctl will simply
>      >     return without
>      >     + * any error.
>      >     + *
>      >     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>      >     concurrently
>      >     + * are not ordered. Furthermore, parts of the VM_UNBIND
>     operation
>      >     can be done
>      >     + * asynchronously, if valid @fence is specified.
>      >     + */
>      >     +struct drm_i915_gem_vm_unbind {
>      >     +       /** @vm_id: VM (address space) id to bind */
>      >     +       __u32 vm_id;
>      >     +
>      >     +       /** @rsvd: Reserved, MBZ */
>      >     +       __u32 rsvd;
>      >     +
>      >     +       /** @start: Virtual Address start to unbind */
>      >     +       __u64 start;
>      >     +
>      >     +       /** @length: Length of mapping to unbind */
>      >     +       __u64 length;
>      >     +
>      >     +       /** @flags: Currently reserved, MBZ */
>      >     +       __u64 flags;
>      >     +
>      >     +       /**
>      >     +        * @fence: Timeline fence for unbind completion
>     signaling.
>      >     +        *
>      >     +        * It is an out fence, hence using
>     I915_TIMELINE_FENCE_WAIT flag
>      >     +        * is invalid, and an error will be returned.
>      >     +        */
>      >     +       struct drm_i915_gem_timeline_fence fence;
>      >     +
>      >     +       /**
>      >     +        * @extensions: Zero-terminated chain of extensions.
>      >     +        *
>      >     +        * For future extensions. See struct i915_user_extension.
>      >     +        */
>      >     +       __u64 extensions;
>      >     +};
>      >     +
>      >     +/**
>      >     + * struct drm_i915_gem_execbuffer3 - Structure for
>      >     DRM_I915_GEM_EXECBUFFER3
>      >     + * ioctl.
>      >     + *
>      >     + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and
>      >     VM_BIND mode
>      >     + * only works with this ioctl for submission.
>      >     + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>      >     + */
>      >     +struct drm_i915_gem_execbuffer3 {
>      >     +       /**
>      >     +        * @ctx_id: Context id
>      >     +        *
>      >     +        * Only contexts with user engine map are allowed.
>      >     +        */
>      >     +       __u32 ctx_id;
>      >     +
>      >     +       /**
>      >     +        * @engine_idx: Engine index
>      >     +        *
>      >     +        * An index in the user engine map of the context
>     specified
>      >     by @ctx_id.
>      >     +        */
>      >     +       __u32 engine_idx;
>      >     +
>      >     +       /**
>      >     +        * @batch_address: Batch gpu virtual address/es.
>      >     +        *
>      >     +        * For normal submission, it is the gpu virtual
>     address of
>      >     the batch
>      >     +        * buffer. For parallel submission, it is a pointer to an
>      >     array of
>      >     +        * batch buffer gpu virtual addresses with array size
>     equal
>      >     to the
>      >     +        * number of (parallel) engines involved in that
>     submission (See
>      >     +        * struct i915_context_engines_parallel_submit).
>      >     +        */
>      >     +       __u64 batch_address;
>      >     +
>      >     +       /** @flags: Currently reserved, MBZ */
>      >     +       __u64 flags;
>      >     +
>      >     +       /** @rsvd1: Reserved, MBZ */
>      >     +       __u32 rsvd1;
>      >     +
>      >     +       /** @fence_count: Number of fences in
>     @timeline_fences array. */
>      >     +       __u32 fence_count;
>      >     +
>      >     +       /**
>      >     +        * @timeline_fences: Pointer to an array of timeline
>     fences.
>      >     +        *
>      >     +        * Timeline fences are of format struct
>      >     drm_i915_gem_timeline_fence.
>      >     +        */
>      >     +       __u64 timeline_fences;
>      >     +
>      >     +       /** @rsvd2: Reserved, MBZ */
>      >     +       __u64 rsvd2;
>      >     +
>      >     +       /**
>      >     +        * @extensions: Zero-terminated chain of extensions.
>      >     +        *
>      >     +        * For future extensions. See struct i915_user_extension.
>      >     +        */
>      >     +       __u64 extensions;
>      >     +};
>      >     +
>      >     +/**
>      >     + * struct drm_i915_gem_create_ext_vm_private - Extension to make
>      >     the object
>      >     + * private to the specified VM.
>      >     + *
>      >     + * See struct drm_i915_gem_create_ext.
>      >     + */
>      >     +struct drm_i915_gem_create_ext_vm_private {
>      >     +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
>      >     +       /** @base: Extension link. See struct
>     i915_user_extension. */
>      >     +       struct i915_user_extension base;
>      >     +
>      >     +       /** @vm_id: Id of the VM to which the object is
>     private */
>      >     +       __u32 vm_id;
>      >     +};
>      >     --
>      >     2.21.0.rc0.32.g243a4c7e27
>      >
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30 16:18           ` [Intel-gfx] " Niranjana Vishwanathapura
@ 2022-06-30 17:12             ` Zanoni, Paulo R
  -1 siblings, 0 replies; 53+ messages in thread
From: Zanoni, Paulo R @ 2022-06-30 17:12 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Brost, Matthew, Wilson, Chris P, Landwerlin,  Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Hellstrom, Thomas, Zeng, Oak, Auld,
	Matthew, jason, Vetter, Daniel, christian.koenig

On Thu, 2022-06-30 at 09:18 -0700, Niranjana Vishwanathapura wrote:
> On Wed, Jun 29, 2022 at 11:39:52PM -0700, Zanoni, Paulo R wrote:
> > On Wed, 2022-06-29 at 23:08 -0700, Niranjana Vishwanathapura wrote:
> > > On Wed, Jun 29, 2022 at 05:33:49PM -0700, Zanoni, Paulo R wrote:
> > > > On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
> > > > > VM_BIND and related uapi definitions
> > > > > 
> > > > > v2: Reduce the scope to simple Mesa use case.
> > > > > v3: Expand VM_UNBIND documentation and add
> > > > >     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
> > > > >     and I915_GEM_VM_BIND_TLB_FLUSH flags.
> > > > > v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
> > > > >     documentation for vm_bind/unbind.
> > > > > v5: Remove TLB flush requirement on VM_UNBIND.
> > > > >     Add version support to stage implementation.
> > > > > v6: Define and use drm_i915_gem_timeline_fence structure for
> > > > >     all timeline fences.
> > > > > v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
> > > > >     Update documentation on async vm_bind/unbind and versioning.
> > > > >     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
> > > > >     batch_count field and I915_EXEC3_SECURE flag.
> > > > > 
> > > > > Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> > > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > ---
> > > > >  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
> > > > >  1 file changed, 280 insertions(+)
> > > > >  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> > > > > 
> > > > > diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
> > > > > new file mode 100644
> > > > > index 000000000000..a93e08bceee6
> > > > > --- /dev/null
> > > > > +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> > > > > @@ -0,0 +1,280 @@
> > > > > +/* SPDX-License-Identifier: MIT */
> > > > > +/*
> > > > > + * Copyright © 2022 Intel Corporation
> > > > > + */
> > > > > +
> > > > > +/**
> > > > > + * DOC: I915_PARAM_VM_BIND_VERSION
> > > > > + *
> > > > > + * VM_BIND feature version supported.
> > > > > + * See typedef drm_i915_getparam_t param.
> > > > > + *
> > > > > + * Specifies the VM_BIND feature version supported.
> > > > > + * The following versions of VM_BIND have been defined:
> > > > > + *
> > > > > + * 0: No VM_BIND support.
> > > > > + *
> > > > > + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
> > > > > + *    previously with VM_BIND, the ioctl will not support unbinding multiple
> > > > > + *    mappings or splitting them. Similarly, VM_BIND calls will not replace
> > > > > + *    any existing mappings.
> > > > > + *
> > > > > + * 2: The restrictions on unbinding partial or multiple mappings is
> > > > > + *    lifted, Similarly, binding will replace any mappings in the given range.
> > > > > + *
> > > > > + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
> > > > > + */
> > > > > +#define I915_PARAM_VM_BIND_VERSION   57
> > > > > +
> > > > > +/**
> > > > > + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> > > > > + *
> > > > > + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> > > > > + * See struct drm_i915_gem_vm_control flags.
> > > > > + *
> > > > > + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> > > > > + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
> > > > > + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> > > > > + */
> > > > > +#define I915_VM_CREATE_FLAGS_USE_VM_BIND     (1 << 0)
> > > > > +
> > > > > +/* VM_BIND related ioctls */
> > > > > +#define DRM_I915_GEM_VM_BIND         0x3d
> > > > > +#define DRM_I915_GEM_VM_UNBIND               0x3e
> > > > > +#define DRM_I915_GEM_EXECBUFFER3     0x3f
> > > > > +
> > > > > +#define DRM_IOCTL_I915_GEM_VM_BIND           DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> > > > > +#define DRM_IOCTL_I915_GEM_VM_UNBIND         DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> > > > > +#define DRM_IOCTL_I915_GEM_EXECBUFFER3               DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> > > > > +
> > > > > +/**
> > > > > + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> > > > > + *
> > > > > + * The operation will wait for input fence to signal.
> > > > > + *
> > > > > + * The returned output fence will be signaled after the completion of the
> > > > > + * operation.
> > > > > + */
> > > > > +struct drm_i915_gem_timeline_fence {
> > > > > +     /** @handle: User's handle for a drm_syncobj to wait on or signal. */
> > > > > +     __u32 handle;
> > > > > +
> > > > > +     /**
> > > > > +      * @flags: Supported flags are:
> > > > > +      *
> > > > > +      * I915_TIMELINE_FENCE_WAIT:
> > > > > +      * Wait for the input fence before the operation.
> > > > > +      *
> > > > > +      * I915_TIMELINE_FENCE_SIGNAL:
> > > > > +      * Return operation completion fence as output.
> > > > > +      */
> > > > > +     __u32 flags;
> > > > > +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> > > > > +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> > > > > +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
> > > > > +
> > > > > +     /**
> > > > > +      * @value: A point in the timeline.
> > > > > +      * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> > > > > +      * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> > > > > +      * binary one.
> > > > > +      */
> > > > > +     __u64 value;
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> > > > > + *
> > > > > + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
> > > > > + * virtual address (VA) range to the section of an object that should be bound
> > > > > + * in the device page table of the specified address space (VM).
> > > > > + * The VA range specified must be unique (ie., not currently bound) and can
> > > > > + * be mapped to whole object or a section of the object (partial binding).
> > > > > + * Multiple VA mappings can be created to the same section of the object
> > > > > + * (aliasing).
> > > > > + *
> > > > > + * The @start, @offset and @length must be 4K page aligned. However the DG2
> > > > > + * and XEHPSDV has 64K page size for device local-memory and has compact page
> > > > > + * table. On those platforms, for binding device local-memory objects, the
> > > > > + * @start must be 2M aligned, @offset and @length must be 64K aligned.
> > > > > + * Also, for such mappings, i915 will reserve the whole 2M range for it so as
> > > > > + * to not allow multiple mappings in that 2M range (Compact page tables do not
> > > > > + * allow 64K page and 4K page bindings in the same 2M range).
> > > > > + *
> > > > > + * Error code -EINVAL will be returned if @start, @offset and @length are not
> > > > > + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
> > > > > + * -ENOSPC will be returned if the VA range specified can't be reserved.
> > > > > + *
> > > > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> > > > > + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
> > > > > + * asynchronously, if valid @fence is specified.
> > > > 
> > > > Does that mean that if I don't provide @fence, then this ioctl will be
> > > > synchronous (i.e., when it returns, the memory will be guaranteed to be
> > > > bound)? The text is kinda implying that, but from one of your earlier
> > > > replies to Tvrtko, that doesn't seem to be the case. I guess we could
> > > > change the text to make this more explicit.
> > > > 
> > > 
> > > Yes, I thought, if user doesn't specify the out fence, KMD better make
> > > the ioctl synchronous by waiting until the binding finishes before
> > > returning. Otherwise, UMD has no way to ensure binding is complete and
> > > UMD must pass in out fence for VM_BIND calls.
> > > 
> > > But latest comment form Daniel on other thread might suggest something else.
> > > Daniel, can you comment?
> > 
> > Whatever we decide, let's make sure it's documented.
> > 
> > > 
> > > > In addition, previously we had the guarantee that an execbuf ioctl
> > > > would wait for all the pending vm_bind operations to finish before
> > > > doing anything. Do we still have this guarantee or do we have to make
> > > > use of the fences now?
> > > > 
> > > 
> > > No, we don't have that anymore (execbuf is decoupled from VM_BIND).
> > > Execbuf3 submission will not wait for any previous VM_BIND to finish.
> > > UMD must pass in VM_BIND out fence as in fence for execbuf3 to ensure
> > > that.
> > 
> > Got it, thanks.
> > 
> > > 
> > > > > + */
> > > > > +struct drm_i915_gem_vm_bind {
> > > > > +     /** @vm_id: VM (address space) id to bind */
> > > > > +     __u32 vm_id;
> > > > > +
> > > > > +     /** @handle: Object handle */
> > > > > +     __u32 handle;
> > > > > +
> > > > > +     /** @start: Virtual Address start to bind */
> > > > > +     __u64 start;
> > > > > +
> > > > > +     /** @offset: Offset in object to bind */
> > > > > +     __u64 offset;
> > > > > +
> > > > > +     /** @length: Length of mapping to bind */
> > > > > +     __u64 length;
> > > > > +
> > > > > +     /**
> > > > > +      * @flags: Supported flags are:
> > > > > +      *
> > > > > +      * I915_GEM_VM_BIND_READONLY:
> > > > > +      * Mapping is read-only.
> > > > 
> > > > Can you please explain what happens when we try to write to a range
> > > > that's bound as read-only?
> > > > 
> > > 
> > > It will be mapped as read-only in device page table. Hence any
> > > write access will fail. I would expect a CAT error reported.
> > 
> > What's a CAT error? Does this lead to machine freeze or a GPU hang?
> > Let's make sure we document this.
> > 
> 
> Catastrophic error.
> 
> > > 
> > > I am seeing that currently the page table R/W setting is based
> > > on whether BO is readonly or not (UMDs can request a userptr
> > > BO to readonly). We can make this READONLY here as a subset.
> > > ie., if BO is readonly, the mappings must be readonly. If BO
> > > is not readonly, then the mapping can be either readonly or
> > > not.
> > > 
> > > But if Mesa doesn't have a use for this, then we can remove
> > > this flag for now.
> > > 
> > 
> > I was considering using it for Vulkan's Sparse
> > residencyNonResidentStrict, so we map all unbound pages to a read-only
> > page. But for that to work, the required behavior would have to be:
> > reads all return zero, writes are ignored without any sort of error.
> > 
> > But maybe our hardware provides other ways to implement this, I haven't
> > checked yet.
> > 
> 
> I am not sure what the behavior is. Probably writes are not simply ignored,
> will check.
> Looks like we can remove this flag for now. We can always add it back
> later if we need it. Is that Ok with you?

I would prefer we keep it if that means writes will just be ignored
without anything exploding, because it would be very useful.

> 
> Niranjana
> 
> > 
> > > > 
> > > > > +      *
> > > > > +      * I915_GEM_VM_BIND_CAPTURE:
> > > > > +      * Capture this mapping in the dump upon GPU error.
> > > > > +      */
> > > > > +     __u64 flags;
> > > > > +#define I915_GEM_VM_BIND_READONLY    (1 << 1)
> > > > > +#define I915_GEM_VM_BIND_CAPTURE     (1 << 2)
> > > > > +
> > > > > +     /**
> > > > > +      * @fence: Timeline fence for bind completion signaling.
> > > > > +      *
> > > > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> > > > > +      * is invalid, and an error will be returned.
> > > > > +      */
> > > > > +     struct drm_i915_gem_timeline_fence fence;
> > > > > +
> > > > > +     /**
> > > > > +      * @extensions: Zero-terminated chain of extensions.
> > > > > +      *
> > > > > +      * For future extensions. See struct i915_user_extension.
> > > > > +      */
> > > > > +     __u64 extensions;
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> > > > > + *
> > > > > + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
> > > > > + * address (VA) range that should be unbound from the device page table of the
> > > > > + * specified address space (VM). VM_UNBIND will force unbind the specified
> > > > > + * range from device page table without waiting for any GPU job to complete.
> > > > > + * It is UMDs responsibility to ensure the mapping is no longer in use before
> > > > > + * calling VM_UNBIND.
> > > > > + *
> > > > > + * If the specified mapping is not found, the ioctl will simply return without
> > > > > + * any error.
> > > > > + *
> > > > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> > > > > + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
> > > > > + * asynchronously, if valid @fence is specified.
> > > > > + */
> > > > > +struct drm_i915_gem_vm_unbind {
> > > > > +     /** @vm_id: VM (address space) id to bind */
> > > > > +     __u32 vm_id;
> > > > > +
> > > > > +     /** @rsvd: Reserved, MBZ */
> > > > > +     __u32 rsvd;
> > > > > +
> > > > > +     /** @start: Virtual Address start to unbind */
> > > > > +     __u64 start;
> > > > > +
> > > > > +     /** @length: Length of mapping to unbind */
> > > > > +     __u64 length;
> > > > > +
> > > > > +     /** @flags: Currently reserved, MBZ */
> > > > > +     __u64 flags;
> > > > > +
> > > > > +     /**
> > > > > +      * @fence: Timeline fence for unbind completion signaling.
> > > > > +      *
> > > > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> > > > > +      * is invalid, and an error will be returned.
> > > > > +      */
> > > > > +     struct drm_i915_gem_timeline_fence fence;
> > > > > +
> > > > > +     /**
> > > > > +      * @extensions: Zero-terminated chain of extensions.
> > > > > +      *
> > > > > +      * For future extensions. See struct i915_user_extension.
> > > > > +      */
> > > > > +     __u64 extensions;
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
> > > > > + * ioctl.
> > > > > + *
> > > > > + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
> > > > > + * only works with this ioctl for submission.
> > > > > + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
> > > > > + */
> > > > > +struct drm_i915_gem_execbuffer3 {
> > > > > +     /**
> > > > > +      * @ctx_id: Context id
> > > > > +      *
> > > > > +      * Only contexts with user engine map are allowed.
> > > > > +      */
> > > > > +     __u32 ctx_id;
> > > > > +
> > > > > +     /**
> > > > > +      * @engine_idx: Engine index
> > > > > +      *
> > > > > +      * An index in the user engine map of the context specified by @ctx_id.
> > > > > +      */
> > > > > +     __u32 engine_idx;
> > > > > +
> > > > > +     /**
> > > > > +      * @batch_address: Batch gpu virtual address/es.
> > > > > +      *
> > > > > +      * For normal submission, it is the gpu virtual address of the batch
> > > > > +      * buffer. For parallel submission, it is a pointer to an array of
> > > > > +      * batch buffer gpu virtual addresses with array size equal to the
> > > > > +      * number of (parallel) engines involved in that submission (See
> > > > > +      * struct i915_context_engines_parallel_submit).
> > > > > +      */
> > > > > +     __u64 batch_address;
> > > > > +
> > > > > +     /** @flags: Currently reserved, MBZ */
> > > > > +     __u64 flags;
> > > > > +
> > > > > +     /** @rsvd1: Reserved, MBZ */
> > > > > +     __u32 rsvd1;
> > > > > +
> > > > > +     /** @fence_count: Number of fences in @timeline_fences array. */
> > > > > +     __u32 fence_count;
> > > > > +
> > > > > +     /**
> > > > > +      * @timeline_fences: Pointer to an array of timeline fences.
> > > > > +      *
> > > > > +      * Timeline fences are of format struct drm_i915_gem_timeline_fence.
> > > > > +      */
> > > > > +     __u64 timeline_fences;
> > > > > +
> > > > > +     /** @rsvd2: Reserved, MBZ */
> > > > > +     __u64 rsvd2;
> > > > > +
> > > > 
> > > > Just out of curiosity: if we can extend behavior with @extensions and
> > > > even @flags, why would we need a rsvd2? Perhaps we could kill rsvd2?
> > > > 
> > > 
> > > True. I added it just in case some requests came up that would require
> > > some additional fields. During this review process itself there were
> > > some requests. Adding directly here should have a slight performance
> > > edge over adding it as an extension (one less copy_from_user).
> > > 
> > > But if folks think this is an overkill, I will remove it.
> > 
> > I do not have strong opinions here, I'm just curious.
> > 
> > Thanks,
> > Paulo
> > 
> > > 
> > > Niranjana
> > > 
> > > > > +     /**
> > > > > +      * @extensions: Zero-terminated chain of extensions.
> > > > > +      *
> > > > > +      * For future extensions. See struct i915_user_extension.
> > > > > +      */
> > > > > +     __u64 extensions;
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
> > > > > + * private to the specified VM.
> > > > > + *
> > > > > + * See struct drm_i915_gem_create_ext.
> > > > > + */
> > > > > +struct drm_i915_gem_create_ext_vm_private {
> > > > > +#define I915_GEM_CREATE_EXT_VM_PRIVATE               2
> > > > > +     /** @base: Extension link. See struct i915_user_extension. */
> > > > > +     struct i915_user_extension base;
> > > > > +
> > > > > +     /** @vm_id: Id of the VM to which the object is private */
> > > > > +     __u32 vm_id;
> > > > > +};
> > > > 
> > 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-30 17:12             ` Zanoni, Paulo R
  0 siblings, 0 replies; 53+ messages in thread
From: Zanoni, Paulo R @ 2022-06-30 17:12 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Wilson, Chris P, intel-gfx, dri-devel, Hellstrom, Thomas, Auld,
	Matthew, Vetter, Daniel, christian.koenig

On Thu, 2022-06-30 at 09:18 -0700, Niranjana Vishwanathapura wrote:
> On Wed, Jun 29, 2022 at 11:39:52PM -0700, Zanoni, Paulo R wrote:
> > On Wed, 2022-06-29 at 23:08 -0700, Niranjana Vishwanathapura wrote:
> > > On Wed, Jun 29, 2022 at 05:33:49PM -0700, Zanoni, Paulo R wrote:
> > > > On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
> > > > > VM_BIND and related uapi definitions
> > > > > 
> > > > > v2: Reduce the scope to simple Mesa use case.
> > > > > v3: Expand VM_UNBIND documentation and add
> > > > >     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
> > > > >     and I915_GEM_VM_BIND_TLB_FLUSH flags.
> > > > > v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
> > > > >     documentation for vm_bind/unbind.
> > > > > v5: Remove TLB flush requirement on VM_UNBIND.
> > > > >     Add version support to stage implementation.
> > > > > v6: Define and use drm_i915_gem_timeline_fence structure for
> > > > >     all timeline fences.
> > > > > v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
> > > > >     Update documentation on async vm_bind/unbind and versioning.
> > > > >     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
> > > > >     batch_count field and I915_EXEC3_SECURE flag.
> > > > > 
> > > > > Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> > > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > ---
> > > > >  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
> > > > >  1 file changed, 280 insertions(+)
> > > > >  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> > > > > 
> > > > > diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
> > > > > new file mode 100644
> > > > > index 000000000000..a93e08bceee6
> > > > > --- /dev/null
> > > > > +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> > > > > @@ -0,0 +1,280 @@
> > > > > +/* SPDX-License-Identifier: MIT */
> > > > > +/*
> > > > > + * Copyright © 2022 Intel Corporation
> > > > > + */
> > > > > +
> > > > > +/**
> > > > > + * DOC: I915_PARAM_VM_BIND_VERSION
> > > > > + *
> > > > > + * VM_BIND feature version supported.
> > > > > + * See typedef drm_i915_getparam_t param.
> > > > > + *
> > > > > + * Specifies the VM_BIND feature version supported.
> > > > > + * The following versions of VM_BIND have been defined:
> > > > > + *
> > > > > + * 0: No VM_BIND support.
> > > > > + *
> > > > > + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
> > > > > + *    previously with VM_BIND, the ioctl will not support unbinding multiple
> > > > > + *    mappings or splitting them. Similarly, VM_BIND calls will not replace
> > > > > + *    any existing mappings.
> > > > > + *
> > > > > + * 2: The restrictions on unbinding partial or multiple mappings is
> > > > > + *    lifted, Similarly, binding will replace any mappings in the given range.
> > > > > + *
> > > > > + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
> > > > > + */
> > > > > +#define I915_PARAM_VM_BIND_VERSION   57
> > > > > +
> > > > > +/**
> > > > > + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> > > > > + *
> > > > > + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> > > > > + * See struct drm_i915_gem_vm_control flags.
> > > > > + *
> > > > > + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> > > > > + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
> > > > > + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> > > > > + */
> > > > > +#define I915_VM_CREATE_FLAGS_USE_VM_BIND     (1 << 0)
> > > > > +
> > > > > +/* VM_BIND related ioctls */
> > > > > +#define DRM_I915_GEM_VM_BIND         0x3d
> > > > > +#define DRM_I915_GEM_VM_UNBIND               0x3e
> > > > > +#define DRM_I915_GEM_EXECBUFFER3     0x3f
> > > > > +
> > > > > +#define DRM_IOCTL_I915_GEM_VM_BIND           DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> > > > > +#define DRM_IOCTL_I915_GEM_VM_UNBIND         DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> > > > > +#define DRM_IOCTL_I915_GEM_EXECBUFFER3               DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> > > > > +
> > > > > +/**
> > > > > + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> > > > > + *
> > > > > + * The operation will wait for input fence to signal.
> > > > > + *
> > > > > + * The returned output fence will be signaled after the completion of the
> > > > > + * operation.
> > > > > + */
> > > > > +struct drm_i915_gem_timeline_fence {
> > > > > +     /** @handle: User's handle for a drm_syncobj to wait on or signal. */
> > > > > +     __u32 handle;
> > > > > +
> > > > > +     /**
> > > > > +      * @flags: Supported flags are:
> > > > > +      *
> > > > > +      * I915_TIMELINE_FENCE_WAIT:
> > > > > +      * Wait for the input fence before the operation.
> > > > > +      *
> > > > > +      * I915_TIMELINE_FENCE_SIGNAL:
> > > > > +      * Return operation completion fence as output.
> > > > > +      */
> > > > > +     __u32 flags;
> > > > > +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
> > > > > +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
> > > > > +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
> > > > > +
> > > > > +     /**
> > > > > +      * @value: A point in the timeline.
> > > > > +      * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> > > > > +      * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> > > > > +      * binary one.
> > > > > +      */
> > > > > +     __u64 value;
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> > > > > + *
> > > > > + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
> > > > > + * virtual address (VA) range to the section of an object that should be bound
> > > > > + * in the device page table of the specified address space (VM).
> > > > > + * The VA range specified must be unique (ie., not currently bound) and can
> > > > > + * be mapped to whole object or a section of the object (partial binding).
> > > > > + * Multiple VA mappings can be created to the same section of the object
> > > > > + * (aliasing).
> > > > > + *
> > > > > + * The @start, @offset and @length must be 4K page aligned. However the DG2
> > > > > + * and XEHPSDV has 64K page size for device local-memory and has compact page
> > > > > + * table. On those platforms, for binding device local-memory objects, the
> > > > > + * @start must be 2M aligned, @offset and @length must be 64K aligned.
> > > > > + * Also, for such mappings, i915 will reserve the whole 2M range for it so as
> > > > > + * to not allow multiple mappings in that 2M range (Compact page tables do not
> > > > > + * allow 64K page and 4K page bindings in the same 2M range).
> > > > > + *
> > > > > + * Error code -EINVAL will be returned if @start, @offset and @length are not
> > > > > + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
> > > > > + * -ENOSPC will be returned if the VA range specified can't be reserved.
> > > > > + *
> > > > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> > > > > + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
> > > > > + * asynchronously, if valid @fence is specified.
> > > > 
> > > > Does that mean that if I don't provide @fence, then this ioctl will be
> > > > synchronous (i.e., when it returns, the memory will be guaranteed to be
> > > > bound)? The text is kinda implying that, but from one of your earlier
> > > > replies to Tvrtko, that doesn't seem to be the case. I guess we could
> > > > change the text to make this more explicit.
> > > > 
> > > 
> > > Yes, I thought, if user doesn't specify the out fence, KMD better make
> > > the ioctl synchronous by waiting until the binding finishes before
> > > returning. Otherwise, UMD has no way to ensure binding is complete and
> > > UMD must pass in out fence for VM_BIND calls.
> > > 
> > > But latest comment form Daniel on other thread might suggest something else.
> > > Daniel, can you comment?
> > 
> > Whatever we decide, let's make sure it's documented.
> > 
> > > 
> > > > In addition, previously we had the guarantee that an execbuf ioctl
> > > > would wait for all the pending vm_bind operations to finish before
> > > > doing anything. Do we still have this guarantee or do we have to make
> > > > use of the fences now?
> > > > 
> > > 
> > > No, we don't have that anymore (execbuf is decoupled from VM_BIND).
> > > Execbuf3 submission will not wait for any previous VM_BIND to finish.
> > > UMD must pass in VM_BIND out fence as in fence for execbuf3 to ensure
> > > that.
> > 
> > Got it, thanks.
> > 
> > > 
> > > > > + */
> > > > > +struct drm_i915_gem_vm_bind {
> > > > > +     /** @vm_id: VM (address space) id to bind */
> > > > > +     __u32 vm_id;
> > > > > +
> > > > > +     /** @handle: Object handle */
> > > > > +     __u32 handle;
> > > > > +
> > > > > +     /** @start: Virtual Address start to bind */
> > > > > +     __u64 start;
> > > > > +
> > > > > +     /** @offset: Offset in object to bind */
> > > > > +     __u64 offset;
> > > > > +
> > > > > +     /** @length: Length of mapping to bind */
> > > > > +     __u64 length;
> > > > > +
> > > > > +     /**
> > > > > +      * @flags: Supported flags are:
> > > > > +      *
> > > > > +      * I915_GEM_VM_BIND_READONLY:
> > > > > +      * Mapping is read-only.
> > > > 
> > > > Can you please explain what happens when we try to write to a range
> > > > that's bound as read-only?
> > > > 
> > > 
> > > It will be mapped as read-only in device page table. Hence any
> > > write access will fail. I would expect a CAT error reported.
> > 
> > What's a CAT error? Does this lead to machine freeze or a GPU hang?
> > Let's make sure we document this.
> > 
> 
> Catastrophic error.
> 
> > > 
> > > I am seeing that currently the page table R/W setting is based
> > > on whether BO is readonly or not (UMDs can request a userptr
> > > BO to readonly). We can make this READONLY here as a subset.
> > > ie., if BO is readonly, the mappings must be readonly. If BO
> > > is not readonly, then the mapping can be either readonly or
> > > not.
> > > 
> > > But if Mesa doesn't have a use for this, then we can remove
> > > this flag for now.
> > > 
> > 
> > I was considering using it for Vulkan's Sparse
> > residencyNonResidentStrict, so we map all unbound pages to a read-only
> > page. But for that to work, the required behavior would have to be:
> > reads all return zero, writes are ignored without any sort of error.
> > 
> > But maybe our hardware provides other ways to implement this, I haven't
> > checked yet.
> > 
> 
> I am not sure what the behavior is. Probably writes are not simply ignored,
> will check.
> Looks like we can remove this flag for now. We can always add it back
> later if we need it. Is that Ok with you?

I would prefer we keep it if that means writes will just be ignored
without anything exploding, because it would be very useful.

> 
> Niranjana
> 
> > 
> > > > 
> > > > > +      *
> > > > > +      * I915_GEM_VM_BIND_CAPTURE:
> > > > > +      * Capture this mapping in the dump upon GPU error.
> > > > > +      */
> > > > > +     __u64 flags;
> > > > > +#define I915_GEM_VM_BIND_READONLY    (1 << 1)
> > > > > +#define I915_GEM_VM_BIND_CAPTURE     (1 << 2)
> > > > > +
> > > > > +     /**
> > > > > +      * @fence: Timeline fence for bind completion signaling.
> > > > > +      *
> > > > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> > > > > +      * is invalid, and an error will be returned.
> > > > > +      */
> > > > > +     struct drm_i915_gem_timeline_fence fence;
> > > > > +
> > > > > +     /**
> > > > > +      * @extensions: Zero-terminated chain of extensions.
> > > > > +      *
> > > > > +      * For future extensions. See struct i915_user_extension.
> > > > > +      */
> > > > > +     __u64 extensions;
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
> > > > > + *
> > > > > + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
> > > > > + * address (VA) range that should be unbound from the device page table of the
> > > > > + * specified address space (VM). VM_UNBIND will force unbind the specified
> > > > > + * range from device page table without waiting for any GPU job to complete.
> > > > > + * It is UMDs responsibility to ensure the mapping is no longer in use before
> > > > > + * calling VM_UNBIND.
> > > > > + *
> > > > > + * If the specified mapping is not found, the ioctl will simply return without
> > > > > + * any error.
> > > > > + *
> > > > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
> > > > > + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
> > > > > + * asynchronously, if valid @fence is specified.
> > > > > + */
> > > > > +struct drm_i915_gem_vm_unbind {
> > > > > +     /** @vm_id: VM (address space) id to bind */
> > > > > +     __u32 vm_id;
> > > > > +
> > > > > +     /** @rsvd: Reserved, MBZ */
> > > > > +     __u32 rsvd;
> > > > > +
> > > > > +     /** @start: Virtual Address start to unbind */
> > > > > +     __u64 start;
> > > > > +
> > > > > +     /** @length: Length of mapping to unbind */
> > > > > +     __u64 length;
> > > > > +
> > > > > +     /** @flags: Currently reserved, MBZ */
> > > > > +     __u64 flags;
> > > > > +
> > > > > +     /**
> > > > > +      * @fence: Timeline fence for unbind completion signaling.
> > > > > +      *
> > > > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
> > > > > +      * is invalid, and an error will be returned.
> > > > > +      */
> > > > > +     struct drm_i915_gem_timeline_fence fence;
> > > > > +
> > > > > +     /**
> > > > > +      * @extensions: Zero-terminated chain of extensions.
> > > > > +      *
> > > > > +      * For future extensions. See struct i915_user_extension.
> > > > > +      */
> > > > > +     __u64 extensions;
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
> > > > > + * ioctl.
> > > > > + *
> > > > > + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
> > > > > + * only works with this ioctl for submission.
> > > > > + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
> > > > > + */
> > > > > +struct drm_i915_gem_execbuffer3 {
> > > > > +     /**
> > > > > +      * @ctx_id: Context id
> > > > > +      *
> > > > > +      * Only contexts with user engine map are allowed.
> > > > > +      */
> > > > > +     __u32 ctx_id;
> > > > > +
> > > > > +     /**
> > > > > +      * @engine_idx: Engine index
> > > > > +      *
> > > > > +      * An index in the user engine map of the context specified by @ctx_id.
> > > > > +      */
> > > > > +     __u32 engine_idx;
> > > > > +
> > > > > +     /**
> > > > > +      * @batch_address: Batch gpu virtual address/es.
> > > > > +      *
> > > > > +      * For normal submission, it is the gpu virtual address of the batch
> > > > > +      * buffer. For parallel submission, it is a pointer to an array of
> > > > > +      * batch buffer gpu virtual addresses with array size equal to the
> > > > > +      * number of (parallel) engines involved in that submission (See
> > > > > +      * struct i915_context_engines_parallel_submit).
> > > > > +      */
> > > > > +     __u64 batch_address;
> > > > > +
> > > > > +     /** @flags: Currently reserved, MBZ */
> > > > > +     __u64 flags;
> > > > > +
> > > > > +     /** @rsvd1: Reserved, MBZ */
> > > > > +     __u32 rsvd1;
> > > > > +
> > > > > +     /** @fence_count: Number of fences in @timeline_fences array. */
> > > > > +     __u32 fence_count;
> > > > > +
> > > > > +     /**
> > > > > +      * @timeline_fences: Pointer to an array of timeline fences.
> > > > > +      *
> > > > > +      * Timeline fences are of format struct drm_i915_gem_timeline_fence.
> > > > > +      */
> > > > > +     __u64 timeline_fences;
> > > > > +
> > > > > +     /** @rsvd2: Reserved, MBZ */
> > > > > +     __u64 rsvd2;
> > > > > +
> > > > 
> > > > Just out of curiosity: if we can extend behavior with @extensions and
> > > > even @flags, why would we need a rsvd2? Perhaps we could kill rsvd2?
> > > > 
> > > 
> > > True. I added it just in case some requests came up that would require
> > > some additional fields. During this review process itself there were
> > > some requests. Adding directly here should have a slight performance
> > > edge over adding it as an extension (one less copy_from_user).
> > > 
> > > But if folks think this is an overkill, I will remove it.
> > 
> > I do not have strong opinions here, I'm just curious.
> > 
> > Thanks,
> > Paulo
> > 
> > > 
> > > Niranjana
> > > 
> > > > > +     /**
> > > > > +      * @extensions: Zero-terminated chain of extensions.
> > > > > +      *
> > > > > +      * For future extensions. See struct i915_user_extension.
> > > > > +      */
> > > > > +     __u64 extensions;
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
> > > > > + * private to the specified VM.
> > > > > + *
> > > > > + * See struct drm_i915_gem_create_ext.
> > > > > + */
> > > > > +struct drm_i915_gem_create_ext_vm_private {
> > > > > +#define I915_GEM_CREATE_EXT_VM_PRIVATE               2
> > > > > +     /** @base: Extension link. See struct i915_user_extension. */
> > > > > +     struct i915_user_extension base;
> > > > > +
> > > > > +     /** @vm_id: Id of the VM to which the object is private */
> > > > > +     __u32 vm_id;
> > > > > +};
> > > > 
> > 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30 17:12             ` [Intel-gfx] " Zanoni, Paulo R
@ 2022-06-30 18:30               ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-30 18:30 UTC (permalink / raw)
  To: Zanoni, Paulo R
  Cc: Brost, Matthew, Wilson, Chris P, Landwerlin, Lionel G, Ursulin,
	Tvrtko, intel-gfx, dri-devel, Hellstrom, Thomas, Zeng, Oak, Auld,
	Matthew, jason, Vetter, Daniel, christian.koenig

On Thu, Jun 30, 2022 at 10:12:47AM -0700, Zanoni, Paulo R wrote:
>On Thu, 2022-06-30 at 09:18 -0700, Niranjana Vishwanathapura wrote:
>> On Wed, Jun 29, 2022 at 11:39:52PM -0700, Zanoni, Paulo R wrote:
>> > On Wed, 2022-06-29 at 23:08 -0700, Niranjana Vishwanathapura wrote:
>> > > On Wed, Jun 29, 2022 at 05:33:49PM -0700, Zanoni, Paulo R wrote:
>> > > > On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
>> > > > > VM_BIND and related uapi definitions
>> > > > >
>> > > > > v2: Reduce the scope to simple Mesa use case.
>> > > > > v3: Expand VM_UNBIND documentation and add
>> > > > >     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>> > > > >     and I915_GEM_VM_BIND_TLB_FLUSH flags.
>> > > > > v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>> > > > >     documentation for vm_bind/unbind.
>> > > > > v5: Remove TLB flush requirement on VM_UNBIND.
>> > > > >     Add version support to stage implementation.
>> > > > > v6: Define and use drm_i915_gem_timeline_fence structure for
>> > > > >     all timeline fences.
>> > > > > v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>> > > > >     Update documentation on async vm_bind/unbind and versioning.
>> > > > >     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>> > > > >     batch_count field and I915_EXEC3_SECURE flag.
>> > > > >
>> > > > > Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> > > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> > > > > ---
>> > > > >  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>> > > > >  1 file changed, 280 insertions(+)
>> > > > >  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>> > > > >
>> > > > > diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
>> > > > > new file mode 100644
>> > > > > index 000000000000..a93e08bceee6
>> > > > > --- /dev/null
>> > > > > +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>> > > > > @@ -0,0 +1,280 @@
>> > > > > +/* SPDX-License-Identifier: MIT */
>> > > > > +/*
>> > > > > + * Copyright © 2022 Intel Corporation
>> > > > > + */
>> > > > > +
>> > > > > +/**
>> > > > > + * DOC: I915_PARAM_VM_BIND_VERSION
>> > > > > + *
>> > > > > + * VM_BIND feature version supported.
>> > > > > + * See typedef drm_i915_getparam_t param.
>> > > > > + *
>> > > > > + * Specifies the VM_BIND feature version supported.
>> > > > > + * The following versions of VM_BIND have been defined:
>> > > > > + *
>> > > > > + * 0: No VM_BIND support.
>> > > > > + *
>> > > > > + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
>> > > > > + *    previously with VM_BIND, the ioctl will not support unbinding multiple
>> > > > > + *    mappings or splitting them. Similarly, VM_BIND calls will not replace
>> > > > > + *    any existing mappings.
>> > > > > + *
>> > > > > + * 2: The restrictions on unbinding partial or multiple mappings is
>> > > > > + *    lifted, Similarly, binding will replace any mappings in the given range.
>> > > > > + *
>> > > > > + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>> > > > > + */
>> > > > > +#define I915_PARAM_VM_BIND_VERSION   57
>> > > > > +
>> > > > > +/**
>> > > > > + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>> > > > > + *
>> > > > > + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>> > > > > + * See struct drm_i915_gem_vm_control flags.
>> > > > > + *
>> > > > > + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>> > > > > + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
>> > > > > + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>> > > > > + */
>> > > > > +#define I915_VM_CREATE_FLAGS_USE_VM_BIND     (1 << 0)
>> > > > > +
>> > > > > +/* VM_BIND related ioctls */
>> > > > > +#define DRM_I915_GEM_VM_BIND         0x3d
>> > > > > +#define DRM_I915_GEM_VM_UNBIND               0x3e
>> > > > > +#define DRM_I915_GEM_EXECBUFFER3     0x3f
>> > > > > +
>> > > > > +#define DRM_IOCTL_I915_GEM_VM_BIND           DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
>> > > > > +#define DRM_IOCTL_I915_GEM_VM_UNBIND         DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
>> > > > > +#define DRM_IOCTL_I915_GEM_EXECBUFFER3               DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
>> > > > > +
>> > > > > +/**
>> > > > > + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
>> > > > > + *
>> > > > > + * The operation will wait for input fence to signal.
>> > > > > + *
>> > > > > + * The returned output fence will be signaled after the completion of the
>> > > > > + * operation.
>> > > > > + */
>> > > > > +struct drm_i915_gem_timeline_fence {
>> > > > > +     /** @handle: User's handle for a drm_syncobj to wait on or signal. */
>> > > > > +     __u32 handle;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @flags: Supported flags are:
>> > > > > +      *
>> > > > > +      * I915_TIMELINE_FENCE_WAIT:
>> > > > > +      * Wait for the input fence before the operation.
>> > > > > +      *
>> > > > > +      * I915_TIMELINE_FENCE_SIGNAL:
>> > > > > +      * Return operation completion fence as output.
>> > > > > +      */
>> > > > > +     __u32 flags;
>> > > > > +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>> > > > > +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>> > > > > +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @value: A point in the timeline.
>> > > > > +      * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>> > > > > +      * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
>> > > > > +      * binary one.
>> > > > > +      */
>> > > > > +     __u64 value;
>> > > > > +};
>> > > > > +
>> > > > > +/**
>> > > > > + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>> > > > > + *
>> > > > > + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
>> > > > > + * virtual address (VA) range to the section of an object that should be bound
>> > > > > + * in the device page table of the specified address space (VM).
>> > > > > + * The VA range specified must be unique (ie., not currently bound) and can
>> > > > > + * be mapped to whole object or a section of the object (partial binding).
>> > > > > + * Multiple VA mappings can be created to the same section of the object
>> > > > > + * (aliasing).
>> > > > > + *
>> > > > > + * The @start, @offset and @length must be 4K page aligned. However the DG2
>> > > > > + * and XEHPSDV has 64K page size for device local-memory and has compact page
>> > > > > + * table. On those platforms, for binding device local-memory objects, the
>> > > > > + * @start must be 2M aligned, @offset and @length must be 64K aligned.
>> > > > > + * Also, for such mappings, i915 will reserve the whole 2M range for it so as
>> > > > > + * to not allow multiple mappings in that 2M range (Compact page tables do not
>> > > > > + * allow 64K page and 4K page bindings in the same 2M range).
>> > > > > + *
>> > > > > + * Error code -EINVAL will be returned if @start, @offset and @length are not
>> > > > > + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
>> > > > > + * -ENOSPC will be returned if the VA range specified can't be reserved.
>> > > > > + *
>> > > > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>> > > > > + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
>> > > > > + * asynchronously, if valid @fence is specified.
>> > > >
>> > > > Does that mean that if I don't provide @fence, then this ioctl will be
>> > > > synchronous (i.e., when it returns, the memory will be guaranteed to be
>> > > > bound)? The text is kinda implying that, but from one of your earlier
>> > > > replies to Tvrtko, that doesn't seem to be the case. I guess we could
>> > > > change the text to make this more explicit.
>> > > >
>> > >
>> > > Yes, I thought, if user doesn't specify the out fence, KMD better make
>> > > the ioctl synchronous by waiting until the binding finishes before
>> > > returning. Otherwise, UMD has no way to ensure binding is complete and
>> > > UMD must pass in out fence for VM_BIND calls.
>> > >
>> > > But latest comment form Daniel on other thread might suggest something else.
>> > > Daniel, can you comment?
>> >
>> > Whatever we decide, let's make sure it's documented.
>> >
>> > >
>> > > > In addition, previously we had the guarantee that an execbuf ioctl
>> > > > would wait for all the pending vm_bind operations to finish before
>> > > > doing anything. Do we still have this guarantee or do we have to make
>> > > > use of the fences now?
>> > > >
>> > >
>> > > No, we don't have that anymore (execbuf is decoupled from VM_BIND).
>> > > Execbuf3 submission will not wait for any previous VM_BIND to finish.
>> > > UMD must pass in VM_BIND out fence as in fence for execbuf3 to ensure
>> > > that.
>> >
>> > Got it, thanks.
>> >
>> > >
>> > > > > + */
>> > > > > +struct drm_i915_gem_vm_bind {
>> > > > > +     /** @vm_id: VM (address space) id to bind */
>> > > > > +     __u32 vm_id;
>> > > > > +
>> > > > > +     /** @handle: Object handle */
>> > > > > +     __u32 handle;
>> > > > > +
>> > > > > +     /** @start: Virtual Address start to bind */
>> > > > > +     __u64 start;
>> > > > > +
>> > > > > +     /** @offset: Offset in object to bind */
>> > > > > +     __u64 offset;
>> > > > > +
>> > > > > +     /** @length: Length of mapping to bind */
>> > > > > +     __u64 length;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @flags: Supported flags are:
>> > > > > +      *
>> > > > > +      * I915_GEM_VM_BIND_READONLY:
>> > > > > +      * Mapping is read-only.
>> > > >
>> > > > Can you please explain what happens when we try to write to a range
>> > > > that's bound as read-only?
>> > > >
>> > >
>> > > It will be mapped as read-only in device page table. Hence any
>> > > write access will fail. I would expect a CAT error reported.
>> >
>> > What's a CAT error? Does this lead to machine freeze or a GPU hang?
>> > Let's make sure we document this.
>> >
>>
>> Catastrophic error.
>>
>> > >
>> > > I am seeing that currently the page table R/W setting is based
>> > > on whether BO is readonly or not (UMDs can request a userptr
>> > > BO to readonly). We can make this READONLY here as a subset.
>> > > ie., if BO is readonly, the mappings must be readonly. If BO
>> > > is not readonly, then the mapping can be either readonly or
>> > > not.
>> > >
>> > > But if Mesa doesn't have a use for this, then we can remove
>> > > this flag for now.
>> > >
>> >
>> > I was considering using it for Vulkan's Sparse
>> > residencyNonResidentStrict, so we map all unbound pages to a read-only
>> > page. But for that to work, the required behavior would have to be:
>> > reads all return zero, writes are ignored without any sort of error.
>> >
>> > But maybe our hardware provides other ways to implement this, I haven't
>> > checked yet.
>> >
>>
>> I am not sure what the behavior is. Probably writes are not simply ignored,
>> will check.
>> Looks like we can remove this flag for now. We can always add it back
>> later if we need it. Is that Ok with you?
>
>I would prefer we keep it if that means writes will just be ignored
>without anything exploding, because it would be very useful.
>

I tried it and writes are not simply ignored and guc reported errors.
So, will remove it for now.

Niranjana

>>
>> Niranjana
>>
>> >
>> > > >
>> > > > > +      *
>> > > > > +      * I915_GEM_VM_BIND_CAPTURE:
>> > > > > +      * Capture this mapping in the dump upon GPU error.
>> > > > > +      */
>> > > > > +     __u64 flags;
>> > > > > +#define I915_GEM_VM_BIND_READONLY    (1 << 1)
>> > > > > +#define I915_GEM_VM_BIND_CAPTURE     (1 << 2)
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @fence: Timeline fence for bind completion signaling.
>> > > > > +      *
>> > > > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>> > > > > +      * is invalid, and an error will be returned.
>> > > > > +      */
>> > > > > +     struct drm_i915_gem_timeline_fence fence;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @extensions: Zero-terminated chain of extensions.
>> > > > > +      *
>> > > > > +      * For future extensions. See struct i915_user_extension.
>> > > > > +      */
>> > > > > +     __u64 extensions;
>> > > > > +};
>> > > > > +
>> > > > > +/**
>> > > > > + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>> > > > > + *
>> > > > > + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
>> > > > > + * address (VA) range that should be unbound from the device page table of the
>> > > > > + * specified address space (VM). VM_UNBIND will force unbind the specified
>> > > > > + * range from device page table without waiting for any GPU job to complete.
>> > > > > + * It is UMDs responsibility to ensure the mapping is no longer in use before
>> > > > > + * calling VM_UNBIND.
>> > > > > + *
>> > > > > + * If the specified mapping is not found, the ioctl will simply return without
>> > > > > + * any error.
>> > > > > + *
>> > > > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>> > > > > + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
>> > > > > + * asynchronously, if valid @fence is specified.
>> > > > > + */
>> > > > > +struct drm_i915_gem_vm_unbind {
>> > > > > +     /** @vm_id: VM (address space) id to bind */
>> > > > > +     __u32 vm_id;
>> > > > > +
>> > > > > +     /** @rsvd: Reserved, MBZ */
>> > > > > +     __u32 rsvd;
>> > > > > +
>> > > > > +     /** @start: Virtual Address start to unbind */
>> > > > > +     __u64 start;
>> > > > > +
>> > > > > +     /** @length: Length of mapping to unbind */
>> > > > > +     __u64 length;
>> > > > > +
>> > > > > +     /** @flags: Currently reserved, MBZ */
>> > > > > +     __u64 flags;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @fence: Timeline fence for unbind completion signaling.
>> > > > > +      *
>> > > > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>> > > > > +      * is invalid, and an error will be returned.
>> > > > > +      */
>> > > > > +     struct drm_i915_gem_timeline_fence fence;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @extensions: Zero-terminated chain of extensions.
>> > > > > +      *
>> > > > > +      * For future extensions. See struct i915_user_extension.
>> > > > > +      */
>> > > > > +     __u64 extensions;
>> > > > > +};
>> > > > > +
>> > > > > +/**
>> > > > > + * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
>> > > > > + * ioctl.
>> > > > > + *
>> > > > > + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
>> > > > > + * only works with this ioctl for submission.
>> > > > > + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>> > > > > + */
>> > > > > +struct drm_i915_gem_execbuffer3 {
>> > > > > +     /**
>> > > > > +      * @ctx_id: Context id
>> > > > > +      *
>> > > > > +      * Only contexts with user engine map are allowed.
>> > > > > +      */
>> > > > > +     __u32 ctx_id;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @engine_idx: Engine index
>> > > > > +      *
>> > > > > +      * An index in the user engine map of the context specified by @ctx_id.
>> > > > > +      */
>> > > > > +     __u32 engine_idx;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @batch_address: Batch gpu virtual address/es.
>> > > > > +      *
>> > > > > +      * For normal submission, it is the gpu virtual address of the batch
>> > > > > +      * buffer. For parallel submission, it is a pointer to an array of
>> > > > > +      * batch buffer gpu virtual addresses with array size equal to the
>> > > > > +      * number of (parallel) engines involved in that submission (See
>> > > > > +      * struct i915_context_engines_parallel_submit).
>> > > > > +      */
>> > > > > +     __u64 batch_address;
>> > > > > +
>> > > > > +     /** @flags: Currently reserved, MBZ */
>> > > > > +     __u64 flags;
>> > > > > +
>> > > > > +     /** @rsvd1: Reserved, MBZ */
>> > > > > +     __u32 rsvd1;
>> > > > > +
>> > > > > +     /** @fence_count: Number of fences in @timeline_fences array. */
>> > > > > +     __u32 fence_count;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @timeline_fences: Pointer to an array of timeline fences.
>> > > > > +      *
>> > > > > +      * Timeline fences are of format struct drm_i915_gem_timeline_fence.
>> > > > > +      */
>> > > > > +     __u64 timeline_fences;
>> > > > > +
>> > > > > +     /** @rsvd2: Reserved, MBZ */
>> > > > > +     __u64 rsvd2;
>> > > > > +
>> > > >
>> > > > Just out of curiosity: if we can extend behavior with @extensions and
>> > > > even @flags, why would we need a rsvd2? Perhaps we could kill rsvd2?
>> > > >
>> > >
>> > > True. I added it just in case some requests came up that would require
>> > > some additional fields. During this review process itself there were
>> > > some requests. Adding directly here should have a slight performance
>> > > edge over adding it as an extension (one less copy_from_user).
>> > >
>> > > But if folks think this is an overkill, I will remove it.
>> >
>> > I do not have strong opinions here, I'm just curious.
>> >
>> > Thanks,
>> > Paulo
>> >
>> > >
>> > > Niranjana
>> > >
>> > > > > +     /**
>> > > > > +      * @extensions: Zero-terminated chain of extensions.
>> > > > > +      *
>> > > > > +      * For future extensions. See struct i915_user_extension.
>> > > > > +      */
>> > > > > +     __u64 extensions;
>> > > > > +};
>> > > > > +
>> > > > > +/**
>> > > > > + * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
>> > > > > + * private to the specified VM.
>> > > > > + *
>> > > > > + * See struct drm_i915_gem_create_ext.
>> > > > > + */
>> > > > > +struct drm_i915_gem_create_ext_vm_private {
>> > > > > +#define I915_GEM_CREATE_EXT_VM_PRIVATE               2
>> > > > > +     /** @base: Extension link. See struct i915_user_extension. */
>> > > > > +     struct i915_user_extension base;
>> > > > > +
>> > > > > +     /** @vm_id: Id of the VM to which the object is private */
>> > > > > +     __u32 vm_id;
>> > > > > +};
>> > > >
>> >
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-30 18:30               ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-30 18:30 UTC (permalink / raw)
  To: Zanoni, Paulo R
  Cc: Wilson, Chris P, intel-gfx, dri-devel, Hellstrom, Thomas, Auld,
	Matthew, Vetter, Daniel, christian.koenig

On Thu, Jun 30, 2022 at 10:12:47AM -0700, Zanoni, Paulo R wrote:
>On Thu, 2022-06-30 at 09:18 -0700, Niranjana Vishwanathapura wrote:
>> On Wed, Jun 29, 2022 at 11:39:52PM -0700, Zanoni, Paulo R wrote:
>> > On Wed, 2022-06-29 at 23:08 -0700, Niranjana Vishwanathapura wrote:
>> > > On Wed, Jun 29, 2022 at 05:33:49PM -0700, Zanoni, Paulo R wrote:
>> > > > On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
>> > > > > VM_BIND and related uapi definitions
>> > > > >
>> > > > > v2: Reduce the scope to simple Mesa use case.
>> > > > > v3: Expand VM_UNBIND documentation and add
>> > > > >     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>> > > > >     and I915_GEM_VM_BIND_TLB_FLUSH flags.
>> > > > > v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>> > > > >     documentation for vm_bind/unbind.
>> > > > > v5: Remove TLB flush requirement on VM_UNBIND.
>> > > > >     Add version support to stage implementation.
>> > > > > v6: Define and use drm_i915_gem_timeline_fence structure for
>> > > > >     all timeline fences.
>> > > > > v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>> > > > >     Update documentation on async vm_bind/unbind and versioning.
>> > > > >     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>> > > > >     batch_count field and I915_EXEC3_SECURE flag.
>> > > > >
>> > > > > Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> > > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> > > > > ---
>> > > > >  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>> > > > >  1 file changed, 280 insertions(+)
>> > > > >  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>> > > > >
>> > > > > diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h
>> > > > > new file mode 100644
>> > > > > index 000000000000..a93e08bceee6
>> > > > > --- /dev/null
>> > > > > +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>> > > > > @@ -0,0 +1,280 @@
>> > > > > +/* SPDX-License-Identifier: MIT */
>> > > > > +/*
>> > > > > + * Copyright © 2022 Intel Corporation
>> > > > > + */
>> > > > > +
>> > > > > +/**
>> > > > > + * DOC: I915_PARAM_VM_BIND_VERSION
>> > > > > + *
>> > > > > + * VM_BIND feature version supported.
>> > > > > + * See typedef drm_i915_getparam_t param.
>> > > > > + *
>> > > > > + * Specifies the VM_BIND feature version supported.
>> > > > > + * The following versions of VM_BIND have been defined:
>> > > > > + *
>> > > > > + * 0: No VM_BIND support.
>> > > > > + *
>> > > > > + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
>> > > > > + *    previously with VM_BIND, the ioctl will not support unbinding multiple
>> > > > > + *    mappings or splitting them. Similarly, VM_BIND calls will not replace
>> > > > > + *    any existing mappings.
>> > > > > + *
>> > > > > + * 2: The restrictions on unbinding partial or multiple mappings is
>> > > > > + *    lifted, Similarly, binding will replace any mappings in the given range.
>> > > > > + *
>> > > > > + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>> > > > > + */
>> > > > > +#define I915_PARAM_VM_BIND_VERSION   57
>> > > > > +
>> > > > > +/**
>> > > > > + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>> > > > > + *
>> > > > > + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>> > > > > + * See struct drm_i915_gem_vm_control flags.
>> > > > > + *
>> > > > > + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>> > > > > + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
>> > > > > + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>> > > > > + */
>> > > > > +#define I915_VM_CREATE_FLAGS_USE_VM_BIND     (1 << 0)
>> > > > > +
>> > > > > +/* VM_BIND related ioctls */
>> > > > > +#define DRM_I915_GEM_VM_BIND         0x3d
>> > > > > +#define DRM_I915_GEM_VM_UNBIND               0x3e
>> > > > > +#define DRM_I915_GEM_EXECBUFFER3     0x3f
>> > > > > +
>> > > > > +#define DRM_IOCTL_I915_GEM_VM_BIND           DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
>> > > > > +#define DRM_IOCTL_I915_GEM_VM_UNBIND         DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
>> > > > > +#define DRM_IOCTL_I915_GEM_EXECBUFFER3               DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
>> > > > > +
>> > > > > +/**
>> > > > > + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
>> > > > > + *
>> > > > > + * The operation will wait for input fence to signal.
>> > > > > + *
>> > > > > + * The returned output fence will be signaled after the completion of the
>> > > > > + * operation.
>> > > > > + */
>> > > > > +struct drm_i915_gem_timeline_fence {
>> > > > > +     /** @handle: User's handle for a drm_syncobj to wait on or signal. */
>> > > > > +     __u32 handle;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @flags: Supported flags are:
>> > > > > +      *
>> > > > > +      * I915_TIMELINE_FENCE_WAIT:
>> > > > > +      * Wait for the input fence before the operation.
>> > > > > +      *
>> > > > > +      * I915_TIMELINE_FENCE_SIGNAL:
>> > > > > +      * Return operation completion fence as output.
>> > > > > +      */
>> > > > > +     __u32 flags;
>> > > > > +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>> > > > > +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>> > > > > +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @value: A point in the timeline.
>> > > > > +      * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>> > > > > +      * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
>> > > > > +      * binary one.
>> > > > > +      */
>> > > > > +     __u64 value;
>> > > > > +};
>> > > > > +
>> > > > > +/**
>> > > > > + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>> > > > > + *
>> > > > > + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
>> > > > > + * virtual address (VA) range to the section of an object that should be bound
>> > > > > + * in the device page table of the specified address space (VM).
>> > > > > + * The VA range specified must be unique (ie., not currently bound) and can
>> > > > > + * be mapped to whole object or a section of the object (partial binding).
>> > > > > + * Multiple VA mappings can be created to the same section of the object
>> > > > > + * (aliasing).
>> > > > > + *
>> > > > > + * The @start, @offset and @length must be 4K page aligned. However the DG2
>> > > > > + * and XEHPSDV has 64K page size for device local-memory and has compact page
>> > > > > + * table. On those platforms, for binding device local-memory objects, the
>> > > > > + * @start must be 2M aligned, @offset and @length must be 64K aligned.
>> > > > > + * Also, for such mappings, i915 will reserve the whole 2M range for it so as
>> > > > > + * to not allow multiple mappings in that 2M range (Compact page tables do not
>> > > > > + * allow 64K page and 4K page bindings in the same 2M range).
>> > > > > + *
>> > > > > + * Error code -EINVAL will be returned if @start, @offset and @length are not
>> > > > > + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION), error code
>> > > > > + * -ENOSPC will be returned if the VA range specified can't be reserved.
>> > > > > + *
>> > > > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>> > > > > + * are not ordered. Furthermore, parts of the VM_BIND operation can be done
>> > > > > + * asynchronously, if valid @fence is specified.
>> > > >
>> > > > Does that mean that if I don't provide @fence, then this ioctl will be
>> > > > synchronous (i.e., when it returns, the memory will be guaranteed to be
>> > > > bound)? The text is kinda implying that, but from one of your earlier
>> > > > replies to Tvrtko, that doesn't seem to be the case. I guess we could
>> > > > change the text to make this more explicit.
>> > > >
>> > >
>> > > Yes, I thought, if user doesn't specify the out fence, KMD better make
>> > > the ioctl synchronous by waiting until the binding finishes before
>> > > returning. Otherwise, UMD has no way to ensure binding is complete and
>> > > UMD must pass in out fence for VM_BIND calls.
>> > >
>> > > But latest comment form Daniel on other thread might suggest something else.
>> > > Daniel, can you comment?
>> >
>> > Whatever we decide, let's make sure it's documented.
>> >
>> > >
>> > > > In addition, previously we had the guarantee that an execbuf ioctl
>> > > > would wait for all the pending vm_bind operations to finish before
>> > > > doing anything. Do we still have this guarantee or do we have to make
>> > > > use of the fences now?
>> > > >
>> > >
>> > > No, we don't have that anymore (execbuf is decoupled from VM_BIND).
>> > > Execbuf3 submission will not wait for any previous VM_BIND to finish.
>> > > UMD must pass in VM_BIND out fence as in fence for execbuf3 to ensure
>> > > that.
>> >
>> > Got it, thanks.
>> >
>> > >
>> > > > > + */
>> > > > > +struct drm_i915_gem_vm_bind {
>> > > > > +     /** @vm_id: VM (address space) id to bind */
>> > > > > +     __u32 vm_id;
>> > > > > +
>> > > > > +     /** @handle: Object handle */
>> > > > > +     __u32 handle;
>> > > > > +
>> > > > > +     /** @start: Virtual Address start to bind */
>> > > > > +     __u64 start;
>> > > > > +
>> > > > > +     /** @offset: Offset in object to bind */
>> > > > > +     __u64 offset;
>> > > > > +
>> > > > > +     /** @length: Length of mapping to bind */
>> > > > > +     __u64 length;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @flags: Supported flags are:
>> > > > > +      *
>> > > > > +      * I915_GEM_VM_BIND_READONLY:
>> > > > > +      * Mapping is read-only.
>> > > >
>> > > > Can you please explain what happens when we try to write to a range
>> > > > that's bound as read-only?
>> > > >
>> > >
>> > > It will be mapped as read-only in device page table. Hence any
>> > > write access will fail. I would expect a CAT error reported.
>> >
>> > What's a CAT error? Does this lead to machine freeze or a GPU hang?
>> > Let's make sure we document this.
>> >
>>
>> Catastrophic error.
>>
>> > >
>> > > I am seeing that currently the page table R/W setting is based
>> > > on whether BO is readonly or not (UMDs can request a userptr
>> > > BO to readonly). We can make this READONLY here as a subset.
>> > > ie., if BO is readonly, the mappings must be readonly. If BO
>> > > is not readonly, then the mapping can be either readonly or
>> > > not.
>> > >
>> > > But if Mesa doesn't have a use for this, then we can remove
>> > > this flag for now.
>> > >
>> >
>> > I was considering using it for Vulkan's Sparse
>> > residencyNonResidentStrict, so we map all unbound pages to a read-only
>> > page. But for that to work, the required behavior would have to be:
>> > reads all return zero, writes are ignored without any sort of error.
>> >
>> > But maybe our hardware provides other ways to implement this, I haven't
>> > checked yet.
>> >
>>
>> I am not sure what the behavior is. Probably writes are not simply ignored,
>> will check.
>> Looks like we can remove this flag for now. We can always add it back
>> later if we need it. Is that Ok with you?
>
>I would prefer we keep it if that means writes will just be ignored
>without anything exploding, because it would be very useful.
>

I tried it and writes are not simply ignored and guc reported errors.
So, will remove it for now.

Niranjana

>>
>> Niranjana
>>
>> >
>> > > >
>> > > > > +      *
>> > > > > +      * I915_GEM_VM_BIND_CAPTURE:
>> > > > > +      * Capture this mapping in the dump upon GPU error.
>> > > > > +      */
>> > > > > +     __u64 flags;
>> > > > > +#define I915_GEM_VM_BIND_READONLY    (1 << 1)
>> > > > > +#define I915_GEM_VM_BIND_CAPTURE     (1 << 2)
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @fence: Timeline fence for bind completion signaling.
>> > > > > +      *
>> > > > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>> > > > > +      * is invalid, and an error will be returned.
>> > > > > +      */
>> > > > > +     struct drm_i915_gem_timeline_fence fence;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @extensions: Zero-terminated chain of extensions.
>> > > > > +      *
>> > > > > +      * For future extensions. See struct i915_user_extension.
>> > > > > +      */
>> > > > > +     __u64 extensions;
>> > > > > +};
>> > > > > +
>> > > > > +/**
>> > > > > + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>> > > > > + *
>> > > > > + * This structure is passed to VM_UNBIND ioctl and specifies the GPU virtual
>> > > > > + * address (VA) range that should be unbound from the device page table of the
>> > > > > + * specified address space (VM). VM_UNBIND will force unbind the specified
>> > > > > + * range from device page table without waiting for any GPU job to complete.
>> > > > > + * It is UMDs responsibility to ensure the mapping is no longer in use before
>> > > > > + * calling VM_UNBIND.
>> > > > > + *
>> > > > > + * If the specified mapping is not found, the ioctl will simply return without
>> > > > > + * any error.
>> > > > > + *
>> > > > > + * VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently
>> > > > > + * are not ordered. Furthermore, parts of the VM_UNBIND operation can be done
>> > > > > + * asynchronously, if valid @fence is specified.
>> > > > > + */
>> > > > > +struct drm_i915_gem_vm_unbind {
>> > > > > +     /** @vm_id: VM (address space) id to bind */
>> > > > > +     __u32 vm_id;
>> > > > > +
>> > > > > +     /** @rsvd: Reserved, MBZ */
>> > > > > +     __u32 rsvd;
>> > > > > +
>> > > > > +     /** @start: Virtual Address start to unbind */
>> > > > > +     __u64 start;
>> > > > > +
>> > > > > +     /** @length: Length of mapping to unbind */
>> > > > > +     __u64 length;
>> > > > > +
>> > > > > +     /** @flags: Currently reserved, MBZ */
>> > > > > +     __u64 flags;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @fence: Timeline fence for unbind completion signaling.
>> > > > > +      *
>> > > > > +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>> > > > > +      * is invalid, and an error will be returned.
>> > > > > +      */
>> > > > > +     struct drm_i915_gem_timeline_fence fence;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @extensions: Zero-terminated chain of extensions.
>> > > > > +      *
>> > > > > +      * For future extensions. See struct i915_user_extension.
>> > > > > +      */
>> > > > > +     __u64 extensions;
>> > > > > +};
>> > > > > +
>> > > > > +/**
>> > > > > + * struct drm_i915_gem_execbuffer3 - Structure for DRM_I915_GEM_EXECBUFFER3
>> > > > > + * ioctl.
>> > > > > + *
>> > > > > + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and VM_BIND mode
>> > > > > + * only works with this ioctl for submission.
>> > > > > + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>> > > > > + */
>> > > > > +struct drm_i915_gem_execbuffer3 {
>> > > > > +     /**
>> > > > > +      * @ctx_id: Context id
>> > > > > +      *
>> > > > > +      * Only contexts with user engine map are allowed.
>> > > > > +      */
>> > > > > +     __u32 ctx_id;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @engine_idx: Engine index
>> > > > > +      *
>> > > > > +      * An index in the user engine map of the context specified by @ctx_id.
>> > > > > +      */
>> > > > > +     __u32 engine_idx;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @batch_address: Batch gpu virtual address/es.
>> > > > > +      *
>> > > > > +      * For normal submission, it is the gpu virtual address of the batch
>> > > > > +      * buffer. For parallel submission, it is a pointer to an array of
>> > > > > +      * batch buffer gpu virtual addresses with array size equal to the
>> > > > > +      * number of (parallel) engines involved in that submission (See
>> > > > > +      * struct i915_context_engines_parallel_submit).
>> > > > > +      */
>> > > > > +     __u64 batch_address;
>> > > > > +
>> > > > > +     /** @flags: Currently reserved, MBZ */
>> > > > > +     __u64 flags;
>> > > > > +
>> > > > > +     /** @rsvd1: Reserved, MBZ */
>> > > > > +     __u32 rsvd1;
>> > > > > +
>> > > > > +     /** @fence_count: Number of fences in @timeline_fences array. */
>> > > > > +     __u32 fence_count;
>> > > > > +
>> > > > > +     /**
>> > > > > +      * @timeline_fences: Pointer to an array of timeline fences.
>> > > > > +      *
>> > > > > +      * Timeline fences are of format struct drm_i915_gem_timeline_fence.
>> > > > > +      */
>> > > > > +     __u64 timeline_fences;
>> > > > > +
>> > > > > +     /** @rsvd2: Reserved, MBZ */
>> > > > > +     __u64 rsvd2;
>> > > > > +
>> > > >
>> > > > Just out of curiosity: if we can extend behavior with @extensions and
>> > > > even @flags, why would we need a rsvd2? Perhaps we could kill rsvd2?
>> > > >
>> > >
>> > > True. I added it just in case some requests came up that would require
>> > > some additional fields. During this review process itself there were
>> > > some requests. Adding directly here should have a slight performance
>> > > edge over adding it as an extension (one less copy_from_user).
>> > >
>> > > But if folks think this is an overkill, I will remove it.
>> >
>> > I do not have strong opinions here, I'm just curious.
>> >
>> > Thanks,
>> > Paulo
>> >
>> > >
>> > > Niranjana
>> > >
>> > > > > +     /**
>> > > > > +      * @extensions: Zero-terminated chain of extensions.
>> > > > > +      *
>> > > > > +      * For future extensions. See struct i915_user_extension.
>> > > > > +      */
>> > > > > +     __u64 extensions;
>> > > > > +};
>> > > > > +
>> > > > > +/**
>> > > > > + * struct drm_i915_gem_create_ext_vm_private - Extension to make the object
>> > > > > + * private to the specified VM.
>> > > > > + *
>> > > > > + * See struct drm_i915_gem_create_ext.
>> > > > > + */
>> > > > > +struct drm_i915_gem_create_ext_vm_private {
>> > > > > +#define I915_GEM_CREATE_EXT_VM_PRIVATE               2
>> > > > > +     /** @base: Extension link. See struct i915_user_extension. */
>> > > > > +     struct i915_user_extension base;
>> > > > > +
>> > > > > +     /** @vm_id: Id of the VM to which the object is private */
>> > > > > +     __u32 vm_id;
>> > > > > +};
>> > > >
>> >
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30 15:45     ` [Intel-gfx] " Jason Ekstrand
@ 2022-06-30 18:32       ` Niranjana Vishwanathapura
  -1 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-30 18:32 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Matthew Brost, Paulo Zanoni, Maling list - DRI developers,
	Tvrtko Ursulin, Intel GFX, Chris Wilson, Thomas Hellstrom,
	oak.zeng, Lionel Landwerlin, Daniel Vetter, Christian König,
	Matthew Auld

On Thu, Jun 30, 2022 at 10:45:12AM -0500, Jason Ekstrand wrote:
>   On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura
>   <niranjana.vishwanathapura@intel.com> wrote:
>
>     VM_BIND and related uapi definitions
>
>     v2: Reduce the scope to simple Mesa use case.
>     v3: Expand VM_UNBIND documentation and add
>         I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>         and I915_GEM_VM_BIND_TLB_FLUSH flags.
>     v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>         documentation for vm_bind/unbind.
>     v5: Remove TLB flush requirement on VM_UNBIND.
>         Add version support to stage implementation.
>     v6: Define and use drm_i915_gem_timeline_fence structure for
>         all timeline fences.
>     v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>         Update documentation on async vm_bind/unbind and versioning.
>         Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>         batch_count field and I915_EXEC3_SECURE flag.
>
>     Signed-off-by: Niranjana Vishwanathapura
>     <niranjana.vishwanathapura@intel.com>
>     Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>     ---
>      Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>      1 file changed, 280 insertions(+)
>      create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>
>     diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
>     b/Documentation/gpu/rfc/i915_vm_bind.h
>     new file mode 100644
>     index 000000000000..a93e08bceee6
>     --- /dev/null
>     +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>     @@ -0,0 +1,280 @@
>     +/* SPDX-License-Identifier: MIT */
>     +/*
>     + * Copyright © 2022 Intel Corporation
>     + */
>     +
>     +/**
>     + * DOC: I915_PARAM_VM_BIND_VERSION
>     + *
>     + * VM_BIND feature version supported.
>     + * See typedef drm_i915_getparam_t param.
>     + *
>     + * Specifies the VM_BIND feature version supported.
>     + * The following versions of VM_BIND have been defined:
>     + *
>     + * 0: No VM_BIND support.
>     + *
>     + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
>     created
>     + *    previously with VM_BIND, the ioctl will not support unbinding
>     multiple
>     + *    mappings or splitting them. Similarly, VM_BIND calls will not
>     replace
>     + *    any existing mappings.
>     + *
>     + * 2: The restrictions on unbinding partial or multiple mappings is
>     + *    lifted, Similarly, binding will replace any mappings in the given
>     range.
>     + *
>     + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>     + */
>     +#define I915_PARAM_VM_BIND_VERSION     57
>     +
>     +/**
>     + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>     + *
>     + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>     + * See struct drm_i915_gem_vm_control flags.
>     + *
>     + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>     + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept
>     any
>     + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>     + */
>     +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
>     +
>     +/* VM_BIND related ioctls */
>     +#define DRM_I915_GEM_VM_BIND           0x3d
>     +#define DRM_I915_GEM_VM_UNBIND         0x3e
>     +#define DRM_I915_GEM_EXECBUFFER3       0x3f
>     +
>     +#define DRM_IOCTL_I915_GEM_VM_BIND           
>      DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
>     drm_i915_gem_vm_bind)
>     +#define DRM_IOCTL_I915_GEM_VM_UNBIND         
>      DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
>     drm_i915_gem_vm_bind)
>     +#define DRM_IOCTL_I915_GEM_EXECBUFFER3       
>      DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
>     drm_i915_gem_execbuffer3)
>     +
>     +/**
>     + * struct drm_i915_gem_timeline_fence - An input or output timeline
>     fence.
>     + *
>     + * The operation will wait for input fence to signal.
>     + *
>     + * The returned output fence will be signaled after the completion of
>     the
>     + * operation.
>     + */
>     +struct drm_i915_gem_timeline_fence {
>     +       /** @handle: User's handle for a drm_syncobj to wait on or
>     signal. */
>     +       __u32 handle;
>     +
>     +       /**
>     +        * @flags: Supported flags are:
>     +        *
>     +        * I915_TIMELINE_FENCE_WAIT:
>     +        * Wait for the input fence before the operation.
>     +        *
>     +        * I915_TIMELINE_FENCE_SIGNAL:
>     +        * Return operation completion fence as output.
>     +        */
>     +       __u32 flags;
>     +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>     +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>     +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS
>     (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>     +
>     +       /**
>     +        * @value: A point in the timeline.
>     +        * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>     +        * timeline drm_syncobj is invalid as it turns a drm_syncobj
>     into a
>     +        * binary one.
>     +        */
>     +       __u64 value;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>     + *
>     + * This structure is passed to VM_BIND ioctl and specifies the mapping
>     of GPU
>     + * virtual address (VA) range to the section of an object that should
>     be bound
>     + * in the device page table of the specified address space (VM).
>     + * The VA range specified must be unique (ie., not currently bound) and
>     can
>     + * be mapped to whole object or a section of the object (partial
>     binding).
>     + * Multiple VA mappings can be created to the same section of the
>     object
>     + * (aliasing).
>     + *
>     + * The @start, @offset and @length must be 4K page aligned. However the
>     DG2
>     + * and XEHPSDV has 64K page size for device local-memory and has
>     compact page
>     + * table. On those platforms, for binding device local-memory objects,
>     the
>     + * @start must be 2M aligned, @offset and @length must be 64K aligned.
>     + * Also, for such mappings, i915 will reserve the whole 2M range for it
>     so as
>     + * to not allow multiple mappings in that 2M range (Compact page tables
>     do not
>     + * allow 64K page and 4K page bindings in the same 2M range).
>     + *
>     + * Error code -EINVAL will be returned if @start, @offset and @length
>     are not
>     + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION),
>     error code
>     + * -ENOSPC will be returned if the VA range specified can't be
>     reserved.
>     + *
>     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>     concurrently
>     + * are not ordered. Furthermore, parts of the VM_BIND operation can be
>     done
>     + * asynchronously, if valid @fence is specified.
>     + */
>     +struct drm_i915_gem_vm_bind {
>     +       /** @vm_id: VM (address space) id to bind */
>     +       __u32 vm_id;
>     +
>     +       /** @handle: Object handle */
>     +       __u32 handle;
>     +
>     +       /** @start: Virtual Address start to bind */
>     +       __u64 start;
>     +
>     +       /** @offset: Offset in object to bind */
>     +       __u64 offset;
>     +
>     +       /** @length: Length of mapping to bind */
>     +       __u64 length;
>     +
>     +       /**
>     +        * @flags: Supported flags are:
>     +        *
>     +        * I915_GEM_VM_BIND_READONLY:
>     +        * Mapping is read-only.
>     +        *
>     +        * I915_GEM_VM_BIND_CAPTURE:
>     +        * Capture this mapping in the dump upon GPU error.
>     +        */
>     +       __u64 flags;
>     +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
>     +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
>     +
>     +       /**
>     +        * @fence: Timeline fence for bind completion signaling.
>     +        *
>     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>     +        * is invalid, and an error will be returned.
>     +        */
>     +       struct drm_i915_gem_timeline_fence fence;
>
>   Why a single fence and not an array of fences?  If Mesa wants to use the
>   out fences for signalling VkSemaphores on the sparse binding queue, we
>   need N of them.  We can still have the "zero fences means block" behavior.

It was discussed and decided to keep it simple with single out fence.
We can always add an extension later for an array of fences.

Niranjana

>   --Jason
>    
>
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>     + *
>     + * This structure is passed to VM_UNBIND ioctl and specifies the GPU
>     virtual
>     + * address (VA) range that should be unbound from the device page table
>     of the
>     + * specified address space (VM). VM_UNBIND will force unbind the
>     specified
>     + * range from device page table without waiting for any GPU job to
>     complete.
>     + * It is UMDs responsibility to ensure the mapping is no longer in use
>     before
>     + * calling VM_UNBIND.
>     + *
>     + * If the specified mapping is not found, the ioctl will simply return
>     without
>     + * any error.
>     + *
>     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>     concurrently
>     + * are not ordered. Furthermore, parts of the VM_UNBIND operation can
>     be done
>     + * asynchronously, if valid @fence is specified.
>     + */
>     +struct drm_i915_gem_vm_unbind {
>     +       /** @vm_id: VM (address space) id to bind */
>     +       __u32 vm_id;
>     +
>     +       /** @rsvd: Reserved, MBZ */
>     +       __u32 rsvd;
>     +
>     +       /** @start: Virtual Address start to unbind */
>     +       __u64 start;
>     +
>     +       /** @length: Length of mapping to unbind */
>     +       __u64 length;
>     +
>     +       /** @flags: Currently reserved, MBZ */
>     +       __u64 flags;
>     +
>     +       /**
>     +        * @fence: Timeline fence for unbind completion signaling.
>     +        *
>     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>     +        * is invalid, and an error will be returned.
>     +        */
>     +       struct drm_i915_gem_timeline_fence fence;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_execbuffer3 - Structure for
>     DRM_I915_GEM_EXECBUFFER3
>     + * ioctl.
>     + *
>     + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and
>     VM_BIND mode
>     + * only works with this ioctl for submission.
>     + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>     + */
>     +struct drm_i915_gem_execbuffer3 {
>     +       /**
>     +        * @ctx_id: Context id
>     +        *
>     +        * Only contexts with user engine map are allowed.
>     +        */
>     +       __u32 ctx_id;
>     +
>     +       /**
>     +        * @engine_idx: Engine index
>     +        *
>     +        * An index in the user engine map of the context specified by
>     @ctx_id.
>     +        */
>     +       __u32 engine_idx;
>     +
>     +       /**
>     +        * @batch_address: Batch gpu virtual address/es.
>     +        *
>     +        * For normal submission, it is the gpu virtual address of the
>     batch
>     +        * buffer. For parallel submission, it is a pointer to an array
>     of
>     +        * batch buffer gpu virtual addresses with array size equal to
>     the
>     +        * number of (parallel) engines involved in that submission (See
>     +        * struct i915_context_engines_parallel_submit).
>     +        */
>     +       __u64 batch_address;
>     +
>     +       /** @flags: Currently reserved, MBZ */
>     +       __u64 flags;
>     +
>     +       /** @rsvd1: Reserved, MBZ */
>     +       __u32 rsvd1;
>     +
>     +       /** @fence_count: Number of fences in @timeline_fences array. */
>     +       __u32 fence_count;
>     +
>     +       /**
>     +        * @timeline_fences: Pointer to an array of timeline fences.
>     +        *
>     +        * Timeline fences are of format struct
>     drm_i915_gem_timeline_fence.
>     +        */
>     +       __u64 timeline_fences;
>     +
>     +       /** @rsvd2: Reserved, MBZ */
>     +       __u64 rsvd2;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
>     object
>     + * private to the specified VM.
>     + *
>     + * See struct drm_i915_gem_create_ext.
>     + */
>     +struct drm_i915_gem_create_ext_vm_private {
>     +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
>     +       /** @base: Extension link. See struct i915_user_extension. */
>     +       struct i915_user_extension base;
>     +
>     +       /** @vm_id: Id of the VM to which the object is private */
>     +       __u32 vm_id;
>     +};
>     --
>     2.21.0.rc0.32.g243a4c7e27

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-06-30 18:32       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 53+ messages in thread
From: Niranjana Vishwanathapura @ 2022-06-30 18:32 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Paulo Zanoni, Maling list - DRI developers, Intel GFX,
	Chris Wilson, Thomas Hellstrom, Daniel Vetter,
	Christian König, Matthew Auld

On Thu, Jun 30, 2022 at 10:45:12AM -0500, Jason Ekstrand wrote:
>   On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura
>   <niranjana.vishwanathapura@intel.com> wrote:
>
>     VM_BIND and related uapi definitions
>
>     v2: Reduce the scope to simple Mesa use case.
>     v3: Expand VM_UNBIND documentation and add
>         I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>         and I915_GEM_VM_BIND_TLB_FLUSH flags.
>     v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>         documentation for vm_bind/unbind.
>     v5: Remove TLB flush requirement on VM_UNBIND.
>         Add version support to stage implementation.
>     v6: Define and use drm_i915_gem_timeline_fence structure for
>         all timeline fences.
>     v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>         Update documentation on async vm_bind/unbind and versioning.
>         Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>         batch_count field and I915_EXEC3_SECURE flag.
>
>     Signed-off-by: Niranjana Vishwanathapura
>     <niranjana.vishwanathapura@intel.com>
>     Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>     ---
>      Documentation/gpu/rfc/i915_vm_bind.h | 280 +++++++++++++++++++++++++++
>      1 file changed, 280 insertions(+)
>      create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>
>     diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
>     b/Documentation/gpu/rfc/i915_vm_bind.h
>     new file mode 100644
>     index 000000000000..a93e08bceee6
>     --- /dev/null
>     +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>     @@ -0,0 +1,280 @@
>     +/* SPDX-License-Identifier: MIT */
>     +/*
>     + * Copyright © 2022 Intel Corporation
>     + */
>     +
>     +/**
>     + * DOC: I915_PARAM_VM_BIND_VERSION
>     + *
>     + * VM_BIND feature version supported.
>     + * See typedef drm_i915_getparam_t param.
>     + *
>     + * Specifies the VM_BIND feature version supported.
>     + * The following versions of VM_BIND have been defined:
>     + *
>     + * 0: No VM_BIND support.
>     + *
>     + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
>     created
>     + *    previously with VM_BIND, the ioctl will not support unbinding
>     multiple
>     + *    mappings or splitting them. Similarly, VM_BIND calls will not
>     replace
>     + *    any existing mappings.
>     + *
>     + * 2: The restrictions on unbinding partial or multiple mappings is
>     + *    lifted, Similarly, binding will replace any mappings in the given
>     range.
>     + *
>     + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>     + */
>     +#define I915_PARAM_VM_BIND_VERSION     57
>     +
>     +/**
>     + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>     + *
>     + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>     + * See struct drm_i915_gem_vm_control flags.
>     + *
>     + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
>     + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept
>     any
>     + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>     + */
>     +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
>     +
>     +/* VM_BIND related ioctls */
>     +#define DRM_I915_GEM_VM_BIND           0x3d
>     +#define DRM_I915_GEM_VM_UNBIND         0x3e
>     +#define DRM_I915_GEM_EXECBUFFER3       0x3f
>     +
>     +#define DRM_IOCTL_I915_GEM_VM_BIND           
>      DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
>     drm_i915_gem_vm_bind)
>     +#define DRM_IOCTL_I915_GEM_VM_UNBIND         
>      DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
>     drm_i915_gem_vm_bind)
>     +#define DRM_IOCTL_I915_GEM_EXECBUFFER3       
>      DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
>     drm_i915_gem_execbuffer3)
>     +
>     +/**
>     + * struct drm_i915_gem_timeline_fence - An input or output timeline
>     fence.
>     + *
>     + * The operation will wait for input fence to signal.
>     + *
>     + * The returned output fence will be signaled after the completion of
>     the
>     + * operation.
>     + */
>     +struct drm_i915_gem_timeline_fence {
>     +       /** @handle: User's handle for a drm_syncobj to wait on or
>     signal. */
>     +       __u32 handle;
>     +
>     +       /**
>     +        * @flags: Supported flags are:
>     +        *
>     +        * I915_TIMELINE_FENCE_WAIT:
>     +        * Wait for the input fence before the operation.
>     +        *
>     +        * I915_TIMELINE_FENCE_SIGNAL:
>     +        * Return operation completion fence as output.
>     +        */
>     +       __u32 flags;
>     +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>     +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>     +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS
>     (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>     +
>     +       /**
>     +        * @value: A point in the timeline.
>     +        * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>     +        * timeline drm_syncobj is invalid as it turns a drm_syncobj
>     into a
>     +        * binary one.
>     +        */
>     +       __u64 value;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>     + *
>     + * This structure is passed to VM_BIND ioctl and specifies the mapping
>     of GPU
>     + * virtual address (VA) range to the section of an object that should
>     be bound
>     + * in the device page table of the specified address space (VM).
>     + * The VA range specified must be unique (ie., not currently bound) and
>     can
>     + * be mapped to whole object or a section of the object (partial
>     binding).
>     + * Multiple VA mappings can be created to the same section of the
>     object
>     + * (aliasing).
>     + *
>     + * The @start, @offset and @length must be 4K page aligned. However the
>     DG2
>     + * and XEHPSDV has 64K page size for device local-memory and has
>     compact page
>     + * table. On those platforms, for binding device local-memory objects,
>     the
>     + * @start must be 2M aligned, @offset and @length must be 64K aligned.
>     + * Also, for such mappings, i915 will reserve the whole 2M range for it
>     so as
>     + * to not allow multiple mappings in that 2M range (Compact page tables
>     do not
>     + * allow 64K page and 4K page bindings in the same 2M range).
>     + *
>     + * Error code -EINVAL will be returned if @start, @offset and @length
>     are not
>     + * properly aligned. In version 1 (See I915_PARAM_VM_BIND_VERSION),
>     error code
>     + * -ENOSPC will be returned if the VA range specified can't be
>     reserved.
>     + *
>     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>     concurrently
>     + * are not ordered. Furthermore, parts of the VM_BIND operation can be
>     done
>     + * asynchronously, if valid @fence is specified.
>     + */
>     +struct drm_i915_gem_vm_bind {
>     +       /** @vm_id: VM (address space) id to bind */
>     +       __u32 vm_id;
>     +
>     +       /** @handle: Object handle */
>     +       __u32 handle;
>     +
>     +       /** @start: Virtual Address start to bind */
>     +       __u64 start;
>     +
>     +       /** @offset: Offset in object to bind */
>     +       __u64 offset;
>     +
>     +       /** @length: Length of mapping to bind */
>     +       __u64 length;
>     +
>     +       /**
>     +        * @flags: Supported flags are:
>     +        *
>     +        * I915_GEM_VM_BIND_READONLY:
>     +        * Mapping is read-only.
>     +        *
>     +        * I915_GEM_VM_BIND_CAPTURE:
>     +        * Capture this mapping in the dump upon GPU error.
>     +        */
>     +       __u64 flags;
>     +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
>     +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
>     +
>     +       /**
>     +        * @fence: Timeline fence for bind completion signaling.
>     +        *
>     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>     +        * is invalid, and an error will be returned.
>     +        */
>     +       struct drm_i915_gem_timeline_fence fence;
>
>   Why a single fence and not an array of fences?  If Mesa wants to use the
>   out fences for signalling VkSemaphores on the sparse binding queue, we
>   need N of them.  We can still have the "zero fences means block" behavior.

It was discussed and decided to keep it simple with single out fence.
We can always add an extension later for an array of fences.

Niranjana

>   --Jason
>    
>
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>     + *
>     + * This structure is passed to VM_UNBIND ioctl and specifies the GPU
>     virtual
>     + * address (VA) range that should be unbound from the device page table
>     of the
>     + * specified address space (VM). VM_UNBIND will force unbind the
>     specified
>     + * range from device page table without waiting for any GPU job to
>     complete.
>     + * It is UMDs responsibility to ensure the mapping is no longer in use
>     before
>     + * calling VM_UNBIND.
>     + *
>     + * If the specified mapping is not found, the ioctl will simply return
>     without
>     + * any error.
>     + *
>     + * VM_BIND/UNBIND ioctl calls executed on different CPU threads
>     concurrently
>     + * are not ordered. Furthermore, parts of the VM_UNBIND operation can
>     be done
>     + * asynchronously, if valid @fence is specified.
>     + */
>     +struct drm_i915_gem_vm_unbind {
>     +       /** @vm_id: VM (address space) id to bind */
>     +       __u32 vm_id;
>     +
>     +       /** @rsvd: Reserved, MBZ */
>     +       __u32 rsvd;
>     +
>     +       /** @start: Virtual Address start to unbind */
>     +       __u64 start;
>     +
>     +       /** @length: Length of mapping to unbind */
>     +       __u64 length;
>     +
>     +       /** @flags: Currently reserved, MBZ */
>     +       __u64 flags;
>     +
>     +       /**
>     +        * @fence: Timeline fence for unbind completion signaling.
>     +        *
>     +        * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>     +        * is invalid, and an error will be returned.
>     +        */
>     +       struct drm_i915_gem_timeline_fence fence;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_execbuffer3 - Structure for
>     DRM_I915_GEM_EXECBUFFER3
>     + * ioctl.
>     + *
>     + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and
>     VM_BIND mode
>     + * only works with this ioctl for submission.
>     + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>     + */
>     +struct drm_i915_gem_execbuffer3 {
>     +       /**
>     +        * @ctx_id: Context id
>     +        *
>     +        * Only contexts with user engine map are allowed.
>     +        */
>     +       __u32 ctx_id;
>     +
>     +       /**
>     +        * @engine_idx: Engine index
>     +        *
>     +        * An index in the user engine map of the context specified by
>     @ctx_id.
>     +        */
>     +       __u32 engine_idx;
>     +
>     +       /**
>     +        * @batch_address: Batch gpu virtual address/es.
>     +        *
>     +        * For normal submission, it is the gpu virtual address of the
>     batch
>     +        * buffer. For parallel submission, it is a pointer to an array
>     of
>     +        * batch buffer gpu virtual addresses with array size equal to
>     the
>     +        * number of (parallel) engines involved in that submission (See
>     +        * struct i915_context_engines_parallel_submit).
>     +        */
>     +       __u64 batch_address;
>     +
>     +       /** @flags: Currently reserved, MBZ */
>     +       __u64 flags;
>     +
>     +       /** @rsvd1: Reserved, MBZ */
>     +       __u32 rsvd1;
>     +
>     +       /** @fence_count: Number of fences in @timeline_fences array. */
>     +       __u32 fence_count;
>     +
>     +       /**
>     +        * @timeline_fences: Pointer to an array of timeline fences.
>     +        *
>     +        * Timeline fences are of format struct
>     drm_i915_gem_timeline_fence.
>     +        */
>     +       __u64 timeline_fences;
>     +
>     +       /** @rsvd2: Reserved, MBZ */
>     +       __u64 rsvd2;
>     +
>     +       /**
>     +        * @extensions: Zero-terminated chain of extensions.
>     +        *
>     +        * For future extensions. See struct i915_user_extension.
>     +        */
>     +       __u64 extensions;
>     +};
>     +
>     +/**
>     + * struct drm_i915_gem_create_ext_vm_private - Extension to make the
>     object
>     + * private to the specified VM.
>     + *
>     + * See struct drm_i915_gem_create_ext.
>     + */
>     +struct drm_i915_gem_create_ext_vm_private {
>     +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
>     +       /** @base: Extension link. See struct i915_user_extension. */
>     +       struct i915_user_extension base;
>     +
>     +       /** @vm_id: Id of the VM to which the object is private */
>     +       __u32 vm_id;
>     +};
>     --
>     2.21.0.rc0.32.g243a4c7e27

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30 16:22         ` Niranjana Vishwanathapura
@ 2022-07-01  8:11           ` Tvrtko Ursulin
  0 siblings, 0 replies; 53+ messages in thread
From: Tvrtko Ursulin @ 2022-07-01  8:11 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: Zanoni, Paulo R, intel-gfx, dri-devel, Hellstrom, Thomas, Wilson,
	Chris P, Vetter, Daniel, christian.koenig, Auld, Matthew


On 30/06/2022 17:22, Niranjana Vishwanathapura wrote:
> On Thu, Jun 30, 2022 at 08:59:09AM +0100, Tvrtko Ursulin wrote:
>>
>> On 30/06/2022 07:08, Niranjana Vishwanathapura wrote:
>>> On Wed, Jun 29, 2022 at 05:33:49PM -0700, Zanoni, Paulo R wrote:
>>>> On Sat, 2022-06-25 at 18:49 -0700, Niranjana Vishwanathapura wrote:
>>>>> VM_BIND and related uapi definitions
>>>>>
>>>>> v2: Reduce the scope to simple Mesa use case.
>>>>> v3: Expand VM_UNBIND documentation and add
>>>>>     I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>>>>>     and I915_GEM_VM_BIND_TLB_FLUSH flags.
>>>>> v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>>>>>     documentation for vm_bind/unbind.
>>>>> v5: Remove TLB flush requirement on VM_UNBIND.
>>>>>     Add version support to stage implementation.
>>>>> v6: Define and use drm_i915_gem_timeline_fence structure for
>>>>>     all timeline fences.
>>>>> v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
>>>>>     Update documentation on async vm_bind/unbind and versioning.
>>>>>     Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
>>>>>     batch_count field and I915_EXEC3_SECURE flag.
>>>>>
>>>>> Signed-off-by: Niranjana Vishwanathapura 
>>>>> <niranjana.vishwanathapura@intel.com>
>>>>> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>>> ---
>>>>>  Documentation/gpu/rfc/i915_vm_bind.h | 280 
>>>>> +++++++++++++++++++++++++++
>>>>>  1 file changed, 280 insertions(+)
>>>>>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>>>>
>>>>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
>>>>> b/Documentation/gpu/rfc/i915_vm_bind.h
>>>>> new file mode 100644
>>>>> index 000000000000..a93e08bceee6
>>>>> --- /dev/null
>>>>> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>>>>> @@ -0,0 +1,280 @@
>>>>> +/* SPDX-License-Identifier: MIT */
>>>>> +/*
>>>>> + * Copyright © 2022 Intel Corporation
>>>>> + */
>>>>> +
>>>>> +/**
>>>>> + * DOC: I915_PARAM_VM_BIND_VERSION
>>>>> + *
>>>>> + * VM_BIND feature version supported.
>>>>> + * See typedef drm_i915_getparam_t param.
>>>>> + *
>>>>> + * Specifies the VM_BIND feature version supported.
>>>>> + * The following versions of VM_BIND have been defined:
>>>>> + *
>>>>> + * 0: No VM_BIND support.
>>>>> + *
>>>>> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings 
>>>>> created
>>>>> + *    previously with VM_BIND, the ioctl will not support 
>>>>> unbinding multiple
>>>>> + *    mappings or splitting them. Similarly, VM_BIND calls will 
>>>>> not replace
>>>>> + *    any existing mappings.
>>>>> + *
>>>>> + * 2: The restrictions on unbinding partial or multiple mappings is
>>>>> + *    lifted, Similarly, binding will replace any mappings in the 
>>>>> given range.
>>>>> + *
>>>>> + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
>>>>> + */
>>>>> +#define I915_PARAM_VM_BIND_VERSION   57
>>>>> +
>>>>> +/**
>>>>> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>>>>> + *
>>>>> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>>>>> + * See struct drm_i915_gem_vm_control flags.
>>>>> + *
>>>>> + * The older execbuf2 ioctl will not support VM_BIND mode of 
>>>>> operation.
>>>>> + * For VM_BIND mode, we have new execbuf3 ioctl which will not 
>>>>> accept any
>>>>> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>>>>> + */
>>>>> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND     (1 << 0)
>>>>> +
>>>>> +/* VM_BIND related ioctls */
>>>>> +#define DRM_I915_GEM_VM_BIND         0x3d
>>>>> +#define DRM_I915_GEM_VM_UNBIND               0x3e
>>>>> +#define DRM_I915_GEM_EXECBUFFER3     0x3f
>>>>> +
>>>>> +#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + 
>>>>> DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
>>>>> +#define DRM_IOCTL_I915_GEM_VM_UNBIND DRM_IOWR(DRM_COMMAND_BASE + 
>>>>> DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
>>>>> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE + 
>>>>> DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
>>>>> +
>>>>> +/**
>>>>> + * struct drm_i915_gem_timeline_fence - An input or output 
>>>>> timeline fence.
>>>>> + *
>>>>> + * The operation will wait for input fence to signal.
>>>>> + *
>>>>> + * The returned output fence will be signaled after the completion 
>>>>> of the
>>>>> + * operation.
>>>>> + */
>>>>> +struct drm_i915_gem_timeline_fence {
>>>>> +     /** @handle: User's handle for a drm_syncobj to wait on or 
>>>>> signal. */
>>>>> +     __u32 handle;
>>>>> +
>>>>> +     /**
>>>>> +      * @flags: Supported flags are:
>>>>> +      *
>>>>> +      * I915_TIMELINE_FENCE_WAIT:
>>>>> +      * Wait for the input fence before the operation.
>>>>> +      *
>>>>> +      * I915_TIMELINE_FENCE_SIGNAL:
>>>>> +      * Return operation completion fence as output.
>>>>> +      */
>>>>> +     __u32 flags;
>>>>> +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>>>>> +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>>>>> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS 
>>>>> (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>>>>> +
>>>>> +     /**
>>>>> +      * @value: A point in the timeline.
>>>>> +      * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>>>>> +      * timeline drm_syncobj is invalid as it turns a drm_syncobj 
>>>>> into a
>>>>> +      * binary one.
>>>>> +      */
>>>>> +     __u64 value;
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>>>>> + *
>>>>> + * This structure is passed to VM_BIND ioctl and specifies the 
>>>>> mapping of GPU
>>>>> + * virtual address (VA) range to the section of an object that 
>>>>> should be bound
>>>>> + * in the device page table of the specified address space (VM).
>>>>> + * The VA range specified must be unique (ie., not currently 
>>>>> bound) and can
>>>>> + * be mapped to whole object or a section of the object (partial 
>>>>> binding).
>>>>> + * Multiple VA mappings can be created to the same section of the 
>>>>> object
>>>>> + * (aliasing).
>>>>> + *
>>>>> + * The @start, @offset and @length must be 4K page aligned. 
>>>>> However the DG2
>>>>> + * and XEHPSDV has 64K page size for device local-memory and has 
>>>>> compact page
>>>>> + * table. On those platforms, for binding device local-memory 
>>>>> objects, the
>>>>> + * @start must be 2M aligned, @offset and @length must be 64K 
>>>>> aligned.
>>>>> + * Also, for such mappings, i915 will reserve the whole 2M range 
>>>>> for it so as
>>>>> + * to not allow multiple mappings in that 2M range (Compact page 
>>>>> tables do not
>>>>> + * allow 64K page and 4K page bindings in the same 2M range).
>>>>> + *
>>>>> + * Error code -EINVAL will be returned if @start, @offset and 
>>>>> @length are not
>>>>> + * properly aligned. In version 1 (See 
>>>>> I915_PARAM_VM_BIND_VERSION), error code
>>>>> + * -ENOSPC will be returned if the VA range specified can't be 
>>>>> reserved.
>>>>> + *
>>>>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads 
>>>>> concurrently
>>>>> + * are not ordered. Furthermore, parts of the VM_BIND operation 
>>>>> can be done
>>>>> + * asynchronously, if valid @fence is specified.
>>>>
>>>> Does that mean that if I don't provide @fence, then this ioctl will be
>>>> synchronous (i.e., when it returns, the memory will be guaranteed to be
>>>> bound)? The text is kinda implying that, but from one of your earlier
>>>> replies to Tvrtko, that doesn't seem to be the case. I guess we could
>>>> change the text to make this more explicit.
>>>>
>>>
>>> Yes, I thought, if user doesn't specify the out fence, KMD better make
>>> the ioctl synchronous by waiting until the binding finishes before
>>> returning. Otherwise, UMD has no way to ensure binding is complete and
>>> UMD must pass in out fence for VM_BIND calls.
>>
>> This problematic angle is exactly what I raised and I did not 
>> understand you were suggesting sync behaviour back then.
>>
>> I suggested a possible execbuf3 extension which makes it wait for any 
>> pending (un)bind activity on a VM. Sounds better to me than making 
>> everything sync for the use case of N binds followed by 1 execbuf. 
>> *If* userspace wants an easy "fire and forget" mode for such use case, 
>> rather than having to use a fence on all.
>>
> 
> This is a good optimization. But it creates some synchronization between
> VM_BIND and execbuf3. Based on discussion in IRC, looks like folks are

"Some synchronisation".. what does that mean? It creates the same 
synchronisation as if userspace used the out-in fencing between _every_ 
vm bind and execbuf. Only difference being it is simpler / less overhead 
to use.

> Ok with waiting in VM_BIND call if out fence is not specified by UMD.
> So, we can go with that for now.

If people actually plan to use this implied synchronous mode then it 
will suck. It will be worse than execbuf2. There at least kernel had the 
freedom to do things asynchronously while batch is waiting for execution 
time on the GPU. While in this proposal every bind is userspace-kernel 
roundtrip.

Or if people do not plan to use it, then the question is why are we 
adding it and fixing the contract in the uapi forever.

So what is the usecase?

Regards,

Tvrtko

> 
> Niranjana
> 
>> Regards,
>>
>> Tvrtko
>>
>>> But latest comment form Daniel on other thread might suggest 
>>> something else.
>>> Daniel, can you comment?
>>>
>>>> In addition, previously we had the guarantee that an execbuf ioctl
>>>> would wait for all the pending vm_bind operations to finish before
>>>> doing anything. Do we still have this guarantee or do we have to make
>>>> use of the fences now?
>>>>
>>>
>>> No, we don't have that anymore (execbuf is decoupled from VM_BIND).
>>> Execbuf3 submission will not wait for any previous VM_BIND to finish.
>>> UMD must pass in VM_BIND out fence as in fence for execbuf3 to ensure
>>> that.
>>>
>>>>> + */
>>>>> +struct drm_i915_gem_vm_bind {
>>>>> +     /** @vm_id: VM (address space) id to bind */
>>>>> +     __u32 vm_id;
>>>>> +
>>>>> +     /** @handle: Object handle */
>>>>> +     __u32 handle;
>>>>> +
>>>>> +     /** @start: Virtual Address start to bind */
>>>>> +     __u64 start;
>>>>> +
>>>>> +     /** @offset: Offset in object to bind */
>>>>> +     __u64 offset;
>>>>> +
>>>>> +     /** @length: Length of mapping to bind */
>>>>> +     __u64 length;
>>>>> +
>>>>> +     /**
>>>>> +      * @flags: Supported flags are:
>>>>> +      *
>>>>> +      * I915_GEM_VM_BIND_READONLY:
>>>>> +      * Mapping is read-only.
>>>>
>>>> Can you please explain what happens when we try to write to a range
>>>> that's bound as read-only?
>>>>
>>>
>>> It will be mapped as read-only in device page table. Hence any
>>> write access will fail. I would expect a CAT error reported.
>>>
>>> I am seeing that currently the page table R/W setting is based
>>> on whether BO is readonly or not (UMDs can request a userptr
>>> BO to readonly). We can make this READONLY here as a subset.
>>> ie., if BO is readonly, the mappings must be readonly. If BO
>>> is not readonly, then the mapping can be either readonly or
>>> not.
>>>
>>> But if Mesa doesn't have a use for this, then we can remove
>>> this flag for now.
>>>
>>>>
>>>>> +      *
>>>>> +      * I915_GEM_VM_BIND_CAPTURE:
>>>>> +      * Capture this mapping in the dump upon GPU error.
>>>>> +      */
>>>>> +     __u64 flags;
>>>>> +#define I915_GEM_VM_BIND_READONLY    (1 << 1)
>>>>> +#define I915_GEM_VM_BIND_CAPTURE     (1 << 2)
>>>>> +
>>>>> +     /**
>>>>> +      * @fence: Timeline fence for bind completion signaling.
>>>>> +      *
>>>>> +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>>>>> +      * is invalid, and an error will be returned.
>>>>> +      */
>>>>> +     struct drm_i915_gem_timeline_fence fence;
>>>>> +
>>>>> +     /**
>>>>> +      * @extensions: Zero-terminated chain of extensions.
>>>>> +      *
>>>>> +      * For future extensions. See struct i915_user_extension.
>>>>> +      */
>>>>> +     __u64 extensions;
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>>>>> + *
>>>>> + * This structure is passed to VM_UNBIND ioctl and specifies the 
>>>>> GPU virtual
>>>>> + * address (VA) range that should be unbound from the device page 
>>>>> table of the
>>>>> + * specified address space (VM). VM_UNBIND will force unbind the 
>>>>> specified
>>>>> + * range from device page table without waiting for any GPU job to 
>>>>> complete.
>>>>> + * It is UMDs responsibility to ensure the mapping is no longer in 
>>>>> use before
>>>>> + * calling VM_UNBIND.
>>>>> + *
>>>>> + * If the specified mapping is not found, the ioctl will simply 
>>>>> return without
>>>>> + * any error.
>>>>> + *
>>>>> + * VM_BIND/UNBIND ioctl calls executed on different CPU threads 
>>>>> concurrently
>>>>> + * are not ordered. Furthermore, parts of the VM_UNBIND operation 
>>>>> can be done
>>>>> + * asynchronously, if valid @fence is specified.
>>>>> + */
>>>>> +struct drm_i915_gem_vm_unbind {
>>>>> +     /** @vm_id: VM (address space) id to bind */
>>>>> +     __u32 vm_id;
>>>>> +
>>>>> +     /** @rsvd: Reserved, MBZ */
>>>>> +     __u32 rsvd;
>>>>> +
>>>>> +     /** @start: Virtual Address start to unbind */
>>>>> +     __u64 start;
>>>>> +
>>>>> +     /** @length: Length of mapping to unbind */
>>>>> +     __u64 length;
>>>>> +
>>>>> +     /** @flags: Currently reserved, MBZ */
>>>>> +     __u64 flags;
>>>>> +
>>>>> +     /**
>>>>> +      * @fence: Timeline fence for unbind completion signaling.
>>>>> +      *
>>>>> +      * It is an out fence, hence using I915_TIMELINE_FENCE_WAIT flag
>>>>> +      * is invalid, and an error will be returned.
>>>>> +      */
>>>>> +     struct drm_i915_gem_timeline_fence fence;
>>>>> +
>>>>> +     /**
>>>>> +      * @extensions: Zero-terminated chain of extensions.
>>>>> +      *
>>>>> +      * For future extensions. See struct i915_user_extension.
>>>>> +      */
>>>>> +     __u64 extensions;
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * struct drm_i915_gem_execbuffer3 - Structure for 
>>>>> DRM_I915_GEM_EXECBUFFER3
>>>>> + * ioctl.
>>>>> + *
>>>>> + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND mode and 
>>>>> VM_BIND mode
>>>>> + * only works with this ioctl for submission.
>>>>> + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>>>>> + */
>>>>> +struct drm_i915_gem_execbuffer3 {
>>>>> +     /**
>>>>> +      * @ctx_id: Context id
>>>>> +      *
>>>>> +      * Only contexts with user engine map are allowed.
>>>>> +      */
>>>>> +     __u32 ctx_id;
>>>>> +
>>>>> +     /**
>>>>> +      * @engine_idx: Engine index
>>>>> +      *
>>>>> +      * An index in the user engine map of the context specified 
>>>>> by @ctx_id.
>>>>> +      */
>>>>> +     __u32 engine_idx;
>>>>> +
>>>>> +     /**
>>>>> +      * @batch_address: Batch gpu virtual address/es.
>>>>> +      *
>>>>> +      * For normal submission, it is the gpu virtual address of 
>>>>> the batch
>>>>> +      * buffer. For parallel submission, it is a pointer to an 
>>>>> array of
>>>>> +      * batch buffer gpu virtual addresses with array size equal 
>>>>> to the
>>>>> +      * number of (parallel) engines involved in that submission (See
>>>>> +      * struct i915_context_engines_parallel_submit).
>>>>> +      */
>>>>> +     __u64 batch_address;
>>>>> +
>>>>> +     /** @flags: Currently reserved, MBZ */
>>>>> +     __u64 flags;
>>>>> +
>>>>> +     /** @rsvd1: Reserved, MBZ */
>>>>> +     __u32 rsvd1;
>>>>> +
>>>>> +     /** @fence_count: Number of fences in @timeline_fences array. */
>>>>> +     __u32 fence_count;
>>>>> +
>>>>> +     /**
>>>>> +      * @timeline_fences: Pointer to an array of timeline fences.
>>>>> +      *
>>>>> +      * Timeline fences are of format struct 
>>>>> drm_i915_gem_timeline_fence.
>>>>> +      */
>>>>> +     __u64 timeline_fences;
>>>>> +
>>>>> +     /** @rsvd2: Reserved, MBZ */
>>>>> +     __u64 rsvd2;
>>>>> +
>>>>
>>>> Just out of curiosity: if we can extend behavior with @extensions and
>>>> even @flags, why would we need a rsvd2? Perhaps we could kill rsvd2?
>>>>
>>>
>>> True. I added it just in case some requests came up that would require
>>> some additional fields. During this review process itself there were
>>> some requests. Adding directly here should have a slight performance
>>> edge over adding it as an extension (one less copy_from_user).
>>>
>>> But if folks think this is an overkill, I will remove it.
>>>
>>> Niranjana
>>>
>>>>> +     /**
>>>>> +      * @extensions: Zero-terminated chain of extensions.
>>>>> +      *
>>>>> +      * For future extensions. See struct i915_user_extension.
>>>>> +      */
>>>>> +     __u64 extensions;
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * struct drm_i915_gem_create_ext_vm_private - Extension to make 
>>>>> the object
>>>>> + * private to the specified VM.
>>>>> + *
>>>>> + * See struct drm_i915_gem_create_ext.
>>>>> + */
>>>>> +struct drm_i915_gem_create_ext_vm_private {
>>>>> +#define I915_GEM_CREATE_EXT_VM_PRIVATE               2
>>>>> +     /** @base: Extension link. See struct i915_user_extension. */
>>>>> +     struct i915_user_extension base;
>>>>> +
>>>>> +     /** @vm_id: Id of the VM to which the object is private */
>>>>> +     __u32 vm_id;
>>>>> +};
>>>>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30 16:23           ` [Intel-gfx] " Matthew Auld
@ 2022-07-01  8:45             ` Matthew Auld
  -1 siblings, 0 replies; 53+ messages in thread
From: Matthew Auld @ 2022-07-01  8:45 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Matthew Brost, Paulo Zanoni, Lionel Landwerlin, Tvrtko Ursulin,
	Intel GFX, Maling list - DRI developers, Thomas Hellstrom,
	oak.zeng, Chris Wilson, Daniel Vetter, Niranjana Vishwanathapura,
	Christian König

On 30/06/2022 17:23, Matthew Auld wrote:
> On 30/06/2022 16:34, Jason Ekstrand wrote:
>> On Thu, Jun 30, 2022 at 10:14 AM Matthew Auld <matthew.auld@intel.com 
>> <mailto:matthew.auld@intel.com>> wrote:
>>
>>     On 30/06/2022 06:11, Jason Ekstrand wrote:
>>      > On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura
>>      > <niranjana.vishwanathapura@intel.com
>>     <mailto:niranjana.vishwanathapura@intel.com>
>>      > <mailto:niranjana.vishwanathapura@intel.com
>>     <mailto:niranjana.vishwanathapura@intel.com>>> wrote:
>>      >
>>      >     VM_BIND and related uapi definitions
>>      >
>>      >     v2: Reduce the scope to simple Mesa use case.
>>      >     v3: Expand VM_UNBIND documentation and add
>>      >          I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>>      >          and I915_GEM_VM_BIND_TLB_FLUSH flags.
>>      >     v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>>      >          documentation for vm_bind/unbind.
>>      >     v5: Remove TLB flush requirement on VM_UNBIND.
>>      >          Add version support to stage implementation.
>>      >     v6: Define and use drm_i915_gem_timeline_fence structure for
>>      >          all timeline fences.
>>      >     v7: Rename I915_PARAM_HAS_VM_BIND to 
>> I915_PARAM_VM_BIND_VERSION.
>>      >          Update documentation on async vm_bind/unbind and 
>> versioning.
>>      >          Remove redundant vm_bind/unbind FENCE_VALID flag, 
>> execbuf3
>>      >          batch_count field and I915_EXEC3_SECURE flag.
>>      >
>>      >     Signed-off-by: Niranjana Vishwanathapura
>>      >     <niranjana.vishwanathapura@intel.com
>>     <mailto:niranjana.vishwanathapura@intel.com>
>>      >     <mailto:niranjana.vishwanathapura@intel.com
>>     <mailto:niranjana.vishwanathapura@intel.com>>>
>>      >     Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch
>>     <mailto:daniel.vetter@ffwll.ch>
>>      >     <mailto:daniel.vetter@ffwll.ch 
>> <mailto:daniel.vetter@ffwll.ch>>>
>>      >     ---
>>      >       Documentation/gpu/rfc/i915_vm_bind.h | 280
>>     +++++++++++++++++++++++++++
>>      >       1 file changed, 280 insertions(+)
>>      >       create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>      >
>>      >     diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
>>      >     b/Documentation/gpu/rfc/i915_vm_bind.h
>>      >     new file mode 100644
>>      >     index 000000000000..a93e08bceee6
>>      >     --- /dev/null
>>      >     +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>>      >     @@ -0,0 +1,280 @@
>>      >     +/* SPDX-License-Identifier: MIT */
>>      >     +/*
>>      >     + * Copyright © 2022 Intel Corporation
>>      >     + */
>>      >     +
>>      >     +/**
>>      >     + * DOC: I915_PARAM_VM_BIND_VERSION
>>      >     + *
>>      >     + * VM_BIND feature version supported.
>>      >     + * See typedef drm_i915_getparam_t param.
>>      >     + *
>>      >     + * Specifies the VM_BIND feature version supported.
>>      >     + * The following versions of VM_BIND have been defined:
>>      >     + *
>>      >     + * 0: No VM_BIND support.
>>      >     + *
>>      >     + * 1: In VM_UNBIND calls, the UMD must specify the exact
>>     mappings
>>      >     created
>>      >     + *    previously with VM_BIND, the ioctl will not support
>>     unbinding
>>      >     multiple
>>      >     + *    mappings or splitting them. Similarly, VM_BIND calls
>>     will not
>>      >     replace
>>      >     + *    any existing mappings.
>>      >     + *
>>      >     + * 2: The restrictions on unbinding partial or multiple
>>     mappings is
>>      >     + *    lifted, Similarly, binding will replace any mappings
>>     in the
>>      >     given range.
>>      >     + *
>>      >     + * See struct drm_i915_gem_vm_bind and struct
>>     drm_i915_gem_vm_unbind.
>>      >     + */
>>      >     +#define I915_PARAM_VM_BIND_VERSION     57
>>      >     +
>>      >     +/**
>>      >     + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>>      >     + *
>>      >     + * Flag to opt-in for VM_BIND mode of binding during VM
>>     creation.
>>      >     + * See struct drm_i915_gem_vm_control flags.
>>      >     + *
>>      >     + * The older execbuf2 ioctl will not support VM_BIND mode of
>>     operation.
>>      >     + * For VM_BIND mode, we have new execbuf3 ioctl which will 
>> not
>>      >     accept any
>>      >     + * execlist (See struct drm_i915_gem_execbuffer3 for more
>>     details).
>>      >     + */
>>      >     +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
>>      >     +
>>      >     +/* VM_BIND related ioctls */
>>      >     +#define DRM_I915_GEM_VM_BIND           0x3d
>>      >     +#define DRM_I915_GEM_VM_UNBIND         0x3e
>>      >     +#define DRM_I915_GEM_EXECBUFFER3       0x3f
>>      >     +
>>      >     +#define DRM_IOCTL_I915_GEM_VM_BIND
>>      >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
>>      >     drm_i915_gem_vm_bind)
>>      >     +#define DRM_IOCTL_I915_GEM_VM_UNBIND
>>      >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
>>      >     drm_i915_gem_vm_bind)
>>      >     +#define DRM_IOCTL_I915_GEM_EXECBUFFER3
>>      >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
>>      >     drm_i915_gem_execbuffer3)
>>      >     +
>>      >     +/**
>>      >     + * struct drm_i915_gem_timeline_fence - An input or output
>>     timeline
>>      >     fence.
>>      >     + *
>>      >     + * The operation will wait for input fence to signal.
>>      >     + *
>>      >     + * The returned output fence will be signaled after the
>>     completion
>>      >     of the
>>      >     + * operation.
>>      >     + */
>>      >     +struct drm_i915_gem_timeline_fence {
>>      >     +       /** @handle: User's handle for a drm_syncobj to wait
>>     on or
>>      >     signal. */
>>      >     +       __u32 handle;
>>      >     +
>>      >     +       /**
>>      >     +        * @flags: Supported flags are:
>>      >     +        *
>>      >     +        * I915_TIMELINE_FENCE_WAIT:
>>      >     +        * Wait for the input fence before the operation.
>>      >     +        *
>>      >     +        * I915_TIMELINE_FENCE_SIGNAL:
>>      >     +        * Return operation completion fence as output.
>>      >     +        */
>>      >     +       __u32 flags;
>>      >     +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>>      >     +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>>      >     +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS
>>      >     (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>>      >     +
>>      >     +       /**
>>      >     +        * @value: A point in the timeline.
>>      >     +        * Value must be 0 for a binary drm_syncobj. A Value
>>     of 0 for a
>>      >     +        * timeline drm_syncobj is invalid as it turns a
>>     drm_syncobj
>>      >     into a
>>      >     +        * binary one.
>>      >     +        */
>>      >     +       __u64 value;
>>      >     +};
>>      >     +
>>      >     +/**
>>      >     + * struct drm_i915_gem_vm_bind - VA to object mapping to 
>> bind.
>>      >     + *
>>      >     + * This structure is passed to VM_BIND ioctl and specifies 
>> the
>>      >     mapping of GPU
>>      >     + * virtual address (VA) range to the section of an object 
>> that
>>      >     should be bound
>>      >     + * in the device page table of the specified address space 
>> (VM).
>>      >     + * The VA range specified must be unique (ie., not currently
>>     bound)
>>      >     and can
>>      >     + * be mapped to whole object or a section of the object 
>> (partial
>>      >     binding).
>>      >     + * Multiple VA mappings can be created to the same section
>>     of the
>>      >     object
>>      >     + * (aliasing).
>>      >     + *
>>      >     + * The @start, @offset and @length must be 4K page aligned.
>>     However
>>      >     the DG2
>>      >     + * and XEHPSDV has 64K page size for device local-memory 
>> and has
>>      >     compact page
>>      >     + * table. On those platforms, for binding device local-memory
>>      >     objects, the
>>      >     + * @start must be 2M aligned, @offset and @length must be
>>     64K aligned.
>>      >
>>      >
>>      > This is not acceptable.  We need 64K granularity.  This 
>> includes the
>>      > starting address, the BO offset, and the length.  Why?  The 
>> tl;dr is
>>      > that it's a requirement for about 50% of D3D12 apps if we want
>>     them to
>>      > run on Linux via D3D12.  A longer explanation follows.  I don't
>>      > necessarily expect kernel folks to get all the details but 
>> hopefully
>>      > I'll have left enough of a map that some of the Intel Mesa 
>> folks can
>>      > help fill in details.
>>      >
>>      > Many modern D3D12 apps have a hard requirement on Tier2 tiled
>>      > resources.  This is a feature that Intel has supported in the 
>> D3D12
>>      > driver since Skylake.  In order to implement this feature, VKD3D
>>      > requires the various sparseResidencyImage* and
>>     sparseResidency*Sampled
>>      > Vulkan features.  If we want those apps to work (there's getting
>>     to be
>>      > quite a few of them), we need to implement the Vulkan sparse
>>     residency
>>      > features.
>>      > |
>>      > |
>>      > What is sparse residency?  I'm glad you asked!  The sparse 
>> residency
>>      > features allow a client to separately bind each miplevel or array
>>     slice
>>      > of an image to a chunk of device memory independently, without
>>     affecting
>>      > any other areas of the image.  Once you get to a high enough
>>     miplevel
>>      > that everything fits inside a single sparse image block (that's a
>>      > technical Vulkan term you can search for in the spec), you can
>>     enter a
>>      > "miptail" which contains all the remaining miplevels in a single
>>     sparse
>>      > image block.
>>      >
>>      > The term "sparse image block" is what the Vulkan spec uses.  On
>>     Intel
>>      > hardware and in the docs, it's what we call a "tile".     
>> Specifically, the
>>      > image needs to use Yf or Ys tiling on SKL-TGL or a Tile64 on
>>     DG2+.  This
>>      > is because Tile4 and legacy X and Y-tiling don't provide any
>>     guarantees
>>      > about page alignment for slices.  Yf, Ys, and Tile64, on the
>>     other hand,
>>      > align all slices of the image to a tile boundary, allowing us 
>> to map
>>      > memory to different slices independently, assuming we have 64K
>>     (or 4K
>>      > for Yf) VM_BIND granularity.  (4K isn't actually a requirement for
>>      > SKL-TGL; we can use Ys all the time which has 64K tiles but
>>     there's no
>>      > reason to not support 4K alignments on integrated.)
>>      >
>>      > Someone may be tempted to ask, "Can't we wiggle the strides
>>     around or
>>      > something to make it work?"  I thought about that and no, you 
>> can't.
>>      > The problem here is LOD2+.  Sure, you can have a stride such that
>>     the
>>      > image is a multiple of 2M worth of tiles across.  That'll work
>>     fine for
>>      > LOD0 and LOD1; both will be 2M aligned.  However, LOD2 won't be 
>> and
>>      > there's no way to control that.  The hardware will place it to
>>     the right
>>      > of LOD1 by ROUND_UP(width, tile_width) pixels and there's nothing
>>     you
>>      > can do about that.  If that position doesn't happen to hit a 2M
>>      > boundary, you're out of luck.
>>      >
>>      > I hope that explanation provides enough detail.  Sadly, this is
>>     one of
>>      > those things which has a lot of moving pieces all over different
>>     bits of
>>      > the hardware and various APIs and they all have to work together
>>     just
>>      > right for it to all come out in the end.  But, yeah, we really
>>     need 64K
>>      > aligned binding if we want VKD3D to work.
>>
>>     Just to confirm, the new model would be to enforce 64K GTT alignment
>>     for
>>     lmem pages, and then for smem pages we would only require 4K 
>> alignment,
>>     but with the added restriction that userspace will never try to 
>> mix the
>>     two (lmem vs smem) within the same 2M va range (page-table). The 
>> kernel
>>     will verify this and throw an error if needed. This model should work
>>     with the above?
>>
>>
>> Mesa doesn't have full control over BO placement so I don't think we 
>> can guarantee quite as much as you want there.  We can guarantee, I 
>> think, that we never place LMEM-only and SMEM-only in the same 2M 
>> block. However, most BOs will be LMEM+SMEM (with a preference for 
>> LMEM) and then it'll be up to the kernel to sort out any issues.  Is 
>> that reasonable?
> 
> That seems tricky for the lmem + smem case. On DG2 the hw design is such 
> that you can't have 64K and 4K GTT pages within the same page-table, 
> since the entire page-table is either operating in 64K or 4K GTT page 
> mode (there is some special bit on the PDE that we need to toggle to 
> turn on/off the 64K mode).

Just to be clear here, "64K GTT" is not just the GTT alignment but also 
the requirement that the device/physical address of the underlying 
memory also has 64K alignment, and ofc is also 64K of contiguous memory. 
We can easily guarantee that with lmem, but with smem it might be more 
tricky (like shmem). Also I'm not sure if there is already a mechanism 
to request some kind of min alignment/page-size with the dma-api, but 
perhaps something like that could work here?

> 
>>
>> --Jason
>>
>>      >
>>      > --Jason
>>      >
>>      >     + * Also, for such mappings, i915 will reserve the whole 2M 
>> range
>>      >     for it so as
>>      >     + * to not allow multiple mappings in that 2M range 
>> (Compact page
>>      >     tables do not
>>      >     + * allow 64K page and 4K page bindings in the same 2M range).
>>      >     + *
>>      >     + * Error code -EINVAL will be returned if @start, @offset and
>>      >     @length are not
>>      >     + * properly aligned. In version 1 (See
>>     I915_PARAM_VM_BIND_VERSION),
>>      >     error code
>>      >     + * -ENOSPC will be returned if the VA range specified 
>> can't be
>>      >     reserved.
>>      >     + *
>>      >     + * VM_BIND/UNBIND ioctl calls executed on different CPU 
>> threads
>>      >     concurrently
>>      >     + * are not ordered. Furthermore, parts of the VM_BIND
>>     operation can
>>      >     be done
>>      >     + * asynchronously, if valid @fence is specified.
>>      >     + */
>>      >     +struct drm_i915_gem_vm_bind {
>>      >     +       /** @vm_id: VM (address space) id to bind */
>>      >     +       __u32 vm_id;
>>      >     +
>>      >     +       /** @handle: Object handle */
>>      >     +       __u32 handle;
>>      >     +
>>      >     +       /** @start: Virtual Address start to bind */
>>      >     +       __u64 start;
>>      >     +
>>      >     +       /** @offset: Offset in object to bind */
>>      >     +       __u64 offset;
>>      >     +
>>      >     +       /** @length: Length of mapping to bind */
>>      >     +       __u64 length;
>>      >     +
>>      >     +       /**
>>      >     +        * @flags: Supported flags are:
>>      >     +        *
>>      >     +        * I915_GEM_VM_BIND_READONLY:
>>      >     +        * Mapping is read-only.
>>      >     +        *
>>      >     +        * I915_GEM_VM_BIND_CAPTURE:
>>      >     +        * Capture this mapping in the dump upon GPU error.
>>      >     +        */
>>      >     +       __u64 flags;
>>      >     +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
>>      >     +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
>>      >     +
>>      >     +       /**
>>      >     +        * @fence: Timeline fence for bind completion 
>> signaling.
>>      >     +        *
>>      >     +        * It is an out fence, hence using
>>     I915_TIMELINE_FENCE_WAIT flag
>>      >     +        * is invalid, and an error will be returned.
>>      >     +        */
>>      >     +       struct drm_i915_gem_timeline_fence fence;
>>      >     +
>>      >     +       /**
>>      >     +        * @extensions: Zero-terminated chain of extensions.
>>      >     +        *
>>      >     +        * For future extensions. See struct 
>> i915_user_extension.
>>      >     +        */
>>      >     +       __u64 extensions;
>>      >     +};
>>      >     +
>>      >     +/**
>>      >     + * struct drm_i915_gem_vm_unbind - VA to object mapping to
>>     unbind.
>>      >     + *
>>      >     + * This structure is passed to VM_UNBIND ioctl and 
>> specifies the
>>      >     GPU virtual
>>      >     + * address (VA) range that should be unbound from the device
>>     page
>>      >     table of the
>>      >     + * specified address space (VM). VM_UNBIND will force 
>> unbind the
>>      >     specified
>>      >     + * range from device page table without waiting for any GPU
>>     job to
>>      >     complete.
>>      >     + * It is UMDs responsibility to ensure the mapping is no
>>     longer in
>>      >     use before
>>      >     + * calling VM_UNBIND.
>>      >     + *
>>      >     + * If the specified mapping is not found, the ioctl will 
>> simply
>>      >     return without
>>      >     + * any error.
>>      >     + *
>>      >     + * VM_BIND/UNBIND ioctl calls executed on different CPU 
>> threads
>>      >     concurrently
>>      >     + * are not ordered. Furthermore, parts of the VM_UNBIND
>>     operation
>>      >     can be done
>>      >     + * asynchronously, if valid @fence is specified.
>>      >     + */
>>      >     +struct drm_i915_gem_vm_unbind {
>>      >     +       /** @vm_id: VM (address space) id to bind */
>>      >     +       __u32 vm_id;
>>      >     +
>>      >     +       /** @rsvd: Reserved, MBZ */
>>      >     +       __u32 rsvd;
>>      >     +
>>      >     +       /** @start: Virtual Address start to unbind */
>>      >     +       __u64 start;
>>      >     +
>>      >     +       /** @length: Length of mapping to unbind */
>>      >     +       __u64 length;
>>      >     +
>>      >     +       /** @flags: Currently reserved, MBZ */
>>      >     +       __u64 flags;
>>      >     +
>>      >     +       /**
>>      >     +        * @fence: Timeline fence for unbind completion
>>     signaling.
>>      >     +        *
>>      >     +        * It is an out fence, hence using
>>     I915_TIMELINE_FENCE_WAIT flag
>>      >     +        * is invalid, and an error will be returned.
>>      >     +        */
>>      >     +       struct drm_i915_gem_timeline_fence fence;
>>      >     +
>>      >     +       /**
>>      >     +        * @extensions: Zero-terminated chain of extensions.
>>      >     +        *
>>      >     +        * For future extensions. See struct 
>> i915_user_extension.
>>      >     +        */
>>      >     +       __u64 extensions;
>>      >     +};
>>      >     +
>>      >     +/**
>>      >     + * struct drm_i915_gem_execbuffer3 - Structure for
>>      >     DRM_I915_GEM_EXECBUFFER3
>>      >     + * ioctl.
>>      >     + *
>>      >     + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND 
>> mode and
>>      >     VM_BIND mode
>>      >     + * only works with this ioctl for submission.
>>      >     + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>>      >     + */
>>      >     +struct drm_i915_gem_execbuffer3 {
>>      >     +       /**
>>      >     +        * @ctx_id: Context id
>>      >     +        *
>>      >     +        * Only contexts with user engine map are allowed.
>>      >     +        */
>>      >     +       __u32 ctx_id;
>>      >     +
>>      >     +       /**
>>      >     +        * @engine_idx: Engine index
>>      >     +        *
>>      >     +        * An index in the user engine map of the context
>>     specified
>>      >     by @ctx_id.
>>      >     +        */
>>      >     +       __u32 engine_idx;
>>      >     +
>>      >     +       /**
>>      >     +        * @batch_address: Batch gpu virtual address/es.
>>      >     +        *
>>      >     +        * For normal submission, it is the gpu virtual
>>     address of
>>      >     the batch
>>      >     +        * buffer. For parallel submission, it is a pointer 
>> to an
>>      >     array of
>>      >     +        * batch buffer gpu virtual addresses with array size
>>     equal
>>      >     to the
>>      >     +        * number of (parallel) engines involved in that
>>     submission (See
>>      >     +        * struct i915_context_engines_parallel_submit).
>>      >     +        */
>>      >     +       __u64 batch_address;
>>      >     +
>>      >     +       /** @flags: Currently reserved, MBZ */
>>      >     +       __u64 flags;
>>      >     +
>>      >     +       /** @rsvd1: Reserved, MBZ */
>>      >     +       __u32 rsvd1;
>>      >     +
>>      >     +       /** @fence_count: Number of fences in
>>     @timeline_fences array. */
>>      >     +       __u32 fence_count;
>>      >     +
>>      >     +       /**
>>      >     +        * @timeline_fences: Pointer to an array of timeline
>>     fences.
>>      >     +        *
>>      >     +        * Timeline fences are of format struct
>>      >     drm_i915_gem_timeline_fence.
>>      >     +        */
>>      >     +       __u64 timeline_fences;
>>      >     +
>>      >     +       /** @rsvd2: Reserved, MBZ */
>>      >     +       __u64 rsvd2;
>>      >     +
>>      >     +       /**
>>      >     +        * @extensions: Zero-terminated chain of extensions.
>>      >     +        *
>>      >     +        * For future extensions. See struct 
>> i915_user_extension.
>>      >     +        */
>>      >     +       __u64 extensions;
>>      >     +};
>>      >     +
>>      >     +/**
>>      >     + * struct drm_i915_gem_create_ext_vm_private - Extension 
>> to make
>>      >     the object
>>      >     + * private to the specified VM.
>>      >     + *
>>      >     + * See struct drm_i915_gem_create_ext.
>>      >     + */
>>      >     +struct drm_i915_gem_create_ext_vm_private {
>>      >     +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
>>      >     +       /** @base: Extension link. See struct
>>     i915_user_extension. */
>>      >     +       struct i915_user_extension base;
>>      >     +
>>      >     +       /** @vm_id: Id of the VM to which the object is
>>     private */
>>      >     +       __u32 vm_id;
>>      >     +};
>>      >     --
>>      >     2.21.0.rc0.32.g243a4c7e27
>>      >
>>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-07-01  8:45             ` Matthew Auld
  0 siblings, 0 replies; 53+ messages in thread
From: Matthew Auld @ 2022-07-01  8:45 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Paulo Zanoni, Intel GFX, Maling list - DRI developers,
	Thomas Hellstrom, Chris Wilson, Daniel Vetter,
	Christian König

On 30/06/2022 17:23, Matthew Auld wrote:
> On 30/06/2022 16:34, Jason Ekstrand wrote:
>> On Thu, Jun 30, 2022 at 10:14 AM Matthew Auld <matthew.auld@intel.com 
>> <mailto:matthew.auld@intel.com>> wrote:
>>
>>     On 30/06/2022 06:11, Jason Ekstrand wrote:
>>      > On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura
>>      > <niranjana.vishwanathapura@intel.com
>>     <mailto:niranjana.vishwanathapura@intel.com>
>>      > <mailto:niranjana.vishwanathapura@intel.com
>>     <mailto:niranjana.vishwanathapura@intel.com>>> wrote:
>>      >
>>      >     VM_BIND and related uapi definitions
>>      >
>>      >     v2: Reduce the scope to simple Mesa use case.
>>      >     v3: Expand VM_UNBIND documentation and add
>>      >          I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>>      >          and I915_GEM_VM_BIND_TLB_FLUSH flags.
>>      >     v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
>>      >          documentation for vm_bind/unbind.
>>      >     v5: Remove TLB flush requirement on VM_UNBIND.
>>      >          Add version support to stage implementation.
>>      >     v6: Define and use drm_i915_gem_timeline_fence structure for
>>      >          all timeline fences.
>>      >     v7: Rename I915_PARAM_HAS_VM_BIND to 
>> I915_PARAM_VM_BIND_VERSION.
>>      >          Update documentation on async vm_bind/unbind and 
>> versioning.
>>      >          Remove redundant vm_bind/unbind FENCE_VALID flag, 
>> execbuf3
>>      >          batch_count field and I915_EXEC3_SECURE flag.
>>      >
>>      >     Signed-off-by: Niranjana Vishwanathapura
>>      >     <niranjana.vishwanathapura@intel.com
>>     <mailto:niranjana.vishwanathapura@intel.com>
>>      >     <mailto:niranjana.vishwanathapura@intel.com
>>     <mailto:niranjana.vishwanathapura@intel.com>>>
>>      >     Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch
>>     <mailto:daniel.vetter@ffwll.ch>
>>      >     <mailto:daniel.vetter@ffwll.ch 
>> <mailto:daniel.vetter@ffwll.ch>>>
>>      >     ---
>>      >       Documentation/gpu/rfc/i915_vm_bind.h | 280
>>     +++++++++++++++++++++++++++
>>      >       1 file changed, 280 insertions(+)
>>      >       create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>      >
>>      >     diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
>>      >     b/Documentation/gpu/rfc/i915_vm_bind.h
>>      >     new file mode 100644
>>      >     index 000000000000..a93e08bceee6
>>      >     --- /dev/null
>>      >     +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>>      >     @@ -0,0 +1,280 @@
>>      >     +/* SPDX-License-Identifier: MIT */
>>      >     +/*
>>      >     + * Copyright © 2022 Intel Corporation
>>      >     + */
>>      >     +
>>      >     +/**
>>      >     + * DOC: I915_PARAM_VM_BIND_VERSION
>>      >     + *
>>      >     + * VM_BIND feature version supported.
>>      >     + * See typedef drm_i915_getparam_t param.
>>      >     + *
>>      >     + * Specifies the VM_BIND feature version supported.
>>      >     + * The following versions of VM_BIND have been defined:
>>      >     + *
>>      >     + * 0: No VM_BIND support.
>>      >     + *
>>      >     + * 1: In VM_UNBIND calls, the UMD must specify the exact
>>     mappings
>>      >     created
>>      >     + *    previously with VM_BIND, the ioctl will not support
>>     unbinding
>>      >     multiple
>>      >     + *    mappings or splitting them. Similarly, VM_BIND calls
>>     will not
>>      >     replace
>>      >     + *    any existing mappings.
>>      >     + *
>>      >     + * 2: The restrictions on unbinding partial or multiple
>>     mappings is
>>      >     + *    lifted, Similarly, binding will replace any mappings
>>     in the
>>      >     given range.
>>      >     + *
>>      >     + * See struct drm_i915_gem_vm_bind and struct
>>     drm_i915_gem_vm_unbind.
>>      >     + */
>>      >     +#define I915_PARAM_VM_BIND_VERSION     57
>>      >     +
>>      >     +/**
>>      >     + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>>      >     + *
>>      >     + * Flag to opt-in for VM_BIND mode of binding during VM
>>     creation.
>>      >     + * See struct drm_i915_gem_vm_control flags.
>>      >     + *
>>      >     + * The older execbuf2 ioctl will not support VM_BIND mode of
>>     operation.
>>      >     + * For VM_BIND mode, we have new execbuf3 ioctl which will 
>> not
>>      >     accept any
>>      >     + * execlist (See struct drm_i915_gem_execbuffer3 for more
>>     details).
>>      >     + */
>>      >     +#define I915_VM_CREATE_FLAGS_USE_VM_BIND       (1 << 0)
>>      >     +
>>      >     +/* VM_BIND related ioctls */
>>      >     +#define DRM_I915_GEM_VM_BIND           0x3d
>>      >     +#define DRM_I915_GEM_VM_UNBIND         0x3e
>>      >     +#define DRM_I915_GEM_EXECBUFFER3       0x3f
>>      >     +
>>      >     +#define DRM_IOCTL_I915_GEM_VM_BIND
>>      >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
>>      >     drm_i915_gem_vm_bind)
>>      >     +#define DRM_IOCTL_I915_GEM_VM_UNBIND
>>      >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
>>      >     drm_i915_gem_vm_bind)
>>      >     +#define DRM_IOCTL_I915_GEM_EXECBUFFER3
>>      >       DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
>>      >     drm_i915_gem_execbuffer3)
>>      >     +
>>      >     +/**
>>      >     + * struct drm_i915_gem_timeline_fence - An input or output
>>     timeline
>>      >     fence.
>>      >     + *
>>      >     + * The operation will wait for input fence to signal.
>>      >     + *
>>      >     + * The returned output fence will be signaled after the
>>     completion
>>      >     of the
>>      >     + * operation.
>>      >     + */
>>      >     +struct drm_i915_gem_timeline_fence {
>>      >     +       /** @handle: User's handle for a drm_syncobj to wait
>>     on or
>>      >     signal. */
>>      >     +       __u32 handle;
>>      >     +
>>      >     +       /**
>>      >     +        * @flags: Supported flags are:
>>      >     +        *
>>      >     +        * I915_TIMELINE_FENCE_WAIT:
>>      >     +        * Wait for the input fence before the operation.
>>      >     +        *
>>      >     +        * I915_TIMELINE_FENCE_SIGNAL:
>>      >     +        * Return operation completion fence as output.
>>      >     +        */
>>      >     +       __u32 flags;
>>      >     +#define I915_TIMELINE_FENCE_WAIT            (1 << 0)
>>      >     +#define I915_TIMELINE_FENCE_SIGNAL          (1 << 1)
>>      >     +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS
>>      >     (-(I915_TIMELINE_FENCE_SIGNAL << 1))
>>      >     +
>>      >     +       /**
>>      >     +        * @value: A point in the timeline.
>>      >     +        * Value must be 0 for a binary drm_syncobj. A Value
>>     of 0 for a
>>      >     +        * timeline drm_syncobj is invalid as it turns a
>>     drm_syncobj
>>      >     into a
>>      >     +        * binary one.
>>      >     +        */
>>      >     +       __u64 value;
>>      >     +};
>>      >     +
>>      >     +/**
>>      >     + * struct drm_i915_gem_vm_bind - VA to object mapping to 
>> bind.
>>      >     + *
>>      >     + * This structure is passed to VM_BIND ioctl and specifies 
>> the
>>      >     mapping of GPU
>>      >     + * virtual address (VA) range to the section of an object 
>> that
>>      >     should be bound
>>      >     + * in the device page table of the specified address space 
>> (VM).
>>      >     + * The VA range specified must be unique (ie., not currently
>>     bound)
>>      >     and can
>>      >     + * be mapped to whole object or a section of the object 
>> (partial
>>      >     binding).
>>      >     + * Multiple VA mappings can be created to the same section
>>     of the
>>      >     object
>>      >     + * (aliasing).
>>      >     + *
>>      >     + * The @start, @offset and @length must be 4K page aligned.
>>     However
>>      >     the DG2
>>      >     + * and XEHPSDV has 64K page size for device local-memory 
>> and has
>>      >     compact page
>>      >     + * table. On those platforms, for binding device local-memory
>>      >     objects, the
>>      >     + * @start must be 2M aligned, @offset and @length must be
>>     64K aligned.
>>      >
>>      >
>>      > This is not acceptable.  We need 64K granularity.  This 
>> includes the
>>      > starting address, the BO offset, and the length.  Why?  The 
>> tl;dr is
>>      > that it's a requirement for about 50% of D3D12 apps if we want
>>     them to
>>      > run on Linux via D3D12.  A longer explanation follows.  I don't
>>      > necessarily expect kernel folks to get all the details but 
>> hopefully
>>      > I'll have left enough of a map that some of the Intel Mesa 
>> folks can
>>      > help fill in details.
>>      >
>>      > Many modern D3D12 apps have a hard requirement on Tier2 tiled
>>      > resources.  This is a feature that Intel has supported in the 
>> D3D12
>>      > driver since Skylake.  In order to implement this feature, VKD3D
>>      > requires the various sparseResidencyImage* and
>>     sparseResidency*Sampled
>>      > Vulkan features.  If we want those apps to work (there's getting
>>     to be
>>      > quite a few of them), we need to implement the Vulkan sparse
>>     residency
>>      > features.
>>      > |
>>      > |
>>      > What is sparse residency?  I'm glad you asked!  The sparse 
>> residency
>>      > features allow a client to separately bind each miplevel or array
>>     slice
>>      > of an image to a chunk of device memory independently, without
>>     affecting
>>      > any other areas of the image.  Once you get to a high enough
>>     miplevel
>>      > that everything fits inside a single sparse image block (that's a
>>      > technical Vulkan term you can search for in the spec), you can
>>     enter a
>>      > "miptail" which contains all the remaining miplevels in a single
>>     sparse
>>      > image block.
>>      >
>>      > The term "sparse image block" is what the Vulkan spec uses.  On
>>     Intel
>>      > hardware and in the docs, it's what we call a "tile".     
>> Specifically, the
>>      > image needs to use Yf or Ys tiling on SKL-TGL or a Tile64 on
>>     DG2+.  This
>>      > is because Tile4 and legacy X and Y-tiling don't provide any
>>     guarantees
>>      > about page alignment for slices.  Yf, Ys, and Tile64, on the
>>     other hand,
>>      > align all slices of the image to a tile boundary, allowing us 
>> to map
>>      > memory to different slices independently, assuming we have 64K
>>     (or 4K
>>      > for Yf) VM_BIND granularity.  (4K isn't actually a requirement for
>>      > SKL-TGL; we can use Ys all the time which has 64K tiles but
>>     there's no
>>      > reason to not support 4K alignments on integrated.)
>>      >
>>      > Someone may be tempted to ask, "Can't we wiggle the strides
>>     around or
>>      > something to make it work?"  I thought about that and no, you 
>> can't.
>>      > The problem here is LOD2+.  Sure, you can have a stride such that
>>     the
>>      > image is a multiple of 2M worth of tiles across.  That'll work
>>     fine for
>>      > LOD0 and LOD1; both will be 2M aligned.  However, LOD2 won't be 
>> and
>>      > there's no way to control that.  The hardware will place it to
>>     the right
>>      > of LOD1 by ROUND_UP(width, tile_width) pixels and there's nothing
>>     you
>>      > can do about that.  If that position doesn't happen to hit a 2M
>>      > boundary, you're out of luck.
>>      >
>>      > I hope that explanation provides enough detail.  Sadly, this is
>>     one of
>>      > those things which has a lot of moving pieces all over different
>>     bits of
>>      > the hardware and various APIs and they all have to work together
>>     just
>>      > right for it to all come out in the end.  But, yeah, we really
>>     need 64K
>>      > aligned binding if we want VKD3D to work.
>>
>>     Just to confirm, the new model would be to enforce 64K GTT alignment
>>     for
>>     lmem pages, and then for smem pages we would only require 4K 
>> alignment,
>>     but with the added restriction that userspace will never try to 
>> mix the
>>     two (lmem vs smem) within the same 2M va range (page-table). The 
>> kernel
>>     will verify this and throw an error if needed. This model should work
>>     with the above?
>>
>>
>> Mesa doesn't have full control over BO placement so I don't think we 
>> can guarantee quite as much as you want there.  We can guarantee, I 
>> think, that we never place LMEM-only and SMEM-only in the same 2M 
>> block. However, most BOs will be LMEM+SMEM (with a preference for 
>> LMEM) and then it'll be up to the kernel to sort out any issues.  Is 
>> that reasonable?
> 
> That seems tricky for the lmem + smem case. On DG2 the hw design is such 
> that you can't have 64K and 4K GTT pages within the same page-table, 
> since the entire page-table is either operating in 64K or 4K GTT page 
> mode (there is some special bit on the PDE that we need to toggle to 
> turn on/off the 64K mode).

Just to be clear here, "64K GTT" is not just the GTT alignment but also 
the requirement that the device/physical address of the underlying 
memory also has 64K alignment, and ofc is also 64K of contiguous memory. 
We can easily guarantee that with lmem, but with smem it might be more 
tricky (like shmem). Also I'm not sure if there is already a mechanism 
to request some kind of min alignment/page-size with the dma-api, but 
perhaps something like that could work here?

> 
>>
>> --Jason
>>
>>      >
>>      > --Jason
>>      >
>>      >     + * Also, for such mappings, i915 will reserve the whole 2M 
>> range
>>      >     for it so as
>>      >     + * to not allow multiple mappings in that 2M range 
>> (Compact page
>>      >     tables do not
>>      >     + * allow 64K page and 4K page bindings in the same 2M range).
>>      >     + *
>>      >     + * Error code -EINVAL will be returned if @start, @offset and
>>      >     @length are not
>>      >     + * properly aligned. In version 1 (See
>>     I915_PARAM_VM_BIND_VERSION),
>>      >     error code
>>      >     + * -ENOSPC will be returned if the VA range specified 
>> can't be
>>      >     reserved.
>>      >     + *
>>      >     + * VM_BIND/UNBIND ioctl calls executed on different CPU 
>> threads
>>      >     concurrently
>>      >     + * are not ordered. Furthermore, parts of the VM_BIND
>>     operation can
>>      >     be done
>>      >     + * asynchronously, if valid @fence is specified.
>>      >     + */
>>      >     +struct drm_i915_gem_vm_bind {
>>      >     +       /** @vm_id: VM (address space) id to bind */
>>      >     +       __u32 vm_id;
>>      >     +
>>      >     +       /** @handle: Object handle */
>>      >     +       __u32 handle;
>>      >     +
>>      >     +       /** @start: Virtual Address start to bind */
>>      >     +       __u64 start;
>>      >     +
>>      >     +       /** @offset: Offset in object to bind */
>>      >     +       __u64 offset;
>>      >     +
>>      >     +       /** @length: Length of mapping to bind */
>>      >     +       __u64 length;
>>      >     +
>>      >     +       /**
>>      >     +        * @flags: Supported flags are:
>>      >     +        *
>>      >     +        * I915_GEM_VM_BIND_READONLY:
>>      >     +        * Mapping is read-only.
>>      >     +        *
>>      >     +        * I915_GEM_VM_BIND_CAPTURE:
>>      >     +        * Capture this mapping in the dump upon GPU error.
>>      >     +        */
>>      >     +       __u64 flags;
>>      >     +#define I915_GEM_VM_BIND_READONLY      (1 << 1)
>>      >     +#define I915_GEM_VM_BIND_CAPTURE       (1 << 2)
>>      >     +
>>      >     +       /**
>>      >     +        * @fence: Timeline fence for bind completion 
>> signaling.
>>      >     +        *
>>      >     +        * It is an out fence, hence using
>>     I915_TIMELINE_FENCE_WAIT flag
>>      >     +        * is invalid, and an error will be returned.
>>      >     +        */
>>      >     +       struct drm_i915_gem_timeline_fence fence;
>>      >     +
>>      >     +       /**
>>      >     +        * @extensions: Zero-terminated chain of extensions.
>>      >     +        *
>>      >     +        * For future extensions. See struct 
>> i915_user_extension.
>>      >     +        */
>>      >     +       __u64 extensions;
>>      >     +};
>>      >     +
>>      >     +/**
>>      >     + * struct drm_i915_gem_vm_unbind - VA to object mapping to
>>     unbind.
>>      >     + *
>>      >     + * This structure is passed to VM_UNBIND ioctl and 
>> specifies the
>>      >     GPU virtual
>>      >     + * address (VA) range that should be unbound from the device
>>     page
>>      >     table of the
>>      >     + * specified address space (VM). VM_UNBIND will force 
>> unbind the
>>      >     specified
>>      >     + * range from device page table without waiting for any GPU
>>     job to
>>      >     complete.
>>      >     + * It is UMDs responsibility to ensure the mapping is no
>>     longer in
>>      >     use before
>>      >     + * calling VM_UNBIND.
>>      >     + *
>>      >     + * If the specified mapping is not found, the ioctl will 
>> simply
>>      >     return without
>>      >     + * any error.
>>      >     + *
>>      >     + * VM_BIND/UNBIND ioctl calls executed on different CPU 
>> threads
>>      >     concurrently
>>      >     + * are not ordered. Furthermore, parts of the VM_UNBIND
>>     operation
>>      >     can be done
>>      >     + * asynchronously, if valid @fence is specified.
>>      >     + */
>>      >     +struct drm_i915_gem_vm_unbind {
>>      >     +       /** @vm_id: VM (address space) id to bind */
>>      >     +       __u32 vm_id;
>>      >     +
>>      >     +       /** @rsvd: Reserved, MBZ */
>>      >     +       __u32 rsvd;
>>      >     +
>>      >     +       /** @start: Virtual Address start to unbind */
>>      >     +       __u64 start;
>>      >     +
>>      >     +       /** @length: Length of mapping to unbind */
>>      >     +       __u64 length;
>>      >     +
>>      >     +       /** @flags: Currently reserved, MBZ */
>>      >     +       __u64 flags;
>>      >     +
>>      >     +       /**
>>      >     +        * @fence: Timeline fence for unbind completion
>>     signaling.
>>      >     +        *
>>      >     +        * It is an out fence, hence using
>>     I915_TIMELINE_FENCE_WAIT flag
>>      >     +        * is invalid, and an error will be returned.
>>      >     +        */
>>      >     +       struct drm_i915_gem_timeline_fence fence;
>>      >     +
>>      >     +       /**
>>      >     +        * @extensions: Zero-terminated chain of extensions.
>>      >     +        *
>>      >     +        * For future extensions. See struct 
>> i915_user_extension.
>>      >     +        */
>>      >     +       __u64 extensions;
>>      >     +};
>>      >     +
>>      >     +/**
>>      >     + * struct drm_i915_gem_execbuffer3 - Structure for
>>      >     DRM_I915_GEM_EXECBUFFER3
>>      >     + * ioctl.
>>      >     + *
>>      >     + * DRM_I915_GEM_EXECBUFFER3 ioctl only works in VM_BIND 
>> mode and
>>      >     VM_BIND mode
>>      >     + * only works with this ioctl for submission.
>>      >     + * See I915_VM_CREATE_FLAGS_USE_VM_BIND.
>>      >     + */
>>      >     +struct drm_i915_gem_execbuffer3 {
>>      >     +       /**
>>      >     +        * @ctx_id: Context id
>>      >     +        *
>>      >     +        * Only contexts with user engine map are allowed.
>>      >     +        */
>>      >     +       __u32 ctx_id;
>>      >     +
>>      >     +       /**
>>      >     +        * @engine_idx: Engine index
>>      >     +        *
>>      >     +        * An index in the user engine map of the context
>>     specified
>>      >     by @ctx_id.
>>      >     +        */
>>      >     +       __u32 engine_idx;
>>      >     +
>>      >     +       /**
>>      >     +        * @batch_address: Batch gpu virtual address/es.
>>      >     +        *
>>      >     +        * For normal submission, it is the gpu virtual
>>     address of
>>      >     the batch
>>      >     +        * buffer. For parallel submission, it is a pointer 
>> to an
>>      >     array of
>>      >     +        * batch buffer gpu virtual addresses with array size
>>     equal
>>      >     to the
>>      >     +        * number of (parallel) engines involved in that
>>     submission (See
>>      >     +        * struct i915_context_engines_parallel_submit).
>>      >     +        */
>>      >     +       __u64 batch_address;
>>      >     +
>>      >     +       /** @flags: Currently reserved, MBZ */
>>      >     +       __u64 flags;
>>      >     +
>>      >     +       /** @rsvd1: Reserved, MBZ */
>>      >     +       __u32 rsvd1;
>>      >     +
>>      >     +       /** @fence_count: Number of fences in
>>     @timeline_fences array. */
>>      >     +       __u32 fence_count;
>>      >     +
>>      >     +       /**
>>      >     +        * @timeline_fences: Pointer to an array of timeline
>>     fences.
>>      >     +        *
>>      >     +        * Timeline fences are of format struct
>>      >     drm_i915_gem_timeline_fence.
>>      >     +        */
>>      >     +       __u64 timeline_fences;
>>      >     +
>>      >     +       /** @rsvd2: Reserved, MBZ */
>>      >     +       __u64 rsvd2;
>>      >     +
>>      >     +       /**
>>      >     +        * @extensions: Zero-terminated chain of extensions.
>>      >     +        *
>>      >     +        * For future extensions. See struct 
>> i915_user_extension.
>>      >     +        */
>>      >     +       __u64 extensions;
>>      >     +};
>>      >     +
>>      >     +/**
>>      >     + * struct drm_i915_gem_create_ext_vm_private - Extension 
>> to make
>>      >     the object
>>      >     + * private to the specified VM.
>>      >     + *
>>      >     + * See struct drm_i915_gem_create_ext.
>>      >     + */
>>      >     +struct drm_i915_gem_create_ext_vm_private {
>>      >     +#define I915_GEM_CREATE_EXT_VM_PRIVATE         2
>>      >     +       /** @base: Extension link. See struct
>>     i915_user_extension. */
>>      >     +       struct i915_user_extension base;
>>      >     +
>>      >     +       /** @vm_id: Id of the VM to which the object is
>>     private */
>>      >     +       __u32 vm_id;
>>      >     +};
>>      >     --
>>      >     2.21.0.rc0.32.g243a4c7e27
>>      >
>>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
  2022-06-30 17:12             ` [Intel-gfx] " Zanoni, Paulo R
@ 2022-07-04 19:35               ` Lionel Landwerlin
  -1 siblings, 0 replies; 53+ messages in thread
From: Lionel Landwerlin @ 2022-07-04 19:35 UTC (permalink / raw)
  To: Zanoni, Paulo R, Vishwanathapura, Niranjana
  Cc: Brost, Matthew, Wilson, Chris P, Ursulin, Tvrtko, intel-gfx,
	dri-devel, Hellstrom, Thomas, Zeng, Oak, Auld, Matthew, jason,
	Vetter, Daniel, christian.koenig

[-- Attachment #1: Type: text/plain, Size: 774 bytes --]

On 30/06/2022 20:12, Zanoni, Paulo R wrote:
>>>> Can you please explain what happens when we try to write to a range
>>>> that's bound as read-only?
>>>>
>>> It will be mapped as read-only in device page table. Hence any
>>> write access will fail. I would expect a CAT error reported.
>> What's a CAT error? Does this lead to machine freeze or a GPU hang?
>> Let's make sure we document this.
>>
> Catastrophic error.
>
Reading the documentation, it seems the behavior depends on the context 
type.

With the Legacy 64bit context type, writes are ignored (BSpec 531) :

     - "For legacy context, the access rights are not applicable and 
should not be considered during page walk."

For Advanced 64bit context type, I think the HW will generate a pagefault.


-Lionel

[-- Attachment #2: Type: text/html, Size: 2378 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
@ 2022-07-04 19:35               ` Lionel Landwerlin
  0 siblings, 0 replies; 53+ messages in thread
From: Lionel Landwerlin @ 2022-07-04 19:35 UTC (permalink / raw)
  To: Zanoni, Paulo R, Vishwanathapura, Niranjana
  Cc: Wilson, Chris P, intel-gfx, dri-devel, Hellstrom, Thomas, Auld,
	Matthew, Vetter, Daniel, christian.koenig

[-- Attachment #1: Type: text/plain, Size: 774 bytes --]

On 30/06/2022 20:12, Zanoni, Paulo R wrote:
>>>> Can you please explain what happens when we try to write to a range
>>>> that's bound as read-only?
>>>>
>>> It will be mapped as read-only in device page table. Hence any
>>> write access will fail. I would expect a CAT error reported.
>> What's a CAT error? Does this lead to machine freeze or a GPU hang?
>> Let's make sure we document this.
>>
> Catastrophic error.
>
Reading the documentation, it seems the behavior depends on the context 
type.

With the Legacy 64bit context type, writes are ignored (BSpec 531) :

     - "For legacy context, the access rights are not applicable and 
should not be considered during page walk."

For Advanced 64bit context type, I think the HW will generate a pagefault.


-Lionel

[-- Attachment #2: Type: text/html, Size: 2378 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/doc/rfc: i915 VM_BIND feature design + uapi
  2022-07-01  0:31 [PATCH v8 0/3] " Niranjana Vishwanathapura
@ 2022-07-01  1:19 ` Patchwork
  0 siblings, 0 replies; 53+ messages in thread
From: Patchwork @ 2022-07-01  1:19 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 10161 bytes --]

== Series Details ==

Series: drm/doc/rfc: i915 VM_BIND feature design + uapi
URL   : https://patchwork.freedesktop.org/series/105845/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11837 -> Patchwork_105845v1
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/index.html

Participating hosts (42 -> 43)
------------------------------

  Additional (1): fi-icl-u2 

Known issues
------------

  Here are the changes found in Patchwork_105845v1 that come from known issues:

### CI changes ###

#### Issues hit ####

  * boot:
    - fi-bxt-dsi:         [PASS][1] -> [FAIL][2] ([i915#6003])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11837/fi-bxt-dsi/boot.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-bxt-dsi/boot.html

  

### IGT changes ###

#### Issues hit ####

  * igt@gem_huc_copy@huc-copy:
    - fi-icl-u2:          NOTRUN -> [SKIP][3] ([i915#2190])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-icl-u2/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@random-engines:
    - fi-icl-u2:          NOTRUN -> [SKIP][4] ([i915#4613]) +3 similar issues
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-icl-u2/igt@gem_lmem_swapping@random-engines.html

  * igt@i915_selftest@live@gem:
    - fi-blb-e6850:       [PASS][5] -> [DMESG-FAIL][6] ([i915#4528])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11837/fi-blb-e6850/igt@i915_selftest@live@gem.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-blb-e6850/igt@i915_selftest@live@gem.html

  * igt@i915_selftest@live@hangcheck:
    - fi-snb-2600:        [PASS][7] -> [INCOMPLETE][8] ([i915#3921])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11837/fi-snb-2600/igt@i915_selftest@live@hangcheck.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-snb-2600/igt@i915_selftest@live@hangcheck.html
    - fi-bdw-5557u:       NOTRUN -> [INCOMPLETE][9] ([i915#3921])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-bdw-5557u/igt@i915_selftest@live@hangcheck.html
    - bat-dg1-6:          [PASS][10] -> [DMESG-FAIL][11] ([i915#4494] / [i915#4957])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11837/bat-dg1-6/igt@i915_selftest@live@hangcheck.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/bat-dg1-6/igt@i915_selftest@live@hangcheck.html

  * igt@i915_suspend@basic-s3-without-i915:
    - fi-icl-u2:          NOTRUN -> [SKIP][12] ([i915#5903])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-icl-u2/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-rkl-guc:         NOTRUN -> [SKIP][13] ([fdo#111827])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-rkl-guc/igt@kms_chamelium@common-hpd-after-suspend.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-icl-u2:          NOTRUN -> [SKIP][14] ([fdo#111827]) +8 similar issues
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor:
    - fi-icl-u2:          NOTRUN -> [SKIP][15] ([i915#4103])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-icl-u2/igt@kms_cursor_legacy@basic-busy-flip-before-cursor.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions-varying-size:
    - fi-bsw-kefka:       [PASS][16] -> [FAIL][17] ([i915#6298])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11837/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions-varying-size.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions-varying-size.html

  * igt@kms_flip@basic-flip-vs-modeset@a-edp1:
    - fi-tgl-u2:          [PASS][18] -> [DMESG-WARN][19] ([i915#402]) +2 similar issues
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11837/fi-tgl-u2/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-tgl-u2/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html

  * igt@kms_force_connector_basic@force-connector-state:
    - fi-icl-u2:          NOTRUN -> [WARN][20] ([i915#6008])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-icl-u2/igt@kms_force_connector_basic@force-connector-state.html

  * igt@kms_force_connector_basic@force-load-detect:
    - fi-icl-u2:          NOTRUN -> [SKIP][21] ([fdo#109285])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-icl-u2/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_setmode@basic-clone-single-crtc:
    - fi-icl-u2:          NOTRUN -> [SKIP][22] ([i915#3555])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-icl-u2/igt@kms_setmode@basic-clone-single-crtc.html

  * igt@prime_vgem@basic-userptr:
    - fi-icl-u2:          NOTRUN -> [SKIP][23] ([fdo#109295] / [i915#3301])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-icl-u2/igt@prime_vgem@basic-userptr.html

  * igt@runner@aborted:
    - fi-blb-e6850:       NOTRUN -> [FAIL][24] ([fdo#109271] / [i915#2403] / [i915#4312])
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-blb-e6850/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@gt_lrc:
    - fi-rkl-guc:         [INCOMPLETE][25] ([i915#4983]) -> [PASS][26]
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11837/fi-rkl-guc/igt@i915_selftest@live@gt_lrc.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-rkl-guc/igt@i915_selftest@live@gt_lrc.html

  * igt@i915_selftest@live@gt_timelines:
    - {bat-dg2-9}:        [DMESG-WARN][27] ([i915#5763]) -> [PASS][28] +1 similar issue
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11837/bat-dg2-9/igt@i915_selftest@live@gt_timelines.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/bat-dg2-9/igt@i915_selftest@live@gt_timelines.html

  * igt@i915_selftest@live@workarounds:
    - fi-bdw-5557u:       [INCOMPLETE][29] ([i915#6307]) -> [PASS][30]
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11837/fi-bdw-5557u/igt@i915_selftest@live@workarounds.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-bdw-5557u/igt@i915_selftest@live@workarounds.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions:
    - fi-bsw-kefka:       [FAIL][31] ([i915#6298]) -> [PASS][32]
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11837/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html

  * igt@kms_flip@basic-flip-vs-modeset@b-edp1:
    - {bat-adlp-6}:       [DMESG-WARN][33] ([i915#3576]) -> [PASS][34]
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11837/bat-adlp-6/igt@kms_flip@basic-flip-vs-modeset@b-edp1.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/bat-adlp-6/igt@kms_flip@basic-flip-vs-modeset@b-edp1.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2403]: https://gitlab.freedesktop.org/drm/intel/issues/2403
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3576]: https://gitlab.freedesktop.org/drm/intel/issues/3576
  [i915#3921]: https://gitlab.freedesktop.org/drm/intel/issues/3921
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4494]: https://gitlab.freedesktop.org/drm/intel/issues/4494
  [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4957]: https://gitlab.freedesktop.org/drm/intel/issues/4957
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#5270]: https://gitlab.freedesktop.org/drm/intel/issues/5270
  [i915#5763]: https://gitlab.freedesktop.org/drm/intel/issues/5763
  [i915#5903]: https://gitlab.freedesktop.org/drm/intel/issues/5903
  [i915#6003]: https://gitlab.freedesktop.org/drm/intel/issues/6003
  [i915#6008]: https://gitlab.freedesktop.org/drm/intel/issues/6008
  [i915#6297]: https://gitlab.freedesktop.org/drm/intel/issues/6297
  [i915#6298]: https://gitlab.freedesktop.org/drm/intel/issues/6298
  [i915#6307]: https://gitlab.freedesktop.org/drm/intel/issues/6307


Build changes
-------------

  * Linux: CI_DRM_11837 -> Patchwork_105845v1

  CI-20190529: 20190529
  CI_DRM_11837: e19040cd831f5ac1c94bb265ebd846c94f6fed80 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6553: 3cf110f8dcd1f4f02cf84339664b413abdaebf7d @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_105845v1: e19040cd831f5ac1c94bb265ebd846c94f6fed80 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

1579e6b297bb drm/doc/rfc: VM_BIND uapi definition
c7637d3e75fc drm/i915: Update i915 uapi documentation
c91256b6179f drm/doc/rfc: VM_BIND feature design document

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105845v1/index.html

[-- Attachment #2: Type: text/html, Size: 11328 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/doc/rfc: i915 VM_BIND feature design + uapi
  2022-06-24  5:32 [PATCH v5 0/3] " Niranjana Vishwanathapura
@ 2022-06-24  6:32 ` Patchwork
  0 siblings, 0 replies; 53+ messages in thread
From: Patchwork @ 2022-06-24  6:32 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 6252 bytes --]

== Series Details ==

Series: drm/doc/rfc: i915 VM_BIND feature design + uapi
URL   : https://patchwork.freedesktop.org/series/105577/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11802 -> Patchwork_105577v1
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105577v1/index.html

Participating hosts (36 -> 36)
------------------------------

  Additional (1): fi-pnv-d510 
  Missing    (1): fi-bdw-samus 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_105577v1:

### IGT changes ###

#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_pm_rpm@module-reload:
    - {bat-adln-1}:       NOTRUN -> [INCOMPLETE][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105577v1/bat-adln-1/igt@i915_pm_rpm@module-reload.html

  
Known issues
------------

  Here are the changes found in Patchwork_105577v1 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live@hangcheck:
    - bat-dg1-6:          NOTRUN -> [DMESG-FAIL][2] ([i915#4494] / [i915#4957])
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105577v1/bat-dg1-6/igt@i915_selftest@live@hangcheck.html

  * igt@i915_selftest@live@requests:
    - fi-pnv-d510:        NOTRUN -> [DMESG-FAIL][3] ([i915#4528])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105577v1/fi-pnv-d510/igt@i915_selftest@live@requests.html

  * igt@i915_suspend@basic-s2idle-without-i915:
    - bat-dg1-6:          NOTRUN -> [INCOMPLETE][4] ([i915#6011])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105577v1/bat-dg1-6/igt@i915_suspend@basic-s2idle-without-i915.html

  * igt@i915_suspend@basic-s3-without-i915:
    - fi-rkl-11600:       [PASS][5] -> [INCOMPLETE][6] ([i915#5982])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11802/fi-rkl-11600/igt@i915_suspend@basic-s3-without-i915.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105577v1/fi-rkl-11600/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-hsw-4770:        NOTRUN -> [SKIP][7] ([fdo#109271] / [fdo#111827])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105577v1/fi-hsw-4770/igt@kms_chamelium@common-hpd-after-suspend.html

  * igt@kms_psr@primary_page_flip:
    - fi-pnv-d510:        NOTRUN -> [SKIP][8] ([fdo#109271]) +42 similar issues
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105577v1/fi-pnv-d510/igt@kms_psr@primary_page_flip.html

  * igt@runner@aborted:
    - fi-pnv-d510:        NOTRUN -> [FAIL][9] ([fdo#109271] / [i915#2403] / [i915#4312])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105577v1/fi-pnv-d510/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@i915_module_load@reload:
    - {bat-adln-1}:       [DMESG-WARN][10] -> [PASS][11]
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11802/bat-adln-1/igt@i915_module_load@reload.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105577v1/bat-adln-1/igt@i915_module_load@reload.html

  * igt@i915_selftest@live@gt_engines:
    - bat-dg1-6:          [INCOMPLETE][12] ([i915#4418]) -> [PASS][13]
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11802/bat-dg1-6/igt@i915_selftest@live@gt_engines.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105577v1/bat-dg1-6/igt@i915_selftest@live@gt_engines.html

  * igt@i915_selftest@live@hangcheck:
    - fi-hsw-4770:        [INCOMPLETE][14] ([i915#3303] / [i915#4785]) -> [PASS][15]
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11802/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105577v1/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html
    - bat-dg1-5:          [DMESG-FAIL][16] ([i915#4494] / [i915#4957]) -> [PASS][17]
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11802/bat-dg1-5/igt@i915_selftest@live@hangcheck.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105577v1/bat-dg1-5/igt@i915_selftest@live@hangcheck.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#2403]: https://gitlab.freedesktop.org/drm/intel/issues/2403
  [i915#3303]: https://gitlab.freedesktop.org/drm/intel/issues/3303
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4418]: https://gitlab.freedesktop.org/drm/intel/issues/4418
  [i915#4494]: https://gitlab.freedesktop.org/drm/intel/issues/4494
  [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4785]: https://gitlab.freedesktop.org/drm/intel/issues/4785
  [i915#4957]: https://gitlab.freedesktop.org/drm/intel/issues/4957
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#5763]: https://gitlab.freedesktop.org/drm/intel/issues/5763
  [i915#5903]: https://gitlab.freedesktop.org/drm/intel/issues/5903
  [i915#5982]: https://gitlab.freedesktop.org/drm/intel/issues/5982
  [i915#6011]: https://gitlab.freedesktop.org/drm/intel/issues/6011


Build changes
-------------

  * Linux: CI_DRM_11802 -> Patchwork_105577v1

  CI-20190529: 20190529
  CI_DRM_11802: a9cd66449a986ed9cd1e90f0dbda3bf1a11619d9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6541: 02153f109bd422d93cfce7f5aa9d7b0e22fab13c @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_105577v1: a9cd66449a986ed9cd1e90f0dbda3bf1a11619d9 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

c0e698153eb7 drm/doc/rfc: VM_BIND uapi definition
e9e9c8edbec6 drm/i915: Update i915 uapi documentation
899a3305e771 drm/doc/rfc: VM_BIND feature design document

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105577v1/index.html

[-- Attachment #2: Type: text/html, Size: 7163 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2022-07-04 19:35 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-26  1:49 [PATCH v7 0/3] drm/doc/rfc: i915 VM_BIND feature design + uapi Niranjana Vishwanathapura
2022-06-26  1:49 ` [Intel-gfx] " Niranjana Vishwanathapura
2022-06-26  1:49 ` [PATCH v6 1/3] drm/doc/rfc: VM_BIND feature design document Niranjana Vishwanathapura
2022-06-26  1:49   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-06-27 16:12   ` Daniel Vetter
2022-06-30  0:38   ` Zanoni, Paulo R
2022-06-30  0:38     ` [Intel-gfx] " Zanoni, Paulo R
2022-06-30  5:39     ` Niranjana Vishwanathapura
2022-06-30  5:39       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-06-26  1:49 ` [PATCH v6 2/3] drm/i915: Update i915 uapi documentation Niranjana Vishwanathapura
2022-06-26  1:49   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-06-26  1:49 ` [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition Niranjana Vishwanathapura
2022-06-26  1:49   ` [Intel-gfx] " Niranjana Vishwanathapura
2022-06-30  0:33   ` Zanoni, Paulo R
2022-06-30  0:33     ` [Intel-gfx] " Zanoni, Paulo R
2022-06-30  6:08     ` Niranjana Vishwanathapura
2022-06-30  6:08       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-06-30  6:39       ` Zanoni, Paulo R
2022-06-30  6:39         ` [Intel-gfx] " Zanoni, Paulo R
2022-06-30 16:18         ` Niranjana Vishwanathapura
2022-06-30 16:18           ` [Intel-gfx] " Niranjana Vishwanathapura
2022-06-30 17:12           ` Zanoni, Paulo R
2022-06-30 17:12             ` [Intel-gfx] " Zanoni, Paulo R
2022-06-30 18:30             ` Niranjana Vishwanathapura
2022-06-30 18:30               ` [Intel-gfx] " Niranjana Vishwanathapura
2022-07-04 19:35             ` Lionel Landwerlin
2022-07-04 19:35               ` [Intel-gfx] " Lionel Landwerlin
2022-06-30  7:59       ` Tvrtko Ursulin
2022-06-30 16:22         ` Niranjana Vishwanathapura
2022-07-01  8:11           ` Tvrtko Ursulin
2022-06-30  5:11   ` Jason Ekstrand
2022-06-30  5:11     ` [Intel-gfx] " Jason Ekstrand
2022-06-30  6:15     ` Niranjana Vishwanathapura
2022-06-30  6:15       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-06-30  6:24       ` Niranjana Vishwanathapura
2022-06-30 15:14     ` Matthew Auld
2022-06-30 15:14       ` [Intel-gfx] " Matthew Auld
2022-06-30 15:34       ` Jason Ekstrand
2022-06-30 15:34         ` [Intel-gfx] " Jason Ekstrand
2022-06-30 16:23         ` Matthew Auld
2022-06-30 16:23           ` [Intel-gfx] " Matthew Auld
2022-07-01  8:45           ` Matthew Auld
2022-07-01  8:45             ` [Intel-gfx] " Matthew Auld
2022-06-30 15:45   ` Jason Ekstrand
2022-06-30 15:45     ` [Intel-gfx] " Jason Ekstrand
2022-06-30 18:32     ` Niranjana Vishwanathapura
2022-06-30 18:32       ` [Intel-gfx] " Niranjana Vishwanathapura
2022-06-26  2:03 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/doc/rfc: i915 VM_BIND feature design + uapi Patchwork
2022-06-26  2:03 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-06-26  2:25 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-06-27 21:34 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2022-07-01  0:31 [PATCH v8 0/3] " Niranjana Vishwanathapura
2022-07-01  1:19 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2022-06-24  5:32 [PATCH v5 0/3] " Niranjana Vishwanathapura
2022-06-24  6:32 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.