All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	Zanoni@vger.kernel.org, "Paulo R" <paulo.r.zanoni@intel.com>,
	"Nirmoy Das" <nirmoy.das@intel.com>,
	"Danilo Krummrich" <dakr@redhat.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	"Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>,
	"Oak Zeng" <oak.zeng@intel.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Francois Dugast" <francois.dugast@intel.com>,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org
Subject: [PATCH v6] Documentation/gpu: Add a VM_BIND async document
Date: Tue, 15 Aug 2023 15:11:40 +0200	[thread overview]
Message-ID: <20230815131140.37844-1-thomas.hellstrom@linux.intel.com> (raw)

Add a motivation for and description of asynchronous VM_BIND operation

v2:
- Fix typos (Nirmoy Das)
- Improve the description of a memory fence (Oak Zeng)
- Add a reference to the document in the Xe RFC.
- Add pointers to sample uAPI suggestions
v3:
- Address review comments (Danilo Krummrich)
- Formatting fixes
v4:
- Address typos (Francois Dugast)
- Explain why in-fences are not allowed for VM_BIND operations for long-
  running workloads (Matthew Brost)
v5:
- More typo- and style fixing
- Further clarify the implications of disallowing in-fences for VM_BIND
  operations for long-running workloads (Matthew Brost)
v6:
- Point out that a gpu_vm is a virtual GPU Address space.
  (Danilo Krummrich)
- For an explanation of dma-fences point to the dma-fence documentation.
  (Paolo Zanoni)
- Clarify that VM_BIND errors are reported synchronously. (Paolo Zanoni)
- Use an rst doc reference when pointing to the async vm_bind document
  from the xe merge plan.
- Add the VM_BIND documentation to the drm documentation table-of-content,
  using an intermediate "Misc DRM driver uAPI- and feature implementation
  guidelines"

Cc: Zanoni, Paulo R <paulo.r.zanoni@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Danilo Krummrich <dakr@redhat.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 Documentation/gpu/drm-vm-bind-async.rst       | 178 ++++++++++++++++++
 .../gpu/implementation_guidelines.rst         |  11 ++
 Documentation/gpu/index.rst                   |   1 +
 Documentation/gpu/rfc/xe.rst                  |   4 +-
 4 files changed, 192 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/gpu/drm-vm-bind-async.rst
 create mode 100644 Documentation/gpu/implementation_guidelines.rst

diff --git a/Documentation/gpu/drm-vm-bind-async.rst b/Documentation/gpu/drm-vm-bind-async.rst
new file mode 100644
index 000000000000..80cd93a5b59c
--- /dev/null
+++ b/Documentation/gpu/drm-vm-bind-async.rst
@@ -0,0 +1,178 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+
+====================
+Asynchronous VM_BIND
+====================
+
+Nomenclature:
+=============
+
+* ``VRAM``: On-device memory. Sometimes referred to as device local memory.
+
+* ``gpu_vm``: A virtual GPU address space. Typically per process, but
+  can be shared by multiple processes.
+
+* ``VM_BIND``: An operation or a list of operations to modify a gpu_vm using
+  an IOCTL. The operations include mapping and unmapping system- or
+  VRAM memory.
+
+* ``syncobj``: A container that abstracts synchronization objects. The
+  synchronization objects can be either generic, like dma-fences or
+  driver specific. A syncobj typically indicates the type of the
+  underlying synchronization object.
+
+* ``in-syncobj``: Argument to a VM_BIND IOCTL, the VM_BIND operation waits
+  for these before starting.
+
+* ``out-syncobj``: Argument to a VM_BIND_IOCTL, the VM_BIND operation
+  signals these when the bind operation is complete.
+
+* ``dma-fence``: A cross-driver synchronization object. A basic
+  understanding of dma-fences is required to digest this
+  document. Please refer to the ``DMA Fences`` section of the
+  :doc:`dma-buf doc </driver-api/dma-buf>`.
+
+* ``memory fence``: A synchronization object, different from a dma-fence.
+  A memory fence uses the value of a specified memory location to determine
+  signaled status. A memory fence can be awaited and signaled by both
+  the GPU and CPU. Memory fences are sometimes referred to as
+  user-fences, userspace-fences or gpu futexes and do not necessarily obey
+  the dma-fence rule of signaling within a "reasonable amount of time".
+  The kernel should thus avoid waiting for memory fences with locks held.
+
+* ``long-running workload``: A workload that may take more than the
+  current stipulated dma-fence maximum signal delay to complete and
+  which therefore needs to set the gpu_vm or the GPU execution context in
+  a certain mode that disallows completion dma-fences.
+
+* ``exec function``: An exec function is a function that revalidates all
+  affected gpu_vmas, submits a GPU command batch and registers the
+  dma_fence representing the GPU command's activity with all affected
+  dma_resvs. For completeness, although not covered by this document,
+  it's worth mentioning that an exec function may also be the
+  revalidation worker that is used by some drivers in compute /
+  long-running mode.
+
+* ``bind context``: A context identifier used for the VM_BIND
+  operation. VM_BIND operations that use the same bind context can be
+  assumed, where it matters, to complete in order of submission. No such
+  assumptions can be made for VM_BIND operations using separate bind contexts.
+
+* ``UMD``: User-mode driver.
+
+* ``KMD``: Kernel-mode driver.
+
+
+Synchronous / Asynchronous VM_BIND operation
+============================================
+
+Synchronous VM_BIND
+___________________
+With Synchronous VM_BIND, the VM_BIND operations all complete before the
+IOCTL returns. A synchronous VM_BIND takes neither in-fences nor
+out-fences. Synchronous VM_BIND may block and wait for GPU operations;
+for example swap-in or clearing, or even previous binds.
+
+Asynchronous VM_BIND
+____________________
+Asynchronous VM_BIND accepts both in-syncobjs and out-syncobjs. While the
+IOCTL may return immediately, the VM_BIND operations wait for the in-syncobjs
+before modifying the GPU page-tables, and signal the out-syncobjs when
+the modification is done in the sense that the next exec function that
+awaits for the out-syncobjs will see the change. Errors are reported
+synchronously.
+In low-memory situations the implementation may block, performing the
+VM_BIND synchronously, because there might not be enough memory
+immediately available for preparing the asynchronous operation.
+
+If the VM_BIND IOCTL takes a list or an array of operations as an argument,
+the in-syncobjs needs to signal before the first operation starts to
+execute, and the out-syncobjs signal after the last operation
+completes. Operations in the operation list can be assumed, where it
+matters, to complete in order.
+
+Since asynchronous VM_BIND operations may use dma-fences embedded in
+out-syncobjs and internally in KMD to signal bind completion,  any
+memory fences given as VM_BIND in-fences need to be awaited
+synchronously before the VM_BIND ioctl returns, since dma-fences,
+required to signal in a reasonable amount of time, can never be made
+to depend on memory fences that don't have such a restriction.
+
+To aid in supporting user-space queues, the VM_BIND may take a bind context.
+
+The purpose of an Asynchronous VM_BIND operation is for user-mode
+drivers to be able to pipeline interleaved gpu_vm modifications and
+exec functions. For long-running workloads, such pipelining of a bind
+operation is not allowed and any in-fences need to be awaited
+synchronously. The reason for this is twofold. First, any memory
+fences gated by a long-running workload and used as in-syncobjs for the
+VM_BIND operation will need to be awaited synchronously anyway (see
+above). Second, any dma-fences used as in-syncobjs for VM_BIND
+operations for long-running workloads will not allow for pipelining
+anyway since long-running workloads don't allow for dma-fences as
+out-syncobjs, so while theoretically possible the use of them is
+questionable and should be rejected until there is a valuable use-case.
+Note that this is not a limitation imposed by dma-fence rules, but
+rather a limitation imposed to keep KMD implementation simple. It does
+not affect using dma-fences as dependencies for the long-running
+workload itself, which is allowed by dma-fence rules, but rather for
+the VM_BIND operation only.
+
+Also for VM_BINDS for long-running gpu_vms the user-mode driver should typically
+select memory fences as out-fences since that gives greater flexibility for
+the kernel mode driver to inject other operations into the bind /
+unbind operations. Like for example inserting breakpoints into batch
+buffers. The workload execution can then easily be pipelined behind
+the bind completion using the memory out-fence as the signal condition
+for a GPU semaphore embedded by UMD in the workload.
+
+Multi-operation VM_BIND IOCTL error handling and interrupts
+===========================================================
+
+The VM_BIND operations of the IOCTL may error due to lack of resources
+to complete and also due to interrupted waits. In both situations UMD
+should preferably restart the IOCTL after taking suitable action. If
+UMD has over-committed a memory resource, an -ENOSPC error will be
+returned, and UMD may then unbind resources that are not used at the
+moment and restart the IOCTL. On -EINTR, UMD should simply restart the
+IOCTL and on -ENOMEM user-space may either attempt to free known
+system memory resources or abort the operation. If aborting as a
+result of a failed operation in a list of operations, some operations
+may still have completed, and to get back to a known state, user-space
+should therefore attempt to unbind all virtual memory regions touched
+by the failing IOCTL.
+Unbind operations are guaranteed not to cause any errors due to
+resource constraints.
+In between a failed VM_BIND IOCTL and a successful restart there may
+be implementation defined restrictions on the use of the gpu_vm. For a
+description why, please see KMD implementation details under `error
+state saving`_.
+
+Sample uAPI implementations
+===========================
+Suggested uAPI implementations at the moment of writing can be found for
+the `Nouveau driver
+<https://patchwork.freedesktop.org/patch/543260/?series=112994&rev=6>`_.
+and for the `Xe driver
+<https://cgit.freedesktop.org/drm/drm-xe/diff/include/uapi/drm/xe_drm.h?h=drm-xe-next&id=9cb016ebbb6a275f57b1cb512b95d5a842391ad7>`_.
+
+KMD implementation details
+==========================
+
+Error state saving
+__________________
+Open: When the VM_BIND IOCTL returns an error, some or even parts of
+an operation may have been completed. If the IOCTL is restarted, in
+order to know where to restart, the KMD can either put the gpu_vm in
+an error state and save one instance of the needed restart state
+internally. In this case, KMD needs to block further modifications of
+the gpu_vm state that may cause additional failures requiring a
+restart state save, until the error has been fully resolved. If the
+uAPI instead defines a pointer to a UMD allocated cookie in the IOCTL
+struct, it could also choose to store the restart state in that cookie.
+
+The restart state may, for example, be the number of successfully
+completed operations.
+
+Easiest for UMD would of course be if KMD did a full unwind on error
+so that no error state needs to be saved.
diff --git a/Documentation/gpu/implementation_guidelines.rst b/Documentation/gpu/implementation_guidelines.rst
new file mode 100644
index 000000000000..32907bbf4c99
--- /dev/null
+++ b/Documentation/gpu/implementation_guidelines.rst
@@ -0,0 +1,11 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+
+===========================================================
+Misc DRM driver uAPI- and feature implementation guidelines
+===========================================================
+
+.. toctree::
+
+   drm-vm-bind-async
+
+  
diff --git a/Documentation/gpu/index.rst b/Documentation/gpu/index.rst
index eee5996acf2c..015b3fe726de 100644
--- a/Documentation/gpu/index.rst
+++ b/Documentation/gpu/index.rst
@@ -17,6 +17,7 @@ GPU Driver Developer's Guide
    backlight
    vga-switcheroo
    vgaarbiter
+   implementation_guidelines
    todo
    rfc/index
 
diff --git a/Documentation/gpu/rfc/xe.rst b/Documentation/gpu/rfc/xe.rst
index 2516fe141db6..e095b23e3dfd 100644
--- a/Documentation/gpu/rfc/xe.rst
+++ b/Documentation/gpu/rfc/xe.rst
@@ -138,8 +138,8 @@ memory fences. Ideally with helper support so people don't get it wrong in all
 possible ways.
 
 As a key measurable result, the benefits of ASYNC VM_BIND and a discussion of
-various flavors, error handling and a sample API should be documented here or in
-a separate document pointed to by this document.
+various flavors, error handling and sample API suggestions are documented in
+:doc:`The ASYNC VM_BIND document </gpu/drm-vm-bind-async>`.
 
 Userptr integration and vm_bind
 -------------------------------
-- 
2.41.0


WARNING: multiple messages have this Message-ID (diff)
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: "Matthew Brost" <matthew.brost@intel.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Francois Dugast" <francois.dugast@intel.com>,
	"Paulo R" <paulo.r.zanoni@intel.com>,
	linux-kernel@vger.kernel.org, "Oak Zeng" <oak.zeng@intel.com>,
	"Danilo Krummrich" <dakr@redhat.com>,
	dri-devel@lists.freedesktop.org,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	Zanoni@freedesktop.org, "Nirmoy Das" <nirmoy.das@intel.com>
Subject: [PATCH v6] Documentation/gpu: Add a VM_BIND async document
Date: Tue, 15 Aug 2023 15:11:40 +0200	[thread overview]
Message-ID: <20230815131140.37844-1-thomas.hellstrom@linux.intel.com> (raw)

Add a motivation for and description of asynchronous VM_BIND operation

v2:
- Fix typos (Nirmoy Das)
- Improve the description of a memory fence (Oak Zeng)
- Add a reference to the document in the Xe RFC.
- Add pointers to sample uAPI suggestions
v3:
- Address review comments (Danilo Krummrich)
- Formatting fixes
v4:
- Address typos (Francois Dugast)
- Explain why in-fences are not allowed for VM_BIND operations for long-
  running workloads (Matthew Brost)
v5:
- More typo- and style fixing
- Further clarify the implications of disallowing in-fences for VM_BIND
  operations for long-running workloads (Matthew Brost)
v6:
- Point out that a gpu_vm is a virtual GPU Address space.
  (Danilo Krummrich)
- For an explanation of dma-fences point to the dma-fence documentation.
  (Paolo Zanoni)
- Clarify that VM_BIND errors are reported synchronously. (Paolo Zanoni)
- Use an rst doc reference when pointing to the async vm_bind document
  from the xe merge plan.
- Add the VM_BIND documentation to the drm documentation table-of-content,
  using an intermediate "Misc DRM driver uAPI- and feature implementation
  guidelines"

Cc: Zanoni, Paulo R <paulo.r.zanoni@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Danilo Krummrich <dakr@redhat.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 Documentation/gpu/drm-vm-bind-async.rst       | 178 ++++++++++++++++++
 .../gpu/implementation_guidelines.rst         |  11 ++
 Documentation/gpu/index.rst                   |   1 +
 Documentation/gpu/rfc/xe.rst                  |   4 +-
 4 files changed, 192 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/gpu/drm-vm-bind-async.rst
 create mode 100644 Documentation/gpu/implementation_guidelines.rst

diff --git a/Documentation/gpu/drm-vm-bind-async.rst b/Documentation/gpu/drm-vm-bind-async.rst
new file mode 100644
index 000000000000..80cd93a5b59c
--- /dev/null
+++ b/Documentation/gpu/drm-vm-bind-async.rst
@@ -0,0 +1,178 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+
+====================
+Asynchronous VM_BIND
+====================
+
+Nomenclature:
+=============
+
+* ``VRAM``: On-device memory. Sometimes referred to as device local memory.
+
+* ``gpu_vm``: A virtual GPU address space. Typically per process, but
+  can be shared by multiple processes.
+
+* ``VM_BIND``: An operation or a list of operations to modify a gpu_vm using
+  an IOCTL. The operations include mapping and unmapping system- or
+  VRAM memory.
+
+* ``syncobj``: A container that abstracts synchronization objects. The
+  synchronization objects can be either generic, like dma-fences or
+  driver specific. A syncobj typically indicates the type of the
+  underlying synchronization object.
+
+* ``in-syncobj``: Argument to a VM_BIND IOCTL, the VM_BIND operation waits
+  for these before starting.
+
+* ``out-syncobj``: Argument to a VM_BIND_IOCTL, the VM_BIND operation
+  signals these when the bind operation is complete.
+
+* ``dma-fence``: A cross-driver synchronization object. A basic
+  understanding of dma-fences is required to digest this
+  document. Please refer to the ``DMA Fences`` section of the
+  :doc:`dma-buf doc </driver-api/dma-buf>`.
+
+* ``memory fence``: A synchronization object, different from a dma-fence.
+  A memory fence uses the value of a specified memory location to determine
+  signaled status. A memory fence can be awaited and signaled by both
+  the GPU and CPU. Memory fences are sometimes referred to as
+  user-fences, userspace-fences or gpu futexes and do not necessarily obey
+  the dma-fence rule of signaling within a "reasonable amount of time".
+  The kernel should thus avoid waiting for memory fences with locks held.
+
+* ``long-running workload``: A workload that may take more than the
+  current stipulated dma-fence maximum signal delay to complete and
+  which therefore needs to set the gpu_vm or the GPU execution context in
+  a certain mode that disallows completion dma-fences.
+
+* ``exec function``: An exec function is a function that revalidates all
+  affected gpu_vmas, submits a GPU command batch and registers the
+  dma_fence representing the GPU command's activity with all affected
+  dma_resvs. For completeness, although not covered by this document,
+  it's worth mentioning that an exec function may also be the
+  revalidation worker that is used by some drivers in compute /
+  long-running mode.
+
+* ``bind context``: A context identifier used for the VM_BIND
+  operation. VM_BIND operations that use the same bind context can be
+  assumed, where it matters, to complete in order of submission. No such
+  assumptions can be made for VM_BIND operations using separate bind contexts.
+
+* ``UMD``: User-mode driver.
+
+* ``KMD``: Kernel-mode driver.
+
+
+Synchronous / Asynchronous VM_BIND operation
+============================================
+
+Synchronous VM_BIND
+___________________
+With Synchronous VM_BIND, the VM_BIND operations all complete before the
+IOCTL returns. A synchronous VM_BIND takes neither in-fences nor
+out-fences. Synchronous VM_BIND may block and wait for GPU operations;
+for example swap-in or clearing, or even previous binds.
+
+Asynchronous VM_BIND
+____________________
+Asynchronous VM_BIND accepts both in-syncobjs and out-syncobjs. While the
+IOCTL may return immediately, the VM_BIND operations wait for the in-syncobjs
+before modifying the GPU page-tables, and signal the out-syncobjs when
+the modification is done in the sense that the next exec function that
+awaits for the out-syncobjs will see the change. Errors are reported
+synchronously.
+In low-memory situations the implementation may block, performing the
+VM_BIND synchronously, because there might not be enough memory
+immediately available for preparing the asynchronous operation.
+
+If the VM_BIND IOCTL takes a list or an array of operations as an argument,
+the in-syncobjs needs to signal before the first operation starts to
+execute, and the out-syncobjs signal after the last operation
+completes. Operations in the operation list can be assumed, where it
+matters, to complete in order.
+
+Since asynchronous VM_BIND operations may use dma-fences embedded in
+out-syncobjs and internally in KMD to signal bind completion,  any
+memory fences given as VM_BIND in-fences need to be awaited
+synchronously before the VM_BIND ioctl returns, since dma-fences,
+required to signal in a reasonable amount of time, can never be made
+to depend on memory fences that don't have such a restriction.
+
+To aid in supporting user-space queues, the VM_BIND may take a bind context.
+
+The purpose of an Asynchronous VM_BIND operation is for user-mode
+drivers to be able to pipeline interleaved gpu_vm modifications and
+exec functions. For long-running workloads, such pipelining of a bind
+operation is not allowed and any in-fences need to be awaited
+synchronously. The reason for this is twofold. First, any memory
+fences gated by a long-running workload and used as in-syncobjs for the
+VM_BIND operation will need to be awaited synchronously anyway (see
+above). Second, any dma-fences used as in-syncobjs for VM_BIND
+operations for long-running workloads will not allow for pipelining
+anyway since long-running workloads don't allow for dma-fences as
+out-syncobjs, so while theoretically possible the use of them is
+questionable and should be rejected until there is a valuable use-case.
+Note that this is not a limitation imposed by dma-fence rules, but
+rather a limitation imposed to keep KMD implementation simple. It does
+not affect using dma-fences as dependencies for the long-running
+workload itself, which is allowed by dma-fence rules, but rather for
+the VM_BIND operation only.
+
+Also for VM_BINDS for long-running gpu_vms the user-mode driver should typically
+select memory fences as out-fences since that gives greater flexibility for
+the kernel mode driver to inject other operations into the bind /
+unbind operations. Like for example inserting breakpoints into batch
+buffers. The workload execution can then easily be pipelined behind
+the bind completion using the memory out-fence as the signal condition
+for a GPU semaphore embedded by UMD in the workload.
+
+Multi-operation VM_BIND IOCTL error handling and interrupts
+===========================================================
+
+The VM_BIND operations of the IOCTL may error due to lack of resources
+to complete and also due to interrupted waits. In both situations UMD
+should preferably restart the IOCTL after taking suitable action. If
+UMD has over-committed a memory resource, an -ENOSPC error will be
+returned, and UMD may then unbind resources that are not used at the
+moment and restart the IOCTL. On -EINTR, UMD should simply restart the
+IOCTL and on -ENOMEM user-space may either attempt to free known
+system memory resources or abort the operation. If aborting as a
+result of a failed operation in a list of operations, some operations
+may still have completed, and to get back to a known state, user-space
+should therefore attempt to unbind all virtual memory regions touched
+by the failing IOCTL.
+Unbind operations are guaranteed not to cause any errors due to
+resource constraints.
+In between a failed VM_BIND IOCTL and a successful restart there may
+be implementation defined restrictions on the use of the gpu_vm. For a
+description why, please see KMD implementation details under `error
+state saving`_.
+
+Sample uAPI implementations
+===========================
+Suggested uAPI implementations at the moment of writing can be found for
+the `Nouveau driver
+<https://patchwork.freedesktop.org/patch/543260/?series=112994&rev=6>`_.
+and for the `Xe driver
+<https://cgit.freedesktop.org/drm/drm-xe/diff/include/uapi/drm/xe_drm.h?h=drm-xe-next&id=9cb016ebbb6a275f57b1cb512b95d5a842391ad7>`_.
+
+KMD implementation details
+==========================
+
+Error state saving
+__________________
+Open: When the VM_BIND IOCTL returns an error, some or even parts of
+an operation may have been completed. If the IOCTL is restarted, in
+order to know where to restart, the KMD can either put the gpu_vm in
+an error state and save one instance of the needed restart state
+internally. In this case, KMD needs to block further modifications of
+the gpu_vm state that may cause additional failures requiring a
+restart state save, until the error has been fully resolved. If the
+uAPI instead defines a pointer to a UMD allocated cookie in the IOCTL
+struct, it could also choose to store the restart state in that cookie.
+
+The restart state may, for example, be the number of successfully
+completed operations.
+
+Easiest for UMD would of course be if KMD did a full unwind on error
+so that no error state needs to be saved.
diff --git a/Documentation/gpu/implementation_guidelines.rst b/Documentation/gpu/implementation_guidelines.rst
new file mode 100644
index 000000000000..32907bbf4c99
--- /dev/null
+++ b/Documentation/gpu/implementation_guidelines.rst
@@ -0,0 +1,11 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+
+===========================================================
+Misc DRM driver uAPI- and feature implementation guidelines
+===========================================================
+
+.. toctree::
+
+   drm-vm-bind-async
+
+  
diff --git a/Documentation/gpu/index.rst b/Documentation/gpu/index.rst
index eee5996acf2c..015b3fe726de 100644
--- a/Documentation/gpu/index.rst
+++ b/Documentation/gpu/index.rst
@@ -17,6 +17,7 @@ GPU Driver Developer's Guide
    backlight
    vga-switcheroo
    vgaarbiter
+   implementation_guidelines
    todo
    rfc/index
 
diff --git a/Documentation/gpu/rfc/xe.rst b/Documentation/gpu/rfc/xe.rst
index 2516fe141db6..e095b23e3dfd 100644
--- a/Documentation/gpu/rfc/xe.rst
+++ b/Documentation/gpu/rfc/xe.rst
@@ -138,8 +138,8 @@ memory fences. Ideally with helper support so people don't get it wrong in all
 possible ways.
 
 As a key measurable result, the benefits of ASYNC VM_BIND and a discussion of
-various flavors, error handling and a sample API should be documented here or in
-a separate document pointed to by this document.
+various flavors, error handling and sample API suggestions are documented in
+:doc:`The ASYNC VM_BIND document </gpu/drm-vm-bind-async>`.
 
 Userptr integration and vm_bind
 -------------------------------
-- 
2.41.0


WARNING: multiple messages have this Message-ID (diff)
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: Francois Dugast <francois.dugast@intel.com>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	linux-kernel@vger.kernel.org, Danilo Krummrich <dakr@redhat.com>,
	dri-devel@lists.freedesktop.org, Daniel Vetter <daniel@ffwll.ch>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Zanoni@freedesktop.org, Nirmoy Das <nirmoy.das@intel.com>
Subject: [Intel-xe] [PATCH v6] Documentation/gpu: Add a VM_BIND async document
Date: Tue, 15 Aug 2023 15:11:40 +0200	[thread overview]
Message-ID: <20230815131140.37844-1-thomas.hellstrom@linux.intel.com> (raw)

Add a motivation for and description of asynchronous VM_BIND operation

v2:
- Fix typos (Nirmoy Das)
- Improve the description of a memory fence (Oak Zeng)
- Add a reference to the document in the Xe RFC.
- Add pointers to sample uAPI suggestions
v3:
- Address review comments (Danilo Krummrich)
- Formatting fixes
v4:
- Address typos (Francois Dugast)
- Explain why in-fences are not allowed for VM_BIND operations for long-
  running workloads (Matthew Brost)
v5:
- More typo- and style fixing
- Further clarify the implications of disallowing in-fences for VM_BIND
  operations for long-running workloads (Matthew Brost)
v6:
- Point out that a gpu_vm is a virtual GPU Address space.
  (Danilo Krummrich)
- For an explanation of dma-fences point to the dma-fence documentation.
  (Paolo Zanoni)
- Clarify that VM_BIND errors are reported synchronously. (Paolo Zanoni)
- Use an rst doc reference when pointing to the async vm_bind document
  from the xe merge plan.
- Add the VM_BIND documentation to the drm documentation table-of-content,
  using an intermediate "Misc DRM driver uAPI- and feature implementation
  guidelines"

Cc: Zanoni, Paulo R <paulo.r.zanoni@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Danilo Krummrich <dakr@redhat.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 Documentation/gpu/drm-vm-bind-async.rst       | 178 ++++++++++++++++++
 .../gpu/implementation_guidelines.rst         |  11 ++
 Documentation/gpu/index.rst                   |   1 +
 Documentation/gpu/rfc/xe.rst                  |   4 +-
 4 files changed, 192 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/gpu/drm-vm-bind-async.rst
 create mode 100644 Documentation/gpu/implementation_guidelines.rst

diff --git a/Documentation/gpu/drm-vm-bind-async.rst b/Documentation/gpu/drm-vm-bind-async.rst
new file mode 100644
index 000000000000..80cd93a5b59c
--- /dev/null
+++ b/Documentation/gpu/drm-vm-bind-async.rst
@@ -0,0 +1,178 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+
+====================
+Asynchronous VM_BIND
+====================
+
+Nomenclature:
+=============
+
+* ``VRAM``: On-device memory. Sometimes referred to as device local memory.
+
+* ``gpu_vm``: A virtual GPU address space. Typically per process, but
+  can be shared by multiple processes.
+
+* ``VM_BIND``: An operation or a list of operations to modify a gpu_vm using
+  an IOCTL. The operations include mapping and unmapping system- or
+  VRAM memory.
+
+* ``syncobj``: A container that abstracts synchronization objects. The
+  synchronization objects can be either generic, like dma-fences or
+  driver specific. A syncobj typically indicates the type of the
+  underlying synchronization object.
+
+* ``in-syncobj``: Argument to a VM_BIND IOCTL, the VM_BIND operation waits
+  for these before starting.
+
+* ``out-syncobj``: Argument to a VM_BIND_IOCTL, the VM_BIND operation
+  signals these when the bind operation is complete.
+
+* ``dma-fence``: A cross-driver synchronization object. A basic
+  understanding of dma-fences is required to digest this
+  document. Please refer to the ``DMA Fences`` section of the
+  :doc:`dma-buf doc </driver-api/dma-buf>`.
+
+* ``memory fence``: A synchronization object, different from a dma-fence.
+  A memory fence uses the value of a specified memory location to determine
+  signaled status. A memory fence can be awaited and signaled by both
+  the GPU and CPU. Memory fences are sometimes referred to as
+  user-fences, userspace-fences or gpu futexes and do not necessarily obey
+  the dma-fence rule of signaling within a "reasonable amount of time".
+  The kernel should thus avoid waiting for memory fences with locks held.
+
+* ``long-running workload``: A workload that may take more than the
+  current stipulated dma-fence maximum signal delay to complete and
+  which therefore needs to set the gpu_vm or the GPU execution context in
+  a certain mode that disallows completion dma-fences.
+
+* ``exec function``: An exec function is a function that revalidates all
+  affected gpu_vmas, submits a GPU command batch and registers the
+  dma_fence representing the GPU command's activity with all affected
+  dma_resvs. For completeness, although not covered by this document,
+  it's worth mentioning that an exec function may also be the
+  revalidation worker that is used by some drivers in compute /
+  long-running mode.
+
+* ``bind context``: A context identifier used for the VM_BIND
+  operation. VM_BIND operations that use the same bind context can be
+  assumed, where it matters, to complete in order of submission. No such
+  assumptions can be made for VM_BIND operations using separate bind contexts.
+
+* ``UMD``: User-mode driver.
+
+* ``KMD``: Kernel-mode driver.
+
+
+Synchronous / Asynchronous VM_BIND operation
+============================================
+
+Synchronous VM_BIND
+___________________
+With Synchronous VM_BIND, the VM_BIND operations all complete before the
+IOCTL returns. A synchronous VM_BIND takes neither in-fences nor
+out-fences. Synchronous VM_BIND may block and wait for GPU operations;
+for example swap-in or clearing, or even previous binds.
+
+Asynchronous VM_BIND
+____________________
+Asynchronous VM_BIND accepts both in-syncobjs and out-syncobjs. While the
+IOCTL may return immediately, the VM_BIND operations wait for the in-syncobjs
+before modifying the GPU page-tables, and signal the out-syncobjs when
+the modification is done in the sense that the next exec function that
+awaits for the out-syncobjs will see the change. Errors are reported
+synchronously.
+In low-memory situations the implementation may block, performing the
+VM_BIND synchronously, because there might not be enough memory
+immediately available for preparing the asynchronous operation.
+
+If the VM_BIND IOCTL takes a list or an array of operations as an argument,
+the in-syncobjs needs to signal before the first operation starts to
+execute, and the out-syncobjs signal after the last operation
+completes. Operations in the operation list can be assumed, where it
+matters, to complete in order.
+
+Since asynchronous VM_BIND operations may use dma-fences embedded in
+out-syncobjs and internally in KMD to signal bind completion,  any
+memory fences given as VM_BIND in-fences need to be awaited
+synchronously before the VM_BIND ioctl returns, since dma-fences,
+required to signal in a reasonable amount of time, can never be made
+to depend on memory fences that don't have such a restriction.
+
+To aid in supporting user-space queues, the VM_BIND may take a bind context.
+
+The purpose of an Asynchronous VM_BIND operation is for user-mode
+drivers to be able to pipeline interleaved gpu_vm modifications and
+exec functions. For long-running workloads, such pipelining of a bind
+operation is not allowed and any in-fences need to be awaited
+synchronously. The reason for this is twofold. First, any memory
+fences gated by a long-running workload and used as in-syncobjs for the
+VM_BIND operation will need to be awaited synchronously anyway (see
+above). Second, any dma-fences used as in-syncobjs for VM_BIND
+operations for long-running workloads will not allow for pipelining
+anyway since long-running workloads don't allow for dma-fences as
+out-syncobjs, so while theoretically possible the use of them is
+questionable and should be rejected until there is a valuable use-case.
+Note that this is not a limitation imposed by dma-fence rules, but
+rather a limitation imposed to keep KMD implementation simple. It does
+not affect using dma-fences as dependencies for the long-running
+workload itself, which is allowed by dma-fence rules, but rather for
+the VM_BIND operation only.
+
+Also for VM_BINDS for long-running gpu_vms the user-mode driver should typically
+select memory fences as out-fences since that gives greater flexibility for
+the kernel mode driver to inject other operations into the bind /
+unbind operations. Like for example inserting breakpoints into batch
+buffers. The workload execution can then easily be pipelined behind
+the bind completion using the memory out-fence as the signal condition
+for a GPU semaphore embedded by UMD in the workload.
+
+Multi-operation VM_BIND IOCTL error handling and interrupts
+===========================================================
+
+The VM_BIND operations of the IOCTL may error due to lack of resources
+to complete and also due to interrupted waits. In both situations UMD
+should preferably restart the IOCTL after taking suitable action. If
+UMD has over-committed a memory resource, an -ENOSPC error will be
+returned, and UMD may then unbind resources that are not used at the
+moment and restart the IOCTL. On -EINTR, UMD should simply restart the
+IOCTL and on -ENOMEM user-space may either attempt to free known
+system memory resources or abort the operation. If aborting as a
+result of a failed operation in a list of operations, some operations
+may still have completed, and to get back to a known state, user-space
+should therefore attempt to unbind all virtual memory regions touched
+by the failing IOCTL.
+Unbind operations are guaranteed not to cause any errors due to
+resource constraints.
+In between a failed VM_BIND IOCTL and a successful restart there may
+be implementation defined restrictions on the use of the gpu_vm. For a
+description why, please see KMD implementation details under `error
+state saving`_.
+
+Sample uAPI implementations
+===========================
+Suggested uAPI implementations at the moment of writing can be found for
+the `Nouveau driver
+<https://patchwork.freedesktop.org/patch/543260/?series=112994&rev=6>`_.
+and for the `Xe driver
+<https://cgit.freedesktop.org/drm/drm-xe/diff/include/uapi/drm/xe_drm.h?h=drm-xe-next&id=9cb016ebbb6a275f57b1cb512b95d5a842391ad7>`_.
+
+KMD implementation details
+==========================
+
+Error state saving
+__________________
+Open: When the VM_BIND IOCTL returns an error, some or even parts of
+an operation may have been completed. If the IOCTL is restarted, in
+order to know where to restart, the KMD can either put the gpu_vm in
+an error state and save one instance of the needed restart state
+internally. In this case, KMD needs to block further modifications of
+the gpu_vm state that may cause additional failures requiring a
+restart state save, until the error has been fully resolved. If the
+uAPI instead defines a pointer to a UMD allocated cookie in the IOCTL
+struct, it could also choose to store the restart state in that cookie.
+
+The restart state may, for example, be the number of successfully
+completed operations.
+
+Easiest for UMD would of course be if KMD did a full unwind on error
+so that no error state needs to be saved.
diff --git a/Documentation/gpu/implementation_guidelines.rst b/Documentation/gpu/implementation_guidelines.rst
new file mode 100644
index 000000000000..32907bbf4c99
--- /dev/null
+++ b/Documentation/gpu/implementation_guidelines.rst
@@ -0,0 +1,11 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+
+===========================================================
+Misc DRM driver uAPI- and feature implementation guidelines
+===========================================================
+
+.. toctree::
+
+   drm-vm-bind-async
+
+  
diff --git a/Documentation/gpu/index.rst b/Documentation/gpu/index.rst
index eee5996acf2c..015b3fe726de 100644
--- a/Documentation/gpu/index.rst
+++ b/Documentation/gpu/index.rst
@@ -17,6 +17,7 @@ GPU Driver Developer's Guide
    backlight
    vga-switcheroo
    vgaarbiter
+   implementation_guidelines
    todo
    rfc/index
 
diff --git a/Documentation/gpu/rfc/xe.rst b/Documentation/gpu/rfc/xe.rst
index 2516fe141db6..e095b23e3dfd 100644
--- a/Documentation/gpu/rfc/xe.rst
+++ b/Documentation/gpu/rfc/xe.rst
@@ -138,8 +138,8 @@ memory fences. Ideally with helper support so people don't get it wrong in all
 possible ways.
 
 As a key measurable result, the benefits of ASYNC VM_BIND and a discussion of
-various flavors, error handling and a sample API should be documented here or in
-a separate document pointed to by this document.
+various flavors, error handling and sample API suggestions are documented in
+:doc:`The ASYNC VM_BIND document </gpu/drm-vm-bind-async>`.
 
 Userptr integration and vm_bind
 -------------------------------
-- 
2.41.0


             reply	other threads:[~2023-08-15 13:13 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-15 13:11 Thomas Hellström [this message]
2023-08-15 13:11 ` [Intel-xe] [PATCH v6] Documentation/gpu: Add a VM_BIND async document Thomas Hellström
2023-08-15 13:11 ` Thomas Hellström
2023-08-15 13:14 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
2023-08-15 13:14 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-08-15 13:15 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-08-15 13:19 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-08-15 13:20 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-08-15 13:20 ` [Intel-xe] ✗ CI.checksparse: warning " Patchwork
2023-08-15 13:57 ` [Intel-xe] ✗ CI.BAT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230815131140.37844-1-thomas.hellstrom@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=Zanoni@vger.kernel.org \
    --cc=dakr@redhat.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.brost@intel.com \
    --cc=nirmoy.das@intel.com \
    --cc=oak.zeng@intel.com \
    --cc=paulo.r.zanoni@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.