All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Danilo Krummrich" <dakr@redhat.com>,
	"Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>,
	"Oak Zeng" <oak.zeng@intel.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Francois Dugast" <francois.dugast@intel.com>,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org
Subject: [RFC PATCH] Documentation/gpu: Draft VM_BIND locking document
Date: Fri, 30 Jun 2023 18:44:52 +0200	[thread overview]
Message-ID: <20230630164452.9228-1-thomas.hellstrom@linux.intel.com> (raw)

Add the first version of the VM_BIND locking document which is
intended to be part of the xe driver upstreaming agreement.

The document describes and discuss the locking used during exec-
functions, evicton and for userptr gmvas. Intention is to be using the
same nomenclature as the drm-vm-bind-async.rst, but to keep naming a
little shorter, use gvm and gmva instead of gpu_vm and gpu_vma which
is used in the previous document, with an intention to modify also
that document.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 Documentation/gpu/drm-vm-bind-locking.rst | 339 ++++++++++++++++++++++
 1 file changed, 339 insertions(+)
 create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst

diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
new file mode 100644
index 000000000000..f5d1a40a2906
--- /dev/null
+++ b/Documentation/gpu/drm-vm-bind-locking.rst
@@ -0,0 +1,339 @@
+===============
+VM_BIND locking
+===============
+
+This document attempts to describe what's needed to get VM_BIND locking right,
+including the userptr mmu_notifier locking and it will also discuss some
+optimizations to get rid of the looping through of all userptr mappings and
+external / shared object mappings that is needed in the simplest
+implementation. It will also discuss some implications for faulting gvms.
+
+Nomenclature
+============
+
+* ``Context``: GPU execution context.
+* ``gvm``: Abstraction of a GPU address space with meta-data. Typically
+  one per client (DRM file-private), or one per context. 
+* ``gvma``: Abstraction of a GPU address range within a gvma with
+  associated meta-data. The backing storage of a gvma can either be
+  a gem buffer object or anonymous pages mapped also into the CPU
+  address space for the process.
+* ``userptr gvma or just userptr``: A gvma, the backing store of
+  which is anonymous pages as described above.
+* ``revalidating``: Revalidating a gvma means making the latest version
+  of the backing store resident and making sure the gvma's
+  page-table entries point to that backing store.
+* ``dma_fence``: A struct dma_fence that is similar to a struct completion
+  and which tracks GPU activity. When the GPU activity is finished,
+  the dma_fence signals.
+* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
+  to track GPU activity in the form of multiple dma_fences on a
+  gvm or a gem buffer object. The dma_resv contains an array / list
+  of dma_fences and a lock that needs to be held when adding
+  additional dma_fences to the dma_resv. The lock is of a type that
+  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
+* ``exec function``: An exec function is a function that revalidates all
+  affected gvmas, submits a GPU command batch and registers the
+  dma_fence representing the GPU command's activity with all affected
+  dma_resvs. For completeness, although not covered by this document,
+  it's worth mentioning that an exec function may also be the
+  revalidation worker that is used by some drivers in compute /
+  long-running mode.
+* ``local object``: A GEM object which is local to a gvm. Shared gem
+  objects also share the gvm's dma_resv.
+* ``shared object``: AKA external object: A GEM object which may be shared
+  by multiple gvms and whose backing storage may be shared with
+  other drivers.
+
+
+Introducing the locks
+=====================
+
+One of the benefits of VM_BIND is that local GEM objects share the gvm's
+dma_resv object and hence the dma_resv lock. So even with a huge
+number of local GEM objects, only one lock is needed to make the exec
+sequence atomic.
+
+The following locks and locking orders are used:
+
+* The ``gvm->lock`` (optionally an rwsem). Protects how the gvm is
+  partitioned into gvmas, protects the gvm's list of external objects,
+  and can also with some simplification protect the gvm's list of
+  userptr gvmas. With the CPU mm analogy this would correspond to the
+  mmap_lock.
+* The ``userptr_seqlock``. This lock is taken in read mode for each
+  userptr gvma on the gvm's userptr list, and in write mode during mmu
+  notifier invalidation.
+* The ``gvm->resv`` lock. Protects the gvm's list of gvmas needing
+  rebinding, and also the residency of all the gvm's local GEM object.
+* The ``gvm->userptr_notifier_lock``. This is an rwsem that is taken in read
+  mode during exec and write mode during a mmu notifier invalidation. In
+  the absence of a separate page-table lock, this lock can serve
+  together with the gvm's dma_resv lock as a page-table lock. More on
+  this below. The userptr notifier lock is per gvm.
+* The ``gvm->page_table_lock``. Protects the gvm's page-table updates. For
+  simplicity the gvm's dma_resv lock can be reused as page-table lock.
+
+There are certain optimizations described below that require
+additional locks. More on that later.
+
+.. code-block:: C
+
+   dma_resv_lock(&gvm->resv);
+
+   for_each_gvma_on_revalidate_list(gvm, &gvma) {
+		revalidate_gvma(&gvma);
+		remove_from_revalidate_list(&gvma);
+   }
+
+   add_dependencies(&gpu_job, &gvm->resv);
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gvm->resv);
+   dma_resv_unlock(&gvm->resv);
+
+Eviction of one of these local objects will then be something like the
+following:
+
+.. code-block:: C
+
+   obj = get_object_from_lru();
+
+   dma_resv_lock(obj->resv);
+   for_each_gvma_of_obj(obj, &gvma);
+		put_gvma_on_revalidate_list(&gvma);
+
+   add_dependencies(&eviction_job, &obj->resv);
+   job_dma_fence = gpu_submit(&eviction_job);
+   add_dma_fence(&obj->resv, job_dma_fence);
+
+   dma_resv_unlock(&obj->resv);
+   put_object(obj);
+
+Note that since the object is local to the gvm, it will share the gvm's
+``dma_resv`` lock so that ``obj->resv == gvm->resv``. Invalidated gvmas are put
+on the gvm's revalidation list, which is protected by ``gvm->resv``, which
+is always locked while evicting, due to the above equality.
+
+Does the gvma need to be unbound before eviction? For VM_BIND gvms
+the answer is no. Since the eviction blit or copy will wait for GPU
+idle, any attempt by the GPU to access freed memory through the
+gvma will be preceded by a new exec function, which will
+make sure the gvma is revalidated, that is not an issue.
+
+Introducing external (or shared) buffer objects
+===============================================
+
+Since shared buffer objects may be shared by multiple gvm's they
+can't share their reservation object with a single gvm, but will rather
+have a reservation object of their own. The shared objects bound to a
+gvm using one or many
+gvmas are therefore typically put on a per-gvm list which is
+protected by the gvm lock. One could in theory protect it also with
+the ``gvm->resv``, but since the list of dma_resvs to take is typically
+built before the ``gvm->resv`` is locked due to a limitation in
+the current locking helpers, that is typically not done. Also see
+below for userptr gvmas.
+
+At eviction time we now need to invalidate *all* gvmas of a shared
+object, but we can no longer be certain that we hold the gvm's
+dma_resv of all the object's gvmas. We can only be certain that we
+hold the object's private dma_resv. We can trylock the dma_resvs for
+the affected gvm's but that might be unnecessarily complex. If we
+have a ww_acquire context at hand at eviction time we can also perform
+sleeping locks of those dma_resvs but that could cause expensive
+rollbacks. One option is to just mark the invalidated gvmas with a bool
+which is inspected on the next exec function, when the gvm's
+dma_resv and the object's dma_resv is held, and the invalidated
+gvmas could then be put on the gvm's list of invalidated
+gvmas. That bool would then, although being per-gvma formally be
+protected by the object's dma_resv.
+
+The exec function would then look something like the following:
+
+.. code-block:: C
+
+   read_lock(&gvm->lock);
+		
+   dma_resv_lock(&gvm->resv);
+
+   // Shared object list is protected by the gvm->lock.
+   for_each_shared_obj(gvm, &obj) {
+		dma_resv_lock(&obj->resv);
+		move_marked_gvmas_to_revalidate_gvma_list(obj, &gvm);
+   }
+
+   for_each_gvma_to_revalidate(gvm, &gvma) {
+		revalidate_gvma(&gvma);
+		remove_from_revalidate_list(&gvma);
+   }
+
+   add_dependencies(&gpu_job, &gvm->resv);
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gvm->resv);
+   for_each_shared_obj(gvm, &obj)
+          add_dma_fence(job_dma_fence, &obj->resv);
+   dma_resv_unlock_all_resv_locks();
+
+   read_unlock(&gvm->lock);
+
+And the corresponding shared-object aware eviction would look like:
+
+.. code-block:: C
+
+   obj = get_object_from_lru();
+
+   dma_resv_lock(obj->resv);
+   for_each_gvma_of_obj(obj, &gvma);
+		if (object_is_vm_local(obj))
+		             put_gvma_on_revalidate_list(&gvma, &gvm);
+		else
+		             mark_gvma_for_revalidation(&gvma);
+
+   add_dependencies(&eviction_job, &obj->resv);
+   job_dma_fence = gpu_submit(&eviction_job);
+   add_dma_fence(&obj->resv, job_dma_fence);
+
+   dma_resv_unlock(&obj->resv);
+   put_object(obj);
+
+Yet another option is to put the gvmas to be invalidated on a separate
+gvm list protected by a lower level lock that can be taken both at eviction
+time and at transfer-to-revalidate list time. The details are not in
+this document, but this for reference implemented in the Intel xe
+driver.
+
+Introducing userptr gvmas
+=========================
+
+A userptr gvma is a gvma that, instead of mapping a buffer object to a
+GPU virtual address range, directly maps a CPU mm range of anonymous-
+or file page-cache pages.
+A very simple approach would be to just pin the pages using
+pin_user_pages() at bind time and unpin them at unbind time, but this
+creates a Denial-Of-Service vector since a single user-space process
+would be able to pin down all of system memory, which is not
+desirable. (For special use-cases and with proper accounting pinning might
+still be a desirable feature, though). What we need to do in the general case is
+to obtain a reference to the desired pages, make sure we are notified
+using a MMU notifier just before the CPU mm unmaps the pages, dirty
+them if they are not mapped read-only to the GPU, and then drop the reference.
+When we are notified by the MMU notifier that CPU mm is about to drop the
+pages, we need to stop GPU access to the pages,
+GPU page-table and make sure that before the next time the GPU tries to access
+whatever is now present in the CPU mm range, we unmap the old pages
+from the GPU page tables and repeat the process of obtaining new page
+references. Note that when the core mm decides to laundry pages, we get such
+an unmap MMU notification and can mark the pages dirty again before the
+next GPU access. We also get similar MMU notifications for NUMA accounting
+which the GPU driver doesn't really need to care about, but so far
+it's proven difficult to exclude certain notifications.
+
+Using a MMU notifier for device DMA (and other methods) is described in
+`this document 
+<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
+
+Now the method of obtaining struct page references using
+get_user_pages() unfortunately can't be used under a dma_resv lock
+since that would violate the locking order of the dma_resv lock vs the
+mmap_lock that is grabbed when resolving a CPU pagefault. This means the gvm's
+list of userptr gvmas needs to be protected by an outer lock, and this
+is the first time we strictly need the gvm->lock. While it was
+previously used also to protect the list of the gvm's shared objects,
+we could in theory have used the gvm->resv for that.
+
+The MMU interval seqlock for a userptr gvma is used in the following
+way:
+
+.. code-block:: C
+
+   down_read(&gvm->lock);
+
+   retry:
+
+   // Note: mmu_interval_read_begin() blocks until there is no
+   // invalidation notifier running anymore.
+   seq = mmu_interval_read_begin(&gvma->userptr_interval);
+   if (seq != gvma->saved_seq) {
+           obtain_new_page_pointers(&gvma);
+	   dma_resv_lock(&gvm->resv);
+	   put_gvma_on_revalidate_list(&gvma, &gvm);
+	   dma_resv_unlock(&gvm->resv);
+	   gvma->saved_seq = seq;
+   }
+
+   // The usual revalidation goes here.
+
+   // Final userptr sequence validation may not happen before the
+   // submission dma_fence is added to the gvm's resv, from the POW
+   // of the MMU invalidation notifier. Hence the
+   // userptr_notifier_lock that will make them appear atomic.
+   
+   add_dependencies(&gpu_job, &gvm->resv);
+   down_read(&gvm->userptr_notifier_lock);
+   if (mmu_interval_read_retry(&gvma->userptr_interval, gvma->saved_seq)) {
+          up_read(&gvm->userptr_notifier_lock);
+	  goto retry;
+   }
+
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gvm->resv);
+
+   for_each_shared_obj(gvm, &obj)
+          add_dma_fence(job_dma_fence, &obj->resv);
+
+   dma_resv_unlock_all_resv_locks();
+   up_read(&gvm->userptr_notifier_lock);
+   up_read(&gvm->lock);
+
+The code between ``mmu_interval_read_begin()`` and the
+``mmu_interval_read_retry()`` marks the read side critical section of
+what we call the ``userptr_seqlock``. In reality the gvm's userptr
+gvma list is looped through, and the check is done for *all* of its
+userptr gvmas, although we only show a single one here.
+
+The userptr gvma MMU invalidation notifier might be called from
+reclaim context and, again to avoid locking order violations, we can't
+take any dma_resv lock nor the gvm->lock from within it.
+
+.. code-block:: C
+
+  bool gvma_userptr_invalidate(userptr_interval, cur_seq)
+  {
+          // Make sure the exec function either sees the new sequence
+	  // and backs off or we wait for the dma-fence:
+	  
+          down_write(&gvm->userptr_notifier_lock);
+	  mmu_interval_set_seq(userptr_interval, cur_seq);
+	  up_write(&gvm->userptr_notifier_lock);
+
+	  dma_resv_wait_timeout(&gvm->resv, DMA_RESV_USAGE_BOOKKEEP,
+		                false, MAX_SCHEDULE_TIMEOUT);
+	  return true;
+  }
+
+When this invalidation notifier returns, the GPU can no longer be
+accessing the old pages of the userptr gvma and needs to redo the page-binding
+before a new GPU submission can succeed.
+
+Optimizing gvma iteration
+-------------------------
+
+Iterating through all of a gvm's userptr gvmas to check the validity
+on each exec function may be very costly. There is a scheme to avoid
+this and only iterate through the userptr gvmas that actually saw an
+invalidation notifier call since the last exec. T
+
+TODO: describe that scheme here. It's implemented in the xe driver.
+
+Locking for page-table updates at bind- and unbind time
+=======================================================
+
+TODO.
+
+Recoverable page-fault implications
+===================================
+
+TODO.
-- 
2.40.1


WARNING: multiple messages have this Message-ID (diff)
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: "Matthew Brost" <matthew.brost@intel.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Francois Dugast" <francois.dugast@intel.com>,
	linux-kernel@vger.kernel.org, "Oak Zeng" <oak.zeng@intel.com>,
	"Danilo Krummrich" <dakr@redhat.com>,
	dri-devel@lists.freedesktop.org
Subject: [RFC PATCH] Documentation/gpu: Draft VM_BIND locking document
Date: Fri, 30 Jun 2023 18:44:52 +0200	[thread overview]
Message-ID: <20230630164452.9228-1-thomas.hellstrom@linux.intel.com> (raw)

Add the first version of the VM_BIND locking document which is
intended to be part of the xe driver upstreaming agreement.

The document describes and discuss the locking used during exec-
functions, evicton and for userptr gmvas. Intention is to be using the
same nomenclature as the drm-vm-bind-async.rst, but to keep naming a
little shorter, use gvm and gmva instead of gpu_vm and gpu_vma which
is used in the previous document, with an intention to modify also
that document.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 Documentation/gpu/drm-vm-bind-locking.rst | 339 ++++++++++++++++++++++
 1 file changed, 339 insertions(+)
 create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst

diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
new file mode 100644
index 000000000000..f5d1a40a2906
--- /dev/null
+++ b/Documentation/gpu/drm-vm-bind-locking.rst
@@ -0,0 +1,339 @@
+===============
+VM_BIND locking
+===============
+
+This document attempts to describe what's needed to get VM_BIND locking right,
+including the userptr mmu_notifier locking and it will also discuss some
+optimizations to get rid of the looping through of all userptr mappings and
+external / shared object mappings that is needed in the simplest
+implementation. It will also discuss some implications for faulting gvms.
+
+Nomenclature
+============
+
+* ``Context``: GPU execution context.
+* ``gvm``: Abstraction of a GPU address space with meta-data. Typically
+  one per client (DRM file-private), or one per context. 
+* ``gvma``: Abstraction of a GPU address range within a gvma with
+  associated meta-data. The backing storage of a gvma can either be
+  a gem buffer object or anonymous pages mapped also into the CPU
+  address space for the process.
+* ``userptr gvma or just userptr``: A gvma, the backing store of
+  which is anonymous pages as described above.
+* ``revalidating``: Revalidating a gvma means making the latest version
+  of the backing store resident and making sure the gvma's
+  page-table entries point to that backing store.
+* ``dma_fence``: A struct dma_fence that is similar to a struct completion
+  and which tracks GPU activity. When the GPU activity is finished,
+  the dma_fence signals.
+* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
+  to track GPU activity in the form of multiple dma_fences on a
+  gvm or a gem buffer object. The dma_resv contains an array / list
+  of dma_fences and a lock that needs to be held when adding
+  additional dma_fences to the dma_resv. The lock is of a type that
+  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
+* ``exec function``: An exec function is a function that revalidates all
+  affected gvmas, submits a GPU command batch and registers the
+  dma_fence representing the GPU command's activity with all affected
+  dma_resvs. For completeness, although not covered by this document,
+  it's worth mentioning that an exec function may also be the
+  revalidation worker that is used by some drivers in compute /
+  long-running mode.
+* ``local object``: A GEM object which is local to a gvm. Shared gem
+  objects also share the gvm's dma_resv.
+* ``shared object``: AKA external object: A GEM object which may be shared
+  by multiple gvms and whose backing storage may be shared with
+  other drivers.
+
+
+Introducing the locks
+=====================
+
+One of the benefits of VM_BIND is that local GEM objects share the gvm's
+dma_resv object and hence the dma_resv lock. So even with a huge
+number of local GEM objects, only one lock is needed to make the exec
+sequence atomic.
+
+The following locks and locking orders are used:
+
+* The ``gvm->lock`` (optionally an rwsem). Protects how the gvm is
+  partitioned into gvmas, protects the gvm's list of external objects,
+  and can also with some simplification protect the gvm's list of
+  userptr gvmas. With the CPU mm analogy this would correspond to the
+  mmap_lock.
+* The ``userptr_seqlock``. This lock is taken in read mode for each
+  userptr gvma on the gvm's userptr list, and in write mode during mmu
+  notifier invalidation.
+* The ``gvm->resv`` lock. Protects the gvm's list of gvmas needing
+  rebinding, and also the residency of all the gvm's local GEM object.
+* The ``gvm->userptr_notifier_lock``. This is an rwsem that is taken in read
+  mode during exec and write mode during a mmu notifier invalidation. In
+  the absence of a separate page-table lock, this lock can serve
+  together with the gvm's dma_resv lock as a page-table lock. More on
+  this below. The userptr notifier lock is per gvm.
+* The ``gvm->page_table_lock``. Protects the gvm's page-table updates. For
+  simplicity the gvm's dma_resv lock can be reused as page-table lock.
+
+There are certain optimizations described below that require
+additional locks. More on that later.
+
+.. code-block:: C
+
+   dma_resv_lock(&gvm->resv);
+
+   for_each_gvma_on_revalidate_list(gvm, &gvma) {
+		revalidate_gvma(&gvma);
+		remove_from_revalidate_list(&gvma);
+   }
+
+   add_dependencies(&gpu_job, &gvm->resv);
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gvm->resv);
+   dma_resv_unlock(&gvm->resv);
+
+Eviction of one of these local objects will then be something like the
+following:
+
+.. code-block:: C
+
+   obj = get_object_from_lru();
+
+   dma_resv_lock(obj->resv);
+   for_each_gvma_of_obj(obj, &gvma);
+		put_gvma_on_revalidate_list(&gvma);
+
+   add_dependencies(&eviction_job, &obj->resv);
+   job_dma_fence = gpu_submit(&eviction_job);
+   add_dma_fence(&obj->resv, job_dma_fence);
+
+   dma_resv_unlock(&obj->resv);
+   put_object(obj);
+
+Note that since the object is local to the gvm, it will share the gvm's
+``dma_resv`` lock so that ``obj->resv == gvm->resv``. Invalidated gvmas are put
+on the gvm's revalidation list, which is protected by ``gvm->resv``, which
+is always locked while evicting, due to the above equality.
+
+Does the gvma need to be unbound before eviction? For VM_BIND gvms
+the answer is no. Since the eviction blit or copy will wait for GPU
+idle, any attempt by the GPU to access freed memory through the
+gvma will be preceded by a new exec function, which will
+make sure the gvma is revalidated, that is not an issue.
+
+Introducing external (or shared) buffer objects
+===============================================
+
+Since shared buffer objects may be shared by multiple gvm's they
+can't share their reservation object with a single gvm, but will rather
+have a reservation object of their own. The shared objects bound to a
+gvm using one or many
+gvmas are therefore typically put on a per-gvm list which is
+protected by the gvm lock. One could in theory protect it also with
+the ``gvm->resv``, but since the list of dma_resvs to take is typically
+built before the ``gvm->resv`` is locked due to a limitation in
+the current locking helpers, that is typically not done. Also see
+below for userptr gvmas.
+
+At eviction time we now need to invalidate *all* gvmas of a shared
+object, but we can no longer be certain that we hold the gvm's
+dma_resv of all the object's gvmas. We can only be certain that we
+hold the object's private dma_resv. We can trylock the dma_resvs for
+the affected gvm's but that might be unnecessarily complex. If we
+have a ww_acquire context at hand at eviction time we can also perform
+sleeping locks of those dma_resvs but that could cause expensive
+rollbacks. One option is to just mark the invalidated gvmas with a bool
+which is inspected on the next exec function, when the gvm's
+dma_resv and the object's dma_resv is held, and the invalidated
+gvmas could then be put on the gvm's list of invalidated
+gvmas. That bool would then, although being per-gvma formally be
+protected by the object's dma_resv.
+
+The exec function would then look something like the following:
+
+.. code-block:: C
+
+   read_lock(&gvm->lock);
+		
+   dma_resv_lock(&gvm->resv);
+
+   // Shared object list is protected by the gvm->lock.
+   for_each_shared_obj(gvm, &obj) {
+		dma_resv_lock(&obj->resv);
+		move_marked_gvmas_to_revalidate_gvma_list(obj, &gvm);
+   }
+
+   for_each_gvma_to_revalidate(gvm, &gvma) {
+		revalidate_gvma(&gvma);
+		remove_from_revalidate_list(&gvma);
+   }
+
+   add_dependencies(&gpu_job, &gvm->resv);
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gvm->resv);
+   for_each_shared_obj(gvm, &obj)
+          add_dma_fence(job_dma_fence, &obj->resv);
+   dma_resv_unlock_all_resv_locks();
+
+   read_unlock(&gvm->lock);
+
+And the corresponding shared-object aware eviction would look like:
+
+.. code-block:: C
+
+   obj = get_object_from_lru();
+
+   dma_resv_lock(obj->resv);
+   for_each_gvma_of_obj(obj, &gvma);
+		if (object_is_vm_local(obj))
+		             put_gvma_on_revalidate_list(&gvma, &gvm);
+		else
+		             mark_gvma_for_revalidation(&gvma);
+
+   add_dependencies(&eviction_job, &obj->resv);
+   job_dma_fence = gpu_submit(&eviction_job);
+   add_dma_fence(&obj->resv, job_dma_fence);
+
+   dma_resv_unlock(&obj->resv);
+   put_object(obj);
+
+Yet another option is to put the gvmas to be invalidated on a separate
+gvm list protected by a lower level lock that can be taken both at eviction
+time and at transfer-to-revalidate list time. The details are not in
+this document, but this for reference implemented in the Intel xe
+driver.
+
+Introducing userptr gvmas
+=========================
+
+A userptr gvma is a gvma that, instead of mapping a buffer object to a
+GPU virtual address range, directly maps a CPU mm range of anonymous-
+or file page-cache pages.
+A very simple approach would be to just pin the pages using
+pin_user_pages() at bind time and unpin them at unbind time, but this
+creates a Denial-Of-Service vector since a single user-space process
+would be able to pin down all of system memory, which is not
+desirable. (For special use-cases and with proper accounting pinning might
+still be a desirable feature, though). What we need to do in the general case is
+to obtain a reference to the desired pages, make sure we are notified
+using a MMU notifier just before the CPU mm unmaps the pages, dirty
+them if they are not mapped read-only to the GPU, and then drop the reference.
+When we are notified by the MMU notifier that CPU mm is about to drop the
+pages, we need to stop GPU access to the pages,
+GPU page-table and make sure that before the next time the GPU tries to access
+whatever is now present in the CPU mm range, we unmap the old pages
+from the GPU page tables and repeat the process of obtaining new page
+references. Note that when the core mm decides to laundry pages, we get such
+an unmap MMU notification and can mark the pages dirty again before the
+next GPU access. We also get similar MMU notifications for NUMA accounting
+which the GPU driver doesn't really need to care about, but so far
+it's proven difficult to exclude certain notifications.
+
+Using a MMU notifier for device DMA (and other methods) is described in
+`this document 
+<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
+
+Now the method of obtaining struct page references using
+get_user_pages() unfortunately can't be used under a dma_resv lock
+since that would violate the locking order of the dma_resv lock vs the
+mmap_lock that is grabbed when resolving a CPU pagefault. This means the gvm's
+list of userptr gvmas needs to be protected by an outer lock, and this
+is the first time we strictly need the gvm->lock. While it was
+previously used also to protect the list of the gvm's shared objects,
+we could in theory have used the gvm->resv for that.
+
+The MMU interval seqlock for a userptr gvma is used in the following
+way:
+
+.. code-block:: C
+
+   down_read(&gvm->lock);
+
+   retry:
+
+   // Note: mmu_interval_read_begin() blocks until there is no
+   // invalidation notifier running anymore.
+   seq = mmu_interval_read_begin(&gvma->userptr_interval);
+   if (seq != gvma->saved_seq) {
+           obtain_new_page_pointers(&gvma);
+	   dma_resv_lock(&gvm->resv);
+	   put_gvma_on_revalidate_list(&gvma, &gvm);
+	   dma_resv_unlock(&gvm->resv);
+	   gvma->saved_seq = seq;
+   }
+
+   // The usual revalidation goes here.
+
+   // Final userptr sequence validation may not happen before the
+   // submission dma_fence is added to the gvm's resv, from the POW
+   // of the MMU invalidation notifier. Hence the
+   // userptr_notifier_lock that will make them appear atomic.
+   
+   add_dependencies(&gpu_job, &gvm->resv);
+   down_read(&gvm->userptr_notifier_lock);
+   if (mmu_interval_read_retry(&gvma->userptr_interval, gvma->saved_seq)) {
+          up_read(&gvm->userptr_notifier_lock);
+	  goto retry;
+   }
+
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gvm->resv);
+
+   for_each_shared_obj(gvm, &obj)
+          add_dma_fence(job_dma_fence, &obj->resv);
+
+   dma_resv_unlock_all_resv_locks();
+   up_read(&gvm->userptr_notifier_lock);
+   up_read(&gvm->lock);
+
+The code between ``mmu_interval_read_begin()`` and the
+``mmu_interval_read_retry()`` marks the read side critical section of
+what we call the ``userptr_seqlock``. In reality the gvm's userptr
+gvma list is looped through, and the check is done for *all* of its
+userptr gvmas, although we only show a single one here.
+
+The userptr gvma MMU invalidation notifier might be called from
+reclaim context and, again to avoid locking order violations, we can't
+take any dma_resv lock nor the gvm->lock from within it.
+
+.. code-block:: C
+
+  bool gvma_userptr_invalidate(userptr_interval, cur_seq)
+  {
+          // Make sure the exec function either sees the new sequence
+	  // and backs off or we wait for the dma-fence:
+	  
+          down_write(&gvm->userptr_notifier_lock);
+	  mmu_interval_set_seq(userptr_interval, cur_seq);
+	  up_write(&gvm->userptr_notifier_lock);
+
+	  dma_resv_wait_timeout(&gvm->resv, DMA_RESV_USAGE_BOOKKEEP,
+		                false, MAX_SCHEDULE_TIMEOUT);
+	  return true;
+  }
+
+When this invalidation notifier returns, the GPU can no longer be
+accessing the old pages of the userptr gvma and needs to redo the page-binding
+before a new GPU submission can succeed.
+
+Optimizing gvma iteration
+-------------------------
+
+Iterating through all of a gvm's userptr gvmas to check the validity
+on each exec function may be very costly. There is a scheme to avoid
+this and only iterate through the userptr gvmas that actually saw an
+invalidation notifier call since the last exec. T
+
+TODO: describe that scheme here. It's implemented in the xe driver.
+
+Locking for page-table updates at bind- and unbind time
+=======================================================
+
+TODO.
+
+Recoverable page-fault implications
+===================================
+
+TODO.
-- 
2.40.1


WARNING: multiple messages have this Message-ID (diff)
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: Francois Dugast <francois.dugast@intel.com>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	linux-kernel@vger.kernel.org, Oak Zeng <oak.zeng@intel.com>,
	Danilo Krummrich <dakr@redhat.com>,
	dri-devel@lists.freedesktop.org, Daniel Vetter <daniel@ffwll.ch>
Subject: [Intel-xe] [RFC PATCH] Documentation/gpu: Draft VM_BIND locking document
Date: Fri, 30 Jun 2023 18:44:52 +0200	[thread overview]
Message-ID: <20230630164452.9228-1-thomas.hellstrom@linux.intel.com> (raw)

Add the first version of the VM_BIND locking document which is
intended to be part of the xe driver upstreaming agreement.

The document describes and discuss the locking used during exec-
functions, evicton and for userptr gmvas. Intention is to be using the
same nomenclature as the drm-vm-bind-async.rst, but to keep naming a
little shorter, use gvm and gmva instead of gpu_vm and gpu_vma which
is used in the previous document, with an intention to modify also
that document.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 Documentation/gpu/drm-vm-bind-locking.rst | 339 ++++++++++++++++++++++
 1 file changed, 339 insertions(+)
 create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst

diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
new file mode 100644
index 000000000000..f5d1a40a2906
--- /dev/null
+++ b/Documentation/gpu/drm-vm-bind-locking.rst
@@ -0,0 +1,339 @@
+===============
+VM_BIND locking
+===============
+
+This document attempts to describe what's needed to get VM_BIND locking right,
+including the userptr mmu_notifier locking and it will also discuss some
+optimizations to get rid of the looping through of all userptr mappings and
+external / shared object mappings that is needed in the simplest
+implementation. It will also discuss some implications for faulting gvms.
+
+Nomenclature
+============
+
+* ``Context``: GPU execution context.
+* ``gvm``: Abstraction of a GPU address space with meta-data. Typically
+  one per client (DRM file-private), or one per context. 
+* ``gvma``: Abstraction of a GPU address range within a gvma with
+  associated meta-data. The backing storage of a gvma can either be
+  a gem buffer object or anonymous pages mapped also into the CPU
+  address space for the process.
+* ``userptr gvma or just userptr``: A gvma, the backing store of
+  which is anonymous pages as described above.
+* ``revalidating``: Revalidating a gvma means making the latest version
+  of the backing store resident and making sure the gvma's
+  page-table entries point to that backing store.
+* ``dma_fence``: A struct dma_fence that is similar to a struct completion
+  and which tracks GPU activity. When the GPU activity is finished,
+  the dma_fence signals.
+* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
+  to track GPU activity in the form of multiple dma_fences on a
+  gvm or a gem buffer object. The dma_resv contains an array / list
+  of dma_fences and a lock that needs to be held when adding
+  additional dma_fences to the dma_resv. The lock is of a type that
+  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
+* ``exec function``: An exec function is a function that revalidates all
+  affected gvmas, submits a GPU command batch and registers the
+  dma_fence representing the GPU command's activity with all affected
+  dma_resvs. For completeness, although not covered by this document,
+  it's worth mentioning that an exec function may also be the
+  revalidation worker that is used by some drivers in compute /
+  long-running mode.
+* ``local object``: A GEM object which is local to a gvm. Shared gem
+  objects also share the gvm's dma_resv.
+* ``shared object``: AKA external object: A GEM object which may be shared
+  by multiple gvms and whose backing storage may be shared with
+  other drivers.
+
+
+Introducing the locks
+=====================
+
+One of the benefits of VM_BIND is that local GEM objects share the gvm's
+dma_resv object and hence the dma_resv lock. So even with a huge
+number of local GEM objects, only one lock is needed to make the exec
+sequence atomic.
+
+The following locks and locking orders are used:
+
+* The ``gvm->lock`` (optionally an rwsem). Protects how the gvm is
+  partitioned into gvmas, protects the gvm's list of external objects,
+  and can also with some simplification protect the gvm's list of
+  userptr gvmas. With the CPU mm analogy this would correspond to the
+  mmap_lock.
+* The ``userptr_seqlock``. This lock is taken in read mode for each
+  userptr gvma on the gvm's userptr list, and in write mode during mmu
+  notifier invalidation.
+* The ``gvm->resv`` lock. Protects the gvm's list of gvmas needing
+  rebinding, and also the residency of all the gvm's local GEM object.
+* The ``gvm->userptr_notifier_lock``. This is an rwsem that is taken in read
+  mode during exec and write mode during a mmu notifier invalidation. In
+  the absence of a separate page-table lock, this lock can serve
+  together with the gvm's dma_resv lock as a page-table lock. More on
+  this below. The userptr notifier lock is per gvm.
+* The ``gvm->page_table_lock``. Protects the gvm's page-table updates. For
+  simplicity the gvm's dma_resv lock can be reused as page-table lock.
+
+There are certain optimizations described below that require
+additional locks. More on that later.
+
+.. code-block:: C
+
+   dma_resv_lock(&gvm->resv);
+
+   for_each_gvma_on_revalidate_list(gvm, &gvma) {
+		revalidate_gvma(&gvma);
+		remove_from_revalidate_list(&gvma);
+   }
+
+   add_dependencies(&gpu_job, &gvm->resv);
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gvm->resv);
+   dma_resv_unlock(&gvm->resv);
+
+Eviction of one of these local objects will then be something like the
+following:
+
+.. code-block:: C
+
+   obj = get_object_from_lru();
+
+   dma_resv_lock(obj->resv);
+   for_each_gvma_of_obj(obj, &gvma);
+		put_gvma_on_revalidate_list(&gvma);
+
+   add_dependencies(&eviction_job, &obj->resv);
+   job_dma_fence = gpu_submit(&eviction_job);
+   add_dma_fence(&obj->resv, job_dma_fence);
+
+   dma_resv_unlock(&obj->resv);
+   put_object(obj);
+
+Note that since the object is local to the gvm, it will share the gvm's
+``dma_resv`` lock so that ``obj->resv == gvm->resv``. Invalidated gvmas are put
+on the gvm's revalidation list, which is protected by ``gvm->resv``, which
+is always locked while evicting, due to the above equality.
+
+Does the gvma need to be unbound before eviction? For VM_BIND gvms
+the answer is no. Since the eviction blit or copy will wait for GPU
+idle, any attempt by the GPU to access freed memory through the
+gvma will be preceded by a new exec function, which will
+make sure the gvma is revalidated, that is not an issue.
+
+Introducing external (or shared) buffer objects
+===============================================
+
+Since shared buffer objects may be shared by multiple gvm's they
+can't share their reservation object with a single gvm, but will rather
+have a reservation object of their own. The shared objects bound to a
+gvm using one or many
+gvmas are therefore typically put on a per-gvm list which is
+protected by the gvm lock. One could in theory protect it also with
+the ``gvm->resv``, but since the list of dma_resvs to take is typically
+built before the ``gvm->resv`` is locked due to a limitation in
+the current locking helpers, that is typically not done. Also see
+below for userptr gvmas.
+
+At eviction time we now need to invalidate *all* gvmas of a shared
+object, but we can no longer be certain that we hold the gvm's
+dma_resv of all the object's gvmas. We can only be certain that we
+hold the object's private dma_resv. We can trylock the dma_resvs for
+the affected gvm's but that might be unnecessarily complex. If we
+have a ww_acquire context at hand at eviction time we can also perform
+sleeping locks of those dma_resvs but that could cause expensive
+rollbacks. One option is to just mark the invalidated gvmas with a bool
+which is inspected on the next exec function, when the gvm's
+dma_resv and the object's dma_resv is held, and the invalidated
+gvmas could then be put on the gvm's list of invalidated
+gvmas. That bool would then, although being per-gvma formally be
+protected by the object's dma_resv.
+
+The exec function would then look something like the following:
+
+.. code-block:: C
+
+   read_lock(&gvm->lock);
+		
+   dma_resv_lock(&gvm->resv);
+
+   // Shared object list is protected by the gvm->lock.
+   for_each_shared_obj(gvm, &obj) {
+		dma_resv_lock(&obj->resv);
+		move_marked_gvmas_to_revalidate_gvma_list(obj, &gvm);
+   }
+
+   for_each_gvma_to_revalidate(gvm, &gvma) {
+		revalidate_gvma(&gvma);
+		remove_from_revalidate_list(&gvma);
+   }
+
+   add_dependencies(&gpu_job, &gvm->resv);
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gvm->resv);
+   for_each_shared_obj(gvm, &obj)
+          add_dma_fence(job_dma_fence, &obj->resv);
+   dma_resv_unlock_all_resv_locks();
+
+   read_unlock(&gvm->lock);
+
+And the corresponding shared-object aware eviction would look like:
+
+.. code-block:: C
+
+   obj = get_object_from_lru();
+
+   dma_resv_lock(obj->resv);
+   for_each_gvma_of_obj(obj, &gvma);
+		if (object_is_vm_local(obj))
+		             put_gvma_on_revalidate_list(&gvma, &gvm);
+		else
+		             mark_gvma_for_revalidation(&gvma);
+
+   add_dependencies(&eviction_job, &obj->resv);
+   job_dma_fence = gpu_submit(&eviction_job);
+   add_dma_fence(&obj->resv, job_dma_fence);
+
+   dma_resv_unlock(&obj->resv);
+   put_object(obj);
+
+Yet another option is to put the gvmas to be invalidated on a separate
+gvm list protected by a lower level lock that can be taken both at eviction
+time and at transfer-to-revalidate list time. The details are not in
+this document, but this for reference implemented in the Intel xe
+driver.
+
+Introducing userptr gvmas
+=========================
+
+A userptr gvma is a gvma that, instead of mapping a buffer object to a
+GPU virtual address range, directly maps a CPU mm range of anonymous-
+or file page-cache pages.
+A very simple approach would be to just pin the pages using
+pin_user_pages() at bind time and unpin them at unbind time, but this
+creates a Denial-Of-Service vector since a single user-space process
+would be able to pin down all of system memory, which is not
+desirable. (For special use-cases and with proper accounting pinning might
+still be a desirable feature, though). What we need to do in the general case is
+to obtain a reference to the desired pages, make sure we are notified
+using a MMU notifier just before the CPU mm unmaps the pages, dirty
+them if they are not mapped read-only to the GPU, and then drop the reference.
+When we are notified by the MMU notifier that CPU mm is about to drop the
+pages, we need to stop GPU access to the pages,
+GPU page-table and make sure that before the next time the GPU tries to access
+whatever is now present in the CPU mm range, we unmap the old pages
+from the GPU page tables and repeat the process of obtaining new page
+references. Note that when the core mm decides to laundry pages, we get such
+an unmap MMU notification and can mark the pages dirty again before the
+next GPU access. We also get similar MMU notifications for NUMA accounting
+which the GPU driver doesn't really need to care about, but so far
+it's proven difficult to exclude certain notifications.
+
+Using a MMU notifier for device DMA (and other methods) is described in
+`this document 
+<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
+
+Now the method of obtaining struct page references using
+get_user_pages() unfortunately can't be used under a dma_resv lock
+since that would violate the locking order of the dma_resv lock vs the
+mmap_lock that is grabbed when resolving a CPU pagefault. This means the gvm's
+list of userptr gvmas needs to be protected by an outer lock, and this
+is the first time we strictly need the gvm->lock. While it was
+previously used also to protect the list of the gvm's shared objects,
+we could in theory have used the gvm->resv for that.
+
+The MMU interval seqlock for a userptr gvma is used in the following
+way:
+
+.. code-block:: C
+
+   down_read(&gvm->lock);
+
+   retry:
+
+   // Note: mmu_interval_read_begin() blocks until there is no
+   // invalidation notifier running anymore.
+   seq = mmu_interval_read_begin(&gvma->userptr_interval);
+   if (seq != gvma->saved_seq) {
+           obtain_new_page_pointers(&gvma);
+	   dma_resv_lock(&gvm->resv);
+	   put_gvma_on_revalidate_list(&gvma, &gvm);
+	   dma_resv_unlock(&gvm->resv);
+	   gvma->saved_seq = seq;
+   }
+
+   // The usual revalidation goes here.
+
+   // Final userptr sequence validation may not happen before the
+   // submission dma_fence is added to the gvm's resv, from the POW
+   // of the MMU invalidation notifier. Hence the
+   // userptr_notifier_lock that will make them appear atomic.
+   
+   add_dependencies(&gpu_job, &gvm->resv);
+   down_read(&gvm->userptr_notifier_lock);
+   if (mmu_interval_read_retry(&gvma->userptr_interval, gvma->saved_seq)) {
+          up_read(&gvm->userptr_notifier_lock);
+	  goto retry;
+   }
+
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gvm->resv);
+
+   for_each_shared_obj(gvm, &obj)
+          add_dma_fence(job_dma_fence, &obj->resv);
+
+   dma_resv_unlock_all_resv_locks();
+   up_read(&gvm->userptr_notifier_lock);
+   up_read(&gvm->lock);
+
+The code between ``mmu_interval_read_begin()`` and the
+``mmu_interval_read_retry()`` marks the read side critical section of
+what we call the ``userptr_seqlock``. In reality the gvm's userptr
+gvma list is looped through, and the check is done for *all* of its
+userptr gvmas, although we only show a single one here.
+
+The userptr gvma MMU invalidation notifier might be called from
+reclaim context and, again to avoid locking order violations, we can't
+take any dma_resv lock nor the gvm->lock from within it.
+
+.. code-block:: C
+
+  bool gvma_userptr_invalidate(userptr_interval, cur_seq)
+  {
+          // Make sure the exec function either sees the new sequence
+	  // and backs off or we wait for the dma-fence:
+	  
+          down_write(&gvm->userptr_notifier_lock);
+	  mmu_interval_set_seq(userptr_interval, cur_seq);
+	  up_write(&gvm->userptr_notifier_lock);
+
+	  dma_resv_wait_timeout(&gvm->resv, DMA_RESV_USAGE_BOOKKEEP,
+		                false, MAX_SCHEDULE_TIMEOUT);
+	  return true;
+  }
+
+When this invalidation notifier returns, the GPU can no longer be
+accessing the old pages of the userptr gvma and needs to redo the page-binding
+before a new GPU submission can succeed.
+
+Optimizing gvma iteration
+-------------------------
+
+Iterating through all of a gvm's userptr gvmas to check the validity
+on each exec function may be very costly. There is a scheme to avoid
+this and only iterate through the userptr gvmas that actually saw an
+invalidation notifier call since the last exec. T
+
+TODO: describe that scheme here. It's implemented in the xe driver.
+
+Locking for page-table updates at bind- and unbind time
+=======================================================
+
+TODO.
+
+Recoverable page-fault implications
+===================================
+
+TODO.
-- 
2.40.1


             reply	other threads:[~2023-06-30 16:45 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-30 16:44 Thomas Hellström [this message]
2023-06-30 16:44 ` [Intel-xe] [RFC PATCH] Documentation/gpu: Draft VM_BIND locking document Thomas Hellström
2023-06-30 16:44 ` Thomas Hellström
2023-06-30 16:47 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
2023-06-30 16:47 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-06-30 16:49 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-06-30 16:52 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-06-30 16:53 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-06-30 16:54 ` [Intel-xe] ✓ CI.checksparse: " Patchwork
2023-06-30 17:39 ` [Intel-xe] ○ CI.BAT: info " Patchwork
2023-07-10 22:45 ` [RFC PATCH] " kernel test robot
2023-08-04 20:15 ` [Intel-xe] " Rodrigo Vivi
2023-08-04 20:15   ` Rodrigo Vivi
2023-08-04 20:15   ` Rodrigo Vivi
2023-08-11 13:32   ` Thomas Hellström
2023-08-11 13:32     ` Thomas Hellström
2023-08-11 13:32     ` Thomas Hellström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230630164452.9228-1-thomas.hellstrom@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=dakr@redhat.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.brost@intel.com \
    --cc=oak.zeng@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.