All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-08-16  9:15 ` Thomas Hellström
  0 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-08-16  9:15 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Rodrigo Vivi, Matthew Brost,
	Danilo Krummrich, Joonas Lahtinen, Oak Zeng, Daniel Vetter,
	Maarten Lankhorst, Francois Dugast, dri-devel, linux-kernel

Add the first version of the VM_BIND locking document which is
intended to be part of the xe driver upstreaming agreement.

The document describes and discuss the locking used during exec-
functions, evicton and for userptr gpu-vmas. Intention is to be using the
same nomenclature as the drm-vm-bind-async.rst.

v2:
- s/gvm/gpu_vm/g (Rodrigo Vivi)
- Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
  (Rodrigo Vivi)
- Adjust commit message accordingly.
- Add SPDX license header.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 Documentation/gpu/drm-vm-bind-locking.rst | 351 ++++++++++++++++++++++
 1 file changed, 351 insertions(+)
 create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst

diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
new file mode 100644
index 000000000000..b813961a9ec2
--- /dev/null
+++ b/Documentation/gpu/drm-vm-bind-locking.rst
@@ -0,0 +1,351 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+
+===============
+VM_BIND locking
+===============
+
+This document attempts to describe what's needed to get VM_BIND locking right,
+including the userptr mmu_notifier locking and it will also discuss some
+optimizations to get rid of the looping through of all userptr mappings and
+external / shared object mappings that is needed in the simplest
+implementation. It will also discuss some implications for faulting gpu_vms.
+
+Nomenclature
+============
+
+* ``Context``: GPU execution context.
+* ``gpu_vm``: Abstraction of a virtual GPU address space with
+  meta-data. Typically one per client (DRM file-private), or one per
+  context.
+* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with
+  associated meta-data. The backing storage of a gpu_vma can either be
+  a gem buffer object or anonymous pages mapped also into the CPU
+  address space for the process.
+* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing store of
+  which is anonymous pages as described above.
+* ``revalidating``: Revalidating a gpu_vma means making the latest version
+  of the backing store resident and making sure the gpu_vma's
+  page-table entries point to that backing store.
+* ``dma_fence``: A struct dma_fence that is similar to a struct completion
+  and which tracks GPU activity. When the GPU activity is finished,
+  the dma_fence signals.
+* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
+  to track GPU activity in the form of multiple dma_fences on a
+  gpu_vm or a gem buffer object. The dma_resv contains an array / list
+  of dma_fences and a lock that needs to be held when adding
+  additional dma_fences to the dma_resv. The lock is of a type that
+  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
+* ``exec function``: An exec function is a function that revalidates all
+  affected gpu_vmas, submits a GPU command batch and registers the
+  dma_fence representing the GPU command's activity with all affected
+  dma_resvs. For completeness, although not covered by this document,
+  it's worth mentioning that an exec function may also be the
+  revalidation worker that is used by some drivers in compute /
+  long-running mode.
+* ``local object``: A GEM object which is local to a gpu_vm. Shared gem
+  objects also share the gpu_vm's dma_resv.
+* ``shared object``: AKA external object: A GEM object which may be shared
+  by multiple gpu_vms and whose backing storage may be shared with
+  other drivers.
+
+
+Introducing the locks
+=====================
+
+One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
+dma_resv object and hence the dma_resv lock. So even with a huge
+number of local GEM objects, only one lock is needed to make the exec
+sequence atomic.
+
+The following locks and locking orders are used:
+
+* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the gpu_vm is
+  partitioned into gpu_vmas, protects the gpu_vm's list of external objects,
+  and can also with some simplification protect the gpu_vm's list of
+  userptr gpu_vmas. With the CPU mm analogy this would correspond to the
+  mmap_lock.
+* The ``userptr_seqlock``. This lock is taken in read mode for each
+  userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
+  notifier invalidation. This is not a real seqlock but described in
+  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
+  'lock' a lot like a seqcount, however this allows multiple
+  write-sides to hold it at once...". The read side critical section
+  is enclosed by ``mmu_interval_read_begin() /
+  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
+  sleeping uninterruptibly if the write side is held.
+  The write side is held by the core mm while calling mmu interval
+  invalidation notifiers.
+* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
+  rebinding, and also the residency of all the gpu_vm's local GEM object.
+* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is taken in read
+  mode during exec and write mode during a mmu notifier invalidation. In
+  the absence of a separate page-table lock, this lock can serve
+  together with the gpu_vm's dma_resv lock as a page-table lock. More on
+  this below. The userptr notifier lock is per gpu_vm.
+* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's page-table updates. For
+  simplicity the gpu_vm's dma_resv lock can be reused as page-table lock.
+
+There are certain optimizations described below that require
+additional locks. More on that later.
+
+.. code-block:: C
+
+   dma_resv_lock(&gpu_vm->resv);
+
+   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
+		revalidate_gpu_vma(&gpu_vma);
+		remove_from_revalidate_list(&gpu_vma);
+   }
+
+   add_dependencies(&gpu_job, &gpu_vm->resv);
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gpu_vm->resv);
+   dma_resv_unlock(&gpu_vm->resv);
+
+Eviction of one of these local objects will then be something like the
+following:
+
+.. code-block:: C
+
+   obj = get_object_from_lru();
+
+   dma_resv_lock(obj->resv);
+   for_each_gpu_vma_of_obj(obj, &gpu_vma);
+		put_gpu_vma_on_revalidate_list(&gpu_vma);
+
+   add_dependencies(&eviction_job, &obj->resv);
+   job_dma_fence = gpu_submit(&eviction_job);
+   add_dma_fence(&obj->resv, job_dma_fence);
+
+   dma_resv_unlock(&obj->resv);
+   put_object(obj);
+
+Note that since the object is local to the gpu_vm, it will share the gpu_vm's
+``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. Invalidated gpu_vmas are put
+on the gpu_vm's revalidation list, which is protected by ``gpu_vm->resv``, which
+is always locked while evicting, due to the above equality.
+
+For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before eviction,
+Since the eviction blit or copy will wait for GPU idle, any attempt by
+the GPU to access freed memory through the gpu_vma will be preceded by
+a new exec function, which will make sure the gpu_vma is
+revalidated. The eviction code holding the object's dma_resv while
+revalidating will ensure a new exec function may not race with the eviction.
+
+Introducing external (or shared) buffer objects
+===============================================
+
+Since shared buffer objects may be shared by multiple gpu_vm's they
+can't share their reservation object with a single gpu_vm, but will rather
+have a reservation object of their own. The shared objects bound to a
+gpu_vm using one or many
+gpu_vmas are therefore typically put on a per-gpu_vm list which is
+protected by the gpu_vm lock. One could in theory protect it also with
+the ``gpu_vm->resv``, but since the list of dma_resvs to take is typically
+built before the ``gpu_vm->resv`` is locked due to a limitation in
+the current locking helpers, that is typically not done. Also see
+below for userptr gpu_vmas.
+
+At eviction time we now need to invalidate *all* gpu_vmas of a shared
+object, but we can no longer be certain that we hold the gpu_vm's
+dma_resv of all the object's gpu_vmas. We can only be certain that we
+hold the object's private dma_resv. We can trylock the dma_resvs for
+the affected gpu_vm's but that might be unnecessarily complex. If we
+have a ww_acquire context at hand at eviction time we can also perform
+sleeping locks of those dma_resvs but that could cause expensive
+rollbacks. One option is to just mark the invalidated gpu_vmas with a bool
+which is inspected on the next exec function, when the gpu_vm's
+dma_resv and the object's dma_resv is held, and the invalidated
+gpu_vmas could then be put on the gpu_vm's list of invalidated
+gpu_vmas. That bool would then, although being per-gpu_vma formally be
+protected by the object's dma_resv.
+
+The exec function would then look something like the following:
+
+.. code-block:: C
+
+   read_lock(&gpu_vm->lock);
+
+   dma_resv_lock(&gpu_vm->resv);
+
+   // Shared object list is protected by the gpu_vm->lock.
+   for_each_shared_obj(gpu_vm, &obj) {
+		dma_resv_lock(&obj->resv);
+		move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
+   }
+
+   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
+		revalidate_gpu_vma(&gpu_vma);
+		remove_from_revalidate_list(&gpu_vma);
+   }
+
+   add_dependencies(&gpu_job, &gpu_vm->resv);
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gpu_vm->resv);
+   for_each_shared_obj(gpu_vm, &obj)
+          add_dma_fence(job_dma_fence, &obj->resv);
+   dma_resv_unlock_all_resv_locks();
+
+   read_unlock(&gpu_vm->lock);
+
+And the corresponding shared-object aware eviction would look like:
+
+.. code-block:: C
+
+   obj = get_object_from_lru();
+
+   dma_resv_lock(obj->resv);
+   for_each_gpu_vma_of_obj(obj, &gpu_vma);
+		if (object_is_vm_local(obj))
+		             put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
+		else
+		             mark_gpu_vma_for_revalidation(&gpu_vma);
+
+   add_dependencies(&eviction_job, &obj->resv);
+   job_dma_fence = gpu_submit(&eviction_job);
+   add_dma_fence(&obj->resv, job_dma_fence);
+
+   dma_resv_unlock(&obj->resv);
+   put_object(obj);
+
+Yet another option is to put the gpu_vmas to be invalidated on a separate
+gpu_vm list protected by a lower level lock that can be taken both at eviction
+time and at transfer-to-revalidate list time. The details are not in
+this document, but this for reference implemented in the Intel xe
+driver.
+
+Introducing userptr gpu_vmas
+============================
+
+A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
+GPU virtual address range, directly maps a CPU mm range of anonymous-
+or file page-cache pages.
+A very simple approach would be to just pin the pages using
+pin_user_pages() at bind time and unpin them at unbind time, but this
+creates a Denial-Of-Service vector since a single user-space process
+would be able to pin down all of system memory, which is not
+desirable. (For special use-cases and with proper accounting pinning might
+still be a desirable feature, though). What we need to do in the general case is
+to obtain a reference to the desired pages, make sure we are notified
+using a MMU notifier just before the CPU mm unmaps the pages, dirty
+them if they are not mapped read-only to the GPU, and then drop the reference.
+When we are notified by the MMU notifier that CPU mm is about to drop the
+pages, we need to stop GPU access to the pages,
+GPU page-table and make sure that before the next time the GPU tries to access
+whatever is now present in the CPU mm range, we unmap the old pages
+from the GPU page tables and repeat the process of obtaining new page
+references. Note that when the core mm decides to laundry pages, we get such
+an unmap MMU notification and can mark the pages dirty again before the
+next GPU access. We also get similar MMU notifications for NUMA accounting
+which the GPU driver doesn't really need to care about, but so far
+it's proven difficult to exclude certain notifications.
+
+Using a MMU notifier for device DMA (and other methods) is described in
+`this document
+<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
+
+Now the method of obtaining struct page references using
+get_user_pages() unfortunately can't be used under a dma_resv lock
+since that would violate the locking order of the dma_resv lock vs the
+mmap_lock that is grabbed when resolving a CPU pagefault. This means the gpu_vm's
+list of userptr gpu_vmas needs to be protected by an outer lock, and this
+is the first time we strictly need the gpu_vm->lock. While it was
+previously used also to protect the list of the gpu_vm's shared objects,
+we could in theory have used the gpu_vm->resv for that.
+
+The MMU interval seqlock for a userptr gpu_vma is used in the following
+way:
+
+.. code-block:: C
+
+   down_read(&gpu_vm->lock);
+
+   retry:
+
+   // Note: mmu_interval_read_begin() blocks until there is no
+   // invalidation notifier running anymore.
+   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
+   if (seq != gpu_vma->saved_seq) {
+           obtain_new_page_pointers(&gpu_vma);
+	   dma_resv_lock(&gpu_vm->resv);
+	   put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
+	   dma_resv_unlock(&gpu_vm->resv);
+	   gpu_vma->saved_seq = seq;
+   }
+
+   // The usual revalidation goes here.
+
+   // Final userptr sequence validation may not happen before the
+   // submission dma_fence is added to the gpu_vm's resv, from the POW
+   // of the MMU invalidation notifier. Hence the
+   // userptr_notifier_lock that will make them appear atomic.
+
+   add_dependencies(&gpu_job, &gpu_vm->resv);
+   down_read(&gpu_vm->userptr_notifier_lock);
+   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
+          up_read(&gpu_vm->userptr_notifier_lock);
+	  goto retry;
+   }
+
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gpu_vm->resv);
+
+   for_each_shared_obj(gpu_vm, &obj)
+          add_dma_fence(job_dma_fence, &obj->resv);
+
+   dma_resv_unlock_all_resv_locks();
+   up_read(&gpu_vm->userptr_notifier_lock);
+   up_read(&gpu_vm->lock);
+
+The code between ``mmu_interval_read_begin()`` and the
+``mmu_interval_read_retry()`` marks the read side critical section of
+what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
+gpu_vma list is looped through, and the check is done for *all* of its
+userptr gpu_vmas, although we only show a single one here.
+
+The userptr gpu_vma MMU invalidation notifier might be called from
+reclaim context and, again to avoid locking order violations, we can't
+take any dma_resv lock nor the gpu_vm->lock from within it.
+
+.. code-block:: C
+
+  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
+  {
+          // Make sure the exec function either sees the new sequence
+	  // and backs off or we wait for the dma-fence:
+
+          down_write(&gpu_vm->userptr_notifier_lock);
+	  mmu_interval_set_seq(userptr_interval, cur_seq);
+	  up_write(&gpu_vm->userptr_notifier_lock);
+
+	  dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
+		                false, MAX_SCHEDULE_TIMEOUT);
+	  return true;
+  }
+
+When this invalidation notifier returns, the GPU can no longer be
+accessing the old pages of the userptr gpu_vma and needs to redo the page-binding
+before a new GPU submission can succeed.
+
+Optimizing gpu_vma iteration
+----------------------------
+
+Iterating through all of a gpu_vm's userptr gpu_vmas to check the validity
+on each exec function may be very costly. There is a scheme to avoid
+this and only iterate through the userptr gpu_vmas that actually saw an
+invalidation notifier call since the last exec. T
+
+TODO: describe that scheme here. It's implemented in the xe driver.
+
+Locking for page-table updates at bind- and unbind time
+=======================================================
+
+TODO.
+
+Recoverable page-fault implications
+===================================
+
+TODO.
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-08-16  9:15 ` Thomas Hellström
  0 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-08-16  9:15 UTC (permalink / raw)
  To: intel-xe
  Cc: Matthew Brost, Thomas Hellström, Francois Dugast,
	linux-kernel, Oak Zeng, Danilo Krummrich, dri-devel,
	Rodrigo Vivi

Add the first version of the VM_BIND locking document which is
intended to be part of the xe driver upstreaming agreement.

The document describes and discuss the locking used during exec-
functions, evicton and for userptr gpu-vmas. Intention is to be using the
same nomenclature as the drm-vm-bind-async.rst.

v2:
- s/gvm/gpu_vm/g (Rodrigo Vivi)
- Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
  (Rodrigo Vivi)
- Adjust commit message accordingly.
- Add SPDX license header.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 Documentation/gpu/drm-vm-bind-locking.rst | 351 ++++++++++++++++++++++
 1 file changed, 351 insertions(+)
 create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst

diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
new file mode 100644
index 000000000000..b813961a9ec2
--- /dev/null
+++ b/Documentation/gpu/drm-vm-bind-locking.rst
@@ -0,0 +1,351 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+
+===============
+VM_BIND locking
+===============
+
+This document attempts to describe what's needed to get VM_BIND locking right,
+including the userptr mmu_notifier locking and it will also discuss some
+optimizations to get rid of the looping through of all userptr mappings and
+external / shared object mappings that is needed in the simplest
+implementation. It will also discuss some implications for faulting gpu_vms.
+
+Nomenclature
+============
+
+* ``Context``: GPU execution context.
+* ``gpu_vm``: Abstraction of a virtual GPU address space with
+  meta-data. Typically one per client (DRM file-private), or one per
+  context.
+* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with
+  associated meta-data. The backing storage of a gpu_vma can either be
+  a gem buffer object or anonymous pages mapped also into the CPU
+  address space for the process.
+* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing store of
+  which is anonymous pages as described above.
+* ``revalidating``: Revalidating a gpu_vma means making the latest version
+  of the backing store resident and making sure the gpu_vma's
+  page-table entries point to that backing store.
+* ``dma_fence``: A struct dma_fence that is similar to a struct completion
+  and which tracks GPU activity. When the GPU activity is finished,
+  the dma_fence signals.
+* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
+  to track GPU activity in the form of multiple dma_fences on a
+  gpu_vm or a gem buffer object. The dma_resv contains an array / list
+  of dma_fences and a lock that needs to be held when adding
+  additional dma_fences to the dma_resv. The lock is of a type that
+  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
+* ``exec function``: An exec function is a function that revalidates all
+  affected gpu_vmas, submits a GPU command batch and registers the
+  dma_fence representing the GPU command's activity with all affected
+  dma_resvs. For completeness, although not covered by this document,
+  it's worth mentioning that an exec function may also be the
+  revalidation worker that is used by some drivers in compute /
+  long-running mode.
+* ``local object``: A GEM object which is local to a gpu_vm. Shared gem
+  objects also share the gpu_vm's dma_resv.
+* ``shared object``: AKA external object: A GEM object which may be shared
+  by multiple gpu_vms and whose backing storage may be shared with
+  other drivers.
+
+
+Introducing the locks
+=====================
+
+One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
+dma_resv object and hence the dma_resv lock. So even with a huge
+number of local GEM objects, only one lock is needed to make the exec
+sequence atomic.
+
+The following locks and locking orders are used:
+
+* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the gpu_vm is
+  partitioned into gpu_vmas, protects the gpu_vm's list of external objects,
+  and can also with some simplification protect the gpu_vm's list of
+  userptr gpu_vmas. With the CPU mm analogy this would correspond to the
+  mmap_lock.
+* The ``userptr_seqlock``. This lock is taken in read mode for each
+  userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
+  notifier invalidation. This is not a real seqlock but described in
+  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
+  'lock' a lot like a seqcount, however this allows multiple
+  write-sides to hold it at once...". The read side critical section
+  is enclosed by ``mmu_interval_read_begin() /
+  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
+  sleeping uninterruptibly if the write side is held.
+  The write side is held by the core mm while calling mmu interval
+  invalidation notifiers.
+* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
+  rebinding, and also the residency of all the gpu_vm's local GEM object.
+* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is taken in read
+  mode during exec and write mode during a mmu notifier invalidation. In
+  the absence of a separate page-table lock, this lock can serve
+  together with the gpu_vm's dma_resv lock as a page-table lock. More on
+  this below. The userptr notifier lock is per gpu_vm.
+* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's page-table updates. For
+  simplicity the gpu_vm's dma_resv lock can be reused as page-table lock.
+
+There are certain optimizations described below that require
+additional locks. More on that later.
+
+.. code-block:: C
+
+   dma_resv_lock(&gpu_vm->resv);
+
+   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
+		revalidate_gpu_vma(&gpu_vma);
+		remove_from_revalidate_list(&gpu_vma);
+   }
+
+   add_dependencies(&gpu_job, &gpu_vm->resv);
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gpu_vm->resv);
+   dma_resv_unlock(&gpu_vm->resv);
+
+Eviction of one of these local objects will then be something like the
+following:
+
+.. code-block:: C
+
+   obj = get_object_from_lru();
+
+   dma_resv_lock(obj->resv);
+   for_each_gpu_vma_of_obj(obj, &gpu_vma);
+		put_gpu_vma_on_revalidate_list(&gpu_vma);
+
+   add_dependencies(&eviction_job, &obj->resv);
+   job_dma_fence = gpu_submit(&eviction_job);
+   add_dma_fence(&obj->resv, job_dma_fence);
+
+   dma_resv_unlock(&obj->resv);
+   put_object(obj);
+
+Note that since the object is local to the gpu_vm, it will share the gpu_vm's
+``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. Invalidated gpu_vmas are put
+on the gpu_vm's revalidation list, which is protected by ``gpu_vm->resv``, which
+is always locked while evicting, due to the above equality.
+
+For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before eviction,
+Since the eviction blit or copy will wait for GPU idle, any attempt by
+the GPU to access freed memory through the gpu_vma will be preceded by
+a new exec function, which will make sure the gpu_vma is
+revalidated. The eviction code holding the object's dma_resv while
+revalidating will ensure a new exec function may not race with the eviction.
+
+Introducing external (or shared) buffer objects
+===============================================
+
+Since shared buffer objects may be shared by multiple gpu_vm's they
+can't share their reservation object with a single gpu_vm, but will rather
+have a reservation object of their own. The shared objects bound to a
+gpu_vm using one or many
+gpu_vmas are therefore typically put on a per-gpu_vm list which is
+protected by the gpu_vm lock. One could in theory protect it also with
+the ``gpu_vm->resv``, but since the list of dma_resvs to take is typically
+built before the ``gpu_vm->resv`` is locked due to a limitation in
+the current locking helpers, that is typically not done. Also see
+below for userptr gpu_vmas.
+
+At eviction time we now need to invalidate *all* gpu_vmas of a shared
+object, but we can no longer be certain that we hold the gpu_vm's
+dma_resv of all the object's gpu_vmas. We can only be certain that we
+hold the object's private dma_resv. We can trylock the dma_resvs for
+the affected gpu_vm's but that might be unnecessarily complex. If we
+have a ww_acquire context at hand at eviction time we can also perform
+sleeping locks of those dma_resvs but that could cause expensive
+rollbacks. One option is to just mark the invalidated gpu_vmas with a bool
+which is inspected on the next exec function, when the gpu_vm's
+dma_resv and the object's dma_resv is held, and the invalidated
+gpu_vmas could then be put on the gpu_vm's list of invalidated
+gpu_vmas. That bool would then, although being per-gpu_vma formally be
+protected by the object's dma_resv.
+
+The exec function would then look something like the following:
+
+.. code-block:: C
+
+   read_lock(&gpu_vm->lock);
+
+   dma_resv_lock(&gpu_vm->resv);
+
+   // Shared object list is protected by the gpu_vm->lock.
+   for_each_shared_obj(gpu_vm, &obj) {
+		dma_resv_lock(&obj->resv);
+		move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
+   }
+
+   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
+		revalidate_gpu_vma(&gpu_vma);
+		remove_from_revalidate_list(&gpu_vma);
+   }
+
+   add_dependencies(&gpu_job, &gpu_vm->resv);
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gpu_vm->resv);
+   for_each_shared_obj(gpu_vm, &obj)
+          add_dma_fence(job_dma_fence, &obj->resv);
+   dma_resv_unlock_all_resv_locks();
+
+   read_unlock(&gpu_vm->lock);
+
+And the corresponding shared-object aware eviction would look like:
+
+.. code-block:: C
+
+   obj = get_object_from_lru();
+
+   dma_resv_lock(obj->resv);
+   for_each_gpu_vma_of_obj(obj, &gpu_vma);
+		if (object_is_vm_local(obj))
+		             put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
+		else
+		             mark_gpu_vma_for_revalidation(&gpu_vma);
+
+   add_dependencies(&eviction_job, &obj->resv);
+   job_dma_fence = gpu_submit(&eviction_job);
+   add_dma_fence(&obj->resv, job_dma_fence);
+
+   dma_resv_unlock(&obj->resv);
+   put_object(obj);
+
+Yet another option is to put the gpu_vmas to be invalidated on a separate
+gpu_vm list protected by a lower level lock that can be taken both at eviction
+time and at transfer-to-revalidate list time. The details are not in
+this document, but this for reference implemented in the Intel xe
+driver.
+
+Introducing userptr gpu_vmas
+============================
+
+A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
+GPU virtual address range, directly maps a CPU mm range of anonymous-
+or file page-cache pages.
+A very simple approach would be to just pin the pages using
+pin_user_pages() at bind time and unpin them at unbind time, but this
+creates a Denial-Of-Service vector since a single user-space process
+would be able to pin down all of system memory, which is not
+desirable. (For special use-cases and with proper accounting pinning might
+still be a desirable feature, though). What we need to do in the general case is
+to obtain a reference to the desired pages, make sure we are notified
+using a MMU notifier just before the CPU mm unmaps the pages, dirty
+them if they are not mapped read-only to the GPU, and then drop the reference.
+When we are notified by the MMU notifier that CPU mm is about to drop the
+pages, we need to stop GPU access to the pages,
+GPU page-table and make sure that before the next time the GPU tries to access
+whatever is now present in the CPU mm range, we unmap the old pages
+from the GPU page tables and repeat the process of obtaining new page
+references. Note that when the core mm decides to laundry pages, we get such
+an unmap MMU notification and can mark the pages dirty again before the
+next GPU access. We also get similar MMU notifications for NUMA accounting
+which the GPU driver doesn't really need to care about, but so far
+it's proven difficult to exclude certain notifications.
+
+Using a MMU notifier for device DMA (and other methods) is described in
+`this document
+<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
+
+Now the method of obtaining struct page references using
+get_user_pages() unfortunately can't be used under a dma_resv lock
+since that would violate the locking order of the dma_resv lock vs the
+mmap_lock that is grabbed when resolving a CPU pagefault. This means the gpu_vm's
+list of userptr gpu_vmas needs to be protected by an outer lock, and this
+is the first time we strictly need the gpu_vm->lock. While it was
+previously used also to protect the list of the gpu_vm's shared objects,
+we could in theory have used the gpu_vm->resv for that.
+
+The MMU interval seqlock for a userptr gpu_vma is used in the following
+way:
+
+.. code-block:: C
+
+   down_read(&gpu_vm->lock);
+
+   retry:
+
+   // Note: mmu_interval_read_begin() blocks until there is no
+   // invalidation notifier running anymore.
+   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
+   if (seq != gpu_vma->saved_seq) {
+           obtain_new_page_pointers(&gpu_vma);
+	   dma_resv_lock(&gpu_vm->resv);
+	   put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
+	   dma_resv_unlock(&gpu_vm->resv);
+	   gpu_vma->saved_seq = seq;
+   }
+
+   // The usual revalidation goes here.
+
+   // Final userptr sequence validation may not happen before the
+   // submission dma_fence is added to the gpu_vm's resv, from the POW
+   // of the MMU invalidation notifier. Hence the
+   // userptr_notifier_lock that will make them appear atomic.
+
+   add_dependencies(&gpu_job, &gpu_vm->resv);
+   down_read(&gpu_vm->userptr_notifier_lock);
+   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
+          up_read(&gpu_vm->userptr_notifier_lock);
+	  goto retry;
+   }
+
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gpu_vm->resv);
+
+   for_each_shared_obj(gpu_vm, &obj)
+          add_dma_fence(job_dma_fence, &obj->resv);
+
+   dma_resv_unlock_all_resv_locks();
+   up_read(&gpu_vm->userptr_notifier_lock);
+   up_read(&gpu_vm->lock);
+
+The code between ``mmu_interval_read_begin()`` and the
+``mmu_interval_read_retry()`` marks the read side critical section of
+what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
+gpu_vma list is looped through, and the check is done for *all* of its
+userptr gpu_vmas, although we only show a single one here.
+
+The userptr gpu_vma MMU invalidation notifier might be called from
+reclaim context and, again to avoid locking order violations, we can't
+take any dma_resv lock nor the gpu_vm->lock from within it.
+
+.. code-block:: C
+
+  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
+  {
+          // Make sure the exec function either sees the new sequence
+	  // and backs off or we wait for the dma-fence:
+
+          down_write(&gpu_vm->userptr_notifier_lock);
+	  mmu_interval_set_seq(userptr_interval, cur_seq);
+	  up_write(&gpu_vm->userptr_notifier_lock);
+
+	  dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
+		                false, MAX_SCHEDULE_TIMEOUT);
+	  return true;
+  }
+
+When this invalidation notifier returns, the GPU can no longer be
+accessing the old pages of the userptr gpu_vma and needs to redo the page-binding
+before a new GPU submission can succeed.
+
+Optimizing gpu_vma iteration
+----------------------------
+
+Iterating through all of a gpu_vm's userptr gpu_vmas to check the validity
+on each exec function may be very costly. There is a scheme to avoid
+this and only iterate through the userptr gpu_vmas that actually saw an
+invalidation notifier call since the last exec. T
+
+TODO: describe that scheme here. It's implemented in the xe driver.
+
+Locking for page-table updates at bind- and unbind time
+=======================================================
+
+TODO.
+
+Recoverable page-fault implications
+===================================
+
+TODO.
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Intel-xe] [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-08-16  9:15 ` Thomas Hellström
  0 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-08-16  9:15 UTC (permalink / raw)
  To: intel-xe
  Cc: Francois Dugast, Joonas Lahtinen, linux-kernel, Danilo Krummrich,
	dri-devel, Daniel Vetter, Rodrigo Vivi

Add the first version of the VM_BIND locking document which is
intended to be part of the xe driver upstreaming agreement.

The document describes and discuss the locking used during exec-
functions, evicton and for userptr gpu-vmas. Intention is to be using the
same nomenclature as the drm-vm-bind-async.rst.

v2:
- s/gvm/gpu_vm/g (Rodrigo Vivi)
- Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
  (Rodrigo Vivi)
- Adjust commit message accordingly.
- Add SPDX license header.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 Documentation/gpu/drm-vm-bind-locking.rst | 351 ++++++++++++++++++++++
 1 file changed, 351 insertions(+)
 create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst

diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
new file mode 100644
index 000000000000..b813961a9ec2
--- /dev/null
+++ b/Documentation/gpu/drm-vm-bind-locking.rst
@@ -0,0 +1,351 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+
+===============
+VM_BIND locking
+===============
+
+This document attempts to describe what's needed to get VM_BIND locking right,
+including the userptr mmu_notifier locking and it will also discuss some
+optimizations to get rid of the looping through of all userptr mappings and
+external / shared object mappings that is needed in the simplest
+implementation. It will also discuss some implications for faulting gpu_vms.
+
+Nomenclature
+============
+
+* ``Context``: GPU execution context.
+* ``gpu_vm``: Abstraction of a virtual GPU address space with
+  meta-data. Typically one per client (DRM file-private), or one per
+  context.
+* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with
+  associated meta-data. The backing storage of a gpu_vma can either be
+  a gem buffer object or anonymous pages mapped also into the CPU
+  address space for the process.
+* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing store of
+  which is anonymous pages as described above.
+* ``revalidating``: Revalidating a gpu_vma means making the latest version
+  of the backing store resident and making sure the gpu_vma's
+  page-table entries point to that backing store.
+* ``dma_fence``: A struct dma_fence that is similar to a struct completion
+  and which tracks GPU activity. When the GPU activity is finished,
+  the dma_fence signals.
+* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
+  to track GPU activity in the form of multiple dma_fences on a
+  gpu_vm or a gem buffer object. The dma_resv contains an array / list
+  of dma_fences and a lock that needs to be held when adding
+  additional dma_fences to the dma_resv. The lock is of a type that
+  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
+* ``exec function``: An exec function is a function that revalidates all
+  affected gpu_vmas, submits a GPU command batch and registers the
+  dma_fence representing the GPU command's activity with all affected
+  dma_resvs. For completeness, although not covered by this document,
+  it's worth mentioning that an exec function may also be the
+  revalidation worker that is used by some drivers in compute /
+  long-running mode.
+* ``local object``: A GEM object which is local to a gpu_vm. Shared gem
+  objects also share the gpu_vm's dma_resv.
+* ``shared object``: AKA external object: A GEM object which may be shared
+  by multiple gpu_vms and whose backing storage may be shared with
+  other drivers.
+
+
+Introducing the locks
+=====================
+
+One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
+dma_resv object and hence the dma_resv lock. So even with a huge
+number of local GEM objects, only one lock is needed to make the exec
+sequence atomic.
+
+The following locks and locking orders are used:
+
+* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the gpu_vm is
+  partitioned into gpu_vmas, protects the gpu_vm's list of external objects,
+  and can also with some simplification protect the gpu_vm's list of
+  userptr gpu_vmas. With the CPU mm analogy this would correspond to the
+  mmap_lock.
+* The ``userptr_seqlock``. This lock is taken in read mode for each
+  userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
+  notifier invalidation. This is not a real seqlock but described in
+  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
+  'lock' a lot like a seqcount, however this allows multiple
+  write-sides to hold it at once...". The read side critical section
+  is enclosed by ``mmu_interval_read_begin() /
+  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
+  sleeping uninterruptibly if the write side is held.
+  The write side is held by the core mm while calling mmu interval
+  invalidation notifiers.
+* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
+  rebinding, and also the residency of all the gpu_vm's local GEM object.
+* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is taken in read
+  mode during exec and write mode during a mmu notifier invalidation. In
+  the absence of a separate page-table lock, this lock can serve
+  together with the gpu_vm's dma_resv lock as a page-table lock. More on
+  this below. The userptr notifier lock is per gpu_vm.
+* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's page-table updates. For
+  simplicity the gpu_vm's dma_resv lock can be reused as page-table lock.
+
+There are certain optimizations described below that require
+additional locks. More on that later.
+
+.. code-block:: C
+
+   dma_resv_lock(&gpu_vm->resv);
+
+   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
+		revalidate_gpu_vma(&gpu_vma);
+		remove_from_revalidate_list(&gpu_vma);
+   }
+
+   add_dependencies(&gpu_job, &gpu_vm->resv);
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gpu_vm->resv);
+   dma_resv_unlock(&gpu_vm->resv);
+
+Eviction of one of these local objects will then be something like the
+following:
+
+.. code-block:: C
+
+   obj = get_object_from_lru();
+
+   dma_resv_lock(obj->resv);
+   for_each_gpu_vma_of_obj(obj, &gpu_vma);
+		put_gpu_vma_on_revalidate_list(&gpu_vma);
+
+   add_dependencies(&eviction_job, &obj->resv);
+   job_dma_fence = gpu_submit(&eviction_job);
+   add_dma_fence(&obj->resv, job_dma_fence);
+
+   dma_resv_unlock(&obj->resv);
+   put_object(obj);
+
+Note that since the object is local to the gpu_vm, it will share the gpu_vm's
+``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. Invalidated gpu_vmas are put
+on the gpu_vm's revalidation list, which is protected by ``gpu_vm->resv``, which
+is always locked while evicting, due to the above equality.
+
+For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before eviction,
+Since the eviction blit or copy will wait for GPU idle, any attempt by
+the GPU to access freed memory through the gpu_vma will be preceded by
+a new exec function, which will make sure the gpu_vma is
+revalidated. The eviction code holding the object's dma_resv while
+revalidating will ensure a new exec function may not race with the eviction.
+
+Introducing external (or shared) buffer objects
+===============================================
+
+Since shared buffer objects may be shared by multiple gpu_vm's they
+can't share their reservation object with a single gpu_vm, but will rather
+have a reservation object of their own. The shared objects bound to a
+gpu_vm using one or many
+gpu_vmas are therefore typically put on a per-gpu_vm list which is
+protected by the gpu_vm lock. One could in theory protect it also with
+the ``gpu_vm->resv``, but since the list of dma_resvs to take is typically
+built before the ``gpu_vm->resv`` is locked due to a limitation in
+the current locking helpers, that is typically not done. Also see
+below for userptr gpu_vmas.
+
+At eviction time we now need to invalidate *all* gpu_vmas of a shared
+object, but we can no longer be certain that we hold the gpu_vm's
+dma_resv of all the object's gpu_vmas. We can only be certain that we
+hold the object's private dma_resv. We can trylock the dma_resvs for
+the affected gpu_vm's but that might be unnecessarily complex. If we
+have a ww_acquire context at hand at eviction time we can also perform
+sleeping locks of those dma_resvs but that could cause expensive
+rollbacks. One option is to just mark the invalidated gpu_vmas with a bool
+which is inspected on the next exec function, when the gpu_vm's
+dma_resv and the object's dma_resv is held, and the invalidated
+gpu_vmas could then be put on the gpu_vm's list of invalidated
+gpu_vmas. That bool would then, although being per-gpu_vma formally be
+protected by the object's dma_resv.
+
+The exec function would then look something like the following:
+
+.. code-block:: C
+
+   read_lock(&gpu_vm->lock);
+
+   dma_resv_lock(&gpu_vm->resv);
+
+   // Shared object list is protected by the gpu_vm->lock.
+   for_each_shared_obj(gpu_vm, &obj) {
+		dma_resv_lock(&obj->resv);
+		move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
+   }
+
+   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
+		revalidate_gpu_vma(&gpu_vma);
+		remove_from_revalidate_list(&gpu_vma);
+   }
+
+   add_dependencies(&gpu_job, &gpu_vm->resv);
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gpu_vm->resv);
+   for_each_shared_obj(gpu_vm, &obj)
+          add_dma_fence(job_dma_fence, &obj->resv);
+   dma_resv_unlock_all_resv_locks();
+
+   read_unlock(&gpu_vm->lock);
+
+And the corresponding shared-object aware eviction would look like:
+
+.. code-block:: C
+
+   obj = get_object_from_lru();
+
+   dma_resv_lock(obj->resv);
+   for_each_gpu_vma_of_obj(obj, &gpu_vma);
+		if (object_is_vm_local(obj))
+		             put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
+		else
+		             mark_gpu_vma_for_revalidation(&gpu_vma);
+
+   add_dependencies(&eviction_job, &obj->resv);
+   job_dma_fence = gpu_submit(&eviction_job);
+   add_dma_fence(&obj->resv, job_dma_fence);
+
+   dma_resv_unlock(&obj->resv);
+   put_object(obj);
+
+Yet another option is to put the gpu_vmas to be invalidated on a separate
+gpu_vm list protected by a lower level lock that can be taken both at eviction
+time and at transfer-to-revalidate list time. The details are not in
+this document, but this for reference implemented in the Intel xe
+driver.
+
+Introducing userptr gpu_vmas
+============================
+
+A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
+GPU virtual address range, directly maps a CPU mm range of anonymous-
+or file page-cache pages.
+A very simple approach would be to just pin the pages using
+pin_user_pages() at bind time and unpin them at unbind time, but this
+creates a Denial-Of-Service vector since a single user-space process
+would be able to pin down all of system memory, which is not
+desirable. (For special use-cases and with proper accounting pinning might
+still be a desirable feature, though). What we need to do in the general case is
+to obtain a reference to the desired pages, make sure we are notified
+using a MMU notifier just before the CPU mm unmaps the pages, dirty
+them if they are not mapped read-only to the GPU, and then drop the reference.
+When we are notified by the MMU notifier that CPU mm is about to drop the
+pages, we need to stop GPU access to the pages,
+GPU page-table and make sure that before the next time the GPU tries to access
+whatever is now present in the CPU mm range, we unmap the old pages
+from the GPU page tables and repeat the process of obtaining new page
+references. Note that when the core mm decides to laundry pages, we get such
+an unmap MMU notification and can mark the pages dirty again before the
+next GPU access. We also get similar MMU notifications for NUMA accounting
+which the GPU driver doesn't really need to care about, but so far
+it's proven difficult to exclude certain notifications.
+
+Using a MMU notifier for device DMA (and other methods) is described in
+`this document
+<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
+
+Now the method of obtaining struct page references using
+get_user_pages() unfortunately can't be used under a dma_resv lock
+since that would violate the locking order of the dma_resv lock vs the
+mmap_lock that is grabbed when resolving a CPU pagefault. This means the gpu_vm's
+list of userptr gpu_vmas needs to be protected by an outer lock, and this
+is the first time we strictly need the gpu_vm->lock. While it was
+previously used also to protect the list of the gpu_vm's shared objects,
+we could in theory have used the gpu_vm->resv for that.
+
+The MMU interval seqlock for a userptr gpu_vma is used in the following
+way:
+
+.. code-block:: C
+
+   down_read(&gpu_vm->lock);
+
+   retry:
+
+   // Note: mmu_interval_read_begin() blocks until there is no
+   // invalidation notifier running anymore.
+   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
+   if (seq != gpu_vma->saved_seq) {
+           obtain_new_page_pointers(&gpu_vma);
+	   dma_resv_lock(&gpu_vm->resv);
+	   put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
+	   dma_resv_unlock(&gpu_vm->resv);
+	   gpu_vma->saved_seq = seq;
+   }
+
+   // The usual revalidation goes here.
+
+   // Final userptr sequence validation may not happen before the
+   // submission dma_fence is added to the gpu_vm's resv, from the POW
+   // of the MMU invalidation notifier. Hence the
+   // userptr_notifier_lock that will make them appear atomic.
+
+   add_dependencies(&gpu_job, &gpu_vm->resv);
+   down_read(&gpu_vm->userptr_notifier_lock);
+   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
+          up_read(&gpu_vm->userptr_notifier_lock);
+	  goto retry;
+   }
+
+   job_dma_fence = gpu_submit(&gpu_job));
+
+   add_dma_fence(job_dma_fence, &gpu_vm->resv);
+
+   for_each_shared_obj(gpu_vm, &obj)
+          add_dma_fence(job_dma_fence, &obj->resv);
+
+   dma_resv_unlock_all_resv_locks();
+   up_read(&gpu_vm->userptr_notifier_lock);
+   up_read(&gpu_vm->lock);
+
+The code between ``mmu_interval_read_begin()`` and the
+``mmu_interval_read_retry()`` marks the read side critical section of
+what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
+gpu_vma list is looped through, and the check is done for *all* of its
+userptr gpu_vmas, although we only show a single one here.
+
+The userptr gpu_vma MMU invalidation notifier might be called from
+reclaim context and, again to avoid locking order violations, we can't
+take any dma_resv lock nor the gpu_vm->lock from within it.
+
+.. code-block:: C
+
+  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
+  {
+          // Make sure the exec function either sees the new sequence
+	  // and backs off or we wait for the dma-fence:
+
+          down_write(&gpu_vm->userptr_notifier_lock);
+	  mmu_interval_set_seq(userptr_interval, cur_seq);
+	  up_write(&gpu_vm->userptr_notifier_lock);
+
+	  dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
+		                false, MAX_SCHEDULE_TIMEOUT);
+	  return true;
+  }
+
+When this invalidation notifier returns, the GPU can no longer be
+accessing the old pages of the userptr gpu_vma and needs to redo the page-binding
+before a new GPU submission can succeed.
+
+Optimizing gpu_vma iteration
+----------------------------
+
+Iterating through all of a gpu_vm's userptr gpu_vmas to check the validity
+on each exec function may be very costly. There is a scheme to avoid
+this and only iterate through the userptr gpu_vmas that actually saw an
+invalidation notifier call since the last exec. T
+
+TODO: describe that scheme here. It's implemented in the xe driver.
+
+Locking for page-table updates at bind- and unbind time
+=======================================================
+
+TODO.
+
+Recoverable page-fault implications
+===================================
+
+TODO.
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Intel-xe] ✓ CI.Patch_applied: success for Documentation/gpu: VM_BIND locking document
  2023-08-16  9:15 ` Thomas Hellström
  (?)
  (?)
@ 2023-08-16  9:56 ` Patchwork
  -1 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2023-08-16  9:56 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

== Series Details ==

Series: Documentation/gpu: VM_BIND locking document
URL   : https://patchwork.freedesktop.org/series/122507/
State : success

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: 9829aba16 drm/xe/dg2: Remove Wa_15010599737
=== git am output follows ===
Applying: Documentation/gpu: VM_BIND locking document



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Intel-xe] ✗ CI.checkpatch: warning for Documentation/gpu: VM_BIND locking document
  2023-08-16  9:15 ` Thomas Hellström
                   ` (2 preceding siblings ...)
  (?)
@ 2023-08-16  9:57 ` Patchwork
  -1 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2023-08-16  9:57 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

== Series Details ==

Series: Documentation/gpu: VM_BIND locking document
URL   : https://patchwork.freedesktop.org/series/122507/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
e700ea2f248a75138759bcb443affeef4a2d1991
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 0c030b258b33f57def9802c9145f7c152f210c6f
Author: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Date:   Wed Aug 16 11:15:47 2023 +0200

    Documentation/gpu: VM_BIND locking document
    
    Add the first version of the VM_BIND locking document which is
    intended to be part of the xe driver upstreaming agreement.
    
    The document describes and discuss the locking used during exec-
    functions, evicton and for userptr gpu-vmas. Intention is to be using the
    same nomenclature as the drm-vm-bind-async.rst.
    
    v2:
    - s/gvm/gpu_vm/g (Rodrigo Vivi)
    - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
      (Rodrigo Vivi)
    - Adjust commit message accordingly.
    - Add SPDX license header.
    
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
+ /mt/dim checkpatch 9829aba16e62fcfba150f72d5d492fd778e0150e drm-intel
/mt/dim: line 50: /root/.dimrc: No such file or directory



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Intel-xe] ✓ CI.KUnit: success for Documentation/gpu: VM_BIND locking document
  2023-08-16  9:15 ` Thomas Hellström
                   ` (3 preceding siblings ...)
  (?)
@ 2023-08-16  9:58 ` Patchwork
  -1 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2023-08-16  9:58 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

== Series Details ==

Series: Documentation/gpu: VM_BIND locking document
URL   : https://patchwork.freedesktop.org/series/122507/
State : success

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
stty: 'standard input': Inappropriate ioctl for device
[09:57:02] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[09:57:06] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make ARCH=um O=.kunit --jobs=48
[09:57:25] Starting KUnit Kernel (1/1)...
[09:57:25] ============================================================
[09:57:25] ==================== xe_bo (2 subtests) ====================
[09:57:25] [SKIPPED] xe_ccs_migrate_kunit
[09:57:25] [SKIPPED] xe_bo_evict_kunit
[09:57:25] ===================== [SKIPPED] xe_bo ======================
[09:57:25] ================== xe_dma_buf (1 subtest) ==================
[09:57:25] [SKIPPED] xe_dma_buf_kunit
[09:57:25] =================== [SKIPPED] xe_dma_buf ===================
[09:57:25] ================== xe_migrate (1 subtest) ==================
[09:57:25] [SKIPPED] xe_migrate_sanity_kunit
[09:57:25] =================== [SKIPPED] xe_migrate ===================
[09:57:25] =================== xe_pci (2 subtests) ====================
[09:57:25] [PASSED] xe_gmdid_graphics_ip
[09:57:25] [PASSED] xe_gmdid_media_ip
[09:57:25] ===================== [PASSED] xe_pci ======================
[09:57:25] ==================== xe_rtp (1 subtest) ====================
[09:57:25] ================== xe_rtp_process_tests  ===================
[09:57:25] [PASSED] coalesce-same-reg
[09:57:25] [PASSED] no-match-no-add
[09:57:25] [PASSED] no-match-no-add-multiple-rules
[09:57:25] [PASSED] two-regs-two-entries
[09:57:25] [PASSED] clr-one-set-other
[09:57:25] [PASSED] set-field
[09:57:25] [PASSED] conflict-duplicate
[09:57:25] [PASSED] conflict-not-disjoint
[09:57:25] [PASSED] conflict-reg-type
[09:57:25] ============== [PASSED] xe_rtp_process_tests ===============
[09:57:25] ===================== [PASSED] xe_rtp ======================
[09:57:25] ==================== xe_wa (1 subtest) =====================
[09:57:25] ======================== xe_wa_gt  =========================
[09:57:25] [PASSED] TIGERLAKE (B0)
[09:57:25] [PASSED] DG1 (A0)
[09:57:25] [PASSED] DG1 (B0)
[09:57:25] [PASSED] ALDERLAKE_S (A0)
[09:57:25] [PASSED] ALDERLAKE_S (B0)
[09:57:25] [PASSED] ALDERLAKE_S (C0)
[09:57:25] [PASSED] ALDERLAKE_S (D0)
[09:57:25] [PASSED] ALDERLAKE_P (A0)
[09:57:25] [PASSED] ALDERLAKE_P (B0)
[09:57:25] [PASSED] ALDERLAKE_P (C0)
[09:57:25] [PASSED] DG2_G10 (A0)
[09:57:25] [PASSED] DG2_G10 (A1)
[09:57:25] [PASSED] DG2_G10 (B0)
[09:57:25] [PASSED] DG2_G10 (C0)
[09:57:25] [PASSED] DG2_G11 (A0)
[09:57:25] [PASSED] DG2_G11 (B0)
[09:57:25] [PASSED] DG2_G11 (B1)
[09:57:25] [PASSED] DG2_G12 (A0)
[09:57:25] [PASSED] DG2_G12 (A1)
[09:57:25] [PASSED] PVC (B0)
[09:57:25] [PASSED] PVC (B1)
[09:57:25] [PASSED] PVC (C0)
[09:57:25] ==================== [PASSED] xe_wa_gt =====================
[09:57:25] ====================== [PASSED] xe_wa ======================
[09:57:25] ============================================================
[09:57:25] Testing complete. Ran 37 tests: passed: 33, skipped: 4
[09:57:25] Elapsed time: 23.722s total, 4.245s configuring, 19.307s building, 0.136s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[09:57:26] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[09:57:27] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make ARCH=um O=.kunit --jobs=48
[09:57:46] Starting KUnit Kernel (1/1)...
[09:57:46] ============================================================
[09:57:46] ============ drm_test_pick_cmdline (2 subtests) ============
[09:57:46] [PASSED] drm_test_pick_cmdline_res_1920_1080_60
[09:57:46] =============== drm_test_pick_cmdline_named  ===============
[09:57:46] [PASSED] NTSC
[09:57:46] [PASSED] NTSC-J
[09:57:46] [PASSED] PAL
[09:57:46] [PASSED] PAL-M
[09:57:46] =========== [PASSED] drm_test_pick_cmdline_named ===========
[09:57:46] ============== [PASSED] drm_test_pick_cmdline ==============
[09:57:46] ================== drm_buddy (6 subtests) ==================
[09:57:46] [PASSED] drm_test_buddy_alloc_limit
[09:57:46] [PASSED] drm_test_buddy_alloc_range
[09:57:46] [PASSED] drm_test_buddy_alloc_optimistic
[09:57:46] [PASSED] drm_test_buddy_alloc_pessimistic
[09:57:46] [PASSED] drm_test_buddy_alloc_smoke
[09:57:46] [PASSED] drm_test_buddy_alloc_pathological
[09:57:46] ==================== [PASSED] drm_buddy ====================
[09:57:46] ============= drm_cmdline_parser (40 subtests) =============
[09:57:46] [PASSED] drm_test_cmdline_force_d_only
[09:57:46] [PASSED] drm_test_cmdline_force_D_only_dvi
[09:57:46] [PASSED] drm_test_cmdline_force_D_only_hdmi
[09:57:46] [PASSED] drm_test_cmdline_force_D_only_not_digital
[09:57:46] [PASSED] drm_test_cmdline_force_e_only
[09:57:46] [PASSED] drm_test_cmdline_res
[09:57:46] [PASSED] drm_test_cmdline_res_vesa
[09:57:46] [PASSED] drm_test_cmdline_res_vesa_rblank
[09:57:46] [PASSED] drm_test_cmdline_res_rblank
[09:57:46] [PASSED] drm_test_cmdline_res_bpp
[09:57:46] [PASSED] drm_test_cmdline_res_refresh
[09:57:46] [PASSED] drm_test_cmdline_res_bpp_refresh
[09:57:46] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[09:57:46] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[09:57:46] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[09:57:46] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[09:57:46] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[09:57:46] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[09:57:46] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[09:57:46] [PASSED] drm_test_cmdline_res_margins_force_on
[09:57:46] [PASSED] drm_test_cmdline_res_vesa_margins
[09:57:46] [PASSED] drm_test_cmdline_name
[09:57:46] [PASSED] drm_test_cmdline_name_bpp
[09:57:46] [PASSED] drm_test_cmdline_name_option
[09:57:46] [PASSED] drm_test_cmdline_name_bpp_option
[09:57:46] [PASSED] drm_test_cmdline_rotate_0
[09:57:46] [PASSED] drm_test_cmdline_rotate_90
[09:57:46] [PASSED] drm_test_cmdline_rotate_180
[09:57:46] [PASSED] drm_test_cmdline_rotate_270
[09:57:46] [PASSED] drm_test_cmdline_hmirror
[09:57:46] [PASSED] drm_test_cmdline_vmirror
[09:57:46] [PASSED] drm_test_cmdline_margin_options
[09:57:46] [PASSED] drm_test_cmdline_multiple_options
[09:57:46] [PASSED] drm_test_cmdline_bpp_extra_and_option
[09:57:46] [PASSED] drm_test_cmdline_extra_and_option
[09:57:46] [PASSED] drm_test_cmdline_freestanding_options
[09:57:46] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[09:57:46] [PASSED] drm_test_cmdline_panel_orientation
[09:57:46] ================ drm_test_cmdline_invalid  =================
[09:57:46] [PASSED] margin_only
[09:57:46] [PASSED] interlace_only
[09:57:46] [PASSED] res_missing_x
[09:57:46] [PASSED] res_missing_y
[09:57:46] [PASSED] res_bad_y
[09:57:46] [PASSED] res_missing_y_bpp
[09:57:46] [PASSED] res_bad_bpp
[09:57:46] [PASSED] res_bad_refresh
[09:57:46] [PASSED] res_bpp_refresh_force_on_off
[09:57:46] [PASSED] res_invalid_mode
[09:57:46] [PASSED] res_bpp_wrong_place_mode
[09:57:46] [PASSED] name_bpp_refresh
[09:57:46] [PASSED] name_refresh
[09:57:46] [PASSED] name_refresh_wrong_mode
[09:57:46] [PASSED] name_refresh_invalid_mode
[09:57:46] [PASSED] rotate_multiple
[09:57:46] [PASSED] rotate_invalid_val
[09:57:46] [PASSED] rotate_truncated
[09:57:46] [PASSED] invalid_option
[09:57:46] [PASSED] invalid_tv_option
[09:57:46] [PASSED] truncated_tv_option
[09:57:46] ============ [PASSED] drm_test_cmdline_invalid =============
[09:57:46] =============== drm_test_cmdline_tv_options  ===============
[09:57:46] [PASSED] NTSC
[09:57:46] [PASSED] NTSC_443
[09:57:46] [PASSED] NTSC_J
[09:57:46] [PASSED] PAL
[09:57:46] [PASSED] PAL_M
[09:57:46] [PASSED] PAL_N
[09:57:46] [PASSED] SECAM
[09:57:46] =========== [PASSED] drm_test_cmdline_tv_options ===========
[09:57:46] =============== [PASSED] drm_cmdline_parser ================
[09:57:46] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[09:57:46] ========== drm_test_get_tv_mode_from_name_valid  ===========
[09:57:46] [PASSED] NTSC
[09:57:46] [PASSED] NTSC-443
[09:57:46] [PASSED] NTSC-J
[09:57:46] [PASSED] PAL
[09:57:46] [PASSED] PAL-M
[09:57:46] [PASSED] PAL-N
[09:57:46] [PASSED] SECAM
[09:57:46] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[09:57:46] [PASSED] drm_test_get_tv_mode_from_name_truncated
[09:57:46] ============ [PASSED] drm_get_tv_mode_from_name ============
[09:57:46] ============= drm_damage_helper (21 subtests) ==============
[09:57:46] [PASSED] drm_test_damage_iter_no_damage
[09:57:46] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[09:57:46] [PASSED] drm_test_damage_iter_no_damage_src_moved
[09:57:46] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[09:57:46] [PASSED] drm_test_damage_iter_no_damage_not_visible
[09:57:46] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[09:57:46] [PASSED] drm_test_damage_iter_no_damage_no_fb
[09:57:46] [PASSED] drm_test_damage_iter_simple_damage
[09:57:46] [PASSED] drm_test_damage_iter_single_damage
[09:57:46] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[09:57:46] [PASSED] drm_test_damage_iter_single_damage_outside_src
[09:57:46] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[09:57:46] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[09:57:46] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[09:57:46] [PASSED] drm_test_damage_iter_single_damage_src_moved
[09:57:46] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[09:57:46] [PASSED] drm_test_damage_iter_damage
[09:57:46] [PASSED] drm_test_damage_iter_damage_one_intersect
[09:57:46] [PASSED] drm_test_damage_iter_damage_one_outside
[09:57:46] [PASSED] drm_test_damage_iter_damage_src_moved
[09:57:46] [PASSED] drm_test_damage_iter_damage_not_visible
[09:57:46] ================ [PASSED] drm_damage_helper ================
[09:57:46] ============== drm_dp_mst_helper (2 subtests) ==============
[09:57:46] ============== drm_test_dp_mst_calc_pbn_mode  ==============
[09:57:46] [PASSED] Clock 154000 BPP 30 DSC disabled
[09:57:46] [PASSED] Clock 234000 BPP 30 DSC disabled
[09:57:46] [PASSED] Clock 297000 BPP 24 DSC disabled
[09:57:46] [PASSED] Clock 332880 BPP 24 DSC enabled
[09:57:46] [PASSED] Clock 324540 BPP 24 DSC enabled
[09:57:46] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[09:57:46] ========= drm_test_dp_mst_sideband_msg_req_decode  =========
[09:57:46] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[09:57:46] [PASSED] DP_POWER_UP_PHY with port number
[09:57:46] [PASSED] DP_POWER_DOWN_PHY with port number
[09:57:46] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[09:57:46] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[09:57:46] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[09:57:46] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[09:57:46] [PASSED] DP_QUERY_PAYLOAD with port number
[09:57:46] [PASSED] DP_QUERY_PAYLOAD with VCPI
[09:57:46] [PASSED] DP_REMOTE_DPCD_READ with port number
[09:57:46] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[09:57:46] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[09:57:46] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[09:57:46] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[09:57:46] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[09:57:46] [PASSED] DP_REMOTE_I2C_READ with port number
[09:57:46] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[09:57:46] [PASSED] DP_REMOTE_I2C_READ with transactions array
[09:57:46] [PASSED] DP_REMOTE_I2C_WRITE with port number
[09:57:46] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[09:57:46] [PASSED] DP_REMOTE_I2C_WRITE with data array
[09:57:46] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[09:57:46] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[09:57:46] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[09:57:46] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[09:57:46] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[09:57:46] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[09:57:46] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[09:57:46] ================ [PASSED] drm_dp_mst_helper ================
[09:57:46] =========== drm_format_helper_test (11 subtests) ===========
[09:57:46] ============== drm_test_fb_xrgb8888_to_gray8  ==============
[09:57:46] [PASSED] single_pixel_source_buffer
[09:57:46] [PASSED] single_pixel_clip_rectangle
[09:57:46] [PASSED] well_known_colors
[09:57:46] [PASSED] destination_pitch
[09:57:46] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[09:57:46] ============= drm_test_fb_xrgb8888_to_rgb332  ==============
[09:57:46] [PASSED] single_pixel_source_buffer
[09:57:46] [PASSED] single_pixel_clip_rectangle
[09:57:46] [PASSED] well_known_colors
[09:57:46] [PASSED] destination_pitch
[09:57:46] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[09:57:46] ============= drm_test_fb_xrgb8888_to_rgb565  ==============
[09:57:46] [PASSED] single_pixel_source_buffer
[09:57:46] [PASSED] single_pixel_clip_rectangle
[09:57:46] [PASSED] well_known_colors
[09:57:46] [PASSED] destination_pitch
[09:57:46] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[09:57:46] ============ drm_test_fb_xrgb8888_to_xrgb1555  =============
[09:57:46] [PASSED] single_pixel_source_buffer
[09:57:46] [PASSED] single_pixel_clip_rectangle
[09:57:46] [PASSED] well_known_colors
[09:57:46] [PASSED] destination_pitch
[09:57:46] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[09:57:46] ============ drm_test_fb_xrgb8888_to_argb1555  =============
[09:57:46] [PASSED] single_pixel_source_buffer
[09:57:46] [PASSED] single_pixel_clip_rectangle
[09:57:46] [PASSED] well_known_colors
[09:57:46] [PASSED] destination_pitch
[09:57:46] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[09:57:46] ============ drm_test_fb_xrgb8888_to_rgba5551  =============
[09:57:46] [PASSED] single_pixel_source_buffer
[09:57:46] [PASSED] single_pixel_clip_rectangle
[09:57:46] [PASSED] well_known_colors
[09:57:46] [PASSED] destination_pitch
[09:57:46] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[09:57:46] ============= drm_test_fb_xrgb8888_to_rgb888  ==============
[09:57:46] [PASSED] single_pixel_source_buffer
[09:57:46] [PASSED] single_pixel_clip_rectangle
[09:57:46] [PASSED] well_known_colors
[09:57:46] [PASSED] destination_pitch
[09:57:46] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[09:57:46] ============ drm_test_fb_xrgb8888_to_argb8888  =============
[09:57:46] [PASSED] single_pixel_source_buffer
[09:57:46] [PASSED] single_pixel_clip_rectangle
[09:57:46] [PASSED] well_known_colors
[09:57:46] [PASSED] destination_pitch
[09:57:46] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[09:57:46] =========== drm_test_fb_xrgb8888_to_xrgb2101010  ===========
[09:57:46] [PASSED] single_pixel_source_buffer
[09:57:46] [PASSED] single_pixel_clip_rectangle
[09:57:46] [PASSED] well_known_colors
[09:57:46] [PASSED] destination_pitch
[09:57:46] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[09:57:46] =========== drm_test_fb_xrgb8888_to_argb2101010  ===========
[09:57:46] [PASSED] single_pixel_source_buffer
[09:57:46] [PASSED] single_pixel_clip_rectangle
[09:57:46] [PASSED] well_known_colors
[09:57:46] [PASSED] destination_pitch
[09:57:46] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[09:57:46] ============== drm_test_fb_xrgb8888_to_mono  ===============
[09:57:46] [PASSED] single_pixel_source_buffer
[09:57:46] [PASSED] single_pixel_clip_rectangle
[09:57:46] [PASSED] well_known_colors
[09:57:46] [PASSED] destination_pitch
[09:57:46] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[09:57:46] ============= [PASSED] drm_format_helper_test ==============
[09:57:46] ================= drm_format (18 subtests) =================
[09:57:46] [PASSED] drm_test_format_block_width_invalid
[09:57:46] [PASSED] drm_test_format_block_width_one_plane
[09:57:46] [PASSED] drm_test_format_block_width_two_plane
[09:57:46] [PASSED] drm_test_format_block_width_three_plane
[09:57:46] [PASSED] drm_test_format_block_width_tiled
[09:57:46] [PASSED] drm_test_format_block_height_invalid
[09:57:46] [PASSED] drm_test_format_block_height_one_plane
[09:57:46] [PASSED] drm_test_format_block_height_two_plane
[09:57:46] [PASSED] drm_test_format_block_height_three_plane
[09:57:46] [PASSED] drm_test_format_block_height_tiled
[09:57:46] [PASSED] drm_test_format_min_pitch_invalid
[09:57:46] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[09:57:46] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[09:57:46] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[09:57:46] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[09:57:46] [PASSED] drm_test_format_min_pitch_two_plane
[09:57:46] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[09:57:46] [PASSED] drm_test_format_min_pitch_tiled
[09:57:46] =================== [PASSED] drm_format ====================
[09:57:46] =============== drm_framebuffer (1 subtest) ================
[09:57:46] =============== drm_test_framebuffer_create  ===============
[09:57:46] [PASSED] ABGR8888 normal sizes
[09:57:46] [PASSED] ABGR8888 max sizes
[09:57:46] [PASSED] ABGR8888 pitch greater than min required
[09:57:46] [PASSED] ABGR8888 pitch less than min required
[09:57:46] [PASSED] ABGR8888 Invalid width
[09:57:46] [PASSED] ABGR8888 Invalid buffer handle
[09:57:46] [PASSED] No pixel format
[09:57:46] [PASSED] ABGR8888 Width 0
[09:57:46] [PASSED] ABGR8888 Height 0
[09:57:47] [PASSED] ABGR8888 Out of bound height * pitch combination
[09:57:47] [PASSED] ABGR8888 Large buffer offset
[09:57:47] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[09:57:47] [PASSED] ABGR8888 Valid buffer modifier
[09:57:47] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[09:57:47] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[09:57:47] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[09:57:47] [PASSED] NV12 Normal sizes
[09:57:47] [PASSED] NV12 Max sizes
[09:57:47] [PASSED] NV12 Invalid pitch
[09:57:47] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[09:57:47] [PASSED] NV12 different  modifier per-plane
[09:57:47] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[09:57:47] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[09:57:47] [PASSED] NV12 Modifier for inexistent plane
[09:57:47] [PASSED] NV12 Handle for inexistent plane
[09:57:47] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[09:57:47] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[09:57:47] [PASSED] YVU420 Normal sizes
[09:57:47] [PASSED] YVU420 Max sizes
[09:57:47] [PASSED] YVU420 Invalid pitch
[09:57:47] [PASSED] YVU420 Different pitches
[09:57:47] [PASSED] YVU420 Different buffer offsets/pitches
[09:57:47] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[09:57:47] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[09:57:47] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[09:57:47] [PASSED] YVU420 Valid modifier
[09:57:47] [PASSED] YVU420 Different modifiers per plane
[09:57:47] [PASSED] YVU420 Modifier for inexistent plane
[09:57:47] [PASSED] X0L2 Normal sizes
[09:57:47] [PASSED] X0L2 Max sizes
[09:57:47] [PASSED] X0L2 Invalid pitch
[09:57:47] [PASSED] X0L2 Pitch greater than minimum required
stty: 'standard input': Inappropriate ioctl for device
[09:57:47] [PASSED] X0L2 Handle for inexistent plane
[09:57:47] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[09:57:47] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[09:57:47] [PASSED] X0L2 Valid modifier
[09:57:47] [PASSED] X0L2 Modifier for inexistent plane
[09:57:47] =========== [PASSED] drm_test_framebuffer_create ===========
[09:57:47] ================= [PASSED] drm_framebuffer =================
[09:57:47] =============== drm-test-managed (1 subtest) ===============
[09:57:47] [PASSED] drm_test_managed_run_action
[09:57:47] ================ [PASSED] drm-test-managed =================
[09:57:47] =================== drm_mm (19 subtests) ===================
[09:57:47] [PASSED] drm_test_mm_init
[09:57:47] [PASSED] drm_test_mm_debug
[09:57:57] [PASSED] drm_test_mm_reserve
[09:58:07] [PASSED] drm_test_mm_insert
[09:58:08] [PASSED] drm_test_mm_replace
[09:58:08] [PASSED] drm_test_mm_insert_range
[09:58:08] [PASSED] drm_test_mm_frag
[09:58:08] [PASSED] drm_test_mm_align
[09:58:08] [PASSED] drm_test_mm_align32
[09:58:08] [PASSED] drm_test_mm_align64
[09:58:09] [PASSED] drm_test_mm_evict
[09:58:09] [PASSED] drm_test_mm_evict_range
[09:58:09] [PASSED] drm_test_mm_topdown
[09:58:09] [PASSED] drm_test_mm_bottomup
[09:58:09] [PASSED] drm_test_mm_lowest
[09:58:09] [PASSED] drm_test_mm_highest
[09:58:09] [PASSED] drm_test_mm_color
[09:58:10] [PASSED] drm_test_mm_color_evict
[09:58:10] [PASSED] drm_test_mm_color_evict_range
[09:58:10] ===================== [PASSED] drm_mm ======================
[09:58:10] ============= drm_modes_analog_tv (4 subtests) =============
[09:58:10] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[09:58:10] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[09:58:10] [PASSED] drm_test_modes_analog_tv_pal_576i
[09:58:10] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[09:58:10] =============== [PASSED] drm_modes_analog_tv ===============
[09:58:10] ============== drm_plane_helper (2 subtests) ===============
[09:58:10] =============== drm_test_check_plane_state  ================
[09:58:10] [PASSED] clipping_simple
[09:58:10] [PASSED] clipping_rotate_reflect
[09:58:10] [PASSED] positioning_simple
[09:58:10] [PASSED] upscaling
[09:58:10] [PASSED] downscaling
[09:58:10] [PASSED] rounding1
[09:58:10] [PASSED] rounding2
[09:58:10] [PASSED] rounding3
[09:58:10] [PASSED] rounding4
[09:58:10] =========== [PASSED] drm_test_check_plane_state ============
[09:58:10] =========== drm_test_check_invalid_plane_state  ============
[09:58:10] [PASSED] positioning_invalid
[09:58:10] [PASSED] upscaling_invalid
[09:58:10] [PASSED] downscaling_invalid
[09:58:10] ======= [PASSED] drm_test_check_invalid_plane_state ========
[09:58:10] ================ [PASSED] drm_plane_helper =================
[09:58:10] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[09:58:10] ====== drm_test_connector_helper_tv_get_modes_check  =======
[09:58:10] [PASSED] None
[09:58:10] [PASSED] PAL
[09:58:10] [PASSED] NTSC
[09:58:10] [PASSED] Both, NTSC Default
[09:58:10] [PASSED] Both, PAL Default
[09:58:10] [PASSED] Both, NTSC Default, with PAL on command-line
[09:58:10] [PASSED] Both, PAL Default, with NTSC on command-line
[09:58:10] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[09:58:10] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[09:58:10] ================== drm_rect (9 subtests) ===================
[09:58:10] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[09:58:10] [PASSED] drm_test_rect_clip_scaled_not_clipped
[09:58:10] [PASSED] drm_test_rect_clip_scaled_clipped
[09:58:10] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[09:58:10] ================= drm_test_rect_intersect  =================
[09:58:10] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[09:58:10] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[09:58:10] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[09:58:10] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[09:58:10] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[09:58:10] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[09:58:10] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[09:58:10] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[09:58:10] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[09:58:10] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[09:58:10] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[09:58:10] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[09:58:10] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[09:58:10] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[09:58:10] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[09:58:10] ============= [PASSED] drm_test_rect_intersect =============
[09:58:10] ================ drm_test_rect_calc_hscale  ================
[09:58:10] [PASSED] normal use
[09:58:10] [PASSED] out of max range
[09:58:10] [PASSED] out of min range
[09:58:10] [PASSED] zero dst
[09:58:10] [PASSED] negative src
[09:58:10] [PASSED] negative dst
[09:58:10] ============ [PASSED] drm_test_rect_calc_hscale ============
[09:58:10] ================ drm_test_rect_calc_vscale  ================
[09:58:10] [PASSED] normal use
[09:58:10] [PASSED] out of max range
[09:58:10] [PASSED] out of min range
[09:58:10] [PASSED] zero dst
[09:58:10] [PASSED] negative src
[09:58:10] [PASSED] negative dst
[09:58:10] ============ [PASSED] drm_test_rect_calc_vscale ============
[09:58:10] ================== drm_test_rect_rotate  ===================
[09:58:10] [PASSED] reflect-x
[09:58:10] [PASSED] reflect-y
[09:58:10] [PASSED] rotate-0
[09:58:10] [PASSED] rotate-90
[09:58:10] [PASSED] rotate-180
[09:58:10] [PASSED] rotate-270
[09:58:10] ============== [PASSED] drm_test_rect_rotate ===============
[09:58:10] ================ drm_test_rect_rotate_inv  =================
[09:58:10] [PASSED] reflect-x
[09:58:10] [PASSED] reflect-y
[09:58:10] [PASSED] rotate-0
[09:58:10] [PASSED] rotate-90
[09:58:10] [PASSED] rotate-180
[09:58:10] [PASSED] rotate-270
[09:58:10] ============ [PASSED] drm_test_rect_rotate_inv =============
[09:58:10] ==================== [PASSED] drm_rect =====================
[09:58:10] ============================================================
[09:58:10] Testing complete. Ran 333 tests: passed: 333
[09:58:10] Elapsed time: 44.508s total, 1.702s configuring, 18.662s building, 24.092s running

+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Intel-xe] ✓ CI.Build: success for Documentation/gpu: VM_BIND locking document
  2023-08-16  9:15 ` Thomas Hellström
                   ` (4 preceding siblings ...)
  (?)
@ 2023-08-16 10:02 ` Patchwork
  -1 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2023-08-16 10:02 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

== Series Details ==

Series: Documentation/gpu: VM_BIND locking document
URL   : https://patchwork.freedesktop.org/series/122507/
State : success

== Summary ==

+ trap cleanup EXIT
+ cd /kernel
+ git clone https://gitlab.freedesktop.org/drm/xe/ci.git .ci
Cloning into '.ci'...
++ date +%s
+ echo -e '\e[0Ksection_start:1692179901:build_x86_64[collapsed=true]\r\e[0KBuild x86-64'
+ mkdir -p build64
^[[0Ksection_start:1692179901:build_x86_64[collapsed=true]
^[[0KBuild x86-64
+ cat .ci/kernel/kconfig
+ make O=build64 olddefconfig
make[1]: Entering directory '/kernel/build64'
  GEN     Makefile
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/confdata.o
  HOSTCC  scripts/kconfig/expr.o
  LEX     scripts/kconfig/lexer.lex.c
  YACC    scripts/kconfig/parser.tab.[ch]
  HOSTCC  scripts/kconfig/lexer.lex.o
  HOSTCC  scripts/kconfig/menu.o
  HOSTCC  scripts/kconfig/parser.tab.o
  HOSTCC  scripts/kconfig/preprocess.o
  HOSTCC  scripts/kconfig/symbol.o
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
#
# configuration written to .config
#
make[1]: Leaving directory '/kernel/build64'
++ nproc
+ make O=build64 -j48
make[1]: Entering directory '/kernel/build64'
  GEN     Makefile
  WRAP    arch/x86/include/generated/uapi/asm/bpf_perf_event.h
  WRAP    arch/x86/include/generated/uapi/asm/errno.h
  WRAP    arch/x86/include/generated/uapi/asm/fcntl.h
  WRAP    arch/x86/include/generated/uapi/asm/ioctl.h
  WRAP    arch/x86/include/generated/uapi/asm/ioctls.h
  WRAP    arch/x86/include/generated/uapi/asm/ipcbuf.h
  WRAP    arch/x86/include/generated/uapi/asm/param.h
  WRAP    arch/x86/include/generated/uapi/asm/poll.h
  WRAP    arch/x86/include/generated/uapi/asm/resource.h
  WRAP    arch/x86/include/generated/uapi/asm/socket.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_32.h
  WRAP    arch/x86/include/generated/uapi/asm/sockios.h
  WRAP    arch/x86/include/generated/uapi/asm/termbits.h
  GEN     arch/x86/include/generated/asm/orc_hash.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_64.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_x32.h
  WRAP    arch/x86/include/generated/uapi/asm/termios.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  WRAP    arch/x86/include/generated/uapi/asm/types.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_64_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_64.h
  HOSTCC  arch/x86/tools/relocs_32.o
  HOSTCC  arch/x86/tools/relocs_64.o
  HOSTCC  arch/x86/tools/relocs_common.o
  WRAP    arch/x86/include/generated/asm/early_ioremap.h
  WRAP    arch/x86/include/generated/asm/mcs_spinlock.h
  WRAP    arch/x86/include/generated/asm/export.h
  WRAP    arch/x86/include/generated/asm/irq_regs.h
  WRAP    arch/x86/include/generated/asm/kmap_size.h
  WRAP    arch/x86/include/generated/asm/local64.h
  WRAP    arch/x86/include/generated/asm/mmiowb.h
  WRAP    arch/x86/include/generated/asm/module.lds.h
  WRAP    arch/x86/include/generated/asm/rwonce.h
  WRAP    arch/x86/include/generated/asm/unaligned.h
  UPD     include/generated/uapi/linux/version.h
  UPD     include/config/kernel.release
  UPD     include/generated/compile.h
  HOSTCC  scripts/kallsyms
  UPD     include/generated/utsrelease.h
  HOSTCC  scripts/sorttable
  HOSTCC  scripts/unifdef
  HOSTCC  scripts/asn1_compiler
  DESCEND objtool
  HOSTCC  /kernel/build64/tools/objtool/fixdep.o
  HOSTLD  /kernel/build64/tools/objtool/fixdep-in.o
  LINK    /kernel/build64/tools/objtool/fixdep
  HOSTLD  arch/x86/tools/relocs
  INSTALL /kernel/build64/tools/objtool/libsubcmd/include/subcmd/exec-cmd.h
  INSTALL /kernel/build64/tools/objtool/libsubcmd/include/subcmd/help.h
  INSTALL /kernel/build64/tools/objtool/libsubcmd/include/subcmd/parse-options.h
  INSTALL /kernel/build64/tools/objtool/libsubcmd/include/subcmd/pager.h
  INSTALL /kernel/build64/tools/objtool/libsubcmd/include/subcmd/run-command.h
  CC      /kernel/build64/tools/objtool/libsubcmd/exec-cmd.o
  INSTALL libsubcmd_headers
  CC      /kernel/build64/tools/objtool/libsubcmd/help.o
  CC      /kernel/build64/tools/objtool/libsubcmd/pager.o
  CC      /kernel/build64/tools/objtool/libsubcmd/parse-options.o
  CC      /kernel/build64/tools/objtool/libsubcmd/run-command.o
  CC      /kernel/build64/tools/objtool/libsubcmd/sigchain.o
  CC      /kernel/build64/tools/objtool/libsubcmd/subcmd-config.o
  CC      scripts/mod/empty.o
  HOSTCC  scripts/mod/mk_elfconfig
  CC      scripts/mod/devicetable-offsets.s
  HDRINST usr/include/video/edid.h
  HDRINST usr/include/video/sisfb.h
  HDRINST usr/include/video/uvesafb.h
  HDRINST usr/include/drm/amdgpu_drm.h
  HDRINST usr/include/drm/qaic_accel.h
  HDRINST usr/include/drm/i915_drm.h
  HDRINST usr/include/drm/vgem_drm.h
  HDRINST usr/include/drm/virtgpu_drm.h
  HDRINST usr/include/drm/xe_drm.h
  HDRINST usr/include/drm/omap_drm.h
  HDRINST usr/include/drm/radeon_drm.h
  HDRINST usr/include/drm/tegra_drm.h
  HDRINST usr/include/drm/drm_mode.h
  HDRINST usr/include/drm/ivpu_accel.h
  HDRINST usr/include/drm/exynos_drm.h
  HDRINST usr/include/drm/drm_sarea.h
  HDRINST usr/include/drm/v3d_drm.h
  HDRINST usr/include/drm/drm_fourcc.h
  HDRINST usr/include/drm/qxl_drm.h
  HDRINST usr/include/drm/nouveau_drm.h
  HDRINST usr/include/drm/habanalabs_accel.h
  HDRINST usr/include/drm/vmwgfx_drm.h
  HDRINST usr/include/drm/msm_drm.h
  HDRINST usr/include/drm/etnaviv_drm.h
  HDRINST usr/include/drm/panfrost_drm.h
  HDRINST usr/include/drm/vc4_drm.h
  HDRINST usr/include/drm/lima_drm.h
  HDRINST usr/include/drm/drm.h
  HDRINST usr/include/drm/armada_drm.h
  HDRINST usr/include/mtd/inftl-user.h
  HDRINST usr/include/mtd/nftl-user.h
  HDRINST usr/include/mtd/mtd-user.h
  HDRINST usr/include/mtd/ubi-user.h
  HDRINST usr/include/mtd/mtd-abi.h
  HDRINST usr/include/xen/gntdev.h
  HDRINST usr/include/xen/gntalloc.h
  HDRINST usr/include/xen/evtchn.h
  HDRINST usr/include/xen/privcmd.h
  HDRINST usr/include/asm-generic/auxvec.h
  HDRINST usr/include/asm-generic/bitsperlong.h
  HDRINST usr/include/asm-generic/posix_types.h
  HDRINST usr/include/asm-generic/ioctls.h
  HDRINST usr/include/asm-generic/mman.h
  HDRINST usr/include/asm-generic/shmbuf.h
  HDRINST usr/include/asm-generic/bpf_perf_event.h
  HDRINST usr/include/asm-generic/types.h
  HDRINST usr/include/asm-generic/poll.h
  HDRINST usr/include/asm-generic/msgbuf.h
  HDRINST usr/include/asm-generic/swab.h
  HDRINST usr/include/asm-generic/statfs.h
  HDRINST usr/include/asm-generic/unistd.h
  UPD     scripts/mod/devicetable-offsets.h
  HDRINST usr/include/asm-generic/hugetlb_encode.h
  HDRINST usr/include/asm-generic/resource.h
  HDRINST usr/include/asm-generic/param.h
  HDRINST usr/include/asm-generic/termbits-common.h
  HDRINST usr/include/asm-generic/sockios.h
  HDRINST usr/include/asm-generic/kvm_para.h
  HDRINST usr/include/asm-generic/errno.h
  HDRINST usr/include/asm-generic/termios.h
  HDRINST usr/include/asm-generic/mman-common.h
  HDRINST usr/include/asm-generic/ioctl.h
  HDRINST usr/include/asm-generic/socket.h
  HDRINST usr/include/asm-generic/signal-defs.h
  HDRINST usr/include/asm-generic/termbits.h
  HDRINST usr/include/asm-generic/int-ll64.h
  HDRINST usr/include/asm-generic/signal.h
  HDRINST usr/include/asm-generic/siginfo.h
  HDRINST usr/include/asm-generic/stat.h
  HDRINST usr/include/asm-generic/int-l64.h
  HDRINST usr/include/asm-generic/errno-base.h
  HDRINST usr/include/asm-generic/fcntl.h
  HDRINST usr/include/asm-generic/setup.h
  HDRINST usr/include/asm-generic/ipcbuf.h
  HDRINST usr/include/asm-generic/sembuf.h
  HDRINST usr/include/asm-generic/ucontext.h
  HDRINST usr/include/rdma/mlx5_user_ioctl_cmds.h
  HDRINST usr/include/rdma/irdma-abi.h
  HDRINST usr/include/rdma/mana-abi.h
  HDRINST usr/include/rdma/hfi/hfi1_user.h
  HDRINST usr/include/rdma/hfi/hfi1_ioctl.h
  HDRINST usr/include/rdma/rdma_user_rxe.h
  HDRINST usr/include/rdma/rdma_user_ioctl.h
  HDRINST usr/include/rdma/mlx5_user_ioctl_verbs.h
  HDRINST usr/include/rdma/bnxt_re-abi.h
  HDRINST usr/include/rdma/hns-abi.h
  HDRINST usr/include/rdma/qedr-abi.h
  HDRINST usr/include/rdma/ib_user_ioctl_cmds.h
  HDRINST usr/include/rdma/vmw_pvrdma-abi.h
  HDRINST usr/include/rdma/ib_user_sa.h
  HDRINST usr/include/rdma/ib_user_ioctl_verbs.h
  HDRINST usr/include/rdma/rvt-abi.h
  HDRINST usr/include/rdma/mlx5-abi.h
  HDRINST usr/include/rdma/rdma_netlink.h
  HDRINST usr/include/rdma/erdma-abi.h
  HDRINST usr/include/rdma/rdma_user_ioctl_cmds.h
  HDRINST usr/include/rdma/rdma_user_cm.h
  HDRINST usr/include/rdma/ib_user_verbs.h
  HDRINST usr/include/rdma/efa-abi.h
  HDRINST usr/include/rdma/siw-abi.h
  HDRINST usr/include/rdma/mlx4-abi.h
  HDRINST usr/include/rdma/mthca-abi.h
  HDRINST usr/include/rdma/ib_user_mad.h
  HDRINST usr/include/rdma/ocrdma-abi.h
  MKELF   scripts/mod/elfconfig.h
  HDRINST usr/include/rdma/cxgb4-abi.h
  HDRINST usr/include/misc/xilinx_sdfec.h
  HDRINST usr/include/misc/uacce/hisi_qm.h
  HOSTCC  scripts/mod/modpost.o
  HDRINST usr/include/misc/uacce/uacce.h
  HOSTCC  scripts/mod/file2alias.o
  HDRINST usr/include/misc/cxl.h
  HOSTCC  scripts/mod/sumversion.o
  HDRINST usr/include/misc/ocxl.h
  HDRINST usr/include/misc/fastrpc.h
  HDRINST usr/include/misc/pvpanic.h
  HDRINST usr/include/linux/i8k.h
  HDRINST usr/include/linux/acct.h
  HDRINST usr/include/linux/atmmpc.h
  HDRINST usr/include/linux/fs.h
  HDRINST usr/include/linux/cifs/cifs_mount.h
  HDRINST usr/include/linux/cifs/cifs_netlink.h
  HDRINST usr/include/linux/if_packet.h
  HDRINST usr/include/linux/route.h
  HDRINST usr/include/linux/patchkey.h
  HDRINST usr/include/linux/tc_ematch/tc_em_cmp.h
  HDRINST usr/include/linux/tc_ematch/tc_em_ipt.h
  HDRINST usr/include/linux/tc_ematch/tc_em_meta.h
  HDRINST usr/include/linux/tc_ematch/tc_em_nbyte.h
  HDRINST usr/include/linux/tc_ematch/tc_em_text.h
  HDRINST usr/include/linux/virtio_pmem.h
  HDRINST usr/include/linux/rkisp1-config.h
  HDRINST usr/include/linux/vhost.h
  HDRINST usr/include/linux/cec-funcs.h
  HDRINST usr/include/linux/ppdev.h
  HDRINST usr/include/linux/isdn/capicmd.h
  HDRINST usr/include/linux/virtio_fs.h
  HDRINST usr/include/linux/netfilter_ipv6.h
  HDRINST usr/include/linux/lirc.h
  HDRINST usr/include/linux/mroute6.h
  HDRINST usr/include/linux/nl80211-vnd-intel.h
  HDRINST usr/include/linux/ivtvfb.h
  HDRINST usr/include/linux/auxvec.h
  HDRINST usr/include/linux/dm-log-userspace.h
  HDRINST usr/include/linux/dccp.h
  HDRINST usr/include/linux/virtio_scmi.h
  HDRINST usr/include/linux/atmarp.h
  HDRINST usr/include/linux/arcfb.h
  HDRINST usr/include/linux/nbd-netlink.h
  HDRINST usr/include/linux/sched/types.h
  HDRINST usr/include/linux/tcp.h
  HDRINST usr/include/linux/neighbour.h
  HDRINST usr/include/linux/dlm_device.h
  HDRINST usr/include/linux/wmi.h
  HDRINST usr/include/linux/btrfs_tree.h
  HDRINST usr/include/linux/virtio_crypto.h
  HDRINST usr/include/linux/vbox_err.h
  HDRINST usr/include/linux/edd.h
  HDRINST usr/include/linux/loop.h
  HDRINST usr/include/linux/nvme_ioctl.h
  HDRINST usr/include/linux/mmtimer.h
  HDRINST usr/include/linux/if_pppol2tp.h
  HDRINST usr/include/linux/mtio.h
  HDRINST usr/include/linux/if_arcnet.h
  HDRINST usr/include/linux/romfs_fs.h
  HDRINST usr/include/linux/posix_types.h
  HDRINST usr/include/linux/rtc.h
  HDRINST usr/include/linux/landlock.h
  HDRINST usr/include/linux/gpio.h
  HDRINST usr/include/linux/selinux_netlink.h
  HDRINST usr/include/linux/pps.h
  HDRINST usr/include/linux/ndctl.h
  HDRINST usr/include/linux/virtio_gpu.h
  HDRINST usr/include/linux/android/binderfs.h
  HDRINST usr/include/linux/android/binder.h
  HDRINST usr/include/linux/virtio_vsock.h
  HDRINST usr/include/linux/sound.h
  HDRINST usr/include/linux/vtpm_proxy.h
  HDRINST usr/include/linux/nfs_fs.h
  HDRINST usr/include/linux/elf-fdpic.h
  HDRINST usr/include/linux/adfs_fs.h
  HDRINST usr/include/linux/target_core_user.h
  HDRINST usr/include/linux/netlink_diag.h
  HDRINST usr/include/linux/const.h
  HDRINST usr/include/linux/firewire-cdev.h
  HDRINST usr/include/linux/vdpa.h
  HDRINST usr/include/linux/if_infiniband.h
  HDRINST usr/include/linux/serial.h
  HDRINST usr/include/linux/iio/types.h
  HDRINST usr/include/linux/iio/buffer.h
  HDRINST usr/include/linux/iio/events.h
  HDRINST usr/include/linux/baycom.h
  HDRINST usr/include/linux/major.h
  HDRINST usr/include/linux/atmppp.h
  HDRINST usr/include/linux/ipv6_route.h
  HDRINST usr/include/linux/spi/spidev.h
  HDRINST usr/include/linux/spi/spi.h
  HDRINST usr/include/linux/virtio_ring.h
  HDRINST usr/include/linux/hdlc/ioctl.h
  HDRINST usr/include/linux/hyperv.h
  HDRINST usr/include/linux/remoteproc_cdev.h
  HDRINST usr/include/linux/rpl_iptunnel.h
  HDRINST usr/include/linux/sync_file.h
  HDRINST usr/include/linux/igmp.h
  HDRINST usr/include/linux/v4l2-dv-timings.h
  HDRINST usr/include/linux/virtio_i2c.h
  HDRINST usr/include/linux/xfrm.h
  HDRINST usr/include/linux/capability.h
  HDRINST usr/include/linux/gtp.h
  HDRINST usr/include/linux/xdp_diag.h
  HDRINST usr/include/linux/pkt_cls.h
  HDRINST usr/include/linux/suspend_ioctls.h
  HDRINST usr/include/linux/vt.h
  HDRINST usr/include/linux/loadpin.h
  HDRINST usr/include/linux/dlm_plock.h
  HDRINST usr/include/linux/fb.h
  HDRINST usr/include/linux/max2175.h
  HDRINST usr/include/linux/sunrpc/debug.h
  HDRINST usr/include/linux/gsmmux.h
  HDRINST usr/include/linux/watchdog.h
  HDRINST usr/include/linux/vhost_types.h
  HDRINST usr/include/linux/vduse.h
  HDRINST usr/include/linux/ila.h
  HDRINST usr/include/linux/tdx-guest.h
  HDRINST usr/include/linux/close_range.h
  HDRINST usr/include/linux/ivtv.h
  HDRINST usr/include/linux/cryptouser.h
  HDRINST usr/include/linux/netfilter/xt_string.h
  HDRINST usr/include/linux/netfilter/nfnetlink_compat.h
  HDRINST usr/include/linux/netfilter/nf_nat.h
  HDRINST usr/include/linux/netfilter/xt_recent.h
  HDRINST usr/include/linux/netfilter/xt_addrtype.h
  HDRINST usr/include/linux/netfilter/nf_conntrack_tcp.h
  HDRINST usr/include/linux/netfilter/xt_MARK.h
  HDRINST usr/include/linux/netfilter/xt_SYNPROXY.h
  HDRINST usr/include/linux/netfilter/xt_multiport.h
  HDRINST usr/include/linux/netfilter/nfnetlink.h
  HDRINST usr/include/linux/netfilter/xt_cgroup.h
  HDRINST usr/include/linux/netfilter/nf_synproxy.h
  HDRINST usr/include/linux/netfilter/xt_TCPOPTSTRIP.h
  HDRINST usr/include/linux/netfilter/nfnetlink_log.h
  HDRINST usr/include/linux/netfilter/xt_TPROXY.h
  HDRINST usr/include/linux/netfilter/xt_u32.h
  HDRINST usr/include/linux/netfilter/nfnetlink_osf.h
  HDRINST usr/include/linux/netfilter/xt_ecn.h
  HDRINST usr/include/linux/netfilter/xt_esp.h
  HDRINST usr/include/linux/netfilter/nfnetlink_hook.h
  HDRINST usr/include/linux/netfilter/xt_mac.h
  HDRINST usr/include/linux/netfilter/xt_comment.h
  HDRINST usr/include/linux/netfilter/xt_NFQUEUE.h
  HDRINST usr/include/linux/netfilter/xt_osf.h
  HDRINST usr/include/linux/netfilter/xt_hashlimit.h
  HDRINST usr/include/linux/netfilter/nf_conntrack_sctp.h
  HDRINST usr/include/linux/netfilter/xt_socket.h
  HDRINST usr/include/linux/netfilter/xt_connmark.h
  HDRINST usr/include/linux/netfilter/xt_sctp.h
  HDRINST usr/include/linux/netfilter/xt_tcpudp.h
  HDRINST usr/include/linux/netfilter/xt_DSCP.h
  HDRINST usr/include/linux/netfilter/xt_time.h
  HDRINST usr/include/linux/netfilter/xt_IDLETIMER.h
  HDRINST usr/include/linux/netfilter/xt_policy.h
  HDRINST usr/include/linux/netfilter/xt_rpfilter.h
  HDRINST usr/include/linux/netfilter/xt_nfacct.h
  HDRINST usr/include/linux/netfilter/xt_SECMARK.h
  HDRINST usr/include/linux/netfilter/xt_length.h
  HDRINST usr/include/linux/netfilter/nfnetlink_cthelper.h
  HDRINST usr/include/linux/netfilter/xt_quota.h
  HDRINST usr/include/linux/netfilter/xt_CLASSIFY.h
  HDRINST usr/include/linux/netfilter/xt_ipcomp.h
  HDRINST usr/include/linux/netfilter/xt_iprange.h
  HDRINST usr/include/linux/netfilter/xt_bpf.h
  HDRINST usr/include/linux/netfilter/xt_LOG.h
  HDRINST usr/include/linux/netfilter/xt_rateest.h
  HDRINST usr/include/linux/netfilter/xt_CONNSECMARK.h
  HDRINST usr/include/linux/netfilter/xt_HMARK.h
  HDRINST usr/include/linux/netfilter/xt_CONNMARK.h
  HDRINST usr/include/linux/netfilter/xt_pkttype.h
  HDRINST usr/include/linux/netfilter/xt_ipvs.h
  HDRINST usr/include/linux/netfilter/xt_devgroup.h
  HDRINST usr/include/linux/netfilter/xt_AUDIT.h
  HDRINST usr/include/linux/netfilter/xt_realm.h
  HDRINST usr/include/linux/netfilter/nf_conntrack_common.h
  HDRINST usr/include/linux/netfilter/xt_set.h
  HDRINST usr/include/linux/netfilter/xt_LED.h
  HDRINST usr/include/linux/netfilter/xt_connlabel.h
  HDRINST usr/include/linux/netfilter/xt_owner.h
  HDRINST usr/include/linux/netfilter/xt_dccp.h
  HDRINST usr/include/linux/netfilter/xt_limit.h
  HDRINST usr/include/linux/netfilter/xt_conntrack.h
  HDRINST usr/include/linux/netfilter/xt_TEE.h
  HDRINST usr/include/linux/netfilter/xt_RATEEST.h
  HDRINST usr/include/linux/netfilter/xt_connlimit.h
  HDRINST usr/include/linux/netfilter/ipset/ip_set.h
  HDRINST usr/include/linux/netfilter/ipset/ip_set_list.h
  HDRINST usr/include/linux/netfilter/ipset/ip_set_hash.h
  HDRINST usr/include/linux/netfilter/ipset/ip_set_bitmap.h
  HDRINST usr/include/linux/netfilter/x_tables.h
  HDRINST usr/include/linux/netfilter/xt_dscp.h
  HDRINST usr/include/linux/netfilter/nf_conntrack_ftp.h
  HDRINST usr/include/linux/netfilter/xt_cluster.h
  HDRINST usr/include/linux/netfilter/nf_conntrack_tuple_common.h
  HDRINST usr/include/linux/netfilter/nf_log.h
  HDRINST usr/include/linux/netfilter/xt_tcpmss.h
  HDRINST usr/include/linux/netfilter/xt_NFLOG.h
  HDRINST usr/include/linux/netfilter/xt_l2tp.h
  HDRINST usr/include/linux/netfilter/xt_helper.h
  HDRINST usr/include/linux/netfilter/xt_statistic.h
  HDRINST usr/include/linux/netfilter/nfnetlink_queue.h
  HDRINST usr/include/linux/netfilter/nfnetlink_cttimeout.h
  HDRINST usr/include/linux/netfilter/xt_CT.h
  HDRINST usr/include/linux/netfilter/xt_CHECKSUM.h
  HDRINST usr/include/linux/netfilter/xt_connbytes.h
  HDRINST usr/include/linux/netfilter/xt_state.h
  HDRINST usr/include/linux/netfilter/nf_tables.h
  HDRINST usr/include/linux/netfilter/xt_mark.h
  HDRINST usr/include/linux/netfilter/xt_cpu.h
  HDRINST usr/include/linux/netfilter/nf_tables_compat.h
  HDRINST usr/include/linux/netfilter/xt_physdev.h
  HDRINST usr/include/linux/netfilter/nfnetlink_conntrack.h
  HDRINST usr/include/linux/netfilter/nfnetlink_acct.h
  HDRINST usr/include/linux/netfilter/xt_TCPMSS.h
  HDRINST usr/include/linux/tty_flags.h
  HDRINST usr/include/linux/if_phonet.h
  HDRINST usr/include/linux/elf-em.h
  HDRINST usr/include/linux/vm_sockets.h
  HDRINST usr/include/linux/dlmconstants.h
  HDRINST usr/include/linux/bsg.h
  HDRINST usr/include/linux/matroxfb.h
  HDRINST usr/include/linux/sysctl.h
  HDRINST usr/include/linux/unix_diag.h
  HDRINST usr/include/linux/pcitest.h
  HDRINST usr/include/linux/mman.h
  HDRINST usr/include/linux/if_plip.h
  HDRINST usr/include/linux/virtio_balloon.h
  HDRINST usr/include/linux/pidfd.h
  HDRINST usr/include/linux/f2fs.h
  HDRINST usr/include/linux/x25.h
  HDRINST usr/include/linux/if_cablemodem.h
  HDRINST usr/include/linux/utsname.h
  HDRINST usr/include/linux/counter.h
  HDRINST usr/include/linux/atm_tcp.h
  HDRINST usr/include/linux/atalk.h
  HDRINST usr/include/linux/virtio_rng.h
  HDRINST usr/include/linux/vboxguest.h
  HDRINST usr/include/linux/bpf_perf_event.h
  HDRINST usr/include/linux/ipmi_ssif_bmc.h
  HDRINST usr/include/linux/nfs_mount.h
  HDRINST usr/include/linux/sonet.h
  HDRINST usr/include/linux/netfilter.h
  HDRINST usr/include/linux/keyctl.h
  HDRINST usr/include/linux/nl80211.h
  HDRINST usr/include/linux/misc/bcm_vk.h
  HDRINST usr/include/linux/audit.h
  HDRINST usr/include/linux/tipc_config.h
  HDRINST usr/include/linux/tipc_sockets_diag.h
  HDRINST usr/include/linux/futex.h
  HDRINST usr/include/linux/sev-guest.h
  HDRINST usr/include/linux/ublk_cmd.h
  HDRINST usr/include/linux/virtio_input.h
  HDRINST usr/include/linux/types.h
  HDRINST usr/include/linux/if_slip.h
  HDRINST usr/include/linux/personality.h
  HDRINST usr/include/linux/openat2.h
  HDRINST usr/include/linux/poll.h
  HDRINST usr/include/linux/posix_acl.h
  HDRINST usr/include/linux/smc_diag.h
  HDRINST usr/include/linux/snmp.h
  HDRINST usr/include/linux/errqueue.h
  HDRINST usr/include/linux/if_tunnel.h
  HDRINST usr/include/linux/fanotify.h
  HDRINST usr/include/linux/kernel.h
  HDRINST usr/include/linux/rtnetlink.h
  HDRINST usr/include/linux/rpl.h
  HDRINST usr/include/linux/memfd.h
  HDRINST usr/include/linux/serial_core.h
  HDRINST usr/include/linux/dns_resolver.h
  HDRINST usr/include/linux/pr.h
  HDRINST usr/include/linux/atm_eni.h
  HDRINST usr/include/linux/lp.h
  HDRINST usr/include/linux/virtio_mem.h
  HDRINST usr/include/linux/ultrasound.h
  HDRINST usr/include/linux/sctp.h
  HDRINST usr/include/linux/uio.h
  HDRINST usr/include/linux/tcp_metrics.h
  HDRINST usr/include/linux/wwan.h
  HDRINST usr/include/linux/atmbr2684.h
  HDRINST usr/include/linux/in_route.h
  HDRINST usr/include/linux/qemu_fw_cfg.h
  HDRINST usr/include/linux/if_macsec.h
  HDRINST usr/include/linux/usb/charger.h
  HDRINST usr/include/linux/usb/g_uvc.h
  HDRINST usr/include/linux/usb/gadgetfs.h
  HDRINST usr/include/linux/usb/raw_gadget.h
  HDRINST usr/include/linux/usb/cdc-wdm.h
  HDRINST usr/include/linux/usb/g_printer.h
  HDRINST usr/include/linux/usb/midi.h
  HDRINST usr/include/linux/usb/tmc.h
  HDRINST usr/include/linux/usb/video.h
  HDRINST usr/include/linux/usb/functionfs.h
  HDRINST usr/include/linux/usb/audio.h
  HDRINST usr/include/linux/usb/ch11.h
  HDRINST usr/include/linux/usb/ch9.h
  HDRINST usr/include/linux/usb/cdc.h
  HDRINST usr/include/linux/jffs2.h
  HDRINST usr/include/linux/ax25.h
  HDRINST usr/include/linux/auto_fs.h
  HDRINST usr/include/linux/tiocl.h
  HDRINST usr/include/linux/scc.h
  HDRINST usr/include/linux/psci.h
  HDRINST usr/include/linux/swab.h
  HDRINST usr/include/linux/cec.h
  HDRINST usr/include/linux/kfd_ioctl.h
  HDRINST usr/include/linux/smc.h
  HDRINST usr/include/linux/qrtr.h
  HDRINST usr/include/linux/screen_info.h
  HDRINST usr/include/linux/nfsacl.h
  HDRINST usr/include/linux/seg6_hmac.h
  HDRINST usr/include/linux/gameport.h
  HDRINST usr/include/linux/wireless.h
  HDRINST usr/include/linux/fdreg.h
  HDRINST usr/include/linux/cciss_defs.h
  HDRINST usr/include/linux/serial_reg.h
  HDRINST usr/include/linux/perf_event.h
  HDRINST usr/include/linux/in6.h
  HDRINST usr/include/linux/hid.h
  HDRINST usr/include/linux/netlink.h
  HDRINST usr/include/linux/fuse.h
  HDRINST usr/include/linux/magic.h
  HDRINST usr/include/linux/ioam6_iptunnel.h
  HDRINST usr/include/linux/stm.h
  HDRINST usr/include/linux/vsockmon.h
  HDRINST usr/include/linux/seg6.h
  HDRINST usr/include/linux/idxd.h
  HDRINST usr/include/linux/nitro_enclaves.h
  HDRINST usr/include/linux/ptrace.h
  HDRINST usr/include/linux/ioam6_genl.h
  HDRINST usr/include/linux/qnx4_fs.h
  HDRINST usr/include/linux/fsl_mc.h
  HDRINST usr/include/linux/net_tstamp.h
  HDRINST usr/include/linux/msg.h
  HDRINST usr/include/linux/netfilter_ipv4/ipt_TTL.h
  HDRINST usr/include/linux/netfilter_ipv4/ipt_ttl.h
  HDRINST usr/include/linux/netfilter_ipv4/ipt_ah.h
  HDRINST usr/include/linux/netfilter_ipv4/ipt_ECN.h
  HDRINST usr/include/linux/netfilter_ipv4/ip_tables.h
  HDRINST usr/include/linux/netfilter_ipv4/ipt_ecn.h
  HDRINST usr/include/linux/netfilter_ipv4/ipt_CLUSTERIP.h
  HDRINST usr/include/linux/netfilter_ipv4/ipt_REJECT.h
  HDRINST usr/include/linux/netfilter_ipv4/ipt_LOG.h
  HDRINST usr/include/linux/sem.h
  HDRINST usr/include/linux/net_namespace.h
  HDRINST usr/include/linux/radeonfb.h
  HDRINST usr/include/linux/tee.h
  HDRINST usr/include/linux/udp.h
  HDRINST usr/include/linux/virtio_bt.h
  HDRINST usr/include/linux/v4l2-subdev.h
  HDRINST usr/include/linux/posix_acl_xattr.h
  HDRINST usr/include/linux/v4l2-mediabus.h
  HDRINST usr/include/linux/atmapi.h
  HDRINST usr/include/linux/raid/md_p.h
  HDRINST usr/include/linux/raid/md_u.h
  HDRINST usr/include/linux/zorro_ids.h
  HDRINST usr/include/linux/nbd.h
  HDRINST usr/include/linux/isst_if.h
  HDRINST usr/include/linux/rxrpc.h
  HDRINST usr/include/linux/unistd.h
  HDRINST usr/include/linux/atm_zatm.h
  HDRINST usr/include/linux/if_arp.h
  HDRINST usr/include/linux/io_uring.h
  HDRINST usr/include/linux/if_fddi.h
  HDRINST usr/include/linux/bpqether.h
  HDRINST usr/include/linux/sysinfo.h
  HDRINST usr/include/linux/auto_dev-ioctl.h
  HDRINST usr/include/linux/nfs4_mount.h
  HDRINST usr/include/linux/keyboard.h
  HDRINST usr/include/linux/virtio_mmio.h
  HDRINST usr/include/linux/input.h
  HDRINST usr/include/linux/qnxtypes.h
  HDRINST usr/include/linux/mdio.h
  HDRINST usr/include/linux/lwtunnel.h
  HDRINST usr/include/linux/gfs2_ondisk.h
  HDRINST usr/include/linux/nfs4.h
  HDRINST usr/include/linux/ptp_clock.h
  HDRINST usr/include/linux/nubus.h
  HDRINST usr/include/linux/if_bonding.h
  HDRINST usr/include/linux/kcov.h
  HDRINST usr/include/linux/fadvise.h
  HDRINST usr/include/linux/taskstats.h
  HDRINST usr/include/linux/veth.h
  HDRINST usr/include/linux/atm.h
  HDRINST usr/include/linux/ipmi.h
  HDRINST usr/include/linux/kdev_t.h
  HDRINST usr/include/linux/mount.h
  HDRINST usr/include/linux/shm.h
  HDRINST usr/include/linux/resource.h
  HDRINST usr/include/linux/prctl.h
  LD      /kernel/build64/tools/objtool/libsubcmd/libsubcmd-in.o
  HDRINST usr/include/linux/watch_queue.h
  HDRINST usr/include/linux/sched.h
  HDRINST usr/include/linux/phonet.h
  HDRINST usr/include/linux/random.h
  HDRINST usr/include/linux/tty.h
  HDRINST usr/include/linux/apm_bios.h
  HDRINST usr/include/linux/fd.h
  HDRINST usr/include/linux/um_timetravel.h
  HDRINST usr/include/linux/tls.h
  HDRINST usr/include/linux/rpmsg_types.h
  HDRINST usr/include/linux/pfrut.h
  HDRINST usr/include/linux/mei.h
  HDRINST usr/include/linux/fsi.h
  HDRINST usr/include/linux/rds.h
  HDRINST usr/include/linux/if_x25.h
  HDRINST usr/include/linux/netdevice.h
  HDRINST usr/include/linux/param.h
  HDRINST usr/include/linux/binfmts.h
  HDRINST usr/include/linux/if_pppox.h
  HDRINST usr/include/linux/sockios.h
  HDRINST usr/include/linux/kcm.h
  HDRINST usr/include/linux/virtio_9p.h
  HDRINST usr/include/linux/genwqe/genwqe_card.h
  HDRINST usr/include/linux/if_tun.h
  HDRINST usr/include/linux/ext4.h
  HDRINST usr/include/linux/if_ether.h
  HDRINST usr/include/linux/kvm_para.h
  HDRINST usr/include/linux/kernel-page-flags.h
  HDRINST usr/include/linux/cdrom.h
  HDRINST usr/include/linux/un.h
  HDRINST usr/include/linux/module.h
  HDRINST usr/include/linux/mqueue.h
  HDRINST usr/include/linux/a.out.h
  HDRINST usr/include/linux/input-event-codes.h
  HDRINST usr/include/linux/coda.h
  HDRINST usr/include/linux/rio_mport_cdev.h
  HDRINST usr/include/linux/ipsec.h
  HDRINST usr/include/linux/blkpg.h
  AR      /kernel/build64/tools/objtool/libsubcmd/libsubcmd.a
  HDRINST usr/include/linux/blkzoned.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_arpreply.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_redirect.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_nflog.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_802_3.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_nat.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_mark_m.h
  HDRINST usr/include/linux/netfilter_bridge/ebtables.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_vlan.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_limit.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_log.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_stp.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_pkttype.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_ip.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_ip6.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_arp.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_mark_t.h
  HDRINST usr/include/linux/netfilter_bridge/ebt_among.h
  HDRINST usr/include/linux/reiserfs_fs.h
  HDRINST usr/include/linux/cciss_ioctl.h
  HDRINST usr/include/linux/fsmap.h
  HDRINST usr/include/linux/smiapp.h
  HDRINST usr/include/linux/switchtec_ioctl.h
  HDRINST usr/include/linux/atmdev.h
  HDRINST usr/include/linux/hpet.h
  HDRINST usr/include/linux/virtio_config.h
  HDRINST usr/include/linux/string.h
  HDRINST usr/include/linux/kfd_sysfs.h
  HDRINST usr/include/linux/inet_diag.h
  HDRINST usr/include/linux/netdev.h
  HDRINST usr/include/linux/xattr.h
  HDRINST usr/include/linux/iommufd.h
  HDRINST usr/include/linux/user_events.h
  HDRINST usr/include/linux/errno.h
  HDRINST usr/include/linux/icmp.h
  HDRINST usr/include/linux/i2o-dev.h
  HDRINST usr/include/linux/pg.h
  HDRINST usr/include/linux/if_bridge.h
  HDRINST usr/include/linux/thermal.h
  HDRINST usr/include/linux/uinput.h
  HDRINST usr/include/linux/handshake.h
  HDRINST usr/include/linux/dqblk_xfs.h
  HDRINST usr/include/linux/v4l2-common.h
  HDRINST usr/include/linux/nvram.h
  HDRINST usr/include/linux/if_vlan.h
  HDRINST usr/include/linux/uhid.h
  HDRINST usr/include/linux/omap3isp.h
  HDRINST usr/include/linux/rose.h
  HDRINST usr/include/linux/phantom.h
  HDRINST usr/include/linux/ipmi_msgdefs.h
  HDRINST usr/include/linux/bcm933xx_hcs.h
  HDRINST usr/include/linux/bpf.h
  HDRINST usr/include/linux/mempolicy.h
  HDRINST usr/include/linux/efs_fs_sb.h
  HDRINST usr/include/linux/nexthop.h
  HDRINST usr/include/linux/net_dropmon.h
  HDRINST usr/include/linux/surface_aggregator/cdev.h
  HDRINST usr/include/linux/surface_aggregator/dtx.h
  HDRINST usr/include/linux/net.h
  HDRINST usr/include/linux/mii.h
  HDRINST usr/include/linux/virtio_pcidev.h
  HDRINST usr/include/linux/termios.h
  HDRINST usr/include/linux/cgroupstats.h
  HDRINST usr/include/linux/mpls.h
  HDRINST usr/include/linux/iommu.h
  HDRINST usr/include/linux/toshiba.h
  CC      /kernel/build64/tools/objtool/weak.o
  HDRINST usr/include/linux/virtio_scsi.h
  HDRINST usr/include/linux/zorro.h
  CC      /kernel/build64/tools/objtool/check.o
  HDRINST usr/include/linux/chio.h
  CC      /kernel/build64/tools/objtool/special.o
  HDRINST usr/include/linux/pkt_sched.h
  HDRINST usr/include/linux/cramfs_fs.h
  CC      /kernel/build64/tools/objtool/builtin-check.o
  HDRINST usr/include/linux/nfs3.h
  CC      /kernel/build64/tools/objtool/elf.o
  HDRINST usr/include/linux/vfio_ccw.h
  MKDIR   /kernel/build64/tools/objtool/arch/x86/
  HDRINST usr/include/linux/atm_nicstar.h
  CC      /kernel/build64/tools/objtool/objtool.o
  HDRINST usr/include/linux/ncsi.h
  MKDIR   /kernel/build64/tools/objtool/arch/x86/lib/
  HDRINST usr/include/linux/virtio_net.h
  HDRINST usr/include/linux/ioctl.h
  CC      /kernel/build64/tools/objtool/orc_gen.o
  HDRINST usr/include/linux/stddef.h
  CC      /kernel/build64/tools/objtool/arch/x86/special.o
  HDRINST usr/include/linux/limits.h
  HDRINST usr/include/linux/ipmi_bmc.h
  HDRINST usr/include/linux/netfilter_arp.h
  CC      /kernel/build64/tools/objtool/orc_dump.o
  GEN     /kernel/build64/tools/objtool/arch/x86/lib/inat-tables.c
  HDRINST usr/include/linux/if_addr.h
  CC      /kernel/build64/tools/objtool/libstring.o
  HDRINST usr/include/linux/rpmsg.h
  HDRINST usr/include/linux/media-bus-format.h
  HDRINST usr/include/linux/kernelcapi.h
  HDRINST usr/include/linux/ppp_defs.h
  HDRINST usr/include/linux/ethtool.h
  CC      /kernel/build64/tools/objtool/libctype.o
  HDRINST usr/include/linux/aspeed-video.h
  CC      /kernel/build64/tools/objtool/str_error_r.o
  HDRINST usr/include/linux/hdlc.h
  HDRINST usr/include/linux/fscrypt.h
  CC      /kernel/build64/tools/objtool/librbtree.o
  HDRINST usr/include/linux/batadv_packet.h
  HDRINST usr/include/linux/uuid.h
  HDRINST usr/include/linux/capi.h
  HDRINST usr/include/linux/mptcp.h
  HDRINST usr/include/linux/hidraw.h
  HDRINST usr/include/linux/virtio_console.h
  HDRINST usr/include/linux/irqnr.h
  HDRINST usr/include/linux/coresight-stm.h
  HDRINST usr/include/linux/cxl_mem.h
  HDRINST usr/include/linux/iso_fs.h
  HDRINST usr/include/linux/virtio_blk.h
  HDRINST usr/include/linux/udf_fs_i.h
  HDRINST usr/include/linux/coff.h
  HDRINST usr/include/linux/dma-buf.h
  HDRINST usr/include/linux/ife.h
  HDRINST usr/include/linux/agpgart.h
  HDRINST usr/include/linux/socket.h
  HDRINST usr/include/linux/nilfs2_ondisk.h
  HDRINST usr/include/linux/connector.h
  HDRINST usr/include/linux/auto_fs4.h
  HDRINST usr/include/linux/bt-bmc.h
  HDRINST usr/include/linux/map_to_7segment.h
  HDRINST usr/include/linux/tc_act/tc_skbedit.h
  HDRINST usr/include/linux/tc_act/tc_ctinfo.h
  HDRINST usr/include/linux/tc_act/tc_defact.h
  HDRINST usr/include/linux/tc_act/tc_gact.h
  HDRINST usr/include/linux/tc_act/tc_vlan.h
  HDRINST usr/include/linux/tc_act/tc_skbmod.h
  HDRINST usr/include/linux/tc_act/tc_sample.h
  HDRINST usr/include/linux/tc_act/tc_tunnel_key.h
  HDRINST usr/include/linux/tc_act/tc_gate.h
  HDRINST usr/include/linux/tc_act/tc_mirred.h
  HDRINST usr/include/linux/tc_act/tc_nat.h
  HDRINST usr/include/linux/tc_act/tc_csum.h
  HDRINST usr/include/linux/tc_act/tc_connmark.h
  HDRINST usr/include/linux/tc_act/tc_ife.h
  HDRINST usr/include/linux/tc_act/tc_mpls.h
  HDRINST usr/include/linux/tc_act/tc_ct.h
  HDRINST usr/include/linux/tc_act/tc_pedit.h
  HDRINST usr/include/linux/tc_act/tc_bpf.h
  HDRINST usr/include/linux/tc_act/tc_ipt.h
  HDRINST usr/include/linux/netrom.h
  HDRINST usr/include/linux/joystick.h
  HDRINST usr/include/linux/falloc.h
  HDRINST usr/include/linux/cycx_cfm.h
  HDRINST usr/include/linux/omapfb.h
  HDRINST usr/include/linux/msdos_fs.h
  HDRINST usr/include/linux/virtio_types.h
  HDRINST usr/include/linux/mroute.h
  HDRINST usr/include/linux/psample.h
  HDRINST usr/include/linux/ipv6.h
  HDRINST usr/include/linux/dw100.h
  HDRINST usr/include/linux/psp-sev.h
  HDRINST usr/include/linux/vfio.h
  HDRINST usr/include/linux/if_ppp.h
  HDRINST usr/include/linux/byteorder/big_endian.h
  HDRINST usr/include/linux/byteorder/little_endian.h
  HDRINST usr/include/linux/comedi.h
  HDRINST usr/include/linux/scif_ioctl.h
  HDRINST usr/include/linux/timerfd.h
  HDRINST usr/include/linux/time_types.h
  HDRINST usr/include/linux/firewire-constants.h
  HDRINST usr/include/linux/virtio_snd.h
  HDRINST usr/include/linux/ppp-ioctl.h
  HDRINST usr/include/linux/fib_rules.h
  HDRINST usr/include/linux/gen_stats.h
  HDRINST usr/include/linux/virtio_iommu.h
  HDRINST usr/include/linux/genetlink.h
  HDRINST usr/include/linux/uvcvideo.h
  HDRINST usr/include/linux/pfkeyv2.h
  HDRINST usr/include/linux/soundcard.h
  HDRINST usr/include/linux/times.h
  CC      /kernel/build64/tools/objtool/arch/x86/decode.o
  HDRINST usr/include/linux/nfc.h
  HDRINST usr/include/linux/affs_hardblocks.h
  HDRINST usr/include/linux/nilfs2_api.h
  HDRINST usr/include/linux/rseq.h
  HDRINST usr/include/linux/caif/caif_socket.h
  HDRINST usr/include/linux/caif/if_caif.h
  HDRINST usr/include/linux/i2c-dev.h
  HDRINST usr/include/linux/cuda.h
  HDRINST usr/include/linux/mei_uuid.h
  HDRINST usr/include/linux/cn_proc.h
  HDRINST usr/include/linux/parport.h
  HDRINST usr/include/linux/v4l2-controls.h
  HDRINST usr/include/linux/hsi/cs-protocol.h
  HDRINST usr/include/linux/hsi/hsi_char.h
  HDRINST usr/include/linux/seg6_genl.h
  HDRINST usr/include/linux/am437x-vpfe.h
  HDRINST usr/include/linux/amt.h
  HDRINST usr/include/linux/netconf.h
  HDRINST usr/include/linux/erspan.h
  HDRINST usr/include/linux/nsfs.h
  HDRINST usr/include/linux/xilinx-v4l2-controls.h
  HDRINST usr/include/linux/aspeed-p2a-ctrl.h
  HDRINST usr/include/linux/vfio_zdev.h
  HDRINST usr/include/linux/serio.h
  HDRINST usr/include/linux/acrn.h
  HDRINST usr/include/linux/nfs2.h
  HDRINST usr/include/linux/virtio_pci.h
  HDRINST usr/include/linux/ipc.h
  HDRINST usr/include/linux/ethtool_netlink.h
  HDRINST usr/include/linux/kd.h
  HDRINST usr/include/linux/elf.h
  HDRINST usr/include/linux/videodev2.h
  HDRINST usr/include/linux/if_alg.h
  HDRINST usr/include/linux/sonypi.h
  HDRINST usr/include/linux/fsverity.h
  HDRINST usr/include/linux/if.h
  HDRINST usr/include/linux/btrfs.h
  HDRINST usr/include/linux/vm_sockets_diag.h
  HDRINST usr/include/linux/netfilter_bridge.h
  HDRINST usr/include/linux/packet_diag.h
  HDRINST usr/include/linux/netfilter_ipv4.h
  HDRINST usr/include/linux/kvm.h
  HDRINST usr/include/linux/pci.h
  HDRINST usr/include/linux/if_addrlabel.h
  HDRINST usr/include/linux/hdlcdrv.h
  HDRINST usr/include/linux/cfm_bridge.h
  HDRINST usr/include/linux/fiemap.h
  HDRINST usr/include/linux/dm-ioctl.h
  HDRINST usr/include/linux/aspeed-lpc-ctrl.h
  HDRINST usr/include/linux/atmioc.h
  HDRINST usr/include/linux/dlm.h
  HDRINST usr/include/linux/pci_regs.h
  HDRINST usr/include/linux/cachefiles.h
  HDRINST usr/include/linux/membarrier.h
  HDRINST usr/include/linux/nfs_idmap.h
  HDRINST usr/include/linux/ip.h
  HDRINST usr/include/linux/atm_he.h
  HDRINST usr/include/linux/nfsd/export.h
  HDRINST usr/include/linux/nfsd/stats.h
  HDRINST usr/include/linux/nfsd/debug.h
  HDRINST usr/include/linux/nfsd/cld.h
  HDRINST usr/include/linux/ip_vs.h
  HDRINST usr/include/linux/vmcore.h
  HDRINST usr/include/linux/vbox_vmmdev_types.h
  HDRINST usr/include/linux/dvb/osd.h
  HDRINST usr/include/linux/dvb/dmx.h
  HDRINST usr/include/linux/dvb/net.h
  HDRINST usr/include/linux/dvb/frontend.h
  HDRINST usr/include/linux/dvb/ca.h
  HDRINST usr/include/linux/dvb/version.h
  HDRINST usr/include/linux/dvb/video.h
  HDRINST usr/include/linux/dvb/audio.h
  HDRINST usr/include/linux/nfs.h
  HDRINST usr/include/linux/if_link.h
  HDRINST usr/include/linux/wait.h
  HDRINST usr/include/linux/icmpv6.h
  HDRINST usr/include/linux/media.h
  HDRINST usr/include/linux/seg6_local.h
  HDRINST usr/include/linux/openvswitch.h
  HDRINST usr/include/linux/atmsap.h
  HDRINST usr/include/linux/bpfilter.h
  HDRINST usr/include/linux/fpga-dfl.h
  HDRINST usr/include/linux/userio.h
  HDRINST usr/include/linux/signal.h
  HDRINST usr/include/linux/map_to_14segment.h
  HDRINST usr/include/linux/hdreg.h
  HDRINST usr/include/linux/utime.h
  HDRINST usr/include/linux/usbdevice_fs.h
  HDRINST usr/include/linux/timex.h
  HDRINST usr/include/linux/if_fc.h
  HDRINST usr/include/linux/reiserfs_xattr.h
  HDRINST usr/include/linux/hw_breakpoint.h
  HDRINST usr/include/linux/quota.h
  HDRINST usr/include/linux/ioprio.h
  HDRINST usr/include/linux/eventpoll.h
  HDRINST usr/include/linux/atmclip.h
  HDRINST usr/include/linux/can.h
  HDRINST usr/include/linux/if_team.h
  HDRINST usr/include/linux/usbip.h
  HDRINST usr/include/linux/stat.h
  HDRINST usr/include/linux/fou.h
  HDRINST usr/include/linux/hash_info.h
  HDRINST usr/include/linux/ppp-comp.h
  HDRINST usr/include/linux/ip6_tunnel.h
  HDRINST usr/include/linux/tipc_netlink.h
  HDRINST usr/include/linux/in.h
  HDRINST usr/include/linux/wireguard.h
  HDRINST usr/include/linux/btf.h
  HDRINST usr/include/linux/batman_adv.h
  HDRINST usr/include/linux/fcntl.h
  HDRINST usr/include/linux/if_ltalk.h
  HDRINST usr/include/linux/i2c.h
  HDRINST usr/include/linux/atm_idt77105.h
  HDRINST usr/include/linux/kexec.h
  HDRINST usr/include/linux/arm_sdei.h
  HDRINST usr/include/linux/netfilter_ipv6/ip6_tables.h
  HDRINST usr/include/linux/netfilter_ipv6/ip6t_ah.h
  HDRINST usr/include/linux/netfilter_ipv6/ip6t_NPT.h
  HDRINST usr/include/linux/netfilter_ipv6/ip6t_rt.h
  HDRINST usr/include/linux/netfilter_ipv6/ip6t_REJECT.h
  HDRINST usr/include/linux/netfilter_ipv6/ip6t_opts.h
  HDRINST usr/include/linux/netfilter_ipv6/ip6t_srh.h
  HDRINST usr/include/linux/netfilter_ipv6/ip6t_LOG.h
  HDRINST usr/include/linux/netfilter_ipv6/ip6t_mh.h
  HDRINST usr/include/linux/netfilter_ipv6/ip6t_HL.h
  HDRINST usr/include/linux/netfilter_ipv6/ip6t_hl.h
  HDRINST usr/include/linux/netfilter_ipv6/ip6t_frag.h
  HDRINST usr/include/linux/netfilter_ipv6/ip6t_ipv6header.h
  HDRINST usr/include/linux/minix_fs.h
  HDRINST usr/include/linux/aio_abi.h
  HDRINST usr/include/linux/pktcdvd.h
  HDRINST usr/include/linux/libc-compat.h
  HDRINST usr/include/linux/atmlec.h
  HDRINST usr/include/linux/signalfd.h
  HDRINST usr/include/linux/bpf_common.h
  HDRINST usr/include/linux/seg6_iptunnel.h
  HDRINST usr/include/linux/synclink.h
  HDRINST usr/include/linux/mpls_iptunnel.h
  HDRINST usr/include/linux/mctp.h
  HDRINST usr/include/linux/if_xdp.h
  HDRINST usr/include/linux/llc.h
  HDRINST usr/include/linux/atmsvc.h
  HDRINST usr/include/linux/sed-opal.h
  HDRINST usr/include/linux/sock_diag.h
  HDRINST usr/include/linux/time.h
  HDRINST usr/include/linux/securebits.h
  HDRINST usr/include/linux/fsl_hypervisor.h
  HDRINST usr/include/linux/if_hippi.h
  HDRINST usr/include/linux/seccomp.h
  HDRINST usr/include/linux/oom.h
  HDRINST usr/include/linux/filter.h
  HDRINST usr/include/linux/inotify.h
  HDRINST usr/include/linux/rfkill.h
  HDRINST usr/include/linux/reboot.h
  HDRINST usr/include/linux/can/vxcan.h
  HDRINST usr/include/linux/can/j1939.h
  HDRINST usr/include/linux/can/netlink.h
  HDRINST usr/include/linux/can/bcm.h
  HDRINST usr/include/linux/can/raw.h
  HDRINST usr/include/linux/can/gw.h
  HDRINST usr/include/linux/can/error.h
  HDRINST usr/include/linux/can/isotp.h
  HDRINST usr/include/linux/if_eql.h
  HDRINST usr/include/linux/hiddev.h
  HDRINST usr/include/linux/blktrace_api.h
  HDRINST usr/include/linux/ccs.h
  HDRINST usr/include/linux/ioam6.h
  HDRINST usr/include/linux/hsr_netlink.h
  HDRINST usr/include/linux/mmc/ioctl.h
  HDRINST usr/include/linux/bfs_fs.h
  HDRINST usr/include/linux/rio_cm_cdev.h
  HDRINST usr/include/linux/uleds.h
  HDRINST usr/include/linux/mrp_bridge.h
  HDRINST usr/include/linux/adb.h
  HDRINST usr/include/linux/pmu.h
  HDRINST usr/include/linux/udmabuf.h
  HDRINST usr/include/linux/kcmp.h
  HDRINST usr/include/linux/dma-heap.h
  HDRINST usr/include/linux/userfaultfd.h
  HDRINST usr/include/linux/netfilter_arp/arpt_mangle.h
  HDRINST usr/include/linux/netfilter_arp/arp_tables.h
  HDRINST usr/include/linux/tipc.h
  HDRINST usr/include/linux/virtio_ids.h
  HDRINST usr/include/linux/l2tp.h
  HDRINST usr/include/linux/devlink.h
  HDRINST usr/include/linux/virtio_gpio.h
  HDRINST usr/include/linux/dcbnl.h
  HDRINST usr/include/linux/cyclades.h
  HDRINST usr/include/sound/intel/avs/tokens.h
  HDRINST usr/include/sound/sof/fw.h
  HDRINST usr/include/sound/sof/abi.h
  HDRINST usr/include/sound/sof/tokens.h
  HDRINST usr/include/sound/sof/header.h
  HDRINST usr/include/sound/usb_stream.h
  HDRINST usr/include/sound/sfnt_info.h
  HDRINST usr/include/sound/asequencer.h
  HDRINST usr/include/sound/tlv.h
  HDRINST usr/include/sound/asound.h
  HDRINST usr/include/sound/asoc.h
  HDRINST usr/include/sound/sb16_csp.h
  HDRINST usr/include/sound/compress_offload.h
  HDRINST usr/include/sound/hdsp.h
  HDRINST usr/include/sound/emu10k1.h
  HDRINST usr/include/sound/snd_ar_tokens.h
  HDRINST usr/include/sound/snd_sst_tokens.h
  HDRINST usr/include/sound/asound_fm.h
  HDRINST usr/include/sound/hdspm.h
  HDRINST usr/include/sound/compress_params.h
  HDRINST usr/include/sound/firewire.h
  HDRINST usr/include/sound/skl-tplg-interface.h
  HDRINST usr/include/scsi/scsi_bsg_ufs.h
  HDRINST usr/include/scsi/scsi_netlink_fc.h
  HDRINST usr/include/scsi/scsi_bsg_mpi3mr.h
  HDRINST usr/include/scsi/fc/fc_ns.h
  HDRINST usr/include/scsi/fc/fc_fs.h
  HDRINST usr/include/scsi/fc/fc_els.h
  HDRINST usr/include/scsi/fc/fc_gs.h
  HDRINST usr/include/scsi/scsi_bsg_fc.h
  HDRINST usr/include/scsi/cxlflash_ioctl.h
  HDRINST usr/include/scsi/scsi_netlink.h
  HDRINST usr/include/linux/version.h
  HDRINST usr/include/asm/processor-flags.h
  HDRINST usr/include/asm/auxvec.h
  HDRINST usr/include/asm/svm.h
  HDRINST usr/include/asm/bitsperlong.h
  HDRINST usr/include/asm/kvm_perf.h
  HDRINST usr/include/asm/mce.h
  HDRINST usr/include/asm/posix_types.h
  HDRINST usr/include/asm/msr.h
  HDRINST usr/include/asm/sigcontext32.h
  HDRINST usr/include/asm/mman.h
  HDRINST usr/include/asm/shmbuf.h
  HDRINST usr/include/asm/e820.h
  HDRINST usr/include/asm/posix_types_64.h
  HDRINST usr/include/asm/vsyscall.h
  HDRINST usr/include/asm/msgbuf.h
  HDRINST usr/include/asm/swab.h
  HDRINST usr/include/asm/statfs.h
  HDRINST usr/include/asm/ptrace.h
  HDRINST usr/include/asm/posix_types_x32.h
  HDRINST usr/include/asm/unistd.h
  HDRINST usr/include/asm/ist.h
  HDRINST usr/include/asm/prctl.h
  HDRINST usr/include/asm/boot.h
  HDRINST usr/include/asm/sigcontext.h
  HDRINST usr/include/asm/posix_types_32.h
  HDRINST usr/include/asm/kvm_para.h
  HDRINST usr/include/asm/a.out.h
  HDRINST usr/include/asm/mtrr.h
  HDRINST usr/include/asm/amd_hsmp.h
  HDRINST usr/include/asm/hwcap2.h
  HDRINST usr/include/asm/ptrace-abi.h
  HDRINST usr/include/asm/vm86.h
  HDRINST usr/include/asm/vmx.h
  HDRINST usr/include/asm/ldt.h
  HDRINST usr/include/asm/perf_regs.h
  HDRINST usr/include/asm/kvm.h
  HDRINST usr/include/asm/debugreg.h
  HDRINST usr/include/asm/signal.h
  HDRINST usr/include/asm/bootparam.h
  HDRINST usr/include/asm/siginfo.h
  HDRINST usr/include/asm/hw_breakpoint.h
  HDRINST usr/include/asm/stat.h
  HDRINST usr/include/asm/setup.h
  HDRINST usr/include/asm/sembuf.h
  HDRINST usr/include/asm/sgx.h
  HDRINST usr/include/asm/ucontext.h
  HDRINST usr/include/asm/byteorder.h
  HDRINST usr/include/asm/unistd_64.h
  HDRINST usr/include/asm/ioctls.h
  HDRINST usr/include/asm/bpf_perf_event.h
  HDRINST usr/include/asm/types.h
  HDRINST usr/include/asm/poll.h
  HDRINST usr/include/asm/resource.h
  HDRINST usr/include/asm/param.h
  HDRINST usr/include/asm/sockios.h
  HDRINST usr/include/asm/errno.h
  HDRINST usr/include/asm/unistd_x32.h
  HDRINST usr/include/asm/termios.h
  HDRINST usr/include/asm/ioctl.h
  HDRINST usr/include/asm/socket.h
  HDRINST usr/include/asm/unistd_32.h
  HDRINST usr/include/asm/termbits.h
  HDRINST usr/include/asm/fcntl.h
  HDRINST usr/include/asm/ipcbuf.h
  HOSTLD  scripts/mod/modpost
  CC      kernel/bounds.s
  CHKSHA1 ../include/linux/atomic/atomic-arch-fallback.h
  CHKSHA1 ../include/linux/atomic/atomic-instrumented.h
  CHKSHA1 ../include/linux/atomic/atomic-long.h
  UPD     include/generated/timeconst.h
  UPD     include/generated/bounds.h
  CC      arch/x86/kernel/asm-offsets.s
  LD      /kernel/build64/tools/objtool/arch/x86/objtool-in.o
  UPD     include/generated/asm-offsets.h
  CALL    ../scripts/checksyscalls.sh
  LD      /kernel/build64/tools/objtool/objtool-in.o
  LINK    /kernel/build64/tools/objtool/objtool
  LDS     scripts/module.lds
  CC      ipc/compat.o
  CC      ipc/util.o
  CC      ipc/msgutil.o
  CC      ipc/msg.o
  CC      ipc/sem.o
  AR      certs/built-in.a
  HOSTCC  usr/gen_init_cpio
  CC      ipc/shm.o
  CC      ipc/syscall.o
  AS      arch/x86/lib/clear_page_64.o
  CC      ipc/ipc_sysctl.o
  CC      io_uring/io_uring.o
  CC      arch/x86/lib/cmdline.o
  CC      init/main.o
  CC      security/commoncap.o
  CC      io_uring/xattr.o
  CC      mm/filemap.o
  AS      arch/x86/lib/cmpxchg16b_emu.o
  CC      arch/x86/lib/copy_mc.o
  CC      io_uring/nop.o
  AR      arch/x86/video/built-in.a
  CC      ipc/mqueue.o
  CC      security/min_addr.o
  CC      arch/x86/power/cpu.o
  AR      virt/lib/built-in.a
  UPD     init/utsversion-tmp.h
  CC      arch/x86/pci/i386.o
  CC      arch/x86/realmode/init.o
  AR      arch/x86/ia32/built-in.a
  AS      arch/x86/crypto/aesni-intel_asm.o
  CC [M]  virt/lib/irqbypass.o
  AR      sound/ppc/built-in.a
  CC      block/partitions/core.o
  AR      drivers/irqchip/built-in.a
  CC      security/keys/gc.o
  CC [M]  arch/x86/video/fbdev.o
  CC      net/core/sock.o
  CC      sound/core/seq/seq.o
  AR      sound/pci/ac97/built-in.a
  CC      fs/notify/dnotify/dnotify.o
  AR      sound/isa/ad1816a/built-in.a
  CC      arch/x86/mm/pat/set_memory.o
  AR      sound/i2c/other/built-in.a
  AR      sound/drivers/opl3/built-in.a
  CC      ipc/namespace.o
  CC      arch/x86/events/amd/core.o
  CC      ipc/mq_sysctl.o
  AR      sound/i2c/built-in.a
  CC      arch/x86/kernel/fpu/init.o
  CC      net/core/request_sock.o
  AR      sound/arm/built-in.a
  AR      sound/pci/ali5451/built-in.a
  AR      drivers/bus/mhi/built-in.a
  AR      sound/drivers/opl4/built-in.a
  CC      lib/kunit/test.o
  AR      sound/isa/ad1848/built-in.a
  CC      arch/x86/entry/vdso/vma.o
  AS      arch/x86/realmode/rm/header.o
  AR      drivers/bus/built-in.a
  AR      sound/isa/cs423x/built-in.a
  AR      sound/pci/asihpi/built-in.a
  CC      mm/kasan/common.o
  AR      sound/drivers/mpu401/built-in.a
  CC      lib/math/div64.o
  AR      sound/pci/au88x0/built-in.a
  AR      sound/isa/es1688/built-in.a
  AR      sound/isa/galaxy/built-in.a
  CC      kernel/sched/core.o
  CC      arch/x86/crypto/aesni-intel_glue.o
  AR      sound/drivers/vx/built-in.a
  AS      arch/x86/realmode/rm/trampoline_64.o
  AR      drivers/phy/allwinner/built-in.a
  AR      sound/isa/gus/built-in.a
  AR      sound/pci/aw2/built-in.a
  AR      sound/isa/msnd/built-in.a
  AR      sound/drivers/pcsp/built-in.a
  CC      arch/x86/pci/init.o
  CC      crypto/api.o
  AR      drivers/phy/amlogic/built-in.a
  AR      sound/drivers/built-in.a
  AR      sound/pci/ctxfi/built-in.a
  AR      sound/isa/opti9xx/built-in.a
  AR      drivers/phy/broadcom/built-in.a
  AR      sound/pci/ca0106/built-in.a
  AS      arch/x86/realmode/rm/stack.o
  AR      sound/isa/sb/built-in.a
  CC      lib/math/gcd.o
  AR      drivers/phy/cadence/built-in.a
  AR      sound/pci/cs46xx/built-in.a
  AR      sound/isa/wavefront/built-in.a
  AR      drivers/phy/freescale/built-in.a
  AS      arch/x86/realmode/rm/reboot.o
  CC      arch/x86/events/amd/lbr.o
  AR      sound/pci/cs5535audio/built-in.a
  AR      sound/pci/lola/built-in.a
  AR      sound/isa/wss/built-in.a
  AR      drivers/phy/hisilicon/built-in.a
  AR      sound/isa/built-in.a
  AS      arch/x86/lib/copy_mc_64.o
  AR      sound/pci/lx6464es/built-in.a
  AR      drivers/phy/ingenic/built-in.a
  AR      sound/pci/echoaudio/built-in.a
  AR      drivers/phy/intel/built-in.a
  AS      arch/x86/realmode/rm/wakeup_asm.o
  CC      arch/x86/realmode/rm/wakemain.o
  CC      block/bdev.o
  AR      sound/pci/emu10k1/built-in.a
  AR      drivers/phy/lantiq/built-in.a
  AR      drivers/phy/marvell/built-in.a
  CC      lib/math/lcm.o
  CC      lib/math/int_pow.o
  AR      sound/pci/hda/built-in.a
  AR      drivers/phy/mediatek/built-in.a
  CC [M]  sound/pci/hda/hda_bind.o
  AR      drivers/phy/microchip/built-in.a
  AR      drivers/phy/motorola/built-in.a
  AR      drivers/phy/mscc/built-in.a
  CC [M]  sound/pci/hda/hda_codec.o
  CC      arch/x86/realmode/rm/video-mode.o
  CC      lib/math/int_sqrt.o
  AR      drivers/phy/qualcomm/built-in.a
  AR      drivers/phy/ralink/built-in.a
  AR      drivers/phy/renesas/built-in.a
  GEN     usr/initramfs_data.cpio
  AR      drivers/phy/rockchip/built-in.a
  AS      arch/x86/lib/copy_page_64.o
  COPY    usr/initramfs_inc_data
  AS      usr/initramfs_data.o
  AR      drivers/phy/samsung/built-in.a
  AR      drivers/phy/socionext/built-in.a
  CC      lib/math/reciprocal_div.o
  AS      arch/x86/lib/copy_user_64.o
  AR      usr/built-in.a
  AR      drivers/phy/st/built-in.a
  CC [M]  sound/pci/hda/hda_jack.o
  AR      drivers/phy/sunplus/built-in.a
  AR      drivers/phy/tegra/built-in.a
  AS      arch/x86/lib/copy_user_uncached_64.o
  AR      drivers/phy/ti/built-in.a
  AS      arch/x86/realmode/rm/copy.o
  CC      arch/x86/kernel/fpu/bugs.o
  AR      drivers/phy/xilinx/built-in.a
  CC      drivers/phy/phy-core.o
  CC      arch/x86/lib/cpu.o
  AS      arch/x86/realmode/rm/bioscall.o
  CC      lib/math/rational.o
  CC      arch/x86/realmode/rm/regs.o
  CC      kernel/locking/mutex.o
  AR      virt/built-in.a
  CC      sound/core/seq/seq_lock.o
  CC      arch/x86/kernel/fpu/core.o
  CC      arch/x86/realmode/rm/video-vga.o
  CC      kernel/locking/semaphore.o
  CC      kernel/locking/rwsem.o
  CC      sound/core/seq/seq_clientmgr.o
  CC      kernel/sched/fair.o
  CC      kernel/power/qos.o
  CC      arch/x86/realmode/rm/video-vesa.o
  CC      kernel/printk/printk.o
  CC      kernel/locking/percpu-rwsem.o
  CC      kernel/irq/irqdesc.o
  CC      security/keys/key.o
  AR      fs/notify/dnotify/built-in.a
  CC      kernel/rcu/update.o
  CC      kernel/rcu/sync.o
  CC      fs/notify/inotify/inotify_fsnotify.o
  CC      arch/x86/realmode/rm/video-bios.o
  CC      arch/x86/pci/mmconfig_64.o
  CC      fs/notify/inotify/inotify_user.o
  CC      lib/kunit/resource.o
  CC      io_uring/fs.o
  PASYMS  arch/x86/realmode/rm/pasyms.h
  CC      arch/x86/lib/delay.o
  LDS     arch/x86/realmode/rm/realmode.lds
  CC [M]  lib/math/prime_numbers.o
  LD      arch/x86/realmode/rm/realmode.elf
  RELOCS  arch/x86/realmode/rm/realmode.relocs
  OBJCOPY arch/x86/realmode/rm/realmode.bin
  AS      arch/x86/realmode/rmpiggy.o
  CC      crypto/cipher.o
  CC      mm/kasan/report.o
  CC      arch/x86/entry/vdso/extable.o
  AR      arch/x86/realmode/built-in.a
  CC      arch/x86/power/hibernate_64.o
  CC      lib/crypto/memneq.o
  CC      block/partitions/ldm.o
  CC      kernel/rcu/srcutree.o
  CC      arch/x86/pci/direct.o
  CC      lib/crypto/utils.o
  CC      arch/x86/events/amd/ibs.o
  CC      io_uring/splice.o
  AS      arch/x86/crypto/aesni-intel_avx-x86_64.o
  CC      init/do_mounts.o
  AS      arch/x86/lib/getuser.o
  GEN     arch/x86/lib/inat-tables.c
  CC      arch/x86/lib/insn-eval.o
  CC [M]  sound/pci/hda/hda_auto_parser.o
  CC      kernel/rcu/tree.o
  CC      security/keys/keyring.o
  AR      kernel/livepatch/built-in.a
  CC      lib/kunit/static_stub.o
  CC      kernel/locking/irqflag-debug.o
  CC      kernel/dma/mapping.o
  CC      kernel/dma/direct.o
  CC      lib/crypto/chacha.o
  CC      block/fops.o
  CC      crypto/compress.o
  CC      kernel/dma/ops_helpers.o
  AS      arch/x86/crypto/aes_ctrby8_avx-x86_64.o
  CC [M]  sound/pci/hda/hda_sysfs.o
  CC      kernel/dma/dummy.o
  AR      lib/math/built-in.a
  AR      drivers/phy/built-in.a
  CC      arch/x86/pci/mmconfig-shared.o
  AR      drivers/pinctrl/actions/built-in.a
  AR      drivers/pinctrl/bcm/built-in.a
  CC      block/bio.o
  AS [M]  arch/x86/crypto/ghash-clmulni-intel_asm.o
  AR      drivers/pinctrl/cirrus/built-in.a
  CC      lib/crypto/aes.o
  CC      kernel/irq/handle.o
  AR      drivers/pinctrl/freescale/built-in.a
  CC      drivers/gpio/gpiolib.o
  CC [M]  arch/x86/crypto/ghash-clmulni-intel_glue.o
  CC      drivers/pinctrl/intel/pinctrl-baytrail.o
  CC      init/do_mounts_initrd.o
  CC      arch/x86/entry/vdso/vdso32-setup.o
  AS      arch/x86/power/hibernate_asm_64.o
  CC      kernel/power/main.o
  CC      arch/x86/power/hibernate.o
  CC      kernel/irq/manage.o
  CC      lib/crypto/gf128mul.o
  CC      arch/x86/mm/pat/memtype.o
  CC      drivers/gpio/gpiolib-devres.o
  CC      mm/kasan/init.o
  CC      crypto/algapi.o
  AR      fs/notify/inotify/built-in.a
  CC      kernel/irq/spurious.o
  CC      lib/kunit/string-stream.o
  CC      fs/notify/fanotify/fanotify.o
  LDS     arch/x86/entry/vdso/vdso.lds
  CC      fs/notify/fanotify/fanotify_user.o
  CC      arch/x86/kernel/fpu/regset.o
  AS      arch/x86/entry/vdso/vdso-note.o
  CC      arch/x86/entry/vdso/vclock_gettime.o
  CC      kernel/dma/contiguous.o
  CC      fs/notify/fsnotify.o
  CC      kernel/dma/swiotlb.o
  CC      sound/core/seq/seq_memory.o
  AS [M]  arch/x86/crypto/crc32-pclmul_asm.o
  CC [M]  arch/x86/crypto/crc32-pclmul_glue.o
  CC [M]  sound/pci/hda/hda_controller.o
  CC      block/partitions/msdos.o
  CC      arch/x86/lib/insn.o
  CC      fs/nfs_common/grace.o
  CC      fs/iomap/trace.o
  AR      fs/quota/built-in.a
  CC      init/initramfs.o
  CC      arch/x86/events/amd/uncore.o
  CC      lib/crypto/blake2s.o
  AR      ipc/built-in.a
  CC      kernel/dma/remap.o
  CC      kernel/locking/mutex-debug.o
  CC      io_uring/sync.o
  CC      lib/kunit/assert.o
  CC      fs/iomap/iter.o
  CC      fs/iomap/buffered-io.o
  CC      fs/proc/task_mmu.o
  CC      lib/crypto/blake2s-generic.o
  AR      arch/x86/power/built-in.a
  CC      security/keys/keyctl.o
  CC      fs/notify/notification.o
  CC      arch/x86/entry/vdso/vgetcpu.o
  CC      arch/x86/pci/fixup.o
  CC      fs/notify/group.o
  AR      arch/x86/platform/atom/built-in.a
  AR      arch/x86/platform/ce4100/built-in.a
  CC      arch/x86/platform/efi/memmap.o
  HOSTCC  arch/x86/entry/vdso/vdso2c
  CC      arch/x86/platform/efi/quirks.o
  CC      kernel/irq/resend.o
  AS      arch/x86/lib/memcpy_64.o
  AS [M]  arch/x86/crypto/crct10dif-pcl-asm_64.o
  CC      arch/x86/kernel/fpu/signal.o
  CC      arch/x86/mm/pat/memtype_interval.o
  CC      mm/kasan/generic.o
  AS      arch/x86/lib/memmove_64.o
  AR      arch/x86/platform/geode/built-in.a
  CC [M]  arch/x86/crypto/crct10dif-pclmul_glue.o
  CC      drivers/pinctrl/intel/pinctrl-intel.o
  CC      kernel/power/console.o
  CC      arch/x86/platform/efi/efi.o
  CC [M]  sound/pci/hda/hda_proc.o
  AS      arch/x86/lib/memset_64.o
  CC      arch/x86/platform/efi/efi_64.o
  CC      arch/x86/lib/misc.o
  CC      arch/x86/lib/pc-conf-reg.o
  CC      kernel/printk/printk_safe.o
  CC      lib/kunit/try-catch.o
  CC      lib/crypto/blake2s-selftest.o
  CC      lib/crypto/des.o
  CC      sound/core/seq/seq_queue.o
  CC      security/keys/permission.o
  LDS     arch/x86/entry/vdso/vdso32/vdso32.lds
  CC      kernel/locking/lockdep.o
  AS      arch/x86/entry/vdso/vdso32/note.o
  AS      arch/x86/lib/putuser.o
  AR      fs/nfs_common/built-in.a
  AS      arch/x86/entry/vdso/vdso32/sigreturn.o
  AS      arch/x86/entry/vdso/vdso32/system_call.o
  CC      net/llc/llc_core.o
  AS      arch/x86/lib/retpoline.o
  CC      net/llc/llc_input.o
  CC      arch/x86/entry/vdso/vdso32/vclock_gettime.o
  CC      block/partitions/efi.o
  CC      arch/x86/lib/usercopy.o
  CC      net/llc/llc_output.o
  CC      crypto/scatterwalk.o
  CC      kernel/irq/chip.o
  LD [M]  arch/x86/crypto/ghash-clmulni-intel.o
  CC      io_uring/advise.o
  LD [M]  arch/x86/crypto/crc32-pclmul.o
  CC      kernel/irq/dummychip.o
  CC      kernel/printk/printk_ringbuffer.o
  LD [M]  arch/x86/crypto/crct10dif-pclmul.o
  AR      arch/x86/crypto/built-in.a
  CC      arch/x86/lib/usercopy_64.o
  AR      arch/x86/platform/iris/built-in.a
  CC      lib/crypto/sha1.o
  CC      init/calibrate.o
  CC      init/init_task.o
  CC      sound/core/sound.o
  AR      arch/x86/mm/pat/built-in.a
  AR      arch/x86/events/amd/built-in.a
  CC      arch/x86/mm/init.o
  CC      lib/kunit/executor.o
  CC      arch/x86/events/intel/core.o
  CC      arch/x86/events/intel/bts.o
  AR      kernel/dma/built-in.a
  AS      arch/x86/platform/efi/efi_stub_64.o
  CC      arch/x86/pci/acpi.o
  CC      arch/x86/events/intel/ds.o
  CC      arch/x86/events/intel/knc.o
  CC      kernel/power/process.o
  CC      block/elevator.o
  CC      io_uring/filetable.o
  AR      fs/notify/fanotify/built-in.a
  CC      fs/notify/mark.o
  CC      kernel/irq/devres.o
  CC      sound/core/seq/seq_fifo.o
  CC      arch/x86/kernel/fpu/xstate.o
  AR      drivers/pinctrl/mediatek/built-in.a
  CC      mm/kasan/report_generic.o
  CC      lib/crypto/sha256.o
  CC      block/blk-core.o
  CC      mm/mempool.o
  CC      sound/core/seq/seq_prioq.o
  AR      arch/x86/platform/efi/built-in.a
  CC      net/core/skbuff.o
  CC      arch/x86/platform/intel/iosf_mbi.o
  CC      sound/core/seq/seq_timer.o
  CC      sound/core/init.o
  CC      arch/x86/entry/vdso/vdso32/vgetcpu.o
  CC      security/keys/process_keys.o
  CC      crypto/proc.o
  CC      arch/x86/lib/msr-smp.o
  VDSO    arch/x86/entry/vdso/vdso64.so.dbg
  CC [M]  sound/pci/hda/hda_hwdep.o
  VDSO    arch/x86/entry/vdso/vdso32.so.dbg
  CC      crypto/aead.o
  OBJCOPY arch/x86/entry/vdso/vdso64.so
  OBJCOPY arch/x86/entry/vdso/vdso32.so
  CC      lib/kunit/hooks.o
  VDSO2C  arch/x86/entry/vdso/vdso-image-64.c
  VDSO2C  arch/x86/entry/vdso/vdso-image-32.c
  CC      arch/x86/entry/vdso/vdso-image-64.o
  CC      kernel/printk/sysctl.o
  CC      sound/core/memory.o
  CC      sound/core/control.o
  CC      arch/x86/events/zhaoxin/core.o
  AR      net/llc/built-in.a
  CC      sound/core/misc.o
  CC      init/version.o
  CC      net/ethernet/eth.o
  AR      block/partitions/built-in.a
  CC      block/blk-sysfs.o
  CC      security/keys/request_key.o
  CC [M]  drivers/pinctrl/intel/pinctrl-cherryview.o
  CC      net/802/p8022.o
  CC      arch/x86/entry/vdso/vdso-image-32.o
  CC      arch/x86/lib/cache-smp.o
  AR      lib/kunit/built-in.a
  CC      arch/x86/pci/legacy.o
  AR      kernel/printk/built-in.a
  CC      kernel/entry/common.o
  CC      arch/x86/kernel/cpu/mce/core.o
  CC      kernel/module/main.o
  AR      init/built-in.a
  CC      kernel/time/time.o
  CC [M]  lib/crypto/arc4.o
  CC      kernel/futex/core.o
  CC      fs/proc/inode.o
  CC      kernel/irq/autoprobe.o
  CC      arch/x86/lib/msr.o
  CC      kernel/futex/syscalls.o
  CC      kernel/time/timer.o
  CC      arch/x86/kernel/cpu/mtrr/mtrr.o
  AR      arch/x86/entry/vdso/built-in.a
  CC      arch/x86/entry/vsyscall/vsyscall_64.o
  CC      mm/kasan/shadow.o
  CC      drivers/gpio/gpiolib-legacy.o
  AS      arch/x86/entry/vsyscall/vsyscall_emu_64.o
  CC      kernel/power/suspend.o
  CC      kernel/power/hibernate.o
  AR      arch/x86/platform/intel/built-in.a
  AR      arch/x86/platform/intel-mid/built-in.a
  CC      arch/x86/mm/init_64.o
  CC      fs/iomap/direct-io.o
  CC      sound/core/seq/seq_system.o
  AR      arch/x86/platform/intel-quark/built-in.a
  CC      fs/notify/fdinfo.o
  AR      arch/x86/platform/olpc/built-in.a
  CC [M]  sound/pci/hda/hda_generic.o
  AR      arch/x86/platform/scx200/built-in.a
  AR      arch/x86/platform/ts5500/built-in.a
  CC      kernel/locking/lockdep_proc.o
  AR      arch/x86/platform/uv/built-in.a
  AR      arch/x86/platform/built-in.a
  AR      drivers/pinctrl/mvebu/built-in.a
  CC      kernel/entry/syscall_user_dispatch.o
  AR      drivers/pinctrl/nomadik/built-in.a
  CC      kernel/entry/kvm.o
  AR      drivers/pinctrl/nuvoton/built-in.a
  CC      net/core/datagram.o
  CC      arch/x86/kernel/cpu/mce/severity.o
  AR      lib/crypto/built-in.a
  LD [M]  lib/crypto/libarc4.o
  CC      lib/zlib_inflate/inffast.o
  CC      crypto/geniv.o
  CC      block/blk-flush.o
  CC      arch/x86/pci/irq.o
  CC      arch/x86/kernel/cpu/mtrr/if.o
  AR      arch/x86/events/zhaoxin/built-in.a
  CC      fs/kernfs/mount.o
  CC      kernel/irq/irqdomain.o
  CC      lib/zlib_inflate/inflate.o
  CC      net/802/psnap.o
  AR      arch/x86/kernel/fpu/built-in.a
  CC      fs/kernfs/inode.o
  CC      drivers/gpio/gpiolib-cdev.o
  CC      security/keys/request_key_auth.o
  CC      io_uring/openclose.o
  CC      mm/kasan/quarantine.o
  CC      sound/core/seq/seq_ports.o
  AR      sound/sh/built-in.a
  AR      sound/synth/emux/built-in.a
  AR      sound/synth/built-in.a
  CC      fs/proc/root.o
  AR      sound/usb/misc/built-in.a
  AR      sound/usb/usx2y/built-in.a
  AS      arch/x86/lib/msr-reg.o
  AR      fs/notify/built-in.a
  AR      sound/usb/caiaq/built-in.a
  CC      kernel/futex/pi.o
  AR      sound/usb/6fire/built-in.a
  CC      arch/x86/kernel/cpu/mtrr/generic.o
  CC      arch/x86/lib/msr-reg-export.o
  AR      sound/usb/hiface/built-in.a
  AR      sound/usb/bcd2000/built-in.a
  CC      sound/core/device.o
  AR      sound/usb/built-in.a
  AR      sound/firewire/built-in.a
  CC      fs/proc/base.o
  AR      arch/x86/entry/vsyscall/built-in.a
  AS      arch/x86/entry/entry.o
  CC      lib/zlib_deflate/deflate.o
  CC      lib/zlib_deflate/deftree.o
  AR      net/ethernet/built-in.a
  CC      lib/zlib_deflate/deflate_syms.o
  AS      arch/x86/lib/hweight.o
  AS      arch/x86/entry/entry_64.o
  CC [M]  drivers/pinctrl/intel/pinctrl-broxton.o
  CC      kernel/time/hrtimer.o
  CC      fs/sysfs/file.o
  CC      arch/x86/entry/syscall_64.o
  CC      arch/x86/lib/iomem.o
  CC      fs/sysfs/dir.o
  CC      sound/core/seq/seq_info.o
  CC      net/802/stp.o
  AR      kernel/entry/built-in.a
  CC      drivers/gpio/gpiolib-sysfs.o
  CC      fs/configfs/inode.o
  CC      crypto/skcipher.o
  CC      fs/configfs/file.o
  CC      fs/iomap/fiemap.o
  CC      kernel/time/timekeeping.o
  CC      kernel/cgroup/cgroup.o
  CC      security/keys/user_defined.o
  CC      kernel/power/snapshot.o
  CC      lib/zlib_inflate/infutil.o
  CC      kernel/trace/trace_clock.o
  CC      lib/zlib_inflate/inftrees.o
  CC      fs/kernfs/dir.o
  AS      arch/x86/lib/iomap_copy_64.o
  CC      fs/kernfs/file.o
  CC      arch/x86/lib/inat.o
  CC      kernel/power/swap.o
  CC      kernel/time/ntp.o
  AR      mm/kasan/built-in.a
  CC      mm/oom_kill.o
  CC [M]  drivers/pinctrl/intel/pinctrl-geminilake.o
  CC [M]  drivers/pinctrl/intel/pinctrl-sunrisepoint.o
  CC      fs/configfs/dir.o
  CC      fs/devpts/inode.o
  CC      kernel/futex/requeue.o
  AR      arch/x86/lib/built-in.a
  AR      arch/x86/lib/lib.a
  AR      sound/core/seq/built-in.a
  CC      sound/core/info.o
  CC      kernel/locking/spinlock.o
  CC      arch/x86/kernel/cpu/mce/genpool.o
  CC      block/blk-settings.o
  CC      arch/x86/entry/common.o
  CC      arch/x86/mm/fault.o
  CC      arch/x86/kernel/cpu/mce/intel.o
  CC      arch/x86/pci/common.o
  CC      lib/zlib_inflate/inflate_syms.o
  CC      arch/x86/kernel/cpu/mtrr/cleanup.o
  CC      arch/x86/events/intel/lbr.o
  CC      kernel/trace/ftrace.o
  CC      kernel/irq/proc.o
  CC      kernel/rcu/rcu_segcblist.o
  CC      fs/sysfs/symlink.o
  CC      fs/iomap/seek.o
  CC      fs/kernfs/symlink.o
  AR      lib/zlib_deflate/built-in.a
  AR      net/802/built-in.a
  CC      io_uring/uring_cmd.o
  AR      sound/sparc/built-in.a
  CC      fs/configfs/symlink.o
  AR      sound/spi/built-in.a
  CC      net/sched/sch_generic.o
  CC      net/netlink/af_netlink.o
  AR      arch/x86/net/built-in.a
  CC      security/keys/compat.o
  CC      net/netlink/genetlink.o
  CC      net/sched/sch_mq.o
  CC      net/netlink/policy.o
  CC      kernel/cgroup/rstat.o
  CC      kernel/irq/migration.o
  AR      lib/zlib_inflate/built-in.a
  AR      drivers/pinctrl/intel/built-in.a
  CC      lib/lzo/lzo1x_compress.o
  AR      drivers/pinctrl/nxp/built-in.a
  AR      drivers/pinctrl/sprd/built-in.a
  AR      drivers/pinctrl/sunplus/built-in.a
  AR      drivers/pinctrl/ti/built-in.a
  CC      kernel/irq/cpuhotplug.o
  CC      drivers/pinctrl/core.o
  CC      kernel/futex/waitwake.o
  CC      kernel/module/strict_rwx.o
  AS      arch/x86/entry/thunk_64.o
  AR      drivers/pwm/built-in.a
  CC      kernel/trace/ring_buffer.o
  CC      drivers/pci/msi/pcidev_msi.o
  CC      arch/x86/kernel/cpu/mce/threshold.o
  CC      crypto/seqiv.o
  AS      arch/x86/entry/entry_64_compat.o
  AR      fs/devpts/built-in.a
  CC      drivers/gpio/gpiolib-acpi.o
  CC      drivers/gpio/gpiolib-swnode.o
  CC      arch/x86/entry/syscall_32.o
  CC      fs/sysfs/mount.o
  CC      drivers/pci/msi/api.o
  AR      kernel/rcu/built-in.a
  CC      drivers/pci/pcie/portdrv.o
  CC      kernel/trace/trace.o
  CC      kernel/trace/trace_output.o
  CC      sound/core/isadma.o
  CC      arch/x86/pci/early.o
  CC      fs/iomap/swapfile.o
  CC      block/blk-ioc.o
  CC      fs/configfs/mount.o
  CC      crypto/echainiv.o
  CC      arch/x86/kernel/cpu/mce/apei.o
  AR      fs/kernfs/built-in.a
  CC      security/keys/proc.o
  CC      kernel/trace/trace_seq.o
  CC      lib/lzo/lzo1x_decompress_safe.o
  CC      kernel/locking/osq_lock.o
  AR      arch/x86/kernel/cpu/mtrr/built-in.a
  CC      kernel/irq/pm.o
  CC      kernel/trace/trace_stat.o
  CC      kernel/time/clocksource.o
  AR      kernel/futex/built-in.a
  CC      kernel/cgroup/namespace.o
  CC      kernel/module/kmod.o
  CC      arch/x86/kernel/cpu/cacheinfo.o
  AR      sound/parisc/built-in.a
  CC      kernel/power/user.o
  CC      kernel/power/poweroff.o
  CC      arch/x86/kernel/cpu/scattered.o
  CC      block/blk-map.o
  CC      fs/sysfs/group.o
  CC      kernel/trace/trace_printk.o
  CC      sound/core/vmaster.o
  CC      io_uring/epoll.o
  CC      kernel/sched/build_policy.o
  CC      kernel/locking/qspinlock.o
  CC      kernel/trace/pid_list.o
  CC      arch/x86/events/intel/p4.o
  CC      arch/x86/mm/ioremap.o
  AR      arch/x86/entry/built-in.a
  CC      io_uring/statx.o
  CC      kernel/cgroup/cgroup-v1.o
  CC      block/blk-merge.o
  CC      drivers/pci/msi/msi.o
  CC      arch/x86/pci/bus_numa.o
  AR      net/bpf/built-in.a
  CC      kernel/locking/rtmutex_api.o
  CC      kernel/trace/trace_sched_switch.o
  AR      lib/lzo/built-in.a
  CC      lib/lz4/lz4_compress.o
  CC      fs/configfs/item.o
  CC      crypto/ahash.o
  CC      fs/proc/generic.o
  AR      fs/iomap/built-in.a
  AR      arch/x86/kernel/cpu/mce/built-in.a
  CC      kernel/cgroup/freezer.o
  CC      kernel/locking/spinlock_debug.o
  CC      net/sched/sch_frag.o
  CC      drivers/pci/pcie/rcec.o
  CC      arch/x86/mm/extable.o
  CC      security/keys/sysctl.o
  CC      block/blk-timeout.o
  CC      mm/fadvise.o
  CC      kernel/cgroup/legacy_freezer.o
  AR      drivers/gpio/built-in.a
  AR      sound/pcmcia/vx/built-in.a
  CC      drivers/pinctrl/pinctrl-utils.o
  AR      sound/pcmcia/pdaudiocf/built-in.a
  AR      sound/pcmcia/built-in.a
  CC      kernel/irq/msi.o
  CC      mm/maccess.o
  CC      drivers/pci/hotplug/pci_hotplug_core.o
  CC      kernel/module/tree_lookup.o
  AR      drivers/pci/controller/dwc/built-in.a
  AR      drivers/pci/controller/mobiveil/built-in.a
  CC      drivers/pci/controller/vmd.o
  AR      fs/sysfs/built-in.a
  AR      sound/pci/ice1712/built-in.a
  CC      sound/core/ctljack.o
  CC      drivers/pci/hotplug/acpi_pcihp.o
  AR      kernel/power/built-in.a
  AR      fs/configfs/built-in.a
  CC      kernel/module/debug_kmemleak.o
  CC      fs/ext4/balloc.o
  CC      kernel/module/kallsyms.o
  CC      kernel/module/procfs.o
  CC      kernel/time/jiffies.o
  CC      fs/ext4/bitmap.o
  CC      kernel/module/sysfs.o
  CC      arch/x86/pci/amd_bus.o
  CC      io_uring/net.o
  AR      security/keys/built-in.a
  CC      security/inode.o
  CC      kernel/cgroup/pids.o
  CC      net/ethtool/ioctl.o
  CC      arch/x86/mm/mmap.o
  CC      arch/x86/kernel/cpu/topology.o
  CC      drivers/pci/pcie/aspm.o
  CC      arch/x86/mm/pgtable.o
  CC      net/ethtool/common.o
  CC      drivers/pinctrl/pinmux.o
  CC      arch/x86/events/intel/p6.o
  CC      sound/core/jack.o
  AR      sound/pci/korg1212/built-in.a
  CC [M]  sound/pci/hda/patch_realtek.o
  AR      sound/pci/mixart/built-in.a
  CC      arch/x86/mm/physaddr.o
  CC      drivers/pinctrl/pinconf.o
  CC      drivers/pinctrl/pinconf-generic.o
  CC      drivers/pci/msi/irqdomain.o
  CC      kernel/cgroup/cpuset.o
  CC      kernel/time/timer_list.o
  CC      fs/proc/array.o
  CC      crypto/shash.o
  CC      kernel/time/timeconv.o
  CC      security/device_cgroup.o
  CC      arch/x86/events/intel/pt.o
  CC      mm/page-writeback.o
  CC      arch/x86/events/intel/uncore.o
  CC      kernel/locking/qrwlock.o
  CC      drivers/pci/hotplug/pciehp_core.o
  CC      arch/x86/kernel/cpu/common.o
  CC      kernel/trace/trace_functions.o
  CC      net/sched/sch_api.o
  CC      kernel/trace/trace_preemptirq.o
  CC      fs/proc/fd.o
  CC      fs/proc/proc_tty.o
  CC      fs/proc/cmdline.o
  CC      kernel/time/timecounter.o
  AR      arch/x86/pci/built-in.a
  CC      net/ethtool/netlink.o
  AR      sound/mips/built-in.a
  CC      fs/proc/consoles.o
  CC      kernel/sched/build_utility.o
  CC      fs/proc/cpuinfo.o
  AR      kernel/module/built-in.a
  CC      kernel/time/alarmtimer.o
  CC      net/core/stream.o
  AR      drivers/pci/switch/built-in.a
  CC      arch/x86/kernel/cpu/rdrand.o
  CC      block/blk-lib.o
  CC      net/sched/sch_blackhole.o
  AR      drivers/pci/controller/built-in.a
  CC      lib/lz4/lz4hc_compress.o
  CC      net/ethtool/bitset.o
  CC      block/blk-mq.o
  CC      lib/zstd/zstd_compress_module.o
  CC      sound/core/timer.o
  CC      kernel/irq/affinity.o
  CC      lib/zstd/compress/fse_compress.o
  AR      kernel/locking/built-in.a
  CC      sound/core/hrtimer.o
  CC      arch/x86/kernel/cpu/match.o
  AR      drivers/pinctrl/built-in.a
  CC      drivers/pci/access.o
  CC      drivers/pci/bus.o
  CC      arch/x86/mm/tlb.o
  CC      net/netlink/diag.o
  AR      drivers/pci/msi/built-in.a
  CC      kernel/trace/trace_nop.o
  CC      arch/x86/kernel/acpi/boot.o
  CC      arch/x86/kernel/acpi/sleep.o
  CC      lib/zstd/compress/hist.o
  CC      fs/proc/devices.o
  CC      arch/x86/kernel/cpu/bugs.o
  CC      drivers/pci/pcie/aer.o
  CC      kernel/trace/trace_functions_graph.o
  AR      sound/soc/built-in.a
  CC      drivers/pci/hotplug/pciehp_ctrl.o
  CC      kernel/trace/fgraph.o
  CC      crypto/akcipher.o
  AR      sound/pci/nm256/built-in.a
  CC      lib/zstd/compress/huf_compress.o
  CC      net/ethtool/strset.o
  CC      drivers/video/console/dummycon.o
  CC      drivers/pci/probe.o
  CC      kernel/irq/matrix.o
  CC      drivers/pci/host-bridge.o
  CC      crypto/kpp.o
  CC      lib/zstd/compress/zstd_compress.o
  CC      arch/x86/mm/cpu_entry_area.o
  CC      fs/ext4/block_validity.o
  CC      arch/x86/kernel/cpu/aperfmperf.o
  CC      lib/lz4/lz4_decompress.o
  CC      net/ethtool/linkinfo.o
  AR      security/built-in.a
  CC      drivers/video/logo/logo.o
  CC      net/ethtool/linkmodes.o
  CC      io_uring/msg_ring.o
  CC      kernel/trace/blktrace.o
  CC      fs/ext4/dir.o
  CC      kernel/time/posix-timers.o
  CC      fs/proc/interrupts.o
  HOSTCC  drivers/video/logo/pnmtologo
  CC      net/core/scm.o
  CC      drivers/pci/remove.o
  CC      drivers/video/console/vgacon.o
  AR      sound/atmel/built-in.a
  CC      drivers/pci/hotplug/pciehp_pci.o
  CC      arch/x86/kernel/cpu/cpuid-deps.o
  AR      sound/hda/built-in.a
  CC      crypto/acompress.o
  CC [M]  sound/hda/hda_bus_type.o
  CC      crypto/scompress.o
  AR      net/netlink/built-in.a
  CC [M]  net/netfilter/ipvs/ip_vs_conn.o
  AR      net/ipv4/netfilter/built-in.a
  CC      net/xfrm/xfrm_policy.o
  CC [M]  net/ipv4/netfilter/nf_defrag_ipv4.o
  CC      arch/x86/events/intel/uncore_nhmex.o
  CC      net/xfrm/xfrm_state.o
  CC      arch/x86/mm/maccess.o
  CC      arch/x86/events/intel/uncore_snb.o
  LOGO    drivers/video/logo/logo_linux_clut224.c
  CC      drivers/pci/pci.o
  CC      drivers/video/logo/logo_linux_clut224.o
  CC      net/xfrm/xfrm_hash.o
  AR      drivers/video/logo/built-in.a
  CC      net/netfilter/core.o
  CC      arch/x86/mm/pgprot.o
  AS      arch/x86/kernel/acpi/wakeup_64.o
  CC      arch/x86/kernel/acpi/apei.o
  CC      fs/proc/loadavg.o
  CC      net/ethtool/rss.o
  CC      drivers/pci/hotplug/pciehp_hpc.o
  CC [M]  sound/hda/hdac_bus.o
  AR      sound/x86/built-in.a
  CC      kernel/trace/trace_events.o
  CC      drivers/pci/pcie/err.o
  CC      fs/proc/meminfo.o
  CC      arch/x86/events/intel/uncore_snbep.o
  CC      mm/folio-compat.o
  CC      sound/core/seq_device.o
  CC      arch/x86/kernel/cpu/umwait.o
  CC      arch/x86/kernel/cpu/proc.o
  CC      net/ethtool/linkstate.o
  AR      sound/xen/built-in.a
  CC      arch/x86/events/core.o
  CC      net/core/gen_stats.o
  CC      crypto/algboss.o
  AR      kernel/irq/built-in.a
  CC [M]  net/netfilter/ipvs/ip_vs_core.o
  CC      kernel/trace/trace_export.o
  CC      kernel/trace/trace_event_perf.o
  CC      arch/x86/events/probe.o
  CC      kernel/trace/trace_events_filter.o
  AR      lib/lz4/built-in.a
  CC      net/sched/sch_fifo.o
  CC      arch/x86/mm/hugetlbpage.o
  CC      arch/x86/events/utils.o
  CC      arch/x86/events/rapl.o
  CC [M]  net/netfilter/ipvs/ip_vs_ctl.o
  CC      io_uring/timeout.o
  CC      fs/ext4/ext4_jbd2.o
  CC      kernel/bpf/core.o
  CC      arch/x86/kernel/acpi/cppc.o
  CC      kernel/time/posix-cpu-timers.o
  CC [M]  sound/pci/hda/patch_analog.o
  AR      drivers/video/console/built-in.a
  CC      drivers/video/backlight/backlight.o
  CC [M]  sound/pci/hda/patch_hdmi.o
  CC      fs/proc/stat.o
  AR      kernel/cgroup/built-in.a
  CC      net/core/gen_estimator.o
  CC      arch/x86/mm/kasan_init_64.o
  CC [M]  sound/core/control_led.o
  CC [M]  net/ipv4/netfilter/nf_reject_ipv4.o
  CC      drivers/pci/pcie/aer_inject.o
  CC [M]  sound/hda/hdac_device.o
  CC      io_uring/sqpoll.o
  CC      net/xfrm/xfrm_input.o
  CC      drivers/pci/pci-driver.o
  CC      arch/x86/events/intel/uncore_discovery.o
  MKCAP   arch/x86/kernel/cpu/capflags.c
  CC      arch/x86/kernel/apic/apic.o
  CC      mm/readahead.o
  CC      arch/x86/kernel/kprobes/core.o
  CC      arch/x86/kernel/apic/apic_common.o
  CC      arch/x86/kernel/acpi/cstate.o
  CC      arch/x86/events/msr.o
  CC      kernel/time/posix-clock.o
  CC      crypto/testmgr.o
  CC      drivers/pci/hotplug/acpiphp_core.o
  CC      net/ethtool/debug.o
  CC      crypto/cmac.o
  CC      lib/xz/xz_dec_syms.o
  CC      kernel/time/itimer.o
  CC      kernel/events/core.o
  CC      fs/proc/uptime.o
  AR      net/sched/built-in.a
  CC      kernel/fork.o
  LDS     arch/x86/kernel/vmlinux.lds
  CC [M]  sound/pci/hda/hda_eld.o
  CC      arch/x86/mm/pkeys.o
  CC      io_uring/fdinfo.o
  CC      arch/x86/kernel/kprobes/opt.o
  CC      net/unix/af_unix.o
  AR      drivers/video/backlight/built-in.a
  CC [M]  sound/core/hwdep.o
  CC      drivers/video/fbdev/core/fb_notify.o
  CC      lib/xz/xz_dec_stream.o
  CC      drivers/pci/pcie/pme.o
  AR      arch/x86/kernel/acpi/built-in.a
  CC      arch/x86/kernel/kprobes/ftrace.o
  AR      net/ipv6/netfilter/built-in.a
  CC [M]  net/ipv6/netfilter/nf_defrag_ipv6_hooks.o
  CC      net/core/net_namespace.o
  CC [M]  sound/hda/hdac_sysfs.o
  CC      arch/x86/events/intel/cstate.o
  CC      lib/zstd/compress/zstd_compress_literals.o
  CC      lib/zstd/compress/zstd_compress_sequences.o
  CC [M]  drivers/video/fbdev/core/fb_backlight.o
  CC      kernel/events/ring_buffer.o
  CC      lib/zstd/compress/zstd_compress_superblock.o
  CC      fs/ext4/extents.o
  CC      fs/proc/util.o
  CC      fs/ext4/extents_status.o
  CC      drivers/pci/hotplug/acpiphp_glue.o
  CC      io_uring/tctx.o
  CC      lib/zstd/compress/zstd_double_fast.o
  CC      mm/swap.o
  CC      arch/x86/mm/pti.o
  CC      net/ethtool/wol.o
  CC      fs/proc/version.o
  CC [M]  net/ipv4/netfilter/ip_tables.o
  CC      lib/xz/xz_dec_lzma2.o
  CC      net/netfilter/nf_log.o
  CC      net/netfilter/nf_queue.o
  CC      block/blk-mq-tag.o
  CC      lib/zstd/compress/zstd_fast.o
  CC      net/xfrm/xfrm_output.o
  CC      kernel/time/clockevents.o
  CC [M]  drivers/video/fbdev/core/fb_info.o
  CC      net/ethtool/features.o
  CC      arch/x86/kernel/apic/apic_noop.o
  CC      drivers/idle/intel_idle.o
  CC [M]  sound/core/pcm.o
  AR      arch/x86/kernel/kprobes/built-in.a
  CC [M]  drivers/video/fbdev/core/fbmem.o
  CC      drivers/pci/pcie/dpc.o
  AR      drivers/char/ipmi/built-in.a
  CC      net/ethtool/privflags.o
  CC      fs/jbd2/transaction.o
  CC      fs/jbd2/commit.o
  CC [M]  drivers/video/fbdev/core/fbmon.o
  CC [M]  sound/hda/hdac_regmap.o
  AR      arch/x86/events/intel/built-in.a
  AR      arch/x86/events/built-in.a
  CC      kernel/trace/trace_events_trigger.o
  CC [M]  sound/hda/hdac_controller.o
  CC      kernel/trace/trace_eprobe.o
  CC      fs/proc/softirqs.o
  CC      drivers/pci/search.o
  CC      kernel/trace/trace_kprobe.o
  CC      arch/x86/kernel/apic/ipi.o
  CC      io_uring/poll.o
  CC [M]  sound/pci/hda/hda_intel.o
  AR      arch/x86/mm/built-in.a
  CC      arch/x86/kernel/apic/vector.o
  CC [M]  net/ipv6/netfilter/nf_conntrack_reasm.o
  CC      drivers/pci/pci-sysfs.o
  CC      lib/xz/xz_dec_bcj.o
  CC      kernel/time/tick-common.o
  AR      drivers/pci/hotplug/built-in.a
  CC      drivers/pci/rom.o
  LD [M]  sound/pci/hda/snd-hda-codec.o
  CC [M]  drivers/video/fbdev/core/fbcmap.o
  LD [M]  sound/pci/hda/snd-hda-codec-generic.o
  AR      kernel/bpf/built-in.a
  LD [M]  sound/pci/hda/snd-hda-codec-realtek.o
  CC [M]  arch/x86/kvm/../../../virt/kvm/kvm_main.o
  CC      kernel/events/callchain.o
  CC      fs/proc/namespaces.o
  CC [M]  sound/hda/hdac_stream.o
  AR      drivers/pci/pcie/built-in.a
  CC      arch/x86/kernel/apic/hw_nmi.o
  CC [M]  sound/hda/array.o
  CC      net/core/secure_seq.o
  CC [M]  sound/core/pcm_native.o
  CC [M]  arch/x86/kvm/../../../virt/kvm/eventfd.o
  CC      block/blk-stat.o
  LD [M]  sound/pci/hda/snd-hda-codec-analog.o
  CC      net/ipv6/af_inet6.o
  CC      drivers/acpi/acpica/dsargs.o
  CC      net/ethtool/rings.o
  CC      net/ipv6/anycast.o
  CC [M]  net/netfilter/ipvs/ip_vs_sched.o
  AR      lib/xz/built-in.a
  CC [M]  arch/x86/kvm/../../../virt/kvm/binary_stats.o
  CC      drivers/acpi/apei/apei-base.o
  CC      crypto/hmac.o
  AR      drivers/idle/built-in.a
  CC      mm/truncate.o
  CC      net/xfrm/xfrm_sysctl.o
  AR      kernel/sched/built-in.a
  CC      mm/vmscan.o
  CC      drivers/video/aperture.o
  CC [M]  net/ipv4/netfilter/iptable_filter.o
  CC      net/ipv6/ip6_output.o
  CC      fs/jbd2/recovery.o
  CC [M]  net/ipv4/netfilter/iptable_mangle.o
  CC      drivers/acpi/acpica/dscontrol.o
  AR      drivers/acpi/pmic/built-in.a
  CC [M]  net/ipv4/netfilter/iptable_nat.o
  CC      net/ethtool/channels.o
  CC      net/packet/af_packet.o
  CC      fs/proc/self.o
  CC [M]  net/ipv4/netfilter/ipt_REJECT.o
  CC [M]  drivers/video/fbdev/core/modedb.o
  CC      drivers/acpi/dptf/int340x_thermal.o
  CC      kernel/time/tick-broadcast.o
  CC [M]  sound/hda/hdmi_chmap.o
  CC      drivers/acpi/acpica/dsdebug.o
  CC      net/xfrm/xfrm_replay.o
  CC [M]  net/netfilter/ipvs/ip_vs_xmit.o
  CC      block/blk-mq-sysfs.o
  CC [M]  net/netfilter/ipvs/ip_vs_app.o
  CC [M]  net/netfilter/ipvs/ip_vs_sync.o
  CC      io_uring/cancel.o
  CC      arch/x86/kernel/cpu/powerflags.o
  CC      net/ipv6/ip6_input.o
  CC      drivers/pci/setup-res.o
  CC      net/unix/garbage.o
  CC      fs/jbd2/checkpoint.o
  CC [M]  net/netfilter/ipvs/ip_vs_est.o
  CC      crypto/vmac.o
  CC      arch/x86/kernel/cpu/feat_ctl.o
  LD [M]  net/ipv6/netfilter/nf_defrag_ipv6.o
  CC      arch/x86/kernel/apic/io_apic.o
  CC      net/packet/diag.o
  CC      drivers/acpi/apei/hest.o
  CC      fs/jbd2/revoke.o
  CC      drivers/acpi/acpica/dsfield.o
  CC      net/core/flow_dissector.o
  AR      drivers/acpi/dptf/built-in.a
  CC      fs/jbd2/journal.o
  CC      fs/proc/thread_self.o
  CC [M]  sound/hda/trace.o
  LD [M]  sound/pci/hda/snd-hda-codec-hdmi.o
  CC [M]  net/netfilter/ipvs/ip_vs_proto.o
  CC      net/unix/sysctl_net_unix.o
  LD [M]  sound/pci/hda/snd-hda-intel.o
  AR      sound/pci/oxygen/built-in.a
  AR      sound/pci/pcxhr/built-in.a
  CC      kernel/time/tick-broadcast-hrtimer.o
  AR      sound/pci/riptide/built-in.a
  AR      sound/pci/rme9652/built-in.a
  CC      fs/proc/proc_sysctl.o
  AR      sound/pci/trident/built-in.a
  AR      sound/pci/ymfpci/built-in.a
  CC      kernel/trace/error_report-traces.o
  AR      sound/pci/vx222/built-in.a
  CC      arch/x86/kernel/apic/msi.o
  AR      sound/pci/built-in.a
  CC      kernel/events/hw_breakpoint.o
  CC      net/core/sysctl_net_core.o
  AR      sound/virtio/built-in.a
  CC      sound/sound_core.o
  CC      net/xfrm/xfrm_device.o
  CC      arch/x86/kernel/cpu/intel.o
  CC      kernel/events/uprobes.o
  CC      net/ethtool/coalesce.o
  CC      block/blk-mq-cpumap.o
  CC      kernel/trace/power-traces.o
  CC      net/ipv4/route.o
  CC      net/ipv6/addrconf.o
  CC [M]  drivers/video/fbdev/core/fbcvt.o
  CC      drivers/acpi/acpica/dsinit.o
  CC      io_uring/kbuf.o
  CC      drivers/acpi/apei/erst.o
  CC      kernel/time/tick-oneshot.o
  CC      drivers/pci/irq.o
  CC      net/ethtool/pause.o
  CC      kernel/exec_domain.o
  CC      net/netfilter/nf_sockopt.o
  CC      crypto/xcbc.o
  CC      crypto/crypto_null.o
  CC      net/unix/diag.o
  CC      drivers/pci/vpd.o
  CC      net/ethtool/eee.o
  CC      drivers/acpi/acpica/dsmethod.o
  CC [M]  sound/hda/hdac_component.o
  CC      kernel/time/tick-sched.o
  CC      fs/proc/proc_net.o
  CC      net/xfrm/xfrm_algo.o
  CC      drivers/pnp/pnpacpi/core.o
  CC      net/unix/scm.o
  CC      drivers/pnp/core.o
  CC      block/blk-mq-sched.o
  CC      drivers/pci/setup-bus.o
  CC      drivers/video/cmdline.o
  CC      kernel/panic.o
  CC [M]  drivers/video/fbdev/core/fb_cmdline.o
  CC      lib/zstd/compress/zstd_lazy.o
  CC      drivers/pnp/pnpacpi/rsparser.o
  CC      net/key/af_key.o
  CC      crypto/md5.o
  CC      net/ethtool/tsinfo.o
  CC      fs/ext4/file.o
  CC      drivers/acpi/acpica/dsmthdat.o
  CC      net/ipv6/addrlabel.o
  CC      drivers/pnp/card.o
  CC      arch/x86/kernel/cpu/intel_pconfig.o
  CC [M]  drivers/video/fbdev/core/fb_io_fops.o
  CC [M]  net/netfilter/ipvs/ip_vs_pe.o
  CC      drivers/acpi/apei/bert.o
  CC      drivers/acpi/apei/ghes.o
  CC      net/netfilter/utils.o
  CC      net/core/dev.o
  CC      arch/x86/kernel/apic/x2apic_phys.o
  CC      io_uring/rsrc.o
  CC      crypto/sha1_generic.o
  CC      arch/x86/kernel/cpu/tsx.o
  CC [M]  sound/hda/hdac_i915.o
  CC      drivers/pci/vc.o
  CC      net/ethtool/cabletest.o
  CC      net/core/dev_addr_lists.o
  CC      arch/x86/kernel/cpu/intel_epb.o
  CC [M]  sound/core/pcm_lib.o
  CC      fs/proc/kcore.o
  AR      net/bridge/netfilter/built-in.a
  CC      net/bridge/br.o
  CC      fs/proc/kmsg.o
  CC      fs/proc/page.o
  CC      drivers/acpi/acpica/dsobject.o
  AR      net/unix/built-in.a
  CC      crypto/sha256_generic.o
  CC      net/core/dst.o
  CC      kernel/trace/rpm-traces.o
  CC      block/ioctl.o
  CC      net/ethtool/tunnels.o
  AR      net/dsa/built-in.a
  CC      kernel/time/vsyscall.o
  CC [M]  arch/x86/kvm/../../../virt/kvm/vfio.o
  CC [M]  net/netfilter/nfnetlink.o
  CC      net/xfrm/xfrm_user.o
  CC      drivers/pci/mmap.o
  CC      net/bridge/br_device.o
  CC [M]  sound/core/pcm_misc.o
  CC      drivers/pci/setup-irq.o
  CC      arch/x86/kernel/apic/x2apic_cluster.o
  CC      arch/x86/kernel/cpu/amd.o
  CC      sound/last.o
  CC [M]  net/sunrpc/auth_gss/auth_gss.o
  AR      drivers/pnp/pnpacpi/built-in.a
  CC [M]  sound/hda/intel-dsp-config.o
  CC      drivers/pnp/driver.o
  CC [M]  drivers/video/fbdev/core/fb_defio.o
  CC [M]  net/sunrpc/auth_gss/gss_generic_token.o
  CC      net/sunrpc/clnt.o
  CC      drivers/acpi/acpica/dsopcode.o
  CC [M]  net/sunrpc/auth_gss/gss_mech_switch.o
  CC      kernel/time/timekeeping_debug.o
  CC      fs/ext4/fsmap.o
  CC      kernel/cpu.o
  CC      crypto/sha512_generic.o
  CC [M]  net/netfilter/ipvs/ip_vs_proto_tcp.o
  CC [M]  net/netfilter/ipvs/ip_vs_proto_udp.o
  CC      net/8021q/vlan_core.o
  CC      drivers/pci/proc.o
  CC      drivers/pci/slot.o
  AR      fs/jbd2/built-in.a
  AR      drivers/acpi/apei/built-in.a
  CC [M]  net/8021q/vlan.o
  CC      drivers/pci/pci-acpi.o
  CC      net/dcb/dcbnl.o
  CC      drivers/acpi/tables.o
  CC      io_uring/rw.o
  AR      fs/proc/built-in.a
  CC      kernel/trace/trace_dynevent.o
  CC      net/dcb/dcbevent.o
  CC [M]  net/8021q/vlan_dev.o
  CC      io_uring/opdef.o
  CC      crypto/blake2b_generic.o
  CC      net/ethtool/fec.o
  CC      arch/x86/kernel/apic/apic_flat_64.o
  CC      drivers/pnp/resource.o
  CC      block/genhd.o
  CC      drivers/acpi/acpica/dspkginit.o
  AR      net/packet/built-in.a
  CC      drivers/acpi/acpica/dsutils.o
  CC      drivers/pci/quirks.o
  CC [M]  sound/hda/intel-nhlt.o
  CC [M]  net/sunrpc/auth_gss/svcauth_gss.o
  CC      kernel/time/namespace.o
  CC      arch/x86/kernel/cpu/hygon.o
  CC      net/ipv4/inetpeer.o
  CC      mm/shmem.o
  CC      net/ethtool/eeprom.o
  CC      drivers/pnp/manager.o
  CC [M]  drivers/video/fbdev/core/fb_chrdev.o
  CC      net/bridge/br_fdb.o
  CC      io_uring/notif.o
  CC [M]  net/netfilter/nf_conntrack_core.o
  CC      net/core/netevent.o
  CC      mm/util.o
  CC      drivers/acpi/acpica/dswexec.o
  CC      block/ioprio.o
  CC      arch/x86/kernel/apic/probe_64.o
  CC [M]  sound/core/pcm_memory.o
  CC [M]  arch/x86/kvm/../../../virt/kvm/coalesced_mmio.o
  CC      net/l3mdev/l3mdev.o
  CC      kernel/trace/trace_probe.o
  CC      arch/x86/kernel/cpu/centaur.o
  CC [M]  sound/hda/intel-sdw-acpi.o
  CC      arch/x86/kernel/cpu/zhaoxin.o
  CC      kernel/trace/trace_uprobe.o
  CC      kernel/trace/rethook.o
  AR      arch/x86/kernel/apic/built-in.a
  CC      crypto/ecb.o
  CC [M]  arch/x86/kvm/../../../virt/kvm/async_pf.o
  CC [M]  arch/x86/kvm/../../../virt/kvm/irqchip.o
  CC      net/ipv4/protocol.o
  CC      drivers/pci/ats.o
  AR      kernel/time/built-in.a
  CC      arch/x86/kernel/cpu/perfctr-watchdog.o
  CC      fs/ext4/fsync.o
  LD [M]  sound/hda/snd-hda-core.o
  CC      drivers/pnp/support.o
  CC      mm/mmzone.o
  CC      arch/x86/kernel/cpu/vmware.o
  CC [M]  net/netfilter/ipvs/ip_vs_nfct.o
  CC      kernel/exit.o
  CC      drivers/pci/iov.o
  CC      drivers/acpi/acpica/dswload.o
  AR      net/key/built-in.a
  CC      mm/vmstat.o
  CC [M]  net/8021q/vlan_netlink.o
  AR      kernel/events/built-in.a
  CC      net/core/neighbour.o
  CC      drivers/pnp/interface.o
  LD [M]  sound/hda/snd-intel-dspcfg.o
  LD [M]  sound/hda/snd-intel-sdw-acpi.o
  CC      arch/x86/kernel/cpu/hypervisor.o
  CC [M]  sound/core/memalloc.o
  CC [M]  sound/core/pcm_timer.o
  CC      net/ethtool/stats.o
  LD [M]  sound/core/snd-ctl-led.o
  CC      crypto/cbc.o
  LD [M]  sound/core/snd-hwdep.o
  CC      net/core/rtnetlink.o
  CC      fs/ramfs/inode.o
  CC [M]  drivers/video/fbdev/core/fb_procfs.o
  CC      fs/hugetlbfs/inode.o
  AR      sound/core/built-in.a
  CC [M]  arch/x86/kvm/../../../virt/kvm/dirty_ring.o
  CC      io_uring/io-wq.o
  CC      block/badblocks.o
  CC [M]  net/8021q/vlanproc.o
  CC      mm/backing-dev.o
  CC      block/blk-rq-qos.o
  AR      net/l3mdev/built-in.a
  CC      fs/ext4/hash.o
  CC      net/core/utils.o
  CC      crypto/pcbc.o
  CC [M]  net/netfilter/nf_conntrack_standalone.o
  CC      drivers/acpi/acpica/dswload2.o
  CC      net/handshake/genl.o
  CC      arch/x86/kernel/cpu/mshyperv.o
  AS      arch/x86/kernel/head_64.o
  CC      drivers/pci/pci-label.o
  CC [M]  net/bluetooth/af_bluetooth.o
  CC [M]  arch/x86/kvm/../../../virt/kvm/pfncache.o
  CC      net/bridge/br_forward.o
  AR      net/xfrm/built-in.a
  CC      drivers/video/nomodeset.o
  AR      net/dcb/built-in.a
  CC      net/handshake/netlink.o
  CC      drivers/pnp/quirks.o
  CC [M]  net/sunrpc/auth_gss/gss_rpc_upcall.o
  CC      fs/ext4/ialloc.o
  CC      net/ipv4/ip_input.o
  CC      kernel/softirq.o
  AR      net/8021q/built-in.a
  CC      kernel/resource.o
  CC      crypto/cts.o
  CC [M]  drivers/video/fbdev/core/fbsysfs.o
  CC      crypto/lrw.o
  CC      fs/ramfs/file-mmu.o
  CC      drivers/acpi/acpica/dswscope.o
  CC      drivers/pci/pci-stub.o
  CC [M]  net/netfilter/ipvs/ip_vs_rr.o
  CC      net/ipv4/ip_fragment.o
  CC      net/ipv4/ip_forward.o
  CC [M]  net/dns_resolver/dns_key.o
  CC      net/ipv6/route.o
  LD [M]  sound/core/snd-pcm.o
  CC      net/ipv4/ip_options.o
  CC      block/disk-events.o
  AR      sound/built-in.a
  CC      drivers/pnp/system.o
  CC      crypto/xts.o
  CC      net/handshake/request.o
  CC [M]  net/dns_resolver/dns_query.o
  CC      drivers/video/hdmi.o
  CC      net/sunrpc/xprt.o
  LD [M]  net/8021q/8021q.o
  CC      arch/x86/kernel/cpu/capflags.o
  CC      net/devres.o
  CC      net/ethtool/phc_vclocks.o
  AR      arch/x86/kernel/cpu/built-in.a
  CC      arch/x86/kernel/head64.o
  CC      block/blk-ia-ranges.o
  CC      drivers/acpi/acpica/dswstate.o
  CC      net/handshake/tlshd.o
  AR      fs/ramfs/built-in.a
  CC      fs/fat/cache.o
  AR      kernel/trace/built-in.a
  CC      fs/fat/dir.o
  CC      net/sunrpc/socklib.o
  CC      drivers/acpi/blacklist.o
  CC [M]  arch/x86/kvm/x86.o
  CC      fs/fat/fatent.o
  AR      fs/hugetlbfs/built-in.a
  CC      arch/x86/kernel/ebda.o
  CC      net/sunrpc/xprtsock.o
  CC      drivers/pci/vgaarb.o
  AR      drivers/pnp/built-in.a
  CC [M]  arch/x86/kvm/emulate.o
  CC      drivers/acpi/acpica/evevent.o
  CC      arch/x86/kernel/platform-quirks.o
  CC [M]  net/netfilter/nf_conntrack_expect.o
  CC      drivers/acpi/osi.o
  CC [M]  net/sunrpc/auth_gss/gss_rpc_xdr.o
  CC [M]  drivers/video/fbdev/core/fbcon.o
  CC      net/sunrpc/sched.o
  LD [M]  net/dns_resolver/dns_resolver.o
  AR      drivers/video/fbdev/omap/built-in.a
  CC [M]  drivers/video/fbdev/core/bitblit.o
  CC      mm/mm_init.o
  CC      net/bridge/br_if.o
  AR      io_uring/built-in.a
  CC [M]  net/bluetooth/hci_core.o
  CC      mm/percpu.o
  CC      kernel/sysctl.o
  CC [M]  drivers/video/fbdev/core/softcursor.o
  CC [M]  drivers/video/fbdev/core/tileblit.o
  CC      crypto/ctr.o
  CC      net/socket.o
  AR      drivers/video/fbdev/omap2/omapfb/dss/built-in.a
  CC      net/sunrpc/auth.o
  AR      drivers/video/fbdev/omap2/omapfb/displays/built-in.a
  AR      drivers/video/fbdev/omap2/omapfb/built-in.a
  AR      drivers/video/fbdev/omap2/built-in.a
  CC      net/sunrpc/auth_null.o
  CC [M]  drivers/video/fbdev/core/cfbfillrect.o
  CC [M]  drivers/video/fbdev/uvesafb.o
  CC      mm/slab_common.o
  LD [M]  net/netfilter/ipvs/ip_vs.o
  CC      lib/raid6/algos.o
  CC      block/bsg.o
  CC      lib/raid6/recov.o
  CC      drivers/acpi/acpica/evgpe.o
  CC      arch/x86/kernel/process_64.o
  HOSTCC  lib/raid6/mktables
  CC      net/ethtool/mm.o
  CC [M]  net/bluetooth/hci_conn.o
  CC      arch/x86/kernel/signal.o
  UNROLL  lib/raid6/int1.c
  UNROLL  lib/raid6/int2.c
  CC      drivers/acpi/osl.o
  CC      net/ipv4/ip_output.o
  UNROLL  lib/raid6/int4.c
  CC      net/ipv4/ip_sockglue.o
  CC      net/ipv4/inet_hashtables.o
  CC      net/handshake/trace.o
  CC      lib/fonts/fonts.o
  CC      crypto/gcm.o
  CC      lib/fonts/font_8x8.o
  CC      lib/fonts/font_8x16.o
  CC      drivers/acpi/acpica/evgpeblk.o
  CC [M]  net/sunrpc/auth_gss/trace.o
  CC      drivers/acpi/utils.o
  CC      kernel/capability.o
  CC      block/bsg-lib.o
  CC      kernel/ptrace.o
  UNROLL  lib/raid6/int8.c
  UNROLL  lib/raid6/int16.c
  UNROLL  lib/raid6/int32.c
  CC      lib/raid6/recov_ssse3.o
  CC [M]  net/bluetooth/hci_event.o
  AR      drivers/pci/built-in.a
  CC      arch/x86/kernel/signal_64.o
  CC      arch/x86/kernel/traps.o
  CC      fs/ext4/indirect.o
  CC      net/compat.o
  CC      fs/ext4/inline.o
  CC      net/sysctl_net.o
  CC      fs/fat/file.o
  CC      net/sunrpc/auth_unix.o
  CC      arch/x86/kernel/idt.o
  AR      lib/fonts/built-in.a
  CC      lib/raid6/recov_avx2.o
  CC      drivers/acpi/acpica/evgpeinit.o
  CC      net/ethtool/module.o
  CC      arch/x86/kernel/irq.o
  CC      arch/x86/kernel/irq_64.o
  CC      net/bridge/br_input.o
  CC [M]  net/netfilter/nf_conntrack_helper.o
  CC      lib/raid6/mmx.o
  CC      lib/raid6/sse1.o
  CC      lib/raid6/sse2.o
  CC [M]  net/bluetooth/mgmt.o
  CC      block/blk-cgroup.o
  CC      net/core/link_watch.o
  CC      mm/compaction.o
  CC      kernel/user.o
  CC      fs/nfs/client.o
  CC      arch/x86/kernel/dumpstack_64.o
  CC      crypto/pcrypt.o
  AR      net/handshake/built-in.a
  CC      arch/x86/kernel/time.o
  CC      drivers/acpi/acpica/evgpeutil.o
  CC [M]  drivers/video/fbdev/core/cfbcopyarea.o
  CC      arch/x86/kernel/ioport.o
  CC      net/core/filter.o
  CC [M]  net/bluetooth/hci_sock.o
  CC      fs/fat/inode.o
  CC [M]  net/bluetooth/hci_sysfs.o
  CC [M]  net/sunrpc/auth_gss/gss_krb5_mech.o
  CC      crypto/cryptd.o
  AR      drivers/amba/built-in.a
  CC      kernel/signal.o
  CC [M]  net/bluetooth/l2cap_core.o
  AR      drivers/clk/actions/built-in.a
  CC [M]  net/bluetooth/l2cap_sock.o
  AR      drivers/clk/analogbits/built-in.a
  AR      drivers/clk/bcm/built-in.a
  AR      drivers/clk/imgtec/built-in.a
  AR      drivers/clk/imx/built-in.a
  CC      kernel/sys.o
  AR      drivers/clk/ingenic/built-in.a
  AR      drivers/clk/mediatek/built-in.a
  AR      drivers/clk/microchip/built-in.a
  CC [M]  net/bluetooth/smp.o
  AR      drivers/clk/mstar/built-in.a
  AR      drivers/clk/mvebu/built-in.a
  AR      drivers/clk/ralink/built-in.a
  AR      drivers/clk/renesas/built-in.a
  CC      drivers/acpi/acpica/evglock.o
  CC      arch/x86/kernel/dumpstack.o
  AR      drivers/clk/socfpga/built-in.a
  AR      drivers/clk/sprd/built-in.a
  AR      drivers/clk/starfive/built-in.a
  AR      drivers/clk/sunxi-ng/built-in.a
  CC      lib/raid6/avx2.o
  CC      net/ethtool/pse-pd.o
  AR      drivers/clk/ti/built-in.a
  AR      drivers/clk/versatile/built-in.a
  CC      lib/raid6/avx512.o
  CC      drivers/clk/x86/clk-lpss-atom.o
  CC      arch/x86/kernel/nmi.o
  CC      crypto/des_generic.o
  CC      net/ipv4/inet_timewait_sock.o
  CC      net/core/sock_diag.o
  CC      drivers/dma/dw/core.o
  CC      net/sunrpc/svc.o
  CC      mm/interval_tree.o
  CC      crypto/aes_generic.o
  CC      drivers/acpi/acpica/evhandler.o
  CC      arch/x86/kernel/ldt.o
  CC      fs/ext4/inode.o
  CC      drivers/dma/hsu/hsu.o
  CC [M]  drivers/video/fbdev/core/cfbimgblt.o
  CC      drivers/clk/x86/clk-pmc-atom.o
  CC [M]  net/netfilter/nf_conntrack_proto.o
  CC      net/bridge/br_ioctl.o
  AR      drivers/soc/apple/built-in.a
  AR      drivers/soc/aspeed/built-in.a
  AR      drivers/soc/bcm/bcm63xx/built-in.a
  CC      net/bridge/br_stp.o
  AR      drivers/soc/bcm/built-in.a
  CC [M]  net/bluetooth/lib.o
  CC      net/core/dev_ioctl.o
  AR      drivers/soc/fsl/built-in.a
  AR      drivers/soc/fujitsu/built-in.a
  CC [M]  net/sunrpc/auth_gss/gss_krb5_seal.o
  AR      drivers/soc/imx/built-in.a
  AR      drivers/soc/ixp4xx/built-in.a
  AR      drivers/soc/loongson/built-in.a
  CC [M]  net/sunrpc/auth_gss/gss_krb5_unseal.o
  AR      drivers/soc/mediatek/built-in.a
  AR      drivers/soc/microchip/built-in.a
  AR      drivers/soc/nuvoton/built-in.a
  AR      drivers/soc/pxa/built-in.a
  AR      drivers/soc/amlogic/built-in.a
  CC      arch/x86/kernel/setup.o
  CC      lib/raid6/recov_avx512.o
  AR      drivers/soc/qcom/built-in.a
  CC      mm/list_lru.o
  AR      drivers/soc/renesas/built-in.a
  AR      drivers/soc/rockchip/built-in.a
  AR      drivers/soc/sifive/built-in.a
  CC      net/bridge/br_stp_bpdu.o
  AR      drivers/soc/sunxi/built-in.a
  CC [M]  net/bluetooth/ecdh_helper.o
  AR      drivers/soc/ti/built-in.a
  CC      net/ipv6/ip6_fib.o
  AR      drivers/soc/xilinx/built-in.a
  AR      drivers/soc/built-in.a
  CC [M]  drivers/video/fbdev/core/sysfillrect.o
  CC      drivers/virtio/virtio.o
  CC      drivers/acpi/acpica/evmisc.o
  CC      fs/fat/misc.o
  CC      net/ethtool/plca.o
  CC [M]  net/netfilter/nf_conntrack_proto_generic.o
  AR      drivers/clk/x86/built-in.a
  CC      drivers/dma/dw/dw.o
  AR      drivers/clk/xilinx/built-in.a
  CC      drivers/clk/clk-devres.o
  CC      fs/nfs/dir.o
  CC      block/blk-cgroup-rwstat.o
  CC      mm/workingset.o
  AR      drivers/dma/hsu/built-in.a
  CC [M]  net/sunrpc/auth_gss/gss_krb5_seqnum.o
  CC      drivers/acpi/acpica/evregion.o
  CC      net/bridge/br_stp_if.o
  CC      drivers/dma/dw/idma32.o
  CC [M]  net/netfilter/nf_conntrack_proto_tcp.o
  TABLE   lib/raid6/tables.c
  CC      lib/raid6/int1.o
  CC [M]  net/netfilter/nf_conntrack_proto_udp.o
  CC      net/bridge/br_stp_timer.o
  CC      drivers/clk/clk-bulk.o
  CC      net/ipv4/inet_connection_sock.o
  CC      fs/fat/nfs.o
  CC      lib/raid6/int2.o
  CC      crypto/deflate.o
  CC      drivers/virtio/virtio_ring.o
  CC      fs/ext4/ioctl.o
  CC      arch/x86/kernel/x86_init.o
  CC      drivers/acpi/reboot.o
  CC      net/core/tso.o
  CC      net/core/sock_reuseport.o
  CC [M]  drivers/video/fbdev/core/syscopyarea.o
  CC      drivers/virtio/virtio_anchor.o
  CC      drivers/virtio/virtio_pci_modern_dev.o
  CC      mm/debug.o
  CC      drivers/virtio/virtio_pci_legacy_dev.o
  CC      net/ipv4/tcp.o
  CC [M]  net/sunrpc/auth_gss/gss_krb5_wrap.o
  CC      drivers/acpi/acpica/evrgnini.o
  CC      fs/nfs/file.o
  CC      block/blk-throttle.o
  CC [M]  drivers/video/fbdev/core/sysimgblt.o
  CC      drivers/clk/clkdev.o
  AR      net/ethtool/built-in.a
  CC      net/core/fib_notifier.o
  CC      arch/x86/kernel/i8259.o
  CC      net/ipv4/tcp_input.o
  CC [M]  net/netfilter/nf_conntrack_proto_icmp.o
  CC      drivers/dma/dw/acpi.o
  CC      lib/raid6/int4.o
  CC      fs/fat/namei_vfat.o
  CC      crypto/crc32c_generic.o
  CC      arch/x86/kernel/irqinit.o
  CC [M]  drivers/video/fbdev/simplefb.o
  CC      drivers/acpi/acpica/evsci.o
  CC      drivers/virtio/virtio_mmio.o
  CC      fs/fat/namei_msdos.o
  CC      net/ipv6/ipv6_sockglue.o
  CC      drivers/virtio/virtio_pci_modern.o
  CC [M]  net/sunrpc/auth_gss/gss_krb5_crypto.o
  CC      drivers/clk/clk.o
  CC      drivers/clk/clk-divider.o
  CC      drivers/acpi/acpica/evxface.o
  CC      net/bridge/br_netlink.o
  CC      crypto/crct10dif_common.o
  CC [M]  net/bluetooth/hci_request.o
  CC      fs/ext4/mballoc.o
  CC      mm/gup.o
  CC      arch/x86/kernel/jump_label.o
  CC      net/bridge/br_netlink_tunnel.o
  CC      lib/raid6/int8.o
  CC      fs/ext4/migrate.o
  CC      arch/x86/kernel/irq_work.o
  CC      crypto/crct10dif_generic.o
  CC [M]  drivers/video/fbdev/core/fb_sys_fops.o
  CC      drivers/dma/dw/pci.o
  CC      lib/raid6/int16.o
  CC      crypto/authenc.o
  CC      drivers/virtio/virtio_pci_common.o
  CC [M]  net/netfilter/nf_conntrack_extend.o
  CC      drivers/acpi/acpica/evxfevnt.o
  CC      drivers/tty/vt/vt_ioctl.o
  CC      kernel/umh.o
  CC      drivers/char/hw_random/core.o
  CC      drivers/char/agp/backend.o
  CC      drivers/char/agp/generic.o
  CC      drivers/char/agp/isoch.o
  AR      drivers/iommu/amd/built-in.a
  CC      drivers/char/agp/intel-agp.o
  CC      drivers/char/agp/intel-gtt.o
  CC      net/core/xdp.o
  CC      drivers/iommu/intel/dmar.o
  CC      drivers/tty/vt/vc_screen.o
  CC      arch/x86/kernel/probe_roms.o
  CC      fs/nfs/getroot.o
  CC      drivers/char/tpm/tpm-chip.o
  AR      fs/fat/built-in.a
  CC      drivers/char/mem.o
  AR      drivers/dma/dw/built-in.a
  AR      drivers/dma/idxd/built-in.a
  CC      net/ipv4/tcp_output.o
  AR      drivers/dma/mediatek/built-in.a
  CC      lib/raid6/int32.o
  AR      drivers/dma/qcom/built-in.a
  AR      drivers/dma/ti/built-in.a
  AR      drivers/dma/xilinx/built-in.a
  CC [M]  drivers/dma/ioat/init.o
  CC      fs/ext4/mmp.o
  CC      block/mq-deadline.o
  CC      net/ipv6/ndisc.o
  CC      drivers/acpi/acpica/evxfgpe.o
  LD [M]  drivers/video/fbdev/core/fb.o
  CC [M]  net/sunrpc/auth_gss/gss_krb5_keys.o
  AR      drivers/video/fbdev/core/built-in.a
  AR      drivers/video/fbdev/built-in.a
  AR      drivers/video/built-in.a
  CC      fs/ext4/move_extent.o
  CC      lib/argv_split.o
  CC      fs/nfs/inode.o
  CC      drivers/virtio/virtio_pci_legacy.o
  CC      net/ipv6/udp.o
  CC      net/ipv4/tcp_timer.o
  CC      crypto/authencesn.o
  CC      kernel/workqueue.o
  CC      fs/ext4/namei.o
  CC      drivers/char/hw_random/intel-rng.o
  LD [M]  net/sunrpc/auth_gss/auth_rpcgss.o
  CC      arch/x86/kernel/sys_ia32.o
  CC      net/bridge/br_arp_nd_proxy.o
  CC      lib/raid6/tables.o
  CC      fs/nfs/super.o
  CC      fs/nfs/io.o
  CC      drivers/char/tpm/tpm-dev-common.o
  CC [M]  net/netfilter/nf_conntrack_acct.o
  CC      fs/ext4/page-io.o
  CC      fs/ext4/readpage.o
  CC      drivers/iommu/intel/iommu.o
  CC      drivers/acpi/acpica/evxfregn.o
  CC      net/ipv6/udplite.o
  CC      fs/ext4/resize.o
  CC      drivers/char/random.o
  CC      fs/nfs/direct.o
  CC      drivers/tty/vt/selection.o
  CC [M]  net/netfilter/nf_conntrack_seqadj.o
  AR      drivers/char/agp/built-in.a
  CC      lib/zstd/compress/zstd_ldm.o
  CC      lib/zstd/compress/zstd_opt.o
  CC      fs/nfs/pagelist.o
  CC      net/sunrpc/svcsock.o
  LD [M]  net/sunrpc/auth_gss/rpcsec_gss_krb5.o
  CC [M]  drivers/virtio/virtio_mem.o
  CC      net/sunrpc/svcauth.o
  CC      drivers/acpi/acpica/exconcat.o
  AR      drivers/char/hw_random/built-in.a
  CC      drivers/char/misc.o
  CC      drivers/iommu/intel/pasid.o
  CC [M]  drivers/dma/ioat/dma.o
  AR      lib/raid6/built-in.a
  CC      lib/bug.o
  CC      drivers/char/tpm/tpm-dev.o
  CC      block/kyber-iosched.o
  CC      net/core/flow_offload.o
  CC      arch/x86/kernel/signal_32.o
  CC      lib/zstd/zstd_decompress_module.o
  CC      crypto/lzo.o
  CC      mm/mmap_lock.o
  CC      lib/zstd/decompress/huf_decompress.o
  CC      mm/highmem.o
  CC      drivers/iommu/intel/trace.o
  CC [M]  net/netfilter/nf_conntrack_proto_icmpv6.o
  CC      drivers/acpi/acpica/exconfig.o
  CC      drivers/acpi/acpica/exconvrt.o
  CC      crypto/lzo-rle.o
  CC      drivers/tty/vt/keyboard.o
  CC      drivers/char/tpm/tpm-interface.o
  CC      fs/exportfs/expfs.o
  CC      net/bridge/br_sysfs_if.o
  CC      lib/zstd/decompress/zstd_ddict.o
  CC      drivers/char/tpm/tpm1-cmd.o
  CC      drivers/char/tpm/tpm2-cmd.o
  CC      drivers/char/virtio_console.o
  CC      drivers/iommu/intel/cap_audit.o
  CC      lib/zstd/decompress/zstd_decompress.o
  CC      crypto/lz4.o
  CC [M]  net/bluetooth/mgmt_util.o
  CC      crypto/lz4hc.o
  CC      drivers/tty/vt/consolemap.o
  CC      crypto/xxhash_generic.o
  CC      drivers/acpi/acpica/excreate.o
  CC      drivers/acpi/acpica/exdebug.o
  CC      mm/memory.o
  CC      drivers/char/tpm/tpmrm-dev.o
  CC      fs/nfs/read.o
  CC      arch/x86/kernel/sys_x86_64.o
  CC      block/bfq-iosched.o
  CC [M]  drivers/dma/ioat/prep.o
  AR      fs/exportfs/built-in.a
  CC      drivers/clk/clk-fixed-factor.o
  CC      fs/lockd/clntlock.o
  CC      net/ipv6/raw.o
  CC      fs/nls/nls_base.o
  CC      fs/nls/nls_cp437.o
  CC      fs/nls/nls_ascii.o
  CC      fs/nls/nls_iso8859-1.o
  CC      net/sunrpc/svcauth_unix.o
  CC [M]  net/bluetooth/mgmt_config.o
  CC      crypto/rng.o
  CC      fs/nfs/symlink.o
  CC      drivers/acpi/acpica/exdump.o
  HOSTCC  drivers/tty/vt/conmakehash
  CC      fs/ext4/super.o
  CC      kernel/pid.o
  CC      fs/nfs/unlink.o
  AR      drivers/virtio/built-in.a
  AR      drivers/gpu/host1x/built-in.a
  CC      mm/mincore.o
  AR      drivers/gpu/drm/tests/built-in.a
  CC      mm/mlock.o
  CC [M]  drivers/gpu/drm/tests/drm_kunit_helpers.o
  CC [M]  net/netfilter/nf_conntrack_proto_dccp.o
  CC [M]  drivers/gpu/drm/tests/drm_buddy_test.o
  CC      mm/mmap.o
  CC      drivers/char/tpm/tpm2-space.o
  CC      arch/x86/kernel/espfix_64.o
  CC      drivers/clk/clk-fixed-rate.o
  CC      drivers/clk/clk-gate.o
  CC [M]  net/netfilter/nf_conntrack_proto_sctp.o
  AR      drivers/gpu/drm/arm/built-in.a
  CC      net/ipv4/tcp_ipv4.o
  CC      fs/nls/nls_utf8.o
  CC      lib/buildid.o
  CC      arch/x86/kernel/ksysfs.o
  CC      net/bridge/br_sysfs_br.o
  CC      drivers/acpi/acpica/exfield.o
  CC      drivers/acpi/acpica/exfldio.o
  CC      net/ipv6/icmp.o
  CC      drivers/tty/vt/vt.o
  CC [M]  net/bluetooth/hci_codec.o
  CC      drivers/connector/cn_queue.o
  CC      crypto/drbg.o
  AR      fs/nls/built-in.a
  CC      drivers/connector/connector.o
  CC      drivers/base/power/sysfs.o
  CC      drivers/base/power/generic_ops.o
  CC      drivers/char/hpet.o
  CC      drivers/base/power/common.o
  CC      lib/zstd/decompress/zstd_decompress_block.o
  CC      drivers/clk/clk-multiplier.o
  CC      drivers/base/firmware_loader/builtin/main.o
  CC [M]  drivers/dma/ioat/dca.o
  CC      net/ipv4/tcp_minisocks.o
  CC      fs/lockd/clntproc.o
  CC      drivers/acpi/acpica/exmisc.o
  CC      fs/nfs/write.o
  CC      mm/mmu_gather.o
  CC [M]  drivers/dma/ioat/sysfs.o
  CC      drivers/iommu/intel/irq_remapping.o
  CC      mm/mprotect.o
  CC      arch/x86/kernel/bootflag.o
  CC      arch/x86/kernel/e820.o
  CC      drivers/base/regmap/regmap.o
  CC      drivers/char/nvram.o
  CC      drivers/char/tpm/tpm-sysfs.o
  CC [M]  net/netfilter/nf_conntrack_netlink.o
  AR      drivers/base/firmware_loader/builtin/built-in.a
  CC      drivers/base/firmware_loader/main.o
  CC      drivers/base/power/qos.o
  CC [M]  drivers/gpu/drm/tests/drm_cmdline_parser_test.o
  CC      net/core/gro.o
  CC      drivers/clk/clk-mux.o
  CC      drivers/base/power/runtime.o
  AR      drivers/base/test/built-in.a
  CC      drivers/base/component.o
  CC [M]  drivers/gpu/drm/tests/drm_connector_test.o
  CC [M]  drivers/gpu/drm/tests/drm_damage_helper_test.o
  CC      drivers/acpi/acpica/exmutex.o
  CC      kernel/task_work.o
  CC      drivers/connector/cn_proc.o
  CC [M]  drivers/gpu/drm/tests/drm_dp_mst_helper_test.o
  CC      kernel/extable.o
  CC [M]  net/bluetooth/eir.o
  CC      drivers/acpi/acpica/exnames.o
  CC      drivers/base/power/wakeirq.o
  CC      drivers/acpi/acpica/exoparg1.o
  LD [M]  drivers/dma/ioat/ioatdma.o
  CC      drivers/clk/clk-composite.o
  CC      net/bridge/br_nf_core.o
  CC      drivers/clk/clk-fractional-divider.o
  CC      drivers/dma/dmaengine.o
  AR      drivers/gpu/drm/display/built-in.a
  CC      drivers/dma/virt-dma.o
  CC [M]  drivers/gpu/drm/display/drm_display_helper_mod.o
  CC      drivers/dma/acpi-dma.o
  CC [M]  drivers/gpu/drm/display/drm_dp_dual_mode_helper.o
  CC [M]  arch/x86/kvm/i8259.o
  CC      net/ipv6/mcast.o
  CC      net/sunrpc/addr.o
  CC      lib/zstd/zstd_common_module.o
  CC      drivers/char/tpm/eventlog/common.o
  CC      drivers/char/tpm/eventlog/tpm1.o
  CC      drivers/acpi/acpica/exoparg2.o
  CC [M]  arch/x86/kvm/irq.o
  CC      crypto/jitterentropy.o
  CC      crypto/jitterentropy-kcapi.o
  CC      net/bridge/br_multicast.o
  CC [M]  arch/x86/kvm/lapic.o
  CC      arch/x86/kernel/pci-dma.o
  CC      drivers/acpi/acpica/exoparg3.o
  CC      crypto/ghash-generic.o
  CC      drivers/base/power/main.o
  AR      drivers/base/firmware_loader/built-in.a
  CC      drivers/base/core.o
  CC      lib/zstd/common/debug.o
  CC      drivers/base/bus.o
  CC      drivers/clk/clk-gpio.o
  CC      net/bridge/br_mdb.o
  CC      fs/lockd/clntxdr.o
  CC      drivers/acpi/acpica/exoparg6.o
  CC      drivers/iommu/intel/perfmon.o
  CC [M]  arch/x86/kvm/i8254.o
  CC      fs/nfs/namespace.o
  AR      drivers/iommu/arm/arm-smmu/built-in.a
  CC      drivers/char/tpm/eventlog/tpm2.o
  AR      drivers/iommu/arm/arm-smmu-v3/built-in.a
  AR      drivers/iommu/arm/built-in.a
  CC [M]  drivers/gpu/drm/display/drm_dp_helper.o
  CC [M]  arch/x86/kvm/ioapic.o
  CC      kernel/params.o
  CC      net/ipv4/tcp_cong.o
  CC      mm/mremap.o
  CC [M]  drivers/gpu/drm/display/drm_dp_mst_topology.o
  CC      fs/lockd/host.o
  CC      fs/lockd/svc.o
  CC      lib/zstd/common/entropy_common.o
  CC      drivers/acpi/acpica/exprep.o
  CC [M]  net/bluetooth/hci_sync.o
  CC [M]  net/bluetooth/coredump.o
  CC      drivers/base/dd.o
  CC      net/sunrpc/rpcb_clnt.o
  AR      drivers/connector/built-in.a
  CC      crypto/af_alg.o
  CC      net/bridge/br_multicast_eht.o
  CC      kernel/kthread.o
  AR      drivers/iommu/iommufd/built-in.a
  CC      drivers/acpi/acpica/exregion.o
  AR      drivers/clk/built-in.a
  CC      mm/msync.o
  CC      net/core/netdev-genl.o
  CC      arch/x86/kernel/quirks.o
  CC      net/core/netdev-genl-gen.o
  CC      block/bfq-wf2q.o
  CC [M]  drivers/gpu/drm/display/drm_dsc_helper.o
  CC [M]  drivers/gpu/drm/tests/drm_format_helper_test.o
  CC      net/core/net-sysfs.o
  AR      drivers/dma/built-in.a
  CC      drivers/base/power/wakeup.o
  CC      lib/zstd/common/error_private.o
  CC      drivers/iommu/iommu.o
  CC      net/sunrpc/timer.o
  CC      drivers/char/tpm/tpm_ppi.o
  CC      arch/x86/kernel/topology.o
  COPY    drivers/tty/vt/defkeymap.c
  CONMK   drivers/tty/vt/consolemap_deftbl.c
  CC      drivers/tty/vt/defkeymap.o
  CC      mm/page_vma_mapped.o
  CC      kernel/sys_ni.o
  CC      drivers/acpi/acpica/exresnte.o
  CC      fs/lockd/svclock.o
  CC [M]  net/bluetooth/sco.o
  CC      drivers/tty/vt/consolemap_deftbl.o
  AR      drivers/tty/vt/built-in.a
  CC      drivers/tty/hvc/hvc_console.o
  CC      drivers/tty/serial/8250/8250_core.o
  CC [M]  net/bluetooth/iso.o
  CC      fs/nfs/mount_clnt.o
  CC      drivers/acpi/acpica/exresolv.o
  CC      drivers/tty/serial/serial_core.o
  AR      drivers/iommu/intel/built-in.a
  CC      mm/pagewalk.o
  CC      drivers/char/tpm/eventlog/acpi.o
  CC      drivers/acpi/acpica/exresop.o
  CC      drivers/base/syscore.o
  CC      arch/x86/kernel/kdebugfs.o
  CC [M]  net/netfilter/nf_nat_core.o
  CC      drivers/base/regmap/regcache.o
  CC [M]  drivers/gpu/drm/display/drm_hdcp_helper.o
  CC      drivers/acpi/nvs.o
  CC      drivers/acpi/wakeup.o
  CC      drivers/acpi/sleep.o
  CC      fs/lockd/svcshare.o
  CC      drivers/acpi/device_sysfs.o
  CC      drivers/base/driver.o
  CC [M]  drivers/gpu/drm/tests/drm_format_test.o
  CC [M]  arch/x86/kvm/irq_comm.o
  CC      mm/pgtable-generic.o
  CC      net/ipv4/tcp_metrics.o
  CC      net/ipv4/tcp_fastopen.o
  CC      block/bfq-cgroup.o
  CC      lib/zstd/common/fse_decompress.o
  CC      drivers/base/regmap/regcache-rbtree.o
  CC      drivers/iommu/iommu-traces.o
  CC      mm/rmap.o
  CC      drivers/base/power/wakeup_stats.o
  CC      kernel/nsproxy.o
  CC      drivers/acpi/acpica/exserial.o
  CC      mm/vmalloc.o
  CC      arch/x86/kernel/alternative.o
  CC      drivers/char/tpm/eventlog/efi.o
  CC      drivers/char/tpm/tpm_crb.o
  CC      crypto/algif_hash.o
  CC      mm/page_alloc.o
  CC [M]  drivers/gpu/drm/display/drm_hdmi_helper.o
  CC      drivers/base/class.o
  CC      net/sunrpc/xdr.o
  AR      drivers/tty/hvc/built-in.a
  CC      net/sunrpc/sunrpc_syms.o
  CC      net/sunrpc/cache.o
  AR      drivers/tty/ipwireless/built-in.a
  CC      drivers/base/regmap/regcache-flat.o
  CC      drivers/base/regmap/regcache-maple.o
  CC [M]  drivers/gpu/drm/tests/drm_framebuffer_test.o
  CC      fs/nfs/nfstrace.o
  CC      crypto/algif_skcipher.o
  CC      drivers/base/power/domain.o
  CC      drivers/tty/serial/8250/8250_pnp.o
  CC      drivers/tty/serial/8250/8250_port.o
  CC      drivers/acpi/acpica/exstore.o
  CC [M]  drivers/gpu/drm/tests/drm_managed_test.o
  CC      kernel/notifier.o
  CC [M]  arch/x86/kvm/cpuid.o
  CC [M]  net/bluetooth/a2mp.o
  CC      fs/lockd/svcproc.o
  CC      net/core/page_pool.o
  CC [M]  drivers/gpu/drm/tests/drm_mm_test.o
  CC [M]  arch/x86/kvm/pmu.o
  CC      lib/zstd/common/zstd_common.o
  CC      fs/nfs/export.o
  CC      net/sunrpc/rpc_pipe.o
  CC      net/bridge/br_vlan.o
  CC      mm/init-mm.o
  AR      lib/zstd/built-in.a
  CC      lib/cmdline.o
  CC      block/blk-mq-pci.o
  CC      drivers/iommu/iommu-sysfs.o
  CC [M]  arch/x86/kvm/mtrr.o
  CC      drivers/base/platform.o
  CC      drivers/tty/tty_io.o
  CC      drivers/acpi/acpica/exstoren.o
  CC      drivers/tty/serial/earlycon.o
  CC      drivers/acpi/device_pm.o
  CC      drivers/base/cpu.o
  AR      drivers/char/tpm/built-in.a
  CC      drivers/base/regmap/regmap-debugfs.o
  AR      drivers/char/built-in.a
  CC      lib/cpumask.o
  CC      drivers/block/loop.o
  AR      drivers/misc/eeprom/built-in.a
  AR      drivers/misc/cb710/built-in.a
  AR      drivers/misc/ti-st/built-in.a
  CC      drivers/tty/n_tty.o
  AR      drivers/misc/lis3lv02d/built-in.a
  AR      drivers/misc/cardreader/built-in.a
  CC [M]  net/netfilter/nf_nat_proto.o
  CC [M]  drivers/misc/mei/hdcp/mei_hdcp.o
  CC      drivers/tty/tty_ioctl.o
  CC      fs/ext4/symlink.o
  CC      net/ipv6/reassembly.o
  CC [M]  drivers/gpu/drm/display/drm_scdc_helper.o
  CC [M]  arch/x86/kvm/hyperv.o
  CC      drivers/acpi/acpica/exstorob.o
  CC [M]  drivers/gpu/drm/tests/drm_modes_test.o
  CC      crypto/xor.o
  CC      crypto/hash_info.o
  CC      arch/x86/kernel/i8253.o
  CC      crypto/simd.o
  CC      drivers/block/virtio_blk.o
  CC      kernel/ksysfs.o
  CC      drivers/iommu/dma-iommu.o
  CC      block/blk-mq-virtio.o
  CC      drivers/mfd/mfd-core.o
  CC      lib/ctype.o
  CC      drivers/mfd/intel-lpss.o
  CC      drivers/tty/tty_ldisc.o
  CC      lib/dec_and_lock.o
  CC      net/ipv4/tcp_rate.o
  CC      fs/lockd/svcsubs.o
  CC      drivers/acpi/acpica/exsystem.o
  CC      drivers/base/regmap/regmap-i2c.o
  CC [M]  arch/x86/kvm/debugfs.o
  CC      lib/decompress.o
  CC      kernel/cred.o
  CC      arch/x86/kernel/hw_breakpoint.o
  CC      drivers/base/firmware.o
  CC      lib/decompress_bunzip2.o
  CC [M]  drivers/misc/mei/pxp/mei_pxp.o
  CC [M]  crypto/md4.o
  CC      net/core/net-procfs.o
  CC      kernel/reboot.o
  CC      mm/memblock.o
  CC [M]  net/bluetooth/amp.o
  CC      drivers/base/power/domain_governor.o
  CC      net/core/netpoll.o
  CC [M]  drivers/gpu/drm/display/drm_dp_aux_dev.o
  CC      fs/ext4/sysfs.o
  CC      drivers/tty/tty_buffer.o
  CC [M]  net/bluetooth/hci_debugfs.o
  CC      block/blk-mq-debugfs.o
  AR      drivers/nfc/built-in.a
  CC      block/blk-pm.o
  CC      drivers/acpi/acpica/extrace.o
  CC      drivers/base/regmap/regmap-irq.o
  AR      fs/unicode/built-in.a
  CC      fs/ntfs/aops.o
  CC      drivers/mfd/intel-lpss-pci.o
  CC      fs/ntfs/attrib.o
  CC      drivers/tty/serial/8250/8250_dma.o
  CC [M]  net/netfilter/nf_nat_helper.o
  CC      fs/lockd/mon.o
  AR      drivers/dax/hmem/built-in.a
  CC      drivers/dax/super.o
  CC      drivers/tty/tty_port.o
  CC [M]  crypto/ccm.o
  CC      drivers/base/power/clock_ops.o
  CC [M]  arch/x86/kvm/mmu/mmu.o
  CC      net/bridge/br_vlan_tunnel.o
  CC [M]  drivers/gpu/drm/tests/drm_plane_helper_test.o
  CC      drivers/acpi/acpica/exutils.o
  CC [M]  arch/x86/kvm/mmu/page_track.o
  CC [M]  drivers/misc/mei/init.o
  CC      net/ipv6/tcp_ipv6.o
  CC [M]  net/netfilter/nf_nat_redirect.o
  CC      lib/decompress_inflate.o
  CC      arch/x86/kernel/tsc.o
  CC      drivers/tty/tty_mutex.o
  CC      arch/x86/kernel/tsc_msr.o
  CC [M]  crypto/arc4.o
  CC      fs/autofs/init.o
  CC      net/sunrpc/sysfs.o
  CC [M]  drivers/block/nbd.o
  CC      fs/autofs/inode.o
  CC      drivers/iommu/iova.o
  CC      fs/autofs/root.o
  CC      block/holder.o
  CC      net/ipv4/tcp_recovery.o
  CC [M]  net/netfilter/nf_nat_masquerade.o
  CC      drivers/dax/bus.o
  CC      drivers/mfd/intel-lpss-acpi.o
  CC      kernel/async.o
  CC      drivers/mfd/intel_soc_pmic_crc.o
  LD [M]  drivers/gpu/drm/display/drm_display_helper.o
  CC      arch/x86/kernel/io_delay.o
  CC      drivers/acpi/acpica/hwacpi.o
  AR      drivers/gpu/drm/renesas/rcar-du/built-in.a
  AR      drivers/gpu/drm/renesas/built-in.a
  CC      fs/autofs/symlink.o
  CC [M]  net/netfilter/x_tables.o
  CC [M]  crypto/ecc.o
  CC      drivers/tty/serial/8250/8250_dwlib.o
  AR      drivers/misc/built-in.a
  CC      drivers/tty/tty_ldsem.o
  AR      drivers/base/power/built-in.a
  CC      lib/decompress_unlz4.o
  CC      drivers/base/init.o
  CC      fs/autofs/waitq.o
  CC [M]  net/netfilter/xt_tcpudp.o
  CC [M]  net/netfilter/xt_mark.o
  CC      drivers/base/map.o
  CC [M]  drivers/misc/mei/hbm.o
  AR      drivers/gpu/vga/built-in.a
  CC      drivers/tty/tty_baudrate.o
  CC      fs/ntfs/collate.o
  CC      fs/ntfs/compress.o
  CC [M]  drivers/misc/mei/interrupt.o
  CC      fs/autofs/expire.o
  CC [M]  drivers/gpu/drm/tests/drm_probe_helper_test.o
  CC      net/sunrpc/svc_xprt.o
  AR      drivers/block/built-in.a
  CC      fs/lockd/trace.o
  CC      drivers/acpi/acpica/hwesleep.o
  CC      drivers/acpi/acpica/hwgpe.o
  AR      block/built-in.a
  AR      drivers/base/regmap/built-in.a
  CC [M]  drivers/gpu/drm/tests/drm_rect_test.o
  CC      net/sunrpc/xprtmultipath.o
  CC      kernel/range.o
  CC      net/ipv4/tcp_ulp.o
  CC [M]  drivers/misc/mei/client.o
  CC      fs/autofs/dev-ioctl.o
  CC      fs/lockd/xdr.o
  CC [M]  drivers/mfd/lpc_sch.o
  CC      kernel/smpboot.o
  LD [M]  net/bluetooth/bluetooth.o
  CC      drivers/iommu/irq_remapping.o
  CC      net/core/fib_rules.o
  CC      net/core/net-traces.o
  CC      net/sunrpc/stats.o
  CC      lib/decompress_unlzma.o
  CC      arch/x86/kernel/rtc.o
  CC      drivers/base/devres.o
  CC      drivers/base/attribute_container.o
  CC      net/ipv6/ping.o
  CC      net/bridge/br_vlan_options.o
  CC      drivers/tty/serial/8250/8250_pcilib.o
  CC      fs/lockd/clnt4xdr.o
  CC      net/bridge/br_mst.o
  CC      drivers/acpi/acpica/hwregs.o
  CC      fs/ntfs/debug.o
  CC      arch/x86/kernel/resource.o
  CC      net/ipv6/exthdrs.o
  CC      net/ipv6/datagram.o
  CC      drivers/tty/tty_jobctrl.o
  CC      fs/lockd/xdr4.o
  CC      lib/decompress_unlzo.o
  CC      drivers/tty/serial/8250/8250_pci.o
  CC      net/sunrpc/sysctl.o
  AR      drivers/dax/built-in.a
  CC      drivers/base/transport_class.o
  CC [M]  drivers/misc/mei/main.o
  CC [M]  crypto/essiv.o
  CC      drivers/dma-buf/dma-buf.o
  AR      drivers/gpu/drm/omapdrm/built-in.a
  CC [M]  drivers/misc/mei/dma-ring.o
  AR      drivers/gpu/drm/tilcdc/built-in.a
  AR      drivers/gpu/drm/imx/built-in.a
  CC [M]  drivers/mfd/lpc_ich.o
  AR      drivers/gpu/drm/i2c/built-in.a
  AR      drivers/gpu/drm/panel/built-in.a
  AR      fs/autofs/built-in.a
  CC      drivers/dma-buf/dma-fence.o
  AR      drivers/gpu/drm/bridge/analogix/built-in.a
  CC      kernel/ucount.o
  AR      drivers/gpu/drm/bridge/cadence/built-in.a
  AR      drivers/gpu/drm/bridge/imx/built-in.a
  AR      drivers/gpu/drm/bridge/synopsys/built-in.a
  AR      drivers/gpu/drm/bridge/built-in.a
  AR      drivers/iommu/built-in.a
  AR      drivers/gpu/drm/hisilicon/built-in.a
  AR      drivers/cxl/core/built-in.a
  AR      drivers/gpu/drm/mxsfb/built-in.a
  AR      drivers/cxl/built-in.a
  AR      drivers/gpu/drm/tiny/built-in.a
  CC      drivers/base/topology.o
  AR      drivers/gpu/drm/xlnx/built-in.a
  AS      arch/x86/kernel/irqflags.o
  AR      drivers/gpu/drm/gud/built-in.a
  AR      drivers/macintosh/built-in.a
  CC      fs/ntfs/dir.o
  CC      drivers/dma-buf/dma-fence-array.o
  AR      drivers/gpu/drm/solomon/built-in.a
  CC      drivers/scsi/scsi.o
  CC      arch/x86/kernel/static_call.o
  CC [M]  drivers/gpu/drm/ttm/ttm_tt.o
  CC      drivers/acpi/acpica/hwsleep.o
  CC      drivers/acpi/acpica/hwvalid.o
  CC      drivers/acpi/acpica/hwxface.o
  CC [M]  drivers/gpu/drm/scheduler/sched_main.o
  CC [M]  crypto/ecdh.o
  CC      net/ipv4/tcp_offload.o
  CC [M]  crypto/ecdh_helper.o
  CC      fs/nfs/sysfs.o
  CC [M]  drivers/gpu/drm/scheduler/sched_fence.o
  CC      lib/decompress_unxz.o
  CC      drivers/scsi/hosts.o
  CC [M]  arch/x86/kvm/mmu/spte.o
  CC      drivers/tty/serial/8250/8250_exar.o
  CC [M]  drivers/gpu/drm/ttm/ttm_bo.o
  CC      mm/memory_hotplug.o
  CC      drivers/tty/n_null.o
  CC      drivers/scsi/scsi_ioctl.o
  CC      lib/decompress_unzstd.o
  CC      kernel/regset.o
  CC      arch/x86/kernel/process.o
  CC [M]  drivers/gpu/drm/ttm/ttm_bo_util.o
  CC      fs/ext4/xattr.o
  CC      drivers/tty/pty.o
  CC      drivers/tty/sysrq.o
  CC      drivers/acpi/acpica/hwxfsleep.o
  CC      drivers/scsi/scsicam.o
  CC      mm/madvise.o
  CC [M]  net/netfilter/xt_nat.o
  CC      fs/ext4/xattr_hurd.o
  CC      drivers/base/container.o
  CC      drivers/scsi/scsi_error.o
  AR      drivers/mfd/built-in.a
  CC [M]  drivers/misc/mei/bus.o
  CC [M]  net/bridge/br_netfilter_hooks.o
  CC [M]  net/bridge/br_netfilter_ipv6.o
  CC      fs/lockd/svc4proc.o
  LD [M]  crypto/ecdh_generic.o
  AR      crypto/built-in.a
  CC [M]  net/netfilter/xt_REDIRECT.o
  CC      drivers/tty/serial/serial_mctrl_gpio.o
  CC      net/ipv6/ip6_flowlabel.o
  CC      drivers/nvme/host/core.o
  AR      drivers/nvme/target/built-in.a
  CC      drivers/ata/libata-core.o
  CC      kernel/groups.o
  CC [M]  drivers/misc/mei/bus-fixup.o
  CC      kernel/vhost_task.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.o
  CC [M]  drivers/misc/mei/debugfs.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_device.o
  CC      drivers/acpi/acpica/hwpci.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.o
  CC      drivers/tty/serial/8250/8250_early.o
  CC      lib/dump_stack.o
  CC      drivers/base/property.o
  CC [M]  drivers/misc/mei/mei-trace.o
  CC      drivers/dma-buf/dma-fence-chain.o
  CC [M]  drivers/gpu/drm/scheduler/sched_entity.o
  CC      fs/nfs/fs_context.o
  CC      drivers/tty/serial/8250/8250_dw.o
  CC      fs/ntfs/file.o
  CC      drivers/tty/serial/8250/8250_lpss.o
  CC      drivers/ata/libata-scsi.o
  CC      drivers/tty/serial/8250/8250_mid.o
  CC      drivers/acpi/acpica/nsaccess.o
  CC      drivers/scsi/scsi_lib.o
  AR      net/sunrpc/built-in.a
  CC      drivers/scsi/scsi_lib_dma.o
  CC      drivers/scsi/scsi_scan.o
  CC [M]  drivers/gpu/drm/ttm/ttm_bo_vm.o
  CC [M]  drivers/gpu/drm/ttm/ttm_module.o
  CC      kernel/kcmp.o
  CC [M]  drivers/gpu/drm/ttm/ttm_execbuf_util.o
  CC      drivers/acpi/acpica/nsalloc.o
  CC      net/ipv4/tcp_plb.o
  CC      fs/nfs/sysctl.o
  CC      lib/earlycpio.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.o
  GEN     drivers/scsi/scsi_devinfo_tbl.c
  CC      drivers/nvme/host/ioctl.o
  CC      drivers/dma-buf/dma-fence-unwrap.o
  CC      fs/ext4/xattr_trusted.o
  AR      net/bridge/built-in.a
  CC      drivers/acpi/acpica/nsarguments.o
  CC      lib/extable.o
  CC      lib/flex_proportions.o
  CC      lib/idr.o
  CC      drivers/acpi/acpica/nsconvert.o
  CC      drivers/spi/spi.o
  CC      drivers/net/phy/mdio-boardinfo.o
  AR      drivers/net/pse-pd/built-in.a
  CC      drivers/acpi/acpica/nsdump.o
  CC      arch/x86/kernel/ptrace.o
  CC      fs/lockd/procfs.o
  CC [M]  drivers/misc/mei/pci-me.o
  CC      fs/ntfs/index.o
  CC      drivers/net/phy/mdio_devres.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/atombios_crtc.o
  AR      fs/hostfs/built-in.a
  CC      fs/ntfs/inode.o
  CC      drivers/scsi/scsi_devinfo.o
  CC [M]  net/netfilter/xt_MASQUERADE.o
  CC      drivers/net/phy/phy.o
  CC [M]  net/netfilter/xt_addrtype.o
  CC      drivers/tty/serial/8250/8250_pericom.o
  LD [M]  drivers/gpu/drm/scheduler/gpu-sched.o
  CC      drivers/dma-buf/dma-resv.o
  CC [M]  drivers/gpu/drm/ttm/ttm_range_manager.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.o
  CC      drivers/acpi/acpica/nseval.o
  CC      drivers/acpi/acpica/nsinit.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/atom.o
  CC [M]  net/netfilter/xt_conntrack.o
  CC      kernel/freezer.o
  CC      arch/x86/kernel/tls.o
  CC      mm/page_io.o
  CC      drivers/base/cacheinfo.o
  CC      drivers/base/swnode.o
  CC      drivers/dma-buf/sync_file.o
  CC      arch/x86/kernel/step.o
  CC      lib/irq_regs.o
  CC      fs/ext4/xattr_user.o
  CC      net/ipv6/inet6_connection_sock.o
  CC      lib/is_single_threaded.o
  CC [M]  drivers/gpu/drm/ttm/ttm_resource.o
  CC      fs/nfs/nfs2super.o
  CC      drivers/scsi/scsi_sysctl.o
  CC [M]  drivers/misc/mei/hw-me.o
  AR      fs/lockd/built-in.a
  CC [M]  drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.o
  CC [M]  net/netfilter/xt_ipvs.o
  CC      net/ipv4/datagram.o
  CC      drivers/net/phy/phy-c45.o
  CC      drivers/acpi/acpica/nsload.o
  CC      net/ipv4/raw.o
  CC      net/ipv6/udp_offload.o
  CC      kernel/stacktrace.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.o
  CC      drivers/acpi/acpica/nsnames.o
  CC      drivers/dma-buf/sw_sync.o
  CC      lib/klist.o
  AR      drivers/tty/serial/8250/built-in.a
  AR      drivers/tty/serial/built-in.a
  AR      drivers/tty/built-in.a
  CC      drivers/ata/libata-eh.o
  CC      drivers/base/auxiliary.o
  CC      kernel/dma.o
  CC      fs/ntfs/mft.o
  LD [M]  net/bridge/br_netfilter.o
  CC      kernel/smp.o
  CC      drivers/net/phy/phy-core.o
  CC      mm/swap_state.o
  CC      arch/x86/kernel/i8237.o
  CC      drivers/nvme/host/trace.o
  CC      fs/ext4/fast_commit.o
  CC      drivers/base/devtmpfs.o
  LD [M]  drivers/gpu/drm/amd/amdxcp/amdxcp.o
  CC      drivers/ata/libata-transport.o
  CC      fs/ext4/orphan.o
  CC      fs/debugfs/inode.o
  CC      fs/debugfs/file.o
  CC      drivers/base/memory.o
  CC      drivers/dma-buf/sync_debug.o
  CC [M]  drivers/gpu/drm/i915/i915_driver.o
  CC      drivers/base/module.o
  CC      lib/kobject.o
  CC [M]  drivers/gpu/drm/i915/i915_drm_client.o
  CC      drivers/acpi/acpica/nsobject.o
  CC      drivers/acpi/acpica/nsparse.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.o
  CC      drivers/scsi/scsi_debugfs.o
  CC      arch/x86/kernel/stacktrace.o
  CC [M]  drivers/gpu/drm/ttm/ttm_pool.o
  CC [M]  drivers/gpu/drm/ttm/ttm_device.o
  CC      fs/nfs/proc.o
  CC      kernel/uid16.o
  CC [M]  drivers/gpu/drm/i915/i915_config.o
  CC [M]  drivers/misc/mei/gsc-me.o
  AR      drivers/firewire/built-in.a
  CC      drivers/net/phy/phy_device.o
  AR      drivers/cdrom/built-in.a
  CC [M]  drivers/gpu/drm/i915/i915_getparam.o
  CC      drivers/scsi/scsi_trace.o
  CC      net/core/selftests.o
  CC      drivers/base/pinctrl.o
  CC      fs/ntfs/mst.o
  CC      fs/ntfs/namei.o
  CC [M]  drivers/dma-buf/selftest.o
  CC [M]  drivers/dma-buf/st-dma-fence.o
  CC      drivers/acpi/acpica/nspredef.o
  LD [M]  net/netfilter/nf_conntrack.o
  CC      net/ipv6/seg6.o
  CC      fs/nfs/nfs2xdr.o
  CC      mm/swapfile.o
  LD [M]  net/netfilter/nf_nat.o
  CC      fs/ntfs/runlist.o
  CC      drivers/ata/libata-trace.o
  AR      net/netfilter/built-in.a
  CC      drivers/scsi/scsi_logging.o
  CC      mm/swap_slots.o
  CC      drivers/net/phy/linkmode.o
  CC      drivers/net/phy/mdio_bus.o
  CC      drivers/base/devcoredump.o
  CC      drivers/acpi/acpica/nsprepkg.o
  CC      fs/ntfs/super.o
  CC      net/ipv6/fib6_notifier.o
  CC      lib/kobject_uevent.o
  CC [M]  drivers/dma-buf/st-dma-fence-chain.o
  CC      arch/x86/kernel/reboot.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_object.o
  CC      drivers/net/phy/mdio_device.o
  AR      fs/debugfs/built-in.a
  CC [M]  drivers/gpu/drm/ttm/ttm_sys_manager.o
  CC      net/ipv4/udp.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.o
  CC [M]  drivers/dma-buf/st-dma-fence-unwrap.o
  LD [M]  drivers/misc/mei/mei.o
  CC      arch/x86/kernel/msr.o
  CC      arch/x86/kernel/cpuid.o
  CC      net/ipv6/rpl.o
  CC      mm/dmapool.o
  CC      net/ipv4/udplite.o
  CC      arch/x86/kernel/early-quirks.o
  LD [M]  drivers/misc/mei/mei-gsc.o
  LD [M]  drivers/misc/mei/mei-me.o
  CC [M]  drivers/gpu/drm/i915/i915_ioctl.o
  CC      drivers/scsi/scsi_pm.o
  CC      drivers/base/platform-msi.o
  CC      net/ipv6/ioam6.o
  CC      kernel/kallsyms.o
  CC      drivers/net/phy/swphy.o
  CC      drivers/acpi/acpica/nsrepair.o
  CC      drivers/nvme/host/fault_inject.o
  AR      drivers/auxdisplay/built-in.a
  CC      arch/x86/kernel/smp.o
  CC      fs/ntfs/sysctl.o
  CC [M]  drivers/gpu/drm/i915/i915_irq.o
  CC      fs/ntfs/unistr.o
  CC [M]  drivers/gpu/drm/i915/i915_mitigations.o
  CC [M]  drivers/dma-buf/st-dma-resv.o
  CC      fs/tracefs/inode.o
  CC      drivers/nvme/host/pci.o
  CC      fs/nfs/nfs3super.o
  CC [M]  drivers/gpu/drm/ttm/ttm_agp_backend.o
  AR      drivers/spi/built-in.a
  CC      kernel/acct.o
  CC      drivers/acpi/acpica/nsrepair2.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_encoders.o
  CC      drivers/scsi/scsi_bsg.o
  CC [M]  arch/x86/kvm/mmu/tdp_iter.o
  CC      net/ipv4/udp_offload.o
  CC      arch/x86/kernel/smpboot.o
  CC      drivers/net/phy/fixed_phy.o
  CC      net/ipv4/arp.o
  CC      drivers/acpi/acpica/nssearch.o
  CC      drivers/base/physical_location.o
  CC      lib/logic_pio.o
  CC      arch/x86/kernel/tsc_sync.o
  CC      net/core/ptp_classifier.o
  CC      arch/x86/kernel/setup_percpu.o
  CC      mm/hugetlb.o
  CC      net/ipv4/icmp.o
  CC [M]  arch/x86/kvm/mmu/tdp_mmu.o
  AR      drivers/dma-buf/built-in.a
  LD [M]  drivers/dma-buf/dmabuf_selftests.o
  CC      drivers/scsi/scsi_common.o
  CC      fs/ntfs/upcase.o
  CC [M]  drivers/gpu/drm/i915/i915_module.o
  CC      lib/maple_tree.o
  CC      net/ipv4/devinet.o
  CC [M]  arch/x86/kvm/smm.o
  CC      lib/memcat_p.o
  CC      drivers/usb/common/common.o
  CC      drivers/input/serio/serio.o
  LD [M]  drivers/gpu/drm/ttm/ttm.o
  CC      drivers/input/keyboard/atkbd.o
  CC      drivers/rtc/lib.o
  CC      drivers/input/input.o
  AR      drivers/input/mouse/built-in.a
  CC      drivers/acpi/acpica/nsutils.o
  AR      fs/ext4/built-in.a
  CC      drivers/rtc/class.o
  CC      drivers/input/input-compat.o
  CC      lib/nmi_backtrace.o
  CC      drivers/input/input-mt.o
  CC      arch/x86/kernel/ftrace.o
  AR      fs/tracefs/built-in.a
  CC      drivers/rtc/interface.o
  CC      drivers/base/trace.o
  CC      drivers/rtc/nvmem.o
  CC [M]  drivers/net/phy/phylink.o
  CC      drivers/ata/libata-sata.o
  CC      fs/nfs/nfs3client.o
  CC      net/ipv6/sysctl_net_ipv6.o
  CC      drivers/scsi/sd.o
  CC      drivers/input/input-poller.o
  CC      lib/plist.o
  CC      fs/btrfs/super.o
  CC      lib/radix-tree.o
  CC      lib/ratelimit.o
  CC      fs/nfs/nfs3proc.o
  AS      arch/x86/kernel/ftrace_64.o
  CC      kernel/crash_core.o
  CC      lib/rbtree.o
  CC [M]  drivers/gpu/drm/xe/tests/xe_bo_test.o
  CC      arch/x86/kernel/trace_clock.o
  CC [M]  drivers/gpu/drm/xe/xe_bb.o
  AR      fs/ntfs/built-in.a
  CC      lib/seq_buf.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_display.o
  CC      drivers/input/ff-core.o
  CC [M]  drivers/gpu/drm/xe/tests/xe_dma_buf_test.o
  CC [M]  drivers/gpu/drm/xe/tests/xe_migrate_test.o
  CC      drivers/acpi/acpica/nswalk.o
  CC      drivers/rtc/dev.o
  CC      fs/nfs/nfs3xdr.o
  CC      net/core/netprio_cgroup.o
  CC      drivers/input/touchscreen.o
  CC      drivers/usb/common/debug.o
  CC      drivers/input/ff-memless.o
  CC      drivers/input/vivaldi-fmap.o
  CC      drivers/input/serio/i8042.o
  AR      drivers/usb/common/built-in.a
  CC      drivers/usb/core/usb.o
  AR      drivers/usb/phy/built-in.a
  AR      drivers/base/built-in.a
  CC      drivers/usb/core/hub.o
  CC      drivers/acpi/proc.o
  CC      drivers/usb/core/hcd.o
  CC [M]  drivers/gpu/drm/xe/xe_bo.o
  CC      drivers/usb/core/urb.o
  CC [M]  drivers/gpu/drm/xe/xe_bo_evict.o
  CC [M]  drivers/gpu/drm/i915/i915_params.o
  CC      arch/x86/kernel/trace.o
  CC      net/ipv6/xfrm6_policy.o
  CC      drivers/usb/core/message.o
  CC      lib/show_mem.o
  CC      net/ipv4/af_inet.o
  CC [M]  drivers/gpu/drm/xe/tests/xe_pci_test.o
  CC      drivers/acpi/acpica/nsxfeval.o
  CC      kernel/compat.o
  CC      net/core/dst_cache.o
  CC      drivers/usb/host/pci-quirks.o
  AR      drivers/input/keyboard/built-in.a
  CC      drivers/input/input-leds.o
  CC      drivers/usb/host/ehci-hcd.o
  CC      net/core/gro_cells.o
  CC      drivers/usb/storage/scsiglue.o
  CC      net/core/failover.o
  CC      lib/siphash.o
  CC [M]  drivers/gpu/drm/vgem/vgem_drv.o
  CC      drivers/net/mdio/acpi_mdio.o
  CC      drivers/acpi/acpica/nsxfname.o
  CC [M]  drivers/gpu/drm/nouveau/nvif/object.o
  CC      drivers/ata/libata-sff.o
  CC [M]  drivers/gpu/drm/ast/ast_drv.o
  CC      arch/x86/kernel/rethook.o
  CC [M]  drivers/gpu/drm/ast/ast_i2c.o
  CC      net/ipv4/igmp.o
  CC      drivers/acpi/acpica/nsxfobj.o
  CC      kernel/utsname.o
  CC      drivers/usb/storage/protocol.o
  CC      drivers/rtc/proc.o
  CC      net/ipv4/fib_frontend.o
  CC [M]  drivers/gpu/drm/xe/tests/xe_rtp_test.o
  CC      lib/string.o
  CC      drivers/input/mousedev.o
  AR      drivers/net/pcs/built-in.a
  CC      lib/timerqueue.o
  CC [M]  drivers/gpu/drm/ast/ast_main.o
  AR      drivers/nvme/host/built-in.a
  AR      drivers/nvme/built-in.a
  CC      drivers/acpi/bus.o
  CC [M]  drivers/gpu/drm/xe/tests/xe_wa_test.o
  CC      net/ipv6/xfrm6_state.o
  CC [M]  drivers/gpu/drm/vgem/vgem_fence.o
  CC      kernel/user_namespace.o
  CC [M]  drivers/gpu/drm/i915/i915_pci.o
  CC      arch/x86/kernel/crash_core_64.o
  CC [M]  arch/x86/kvm/vmx/vmx.o
  CC [M]  drivers/gpu/drm/xe/xe_debugfs.o
  CC      drivers/usb/storage/transport.o
  CC      drivers/ata/libata-pmp.o
  CC      drivers/input/serio/libps2.o
  CC      drivers/acpi/acpica/psargs.o
  CC      drivers/net/mdio/fwnode_mdio.o
  CC      net/ipv4/fib_semantics.o
  CC      lib/vsprintf.o
  CC [M]  drivers/gpu/drm/nouveau/nvif/client.o
  CC      lib/win_minmax.o
  CC      drivers/usb/storage/usb.o
  AR      net/core/built-in.a
  CC      drivers/usb/storage/initializers.o
  CC      drivers/gpu/drm/drm_mipi_dsi.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.o
  CC [M]  drivers/gpu/drm/ast/ast_mm.o
  CC      net/ipv4/fib_trie.o
  CC      drivers/rtc/sysfs.o
  CC      drivers/usb/core/driver.o
  CC      net/ipv4/fib_notifier.o
  CC [M]  drivers/gpu/drm/xe/xe_devcoredump.o
  CC      kernel/pid_namespace.o
  CC [M]  drivers/net/phy/aquantia_main.o
  CC      arch/x86/kernel/module.o
  AR      fs/nfs/built-in.a
  CC      arch/x86/kernel/early_printk.o
  CC [M]  drivers/net/phy/aquantia_hwmon.o
  CC [M]  drivers/gpu/drm/nouveau/nvif/conn.o
  CC      drivers/usb/core/config.o
  LD [M]  drivers/gpu/drm/vgem/vgem.o
  CC [M]  drivers/gpu/drm/xe/xe_device.o
  CC      drivers/scsi/sg.o
  CC      drivers/acpi/acpica/psloop.o
  CC      fs/pstore/inode.o
  CC      arch/x86/kernel/hpet.o
  CC      drivers/input/evdev.o
  CC      fs/pstore/platform.o
  AR      drivers/input/serio/built-in.a
  AR      drivers/i2c/algos/built-in.a
  CC [M]  drivers/i2c/algos/i2c-algo-bit.o
  CC      drivers/usb/storage/sierra_ms.o
  CC [M]  drivers/gpu/drm/xe/xe_device_sysfs.o
  CC      net/ipv6/xfrm6_input.o
  AR      drivers/net/mdio/built-in.a
  CC      drivers/i2c/busses/i2c-designware-common.o
  AR      drivers/net/ethernet/adi/built-in.a
  AR      drivers/net/ethernet/alacritech/built-in.a
  AR      drivers/i2c/muxes/built-in.a
  CC [M]  drivers/i2c/muxes/i2c-mux-gpio.o
  AR      drivers/net/ethernet/amazon/built-in.a
  AR      drivers/net/ethernet/aquantia/built-in.a
  AR      drivers/net/ethernet/asix/built-in.a
  AR      drivers/net/ethernet/cadence/built-in.a
  CC      drivers/i2c/i2c-boardinfo.o
  AR      drivers/net/ethernet/broadcom/built-in.a
  UPD     kernel/config_data
  CC [M]  drivers/net/ethernet/broadcom/b44.o
  CC      drivers/rtc/rtc-mc146818-lib.o
  CC      drivers/usb/storage/option_ms.o
  CC [M]  drivers/gpu/drm/i915/i915_scatterlist.o
  CC [M]  drivers/net/ethernet/broadcom/bnx2.o
  CC [M]  drivers/gpu/drm/ast/ast_mode.o
  AR      drivers/i3c/built-in.a
  CC [M]  drivers/gpu/drm/ast/ast_post.o
  CC      drivers/scsi/scsi_sysfs.o
  CC [M]  drivers/gpu/drm/drm_aperture.o
  CC      drivers/rtc/rtc-cmos.o
  CC      drivers/acpi/acpica/psobject.o
  CC      drivers/acpi/acpica/psopcode.o
  CC      drivers/ata/libata-acpi.o
  CC      kernel/stop_machine.o
  AR      drivers/media/i2c/built-in.a
  CC      fs/btrfs/ctree.o
  AR      drivers/media/tuners/built-in.a
  CC      fs/btrfs/extent-tree.o
  AR      drivers/media/rc/keymaps/built-in.a
  AR      drivers/media/rc/built-in.a
  AR      drivers/media/common/b2c2/built-in.a
  AR      drivers/media/common/saa7146/built-in.a
  CC [M]  drivers/gpu/drm/nouveau/nvif/device.o
  CC      fs/btrfs/print-tree.o
  AR      drivers/media/common/siano/built-in.a
  AR      drivers/media/common/v4l2-tpg/built-in.a
  CC [M]  drivers/net/phy/ax88796b.o
  AR      drivers/media/common/videobuf2/built-in.a
  AR      drivers/media/common/built-in.a
  CC      net/ipv4/inet_fragment.o
  CC      fs/pstore/pmsg.o
  AR      drivers/media/platform/allegro-dvt/built-in.a
  AR      drivers/media/platform/amlogic/meson-ge2d/built-in.a
  AR      drivers/media/platform/amlogic/built-in.a
  AR      drivers/media/platform/amphion/built-in.a
  AR      drivers/media/platform/aspeed/built-in.a
  AR      drivers/media/platform/atmel/built-in.a
  AR      drivers/media/platform/cadence/built-in.a
  AR      drivers/media/platform/chips-media/built-in.a
  AR      drivers/media/platform/intel/built-in.a
  CC      drivers/acpi/acpica/psopinfo.o
  AR      drivers/media/platform/marvell/built-in.a
  AR      drivers/media/platform/microchip/built-in.a
  CC      drivers/acpi/acpica/psparse.o
  AR      drivers/media/platform/mediatek/jpeg/built-in.a
  AR      drivers/media/platform/mediatek/mdp/built-in.a
  CC      drivers/ata/libata-pata-timings.o
  CC      drivers/acpi/acpica/psscope.o
  CC      drivers/usb/core/file.o
  AR      drivers/media/platform/mediatek/vcodec/built-in.a
  AR      drivers/media/platform/mediatek/vpu/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_dma_buf.o
  AR      drivers/media/platform/mediatek/mdp3/built-in.a
  AR      drivers/ptp/built-in.a
  AR      drivers/media/platform/mediatek/built-in.a
  CC [M]  drivers/ptp/ptp_clock.o
  CC      drivers/usb/storage/usual-tables.o
  AR      drivers/media/platform/nvidia/tegra-vde/built-in.a
  AR      drivers/media/platform/nvidia/built-in.a
  AR      drivers/media/platform/nxp/dw100/built-in.a
  CC [M]  drivers/net/ethernet/broadcom/cnic.o
  AR      drivers/media/platform/nxp/imx-jpeg/built-in.a
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.o
  CC      arch/x86/kernel/amd_nb.o
  AR      drivers/media/platform/nxp/imx8-isi/built-in.a
  CC      drivers/hwmon/hwmon.o
  AR      drivers/media/platform/nxp/built-in.a
  AR      drivers/power/reset/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_exec.o
  CC      drivers/power/supply/power_supply_core.o
  AR      drivers/media/platform/qcom/camss/built-in.a
  AR      drivers/media/platform/qcom/venus/built-in.a
  AR      drivers/media/platform/qcom/built-in.a
  CC      drivers/usb/serial/usb-serial.o
  AR      drivers/usb/misc/built-in.a
  AR      drivers/media/platform/renesas/rcar-vin/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_execlist.o
  AR      drivers/media/platform/renesas/rzg2l-cru/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_exec_queue.o
  AR      drivers/net/ethernet/cavium/common/built-in.a
  AR      drivers/net/ethernet/cavium/thunder/built-in.a
  AR      drivers/media/platform/renesas/vsp1/built-in.a
  AR      drivers/media/platform/renesas/built-in.a
  AR      drivers/net/ethernet/cavium/liquidio/built-in.a
  AR      drivers/net/ethernet/cavium/octeon/built-in.a
  AR      drivers/net/ethernet/cavium/built-in.a
  AR      drivers/media/platform/rockchip/rga/built-in.a
  CC      drivers/i2c/busses/i2c-designware-master.o
  AR      drivers/input/built-in.a
  CC      drivers/i2c/busses/i2c-designware-platdrv.o
  AR      drivers/media/platform/rockchip/rkisp1/built-in.a
  CC      lib/xarray.o
  AR      drivers/media/platform/rockchip/built-in.a
  AR      drivers/media/platform/samsung/exynos-gsc/built-in.a
  AR      drivers/media/platform/samsung/exynos4-is/built-in.a
  AR      drivers/media/platform/samsung/s3c-camif/built-in.a
  AR      drivers/media/platform/samsung/s5p-g2d/built-in.a
  AR      drivers/media/platform/samsung/s5p-jpeg/built-in.a
  AR      drivers/media/platform/samsung/s5p-mfc/built-in.a
  AR      drivers/media/platform/samsung/built-in.a
  AR      fs/pstore/built-in.a
  CC      drivers/acpi/acpica/pstree.o
  CC      fs/efivarfs/inode.o
  AR      drivers/media/platform/st/sti/bdisp/built-in.a
  AR      drivers/media/platform/st/sti/c8sectpfe/built-in.a
  AR      drivers/media/platform/st/sti/delta/built-in.a
  AR      drivers/media/platform/st/sti/hva/built-in.a
  AR      drivers/media/platform/st/stm32/built-in.a
  CC      drivers/acpi/acpica/psutils.o
  AR      drivers/media/platform/st/built-in.a
  CC [M]  drivers/gpu/drm/i915/i915_suspend.o
  AR      drivers/media/platform/sunxi/sun4i-csi/built-in.a
  AR      drivers/media/platform/sunxi/sun6i-csi/built-in.a
  CC      kernel/kprobes.o
  CC [M]  drivers/gpu/drm/xe/xe_force_wake.o
  AR      drivers/media/platform/sunxi/sun6i-mipi-csi2/built-in.a
  AR      drivers/media/platform/sunxi/sun8i-a83t-mipi-csi2/built-in.a
  CC [M]  drivers/net/phy/bcm7xxx.o
  AR      drivers/media/platform/sunxi/sun8i-di/built-in.a
  AR      drivers/media/platform/sunxi/sun8i-rotate/built-in.a
  CC      net/ipv6/xfrm6_output.o
  AR      drivers/media/platform/sunxi/built-in.a
  CC [M]  drivers/gpu/drm/nouveau/nvif/disp.o
  AR      drivers/media/platform/ti/am437x/built-in.a
  AR      drivers/media/platform/ti/cal/built-in.a
  AR      drivers/media/platform/ti/vpe/built-in.a
  AR      drivers/rtc/built-in.a
  AR      drivers/media/platform/ti/davinci/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_ggtt.o
  AR      drivers/usb/storage/built-in.a
  CC      drivers/power/supply/power_supply_sysfs.o
  AR      drivers/media/platform/ti/omap/built-in.a
  CC [M]  drivers/net/ethernet/broadcom/tg3.o
  AR      drivers/media/platform/ti/omap3isp/built-in.a
  AR      drivers/media/platform/ti/built-in.a
  CC      drivers/usb/core/buffer.o
  AR      drivers/media/platform/verisilicon/built-in.a
  AR      drivers/media/platform/via/built-in.a
  AR      drivers/media/platform/xilinx/built-in.a
  CC [M]  drivers/hwmon/acpi_power_meter.o
  AR      drivers/media/platform/built-in.a
  CC [M]  drivers/gpu/drm/drm_atomic.o
  CC      drivers/ata/ahci.o
  AR      drivers/media/pci/ttpci/built-in.a
  CC      drivers/ata/libahci.o
  AR      drivers/media/pci/b2c2/built-in.a
  AR      drivers/media/pci/pluto2/built-in.a
  AR      drivers/media/pci/dm1105/built-in.a
  AR      drivers/media/pci/pt1/built-in.a
  AR      drivers/media/pci/pt3/built-in.a
  CC      net/ipv4/ping.o
  CC      mm/hugetlb_vmemmap.o
  AR      drivers/media/pci/mantis/built-in.a
  AR      drivers/media/pci/ngene/built-in.a
  AR      drivers/media/pci/ddbridge/built-in.a
  CC      drivers/acpi/acpica/pswalk.o
  CC      arch/x86/kernel/kvm.o
  AR      drivers/media/pci/saa7146/built-in.a
  AR      drivers/media/pci/smipcie/built-in.a
  AR      drivers/media/pci/netup_unidvb/built-in.a
  AR      drivers/media/pci/intel/ipu3/built-in.a
  CC      drivers/acpi/acpica/psxface.o
  AR      drivers/media/pci/intel/built-in.a
  AR      drivers/media/pci/built-in.a
  CC [M]  drivers/ptp/ptp_chardev.o
  CC [M]  drivers/gpu/drm/ast/ast_dp501.o
  AR      drivers/media/usb/b2c2/built-in.a
  CC      fs/efivarfs/file.o
  AR      drivers/media/usb/dvb-usb/built-in.a
  AR      drivers/media/usb/dvb-usb-v2/built-in.a
  AR      drivers/scsi/built-in.a
  CC [M]  drivers/gpu/drm/ast/ast_dp.o
  AR      drivers/media/usb/s2255/built-in.a
  AR      drivers/media/usb/siano/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_gt.o
  AR      drivers/media/usb/ttusb-budget/built-in.a
  CC [M]  drivers/gpu/drm/nouveau/nvif/driver.o
  AR      drivers/media/usb/ttusb-dec/built-in.a
  CC [M]  drivers/gpu/drm/nouveau/nvif/event.o
  AR      drivers/media/usb/built-in.a
  CC [M]  drivers/gpu/drm/nouveau/nvif/fifo.o
  AR      drivers/media/firewire/built-in.a
  AR      drivers/media/mmc/siano/built-in.a
  AR      drivers/media/mmc/built-in.a
  AR      drivers/media/spi/built-in.a
  AR      drivers/thermal/broadcom/built-in.a
  AR      drivers/thermal/samsung/built-in.a
  CC [M]  drivers/gpu/drm/nouveau/nvif/head.o
  AR      drivers/media/test-drivers/built-in.a
  AR      drivers/media/built-in.a
  CC      drivers/thermal/intel/intel_tcc.o
  AR      drivers/thermal/st/built-in.a
  CC [M]  fs/netfs/buffered_read.o
  AR      drivers/thermal/qcom/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_gt_clock.o
  CC      drivers/power/supply/power_supply_leds.o
  AR      drivers/net/ethernet/cortina/built-in.a
  CC      drivers/usb/host/ehci-pci.o
  AR      drivers/net/ethernet/engleder/built-in.a
  CC      drivers/usb/host/ohci-hcd.o
  CC      drivers/usb/host/ohci-pci.o
  CC      drivers/thermal/intel/therm_throt.o
  CC      drivers/usb/core/sysfs.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.o
  CC      drivers/i2c/busses/i2c-designware-baytrail.o
  CC [M]  drivers/hwmon/coretemp.o
  CC      drivers/acpi/acpica/rsaddr.o
  CC [M]  fs/netfs/io.o
  CC [M]  fs/fscache/cache.o
  AR      drivers/thermal/tegra/built-in.a
  CC [M]  drivers/net/phy/bcm87xx.o
  AR      drivers/thermal/mediatek/built-in.a
  CC [M]  fs/netfs/iterator.o
  CC      arch/x86/kernel/kvmclock.o
  CC [M]  drivers/gpu/drm/drm_atomic_uapi.o
  CC [M]  drivers/gpu/drm/i915/i915_switcheroo.o
  CC      drivers/usb/serial/generic.o
  CC      fs/efivarfs/super.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_debugfs.o
  CC      drivers/power/supply/power_supply_hwmon.o
  CC      drivers/usb/core/endpoint.o
  CC      mm/sparse.o
  CC      net/ipv6/xfrm6_protocol.o
  CC [M]  drivers/ptp/ptp_sysfs.o
  CC      drivers/acpi/acpica/rscalc.o
  CC      drivers/usb/serial/bus.o
  CC [M]  drivers/gpu/drm/nouveau/nvif/mem.o
  CC      drivers/acpi/acpica/rscreate.o
  CC [M]  fs/netfs/main.o
  CC      drivers/usb/host/uhci-hcd.o
  CC [M]  drivers/gpu/drm/nouveau/nvif/mmu.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_idle_sysfs.o
  LD [M]  drivers/gpu/drm/ast/ast.o
  CC      drivers/usb/host/xhci.o
  CC      drivers/usb/host/xhci-mem.o
  CC      drivers/usb/serial/console.o
  CC      drivers/ata/ata_piix.o
  CC [M]  drivers/i2c/busses/i2c-scmi.o
  CC [M]  drivers/gpu/drm/i915/i915_sysfs.o
  CC [M]  arch/x86/kvm/kvm-asm-offsets.s
  CC [M]  drivers/net/phy/bcm-phy-lib.o
  CC      drivers/acpi/acpica/rsdumpinfo.o
  CC      arch/x86/kernel/paravirt.o
  CC      lib/lockref.o
  AR      drivers/net/usb/built-in.a
  CC [M]  drivers/net/usb/pegasus.o
  CC      kernel/hung_task.o
  AR      drivers/power/supply/built-in.a
  AR      drivers/power/built-in.a
  AR      drivers/hwmon/built-in.a
  CC [M]  drivers/net/usb/rtl8150.o
  CC [M]  arch/x86/kvm/vmx/pmu_intel.o
  CC [M]  fs/fscache/cookie.o
  CC [M]  drivers/net/ipvlan/ipvlan_core.o
  CC [M]  fs/fscache/io.o
  CC [M]  drivers/thermal/intel/x86_pkg_temp_thermal.o
  CC [M]  fs/fscache/main.o
  CC [M]  drivers/net/ipvlan/ipvlan_main.o
  CC      fs/efivarfs/vars.o
  CC      lib/bcd.o
  CC      drivers/usb/core/devio.o
  CC      net/ipv6/netfilter.o
  CC      lib/sort.o
  CC      drivers/acpi/acpica/rsinfo.o
  CC      fs/btrfs/root-tree.o
  CC [M]  drivers/gpu/drm/drm_auth.o
  CC      net/ipv4/ip_tunnel_core.o
  AR      drivers/net/ethernet/ezchip/built-in.a
  CC [M]  fs/fscache/volume.o
  CC      lib/parser.o
  CC      drivers/thermal/thermal_core.o
  CC [M]  fs/smb/common/cifs_arc4.o
  CC [M]  drivers/gpu/drm/i915/i915_utils.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_mcr.o
  CC [M]  drivers/ptp/ptp_vclock.o
  CC [M]  drivers/gpu/drm/i915/intel_clock_gating.o
  CC      drivers/usb/serial/ftdi_sio.o
  CC      mm/sparse-vmemmap.o
  CC [M]  drivers/gpu/drm/nouveau/nvif/outp.o
  CC [M]  drivers/gpu/drm/drm_blend.o
  CC      drivers/acpi/acpica/rsio.o
  CC [M]  fs/netfs/objects.o
  CC      arch/x86/kernel/pvclock.o
  CC [M]  drivers/i2c/busses/i2c-ccgx-ucsi.o
  CC [M]  arch/x86/kvm/vmx/vmcs12.o
  CC      lib/debug_locks.o
  CC      lib/random32.o
  CC [M]  fs/smb/common/cifs_md4.o
  AR      drivers/thermal/intel/built-in.a
  CC      drivers/usb/host/xhci-ext-caps.o
  CC [M]  drivers/gpu/drm/i915/intel_device_info.o
  AR      drivers/ata/built-in.a
  CC      drivers/acpi/glue.o
  CC [M]  drivers/gpu/drm/nouveau/nvif/timer.o
  CC [M]  fs/fscache/proc.o
  AR      fs/efivarfs/built-in.a
  CC      kernel/watchdog.o
  CC      drivers/usb/host/xhci-ring.o
  CC [M]  drivers/net/phy/broadcom.o
  CC      drivers/usb/serial/pl2303.o
  CC      net/ipv6/fib6_rules.o
  CC      drivers/acpi/acpica/rsirq.o
  CC [M]  drivers/ptp/ptp_kvm_x86.o
  CC      lib/bust_spinlocks.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_bios.o
  CC [M]  fs/smb/client/trace.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_pagefault.o
  CC [M]  fs/smb/client/cifsfs.o
  CC [M]  fs/smb/client/cifs_debug.o
  CC      arch/x86/kernel/pcspeaker.o
  CC      mm/mmu_notifier.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_sysfs.o
  CC      net/ipv6/proc.o
  CC      net/ipv6/syncookies.o
  CC [M]  drivers/net/usb/r8152.o
  CC [M]  drivers/i2c/busses/i2c-i801.o
  LD [M]  fs/netfs/netfs.o
  CC [M]  arch/x86/kvm/vmx/hyperv.o
  CC      net/ipv6/mip6.o
  CC      drivers/acpi/acpica/rslist.o
  CC      net/ipv6/addrconf_core.o
  CC      drivers/usb/host/xhci-hub.o
  CC [M]  drivers/net/usb/asix_devices.o
  CC [M]  drivers/gpu/drm/nouveau/nvif/vmm.o
  CC      mm/ksm.o
  CC [M]  drivers/gpu/drm/i915/intel_memory_region.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.o
  CC      fs/btrfs/dir-item.o
  LD [M]  fs/fscache/fscache.o
  CC [M]  drivers/ptp/ptp_kvm_common.o
  CC [M]  drivers/net/ipvlan/ipvlan_l3s.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.o
  CC      arch/x86/kernel/check.o
  CC      drivers/usb/gadget/udc/core.o
  CC      lib/kasprintf.o
  CC      lib/bitmap.o
  CC      net/ipv4/gre_offload.o
  CC      drivers/usb/gadget/udc/trace.o
  CC      kernel/watchdog_hld.o
  CC      drivers/thermal/thermal_sysfs.o
  AR      drivers/usb/gadget/function/built-in.a
  CC      drivers/thermal/thermal_trip.o
  CC [M]  drivers/net/phy/lxt.o
  CC      drivers/acpi/acpica/rsmemory.o
  CC [M]  drivers/gpu/drm/i915/intel_pcode.o
  CC      lib/scatterlist.o
  AR      drivers/usb/serial/built-in.a
  CC [M]  arch/x86/kvm/vmx/nested.o
  CC      drivers/acpi/acpica/rsmisc.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_topology.o
  CC      drivers/thermal/thermal_helpers.o
  CC [M]  drivers/gpu/drm/drm_bridge.o
  CC      drivers/acpi/acpica/rsserial.o
  CC [M]  drivers/gpu/drm/i915/intel_region_ttm.o
  CC      net/ipv6/exthdrs_core.o
  CC      mm/slub.o
  CC      lib/list_sort.o
  CC      arch/x86/kernel/uprobes.o
  CC      net/ipv6/ip6_checksum.o
  CC      drivers/thermal/thermal_hwmon.o
  CC      drivers/usb/core/notify.o
  CC      drivers/thermal/gov_fair_share.o
  LD [M]  drivers/ptp/ptp.o
  CC [M]  drivers/gpu/drm/nouveau/nvif/user.o
  LD [M]  drivers/ptp/ptp_kvm.o
  CC      lib/uuid.o
  CC [M]  drivers/gpu/drm/i915/intel_runtime_pm.o
  CC      net/ipv6/ip6_icmp.o
  CC [M]  drivers/net/vxlan/vxlan_core.o
  HOSTCC  drivers/gpu/drm/xe/xe_gen_wa_oob
  CC      kernel/seccomp.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/atombios_dp.o
  CC      kernel/relay.o
  CC [M]  drivers/net/vxlan/vxlan_multicast.o
  CC      drivers/net/loopback.o
  CC      kernel/utsname_sysctl.o
  CC      drivers/watchdog/watchdog_core.o
  CC      drivers/acpi/acpica/rsutils.o
  CC      drivers/i2c/i2c-core-base.o
  CC      drivers/watchdog/watchdog_dev.o
  CC      drivers/acpi/acpica/rsxface.o
  CC      drivers/usb/core/generic.o
  LD [M]  drivers/net/ipvlan/ipvlan.o
  CC      drivers/net/netconsole.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_ads.o
  CC [M]  drivers/net/phy/realtek.o
  CC [M]  drivers/i2c/busses/i2c-isch.o
  AR      drivers/net/ethernet/fungible/built-in.a
  CC [M]  drivers/i2c/busses/i2c-ismt.o
  CC      fs/btrfs/file-item.o
  CC      lib/iov_iter.o
  CC      drivers/thermal/gov_step_wise.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_ct.o
  CC      drivers/acpi/acpica/tbdata.o
  CC [M]  drivers/net/phy/smsc.o
  CC      drivers/thermal/gov_user_space.o
  CC      drivers/usb/core/quirks.o
  CC      net/ipv4/metrics.o
  CC      fs/btrfs/inode-item.o
  CC      kernel/delayacct.o
  CC      drivers/watchdog/softdog.o
  CC      drivers/i2c/i2c-core-smbus.o
  CC      drivers/i2c/i2c-core-acpi.o
  CC      drivers/acpi/acpica/tbfadt.o
  CC      drivers/usb/core/devices.o
  CC      arch/x86/kernel/perf_regs.o
  AR      drivers/usb/gadget/udc/built-in.a
  AR      drivers/usb/gadget/legacy/built-in.a
  CC      drivers/usb/gadget/usbstring.o
  CC [M]  drivers/gpu/drm/nouveau/nvif/userc361.o
  CC      net/ipv6/output_core.o
  CC      arch/x86/kernel/tracepoint.o
  CC      fs/btrfs/disk-io.o
  CC [M]  drivers/net/usb/asix_common.o
  CC      arch/x86/kernel/itmt.o
  CC [M]  fs/smb/client/connect.o
  AR      drivers/thermal/built-in.a
  CC [M]  drivers/md/persistent-data/dm-array.o
  CC      drivers/opp/core.o
  CC      drivers/md/md.o
  CC      drivers/opp/cpu.o
  CC [M]  drivers/md/persistent-data/dm-bitset.o
  CC [M]  drivers/gpu/drm/drm_cache.o
  CC [M]  drivers/i2c/busses/i2c-piix4.o
  CC [M]  drivers/gpu/drm/i915/intel_sbi.o
  CC      net/ipv6/protocol.o
  AR      drivers/watchdog/built-in.a
  CC [M]  drivers/gpu/drm/drm_client.o
  CC      drivers/acpi/acpica/tbfind.o
  CC      drivers/acpi/acpica/tbinstal.o
  CC [M]  drivers/gpu/drm/i915/intel_step.o
  CC      net/ipv4/netlink.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_afmt.o
  CC      net/ipv4/nexthop.o
  CC      drivers/cpufreq/cpufreq.o
  CC      drivers/usb/gadget/config.o
  CC      drivers/cpuidle/governors/menu.o
  CC      drivers/cpuidle/cpuidle.o
  LD [M]  drivers/net/phy/aquantia.o
  CC      drivers/net/virtio_net.o
  AR      drivers/net/phy/built-in.a
  CC      drivers/cpufreq/freq_table.o
  CC      drivers/i2c/i2c-core-slave.o
  CC [M]  drivers/net/vxlan/vxlan_vnifilter.o
  CC      arch/x86/kernel/umip.o
  CC      drivers/usb/core/phy.o
  CC      drivers/opp/debugfs.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/client.o
  CC [M]  fs/fuse/dev.o
  CC      drivers/acpi/acpica/tbprint.o
  CC      drivers/net/net_failover.o
  CC [M]  drivers/net/dummy.o
  CC      drivers/acpi/acpica/tbutils.o
  CC      net/ipv6/ip6_offload.o
  CC      drivers/usb/host/xhci-dbg.o
  CC      kernel/taskstats.o
  CC      drivers/usb/host/xhci-trace.o
  CC [M]  drivers/md/persistent-data/dm-block-manager.o
  CC      drivers/cpuidle/governors/haltpoll.o
  CC      net/ipv4/udp_tunnel_stub.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_debugfs.o
  CC [M]  drivers/gpu/drm/i915/intel_uncore.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/engine.o
  CC [M]  fs/fuse/dir.o
  AR      drivers/net/ethernet/huawei/built-in.a
  CC [M]  fs/smb/client/dir.o
  CC [M]  drivers/net/usb/ax88172a.o
  CC      drivers/cpufreq/cpufreq_performance.o
  CC      drivers/acpi/scan.o
  CC      fs/btrfs/transaction.o
  CC [M]  drivers/gpu/drm/drm_client_modeset.o
  CC      drivers/usb/gadget/epautoconf.o
  CC [M]  fs/smb/client/file.o
  CC      drivers/acpi/acpica/tbxface.o
  CC      drivers/usb/gadget/composite.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_trace_points.o
  CC      drivers/md/md-bitmap.o
  CC      arch/x86/kernel/unwind_orc.o
  CC [M]  arch/x86/kvm/vmx/posted_intr.o
  CC [M]  drivers/gpu/drm/i915/intel_wakeref.o
  CC [M]  drivers/i2c/busses/i2c-designware-pcidrv.o
  CC      drivers/usb/core/port.o
  CC [M]  drivers/md/persistent-data/dm-space-map-common.o
  CC      drivers/acpi/resource.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/enum.o
  LD [M]  arch/x86/kvm/kvm.o
  CC      arch/x86/kernel/callthunks.o
  CC      drivers/usb/core/hcd-pci.o
  AR      drivers/opp/built-in.a
  CC [M]  drivers/md/persistent-data/dm-space-map-disk.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_hwconfig.o
  CC      net/ipv6/tcpv6_offload.o
  CC      drivers/acpi/acpica/tbxfload.o
  CC      drivers/acpi/acpica/tbxfroot.o
  AR      drivers/cpuidle/governors/built-in.a
  CC      drivers/cpuidle/driver.o
  CC      kernel/tsacct.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/event.o
  CC      drivers/cpufreq/cpufreq_ondemand.o
  CC      drivers/md/md-autodetect.o
  CC      arch/x86/kernel/mmconf-fam10h_64.o
  CC [M]  drivers/net/usb/ax88179_178a.o
  CC [M]  fs/fuse/file.o
  LD [M]  drivers/i2c/busses/i2c-designware-pci.o
  CC      arch/x86/kernel/vsmp_64.o
  CC      lib/clz_ctz.o
  AR      drivers/i2c/busses/built-in.a
  CC      drivers/acpi/acpica/utaddress.o
  CC      drivers/i2c/i2c-dev.o
  CC      net/ipv6/exthdrs_offload.o
  CC      lib/bsearch.o
  CC      net/ipv6/inet6_hashtables.o
  CC      net/ipv6/mcast_snoop.o
  CC      drivers/cpuidle/governor.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_log.o
  CC      drivers/acpi/acpica/utalloc.o
  CC      drivers/usb/gadget/functions.o
  CC      drivers/acpi/acpi_processor.o
  CC      drivers/acpi/processor_core.o
  CC      drivers/cpufreq/cpufreq_governor.o
  CC      drivers/md/dm-uevent.o
  CC      drivers/usb/core/usb-acpi.o
  CC      kernel/tracepoint.o
  CC      kernel/latencytop.o
  CC [M]  drivers/md/persistent-data/dm-space-map-metadata.o
  CC      mm/migrate.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/atombios_encoders.o
  CC [M]  drivers/usb/class/usbtmc.o
  CC [M]  drivers/net/vxlan/vxlan_mdb.o
  CC      mm/migrate_device.o
  CC      drivers/acpi/acpica/utascii.o
  AR      arch/x86/kernel/built-in.a
  CC      drivers/usb/gadget/configfs.o
  CC      drivers/cpuidle/sysfs.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/firmware.o
  CC      drivers/md/dm.o
  CC [M]  drivers/net/usb/cdc_ether.o
  CC      drivers/cpuidle/poll_state.o
  UPD     arch/x86/kvm/kvm-asm-offsets.h
  CC      lib/find_bit.o
  CC [M]  fs/fuse/inode.o
  CC      drivers/mmc/core/core.o
  CC      drivers/mmc/host/sdhci.o
  CC      drivers/usb/host/xhci-debugfs.o
  CC      drivers/usb/gadget/u_f.o
  CC      drivers/acpi/processor_pdc.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_pc.o
  CC [M]  net/ipv6/ip6_udp_tunnel.o
  AS [M]  arch/x86/kvm/vmx/vmenter.o
  CC      drivers/acpi/acpica/utbuffer.o
  CC      drivers/md/dm-table.o
  CC [M]  fs/fuse/control.o
  CC      drivers/cpufreq/cpufreq_governor_attr_set.o
  CC      lib/llist.o
  CC      drivers/mmc/core/bus.o
  CC [M]  drivers/i2c/i2c-smbus.o
  AR      drivers/usb/core/built-in.a
  CC      drivers/acpi/ec.o
  CC      drivers/acpi/dock.o
  CC      kernel/irq_work.o
  CC [M]  drivers/i2c/i2c-mux.o
  CC      lib/memweight.o
  CC [M]  drivers/md/persistent-data/dm-transaction-manager.o
  CC      drivers/acpi/pci_root.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_submit.o
  CC      lib/kfifo.o
  CC      net/ipv4/sysctl_net_ipv4.o
  CC [M]  drivers/md/persistent-data/dm-btree.o
  CC      drivers/cpuidle/cpuidle-haltpoll.o
  CC [M]  drivers/net/usb/cdc_eem.o
  CC      drivers/md/dm-target.o
  CC      drivers/mmc/core/host.o
  CC      drivers/acpi/acpica/utcksum.o
  CC      drivers/usb/host/xhci-pci.o
  CC [M]  fs/fuse/xattr.o
  CC [M]  drivers/md/persistent-data/dm-btree-remove.o
  CC      fs/btrfs/inode.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/gpuobj.o
  CC [M]  drivers/md/persistent-data/dm-btree-spine.o
  CC      drivers/cpufreq/acpi-cpufreq.o
  CC [M]  drivers/net/usb/smsc75xx.o
  CC [M]  fs/smb/client/inode.o
  AR      drivers/cpuidle/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_hw_engine.o
  CC      drivers/acpi/acpica/utcopy.o
  CC      kernel/static_call.o
  CC [M]  drivers/net/usb/smsc95xx.o
  CC      mm/huge_memory.o
  CC      drivers/mmc/core/mmc.o
  LD [M]  arch/x86/kvm/kvm-intel.o
  CC      drivers/md/dm-linear.o
  CC [M]  fs/smb/client/link.o
  AR      arch/x86/built-in.a
  AR      drivers/i2c/built-in.a
  CC      fs/btrfs/file.o
  CC      fs/btrfs/defrag.o
  AR      net/ipv6/built-in.a
  CC [M]  drivers/gpu/drm/i915/vlv_sideband.o
  CC      fs/btrfs/extent_map.o
  AR      drivers/usb/gadget/built-in.a
  CC      drivers/md/dm-stripe.o
  CC [M]  drivers/gpu/drm/i915/vlv_suspend.o
  CC      drivers/md/dm-ioctl.o
  CC      lib/percpu-refcount.o
  CC [M]  drivers/net/macvlan.o
  AR      drivers/ufs/built-in.a
  CC [M]  drivers/gpu/drm/drm_color_mgmt.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_sa.o
  CC      drivers/acpi/acpica/utexcep.o
  CC [M]  fs/fuse/acl.o
  CC      kernel/static_call_inline.o
  CC [M]  fs/smb/client/misc.o
  CC [M]  drivers/gpu/drm/drm_connector.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/atombios_i2c.o
  CC      drivers/mmc/host/sdhci-pci-core.o
  CC [M]  drivers/net/mii.o
  CC [M]  drivers/net/mdio.o
  CC      mm/khugepaged.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/intr.o
  CC      drivers/acpi/pci_link.o
  CC [M]  drivers/gpu/drm/drm_crtc.o
  CC      drivers/cpufreq/intel_pstate.o
  CC [M]  fs/fuse/readdir.o
  CC      drivers/acpi/pci_irq.o
  LD [M]  drivers/md/persistent-data/dm-persistent-data.o
  CC      drivers/acpi/acpica/utdebug.o
  CC      drivers/md/dm-io.o
  CC [M]  fs/smb/client/netmisc.o
  CC [M]  fs/smb/client/smbencrypt.o
  CC      net/ipv4/proc.o
  LD [M]  drivers/net/vxlan/vxlan.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/ioctl.o
  AR      drivers/usb/host/built-in.a
  CC      drivers/md/dm-kcopyd.o
  AR      drivers/usb/built-in.a
  CC      lib/rhashtable.o
  CC      drivers/mmc/host/sdhci-pci-o2micro.o
  CC      drivers/acpi/acpi_lpss.o
  CC      lib/base64.o
  CC      kernel/user-return-notifier.o
  CC [M]  drivers/net/tun.o
  CC [M]  fs/fuse/ioctl.o
  CC [M]  drivers/gpu/drm/drm_displayid.o
  AR      drivers/leds/trigger/built-in.a
  CC [M]  drivers/leds/trigger/ledtrig-audio.o
  CC      drivers/acpi/acpica/utdecode.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.o
  CC      kernel/padata.o
  CC [M]  drivers/gpu/drm/i915/soc/intel_dram.o
  CC [M]  fs/overlayfs/super.o
  CC [M]  drivers/net/veth.o
  CC [M]  drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.o
  CC      drivers/acpi/acpi_apd.o
  CC      drivers/acpi/acpica/utdelete.o
  CC      drivers/mmc/core/mmc_ops.o
  CC [M]  drivers/gpu/drm/drm_drv.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/memory.o
  AR      drivers/leds/blink/built-in.a
  AR      drivers/leds/simple/built-in.a
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/mm.o
  CC      drivers/acpi/acpica/uterror.o
  CC      drivers/leds/led-core.o
  CC [M]  drivers/gpu/drm/xe/xe_hw_fence.o
  CC      drivers/leds/led-class.o
  CC [M]  drivers/net/ethernet/intel/e1000/e1000_main.o
  CC [M]  drivers/net/usb/mcs7830.o
  CC [M]  drivers/net/usb/usbnet.o
  CC [M]  drivers/net/ethernet/intel/e1000e/82571.o
  CC [M]  drivers/net/ethernet/intel/igb/igb_main.o
  CC [M]  drivers/net/ethernet/intel/igb/igb_ethtool.o
  CC [M]  drivers/net/ethernet/intel/e1000/e1000_hw.o
  CC [M]  drivers/net/ethernet/intel/igb/e1000_82575.o
  CC [M]  drivers/net/ethernet/intel/e1000e/ich8lan.o
  CC [M]  drivers/net/ethernet/intel/igc/igc_main.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/object.o
  LD [M]  fs/fuse/fuse.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/oproxy.o
  CC      kernel/jump_label.o
  CC      drivers/acpi/acpi_platform.o
  CC      drivers/mmc/core/sd.o
  CC [M]  fs/smb/client/transport.o
  CC [M]  drivers/net/ethernet/intel/igbvf/vf.o
  CC      drivers/acpi/acpica/uteval.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_main.o
  CC      fs/btrfs/sysfs.o
  CC [M]  drivers/net/usb/cdc_ncm.o
  CC      net/ipv4/syncookies.o
  CC [M]  drivers/net/usb/r8153_ecm.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.o
  CC      kernel/context_tracking.o
  CC      drivers/leds/led-triggers.o
  CC      drivers/mmc/host/sdhci-pci-arasan.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.o
  CC      kernel/iomem.o
  CC      lib/refcount.o
  CC      lib/once.o
  CC [M]  drivers/gpu/drm/i915/soc/intel_gmch.o
  CC      drivers/acpi/acpica/utglobal.o
  CC      drivers/mmc/core/sd_ops.o
  CC      lib/rcuref.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/option.o
  CC [M]  drivers/gpu/drm/xe/xe_huc.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_pll.o
  CC      mm/page_counter.o
  CC [M]  fs/smb/client/cached_dir.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/ramht.o
  AR      drivers/cpufreq/built-in.a
  CC      drivers/mmc/core/sdio.o
  CC      kernel/rseq.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/subdev.o
  CC      drivers/acpi/acpi_pnp.o
  CC      lib/usercopy.o
  CC      drivers/md/dm-sysfs.o
  CC [M]  fs/overlayfs/namei.o
  CC      lib/errseq.o
  CC      drivers/acpi/acpica/uthex.o
  CC [M]  drivers/net/ethernet/intel/igbvf/mbx.o
  AR      drivers/net/ethernet/i825xx/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_huc_debugfs.o
  CC      net/ipv4/esp4.o
  CC      drivers/mmc/host/sdhci-pci-dwc-mshc.o
  CC      drivers/mmc/host/sdhci-pci-gli.o
  CC      mm/memcontrol.o
  AR      drivers/leds/built-in.a
  CC      lib/bucket_locks.o
  CC      net/ipv4/esp4_offload.o
  CC [M]  drivers/gpu/drm/xe/xe_irq.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/core/uevent.o
  CC      lib/generic-radix-tree.o
  CC [M]  drivers/gpu/drm/drm_dumb_buffers.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_common.o
  CC      drivers/acpi/acpica/utids.o
  CC      drivers/acpi/power.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.o
  CC      mm/vmpressure.o
  CC [M]  drivers/net/ethernet/intel/igbvf/ethtool.o
  CC [M]  drivers/gpu/drm/i915/soc/intel_pch.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.o
  CC      fs/btrfs/accessors.o
  CC      net/ipv4/netfilter.o
  AR      drivers/firmware/arm_ffa/built-in.a
  AR      drivers/firmware/arm_scmi/built-in.a
  AR      drivers/firmware/broadcom/built-in.a
  CC      mm/swap_cgroup.o
  CC      lib/string_helpers.o
  CC      drivers/md/dm-stats.o
  AR      drivers/firmware/cirrus/built-in.a
  CC [M]  fs/smb/client/cifs_unicode.o
  AR      drivers/firmware/meson/built-in.a
  CC [M]  drivers/gpu/drm/drm_edid.o
  CC [M]  drivers/gpu/drm/xe/xe_lrc.o
  CC [M]  drivers/gpu/drm/drm_encoder.o
  CC [M]  drivers/gpu/drm/xe/xe_migrate.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_82599.o
  CC [M]  drivers/gpu/drm/xe/xe_mmio.o
  GZIP    kernel/config_data.gz
  CC      kernel/configs.o
  CC      drivers/firmware/efi/libstub/efi-stub-helper.o
  CC [M]  fs/overlayfs/util.o
  CC [M]  fs/smb/client/nterr.o
  CC      drivers/acpi/acpica/utinit.o
  CC      drivers/md/dm-rq.o
  LD [M]  drivers/net/usb/asix.o
  CC [M]  fs/overlayfs/inode.o
  CC [M]  fs/overlayfs/file.o
  CC      drivers/mmc/core/sdio_ops.o
  CC      drivers/mmc/core/sdio_bus.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/nvfw/fw.o
  CC      fs/btrfs/xattr.o
  AR      drivers/firmware/imx/built-in.a
  CC [M]  drivers/net/ethernet/intel/igb/e1000_mac.o
  CC      mm/hugetlb_cgroup.o
  CC      fs/btrfs/ordered-data.o
  CC [M]  fs/smb/client/cifsencrypt.o
  CC      drivers/acpi/event.o
  AR      kernel/built-in.a
  CC      drivers/md/dm-io-rewind.o
  CC [M]  fs/smb/client/readdir.o
  CC [M]  drivers/gpu/drm/xe/xe_mocs.o
  CC      net/ipv4/inet_diag.o
  CC      drivers/mmc/host/sdhci-acpi.o
  CC      net/ipv4/tcp_diag.o
  CC      drivers/acpi/acpica/utlock.o
  CC      drivers/acpi/acpica/utmath.o
  CC [M]  drivers/net/ethernet/intel/igbvf/netdev.o
  CC      drivers/firmware/efi/libstub/gop.o
  CC      lib/hexdump.o
  CC [M]  drivers/net/ethernet/intel/e1000e/80003es2lan.o
  CC [M]  drivers/net/ethernet/intel/e1000e/mac.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.o
  CC [M]  fs/overlayfs/dir.o
  CC      mm/kmemleak.o
  CC      drivers/mmc/host/cqhci-core.o
  CC [M]  drivers/gpu/drm/i915/i915_memcpy.o
  CC      lib/kstrtox.o
  CC      drivers/mmc/core/sdio_cis.o
  AR      drivers/firmware/psci/built-in.a
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.o
  AR      drivers/firmware/smccc/built-in.a
  CC [M]  drivers/gpu/drm/drm_file.o
  CC      drivers/firmware/efi/libstub/secureboot.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/nvfw/hs.o
  CC [M]  drivers/net/ethernet/intel/igb/e1000_nvm.o
  CC [M]  drivers/gpu/drm/i915/i915_mm.o
  CC      drivers/acpi/acpica/utmisc.o
  AR      drivers/firmware/tegra/built-in.a
  CC      fs/btrfs/extent_io.o
  CC      drivers/md/dm-builtin.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/nvfw/ls.o
  CC [M]  drivers/gpu/drm/i915/i915_sw_fence.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/nvfw/acr.o
  CC      drivers/firmware/efi/efi-bgrt.o
  CC [M]  drivers/gpu/drm/i915/i915_sw_fence_work.o
  CC      mm/page_isolation.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_82598.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.o
  AR      drivers/net/ethernet/microsoft/built-in.a
  CC      lib/debug_info.o
  CC [M]  drivers/net/ethernet/intel/e1000/e1000_ethtool.o
  AR      drivers/net/ethernet/litex/built-in.a
  AR      drivers/net/ethernet/microchip/built-in.a
  AR      drivers/net/ethernet/mscc/built-in.a
  CC      drivers/firmware/efi/libstub/tpm.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_phy.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/nvfw/flcn.o
  CC      drivers/acpi/acpica/utmutex.o
  CC      mm/early_ioremap.o
  CC [M]  drivers/gpu/drm/xe/xe_module.o
  AR      drivers/net/ethernet/neterion/built-in.a
  AR      drivers/net/ethernet/netronome/built-in.a
  AR      drivers/net/ethernet/ni/built-in.a
  CC      fs/btrfs/volumes.o
  CC [M]  drivers/md/dm-bufio.o
  CC [M]  drivers/net/ethernet/intel/igb/e1000_phy.o
  CC      drivers/acpi/acpica/utnonansi.o
  CC      drivers/mmc/core/sdio_io.o
  CC      mm/cma.o
  CC      drivers/firmware/efi/efi.o
  CC      drivers/acpi/evged.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.o
  CC      fs/btrfs/async-thread.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.o
  CC [M]  fs/overlayfs/readdir.o
  CC [M]  fs/overlayfs/copy_up.o
  CC      fs/open.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/falcon/base.o
  AR      drivers/net/ethernet/packetengines/built-in.a
  AR      drivers/net/ethernet/realtek/built-in.a
  CC [M]  drivers/net/ethernet/realtek/8139cp.o
  CC      fs/btrfs/ioctl.o
  AR      drivers/net/ethernet/renesas/built-in.a
  CC [M]  drivers/gpu/drm/drm_fourcc.o
  CC      fs/btrfs/locking.o
  CC      fs/btrfs/orphan.o
  CC      drivers/firmware/efi/libstub/file.o
  CC [M]  drivers/gpu/drm/drm_framebuffer.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/falcon/cmdq.o
  CC      drivers/acpi/acpica/utobject.o
  CC [M]  drivers/gpu/drm/xe/xe_pat.o
  CC [M]  drivers/mmc/host/sdhci-pltfm.o
  CC      drivers/firmware/efi/libstub/mem.o
  CC [M]  drivers/net/ethernet/intel/e1000e/manage.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/falcon/fw.o
  CC [M]  drivers/net/ethernet/intel/e1000e/nvm.o
  CC [M]  drivers/gpu/drm/i915/i915_syncmap.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.o
  CC      net/ipv4/udp_diag.o
  CC      fs/btrfs/export.o
  CC [M]  drivers/gpu/drm/i915/i915_user_extensions.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.o
  CC [M]  fs/smb/client/ioctl.o
  CC [M]  drivers/net/ethernet/intel/igc/igc_mac.o
  CC      drivers/firmware/efi/libstub/random.o
  CC      drivers/firmware/efi/libstub/randomalloc.o
  CC      mm/secretmem.o
  CC      lib/iomap.o
  CC [M]  drivers/gpu/drm/drm_gem.o
  CC      drivers/mmc/core/sdio_irq.o
  CC      drivers/acpi/acpica/utosi.o
  LD [M]  drivers/net/ethernet/intel/igbvf/igbvf.o
  CC [M]  drivers/gpu/drm/i915/i915_ioc32.o
  CC [M]  drivers/net/ethernet/intel/ixgbevf/vf.o
  AR      drivers/net/ethernet/intel/built-in.a
  CC      drivers/firmware/efi/libstub/pci.o
  CC [M]  drivers/net/ethernet/intel/ixgbevf/mbx.o
  CC [M]  drivers/net/ethernet/intel/e100.o
  CC [M]  drivers/gpu/drm/xe/xe_pci.o
  AR      drivers/mmc/host/built-in.a
  CC [M]  drivers/net/ethernet/intel/e1000e/phy.o
  CC [M]  drivers/gpu/drm/drm_ioctl.o
  CC      drivers/firmware/efi/libstub/skip_spaces.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.o
  CC      mm/userfaultfd.o
  CC      fs/btrfs/tree-log.o
  CC [M]  drivers/gpu/drm/i915/i915_debugfs.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_x540.o
  CC [M]  drivers/gpu/drm/i915/i915_debugfs_params.o
  CC [M]  drivers/net/ethernet/intel/e1000/e1000_param.o
  CC      drivers/firmware/efi/libstub/lib-cmdline.o
  CC [M]  drivers/net/ethernet/intel/e1000e/param.o
  CC      drivers/mmc/core/slot-gpio.o
  CC      drivers/acpi/acpica/utownerid.o
  CC [M]  fs/overlayfs/export.o
  CC      drivers/firmware/efi/vars.o
  CC      drivers/firmware/efi/reboot.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_x550.o
  CC [M]  drivers/net/ethernet/intel/igc/igc_i225.o
  CC      net/ipv4/tcp_cubic.o
  CC      fs/btrfs/free-space-cache.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/falcon/msgq.o
  CC      lib/pci_iomap.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/falcon/qmgr.o
  CC [M]  drivers/net/ethernet/intel/ixgbevf/ethtool.o
  CC [M]  drivers/net/ethernet/intel/e1000e/ethtool.o
  CC [M]  drivers/md/dm-bio-prison-v1.o
  CC      fs/btrfs/zlib.o
  CC      mm/memremap.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/falcon/v1.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/falcon/gm200.o
  CC      lib/iomap_copy.o
  CC      drivers/firmware/efi/libstub/lib-ctype.o
  CC      drivers/firmware/efi/libstub/alignedmem.o
  CC      drivers/firmware/efi/libstub/relocate.o
  CC      drivers/acpi/acpica/utpredef.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_lib.o
  CC [M]  drivers/net/ethernet/intel/igb/e1000_mbx.o
  CC      lib/devres.o
  CC [M]  drivers/gpu/drm/i915/display/intel_display_debugfs.o
  CC [M]  drivers/gpu/drm/i915/display/intel_pipe_crc.o
  CC      drivers/firmware/efi/libstub/printk.o
  CC      drivers/firmware/efi/libstub/vsprintf.o
  CC [M]  drivers/gpu/drm/i915/i915_pmu.o
  CC      drivers/mmc/core/regulator.o
  CC [M]  fs/smb/client/sess.o
  CC [M]  drivers/gpu/drm/drm_lease.o
  CC [M]  drivers/gpu/drm/drm_managed.o
  CC [M]  drivers/gpu/drm/xe/xe_pcode.o
  CC [M]  drivers/net/ethernet/realtek/8139too.o
  CC      fs/read_write.o
  CC      lib/check_signature.o
  LD [M]  fs/overlayfs/overlay.o
  CC      drivers/acpi/acpica/utresdecode.o
  CC [M]  drivers/net/ethernet/intel/igb/e1000_i210.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.o
  CC [M]  drivers/gpu/drm/i915/gt/gen2_engine_cs.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/falcon/gp102.o
  CC [M]  fs/smb/client/export.o
  CC      drivers/acpi/acpica/utresrc.o
  CC [M]  drivers/net/ethernet/intel/igb/igb_ptp.o
  CC [M]  drivers/md/dm-bio-prison-v2.o
  CC      drivers/firmware/efi/libstub/x86-stub.o
  CC [M]  drivers/net/ethernet/intel/igb/igb_hwmon.o
  CC [M]  drivers/md/dm-crypt.o
  CC [M]  drivers/net/ethernet/intel/igc/igc_base.o
  CC      fs/file_table.o
  LD [M]  drivers/net/ethernet/intel/e1000/e1000.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/falcon/ga100.o
  CC      mm/hmm.o
  CC      fs/btrfs/lzo.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.o
  CC [M]  drivers/net/ethernet/intel/igc/igc_nvm.o
  CC      fs/btrfs/zstd.o
  CC      lib/interval_tree.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/falcon/ga102.o
  CC      fs/btrfs/compression.o
  CC      net/ipv4/xfrm4_policy.o
  CC      drivers/mmc/core/debugfs.o
  CC      net/ipv4/xfrm4_state.o
  CC [M]  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.o
  CC      drivers/acpi/acpica/utstate.o
  CC [M]  drivers/gpu/drm/xe/xe_pm.o
  CC      lib/assoc_array.o
  CC      mm/memfd.o
  CC [M]  drivers/net/ethernet/intel/e1000e/netdev.o
  CC      drivers/mmc/core/block.o
  CC [M]  drivers/gpu/drm/xe/xe_preempt_fence.o
  CC [M]  drivers/gpu/drm/xe/xe_pt.o
  CC      drivers/firmware/efi/memattr.o
  CC      drivers/acpi/acpica/utstring.o
  CC [M]  drivers/net/ethernet/intel/e1000e/ptp.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/acr/base.o
  CC      mm/bootmem_info.o
  STUBCPY drivers/firmware/efi/libstub/alignedmem.stub.o
  STUBCPY drivers/firmware/efi/libstub/efi-stub-helper.stub.o
  CC      drivers/mmc/core/queue.o
  STUBCPY drivers/firmware/efi/libstub/file.stub.o
  STUBCPY drivers/firmware/efi/libstub/gop.stub.o
  STUBCPY drivers/firmware/efi/libstub/lib-cmdline.stub.o
  CC      fs/btrfs/delayed-ref.o
  STUBCPY drivers/firmware/efi/libstub/lib-ctype.stub.o
  STUBCPY drivers/firmware/efi/libstub/mem.stub.o
  STUBCPY drivers/firmware/efi/libstub/pci.stub.o
  STUBCPY drivers/firmware/efi/libstub/printk.stub.o
  CC      drivers/acpi/acpica/utstrsuppt.o
  CC [M]  drivers/net/ethernet/intel/igc/igc_phy.o
  STUBCPY drivers/firmware/efi/libstub/random.stub.o
  STUBCPY drivers/firmware/efi/libstub/randomalloc.stub.o
  CC [M]  drivers/gpu/drm/i915/gt/gen6_engine_cs.o
  CC [M]  drivers/net/ethernet/intel/ixgbevf/ipsec.o
  STUBCPY drivers/firmware/efi/libstub/relocate.stub.o
  STUBCPY drivers/firmware/efi/libstub/secureboot.stub.o
  STUBCPY drivers/firmware/efi/libstub/skip_spaces.stub.o
  STUBCPY drivers/firmware/efi/libstub/tpm.stub.o
  STUBCPY drivers/firmware/efi/libstub/vsprintf.stub.o
  CC      fs/btrfs/relocation.o
  CC [M]  drivers/gpu/drm/i915/gt/gen6_ppgtt.o
  STUBCPY drivers/firmware/efi/libstub/x86-stub.stub.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82598.o
  AR      drivers/firmware/efi/libstub/lib.a
  CC      fs/super.o
  CC      lib/list_debug.o
  CC      lib/debugobjects.o
  CC      fs/btrfs/delayed-inode.o
  CC      drivers/firmware/efi/tpm.o
  CC      drivers/acpi/acpica/utstrtoul64.o
  CC [M]  drivers/gpu/drm/i915/gt/gen7_renderclear.o
  CC      drivers/acpi/acpica/utxface.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82599.o
  CC [M]  drivers/net/ethernet/realtek/r8169_main.o
  CC      net/ipv4/xfrm4_input.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_vf_error.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_sched.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.o
  CC [M]  drivers/gpu/drm/i915/gt/gen8_engine_cs.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_sysfs.o
  CC      lib/bitrev.o
  CC      drivers/acpi/sysfs.o
  CC      drivers/acpi/acpica/utxfinit.o
  CC [M]  drivers/net/ethernet/intel/igc/igc_diag.o
  CC [M]  drivers/net/ethernet/intel/igc/igc_ethtool.o
  CC      drivers/firmware/efi/memmap.o
  AR      drivers/net/ethernet/sfc/built-in.a
  CC [M]  drivers/gpu/drm/i915/gt/gen8_ppgtt.o
  CC      fs/btrfs/scrub.o
  CC      fs/btrfs/backref.o
  LD [M]  drivers/net/ethernet/intel/igb/igb.o
  CC [M]  drivers/gpu/drm/xe/xe_pt_walk.o
  AR      mm/built-in.a
  CC [M]  drivers/net/ethernet/intel/igc/igc_ptp.o
  CC [M]  drivers/net/ethernet/intel/igc/igc_dump.o
  CC [M]  drivers/net/ethernet/intel/igc/igc_tsn.o
  CC      drivers/firmware/efi/esrt.o
  CC      net/ipv4/xfrm4_output.o
  CC      net/ipv4/xfrm4_protocol.o
  CC [M]  drivers/gpu/drm/xe/xe_query.o
  CC      lib/crc16.o
  CC      drivers/acpi/acpica/utxferror.o
  CC      fs/btrfs/ulist.o
  CC [M]  fs/smb/client/unc.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.o
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.o
  CC      fs/btrfs/qgroup.o
  CC [M]  drivers/net/ethernet/intel/igc/igc_xdp.o
  CC [M]  fs/smb/client/winucase.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/acr/gm200.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_breadcrumbs.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_context.o
  CC [M]  fs/smb/client/smb2ops.o
  CC      lib/crc-t10dif.o
  CC [M]  drivers/md/dm-thin.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_context_sseu.o
  CC [M]  drivers/md/dm-thin-metadata.o
  CC [M]  fs/smb/client/smb2maperror.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_engine_cs.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.o
  HOSTCC  lib/gen_crc32table
  CC      drivers/acpi/acpica/utxfmutex.o
  CC [M]  net/ipv4/ip_tunnel.o
  CC      drivers/firmware/efi/efi-pstore.o
  CC      drivers/acpi/property.o
  CC      drivers/firmware/efi/cper.o
  CC      drivers/firmware/efi/cper_cxl.o
  CC [M]  drivers/gpu/drm/drm_mm.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/acr/gm20b.o
  CC [M]  net/ipv4/udp_tunnel_core.o
  CC [M]  net/ipv4/udp_tunnel_nic.o
  CC      drivers/firmware/efi/runtime-wrappers.o
  AR      drivers/crypto/stm32/built-in.a
  AR      drivers/crypto/xilinx/built-in.a
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.o
  AR      drivers/crypto/hisilicon/trng/built-in.a
  AR      drivers/crypto/intel/keembay/built-in.a
  AR      drivers/crypto/hisilicon/built-in.a
  AR      drivers/crypto/intel/ixp4xx/built-in.a
  CC      drivers/firmware/efi/dev-path-parser.o
  AR      drivers/mmc/core/built-in.a
  AR      drivers/crypto/intel/built-in.a
  AR      drivers/mmc/built-in.a
  AR      drivers/crypto/built-in.a
  CC [M]  drivers/net/ethernet/realtek/r8169_firmware.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/acr/gp102.o
  AR      drivers/firmware/xilinx/built-in.a
  CC      lib/libcrc32c.o
  CC      drivers/firmware/dmi_scan.o
  AR      drivers/acpi/acpica/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_range_fence.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_engine_heartbeat.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_engine_pm.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/acr/gp108.o
  CC      fs/btrfs/send.o
  CC      fs/btrfs/dev-replace.o
  CC      drivers/firmware/efi/apple-properties.o
  AR      net/ipv4/built-in.a
  CC      lib/xxhash.o
  CC      drivers/clocksource/acpi_pm.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.o
  CC      drivers/clocksource/i8253.o
  CC [M]  drivers/gpu/drm/xe/xe_reg_sr.o
  CC [M]  fs/smb/client/smb2transport.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.o
  CC [M]  drivers/gpu/drm/drm_mode_config.o
  LD [M]  drivers/net/ethernet/intel/igc/igc.o
  CC [M]  drivers/net/ethernet/realtek/r8169_phy_config.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/acr/gv100.o
  AR      drivers/net/ethernet/smsc/built-in.a
  CC [M]  drivers/net/ethernet/smsc/smsc9420.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/acr/gp10b.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/acr/tu102.o
  CC [M]  drivers/gpu/drm/drm_mode_object.o
  AR      drivers/net/ethernet/socionext/built-in.a
  AR      drivers/net/ethernet/vertexcom/built-in.a
  AR      drivers/net/ethernet/wangxun/built-in.a
  CC [M]  drivers/gpu/drm/i915/gt/intel_engine_user.o
  LD [M]  drivers/md/dm-bio-prison.o
  CC [M]  drivers/gpu/drm/drm_modes.o
  CC [M]  fs/smb/client/smb2misc.o
  CC [M]  drivers/gpu/drm/drm_modeset_lock.o
  CC      drivers/acpi/acpi_cmos_rtc.o
  CC      drivers/acpi/x86/apple.o
  CC [M]  drivers/gpu/drm/drm_plane.o
  CC      lib/genalloc.o
  CC [M]  drivers/gpu/drm/drm_prime.o
  AR      drivers/md/built-in.a
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/acr/ga100.o
  AR      drivers/net/ethernet/xilinx/built-in.a
  CC      lib/percpu_counter.o
  AR      drivers/clocksource/built-in.a
  CC      drivers/firmware/dmi-sysfs.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_execlists_submission.o
  CC      drivers/hid/usbhid/hid-core.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_ggtt.o
  CC      drivers/firmware/efi/earlycon.o
  CC      drivers/hid/usbhid/hiddev.o
  LD [M]  drivers/net/ethernet/intel/ixgbevf/ixgbevf.o
  CC      drivers/hid/hid-core.o
  CC [M]  fs/smb/client/smb2pdu.o
  CC [M]  drivers/gpu/drm/xe/xe_reg_whitelist.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/acr/ga102.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.o
  CC      drivers/acpi/x86/utils.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_ggtt_fencing.o
  CC      drivers/acpi/x86/s2idle.o
  CC      drivers/hid/hid-input.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.o
  CC [M]  fs/smb/client/smb2inode.o
  CC [M]  drivers/gpu/drm/xe/xe_rtp.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.o
  LD [M]  drivers/net/ethernet/intel/ixgbe/ixgbe.o
  CC [M]  drivers/gpu/drm/drm_print.o
  CC [M]  drivers/gpu/drm/drm_property.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_gt.o
  LD [M]  drivers/net/ethernet/realtek/r8169.o
  CC      drivers/acpi/debugfs.o
  CC      drivers/hid/hid-quirks.o
  AR      drivers/net/ethernet/synopsys/built-in.a
  CC      lib/fault-inject.o
  CC [M]  fs/smb/client/smb2file.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.o
  CC      lib/syscall.o
  CC      drivers/firmware/efi/cper-x86.o
  CC [M]  drivers/gpu/drm/drm_syncobj.o
  CC      fs/char_dev.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.o
  AR      drivers/net/ethernet/pensando/built-in.a
  GEN     xe_wa_oob.c xe_wa_oob.h
  CC [M]  drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.o
  CC      drivers/acpi/acpi_lpat.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_gt_clock_utils.o
  GEN     xe_wa_oob.c xe_wa_oob.h
  CC      fs/stat.o
  AR      drivers/staging/media/built-in.a
  AR      drivers/staging/built-in.a
  AR      drivers/platform/x86/amd/built-in.a
  CC      drivers/mailbox/mailbox.o
  CC      drivers/platform/x86/intel/pmc/core.o
  CC      drivers/mailbox/pcc.o
  LD [M]  net/ipv4/udp_tunnel.o
  AR      net/built-in.a
  LD [M]  drivers/md/dm-thin-pool.o
  AR      drivers/platform/surface/built-in.a
  CC      drivers/platform/x86/intel/pmc/spt.o
  CC      drivers/platform/x86/p2sb.o
  CC [M]  drivers/platform/x86/intel/pmt/class.o
  CC [M]  drivers/gpu/drm/drm_sysfs.o
  CC [M]  drivers/gpu/drm/drm_trace_points.o
  CC [M]  drivers/platform/x86/intel/pmt/telemetry.o
  CC      drivers/acpi/acpi_lpit.o
  CC      drivers/acpi/prmt.o
  CC      drivers/platform/x86/intel/turbo_max_3.o
  CC      drivers/platform/x86/pmc_atom.o
  CC      drivers/devfreq/devfreq.o
  CC [M]  drivers/devfreq/governor_simpleondemand.o
  CC      drivers/powercap/powercap_sys.o
  CC      lib/dynamic_debug.o
  CC [M]  drivers/gpu/drm/xe/xe_sa.o
  CC      drivers/powercap/intel_rapl_common.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bar/nv50.o
  CC [M]  drivers/gpu/drm/xe/xe_sched_job.o
  CC      lib/errname.o
  CC      drivers/firmware/dmi-id.o
  AR      drivers/firmware/efi/built-in.a
  CC      drivers/powercap/intel_rapl_msr.o
  CC      drivers/hid/hid-debug.o
  CC      lib/nlattr.o
  CC [M]  drivers/platform/x86/intel/vsec.o
  CC      lib/checksum.o
  CC      drivers/platform/x86/intel/pmc/cnp.o
  CC      drivers/acpi/acpi_pcc.o
  CC      drivers/acpi/ac.o
  AR      drivers/hid/usbhid/built-in.a
  CC      fs/btrfs/raid56.o
  CC      drivers/hid/hidraw.o
  AR      drivers/mailbox/built-in.a
  CC [M]  drivers/gpu/drm/drm_vblank.o
  CC      drivers/firmware/memmap.o
  CC [M]  fs/smb/client/cifsacl.o
  CC      lib/cpu_rmap.o
  LD [M]  drivers/net/ethernet/intel/e1000e/e1000e.o
  CC [M]  drivers/devfreq/governor_performance.o
  CC      fs/exec.o
  CC      drivers/hid/hid-generic.o
  CC [M]  drivers/platform/x86/intel/pmt/crashlog.o
  AR      drivers/net/ethernet/built-in.a
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.o
  AR      drivers/net/built-in.a
  AR      drivers/perf/built-in.a
  CC [M]  fs/smb/client/fs_context.o
  CC      drivers/ras/ras.o
  CC [M]  drivers/gpu/drm/drm_vblank_work.o
  CC      drivers/ras/debugfs.o
  CC [M]  fs/smb/client/dns_resolve.o
  CC      fs/btrfs/uuid-tree.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_gt_debugfs.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_gt_engines_debugfs.o
  CC      fs/btrfs/props.o
  CC [M]  drivers/gpu/drm/drm_vma_manager.o
  CC      drivers/platform/x86/intel/pmc/icl.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_gt_irq.o
  CC      fs/btrfs/free-space-tree.o
  CC [M]  drivers/gpu/drm/drm_gpuva_mgr.o
  CC      fs/btrfs/tree-checker.o
  LD [M]  drivers/platform/x86/intel/pmt/pmt_class.o
  ASN.1   fs/smb/client/cifs_spnego_negtokeninit.asn1.[ch]
  CC      fs/pipe.o
  CC      lib/dynamic_queue_limits.o
  CC [M]  fs/smb/client/smb1ops.o
  CC      fs/namei.o
  CC      drivers/acpi/button.o
  CC      drivers/hid/hid-a4tech.o
  CC      drivers/platform/x86/intel/pmc/tgl.o
  CC      drivers/platform/x86/intel/pmc/adl.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_gt_mcr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.o
  CC      drivers/hid/hid-apple.o
  AR      drivers/firmware/built-in.a
  CC      lib/glob.o
  CC      fs/btrfs/space-info.o
  CC      lib/strncpy_from_user.o
  CC [M]  drivers/gpu/drm/xe/xe_step.o
  CC [M]  fs/smb/client/cifssmb.o
  CC      drivers/hid/hid-belkin.o
  CC      drivers/hid/hid-cherry.o
  LD [M]  drivers/platform/x86/intel/pmt/pmt_telemetry.o
  LD [M]  drivers/platform/x86/intel/pmt/pmt_crashlog.o
  CC [M]  drivers/gpu/drm/drm_writeback.o
  CC [M]  drivers/platform/x86/intel/rst.o
  AR      drivers/powercap/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_sync.o
  CC [M]  drivers/platform/x86/wmi.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.o
  CC [M]  fs/smb/client/cifs_spnego_negtokeninit.asn1.o
  CC      drivers/acpi/fan_core.o
  CC [M]  fs/smb/client/asn1.o
  CC      lib/strnlen_user.o
  CC [M]  drivers/gpu/drm/xe/xe_tile.o
  AR      drivers/hwtracing/intel_th/built-in.a
  CC      drivers/platform/x86/intel/pmc/mtl.o
  CC      drivers/platform/x86/intel/pmc/pltdrv.o
  AR      drivers/devfreq/built-in.a
  CC      drivers/android/binderfs.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_gt_pm.o
  CC      drivers/android/binder.o
  CC [M]  drivers/gpu/drm/xe/xe_tile_sysfs.o
  AR      drivers/ras/built-in.a
  CC      lib/net_utils.o
  CC      drivers/acpi/fan_attr.o
  CC      drivers/acpi/processor_driver.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.o
  LD [M]  drivers/platform/x86/intel/intel_vsec.o
  CC [M]  drivers/platform/x86/wmi-bmof.o
  CC      fs/btrfs/block-rsv.o
  CC      drivers/android/binder_alloc.o
  CC [M]  drivers/gpu/drm/lib/drm_random.o
  CC [M]  drivers/gpu/drm/xe/xe_trace.o
  CC [M]  drivers/gpu/drm/drm_ioc32.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.o
  CC [M]  drivers/gpu/drm/drm_panel.o
  CC [M]  drivers/gpu/drm/drm_pci.o
  CC      fs/fcntl.o
  CC      drivers/hid/hid-chicony.o
  CC      fs/btrfs/delalloc-space.o
  LD [M]  drivers/platform/x86/intel/intel-rst.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_gt_pm_irq.o
  CC      drivers/acpi/processor_thermal.o
  CC [M]  drivers/gpu/drm/xe/xe_ttm_sys_mgr.o
  CC [M]  drivers/gpu/drm/xe/xe_ttm_stolen_mgr.o
  CC      drivers/acpi/processor_idle.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_gt_requests.o
  CC      fs/ioctl.o
  AR      drivers/platform/x86/intel/pmc/built-in.a
  AR      drivers/platform/x86/intel/built-in.a
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.o
  CC [M]  drivers/platform/x86/mxm-wmi.o
  CC      fs/btrfs/block-group.o
  CC [M]  drivers/gpu/drm/drm_debugfs.o
  CC      fs/readdir.o
  CC      drivers/hid/hid-cypress.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gm107.o
  CC      lib/sg_pool.o
  CC      drivers/acpi/processor_throttling.o
  CC [M]  drivers/platform/x86/intel_ips.o
  AR      drivers/platform/x86/built-in.a
  CC [M]  drivers/gpu/drm/i915/gt/intel_gt_sysfs.o
  CC      lib/stackdepot.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_gt_sysfs_pm.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.o
  CC [M]  drivers/gpu/drm/drm_debugfs_crc.o
  CC      fs/btrfs/discard.o
  CC [M]  drivers/gpu/drm/drm_edid_load.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_gtt.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_llc.o
  CC      drivers/acpi/processor_perflib.o
  CC      fs/select.o
  CC      lib/ucs2_string.o
  CC      lib/sbitmap.o
  CC      fs/btrfs/reflink.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.o
  CC [M]  drivers/gpu/drm/drm_panel_orientation_quirks.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gm20b.o
  AR      drivers/nvmem/layouts/built-in.a
  CC      drivers/nvmem/core.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bar/tu102.o
  CC [M]  drivers/gpu/drm/drm_buddy.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_lrc.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/base.o
  CC [M]  drivers/mtd/chips/chipreg.o
  CC [M]  drivers/uio/uio.o
  CC      drivers/hid/hid-ezkey.o
  CC      fs/btrfs/subpage.o
  CC      fs/dcache.o
  CC      lib/group_cpus.o
  CC [M]  drivers/gpu/drm/xe/xe_ttm_vram_mgr.o
  CC [M]  drivers/vfio/vfio_main.o
  CC [M]  drivers/vfio/group.o
  CC [M]  drivers/vfio/pci/vfio_pci_core.o
  CC [M]  drivers/vfio/iova_bitmap.o
  CC [M]  drivers/vfio/pci/vfio_pci_intrs.o
  CC [M]  drivers/vfio/container.o
  CC [M]  lib/asn1_decoder.o
  CC [M]  drivers/gpu/drm/drm_gem_shmem_helper.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_migrate.o
  CC      fs/btrfs/tree-mod-log.o
  CC      drivers/acpi/container.o
  CC      fs/btrfs/extent-io-tree.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/bit.o
  CC [M]  drivers/vfio/virqfd.o
  CC      fs/inode.o
  GEN     lib/oid_registry_data.c
  CC [M]  drivers/gpu/drm/drm_suballoc.o
  CC [M]  drivers/gpu/drm/xe/xe_tuning.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.o
  AR      drivers/platform/built-in.a
  CC      drivers/acpi/thermal.o
  CC [M]  drivers/mtd/mtdcore.o
  CC [M]  drivers/vfio/vfio_iommu_type1.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/boost.o
  CC      fs/attr.o
  CC [M]  lib/oid_registry.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/conn.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_mocs.o
  CC [M]  drivers/mtd/mtdsuper.o
  CC      drivers/hid/hid-kensington.o
  CC      drivers/hid/hid-lg.o
  AR      lib/lib.a
  CC [M]  drivers/vfio/pci/vfio_pci_rdwr.o
  CC [M]  drivers/vfio/pci/vfio_pci_config.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_ppgtt.o
  CC [M]  drivers/mtd/mtdconcat.o
  CC      drivers/acpi/acpi_memhotplug.o
  AR      drivers/nvmem/built-in.a
  CC      drivers/acpi/ioapic.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_rc6.o
  CC [M]  drivers/bluetooth/btusb.o
  CC [M]  drivers/pps/pps.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_region_lmem.o
  CC [M]  drivers/pps/kapi.o
  GEN     lib/crc32table.h
  CC [M]  drivers/gpu/drm/drm_gem_ttm_helper.o
  CC [M]  drivers/gpu/drm/xe/xe_uc.o
  CC [M]  drivers/bluetooth/btintel.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.o
  CC      lib/crc32.o
  CC [M]  drivers/bluetooth/btbcm.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_rap.o
  CC [M]  drivers/gpu/drm/drm_atomic_helper.o
  CC [M]  drivers/pps/sysfs.o
  CC [M]  drivers/gpu/drm/drm_atomic_state_helper.o
  LD [M]  drivers/vfio/vfio.o
  CC [M]  drivers/vfio/pci/vfio_pci.o
  CC      drivers/acpi/battery.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/cstep.o
  CC [M]  drivers/gpu/drm/xe/xe_uc_debugfs.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_fw_attestation.o
  CC [M]  drivers/dca/dca-core.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.o
  CC [M]  drivers/mtd/mtdpart.o
  CC [M]  drivers/ssb/main.o
  CC      fs/bad_inode.o
  CC      fs/file.o
  CC      fs/filesystems.o
  CC      fs/btrfs/fs.o
  LD [M]  drivers/pps/pps_core.o
  AR      lib/built-in.a
  CC [M]  drivers/vhost/net.o
  CC [M]  drivers/vhost/vhost.o
  CC      drivers/acpi/hed.o
  CC      drivers/acpi/bgrt.o
  CC [M]  drivers/vhost/iotlb.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/dcb.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/disp.o
  CC [M]  drivers/mtd/mtdchar.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.o
  CC [M]  drivers/dca/dca-sysfs.o
  CC      fs/namespace.o
  CC      drivers/acpi/cppc_acpi.o
  CC      drivers/hid/hid-lg-g15.o
  CC      fs/seq_file.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_renderstate.o
  CC      fs/xattr.o
  CC      drivers/hid/hid-microsoft.o
  CC [M]  drivers/gpu/drm/xe/xe_uc_fw.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_reset.o
  LD [M]  drivers/vfio/pci/vfio-pci.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_mca.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.o
  LD [M]  drivers/vfio/pci/vfio-pci-core.o
  LD [M]  fs/smb/client/cifs.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_lsdma.o
  CC      fs/btrfs/messages.o
  CC      drivers/hid/hid-monterey.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/dp.o
  CC      fs/btrfs/bio.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_ring.o
  CC [M]  drivers/gpu/drm/drm_bridge_connector.o
  CC      fs/libfs.o
  CC      drivers/acpi/spcr.o
  CC [M]  drivers/ssb/scan.o
  CC [M]  drivers/bluetooth/btrtl.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.o
  CC [M]  drivers/gpu/drm/xe/xe_vm.o
  LD [M]  drivers/vhost/vhost_iotlb.o
  CC [M]  drivers/gpu/drm/xe/xe_vm_madvise.o
  CC [M]  drivers/gpu/drm/drm_crtc_helper.o
  CC [M]  drivers/ssb/sprom.o
  CC      drivers/acpi/acpi_pad.o
  CC      fs/fs-writeback.o
  LD [M]  drivers/dca/dca.o
  CC [M]  drivers/ssb/pci.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/extdev.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_ring_submission.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_rps.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_sa_media.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_sseu.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_sseu_debugfs.o
  CC      fs/pnode.o
  CC      fs/btrfs/lru_cache.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.o
  CC [M]  drivers/gpu/drm/drm_damage_helper.o
  CC [M]  drivers/gpu/drm/drm_encoder_slave.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/fan.o
  LD [M]  drivers/mtd/mtd.o
  CC [M]  drivers/gpu/drm/xe/xe_wait_user_fence.o
  CC [M]  drivers/ssb/pcihost_wrapper.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/gpio.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/i2c.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/cik.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_timeline.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_wopcm.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_workarounds.o
  CC [M]  drivers/gpu/drm/drm_flip_work.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/iccsense.o
  AR      drivers/hid/built-in.a
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/image.o
  CC      fs/splice.o
  CC [M]  drivers/gpu/drm/xe/xe_wa.o
  CC [M]  drivers/ssb/driver_chipcommon.o
  CC      fs/sync.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/cik_ih.o
  CC      fs/btrfs/acl.o
  CC [M]  drivers/ssb/driver_chipcommon_pmu.o
  CC      fs/utimes.o
  CC [M]  drivers/acpi/acpi_video.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/dce_v8_0.o
  CC [M]  drivers/gpu/drm/xe/xe_wopcm.o
  AR      drivers/android/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_display.o
  CC [M]  drivers/gpu/drm/xe/display/xe_fb_pin.o
  CC      fs/d_path.o
  CC [M]  drivers/ssb/driver_pcicore.o
  CC [M]  drivers/gpu/drm/i915/gt/shmem_utils.o
  CC [M]  drivers/acpi/video_detect.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/init.o
  CC [M]  drivers/gpu/drm/i915/gt/sysfs_engines.o
  LD [M]  drivers/vhost/vhost_net.o
  CC [M]  drivers/gpu/drm/drm_format_helper.o
  CC [M]  drivers/gpu/drm/drm_gem_atomic_helper.o
  CC [M]  drivers/gpu/drm/drm_gem_framebuffer_helper.o
  CC [M]  drivers/gpu/drm/drm_kms_helper_common.o
  CC [M]  drivers/gpu/drm/drm_modeset_helper.o
  CC [M]  drivers/gpu/drm/drm_plane_helper.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/mxm.o
  CC [M]  drivers/gpu/drm/drm_probe_helper.o
  CC [M]  drivers/gpu/drm/drm_rect.o
  CC [M]  drivers/gpu/drm/drm_self_refresh_helper.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/npde.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_ggtt_gmch.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.o
  CC [M]  drivers/gpu/drm/i915/gt/gen6_renderstate.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/cik_sdma.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/pcir.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/perf.o
  CC      fs/stack.o
  CC [M]  drivers/gpu/drm/xe/display/xe_hdcp_gsc.o
  CC [M]  drivers/gpu/drm/xe/display/xe_plane_initial.o
  CC [M]  drivers/gpu/drm/i915/gt/gen7_renderstate.o
  CC [M]  drivers/gpu/drm/i915/gt/gen8_renderstate.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/uvd_v4_2.o
  CC      fs/fs_struct.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/vce_v2_0.o
  CC [M]  drivers/gpu/drm/i915/gt/gen9_renderstate.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/si.o
  AR      drivers/acpi/built-in.a
  CC [M]  drivers/gpu/drm/drm_simple_kms_helper.o
  LD [M]  drivers/ssb/ssb.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_busy.o
  AR      fs/btrfs/built-in.a
  CC [M]  drivers/gpu/drm/amd/amdgpu/si_ih.o
  CC [M]  drivers/gpu/drm/bridge/panel.o
  CC [M]  drivers/gpu/drm/drm_fbdev_generic.o
  CC [M]  drivers/gpu/drm/drm_fb_helper.o
  CC      fs/statfs.o
  LD [M]  drivers/gpu/drm/drm.o
  LD [M]  drivers/gpu/drm/drm_shmem_helper.o
  LD [M]  drivers/gpu/drm/drm_suballoc_helper.o
  LD [M]  drivers/gpu/drm/drm_ttm_helper.o
  CC [M]  drivers/gpu/drm/xe/display/xe_display_rps.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/pll.o
  AR      drivers/gpu/drm/built-in.a
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_clflush.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/pmu.o
  CC [M]  drivers/gpu/drm/xe/display/ext/i915_irq.o
  CC [M]  drivers/gpu/drm/xe/display/ext/i915_utils.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/si_dma.o
  CC      fs/fs_pin.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/power_budget.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/ramcfg.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/rammap.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_context.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_create.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowacpi.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowof.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_domain.o
  CC [M]  drivers/gpu/drm/xe/display/ext/intel_clock_gating.o
  CC      fs/nsfs.o
  CC [M]  drivers/gpu/drm/xe/i915-soc/intel_dram.o
  CC      fs/fs_types.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowpci.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o
  LD [M]  drivers/acpi/video.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_internal.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_object.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_lmem.o
  CC [M]  drivers/gpu/drm/xe/i915-soc/intel_pch.o
  CC      fs/fs_context.o
  CC      fs/fs_parser.o
  CC [M]  drivers/gpu/drm/xe/i915-display/icl_dsi.o
  CC      fs/fsopen.o
  CC      fs/init.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/dce_v6_0.o
  CC      fs/kernel_read_file.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/uvd_v3_1.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowramin.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowrom.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_mman.o
  CC      fs/mnt_idmapping.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/vi.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/timing.o
  CC      fs/remap_range.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_pages.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/mxgpu_vi.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/nbio_v6_1.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/therm.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/soc15.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/vmap.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/volt.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/vpstate.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_phys.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_pm.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/xpio.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_atomic.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_atomic_plane.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/emu_soc.o
  CC      fs/buffer.o
  LD [M]  drivers/gpu/drm/drm_kms_helper.o
  CC      fs/mpage.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_region.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_shmem.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/mxgpu_ai.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/nbio_v7_0.o
  CC      fs/proc_namespace.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_audio.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_backlight.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/vega10_reg_init.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_bios.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_shrinker.o
  CC      fs/direct-io.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_stolen.o
  CC      fs/eventpoll.o
  CC      fs/anon_inodes.o
  CC      fs/signalfd.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/M0203.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/M0205.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/M0209.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_throttle.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_tiling.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bios/P0260.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bus/base.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_ttm.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bus/hwsq.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bus/nv04.o
  CC      fs/timerfd.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/vega20_reg_init.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/nbio_v7_4.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/nbio_v2_3.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.o
  CC      fs/eventfd.o
  CC      fs/userfaultfd.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_userptr.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_bw.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gem_wait.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/nv.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/arct_reg_init.o
  CC [M]  drivers/gpu/drm/i915/gem/i915_gemfs.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/mxgpu_nv.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/nbio_v7_2.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bus/nv31.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/hdp_v4_0.o
  CC [M]  drivers/gpu/drm/i915/i915_active.o
  CC [M]  drivers/gpu/drm/i915/i915_cmd_parser.o
  CC      fs/aio.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/hdp_v5_0.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bus/nv50.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/aldebaran_reg_init.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bus/g94.o
  CC      fs/locks.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_cdclk.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/aldebaran.o
  CC [M]  drivers/gpu/drm/i915/i915_deps.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/soc21.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_evict.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/sienna_cichlid.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_gtt.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_color.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_ww.o
  CC [M]  drivers/gpu/drm/i915/i915_gem.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/smu_v13_0_10.o
  CC      fs/binfmt_script.o
  CC      fs/binfmt_elf.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/nbio_v4_3.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/hdp_v6_0.o
  CC [M]  drivers/gpu/drm/i915/i915_query.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_combo_phy.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_connector.o
  CC      fs/compat_binfmt_elf.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/nbio_v7_7.o
  CC [M]  drivers/gpu/drm/i915/i915_request.o
  CC [M]  drivers/gpu/drm/i915/i915_scheduler.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/bus/gf100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/hdp_v5_2.o
  CC [M]  drivers/gpu/drm/i915/i915_trace_points.o
  CC      fs/mbcache.o
  CC [M]  drivers/gpu/drm/i915/i915_ttm_buddy_manager.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/lsdma_v6_0.o
  CC      fs/posix_acl.o
  CC [M]  drivers/gpu/drm/i915/i915_vma.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_crtc.o
  CC      fs/coredump.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/nbio_v7_9.o
  CC [M]  drivers/gpu/drm/i915/i915_vma_resource.o
  CC      fs/drop_caches.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/df_v1_7.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_crtc_state_dump.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_cursor.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/df_v3_6.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/df_v4_3.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/clk/nv04.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_cx0_phy.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.o
  CC      fs/sysctls.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_ddi.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/clk/nv40.o
  CC      fs/fhandle.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/clk/nv50.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_ddi_buf_trans.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_debugfs.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_debugfs.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfxhub_v1_1.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_device.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_guc.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_guc_capture.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/clk/g84.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_driver.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_irq.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_guc_fw.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/clk/gt215.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/mmhub_v2_3.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_power.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_guc_log.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_power_map.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_guc_rc.o
  AR      fs/built-in.a
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_power_well.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_huc.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfxhub_v3_0.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_huc_debugfs.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_huc_fw.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_uc.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_uc_debugfs.o
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_trace.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dkl_phy.o
  CC [M]  drivers/gpu/drm/i915/gt/intel_gsc.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/mmhub_v3_0.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/clk/mcp77.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/clk/gf100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/mmhub_v3_0_2.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.o
  CC [M]  drivers/gpu/drm/i915/i915_hwmon.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dmc.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/mmhub_v3_0_1.o
  CC [M]  drivers/gpu/drm/i915/display/hsw_ips.o
  CC [M]  drivers/gpu/drm/i915/display/intel_atomic.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dp.o
  CC [M]  drivers/gpu/drm/i915/display/intel_atomic_plane.o
  CC [M]  drivers/gpu/drm/i915/display/intel_audio.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dp_aux.o
  CC [M]  drivers/gpu/drm/i915/display/intel_bios.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfxhub_v3_0_3.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dp_aux_backlight.o
  CC [M]  drivers/gpu/drm/i915/display/intel_bw.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dp_hdcp.o
  CC [M]  drivers/gpu/drm/i915/display/intel_cdclk.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dp_link_training.o
  CC [M]  drivers/gpu/drm/i915/display/intel_color.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.o
  CC [M]  drivers/gpu/drm/i915/display/intel_combo_phy.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/umc_v6_0.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/umc_v6_1.o
  CC [M]  drivers/gpu/drm/i915/display/intel_connector.o
  CC [M]  drivers/gpu/drm/i915/display/intel_crtc.o
  CC [M]  drivers/gpu/drm/i915/display/intel_crtc_state_dump.o
  CC [M]  drivers/gpu/drm/i915/display/intel_cursor.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/umc_v6_7.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/clk/gk104.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/clk/gk20a.o
  CC [M]  drivers/gpu/drm/i915/display/intel_display.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/umc_v8_7.o
  CC [M]  drivers/gpu/drm/i915/display/intel_display_driver.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/umc_v8_10.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dp_mst.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_ih.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/iceland_ih.o
  CC [M]  drivers/gpu/drm/i915/display/intel_display_irq.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dpll.o
  CC [M]  drivers/gpu/drm/i915/display/intel_display_power.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dpll_mgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/clk/gm20b.o
  CC [M]  drivers/gpu/drm/i915/display/intel_display_power_map.o
  CC [M]  drivers/gpu/drm/i915/display/intel_display_power_well.o
  CC [M]  drivers/gpu/drm/i915/display/intel_display_reset.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/tonga_ih.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dpt.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/cz_ih.o
  CC [M]  drivers/gpu/drm/i915/display/intel_display_rps.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_drrs.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dsb.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dmc.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dsi.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/clk/pllnv04.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/vega10_ih.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/vega20_ih.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dpio_phy.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/clk/pllgt215.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/navi10_ih.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dpll.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/ih_v6_0.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dpll_mgr.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dpt.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.o
  CC [M]  drivers/gpu/drm/i915/display/intel_drrs.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/psp_v3_1.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dsb.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/psp_v10_0.o
  CC [M]  drivers/gpu/drm/i915/display/intel_fb.o
  CC [M]  drivers/gpu/drm/i915/display/intel_fb_pin.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/psp_v11_0.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dsi_dcs_backlight.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dsi_vbt.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_fb.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/psp_v11_0_8.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_fbc.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/base.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/psp_v12_0.o
  CC [M]  drivers/gpu/drm/i915/display/intel_fbc.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/psp_v13_0.o
  CC [M]  drivers/gpu/drm/i915/display/intel_fdi.o
  CC [M]  drivers/gpu/drm/i915/display/intel_fifo_underrun.o
  CC [M]  drivers/gpu/drm/i915/display/intel_frontbuffer.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/nv04.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/nv05.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_fdi.o
  CC [M]  drivers/gpu/drm/i915/display/intel_global_state.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_fifo_underrun.o
  CC [M]  drivers/gpu/drm/i915/display/intel_hdcp.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_frontbuffer.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_global_state.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/dce_v10_0.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/dce_v11_0.o
  CC [M]  drivers/gpu/drm/i915/display/intel_hdcp_gsc.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.o
  CC [M]  drivers/gpu/drm/i915/display/intel_hotplug.o
  CC [M]  drivers/gpu/drm/i915/display/intel_hotplug_irq.o
  CC [M]  drivers/gpu/drm/i915/display/intel_hti.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/nv10.o
  CC [M]  drivers/gpu/drm/i915/display/intel_load_detect.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_gmbus.o
  CC [M]  drivers/gpu/drm/i915/display/intel_lpe_audio.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.o
  CC [M]  drivers/gpu/drm/i915/display/intel_modeset_lock.o
  CC [M]  drivers/gpu/drm/i915/display/intel_modeset_verify.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/nv1a.o
  CC [M]  drivers/gpu/drm/i915/display/intel_modeset_setup.o
  CC [M]  drivers/gpu/drm/i915/display/intel_overlay.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/nv20.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_hdcp.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.o
  CC [M]  drivers/gpu/drm/i915/display/intel_pch_display.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_hdmi.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/nv50.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_hotplug.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_hotplug_irq.o
  CC [M]  drivers/gpu/drm/i915/display/intel_pch_refclk.o
  CC [M]  drivers/gpu/drm/i915/display/intel_plane_initial.o
  CC [M]  drivers/gpu/drm/i915/display/intel_pmdemand.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_hti.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfx_v9_4.o
  CC [M]  drivers/gpu/drm/i915/display/intel_psr.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_lspcon.o
  CC [M]  drivers/gpu/drm/i915/display/intel_quirks.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_modeset_lock.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/g84.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_modeset_setup.o
  CC [M]  drivers/gpu/drm/i915/display/intel_sprite.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.o
  CC [M]  drivers/gpu/drm/i915/display/intel_sprite_uapi.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.o
  CC [M]  drivers/gpu/drm/i915/display/intel_tc.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/imu_v11_0.o
  CC [M]  drivers/gpu/drm/i915/display/intel_vblank.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_modeset_verify.o
  CC [M]  drivers/gpu/drm/i915/display/intel_vga.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/g98.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_panel.o
  CC [M]  drivers/gpu/drm/i915/display/intel_wm.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_pipe_crc.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/gfx_v11_0_3.o
  CC [M]  drivers/gpu/drm/i915/display/i9xx_plane.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_pmdemand.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_pps.o
  CC [M]  drivers/gpu/drm/i915/display/i9xx_wm.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/gt215.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/imu_v11_0_3.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_psr.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_qp_tables.o
  CC [M]  drivers/gpu/drm/i915/display/skl_scaler.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_quirks.o
  CC [M]  drivers/gpu/drm/i915/display/skl_universal_plane.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.o
  CC [M]  drivers/gpu/drm/i915/display/skl_watermark.o
  CC [M]  drivers/gpu/drm/i915/display/intel_acpi.o
  CC [M]  drivers/gpu/drm/i915/display/intel_opregion.o
  CC [M]  drivers/gpu/drm/i915/display/intel_fbdev.o
  CC [M]  drivers/gpu/drm/i915/display/dvo_ch7017.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/mcp89.o
  CC [M]  drivers/gpu/drm/i915/display/dvo_ch7xxx.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_snps_phy.o
  CC [M]  drivers/gpu/drm/i915/display/dvo_ivch.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.o
  CC [M]  drivers/gpu/drm/i915/display/dvo_ns2501.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_tc.o
  CC [M]  drivers/gpu/drm/i915/display/dvo_sil164.o
  CC [M]  drivers/gpu/drm/i915/display/dvo_tfp410.o
  CC [M]  drivers/gpu/drm/i915/display/g4x_dp.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_vblank.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.o
  CC [M]  drivers/gpu/drm/i915/display/g4x_hdmi.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_vdsc.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/gf100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/sdma_v4_4.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_vga.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_vrr.o
  CC [M]  drivers/gpu/drm/i915/display/icl_dsi.o
  CC [M]  drivers/gpu/drm/i915/display/intel_backlight.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.o
  CC [M]  drivers/gpu/drm/i915/display/intel_crt.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/gm107.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_wm.o
  CC [M]  drivers/gpu/drm/xe/i915-display/skl_scaler.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.o
  CC [M]  drivers/gpu/drm/xe/i915-display/skl_universal_plane.o
  CC [M]  drivers/gpu/drm/i915/display/intel_cx0_phy.o
  CC [M]  drivers/gpu/drm/i915/display/intel_ddi.o
  CC [M]  drivers/gpu/drm/i915/display/intel_ddi_buf_trans.o
  CC [M]  drivers/gpu/drm/xe/i915-display/skl_watermark.o
  CC [M]  drivers/gpu/drm/i915/display/intel_display_device.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/gm200.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_acpi.o
  CC [M]  drivers/gpu/drm/i915/display/intel_display_trace.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dkl_phy.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dp.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dp_aux.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_opregion.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dp_aux_backlight.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dp_hdcp.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dp_link_training.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/gv100.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dp_mst.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/tu102.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_fbdev.o
  CC [M]  drivers/gpu/drm/xe/xe_guc.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dsi.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dsi_dcs_backlight.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dsi_vbt.o
  CC [M]  drivers/gpu/drm/i915/display/intel_dvo.o
  CC [M]  drivers/gpu/drm/xe/xe_ring_ops.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/devinit/ga100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fault/base.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fault/user.o
  HDRTEST drivers/gpu/drm/xe/abi/guc_klvs_abi.h
  HDRTEST drivers/gpu/drm/xe/abi/guc_errors_abi.h
  CC [M]  drivers/gpu/drm/i915/display/intel_gmbus.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fault/gp100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/mes_v10_1.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fault/gp10b.o
  CC [M]  drivers/gpu/drm/i915/display/intel_hdmi.o
  CC [M]  drivers/gpu/drm/i915/display/intel_lspcon.o
  CC [M]  drivers/gpu/drm/i915/display/intel_lvds.o
  HDRTEST drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h
  CC [M]  drivers/gpu/drm/i915/display/intel_panel.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fault/gv100.o
  CC [M]  drivers/gpu/drm/i915/display/intel_pps.o
  HDRTEST drivers/gpu/drm/xe/abi/guc_communication_mmio_abi.h
  HDRTEST drivers/gpu/drm/xe/abi/guc_actions_abi.h
  HDRTEST drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fault/tu102.o
  HDRTEST drivers/gpu/drm/xe/abi/guc_messages_abi.h
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/i915_vma_types.h
  CC [M]  drivers/gpu/drm/i915/display/intel_qp_tables.o
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/vlv_sideband_reg.h
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/intel_wakeref.h
  CC [M]  drivers/gpu/drm/i915/display/intel_sdvo.o
  CC [M]  drivers/gpu/drm/i915/display/intel_snps_phy.o
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/intel_pcode.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/mes_v11_0.o
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h
  CC [M]  drivers/gpu/drm/i915/display/intel_tv.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.o
  CC [M]  drivers/gpu/drm/i915/display/intel_vdsc.o
  CC [M]  drivers/gpu/drm/i915/display/intel_vrr.o
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/i915_reg_defs.h
  CC [M]  drivers/gpu/drm/i915/display/vlv_dsi.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/base.o
  CC [M]  drivers/gpu/drm/i915/display/vlv_dsi_pll.o
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/i915_trace.h
  CC [M]  drivers/gpu/drm/i915/i915_perf.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv04.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/uvd_v5_0.o
  CC [M]  drivers/gpu/drm/i915/pxp/intel_pxp.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv1a.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv10.o
  CC [M]  drivers/gpu/drm/i915/pxp/intel_pxp_tee.o
  CC [M]  drivers/gpu/drm/i915/pxp/intel_pxp_huc.o
  CC [M]  drivers/gpu/drm/i915/pxp/intel_pxp_cmd.o
  CC [M]  drivers/gpu/drm/i915/pxp/intel_pxp_debugfs.o
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/i915_reg.h
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/i915_active_types.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv20.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv25.o
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/i915_utils.h
  CC [M]  drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.o
  CC [M]  drivers/gpu/drm/i915/pxp/intel_pxp_irq.o
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/i915_config.h
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/i915_vma.h
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/vlv_sideband.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/uvd_v7_0.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv30.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv35.o
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/intel_mchbar_regs.h
  CC [M]  drivers/gpu/drm/i915/pxp/intel_pxp_pm.o
  CC [M]  drivers/gpu/drm/i915/pxp/intel_pxp_session.o
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/i915_debugfs.h
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/soc/intel_pch.h
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/soc/intel_dram.h
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/soc/intel_gmch.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv36.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv40.o
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/i915_vgpu.h
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/i915_fixed.h
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/intel_runtime_pm.h
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/intel_pm_types.h
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/intel_uncore.h
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/intel_pci_config.h
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/gt/intel_rps.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.o
  CC [M]  drivers/gpu/drm/i915/i915_gpu_error.o
  HDRTEST drivers/gpu/drm/xe/compat-i915-headers/intel_clock_gating.h
  HDRTEST drivers/gpu/drm/xe/display/ext/i915_irq.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv41.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv44.o
  HDRTEST drivers/gpu/drm/xe/regs/xe_reg_defs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/vce_v3_0.o
  HDRTEST drivers/gpu/drm/xe/regs/xe_guc_regs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/vce_v4_0.o
  HDRTEST drivers/gpu/drm/xe/regs/xe_gt_regs.h
  HDRTEST drivers/gpu/drm/xe/regs/xe_regs.h
  HDRTEST drivers/gpu/drm/xe/regs/xe_gpu_commands.h
  HDRTEST drivers/gpu/drm/xe/regs/xe_lrc_layout.h
  HDRTEST drivers/gpu/drm/xe/regs/xe_engine_regs.h
  CC [M]  drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.o
  HDRTEST drivers/gpu/drm/xe/tests/xe_test.h
  HDRTEST drivers/gpu/drm/xe/tests/xe_pci_test.h
  CC [M]  drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.o
  HDRTEST drivers/gpu/drm/xe/tests/xe_migrate_test.h
  HDRTEST drivers/gpu/drm/xe/tests/xe_dma_buf_test.h
  CC [M]  drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.o
  HDRTEST drivers/gpu/drm/xe/tests/xe_bo_test.h
  HDRTEST drivers/gpu/drm/xe/xe_bb.h
  CC [M]  drivers/gpu/drm/i915/selftests/i915_random.o
  HDRTEST drivers/gpu/drm/xe/xe_bb_types.h
  CC [M]  drivers/gpu/drm/i915/selftests/i915_selftest.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv46.o
  HDRTEST drivers/gpu/drm/xe/xe_bo.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv47.o
  CC [M]  drivers/gpu/drm/i915/selftests/igt_atomic.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.o
  CC [M]  drivers/gpu/drm/i915/selftests/igt_flush_test.o
  CC [M]  drivers/gpu/drm/i915/selftests/igt_live_test.o
  CC [M]  drivers/gpu/drm/i915/selftests/igt_mmap.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv49.o
  CC [M]  drivers/gpu/drm/i915/selftests/igt_reset.o
  CC [M]  drivers/gpu/drm/i915/selftests/igt_spinner.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv4e.o
  CC [M]  drivers/gpu/drm/i915/selftests/librapl.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv50.o
  HDRTEST drivers/gpu/drm/xe/xe_bo_doc.h
  CC [M]  drivers/gpu/drm/i915/i915_vgpu.o
  HDRTEST drivers/gpu/drm/xe/xe_bo_evict.h
  HDRTEST drivers/gpu/drm/i915/display/intel_dkl_phy_regs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/vcn_sw_ring.o
  HDRTEST drivers/gpu/drm/xe/xe_bo_types.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/g84.o
  HDRTEST drivers/gpu/drm/i915/display/intel_crtc_state_dump.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.o
  HDRTEST drivers/gpu/drm/i915/display/hsw_ips.h
  HDRTEST drivers/gpu/drm/i915/display/g4x_hdmi.h
  HDRTEST drivers/gpu/drm/i915/display/intel_hdcp_regs.h
  HDRTEST drivers/gpu/drm/i915/display/intel_overlay.h
  HDRTEST drivers/gpu/drm/i915/display/intel_display.h
  HDRTEST drivers/gpu/drm/i915/display/skl_watermark_regs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/vcn_v2_0.o
  HDRTEST drivers/gpu/drm/i915/display/intel_dmc.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.o
  HDRTEST drivers/gpu/drm/i915/display/intel_vga.h
  HDRTEST drivers/gpu/drm/i915/display/intel_audio.h
  HDRTEST drivers/gpu/drm/i915/display/intel_lvds.h
  HDRTEST drivers/gpu/drm/i915/display/intel_modeset_setup.h
  HDRTEST drivers/gpu/drm/i915/display/intel_cdclk.h
  HDRTEST drivers/gpu/drm/i915/display/intel_display_limits.h
  HDRTEST drivers/gpu/drm/i915/display/intel_hotplug.h
  HDRTEST drivers/gpu/drm/i915/display/intel_dkl_phy.h
  HDRTEST drivers/gpu/drm/i915/display/intel_atomic.h
  HDRTEST drivers/gpu/drm/i915/display/intel_display_driver.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gt215.o
  HDRTEST drivers/gpu/drm/i915/display/intel_dpll.h
  HDRTEST drivers/gpu/drm/i915/display/vlv_dsi_pll_regs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.o
  HDRTEST drivers/gpu/drm/i915/display/intel_dp_mst.h
  HDRTEST drivers/gpu/drm/i915/display/intel_fdi_regs.h
  HDRTEST drivers/gpu/drm/i915/display/g4x_dp.h
  HDRTEST drivers/gpu/drm/i915/display/intel_tc.h
  HDRTEST drivers/gpu/drm/i915/display/intel_frontbuffer.h
  HDRTEST drivers/gpu/drm/i915/display/intel_dsi_vbt.h
  HDRTEST drivers/gpu/drm/i915/display/intel_psr.h
  HDRTEST drivers/gpu/drm/i915/display/intel_crt.h
  HDRTEST drivers/gpu/drm/i915/display/intel_opregion.h
  HDRTEST drivers/gpu/drm/i915/display/intel_snps_phy_regs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/mcp77.o
  HDRTEST drivers/gpu/drm/i915/display/i9xx_wm.h
  HDRTEST drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/mcp89.o
  HDRTEST drivers/gpu/drm/i915/display/intel_global_state.h
  HDRTEST drivers/gpu/drm/xe/xe_debugfs.h
  HDRTEST drivers/gpu/drm/i915/display/intel_lpe_audio.h
  HDRTEST drivers/gpu/drm/i915/display/intel_drrs.h
  HDRTEST drivers/gpu/drm/xe/xe_devcoredump.h
  HDRTEST drivers/gpu/drm/i915/display/intel_display_rps.h
  HDRTEST drivers/gpu/drm/i915/display/intel_fbdev.h
  HDRTEST drivers/gpu/drm/xe/xe_devcoredump_types.h
  HDRTEST drivers/gpu/drm/i915/display/intel_pps_regs.h
  HDRTEST drivers/gpu/drm/i915/display/intel_hdmi.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.o
  HDRTEST drivers/gpu/drm/i915/display/intel_fdi.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gf100.o
  HDRTEST drivers/gpu/drm/i915/display/intel_fb.h
  HDRTEST drivers/gpu/drm/i915/display/intel_qp_tables.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gf108.o
  HDRTEST drivers/gpu/drm/i915/display/intel_dsb_regs.h
  HDRTEST drivers/gpu/drm/i915/display/intel_vdsc.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gk104.o
  HDRTEST drivers/gpu/drm/i915/display/intel_snps_phy.h
  HDRTEST drivers/gpu/drm/xe/xe_device.h
  HDRTEST drivers/gpu/drm/i915/display/intel_display_core.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/jpeg_v1_0.o
  HDRTEST drivers/gpu/drm/i915/display/vlv_dsi_pll.h
  HDRTEST drivers/gpu/drm/i915/display/intel_dvo_dev.h
  HDRTEST drivers/gpu/drm/i915/display/intel_hdcp.h
  HDRTEST drivers/gpu/drm/i915/display/intel_sdvo_regs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gk110.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.o
  HDRTEST drivers/gpu/drm/i915/display/intel_pch_refclk.h
  HDRTEST drivers/gpu/drm/xe/xe_device_sysfs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gk20a.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.o
  HDRTEST drivers/gpu/drm/i915/display/intel_modeset_lock.h
  HDRTEST drivers/gpu/drm/xe/xe_device_types.h
  HDRTEST drivers/gpu/drm/i915/display/intel_display_trace.h
  HDRTEST drivers/gpu/drm/xe/xe_display.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gm107.o
  HDRTEST drivers/gpu/drm/i915/display/intel_display_power.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.o
  HDRTEST drivers/gpu/drm/xe/xe_dma_buf.h
  HDRTEST drivers/gpu/drm/xe/xe_drv.h
  HDRTEST drivers/gpu/drm/i915/display/intel_dp_aux_regs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gm200.o
  HDRTEST drivers/gpu/drm/xe/xe_exec.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/athub_v1_0.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/athub_v2_0.o
  HDRTEST drivers/gpu/drm/i915/display/i9xx_plane.h
  HDRTEST drivers/gpu/drm/i915/display/intel_dp_aux_backlight.h
  HDRTEST drivers/gpu/drm/xe/xe_exec_queue.h
  HDRTEST drivers/gpu/drm/i915/display/intel_dpll_mgr.h
  HDRTEST drivers/gpu/drm/i915/display/vlv_dsi.h
  HDRTEST drivers/gpu/drm/i915/display/intel_plane_initial.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/athub_v2_1.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/athub_v3_0.o
  HDRTEST drivers/gpu/drm/i915/display/intel_display_device.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gm20b.o
  HDRTEST drivers/gpu/drm/i915/display/intel_fifo_underrun.h
  HDRTEST drivers/gpu/drm/i915/display/intel_cursor.h
  HDRTEST drivers/gpu/drm/i915/display/vlv_dsi_regs.h
  HDRTEST drivers/gpu/drm/i915/display/intel_cx0_phy.h
  HDRTEST drivers/gpu/drm/xe/xe_exec_queue_types.h
  HDRTEST drivers/gpu/drm/i915/display/skl_scaler.h
  HDRTEST drivers/gpu/drm/i915/display/intel_hti.h
  HDRTEST drivers/gpu/drm/i915/display/icl_dsi_regs.h
  HDRTEST drivers/gpu/drm/i915/display/intel_atomic_plane.h
  HDRTEST drivers/gpu/drm/i915/display/skl_watermark.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gp100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gp102.o
  HDRTEST drivers/gpu/drm/xe/xe_execlist.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/smuio_v9_0.o
  HDRTEST drivers/gpu/drm/i915/display/intel_fbc.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/smuio_v11_0.o
  HDRTEST drivers/gpu/drm/xe/xe_execlist_types.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gp10b.o
  HDRTEST drivers/gpu/drm/i915/display/intel_display_reg_defs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/smuio_v11_0_6.o
  HDRTEST drivers/gpu/drm/xe/xe_force_wake.h
  HDRTEST drivers/gpu/drm/xe/xe_force_wake_types.h
  HDRTEST drivers/gpu/drm/xe/xe_ggtt.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gv100.o
  HDRTEST drivers/gpu/drm/xe/xe_ggtt_types.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/tu102.o
  HDRTEST drivers/gpu/drm/xe/xe_gt.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ga100.o
  HDRTEST drivers/gpu/drm/i915/display/intel_acpi.h
  HDRTEST drivers/gpu/drm/i915/display/intel_connector.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_clock.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ga102.o
  HDRTEST drivers/gpu/drm/xe/xe_gt_debugfs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ram.o
  HDRTEST drivers/gpu/drm/xe/xe_gt_idle_sysfs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv04.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv10.o
  HDRTEST drivers/gpu/drm/i915/display/intel_dpt.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv1a.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv20.o
  HDRTEST drivers/gpu/drm/xe/xe_gt_idle_sysfs_types.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_mcr.h
  HDRTEST drivers/gpu/drm/i915/display/intel_quirks.h
  HDRTEST drivers/gpu/drm/i915/display/intel_dp_link_training.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_pagefault.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_printk.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_sysfs.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_sysfs_types.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
  HDRTEST drivers/gpu/drm/i915/display/intel_color.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/smuio_v13_0.o
  HDRTEST drivers/gpu/drm/i915/display/intel_crtc.h
  HDRTEST drivers/gpu/drm/i915/display/intel_display_debugfs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/smuio_v13_0_3.o
  HDRTEST drivers/gpu/drm/i915/display/intel_modeset_verify.h
  HDRTEST drivers/gpu/drm/i915/display/intel_display_power_well.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/smuio_v13_0_6.o
  HDRTEST drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_topology.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv40.o
  HDRTEST drivers/gpu/drm/xe/xe_gt_types.h
  HDRTEST drivers/gpu/drm/i915/display/intel_psr_regs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.o
  HDRTEST drivers/gpu/drm/xe/xe_guc.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv41.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/mca_v3_0.o
  HDRTEST drivers/gpu/drm/xe/xe_guc_ads.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_ads_types.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_ct.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_ct_types.h
  HDRTEST drivers/gpu/drm/i915/display/intel_wm.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_debugfs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.o
  HDRTEST drivers/gpu/drm/i915/display/intel_pipe_crc.h
  HDRTEST drivers/gpu/drm/i915/display/intel_audio_regs.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv44.o
  HDRTEST drivers/gpu/drm/xe/xe_guc_fwif.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv49.o
  HDRTEST drivers/gpu/drm/xe/xe_guc_hwconfig.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_module.o
  HDRTEST drivers/gpu/drm/xe/xe_guc_log.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.o
  HDRTEST drivers/gpu/drm/xe/xe_guc_log_types.h
  HDRTEST drivers/gpu/drm/i915/display/intel_panel.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_pc.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_pc_types.h
  HDRTEST drivers/gpu/drm/i915/display/intel_sprite.h
  HDRTEST drivers/gpu/drm/i915/display/intel_wm_types.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv4e.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv50.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_chardev.o
  HDRTEST drivers/gpu/drm/i915/display/intel_tv.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.o
  HDRTEST drivers/gpu/drm/i915/display/intel_hti_regs.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_submit.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_pasid.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_doorbell.o
  HDRTEST drivers/gpu/drm/xe/xe_guc_submit_types.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_flat_memory.o
  HDRTEST drivers/gpu/drm/xe/xe_guc_types.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgt215.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.o
  HDRTEST drivers/gpu/drm/xe/xe_huc.h
  HDRTEST drivers/gpu/drm/xe/xe_huc_debugfs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/rammcp77.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgf100.o
  HDRTEST drivers/gpu/drm/xe/xe_huc_types.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgf108.o
  HDRTEST drivers/gpu/drm/xe/xe_hw_engine.h
  HDRTEST drivers/gpu/drm/i915/display/intel_vrr.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgk104.o
  HDRTEST drivers/gpu/drm/i915/display/intel_load_detect.h
  HDRTEST drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgm107.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_mqd_manager.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_mqd_manager_cik.o
  HDRTEST drivers/gpu/drm/i915/display/skl_universal_plane.h
  HDRTEST drivers/gpu/drm/xe/xe_hw_engine_types.h
  HDRTEST drivers/gpu/drm/i915/display/intel_mg_phy_regs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgm200.o
  HDRTEST drivers/gpu/drm/xe/xe_hw_fence.h
  HDRTEST drivers/gpu/drm/xe/xe_hw_fence_types.h
  HDRTEST drivers/gpu/drm/i915/display/intel_bw.h
  HDRTEST drivers/gpu/drm/i915/display/intel_display_irq.h
  HDRTEST drivers/gpu/drm/i915/display/intel_de.h
  HDRTEST drivers/gpu/drm/i915/display/intel_lvds_regs.h
  HDRTEST drivers/gpu/drm/i915/display/intel_gmbus_regs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_mqd_manager_vi.o
  HDRTEST drivers/gpu/drm/xe/xe_irq.h
  HDRTEST drivers/gpu/drm/i915/display/intel_dsi_dcs_backlight.h
  HDRTEST drivers/gpu/drm/xe/xe_lrc.h
  HDRTEST drivers/gpu/drm/xe/xe_lrc_types.h
  HDRTEST drivers/gpu/drm/xe/xe_macros.h
  HDRTEST drivers/gpu/drm/xe/xe_map.h
  HDRTEST drivers/gpu/drm/i915/display/intel_dvo.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgp100.o
  HDRTEST drivers/gpu/drm/i915/display/intel_sdvo.h
  HDRTEST drivers/gpu/drm/xe/xe_migrate.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramga102.o
  HDRTEST drivers/gpu/drm/xe/xe_migrate_doc.h
  HDRTEST drivers/gpu/drm/i915/display/intel_dp_aux.h
  HDRTEST drivers/gpu/drm/i915/display/intel_vdsc_regs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/sddr2.o
  HDRTEST drivers/gpu/drm/i915/display/intel_combo_phy.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_mqd_manager_v9.o
  HDRTEST drivers/gpu/drm/xe/xe_mmio.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/sddr3.o
  HDRTEST drivers/gpu/drm/i915/display/intel_dvo_regs.h
  HDRTEST drivers/gpu/drm/i915/display/intel_gmbus.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gddr3.o
  HDRTEST drivers/gpu/drm/i915/display/intel_hdcp_gsc.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_mqd_manager_v10.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_mqd_manager_v11.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gddr5.o
  HDRTEST drivers/gpu/drm/i915/display/intel_dsi.h
  HDRTEST drivers/gpu/drm/xe/xe_mocs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fuse/base.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fuse/nv50.o
  HDRTEST drivers/gpu/drm/i915/display/intel_dmc_regs.h
  HDRTEST drivers/gpu/drm/xe/xe_module.h
  HDRTEST drivers/gpu/drm/xe/xe_pat.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_kernel_queue.o
  HDRTEST drivers/gpu/drm/xe/xe_pci.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_packet_manager.o
  HDRTEST drivers/gpu/drm/xe/xe_pci_types.h
  HDRTEST drivers/gpu/drm/xe/xe_pcode.h
  HDRTEST drivers/gpu/drm/xe/xe_pcode_api.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fuse/gf100.o
  HDRTEST drivers/gpu/drm/i915/display/intel_ddi.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/fuse/gm107.o
  HDRTEST drivers/gpu/drm/xe/xe_platform_types.h
  HDRTEST drivers/gpu/drm/i915/display/intel_hotplug_irq.h
  HDRTEST drivers/gpu/drm/xe/xe_pm.h
  HDRTEST drivers/gpu/drm/i915/display/intel_tv_regs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_packet_manager_vi.o
  HDRTEST drivers/gpu/drm/xe/xe_preempt_fence.h
  HDRTEST drivers/gpu/drm/i915/display/intel_dsb.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/gpio/base.o
  HDRTEST drivers/gpu/drm/i915/display/intel_bios.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_packet_manager_v9.o
  HDRTEST drivers/gpu/drm/i915/display/intel_pch_display.h
  HDRTEST drivers/gpu/drm/i915/display/intel_display_types.h
  HDRTEST drivers/gpu/drm/xe/xe_preempt_fence_types.h
  HDRTEST drivers/gpu/drm/xe/xe_pt.h
  HDRTEST drivers/gpu/drm/xe/xe_pt_types.h
  HDRTEST drivers/gpu/drm/i915/display/intel_backlight.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process_queue_manager.o
  HDRTEST drivers/gpu/drm/i915/display/intel_vblank.h
  HDRTEST drivers/gpu/drm/i915/display/intel_dp.h
  HDRTEST drivers/gpu/drm/xe/xe_pt_walk.h
  HDRTEST drivers/gpu/drm/xe/xe_query.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.o
  HDRTEST drivers/gpu/drm/xe/xe_range_fence.h
  HDRTEST drivers/gpu/drm/i915/display/intel_pmdemand.h
  HDRTEST drivers/gpu/drm/i915/display/intel_backlight_regs.h
  HDRTEST drivers/gpu/drm/xe/xe_reg_sr.h
  HDRTEST drivers/gpu/drm/xe/xe_reg_sr_types.h
  HDRTEST drivers/gpu/drm/i915/display/intel_combo_phy_regs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/gpio/nv10.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/gpio/nv50.o
  HDRTEST drivers/gpu/drm/i915/display/intel_display_reset.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/gpio/g94.o
  HDRTEST drivers/gpu/drm/i915/display/intel_display_power_map.h
  HDRTEST drivers/gpu/drm/i915/display/intel_ddi_buf_trans.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/gpio/gf119.o
  HDRTEST drivers/gpu/drm/i915/display/icl_dsi.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/gpio/gk104.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/gpio/ga102.o
  HDRTEST drivers/gpu/drm/xe/xe_reg_whitelist.h
  HDRTEST drivers/gpu/drm/i915/display/intel_lspcon.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/gsp/base.o
  HDRTEST drivers/gpu/drm/xe/xe_res_cursor.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager_cik.o
  HDRTEST drivers/gpu/drm/i915/display/intel_dpio_phy.h
  HDRTEST drivers/gpu/drm/xe/xe_ring_ops.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/gsp/gv100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/gsp/ga102.o
  HDRTEST drivers/gpu/drm/i915/display/intel_dp_hdcp.h
  HDRTEST drivers/gpu/drm/i915/display/intel_fb_pin.h
  HDRTEST drivers/gpu/drm/xe/xe_ring_ops_types.h
  HDRTEST drivers/gpu/drm/i915/display/intel_pps.h
  HDRTEST drivers/gpu/drm/xe/xe_rtp.h
  HDRTEST drivers/gpu/drm/xe/xe_rtp_types.h
  HDRTEST drivers/gpu/drm/xe/xe_sa.h
  HDRTEST drivers/gpu/drm/i915/display/intel_sprite_uapi.h
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_ttm.h
  HDRTEST drivers/gpu/drm/xe/xe_sa_types.h
  HDRTEST drivers/gpu/drm/xe/xe_sched_job.h
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_region.h
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_context_types.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/base.o
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_lmem.h
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_mman.h
  HDRTEST drivers/gpu/drm/xe/xe_sched_job_types.h
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_object_types.h
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_context.h
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_clflush.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager_vi.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager_v9.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager_v10.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/nv04.o
  HDRTEST drivers/gpu/drm/xe/xe_step.h
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_tiling.h
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_stolen.h
  HDRTEST drivers/gpu/drm/xe/xe_step_types.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager_v11.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_interrupt.o
  HDRTEST drivers/gpu/drm/xe/xe_sync.h
  HDRTEST drivers/gpu/drm/xe/xe_sync_types.h
  HDRTEST drivers/gpu/drm/xe/xe_tile.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_events.o
  HDRTEST drivers/gpu/drm/xe/xe_tile_sysfs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/nv4e.o
  HDRTEST drivers/gpu/drm/xe/xe_tile_sysfs_types.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/cik_event_interrupt.o
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.h
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_create.h
  HDRTEST drivers/gpu/drm/xe/xe_trace.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/nv50.o
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
  HDRTEST drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/g94.o
  HDRTEST drivers/gpu/drm/xe/xe_ttm_sys_mgr.h
  HDRTEST drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v9.o
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v10.o
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_domain.h
  HDRTEST drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
  HDRTEST drivers/gpu/drm/xe/xe_tuning.h
  HDRTEST drivers/gpu/drm/xe/xe_uc.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v11.o
  HDRTEST drivers/gpu/drm/xe/xe_uc_debugfs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/gf117.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/gf119.o
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_internal.h
  HDRTEST drivers/gpu/drm/xe/xe_uc_fw.h
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_dmabuf.h
  HDRTEST drivers/gpu/drm/i915/gem/selftests/mock_context.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/gk104.o
  HDRTEST drivers/gpu/drm/xe/xe_uc_fw_abi.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/gk110.o
  HDRTEST drivers/gpu/drm/xe/xe_uc_fw_types.h
  HDRTEST drivers/gpu/drm/xe/xe_uc_types.h
  HDRTEST drivers/gpu/drm/xe/xe_vm.h
  HDRTEST drivers/gpu/drm/xe/xe_vm_doc.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.o
  HDRTEST drivers/gpu/drm/xe/xe_vm_madvise.h
  HDRTEST drivers/gpu/drm/xe/xe_vm_types.h
  HDRTEST drivers/gpu/drm/xe/xe_wa.h
  HDRTEST drivers/gpu/drm/i915/gem/selftests/huge_gem_object.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/gm200.o
  HDRTEST drivers/gpu/drm/xe/xe_wait_user_fence.h
  HDRTEST drivers/gpu/drm/i915/gem/selftests/mock_gem_object.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/pad.o
  HDRTEST drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.h
  HDRTEST drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debugfs.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/padnv04.o
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_userptr.h
  HDRTEST drivers/gpu/drm/xe/xe_wopcm.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/padnv4e.o
  HDRTEST drivers/gpu/drm/xe/xe_wopcm_types.h
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_pm.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.o
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_shrinker.h
  LD [M]  drivers/gpu/drm/xe/xe.o
  HDRTEST drivers/gpu/drm/i915/gem/i915_gemfs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_migrate.o
  HDRTEST drivers/gpu/drm/i915/gem/i915_gem_object.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/padnv50.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/padg94.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_timeline_types.h
  HDRTEST drivers/gpu/drm/i915/gt/selftest_engine.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/padgf119.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/padgm200.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_context_types.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_execlists_submission.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/busnv04.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/busnv4e.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/busnv50.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/busgf119.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_pm.h
  HDRTEST drivers/gpu/drm/i915/gt/selftest_rc6.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bit.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/auxg94.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/auxgf119.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_llc_types.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/auxgm200.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/anx9805.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_region_lmem.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_requests.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_ggtt_gmch.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10_3.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_print.h
  HDRTEST drivers/gpu/drm/i915/gt/gen8_ppgtt.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/iccsense/base.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_mcr.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_timeline.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/iccsense/gf100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/instmem/base.o
  HDRTEST drivers/gpu/drm/i915/gt/gen6_engine_cs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv04.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv40.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_workarounds_types.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.o
  HDRTEST drivers/gpu/drm/i915/gt/selftest_rps.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_job.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/ltc/base.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_sa_media.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_debugfs.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_clock_utils.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_rps_types.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/ltc/gf100.o
  HDRTEST drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h
  HDRTEST drivers/gpu/drm/i915/gt/sysfs_engines.h
  HDRTEST drivers/gpu/drm/i915/gt/gen7_renderclear.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_context.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_wopcm.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/ltc/gk104.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/ltc/gm107.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/ltc/gm200.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_mocs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_acp.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_engine_pm.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/ltc/gp100.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_sysfs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../acp/acp_hw.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/ltc/gp102.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_rc6.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_ring_types.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_workarounds.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_ioc32.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_engine_regs.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_pm_irq.h
  HDRTEST drivers/gpu/drm/i915/gt/shmem_utils.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/ltc/gp10b.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/ltc/ga102.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_engine.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mc/base.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_reset_types.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_regs.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_reset.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_uc.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mc/nv04.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.o
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_uc_fw_abi.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu11/arcturus_ppt.o
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_guc_print.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_guc_fw.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mc/nv11.o
  HDRTEST drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mc/nv17.o
  HDRTEST drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mc/nv44.o
  HDRTEST drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mc/nv50.o
  HDRTEST drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mc/g84.o
  HDRTEST drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu11/navi10_ppt.o
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_debugfs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mc/g98.o
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_huc.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_gsc_binary_headers.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_guc.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_huc_print.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_huc_fw.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mc/gt215.o
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mc/gf100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu11/sienna_cichlid_ppt.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mc/gk104.o
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mc/gk20a.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu11/vangogh_ppt.o
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu11/cyan_skillfish_ppt.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu11/smu_v11_0.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mc/gp100.o
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_guc_slpc_types.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu12/renoir_ppt.o
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_uc_debugfs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu12/smu_v12_0.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/aldebaran_ppt.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/yellow_carp_ppt.o
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_guc_rc.h
  HDRTEST drivers/gpu/drm/i915/gt/uc/intel_huc_debugfs.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_hwconfig.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mc/gp10b.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_llc.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mc/ga100.o
  HDRTEST drivers/gpu/drm/i915/gt/gen8_engine_cs.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_sseu_debugfs.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_rc6_types.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_context_param.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_gpu_commands.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_0_ppt.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_engine_user.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_irq.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_gsc.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_4_ppt.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_rps.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/base.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/nv04.o
  HDRTEST drivers/gpu/drm/i915/gt/selftest_llc.h
  HDRTEST drivers/gpu/drm/i915/gt/gen6_ppgtt.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/nv41.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_migrate_types.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/nv44.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/nv50.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/g84.o
  HDRTEST drivers/gpu/drm/i915/gt/selftests/mock_timeline.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_lrc.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_lrc_reg.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_migrate.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_sysfs_pm.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.h
  HDRTEST drivers/gpu/drm/i915/gt/mock_engine.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/mcp77.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_engine_stats.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gf100.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_gtt.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_buffer_pool_types.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_ring.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gk104.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_types.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_renderstate.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_5_ppt.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_sseu.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gk20a.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_engine_types.h
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_engines_debugfs.h
  HDRTEST drivers/gpu/drm/i915/gt/gen2_engine_cs.h
  HDRTEST drivers/gpu/drm/i915/gvt/gvt.h
  HDRTEST drivers/gpu/drm/i915/gvt/trace.h
  HDRTEST drivers/gpu/drm/i915/gvt/debug.h
  HDRTEST drivers/gpu/drm/i915/gvt/edid.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gm200.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_7_ppt.o
  HDRTEST drivers/gpu/drm/i915/gvt/page_track.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_6_ppt.o
  HDRTEST drivers/gpu/drm/i915/gvt/mmio.h
  HDRTEST drivers/gpu/drm/i915/gvt/sched_policy.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/amdgpu_smu.o
  HDRTEST drivers/gpu/drm/i915/gvt/fb_decoder.h
  HDRTEST drivers/gpu/drm/i915/gvt/cmd_parser.h
  HDRTEST drivers/gpu/drm/i915/gvt/dmabuf.h
  HDRTEST drivers/gpu/drm/i915/gvt/mmio_context.h
  HDRTEST drivers/gpu/drm/i915/gvt/display.h
  HDRTEST drivers/gpu/drm/i915/gvt/gtt.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gm20b.o
  HDRTEST drivers/gpu/drm/i915/gvt/scheduler.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gp100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gp10b.o
  HDRTEST drivers/gpu/drm/i915/gvt/reg.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smumgr.o
  HDRTEST drivers/gpu/drm/i915/gvt/execlist.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu8_smumgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gv100.o
  HDRTEST drivers/gpu/drm/i915/gvt/interrupt.h
  HDRTEST drivers/gpu/drm/i915/i915_active.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/tonga_smumgr.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/fiji_smumgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/tu102.o
  HDRTEST drivers/gpu/drm/i915/i915_active_types.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/mem.o
  HDRTEST drivers/gpu/drm/i915/i915_cmd_parser.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/polaris10_smumgr.o
  HDRTEST drivers/gpu/drm/i915/i915_config.h
  HDRTEST drivers/gpu/drm/i915/i915_debugfs.h
  HDRTEST drivers/gpu/drm/i915/i915_debugfs_params.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/memnv04.o
  HDRTEST drivers/gpu/drm/i915/i915_deps.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/memnv50.o
  HDRTEST drivers/gpu/drm/i915/i915_driver.h
  HDRTEST drivers/gpu/drm/i915/i915_drm_client.h
  HDRTEST drivers/gpu/drm/i915/i915_drv.h
  HDRTEST drivers/gpu/drm/i915/i915_file_private.h
  HDRTEST drivers/gpu/drm/i915/i915_fixed.h
  HDRTEST drivers/gpu/drm/i915/i915_gem.h
  HDRTEST drivers/gpu/drm/i915/i915_gem_evict.h
  HDRTEST drivers/gpu/drm/i915/i915_gem_gtt.h
  HDRTEST drivers/gpu/drm/i915/i915_gem_ww.h
  HDRTEST drivers/gpu/drm/i915/i915_getparam.h
  HDRTEST drivers/gpu/drm/i915/i915_gpu_error.h
  HDRTEST drivers/gpu/drm/i915/i915_hwmon.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/iceland_smumgr.o
  HDRTEST drivers/gpu/drm/i915/i915_ioc32.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu7_smumgr.o
  HDRTEST drivers/gpu/drm/i915/i915_ioctl.h
  HDRTEST drivers/gpu/drm/i915/i915_iosf_mbi.h
  HDRTEST drivers/gpu/drm/i915/i915_irq.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/vega10_smumgr.o
  HDRTEST drivers/gpu/drm/i915/i915_memcpy.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu10_smumgr.o
  HDRTEST drivers/gpu/drm/i915/i915_mitigations.h
  HDRTEST drivers/gpu/drm/i915/i915_mm.h
  HDRTEST drivers/gpu/drm/i915/i915_params.h
  HDRTEST drivers/gpu/drm/i915/i915_pci.h
  HDRTEST drivers/gpu/drm/i915/i915_perf.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/memgf100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/ci_smumgr.o
  HDRTEST drivers/gpu/drm/i915/i915_perf_oa_regs.h
  HDRTEST drivers/gpu/drm/i915/i915_perf_types.h
  HDRTEST drivers/gpu/drm/i915/i915_pmu.h
  HDRTEST drivers/gpu/drm/i915/i915_priolist_types.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.o
  HDRTEST drivers/gpu/drm/i915/i915_pvinfo.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/vega12_smumgr.o
  HDRTEST drivers/gpu/drm/i915/i915_query.h
  HDRTEST drivers/gpu/drm/i915/i915_reg.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv04.o
  HDRTEST drivers/gpu/drm/i915/i915_reg_defs.h
  HDRTEST drivers/gpu/drm/i915/i915_request.h
  HDRTEST drivers/gpu/drm/i915/i915_scatterlist.h
  HDRTEST drivers/gpu/drm/i915/i915_scheduler.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/vegam_smumgr.o
  HDRTEST drivers/gpu/drm/i915/i915_scheduler_types.h
  HDRTEST drivers/gpu/drm/i915/i915_selftest.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv41.o
  HDRTEST drivers/gpu/drm/i915/i915_suspend.h
  HDRTEST drivers/gpu/drm/i915/i915_sw_fence.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv44.o
  HDRTEST drivers/gpu/drm/i915/i915_sw_fence_work.h
  HDRTEST drivers/gpu/drm/i915/i915_switcheroo.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.o
  HDRTEST drivers/gpu/drm/i915/i915_syncmap.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmmcp77.o
  HDRTEST drivers/gpu/drm/i915/i915_sysfs.h
  HDRTEST drivers/gpu/drm/i915/i915_tasklet.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.o
  HDRTEST drivers/gpu/drm/i915/i915_trace.h
  HDRTEST drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
  HDRTEST drivers/gpu/drm/i915/i915_user_extensions.h
  HDRTEST drivers/gpu/drm/i915/i915_utils.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgk104.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgk20a.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu9_smumgr.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/vega20_smumgr.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/hwmgr.o
  HDRTEST drivers/gpu/drm/i915/i915_vgpu.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.o
  HDRTEST drivers/gpu/drm/i915/i915_vma.h
  HDRTEST drivers/gpu/drm/i915/i915_vma_resource.h
  HDRTEST drivers/gpu/drm/i915/i915_vma_types.h
  HDRTEST drivers/gpu/drm/i915/intel_clock_gating.h
  HDRTEST drivers/gpu/drm/i915/intel_device_info.h
  HDRTEST drivers/gpu/drm/i915/intel_gvt.h
  HDRTEST drivers/gpu/drm/i915/intel_mchbar_regs.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/hardwaremanager.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu8_hwmgr.o
  HDRTEST drivers/gpu/drm/i915/intel_memory_region.h
  HDRTEST drivers/gpu/drm/i915/intel_pci_config.h
  HDRTEST drivers/gpu/drm/i915/intel_pcode.h
  HDRTEST drivers/gpu/drm/i915/intel_region_ttm.h
  HDRTEST drivers/gpu/drm/i915/intel_runtime_pm.h
  HDRTEST drivers/gpu/drm/i915/intel_sbi.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/pppcielanes.o
  HDRTEST drivers/gpu/drm/i915/intel_step.h
  HDRTEST drivers/gpu/drm/i915/intel_uncore.h
  HDRTEST drivers/gpu/drm/i915/intel_wakeref.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/process_pptables_v1_0.o
  HDRTEST drivers/gpu/drm/i915/pxp/intel_pxp_tee.h
  HDRTEST drivers/gpu/drm/i915/pxp/intel_pxp_irq.h
  HDRTEST drivers/gpu/drm/i915/pxp/intel_pxp_session.h
  HDRTEST drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
  HDRTEST drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
  HDRTEST drivers/gpu/drm/i915/pxp/intel_pxp_cmd.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgm200.o
  HDRTEST drivers/gpu/drm/i915/pxp/intel_pxp.h
  HDRTEST drivers/gpu/drm/i915/pxp/intel_pxp_types.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/ppatomctrl.o
  HDRTEST drivers/gpu/drm/i915/pxp/intel_pxp_debugfs.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgm20b.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/ppatomfwctrl.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu7_hwmgr.o
  HDRTEST drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_cmn.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.o
  HDRTEST drivers/gpu/drm/i915/pxp/intel_pxp_huc.h
  HDRTEST drivers/gpu/drm/i915/pxp/intel_pxp_pm.h
  HDRTEST drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_42.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp10b.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu7_powertune.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu7_thermal.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgv100.o
  HDRTEST drivers/gpu/drm/i915/pxp/intel_pxp_regs.h
  HDRTEST drivers/gpu/drm/i915/selftests/igt_live_test.h
  HDRTEST drivers/gpu/drm/i915/selftests/mock_gem_device.h
  HDRTEST drivers/gpu/drm/i915/selftests/igt_atomic.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmtu102.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/umem.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/ummu.o
  HDRTEST drivers/gpu/drm/i915/selftests/mock_drm.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mxm/base.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu7_clockpowergating.o
  HDRTEST drivers/gpu/drm/i915/selftests/igt_reset.h
  HDRTEST drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
  HDRTEST drivers/gpu/drm/i915/selftests/lib_sw_fence.h
  HDRTEST drivers/gpu/drm/i915/selftests/i915_perf_selftests.h
  HDRTEST drivers/gpu/drm/i915/selftests/mock_uncore.h
  HDRTEST drivers/gpu/drm/i915/selftests/mock_gtt.h
  HDRTEST drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
  HDRTEST drivers/gpu/drm/i915/selftests/mock_request.h
  HDRTEST drivers/gpu/drm/i915/selftests/i915_random.h
  HDRTEST drivers/gpu/drm/i915/selftests/igt_spinner.h
  HDRTEST drivers/gpu/drm/i915/selftests/librapl.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mxm/mxms.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/vega10_processpptables.o
  HDRTEST drivers/gpu/drm/i915/selftests/mock_region.h
  HDRTEST drivers/gpu/drm/i915/selftests/i915_live_selftests.h
  HDRTEST drivers/gpu/drm/i915/selftests/igt_mmap.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/vega10_hwmgr.o
  HDRTEST drivers/gpu/drm/i915/selftests/igt_flush_test.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/vega10_powertune.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/mxm/nv50.o
  HDRTEST drivers/gpu/drm/i915/soc/intel_pch.h
  HDRTEST drivers/gpu/drm/i915/soc/intel_dram.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.o
  HDRTEST drivers/gpu/drm/i915/soc/intel_gmch.h
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.o
  HDRTEST drivers/gpu/drm/i915/vlv_sideband.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/vega10_thermal.o
  HDRTEST drivers/gpu/drm/i915/vlv_sideband_reg.h
  HDRTEST drivers/gpu/drm/i915/vlv_suspend.h
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu10_hwmgr.o
  LD [M]  drivers/gpu/drm/i915/i915.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/pp_psm.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/vega12_processpptables.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/vega12_hwmgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pci/pcie.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/vega12_thermal.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/pp_overdriver.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pci/nv04.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu_helper.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pci/nv40.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pci/nv46.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/vega20_processpptables.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pci/nv4c.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/vega20_hwmgr.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/vega20_powertune.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pci/g84.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/vega20_thermal.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pci/g92.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pci/g94.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pci/gf100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pci/gf106.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/common_baco.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/vega10_baco.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pci/gk104.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/vega20_baco.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pci/gp100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/vega12_baco.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu9_baco.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/memx.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gt215.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gf100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gf119.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gk104.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gk110.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/tonga_baco.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gk208.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gk20a.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gm107.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/polaris_baco.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/fiji_baco.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gm200.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gm20b.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp102.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/privring/gf100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/privring/gf117.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/ci_baco.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu7_baco.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/amd_powerplay.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/privring/gk104.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/privring/gk20a.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/legacy_dpm.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/privring/gm200.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/privring/gp10b.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/fan.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/fannil.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/fanpwm.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/kv_dpm.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/kv_smc.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/fantog.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/si_dpm.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/ic.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/temp.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/nv40.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/si_smc.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/nv50.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/amdgpu_dpm.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/g84.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/amdgpu_pm.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/gt215.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/amdgpu_dpm_internal.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/gf100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/gf119.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/gk104.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/gm107.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/gm200.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_crtc.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_irq.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_mst_types.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/timer/base.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_color.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/timer/nv04.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/dc_fpu.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/timer/nv40.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/timer/nv41.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/timer/gk20a.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_services.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/top/base.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/top/gk104.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_helpers.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/top/ga100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_pp_smu.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/vfn/base.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/vfn/uvfn.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/vfn/gv100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/vfn/tu102.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_psr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/vfn/ga100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_hdcp.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_crc.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/volt/base.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/volt/gpio.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_debugfs.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/volt/nv40.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/volt/gf100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/volt/gf117.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/volt/gk104.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/volt/gk20a.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/basics/conversion.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/basics/fixpt31_32.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/subdev/volt/gm20b.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/falcon.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/xtensa.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/basics/vector.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/bsp/g84.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/basics/dc_common.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/ce/gt215.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/ce/gf100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/ce/gk104.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/ce/gm107.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/ce/gm200.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/ce/gp100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser_interface.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/ce/gp102.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser_helper.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/bios/command_table.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/bios/command_table_helper.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/ce/gv100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser_common.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/ce/tu102.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/ce/ga100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/bios/command_table2.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/bios/command_table_helper2.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/ce/ga102.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/cipher/g84.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/bios/dce60/command_table_helper_dce60.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/device/acpi.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/device/base.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/bios/dce80/command_table_helper_dce80.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/device/pci.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/bios/dce110/command_table_helper_dce110.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/bios/dce112/command_table_helper_dce112.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/bios/dce112/command_table_helper2_dce112.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/device/user.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/calcs/dce_calcs.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/base.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/calcs/custom_float.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/calcs/bw_fixed.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_rq_dlg_helpers.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/chan.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/conn.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dml1_display_rq_dlg_calc.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn10/dcn10_fpu.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/hdmi.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/dcn20_fpu.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/head.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/ior.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/vga.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/nv04.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_vba.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_rq_dlg_calc_20.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/nv50.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_rq_dlg_calc_20v2.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/g84.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20v2.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_rq_dlg_calc_21.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_mode_vba_21.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/dcn30_fpu.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/g94.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/gt200.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/mcp77.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_mode_vba_30.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/gt215.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_rq_dlg_calc_30.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/mcp89.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_rq_dlg_calc_31.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn314/display_mode_vba_314.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/gf119.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn314/display_rq_dlg_calc_314.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_rq_dlg_calc_32.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/gk104.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/gk110.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/gm107.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/gm200.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/gp100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/dcn31_fpu.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn321/dcn321_fpu.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/gp102.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/gv100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn301/dcn301_fpu.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn302/dcn302_fpu.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/tu102.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn303/dcn303_fpu.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn314/dcn314_fpu.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/ga102.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/udisp.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dsc/rc_calc_fpu.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/uconn.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/uoutp.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/disp/uhead.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/dma/base.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/dma/nv04.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/calcs/dcn_calcs.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/dma/nv50.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/dma/gf100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/dma/gf119.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/calcs/dcn_calc_math.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/calcs/dcn_calc_auto.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/dma/gv100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/dma/user.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/dma/usernv04.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/clk_mgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/dma/usernv50.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/dma/usergf100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dce60/dce60_clk_mgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/dma/usergf119.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dce100/dce_clk_mgr.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dce110/dce110_clk_mgr.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dce112/dce112_clk_mgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/dma/usergv100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/base.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dce120/dce120_clk_mgr.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn10/rv1_clk_mgr.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn10/rv1_clk_mgr_vbios_smu.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/cgrp.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn10/rv2_clk_mgr.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn20/dcn20_clk_mgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn201/dcn201_clk_mgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/chid.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn21/rn_clk_mgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn21/rn_clk_mgr_vbios_smu.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/runq.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/nv04.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/nv10.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/nv17.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/nv40.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/nv50.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn30/dcn30_clk_mgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/g84.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn30/dcn30_clk_mgr_smu_msg.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn301/vg_clk_mgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/g98.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/gk104.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/gk110.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/gk208.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn301/dcn301_smu.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/gk20a.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/gm107.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/gm200.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn31/dcn31_smu.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn31/dcn31_clk_mgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/gp100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn314/dcn314_smu.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn314/dcn314_clk_mgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/gv100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn315/dcn315_smu.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn315/dcn315_clk_mgr.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn316/dcn316_smu.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/tu102.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn316/dcn316_clk_mgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/ga100.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn32/dcn32_clk_mgr.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn32/dcn32_clk_mgr_smu_msg.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_audio.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/ga102.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/ucgrp.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_link_encoder.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_stream_encoder.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/fifo/uchan.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_hwseq.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/base.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/nv04.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_mem_input.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_clock_source.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_scl_filters.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/nv10.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/nv15.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/nv17.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/nv20.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/nv25.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_transform.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/nv2a.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_opp.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_dmcu.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/nv30.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/nv34.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_abm.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/nv35.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/nv40.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/nv44.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/nv50.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_ipp.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/g84.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_i2c.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gt200.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_i2c_hw.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/mcp79.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_i2c_sw.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_psr.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_abm.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gt215.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_abm_lcd.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_panel_cntl.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_hw_lock_mgr.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/mcp89.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_outbox.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/gpio_base.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gf104.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gf108.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gf110.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/gpio_service.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gf117.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/hw_factory.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gf119.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gk104.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110b.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gk208.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/hw_gpio.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/hw_hpd.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gk20a.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gm107.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/hw_ddc.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gm200.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/hw_generic.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/hw_translate.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gm20b.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/dce60/hw_translate_dce60.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gp100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gp102.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gp104.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/dce60/hw_factory_dce60.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/dce80/hw_translate_dce80.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/dce80/hw_factory_dce80.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gp107.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/dce110/hw_translate_dce110.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gp108.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/dce110/hw_factory_dce110.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gp10b.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/dce120/hw_translate_dce120.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/dce120/hw_factory_dce120.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/dcn10/hw_translate_dcn10.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/dcn10/hw_factory_dcn10.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/gpio/dcn20/hw_translate_dcn20.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/gv100.o
  CC [M]  drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../displ



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Intel-xe] ✓ CI.Hooks: success for Documentation/gpu: VM_BIND locking document
  2023-08-16  9:15 ` Thomas Hellström
                   ` (5 preceding siblings ...)
  (?)
@ 2023-08-16 10:02 ` Patchwork
  -1 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2023-08-16 10:02 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

== Series Details ==

Series: Documentation/gpu: VM_BIND locking document
URL   : https://patchwork.freedesktop.org/series/122507/
State : success

== Summary ==

run-parts: executing /workspace/ci/hooks/00-showenv
+ pwd
+ ls -la
/workspace
total 496
drwxrwxr-x 10 1003 1003   4096 Aug 16 10:02 .
drwxr-xr-x  1 root root   4096 Aug 16 10:02 ..
-rw-rw-r--  1 1003 1003 389691 Aug 16 10:01 build.log
-rw-rw-r--  1 1003 1003   1346 Aug 16 09:56 checkpatch.log
drwxrwxr-x  5 1003 1003   4096 Aug 16 09:55 ci
drwxrwxr-x  9 1003 1003   4096 Aug 16 09:55 docker
drwxrwxr-x  8 1003 1003   4096 Aug 16 09:55 .git
-rw-rw-r--  1 1003 1003    208 Aug 16 09:56 git_apply.log
drwxrwxr-x  3 1003 1003   4096 Aug 16 09:55 .github
-rw-rw-r--  1 1003 1003    233 Aug 16 09:55 .groovylintrc.json
-rw-rw-r--  1 1003 1003     78 Aug 16 10:02 hooks.log
drwxrwxr-x 31 1003 1003   4096 Aug 16 10:01 kernel
-rw-rw-r--  1 1003 1003  16834 Aug 16 09:56 kernel.mbox
-rw-rw-r--  1 1003 1003  26091 Aug 16 09:58 kunit.log
-rw-rw-r--  1 1003 1003     48 Aug 16 09:56 parent.tag
drwxrwxr-x 45 1003 1003   4096 Aug 16 09:55 pipelines
-rw-rw-r--  1 1003 1003    793 Aug 16 09:55 README.adoc
drwxrwxr-x  3 1003 1003   4096 Aug 16 09:55 scripts
drwxrwxr-x  2 1003 1003   4096 Aug 16 09:55 .vscode
+ uname -a
Linux 7bc0f5025626 5.4.0-149-generic #166-Ubuntu SMP Tue Apr 18 16:51:45 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
+ export
+ grep -Ei '(^|\W)CI_'
declare -x CI_KERNEL_BUILD_DIR="/workspace/kernel/build64"
declare -x CI_KERNEL_IMAGES_DIR="/workspace/kernel/archive/boot"
declare -x CI_KERNEL_MODULES_DIR="/workspace/kernel/archive"
declare -x CI_KERNEL_SRC_DIR="/workspace/kernel"
declare -x CI_SRC_DIR="/workspace/kernel"
declare -x CI_TOOLS_SRC_DIR="/workspace/ci"
declare -x CI_WORKSPACE_DIR="/workspace"
+ '[' -n /workspace ']'
+ git_args='-C /workspace/kernel'
+ git_log_args=
+ git --no-pager -C /workspace/kernel log --format=oneline --abbrev-commit
0c030b258 Documentation/gpu: VM_BIND locking document
9829aba16 drm/xe/dg2: Remove Wa_15010599737
run-parts: executing /workspace/ci/hooks/10-build-W1
+ SRC_DIR=/workspace/kernel
+ RESTORE_DISPLAY_CONFIG=0
+ '[' -n /workspace/kernel/build64 ']'
+ BUILD_DIR=/workspace/kernel/build64
+ cd /workspace/kernel
+ grep -q -e '^CONFIG_DRM_XE_DISPLAY=[yY]' /workspace/kernel/build64/.config
+ RESTORE_DISPLAY_CONFIG=1
+ trap cleanup EXIT
+ ./scripts/config --file /workspace/kernel/build64/.config --disable CONFIG_DRM_XE_DISPLAY
++ nproc
+ make -j48 O=/workspace/kernel/build64 modules_prepare
make[1]: Entering directory '/workspace/kernel/build64'
  SYNC    include/config/auto.conf.cmd
  GEN     Makefile
  GEN     Makefile
  UPD     include/generated/compile.h
  UPD     include/config/kernel.release
  UPD     include/generated/utsrelease.h
  DESCEND objtool
  CALL    ../scripts/checksyscalls.sh
  HOSTCC  /workspace/kernel/build64/tools/objtool/fixdep.o
  HOSTLD  /workspace/kernel/build64/tools/objtool/fixdep-in.o
  LINK    /workspace/kernel/build64/tools/objtool/fixdep
  INSTALL libsubcmd_headers
  CC      /workspace/kernel/build64/tools/objtool/libsubcmd/exec-cmd.o
  CC      /workspace/kernel/build64/tools/objtool/libsubcmd/help.o
  CC      /workspace/kernel/build64/tools/objtool/libsubcmd/pager.o
  CC      /workspace/kernel/build64/tools/objtool/libsubcmd/parse-options.o
  CC      /workspace/kernel/build64/tools/objtool/libsubcmd/run-command.o
  CC      /workspace/kernel/build64/tools/objtool/libsubcmd/sigchain.o
  CC      /workspace/kernel/build64/tools/objtool/libsubcmd/subcmd-config.o
  LD      /workspace/kernel/build64/tools/objtool/libsubcmd/libsubcmd-in.o
  AR      /workspace/kernel/build64/tools/objtool/libsubcmd/libsubcmd.a
  CC      /workspace/kernel/build64/tools/objtool/weak.o
  CC      /workspace/kernel/build64/tools/objtool/check.o
  CC      /workspace/kernel/build64/tools/objtool/special.o
  CC      /workspace/kernel/build64/tools/objtool/builtin-check.o
  CC      /workspace/kernel/build64/tools/objtool/elf.o
  CC      /workspace/kernel/build64/tools/objtool/objtool.o
  CC      /workspace/kernel/build64/tools/objtool/orc_gen.o
  CC      /workspace/kernel/build64/tools/objtool/orc_dump.o
  CC      /workspace/kernel/build64/tools/objtool/libstring.o
  CC      /workspace/kernel/build64/tools/objtool/libctype.o
  CC      /workspace/kernel/build64/tools/objtool/str_error_r.o
  CC      /workspace/kernel/build64/tools/objtool/librbtree.o
  CC      /workspace/kernel/build64/tools/objtool/arch/x86/special.o
  CC      /workspace/kernel/build64/tools/objtool/arch/x86/decode.o
  LD      /workspace/kernel/build64/tools/objtool/arch/x86/objtool-in.o
  LD      /workspace/kernel/build64/tools/objtool/objtool-in.o
  LINK    /workspace/kernel/build64/tools/objtool/objtool
make[1]: Leaving directory '/workspace/kernel/build64'
++ nproc
+ make -j48 O=/workspace/kernel/build64 M=drivers/gpu/drm/xe W=1
make[1]: Entering directory '/workspace/kernel/build64'
  CC [M]  drivers/gpu/drm/xe/xe_bb.o
  CC [M]  drivers/gpu/drm/xe/xe_bo.o
  CC [M]  drivers/gpu/drm/xe/xe_bo_evict.o
  CC [M]  drivers/gpu/drm/xe/xe_debugfs.o
  CC [M]  drivers/gpu/drm/xe/xe_devcoredump.o
  CC [M]  drivers/gpu/drm/xe/xe_device.o
  CC [M]  drivers/gpu/drm/xe/xe_device_sysfs.o
  CC [M]  drivers/gpu/drm/xe/xe_dma_buf.o
  CC [M]  drivers/gpu/drm/xe/xe_exec.o
  CC [M]  drivers/gpu/drm/xe/xe_execlist.o
  CC [M]  drivers/gpu/drm/xe/xe_exec_queue.o
  CC [M]  drivers/gpu/drm/xe/xe_force_wake.o
  CC [M]  drivers/gpu/drm/xe/xe_ggtt.o
  CC [M]  drivers/gpu/drm/xe/xe_gt.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_clock.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_debugfs.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_idle_sysfs.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_mcr.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_pagefault.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_sysfs.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_topology.o
  HOSTCC  drivers/gpu/drm/xe/xe_gen_wa_oob
  CC [M]  drivers/gpu/drm/xe/xe_guc_ads.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_ct.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_debugfs.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_hwconfig.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_log.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_pc.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_submit.o
  CC [M]  drivers/gpu/drm/xe/xe_hw_engine.o
  CC [M]  drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.o
  CC [M]  drivers/gpu/drm/xe/xe_hw_fence.o
  CC [M]  drivers/gpu/drm/xe/xe_huc.o
  CC [M]  drivers/gpu/drm/xe/xe_huc_debugfs.o
  CC [M]  drivers/gpu/drm/xe/xe_irq.o
  CC [M]  drivers/gpu/drm/xe/xe_lrc.o
  CC [M]  drivers/gpu/drm/xe/xe_migrate.o
  CC [M]  drivers/gpu/drm/xe/xe_mmio.o
  CC [M]  drivers/gpu/drm/xe/xe_mocs.o
  CC [M]  drivers/gpu/drm/xe/xe_module.o
  CC [M]  drivers/gpu/drm/xe/xe_pat.o
  CC [M]  drivers/gpu/drm/xe/xe_pci.o
  CC [M]  drivers/gpu/drm/xe/xe_pcode.o
  CC [M]  drivers/gpu/drm/xe/xe_pm.o
  CC [M]  drivers/gpu/drm/xe/xe_preempt_fence.o
  CC [M]  drivers/gpu/drm/xe/xe_pt_walk.o
  CC [M]  drivers/gpu/drm/xe/xe_pt.o
  CC [M]  drivers/gpu/drm/xe/xe_query.o
  CC [M]  drivers/gpu/drm/xe/xe_range_fence.o
  CC [M]  drivers/gpu/drm/xe/xe_reg_sr.o
  CC [M]  drivers/gpu/drm/xe/xe_reg_whitelist.o
  CC [M]  drivers/gpu/drm/xe/xe_rtp.o
  CC [M]  drivers/gpu/drm/xe/xe_sa.o
  CC [M]  drivers/gpu/drm/xe/xe_sched_job.o
  CC [M]  drivers/gpu/drm/xe/xe_step.o
  CC [M]  drivers/gpu/drm/xe/xe_sync.o
  CC [M]  drivers/gpu/drm/xe/xe_tile.o
  CC [M]  drivers/gpu/drm/xe/xe_tile_sysfs.o
  CC [M]  drivers/gpu/drm/xe/xe_trace.o
  CC [M]  drivers/gpu/drm/xe/xe_ttm_sys_mgr.o
  CC [M]  drivers/gpu/drm/xe/xe_ttm_stolen_mgr.o
  CC [M]  drivers/gpu/drm/xe/xe_ttm_vram_mgr.o
  CC [M]  drivers/gpu/drm/xe/xe_tuning.o
  CC [M]  drivers/gpu/drm/xe/xe_uc.o
  CC [M]  drivers/gpu/drm/xe/xe_uc_debugfs.o
  CC [M]  drivers/gpu/drm/xe/xe_uc_fw.o
  CC [M]  drivers/gpu/drm/xe/xe_vm_madvise.o
  CC [M]  drivers/gpu/drm/xe/xe_wait_user_fence.o
  CC [M]  drivers/gpu/drm/xe/xe_wopcm.o
  HDRTEST drivers/gpu/drm/xe/abi/guc_klvs_abi.h
  HDRTEST drivers/gpu/drm/xe/abi/guc_errors_abi.h
  CC [M]  drivers/gpu/drm/xe/tests/xe_bo_test.o
  HDRTEST drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h
  CC [M]  drivers/gpu/drm/xe/tests/xe_dma_buf_test.o
  CC [M]  drivers/gpu/drm/xe/tests/xe_migrate_test.o
  HDRTEST drivers/gpu/drm/xe/abi/guc_communication_mmio_abi.h
  HDRTEST drivers/gpu/drm/xe/abi/guc_actions_abi.h
  CC [M]  drivers/gpu/drm/xe/tests/xe_pci_test.o
  HDRTEST drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h
  CC [M]  drivers/gpu/drm/xe/tests/xe_rtp_test.o
  HDRTEST drivers/gpu/drm/xe/abi/guc_messages_abi.h
  CC [M]  drivers/gpu/drm/xe/tests/xe_wa_test.o
  HDRTEST drivers/gpu/drm/xe/regs/xe_reg_defs.h
  HDRTEST drivers/gpu/drm/xe/regs/xe_guc_regs.h
  HDRTEST drivers/gpu/drm/xe/regs/xe_gt_regs.h
  HDRTEST drivers/gpu/drm/xe/regs/xe_regs.h
  HDRTEST drivers/gpu/drm/xe/regs/xe_gpu_commands.h
  HDRTEST drivers/gpu/drm/xe/regs/xe_lrc_layout.h
  HDRTEST drivers/gpu/drm/xe/regs/xe_engine_regs.h
  HDRTEST drivers/gpu/drm/xe/tests/xe_test.h
  HDRTEST drivers/gpu/drm/xe/tests/xe_pci_test.h
  HDRTEST drivers/gpu/drm/xe/tests/xe_migrate_test.h
  HDRTEST drivers/gpu/drm/xe/tests/xe_dma_buf_test.h
  HDRTEST drivers/gpu/drm/xe/tests/xe_bo_test.h
  HDRTEST drivers/gpu/drm/xe/xe_bb.h
  HDRTEST drivers/gpu/drm/xe/xe_bb_types.h
  HDRTEST drivers/gpu/drm/xe/xe_bo.h
  HDRTEST drivers/gpu/drm/xe/xe_bo_doc.h
  HDRTEST drivers/gpu/drm/xe/xe_bo_evict.h
  HDRTEST drivers/gpu/drm/xe/xe_bo_types.h
  HDRTEST drivers/gpu/drm/xe/xe_debugfs.h
  HDRTEST drivers/gpu/drm/xe/xe_devcoredump.h
  HDRTEST drivers/gpu/drm/xe/xe_devcoredump_types.h
  HDRTEST drivers/gpu/drm/xe/xe_device.h
  HDRTEST drivers/gpu/drm/xe/xe_device_sysfs.h
  HDRTEST drivers/gpu/drm/xe/xe_device_types.h
  HDRTEST drivers/gpu/drm/xe/xe_dma_buf.h
  HDRTEST drivers/gpu/drm/xe/xe_drv.h
  HDRTEST drivers/gpu/drm/xe/xe_exec.h
  HDRTEST drivers/gpu/drm/xe/xe_exec_queue.h
  HDRTEST drivers/gpu/drm/xe/xe_exec_queue_types.h
  HDRTEST drivers/gpu/drm/xe/xe_execlist.h
  HDRTEST drivers/gpu/drm/xe/xe_execlist_types.h
  HDRTEST drivers/gpu/drm/xe/xe_force_wake.h
  HDRTEST drivers/gpu/drm/xe/xe_force_wake_types.h
  HDRTEST drivers/gpu/drm/xe/xe_ggtt.h
  HDRTEST drivers/gpu/drm/xe/xe_ggtt_types.h
  HDRTEST drivers/gpu/drm/xe/xe_gt.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_clock.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_debugfs.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_idle_sysfs.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_idle_sysfs_types.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_mcr.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_pagefault.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_printk.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_sysfs.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_sysfs_types.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_topology.h
  HDRTEST drivers/gpu/drm/xe/xe_gt_types.h
  HDRTEST drivers/gpu/drm/xe/xe_guc.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_ads.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_ads_types.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_ct.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_ct_types.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_debugfs.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_fwif.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_hwconfig.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_log.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_log_types.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_pc.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_pc_types.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_submit.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_submit_types.h
  HDRTEST drivers/gpu/drm/xe/xe_guc_types.h
  HDRTEST drivers/gpu/drm/xe/xe_huc.h
  HDRTEST drivers/gpu/drm/xe/xe_huc_debugfs.h
  HDRTEST drivers/gpu/drm/xe/xe_huc_types.h
  HDRTEST drivers/gpu/drm/xe/xe_hw_engine.h
  HDRTEST drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.h
  HDRTEST drivers/gpu/drm/xe/xe_hw_engine_types.h
  HDRTEST drivers/gpu/drm/xe/xe_hw_fence.h
  HDRTEST drivers/gpu/drm/xe/xe_hw_fence_types.h
  HDRTEST drivers/gpu/drm/xe/xe_irq.h
  HDRTEST drivers/gpu/drm/xe/xe_lrc.h
  HDRTEST drivers/gpu/drm/xe/xe_lrc_types.h
  HDRTEST drivers/gpu/drm/xe/xe_macros.h
  HDRTEST drivers/gpu/drm/xe/xe_map.h
  HDRTEST drivers/gpu/drm/xe/xe_migrate.h
  HDRTEST drivers/gpu/drm/xe/xe_migrate_doc.h
  HDRTEST drivers/gpu/drm/xe/xe_mmio.h
  HDRTEST drivers/gpu/drm/xe/xe_mocs.h
  HDRTEST drivers/gpu/drm/xe/xe_module.h
  HDRTEST drivers/gpu/drm/xe/xe_pat.h
  HDRTEST drivers/gpu/drm/xe/xe_pci.h
  HDRTEST drivers/gpu/drm/xe/xe_pci_types.h
  HDRTEST drivers/gpu/drm/xe/xe_pcode.h
  HDRTEST drivers/gpu/drm/xe/xe_pcode_api.h
  HDRTEST drivers/gpu/drm/xe/xe_platform_types.h
  HDRTEST drivers/gpu/drm/xe/xe_pm.h
  HDRTEST drivers/gpu/drm/xe/xe_preempt_fence.h
  HDRTEST drivers/gpu/drm/xe/xe_preempt_fence_types.h
  HDRTEST drivers/gpu/drm/xe/xe_pt.h
  HDRTEST drivers/gpu/drm/xe/xe_pt_types.h
  HDRTEST drivers/gpu/drm/xe/xe_pt_walk.h
  HDRTEST drivers/gpu/drm/xe/xe_query.h
  HDRTEST drivers/gpu/drm/xe/xe_range_fence.h
  HDRTEST drivers/gpu/drm/xe/xe_reg_sr.h
  HDRTEST drivers/gpu/drm/xe/xe_reg_sr_types.h
  HDRTEST drivers/gpu/drm/xe/xe_reg_whitelist.h
  HDRTEST drivers/gpu/drm/xe/xe_res_cursor.h
  HDRTEST drivers/gpu/drm/xe/xe_ring_ops.h
  HDRTEST drivers/gpu/drm/xe/xe_ring_ops_types.h
  HDRTEST drivers/gpu/drm/xe/xe_rtp.h
  HDRTEST drivers/gpu/drm/xe/xe_rtp_types.h
  HDRTEST drivers/gpu/drm/xe/xe_sa.h
  HDRTEST drivers/gpu/drm/xe/xe_sa_types.h
  HDRTEST drivers/gpu/drm/xe/xe_sched_job.h
  HDRTEST drivers/gpu/drm/xe/xe_sched_job_types.h
  HDRTEST drivers/gpu/drm/xe/xe_step.h
  HDRTEST drivers/gpu/drm/xe/xe_step_types.h
  HDRTEST drivers/gpu/drm/xe/xe_sync.h
  HDRTEST drivers/gpu/drm/xe/xe_sync_types.h
  HDRTEST drivers/gpu/drm/xe/xe_tile.h
  HDRTEST drivers/gpu/drm/xe/xe_tile_sysfs.h
  HDRTEST drivers/gpu/drm/xe/xe_tile_sysfs_types.h
  HDRTEST drivers/gpu/drm/xe/xe_trace.h
  HDRTEST drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h
  HDRTEST drivers/gpu/drm/xe/xe_ttm_sys_mgr.h
  HDRTEST drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
  HDRTEST drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
  HDRTEST drivers/gpu/drm/xe/xe_tuning.h
  HDRTEST drivers/gpu/drm/xe/xe_uc.h
  HDRTEST drivers/gpu/drm/xe/xe_uc_debugfs.h
  HDRTEST drivers/gpu/drm/xe/xe_uc_fw.h
  HDRTEST drivers/gpu/drm/xe/xe_uc_fw_abi.h
  HDRTEST drivers/gpu/drm/xe/xe_uc_fw_types.h
  HDRTEST drivers/gpu/drm/xe/xe_uc_types.h
  HDRTEST drivers/gpu/drm/xe/xe_vm.h
  HDRTEST drivers/gpu/drm/xe/xe_vm_doc.h
  HDRTEST drivers/gpu/drm/xe/xe_vm_madvise.h
  HDRTEST drivers/gpu/drm/xe/xe_vm_types.h
  HDRTEST drivers/gpu/drm/xe/xe_wa.h
  HDRTEST drivers/gpu/drm/xe/xe_wait_user_fence.h
  HDRTEST drivers/gpu/drm/xe/xe_wopcm.h
  HDRTEST drivers/gpu/drm/xe/xe_wopcm_types.h
  GEN     xe_wa_oob.c xe_wa_oob.h
  GEN     xe_wa_oob.c xe_wa_oob.h
  CC [M]  drivers/gpu/drm/xe/xe_guc.o
  CC [M]  drivers/gpu/drm/xe/xe_ring_ops.o
  CC [M]  drivers/gpu/drm/xe/xe_vm.o
  CC [M]  drivers/gpu/drm/xe/xe_wa.o
  LD [M]  drivers/gpu/drm/xe/xe.o
  MODPOST drivers/gpu/drm/xe/Module.symvers
  CC [M]  drivers/gpu/drm/xe/xe.mod.o
  CC [M]  drivers/gpu/drm/xe/tests/xe_bo_test.mod.o
  CC [M]  drivers/gpu/drm/xe/tests/xe_dma_buf_test.mod.o
  CC [M]  drivers/gpu/drm/xe/tests/xe_migrate_test.mod.o
  CC [M]  drivers/gpu/drm/xe/tests/xe_pci_test.mod.o
  CC [M]  drivers/gpu/drm/xe/tests/xe_rtp_test.mod.o
  CC [M]  drivers/gpu/drm/xe/tests/xe_wa_test.mod.o
  LD [M]  drivers/gpu/drm/xe/tests/xe_bo_test.ko
  LD [M]  drivers/gpu/drm/xe/tests/xe_pci_test.ko
  LD [M]  drivers/gpu/drm/xe/tests/xe_rtp_test.ko
  LD [M]  drivers/gpu/drm/xe/tests/xe_dma_buf_test.ko
  LD [M]  drivers/gpu/drm/xe/tests/xe_migrate_test.ko
  LD [M]  drivers/gpu/drm/xe/xe.ko
  LD [M]  drivers/gpu/drm/xe/tests/xe_wa_test.ko
make[1]: Leaving directory '/workspace/kernel/build64'
+ cleanup
+ '[' 1 -eq 1 ']'
+ ./scripts/config --file /workspace/kernel/build64/.config --enable CONFIG_DRM_XE_DISPLAY
run-parts: executing /workspace/ci/hooks/20-kernel-doc
+ SRC_DIR=/workspace/kernel
+ cd /workspace/kernel
+ find drivers/gpu/drm/xe/ -name '*.[ch]' -not -path 'drivers/gpu/drm/xe/display/*'
+ xargs ./scripts/kernel-doc -Werror -none include/uapi/drm/xe_drm.h
All hooks done



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Intel-xe] ✗ CI.checksparse: warning for Documentation/gpu: VM_BIND locking document
  2023-08-16  9:15 ` Thomas Hellström
                   ` (6 preceding siblings ...)
  (?)
@ 2023-08-16 10:02 ` Patchwork
  -1 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2023-08-16 10:02 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

== Series Details ==

Series: Documentation/gpu: VM_BIND locking document
URL   : https://patchwork.freedesktop.org/series/122507/
State : warning

== Summary ==

+ trap cleanup EXIT
+ KERNEL=/kernel
+ MT=/root/linux/maintainer-tools
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools /root/linux/maintainer-tools
Cloning into '/root/linux/maintainer-tools'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ make -C /root/linux/maintainer-tools
make: Entering directory '/root/linux/maintainer-tools'
cc -O2 -g -Wextra -o remap-log remap-log.c
make: Leaving directory '/root/linux/maintainer-tools'
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ /root/linux/maintainer-tools/dim sparse --fast 9829aba16e62fcfba150f72d5d492fd778e0150e
/root/linux/maintainer-tools/dim: line 50: /root/.dimrc: No such file or directory
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Intel-xe] ✓ CI.BAT: success for Documentation/gpu: VM_BIND locking document
  2023-08-16  9:15 ` Thomas Hellström
                   ` (7 preceding siblings ...)
  (?)
@ 2023-08-16 10:26 ` Patchwork
  -1 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2023-08-16 10:26 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 1996 bytes --]

== Series Details ==

Series: Documentation/gpu: VM_BIND locking document
URL   : https://patchwork.freedesktop.org/series/122507/
State : success

== Summary ==

CI Bug Log - changes from xe-323-9829aba16e62fcfba150f72d5d492fd778e0150e_BAT -> xe-pw-122507v1_BAT
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (3 -> 2)
------------------------------

  Missing    (1): bat-atsm-2 

Known issues
------------

  Here are the changes found in xe-pw-122507v1_BAT that come from known issues:

### IGT changes ###

#### Possible fixes ####

  * igt@kms_flip@basic-flip-vs-wf_vblank:
    - bat-adlp-7:         [FAIL][1] ([Intel XE#480]) -> [PASS][2] +1 similar issue
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-323-9829aba16e62fcfba150f72d5d492fd778e0150e/bat-adlp-7/igt@kms_flip@basic-flip-vs-wf_vblank.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-122507v1/bat-adlp-7/igt@kms_flip@basic-flip-vs-wf_vblank.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-nv12:
    - bat-adlp-7:         [FAIL][3] ([Intel XE#400]) -> [PASS][4] +1 similar issue
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-323-9829aba16e62fcfba150f72d5d492fd778e0150e/bat-adlp-7/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-nv12.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-122507v1/bat-adlp-7/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-nv12.html

  
  [Intel XE#400]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/400
  [Intel XE#480]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/480


Build changes
-------------

  * IGT: IGT_7436 -> IGT_7437
  * Linux: xe-323-9829aba16e62fcfba150f72d5d492fd778e0150e -> xe-pw-122507v1

  IGT_7436: 81e08c6d648e949df161a4f39118ed3eb1e354e9 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  IGT_7437: 7437
  xe-323-9829aba16e62fcfba150f72d5d492fd778e0150e: 9829aba16e62fcfba150f72d5d492fd778e0150e
  xe-pw-122507v1: 122507v1



[-- Attachment #2: Type: text/html, Size: 2515 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
  2023-08-16  9:15 ` Thomas Hellström
  (?)
@ 2023-08-17  2:05   ` kernel test robot
  -1 siblings, 0 replies; 45+ messages in thread
From: kernel test robot @ 2023-08-17  2:05 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Matthew Brost, Thomas Hellström, Francois Dugast,
	linux-kernel, Oak Zeng, Rodrigo Vivi, Danilo Krummrich,
	dri-devel, oe-kbuild-all

Hi Thomas,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on drm/drm-next drm-tip/drm-tip linus/master v6.5-rc6 next-20230816]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/Documentation-gpu-VM_BIND-locking-document/20230816-171911
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230816091547.2982-1-thomas.hellstrom%40linux.intel.com
patch subject: [PATCH v2] Documentation/gpu: VM_BIND locking document
reproduce: (https://download.01.org/0day-ci/archive/20230817/202308170916.TGY7kBpM-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308170916.TGY7kBpM-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Documentation/gpu/drm-vm-bind-locking.rst: WARNING: document isn't included in any toctree

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-xe] [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-08-17  2:05   ` kernel test robot
  0 siblings, 0 replies; 45+ messages in thread
From: kernel test robot @ 2023-08-17  2:05 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Francois Dugast, linux-kernel, Rodrigo Vivi, Danilo Krummrich,
	dri-devel, oe-kbuild-all

Hi Thomas,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on drm/drm-next drm-tip/drm-tip linus/master v6.5-rc6 next-20230816]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/Documentation-gpu-VM_BIND-locking-document/20230816-171911
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230816091547.2982-1-thomas.hellstrom%40linux.intel.com
patch subject: [PATCH v2] Documentation/gpu: VM_BIND locking document
reproduce: (https://download.01.org/0day-ci/archive/20230817/202308170916.TGY7kBpM-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308170916.TGY7kBpM-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Documentation/gpu/drm-vm-bind-locking.rst: WARNING: document isn't included in any toctree

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-08-17  2:05   ` kernel test robot
  0 siblings, 0 replies; 45+ messages in thread
From: kernel test robot @ 2023-08-17  2:05 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: oe-kbuild-all, Matthew Brost, Thomas Hellström,
	Francois Dugast, linux-kernel, Oak Zeng, Danilo Krummrich,
	dri-devel, Rodrigo Vivi

Hi Thomas,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on drm/drm-next drm-tip/drm-tip linus/master v6.5-rc6 next-20230816]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/Documentation-gpu-VM_BIND-locking-document/20230816-171911
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230816091547.2982-1-thomas.hellstrom%40linux.intel.com
patch subject: [PATCH v2] Documentation/gpu: VM_BIND locking document
reproduce: (https://download.01.org/0day-ci/archive/20230817/202308170916.TGY7kBpM-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308170916.TGY7kBpM-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Documentation/gpu/drm-vm-bind-locking.rst: WARNING: document isn't included in any toctree

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
  2023-08-16  9:15 ` Thomas Hellström
  (?)
@ 2023-08-31 19:30   ` Rodrigo Vivi
  -1 siblings, 0 replies; 45+ messages in thread
From: Rodrigo Vivi @ 2023-08-31 19:30 UTC (permalink / raw)
  To: Thomas Hellström, Danilo Krummrich
  Cc: intel-xe, Matthew Brost, Danilo Krummrich, Joonas Lahtinen,
	Oak Zeng, Daniel Vetter, Maarten Lankhorst, Francois Dugast,
	dri-devel, linux-kernel

On Wed, Aug 16, 2023 at 11:15:47AM +0200, Thomas Hellström wrote:
> Add the first version of the VM_BIND locking document which is
> intended to be part of the xe driver upstreaming agreement.
> 
> The document describes and discuss the locking used during exec-
> functions, evicton and for userptr gpu-vmas. Intention is to be using the
> same nomenclature as the drm-vm-bind-async.rst.
> 
> v2:
> - s/gvm/gpu_vm/g (Rodrigo Vivi)
> - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
>   (Rodrigo Vivi)
> - Adjust commit message accordingly.
> - Add SPDX license header.
> 
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

Cc: Danilo Krummrich <dakr@redhat.com>

> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  Documentation/gpu/drm-vm-bind-locking.rst | 351 ++++++++++++++++++++++
>  1 file changed, 351 insertions(+)
>  create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst
> 
> diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
> new file mode 100644
> index 000000000000..b813961a9ec2
> --- /dev/null
> +++ b/Documentation/gpu/drm-vm-bind-locking.rst
> @@ -0,0 +1,351 @@
> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
> +
> +===============
> +VM_BIND locking
> +===============
> +
> +This document attempts to describe what's needed to get VM_BIND locking right,
> +including the userptr mmu_notifier locking and it will also discuss some
> +optimizations to get rid of the looping through of all userptr mappings and
> +external / shared object mappings that is needed in the simplest
> +implementation. It will also discuss some implications for faulting gpu_vms.
> +
> +Nomenclature
> +============
> +
> +* ``Context``: GPU execution context.
> +* ``gpu_vm``: Abstraction of a virtual GPU address space with
> +  meta-data. Typically one per client (DRM file-private), or one per
> +  context.
> +* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with
> +  associated meta-data. The backing storage of a gpu_vma can either be
> +  a gem buffer object or anonymous pages mapped also into the CPU
> +  address space for the process.
> +* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing store of
> +  which is anonymous pages as described above.
> +* ``revalidating``: Revalidating a gpu_vma means making the latest version
> +  of the backing store resident and making sure the gpu_vma's
> +  page-table entries point to that backing store.
> +* ``dma_fence``: A struct dma_fence that is similar to a struct completion
> +  and which tracks GPU activity. When the GPU activity is finished,
> +  the dma_fence signals.
> +* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
> +  to track GPU activity in the form of multiple dma_fences on a
> +  gpu_vm or a gem buffer object. The dma_resv contains an array / list
> +  of dma_fences and a lock that needs to be held when adding
> +  additional dma_fences to the dma_resv. The lock is of a type that
> +  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
> +* ``exec function``: An exec function is a function that revalidates all
> +  affected gpu_vmas, submits a GPU command batch and registers the
> +  dma_fence representing the GPU command's activity with all affected
> +  dma_resvs. For completeness, although not covered by this document,
> +  it's worth mentioning that an exec function may also be the
> +  revalidation worker that is used by some drivers in compute /
> +  long-running mode.
> +* ``local object``: A GEM object which is local to a gpu_vm. Shared gem
> +  objects also share the gpu_vm's dma_resv.
> +* ``shared object``: AKA external object: A GEM object which may be shared
> +  by multiple gpu_vms and whose backing storage may be shared with
> +  other drivers.
> +
> +
> +Introducing the locks
> +=====================
> +
> +One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
> +dma_resv object and hence the dma_resv lock. So even with a huge
> +number of local GEM objects, only one lock is needed to make the exec
> +sequence atomic.
> +
> +The following locks and locking orders are used:
> +
> +* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the gpu_vm is
> +  partitioned into gpu_vmas, protects the gpu_vm's list of external objects,
> +  and can also with some simplification protect the gpu_vm's list of
> +  userptr gpu_vmas. With the CPU mm analogy this would correspond to the
> +  mmap_lock.
> +* The ``userptr_seqlock``. This lock is taken in read mode for each
> +  userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
> +  notifier invalidation. This is not a real seqlock but described in
> +  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
> +  'lock' a lot like a seqcount, however this allows multiple
> +  write-sides to hold it at once...". The read side critical section
> +  is enclosed by ``mmu_interval_read_begin() /
> +  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
> +  sleeping uninterruptibly if the write side is held.
> +  The write side is held by the core mm while calling mmu interval
> +  invalidation notifiers.
> +* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
> +  rebinding, and also the residency of all the gpu_vm's local GEM object.
> +* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is taken in read
> +  mode during exec and write mode during a mmu notifier invalidation. In
> +  the absence of a separate page-table lock, this lock can serve
> +  together with the gpu_vm's dma_resv lock as a page-table lock. More on
> +  this below. The userptr notifier lock is per gpu_vm.
> +* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's page-table updates. For
> +  simplicity the gpu_vm's dma_resv lock can be reused as page-table lock.
> +
> +There are certain optimizations described below that require
> +additional locks. More on that later.
> +
> +.. code-block:: C
> +
> +   dma_resv_lock(&gpu_vm->resv);
> +
> +   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
> +		revalidate_gpu_vma(&gpu_vma);
> +		remove_from_revalidate_list(&gpu_vma);
> +   }
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +   dma_resv_unlock(&gpu_vm->resv);
> +
> +Eviction of one of these local objects will then be something like the
> +following:
> +
> +.. code-block:: C
> +
> +   obj = get_object_from_lru();
> +
> +   dma_resv_lock(obj->resv);
> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
> +		put_gpu_vma_on_revalidate_list(&gpu_vma);
> +
> +   add_dependencies(&eviction_job, &obj->resv);
> +   job_dma_fence = gpu_submit(&eviction_job);
> +   add_dma_fence(&obj->resv, job_dma_fence);
> +
> +   dma_resv_unlock(&obj->resv);
> +   put_object(obj);
> +
> +Note that since the object is local to the gpu_vm, it will share the gpu_vm's
> +``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. Invalidated gpu_vmas are put
> +on the gpu_vm's revalidation list, which is protected by ``gpu_vm->resv``, which
> +is always locked while evicting, due to the above equality.
> +
> +For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before eviction,
> +Since the eviction blit or copy will wait for GPU idle, any attempt by
> +the GPU to access freed memory through the gpu_vma will be preceded by
> +a new exec function, which will make sure the gpu_vma is
> +revalidated. The eviction code holding the object's dma_resv while
> +revalidating will ensure a new exec function may not race with the eviction.
> +
> +Introducing external (or shared) buffer objects
> +===============================================
> +
> +Since shared buffer objects may be shared by multiple gpu_vm's they
> +can't share their reservation object with a single gpu_vm, but will rather
> +have a reservation object of their own. The shared objects bound to a
> +gpu_vm using one or many
> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
> +protected by the gpu_vm lock. One could in theory protect it also with
> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is typically
> +built before the ``gpu_vm->resv`` is locked due to a limitation in
> +the current locking helpers, that is typically not done. Also see
> +below for userptr gpu_vmas.
> +
> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
> +object, but we can no longer be certain that we hold the gpu_vm's
> +dma_resv of all the object's gpu_vmas. We can only be certain that we
> +hold the object's private dma_resv. We can trylock the dma_resvs for
> +the affected gpu_vm's but that might be unnecessarily complex. If we
> +have a ww_acquire context at hand at eviction time we can also perform
> +sleeping locks of those dma_resvs but that could cause expensive
> +rollbacks. One option is to just mark the invalidated gpu_vmas with a bool
> +which is inspected on the next exec function, when the gpu_vm's
> +dma_resv and the object's dma_resv is held, and the invalidated
> +gpu_vmas could then be put on the gpu_vm's list of invalidated
> +gpu_vmas. That bool would then, although being per-gpu_vma formally be
> +protected by the object's dma_resv.
> +
> +The exec function would then look something like the following:
> +
> +.. code-block:: C
> +
> +   read_lock(&gpu_vm->lock);
> +
> +   dma_resv_lock(&gpu_vm->resv);
> +
> +   // Shared object list is protected by the gpu_vm->lock.
> +   for_each_shared_obj(gpu_vm, &obj) {
> +		dma_resv_lock(&obj->resv);
> +		move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
> +   }
> +
> +   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
> +		revalidate_gpu_vma(&gpu_vma);
> +		remove_from_revalidate_list(&gpu_vma);
> +   }
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +   for_each_shared_obj(gpu_vm, &obj)
> +          add_dma_fence(job_dma_fence, &obj->resv);
> +   dma_resv_unlock_all_resv_locks();
> +
> +   read_unlock(&gpu_vm->lock);
> +
> +And the corresponding shared-object aware eviction would look like:
> +
> +.. code-block:: C
> +
> +   obj = get_object_from_lru();
> +
> +   dma_resv_lock(obj->resv);
> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
> +		if (object_is_vm_local(obj))
> +		             put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
> +		else
> +		             mark_gpu_vma_for_revalidation(&gpu_vma);
> +
> +   add_dependencies(&eviction_job, &obj->resv);
> +   job_dma_fence = gpu_submit(&eviction_job);
> +   add_dma_fence(&obj->resv, job_dma_fence);
> +
> +   dma_resv_unlock(&obj->resv);
> +   put_object(obj);
> +
> +Yet another option is to put the gpu_vmas to be invalidated on a separate
> +gpu_vm list protected by a lower level lock that can be taken both at eviction
> +time and at transfer-to-revalidate list time. The details are not in
> +this document, but this for reference implemented in the Intel xe
> +driver.
> +
> +Introducing userptr gpu_vmas
> +============================
> +
> +A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
> +GPU virtual address range, directly maps a CPU mm range of anonymous-
> +or file page-cache pages.
> +A very simple approach would be to just pin the pages using
> +pin_user_pages() at bind time and unpin them at unbind time, but this
> +creates a Denial-Of-Service vector since a single user-space process
> +would be able to pin down all of system memory, which is not
> +desirable. (For special use-cases and with proper accounting pinning might
> +still be a desirable feature, though). What we need to do in the general case is
> +to obtain a reference to the desired pages, make sure we are notified
> +using a MMU notifier just before the CPU mm unmaps the pages, dirty
> +them if they are not mapped read-only to the GPU, and then drop the reference.
> +When we are notified by the MMU notifier that CPU mm is about to drop the
> +pages, we need to stop GPU access to the pages,
> +GPU page-table and make sure that before the next time the GPU tries to access
> +whatever is now present in the CPU mm range, we unmap the old pages
> +from the GPU page tables and repeat the process of obtaining new page
> +references. Note that when the core mm decides to laundry pages, we get such
> +an unmap MMU notification and can mark the pages dirty again before the
> +next GPU access. We also get similar MMU notifications for NUMA accounting
> +which the GPU driver doesn't really need to care about, but so far
> +it's proven difficult to exclude certain notifications.
> +
> +Using a MMU notifier for device DMA (and other methods) is described in
> +`this document
> +<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
> +
> +Now the method of obtaining struct page references using
> +get_user_pages() unfortunately can't be used under a dma_resv lock
> +since that would violate the locking order of the dma_resv lock vs the
> +mmap_lock that is grabbed when resolving a CPU pagefault. This means the gpu_vm's
> +list of userptr gpu_vmas needs to be protected by an outer lock, and this
> +is the first time we strictly need the gpu_vm->lock. While it was
> +previously used also to protect the list of the gpu_vm's shared objects,
> +we could in theory have used the gpu_vm->resv for that.
> +
> +The MMU interval seqlock for a userptr gpu_vma is used in the following
> +way:
> +
> +.. code-block:: C
> +
> +   down_read(&gpu_vm->lock);
> +
> +   retry:
> +
> +   // Note: mmu_interval_read_begin() blocks until there is no
> +   // invalidation notifier running anymore.
> +   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
> +   if (seq != gpu_vma->saved_seq) {
> +           obtain_new_page_pointers(&gpu_vma);
> +	   dma_resv_lock(&gpu_vm->resv);
> +	   put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
> +	   dma_resv_unlock(&gpu_vm->resv);
> +	   gpu_vma->saved_seq = seq;
> +   }
> +
> +   // The usual revalidation goes here.
> +
> +   // Final userptr sequence validation may not happen before the
> +   // submission dma_fence is added to the gpu_vm's resv, from the POW
> +   // of the MMU invalidation notifier. Hence the
> +   // userptr_notifier_lock that will make them appear atomic.
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   down_read(&gpu_vm->userptr_notifier_lock);
> +   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
> +          up_read(&gpu_vm->userptr_notifier_lock);
> +	  goto retry;
> +   }
> +
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +
> +   for_each_shared_obj(gpu_vm, &obj)
> +          add_dma_fence(job_dma_fence, &obj->resv);
> +
> +   dma_resv_unlock_all_resv_locks();
> +   up_read(&gpu_vm->userptr_notifier_lock);
> +   up_read(&gpu_vm->lock);
> +
> +The code between ``mmu_interval_read_begin()`` and the
> +``mmu_interval_read_retry()`` marks the read side critical section of
> +what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
> +gpu_vma list is looped through, and the check is done for *all* of its
> +userptr gpu_vmas, although we only show a single one here.
> +
> +The userptr gpu_vma MMU invalidation notifier might be called from
> +reclaim context and, again to avoid locking order violations, we can't
> +take any dma_resv lock nor the gpu_vm->lock from within it.
> +
> +.. code-block:: C
> +
> +  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
> +  {
> +          // Make sure the exec function either sees the new sequence
> +	  // and backs off or we wait for the dma-fence:
> +
> +          down_write(&gpu_vm->userptr_notifier_lock);
> +	  mmu_interval_set_seq(userptr_interval, cur_seq);
> +	  up_write(&gpu_vm->userptr_notifier_lock);
> +
> +	  dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
> +		                false, MAX_SCHEDULE_TIMEOUT);
> +	  return true;
> +  }
> +
> +When this invalidation notifier returns, the GPU can no longer be
> +accessing the old pages of the userptr gpu_vma and needs to redo the page-binding
> +before a new GPU submission can succeed.
> +
> +Optimizing gpu_vma iteration
> +----------------------------
> +
> +Iterating through all of a gpu_vm's userptr gpu_vmas to check the validity
> +on each exec function may be very costly. There is a scheme to avoid
> +this and only iterate through the userptr gpu_vmas that actually saw an
> +invalidation notifier call since the last exec. T

The document so far looks good to me.
I'd like to hear from Danilo if this aligns with nouveau locking
or if he has any further thoughts on this in general.

> +
> +TODO: describe that scheme here. It's implemented in the xe driver.
> +
> +Locking for page-table updates at bind- and unbind time
> +=======================================================
> +
> +TODO.
> +
> +Recoverable page-fault implications
> +===================================
> +
> +TODO.

We should probably add the TODO note somewhere else and keep the doc itself clean?
or the plan is to update before we push this patch?

> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-08-31 19:30   ` Rodrigo Vivi
  0 siblings, 0 replies; 45+ messages in thread
From: Rodrigo Vivi @ 2023-08-31 19:30 UTC (permalink / raw)
  To: Thomas Hellström, Danilo Krummrich
  Cc: Matthew Brost, Francois Dugast, linux-kernel, Oak Zeng,
	Danilo Krummrich, dri-devel, intel-xe

On Wed, Aug 16, 2023 at 11:15:47AM +0200, Thomas Hellström wrote:
> Add the first version of the VM_BIND locking document which is
> intended to be part of the xe driver upstreaming agreement.
> 
> The document describes and discuss the locking used during exec-
> functions, evicton and for userptr gpu-vmas. Intention is to be using the
> same nomenclature as the drm-vm-bind-async.rst.
> 
> v2:
> - s/gvm/gpu_vm/g (Rodrigo Vivi)
> - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
>   (Rodrigo Vivi)
> - Adjust commit message accordingly.
> - Add SPDX license header.
> 
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

Cc: Danilo Krummrich <dakr@redhat.com>

> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  Documentation/gpu/drm-vm-bind-locking.rst | 351 ++++++++++++++++++++++
>  1 file changed, 351 insertions(+)
>  create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst
> 
> diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
> new file mode 100644
> index 000000000000..b813961a9ec2
> --- /dev/null
> +++ b/Documentation/gpu/drm-vm-bind-locking.rst
> @@ -0,0 +1,351 @@
> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
> +
> +===============
> +VM_BIND locking
> +===============
> +
> +This document attempts to describe what's needed to get VM_BIND locking right,
> +including the userptr mmu_notifier locking and it will also discuss some
> +optimizations to get rid of the looping through of all userptr mappings and
> +external / shared object mappings that is needed in the simplest
> +implementation. It will also discuss some implications for faulting gpu_vms.
> +
> +Nomenclature
> +============
> +
> +* ``Context``: GPU execution context.
> +* ``gpu_vm``: Abstraction of a virtual GPU address space with
> +  meta-data. Typically one per client (DRM file-private), or one per
> +  context.
> +* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with
> +  associated meta-data. The backing storage of a gpu_vma can either be
> +  a gem buffer object or anonymous pages mapped also into the CPU
> +  address space for the process.
> +* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing store of
> +  which is anonymous pages as described above.
> +* ``revalidating``: Revalidating a gpu_vma means making the latest version
> +  of the backing store resident and making sure the gpu_vma's
> +  page-table entries point to that backing store.
> +* ``dma_fence``: A struct dma_fence that is similar to a struct completion
> +  and which tracks GPU activity. When the GPU activity is finished,
> +  the dma_fence signals.
> +* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
> +  to track GPU activity in the form of multiple dma_fences on a
> +  gpu_vm or a gem buffer object. The dma_resv contains an array / list
> +  of dma_fences and a lock that needs to be held when adding
> +  additional dma_fences to the dma_resv. The lock is of a type that
> +  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
> +* ``exec function``: An exec function is a function that revalidates all
> +  affected gpu_vmas, submits a GPU command batch and registers the
> +  dma_fence representing the GPU command's activity with all affected
> +  dma_resvs. For completeness, although not covered by this document,
> +  it's worth mentioning that an exec function may also be the
> +  revalidation worker that is used by some drivers in compute /
> +  long-running mode.
> +* ``local object``: A GEM object which is local to a gpu_vm. Shared gem
> +  objects also share the gpu_vm's dma_resv.
> +* ``shared object``: AKA external object: A GEM object which may be shared
> +  by multiple gpu_vms and whose backing storage may be shared with
> +  other drivers.
> +
> +
> +Introducing the locks
> +=====================
> +
> +One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
> +dma_resv object and hence the dma_resv lock. So even with a huge
> +number of local GEM objects, only one lock is needed to make the exec
> +sequence atomic.
> +
> +The following locks and locking orders are used:
> +
> +* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the gpu_vm is
> +  partitioned into gpu_vmas, protects the gpu_vm's list of external objects,
> +  and can also with some simplification protect the gpu_vm's list of
> +  userptr gpu_vmas. With the CPU mm analogy this would correspond to the
> +  mmap_lock.
> +* The ``userptr_seqlock``. This lock is taken in read mode for each
> +  userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
> +  notifier invalidation. This is not a real seqlock but described in
> +  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
> +  'lock' a lot like a seqcount, however this allows multiple
> +  write-sides to hold it at once...". The read side critical section
> +  is enclosed by ``mmu_interval_read_begin() /
> +  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
> +  sleeping uninterruptibly if the write side is held.
> +  The write side is held by the core mm while calling mmu interval
> +  invalidation notifiers.
> +* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
> +  rebinding, and also the residency of all the gpu_vm's local GEM object.
> +* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is taken in read
> +  mode during exec and write mode during a mmu notifier invalidation. In
> +  the absence of a separate page-table lock, this lock can serve
> +  together with the gpu_vm's dma_resv lock as a page-table lock. More on
> +  this below. The userptr notifier lock is per gpu_vm.
> +* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's page-table updates. For
> +  simplicity the gpu_vm's dma_resv lock can be reused as page-table lock.
> +
> +There are certain optimizations described below that require
> +additional locks. More on that later.
> +
> +.. code-block:: C
> +
> +   dma_resv_lock(&gpu_vm->resv);
> +
> +   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
> +		revalidate_gpu_vma(&gpu_vma);
> +		remove_from_revalidate_list(&gpu_vma);
> +   }
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +   dma_resv_unlock(&gpu_vm->resv);
> +
> +Eviction of one of these local objects will then be something like the
> +following:
> +
> +.. code-block:: C
> +
> +   obj = get_object_from_lru();
> +
> +   dma_resv_lock(obj->resv);
> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
> +		put_gpu_vma_on_revalidate_list(&gpu_vma);
> +
> +   add_dependencies(&eviction_job, &obj->resv);
> +   job_dma_fence = gpu_submit(&eviction_job);
> +   add_dma_fence(&obj->resv, job_dma_fence);
> +
> +   dma_resv_unlock(&obj->resv);
> +   put_object(obj);
> +
> +Note that since the object is local to the gpu_vm, it will share the gpu_vm's
> +``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. Invalidated gpu_vmas are put
> +on the gpu_vm's revalidation list, which is protected by ``gpu_vm->resv``, which
> +is always locked while evicting, due to the above equality.
> +
> +For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before eviction,
> +Since the eviction blit or copy will wait for GPU idle, any attempt by
> +the GPU to access freed memory through the gpu_vma will be preceded by
> +a new exec function, which will make sure the gpu_vma is
> +revalidated. The eviction code holding the object's dma_resv while
> +revalidating will ensure a new exec function may not race with the eviction.
> +
> +Introducing external (or shared) buffer objects
> +===============================================
> +
> +Since shared buffer objects may be shared by multiple gpu_vm's they
> +can't share their reservation object with a single gpu_vm, but will rather
> +have a reservation object of their own. The shared objects bound to a
> +gpu_vm using one or many
> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
> +protected by the gpu_vm lock. One could in theory protect it also with
> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is typically
> +built before the ``gpu_vm->resv`` is locked due to a limitation in
> +the current locking helpers, that is typically not done. Also see
> +below for userptr gpu_vmas.
> +
> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
> +object, but we can no longer be certain that we hold the gpu_vm's
> +dma_resv of all the object's gpu_vmas. We can only be certain that we
> +hold the object's private dma_resv. We can trylock the dma_resvs for
> +the affected gpu_vm's but that might be unnecessarily complex. If we
> +have a ww_acquire context at hand at eviction time we can also perform
> +sleeping locks of those dma_resvs but that could cause expensive
> +rollbacks. One option is to just mark the invalidated gpu_vmas with a bool
> +which is inspected on the next exec function, when the gpu_vm's
> +dma_resv and the object's dma_resv is held, and the invalidated
> +gpu_vmas could then be put on the gpu_vm's list of invalidated
> +gpu_vmas. That bool would then, although being per-gpu_vma formally be
> +protected by the object's dma_resv.
> +
> +The exec function would then look something like the following:
> +
> +.. code-block:: C
> +
> +   read_lock(&gpu_vm->lock);
> +
> +   dma_resv_lock(&gpu_vm->resv);
> +
> +   // Shared object list is protected by the gpu_vm->lock.
> +   for_each_shared_obj(gpu_vm, &obj) {
> +		dma_resv_lock(&obj->resv);
> +		move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
> +   }
> +
> +   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
> +		revalidate_gpu_vma(&gpu_vma);
> +		remove_from_revalidate_list(&gpu_vma);
> +   }
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +   for_each_shared_obj(gpu_vm, &obj)
> +          add_dma_fence(job_dma_fence, &obj->resv);
> +   dma_resv_unlock_all_resv_locks();
> +
> +   read_unlock(&gpu_vm->lock);
> +
> +And the corresponding shared-object aware eviction would look like:
> +
> +.. code-block:: C
> +
> +   obj = get_object_from_lru();
> +
> +   dma_resv_lock(obj->resv);
> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
> +		if (object_is_vm_local(obj))
> +		             put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
> +		else
> +		             mark_gpu_vma_for_revalidation(&gpu_vma);
> +
> +   add_dependencies(&eviction_job, &obj->resv);
> +   job_dma_fence = gpu_submit(&eviction_job);
> +   add_dma_fence(&obj->resv, job_dma_fence);
> +
> +   dma_resv_unlock(&obj->resv);
> +   put_object(obj);
> +
> +Yet another option is to put the gpu_vmas to be invalidated on a separate
> +gpu_vm list protected by a lower level lock that can be taken both at eviction
> +time and at transfer-to-revalidate list time. The details are not in
> +this document, but this for reference implemented in the Intel xe
> +driver.
> +
> +Introducing userptr gpu_vmas
> +============================
> +
> +A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
> +GPU virtual address range, directly maps a CPU mm range of anonymous-
> +or file page-cache pages.
> +A very simple approach would be to just pin the pages using
> +pin_user_pages() at bind time and unpin them at unbind time, but this
> +creates a Denial-Of-Service vector since a single user-space process
> +would be able to pin down all of system memory, which is not
> +desirable. (For special use-cases and with proper accounting pinning might
> +still be a desirable feature, though). What we need to do in the general case is
> +to obtain a reference to the desired pages, make sure we are notified
> +using a MMU notifier just before the CPU mm unmaps the pages, dirty
> +them if they are not mapped read-only to the GPU, and then drop the reference.
> +When we are notified by the MMU notifier that CPU mm is about to drop the
> +pages, we need to stop GPU access to the pages,
> +GPU page-table and make sure that before the next time the GPU tries to access
> +whatever is now present in the CPU mm range, we unmap the old pages
> +from the GPU page tables and repeat the process of obtaining new page
> +references. Note that when the core mm decides to laundry pages, we get such
> +an unmap MMU notification and can mark the pages dirty again before the
> +next GPU access. We also get similar MMU notifications for NUMA accounting
> +which the GPU driver doesn't really need to care about, but so far
> +it's proven difficult to exclude certain notifications.
> +
> +Using a MMU notifier for device DMA (and other methods) is described in
> +`this document
> +<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
> +
> +Now the method of obtaining struct page references using
> +get_user_pages() unfortunately can't be used under a dma_resv lock
> +since that would violate the locking order of the dma_resv lock vs the
> +mmap_lock that is grabbed when resolving a CPU pagefault. This means the gpu_vm's
> +list of userptr gpu_vmas needs to be protected by an outer lock, and this
> +is the first time we strictly need the gpu_vm->lock. While it was
> +previously used also to protect the list of the gpu_vm's shared objects,
> +we could in theory have used the gpu_vm->resv for that.
> +
> +The MMU interval seqlock for a userptr gpu_vma is used in the following
> +way:
> +
> +.. code-block:: C
> +
> +   down_read(&gpu_vm->lock);
> +
> +   retry:
> +
> +   // Note: mmu_interval_read_begin() blocks until there is no
> +   // invalidation notifier running anymore.
> +   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
> +   if (seq != gpu_vma->saved_seq) {
> +           obtain_new_page_pointers(&gpu_vma);
> +	   dma_resv_lock(&gpu_vm->resv);
> +	   put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
> +	   dma_resv_unlock(&gpu_vm->resv);
> +	   gpu_vma->saved_seq = seq;
> +   }
> +
> +   // The usual revalidation goes here.
> +
> +   // Final userptr sequence validation may not happen before the
> +   // submission dma_fence is added to the gpu_vm's resv, from the POW
> +   // of the MMU invalidation notifier. Hence the
> +   // userptr_notifier_lock that will make them appear atomic.
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   down_read(&gpu_vm->userptr_notifier_lock);
> +   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
> +          up_read(&gpu_vm->userptr_notifier_lock);
> +	  goto retry;
> +   }
> +
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +
> +   for_each_shared_obj(gpu_vm, &obj)
> +          add_dma_fence(job_dma_fence, &obj->resv);
> +
> +   dma_resv_unlock_all_resv_locks();
> +   up_read(&gpu_vm->userptr_notifier_lock);
> +   up_read(&gpu_vm->lock);
> +
> +The code between ``mmu_interval_read_begin()`` and the
> +``mmu_interval_read_retry()`` marks the read side critical section of
> +what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
> +gpu_vma list is looped through, and the check is done for *all* of its
> +userptr gpu_vmas, although we only show a single one here.
> +
> +The userptr gpu_vma MMU invalidation notifier might be called from
> +reclaim context and, again to avoid locking order violations, we can't
> +take any dma_resv lock nor the gpu_vm->lock from within it.
> +
> +.. code-block:: C
> +
> +  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
> +  {
> +          // Make sure the exec function either sees the new sequence
> +	  // and backs off or we wait for the dma-fence:
> +
> +          down_write(&gpu_vm->userptr_notifier_lock);
> +	  mmu_interval_set_seq(userptr_interval, cur_seq);
> +	  up_write(&gpu_vm->userptr_notifier_lock);
> +
> +	  dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
> +		                false, MAX_SCHEDULE_TIMEOUT);
> +	  return true;
> +  }
> +
> +When this invalidation notifier returns, the GPU can no longer be
> +accessing the old pages of the userptr gpu_vma and needs to redo the page-binding
> +before a new GPU submission can succeed.
> +
> +Optimizing gpu_vma iteration
> +----------------------------
> +
> +Iterating through all of a gpu_vm's userptr gpu_vmas to check the validity
> +on each exec function may be very costly. There is a scheme to avoid
> +this and only iterate through the userptr gpu_vmas that actually saw an
> +invalidation notifier call since the last exec. T

The document so far looks good to me.
I'd like to hear from Danilo if this aligns with nouveau locking
or if he has any further thoughts on this in general.

> +
> +TODO: describe that scheme here. It's implemented in the xe driver.
> +
> +Locking for page-table updates at bind- and unbind time
> +=======================================================
> +
> +TODO.
> +
> +Recoverable page-fault implications
> +===================================
> +
> +TODO.

We should probably add the TODO note somewhere else and keep the doc itself clean?
or the plan is to update before we push this patch?

> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-xe] [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-08-31 19:30   ` Rodrigo Vivi
  0 siblings, 0 replies; 45+ messages in thread
From: Rodrigo Vivi @ 2023-08-31 19:30 UTC (permalink / raw)
  To: Thomas Hellström, Danilo Krummrich
  Cc: Francois Dugast, Joonas Lahtinen, linux-kernel, Danilo Krummrich,
	dri-devel, Daniel Vetter, intel-xe

On Wed, Aug 16, 2023 at 11:15:47AM +0200, Thomas Hellström wrote:
> Add the first version of the VM_BIND locking document which is
> intended to be part of the xe driver upstreaming agreement.
> 
> The document describes and discuss the locking used during exec-
> functions, evicton and for userptr gpu-vmas. Intention is to be using the
> same nomenclature as the drm-vm-bind-async.rst.
> 
> v2:
> - s/gvm/gpu_vm/g (Rodrigo Vivi)
> - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
>   (Rodrigo Vivi)
> - Adjust commit message accordingly.
> - Add SPDX license header.
> 
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

Cc: Danilo Krummrich <dakr@redhat.com>

> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  Documentation/gpu/drm-vm-bind-locking.rst | 351 ++++++++++++++++++++++
>  1 file changed, 351 insertions(+)
>  create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst
> 
> diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
> new file mode 100644
> index 000000000000..b813961a9ec2
> --- /dev/null
> +++ b/Documentation/gpu/drm-vm-bind-locking.rst
> @@ -0,0 +1,351 @@
> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
> +
> +===============
> +VM_BIND locking
> +===============
> +
> +This document attempts to describe what's needed to get VM_BIND locking right,
> +including the userptr mmu_notifier locking and it will also discuss some
> +optimizations to get rid of the looping through of all userptr mappings and
> +external / shared object mappings that is needed in the simplest
> +implementation. It will also discuss some implications for faulting gpu_vms.
> +
> +Nomenclature
> +============
> +
> +* ``Context``: GPU execution context.
> +* ``gpu_vm``: Abstraction of a virtual GPU address space with
> +  meta-data. Typically one per client (DRM file-private), or one per
> +  context.
> +* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with
> +  associated meta-data. The backing storage of a gpu_vma can either be
> +  a gem buffer object or anonymous pages mapped also into the CPU
> +  address space for the process.
> +* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing store of
> +  which is anonymous pages as described above.
> +* ``revalidating``: Revalidating a gpu_vma means making the latest version
> +  of the backing store resident and making sure the gpu_vma's
> +  page-table entries point to that backing store.
> +* ``dma_fence``: A struct dma_fence that is similar to a struct completion
> +  and which tracks GPU activity. When the GPU activity is finished,
> +  the dma_fence signals.
> +* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
> +  to track GPU activity in the form of multiple dma_fences on a
> +  gpu_vm or a gem buffer object. The dma_resv contains an array / list
> +  of dma_fences and a lock that needs to be held when adding
> +  additional dma_fences to the dma_resv. The lock is of a type that
> +  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
> +* ``exec function``: An exec function is a function that revalidates all
> +  affected gpu_vmas, submits a GPU command batch and registers the
> +  dma_fence representing the GPU command's activity with all affected
> +  dma_resvs. For completeness, although not covered by this document,
> +  it's worth mentioning that an exec function may also be the
> +  revalidation worker that is used by some drivers in compute /
> +  long-running mode.
> +* ``local object``: A GEM object which is local to a gpu_vm. Shared gem
> +  objects also share the gpu_vm's dma_resv.
> +* ``shared object``: AKA external object: A GEM object which may be shared
> +  by multiple gpu_vms and whose backing storage may be shared with
> +  other drivers.
> +
> +
> +Introducing the locks
> +=====================
> +
> +One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
> +dma_resv object and hence the dma_resv lock. So even with a huge
> +number of local GEM objects, only one lock is needed to make the exec
> +sequence atomic.
> +
> +The following locks and locking orders are used:
> +
> +* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the gpu_vm is
> +  partitioned into gpu_vmas, protects the gpu_vm's list of external objects,
> +  and can also with some simplification protect the gpu_vm's list of
> +  userptr gpu_vmas. With the CPU mm analogy this would correspond to the
> +  mmap_lock.
> +* The ``userptr_seqlock``. This lock is taken in read mode for each
> +  userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
> +  notifier invalidation. This is not a real seqlock but described in
> +  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
> +  'lock' a lot like a seqcount, however this allows multiple
> +  write-sides to hold it at once...". The read side critical section
> +  is enclosed by ``mmu_interval_read_begin() /
> +  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
> +  sleeping uninterruptibly if the write side is held.
> +  The write side is held by the core mm while calling mmu interval
> +  invalidation notifiers.
> +* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
> +  rebinding, and also the residency of all the gpu_vm's local GEM object.
> +* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is taken in read
> +  mode during exec and write mode during a mmu notifier invalidation. In
> +  the absence of a separate page-table lock, this lock can serve
> +  together with the gpu_vm's dma_resv lock as a page-table lock. More on
> +  this below. The userptr notifier lock is per gpu_vm.
> +* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's page-table updates. For
> +  simplicity the gpu_vm's dma_resv lock can be reused as page-table lock.
> +
> +There are certain optimizations described below that require
> +additional locks. More on that later.
> +
> +.. code-block:: C
> +
> +   dma_resv_lock(&gpu_vm->resv);
> +
> +   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
> +		revalidate_gpu_vma(&gpu_vma);
> +		remove_from_revalidate_list(&gpu_vma);
> +   }
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +   dma_resv_unlock(&gpu_vm->resv);
> +
> +Eviction of one of these local objects will then be something like the
> +following:
> +
> +.. code-block:: C
> +
> +   obj = get_object_from_lru();
> +
> +   dma_resv_lock(obj->resv);
> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
> +		put_gpu_vma_on_revalidate_list(&gpu_vma);
> +
> +   add_dependencies(&eviction_job, &obj->resv);
> +   job_dma_fence = gpu_submit(&eviction_job);
> +   add_dma_fence(&obj->resv, job_dma_fence);
> +
> +   dma_resv_unlock(&obj->resv);
> +   put_object(obj);
> +
> +Note that since the object is local to the gpu_vm, it will share the gpu_vm's
> +``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. Invalidated gpu_vmas are put
> +on the gpu_vm's revalidation list, which is protected by ``gpu_vm->resv``, which
> +is always locked while evicting, due to the above equality.
> +
> +For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before eviction,
> +Since the eviction blit or copy will wait for GPU idle, any attempt by
> +the GPU to access freed memory through the gpu_vma will be preceded by
> +a new exec function, which will make sure the gpu_vma is
> +revalidated. The eviction code holding the object's dma_resv while
> +revalidating will ensure a new exec function may not race with the eviction.
> +
> +Introducing external (or shared) buffer objects
> +===============================================
> +
> +Since shared buffer objects may be shared by multiple gpu_vm's they
> +can't share their reservation object with a single gpu_vm, but will rather
> +have a reservation object of their own. The shared objects bound to a
> +gpu_vm using one or many
> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
> +protected by the gpu_vm lock. One could in theory protect it also with
> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is typically
> +built before the ``gpu_vm->resv`` is locked due to a limitation in
> +the current locking helpers, that is typically not done. Also see
> +below for userptr gpu_vmas.
> +
> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
> +object, but we can no longer be certain that we hold the gpu_vm's
> +dma_resv of all the object's gpu_vmas. We can only be certain that we
> +hold the object's private dma_resv. We can trylock the dma_resvs for
> +the affected gpu_vm's but that might be unnecessarily complex. If we
> +have a ww_acquire context at hand at eviction time we can also perform
> +sleeping locks of those dma_resvs but that could cause expensive
> +rollbacks. One option is to just mark the invalidated gpu_vmas with a bool
> +which is inspected on the next exec function, when the gpu_vm's
> +dma_resv and the object's dma_resv is held, and the invalidated
> +gpu_vmas could then be put on the gpu_vm's list of invalidated
> +gpu_vmas. That bool would then, although being per-gpu_vma formally be
> +protected by the object's dma_resv.
> +
> +The exec function would then look something like the following:
> +
> +.. code-block:: C
> +
> +   read_lock(&gpu_vm->lock);
> +
> +   dma_resv_lock(&gpu_vm->resv);
> +
> +   // Shared object list is protected by the gpu_vm->lock.
> +   for_each_shared_obj(gpu_vm, &obj) {
> +		dma_resv_lock(&obj->resv);
> +		move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
> +   }
> +
> +   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
> +		revalidate_gpu_vma(&gpu_vma);
> +		remove_from_revalidate_list(&gpu_vma);
> +   }
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +   for_each_shared_obj(gpu_vm, &obj)
> +          add_dma_fence(job_dma_fence, &obj->resv);
> +   dma_resv_unlock_all_resv_locks();
> +
> +   read_unlock(&gpu_vm->lock);
> +
> +And the corresponding shared-object aware eviction would look like:
> +
> +.. code-block:: C
> +
> +   obj = get_object_from_lru();
> +
> +   dma_resv_lock(obj->resv);
> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
> +		if (object_is_vm_local(obj))
> +		             put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
> +		else
> +		             mark_gpu_vma_for_revalidation(&gpu_vma);
> +
> +   add_dependencies(&eviction_job, &obj->resv);
> +   job_dma_fence = gpu_submit(&eviction_job);
> +   add_dma_fence(&obj->resv, job_dma_fence);
> +
> +   dma_resv_unlock(&obj->resv);
> +   put_object(obj);
> +
> +Yet another option is to put the gpu_vmas to be invalidated on a separate
> +gpu_vm list protected by a lower level lock that can be taken both at eviction
> +time and at transfer-to-revalidate list time. The details are not in
> +this document, but this for reference implemented in the Intel xe
> +driver.
> +
> +Introducing userptr gpu_vmas
> +============================
> +
> +A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
> +GPU virtual address range, directly maps a CPU mm range of anonymous-
> +or file page-cache pages.
> +A very simple approach would be to just pin the pages using
> +pin_user_pages() at bind time and unpin them at unbind time, but this
> +creates a Denial-Of-Service vector since a single user-space process
> +would be able to pin down all of system memory, which is not
> +desirable. (For special use-cases and with proper accounting pinning might
> +still be a desirable feature, though). What we need to do in the general case is
> +to obtain a reference to the desired pages, make sure we are notified
> +using a MMU notifier just before the CPU mm unmaps the pages, dirty
> +them if they are not mapped read-only to the GPU, and then drop the reference.
> +When we are notified by the MMU notifier that CPU mm is about to drop the
> +pages, we need to stop GPU access to the pages,
> +GPU page-table and make sure that before the next time the GPU tries to access
> +whatever is now present in the CPU mm range, we unmap the old pages
> +from the GPU page tables and repeat the process of obtaining new page
> +references. Note that when the core mm decides to laundry pages, we get such
> +an unmap MMU notification and can mark the pages dirty again before the
> +next GPU access. We also get similar MMU notifications for NUMA accounting
> +which the GPU driver doesn't really need to care about, but so far
> +it's proven difficult to exclude certain notifications.
> +
> +Using a MMU notifier for device DMA (and other methods) is described in
> +`this document
> +<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
> +
> +Now the method of obtaining struct page references using
> +get_user_pages() unfortunately can't be used under a dma_resv lock
> +since that would violate the locking order of the dma_resv lock vs the
> +mmap_lock that is grabbed when resolving a CPU pagefault. This means the gpu_vm's
> +list of userptr gpu_vmas needs to be protected by an outer lock, and this
> +is the first time we strictly need the gpu_vm->lock. While it was
> +previously used also to protect the list of the gpu_vm's shared objects,
> +we could in theory have used the gpu_vm->resv for that.
> +
> +The MMU interval seqlock for a userptr gpu_vma is used in the following
> +way:
> +
> +.. code-block:: C
> +
> +   down_read(&gpu_vm->lock);
> +
> +   retry:
> +
> +   // Note: mmu_interval_read_begin() blocks until there is no
> +   // invalidation notifier running anymore.
> +   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
> +   if (seq != gpu_vma->saved_seq) {
> +           obtain_new_page_pointers(&gpu_vma);
> +	   dma_resv_lock(&gpu_vm->resv);
> +	   put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
> +	   dma_resv_unlock(&gpu_vm->resv);
> +	   gpu_vma->saved_seq = seq;
> +   }
> +
> +   // The usual revalidation goes here.
> +
> +   // Final userptr sequence validation may not happen before the
> +   // submission dma_fence is added to the gpu_vm's resv, from the POW
> +   // of the MMU invalidation notifier. Hence the
> +   // userptr_notifier_lock that will make them appear atomic.
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   down_read(&gpu_vm->userptr_notifier_lock);
> +   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
> +          up_read(&gpu_vm->userptr_notifier_lock);
> +	  goto retry;
> +   }
> +
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +
> +   for_each_shared_obj(gpu_vm, &obj)
> +          add_dma_fence(job_dma_fence, &obj->resv);
> +
> +   dma_resv_unlock_all_resv_locks();
> +   up_read(&gpu_vm->userptr_notifier_lock);
> +   up_read(&gpu_vm->lock);
> +
> +The code between ``mmu_interval_read_begin()`` and the
> +``mmu_interval_read_retry()`` marks the read side critical section of
> +what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
> +gpu_vma list is looped through, and the check is done for *all* of its
> +userptr gpu_vmas, although we only show a single one here.
> +
> +The userptr gpu_vma MMU invalidation notifier might be called from
> +reclaim context and, again to avoid locking order violations, we can't
> +take any dma_resv lock nor the gpu_vm->lock from within it.
> +
> +.. code-block:: C
> +
> +  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
> +  {
> +          // Make sure the exec function either sees the new sequence
> +	  // and backs off or we wait for the dma-fence:
> +
> +          down_write(&gpu_vm->userptr_notifier_lock);
> +	  mmu_interval_set_seq(userptr_interval, cur_seq);
> +	  up_write(&gpu_vm->userptr_notifier_lock);
> +
> +	  dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
> +		                false, MAX_SCHEDULE_TIMEOUT);
> +	  return true;
> +  }
> +
> +When this invalidation notifier returns, the GPU can no longer be
> +accessing the old pages of the userptr gpu_vma and needs to redo the page-binding
> +before a new GPU submission can succeed.
> +
> +Optimizing gpu_vma iteration
> +----------------------------
> +
> +Iterating through all of a gpu_vm's userptr gpu_vmas to check the validity
> +on each exec function may be very costly. There is a scheme to avoid
> +this and only iterate through the userptr gpu_vmas that actually saw an
> +invalidation notifier call since the last exec. T

The document so far looks good to me.
I'd like to hear from Danilo if this aligns with nouveau locking
or if he has any further thoughts on this in general.

> +
> +TODO: describe that scheme here. It's implemented in the xe driver.
> +
> +Locking for page-table updates at bind- and unbind time
> +=======================================================
> +
> +TODO.
> +
> +Recoverable page-fault implications
> +===================================
> +
> +TODO.

We should probably add the TODO note somewhere else and keep the doc itself clean?
or the plan is to update before we push this patch?

> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
  2023-08-16  9:15 ` Thomas Hellström
  (?)
@ 2023-09-05 19:50   ` Danilo Krummrich
  -1 siblings, 0 replies; 45+ messages in thread
From: Danilo Krummrich @ 2023-09-05 19:50 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Matthew Brost, Francois Dugast, linux-kernel, Oak Zeng,
	dri-devel, Rodrigo Vivi, intel-xe

On Wed, Aug 16, 2023 at 11:15:47AM +0200, Thomas Hellström wrote:
> Add the first version of the VM_BIND locking document which is
> intended to be part of the xe driver upstreaming agreement.
> 
> The document describes and discuss the locking used during exec-
> functions, evicton and for userptr gpu-vmas. Intention is to be using the
> same nomenclature as the drm-vm-bind-async.rst.
> 
> v2:
> - s/gvm/gpu_vm/g (Rodrigo Vivi)
> - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
>   (Rodrigo Vivi)
> - Adjust commit message accordingly.
> - Add SPDX license header.
> 
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  Documentation/gpu/drm-vm-bind-locking.rst | 351 ++++++++++++++++++++++
>  1 file changed, 351 insertions(+)
>  create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst
> 
> diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
> new file mode 100644
> index 000000000000..b813961a9ec2
> --- /dev/null
> +++ b/Documentation/gpu/drm-vm-bind-locking.rst
> @@ -0,0 +1,351 @@
> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
> +
> +===============
> +VM_BIND locking
> +===============
> +
> +This document attempts to describe what's needed to get VM_BIND locking right,
> +including the userptr mmu_notifier locking and it will also discuss some
> +optimizations to get rid of the looping through of all userptr mappings and
> +external / shared object mappings that is needed in the simplest
> +implementation. It will also discuss some implications for faulting gpu_vms.
> +
> +Nomenclature
> +============
> +
> +* ``Context``: GPU execution context.
> +* ``gpu_vm``: Abstraction of a virtual GPU address space with
> +  meta-data. Typically one per client (DRM file-private), or one per
> +  context.
> +* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with

The same nomenclature was used within the VM_BIND async document as well. I
wonder if it would make sense to align the naming with the GPUVA manager, such
that ('drm_gpuva_manager' -> 'drm_gpuvm'). This would also result into better
function names, such as drm_gpuvm_resv_lock() or drm_gpuvm_prepare_objects() and
potentially way better naming for the VM_BO abstraction 'drm_gpuvm_bo'.

However, I'd like to keep 'drm_gpuva' rather than 'drm_gpu_vma', but I think
this is close enough anyway.

> +  associated meta-data. The backing storage of a gpu_vma can either be
> +  a gem buffer object or anonymous pages mapped also into the CPU
> +  address space for the process.
> +* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing store of
> +  which is anonymous pages as described above.
> +* ``revalidating``: Revalidating a gpu_vma means making the latest version
> +  of the backing store resident and making sure the gpu_vma's
> +  page-table entries point to that backing store.
> +* ``dma_fence``: A struct dma_fence that is similar to a struct completion
> +  and which tracks GPU activity. When the GPU activity is finished,
> +  the dma_fence signals.
> +* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
> +  to track GPU activity in the form of multiple dma_fences on a
> +  gpu_vm or a gem buffer object. The dma_resv contains an array / list
> +  of dma_fences and a lock that needs to be held when adding
> +  additional dma_fences to the dma_resv. The lock is of a type that
> +  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
> +* ``exec function``: An exec function is a function that revalidates all
> +  affected gpu_vmas, submits a GPU command batch and registers the
> +  dma_fence representing the GPU command's activity with all affected
> +  dma_resvs. For completeness, although not covered by this document,
> +  it's worth mentioning that an exec function may also be the
> +  revalidation worker that is used by some drivers in compute /
> +  long-running mode.
> +* ``local object``: A GEM object which is local to a gpu_vm. Shared gem
> +  objects also share the gpu_vm's dma_resv.
> +* ``shared object``: AKA external object: A GEM object which may be shared
> +  by multiple gpu_vms and whose backing storage may be shared with
> +  other drivers.
> +
> +
> +Introducing the locks
> +=====================
> +
> +One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
> +dma_resv object and hence the dma_resv lock. So even with a huge
> +number of local GEM objects, only one lock is needed to make the exec
> +sequence atomic.
> +
> +The following locks and locking orders are used:
> +
> +* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the gpu_vm is
> +  partitioned into gpu_vmas, protects the gpu_vm's list of external objects,
> +  and can also with some simplification protect the gpu_vm's list of
> +  userptr gpu_vmas. With the CPU mm analogy this would correspond to the
> +  mmap_lock.
> +* The ``userptr_seqlock``. This lock is taken in read mode for each
> +  userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
> +  notifier invalidation. This is not a real seqlock but described in
> +  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
> +  'lock' a lot like a seqcount, however this allows multiple
> +  write-sides to hold it at once...". The read side critical section
> +  is enclosed by ``mmu_interval_read_begin() /
> +  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
> +  sleeping uninterruptibly if the write side is held.
> +  The write side is held by the core mm while calling mmu interval
> +  invalidation notifiers.
> +* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
> +  rebinding, and also the residency of all the gpu_vm's local GEM object.
> +* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is taken in read
> +  mode during exec and write mode during a mmu notifier invalidation. In
> +  the absence of a separate page-table lock, this lock can serve
> +  together with the gpu_vm's dma_resv lock as a page-table lock. More on
> +  this below. The userptr notifier lock is per gpu_vm.
> +* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's page-table updates. For
> +  simplicity the gpu_vm's dma_resv lock can be reused as page-table lock.
> +
> +There are certain optimizations described below that require
> +additional locks. More on that later.
> +
> +.. code-block:: C
> +
> +   dma_resv_lock(&gpu_vm->resv);
> +
> +   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
> +		revalidate_gpu_vma(&gpu_vma);
> +		remove_from_revalidate_list(&gpu_vma);
> +   }
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +   dma_resv_unlock(&gpu_vm->resv);
> +
> +Eviction of one of these local objects will then be something like the
> +following:
> +
> +.. code-block:: C
> +
> +   obj = get_object_from_lru();
> +
> +   dma_resv_lock(obj->resv);
> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
> +		put_gpu_vma_on_revalidate_list(&gpu_vma);
> +
> +   add_dependencies(&eviction_job, &obj->resv);
> +   job_dma_fence = gpu_submit(&eviction_job);
> +   add_dma_fence(&obj->resv, job_dma_fence);
> +
> +   dma_resv_unlock(&obj->resv);
> +   put_object(obj);
> +
> +Note that since the object is local to the gpu_vm, it will share the gpu_vm's
> +``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. Invalidated gpu_vmas are put
> +on the gpu_vm's revalidation list, which is protected by ``gpu_vm->resv``, which
> +is always locked while evicting, due to the above equality.
> +
> +For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before eviction,
> +Since the eviction blit or copy will wait for GPU idle, any attempt by
> +the GPU to access freed memory through the gpu_vma will be preceded by
> +a new exec function, which will make sure the gpu_vma is
> +revalidated. The eviction code holding the object's dma_resv while
> +revalidating will ensure a new exec function may not race with the eviction.
> +
> +Introducing external (or shared) buffer objects
> +===============================================
> +
> +Since shared buffer objects may be shared by multiple gpu_vm's they
> +can't share their reservation object with a single gpu_vm, but will rather
> +have a reservation object of their own. The shared objects bound to a
> +gpu_vm using one or many
> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
> +protected by the gpu_vm lock. One could in theory protect it also with
> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is typically
> +built before the ``gpu_vm->resv`` is locked due to a limitation in
> +the current locking helpers, that is typically not done. Also see
> +below for userptr gpu_vmas.
> +
> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
> +object, but we can no longer be certain that we hold the gpu_vm's
> +dma_resv of all the object's gpu_vmas. We can only be certain that we

I need to think a bit more about locking of extobj and evicted object tracking
in the case of processing 'drm_gpuva_ops' directly through callbacks within the
fence signalling critical path as mentioend in [1].

In order to support that, we'd need to protect extobjs with a separate lock,
and while iterating extobjs to acquire the dma-resv lock drop the lock within
the loop before we actually acquire the dma-resv lock. Maple tree supports that
already and this can be fully done within the GPUVA manager; no need for the
driver to care about that.

While, as already mentioned, I'd really love to support that, I noticed that we
have a similar issue with tracking evicted objects. There are (similar) ways to
deal with that, however, it drastically increases complexity.

Hence, I'd like to reconsider whether it's worth supporting it in the first
place. Most of the arguments in order to support it are for decreasing
complexity. However, if it increases complexity elsewhere, it's probably not
worth. The only argument left would be for synchronous bind jobs which could
be injected at any point of time without the need to be queued up in the
scheduler to preserve ordering. However, I'm not yet sure how important this
would be. For Xe it doesn't really seem to be a concern I guess?

[1] https://lore.kernel.org/dri-devel/202308221050.kTj8uFMA-lkp@intel.com/T/#m7f3b5a7ff70723332adeea32671578cb95c62f7c

> +hold the object's private dma_resv. We can trylock the dma_resvs for
> +the affected gpu_vm's but that might be unnecessarily complex. If we
> +have a ww_acquire context at hand at eviction time we can also perform
> +sleeping locks of those dma_resvs but that could cause expensive
> +rollbacks. One option is to just mark the invalidated gpu_vmas with a bool
> +which is inspected on the next exec function, when the gpu_vm's
> +dma_resv and the object's dma_resv is held, and the invalidated
> +gpu_vmas could then be put on the gpu_vm's list of invalidated
> +gpu_vmas. That bool would then, although being per-gpu_vma formally be
> +protected by the object's dma_resv.
> +
> +The exec function would then look something like the following:
> +
> +.. code-block:: C
> +
> +   read_lock(&gpu_vm->lock);
> +
> +   dma_resv_lock(&gpu_vm->resv);
> +
> +   // Shared object list is protected by the gpu_vm->lock.
> +   for_each_shared_obj(gpu_vm, &obj) {
> +		dma_resv_lock(&obj->resv);
> +		move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
> +   }
> +
> +   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
> +		revalidate_gpu_vma(&gpu_vma);
> +		remove_from_revalidate_list(&gpu_vma);
> +   }
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +   for_each_shared_obj(gpu_vm, &obj)
> +          add_dma_fence(job_dma_fence, &obj->resv);
> +   dma_resv_unlock_all_resv_locks();
> +
> +   read_unlock(&gpu_vm->lock);
> +
> +And the corresponding shared-object aware eviction would look like:
> +
> +.. code-block:: C
> +
> +   obj = get_object_from_lru();
> +
> +   dma_resv_lock(obj->resv);
> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
> +		if (object_is_vm_local(obj))
> +		             put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
> +		else
> +		             mark_gpu_vma_for_revalidation(&gpu_vma);
> +
> +   add_dependencies(&eviction_job, &obj->resv);
> +   job_dma_fence = gpu_submit(&eviction_job);
> +   add_dma_fence(&obj->resv, job_dma_fence);
> +
> +   dma_resv_unlock(&obj->resv);
> +   put_object(obj);
> +
> +Yet another option is to put the gpu_vmas to be invalidated on a separate
> +gpu_vm list protected by a lower level lock that can be taken both at eviction
> +time and at transfer-to-revalidate list time. The details are not in
> +this document, but this for reference implemented in the Intel xe
> +driver.
> +
> +Introducing userptr gpu_vmas
> +============================
> +
> +A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
> +GPU virtual address range, directly maps a CPU mm range of anonymous-
> +or file page-cache pages.
> +A very simple approach would be to just pin the pages using
> +pin_user_pages() at bind time and unpin them at unbind time, but this
> +creates a Denial-Of-Service vector since a single user-space process
> +would be able to pin down all of system memory, which is not
> +desirable. (For special use-cases and with proper accounting pinning might
> +still be a desirable feature, though). What we need to do in the general case is
> +to obtain a reference to the desired pages, make sure we are notified
> +using a MMU notifier just before the CPU mm unmaps the pages, dirty
> +them if they are not mapped read-only to the GPU, and then drop the reference.
> +When we are notified by the MMU notifier that CPU mm is about to drop the
> +pages, we need to stop GPU access to the pages,
> +GPU page-table and make sure that before the next time the GPU tries to access
> +whatever is now present in the CPU mm range, we unmap the old pages
> +from the GPU page tables and repeat the process of obtaining new page
> +references. Note that when the core mm decides to laundry pages, we get such
> +an unmap MMU notification and can mark the pages dirty again before the
> +next GPU access. We also get similar MMU notifications for NUMA accounting
> +which the GPU driver doesn't really need to care about, but so far
> +it's proven difficult to exclude certain notifications.
> +
> +Using a MMU notifier for device DMA (and other methods) is described in
> +`this document
> +<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
> +
> +Now the method of obtaining struct page references using
> +get_user_pages() unfortunately can't be used under a dma_resv lock
> +since that would violate the locking order of the dma_resv lock vs the
> +mmap_lock that is grabbed when resolving a CPU pagefault. This means the gpu_vm's
> +list of userptr gpu_vmas needs to be protected by an outer lock, and this
> +is the first time we strictly need the gpu_vm->lock. While it was
> +previously used also to protect the list of the gpu_vm's shared objects,
> +we could in theory have used the gpu_vm->resv for that.
> +
> +The MMU interval seqlock for a userptr gpu_vma is used in the following
> +way:
> +
> +.. code-block:: C
> +
> +   down_read(&gpu_vm->lock);
> +
> +   retry:
> +
> +   // Note: mmu_interval_read_begin() blocks until there is no
> +   // invalidation notifier running anymore.
> +   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
> +   if (seq != gpu_vma->saved_seq) {
> +           obtain_new_page_pointers(&gpu_vma);
> +	   dma_resv_lock(&gpu_vm->resv);
> +	   put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
> +	   dma_resv_unlock(&gpu_vm->resv);
> +	   gpu_vma->saved_seq = seq;
> +   }
> +
> +   // The usual revalidation goes here.
> +
> +   // Final userptr sequence validation may not happen before the
> +   // submission dma_fence is added to the gpu_vm's resv, from the POW
> +   // of the MMU invalidation notifier. Hence the
> +   // userptr_notifier_lock that will make them appear atomic.
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   down_read(&gpu_vm->userptr_notifier_lock);
> +   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
> +          up_read(&gpu_vm->userptr_notifier_lock);
> +	  goto retry;
> +   }
> +
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +
> +   for_each_shared_obj(gpu_vm, &obj)
> +          add_dma_fence(job_dma_fence, &obj->resv);
> +
> +   dma_resv_unlock_all_resv_locks();
> +   up_read(&gpu_vm->userptr_notifier_lock);
> +   up_read(&gpu_vm->lock);
> +
> +The code between ``mmu_interval_read_begin()`` and the
> +``mmu_interval_read_retry()`` marks the read side critical section of
> +what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
> +gpu_vma list is looped through, and the check is done for *all* of its
> +userptr gpu_vmas, although we only show a single one here.
> +
> +The userptr gpu_vma MMU invalidation notifier might be called from
> +reclaim context and, again to avoid locking order violations, we can't
> +take any dma_resv lock nor the gpu_vm->lock from within it.
> +
> +.. code-block:: C
> +
> +  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
> +  {
> +          // Make sure the exec function either sees the new sequence
> +	  // and backs off or we wait for the dma-fence:
> +
> +          down_write(&gpu_vm->userptr_notifier_lock);
> +	  mmu_interval_set_seq(userptr_interval, cur_seq);
> +	  up_write(&gpu_vm->userptr_notifier_lock);
> +
> +	  dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
> +		                false, MAX_SCHEDULE_TIMEOUT);
> +	  return true;
> +  }
> +
> +When this invalidation notifier returns, the GPU can no longer be
> +accessing the old pages of the userptr gpu_vma and needs to redo the page-binding
> +before a new GPU submission can succeed.
> +
> +Optimizing gpu_vma iteration
> +----------------------------
> +
> +Iterating through all of a gpu_vm's userptr gpu_vmas to check the validity
> +on each exec function may be very costly. There is a scheme to avoid
> +this and only iterate through the userptr gpu_vmas that actually saw an
> +invalidation notifier call since the last exec. T
> +
> +TODO: describe that scheme here. It's implemented in the xe driver.
> +
> +Locking for page-table updates at bind- and unbind time
> +=======================================================
> +
> +TODO.
> +
> +Recoverable page-fault implications
> +===================================
> +
> +TODO.
> -- 
> 2.41.0
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-xe] [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-05 19:50   ` Danilo Krummrich
  0 siblings, 0 replies; 45+ messages in thread
From: Danilo Krummrich @ 2023-09-05 19:50 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Francois Dugast, Joonas Lahtinen, linux-kernel, dri-devel,
	Daniel Vetter, Rodrigo Vivi, intel-xe

On Wed, Aug 16, 2023 at 11:15:47AM +0200, Thomas Hellström wrote:
> Add the first version of the VM_BIND locking document which is
> intended to be part of the xe driver upstreaming agreement.
> 
> The document describes and discuss the locking used during exec-
> functions, evicton and for userptr gpu-vmas. Intention is to be using the
> same nomenclature as the drm-vm-bind-async.rst.
> 
> v2:
> - s/gvm/gpu_vm/g (Rodrigo Vivi)
> - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
>   (Rodrigo Vivi)
> - Adjust commit message accordingly.
> - Add SPDX license header.
> 
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  Documentation/gpu/drm-vm-bind-locking.rst | 351 ++++++++++++++++++++++
>  1 file changed, 351 insertions(+)
>  create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst
> 
> diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
> new file mode 100644
> index 000000000000..b813961a9ec2
> --- /dev/null
> +++ b/Documentation/gpu/drm-vm-bind-locking.rst
> @@ -0,0 +1,351 @@
> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
> +
> +===============
> +VM_BIND locking
> +===============
> +
> +This document attempts to describe what's needed to get VM_BIND locking right,
> +including the userptr mmu_notifier locking and it will also discuss some
> +optimizations to get rid of the looping through of all userptr mappings and
> +external / shared object mappings that is needed in the simplest
> +implementation. It will also discuss some implications for faulting gpu_vms.
> +
> +Nomenclature
> +============
> +
> +* ``Context``: GPU execution context.
> +* ``gpu_vm``: Abstraction of a virtual GPU address space with
> +  meta-data. Typically one per client (DRM file-private), or one per
> +  context.
> +* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with

The same nomenclature was used within the VM_BIND async document as well. I
wonder if it would make sense to align the naming with the GPUVA manager, such
that ('drm_gpuva_manager' -> 'drm_gpuvm'). This would also result into better
function names, such as drm_gpuvm_resv_lock() or drm_gpuvm_prepare_objects() and
potentially way better naming for the VM_BO abstraction 'drm_gpuvm_bo'.

However, I'd like to keep 'drm_gpuva' rather than 'drm_gpu_vma', but I think
this is close enough anyway.

> +  associated meta-data. The backing storage of a gpu_vma can either be
> +  a gem buffer object or anonymous pages mapped also into the CPU
> +  address space for the process.
> +* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing store of
> +  which is anonymous pages as described above.
> +* ``revalidating``: Revalidating a gpu_vma means making the latest version
> +  of the backing store resident and making sure the gpu_vma's
> +  page-table entries point to that backing store.
> +* ``dma_fence``: A struct dma_fence that is similar to a struct completion
> +  and which tracks GPU activity. When the GPU activity is finished,
> +  the dma_fence signals.
> +* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
> +  to track GPU activity in the form of multiple dma_fences on a
> +  gpu_vm or a gem buffer object. The dma_resv contains an array / list
> +  of dma_fences and a lock that needs to be held when adding
> +  additional dma_fences to the dma_resv. The lock is of a type that
> +  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
> +* ``exec function``: An exec function is a function that revalidates all
> +  affected gpu_vmas, submits a GPU command batch and registers the
> +  dma_fence representing the GPU command's activity with all affected
> +  dma_resvs. For completeness, although not covered by this document,
> +  it's worth mentioning that an exec function may also be the
> +  revalidation worker that is used by some drivers in compute /
> +  long-running mode.
> +* ``local object``: A GEM object which is local to a gpu_vm. Shared gem
> +  objects also share the gpu_vm's dma_resv.
> +* ``shared object``: AKA external object: A GEM object which may be shared
> +  by multiple gpu_vms and whose backing storage may be shared with
> +  other drivers.
> +
> +
> +Introducing the locks
> +=====================
> +
> +One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
> +dma_resv object and hence the dma_resv lock. So even with a huge
> +number of local GEM objects, only one lock is needed to make the exec
> +sequence atomic.
> +
> +The following locks and locking orders are used:
> +
> +* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the gpu_vm is
> +  partitioned into gpu_vmas, protects the gpu_vm's list of external objects,
> +  and can also with some simplification protect the gpu_vm's list of
> +  userptr gpu_vmas. With the CPU mm analogy this would correspond to the
> +  mmap_lock.
> +* The ``userptr_seqlock``. This lock is taken in read mode for each
> +  userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
> +  notifier invalidation. This is not a real seqlock but described in
> +  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
> +  'lock' a lot like a seqcount, however this allows multiple
> +  write-sides to hold it at once...". The read side critical section
> +  is enclosed by ``mmu_interval_read_begin() /
> +  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
> +  sleeping uninterruptibly if the write side is held.
> +  The write side is held by the core mm while calling mmu interval
> +  invalidation notifiers.
> +* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
> +  rebinding, and also the residency of all the gpu_vm's local GEM object.
> +* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is taken in read
> +  mode during exec and write mode during a mmu notifier invalidation. In
> +  the absence of a separate page-table lock, this lock can serve
> +  together with the gpu_vm's dma_resv lock as a page-table lock. More on
> +  this below. The userptr notifier lock is per gpu_vm.
> +* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's page-table updates. For
> +  simplicity the gpu_vm's dma_resv lock can be reused as page-table lock.
> +
> +There are certain optimizations described below that require
> +additional locks. More on that later.
> +
> +.. code-block:: C
> +
> +   dma_resv_lock(&gpu_vm->resv);
> +
> +   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
> +		revalidate_gpu_vma(&gpu_vma);
> +		remove_from_revalidate_list(&gpu_vma);
> +   }
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +   dma_resv_unlock(&gpu_vm->resv);
> +
> +Eviction of one of these local objects will then be something like the
> +following:
> +
> +.. code-block:: C
> +
> +   obj = get_object_from_lru();
> +
> +   dma_resv_lock(obj->resv);
> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
> +		put_gpu_vma_on_revalidate_list(&gpu_vma);
> +
> +   add_dependencies(&eviction_job, &obj->resv);
> +   job_dma_fence = gpu_submit(&eviction_job);
> +   add_dma_fence(&obj->resv, job_dma_fence);
> +
> +   dma_resv_unlock(&obj->resv);
> +   put_object(obj);
> +
> +Note that since the object is local to the gpu_vm, it will share the gpu_vm's
> +``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. Invalidated gpu_vmas are put
> +on the gpu_vm's revalidation list, which is protected by ``gpu_vm->resv``, which
> +is always locked while evicting, due to the above equality.
> +
> +For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before eviction,
> +Since the eviction blit or copy will wait for GPU idle, any attempt by
> +the GPU to access freed memory through the gpu_vma will be preceded by
> +a new exec function, which will make sure the gpu_vma is
> +revalidated. The eviction code holding the object's dma_resv while
> +revalidating will ensure a new exec function may not race with the eviction.
> +
> +Introducing external (or shared) buffer objects
> +===============================================
> +
> +Since shared buffer objects may be shared by multiple gpu_vm's they
> +can't share their reservation object with a single gpu_vm, but will rather
> +have a reservation object of their own. The shared objects bound to a
> +gpu_vm using one or many
> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
> +protected by the gpu_vm lock. One could in theory protect it also with
> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is typically
> +built before the ``gpu_vm->resv`` is locked due to a limitation in
> +the current locking helpers, that is typically not done. Also see
> +below for userptr gpu_vmas.
> +
> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
> +object, but we can no longer be certain that we hold the gpu_vm's
> +dma_resv of all the object's gpu_vmas. We can only be certain that we

I need to think a bit more about locking of extobj and evicted object tracking
in the case of processing 'drm_gpuva_ops' directly through callbacks within the
fence signalling critical path as mentioend in [1].

In order to support that, we'd need to protect extobjs with a separate lock,
and while iterating extobjs to acquire the dma-resv lock drop the lock within
the loop before we actually acquire the dma-resv lock. Maple tree supports that
already and this can be fully done within the GPUVA manager; no need for the
driver to care about that.

While, as already mentioned, I'd really love to support that, I noticed that we
have a similar issue with tracking evicted objects. There are (similar) ways to
deal with that, however, it drastically increases complexity.

Hence, I'd like to reconsider whether it's worth supporting it in the first
place. Most of the arguments in order to support it are for decreasing
complexity. However, if it increases complexity elsewhere, it's probably not
worth. The only argument left would be for synchronous bind jobs which could
be injected at any point of time without the need to be queued up in the
scheduler to preserve ordering. However, I'm not yet sure how important this
would be. For Xe it doesn't really seem to be a concern I guess?

[1] https://lore.kernel.org/dri-devel/202308221050.kTj8uFMA-lkp@intel.com/T/#m7f3b5a7ff70723332adeea32671578cb95c62f7c

> +hold the object's private dma_resv. We can trylock the dma_resvs for
> +the affected gpu_vm's but that might be unnecessarily complex. If we
> +have a ww_acquire context at hand at eviction time we can also perform
> +sleeping locks of those dma_resvs but that could cause expensive
> +rollbacks. One option is to just mark the invalidated gpu_vmas with a bool
> +which is inspected on the next exec function, when the gpu_vm's
> +dma_resv and the object's dma_resv is held, and the invalidated
> +gpu_vmas could then be put on the gpu_vm's list of invalidated
> +gpu_vmas. That bool would then, although being per-gpu_vma formally be
> +protected by the object's dma_resv.
> +
> +The exec function would then look something like the following:
> +
> +.. code-block:: C
> +
> +   read_lock(&gpu_vm->lock);
> +
> +   dma_resv_lock(&gpu_vm->resv);
> +
> +   // Shared object list is protected by the gpu_vm->lock.
> +   for_each_shared_obj(gpu_vm, &obj) {
> +		dma_resv_lock(&obj->resv);
> +		move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
> +   }
> +
> +   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
> +		revalidate_gpu_vma(&gpu_vma);
> +		remove_from_revalidate_list(&gpu_vma);
> +   }
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +   for_each_shared_obj(gpu_vm, &obj)
> +          add_dma_fence(job_dma_fence, &obj->resv);
> +   dma_resv_unlock_all_resv_locks();
> +
> +   read_unlock(&gpu_vm->lock);
> +
> +And the corresponding shared-object aware eviction would look like:
> +
> +.. code-block:: C
> +
> +   obj = get_object_from_lru();
> +
> +   dma_resv_lock(obj->resv);
> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
> +		if (object_is_vm_local(obj))
> +		             put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
> +		else
> +		             mark_gpu_vma_for_revalidation(&gpu_vma);
> +
> +   add_dependencies(&eviction_job, &obj->resv);
> +   job_dma_fence = gpu_submit(&eviction_job);
> +   add_dma_fence(&obj->resv, job_dma_fence);
> +
> +   dma_resv_unlock(&obj->resv);
> +   put_object(obj);
> +
> +Yet another option is to put the gpu_vmas to be invalidated on a separate
> +gpu_vm list protected by a lower level lock that can be taken both at eviction
> +time and at transfer-to-revalidate list time. The details are not in
> +this document, but this for reference implemented in the Intel xe
> +driver.
> +
> +Introducing userptr gpu_vmas
> +============================
> +
> +A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
> +GPU virtual address range, directly maps a CPU mm range of anonymous-
> +or file page-cache pages.
> +A very simple approach would be to just pin the pages using
> +pin_user_pages() at bind time and unpin them at unbind time, but this
> +creates a Denial-Of-Service vector since a single user-space process
> +would be able to pin down all of system memory, which is not
> +desirable. (For special use-cases and with proper accounting pinning might
> +still be a desirable feature, though). What we need to do in the general case is
> +to obtain a reference to the desired pages, make sure we are notified
> +using a MMU notifier just before the CPU mm unmaps the pages, dirty
> +them if they are not mapped read-only to the GPU, and then drop the reference.
> +When we are notified by the MMU notifier that CPU mm is about to drop the
> +pages, we need to stop GPU access to the pages,
> +GPU page-table and make sure that before the next time the GPU tries to access
> +whatever is now present in the CPU mm range, we unmap the old pages
> +from the GPU page tables and repeat the process of obtaining new page
> +references. Note that when the core mm decides to laundry pages, we get such
> +an unmap MMU notification and can mark the pages dirty again before the
> +next GPU access. We also get similar MMU notifications for NUMA accounting
> +which the GPU driver doesn't really need to care about, but so far
> +it's proven difficult to exclude certain notifications.
> +
> +Using a MMU notifier for device DMA (and other methods) is described in
> +`this document
> +<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
> +
> +Now the method of obtaining struct page references using
> +get_user_pages() unfortunately can't be used under a dma_resv lock
> +since that would violate the locking order of the dma_resv lock vs the
> +mmap_lock that is grabbed when resolving a CPU pagefault. This means the gpu_vm's
> +list of userptr gpu_vmas needs to be protected by an outer lock, and this
> +is the first time we strictly need the gpu_vm->lock. While it was
> +previously used also to protect the list of the gpu_vm's shared objects,
> +we could in theory have used the gpu_vm->resv for that.
> +
> +The MMU interval seqlock for a userptr gpu_vma is used in the following
> +way:
> +
> +.. code-block:: C
> +
> +   down_read(&gpu_vm->lock);
> +
> +   retry:
> +
> +   // Note: mmu_interval_read_begin() blocks until there is no
> +   // invalidation notifier running anymore.
> +   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
> +   if (seq != gpu_vma->saved_seq) {
> +           obtain_new_page_pointers(&gpu_vma);
> +	   dma_resv_lock(&gpu_vm->resv);
> +	   put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
> +	   dma_resv_unlock(&gpu_vm->resv);
> +	   gpu_vma->saved_seq = seq;
> +   }
> +
> +   // The usual revalidation goes here.
> +
> +   // Final userptr sequence validation may not happen before the
> +   // submission dma_fence is added to the gpu_vm's resv, from the POW
> +   // of the MMU invalidation notifier. Hence the
> +   // userptr_notifier_lock that will make them appear atomic.
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   down_read(&gpu_vm->userptr_notifier_lock);
> +   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
> +          up_read(&gpu_vm->userptr_notifier_lock);
> +	  goto retry;
> +   }
> +
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +
> +   for_each_shared_obj(gpu_vm, &obj)
> +          add_dma_fence(job_dma_fence, &obj->resv);
> +
> +   dma_resv_unlock_all_resv_locks();
> +   up_read(&gpu_vm->userptr_notifier_lock);
> +   up_read(&gpu_vm->lock);
> +
> +The code between ``mmu_interval_read_begin()`` and the
> +``mmu_interval_read_retry()`` marks the read side critical section of
> +what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
> +gpu_vma list is looped through, and the check is done for *all* of its
> +userptr gpu_vmas, although we only show a single one here.
> +
> +The userptr gpu_vma MMU invalidation notifier might be called from
> +reclaim context and, again to avoid locking order violations, we can't
> +take any dma_resv lock nor the gpu_vm->lock from within it.
> +
> +.. code-block:: C
> +
> +  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
> +  {
> +          // Make sure the exec function either sees the new sequence
> +	  // and backs off or we wait for the dma-fence:
> +
> +          down_write(&gpu_vm->userptr_notifier_lock);
> +	  mmu_interval_set_seq(userptr_interval, cur_seq);
> +	  up_write(&gpu_vm->userptr_notifier_lock);
> +
> +	  dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
> +		                false, MAX_SCHEDULE_TIMEOUT);
> +	  return true;
> +  }
> +
> +When this invalidation notifier returns, the GPU can no longer be
> +accessing the old pages of the userptr gpu_vma and needs to redo the page-binding
> +before a new GPU submission can succeed.
> +
> +Optimizing gpu_vma iteration
> +----------------------------
> +
> +Iterating through all of a gpu_vm's userptr gpu_vmas to check the validity
> +on each exec function may be very costly. There is a scheme to avoid
> +this and only iterate through the userptr gpu_vmas that actually saw an
> +invalidation notifier call since the last exec. T
> +
> +TODO: describe that scheme here. It's implemented in the xe driver.
> +
> +Locking for page-table updates at bind- and unbind time
> +=======================================================
> +
> +TODO.
> +
> +Recoverable page-fault implications
> +===================================
> +
> +TODO.
> -- 
> 2.41.0
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-05 19:50   ` Danilo Krummrich
  0 siblings, 0 replies; 45+ messages in thread
From: Danilo Krummrich @ 2023-09-05 19:50 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Rodrigo Vivi, Matthew Brost, Joonas Lahtinen, Oak Zeng,
	Daniel Vetter, Maarten Lankhorst, Francois Dugast, dri-devel,
	linux-kernel

On Wed, Aug 16, 2023 at 11:15:47AM +0200, Thomas Hellström wrote:
> Add the first version of the VM_BIND locking document which is
> intended to be part of the xe driver upstreaming agreement.
> 
> The document describes and discuss the locking used during exec-
> functions, evicton and for userptr gpu-vmas. Intention is to be using the
> same nomenclature as the drm-vm-bind-async.rst.
> 
> v2:
> - s/gvm/gpu_vm/g (Rodrigo Vivi)
> - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
>   (Rodrigo Vivi)
> - Adjust commit message accordingly.
> - Add SPDX license header.
> 
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  Documentation/gpu/drm-vm-bind-locking.rst | 351 ++++++++++++++++++++++
>  1 file changed, 351 insertions(+)
>  create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst
> 
> diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
> new file mode 100644
> index 000000000000..b813961a9ec2
> --- /dev/null
> +++ b/Documentation/gpu/drm-vm-bind-locking.rst
> @@ -0,0 +1,351 @@
> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
> +
> +===============
> +VM_BIND locking
> +===============
> +
> +This document attempts to describe what's needed to get VM_BIND locking right,
> +including the userptr mmu_notifier locking and it will also discuss some
> +optimizations to get rid of the looping through of all userptr mappings and
> +external / shared object mappings that is needed in the simplest
> +implementation. It will also discuss some implications for faulting gpu_vms.
> +
> +Nomenclature
> +============
> +
> +* ``Context``: GPU execution context.
> +* ``gpu_vm``: Abstraction of a virtual GPU address space with
> +  meta-data. Typically one per client (DRM file-private), or one per
> +  context.
> +* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with

The same nomenclature was used within the VM_BIND async document as well. I
wonder if it would make sense to align the naming with the GPUVA manager, such
that ('drm_gpuva_manager' -> 'drm_gpuvm'). This would also result into better
function names, such as drm_gpuvm_resv_lock() or drm_gpuvm_prepare_objects() and
potentially way better naming for the VM_BO abstraction 'drm_gpuvm_bo'.

However, I'd like to keep 'drm_gpuva' rather than 'drm_gpu_vma', but I think
this is close enough anyway.

> +  associated meta-data. The backing storage of a gpu_vma can either be
> +  a gem buffer object or anonymous pages mapped also into the CPU
> +  address space for the process.
> +* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing store of
> +  which is anonymous pages as described above.
> +* ``revalidating``: Revalidating a gpu_vma means making the latest version
> +  of the backing store resident and making sure the gpu_vma's
> +  page-table entries point to that backing store.
> +* ``dma_fence``: A struct dma_fence that is similar to a struct completion
> +  and which tracks GPU activity. When the GPU activity is finished,
> +  the dma_fence signals.
> +* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
> +  to track GPU activity in the form of multiple dma_fences on a
> +  gpu_vm or a gem buffer object. The dma_resv contains an array / list
> +  of dma_fences and a lock that needs to be held when adding
> +  additional dma_fences to the dma_resv. The lock is of a type that
> +  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
> +* ``exec function``: An exec function is a function that revalidates all
> +  affected gpu_vmas, submits a GPU command batch and registers the
> +  dma_fence representing the GPU command's activity with all affected
> +  dma_resvs. For completeness, although not covered by this document,
> +  it's worth mentioning that an exec function may also be the
> +  revalidation worker that is used by some drivers in compute /
> +  long-running mode.
> +* ``local object``: A GEM object which is local to a gpu_vm. Shared gem
> +  objects also share the gpu_vm's dma_resv.
> +* ``shared object``: AKA external object: A GEM object which may be shared
> +  by multiple gpu_vms and whose backing storage may be shared with
> +  other drivers.
> +
> +
> +Introducing the locks
> +=====================
> +
> +One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
> +dma_resv object and hence the dma_resv lock. So even with a huge
> +number of local GEM objects, only one lock is needed to make the exec
> +sequence atomic.
> +
> +The following locks and locking orders are used:
> +
> +* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the gpu_vm is
> +  partitioned into gpu_vmas, protects the gpu_vm's list of external objects,
> +  and can also with some simplification protect the gpu_vm's list of
> +  userptr gpu_vmas. With the CPU mm analogy this would correspond to the
> +  mmap_lock.
> +* The ``userptr_seqlock``. This lock is taken in read mode for each
> +  userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
> +  notifier invalidation. This is not a real seqlock but described in
> +  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
> +  'lock' a lot like a seqcount, however this allows multiple
> +  write-sides to hold it at once...". The read side critical section
> +  is enclosed by ``mmu_interval_read_begin() /
> +  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
> +  sleeping uninterruptibly if the write side is held.
> +  The write side is held by the core mm while calling mmu interval
> +  invalidation notifiers.
> +* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
> +  rebinding, and also the residency of all the gpu_vm's local GEM object.
> +* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is taken in read
> +  mode during exec and write mode during a mmu notifier invalidation. In
> +  the absence of a separate page-table lock, this lock can serve
> +  together with the gpu_vm's dma_resv lock as a page-table lock. More on
> +  this below. The userptr notifier lock is per gpu_vm.
> +* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's page-table updates. For
> +  simplicity the gpu_vm's dma_resv lock can be reused as page-table lock.
> +
> +There are certain optimizations described below that require
> +additional locks. More on that later.
> +
> +.. code-block:: C
> +
> +   dma_resv_lock(&gpu_vm->resv);
> +
> +   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
> +		revalidate_gpu_vma(&gpu_vma);
> +		remove_from_revalidate_list(&gpu_vma);
> +   }
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +   dma_resv_unlock(&gpu_vm->resv);
> +
> +Eviction of one of these local objects will then be something like the
> +following:
> +
> +.. code-block:: C
> +
> +   obj = get_object_from_lru();
> +
> +   dma_resv_lock(obj->resv);
> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
> +		put_gpu_vma_on_revalidate_list(&gpu_vma);
> +
> +   add_dependencies(&eviction_job, &obj->resv);
> +   job_dma_fence = gpu_submit(&eviction_job);
> +   add_dma_fence(&obj->resv, job_dma_fence);
> +
> +   dma_resv_unlock(&obj->resv);
> +   put_object(obj);
> +
> +Note that since the object is local to the gpu_vm, it will share the gpu_vm's
> +``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. Invalidated gpu_vmas are put
> +on the gpu_vm's revalidation list, which is protected by ``gpu_vm->resv``, which
> +is always locked while evicting, due to the above equality.
> +
> +For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before eviction,
> +Since the eviction blit or copy will wait for GPU idle, any attempt by
> +the GPU to access freed memory through the gpu_vma will be preceded by
> +a new exec function, which will make sure the gpu_vma is
> +revalidated. The eviction code holding the object's dma_resv while
> +revalidating will ensure a new exec function may not race with the eviction.
> +
> +Introducing external (or shared) buffer objects
> +===============================================
> +
> +Since shared buffer objects may be shared by multiple gpu_vm's they
> +can't share their reservation object with a single gpu_vm, but will rather
> +have a reservation object of their own. The shared objects bound to a
> +gpu_vm using one or many
> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
> +protected by the gpu_vm lock. One could in theory protect it also with
> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is typically
> +built before the ``gpu_vm->resv`` is locked due to a limitation in
> +the current locking helpers, that is typically not done. Also see
> +below for userptr gpu_vmas.
> +
> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
> +object, but we can no longer be certain that we hold the gpu_vm's
> +dma_resv of all the object's gpu_vmas. We can only be certain that we

I need to think a bit more about locking of extobj and evicted object tracking
in the case of processing 'drm_gpuva_ops' directly through callbacks within the
fence signalling critical path as mentioend in [1].

In order to support that, we'd need to protect extobjs with a separate lock,
and while iterating extobjs to acquire the dma-resv lock drop the lock within
the loop before we actually acquire the dma-resv lock. Maple tree supports that
already and this can be fully done within the GPUVA manager; no need for the
driver to care about that.

While, as already mentioned, I'd really love to support that, I noticed that we
have a similar issue with tracking evicted objects. There are (similar) ways to
deal with that, however, it drastically increases complexity.

Hence, I'd like to reconsider whether it's worth supporting it in the first
place. Most of the arguments in order to support it are for decreasing
complexity. However, if it increases complexity elsewhere, it's probably not
worth. The only argument left would be for synchronous bind jobs which could
be injected at any point of time without the need to be queued up in the
scheduler to preserve ordering. However, I'm not yet sure how important this
would be. For Xe it doesn't really seem to be a concern I guess?

[1] https://lore.kernel.org/dri-devel/202308221050.kTj8uFMA-lkp@intel.com/T/#m7f3b5a7ff70723332adeea32671578cb95c62f7c

> +hold the object's private dma_resv. We can trylock the dma_resvs for
> +the affected gpu_vm's but that might be unnecessarily complex. If we
> +have a ww_acquire context at hand at eviction time we can also perform
> +sleeping locks of those dma_resvs but that could cause expensive
> +rollbacks. One option is to just mark the invalidated gpu_vmas with a bool
> +which is inspected on the next exec function, when the gpu_vm's
> +dma_resv and the object's dma_resv is held, and the invalidated
> +gpu_vmas could then be put on the gpu_vm's list of invalidated
> +gpu_vmas. That bool would then, although being per-gpu_vma formally be
> +protected by the object's dma_resv.
> +
> +The exec function would then look something like the following:
> +
> +.. code-block:: C
> +
> +   read_lock(&gpu_vm->lock);
> +
> +   dma_resv_lock(&gpu_vm->resv);
> +
> +   // Shared object list is protected by the gpu_vm->lock.
> +   for_each_shared_obj(gpu_vm, &obj) {
> +		dma_resv_lock(&obj->resv);
> +		move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
> +   }
> +
> +   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
> +		revalidate_gpu_vma(&gpu_vma);
> +		remove_from_revalidate_list(&gpu_vma);
> +   }
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +   for_each_shared_obj(gpu_vm, &obj)
> +          add_dma_fence(job_dma_fence, &obj->resv);
> +   dma_resv_unlock_all_resv_locks();
> +
> +   read_unlock(&gpu_vm->lock);
> +
> +And the corresponding shared-object aware eviction would look like:
> +
> +.. code-block:: C
> +
> +   obj = get_object_from_lru();
> +
> +   dma_resv_lock(obj->resv);
> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
> +		if (object_is_vm_local(obj))
> +		             put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
> +		else
> +		             mark_gpu_vma_for_revalidation(&gpu_vma);
> +
> +   add_dependencies(&eviction_job, &obj->resv);
> +   job_dma_fence = gpu_submit(&eviction_job);
> +   add_dma_fence(&obj->resv, job_dma_fence);
> +
> +   dma_resv_unlock(&obj->resv);
> +   put_object(obj);
> +
> +Yet another option is to put the gpu_vmas to be invalidated on a separate
> +gpu_vm list protected by a lower level lock that can be taken both at eviction
> +time and at transfer-to-revalidate list time. The details are not in
> +this document, but this for reference implemented in the Intel xe
> +driver.
> +
> +Introducing userptr gpu_vmas
> +============================
> +
> +A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
> +GPU virtual address range, directly maps a CPU mm range of anonymous-
> +or file page-cache pages.
> +A very simple approach would be to just pin the pages using
> +pin_user_pages() at bind time and unpin them at unbind time, but this
> +creates a Denial-Of-Service vector since a single user-space process
> +would be able to pin down all of system memory, which is not
> +desirable. (For special use-cases and with proper accounting pinning might
> +still be a desirable feature, though). What we need to do in the general case is
> +to obtain a reference to the desired pages, make sure we are notified
> +using a MMU notifier just before the CPU mm unmaps the pages, dirty
> +them if they are not mapped read-only to the GPU, and then drop the reference.
> +When we are notified by the MMU notifier that CPU mm is about to drop the
> +pages, we need to stop GPU access to the pages,
> +GPU page-table and make sure that before the next time the GPU tries to access
> +whatever is now present in the CPU mm range, we unmap the old pages
> +from the GPU page tables and repeat the process of obtaining new page
> +references. Note that when the core mm decides to laundry pages, we get such
> +an unmap MMU notification and can mark the pages dirty again before the
> +next GPU access. We also get similar MMU notifications for NUMA accounting
> +which the GPU driver doesn't really need to care about, but so far
> +it's proven difficult to exclude certain notifications.
> +
> +Using a MMU notifier for device DMA (and other methods) is described in
> +`this document
> +<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
> +
> +Now the method of obtaining struct page references using
> +get_user_pages() unfortunately can't be used under a dma_resv lock
> +since that would violate the locking order of the dma_resv lock vs the
> +mmap_lock that is grabbed when resolving a CPU pagefault. This means the gpu_vm's
> +list of userptr gpu_vmas needs to be protected by an outer lock, and this
> +is the first time we strictly need the gpu_vm->lock. While it was
> +previously used also to protect the list of the gpu_vm's shared objects,
> +we could in theory have used the gpu_vm->resv for that.
> +
> +The MMU interval seqlock for a userptr gpu_vma is used in the following
> +way:
> +
> +.. code-block:: C
> +
> +   down_read(&gpu_vm->lock);
> +
> +   retry:
> +
> +   // Note: mmu_interval_read_begin() blocks until there is no
> +   // invalidation notifier running anymore.
> +   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
> +   if (seq != gpu_vma->saved_seq) {
> +           obtain_new_page_pointers(&gpu_vma);
> +	   dma_resv_lock(&gpu_vm->resv);
> +	   put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
> +	   dma_resv_unlock(&gpu_vm->resv);
> +	   gpu_vma->saved_seq = seq;
> +   }
> +
> +   // The usual revalidation goes here.
> +
> +   // Final userptr sequence validation may not happen before the
> +   // submission dma_fence is added to the gpu_vm's resv, from the POW
> +   // of the MMU invalidation notifier. Hence the
> +   // userptr_notifier_lock that will make them appear atomic.
> +
> +   add_dependencies(&gpu_job, &gpu_vm->resv);
> +   down_read(&gpu_vm->userptr_notifier_lock);
> +   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
> +          up_read(&gpu_vm->userptr_notifier_lock);
> +	  goto retry;
> +   }
> +
> +   job_dma_fence = gpu_submit(&gpu_job));
> +
> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
> +
> +   for_each_shared_obj(gpu_vm, &obj)
> +          add_dma_fence(job_dma_fence, &obj->resv);
> +
> +   dma_resv_unlock_all_resv_locks();
> +   up_read(&gpu_vm->userptr_notifier_lock);
> +   up_read(&gpu_vm->lock);
> +
> +The code between ``mmu_interval_read_begin()`` and the
> +``mmu_interval_read_retry()`` marks the read side critical section of
> +what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
> +gpu_vma list is looped through, and the check is done for *all* of its
> +userptr gpu_vmas, although we only show a single one here.
> +
> +The userptr gpu_vma MMU invalidation notifier might be called from
> +reclaim context and, again to avoid locking order violations, we can't
> +take any dma_resv lock nor the gpu_vm->lock from within it.
> +
> +.. code-block:: C
> +
> +  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
> +  {
> +          // Make sure the exec function either sees the new sequence
> +	  // and backs off or we wait for the dma-fence:
> +
> +          down_write(&gpu_vm->userptr_notifier_lock);
> +	  mmu_interval_set_seq(userptr_interval, cur_seq);
> +	  up_write(&gpu_vm->userptr_notifier_lock);
> +
> +	  dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
> +		                false, MAX_SCHEDULE_TIMEOUT);
> +	  return true;
> +  }
> +
> +When this invalidation notifier returns, the GPU can no longer be
> +accessing the old pages of the userptr gpu_vma and needs to redo the page-binding
> +before a new GPU submission can succeed.
> +
> +Optimizing gpu_vma iteration
> +----------------------------
> +
> +Iterating through all of a gpu_vm's userptr gpu_vmas to check the validity
> +on each exec function may be very costly. There is a scheme to avoid
> +this and only iterate through the userptr gpu_vmas that actually saw an
> +invalidation notifier call since the last exec. T
> +
> +TODO: describe that scheme here. It's implemented in the xe driver.
> +
> +Locking for page-table updates at bind- and unbind time
> +=======================================================
> +
> +TODO.
> +
> +Recoverable page-fault implications
> +===================================
> +
> +TODO.
> -- 
> 2.41.0
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
  2023-09-05 19:50   ` [Intel-xe] " Danilo Krummrich
  (?)
@ 2023-09-06  7:06     ` Thomas Hellström
  -1 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-09-06  7:06 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: intel-xe, Rodrigo Vivi, Matthew Brost, Joonas Lahtinen, Oak Zeng,
	Daniel Vetter, Maarten Lankhorst, Francois Dugast, dri-devel,
	linux-kernel

Hi, Danilo,

Thanks for taking a look. Comments inline.

On 9/5/23 21:50, Danilo Krummrich wrote:
> On Wed, Aug 16, 2023 at 11:15:47AM +0200, Thomas Hellström wrote:
>> Add the first version of the VM_BIND locking document which is
>> intended to be part of the xe driver upstreaming agreement.
>>
>> The document describes and discuss the locking used during exec-
>> functions, evicton and for userptr gpu-vmas. Intention is to be using the
>> same nomenclature as the drm-vm-bind-async.rst.
>>
>> v2:
>> - s/gvm/gpu_vm/g (Rodrigo Vivi)
>> - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
>>    (Rodrigo Vivi)
>> - Adjust commit message accordingly.
>> - Add SPDX license header.
>>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>   Documentation/gpu/drm-vm-bind-locking.rst | 351 ++++++++++++++++++++++
>>   1 file changed, 351 insertions(+)
>>   create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst
>>
>> diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
>> new file mode 100644
>> index 000000000000..b813961a9ec2
>> --- /dev/null
>> +++ b/Documentation/gpu/drm-vm-bind-locking.rst
>> @@ -0,0 +1,351 @@
>> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
>> +
>> +===============
>> +VM_BIND locking
>> +===============
>> +
>> +This document attempts to describe what's needed to get VM_BIND locking right,
>> +including the userptr mmu_notifier locking and it will also discuss some
>> +optimizations to get rid of the looping through of all userptr mappings and
>> +external / shared object mappings that is needed in the simplest
>> +implementation. It will also discuss some implications for faulting gpu_vms.
>> +
>> +Nomenclature
>> +============
>> +
>> +* ``Context``: GPU execution context.
>> +* ``gpu_vm``: Abstraction of a virtual GPU address space with
>> +  meta-data. Typically one per client (DRM file-private), or one per
>> +  context.
>> +* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with
> The same nomenclature was used within the VM_BIND async document as well. I
> wonder if it would make sense to align the naming with the GPUVA manager, such
> that ('drm_gpuva_manager' -> 'drm_gpuvm'). This would also result into better
> function names, such as drm_gpuvm_resv_lock() or drm_gpuvm_prepare_objects() and
> potentially way better naming for the VM_BO abstraction 'drm_gpuvm_bo'.
>
> However, I'd like to keep 'drm_gpuva' rather than 'drm_gpu_vma', but I think
> this is close enough anyway.

I don't have a strong opinion about the naming here and aligning with 
the GPUVA manager make sense, although perhaps the "drm_" prefix which 
makes sense for the function- and struct names may not make sense in a 
more generic document like this. What about gpuva and gpuvm?


>
>> +  associated meta-data. The backing storage of a gpu_vma can either be
>> +  a gem buffer object or anonymous pages mapped also into the CPU
>> +  address space for the process.
>> +* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing store of
>> +  which is anonymous pages as described above.
>> +* ``revalidating``: Revalidating a gpu_vma means making the latest version
>> +  of the backing store resident and making sure the gpu_vma's
>> +  page-table entries point to that backing store.
>> +* ``dma_fence``: A struct dma_fence that is similar to a struct completion
>> +  and which tracks GPU activity. When the GPU activity is finished,
>> +  the dma_fence signals.
>> +* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
>> +  to track GPU activity in the form of multiple dma_fences on a
>> +  gpu_vm or a gem buffer object. The dma_resv contains an array / list
>> +  of dma_fences and a lock that needs to be held when adding
>> +  additional dma_fences to the dma_resv. The lock is of a type that
>> +  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
>> +* ``exec function``: An exec function is a function that revalidates all
>> +  affected gpu_vmas, submits a GPU command batch and registers the
>> +  dma_fence representing the GPU command's activity with all affected
>> +  dma_resvs. For completeness, although not covered by this document,
>> +  it's worth mentioning that an exec function may also be the
>> +  revalidation worker that is used by some drivers in compute /
>> +  long-running mode.
>> +* ``local object``: A GEM object which is local to a gpu_vm. Shared gem
>> +  objects also share the gpu_vm's dma_resv.
>> +* ``shared object``: AKA external object: A GEM object which may be shared
>> +  by multiple gpu_vms and whose backing storage may be shared with
>> +  other drivers.
>> +
>> +
>> +Introducing the locks
>> +=====================
>> +
>> +One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
>> +dma_resv object and hence the dma_resv lock. So even with a huge
>> +number of local GEM objects, only one lock is needed to make the exec
>> +sequence atomic.
>> +
>> +The following locks and locking orders are used:
>> +
>> +* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the gpu_vm is
>> +  partitioned into gpu_vmas, protects the gpu_vm's list of external objects,
>> +  and can also with some simplification protect the gpu_vm's list of
>> +  userptr gpu_vmas. With the CPU mm analogy this would correspond to the
>> +  mmap_lock.
>> +* The ``userptr_seqlock``. This lock is taken in read mode for each
>> +  userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
>> +  notifier invalidation. This is not a real seqlock but described in
>> +  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
>> +  'lock' a lot like a seqcount, however this allows multiple
>> +  write-sides to hold it at once...". The read side critical section
>> +  is enclosed by ``mmu_interval_read_begin() /
>> +  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
>> +  sleeping uninterruptibly if the write side is held.
>> +  The write side is held by the core mm while calling mmu interval
>> +  invalidation notifiers.
>> +* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
>> +  rebinding, and also the residency of all the gpu_vm's local GEM object.
>> +* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is taken in read
>> +  mode during exec and write mode during a mmu notifier invalidation. In
>> +  the absence of a separate page-table lock, this lock can serve
>> +  together with the gpu_vm's dma_resv lock as a page-table lock. More on
>> +  this below. The userptr notifier lock is per gpu_vm.
>> +* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's page-table updates. For
>> +  simplicity the gpu_vm's dma_resv lock can be reused as page-table lock.
>> +
>> +There are certain optimizations described below that require
>> +additional locks. More on that later.
>> +
>> +.. code-block:: C
>> +
>> +   dma_resv_lock(&gpu_vm->resv);
>> +
>> +   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
>> +		revalidate_gpu_vma(&gpu_vma);
>> +		remove_from_revalidate_list(&gpu_vma);
>> +   }
>> +
>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>> +   job_dma_fence = gpu_submit(&gpu_job));
>> +
>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>> +   dma_resv_unlock(&gpu_vm->resv);
>> +
>> +Eviction of one of these local objects will then be something like the
>> +following:
>> +
>> +.. code-block:: C
>> +
>> +   obj = get_object_from_lru();
>> +
>> +   dma_resv_lock(obj->resv);
>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>> +		put_gpu_vma_on_revalidate_list(&gpu_vma);
>> +
>> +   add_dependencies(&eviction_job, &obj->resv);
>> +   job_dma_fence = gpu_submit(&eviction_job);
>> +   add_dma_fence(&obj->resv, job_dma_fence);
>> +
>> +   dma_resv_unlock(&obj->resv);
>> +   put_object(obj);
>> +
>> +Note that since the object is local to the gpu_vm, it will share the gpu_vm's
>> +``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. Invalidated gpu_vmas are put
>> +on the gpu_vm's revalidation list, which is protected by ``gpu_vm->resv``, which
>> +is always locked while evicting, due to the above equality.
>> +
>> +For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before eviction,
>> +Since the eviction blit or copy will wait for GPU idle, any attempt by
>> +the GPU to access freed memory through the gpu_vma will be preceded by
>> +a new exec function, which will make sure the gpu_vma is
>> +revalidated. The eviction code holding the object's dma_resv while
>> +revalidating will ensure a new exec function may not race with the eviction.
>> +
>> +Introducing external (or shared) buffer objects
>> +===============================================
>> +
>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>> +can't share their reservation object with a single gpu_vm, but will rather
>> +have a reservation object of their own. The shared objects bound to a
>> +gpu_vm using one or many
>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>> +protected by the gpu_vm lock. One could in theory protect it also with
>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is typically
>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>> +the current locking helpers, that is typically not done. Also see
>> +below for userptr gpu_vmas.
>> +
>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>> +object, but we can no longer be certain that we hold the gpu_vm's
>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
> I need to think a bit more about locking of extobj and evicted object tracking
> in the case of processing 'drm_gpuva_ops' directly through callbacks within the
> fence signalling critical path as mentioend in [1].
>
> In order to support that, we'd need to protect extobjs with a separate lock,
> and while iterating extobjs to acquire the dma-resv lock drop the lock within
> the loop before we actually acquire the dma-resv lock. Maple tree supports that
> already and this can be fully done within the GPUVA manager; no need for the
> driver to care about that.

So do I understand correctly that this because you want to update the 
gpuvm state while operations are progressing asynchronously?

If so, I wonder whether that could really be done? For example to 
allocate enough memory for page-tables etc, you need to know the details 
of the operations at IOCTL execution time, and to know the details you 
need to know the state from the previous operation?

>
> While, as already mentioned, I'd really love to support that, I noticed that we
> have a similar issue with tracking evicted objects. There are (similar) ways to
> deal with that, however, it drastically increases complexity.
>
> Hence, I'd like to reconsider whether it's worth supporting it in the first
> place. Most of the arguments in order to support it are for decreasing
> complexity. However, if it increases complexity elsewhere, it's probably not
> worth. The only argument left would be for synchronous bind jobs which could
> be injected at any point of time without the need to be queued up in the
> scheduler to preserve ordering. However, I'm not yet sure how important this
> would be. For Xe it doesn't really seem to be a concern I guess?
Xe supports that functionality via separate bind queues. If you queue 
most of the operations using one queue, you can inject synchronous bind 
jobs using another. Ideally they execute separately, but they are not 
guaranteed to do that.
>
> [1] https://lore.kernel.org/dri-devel/202308221050.kTj8uFMA-lkp@intel.com/T/#m7f3b5a7ff70723332adeea32671578cb95c62f7c
>
>> +hold the object's private dma_resv. We can trylock the dma_resvs for
>> +the affected gpu_vm's but that might be unnecessarily complex. If we
>> +have a ww_acquire context at hand at eviction time we can also perform
>> +sleeping locks of those dma_resvs but that could cause expensive
>> +rollbacks. One option is to just mark the invalidated gpu_vmas with a bool
>> +which is inspected on the next exec function, when the gpu_vm's
>> +dma_resv and the object's dma_resv is held, and the invalidated
>> +gpu_vmas could then be put on the gpu_vm's list of invalidated
>> +gpu_vmas. That bool would then, although being per-gpu_vma formally be
>> +protected by the object's dma_resv.
>> +
>> +The exec function would then look something like the following:
>> +
>> +.. code-block:: C
>> +
>> +   read_lock(&gpu_vm->lock);
>> +
>> +   dma_resv_lock(&gpu_vm->resv);
>> +
>> +   // Shared object list is protected by the gpu_vm->lock.
>> +   for_each_shared_obj(gpu_vm, &obj) {
>> +		dma_resv_lock(&obj->resv);
>> +		move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
>> +   }
>> +
>> +   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
>> +		revalidate_gpu_vma(&gpu_vma);
>> +		remove_from_revalidate_list(&gpu_vma);
>> +   }
>> +
>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>> +   job_dma_fence = gpu_submit(&gpu_job));
>> +
>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>> +   for_each_shared_obj(gpu_vm, &obj)
>> +          add_dma_fence(job_dma_fence, &obj->resv);
>> +   dma_resv_unlock_all_resv_locks();
>> +
>> +   read_unlock(&gpu_vm->lock);
>> +
>> +And the corresponding shared-object aware eviction would look like:
>> +
>> +.. code-block:: C
>> +
>> +   obj = get_object_from_lru();
>> +
>> +   dma_resv_lock(obj->resv);
>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>> +		if (object_is_vm_local(obj))
>> +		             put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>> +		else
>> +		             mark_gpu_vma_for_revalidation(&gpu_vma);
>> +
>> +   add_dependencies(&eviction_job, &obj->resv);
>> +   job_dma_fence = gpu_submit(&eviction_job);
>> +   add_dma_fence(&obj->resv, job_dma_fence);
>> +
>> +   dma_resv_unlock(&obj->resv);
>> +   put_object(obj);
>> +
>> +Yet another option is to put the gpu_vmas to be invalidated on a separate
>> +gpu_vm list protected by a lower level lock that can be taken both at eviction
>> +time and at transfer-to-revalidate list time. The details are not in
>> +this document, but this for reference implemented in the Intel xe
>> +driver.
>> +
>> +Introducing userptr gpu_vmas
>> +============================
>> +
>> +A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
>> +GPU virtual address range, directly maps a CPU mm range of anonymous-
>> +or file page-cache pages.
>> +A very simple approach would be to just pin the pages using
>> +pin_user_pages() at bind time and unpin them at unbind time, but this
>> +creates a Denial-Of-Service vector since a single user-space process
>> +would be able to pin down all of system memory, which is not
>> +desirable. (For special use-cases and with proper accounting pinning might
>> +still be a desirable feature, though). What we need to do in the general case is
>> +to obtain a reference to the desired pages, make sure we are notified
>> +using a MMU notifier just before the CPU mm unmaps the pages, dirty
>> +them if they are not mapped read-only to the GPU, and then drop the reference.
>> +When we are notified by the MMU notifier that CPU mm is about to drop the
>> +pages, we need to stop GPU access to the pages,
>> +GPU page-table and make sure that before the next time the GPU tries to access
>> +whatever is now present in the CPU mm range, we unmap the old pages
>> +from the GPU page tables and repeat the process of obtaining new page
>> +references. Note that when the core mm decides to laundry pages, we get such
>> +an unmap MMU notification and can mark the pages dirty again before the
>> +next GPU access. We also get similar MMU notifications for NUMA accounting
>> +which the GPU driver doesn't really need to care about, but so far
>> +it's proven difficult to exclude certain notifications.
>> +
>> +Using a MMU notifier for device DMA (and other methods) is described in
>> +`this document
>> +<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
>> +
>> +Now the method of obtaining struct page references using
>> +get_user_pages() unfortunately can't be used under a dma_resv lock
>> +since that would violate the locking order of the dma_resv lock vs the
>> +mmap_lock that is grabbed when resolving a CPU pagefault. This means the gpu_vm's
>> +list of userptr gpu_vmas needs to be protected by an outer lock, and this
>> +is the first time we strictly need the gpu_vm->lock. While it was
>> +previously used also to protect the list of the gpu_vm's shared objects,
>> +we could in theory have used the gpu_vm->resv for that.
>> +
>> +The MMU interval seqlock for a userptr gpu_vma is used in the following
>> +way:
>> +
>> +.. code-block:: C
>> +
>> +   down_read(&gpu_vm->lock);
>> +
>> +   retry:
>> +
>> +   // Note: mmu_interval_read_begin() blocks until there is no
>> +   // invalidation notifier running anymore.
>> +   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
>> +   if (seq != gpu_vma->saved_seq) {
>> +           obtain_new_page_pointers(&gpu_vma);
>> +	   dma_resv_lock(&gpu_vm->resv);
>> +	   put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>> +	   dma_resv_unlock(&gpu_vm->resv);
>> +	   gpu_vma->saved_seq = seq;
>> +   }
>> +
>> +   // The usual revalidation goes here.
>> +
>> +   // Final userptr sequence validation may not happen before the
>> +   // submission dma_fence is added to the gpu_vm's resv, from the POW
>> +   // of the MMU invalidation notifier. Hence the
>> +   // userptr_notifier_lock that will make them appear atomic.
>> +
>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>> +   down_read(&gpu_vm->userptr_notifier_lock);
>> +   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
>> +          up_read(&gpu_vm->userptr_notifier_lock);
>> +	  goto retry;
>> +   }
>> +
>> +   job_dma_fence = gpu_submit(&gpu_job));
>> +
>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>> +
>> +   for_each_shared_obj(gpu_vm, &obj)
>> +          add_dma_fence(job_dma_fence, &obj->resv);
>> +
>> +   dma_resv_unlock_all_resv_locks();
>> +   up_read(&gpu_vm->userptr_notifier_lock);
>> +   up_read(&gpu_vm->lock);
>> +
>> +The code between ``mmu_interval_read_begin()`` and the
>> +``mmu_interval_read_retry()`` marks the read side critical section of
>> +what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
>> +gpu_vma list is looped through, and the check is done for *all* of its
>> +userptr gpu_vmas, although we only show a single one here.
>> +
>> +The userptr gpu_vma MMU invalidation notifier might be called from
>> +reclaim context and, again to avoid locking order violations, we can't
>> +take any dma_resv lock nor the gpu_vm->lock from within it.
>> +
>> +.. code-block:: C
>> +
>> +  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
>> +  {
>> +          // Make sure the exec function either sees the new sequence
>> +	  // and backs off or we wait for the dma-fence:
>> +
>> +          down_write(&gpu_vm->userptr_notifier_lock);
>> +	  mmu_interval_set_seq(userptr_interval, cur_seq);
>> +	  up_write(&gpu_vm->userptr_notifier_lock);
>> +
>> +	  dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
>> +		                false, MAX_SCHEDULE_TIMEOUT);
>> +	  return true;
>> +  }
>> +
>> +When this invalidation notifier returns, the GPU can no longer be
>> +accessing the old pages of the userptr gpu_vma and needs to redo the page-binding
>> +before a new GPU submission can succeed.
>> +
>> +Optimizing gpu_vma iteration
>> +----------------------------
>> +
>> +Iterating through all of a gpu_vm's userptr gpu_vmas to check the validity
>> +on each exec function may be very costly. There is a scheme to avoid
>> +this and only iterate through the userptr gpu_vmas that actually saw an
>> +invalidation notifier call since the last exec. T
>> +
>> +TODO: describe that scheme here. It's implemented in the xe driver.
>> +
>> +Locking for page-table updates at bind- and unbind time
>> +=======================================================
>> +
>> +TODO.
>> +
>> +Recoverable page-fault implications
>> +===================================
>> +
>> +TODO.
>> -- 
>> 2.41.0
>>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06  7:06     ` Thomas Hellström
  0 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-09-06  7:06 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Matthew Brost, Francois Dugast, linux-kernel, Oak Zeng,
	dri-devel, Rodrigo Vivi, intel-xe

Hi, Danilo,

Thanks for taking a look. Comments inline.

On 9/5/23 21:50, Danilo Krummrich wrote:
> On Wed, Aug 16, 2023 at 11:15:47AM +0200, Thomas Hellström wrote:
>> Add the first version of the VM_BIND locking document which is
>> intended to be part of the xe driver upstreaming agreement.
>>
>> The document describes and discuss the locking used during exec-
>> functions, evicton and for userptr gpu-vmas. Intention is to be using the
>> same nomenclature as the drm-vm-bind-async.rst.
>>
>> v2:
>> - s/gvm/gpu_vm/g (Rodrigo Vivi)
>> - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
>>    (Rodrigo Vivi)
>> - Adjust commit message accordingly.
>> - Add SPDX license header.
>>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>   Documentation/gpu/drm-vm-bind-locking.rst | 351 ++++++++++++++++++++++
>>   1 file changed, 351 insertions(+)
>>   create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst
>>
>> diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
>> new file mode 100644
>> index 000000000000..b813961a9ec2
>> --- /dev/null
>> +++ b/Documentation/gpu/drm-vm-bind-locking.rst
>> @@ -0,0 +1,351 @@
>> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
>> +
>> +===============
>> +VM_BIND locking
>> +===============
>> +
>> +This document attempts to describe what's needed to get VM_BIND locking right,
>> +including the userptr mmu_notifier locking and it will also discuss some
>> +optimizations to get rid of the looping through of all userptr mappings and
>> +external / shared object mappings that is needed in the simplest
>> +implementation. It will also discuss some implications for faulting gpu_vms.
>> +
>> +Nomenclature
>> +============
>> +
>> +* ``Context``: GPU execution context.
>> +* ``gpu_vm``: Abstraction of a virtual GPU address space with
>> +  meta-data. Typically one per client (DRM file-private), or one per
>> +  context.
>> +* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with
> The same nomenclature was used within the VM_BIND async document as well. I
> wonder if it would make sense to align the naming with the GPUVA manager, such
> that ('drm_gpuva_manager' -> 'drm_gpuvm'). This would also result into better
> function names, such as drm_gpuvm_resv_lock() or drm_gpuvm_prepare_objects() and
> potentially way better naming for the VM_BO abstraction 'drm_gpuvm_bo'.
>
> However, I'd like to keep 'drm_gpuva' rather than 'drm_gpu_vma', but I think
> this is close enough anyway.

I don't have a strong opinion about the naming here and aligning with 
the GPUVA manager make sense, although perhaps the "drm_" prefix which 
makes sense for the function- and struct names may not make sense in a 
more generic document like this. What about gpuva and gpuvm?


>
>> +  associated meta-data. The backing storage of a gpu_vma can either be
>> +  a gem buffer object or anonymous pages mapped also into the CPU
>> +  address space for the process.
>> +* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing store of
>> +  which is anonymous pages as described above.
>> +* ``revalidating``: Revalidating a gpu_vma means making the latest version
>> +  of the backing store resident and making sure the gpu_vma's
>> +  page-table entries point to that backing store.
>> +* ``dma_fence``: A struct dma_fence that is similar to a struct completion
>> +  and which tracks GPU activity. When the GPU activity is finished,
>> +  the dma_fence signals.
>> +* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
>> +  to track GPU activity in the form of multiple dma_fences on a
>> +  gpu_vm or a gem buffer object. The dma_resv contains an array / list
>> +  of dma_fences and a lock that needs to be held when adding
>> +  additional dma_fences to the dma_resv. The lock is of a type that
>> +  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
>> +* ``exec function``: An exec function is a function that revalidates all
>> +  affected gpu_vmas, submits a GPU command batch and registers the
>> +  dma_fence representing the GPU command's activity with all affected
>> +  dma_resvs. For completeness, although not covered by this document,
>> +  it's worth mentioning that an exec function may also be the
>> +  revalidation worker that is used by some drivers in compute /
>> +  long-running mode.
>> +* ``local object``: A GEM object which is local to a gpu_vm. Shared gem
>> +  objects also share the gpu_vm's dma_resv.
>> +* ``shared object``: AKA external object: A GEM object which may be shared
>> +  by multiple gpu_vms and whose backing storage may be shared with
>> +  other drivers.
>> +
>> +
>> +Introducing the locks
>> +=====================
>> +
>> +One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
>> +dma_resv object and hence the dma_resv lock. So even with a huge
>> +number of local GEM objects, only one lock is needed to make the exec
>> +sequence atomic.
>> +
>> +The following locks and locking orders are used:
>> +
>> +* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the gpu_vm is
>> +  partitioned into gpu_vmas, protects the gpu_vm's list of external objects,
>> +  and can also with some simplification protect the gpu_vm's list of
>> +  userptr gpu_vmas. With the CPU mm analogy this would correspond to the
>> +  mmap_lock.
>> +* The ``userptr_seqlock``. This lock is taken in read mode for each
>> +  userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
>> +  notifier invalidation. This is not a real seqlock but described in
>> +  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
>> +  'lock' a lot like a seqcount, however this allows multiple
>> +  write-sides to hold it at once...". The read side critical section
>> +  is enclosed by ``mmu_interval_read_begin() /
>> +  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
>> +  sleeping uninterruptibly if the write side is held.
>> +  The write side is held by the core mm while calling mmu interval
>> +  invalidation notifiers.
>> +* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
>> +  rebinding, and also the residency of all the gpu_vm's local GEM object.
>> +* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is taken in read
>> +  mode during exec and write mode during a mmu notifier invalidation. In
>> +  the absence of a separate page-table lock, this lock can serve
>> +  together with the gpu_vm's dma_resv lock as a page-table lock. More on
>> +  this below. The userptr notifier lock is per gpu_vm.
>> +* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's page-table updates. For
>> +  simplicity the gpu_vm's dma_resv lock can be reused as page-table lock.
>> +
>> +There are certain optimizations described below that require
>> +additional locks. More on that later.
>> +
>> +.. code-block:: C
>> +
>> +   dma_resv_lock(&gpu_vm->resv);
>> +
>> +   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
>> +		revalidate_gpu_vma(&gpu_vma);
>> +		remove_from_revalidate_list(&gpu_vma);
>> +   }
>> +
>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>> +   job_dma_fence = gpu_submit(&gpu_job));
>> +
>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>> +   dma_resv_unlock(&gpu_vm->resv);
>> +
>> +Eviction of one of these local objects will then be something like the
>> +following:
>> +
>> +.. code-block:: C
>> +
>> +   obj = get_object_from_lru();
>> +
>> +   dma_resv_lock(obj->resv);
>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>> +		put_gpu_vma_on_revalidate_list(&gpu_vma);
>> +
>> +   add_dependencies(&eviction_job, &obj->resv);
>> +   job_dma_fence = gpu_submit(&eviction_job);
>> +   add_dma_fence(&obj->resv, job_dma_fence);
>> +
>> +   dma_resv_unlock(&obj->resv);
>> +   put_object(obj);
>> +
>> +Note that since the object is local to the gpu_vm, it will share the gpu_vm's
>> +``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. Invalidated gpu_vmas are put
>> +on the gpu_vm's revalidation list, which is protected by ``gpu_vm->resv``, which
>> +is always locked while evicting, due to the above equality.
>> +
>> +For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before eviction,
>> +Since the eviction blit or copy will wait for GPU idle, any attempt by
>> +the GPU to access freed memory through the gpu_vma will be preceded by
>> +a new exec function, which will make sure the gpu_vma is
>> +revalidated. The eviction code holding the object's dma_resv while
>> +revalidating will ensure a new exec function may not race with the eviction.
>> +
>> +Introducing external (or shared) buffer objects
>> +===============================================
>> +
>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>> +can't share their reservation object with a single gpu_vm, but will rather
>> +have a reservation object of their own. The shared objects bound to a
>> +gpu_vm using one or many
>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>> +protected by the gpu_vm lock. One could in theory protect it also with
>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is typically
>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>> +the current locking helpers, that is typically not done. Also see
>> +below for userptr gpu_vmas.
>> +
>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>> +object, but we can no longer be certain that we hold the gpu_vm's
>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
> I need to think a bit more about locking of extobj and evicted object tracking
> in the case of processing 'drm_gpuva_ops' directly through callbacks within the
> fence signalling critical path as mentioend in [1].
>
> In order to support that, we'd need to protect extobjs with a separate lock,
> and while iterating extobjs to acquire the dma-resv lock drop the lock within
> the loop before we actually acquire the dma-resv lock. Maple tree supports that
> already and this can be fully done within the GPUVA manager; no need for the
> driver to care about that.

So do I understand correctly that this because you want to update the 
gpuvm state while operations are progressing asynchronously?

If so, I wonder whether that could really be done? For example to 
allocate enough memory for page-tables etc, you need to know the details 
of the operations at IOCTL execution time, and to know the details you 
need to know the state from the previous operation?

>
> While, as already mentioned, I'd really love to support that, I noticed that we
> have a similar issue with tracking evicted objects. There are (similar) ways to
> deal with that, however, it drastically increases complexity.
>
> Hence, I'd like to reconsider whether it's worth supporting it in the first
> place. Most of the arguments in order to support it are for decreasing
> complexity. However, if it increases complexity elsewhere, it's probably not
> worth. The only argument left would be for synchronous bind jobs which could
> be injected at any point of time without the need to be queued up in the
> scheduler to preserve ordering. However, I'm not yet sure how important this
> would be. For Xe it doesn't really seem to be a concern I guess?
Xe supports that functionality via separate bind queues. If you queue 
most of the operations using one queue, you can inject synchronous bind 
jobs using another. Ideally they execute separately, but they are not 
guaranteed to do that.
>
> [1] https://lore.kernel.org/dri-devel/202308221050.kTj8uFMA-lkp@intel.com/T/#m7f3b5a7ff70723332adeea32671578cb95c62f7c
>
>> +hold the object's private dma_resv. We can trylock the dma_resvs for
>> +the affected gpu_vm's but that might be unnecessarily complex. If we
>> +have a ww_acquire context at hand at eviction time we can also perform
>> +sleeping locks of those dma_resvs but that could cause expensive
>> +rollbacks. One option is to just mark the invalidated gpu_vmas with a bool
>> +which is inspected on the next exec function, when the gpu_vm's
>> +dma_resv and the object's dma_resv is held, and the invalidated
>> +gpu_vmas could then be put on the gpu_vm's list of invalidated
>> +gpu_vmas. That bool would then, although being per-gpu_vma formally be
>> +protected by the object's dma_resv.
>> +
>> +The exec function would then look something like the following:
>> +
>> +.. code-block:: C
>> +
>> +   read_lock(&gpu_vm->lock);
>> +
>> +   dma_resv_lock(&gpu_vm->resv);
>> +
>> +   // Shared object list is protected by the gpu_vm->lock.
>> +   for_each_shared_obj(gpu_vm, &obj) {
>> +		dma_resv_lock(&obj->resv);
>> +		move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
>> +   }
>> +
>> +   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
>> +		revalidate_gpu_vma(&gpu_vma);
>> +		remove_from_revalidate_list(&gpu_vma);
>> +   }
>> +
>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>> +   job_dma_fence = gpu_submit(&gpu_job));
>> +
>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>> +   for_each_shared_obj(gpu_vm, &obj)
>> +          add_dma_fence(job_dma_fence, &obj->resv);
>> +   dma_resv_unlock_all_resv_locks();
>> +
>> +   read_unlock(&gpu_vm->lock);
>> +
>> +And the corresponding shared-object aware eviction would look like:
>> +
>> +.. code-block:: C
>> +
>> +   obj = get_object_from_lru();
>> +
>> +   dma_resv_lock(obj->resv);
>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>> +		if (object_is_vm_local(obj))
>> +		             put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>> +		else
>> +		             mark_gpu_vma_for_revalidation(&gpu_vma);
>> +
>> +   add_dependencies(&eviction_job, &obj->resv);
>> +   job_dma_fence = gpu_submit(&eviction_job);
>> +   add_dma_fence(&obj->resv, job_dma_fence);
>> +
>> +   dma_resv_unlock(&obj->resv);
>> +   put_object(obj);
>> +
>> +Yet another option is to put the gpu_vmas to be invalidated on a separate
>> +gpu_vm list protected by a lower level lock that can be taken both at eviction
>> +time and at transfer-to-revalidate list time. The details are not in
>> +this document, but this for reference implemented in the Intel xe
>> +driver.
>> +
>> +Introducing userptr gpu_vmas
>> +============================
>> +
>> +A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
>> +GPU virtual address range, directly maps a CPU mm range of anonymous-
>> +or file page-cache pages.
>> +A very simple approach would be to just pin the pages using
>> +pin_user_pages() at bind time and unpin them at unbind time, but this
>> +creates a Denial-Of-Service vector since a single user-space process
>> +would be able to pin down all of system memory, which is not
>> +desirable. (For special use-cases and with proper accounting pinning might
>> +still be a desirable feature, though). What we need to do in the general case is
>> +to obtain a reference to the desired pages, make sure we are notified
>> +using a MMU notifier just before the CPU mm unmaps the pages, dirty
>> +them if they are not mapped read-only to the GPU, and then drop the reference.
>> +When we are notified by the MMU notifier that CPU mm is about to drop the
>> +pages, we need to stop GPU access to the pages,
>> +GPU page-table and make sure that before the next time the GPU tries to access
>> +whatever is now present in the CPU mm range, we unmap the old pages
>> +from the GPU page tables and repeat the process of obtaining new page
>> +references. Note that when the core mm decides to laundry pages, we get such
>> +an unmap MMU notification and can mark the pages dirty again before the
>> +next GPU access. We also get similar MMU notifications for NUMA accounting
>> +which the GPU driver doesn't really need to care about, but so far
>> +it's proven difficult to exclude certain notifications.
>> +
>> +Using a MMU notifier for device DMA (and other methods) is described in
>> +`this document
>> +<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
>> +
>> +Now the method of obtaining struct page references using
>> +get_user_pages() unfortunately can't be used under a dma_resv lock
>> +since that would violate the locking order of the dma_resv lock vs the
>> +mmap_lock that is grabbed when resolving a CPU pagefault. This means the gpu_vm's
>> +list of userptr gpu_vmas needs to be protected by an outer lock, and this
>> +is the first time we strictly need the gpu_vm->lock. While it was
>> +previously used also to protect the list of the gpu_vm's shared objects,
>> +we could in theory have used the gpu_vm->resv for that.
>> +
>> +The MMU interval seqlock for a userptr gpu_vma is used in the following
>> +way:
>> +
>> +.. code-block:: C
>> +
>> +   down_read(&gpu_vm->lock);
>> +
>> +   retry:
>> +
>> +   // Note: mmu_interval_read_begin() blocks until there is no
>> +   // invalidation notifier running anymore.
>> +   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
>> +   if (seq != gpu_vma->saved_seq) {
>> +           obtain_new_page_pointers(&gpu_vma);
>> +	   dma_resv_lock(&gpu_vm->resv);
>> +	   put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>> +	   dma_resv_unlock(&gpu_vm->resv);
>> +	   gpu_vma->saved_seq = seq;
>> +   }
>> +
>> +   // The usual revalidation goes here.
>> +
>> +   // Final userptr sequence validation may not happen before the
>> +   // submission dma_fence is added to the gpu_vm's resv, from the POW
>> +   // of the MMU invalidation notifier. Hence the
>> +   // userptr_notifier_lock that will make them appear atomic.
>> +
>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>> +   down_read(&gpu_vm->userptr_notifier_lock);
>> +   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
>> +          up_read(&gpu_vm->userptr_notifier_lock);
>> +	  goto retry;
>> +   }
>> +
>> +   job_dma_fence = gpu_submit(&gpu_job));
>> +
>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>> +
>> +   for_each_shared_obj(gpu_vm, &obj)
>> +          add_dma_fence(job_dma_fence, &obj->resv);
>> +
>> +   dma_resv_unlock_all_resv_locks();
>> +   up_read(&gpu_vm->userptr_notifier_lock);
>> +   up_read(&gpu_vm->lock);
>> +
>> +The code between ``mmu_interval_read_begin()`` and the
>> +``mmu_interval_read_retry()`` marks the read side critical section of
>> +what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
>> +gpu_vma list is looped through, and the check is done for *all* of its
>> +userptr gpu_vmas, although we only show a single one here.
>> +
>> +The userptr gpu_vma MMU invalidation notifier might be called from
>> +reclaim context and, again to avoid locking order violations, we can't
>> +take any dma_resv lock nor the gpu_vm->lock from within it.
>> +
>> +.. code-block:: C
>> +
>> +  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
>> +  {
>> +          // Make sure the exec function either sees the new sequence
>> +	  // and backs off or we wait for the dma-fence:
>> +
>> +          down_write(&gpu_vm->userptr_notifier_lock);
>> +	  mmu_interval_set_seq(userptr_interval, cur_seq);
>> +	  up_write(&gpu_vm->userptr_notifier_lock);
>> +
>> +	  dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
>> +		                false, MAX_SCHEDULE_TIMEOUT);
>> +	  return true;
>> +  }
>> +
>> +When this invalidation notifier returns, the GPU can no longer be
>> +accessing the old pages of the userptr gpu_vma and needs to redo the page-binding
>> +before a new GPU submission can succeed.
>> +
>> +Optimizing gpu_vma iteration
>> +----------------------------
>> +
>> +Iterating through all of a gpu_vm's userptr gpu_vmas to check the validity
>> +on each exec function may be very costly. There is a scheme to avoid
>> +this and only iterate through the userptr gpu_vmas that actually saw an
>> +invalidation notifier call since the last exec. T
>> +
>> +TODO: describe that scheme here. It's implemented in the xe driver.
>> +
>> +Locking for page-table updates at bind- and unbind time
>> +=======================================================
>> +
>> +TODO.
>> +
>> +Recoverable page-fault implications
>> +===================================
>> +
>> +TODO.
>> -- 
>> 2.41.0
>>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-xe] [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06  7:06     ` Thomas Hellström
  0 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-09-06  7:06 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Francois Dugast, Joonas Lahtinen, linux-kernel, dri-devel,
	Daniel Vetter, Rodrigo Vivi, intel-xe

Hi, Danilo,

Thanks for taking a look. Comments inline.

On 9/5/23 21:50, Danilo Krummrich wrote:
> On Wed, Aug 16, 2023 at 11:15:47AM +0200, Thomas Hellström wrote:
>> Add the first version of the VM_BIND locking document which is
>> intended to be part of the xe driver upstreaming agreement.
>>
>> The document describes and discuss the locking used during exec-
>> functions, evicton and for userptr gpu-vmas. Intention is to be using the
>> same nomenclature as the drm-vm-bind-async.rst.
>>
>> v2:
>> - s/gvm/gpu_vm/g (Rodrigo Vivi)
>> - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
>>    (Rodrigo Vivi)
>> - Adjust commit message accordingly.
>> - Add SPDX license header.
>>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>   Documentation/gpu/drm-vm-bind-locking.rst | 351 ++++++++++++++++++++++
>>   1 file changed, 351 insertions(+)
>>   create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst
>>
>> diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
>> new file mode 100644
>> index 000000000000..b813961a9ec2
>> --- /dev/null
>> +++ b/Documentation/gpu/drm-vm-bind-locking.rst
>> @@ -0,0 +1,351 @@
>> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
>> +
>> +===============
>> +VM_BIND locking
>> +===============
>> +
>> +This document attempts to describe what's needed to get VM_BIND locking right,
>> +including the userptr mmu_notifier locking and it will also discuss some
>> +optimizations to get rid of the looping through of all userptr mappings and
>> +external / shared object mappings that is needed in the simplest
>> +implementation. It will also discuss some implications for faulting gpu_vms.
>> +
>> +Nomenclature
>> +============
>> +
>> +* ``Context``: GPU execution context.
>> +* ``gpu_vm``: Abstraction of a virtual GPU address space with
>> +  meta-data. Typically one per client (DRM file-private), or one per
>> +  context.
>> +* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with
> The same nomenclature was used within the VM_BIND async document as well. I
> wonder if it would make sense to align the naming with the GPUVA manager, such
> that ('drm_gpuva_manager' -> 'drm_gpuvm'). This would also result into better
> function names, such as drm_gpuvm_resv_lock() or drm_gpuvm_prepare_objects() and
> potentially way better naming for the VM_BO abstraction 'drm_gpuvm_bo'.
>
> However, I'd like to keep 'drm_gpuva' rather than 'drm_gpu_vma', but I think
> this is close enough anyway.

I don't have a strong opinion about the naming here and aligning with 
the GPUVA manager make sense, although perhaps the "drm_" prefix which 
makes sense for the function- and struct names may not make sense in a 
more generic document like this. What about gpuva and gpuvm?


>
>> +  associated meta-data. The backing storage of a gpu_vma can either be
>> +  a gem buffer object or anonymous pages mapped also into the CPU
>> +  address space for the process.
>> +* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing store of
>> +  which is anonymous pages as described above.
>> +* ``revalidating``: Revalidating a gpu_vma means making the latest version
>> +  of the backing store resident and making sure the gpu_vma's
>> +  page-table entries point to that backing store.
>> +* ``dma_fence``: A struct dma_fence that is similar to a struct completion
>> +  and which tracks GPU activity. When the GPU activity is finished,
>> +  the dma_fence signals.
>> +* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
>> +  to track GPU activity in the form of multiple dma_fences on a
>> +  gpu_vm or a gem buffer object. The dma_resv contains an array / list
>> +  of dma_fences and a lock that needs to be held when adding
>> +  additional dma_fences to the dma_resv. The lock is of a type that
>> +  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
>> +* ``exec function``: An exec function is a function that revalidates all
>> +  affected gpu_vmas, submits a GPU command batch and registers the
>> +  dma_fence representing the GPU command's activity with all affected
>> +  dma_resvs. For completeness, although not covered by this document,
>> +  it's worth mentioning that an exec function may also be the
>> +  revalidation worker that is used by some drivers in compute /
>> +  long-running mode.
>> +* ``local object``: A GEM object which is local to a gpu_vm. Shared gem
>> +  objects also share the gpu_vm's dma_resv.
>> +* ``shared object``: AKA external object: A GEM object which may be shared
>> +  by multiple gpu_vms and whose backing storage may be shared with
>> +  other drivers.
>> +
>> +
>> +Introducing the locks
>> +=====================
>> +
>> +One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
>> +dma_resv object and hence the dma_resv lock. So even with a huge
>> +number of local GEM objects, only one lock is needed to make the exec
>> +sequence atomic.
>> +
>> +The following locks and locking orders are used:
>> +
>> +* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the gpu_vm is
>> +  partitioned into gpu_vmas, protects the gpu_vm's list of external objects,
>> +  and can also with some simplification protect the gpu_vm's list of
>> +  userptr gpu_vmas. With the CPU mm analogy this would correspond to the
>> +  mmap_lock.
>> +* The ``userptr_seqlock``. This lock is taken in read mode for each
>> +  userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
>> +  notifier invalidation. This is not a real seqlock but described in
>> +  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
>> +  'lock' a lot like a seqcount, however this allows multiple
>> +  write-sides to hold it at once...". The read side critical section
>> +  is enclosed by ``mmu_interval_read_begin() /
>> +  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
>> +  sleeping uninterruptibly if the write side is held.
>> +  The write side is held by the core mm while calling mmu interval
>> +  invalidation notifiers.
>> +* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
>> +  rebinding, and also the residency of all the gpu_vm's local GEM object.
>> +* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is taken in read
>> +  mode during exec and write mode during a mmu notifier invalidation. In
>> +  the absence of a separate page-table lock, this lock can serve
>> +  together with the gpu_vm's dma_resv lock as a page-table lock. More on
>> +  this below. The userptr notifier lock is per gpu_vm.
>> +* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's page-table updates. For
>> +  simplicity the gpu_vm's dma_resv lock can be reused as page-table lock.
>> +
>> +There are certain optimizations described below that require
>> +additional locks. More on that later.
>> +
>> +.. code-block:: C
>> +
>> +   dma_resv_lock(&gpu_vm->resv);
>> +
>> +   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
>> +		revalidate_gpu_vma(&gpu_vma);
>> +		remove_from_revalidate_list(&gpu_vma);
>> +   }
>> +
>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>> +   job_dma_fence = gpu_submit(&gpu_job));
>> +
>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>> +   dma_resv_unlock(&gpu_vm->resv);
>> +
>> +Eviction of one of these local objects will then be something like the
>> +following:
>> +
>> +.. code-block:: C
>> +
>> +   obj = get_object_from_lru();
>> +
>> +   dma_resv_lock(obj->resv);
>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>> +		put_gpu_vma_on_revalidate_list(&gpu_vma);
>> +
>> +   add_dependencies(&eviction_job, &obj->resv);
>> +   job_dma_fence = gpu_submit(&eviction_job);
>> +   add_dma_fence(&obj->resv, job_dma_fence);
>> +
>> +   dma_resv_unlock(&obj->resv);
>> +   put_object(obj);
>> +
>> +Note that since the object is local to the gpu_vm, it will share the gpu_vm's
>> +``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. Invalidated gpu_vmas are put
>> +on the gpu_vm's revalidation list, which is protected by ``gpu_vm->resv``, which
>> +is always locked while evicting, due to the above equality.
>> +
>> +For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before eviction,
>> +Since the eviction blit or copy will wait for GPU idle, any attempt by
>> +the GPU to access freed memory through the gpu_vma will be preceded by
>> +a new exec function, which will make sure the gpu_vma is
>> +revalidated. The eviction code holding the object's dma_resv while
>> +revalidating will ensure a new exec function may not race with the eviction.
>> +
>> +Introducing external (or shared) buffer objects
>> +===============================================
>> +
>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>> +can't share their reservation object with a single gpu_vm, but will rather
>> +have a reservation object of their own. The shared objects bound to a
>> +gpu_vm using one or many
>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>> +protected by the gpu_vm lock. One could in theory protect it also with
>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is typically
>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>> +the current locking helpers, that is typically not done. Also see
>> +below for userptr gpu_vmas.
>> +
>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>> +object, but we can no longer be certain that we hold the gpu_vm's
>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
> I need to think a bit more about locking of extobj and evicted object tracking
> in the case of processing 'drm_gpuva_ops' directly through callbacks within the
> fence signalling critical path as mentioend in [1].
>
> In order to support that, we'd need to protect extobjs with a separate lock,
> and while iterating extobjs to acquire the dma-resv lock drop the lock within
> the loop before we actually acquire the dma-resv lock. Maple tree supports that
> already and this can be fully done within the GPUVA manager; no need for the
> driver to care about that.

So do I understand correctly that this because you want to update the 
gpuvm state while operations are progressing asynchronously?

If so, I wonder whether that could really be done? For example to 
allocate enough memory for page-tables etc, you need to know the details 
of the operations at IOCTL execution time, and to know the details you 
need to know the state from the previous operation?

>
> While, as already mentioned, I'd really love to support that, I noticed that we
> have a similar issue with tracking evicted objects. There are (similar) ways to
> deal with that, however, it drastically increases complexity.
>
> Hence, I'd like to reconsider whether it's worth supporting it in the first
> place. Most of the arguments in order to support it are for decreasing
> complexity. However, if it increases complexity elsewhere, it's probably not
> worth. The only argument left would be for synchronous bind jobs which could
> be injected at any point of time without the need to be queued up in the
> scheduler to preserve ordering. However, I'm not yet sure how important this
> would be. For Xe it doesn't really seem to be a concern I guess?
Xe supports that functionality via separate bind queues. If you queue 
most of the operations using one queue, you can inject synchronous bind 
jobs using another. Ideally they execute separately, but they are not 
guaranteed to do that.
>
> [1] https://lore.kernel.org/dri-devel/202308221050.kTj8uFMA-lkp@intel.com/T/#m7f3b5a7ff70723332adeea32671578cb95c62f7c
>
>> +hold the object's private dma_resv. We can trylock the dma_resvs for
>> +the affected gpu_vm's but that might be unnecessarily complex. If we
>> +have a ww_acquire context at hand at eviction time we can also perform
>> +sleeping locks of those dma_resvs but that could cause expensive
>> +rollbacks. One option is to just mark the invalidated gpu_vmas with a bool
>> +which is inspected on the next exec function, when the gpu_vm's
>> +dma_resv and the object's dma_resv is held, and the invalidated
>> +gpu_vmas could then be put on the gpu_vm's list of invalidated
>> +gpu_vmas. That bool would then, although being per-gpu_vma formally be
>> +protected by the object's dma_resv.
>> +
>> +The exec function would then look something like the following:
>> +
>> +.. code-block:: C
>> +
>> +   read_lock(&gpu_vm->lock);
>> +
>> +   dma_resv_lock(&gpu_vm->resv);
>> +
>> +   // Shared object list is protected by the gpu_vm->lock.
>> +   for_each_shared_obj(gpu_vm, &obj) {
>> +		dma_resv_lock(&obj->resv);
>> +		move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
>> +   }
>> +
>> +   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
>> +		revalidate_gpu_vma(&gpu_vma);
>> +		remove_from_revalidate_list(&gpu_vma);
>> +   }
>> +
>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>> +   job_dma_fence = gpu_submit(&gpu_job));
>> +
>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>> +   for_each_shared_obj(gpu_vm, &obj)
>> +          add_dma_fence(job_dma_fence, &obj->resv);
>> +   dma_resv_unlock_all_resv_locks();
>> +
>> +   read_unlock(&gpu_vm->lock);
>> +
>> +And the corresponding shared-object aware eviction would look like:
>> +
>> +.. code-block:: C
>> +
>> +   obj = get_object_from_lru();
>> +
>> +   dma_resv_lock(obj->resv);
>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>> +		if (object_is_vm_local(obj))
>> +		             put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>> +		else
>> +		             mark_gpu_vma_for_revalidation(&gpu_vma);
>> +
>> +   add_dependencies(&eviction_job, &obj->resv);
>> +   job_dma_fence = gpu_submit(&eviction_job);
>> +   add_dma_fence(&obj->resv, job_dma_fence);
>> +
>> +   dma_resv_unlock(&obj->resv);
>> +   put_object(obj);
>> +
>> +Yet another option is to put the gpu_vmas to be invalidated on a separate
>> +gpu_vm list protected by a lower level lock that can be taken both at eviction
>> +time and at transfer-to-revalidate list time. The details are not in
>> +this document, but this for reference implemented in the Intel xe
>> +driver.
>> +
>> +Introducing userptr gpu_vmas
>> +============================
>> +
>> +A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
>> +GPU virtual address range, directly maps a CPU mm range of anonymous-
>> +or file page-cache pages.
>> +A very simple approach would be to just pin the pages using
>> +pin_user_pages() at bind time and unpin them at unbind time, but this
>> +creates a Denial-Of-Service vector since a single user-space process
>> +would be able to pin down all of system memory, which is not
>> +desirable. (For special use-cases and with proper accounting pinning might
>> +still be a desirable feature, though). What we need to do in the general case is
>> +to obtain a reference to the desired pages, make sure we are notified
>> +using a MMU notifier just before the CPU mm unmaps the pages, dirty
>> +them if they are not mapped read-only to the GPU, and then drop the reference.
>> +When we are notified by the MMU notifier that CPU mm is about to drop the
>> +pages, we need to stop GPU access to the pages,
>> +GPU page-table and make sure that before the next time the GPU tries to access
>> +whatever is now present in the CPU mm range, we unmap the old pages
>> +from the GPU page tables and repeat the process of obtaining new page
>> +references. Note that when the core mm decides to laundry pages, we get such
>> +an unmap MMU notification and can mark the pages dirty again before the
>> +next GPU access. We also get similar MMU notifications for NUMA accounting
>> +which the GPU driver doesn't really need to care about, but so far
>> +it's proven difficult to exclude certain notifications.
>> +
>> +Using a MMU notifier for device DMA (and other methods) is described in
>> +`this document
>> +<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
>> +
>> +Now the method of obtaining struct page references using
>> +get_user_pages() unfortunately can't be used under a dma_resv lock
>> +since that would violate the locking order of the dma_resv lock vs the
>> +mmap_lock that is grabbed when resolving a CPU pagefault. This means the gpu_vm's
>> +list of userptr gpu_vmas needs to be protected by an outer lock, and this
>> +is the first time we strictly need the gpu_vm->lock. While it was
>> +previously used also to protect the list of the gpu_vm's shared objects,
>> +we could in theory have used the gpu_vm->resv for that.
>> +
>> +The MMU interval seqlock for a userptr gpu_vma is used in the following
>> +way:
>> +
>> +.. code-block:: C
>> +
>> +   down_read(&gpu_vm->lock);
>> +
>> +   retry:
>> +
>> +   // Note: mmu_interval_read_begin() blocks until there is no
>> +   // invalidation notifier running anymore.
>> +   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
>> +   if (seq != gpu_vma->saved_seq) {
>> +           obtain_new_page_pointers(&gpu_vma);
>> +	   dma_resv_lock(&gpu_vm->resv);
>> +	   put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>> +	   dma_resv_unlock(&gpu_vm->resv);
>> +	   gpu_vma->saved_seq = seq;
>> +   }
>> +
>> +   // The usual revalidation goes here.
>> +
>> +   // Final userptr sequence validation may not happen before the
>> +   // submission dma_fence is added to the gpu_vm's resv, from the POW
>> +   // of the MMU invalidation notifier. Hence the
>> +   // userptr_notifier_lock that will make them appear atomic.
>> +
>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>> +   down_read(&gpu_vm->userptr_notifier_lock);
>> +   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
>> +          up_read(&gpu_vm->userptr_notifier_lock);
>> +	  goto retry;
>> +   }
>> +
>> +   job_dma_fence = gpu_submit(&gpu_job));
>> +
>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>> +
>> +   for_each_shared_obj(gpu_vm, &obj)
>> +          add_dma_fence(job_dma_fence, &obj->resv);
>> +
>> +   dma_resv_unlock_all_resv_locks();
>> +   up_read(&gpu_vm->userptr_notifier_lock);
>> +   up_read(&gpu_vm->lock);
>> +
>> +The code between ``mmu_interval_read_begin()`` and the
>> +``mmu_interval_read_retry()`` marks the read side critical section of
>> +what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
>> +gpu_vma list is looped through, and the check is done for *all* of its
>> +userptr gpu_vmas, although we only show a single one here.
>> +
>> +The userptr gpu_vma MMU invalidation notifier might be called from
>> +reclaim context and, again to avoid locking order violations, we can't
>> +take any dma_resv lock nor the gpu_vm->lock from within it.
>> +
>> +.. code-block:: C
>> +
>> +  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
>> +  {
>> +          // Make sure the exec function either sees the new sequence
>> +	  // and backs off or we wait for the dma-fence:
>> +
>> +          down_write(&gpu_vm->userptr_notifier_lock);
>> +	  mmu_interval_set_seq(userptr_interval, cur_seq);
>> +	  up_write(&gpu_vm->userptr_notifier_lock);
>> +
>> +	  dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
>> +		                false, MAX_SCHEDULE_TIMEOUT);
>> +	  return true;
>> +  }
>> +
>> +When this invalidation notifier returns, the GPU can no longer be
>> +accessing the old pages of the userptr gpu_vma and needs to redo the page-binding
>> +before a new GPU submission can succeed.
>> +
>> +Optimizing gpu_vma iteration
>> +----------------------------
>> +
>> +Iterating through all of a gpu_vm's userptr gpu_vmas to check the validity
>> +on each exec function may be very costly. There is a scheme to avoid
>> +this and only iterate through the userptr gpu_vmas that actually saw an
>> +invalidation notifier call since the last exec. T
>> +
>> +TODO: describe that scheme here. It's implemented in the xe driver.
>> +
>> +Locking for page-table updates at bind- and unbind time
>> +=======================================================
>> +
>> +TODO.
>> +
>> +Recoverable page-fault implications
>> +===================================
>> +
>> +TODO.
>> -- 
>> 2.41.0
>>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
  2023-09-06  7:06     ` Thomas Hellström
  (?)
@ 2023-09-06  8:00       ` Danilo Krummrich
  -1 siblings, 0 replies; 45+ messages in thread
From: Danilo Krummrich @ 2023-09-06  8:00 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Matthew Brost, Francois Dugast, linux-kernel, Oak Zeng,
	dri-devel, Rodrigo Vivi, intel-xe

On 9/6/23 09:06, Thomas Hellström wrote:
> Hi, Danilo,
> 
> Thanks for taking a look. Comments inline.
> 
> On 9/5/23 21:50, Danilo Krummrich wrote:
>> On Wed, Aug 16, 2023 at 11:15:47AM +0200, Thomas Hellström wrote:
>>> Add the first version of the VM_BIND locking document which is
>>> intended to be part of the xe driver upstreaming agreement.
>>>
>>> The document describes and discuss the locking used during exec-
>>> functions, evicton and for userptr gpu-vmas. Intention is to be using the
>>> same nomenclature as the drm-vm-bind-async.rst.
>>>
>>> v2:
>>> - s/gvm/gpu_vm/g (Rodrigo Vivi)
>>> - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
>>>    (Rodrigo Vivi)
>>> - Adjust commit message accordingly.
>>> - Add SPDX license header.
>>>
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> ---
>>>   Documentation/gpu/drm-vm-bind-locking.rst | 351 ++++++++++++++++++++++
>>>   1 file changed, 351 insertions(+)
>>>   create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst
>>>
>>> diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
>>> new file mode 100644
>>> index 000000000000..b813961a9ec2
>>> --- /dev/null
>>> +++ b/Documentation/gpu/drm-vm-bind-locking.rst
>>> @@ -0,0 +1,351 @@
>>> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
>>> +
>>> +===============
>>> +VM_BIND locking
>>> +===============
>>> +
>>> +This document attempts to describe what's needed to get VM_BIND locking right,
>>> +including the userptr mmu_notifier locking and it will also discuss some
>>> +optimizations to get rid of the looping through of all userptr mappings and
>>> +external / shared object mappings that is needed in the simplest
>>> +implementation. It will also discuss some implications for faulting gpu_vms.
>>> +
>>> +Nomenclature
>>> +============
>>> +
>>> +* ``Context``: GPU execution context.
>>> +* ``gpu_vm``: Abstraction of a virtual GPU address space with
>>> +  meta-data. Typically one per client (DRM file-private), or one per
>>> +  context.
>>> +* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with
>> The same nomenclature was used within the VM_BIND async document as well. I
>> wonder if it would make sense to align the naming with the GPUVA manager, such
>> that ('drm_gpuva_manager' -> 'drm_gpuvm'). This would also result into better
>> function names, such as drm_gpuvm_resv_lock() or drm_gpuvm_prepare_objects() and
>> potentially way better naming for the VM_BO abstraction 'drm_gpuvm_bo'.
>>
>> However, I'd like to keep 'drm_gpuva' rather than 'drm_gpu_vma', but I think
>> this is close enough anyway.
> 
> I don't have a strong opinion about the naming here and aligning with the GPUVA manager make sense, although perhaps the "drm_" prefix which makes sense for the function- and struct names may not make sense in a more generic document like this. What about gpuva and gpuvm?

Oh, I think the document is fine as it is. This was more like me thinking loud
about renaming things in the GPUVA manager accordingly.

> 
> 
>>
>>> +  associated meta-data. The backing storage of a gpu_vma can either be
>>> +  a gem buffer object or anonymous pages mapped also into the CPU
>>> +  address space for the process.
>>> +* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing store of
>>> +  which is anonymous pages as described above.
>>> +* ``revalidating``: Revalidating a gpu_vma means making the latest version
>>> +  of the backing store resident and making sure the gpu_vma's
>>> +  page-table entries point to that backing store.
>>> +* ``dma_fence``: A struct dma_fence that is similar to a struct completion
>>> +  and which tracks GPU activity. When the GPU activity is finished,
>>> +  the dma_fence signals.
>>> +* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
>>> +  to track GPU activity in the form of multiple dma_fences on a
>>> +  gpu_vm or a gem buffer object. The dma_resv contains an array / list
>>> +  of dma_fences and a lock that needs to be held when adding
>>> +  additional dma_fences to the dma_resv. The lock is of a type that
>>> +  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
>>> +* ``exec function``: An exec function is a function that revalidates all
>>> +  affected gpu_vmas, submits a GPU command batch and registers the
>>> +  dma_fence representing the GPU command's activity with all affected
>>> +  dma_resvs. For completeness, although not covered by this document,
>>> +  it's worth mentioning that an exec function may also be the
>>> +  revalidation worker that is used by some drivers in compute /
>>> +  long-running mode.
>>> +* ``local object``: A GEM object which is local to a gpu_vm. Shared gem
>>> +  objects also share the gpu_vm's dma_resv.
>>> +* ``shared object``: AKA external object: A GEM object which may be shared
>>> +  by multiple gpu_vms and whose backing storage may be shared with
>>> +  other drivers.
>>> +
>>> +
>>> +Introducing the locks
>>> +=====================
>>> +
>>> +One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
>>> +dma_resv object and hence the dma_resv lock. So even with a huge
>>> +number of local GEM objects, only one lock is needed to make the exec
>>> +sequence atomic.
>>> +
>>> +The following locks and locking orders are used:
>>> +
>>> +* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the gpu_vm is
>>> +  partitioned into gpu_vmas, protects the gpu_vm's list of external objects,
>>> +  and can also with some simplification protect the gpu_vm's list of
>>> +  userptr gpu_vmas. With the CPU mm analogy this would correspond to the
>>> +  mmap_lock.
>>> +* The ``userptr_seqlock``. This lock is taken in read mode for each
>>> +  userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
>>> +  notifier invalidation. This is not a real seqlock but described in
>>> +  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
>>> +  'lock' a lot like a seqcount, however this allows multiple
>>> +  write-sides to hold it at once...". The read side critical section
>>> +  is enclosed by ``mmu_interval_read_begin() /
>>> +  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
>>> +  sleeping uninterruptibly if the write side is held.
>>> +  The write side is held by the core mm while calling mmu interval
>>> +  invalidation notifiers.
>>> +* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
>>> +  rebinding, and also the residency of all the gpu_vm's local GEM object.
>>> +* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is taken in read
>>> +  mode during exec and write mode during a mmu notifier invalidation. In
>>> +  the absence of a separate page-table lock, this lock can serve
>>> +  together with the gpu_vm's dma_resv lock as a page-table lock. More on
>>> +  this below. The userptr notifier lock is per gpu_vm.
>>> +* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's page-table updates. For
>>> +  simplicity the gpu_vm's dma_resv lock can be reused as page-table lock.
>>> +
>>> +There are certain optimizations described below that require
>>> +additional locks. More on that later.
>>> +
>>> +.. code-block:: C
>>> +
>>> +   dma_resv_lock(&gpu_vm->resv);
>>> +
>>> +   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
>>> +        revalidate_gpu_vma(&gpu_vma);
>>> +        remove_from_revalidate_list(&gpu_vma);
>>> +   }
>>> +
>>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>>> +   job_dma_fence = gpu_submit(&gpu_job));
>>> +
>>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>>> +   dma_resv_unlock(&gpu_vm->resv);
>>> +
>>> +Eviction of one of these local objects will then be something like the
>>> +following:
>>> +
>>> +.. code-block:: C
>>> +
>>> +   obj = get_object_from_lru();
>>> +
>>> +   dma_resv_lock(obj->resv);
>>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>>> +        put_gpu_vma_on_revalidate_list(&gpu_vma);
>>> +
>>> +   add_dependencies(&eviction_job, &obj->resv);
>>> +   job_dma_fence = gpu_submit(&eviction_job);
>>> +   add_dma_fence(&obj->resv, job_dma_fence);
>>> +
>>> +   dma_resv_unlock(&obj->resv);
>>> +   put_object(obj);
>>> +
>>> +Note that since the object is local to the gpu_vm, it will share the gpu_vm's
>>> +``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. Invalidated gpu_vmas are put
>>> +on the gpu_vm's revalidation list, which is protected by ``gpu_vm->resv``, which
>>> +is always locked while evicting, due to the above equality.
>>> +
>>> +For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before eviction,
>>> +Since the eviction blit or copy will wait for GPU idle, any attempt by
>>> +the GPU to access freed memory through the gpu_vma will be preceded by
>>> +a new exec function, which will make sure the gpu_vma is
>>> +revalidated. The eviction code holding the object's dma_resv while
>>> +revalidating will ensure a new exec function may not race with the eviction.
>>> +
>>> +Introducing external (or shared) buffer objects
>>> +===============================================
>>> +
>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>>> +can't share their reservation object with a single gpu_vm, but will rather
>>> +have a reservation object of their own. The shared objects bound to a
>>> +gpu_vm using one or many
>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>>> +protected by the gpu_vm lock. One could in theory protect it also with
>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is typically
>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>>> +the current locking helpers, that is typically not done. Also see
>>> +below for userptr gpu_vmas.
>>> +
>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>>> +object, but we can no longer be certain that we hold the gpu_vm's
>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
>> I need to think a bit more about locking of extobj and evicted object tracking
>> in the case of processing 'drm_gpuva_ops' directly through callbacks within the
>> fence signalling critical path as mentioend in [1].
>>
>> In order to support that, we'd need to protect extobjs with a separate lock,
>> and while iterating extobjs to acquire the dma-resv lock drop the lock within
>> the loop before we actually acquire the dma-resv lock. Maple tree supports that
>> already and this can be fully done within the GPUVA manager; no need for the
>> driver to care about that.
> 
> So do I understand correctly that this because you want to update the gpuvm state while operations are progressing asynchronously?
> 
> If so, I wonder whether that could really be done? For example to allocate enough memory for page-tables etc, you need to know the details of the operations at IOCTL execution time, and to know the details you need to know the state from the previous operation?

Right, sync and async bind can't run fully concurrently, but you could "inject" a
sync one between two async ones such that the sync ones executed from the IOCTL
directly while async execution is stalled meanwhile. This would be possible because
the actual drm_gpuva_ops would be calculated within the async execution path rather
than in the IOCTL. But yes, page-table management must be desinged to support that.

> 
>>
>> While, as already mentioned, I'd really love to support that, I noticed that we
>> have a similar issue with tracking evicted objects. There are (similar) ways to
>> deal with that, however, it drastically increases complexity.
>>
>> Hence, I'd like to reconsider whether it's worth supporting it in the first
>> place. Most of the arguments in order to support it are for decreasing
>> complexity. However, if it increases complexity elsewhere, it's probably not
>> worth. The only argument left would be for synchronous bind jobs which could
>> be injected at any point of time without the need to be queued up in the
>> scheduler to preserve ordering. However, I'm not yet sure how important this
>> would be. For Xe it doesn't really seem to be a concern I guess?
> Xe supports that functionality via separate bind queues. If you queue most of the operations using one queue, you can inject synchronous bind jobs using another. Ideally they execute separately, but they are not guaranteed to do that.

Ok, but the separate bind queue would still work in the same asynchronous way, as
in the job is submitted to some kind of worker and the IOCTL just blocks until
completion, right?

>>
>> [1] https://lore.kernel.org/dri-devel/202308221050.kTj8uFMA-lkp@intel.com/T/#m7f3b5a7ff70723332adeea32671578cb95c62f7c
>>
>>> +hold the object's private dma_resv. We can trylock the dma_resvs for
>>> +the affected gpu_vm's but that might be unnecessarily complex. If we
>>> +have a ww_acquire context at hand at eviction time we can also perform
>>> +sleeping locks of those dma_resvs but that could cause expensive
>>> +rollbacks. One option is to just mark the invalidated gpu_vmas with a bool
>>> +which is inspected on the next exec function, when the gpu_vm's
>>> +dma_resv and the object's dma_resv is held, and the invalidated
>>> +gpu_vmas could then be put on the gpu_vm's list of invalidated
>>> +gpu_vmas. That bool would then, although being per-gpu_vma formally be
>>> +protected by the object's dma_resv.
>>> +
>>> +The exec function would then look something like the following:
>>> +
>>> +.. code-block:: C
>>> +
>>> +   read_lock(&gpu_vm->lock);
>>> +
>>> +   dma_resv_lock(&gpu_vm->resv);
>>> +
>>> +   // Shared object list is protected by the gpu_vm->lock.
>>> +   for_each_shared_obj(gpu_vm, &obj) {
>>> +        dma_resv_lock(&obj->resv);
>>> +        move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
>>> +   }
>>> +
>>> +   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
>>> +        revalidate_gpu_vma(&gpu_vma);
>>> +        remove_from_revalidate_list(&gpu_vma);
>>> +   }
>>> +
>>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>>> +   job_dma_fence = gpu_submit(&gpu_job));
>>> +
>>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>>> +   for_each_shared_obj(gpu_vm, &obj)
>>> +          add_dma_fence(job_dma_fence, &obj->resv);
>>> +   dma_resv_unlock_all_resv_locks();
>>> +
>>> +   read_unlock(&gpu_vm->lock);
>>> +
>>> +And the corresponding shared-object aware eviction would look like:
>>> +
>>> +.. code-block:: C
>>> +
>>> +   obj = get_object_from_lru();
>>> +
>>> +   dma_resv_lock(obj->resv);
>>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>>> +        if (object_is_vm_local(obj))
>>> +                     put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>>> +        else
>>> +                     mark_gpu_vma_for_revalidation(&gpu_vma);
>>> +
>>> +   add_dependencies(&eviction_job, &obj->resv);
>>> +   job_dma_fence = gpu_submit(&eviction_job);
>>> +   add_dma_fence(&obj->resv, job_dma_fence);
>>> +
>>> +   dma_resv_unlock(&obj->resv);
>>> +   put_object(obj);
>>> +
>>> +Yet another option is to put the gpu_vmas to be invalidated on a separate
>>> +gpu_vm list protected by a lower level lock that can be taken both at eviction
>>> +time and at transfer-to-revalidate list time. The details are not in
>>> +this document, but this for reference implemented in the Intel xe
>>> +driver.
>>> +
>>> +Introducing userptr gpu_vmas
>>> +============================
>>> +
>>> +A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
>>> +GPU virtual address range, directly maps a CPU mm range of anonymous-
>>> +or file page-cache pages.
>>> +A very simple approach would be to just pin the pages using
>>> +pin_user_pages() at bind time and unpin them at unbind time, but this
>>> +creates a Denial-Of-Service vector since a single user-space process
>>> +would be able to pin down all of system memory, which is not
>>> +desirable. (For special use-cases and with proper accounting pinning might
>>> +still be a desirable feature, though). What we need to do in the general case is
>>> +to obtain a reference to the desired pages, make sure we are notified
>>> +using a MMU notifier just before the CPU mm unmaps the pages, dirty
>>> +them if they are not mapped read-only to the GPU, and then drop the reference.
>>> +When we are notified by the MMU notifier that CPU mm is about to drop the
>>> +pages, we need to stop GPU access to the pages,
>>> +GPU page-table and make sure that before the next time the GPU tries to access
>>> +whatever is now present in the CPU mm range, we unmap the old pages
>>> +from the GPU page tables and repeat the process of obtaining new page
>>> +references. Note that when the core mm decides to laundry pages, we get such
>>> +an unmap MMU notification and can mark the pages dirty again before the
>>> +next GPU access. We also get similar MMU notifications for NUMA accounting
>>> +which the GPU driver doesn't really need to care about, but so far
>>> +it's proven difficult to exclude certain notifications.
>>> +
>>> +Using a MMU notifier for device DMA (and other methods) is described in
>>> +`this document
>>> +<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
>>> +
>>> +Now the method of obtaining struct page references using
>>> +get_user_pages() unfortunately can't be used under a dma_resv lock
>>> +since that would violate the locking order of the dma_resv lock vs the
>>> +mmap_lock that is grabbed when resolving a CPU pagefault. This means the gpu_vm's
>>> +list of userptr gpu_vmas needs to be protected by an outer lock, and this
>>> +is the first time we strictly need the gpu_vm->lock. While it was
>>> +previously used also to protect the list of the gpu_vm's shared objects,
>>> +we could in theory have used the gpu_vm->resv for that.
>>> +
>>> +The MMU interval seqlock for a userptr gpu_vma is used in the following
>>> +way:
>>> +
>>> +.. code-block:: C
>>> +
>>> +   down_read(&gpu_vm->lock);
>>> +
>>> +   retry:
>>> +
>>> +   // Note: mmu_interval_read_begin() blocks until there is no
>>> +   // invalidation notifier running anymore.
>>> +   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
>>> +   if (seq != gpu_vma->saved_seq) {
>>> +           obtain_new_page_pointers(&gpu_vma);
>>> +       dma_resv_lock(&gpu_vm->resv);
>>> +       put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>>> +       dma_resv_unlock(&gpu_vm->resv);
>>> +       gpu_vma->saved_seq = seq;
>>> +   }
>>> +
>>> +   // The usual revalidation goes here.
>>> +
>>> +   // Final userptr sequence validation may not happen before the
>>> +   // submission dma_fence is added to the gpu_vm's resv, from the POW
>>> +   // of the MMU invalidation notifier. Hence the
>>> +   // userptr_notifier_lock that will make them appear atomic.
>>> +
>>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>>> +   down_read(&gpu_vm->userptr_notifier_lock);
>>> +   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
>>> +          up_read(&gpu_vm->userptr_notifier_lock);
>>> +      goto retry;
>>> +   }
>>> +
>>> +   job_dma_fence = gpu_submit(&gpu_job));
>>> +
>>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>>> +
>>> +   for_each_shared_obj(gpu_vm, &obj)
>>> +          add_dma_fence(job_dma_fence, &obj->resv);
>>> +
>>> +   dma_resv_unlock_all_resv_locks();
>>> +   up_read(&gpu_vm->userptr_notifier_lock);
>>> +   up_read(&gpu_vm->lock);
>>> +
>>> +The code between ``mmu_interval_read_begin()`` and the
>>> +``mmu_interval_read_retry()`` marks the read side critical section of
>>> +what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
>>> +gpu_vma list is looped through, and the check is done for *all* of its
>>> +userptr gpu_vmas, although we only show a single one here.
>>> +
>>> +The userptr gpu_vma MMU invalidation notifier might be called from
>>> +reclaim context and, again to avoid locking order violations, we can't
>>> +take any dma_resv lock nor the gpu_vm->lock from within it.
>>> +
>>> +.. code-block:: C
>>> +
>>> +  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
>>> +  {
>>> +          // Make sure the exec function either sees the new sequence
>>> +      // and backs off or we wait for the dma-fence:
>>> +
>>> +          down_write(&gpu_vm->userptr_notifier_lock);
>>> +      mmu_interval_set_seq(userptr_interval, cur_seq);
>>> +      up_write(&gpu_vm->userptr_notifier_lock);
>>> +
>>> +      dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
>>> +                        false, MAX_SCHEDULE_TIMEOUT);
>>> +      return true;
>>> +  }
>>> +
>>> +When this invalidation notifier returns, the GPU can no longer be
>>> +accessing the old pages of the userptr gpu_vma and needs to redo the page-binding
>>> +before a new GPU submission can succeed.
>>> +
>>> +Optimizing gpu_vma iteration
>>> +----------------------------
>>> +
>>> +Iterating through all of a gpu_vm's userptr gpu_vmas to check the validity
>>> +on each exec function may be very costly. There is a scheme to avoid
>>> +this and only iterate through the userptr gpu_vmas that actually saw an
>>> +invalidation notifier call since the last exec. T
>>> +
>>> +TODO: describe that scheme here. It's implemented in the xe driver.
>>> +
>>> +Locking for page-table updates at bind- and unbind time
>>> +=======================================================
>>> +
>>> +TODO.
>>> +
>>> +Recoverable page-fault implications
>>> +===================================
>>> +
>>> +TODO.
>>> -- 
>>> 2.41.0
>>>
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-xe] [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06  8:00       ` Danilo Krummrich
  0 siblings, 0 replies; 45+ messages in thread
From: Danilo Krummrich @ 2023-09-06  8:00 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Francois Dugast, Joonas Lahtinen, linux-kernel, dri-devel,
	Daniel Vetter, Rodrigo Vivi, intel-xe

On 9/6/23 09:06, Thomas Hellström wrote:
> Hi, Danilo,
> 
> Thanks for taking a look. Comments inline.
> 
> On 9/5/23 21:50, Danilo Krummrich wrote:
>> On Wed, Aug 16, 2023 at 11:15:47AM +0200, Thomas Hellström wrote:
>>> Add the first version of the VM_BIND locking document which is
>>> intended to be part of the xe driver upstreaming agreement.
>>>
>>> The document describes and discuss the locking used during exec-
>>> functions, evicton and for userptr gpu-vmas. Intention is to be using the
>>> same nomenclature as the drm-vm-bind-async.rst.
>>>
>>> v2:
>>> - s/gvm/gpu_vm/g (Rodrigo Vivi)
>>> - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
>>>    (Rodrigo Vivi)
>>> - Adjust commit message accordingly.
>>> - Add SPDX license header.
>>>
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> ---
>>>   Documentation/gpu/drm-vm-bind-locking.rst | 351 ++++++++++++++++++++++
>>>   1 file changed, 351 insertions(+)
>>>   create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst
>>>
>>> diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
>>> new file mode 100644
>>> index 000000000000..b813961a9ec2
>>> --- /dev/null
>>> +++ b/Documentation/gpu/drm-vm-bind-locking.rst
>>> @@ -0,0 +1,351 @@
>>> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
>>> +
>>> +===============
>>> +VM_BIND locking
>>> +===============
>>> +
>>> +This document attempts to describe what's needed to get VM_BIND locking right,
>>> +including the userptr mmu_notifier locking and it will also discuss some
>>> +optimizations to get rid of the looping through of all userptr mappings and
>>> +external / shared object mappings that is needed in the simplest
>>> +implementation. It will also discuss some implications for faulting gpu_vms.
>>> +
>>> +Nomenclature
>>> +============
>>> +
>>> +* ``Context``: GPU execution context.
>>> +* ``gpu_vm``: Abstraction of a virtual GPU address space with
>>> +  meta-data. Typically one per client (DRM file-private), or one per
>>> +  context.
>>> +* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with
>> The same nomenclature was used within the VM_BIND async document as well. I
>> wonder if it would make sense to align the naming with the GPUVA manager, such
>> that ('drm_gpuva_manager' -> 'drm_gpuvm'). This would also result into better
>> function names, such as drm_gpuvm_resv_lock() or drm_gpuvm_prepare_objects() and
>> potentially way better naming for the VM_BO abstraction 'drm_gpuvm_bo'.
>>
>> However, I'd like to keep 'drm_gpuva' rather than 'drm_gpu_vma', but I think
>> this is close enough anyway.
> 
> I don't have a strong opinion about the naming here and aligning with the GPUVA manager make sense, although perhaps the "drm_" prefix which makes sense for the function- and struct names may not make sense in a more generic document like this. What about gpuva and gpuvm?

Oh, I think the document is fine as it is. This was more like me thinking loud
about renaming things in the GPUVA manager accordingly.

> 
> 
>>
>>> +  associated meta-data. The backing storage of a gpu_vma can either be
>>> +  a gem buffer object or anonymous pages mapped also into the CPU
>>> +  address space for the process.
>>> +* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing store of
>>> +  which is anonymous pages as described above.
>>> +* ``revalidating``: Revalidating a gpu_vma means making the latest version
>>> +  of the backing store resident and making sure the gpu_vma's
>>> +  page-table entries point to that backing store.
>>> +* ``dma_fence``: A struct dma_fence that is similar to a struct completion
>>> +  and which tracks GPU activity. When the GPU activity is finished,
>>> +  the dma_fence signals.
>>> +* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
>>> +  to track GPU activity in the form of multiple dma_fences on a
>>> +  gpu_vm or a gem buffer object. The dma_resv contains an array / list
>>> +  of dma_fences and a lock that needs to be held when adding
>>> +  additional dma_fences to the dma_resv. The lock is of a type that
>>> +  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
>>> +* ``exec function``: An exec function is a function that revalidates all
>>> +  affected gpu_vmas, submits a GPU command batch and registers the
>>> +  dma_fence representing the GPU command's activity with all affected
>>> +  dma_resvs. For completeness, although not covered by this document,
>>> +  it's worth mentioning that an exec function may also be the
>>> +  revalidation worker that is used by some drivers in compute /
>>> +  long-running mode.
>>> +* ``local object``: A GEM object which is local to a gpu_vm. Shared gem
>>> +  objects also share the gpu_vm's dma_resv.
>>> +* ``shared object``: AKA external object: A GEM object which may be shared
>>> +  by multiple gpu_vms and whose backing storage may be shared with
>>> +  other drivers.
>>> +
>>> +
>>> +Introducing the locks
>>> +=====================
>>> +
>>> +One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
>>> +dma_resv object and hence the dma_resv lock. So even with a huge
>>> +number of local GEM objects, only one lock is needed to make the exec
>>> +sequence atomic.
>>> +
>>> +The following locks and locking orders are used:
>>> +
>>> +* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the gpu_vm is
>>> +  partitioned into gpu_vmas, protects the gpu_vm's list of external objects,
>>> +  and can also with some simplification protect the gpu_vm's list of
>>> +  userptr gpu_vmas. With the CPU mm analogy this would correspond to the
>>> +  mmap_lock.
>>> +* The ``userptr_seqlock``. This lock is taken in read mode for each
>>> +  userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
>>> +  notifier invalidation. This is not a real seqlock but described in
>>> +  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
>>> +  'lock' a lot like a seqcount, however this allows multiple
>>> +  write-sides to hold it at once...". The read side critical section
>>> +  is enclosed by ``mmu_interval_read_begin() /
>>> +  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
>>> +  sleeping uninterruptibly if the write side is held.
>>> +  The write side is held by the core mm while calling mmu interval
>>> +  invalidation notifiers.
>>> +* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
>>> +  rebinding, and also the residency of all the gpu_vm's local GEM object.
>>> +* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is taken in read
>>> +  mode during exec and write mode during a mmu notifier invalidation. In
>>> +  the absence of a separate page-table lock, this lock can serve
>>> +  together with the gpu_vm's dma_resv lock as a page-table lock. More on
>>> +  this below. The userptr notifier lock is per gpu_vm.
>>> +* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's page-table updates. For
>>> +  simplicity the gpu_vm's dma_resv lock can be reused as page-table lock.
>>> +
>>> +There are certain optimizations described below that require
>>> +additional locks. More on that later.
>>> +
>>> +.. code-block:: C
>>> +
>>> +   dma_resv_lock(&gpu_vm->resv);
>>> +
>>> +   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
>>> +        revalidate_gpu_vma(&gpu_vma);
>>> +        remove_from_revalidate_list(&gpu_vma);
>>> +   }
>>> +
>>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>>> +   job_dma_fence = gpu_submit(&gpu_job));
>>> +
>>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>>> +   dma_resv_unlock(&gpu_vm->resv);
>>> +
>>> +Eviction of one of these local objects will then be something like the
>>> +following:
>>> +
>>> +.. code-block:: C
>>> +
>>> +   obj = get_object_from_lru();
>>> +
>>> +   dma_resv_lock(obj->resv);
>>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>>> +        put_gpu_vma_on_revalidate_list(&gpu_vma);
>>> +
>>> +   add_dependencies(&eviction_job, &obj->resv);
>>> +   job_dma_fence = gpu_submit(&eviction_job);
>>> +   add_dma_fence(&obj->resv, job_dma_fence);
>>> +
>>> +   dma_resv_unlock(&obj->resv);
>>> +   put_object(obj);
>>> +
>>> +Note that since the object is local to the gpu_vm, it will share the gpu_vm's
>>> +``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. Invalidated gpu_vmas are put
>>> +on the gpu_vm's revalidation list, which is protected by ``gpu_vm->resv``, which
>>> +is always locked while evicting, due to the above equality.
>>> +
>>> +For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before eviction,
>>> +Since the eviction blit or copy will wait for GPU idle, any attempt by
>>> +the GPU to access freed memory through the gpu_vma will be preceded by
>>> +a new exec function, which will make sure the gpu_vma is
>>> +revalidated. The eviction code holding the object's dma_resv while
>>> +revalidating will ensure a new exec function may not race with the eviction.
>>> +
>>> +Introducing external (or shared) buffer objects
>>> +===============================================
>>> +
>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>>> +can't share their reservation object with a single gpu_vm, but will rather
>>> +have a reservation object of their own. The shared objects bound to a
>>> +gpu_vm using one or many
>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>>> +protected by the gpu_vm lock. One could in theory protect it also with
>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is typically
>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>>> +the current locking helpers, that is typically not done. Also see
>>> +below for userptr gpu_vmas.
>>> +
>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>>> +object, but we can no longer be certain that we hold the gpu_vm's
>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
>> I need to think a bit more about locking of extobj and evicted object tracking
>> in the case of processing 'drm_gpuva_ops' directly through callbacks within the
>> fence signalling critical path as mentioend in [1].
>>
>> In order to support that, we'd need to protect extobjs with a separate lock,
>> and while iterating extobjs to acquire the dma-resv lock drop the lock within
>> the loop before we actually acquire the dma-resv lock. Maple tree supports that
>> already and this can be fully done within the GPUVA manager; no need for the
>> driver to care about that.
> 
> So do I understand correctly that this because you want to update the gpuvm state while operations are progressing asynchronously?
> 
> If so, I wonder whether that could really be done? For example to allocate enough memory for page-tables etc, you need to know the details of the operations at IOCTL execution time, and to know the details you need to know the state from the previous operation?

Right, sync and async bind can't run fully concurrently, but you could "inject" a
sync one between two async ones such that the sync ones executed from the IOCTL
directly while async execution is stalled meanwhile. This would be possible because
the actual drm_gpuva_ops would be calculated within the async execution path rather
than in the IOCTL. But yes, page-table management must be desinged to support that.

> 
>>
>> While, as already mentioned, I'd really love to support that, I noticed that we
>> have a similar issue with tracking evicted objects. There are (similar) ways to
>> deal with that, however, it drastically increases complexity.
>>
>> Hence, I'd like to reconsider whether it's worth supporting it in the first
>> place. Most of the arguments in order to support it are for decreasing
>> complexity. However, if it increases complexity elsewhere, it's probably not
>> worth. The only argument left would be for synchronous bind jobs which could
>> be injected at any point of time without the need to be queued up in the
>> scheduler to preserve ordering. However, I'm not yet sure how important this
>> would be. For Xe it doesn't really seem to be a concern I guess?
> Xe supports that functionality via separate bind queues. If you queue most of the operations using one queue, you can inject synchronous bind jobs using another. Ideally they execute separately, but they are not guaranteed to do that.

Ok, but the separate bind queue would still work in the same asynchronous way, as
in the job is submitted to some kind of worker and the IOCTL just blocks until
completion, right?

>>
>> [1] https://lore.kernel.org/dri-devel/202308221050.kTj8uFMA-lkp@intel.com/T/#m7f3b5a7ff70723332adeea32671578cb95c62f7c
>>
>>> +hold the object's private dma_resv. We can trylock the dma_resvs for
>>> +the affected gpu_vm's but that might be unnecessarily complex. If we
>>> +have a ww_acquire context at hand at eviction time we can also perform
>>> +sleeping locks of those dma_resvs but that could cause expensive
>>> +rollbacks. One option is to just mark the invalidated gpu_vmas with a bool
>>> +which is inspected on the next exec function, when the gpu_vm's
>>> +dma_resv and the object's dma_resv is held, and the invalidated
>>> +gpu_vmas could then be put on the gpu_vm's list of invalidated
>>> +gpu_vmas. That bool would then, although being per-gpu_vma formally be
>>> +protected by the object's dma_resv.
>>> +
>>> +The exec function would then look something like the following:
>>> +
>>> +.. code-block:: C
>>> +
>>> +   read_lock(&gpu_vm->lock);
>>> +
>>> +   dma_resv_lock(&gpu_vm->resv);
>>> +
>>> +   // Shared object list is protected by the gpu_vm->lock.
>>> +   for_each_shared_obj(gpu_vm, &obj) {
>>> +        dma_resv_lock(&obj->resv);
>>> +        move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
>>> +   }
>>> +
>>> +   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
>>> +        revalidate_gpu_vma(&gpu_vma);
>>> +        remove_from_revalidate_list(&gpu_vma);
>>> +   }
>>> +
>>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>>> +   job_dma_fence = gpu_submit(&gpu_job));
>>> +
>>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>>> +   for_each_shared_obj(gpu_vm, &obj)
>>> +          add_dma_fence(job_dma_fence, &obj->resv);
>>> +   dma_resv_unlock_all_resv_locks();
>>> +
>>> +   read_unlock(&gpu_vm->lock);
>>> +
>>> +And the corresponding shared-object aware eviction would look like:
>>> +
>>> +.. code-block:: C
>>> +
>>> +   obj = get_object_from_lru();
>>> +
>>> +   dma_resv_lock(obj->resv);
>>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>>> +        if (object_is_vm_local(obj))
>>> +                     put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>>> +        else
>>> +                     mark_gpu_vma_for_revalidation(&gpu_vma);
>>> +
>>> +   add_dependencies(&eviction_job, &obj->resv);
>>> +   job_dma_fence = gpu_submit(&eviction_job);
>>> +   add_dma_fence(&obj->resv, job_dma_fence);
>>> +
>>> +   dma_resv_unlock(&obj->resv);
>>> +   put_object(obj);
>>> +
>>> +Yet another option is to put the gpu_vmas to be invalidated on a separate
>>> +gpu_vm list protected by a lower level lock that can be taken both at eviction
>>> +time and at transfer-to-revalidate list time. The details are not in
>>> +this document, but this for reference implemented in the Intel xe
>>> +driver.
>>> +
>>> +Introducing userptr gpu_vmas
>>> +============================
>>> +
>>> +A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
>>> +GPU virtual address range, directly maps a CPU mm range of anonymous-
>>> +or file page-cache pages.
>>> +A very simple approach would be to just pin the pages using
>>> +pin_user_pages() at bind time and unpin them at unbind time, but this
>>> +creates a Denial-Of-Service vector since a single user-space process
>>> +would be able to pin down all of system memory, which is not
>>> +desirable. (For special use-cases and with proper accounting pinning might
>>> +still be a desirable feature, though). What we need to do in the general case is
>>> +to obtain a reference to the desired pages, make sure we are notified
>>> +using a MMU notifier just before the CPU mm unmaps the pages, dirty
>>> +them if they are not mapped read-only to the GPU, and then drop the reference.
>>> +When we are notified by the MMU notifier that CPU mm is about to drop the
>>> +pages, we need to stop GPU access to the pages,
>>> +GPU page-table and make sure that before the next time the GPU tries to access
>>> +whatever is now present in the CPU mm range, we unmap the old pages
>>> +from the GPU page tables and repeat the process of obtaining new page
>>> +references. Note that when the core mm decides to laundry pages, we get such
>>> +an unmap MMU notification and can mark the pages dirty again before the
>>> +next GPU access. We also get similar MMU notifications for NUMA accounting
>>> +which the GPU driver doesn't really need to care about, but so far
>>> +it's proven difficult to exclude certain notifications.
>>> +
>>> +Using a MMU notifier for device DMA (and other methods) is described in
>>> +`this document
>>> +<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
>>> +
>>> +Now the method of obtaining struct page references using
>>> +get_user_pages() unfortunately can't be used under a dma_resv lock
>>> +since that would violate the locking order of the dma_resv lock vs the
>>> +mmap_lock that is grabbed when resolving a CPU pagefault. This means the gpu_vm's
>>> +list of userptr gpu_vmas needs to be protected by an outer lock, and this
>>> +is the first time we strictly need the gpu_vm->lock. While it was
>>> +previously used also to protect the list of the gpu_vm's shared objects,
>>> +we could in theory have used the gpu_vm->resv for that.
>>> +
>>> +The MMU interval seqlock for a userptr gpu_vma is used in the following
>>> +way:
>>> +
>>> +.. code-block:: C
>>> +
>>> +   down_read(&gpu_vm->lock);
>>> +
>>> +   retry:
>>> +
>>> +   // Note: mmu_interval_read_begin() blocks until there is no
>>> +   // invalidation notifier running anymore.
>>> +   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
>>> +   if (seq != gpu_vma->saved_seq) {
>>> +           obtain_new_page_pointers(&gpu_vma);
>>> +       dma_resv_lock(&gpu_vm->resv);
>>> +       put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>>> +       dma_resv_unlock(&gpu_vm->resv);
>>> +       gpu_vma->saved_seq = seq;
>>> +   }
>>> +
>>> +   // The usual revalidation goes here.
>>> +
>>> +   // Final userptr sequence validation may not happen before the
>>> +   // submission dma_fence is added to the gpu_vm's resv, from the POW
>>> +   // of the MMU invalidation notifier. Hence the
>>> +   // userptr_notifier_lock that will make them appear atomic.
>>> +
>>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>>> +   down_read(&gpu_vm->userptr_notifier_lock);
>>> +   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
>>> +          up_read(&gpu_vm->userptr_notifier_lock);
>>> +      goto retry;
>>> +   }
>>> +
>>> +   job_dma_fence = gpu_submit(&gpu_job));
>>> +
>>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>>> +
>>> +   for_each_shared_obj(gpu_vm, &obj)
>>> +          add_dma_fence(job_dma_fence, &obj->resv);
>>> +
>>> +   dma_resv_unlock_all_resv_locks();
>>> +   up_read(&gpu_vm->userptr_notifier_lock);
>>> +   up_read(&gpu_vm->lock);
>>> +
>>> +The code between ``mmu_interval_read_begin()`` and the
>>> +``mmu_interval_read_retry()`` marks the read side critical section of
>>> +what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
>>> +gpu_vma list is looped through, and the check is done for *all* of its
>>> +userptr gpu_vmas, although we only show a single one here.
>>> +
>>> +The userptr gpu_vma MMU invalidation notifier might be called from
>>> +reclaim context and, again to avoid locking order violations, we can't
>>> +take any dma_resv lock nor the gpu_vm->lock from within it.
>>> +
>>> +.. code-block:: C
>>> +
>>> +  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
>>> +  {
>>> +          // Make sure the exec function either sees the new sequence
>>> +      // and backs off or we wait for the dma-fence:
>>> +
>>> +          down_write(&gpu_vm->userptr_notifier_lock);
>>> +      mmu_interval_set_seq(userptr_interval, cur_seq);
>>> +      up_write(&gpu_vm->userptr_notifier_lock);
>>> +
>>> +      dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
>>> +                        false, MAX_SCHEDULE_TIMEOUT);
>>> +      return true;
>>> +  }
>>> +
>>> +When this invalidation notifier returns, the GPU can no longer be
>>> +accessing the old pages of the userptr gpu_vma and needs to redo the page-binding
>>> +before a new GPU submission can succeed.
>>> +
>>> +Optimizing gpu_vma iteration
>>> +----------------------------
>>> +
>>> +Iterating through all of a gpu_vm's userptr gpu_vmas to check the validity
>>> +on each exec function may be very costly. There is a scheme to avoid
>>> +this and only iterate through the userptr gpu_vmas that actually saw an
>>> +invalidation notifier call since the last exec. T
>>> +
>>> +TODO: describe that scheme here. It's implemented in the xe driver.
>>> +
>>> +Locking for page-table updates at bind- and unbind time
>>> +=======================================================
>>> +
>>> +TODO.
>>> +
>>> +Recoverable page-fault implications
>>> +===================================
>>> +
>>> +TODO.
>>> -- 
>>> 2.41.0
>>>
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06  8:00       ` Danilo Krummrich
  0 siblings, 0 replies; 45+ messages in thread
From: Danilo Krummrich @ 2023-09-06  8:00 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Rodrigo Vivi, Matthew Brost, Joonas Lahtinen, Oak Zeng,
	Daniel Vetter, Maarten Lankhorst, Francois Dugast, dri-devel,
	linux-kernel

On 9/6/23 09:06, Thomas Hellström wrote:
> Hi, Danilo,
> 
> Thanks for taking a look. Comments inline.
> 
> On 9/5/23 21:50, Danilo Krummrich wrote:
>> On Wed, Aug 16, 2023 at 11:15:47AM +0200, Thomas Hellström wrote:
>>> Add the first version of the VM_BIND locking document which is
>>> intended to be part of the xe driver upstreaming agreement.
>>>
>>> The document describes and discuss the locking used during exec-
>>> functions, evicton and for userptr gpu-vmas. Intention is to be using the
>>> same nomenclature as the drm-vm-bind-async.rst.
>>>
>>> v2:
>>> - s/gvm/gpu_vm/g (Rodrigo Vivi)
>>> - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
>>>    (Rodrigo Vivi)
>>> - Adjust commit message accordingly.
>>> - Add SPDX license header.
>>>
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> ---
>>>   Documentation/gpu/drm-vm-bind-locking.rst | 351 ++++++++++++++++++++++
>>>   1 file changed, 351 insertions(+)
>>>   create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst
>>>
>>> diff --git a/Documentation/gpu/drm-vm-bind-locking.rst b/Documentation/gpu/drm-vm-bind-locking.rst
>>> new file mode 100644
>>> index 000000000000..b813961a9ec2
>>> --- /dev/null
>>> +++ b/Documentation/gpu/drm-vm-bind-locking.rst
>>> @@ -0,0 +1,351 @@
>>> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
>>> +
>>> +===============
>>> +VM_BIND locking
>>> +===============
>>> +
>>> +This document attempts to describe what's needed to get VM_BIND locking right,
>>> +including the userptr mmu_notifier locking and it will also discuss some
>>> +optimizations to get rid of the looping through of all userptr mappings and
>>> +external / shared object mappings that is needed in the simplest
>>> +implementation. It will also discuss some implications for faulting gpu_vms.
>>> +
>>> +Nomenclature
>>> +============
>>> +
>>> +* ``Context``: GPU execution context.
>>> +* ``gpu_vm``: Abstraction of a virtual GPU address space with
>>> +  meta-data. Typically one per client (DRM file-private), or one per
>>> +  context.
>>> +* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with
>> The same nomenclature was used within the VM_BIND async document as well. I
>> wonder if it would make sense to align the naming with the GPUVA manager, such
>> that ('drm_gpuva_manager' -> 'drm_gpuvm'). This would also result into better
>> function names, such as drm_gpuvm_resv_lock() or drm_gpuvm_prepare_objects() and
>> potentially way better naming for the VM_BO abstraction 'drm_gpuvm_bo'.
>>
>> However, I'd like to keep 'drm_gpuva' rather than 'drm_gpu_vma', but I think
>> this is close enough anyway.
> 
> I don't have a strong opinion about the naming here and aligning with the GPUVA manager make sense, although perhaps the "drm_" prefix which makes sense for the function- and struct names may not make sense in a more generic document like this. What about gpuva and gpuvm?

Oh, I think the document is fine as it is. This was more like me thinking loud
about renaming things in the GPUVA manager accordingly.

> 
> 
>>
>>> +  associated meta-data. The backing storage of a gpu_vma can either be
>>> +  a gem buffer object or anonymous pages mapped also into the CPU
>>> +  address space for the process.
>>> +* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing store of
>>> +  which is anonymous pages as described above.
>>> +* ``revalidating``: Revalidating a gpu_vma means making the latest version
>>> +  of the backing store resident and making sure the gpu_vma's
>>> +  page-table entries point to that backing store.
>>> +* ``dma_fence``: A struct dma_fence that is similar to a struct completion
>>> +  and which tracks GPU activity. When the GPU activity is finished,
>>> +  the dma_fence signals.
>>> +* ``dma_resv``: A struct dma_resv (AKA reservation object) that is used
>>> +  to track GPU activity in the form of multiple dma_fences on a
>>> +  gpu_vm or a gem buffer object. The dma_resv contains an array / list
>>> +  of dma_fences and a lock that needs to be held when adding
>>> +  additional dma_fences to the dma_resv. The lock is of a type that
>>> +  allows deadlock-safe locking of multiple dma_resvs in arbitrary order.
>>> +* ``exec function``: An exec function is a function that revalidates all
>>> +  affected gpu_vmas, submits a GPU command batch and registers the
>>> +  dma_fence representing the GPU command's activity with all affected
>>> +  dma_resvs. For completeness, although not covered by this document,
>>> +  it's worth mentioning that an exec function may also be the
>>> +  revalidation worker that is used by some drivers in compute /
>>> +  long-running mode.
>>> +* ``local object``: A GEM object which is local to a gpu_vm. Shared gem
>>> +  objects also share the gpu_vm's dma_resv.
>>> +* ``shared object``: AKA external object: A GEM object which may be shared
>>> +  by multiple gpu_vms and whose backing storage may be shared with
>>> +  other drivers.
>>> +
>>> +
>>> +Introducing the locks
>>> +=====================
>>> +
>>> +One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
>>> +dma_resv object and hence the dma_resv lock. So even with a huge
>>> +number of local GEM objects, only one lock is needed to make the exec
>>> +sequence atomic.
>>> +
>>> +The following locks and locking orders are used:
>>> +
>>> +* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the gpu_vm is
>>> +  partitioned into gpu_vmas, protects the gpu_vm's list of external objects,
>>> +  and can also with some simplification protect the gpu_vm's list of
>>> +  userptr gpu_vmas. With the CPU mm analogy this would correspond to the
>>> +  mmap_lock.
>>> +* The ``userptr_seqlock``. This lock is taken in read mode for each
>>> +  userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
>>> +  notifier invalidation. This is not a real seqlock but described in
>>> +  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
>>> +  'lock' a lot like a seqcount, however this allows multiple
>>> +  write-sides to hold it at once...". The read side critical section
>>> +  is enclosed by ``mmu_interval_read_begin() /
>>> +  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
>>> +  sleeping uninterruptibly if the write side is held.
>>> +  The write side is held by the core mm while calling mmu interval
>>> +  invalidation notifiers.
>>> +* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
>>> +  rebinding, and also the residency of all the gpu_vm's local GEM object.
>>> +* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is taken in read
>>> +  mode during exec and write mode during a mmu notifier invalidation. In
>>> +  the absence of a separate page-table lock, this lock can serve
>>> +  together with the gpu_vm's dma_resv lock as a page-table lock. More on
>>> +  this below. The userptr notifier lock is per gpu_vm.
>>> +* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's page-table updates. For
>>> +  simplicity the gpu_vm's dma_resv lock can be reused as page-table lock.
>>> +
>>> +There are certain optimizations described below that require
>>> +additional locks. More on that later.
>>> +
>>> +.. code-block:: C
>>> +
>>> +   dma_resv_lock(&gpu_vm->resv);
>>> +
>>> +   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
>>> +        revalidate_gpu_vma(&gpu_vma);
>>> +        remove_from_revalidate_list(&gpu_vma);
>>> +   }
>>> +
>>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>>> +   job_dma_fence = gpu_submit(&gpu_job));
>>> +
>>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>>> +   dma_resv_unlock(&gpu_vm->resv);
>>> +
>>> +Eviction of one of these local objects will then be something like the
>>> +following:
>>> +
>>> +.. code-block:: C
>>> +
>>> +   obj = get_object_from_lru();
>>> +
>>> +   dma_resv_lock(obj->resv);
>>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>>> +        put_gpu_vma_on_revalidate_list(&gpu_vma);
>>> +
>>> +   add_dependencies(&eviction_job, &obj->resv);
>>> +   job_dma_fence = gpu_submit(&eviction_job);
>>> +   add_dma_fence(&obj->resv, job_dma_fence);
>>> +
>>> +   dma_resv_unlock(&obj->resv);
>>> +   put_object(obj);
>>> +
>>> +Note that since the object is local to the gpu_vm, it will share the gpu_vm's
>>> +``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. Invalidated gpu_vmas are put
>>> +on the gpu_vm's revalidation list, which is protected by ``gpu_vm->resv``, which
>>> +is always locked while evicting, due to the above equality.
>>> +
>>> +For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before eviction,
>>> +Since the eviction blit or copy will wait for GPU idle, any attempt by
>>> +the GPU to access freed memory through the gpu_vma will be preceded by
>>> +a new exec function, which will make sure the gpu_vma is
>>> +revalidated. The eviction code holding the object's dma_resv while
>>> +revalidating will ensure a new exec function may not race with the eviction.
>>> +
>>> +Introducing external (or shared) buffer objects
>>> +===============================================
>>> +
>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>>> +can't share their reservation object with a single gpu_vm, but will rather
>>> +have a reservation object of their own. The shared objects bound to a
>>> +gpu_vm using one or many
>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>>> +protected by the gpu_vm lock. One could in theory protect it also with
>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is typically
>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>>> +the current locking helpers, that is typically not done. Also see
>>> +below for userptr gpu_vmas.
>>> +
>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>>> +object, but we can no longer be certain that we hold the gpu_vm's
>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
>> I need to think a bit more about locking of extobj and evicted object tracking
>> in the case of processing 'drm_gpuva_ops' directly through callbacks within the
>> fence signalling critical path as mentioend in [1].
>>
>> In order to support that, we'd need to protect extobjs with a separate lock,
>> and while iterating extobjs to acquire the dma-resv lock drop the lock within
>> the loop before we actually acquire the dma-resv lock. Maple tree supports that
>> already and this can be fully done within the GPUVA manager; no need for the
>> driver to care about that.
> 
> So do I understand correctly that this because you want to update the gpuvm state while operations are progressing asynchronously?
> 
> If so, I wonder whether that could really be done? For example to allocate enough memory for page-tables etc, you need to know the details of the operations at IOCTL execution time, and to know the details you need to know the state from the previous operation?

Right, sync and async bind can't run fully concurrently, but you could "inject" a
sync one between two async ones such that the sync ones executed from the IOCTL
directly while async execution is stalled meanwhile. This would be possible because
the actual drm_gpuva_ops would be calculated within the async execution path rather
than in the IOCTL. But yes, page-table management must be desinged to support that.

> 
>>
>> While, as already mentioned, I'd really love to support that, I noticed that we
>> have a similar issue with tracking evicted objects. There are (similar) ways to
>> deal with that, however, it drastically increases complexity.
>>
>> Hence, I'd like to reconsider whether it's worth supporting it in the first
>> place. Most of the arguments in order to support it are for decreasing
>> complexity. However, if it increases complexity elsewhere, it's probably not
>> worth. The only argument left would be for synchronous bind jobs which could
>> be injected at any point of time without the need to be queued up in the
>> scheduler to preserve ordering. However, I'm not yet sure how important this
>> would be. For Xe it doesn't really seem to be a concern I guess?
> Xe supports that functionality via separate bind queues. If you queue most of the operations using one queue, you can inject synchronous bind jobs using another. Ideally they execute separately, but they are not guaranteed to do that.

Ok, but the separate bind queue would still work in the same asynchronous way, as
in the job is submitted to some kind of worker and the IOCTL just blocks until
completion, right?

>>
>> [1] https://lore.kernel.org/dri-devel/202308221050.kTj8uFMA-lkp@intel.com/T/#m7f3b5a7ff70723332adeea32671578cb95c62f7c
>>
>>> +hold the object's private dma_resv. We can trylock the dma_resvs for
>>> +the affected gpu_vm's but that might be unnecessarily complex. If we
>>> +have a ww_acquire context at hand at eviction time we can also perform
>>> +sleeping locks of those dma_resvs but that could cause expensive
>>> +rollbacks. One option is to just mark the invalidated gpu_vmas with a bool
>>> +which is inspected on the next exec function, when the gpu_vm's
>>> +dma_resv and the object's dma_resv is held, and the invalidated
>>> +gpu_vmas could then be put on the gpu_vm's list of invalidated
>>> +gpu_vmas. That bool would then, although being per-gpu_vma formally be
>>> +protected by the object's dma_resv.
>>> +
>>> +The exec function would then look something like the following:
>>> +
>>> +.. code-block:: C
>>> +
>>> +   read_lock(&gpu_vm->lock);
>>> +
>>> +   dma_resv_lock(&gpu_vm->resv);
>>> +
>>> +   // Shared object list is protected by the gpu_vm->lock.
>>> +   for_each_shared_obj(gpu_vm, &obj) {
>>> +        dma_resv_lock(&obj->resv);
>>> +        move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
>>> +   }
>>> +
>>> +   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
>>> +        revalidate_gpu_vma(&gpu_vma);
>>> +        remove_from_revalidate_list(&gpu_vma);
>>> +   }
>>> +
>>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>>> +   job_dma_fence = gpu_submit(&gpu_job));
>>> +
>>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>>> +   for_each_shared_obj(gpu_vm, &obj)
>>> +          add_dma_fence(job_dma_fence, &obj->resv);
>>> +   dma_resv_unlock_all_resv_locks();
>>> +
>>> +   read_unlock(&gpu_vm->lock);
>>> +
>>> +And the corresponding shared-object aware eviction would look like:
>>> +
>>> +.. code-block:: C
>>> +
>>> +   obj = get_object_from_lru();
>>> +
>>> +   dma_resv_lock(obj->resv);
>>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>>> +        if (object_is_vm_local(obj))
>>> +                     put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>>> +        else
>>> +                     mark_gpu_vma_for_revalidation(&gpu_vma);
>>> +
>>> +   add_dependencies(&eviction_job, &obj->resv);
>>> +   job_dma_fence = gpu_submit(&eviction_job);
>>> +   add_dma_fence(&obj->resv, job_dma_fence);
>>> +
>>> +   dma_resv_unlock(&obj->resv);
>>> +   put_object(obj);
>>> +
>>> +Yet another option is to put the gpu_vmas to be invalidated on a separate
>>> +gpu_vm list protected by a lower level lock that can be taken both at eviction
>>> +time and at transfer-to-revalidate list time. The details are not in
>>> +this document, but this for reference implemented in the Intel xe
>>> +driver.
>>> +
>>> +Introducing userptr gpu_vmas
>>> +============================
>>> +
>>> +A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
>>> +GPU virtual address range, directly maps a CPU mm range of anonymous-
>>> +or file page-cache pages.
>>> +A very simple approach would be to just pin the pages using
>>> +pin_user_pages() at bind time and unpin them at unbind time, but this
>>> +creates a Denial-Of-Service vector since a single user-space process
>>> +would be able to pin down all of system memory, which is not
>>> +desirable. (For special use-cases and with proper accounting pinning might
>>> +still be a desirable feature, though). What we need to do in the general case is
>>> +to obtain a reference to the desired pages, make sure we are notified
>>> +using a MMU notifier just before the CPU mm unmaps the pages, dirty
>>> +them if they are not mapped read-only to the GPU, and then drop the reference.
>>> +When we are notified by the MMU notifier that CPU mm is about to drop the
>>> +pages, we need to stop GPU access to the pages,
>>> +GPU page-table and make sure that before the next time the GPU tries to access
>>> +whatever is now present in the CPU mm range, we unmap the old pages
>>> +from the GPU page tables and repeat the process of obtaining new page
>>> +references. Note that when the core mm decides to laundry pages, we get such
>>> +an unmap MMU notification and can mark the pages dirty again before the
>>> +next GPU access. We also get similar MMU notifications for NUMA accounting
>>> +which the GPU driver doesn't really need to care about, but so far
>>> +it's proven difficult to exclude certain notifications.
>>> +
>>> +Using a MMU notifier for device DMA (and other methods) is described in
>>> +`this document
>>> +<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_.
>>> +
>>> +Now the method of obtaining struct page references using
>>> +get_user_pages() unfortunately can't be used under a dma_resv lock
>>> +since that would violate the locking order of the dma_resv lock vs the
>>> +mmap_lock that is grabbed when resolving a CPU pagefault. This means the gpu_vm's
>>> +list of userptr gpu_vmas needs to be protected by an outer lock, and this
>>> +is the first time we strictly need the gpu_vm->lock. While it was
>>> +previously used also to protect the list of the gpu_vm's shared objects,
>>> +we could in theory have used the gpu_vm->resv for that.
>>> +
>>> +The MMU interval seqlock for a userptr gpu_vma is used in the following
>>> +way:
>>> +
>>> +.. code-block:: C
>>> +
>>> +   down_read(&gpu_vm->lock);
>>> +
>>> +   retry:
>>> +
>>> +   // Note: mmu_interval_read_begin() blocks until there is no
>>> +   // invalidation notifier running anymore.
>>> +   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
>>> +   if (seq != gpu_vma->saved_seq) {
>>> +           obtain_new_page_pointers(&gpu_vma);
>>> +       dma_resv_lock(&gpu_vm->resv);
>>> +       put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>>> +       dma_resv_unlock(&gpu_vm->resv);
>>> +       gpu_vma->saved_seq = seq;
>>> +   }
>>> +
>>> +   // The usual revalidation goes here.
>>> +
>>> +   // Final userptr sequence validation may not happen before the
>>> +   // submission dma_fence is added to the gpu_vm's resv, from the POW
>>> +   // of the MMU invalidation notifier. Hence the
>>> +   // userptr_notifier_lock that will make them appear atomic.
>>> +
>>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>>> +   down_read(&gpu_vm->userptr_notifier_lock);
>>> +   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
>>> +          up_read(&gpu_vm->userptr_notifier_lock);
>>> +      goto retry;
>>> +   }
>>> +
>>> +   job_dma_fence = gpu_submit(&gpu_job));
>>> +
>>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>>> +
>>> +   for_each_shared_obj(gpu_vm, &obj)
>>> +          add_dma_fence(job_dma_fence, &obj->resv);
>>> +
>>> +   dma_resv_unlock_all_resv_locks();
>>> +   up_read(&gpu_vm->userptr_notifier_lock);
>>> +   up_read(&gpu_vm->lock);
>>> +
>>> +The code between ``mmu_interval_read_begin()`` and the
>>> +``mmu_interval_read_retry()`` marks the read side critical section of
>>> +what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
>>> +gpu_vma list is looped through, and the check is done for *all* of its
>>> +userptr gpu_vmas, although we only show a single one here.
>>> +
>>> +The userptr gpu_vma MMU invalidation notifier might be called from
>>> +reclaim context and, again to avoid locking order violations, we can't
>>> +take any dma_resv lock nor the gpu_vm->lock from within it.
>>> +
>>> +.. code-block:: C
>>> +
>>> +  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
>>> +  {
>>> +          // Make sure the exec function either sees the new sequence
>>> +      // and backs off or we wait for the dma-fence:
>>> +
>>> +          down_write(&gpu_vm->userptr_notifier_lock);
>>> +      mmu_interval_set_seq(userptr_interval, cur_seq);
>>> +      up_write(&gpu_vm->userptr_notifier_lock);
>>> +
>>> +      dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
>>> +                        false, MAX_SCHEDULE_TIMEOUT);
>>> +      return true;
>>> +  }
>>> +
>>> +When this invalidation notifier returns, the GPU can no longer be
>>> +accessing the old pages of the userptr gpu_vma and needs to redo the page-binding
>>> +before a new GPU submission can succeed.
>>> +
>>> +Optimizing gpu_vma iteration
>>> +----------------------------
>>> +
>>> +Iterating through all of a gpu_vm's userptr gpu_vmas to check the validity
>>> +on each exec function may be very costly. There is a scheme to avoid
>>> +this and only iterate through the userptr gpu_vmas that actually saw an
>>> +invalidation notifier call since the last exec. T
>>> +
>>> +TODO: describe that scheme here. It's implemented in the xe driver.
>>> +
>>> +Locking for page-table updates at bind- and unbind time
>>> +=======================================================
>>> +
>>> +TODO.
>>> +
>>> +Recoverable page-fault implications
>>> +===================================
>>> +
>>> +TODO.
>>> -- 
>>> 2.41.0
>>>
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
  2023-09-06  8:00       ` [Intel-xe] " Danilo Krummrich
@ 2023-09-06  8:32         ` Thomas Hellström
  -1 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-09-06  8:32 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Matthew Brost, Francois Dugast, linux-kernel, Oak Zeng,
	dri-devel, Rodrigo Vivi, intel-xe

[-- Attachment #1: Type: text/plain, Size: 24233 bytes --]


On 9/6/23 10:00, Danilo Krummrich wrote:
> On 9/6/23 09:06, Thomas Hellström wrote:
>> Hi, Danilo,
>>
>> Thanks for taking a look. Comments inline.
>>
>> On 9/5/23 21:50, Danilo Krummrich wrote:
>>> On Wed, Aug 16, 2023 at 11:15:47AM +0200, Thomas Hellström wrote:
>>>> Add the first version of the VM_BIND locking document which is
>>>> intended to be part of the xe driver upstreaming agreement.
>>>>
>>>> The document describes and discuss the locking used during exec-
>>>> functions, evicton and for userptr gpu-vmas. Intention is to be 
>>>> using the
>>>> same nomenclature as the drm-vm-bind-async.rst.
>>>>
>>>> v2:
>>>>
>>>>
>>>> - s/gvm/gpu_vm/g (Rodrigo Vivi)
>>>> - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
>>>>    (Rodrigo Vivi)
>>>> - Adjust commit message accordingly.
>>>> - Add SPDX license header.
>>>>
>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> ---
>>>>   Documentation/gpu/drm-vm-bind-locking.rst | 351 
>>>> ++++++++++++++++++++++
>>>>   1 file changed, 351 insertions(+)
>>>>   create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst
>>>>
>>>> diff --git a/Documentation/gpu/drm-vm-bind-locking.rst 
>>>> b/Documentation/gpu/drm-vm-bind-locking.rst
>>>> new file mode 100644
>>>> index 000000000000..b813961a9ec2
>>>> --- /dev/null
>>>> +++ b/Documentation/gpu/drm-vm-bind-locking.rst
>>>> @@ -0,0 +1,351 @@
>>>> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
>>>> +
>>>> +===============
>>>> +VM_BIND locking
>>>> +===============
>>>> +
>>>> +This document attempts to describe what's needed to get VM_BIND 
>>>> locking right,
>>>> +including the userptr mmu_notifier locking and it will also 
>>>> discuss some
>>>> +optimizations to get rid of the looping through of all userptr 
>>>> mappings and
>>>> +external / shared object mappings that is needed in the simplest
>>>> +implementation. It will also discuss some implications for 
>>>> faulting gpu_vms.
>>>> +
>>>> +Nomenclature
>>>> +============
>>>> +
>>>> +* ``Context``: GPU execution context.
>>>> +* ``gpu_vm``: Abstraction of a virtual GPU address space with
>>>>
>>>>
>>>> +  meta-data. Typically one per client (DRM file-private), or one per
>>>> +  context.
>>>> +* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm 
>>>> with
>>> The same nomenclature was used within the VM_BIND async document as 
>>> well. I
>>> wonder if it would make sense to align the naming with the GPUVA 
>>> manager, such
>>> that ('drm_gpuva_manager' -> 'drm_gpuvm'). This would also result 
>>> into better
>>> function names, such as drm_gpuvm_resv_lock() or 
>>> drm_gpuvm_prepare_objects() and
>>> potentially way better naming for the VM_BO abstraction 'drm_gpuvm_bo'.
>>>
>>> However, I'd like to keep 'drm_gpuva' rather than 'drm_gpu_vma', but 
>>> I think
>>> this is close enough anyway.
>>
>> I don't have a strong opinion about the naming here and aligning with 
>> the GPUVA manager make sense, although perhaps the "drm_" prefix 
>> which makes sense for the function- and struct names may not make 
>> sense in a more generic document like this. What about gpuva and gpuvm?
>
> Oh, I think the document is fine as it is. This was more like me 
> thinking loud
> about renaming things in the GPUVA manager accordingly.
>
>>
>>
>>>
>>>> +  associated meta-data. The backing storage of a gpu_vma can 
>>>> either be
>>>> +  a gem buffer object or anonymous pages mapped also into the CPU
>>>> +  address space for the process.
>>>> +* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing 
>>>> store of
>>>> +  which is anonymous pages as described above.
>>>> +* ``revalidating``: Revalidating a gpu_vma means making the latest 
>>>> version
>>>> +  of the backing store resident and making sure the gpu_vma's
>>>> +  page-table entries point to that backing store.
>>>> +* ``dma_fence``: A struct dma_fence that is similar to a struct 
>>>> completion
>>>> +  and which tracks GPU activity. When the GPU activity is finished,
>>>> +  the dma_fence signals.
>>>> +* ``dma_resv``: A struct dma_resv (AKA reservation object) that is 
>>>> used
>>>> +  to track GPU activity in the form of multiple dma_fences on a
>>>> +  gpu_vm or a gem buffer object. The dma_resv contains an array / 
>>>> list
>>>> +  of dma_fences and a lock that needs to be held when adding
>>>>
>>>>
>>>> +  additional dma_fences to the dma_resv. The lock is of a type that
>>>> +  allows deadlock-safe locking of multiple dma_resvs in arbitrary 
>>>> order.
>>>> +* ``exec function``: An exec function is a function that 
>>>> revalidates all
>>>> +  affected gpu_vmas, submits a GPU command batch and registers the
>>>> +  dma_fence representing the GPU command's activity with all affected
>>>> +  dma_resvs. For completeness, although not covered by this document,
>>>> +  it's worth mentioning that an exec function may also be the
>>>> +  revalidation worker that is used by some drivers in compute /
>>>> +  long-running mode.
>>>> +* ``local object``: A GEM object which is local to a gpu_vm. 
>>>> Shared gem
>>>> +  objects also share the gpu_vm's dma_resv.
>>>> +* ``shared object``: AKA external object: A GEM object which may 
>>>> be shared
>>>> +  by multiple gpu_vms and whose backing storage may be shared with
>>>> +  other drivers.
>>>> +
>>>> +
>>>> +Introducing the locks
>>>> +=====================
>>>> +
>>>> +One of the benefits of VM_BIND is that local GEM objects share the 
>>>> gpu_vm's
>>>> +dma_resv object and hence the dma_resv lock. So even with a huge
>>>> +number of local GEM objects, only one lock is needed to make the exec
>>>> +sequence atomic.
>>>> +
>>>> +The following locks and locking orders are used:
>>>> +
>>>> +* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the 
>>>> gpu_vm is
>>>> +  partitioned into gpu_vmas, protects the gpu_vm's list of 
>>>> external objects,
>>>> +  and can also with some simplification protect the gpu_vm's list of
>>>> +  userptr gpu_vmas. With the CPU mm analogy this would correspond 
>>>> to the
>>>> +  mmap_lock.
>>>> +* The ``userptr_seqlock``. This lock is taken in read mode for each
>>>> +  userptr gpu_vma on the gpu_vm's userptr list, and in write mode 
>>>> during mmu
>>>> +  notifier invalidation. This is not a real seqlock but described in
>>>> +  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
>>>> +  'lock' a lot like a seqcount, however this allows multiple
>>>> +  write-sides to hold it at once...". The read side critical section
>>>> +  is enclosed by ``mmu_interval_read_begin() /
>>>> +  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
>>>> +  sleeping uninterruptibly if the write side is held.
>>>> +  The write side is held by the core mm while calling mmu interval
>>>> +  invalidation notifiers.
>>>> +* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of 
>>>> gpu_vmas needing
>>>> +  rebinding, and also the residency of all the gpu_vm's local GEM 
>>>> object.
>>>> +* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is 
>>>> taken in read
>>>> +  mode during exec and write mode during a mmu notifier 
>>>> invalidation. In
>>>> +  the absence of a separate page-table lock, this lock can serve
>>>> +  together with the gpu_vm's dma_resv lock as a page-table lock. 
>>>> More on
>>>> +  this below. The userptr notifier lock is per gpu_vm.
>>>> +* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's 
>>>> page-table updates. For
>>>> +  simplicity the gpu_vm's dma_resv lock can be reused as 
>>>> page-table lock.
>>>> +
>>>> +There are certain optimizations described below that require
>>>> +additional locks. More on that later.
>>>> +
>>>> +.. code-block:: C
>>>> +
>>>> +   dma_resv_lock(&gpu_vm->resv);
>>>> +
>>>> +   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
>>>> +        revalidate_gpu_vma(&gpu_vma);
>>>> +        remove_from_revalidate_list(&gpu_vma);
>>>> +   }
>>>> +
>>>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>>>> +   job_dma_fence = gpu_submit(&gpu_job));
>>>> +
>>>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>>>> +   dma_resv_unlock(&gpu_vm->resv);
>>>> +
>>>> +Eviction of one of these local objects will then be something like 
>>>> the
>>>> +following:
>>>> +
>>>> +.. code-block:: C
>>>> +
>>>> +   obj = get_object_from_lru();
>>>> +
>>>> +   dma_resv_lock(obj->resv);
>>>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>>>> +        put_gpu_vma_on_revalidate_list(&gpu_vma);
>>>> +
>>>> +   add_dependencies(&eviction_job, &obj->resv);
>>>> +   job_dma_fence = gpu_submit(&eviction_job);
>>>> +   add_dma_fence(&obj->resv, job_dma_fence);
>>>> +
>>>> +   dma_resv_unlock(&obj->resv);
>>>> +   put_object(obj);
>>>> +
>>>> +Note that since the object is local to the gpu_vm, it will share 
>>>> the gpu_vm's
>>>> +``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. 
>>>> Invalidated gpu_vmas are put
>>>> +on the gpu_vm's revalidation list, which is protected by 
>>>> ``gpu_vm->resv``, which
>>>> +is always locked while evicting, due to the above equality.
>>>> +
>>>> +For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before 
>>>> eviction,
>>>> +Since the eviction blit or copy will wait for GPU idle, any 
>>>> attempt by
>>>> +the GPU to access freed memory through the gpu_vma will be 
>>>> preceded by
>>>> +a new exec function, which will make sure the gpu_vma is
>>>> +revalidated. The eviction code holding the object's dma_resv while
>>>> +revalidating will ensure a new exec function may not race with the 
>>>> eviction.
>>>> +
>>>> +Introducing external (or shared) buffer objects
>>>> +===============================================
>>>> +
>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>>>> +can't share their reservation object with a single gpu_vm, but 
>>>> will rather
>>>> +have a reservation object of their own. The shared objects bound to a
>>>> +gpu_vm using one or many
>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>>>> +protected by the gpu_vm lock. One could in theory protect it also 
>>>> with
>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is 
>>>> typically
>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>>>> +the current locking helpers, that is typically not done. Also see
>>>> +below for userptr gpu_vmas.
>>>> +
>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>>>> +object, but we can no longer be certain that we hold the gpu_vm's
>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
>>> I need to think a bit more about locking of extobj and evicted 
>>> object tracking
>>> in the case of processing 'drm_gpuva_ops' directly through callbacks 
>>> within the
>>> fence signalling critical path as mentioend in [1].
>>>
>>> In order to support that, we'd need to protect extobjs with a 
>>> separate lock,
>>> and while iterating extobjs to acquire the dma-resv lock drop the 
>>> lock within
>>> the loop before we actually acquire the dma-resv lock. Maple tree 
>>> supports that
>>> already and this can be fully done within the GPUVA manager; no need 
>>> for the
>>> driver to care about that.
>>
>> So do I understand correctly that this because you want to update the 
>> gpuvm state while operations are progressing asynchronously?
>>
>> If so, I wonder whether that could really be done? For example to 
>> allocate enough memory for page-tables etc, you need to know the 
>> details of the operations at IOCTL execution time, and to know the 
>> details you need to know the state from the previous operation?
>
>
> Right, sync and async bind can't run fully concurrently, but you could 
> "inject" a
> sync one between two async ones such that the sync ones executed from 
> the IOCTL
> directly while async execution is stalled meanwhile. This would be 
> possible because
> the actual drm_gpuva_ops would be calculated within the async 
> execution path rather
> than in the IOCTL. But yes, page-table management must be desinged to 
> support that.

OK, well one of the main motivations for Xe is to be able to pipeline 
interleaving binds and execs if needed, like so:

- Bind vmas for scene 1.
- Submit scene 1.
- Unbind vmas for scene 1.
- Bind vmas for scene 2.
- Submit scene 2.
- Unbind vmas for scene 2.

And being able to *submit* all of the above while the async binding of 
vmas for scene (step 1) has not yet completed.
I can't really see how this could be done, while obeying dma-fence 
rules, unless state is updated synchronously while submitting?

So unless I'm misunderstanding what you are trying to do, I don't see Xe 
wanting to side-step the current approach, but OTOH protecting part of 
the state with additional locks probably won't be a problem as long as 
that is optional.

>
>>
>>>
>>> While, as already mentioned, I'd really love to support that, I 
>>> noticed that we
>>> have a similar issue with tracking evicted objects. There are 
>>> (similar) ways to
>>> deal with that, however, it drastically increases complexity.
>>>
>>> Hence, I'd like to reconsider whether it's worth supporting it in 
>>> the first
>>> place. Most of the arguments in order to support it are for decreasing
>>> complexity. However, if it increases complexity elsewhere, it's 
>>> probably not
>>> worth. The only argument left would be for synchronous bind jobs 
>>> which could
>>> be injected at any point of time without the need to be queued up in 
>>> the
>>> scheduler to preserve ordering. However, I'm not yet sure how 
>>> important this
>>> would be. For Xe it doesn't really seem to be a concern I guess?
>> Xe supports that functionality via separate bind queues. If you queue 
>> most of the operations using one queue, you can inject synchronous 
>> bind jobs using another. Ideally they execute separately, but they 
>> are not guaranteed to do that.
>
> Ok, but the separate bind queue would still work in the same 
> asynchronous way, as
> in the job is submitted to some kind of worker and the IOCTL just 
> blocks until
> completion, right?

The job is only submitted to a worker if there are unsatisfied 
dependencies, like that bind queue is busy with something else, or a GPU 
job is wiping the BO content for security reasons, or an in-fence, or 
somebody else having queued a job to the same page-table range *). 
Otherwise the page-table is updated immediately using CPU writes.

But yes, the IOCTL blocks until completion if the job is synchronous.

/Thomas


>
>
>
>>>
>>> [1] 
>>> https://lore.kernel.org/dri-devel/202308221050.kTj8uFMA-lkp@intel.com/T/#m7f3b5a7ff70723332adeea32671578cb95c62f7c
>>>
>>>> +hold the object's private dma_resv. We can trylock the dma_resvs for
>>>> +the affected gpu_vm's but that might be unnecessarily complex. If we
>>>> +have a ww_acquire context at hand at eviction time we can also 
>>>> perform
>>>> +sleeping locks of those dma_resvs but that could cause expensive
>>>> +rollbacks. One option is to just mark the invalidated gpu_vmas 
>>>> with a bool
>>>> +which is inspected on the next exec function, when the gpu_vm's
>>>> +dma_resv and the object's dma_resv is held, and the invalidated
>>>> +gpu_vmas could then be put on the gpu_vm's list of invalidated
>>>> +gpu_vmas. That bool would then, although being per-gpu_vma 
>>>> formally be
>>>> +protected by the object's dma_resv.
>>>> +
>>>> +The exec function would then look something like the following:
>>>> +
>>>> +.. code-block:: C
>>>> +
>>>> +   read_lock(&gpu_vm->lock);
>>>> +
>>>> +   dma_resv_lock(&gpu_vm->resv);
>>>> +
>>>> +   // Shared object list is protected by the gpu_vm->lock.
>>>> +   for_each_shared_obj(gpu_vm, &obj) {
>>>> +        dma_resv_lock(&obj->resv);
>>>> + move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
>>>> +   }
>>>> +
>>>> +   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
>>>> +        revalidate_gpu_vma(&gpu_vma);
>>>> +        remove_from_revalidate_list(&gpu_vma);
>>>> +   }
>>>> +
>>>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>>>> +   job_dma_fence = gpu_submit(&gpu_job));
>>>> +
>>>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>>>> +   for_each_shared_obj(gpu_vm, &obj)
>>>> +          add_dma_fence(job_dma_fence, &obj->resv);
>>>> +   dma_resv_unlock_all_resv_locks();
>>>> +
>>>> +   read_unlock(&gpu_vm->lock);
>>>> +
>>>> +And the corresponding shared-object aware eviction would look like:
>>>> +
>>>> +.. code-block:: C
>>>> +
>>>> +   obj = get_object_from_lru();
>>>> +
>>>> +   dma_resv_lock(obj->resv);
>>>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>>>> +        if (object_is_vm_local(obj))
>>>> + put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>>>> +        else
>>>> + mark_gpu_vma_for_revalidation(&gpu_vma);
>>>> +
>>>> +   add_dependencies(&eviction_job, &obj->resv);
>>>> +   job_dma_fence = gpu_submit(&eviction_job);
>>>> +   add_dma_fence(&obj->resv, job_dma_fence);
>>>> +
>>>> +   dma_resv_unlock(&obj->resv);
>>>> +   put_object(obj);
>>>> +
>>>> +Yet another option is to put the gpu_vmas to be invalidated on a 
>>>> separate
>>>> +gpu_vm list protected by a lower level lock that can be taken both 
>>>> at eviction
>>>> +time and at transfer-to-revalidate list time. The details are not in
>>>> +this document, but this for reference implemented in the Intel xe
>>>> +driver.
>>>> +
>>>> +Introducing userptr gpu_vmas
>>>> +============================
>>>> +
>>>> +A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer 
>>>> object to a
>>>> +GPU virtual address range, directly maps a CPU mm range of anonymous-
>>>> +or file page-cache pages.
>>>> +A very simple approach would be to just pin the pages using
>>>> +pin_user_pages() at bind time and unpin them at unbind time, but this
>>>> +creates a Denial-Of-Service vector since a single user-space process
>>>> +would be able to pin down all of system memory, which is not
>>>> +desirable. (For special use-cases and with proper accounting 
>>>> pinning might
>>>> +still be a desirable feature, though). What we need to do in the 
>>>> general case is
>>>> +to obtain a reference to the desired pages, make sure we are notified
>>>> +using a MMU notifier just before the CPU mm unmaps the pages, dirty
>>>> +them if they are not mapped read-only to the GPU, and then drop 
>>>> the reference.
>>>> +When we are notified by the MMU notifier that CPU mm is about to 
>>>> drop the
>>>> +pages, we need to stop GPU access to the pages,
>>>> +GPU page-table and make sure that before the next time the GPU 
>>>> tries to access
>>>> +whatever is now present in the CPU mm range, we unmap the old pages
>>>> +from the GPU page tables and repeat the process of obtaining new page
>>>> +references. Note that when the core mm decides to laundry pages, 
>>>> we get such
>>>> +an unmap MMU notification and can mark the pages dirty again 
>>>> before the
>>>> +next GPU access. We also get similar MMU notifications for NUMA 
>>>> accounting
>>>> +which the GPU driver doesn't really need to care about, but so far
>>>> +it's proven difficult to exclude certain notifications.
>>>> +
>>>> +Using a MMU notifier for device DMA (and other methods) is 
>>>> described in
>>>> +`this document
>>>> +<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_. 
>>>>
>>>> +
>>>> +Now the method of obtaining struct page references using
>>>> +get_user_pages() unfortunately can't be used under a dma_resv lock
>>>> +since that would violate the locking order of the dma_resv lock vs 
>>>> the
>>>> +mmap_lock that is grabbed when resolving a CPU pagefault. This 
>>>> means the gpu_vm's
>>>> +list of userptr gpu_vmas needs to be protected by an outer lock, 
>>>> and this
>>>> +is the first time we strictly need the gpu_vm->lock. While it was
>>>> +previously used also to protect the list of the gpu_vm's shared 
>>>> objects,
>>>> +we could in theory have used the gpu_vm->resv for that.
>>>> +
>>>> +The MMU interval seqlock for a userptr gpu_vma is used in the 
>>>> following
>>>> +way:
>>>> +
>>>> +.. code-block:: C
>>>> +
>>>> +   down_read(&gpu_vm->lock);
>>>> +
>>>> +   retry:
>>>> +
>>>> +   // Note: mmu_interval_read_begin() blocks until there is no
>>>> +   // invalidation notifier running anymore.
>>>> +   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
>>>> +   if (seq != gpu_vma->saved_seq) {
>>>> +           obtain_new_page_pointers(&gpu_vma);
>>>> +       dma_resv_lock(&gpu_vm->resv);
>>>> +       put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>>>> +       dma_resv_unlock(&gpu_vm->resv);
>>>> +       gpu_vma->saved_seq = seq;
>>>> +   }
>>>> +
>>>> +   // The usual revalidation goes here.
>>>> +
>>>> +   // Final userptr sequence validation may not happen before the
>>>> +   // submission dma_fence is added to the gpu_vm's resv, from the 
>>>> POW
>>>> +   // of the MMU invalidation notifier. Hence the
>>>> +   // userptr_notifier_lock that will make them appear atomic.
>>>> +
>>>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>>>> +   down_read(&gpu_vm->userptr_notifier_lock);
>>>> +   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, 
>>>> gpu_vma->saved_seq)) {
>>>> +          up_read(&gpu_vm->userptr_notifier_lock);
>>>> +      goto retry;
>>>> +   }
>>>> +
>>>> +   job_dma_fence = gpu_submit(&gpu_job));
>>>> +
>>>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>>>> +
>>>> +   for_each_shared_obj(gpu_vm, &obj)
>>>> +          add_dma_fence(job_dma_fence, &obj->resv);
>>>> +
>>>> +   dma_resv_unlock_all_resv_locks();
>>>> +   up_read(&gpu_vm->userptr_notifier_lock);
>>>> +   up_read(&gpu_vm->lock);
>>>> +
>>>> +The code between ``mmu_interval_read_begin()`` and the
>>>> +``mmu_interval_read_retry()`` marks the read side critical section of
>>>> +what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
>>>> +gpu_vma list is looped through, and the check is done for *all* of 
>>>> its
>>>> +userptr gpu_vmas, although we only show a single one here.
>>>> +
>>>> +The userptr gpu_vma MMU invalidation notifier might be called fr
>>>>
>>>>
>>>> om
>>>> +reclaim context and, again to avoid locking order violations, we 
>>>> can't
>>>> +take any dma_resv lock nor the gpu_vm->lock from within it.
>>>> +
>>>> +.. code-block:: C
>>>> +
>>>> +  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
>>>> +  {
>>>> +          // Make sure the exec function either sees the new sequence
>>>> +      // and backs off or we wait for the dma-fence:
>>>> +
>>>> + down_write(&gpu_vm->userptr_notifier_lock);
>>>> +      mmu_interval_set_seq(userptr_interval, cur_seq);
>>>> +      up_write(&gpu_vm->userptr_notifier_lock);
>>>> +
>>>> +      dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
>>>> +                        false, MAX_SCHEDULE_TIMEOUT);
>>>> +      return true;
>>>> +  }
>>>> +
>>>> +When this invalidation notifier returns, the GPU can no longer be
>>>> +accessing the old pages of the userptr gpu_vma and needs to redo 
>>>> the page-binding
>>>> +before a new GPU submission can succeed.
>>>> +
>>>> +Optimizing gpu_vma iteration
>>>> +----------------------------
>>>> +
>>>> +Iterating through all of a gpu_vm's userptr gpu_vmas to check the 
>>>> validity
>>>> +on each exec function may be very costly. There is a scheme to avoid
>>>> +this and only iterate through the userptr gpu_vmas that actually 
>>>> saw an
>>>> +invalidation notifier call since the last exec. T
>>>> +
>>>> +TODO: describe that scheme here. It's implemented in the xe driver.
>>>> +
>>>> +Locking for page-table updates at bind- and unbind time
>>>> +=======================================================
>>>> +
>>>> +TODO.
>>>> +
>>>> +Recoverable page-fault implications
>>>> +===================================
>>>> +
>>>> +TODO.
>>>> -- 
>>>> 2.41.0
>>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 38418 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-xe] [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06  8:32         ` Thomas Hellström
  0 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-09-06  8:32 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Francois Dugast, Joonas Lahtinen, linux-kernel, dri-devel,
	Daniel Vetter, Rodrigo Vivi, intel-xe

[-- Attachment #1: Type: text/plain, Size: 24233 bytes --]


On 9/6/23 10:00, Danilo Krummrich wrote:
> On 9/6/23 09:06, Thomas Hellström wrote:
>> Hi, Danilo,
>>
>> Thanks for taking a look. Comments inline.
>>
>> On 9/5/23 21:50, Danilo Krummrich wrote:
>>> On Wed, Aug 16, 2023 at 11:15:47AM +0200, Thomas Hellström wrote:
>>>> Add the first version of the VM_BIND locking document which is
>>>> intended to be part of the xe driver upstreaming agreement.
>>>>
>>>> The document describes and discuss the locking used during exec-
>>>> functions, evicton and for userptr gpu-vmas. Intention is to be 
>>>> using the
>>>> same nomenclature as the drm-vm-bind-async.rst.
>>>>
>>>> v2:
>>>>
>>>>
>>>> - s/gvm/gpu_vm/g (Rodrigo Vivi)
>>>> - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
>>>>    (Rodrigo Vivi)
>>>> - Adjust commit message accordingly.
>>>> - Add SPDX license header.
>>>>
>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> ---
>>>>   Documentation/gpu/drm-vm-bind-locking.rst | 351 
>>>> ++++++++++++++++++++++
>>>>   1 file changed, 351 insertions(+)
>>>>   create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst
>>>>
>>>> diff --git a/Documentation/gpu/drm-vm-bind-locking.rst 
>>>> b/Documentation/gpu/drm-vm-bind-locking.rst
>>>> new file mode 100644
>>>> index 000000000000..b813961a9ec2
>>>> --- /dev/null
>>>> +++ b/Documentation/gpu/drm-vm-bind-locking.rst
>>>> @@ -0,0 +1,351 @@
>>>> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
>>>> +
>>>> +===============
>>>> +VM_BIND locking
>>>> +===============
>>>> +
>>>> +This document attempts to describe what's needed to get VM_BIND 
>>>> locking right,
>>>> +including the userptr mmu_notifier locking and it will also 
>>>> discuss some
>>>> +optimizations to get rid of the looping through of all userptr 
>>>> mappings and
>>>> +external / shared object mappings that is needed in the simplest
>>>> +implementation. It will also discuss some implications for 
>>>> faulting gpu_vms.
>>>> +
>>>> +Nomenclature
>>>> +============
>>>> +
>>>> +* ``Context``: GPU execution context.
>>>> +* ``gpu_vm``: Abstraction of a virtual GPU address space with
>>>>
>>>>
>>>> +  meta-data. Typically one per client (DRM file-private), or one per
>>>> +  context.
>>>> +* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm 
>>>> with
>>> The same nomenclature was used within the VM_BIND async document as 
>>> well. I
>>> wonder if it would make sense to align the naming with the GPUVA 
>>> manager, such
>>> that ('drm_gpuva_manager' -> 'drm_gpuvm'). This would also result 
>>> into better
>>> function names, such as drm_gpuvm_resv_lock() or 
>>> drm_gpuvm_prepare_objects() and
>>> potentially way better naming for the VM_BO abstraction 'drm_gpuvm_bo'.
>>>
>>> However, I'd like to keep 'drm_gpuva' rather than 'drm_gpu_vma', but 
>>> I think
>>> this is close enough anyway.
>>
>> I don't have a strong opinion about the naming here and aligning with 
>> the GPUVA manager make sense, although perhaps the "drm_" prefix 
>> which makes sense for the function- and struct names may not make 
>> sense in a more generic document like this. What about gpuva and gpuvm?
>
> Oh, I think the document is fine as it is. This was more like me 
> thinking loud
> about renaming things in the GPUVA manager accordingly.
>
>>
>>
>>>
>>>> +  associated meta-data. The backing storage of a gpu_vma can 
>>>> either be
>>>> +  a gem buffer object or anonymous pages mapped also into the CPU
>>>> +  address space for the process.
>>>> +* ``userptr gpu_vma or just userptr``: A gpu_vma, the backing 
>>>> store of
>>>> +  which is anonymous pages as described above.
>>>> +* ``revalidating``: Revalidating a gpu_vma means making the latest 
>>>> version
>>>> +  of the backing store resident and making sure the gpu_vma's
>>>> +  page-table entries point to that backing store.
>>>> +* ``dma_fence``: A struct dma_fence that is similar to a struct 
>>>> completion
>>>> +  and which tracks GPU activity. When the GPU activity is finished,
>>>> +  the dma_fence signals.
>>>> +* ``dma_resv``: A struct dma_resv (AKA reservation object) that is 
>>>> used
>>>> +  to track GPU activity in the form of multiple dma_fences on a
>>>> +  gpu_vm or a gem buffer object. The dma_resv contains an array / 
>>>> list
>>>> +  of dma_fences and a lock that needs to be held when adding
>>>>
>>>>
>>>> +  additional dma_fences to the dma_resv. The lock is of a type that
>>>> +  allows deadlock-safe locking of multiple dma_resvs in arbitrary 
>>>> order.
>>>> +* ``exec function``: An exec function is a function that 
>>>> revalidates all
>>>> +  affected gpu_vmas, submits a GPU command batch and registers the
>>>> +  dma_fence representing the GPU command's activity with all affected
>>>> +  dma_resvs. For completeness, although not covered by this document,
>>>> +  it's worth mentioning that an exec function may also be the
>>>> +  revalidation worker that is used by some drivers in compute /
>>>> +  long-running mode.
>>>> +* ``local object``: A GEM object which is local to a gpu_vm. 
>>>> Shared gem
>>>> +  objects also share the gpu_vm's dma_resv.
>>>> +* ``shared object``: AKA external object: A GEM object which may 
>>>> be shared
>>>> +  by multiple gpu_vms and whose backing storage may be shared with
>>>> +  other drivers.
>>>> +
>>>> +
>>>> +Introducing the locks
>>>> +=====================
>>>> +
>>>> +One of the benefits of VM_BIND is that local GEM objects share the 
>>>> gpu_vm's
>>>> +dma_resv object and hence the dma_resv lock. So even with a huge
>>>> +number of local GEM objects, only one lock is needed to make the exec
>>>> +sequence atomic.
>>>> +
>>>> +The following locks and locking orders are used:
>>>> +
>>>> +* The ``gpu_vm->lock`` (optionally an rwsem). Protects how the 
>>>> gpu_vm is
>>>> +  partitioned into gpu_vmas, protects the gpu_vm's list of 
>>>> external objects,
>>>> +  and can also with some simplification protect the gpu_vm's list of
>>>> +  userptr gpu_vmas. With the CPU mm analogy this would correspond 
>>>> to the
>>>> +  mmap_lock.
>>>> +* The ``userptr_seqlock``. This lock is taken in read mode for each
>>>> +  userptr gpu_vma on the gpu_vm's userptr list, and in write mode 
>>>> during mmu
>>>> +  notifier invalidation. This is not a real seqlock but described in
>>>> +  ``mm/mmu_notifier.c` as a "Collision-retry read-side/write-side
>>>> +  'lock' a lot like a seqcount, however this allows multiple
>>>> +  write-sides to hold it at once...". The read side critical section
>>>> +  is enclosed by ``mmu_interval_read_begin() /
>>>> +  mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
>>>> +  sleeping uninterruptibly if the write side is held.
>>>> +  The write side is held by the core mm while calling mmu interval
>>>> +  invalidation notifiers.
>>>> +* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of 
>>>> gpu_vmas needing
>>>> +  rebinding, and also the residency of all the gpu_vm's local GEM 
>>>> object.
>>>> +* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is 
>>>> taken in read
>>>> +  mode during exec and write mode during a mmu notifier 
>>>> invalidation. In
>>>> +  the absence of a separate page-table lock, this lock can serve
>>>> +  together with the gpu_vm's dma_resv lock as a page-table lock. 
>>>> More on
>>>> +  this below. The userptr notifier lock is per gpu_vm.
>>>> +* The ``gpu_vm->page_table_lock``. Protects the gpu_vm's 
>>>> page-table updates. For
>>>> +  simplicity the gpu_vm's dma_resv lock can be reused as 
>>>> page-table lock.
>>>> +
>>>> +There are certain optimizations described below that require
>>>> +additional locks. More on that later.
>>>> +
>>>> +.. code-block:: C
>>>> +
>>>> +   dma_resv_lock(&gpu_vm->resv);
>>>> +
>>>> +   for_each_gpu_vma_on_revalidate_list(gpu_vm, &gpu_vma) {
>>>> +        revalidate_gpu_vma(&gpu_vma);
>>>> +        remove_from_revalidate_list(&gpu_vma);
>>>> +   }
>>>> +
>>>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>>>> +   job_dma_fence = gpu_submit(&gpu_job));
>>>> +
>>>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>>>> +   dma_resv_unlock(&gpu_vm->resv);
>>>> +
>>>> +Eviction of one of these local objects will then be something like 
>>>> the
>>>> +following:
>>>> +
>>>> +.. code-block:: C
>>>> +
>>>> +   obj = get_object_from_lru();
>>>> +
>>>> +   dma_resv_lock(obj->resv);
>>>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>>>> +        put_gpu_vma_on_revalidate_list(&gpu_vma);
>>>> +
>>>> +   add_dependencies(&eviction_job, &obj->resv);
>>>> +   job_dma_fence = gpu_submit(&eviction_job);
>>>> +   add_dma_fence(&obj->resv, job_dma_fence);
>>>> +
>>>> +   dma_resv_unlock(&obj->resv);
>>>> +   put_object(obj);
>>>> +
>>>> +Note that since the object is local to the gpu_vm, it will share 
>>>> the gpu_vm's
>>>> +``dma_resv`` lock so that ``obj->resv == gpu_vm->resv``. 
>>>> Invalidated gpu_vmas are put
>>>> +on the gpu_vm's revalidation list, which is protected by 
>>>> ``gpu_vm->resv``, which
>>>> +is always locked while evicting, due to the above equality.
>>>> +
>>>> +For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before 
>>>> eviction,
>>>> +Since the eviction blit or copy will wait for GPU idle, any 
>>>> attempt by
>>>> +the GPU to access freed memory through the gpu_vma will be 
>>>> preceded by
>>>> +a new exec function, which will make sure the gpu_vma is
>>>> +revalidated. The eviction code holding the object's dma_resv while
>>>> +revalidating will ensure a new exec function may not race with the 
>>>> eviction.
>>>> +
>>>> +Introducing external (or shared) buffer objects
>>>> +===============================================
>>>> +
>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>>>> +can't share their reservation object with a single gpu_vm, but 
>>>> will rather
>>>> +have a reservation object of their own. The shared objects bound to a
>>>> +gpu_vm using one or many
>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>>>> +protected by the gpu_vm lock. One could in theory protect it also 
>>>> with
>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is 
>>>> typically
>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>>>> +the current locking helpers, that is typically not done. Also see
>>>> +below for userptr gpu_vmas.
>>>> +
>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>>>> +object, but we can no longer be certain that we hold the gpu_vm's
>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
>>> I need to think a bit more about locking of extobj and evicted 
>>> object tracking
>>> in the case of processing 'drm_gpuva_ops' directly through callbacks 
>>> within the
>>> fence signalling critical path as mentioend in [1].
>>>
>>> In order to support that, we'd need to protect extobjs with a 
>>> separate lock,
>>> and while iterating extobjs to acquire the dma-resv lock drop the 
>>> lock within
>>> the loop before we actually acquire the dma-resv lock. Maple tree 
>>> supports that
>>> already and this can be fully done within the GPUVA manager; no need 
>>> for the
>>> driver to care about that.
>>
>> So do I understand correctly that this because you want to update the 
>> gpuvm state while operations are progressing asynchronously?
>>
>> If so, I wonder whether that could really be done? For example to 
>> allocate enough memory for page-tables etc, you need to know the 
>> details of the operations at IOCTL execution time, and to know the 
>> details you need to know the state from the previous operation?
>
>
> Right, sync and async bind can't run fully concurrently, but you could 
> "inject" a
> sync one between two async ones such that the sync ones executed from 
> the IOCTL
> directly while async execution is stalled meanwhile. This would be 
> possible because
> the actual drm_gpuva_ops would be calculated within the async 
> execution path rather
> than in the IOCTL. But yes, page-table management must be desinged to 
> support that.

OK, well one of the main motivations for Xe is to be able to pipeline 
interleaving binds and execs if needed, like so:

- Bind vmas for scene 1.
- Submit scene 1.
- Unbind vmas for scene 1.
- Bind vmas for scene 2.
- Submit scene 2.
- Unbind vmas for scene 2.

And being able to *submit* all of the above while the async binding of 
vmas for scene (step 1) has not yet completed.
I can't really see how this could be done, while obeying dma-fence 
rules, unless state is updated synchronously while submitting?

So unless I'm misunderstanding what you are trying to do, I don't see Xe 
wanting to side-step the current approach, but OTOH protecting part of 
the state with additional locks probably won't be a problem as long as 
that is optional.

>
>>
>>>
>>> While, as already mentioned, I'd really love to support that, I 
>>> noticed that we
>>> have a similar issue with tracking evicted objects. There are 
>>> (similar) ways to
>>> deal with that, however, it drastically increases complexity.
>>>
>>> Hence, I'd like to reconsider whether it's worth supporting it in 
>>> the first
>>> place. Most of the arguments in order to support it are for decreasing
>>> complexity. However, if it increases complexity elsewhere, it's 
>>> probably not
>>> worth. The only argument left would be for synchronous bind jobs 
>>> which could
>>> be injected at any point of time without the need to be queued up in 
>>> the
>>> scheduler to preserve ordering. However, I'm not yet sure how 
>>> important this
>>> would be. For Xe it doesn't really seem to be a concern I guess?
>> Xe supports that functionality via separate bind queues. If you queue 
>> most of the operations using one queue, you can inject synchronous 
>> bind jobs using another. Ideally they execute separately, but they 
>> are not guaranteed to do that.
>
> Ok, but the separate bind queue would still work in the same 
> asynchronous way, as
> in the job is submitted to some kind of worker and the IOCTL just 
> blocks until
> completion, right?

The job is only submitted to a worker if there are unsatisfied 
dependencies, like that bind queue is busy with something else, or a GPU 
job is wiping the BO content for security reasons, or an in-fence, or 
somebody else having queued a job to the same page-table range *). 
Otherwise the page-table is updated immediately using CPU writes.

But yes, the IOCTL blocks until completion if the job is synchronous.

/Thomas


>
>
>
>>>
>>> [1] 
>>> https://lore.kernel.org/dri-devel/202308221050.kTj8uFMA-lkp@intel.com/T/#m7f3b5a7ff70723332adeea32671578cb95c62f7c
>>>
>>>> +hold the object's private dma_resv. We can trylock the dma_resvs for
>>>> +the affected gpu_vm's but that might be unnecessarily complex. If we
>>>> +have a ww_acquire context at hand at eviction time we can also 
>>>> perform
>>>> +sleeping locks of those dma_resvs but that could cause expensive
>>>> +rollbacks. One option is to just mark the invalidated gpu_vmas 
>>>> with a bool
>>>> +which is inspected on the next exec function, when the gpu_vm's
>>>> +dma_resv and the object's dma_resv is held, and the invalidated
>>>> +gpu_vmas could then be put on the gpu_vm's list of invalidated
>>>> +gpu_vmas. That bool would then, although being per-gpu_vma 
>>>> formally be
>>>> +protected by the object's dma_resv.
>>>> +
>>>> +The exec function would then look something like the following:
>>>> +
>>>> +.. code-block:: C
>>>> +
>>>> +   read_lock(&gpu_vm->lock);
>>>> +
>>>> +   dma_resv_lock(&gpu_vm->resv);
>>>> +
>>>> +   // Shared object list is protected by the gpu_vm->lock.
>>>> +   for_each_shared_obj(gpu_vm, &obj) {
>>>> +        dma_resv_lock(&obj->resv);
>>>> + move_marked_gpu_vmas_to_revalidate_gpu_vma_list(obj, &gpu_vm);
>>>> +   }
>>>> +
>>>> +   for_each_gpu_vma_to_revalidate(gpu_vm, &gpu_vma) {
>>>> +        revalidate_gpu_vma(&gpu_vma);
>>>> +        remove_from_revalidate_list(&gpu_vma);
>>>> +   }
>>>> +
>>>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>>>> +   job_dma_fence = gpu_submit(&gpu_job));
>>>> +
>>>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>>>> +   for_each_shared_obj(gpu_vm, &obj)
>>>> +          add_dma_fence(job_dma_fence, &obj->resv);
>>>> +   dma_resv_unlock_all_resv_locks();
>>>> +
>>>> +   read_unlock(&gpu_vm->lock);
>>>> +
>>>> +And the corresponding shared-object aware eviction would look like:
>>>> +
>>>> +.. code-block:: C
>>>> +
>>>> +   obj = get_object_from_lru();
>>>> +
>>>> +   dma_resv_lock(obj->resv);
>>>> +   for_each_gpu_vma_of_obj(obj, &gpu_vma);
>>>> +        if (object_is_vm_local(obj))
>>>> + put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>>>> +        else
>>>> + mark_gpu_vma_for_revalidation(&gpu_vma);
>>>> +
>>>> +   add_dependencies(&eviction_job, &obj->resv);
>>>> +   job_dma_fence = gpu_submit(&eviction_job);
>>>> +   add_dma_fence(&obj->resv, job_dma_fence);
>>>> +
>>>> +   dma_resv_unlock(&obj->resv);
>>>> +   put_object(obj);
>>>> +
>>>> +Yet another option is to put the gpu_vmas to be invalidated on a 
>>>> separate
>>>> +gpu_vm list protected by a lower level lock that can be taken both 
>>>> at eviction
>>>> +time and at transfer-to-revalidate list time. The details are not in
>>>> +this document, but this for reference implemented in the Intel xe
>>>> +driver.
>>>> +
>>>> +Introducing userptr gpu_vmas
>>>> +============================
>>>> +
>>>> +A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer 
>>>> object to a
>>>> +GPU virtual address range, directly maps a CPU mm range of anonymous-
>>>> +or file page-cache pages.
>>>> +A very simple approach would be to just pin the pages using
>>>> +pin_user_pages() at bind time and unpin them at unbind time, but this
>>>> +creates a Denial-Of-Service vector since a single user-space process
>>>> +would be able to pin down all of system memory, which is not
>>>> +desirable. (For special use-cases and with proper accounting 
>>>> pinning might
>>>> +still be a desirable feature, though). What we need to do in the 
>>>> general case is
>>>> +to obtain a reference to the desired pages, make sure we are notified
>>>> +using a MMU notifier just before the CPU mm unmaps the pages, dirty
>>>> +them if they are not mapped read-only to the GPU, and then drop 
>>>> the reference.
>>>> +When we are notified by the MMU notifier that CPU mm is about to 
>>>> drop the
>>>> +pages, we need to stop GPU access to the pages,
>>>> +GPU page-table and make sure that before the next time the GPU 
>>>> tries to access
>>>> +whatever is now present in the CPU mm range, we unmap the old pages
>>>> +from the GPU page tables and repeat the process of obtaining new page
>>>> +references. Note that when the core mm decides to laundry pages, 
>>>> we get such
>>>> +an unmap MMU notification and can mark the pages dirty again 
>>>> before the
>>>> +next GPU access. We also get similar MMU notifications for NUMA 
>>>> accounting
>>>> +which the GPU driver doesn't really need to care about, but so far
>>>> +it's proven difficult to exclude certain notifications.
>>>> +
>>>> +Using a MMU notifier for device DMA (and other methods) is 
>>>> described in
>>>> +`this document
>>>> +<https://docs.kernel.org/core-api/pin_user_pages.html#case-3-mmu-notifier-registration-with-or-without-page-faulting-hardware>`_. 
>>>>
>>>> +
>>>> +Now the method of obtaining struct page references using
>>>> +get_user_pages() unfortunately can't be used under a dma_resv lock
>>>> +since that would violate the locking order of the dma_resv lock vs 
>>>> the
>>>> +mmap_lock that is grabbed when resolving a CPU pagefault. This 
>>>> means the gpu_vm's
>>>> +list of userptr gpu_vmas needs to be protected by an outer lock, 
>>>> and this
>>>> +is the first time we strictly need the gpu_vm->lock. While it was
>>>> +previously used also to protect the list of the gpu_vm's shared 
>>>> objects,
>>>> +we could in theory have used the gpu_vm->resv for that.
>>>> +
>>>> +The MMU interval seqlock for a userptr gpu_vma is used in the 
>>>> following
>>>> +way:
>>>> +
>>>> +.. code-block:: C
>>>> +
>>>> +   down_read(&gpu_vm->lock);
>>>> +
>>>> +   retry:
>>>> +
>>>> +   // Note: mmu_interval_read_begin() blocks until there is no
>>>> +   // invalidation notifier running anymore.
>>>> +   seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
>>>> +   if (seq != gpu_vma->saved_seq) {
>>>> +           obtain_new_page_pointers(&gpu_vma);
>>>> +       dma_resv_lock(&gpu_vm->resv);
>>>> +       put_gpu_vma_on_revalidate_list(&gpu_vma, &gpu_vm);
>>>> +       dma_resv_unlock(&gpu_vm->resv);
>>>> +       gpu_vma->saved_seq = seq;
>>>> +   }
>>>> +
>>>> +   // The usual revalidation goes here.
>>>> +
>>>> +   // Final userptr sequence validation may not happen before the
>>>> +   // submission dma_fence is added to the gpu_vm's resv, from the 
>>>> POW
>>>> +   // of the MMU invalidation notifier. Hence the
>>>> +   // userptr_notifier_lock that will make them appear atomic.
>>>> +
>>>> +   add_dependencies(&gpu_job, &gpu_vm->resv);
>>>> +   down_read(&gpu_vm->userptr_notifier_lock);
>>>> +   if (mmu_interval_read_retry(&gpu_vma->userptr_interval, 
>>>> gpu_vma->saved_seq)) {
>>>> +          up_read(&gpu_vm->userptr_notifier_lock);
>>>> +      goto retry;
>>>> +   }
>>>> +
>>>> +   job_dma_fence = gpu_submit(&gpu_job));
>>>> +
>>>> +   add_dma_fence(job_dma_fence, &gpu_vm->resv);
>>>> +
>>>> +   for_each_shared_obj(gpu_vm, &obj)
>>>> +          add_dma_fence(job_dma_fence, &obj->resv);
>>>> +
>>>> +   dma_resv_unlock_all_resv_locks();
>>>> +   up_read(&gpu_vm->userptr_notifier_lock);
>>>> +   up_read(&gpu_vm->lock);
>>>> +
>>>> +The code between ``mmu_interval_read_begin()`` and the
>>>> +``mmu_interval_read_retry()`` marks the read side critical section of
>>>> +what we call the ``userptr_seqlock``. In reality the gpu_vm's userptr
>>>> +gpu_vma list is looped through, and the check is done for *all* of 
>>>> its
>>>> +userptr gpu_vmas, although we only show a single one here.
>>>> +
>>>> +The userptr gpu_vma MMU invalidation notifier might be called fr
>>>>
>>>>
>>>> om
>>>> +reclaim context and, again to avoid locking order violations, we 
>>>> can't
>>>> +take any dma_resv lock nor the gpu_vm->lock from within it.
>>>> +
>>>> +.. code-block:: C
>>>> +
>>>> +  bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
>>>> +  {
>>>> +          // Make sure the exec function either sees the new sequence
>>>> +      // and backs off or we wait for the dma-fence:
>>>> +
>>>> + down_write(&gpu_vm->userptr_notifier_lock);
>>>> +      mmu_interval_set_seq(userptr_interval, cur_seq);
>>>> +      up_write(&gpu_vm->userptr_notifier_lock);
>>>> +
>>>> +      dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
>>>> +                        false, MAX_SCHEDULE_TIMEOUT);
>>>> +      return true;
>>>> +  }
>>>> +
>>>> +When this invalidation notifier returns, the GPU can no longer be
>>>> +accessing the old pages of the userptr gpu_vma and needs to redo 
>>>> the page-binding
>>>> +before a new GPU submission can succeed.
>>>> +
>>>> +Optimizing gpu_vma iteration
>>>> +----------------------------
>>>> +
>>>> +Iterating through all of a gpu_vm's userptr gpu_vmas to check the 
>>>> validity
>>>> +on each exec function may be very costly. There is a scheme to avoid
>>>> +this and only iterate through the userptr gpu_vmas that actually 
>>>> saw an
>>>> +invalidation notifier call since the last exec. T
>>>> +
>>>> +TODO: describe that scheme here. It's implemented in the xe driver.
>>>> +
>>>> +Locking for page-table updates at bind- and unbind time
>>>> +=======================================================
>>>> +
>>>> +TODO.
>>>> +
>>>> +Recoverable page-fault implications
>>>> +===================================
>>>> +
>>>> +TODO.
>>>> -- 
>>>> 2.41.0
>>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 38418 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
  2023-09-06  8:32         ` [Intel-xe] " Thomas Hellström
  (?)
@ 2023-09-06 11:09           ` Boris Brezillon
  -1 siblings, 0 replies; 45+ messages in thread
From: Boris Brezillon @ 2023-09-06 11:09 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Danilo Krummrich, Matthew Brost, Francois Dugast, linux-kernel,
	Oak Zeng, dri-devel, Rodrigo Vivi, intel-xe

On Wed, 6 Sep 2023 10:32:24 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:


> >>>> +Introducing external (or shared) buffer objects
> >>>> +===============================================
> >>>> +
> >>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
> >>>> +can't share their reservation object with a single gpu_vm, but 
> >>>> will rather
> >>>> +have a reservation object of their own. The shared objects bound to a
> >>>> +gpu_vm using one or many
> >>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
> >>>> +protected by the gpu_vm lock. One could in theory protect it also 
> >>>> with
> >>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is 
> >>>> typically
> >>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
> >>>> +the current locking helpers, that is typically not done. Also see
> >>>> +below for userptr gpu_vmas.
> >>>> +
> >>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
> >>>> +object, but we can no longer be certain that we hold the gpu_vm's
> >>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we  
> >>> I need to think a bit more about locking of extobj and evicted 
> >>> object tracking
> >>> in the case of processing 'drm_gpuva_ops' directly through callbacks 
> >>> within the
> >>> fence signalling critical path as mentioend in [1].
> >>>
> >>> In order to support that, we'd need to protect extobjs with a 
> >>> separate lock,
> >>> and while iterating extobjs to acquire the dma-resv lock drop the 
> >>> lock within
> >>> the loop before we actually acquire the dma-resv lock. Maple tree 
> >>> supports that
> >>> already and this can be fully done within the GPUVA manager; no need 
> >>> for the
> >>> driver to care about that.  
> >>
> >> So do I understand correctly that this because you want to update the 
> >> gpuvm state while operations are progressing asynchronously?
> >>
> >> If so, I wonder whether that could really be done? For example to 
> >> allocate enough memory for page-tables etc, you need to know the 
> >> details of the operations at IOCTL execution time, and to know the 
> >> details you need to know the state from the previous operation?  
> >
> >
> > Right, sync and async bind can't run fully concurrently, but you could 
> > "inject" a
> > sync one between two async ones such that the sync ones executed from 
> > the IOCTL
> > directly while async execution is stalled meanwhile. This would be 
> > possible because
> > the actual drm_gpuva_ops would be calculated within the async 
> > execution path rather
> > than in the IOCTL. But yes, page-table management must be desinged to 
> > support that.

FWIW, the panthor driver is designed this way (note that I'm not
supporting GEM eviction yet, so there might be subtleties I missed).

> 
> OK, well one of the main motivations for Xe is to be able to pipeline 
> interleaving binds and execs if needed, like so:
> 
> - Bind vmas for scene 1.
> - Submit scene 1.
> - Unbind vmas for scene 1.
> - Bind vmas for scene 2.
> - Submit scene 2.
> - Unbind vmas for scene 2.
> 
> And being able to *submit* all of the above while the async binding of 
> vmas for scene (step 1) has not yet completed.
> I can't really see how this could be done, while obeying dma-fence 
> rules, unless state is updated synchronously while submitting?

The idea in this case is to detect when a GPU job dependency is a
VM_BIND out-fence, turn drm_sched_fence->parent into an
xxx_vm_bind_job_fence object that's holding the GEM that's about to be
mapped (AFAICT, we don't need to do anything for unmap operations), and
then add our GPU job fence to this BO. This should not only guarantee
that the GEMs we depend on are mapped before the GPU job is executed
(the fence wait does that), but also that such yet-to-be-mapped GEMs
won't be evicted just after they've been mapped and before the GPU had
a chance to execute (unless I'm missing something, adding our GPU job
fence to the BO being targeted by a pending VM_BIND(async,map) operation
solves this problem).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06 11:09           ` Boris Brezillon
  0 siblings, 0 replies; 45+ messages in thread
From: Boris Brezillon @ 2023-09-06 11:09 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Matthew Brost, Francois Dugast, linux-kernel, dri-devel,
	Danilo Krummrich, Oak Zeng, Rodrigo Vivi, intel-xe

On Wed, 6 Sep 2023 10:32:24 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:


> >>>> +Introducing external (or shared) buffer objects
> >>>> +===============================================
> >>>> +
> >>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
> >>>> +can't share their reservation object with a single gpu_vm, but 
> >>>> will rather
> >>>> +have a reservation object of their own. The shared objects bound to a
> >>>> +gpu_vm using one or many
> >>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
> >>>> +protected by the gpu_vm lock. One could in theory protect it also 
> >>>> with
> >>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is 
> >>>> typically
> >>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
> >>>> +the current locking helpers, that is typically not done. Also see
> >>>> +below for userptr gpu_vmas.
> >>>> +
> >>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
> >>>> +object, but we can no longer be certain that we hold the gpu_vm's
> >>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we  
> >>> I need to think a bit more about locking of extobj and evicted 
> >>> object tracking
> >>> in the case of processing 'drm_gpuva_ops' directly through callbacks 
> >>> within the
> >>> fence signalling critical path as mentioend in [1].
> >>>
> >>> In order to support that, we'd need to protect extobjs with a 
> >>> separate lock,
> >>> and while iterating extobjs to acquire the dma-resv lock drop the 
> >>> lock within
> >>> the loop before we actually acquire the dma-resv lock. Maple tree 
> >>> supports that
> >>> already and this can be fully done within the GPUVA manager; no need 
> >>> for the
> >>> driver to care about that.  
> >>
> >> So do I understand correctly that this because you want to update the 
> >> gpuvm state while operations are progressing asynchronously?
> >>
> >> If so, I wonder whether that could really be done? For example to 
> >> allocate enough memory for page-tables etc, you need to know the 
> >> details of the operations at IOCTL execution time, and to know the 
> >> details you need to know the state from the previous operation?  
> >
> >
> > Right, sync and async bind can't run fully concurrently, but you could 
> > "inject" a
> > sync one between two async ones such that the sync ones executed from 
> > the IOCTL
> > directly while async execution is stalled meanwhile. This would be 
> > possible because
> > the actual drm_gpuva_ops would be calculated within the async 
> > execution path rather
> > than in the IOCTL. But yes, page-table management must be desinged to 
> > support that.

FWIW, the panthor driver is designed this way (note that I'm not
supporting GEM eviction yet, so there might be subtleties I missed).

> 
> OK, well one of the main motivations for Xe is to be able to pipeline 
> interleaving binds and execs if needed, like so:
> 
> - Bind vmas for scene 1.
> - Submit scene 1.
> - Unbind vmas for scene 1.
> - Bind vmas for scene 2.
> - Submit scene 2.
> - Unbind vmas for scene 2.
> 
> And being able to *submit* all of the above while the async binding of 
> vmas for scene (step 1) has not yet completed.
> I can't really see how this could be done, while obeying dma-fence 
> rules, unless state is updated synchronously while submitting?

The idea in this case is to detect when a GPU job dependency is a
VM_BIND out-fence, turn drm_sched_fence->parent into an
xxx_vm_bind_job_fence object that's holding the GEM that's about to be
mapped (AFAICT, we don't need to do anything for unmap operations), and
then add our GPU job fence to this BO. This should not only guarantee
that the GEMs we depend on are mapped before the GPU job is executed
(the fence wait does that), but also that such yet-to-be-mapped GEMs
won't be evicted just after they've been mapped and before the GPU had
a chance to execute (unless I'm missing something, adding our GPU job
fence to the BO being targeted by a pending VM_BIND(async,map) operation
solves this problem).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-xe] [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06 11:09           ` Boris Brezillon
  0 siblings, 0 replies; 45+ messages in thread
From: Boris Brezillon @ 2023-09-06 11:09 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Francois Dugast, linux-kernel, dri-devel, Danilo Krummrich,
	Rodrigo Vivi, intel-xe

On Wed, 6 Sep 2023 10:32:24 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:


> >>>> +Introducing external (or shared) buffer objects
> >>>> +===============================================
> >>>> +
> >>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
> >>>> +can't share their reservation object with a single gpu_vm, but 
> >>>> will rather
> >>>> +have a reservation object of their own. The shared objects bound to a
> >>>> +gpu_vm using one or many
> >>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
> >>>> +protected by the gpu_vm lock. One could in theory protect it also 
> >>>> with
> >>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is 
> >>>> typically
> >>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
> >>>> +the current locking helpers, that is typically not done. Also see
> >>>> +below for userptr gpu_vmas.
> >>>> +
> >>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
> >>>> +object, but we can no longer be certain that we hold the gpu_vm's
> >>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we  
> >>> I need to think a bit more about locking of extobj and evicted 
> >>> object tracking
> >>> in the case of processing 'drm_gpuva_ops' directly through callbacks 
> >>> within the
> >>> fence signalling critical path as mentioend in [1].
> >>>
> >>> In order to support that, we'd need to protect extobjs with a 
> >>> separate lock,
> >>> and while iterating extobjs to acquire the dma-resv lock drop the 
> >>> lock within
> >>> the loop before we actually acquire the dma-resv lock. Maple tree 
> >>> supports that
> >>> already and this can be fully done within the GPUVA manager; no need 
> >>> for the
> >>> driver to care about that.  
> >>
> >> So do I understand correctly that this because you want to update the 
> >> gpuvm state while operations are progressing asynchronously?
> >>
> >> If so, I wonder whether that could really be done? For example to 
> >> allocate enough memory for page-tables etc, you need to know the 
> >> details of the operations at IOCTL execution time, and to know the 
> >> details you need to know the state from the previous operation?  
> >
> >
> > Right, sync and async bind can't run fully concurrently, but you could 
> > "inject" a
> > sync one between two async ones such that the sync ones executed from 
> > the IOCTL
> > directly while async execution is stalled meanwhile. This would be 
> > possible because
> > the actual drm_gpuva_ops would be calculated within the async 
> > execution path rather
> > than in the IOCTL. But yes, page-table management must be desinged to 
> > support that.

FWIW, the panthor driver is designed this way (note that I'm not
supporting GEM eviction yet, so there might be subtleties I missed).

> 
> OK, well one of the main motivations for Xe is to be able to pipeline 
> interleaving binds and execs if needed, like so:
> 
> - Bind vmas for scene 1.
> - Submit scene 1.
> - Unbind vmas for scene 1.
> - Bind vmas for scene 2.
> - Submit scene 2.
> - Unbind vmas for scene 2.
> 
> And being able to *submit* all of the above while the async binding of 
> vmas for scene (step 1) has not yet completed.
> I can't really see how this could be done, while obeying dma-fence 
> rules, unless state is updated synchronously while submitting?

The idea in this case is to detect when a GPU job dependency is a
VM_BIND out-fence, turn drm_sched_fence->parent into an
xxx_vm_bind_job_fence object that's holding the GEM that's about to be
mapped (AFAICT, we don't need to do anything for unmap operations), and
then add our GPU job fence to this BO. This should not only guarantee
that the GEMs we depend on are mapped before the GPU job is executed
(the fence wait does that), but also that such yet-to-be-mapped GEMs
won't be evicted just after they've been mapped and before the GPU had
a chance to execute (unless I'm missing something, adding our GPU job
fence to the BO being targeted by a pending VM_BIND(async,map) operation
solves this problem).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
  2023-09-06 11:09           ` Boris Brezillon
  (?)
@ 2023-09-06 11:57             ` Thomas Hellström
  -1 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-09-06 11:57 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Danilo Krummrich, Matthew Brost, Francois Dugast, linux-kernel,
	Oak Zeng, dri-devel, Rodrigo Vivi, intel-xe

Hi, Boris

On 9/6/23 13:09, Boris Brezillon wrote:
> On Wed, 6 Sep 2023 10:32:24 +0200
> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>
>
>>>>>> +Introducing external (or shared) buffer objects
>>>>>> +===============================================
>>>>>> +
>>>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>>>>>> +can't share their reservation object with a single gpu_vm, but
>>>>>> will rather
>>>>>> +have a reservation object of their own. The shared objects bound to a
>>>>>> +gpu_vm using one or many
>>>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>>>>>> +protected by the gpu_vm lock. One could in theory protect it also
>>>>>> with
>>>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is
>>>>>> typically
>>>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>>>>>> +the current locking helpers, that is typically not done. Also see
>>>>>> +below for userptr gpu_vmas.
>>>>>> +
>>>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>>>>>> +object, but we can no longer be certain that we hold the gpu_vm's
>>>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
>>>>> I need to think a bit more about locking of extobj and evicted
>>>>> object tracking
>>>>> in the case of processing 'drm_gpuva_ops' directly through callbacks
>>>>> within the
>>>>> fence signalling critical path as mentioend in [1].
>>>>>
>>>>> In order to support that, we'd need to protect extobjs with a
>>>>> separate lock,
>>>>> and while iterating extobjs to acquire the dma-resv lock drop the
>>>>> lock within
>>>>> the loop before we actually acquire the dma-resv lock. Maple tree
>>>>> supports that
>>>>> already and this can be fully done within the GPUVA manager; no need
>>>>> for the
>>>>> driver to care about that.
>>>> So do I understand correctly that this because you want to update the
>>>> gpuvm state while operations are progressing asynchronously?
>>>>
>>>> If so, I wonder whether that could really be done? For example to
>>>> allocate enough memory for page-tables etc, you need to know the
>>>> details of the operations at IOCTL execution time, and to know the
>>>> details you need to know the state from the previous operation?
>>>
>>> Right, sync and async bind can't run fully concurrently, but you could
>>> "inject" a
>>> sync one between two async ones such that the sync ones executed from
>>> the IOCTL
>>> directly while async execution is stalled meanwhile. This would be
>>> possible because
>>> the actual drm_gpuva_ops would be calculated within the async
>>> execution path rather
>>> than in the IOCTL. But yes, page-table management must be desinged to
>>> support that.
> FWIW, the panthor driver is designed this way (note that I'm not
> supporting GEM eviction yet, so there might be subtleties I missed).

The problem is that once you've published your VM_BIND out-fence, any 
code path required to signal that fence may notallocate memory nor or 
grab any locks that allows allocating memory while held including 
dma_resv locks, and that means all required page-table memory needs to 
be allocated synchronously in the IOCTL, and all evicted bos need to be 
made resident in the IOCTL, and at least in the xe driver the amount of 
memory we need to allocate depends on the vm state, so we can't really 
update the vm state asynchronously either.

But as long as any async binding work required for signalling the 
VM_BIND out-fence is properly annotated with 
dma_fence_begin_signalling() and dma_fence_end_signalling() and there 
aren't any lockdep splats, things should be good. It would trigger on 
both memory allocation and attempts to grab a dma_resv lock.


>
>> OK, well one of the main motivations for Xe is to be able to pipeline
>> interleaving binds and execs if needed, like so:
>>
>> - Bind vmas for scene 1.
>> - Submit scene 1.
>> - Unbind vmas for scene 1.
>> - Bind vmas for scene 2.
>> - Submit scene 2.
>> - Unbind vmas for scene 2.
>>
>> And being able to *submit* all of the above while the async binding of
>> vmas for scene (step 1) has not yet completed.
>> I can't really see how this could be done, while obeying dma-fence
>> rules, unless state is updated synchronously while submitting?
> The idea in this case is to detect when a GPU job dependency is a
> VM_BIND out-fence, turn drm_sched_fence->parent into an
> xxx_vm_bind_job_fence object that's holding the GEM that's about to be
> mapped (AFAICT, we don't need to do anything for unmap operations), and
> then add our GPU job fence to this BO. This should not only guarantee
> that the GEMs we depend on are mapped before the GPU job is executed
> (the fence wait does that), but also that such yet-to-be-mapped GEMs
> won't be evicted just after they've been mapped and before the GPU had
> a chance to execute (unless I'm missing something, adding our GPU job
> fence to the BO being targeted by a pending VM_BIND(async,map) operation
> solves this problem).

Yes, we're essentially doing the same. The issue here is that when we, 
for example *submit* Bind vmas for scene 2,
we need to know how much page-table memory to allocate, and what BOs to 
make resident to be able to publish the out-fence. That means we need to 
know what the VM state would look like at the end of "Unbind vmas for 
scene 1". If the VM state is updated at submission time, that's all ok 
but if it's updated at execution time, we'd have to guess what resources 
to pre-allocate.

/Thomas




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06 11:57             ` Thomas Hellström
  0 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-09-06 11:57 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Matthew Brost, Francois Dugast, linux-kernel, dri-devel,
	Danilo Krummrich, Oak Zeng, Rodrigo Vivi, intel-xe

Hi, Boris

On 9/6/23 13:09, Boris Brezillon wrote:
> On Wed, 6 Sep 2023 10:32:24 +0200
> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>
>
>>>>>> +Introducing external (or shared) buffer objects
>>>>>> +===============================================
>>>>>> +
>>>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>>>>>> +can't share their reservation object with a single gpu_vm, but
>>>>>> will rather
>>>>>> +have a reservation object of their own. The shared objects bound to a
>>>>>> +gpu_vm using one or many
>>>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>>>>>> +protected by the gpu_vm lock. One could in theory protect it also
>>>>>> with
>>>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is
>>>>>> typically
>>>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>>>>>> +the current locking helpers, that is typically not done. Also see
>>>>>> +below for userptr gpu_vmas.
>>>>>> +
>>>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>>>>>> +object, but we can no longer be certain that we hold the gpu_vm's
>>>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
>>>>> I need to think a bit more about locking of extobj and evicted
>>>>> object tracking
>>>>> in the case of processing 'drm_gpuva_ops' directly through callbacks
>>>>> within the
>>>>> fence signalling critical path as mentioend in [1].
>>>>>
>>>>> In order to support that, we'd need to protect extobjs with a
>>>>> separate lock,
>>>>> and while iterating extobjs to acquire the dma-resv lock drop the
>>>>> lock within
>>>>> the loop before we actually acquire the dma-resv lock. Maple tree
>>>>> supports that
>>>>> already and this can be fully done within the GPUVA manager; no need
>>>>> for the
>>>>> driver to care about that.
>>>> So do I understand correctly that this because you want to update the
>>>> gpuvm state while operations are progressing asynchronously?
>>>>
>>>> If so, I wonder whether that could really be done? For example to
>>>> allocate enough memory for page-tables etc, you need to know the
>>>> details of the operations at IOCTL execution time, and to know the
>>>> details you need to know the state from the previous operation?
>>>
>>> Right, sync and async bind can't run fully concurrently, but you could
>>> "inject" a
>>> sync one between two async ones such that the sync ones executed from
>>> the IOCTL
>>> directly while async execution is stalled meanwhile. This would be
>>> possible because
>>> the actual drm_gpuva_ops would be calculated within the async
>>> execution path rather
>>> than in the IOCTL. But yes, page-table management must be desinged to
>>> support that.
> FWIW, the panthor driver is designed this way (note that I'm not
> supporting GEM eviction yet, so there might be subtleties I missed).

The problem is that once you've published your VM_BIND out-fence, any 
code path required to signal that fence may notallocate memory nor or 
grab any locks that allows allocating memory while held including 
dma_resv locks, and that means all required page-table memory needs to 
be allocated synchronously in the IOCTL, and all evicted bos need to be 
made resident in the IOCTL, and at least in the xe driver the amount of 
memory we need to allocate depends on the vm state, so we can't really 
update the vm state asynchronously either.

But as long as any async binding work required for signalling the 
VM_BIND out-fence is properly annotated with 
dma_fence_begin_signalling() and dma_fence_end_signalling() and there 
aren't any lockdep splats, things should be good. It would trigger on 
both memory allocation and attempts to grab a dma_resv lock.


>
>> OK, well one of the main motivations for Xe is to be able to pipeline
>> interleaving binds and execs if needed, like so:
>>
>> - Bind vmas for scene 1.
>> - Submit scene 1.
>> - Unbind vmas for scene 1.
>> - Bind vmas for scene 2.
>> - Submit scene 2.
>> - Unbind vmas for scene 2.
>>
>> And being able to *submit* all of the above while the async binding of
>> vmas for scene (step 1) has not yet completed.
>> I can't really see how this could be done, while obeying dma-fence
>> rules, unless state is updated synchronously while submitting?
> The idea in this case is to detect when a GPU job dependency is a
> VM_BIND out-fence, turn drm_sched_fence->parent into an
> xxx_vm_bind_job_fence object that's holding the GEM that's about to be
> mapped (AFAICT, we don't need to do anything for unmap operations), and
> then add our GPU job fence to this BO. This should not only guarantee
> that the GEMs we depend on are mapped before the GPU job is executed
> (the fence wait does that), but also that such yet-to-be-mapped GEMs
> won't be evicted just after they've been mapped and before the GPU had
> a chance to execute (unless I'm missing something, adding our GPU job
> fence to the BO being targeted by a pending VM_BIND(async,map) operation
> solves this problem).

Yes, we're essentially doing the same. The issue here is that when we, 
for example *submit* Bind vmas for scene 2,
we need to know how much page-table memory to allocate, and what BOs to 
make resident to be able to publish the out-fence. That means we need to 
know what the VM state would look like at the end of "Unbind vmas for 
scene 1". If the VM state is updated at submission time, that's all ok 
but if it's updated at execution time, we'd have to guess what resources 
to pre-allocate.

/Thomas




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-xe] [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06 11:57             ` Thomas Hellström
  0 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-09-06 11:57 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Francois Dugast, linux-kernel, dri-devel, Danilo Krummrich,
	Rodrigo Vivi, intel-xe

Hi, Boris

On 9/6/23 13:09, Boris Brezillon wrote:
> On Wed, 6 Sep 2023 10:32:24 +0200
> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>
>
>>>>>> +Introducing external (or shared) buffer objects
>>>>>> +===============================================
>>>>>> +
>>>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>>>>>> +can't share their reservation object with a single gpu_vm, but
>>>>>> will rather
>>>>>> +have a reservation object of their own. The shared objects bound to a
>>>>>> +gpu_vm using one or many
>>>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>>>>>> +protected by the gpu_vm lock. One could in theory protect it also
>>>>>> with
>>>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is
>>>>>> typically
>>>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>>>>>> +the current locking helpers, that is typically not done. Also see
>>>>>> +below for userptr gpu_vmas.
>>>>>> +
>>>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>>>>>> +object, but we can no longer be certain that we hold the gpu_vm's
>>>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
>>>>> I need to think a bit more about locking of extobj and evicted
>>>>> object tracking
>>>>> in the case of processing 'drm_gpuva_ops' directly through callbacks
>>>>> within the
>>>>> fence signalling critical path as mentioend in [1].
>>>>>
>>>>> In order to support that, we'd need to protect extobjs with a
>>>>> separate lock,
>>>>> and while iterating extobjs to acquire the dma-resv lock drop the
>>>>> lock within
>>>>> the loop before we actually acquire the dma-resv lock. Maple tree
>>>>> supports that
>>>>> already and this can be fully done within the GPUVA manager; no need
>>>>> for the
>>>>> driver to care about that.
>>>> So do I understand correctly that this because you want to update the
>>>> gpuvm state while operations are progressing asynchronously?
>>>>
>>>> If so, I wonder whether that could really be done? For example to
>>>> allocate enough memory for page-tables etc, you need to know the
>>>> details of the operations at IOCTL execution time, and to know the
>>>> details you need to know the state from the previous operation?
>>>
>>> Right, sync and async bind can't run fully concurrently, but you could
>>> "inject" a
>>> sync one between two async ones such that the sync ones executed from
>>> the IOCTL
>>> directly while async execution is stalled meanwhile. This would be
>>> possible because
>>> the actual drm_gpuva_ops would be calculated within the async
>>> execution path rather
>>> than in the IOCTL. But yes, page-table management must be desinged to
>>> support that.
> FWIW, the panthor driver is designed this way (note that I'm not
> supporting GEM eviction yet, so there might be subtleties I missed).

The problem is that once you've published your VM_BIND out-fence, any 
code path required to signal that fence may notallocate memory nor or 
grab any locks that allows allocating memory while held including 
dma_resv locks, and that means all required page-table memory needs to 
be allocated synchronously in the IOCTL, and all evicted bos need to be 
made resident in the IOCTL, and at least in the xe driver the amount of 
memory we need to allocate depends on the vm state, so we can't really 
update the vm state asynchronously either.

But as long as any async binding work required for signalling the 
VM_BIND out-fence is properly annotated with 
dma_fence_begin_signalling() and dma_fence_end_signalling() and there 
aren't any lockdep splats, things should be good. It would trigger on 
both memory allocation and attempts to grab a dma_resv lock.


>
>> OK, well one of the main motivations for Xe is to be able to pipeline
>> interleaving binds and execs if needed, like so:
>>
>> - Bind vmas for scene 1.
>> - Submit scene 1.
>> - Unbind vmas for scene 1.
>> - Bind vmas for scene 2.
>> - Submit scene 2.
>> - Unbind vmas for scene 2.
>>
>> And being able to *submit* all of the above while the async binding of
>> vmas for scene (step 1) has not yet completed.
>> I can't really see how this could be done, while obeying dma-fence
>> rules, unless state is updated synchronously while submitting?
> The idea in this case is to detect when a GPU job dependency is a
> VM_BIND out-fence, turn drm_sched_fence->parent into an
> xxx_vm_bind_job_fence object that's holding the GEM that's about to be
> mapped (AFAICT, we don't need to do anything for unmap operations), and
> then add our GPU job fence to this BO. This should not only guarantee
> that the GEMs we depend on are mapped before the GPU job is executed
> (the fence wait does that), but also that such yet-to-be-mapped GEMs
> won't be evicted just after they've been mapped and before the GPU had
> a chance to execute (unless I'm missing something, adding our GPU job
> fence to the BO being targeted by a pending VM_BIND(async,map) operation
> solves this problem).

Yes, we're essentially doing the same. The issue here is that when we, 
for example *submit* Bind vmas for scene 2,
we need to know how much page-table memory to allocate, and what BOs to 
make resident to be able to publish the out-fence. That means we need to 
know what the VM state would look like at the end of "Unbind vmas for 
scene 1". If the VM state is updated at submission time, that's all ok 
but if it's updated at execution time, we'd have to guess what resources 
to pre-allocate.

/Thomas




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
  2023-09-06 11:57             ` Thomas Hellström
  (?)
@ 2023-09-06 13:00               ` Boris Brezillon
  -1 siblings, 0 replies; 45+ messages in thread
From: Boris Brezillon @ 2023-09-06 13:00 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Danilo Krummrich, Matthew Brost, Francois Dugast, linux-kernel,
	Oak Zeng, dri-devel, Rodrigo Vivi, intel-xe

On Wed, 6 Sep 2023 13:57:03 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:

> Hi, Boris
> 
> On 9/6/23 13:09, Boris Brezillon wrote:
> > On Wed, 6 Sep 2023 10:32:24 +0200
> > Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >
> >  
> >>>>>> +Introducing external (or shared) buffer objects
> >>>>>> +===============================================
> >>>>>> +
> >>>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
> >>>>>> +can't share their reservation object with a single gpu_vm, but
> >>>>>> will rather
> >>>>>> +have a reservation object of their own. The shared objects bound to a
> >>>>>> +gpu_vm using one or many
> >>>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
> >>>>>> +protected by the gpu_vm lock. One could in theory protect it also
> >>>>>> with
> >>>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is
> >>>>>> typically
> >>>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
> >>>>>> +the current locking helpers, that is typically not done. Also see
> >>>>>> +below for userptr gpu_vmas.
> >>>>>> +
> >>>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
> >>>>>> +object, but we can no longer be certain that we hold the gpu_vm's
> >>>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we  
> >>>>> I need to think a bit more about locking of extobj and evicted
> >>>>> object tracking
> >>>>> in the case of processing 'drm_gpuva_ops' directly through callbacks
> >>>>> within the
> >>>>> fence signalling critical path as mentioend in [1].
> >>>>>
> >>>>> In order to support that, we'd need to protect extobjs with a
> >>>>> separate lock,
> >>>>> and while iterating extobjs to acquire the dma-resv lock drop the
> >>>>> lock within
> >>>>> the loop before we actually acquire the dma-resv lock. Maple tree
> >>>>> supports that
> >>>>> already and this can be fully done within the GPUVA manager; no need
> >>>>> for the
> >>>>> driver to care about that.  
> >>>> So do I understand correctly that this because you want to update the
> >>>> gpuvm state while operations are progressing asynchronously?
> >>>>
> >>>> If so, I wonder whether that could really be done? For example to
> >>>> allocate enough memory for page-tables etc, you need to know the
> >>>> details of the operations at IOCTL execution time, and to know the
> >>>> details you need to know the state from the previous operation?  
> >>>
> >>> Right, sync and async bind can't run fully concurrently, but you could
> >>> "inject" a
> >>> sync one between two async ones such that the sync ones executed from
> >>> the IOCTL
> >>> directly while async execution is stalled meanwhile. This would be
> >>> possible because
> >>> the actual drm_gpuva_ops would be calculated within the async
> >>> execution path rather
> >>> than in the IOCTL. But yes, page-table management must be desinged to
> >>> support that.  
> > FWIW, the panthor driver is designed this way (note that I'm not
> > supporting GEM eviction yet, so there might be subtleties I missed).  
> 
> The problem is that once you've published your VM_BIND out-fence, any 
> code path required to signal that fence may notallocate memory nor or 
> grab any locks that allows allocating memory while held including 
> dma_resv locks, and that means all required page-table memory needs to 
> be allocated synchronously in the IOCTL,

Yep, that's already what I do, by over-provisioning for the worst case
scenario (page table tree is empty), and returning unused pages after
the operation is done.

> and all evicted bos need to be 
> made resident in the IOCTL,

Yep, I'm pinning memory to BOs in that path too.

> and at least in the xe driver the amount of 
> memory we need to allocate depends on the vm state, so we can't really 
> update the vm state asynchronously either.

For Mali, we can calculate the maximum amount of pages we'll need for a
MAP operation, by assuming the page table is empty. Then it's just a
matter of returning unused pages to a fast-alloc pool so we can
speed-up further page table allocations (we're using a kmem_cache here,
since the page table update is done by the CPU and memory is shared on
Arm, but there's no reason you can't have your own cache
implementation).

> 
> But as long as any async binding work required for signalling the 
> VM_BIND out-fence is properly annotated with 
> dma_fence_begin_signalling() and dma_fence_end_signalling() and there 
> aren't any lockdep splats, things should be good. It would trigger on 
> both memory allocation and attempts to grab a dma_resv lock.

I have dma_fence_{begin,end}_signalling() annotations in the
::run_job() path, and no lockdep complaint spotted so far.

> 
> 
> >  
> >> OK, well one of the main motivations for Xe is to be able to pipeline
> >> interleaving binds and execs if needed, like so:
> >>
> >> - Bind vmas for scene 1.
> >> - Submit scene 1.
> >> - Unbind vmas for scene 1.
> >> - Bind vmas for scene 2.
> >> - Submit scene 2.
> >> - Unbind vmas for scene 2.
> >>
> >> And being able to *submit* all of the above while the async binding of
> >> vmas for scene (step 1) has not yet completed.
> >> I can't really see how this could be done, while obeying dma-fence
> >> rules, unless state is updated synchronously while submitting?  
> > The idea in this case is to detect when a GPU job dependency is a
> > VM_BIND out-fence, turn drm_sched_fence->parent into an
> > xxx_vm_bind_job_fence object that's holding the GEM that's about to be
> > mapped (AFAICT, we don't need to do anything for unmap operations), and
> > then add our GPU job fence to this BO. This should not only guarantee
> > that the GEMs we depend on are mapped before the GPU job is executed
> > (the fence wait does that), but also that such yet-to-be-mapped GEMs
> > won't be evicted just after they've been mapped and before the GPU had
> > a chance to execute (unless I'm missing something, adding our GPU job
> > fence to the BO being targeted by a pending VM_BIND(async,map) operation
> > solves this problem).

It's not exactly that, because we'd need to add a GEMs of all the
pending VM_BIND(map) jobs that come before the expressed dependency, not
just the one attached to the dependency itself. But after chatting with
Danilo, I realized we might not even need to track the GEMs being
mapped at the fence level if we call drm_gpuva_extobj_insert() in the
ioctl(VM_BIND) path:

- drm_gpuva_extobj_insert() will make sure the GEM is added to
  the ext-object map even before it's actually mapped to the VM (for
  private GEMs, it doesn't matter, because they are using the VM resv,
  so any private GEM mapped will automatically receive the VM resv
  updates).

Now, when a GPU job is queued, we do all the VM GEM preparation, which
includes the following steps:

- drm_gpuva_manager_validate() will make already-bound-but-evicted GEMs
  resident
- Iterate over all ext-objs to add our fence (I'm skipping the slot
  reservation step that's implied). Because drm_gpuva_extobj_insert()
  was called early, we also get all the GEMs that are not yet mapped,
  but are about to be mapped. This means they won't be evicted until
  after our job is done
- add our fence to the VM resv

Unless I'm missing something, this should guarantee that all GEMs are
resident and mapped when the job is executed.

> 
> Yes, we're essentially doing the same. The issue here is that when we, 
> for example *submit* Bind vmas for scene 2,
> we need to know how much page-table memory to allocate,

This is solved with over-provisioning in our case.

> and what BOs to 
> make resident to be able to publish the out-fence.

That's basically what Danilo's latest gpuva_mgr patchset tries to
provide generic helpers for, by exposing functions to iterate over all
evicted GEMs (so we can make them resident) and adding a way to add
fences to all GEMs currently bound to the VM. That leaves external GEMs
that are about to be mapped, which, I think, is addressed by the
solution detailed above.

> That means we need to 
> know what the VM state would look like at the end of "Unbind vmas for 
> scene 1".

Not necessarily, as long as you know all the GEMs that are currently
mapped and those that are about to be mapped. The extobj set provides
exactly that for external GEMs.

> If the VM state is updated at submission time, that's all ok 
> but if it's updated at execution time, we'd have to guess what resources 
> to pre-allocate.

As long as you have enough resources pre-allocated to do the VM update
(not saying this is easy to guess on Intel, but it's doable on Mali,
and the page table caching makes over-provisioning not too bad, as long
as we limit the number of in-flight VM_BIND jobs).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06 13:00               ` Boris Brezillon
  0 siblings, 0 replies; 45+ messages in thread
From: Boris Brezillon @ 2023-09-06 13:00 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Matthew Brost, Francois Dugast, linux-kernel, dri-devel,
	Danilo Krummrich, Oak Zeng, Rodrigo Vivi, intel-xe

On Wed, 6 Sep 2023 13:57:03 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:

> Hi, Boris
> 
> On 9/6/23 13:09, Boris Brezillon wrote:
> > On Wed, 6 Sep 2023 10:32:24 +0200
> > Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >
> >  
> >>>>>> +Introducing external (or shared) buffer objects
> >>>>>> +===============================================
> >>>>>> +
> >>>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
> >>>>>> +can't share their reservation object with a single gpu_vm, but
> >>>>>> will rather
> >>>>>> +have a reservation object of their own. The shared objects bound to a
> >>>>>> +gpu_vm using one or many
> >>>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
> >>>>>> +protected by the gpu_vm lock. One could in theory protect it also
> >>>>>> with
> >>>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is
> >>>>>> typically
> >>>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
> >>>>>> +the current locking helpers, that is typically not done. Also see
> >>>>>> +below for userptr gpu_vmas.
> >>>>>> +
> >>>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
> >>>>>> +object, but we can no longer be certain that we hold the gpu_vm's
> >>>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we  
> >>>>> I need to think a bit more about locking of extobj and evicted
> >>>>> object tracking
> >>>>> in the case of processing 'drm_gpuva_ops' directly through callbacks
> >>>>> within the
> >>>>> fence signalling critical path as mentioend in [1].
> >>>>>
> >>>>> In order to support that, we'd need to protect extobjs with a
> >>>>> separate lock,
> >>>>> and while iterating extobjs to acquire the dma-resv lock drop the
> >>>>> lock within
> >>>>> the loop before we actually acquire the dma-resv lock. Maple tree
> >>>>> supports that
> >>>>> already and this can be fully done within the GPUVA manager; no need
> >>>>> for the
> >>>>> driver to care about that.  
> >>>> So do I understand correctly that this because you want to update the
> >>>> gpuvm state while operations are progressing asynchronously?
> >>>>
> >>>> If so, I wonder whether that could really be done? For example to
> >>>> allocate enough memory for page-tables etc, you need to know the
> >>>> details of the operations at IOCTL execution time, and to know the
> >>>> details you need to know the state from the previous operation?  
> >>>
> >>> Right, sync and async bind can't run fully concurrently, but you could
> >>> "inject" a
> >>> sync one between two async ones such that the sync ones executed from
> >>> the IOCTL
> >>> directly while async execution is stalled meanwhile. This would be
> >>> possible because
> >>> the actual drm_gpuva_ops would be calculated within the async
> >>> execution path rather
> >>> than in the IOCTL. But yes, page-table management must be desinged to
> >>> support that.  
> > FWIW, the panthor driver is designed this way (note that I'm not
> > supporting GEM eviction yet, so there might be subtleties I missed).  
> 
> The problem is that once you've published your VM_BIND out-fence, any 
> code path required to signal that fence may notallocate memory nor or 
> grab any locks that allows allocating memory while held including 
> dma_resv locks, and that means all required page-table memory needs to 
> be allocated synchronously in the IOCTL,

Yep, that's already what I do, by over-provisioning for the worst case
scenario (page table tree is empty), and returning unused pages after
the operation is done.

> and all evicted bos need to be 
> made resident in the IOCTL,

Yep, I'm pinning memory to BOs in that path too.

> and at least in the xe driver the amount of 
> memory we need to allocate depends on the vm state, so we can't really 
> update the vm state asynchronously either.

For Mali, we can calculate the maximum amount of pages we'll need for a
MAP operation, by assuming the page table is empty. Then it's just a
matter of returning unused pages to a fast-alloc pool so we can
speed-up further page table allocations (we're using a kmem_cache here,
since the page table update is done by the CPU and memory is shared on
Arm, but there's no reason you can't have your own cache
implementation).

> 
> But as long as any async binding work required for signalling the 
> VM_BIND out-fence is properly annotated with 
> dma_fence_begin_signalling() and dma_fence_end_signalling() and there 
> aren't any lockdep splats, things should be good. It would trigger on 
> both memory allocation and attempts to grab a dma_resv lock.

I have dma_fence_{begin,end}_signalling() annotations in the
::run_job() path, and no lockdep complaint spotted so far.

> 
> 
> >  
> >> OK, well one of the main motivations for Xe is to be able to pipeline
> >> interleaving binds and execs if needed, like so:
> >>
> >> - Bind vmas for scene 1.
> >> - Submit scene 1.
> >> - Unbind vmas for scene 1.
> >> - Bind vmas for scene 2.
> >> - Submit scene 2.
> >> - Unbind vmas for scene 2.
> >>
> >> And being able to *submit* all of the above while the async binding of
> >> vmas for scene (step 1) has not yet completed.
> >> I can't really see how this could be done, while obeying dma-fence
> >> rules, unless state is updated synchronously while submitting?  
> > The idea in this case is to detect when a GPU job dependency is a
> > VM_BIND out-fence, turn drm_sched_fence->parent into an
> > xxx_vm_bind_job_fence object that's holding the GEM that's about to be
> > mapped (AFAICT, we don't need to do anything for unmap operations), and
> > then add our GPU job fence to this BO. This should not only guarantee
> > that the GEMs we depend on are mapped before the GPU job is executed
> > (the fence wait does that), but also that such yet-to-be-mapped GEMs
> > won't be evicted just after they've been mapped and before the GPU had
> > a chance to execute (unless I'm missing something, adding our GPU job
> > fence to the BO being targeted by a pending VM_BIND(async,map) operation
> > solves this problem).

It's not exactly that, because we'd need to add a GEMs of all the
pending VM_BIND(map) jobs that come before the expressed dependency, not
just the one attached to the dependency itself. But after chatting with
Danilo, I realized we might not even need to track the GEMs being
mapped at the fence level if we call drm_gpuva_extobj_insert() in the
ioctl(VM_BIND) path:

- drm_gpuva_extobj_insert() will make sure the GEM is added to
  the ext-object map even before it's actually mapped to the VM (for
  private GEMs, it doesn't matter, because they are using the VM resv,
  so any private GEM mapped will automatically receive the VM resv
  updates).

Now, when a GPU job is queued, we do all the VM GEM preparation, which
includes the following steps:

- drm_gpuva_manager_validate() will make already-bound-but-evicted GEMs
  resident
- Iterate over all ext-objs to add our fence (I'm skipping the slot
  reservation step that's implied). Because drm_gpuva_extobj_insert()
  was called early, we also get all the GEMs that are not yet mapped,
  but are about to be mapped. This means they won't be evicted until
  after our job is done
- add our fence to the VM resv

Unless I'm missing something, this should guarantee that all GEMs are
resident and mapped when the job is executed.

> 
> Yes, we're essentially doing the same. The issue here is that when we, 
> for example *submit* Bind vmas for scene 2,
> we need to know how much page-table memory to allocate,

This is solved with over-provisioning in our case.

> and what BOs to 
> make resident to be able to publish the out-fence.

That's basically what Danilo's latest gpuva_mgr patchset tries to
provide generic helpers for, by exposing functions to iterate over all
evicted GEMs (so we can make them resident) and adding a way to add
fences to all GEMs currently bound to the VM. That leaves external GEMs
that are about to be mapped, which, I think, is addressed by the
solution detailed above.

> That means we need to 
> know what the VM state would look like at the end of "Unbind vmas for 
> scene 1".

Not necessarily, as long as you know all the GEMs that are currently
mapped and those that are about to be mapped. The extobj set provides
exactly that for external GEMs.

> If the VM state is updated at submission time, that's all ok 
> but if it's updated at execution time, we'd have to guess what resources 
> to pre-allocate.

As long as you have enough resources pre-allocated to do the VM update
(not saying this is easy to guess on Intel, but it's doable on Mali,
and the page table caching makes over-provisioning not too bad, as long
as we limit the number of in-flight VM_BIND jobs).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-xe] [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06 13:00               ` Boris Brezillon
  0 siblings, 0 replies; 45+ messages in thread
From: Boris Brezillon @ 2023-09-06 13:00 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Francois Dugast, linux-kernel, dri-devel, Danilo Krummrich,
	Rodrigo Vivi, intel-xe

On Wed, 6 Sep 2023 13:57:03 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:

> Hi, Boris
> 
> On 9/6/23 13:09, Boris Brezillon wrote:
> > On Wed, 6 Sep 2023 10:32:24 +0200
> > Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >
> >  
> >>>>>> +Introducing external (or shared) buffer objects
> >>>>>> +===============================================
> >>>>>> +
> >>>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
> >>>>>> +can't share their reservation object with a single gpu_vm, but
> >>>>>> will rather
> >>>>>> +have a reservation object of their own. The shared objects bound to a
> >>>>>> +gpu_vm using one or many
> >>>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
> >>>>>> +protected by the gpu_vm lock. One could in theory protect it also
> >>>>>> with
> >>>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is
> >>>>>> typically
> >>>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
> >>>>>> +the current locking helpers, that is typically not done. Also see
> >>>>>> +below for userptr gpu_vmas.
> >>>>>> +
> >>>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
> >>>>>> +object, but we can no longer be certain that we hold the gpu_vm's
> >>>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we  
> >>>>> I need to think a bit more about locking of extobj and evicted
> >>>>> object tracking
> >>>>> in the case of processing 'drm_gpuva_ops' directly through callbacks
> >>>>> within the
> >>>>> fence signalling critical path as mentioend in [1].
> >>>>>
> >>>>> In order to support that, we'd need to protect extobjs with a
> >>>>> separate lock,
> >>>>> and while iterating extobjs to acquire the dma-resv lock drop the
> >>>>> lock within
> >>>>> the loop before we actually acquire the dma-resv lock. Maple tree
> >>>>> supports that
> >>>>> already and this can be fully done within the GPUVA manager; no need
> >>>>> for the
> >>>>> driver to care about that.  
> >>>> So do I understand correctly that this because you want to update the
> >>>> gpuvm state while operations are progressing asynchronously?
> >>>>
> >>>> If so, I wonder whether that could really be done? For example to
> >>>> allocate enough memory for page-tables etc, you need to know the
> >>>> details of the operations at IOCTL execution time, and to know the
> >>>> details you need to know the state from the previous operation?  
> >>>
> >>> Right, sync and async bind can't run fully concurrently, but you could
> >>> "inject" a
> >>> sync one between two async ones such that the sync ones executed from
> >>> the IOCTL
> >>> directly while async execution is stalled meanwhile. This would be
> >>> possible because
> >>> the actual drm_gpuva_ops would be calculated within the async
> >>> execution path rather
> >>> than in the IOCTL. But yes, page-table management must be desinged to
> >>> support that.  
> > FWIW, the panthor driver is designed this way (note that I'm not
> > supporting GEM eviction yet, so there might be subtleties I missed).  
> 
> The problem is that once you've published your VM_BIND out-fence, any 
> code path required to signal that fence may notallocate memory nor or 
> grab any locks that allows allocating memory while held including 
> dma_resv locks, and that means all required page-table memory needs to 
> be allocated synchronously in the IOCTL,

Yep, that's already what I do, by over-provisioning for the worst case
scenario (page table tree is empty), and returning unused pages after
the operation is done.

> and all evicted bos need to be 
> made resident in the IOCTL,

Yep, I'm pinning memory to BOs in that path too.

> and at least in the xe driver the amount of 
> memory we need to allocate depends on the vm state, so we can't really 
> update the vm state asynchronously either.

For Mali, we can calculate the maximum amount of pages we'll need for a
MAP operation, by assuming the page table is empty. Then it's just a
matter of returning unused pages to a fast-alloc pool so we can
speed-up further page table allocations (we're using a kmem_cache here,
since the page table update is done by the CPU and memory is shared on
Arm, but there's no reason you can't have your own cache
implementation).

> 
> But as long as any async binding work required for signalling the 
> VM_BIND out-fence is properly annotated with 
> dma_fence_begin_signalling() and dma_fence_end_signalling() and there 
> aren't any lockdep splats, things should be good. It would trigger on 
> both memory allocation and attempts to grab a dma_resv lock.

I have dma_fence_{begin,end}_signalling() annotations in the
::run_job() path, and no lockdep complaint spotted so far.

> 
> 
> >  
> >> OK, well one of the main motivations for Xe is to be able to pipeline
> >> interleaving binds and execs if needed, like so:
> >>
> >> - Bind vmas for scene 1.
> >> - Submit scene 1.
> >> - Unbind vmas for scene 1.
> >> - Bind vmas for scene 2.
> >> - Submit scene 2.
> >> - Unbind vmas for scene 2.
> >>
> >> And being able to *submit* all of the above while the async binding of
> >> vmas for scene (step 1) has not yet completed.
> >> I can't really see how this could be done, while obeying dma-fence
> >> rules, unless state is updated synchronously while submitting?  
> > The idea in this case is to detect when a GPU job dependency is a
> > VM_BIND out-fence, turn drm_sched_fence->parent into an
> > xxx_vm_bind_job_fence object that's holding the GEM that's about to be
> > mapped (AFAICT, we don't need to do anything for unmap operations), and
> > then add our GPU job fence to this BO. This should not only guarantee
> > that the GEMs we depend on are mapped before the GPU job is executed
> > (the fence wait does that), but also that such yet-to-be-mapped GEMs
> > won't be evicted just after they've been mapped and before the GPU had
> > a chance to execute (unless I'm missing something, adding our GPU job
> > fence to the BO being targeted by a pending VM_BIND(async,map) operation
> > solves this problem).

It's not exactly that, because we'd need to add a GEMs of all the
pending VM_BIND(map) jobs that come before the expressed dependency, not
just the one attached to the dependency itself. But after chatting with
Danilo, I realized we might not even need to track the GEMs being
mapped at the fence level if we call drm_gpuva_extobj_insert() in the
ioctl(VM_BIND) path:

- drm_gpuva_extobj_insert() will make sure the GEM is added to
  the ext-object map even before it's actually mapped to the VM (for
  private GEMs, it doesn't matter, because they are using the VM resv,
  so any private GEM mapped will automatically receive the VM resv
  updates).

Now, when a GPU job is queued, we do all the VM GEM preparation, which
includes the following steps:

- drm_gpuva_manager_validate() will make already-bound-but-evicted GEMs
  resident
- Iterate over all ext-objs to add our fence (I'm skipping the slot
  reservation step that's implied). Because drm_gpuva_extobj_insert()
  was called early, we also get all the GEMs that are not yet mapped,
  but are about to be mapped. This means they won't be evicted until
  after our job is done
- add our fence to the VM resv

Unless I'm missing something, this should guarantee that all GEMs are
resident and mapped when the job is executed.

> 
> Yes, we're essentially doing the same. The issue here is that when we, 
> for example *submit* Bind vmas for scene 2,
> we need to know how much page-table memory to allocate,

This is solved with over-provisioning in our case.

> and what BOs to 
> make resident to be able to publish the out-fence.

That's basically what Danilo's latest gpuva_mgr patchset tries to
provide generic helpers for, by exposing functions to iterate over all
evicted GEMs (so we can make them resident) and adding a way to add
fences to all GEMs currently bound to the VM. That leaves external GEMs
that are about to be mapped, which, I think, is addressed by the
solution detailed above.

> That means we need to 
> know what the VM state would look like at the end of "Unbind vmas for 
> scene 1".

Not necessarily, as long as you know all the GEMs that are currently
mapped and those that are about to be mapped. The extobj set provides
exactly that for external GEMs.

> If the VM state is updated at submission time, that's all ok 
> but if it's updated at execution time, we'd have to guess what resources 
> to pre-allocate.

As long as you have enough resources pre-allocated to do the VM update
(not saying this is easy to guess on Intel, but it's doable on Mali,
and the page table caching makes over-provisioning not too bad, as long
as we limit the number of in-flight VM_BIND jobs).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
  2023-09-06 13:00               ` Boris Brezillon
  (?)
@ 2023-09-06 14:08                 ` Thomas Hellström
  -1 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-09-06 14:08 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Danilo Krummrich, Matthew Brost, Francois Dugast, linux-kernel,
	Oak Zeng, dri-devel, Rodrigo Vivi, intel-xe

Hi, Boris,

On 9/6/23 15:00, Boris Brezillon wrote:
> On Wed, 6 Sep 2023 13:57:03 +0200
> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>
>> Hi, Boris
>>
>> On 9/6/23 13:09, Boris Brezillon wrote:
>>> On Wed, 6 Sep 2023 10:32:24 +0200
>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>
>>>   
>>>>>>>> +Introducing external (or shared) buffer objects
>>>>>>>> +===============================================
>>>>>>>> +
>>>>>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>>>>>>>> +can't share their reservation object with a single gpu_vm, but
>>>>>>>> will rather
>>>>>>>> +have a reservation object of their own. The shared objects bound to a
>>>>>>>> +gpu_vm using one or many
>>>>>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>>>>>>>> +protected by the gpu_vm lock. One could in theory protect it also
>>>>>>>> with
>>>>>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is
>>>>>>>> typically
>>>>>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>>>>>>>> +the current locking helpers, that is typically not done. Also see
>>>>>>>> +below for userptr gpu_vmas.
>>>>>>>> +
>>>>>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>>>>>>>> +object, but we can no longer be certain that we hold the gpu_vm's
>>>>>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
>>>>>>> I need to think a bit more about locking of extobj and evicted
>>>>>>> object tracking
>>>>>>> in the case of processing 'drm_gpuva_ops' directly through callbacks
>>>>>>> within the
>>>>>>> fence signalling critical path as mentioend in [1].
>>>>>>>
>>>>>>> In order to support that, we'd need to protect extobjs with a
>>>>>>> separate lock,
>>>>>>> and while iterating extobjs to acquire the dma-resv lock drop the
>>>>>>> lock within
>>>>>>> the loop before we actually acquire the dma-resv lock. Maple tree
>>>>>>> supports that
>>>>>>> already and this can be fully done within the GPUVA manager; no need
>>>>>>> for the
>>>>>>> driver to care about that.
>>>>>> So do I understand correctly that this because you want to update the
>>>>>> gpuvm state while operations are progressing asynchronously?
>>>>>>
>>>>>> If so, I wonder whether that could really be done? For example to
>>>>>> allocate enough memory for page-tables etc, you need to know the
>>>>>> details of the operations at IOCTL execution time, and to know the
>>>>>> details you need to know the state from the previous operation?
>>>>> Right, sync and async bind can't run fully concurrently, but you could
>>>>> "inject" a
>>>>> sync one between two async ones such that the sync ones executed from
>>>>> the IOCTL
>>>>> directly while async execution is stalled meanwhile. This would be
>>>>> possible because
>>>>> the actual drm_gpuva_ops would be calculated within the async
>>>>> execution path rather
>>>>> than in the IOCTL. But yes, page-table management must be desinged to
>>>>> support that.
>>> FWIW, the panthor driver is designed this way (note that I'm not
>>> supporting GEM eviction yet, so there might be subtleties I missed).
>> The problem is that once you've published your VM_BIND out-fence, any
>> code path required to signal that fence may notallocate memory nor or
>> grab any locks that allows allocating memory while held including
>> dma_resv locks, and that means all required page-table memory needs to
>> be allocated synchronously in the IOCTL,
> Yep, that's already what I do, by over-provisioning for the worst case
> scenario (page table tree is empty), and returning unused pages after
> the operation is done.
>
>> and all evicted bos need to be
>> made resident in the IOCTL,
> Yep, I'm pinning memory to BOs in that path too.
>
>> and at least in the xe driver the amount of
>> memory we need to allocate depends on the vm state, so we can't really
>> update the vm state asynchronously either.
> For Mali, we can calculate the maximum amount of pages we'll need for a
> MAP operation, by assuming the page table is empty. Then it's just a
> matter of returning unused pages to a fast-alloc pool so we can
> speed-up further page table allocations (we're using a kmem_cache here,
> since the page table update is done by the CPU and memory is shared on
> Arm, but there's no reason you can't have your own cache
> implementation).
>
>> But as long as any async binding work required for signalling the
>> VM_BIND out-fence is properly annotated with
>> dma_fence_begin_signalling() and dma_fence_end_signalling() and there
>> aren't any lockdep splats, things should be good. It would trigger on
>> both memory allocation and attempts to grab a dma_resv lock.
> I have dma_fence_{begin,end}_signalling() annotations in the
> ::run_job() path, and no lockdep complaint spotted so far.
>
>>
>>>   
>>>> OK, well one of the main motivations for Xe is to be able to pipeline
>>>> interleaving binds and execs if needed, like so:
>>>>
>>>> - Bind vmas for scene 1.
>>>> - Submit scene 1.
>>>> - Unbind vmas for scene 1.
>>>> - Bind vmas for scene 2.
>>>> - Submit scene 2.
>>>> - Unbind vmas for scene 2.
>>>>
>>>> And being able to *submit* all of the above while the async binding of
>>>> vmas for scene (step 1) has not yet completed.
>>>> I can't really see how this could be done, while obeying dma-fence
>>>> rules, unless state is updated synchronously while submitting?
>>> The idea in this case is to detect when a GPU job dependency is a
>>> VM_BIND out-fence, turn drm_sched_fence->parent into an
>>> xxx_vm_bind_job_fence object that's holding the GEM that's about to be
>>> mapped (AFAICT, we don't need to do anything for unmap operations), and
>>> then add our GPU job fence to this BO. This should not only guarantee
>>> that the GEMs we depend on are mapped before the GPU job is executed
>>> (the fence wait does that), but also that such yet-to-be-mapped GEMs
>>> won't be evicted just after they've been mapped and before the GPU had
>>> a chance to execute (unless I'm missing something, adding our GPU job
>>> fence to the BO being targeted by a pending VM_BIND(async,map) operation
>>> solves this problem).
> It's not exactly that, because we'd need to add a GEMs of all the
> pending VM_BIND(map) jobs that come before the expressed dependency, not
> just the one attached to the dependency itself. But after chatting with
> Danilo, I realized we might not even need to track the GEMs being
> mapped at the fence level if we call drm_gpuva_extobj_insert() in the
> ioctl(VM_BIND) path:
>
> - drm_gpuva_extobj_insert() will make sure the GEM is added to
>    the ext-object map even before it's actually mapped to the VM (for
>    private GEMs, it doesn't matter, because they are using the VM resv,
>    so any private GEM mapped will automatically receive the VM resv
>    updates).
>
> Now, when a GPU job is queued, we do all the VM GEM preparation, which
> includes the following steps:
>
> - drm_gpuva_manager_validate() will make already-bound-but-evicted GEMs
>    resident
> - Iterate over all ext-objs to add our fence (I'm skipping the slot
>    reservation step that's implied). Because drm_gpuva_extobj_insert()
>    was called early, we also get all the GEMs that are not yet mapped,
>    but are about to be mapped. This means they won't be evicted until
>    after our job is done
> - add our fence to the VM resv
>
> Unless I'm missing something, this should guarantee that all GEMs are
> resident and mapped when the job is executed.
>
>> Yes, we're essentially doing the same. The issue here is that when we,
>> for example *submit* Bind vmas for scene 2,
>> we need to know how much page-table memory to allocate,
> This is solved with over-provisioning in our case.
>
>> and what BOs to
>> make resident to be able to publish the out-fence.
> That's basically what Danilo's latest gpuva_mgr patchset tries to
> provide generic helpers for, by exposing functions to iterate over all
> evicted GEMs (so we can make them resident) and adding a way to add
> fences to all GEMs currently bound to the VM. That leaves external GEMs
> that are about to be mapped, which, I think, is addressed by the
> solution detailed above.
>
>> That means we need to
>> know what the VM state would look like at the end of "Unbind vmas for
>> scene 1".
> Not necessarily, as long as you know all the GEMs that are currently
> mapped and those that are about to be mapped. The extobj set provides
> exactly that for external GEMs.
>
>> If the VM state is updated at submission time, that's all ok
>> but if it's updated at execution time, we'd have to guess what resources
>> to pre-allocate.
> As long as you have enough resources pre-allocated to do the VM update
> (not saying this is easy to guess on Intel, but it's doable on Mali,
> and the page table caching makes over-provisioning not too bad, as long
> as we limit the number of in-flight VM_BIND jobs).

OK, then it sounds we're on the same page. I guess it would i theory be 
possible to pre-allocate all needed resources on xe as well, but if the 
vm state lock is made an inner lock in order for us to be able to grab 
it within the dma-fence critical section, then it comes with a number of 
drawbacks as well:
* Over-allocation of resources.
* Need to spawn a cpu-thread for the async part (currently we utilize 
the GPU for that).
* Probably looking at locking inversions wrt userptr?
* Probably looking at locking inversions wrt recoverable pagefaults?
* Mismatch with the cpu mmap() / munmap() interface where the mmap_sem 
is the outermost lock.

So for us currently it currently looks like the sync state update is the 
preferred one... But OTOH we haven't fully implemented the unwinding yet...

/Thomas






^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06 14:08                 ` Thomas Hellström
  0 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-09-06 14:08 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Matthew Brost, Francois Dugast, linux-kernel, dri-devel,
	Danilo Krummrich, Oak Zeng, Rodrigo Vivi, intel-xe

Hi, Boris,

On 9/6/23 15:00, Boris Brezillon wrote:
> On Wed, 6 Sep 2023 13:57:03 +0200
> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>
>> Hi, Boris
>>
>> On 9/6/23 13:09, Boris Brezillon wrote:
>>> On Wed, 6 Sep 2023 10:32:24 +0200
>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>
>>>   
>>>>>>>> +Introducing external (or shared) buffer objects
>>>>>>>> +===============================================
>>>>>>>> +
>>>>>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>>>>>>>> +can't share their reservation object with a single gpu_vm, but
>>>>>>>> will rather
>>>>>>>> +have a reservation object of their own. The shared objects bound to a
>>>>>>>> +gpu_vm using one or many
>>>>>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>>>>>>>> +protected by the gpu_vm lock. One could in theory protect it also
>>>>>>>> with
>>>>>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is
>>>>>>>> typically
>>>>>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>>>>>>>> +the current locking helpers, that is typically not done. Also see
>>>>>>>> +below for userptr gpu_vmas.
>>>>>>>> +
>>>>>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>>>>>>>> +object, but we can no longer be certain that we hold the gpu_vm's
>>>>>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
>>>>>>> I need to think a bit more about locking of extobj and evicted
>>>>>>> object tracking
>>>>>>> in the case of processing 'drm_gpuva_ops' directly through callbacks
>>>>>>> within the
>>>>>>> fence signalling critical path as mentioend in [1].
>>>>>>>
>>>>>>> In order to support that, we'd need to protect extobjs with a
>>>>>>> separate lock,
>>>>>>> and while iterating extobjs to acquire the dma-resv lock drop the
>>>>>>> lock within
>>>>>>> the loop before we actually acquire the dma-resv lock. Maple tree
>>>>>>> supports that
>>>>>>> already and this can be fully done within the GPUVA manager; no need
>>>>>>> for the
>>>>>>> driver to care about that.
>>>>>> So do I understand correctly that this because you want to update the
>>>>>> gpuvm state while operations are progressing asynchronously?
>>>>>>
>>>>>> If so, I wonder whether that could really be done? For example to
>>>>>> allocate enough memory for page-tables etc, you need to know the
>>>>>> details of the operations at IOCTL execution time, and to know the
>>>>>> details you need to know the state from the previous operation?
>>>>> Right, sync and async bind can't run fully concurrently, but you could
>>>>> "inject" a
>>>>> sync one between two async ones such that the sync ones executed from
>>>>> the IOCTL
>>>>> directly while async execution is stalled meanwhile. This would be
>>>>> possible because
>>>>> the actual drm_gpuva_ops would be calculated within the async
>>>>> execution path rather
>>>>> than in the IOCTL. But yes, page-table management must be desinged to
>>>>> support that.
>>> FWIW, the panthor driver is designed this way (note that I'm not
>>> supporting GEM eviction yet, so there might be subtleties I missed).
>> The problem is that once you've published your VM_BIND out-fence, any
>> code path required to signal that fence may notallocate memory nor or
>> grab any locks that allows allocating memory while held including
>> dma_resv locks, and that means all required page-table memory needs to
>> be allocated synchronously in the IOCTL,
> Yep, that's already what I do, by over-provisioning for the worst case
> scenario (page table tree is empty), and returning unused pages after
> the operation is done.
>
>> and all evicted bos need to be
>> made resident in the IOCTL,
> Yep, I'm pinning memory to BOs in that path too.
>
>> and at least in the xe driver the amount of
>> memory we need to allocate depends on the vm state, so we can't really
>> update the vm state asynchronously either.
> For Mali, we can calculate the maximum amount of pages we'll need for a
> MAP operation, by assuming the page table is empty. Then it's just a
> matter of returning unused pages to a fast-alloc pool so we can
> speed-up further page table allocations (we're using a kmem_cache here,
> since the page table update is done by the CPU and memory is shared on
> Arm, but there's no reason you can't have your own cache
> implementation).
>
>> But as long as any async binding work required for signalling the
>> VM_BIND out-fence is properly annotated with
>> dma_fence_begin_signalling() and dma_fence_end_signalling() and there
>> aren't any lockdep splats, things should be good. It would trigger on
>> both memory allocation and attempts to grab a dma_resv lock.
> I have dma_fence_{begin,end}_signalling() annotations in the
> ::run_job() path, and no lockdep complaint spotted so far.
>
>>
>>>   
>>>> OK, well one of the main motivations for Xe is to be able to pipeline
>>>> interleaving binds and execs if needed, like so:
>>>>
>>>> - Bind vmas for scene 1.
>>>> - Submit scene 1.
>>>> - Unbind vmas for scene 1.
>>>> - Bind vmas for scene 2.
>>>> - Submit scene 2.
>>>> - Unbind vmas for scene 2.
>>>>
>>>> And being able to *submit* all of the above while the async binding of
>>>> vmas for scene (step 1) has not yet completed.
>>>> I can't really see how this could be done, while obeying dma-fence
>>>> rules, unless state is updated synchronously while submitting?
>>> The idea in this case is to detect when a GPU job dependency is a
>>> VM_BIND out-fence, turn drm_sched_fence->parent into an
>>> xxx_vm_bind_job_fence object that's holding the GEM that's about to be
>>> mapped (AFAICT, we don't need to do anything for unmap operations), and
>>> then add our GPU job fence to this BO. This should not only guarantee
>>> that the GEMs we depend on are mapped before the GPU job is executed
>>> (the fence wait does that), but also that such yet-to-be-mapped GEMs
>>> won't be evicted just after they've been mapped and before the GPU had
>>> a chance to execute (unless I'm missing something, adding our GPU job
>>> fence to the BO being targeted by a pending VM_BIND(async,map) operation
>>> solves this problem).
> It's not exactly that, because we'd need to add a GEMs of all the
> pending VM_BIND(map) jobs that come before the expressed dependency, not
> just the one attached to the dependency itself. But after chatting with
> Danilo, I realized we might not even need to track the GEMs being
> mapped at the fence level if we call drm_gpuva_extobj_insert() in the
> ioctl(VM_BIND) path:
>
> - drm_gpuva_extobj_insert() will make sure the GEM is added to
>    the ext-object map even before it's actually mapped to the VM (for
>    private GEMs, it doesn't matter, because they are using the VM resv,
>    so any private GEM mapped will automatically receive the VM resv
>    updates).
>
> Now, when a GPU job is queued, we do all the VM GEM preparation, which
> includes the following steps:
>
> - drm_gpuva_manager_validate() will make already-bound-but-evicted GEMs
>    resident
> - Iterate over all ext-objs to add our fence (I'm skipping the slot
>    reservation step that's implied). Because drm_gpuva_extobj_insert()
>    was called early, we also get all the GEMs that are not yet mapped,
>    but are about to be mapped. This means they won't be evicted until
>    after our job is done
> - add our fence to the VM resv
>
> Unless I'm missing something, this should guarantee that all GEMs are
> resident and mapped when the job is executed.
>
>> Yes, we're essentially doing the same. The issue here is that when we,
>> for example *submit* Bind vmas for scene 2,
>> we need to know how much page-table memory to allocate,
> This is solved with over-provisioning in our case.
>
>> and what BOs to
>> make resident to be able to publish the out-fence.
> That's basically what Danilo's latest gpuva_mgr patchset tries to
> provide generic helpers for, by exposing functions to iterate over all
> evicted GEMs (so we can make them resident) and adding a way to add
> fences to all GEMs currently bound to the VM. That leaves external GEMs
> that are about to be mapped, which, I think, is addressed by the
> solution detailed above.
>
>> That means we need to
>> know what the VM state would look like at the end of "Unbind vmas for
>> scene 1".
> Not necessarily, as long as you know all the GEMs that are currently
> mapped and those that are about to be mapped. The extobj set provides
> exactly that for external GEMs.
>
>> If the VM state is updated at submission time, that's all ok
>> but if it's updated at execution time, we'd have to guess what resources
>> to pre-allocate.
> As long as you have enough resources pre-allocated to do the VM update
> (not saying this is easy to guess on Intel, but it's doable on Mali,
> and the page table caching makes over-provisioning not too bad, as long
> as we limit the number of in-flight VM_BIND jobs).

OK, then it sounds we're on the same page. I guess it would i theory be 
possible to pre-allocate all needed resources on xe as well, but if the 
vm state lock is made an inner lock in order for us to be able to grab 
it within the dma-fence critical section, then it comes with a number of 
drawbacks as well:
* Over-allocation of resources.
* Need to spawn a cpu-thread for the async part (currently we utilize 
the GPU for that).
* Probably looking at locking inversions wrt userptr?
* Probably looking at locking inversions wrt recoverable pagefaults?
* Mismatch with the cpu mmap() / munmap() interface where the mmap_sem 
is the outermost lock.

So for us currently it currently looks like the sync state update is the 
preferred one... But OTOH we haven't fully implemented the unwinding yet...

/Thomas






^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-xe] [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06 14:08                 ` Thomas Hellström
  0 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-09-06 14:08 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Francois Dugast, linux-kernel, dri-devel, Danilo Krummrich,
	Rodrigo Vivi, intel-xe

Hi, Boris,

On 9/6/23 15:00, Boris Brezillon wrote:
> On Wed, 6 Sep 2023 13:57:03 +0200
> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>
>> Hi, Boris
>>
>> On 9/6/23 13:09, Boris Brezillon wrote:
>>> On Wed, 6 Sep 2023 10:32:24 +0200
>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>
>>>   
>>>>>>>> +Introducing external (or shared) buffer objects
>>>>>>>> +===============================================
>>>>>>>> +
>>>>>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>>>>>>>> +can't share their reservation object with a single gpu_vm, but
>>>>>>>> will rather
>>>>>>>> +have a reservation object of their own. The shared objects bound to a
>>>>>>>> +gpu_vm using one or many
>>>>>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>>>>>>>> +protected by the gpu_vm lock. One could in theory protect it also
>>>>>>>> with
>>>>>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is
>>>>>>>> typically
>>>>>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>>>>>>>> +the current locking helpers, that is typically not done. Also see
>>>>>>>> +below for userptr gpu_vmas.
>>>>>>>> +
>>>>>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>>>>>>>> +object, but we can no longer be certain that we hold the gpu_vm's
>>>>>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
>>>>>>> I need to think a bit more about locking of extobj and evicted
>>>>>>> object tracking
>>>>>>> in the case of processing 'drm_gpuva_ops' directly through callbacks
>>>>>>> within the
>>>>>>> fence signalling critical path as mentioend in [1].
>>>>>>>
>>>>>>> In order to support that, we'd need to protect extobjs with a
>>>>>>> separate lock,
>>>>>>> and while iterating extobjs to acquire the dma-resv lock drop the
>>>>>>> lock within
>>>>>>> the loop before we actually acquire the dma-resv lock. Maple tree
>>>>>>> supports that
>>>>>>> already and this can be fully done within the GPUVA manager; no need
>>>>>>> for the
>>>>>>> driver to care about that.
>>>>>> So do I understand correctly that this because you want to update the
>>>>>> gpuvm state while operations are progressing asynchronously?
>>>>>>
>>>>>> If so, I wonder whether that could really be done? For example to
>>>>>> allocate enough memory for page-tables etc, you need to know the
>>>>>> details of the operations at IOCTL execution time, and to know the
>>>>>> details you need to know the state from the previous operation?
>>>>> Right, sync and async bind can't run fully concurrently, but you could
>>>>> "inject" a
>>>>> sync one between two async ones such that the sync ones executed from
>>>>> the IOCTL
>>>>> directly while async execution is stalled meanwhile. This would be
>>>>> possible because
>>>>> the actual drm_gpuva_ops would be calculated within the async
>>>>> execution path rather
>>>>> than in the IOCTL. But yes, page-table management must be desinged to
>>>>> support that.
>>> FWIW, the panthor driver is designed this way (note that I'm not
>>> supporting GEM eviction yet, so there might be subtleties I missed).
>> The problem is that once you've published your VM_BIND out-fence, any
>> code path required to signal that fence may notallocate memory nor or
>> grab any locks that allows allocating memory while held including
>> dma_resv locks, and that means all required page-table memory needs to
>> be allocated synchronously in the IOCTL,
> Yep, that's already what I do, by over-provisioning for the worst case
> scenario (page table tree is empty), and returning unused pages after
> the operation is done.
>
>> and all evicted bos need to be
>> made resident in the IOCTL,
> Yep, I'm pinning memory to BOs in that path too.
>
>> and at least in the xe driver the amount of
>> memory we need to allocate depends on the vm state, so we can't really
>> update the vm state asynchronously either.
> For Mali, we can calculate the maximum amount of pages we'll need for a
> MAP operation, by assuming the page table is empty. Then it's just a
> matter of returning unused pages to a fast-alloc pool so we can
> speed-up further page table allocations (we're using a kmem_cache here,
> since the page table update is done by the CPU and memory is shared on
> Arm, but there's no reason you can't have your own cache
> implementation).
>
>> But as long as any async binding work required for signalling the
>> VM_BIND out-fence is properly annotated with
>> dma_fence_begin_signalling() and dma_fence_end_signalling() and there
>> aren't any lockdep splats, things should be good. It would trigger on
>> both memory allocation and attempts to grab a dma_resv lock.
> I have dma_fence_{begin,end}_signalling() annotations in the
> ::run_job() path, and no lockdep complaint spotted so far.
>
>>
>>>   
>>>> OK, well one of the main motivations for Xe is to be able to pipeline
>>>> interleaving binds and execs if needed, like so:
>>>>
>>>> - Bind vmas for scene 1.
>>>> - Submit scene 1.
>>>> - Unbind vmas for scene 1.
>>>> - Bind vmas for scene 2.
>>>> - Submit scene 2.
>>>> - Unbind vmas for scene 2.
>>>>
>>>> And being able to *submit* all of the above while the async binding of
>>>> vmas for scene (step 1) has not yet completed.
>>>> I can't really see how this could be done, while obeying dma-fence
>>>> rules, unless state is updated synchronously while submitting?
>>> The idea in this case is to detect when a GPU job dependency is a
>>> VM_BIND out-fence, turn drm_sched_fence->parent into an
>>> xxx_vm_bind_job_fence object that's holding the GEM that's about to be
>>> mapped (AFAICT, we don't need to do anything for unmap operations), and
>>> then add our GPU job fence to this BO. This should not only guarantee
>>> that the GEMs we depend on are mapped before the GPU job is executed
>>> (the fence wait does that), but also that such yet-to-be-mapped GEMs
>>> won't be evicted just after they've been mapped and before the GPU had
>>> a chance to execute (unless I'm missing something, adding our GPU job
>>> fence to the BO being targeted by a pending VM_BIND(async,map) operation
>>> solves this problem).
> It's not exactly that, because we'd need to add a GEMs of all the
> pending VM_BIND(map) jobs that come before the expressed dependency, not
> just the one attached to the dependency itself. But after chatting with
> Danilo, I realized we might not even need to track the GEMs being
> mapped at the fence level if we call drm_gpuva_extobj_insert() in the
> ioctl(VM_BIND) path:
>
> - drm_gpuva_extobj_insert() will make sure the GEM is added to
>    the ext-object map even before it's actually mapped to the VM (for
>    private GEMs, it doesn't matter, because they are using the VM resv,
>    so any private GEM mapped will automatically receive the VM resv
>    updates).
>
> Now, when a GPU job is queued, we do all the VM GEM preparation, which
> includes the following steps:
>
> - drm_gpuva_manager_validate() will make already-bound-but-evicted GEMs
>    resident
> - Iterate over all ext-objs to add our fence (I'm skipping the slot
>    reservation step that's implied). Because drm_gpuva_extobj_insert()
>    was called early, we also get all the GEMs that are not yet mapped,
>    but are about to be mapped. This means they won't be evicted until
>    after our job is done
> - add our fence to the VM resv
>
> Unless I'm missing something, this should guarantee that all GEMs are
> resident and mapped when the job is executed.
>
>> Yes, we're essentially doing the same. The issue here is that when we,
>> for example *submit* Bind vmas for scene 2,
>> we need to know how much page-table memory to allocate,
> This is solved with over-provisioning in our case.
>
>> and what BOs to
>> make resident to be able to publish the out-fence.
> That's basically what Danilo's latest gpuva_mgr patchset tries to
> provide generic helpers for, by exposing functions to iterate over all
> evicted GEMs (so we can make them resident) and adding a way to add
> fences to all GEMs currently bound to the VM. That leaves external GEMs
> that are about to be mapped, which, I think, is addressed by the
> solution detailed above.
>
>> That means we need to
>> know what the VM state would look like at the end of "Unbind vmas for
>> scene 1".
> Not necessarily, as long as you know all the GEMs that are currently
> mapped and those that are about to be mapped. The extobj set provides
> exactly that for external GEMs.
>
>> If the VM state is updated at submission time, that's all ok
>> but if it's updated at execution time, we'd have to guess what resources
>> to pre-allocate.
> As long as you have enough resources pre-allocated to do the VM update
> (not saying this is easy to guess on Intel, but it's doable on Mali,
> and the page table caching makes over-provisioning not too bad, as long
> as we limit the number of in-flight VM_BIND jobs).

OK, then it sounds we're on the same page. I guess it would i theory be 
possible to pre-allocate all needed resources on xe as well, but if the 
vm state lock is made an inner lock in order for us to be able to grab 
it within the dma-fence critical section, then it comes with a number of 
drawbacks as well:
* Over-allocation of resources.
* Need to spawn a cpu-thread for the async part (currently we utilize 
the GPU for that).
* Probably looking at locking inversions wrt userptr?
* Probably looking at locking inversions wrt recoverable pagefaults?
* Mismatch with the cpu mmap() / munmap() interface where the mmap_sem 
is the outermost lock.

So for us currently it currently looks like the sync state update is the 
preferred one... But OTOH we haven't fully implemented the unwinding yet...

/Thomas






^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-xe] [PATCH v2] Documentation/gpu: VM_BIND locking document
  2023-09-06 14:08                 ` Thomas Hellström
  (?)
@ 2023-09-06 14:54                   ` Boris Brezillon
  -1 siblings, 0 replies; 45+ messages in thread
From: Boris Brezillon @ 2023-09-06 14:54 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Francois Dugast, linux-kernel, dri-devel, Danilo Krummrich,
	Rodrigo Vivi, intel-xe

Hi Thomas,

On Wed, 6 Sep 2023 16:08:07 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:

> Hi, Boris,
> 
> On 9/6/23 15:00, Boris Brezillon wrote:
> > On Wed, 6 Sep 2023 13:57:03 +0200
> > Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >  
> >> Hi, Boris
> >>
> >> On 9/6/23 13:09, Boris Brezillon wrote:  
> >>> On Wed, 6 Sep 2023 10:32:24 +0200
> >>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >>>
> >>>     
> >>>>>>>> +Introducing external (or shared) buffer objects
> >>>>>>>> +===============================================
> >>>>>>>> +
> >>>>>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
> >>>>>>>> +can't share their reservation object with a single gpu_vm, but
> >>>>>>>> will rather
> >>>>>>>> +have a reservation object of their own. The shared objects bound to a
> >>>>>>>> +gpu_vm using one or many
> >>>>>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
> >>>>>>>> +protected by the gpu_vm lock. One could in theory protect it also
> >>>>>>>> with
> >>>>>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is
> >>>>>>>> typically
> >>>>>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
> >>>>>>>> +the current locking helpers, that is typically not done. Also see
> >>>>>>>> +below for userptr gpu_vmas.
> >>>>>>>> +
> >>>>>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
> >>>>>>>> +object, but we can no longer be certain that we hold the gpu_vm's
> >>>>>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we  
> >>>>>>> I need to think a bit more about locking of extobj and evicted
> >>>>>>> object tracking
> >>>>>>> in the case of processing 'drm_gpuva_ops' directly through callbacks
> >>>>>>> within the
> >>>>>>> fence signalling critical path as mentioend in [1].
> >>>>>>>
> >>>>>>> In order to support that, we'd need to protect extobjs with a
> >>>>>>> separate lock,
> >>>>>>> and while iterating extobjs to acquire the dma-resv lock drop the
> >>>>>>> lock within
> >>>>>>> the loop before we actually acquire the dma-resv lock. Maple tree
> >>>>>>> supports that
> >>>>>>> already and this can be fully done within the GPUVA manager; no need
> >>>>>>> for the
> >>>>>>> driver to care about that.  
> >>>>>> So do I understand correctly that this because you want to update the
> >>>>>> gpuvm state while operations are progressing asynchronously?
> >>>>>>
> >>>>>> If so, I wonder whether that could really be done? For example to
> >>>>>> allocate enough memory for page-tables etc, you need to know the
> >>>>>> details of the operations at IOCTL execution time, and to know the
> >>>>>> details you need to know the state from the previous operation?  
> >>>>> Right, sync and async bind can't run fully concurrently, but you could
> >>>>> "inject" a
> >>>>> sync one between two async ones such that the sync ones executed from
> >>>>> the IOCTL
> >>>>> directly while async execution is stalled meanwhile. This would be
> >>>>> possible because
> >>>>> the actual drm_gpuva_ops would be calculated within the async
> >>>>> execution path rather
> >>>>> than in the IOCTL. But yes, page-table management must be desinged to
> >>>>> support that.  
> >>> FWIW, the panthor driver is designed this way (note that I'm not
> >>> supporting GEM eviction yet, so there might be subtleties I missed).  
> >> The problem is that once you've published your VM_BIND out-fence, any
> >> code path required to signal that fence may notallocate memory nor or
> >> grab any locks that allows allocating memory while held including
> >> dma_resv locks, and that means all required page-table memory needs to
> >> be allocated synchronously in the IOCTL,  
> > Yep, that's already what I do, by over-provisioning for the worst case
> > scenario (page table tree is empty), and returning unused pages after
> > the operation is done.
> >  
> >> and all evicted bos need to be
> >> made resident in the IOCTL,  
> > Yep, I'm pinning memory to BOs in that path too.
> >  
> >> and at least in the xe driver the amount of
> >> memory we need to allocate depends on the vm state, so we can't really
> >> update the vm state asynchronously either.  
> > For Mali, we can calculate the maximum amount of pages we'll need for a
> > MAP operation, by assuming the page table is empty. Then it's just a
> > matter of returning unused pages to a fast-alloc pool so we can
> > speed-up further page table allocations (we're using a kmem_cache here,
> > since the page table update is done by the CPU and memory is shared on
> > Arm, but there's no reason you can't have your own cache
> > implementation).
> >  
> >> But as long as any async binding work required for signalling the
> >> VM_BIND out-fence is properly annotated with
> >> dma_fence_begin_signalling() and dma_fence_end_signalling() and there
> >> aren't any lockdep splats, things should be good. It would trigger on
> >> both memory allocation and attempts to grab a dma_resv lock.  
> > I have dma_fence_{begin,end}_signalling() annotations in the
> > ::run_job() path, and no lockdep complaint spotted so far.
> >  
> >>  
> >>>     
> >>>> OK, well one of the main motivations for Xe is to be able to pipeline
> >>>> interleaving binds and execs if needed, like so:
> >>>>
> >>>> - Bind vmas for scene 1.
> >>>> - Submit scene 1.
> >>>> - Unbind vmas for scene 1.
> >>>> - Bind vmas for scene 2.
> >>>> - Submit scene 2.
> >>>> - Unbind vmas for scene 2.
> >>>>
> >>>> And being able to *submit* all of the above while the async binding of
> >>>> vmas for scene (step 1) has not yet completed.
> >>>> I can't really see how this could be done, while obeying dma-fence
> >>>> rules, unless state is updated synchronously while submitting?  
> >>> The idea in this case is to detect when a GPU job dependency is a
> >>> VM_BIND out-fence, turn drm_sched_fence->parent into an
> >>> xxx_vm_bind_job_fence object that's holding the GEM that's about to be
> >>> mapped (AFAICT, we don't need to do anything for unmap operations), and
> >>> then add our GPU job fence to this BO. This should not only guarantee
> >>> that the GEMs we depend on are mapped before the GPU job is executed
> >>> (the fence wait does that), but also that such yet-to-be-mapped GEMs
> >>> won't be evicted just after they've been mapped and before the GPU had
> >>> a chance to execute (unless I'm missing something, adding our GPU job
> >>> fence to the BO being targeted by a pending VM_BIND(async,map) operation
> >>> solves this problem).  
> > It's not exactly that, because we'd need to add a GEMs of all the
> > pending VM_BIND(map) jobs that come before the expressed dependency, not
> > just the one attached to the dependency itself. But after chatting with
> > Danilo, I realized we might not even need to track the GEMs being
> > mapped at the fence level if we call drm_gpuva_extobj_insert() in the
> > ioctl(VM_BIND) path:
> >
> > - drm_gpuva_extobj_insert() will make sure the GEM is added to
> >    the ext-object map even before it's actually mapped to the VM (for
> >    private GEMs, it doesn't matter, because they are using the VM resv,
> >    so any private GEM mapped will automatically receive the VM resv
> >    updates).
> >
> > Now, when a GPU job is queued, we do all the VM GEM preparation, which
> > includes the following steps:
> >
> > - drm_gpuva_manager_validate() will make already-bound-but-evicted GEMs
> >    resident
> > - Iterate over all ext-objs to add our fence (I'm skipping the slot
> >    reservation step that's implied). Because drm_gpuva_extobj_insert()
> >    was called early, we also get all the GEMs that are not yet mapped,
> >    but are about to be mapped. This means they won't be evicted until
> >    after our job is done
> > - add our fence to the VM resv
> >
> > Unless I'm missing something, this should guarantee that all GEMs are
> > resident and mapped when the job is executed.
> >  
> >> Yes, we're essentially doing the same. The issue here is that when we,
> >> for example *submit* Bind vmas for scene 2,
> >> we need to know how much page-table memory to allocate,  
> > This is solved with over-provisioning in our case.
> >  
> >> and what BOs to
> >> make resident to be able to publish the out-fence.  
> > That's basically what Danilo's latest gpuva_mgr patchset tries to
> > provide generic helpers for, by exposing functions to iterate over all
> > evicted GEMs (so we can make them resident) and adding a way to add
> > fences to all GEMs currently bound to the VM. That leaves external GEMs
> > that are about to be mapped, which, I think, is addressed by the
> > solution detailed above.
> >  
> >> That means we need to
> >> know what the VM state would look like at the end of "Unbind vmas for
> >> scene 1".  
> > Not necessarily, as long as you know all the GEMs that are currently
> > mapped and those that are about to be mapped. The extobj set provides
> > exactly that for external GEMs.
> >  
> >> If the VM state is updated at submission time, that's all ok
> >> but if it's updated at execution time, we'd have to guess what resources
> >> to pre-allocate.  
> > As long as you have enough resources pre-allocated to do the VM update
> > (not saying this is easy to guess on Intel, but it's doable on Mali,
> > and the page table caching makes over-provisioning not too bad, as long
> > as we limit the number of in-flight VM_BIND jobs).  
> 
> OK, then it sounds we're on the same page. I guess it would i theory be 
> possible to pre-allocate all needed resources on xe as well, but if the 
> vm state lock is made an inner lock in order for us to be able to grab 
> it within the dma-fence critical section, then it comes with a number of 
> drawbacks as well:
> * Over-allocation of resources.
> * Need to spawn a cpu-thread for the async part (currently we utilize 
> the GPU for that).

I guess the async CPU part is the logic returning unused resources to
the cache. You can use a work item/wq for that instead of a thread, but
yes, there's some work to be done on the CPU, indeed.

> * Probably looking at locking inversions wrt userptr?
> * Probably looking at locking inversions wrt recoverable pagefaults?

Okay, I clearly didn't look at userptr, and I briefly looked at
alloc-on-fault but didn't finish/test my implementation since I didn't
have a use case for it.

For the use cases we have, we only need to take the VM lock (the lock
protecting the VM state) when executing a VM operation (map/unmap), and
that's in the dma-signalling path where we do no allocation (thanks
to the pre-allocation logic) and no attempt to acquire a resv lock.

Tbh, I'm not even sure we'd need a lock if that wasn't for the debugfs
gpuva dumper, because drm_sched makes it so VM operations are
serialized (VM ops happen on the CPU, and there's one thread dequeuing
drm_sched_jobs).

The extobj set is protected using another lock in Danilo's
implementation (and my open-coded implementation did something similar,
though slightly broken apparently), so maybe that's the one you're
worried about.

> * Mismatch with the cpu mmap() / munmap() interface where the mmap_sem 
> is the outermost lock.
> 
> So for us currently it currently looks like the sync state update is the 
> preferred one...

Just to clarify things, I'm not trying to convince you to use the async
model, just saying that's what we went for, and, at first glance, it
didn't seem completely insane to me. But if there's something
fundamentally broken in this approach, I think I'd like to figure it
out early, so thanks for all your inputs :-).

Regards,

Boris

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06 14:54                   ` Boris Brezillon
  0 siblings, 0 replies; 45+ messages in thread
From: Boris Brezillon @ 2023-09-06 14:54 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Danilo Krummrich, Matthew Brost, Francois Dugast, linux-kernel,
	Oak Zeng, dri-devel, Rodrigo Vivi, intel-xe

Hi Thomas,

On Wed, 6 Sep 2023 16:08:07 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:

> Hi, Boris,
> 
> On 9/6/23 15:00, Boris Brezillon wrote:
> > On Wed, 6 Sep 2023 13:57:03 +0200
> > Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >  
> >> Hi, Boris
> >>
> >> On 9/6/23 13:09, Boris Brezillon wrote:  
> >>> On Wed, 6 Sep 2023 10:32:24 +0200
> >>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >>>
> >>>     
> >>>>>>>> +Introducing external (or shared) buffer objects
> >>>>>>>> +===============================================
> >>>>>>>> +
> >>>>>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
> >>>>>>>> +can't share their reservation object with a single gpu_vm, but
> >>>>>>>> will rather
> >>>>>>>> +have a reservation object of their own. The shared objects bound to a
> >>>>>>>> +gpu_vm using one or many
> >>>>>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
> >>>>>>>> +protected by the gpu_vm lock. One could in theory protect it also
> >>>>>>>> with
> >>>>>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is
> >>>>>>>> typically
> >>>>>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
> >>>>>>>> +the current locking helpers, that is typically not done. Also see
> >>>>>>>> +below for userptr gpu_vmas.
> >>>>>>>> +
> >>>>>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
> >>>>>>>> +object, but we can no longer be certain that we hold the gpu_vm's
> >>>>>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we  
> >>>>>>> I need to think a bit more about locking of extobj and evicted
> >>>>>>> object tracking
> >>>>>>> in the case of processing 'drm_gpuva_ops' directly through callbacks
> >>>>>>> within the
> >>>>>>> fence signalling critical path as mentioend in [1].
> >>>>>>>
> >>>>>>> In order to support that, we'd need to protect extobjs with a
> >>>>>>> separate lock,
> >>>>>>> and while iterating extobjs to acquire the dma-resv lock drop the
> >>>>>>> lock within
> >>>>>>> the loop before we actually acquire the dma-resv lock. Maple tree
> >>>>>>> supports that
> >>>>>>> already and this can be fully done within the GPUVA manager; no need
> >>>>>>> for the
> >>>>>>> driver to care about that.  
> >>>>>> So do I understand correctly that this because you want to update the
> >>>>>> gpuvm state while operations are progressing asynchronously?
> >>>>>>
> >>>>>> If so, I wonder whether that could really be done? For example to
> >>>>>> allocate enough memory for page-tables etc, you need to know the
> >>>>>> details of the operations at IOCTL execution time, and to know the
> >>>>>> details you need to know the state from the previous operation?  
> >>>>> Right, sync and async bind can't run fully concurrently, but you could
> >>>>> "inject" a
> >>>>> sync one between two async ones such that the sync ones executed from
> >>>>> the IOCTL
> >>>>> directly while async execution is stalled meanwhile. This would be
> >>>>> possible because
> >>>>> the actual drm_gpuva_ops would be calculated within the async
> >>>>> execution path rather
> >>>>> than in the IOCTL. But yes, page-table management must be desinged to
> >>>>> support that.  
> >>> FWIW, the panthor driver is designed this way (note that I'm not
> >>> supporting GEM eviction yet, so there might be subtleties I missed).  
> >> The problem is that once you've published your VM_BIND out-fence, any
> >> code path required to signal that fence may notallocate memory nor or
> >> grab any locks that allows allocating memory while held including
> >> dma_resv locks, and that means all required page-table memory needs to
> >> be allocated synchronously in the IOCTL,  
> > Yep, that's already what I do, by over-provisioning for the worst case
> > scenario (page table tree is empty), and returning unused pages after
> > the operation is done.
> >  
> >> and all evicted bos need to be
> >> made resident in the IOCTL,  
> > Yep, I'm pinning memory to BOs in that path too.
> >  
> >> and at least in the xe driver the amount of
> >> memory we need to allocate depends on the vm state, so we can't really
> >> update the vm state asynchronously either.  
> > For Mali, we can calculate the maximum amount of pages we'll need for a
> > MAP operation, by assuming the page table is empty. Then it's just a
> > matter of returning unused pages to a fast-alloc pool so we can
> > speed-up further page table allocations (we're using a kmem_cache here,
> > since the page table update is done by the CPU and memory is shared on
> > Arm, but there's no reason you can't have your own cache
> > implementation).
> >  
> >> But as long as any async binding work required for signalling the
> >> VM_BIND out-fence is properly annotated with
> >> dma_fence_begin_signalling() and dma_fence_end_signalling() and there
> >> aren't any lockdep splats, things should be good. It would trigger on
> >> both memory allocation and attempts to grab a dma_resv lock.  
> > I have dma_fence_{begin,end}_signalling() annotations in the
> > ::run_job() path, and no lockdep complaint spotted so far.
> >  
> >>  
> >>>     
> >>>> OK, well one of the main motivations for Xe is to be able to pipeline
> >>>> interleaving binds and execs if needed, like so:
> >>>>
> >>>> - Bind vmas for scene 1.
> >>>> - Submit scene 1.
> >>>> - Unbind vmas for scene 1.
> >>>> - Bind vmas for scene 2.
> >>>> - Submit scene 2.
> >>>> - Unbind vmas for scene 2.
> >>>>
> >>>> And being able to *submit* all of the above while the async binding of
> >>>> vmas for scene (step 1) has not yet completed.
> >>>> I can't really see how this could be done, while obeying dma-fence
> >>>> rules, unless state is updated synchronously while submitting?  
> >>> The idea in this case is to detect when a GPU job dependency is a
> >>> VM_BIND out-fence, turn drm_sched_fence->parent into an
> >>> xxx_vm_bind_job_fence object that's holding the GEM that's about to be
> >>> mapped (AFAICT, we don't need to do anything for unmap operations), and
> >>> then add our GPU job fence to this BO. This should not only guarantee
> >>> that the GEMs we depend on are mapped before the GPU job is executed
> >>> (the fence wait does that), but also that such yet-to-be-mapped GEMs
> >>> won't be evicted just after they've been mapped and before the GPU had
> >>> a chance to execute (unless I'm missing something, adding our GPU job
> >>> fence to the BO being targeted by a pending VM_BIND(async,map) operation
> >>> solves this problem).  
> > It's not exactly that, because we'd need to add a GEMs of all the
> > pending VM_BIND(map) jobs that come before the expressed dependency, not
> > just the one attached to the dependency itself. But after chatting with
> > Danilo, I realized we might not even need to track the GEMs being
> > mapped at the fence level if we call drm_gpuva_extobj_insert() in the
> > ioctl(VM_BIND) path:
> >
> > - drm_gpuva_extobj_insert() will make sure the GEM is added to
> >    the ext-object map even before it's actually mapped to the VM (for
> >    private GEMs, it doesn't matter, because they are using the VM resv,
> >    so any private GEM mapped will automatically receive the VM resv
> >    updates).
> >
> > Now, when a GPU job is queued, we do all the VM GEM preparation, which
> > includes the following steps:
> >
> > - drm_gpuva_manager_validate() will make already-bound-but-evicted GEMs
> >    resident
> > - Iterate over all ext-objs to add our fence (I'm skipping the slot
> >    reservation step that's implied). Because drm_gpuva_extobj_insert()
> >    was called early, we also get all the GEMs that are not yet mapped,
> >    but are about to be mapped. This means they won't be evicted until
> >    after our job is done
> > - add our fence to the VM resv
> >
> > Unless I'm missing something, this should guarantee that all GEMs are
> > resident and mapped when the job is executed.
> >  
> >> Yes, we're essentially doing the same. The issue here is that when we,
> >> for example *submit* Bind vmas for scene 2,
> >> we need to know how much page-table memory to allocate,  
> > This is solved with over-provisioning in our case.
> >  
> >> and what BOs to
> >> make resident to be able to publish the out-fence.  
> > That's basically what Danilo's latest gpuva_mgr patchset tries to
> > provide generic helpers for, by exposing functions to iterate over all
> > evicted GEMs (so we can make them resident) and adding a way to add
> > fences to all GEMs currently bound to the VM. That leaves external GEMs
> > that are about to be mapped, which, I think, is addressed by the
> > solution detailed above.
> >  
> >> That means we need to
> >> know what the VM state would look like at the end of "Unbind vmas for
> >> scene 1".  
> > Not necessarily, as long as you know all the GEMs that are currently
> > mapped and those that are about to be mapped. The extobj set provides
> > exactly that for external GEMs.
> >  
> >> If the VM state is updated at submission time, that's all ok
> >> but if it's updated at execution time, we'd have to guess what resources
> >> to pre-allocate.  
> > As long as you have enough resources pre-allocated to do the VM update
> > (not saying this is easy to guess on Intel, but it's doable on Mali,
> > and the page table caching makes over-provisioning not too bad, as long
> > as we limit the number of in-flight VM_BIND jobs).  
> 
> OK, then it sounds we're on the same page. I guess it would i theory be 
> possible to pre-allocate all needed resources on xe as well, but if the 
> vm state lock is made an inner lock in order for us to be able to grab 
> it within the dma-fence critical section, then it comes with a number of 
> drawbacks as well:
> * Over-allocation of resources.
> * Need to spawn a cpu-thread for the async part (currently we utilize 
> the GPU for that).

I guess the async CPU part is the logic returning unused resources to
the cache. You can use a work item/wq for that instead of a thread, but
yes, there's some work to be done on the CPU, indeed.

> * Probably looking at locking inversions wrt userptr?
> * Probably looking at locking inversions wrt recoverable pagefaults?

Okay, I clearly didn't look at userptr, and I briefly looked at
alloc-on-fault but didn't finish/test my implementation since I didn't
have a use case for it.

For the use cases we have, we only need to take the VM lock (the lock
protecting the VM state) when executing a VM operation (map/unmap), and
that's in the dma-signalling path where we do no allocation (thanks
to the pre-allocation logic) and no attempt to acquire a resv lock.

Tbh, I'm not even sure we'd need a lock if that wasn't for the debugfs
gpuva dumper, because drm_sched makes it so VM operations are
serialized (VM ops happen on the CPU, and there's one thread dequeuing
drm_sched_jobs).

The extobj set is protected using another lock in Danilo's
implementation (and my open-coded implementation did something similar,
though slightly broken apparently), so maybe that's the one you're
worried about.

> * Mismatch with the cpu mmap() / munmap() interface where the mmap_sem 
> is the outermost lock.
> 
> So for us currently it currently looks like the sync state update is the 
> preferred one...

Just to clarify things, I'm not trying to convince you to use the async
model, just saying that's what we went for, and, at first glance, it
didn't seem completely insane to me. But if there's something
fundamentally broken in this approach, I think I'd like to figure it
out early, so thanks for all your inputs :-).

Regards,

Boris

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06 14:54                   ` Boris Brezillon
  0 siblings, 0 replies; 45+ messages in thread
From: Boris Brezillon @ 2023-09-06 14:54 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Matthew Brost, Francois Dugast, linux-kernel, dri-devel,
	Danilo Krummrich, Oak Zeng, Rodrigo Vivi, intel-xe

Hi Thomas,

On Wed, 6 Sep 2023 16:08:07 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:

> Hi, Boris,
> 
> On 9/6/23 15:00, Boris Brezillon wrote:
> > On Wed, 6 Sep 2023 13:57:03 +0200
> > Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >  
> >> Hi, Boris
> >>
> >> On 9/6/23 13:09, Boris Brezillon wrote:  
> >>> On Wed, 6 Sep 2023 10:32:24 +0200
> >>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >>>
> >>>     
> >>>>>>>> +Introducing external (or shared) buffer objects
> >>>>>>>> +===============================================
> >>>>>>>> +
> >>>>>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
> >>>>>>>> +can't share their reservation object with a single gpu_vm, but
> >>>>>>>> will rather
> >>>>>>>> +have a reservation object of their own. The shared objects bound to a
> >>>>>>>> +gpu_vm using one or many
> >>>>>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
> >>>>>>>> +protected by the gpu_vm lock. One could in theory protect it also
> >>>>>>>> with
> >>>>>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is
> >>>>>>>> typically
> >>>>>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
> >>>>>>>> +the current locking helpers, that is typically not done. Also see
> >>>>>>>> +below for userptr gpu_vmas.
> >>>>>>>> +
> >>>>>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
> >>>>>>>> +object, but we can no longer be certain that we hold the gpu_vm's
> >>>>>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we  
> >>>>>>> I need to think a bit more about locking of extobj and evicted
> >>>>>>> object tracking
> >>>>>>> in the case of processing 'drm_gpuva_ops' directly through callbacks
> >>>>>>> within the
> >>>>>>> fence signalling critical path as mentioend in [1].
> >>>>>>>
> >>>>>>> In order to support that, we'd need to protect extobjs with a
> >>>>>>> separate lock,
> >>>>>>> and while iterating extobjs to acquire the dma-resv lock drop the
> >>>>>>> lock within
> >>>>>>> the loop before we actually acquire the dma-resv lock. Maple tree
> >>>>>>> supports that
> >>>>>>> already and this can be fully done within the GPUVA manager; no need
> >>>>>>> for the
> >>>>>>> driver to care about that.  
> >>>>>> So do I understand correctly that this because you want to update the
> >>>>>> gpuvm state while operations are progressing asynchronously?
> >>>>>>
> >>>>>> If so, I wonder whether that could really be done? For example to
> >>>>>> allocate enough memory for page-tables etc, you need to know the
> >>>>>> details of the operations at IOCTL execution time, and to know the
> >>>>>> details you need to know the state from the previous operation?  
> >>>>> Right, sync and async bind can't run fully concurrently, but you could
> >>>>> "inject" a
> >>>>> sync one between two async ones such that the sync ones executed from
> >>>>> the IOCTL
> >>>>> directly while async execution is stalled meanwhile. This would be
> >>>>> possible because
> >>>>> the actual drm_gpuva_ops would be calculated within the async
> >>>>> execution path rather
> >>>>> than in the IOCTL. But yes, page-table management must be desinged to
> >>>>> support that.  
> >>> FWIW, the panthor driver is designed this way (note that I'm not
> >>> supporting GEM eviction yet, so there might be subtleties I missed).  
> >> The problem is that once you've published your VM_BIND out-fence, any
> >> code path required to signal that fence may notallocate memory nor or
> >> grab any locks that allows allocating memory while held including
> >> dma_resv locks, and that means all required page-table memory needs to
> >> be allocated synchronously in the IOCTL,  
> > Yep, that's already what I do, by over-provisioning for the worst case
> > scenario (page table tree is empty), and returning unused pages after
> > the operation is done.
> >  
> >> and all evicted bos need to be
> >> made resident in the IOCTL,  
> > Yep, I'm pinning memory to BOs in that path too.
> >  
> >> and at least in the xe driver the amount of
> >> memory we need to allocate depends on the vm state, so we can't really
> >> update the vm state asynchronously either.  
> > For Mali, we can calculate the maximum amount of pages we'll need for a
> > MAP operation, by assuming the page table is empty. Then it's just a
> > matter of returning unused pages to a fast-alloc pool so we can
> > speed-up further page table allocations (we're using a kmem_cache here,
> > since the page table update is done by the CPU and memory is shared on
> > Arm, but there's no reason you can't have your own cache
> > implementation).
> >  
> >> But as long as any async binding work required for signalling the
> >> VM_BIND out-fence is properly annotated with
> >> dma_fence_begin_signalling() and dma_fence_end_signalling() and there
> >> aren't any lockdep splats, things should be good. It would trigger on
> >> both memory allocation and attempts to grab a dma_resv lock.  
> > I have dma_fence_{begin,end}_signalling() annotations in the
> > ::run_job() path, and no lockdep complaint spotted so far.
> >  
> >>  
> >>>     
> >>>> OK, well one of the main motivations for Xe is to be able to pipeline
> >>>> interleaving binds and execs if needed, like so:
> >>>>
> >>>> - Bind vmas for scene 1.
> >>>> - Submit scene 1.
> >>>> - Unbind vmas for scene 1.
> >>>> - Bind vmas for scene 2.
> >>>> - Submit scene 2.
> >>>> - Unbind vmas for scene 2.
> >>>>
> >>>> And being able to *submit* all of the above while the async binding of
> >>>> vmas for scene (step 1) has not yet completed.
> >>>> I can't really see how this could be done, while obeying dma-fence
> >>>> rules, unless state is updated synchronously while submitting?  
> >>> The idea in this case is to detect when a GPU job dependency is a
> >>> VM_BIND out-fence, turn drm_sched_fence->parent into an
> >>> xxx_vm_bind_job_fence object that's holding the GEM that's about to be
> >>> mapped (AFAICT, we don't need to do anything for unmap operations), and
> >>> then add our GPU job fence to this BO. This should not only guarantee
> >>> that the GEMs we depend on are mapped before the GPU job is executed
> >>> (the fence wait does that), but also that such yet-to-be-mapped GEMs
> >>> won't be evicted just after they've been mapped and before the GPU had
> >>> a chance to execute (unless I'm missing something, adding our GPU job
> >>> fence to the BO being targeted by a pending VM_BIND(async,map) operation
> >>> solves this problem).  
> > It's not exactly that, because we'd need to add a GEMs of all the
> > pending VM_BIND(map) jobs that come before the expressed dependency, not
> > just the one attached to the dependency itself. But after chatting with
> > Danilo, I realized we might not even need to track the GEMs being
> > mapped at the fence level if we call drm_gpuva_extobj_insert() in the
> > ioctl(VM_BIND) path:
> >
> > - drm_gpuva_extobj_insert() will make sure the GEM is added to
> >    the ext-object map even before it's actually mapped to the VM (for
> >    private GEMs, it doesn't matter, because they are using the VM resv,
> >    so any private GEM mapped will automatically receive the VM resv
> >    updates).
> >
> > Now, when a GPU job is queued, we do all the VM GEM preparation, which
> > includes the following steps:
> >
> > - drm_gpuva_manager_validate() will make already-bound-but-evicted GEMs
> >    resident
> > - Iterate over all ext-objs to add our fence (I'm skipping the slot
> >    reservation step that's implied). Because drm_gpuva_extobj_insert()
> >    was called early, we also get all the GEMs that are not yet mapped,
> >    but are about to be mapped. This means they won't be evicted until
> >    after our job is done
> > - add our fence to the VM resv
> >
> > Unless I'm missing something, this should guarantee that all GEMs are
> > resident and mapped when the job is executed.
> >  
> >> Yes, we're essentially doing the same. The issue here is that when we,
> >> for example *submit* Bind vmas for scene 2,
> >> we need to know how much page-table memory to allocate,  
> > This is solved with over-provisioning in our case.
> >  
> >> and what BOs to
> >> make resident to be able to publish the out-fence.  
> > That's basically what Danilo's latest gpuva_mgr patchset tries to
> > provide generic helpers for, by exposing functions to iterate over all
> > evicted GEMs (so we can make them resident) and adding a way to add
> > fences to all GEMs currently bound to the VM. That leaves external GEMs
> > that are about to be mapped, which, I think, is addressed by the
> > solution detailed above.
> >  
> >> That means we need to
> >> know what the VM state would look like at the end of "Unbind vmas for
> >> scene 1".  
> > Not necessarily, as long as you know all the GEMs that are currently
> > mapped and those that are about to be mapped. The extobj set provides
> > exactly that for external GEMs.
> >  
> >> If the VM state is updated at submission time, that's all ok
> >> but if it's updated at execution time, we'd have to guess what resources
> >> to pre-allocate.  
> > As long as you have enough resources pre-allocated to do the VM update
> > (not saying this is easy to guess on Intel, but it's doable on Mali,
> > and the page table caching makes over-provisioning not too bad, as long
> > as we limit the number of in-flight VM_BIND jobs).  
> 
> OK, then it sounds we're on the same page. I guess it would i theory be 
> possible to pre-allocate all needed resources on xe as well, but if the 
> vm state lock is made an inner lock in order for us to be able to grab 
> it within the dma-fence critical section, then it comes with a number of 
> drawbacks as well:
> * Over-allocation of resources.
> * Need to spawn a cpu-thread for the async part (currently we utilize 
> the GPU for that).

I guess the async CPU part is the logic returning unused resources to
the cache. You can use a work item/wq for that instead of a thread, but
yes, there's some work to be done on the CPU, indeed.

> * Probably looking at locking inversions wrt userptr?
> * Probably looking at locking inversions wrt recoverable pagefaults?

Okay, I clearly didn't look at userptr, and I briefly looked at
alloc-on-fault but didn't finish/test my implementation since I didn't
have a use case for it.

For the use cases we have, we only need to take the VM lock (the lock
protecting the VM state) when executing a VM operation (map/unmap), and
that's in the dma-signalling path where we do no allocation (thanks
to the pre-allocation logic) and no attempt to acquire a resv lock.

Tbh, I'm not even sure we'd need a lock if that wasn't for the debugfs
gpuva dumper, because drm_sched makes it so VM operations are
serialized (VM ops happen on the CPU, and there's one thread dequeuing
drm_sched_jobs).

The extobj set is protected using another lock in Danilo's
implementation (and my open-coded implementation did something similar,
though slightly broken apparently), so maybe that's the one you're
worried about.

> * Mismatch with the cpu mmap() / munmap() interface where the mmap_sem 
> is the outermost lock.
> 
> So for us currently it currently looks like the sync state update is the 
> preferred one...

Just to clarify things, I'm not trying to convince you to use the async
model, just saying that's what we went for, and, at first glance, it
didn't seem completely insane to me. But if there's something
fundamentally broken in this approach, I think I'd like to figure it
out early, so thanks for all your inputs :-).

Regards,

Boris

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
  2023-09-06 14:54                   ` Boris Brezillon
  (?)
@ 2023-09-06 15:07                     ` Thomas Hellström
  -1 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-09-06 15:07 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Danilo Krummrich, Matthew Brost, Francois Dugast, linux-kernel,
	Oak Zeng, dri-devel, Rodrigo Vivi, intel-xe

Hi, Boris

On 9/6/23 16:54, Boris Brezillon wrote:
> Hi Thomas,
>
> On Wed, 6 Sep 2023 16:08:07 +0200
> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>
>> Hi, Boris,
>>
>> On 9/6/23 15:00, Boris Brezillon wrote:
>>> On Wed, 6 Sep 2023 13:57:03 +0200
>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>   
>>>> Hi, Boris
>>>>
>>>> On 9/6/23 13:09, Boris Brezillon wrote:
>>>>> On Wed, 6 Sep 2023 10:32:24 +0200
>>>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>>>
>>>>>      
>>>>>>>>>> +Introducing external (or shared) buffer objects
>>>>>>>>>> +===============================================
>>>>>>>>>> +
>>>>>>>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>>>>>>>>>> +can't share their reservation object with a single gpu_vm, but
>>>>>>>>>> will rather
>>>>>>>>>> +have a reservation object of their own. The shared objects bound to a
>>>>>>>>>> +gpu_vm using one or many
>>>>>>>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>>>>>>>>>> +protected by the gpu_vm lock. One could in theory protect it also
>>>>>>>>>> with
>>>>>>>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is
>>>>>>>>>> typically
>>>>>>>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>>>>>>>>>> +the current locking helpers, that is typically not done. Also see
>>>>>>>>>> +below for userptr gpu_vmas.
>>>>>>>>>> +
>>>>>>>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>>>>>>>>>> +object, but we can no longer be certain that we hold the gpu_vm's
>>>>>>>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
>>>>>>>>> I need to think a bit more about locking of extobj and evicted
>>>>>>>>> object tracking
>>>>>>>>> in the case of processing 'drm_gpuva_ops' directly through callbacks
>>>>>>>>> within the
>>>>>>>>> fence signalling critical path as mentioend in [1].
>>>>>>>>>
>>>>>>>>> In order to support that, we'd need to protect extobjs with a
>>>>>>>>> separate lock,
>>>>>>>>> and while iterating extobjs to acquire the dma-resv lock drop the
>>>>>>>>> lock within
>>>>>>>>> the loop before we actually acquire the dma-resv lock. Maple tree
>>>>>>>>> supports that
>>>>>>>>> already and this can be fully done within the GPUVA manager; no need
>>>>>>>>> for the
>>>>>>>>> driver to care about that.
>>>>>>>> So do I understand correctly that this because you want to update the
>>>>>>>> gpuvm state while operations are progressing asynchronously?
>>>>>>>>
>>>>>>>> If so, I wonder whether that could really be done? For example to
>>>>>>>> allocate enough memory for page-tables etc, you need to know the
>>>>>>>> details of the operations at IOCTL execution time, and to know the
>>>>>>>> details you need to know the state from the previous operation?
>>>>>>> Right, sync and async bind can't run fully concurrently, but you could
>>>>>>> "inject" a
>>>>>>> sync one between two async ones such that the sync ones executed from
>>>>>>> the IOCTL
>>>>>>> directly while async execution is stalled meanwhile. This would be
>>>>>>> possible because
>>>>>>> the actual drm_gpuva_ops would be calculated within the async
>>>>>>> execution path rather
>>>>>>> than in the IOCTL. But yes, page-table management must be desinged to
>>>>>>> support that.
>>>>> FWIW, the panthor driver is designed this way (note that I'm not
>>>>> supporting GEM eviction yet, so there might be subtleties I missed).
>>>> The problem is that once you've published your VM_BIND out-fence, any
>>>> code path required to signal that fence may notallocate memory nor or
>>>> grab any locks that allows allocating memory while held including
>>>> dma_resv locks, and that means all required page-table memory needs to
>>>> be allocated synchronously in the IOCTL,
>>> Yep, that's already what I do, by over-provisioning for the worst case
>>> scenario (page table tree is empty), and returning unused pages after
>>> the operation is done.
>>>   
>>>> and all evicted bos need to be
>>>> made resident in the IOCTL,
>>> Yep, I'm pinning memory to BOs in that path too.
>>>   
>>>> and at least in the xe driver the amount of
>>>> memory we need to allocate depends on the vm state, so we can't really
>>>> update the vm state asynchronously either.
>>> For Mali, we can calculate the maximum amount of pages we'll need for a
>>> MAP operation, by assuming the page table is empty. Then it's just a
>>> matter of returning unused pages to a fast-alloc pool so we can
>>> speed-up further page table allocations (we're using a kmem_cache here,
>>> since the page table update is done by the CPU and memory is shared on
>>> Arm, but there's no reason you can't have your own cache
>>> implementation).
>>>   
>>>> But as long as any async binding work required for signalling the
>>>> VM_BIND out-fence is properly annotated with
>>>> dma_fence_begin_signalling() and dma_fence_end_signalling() and there
>>>> aren't any lockdep splats, things should be good. It would trigger on
>>>> both memory allocation and attempts to grab a dma_resv lock.
>>> I have dma_fence_{begin,end}_signalling() annotations in the
>>> ::run_job() path, and no lockdep complaint spotted so far.
>>>   
>>>>   
>>>>>      
>>>>>> OK, well one of the main motivations for Xe is to be able to pipeline
>>>>>> interleaving binds and execs if needed, like so:
>>>>>>
>>>>>> - Bind vmas for scene 1.
>>>>>> - Submit scene 1.
>>>>>> - Unbind vmas for scene 1.
>>>>>> - Bind vmas for scene 2.
>>>>>> - Submit scene 2.
>>>>>> - Unbind vmas for scene 2.
>>>>>>
>>>>>> And being able to *submit* all of the above while the async binding of
>>>>>> vmas for scene (step 1) has not yet completed.
>>>>>> I can't really see how this could be done, while obeying dma-fence
>>>>>> rules, unless state is updated synchronously while submitting?
>>>>> The idea in this case is to detect when a GPU job dependency is a
>>>>> VM_BIND out-fence, turn drm_sched_fence->parent into an
>>>>> xxx_vm_bind_job_fence object that's holding the GEM that's about to be
>>>>> mapped (AFAICT, we don't need to do anything for unmap operations), and
>>>>> then add our GPU job fence to this BO. This should not only guarantee
>>>>> that the GEMs we depend on are mapped before the GPU job is executed
>>>>> (the fence wait does that), but also that such yet-to-be-mapped GEMs
>>>>> won't be evicted just after they've been mapped and before the GPU had
>>>>> a chance to execute (unless I'm missing something, adding our GPU job
>>>>> fence to the BO being targeted by a pending VM_BIND(async,map) operation
>>>>> solves this problem).
>>> It's not exactly that, because we'd need to add a GEMs of all the
>>> pending VM_BIND(map) jobs that come before the expressed dependency, not
>>> just the one attached to the dependency itself. But after chatting with
>>> Danilo, I realized we might not even need to track the GEMs being
>>> mapped at the fence level if we call drm_gpuva_extobj_insert() in the
>>> ioctl(VM_BIND) path:
>>>
>>> - drm_gpuva_extobj_insert() will make sure the GEM is added to
>>>     the ext-object map even before it's actually mapped to the VM (for
>>>     private GEMs, it doesn't matter, because they are using the VM resv,
>>>     so any private GEM mapped will automatically receive the VM resv
>>>     updates).
>>>
>>> Now, when a GPU job is queued, we do all the VM GEM preparation, which
>>> includes the following steps:
>>>
>>> - drm_gpuva_manager_validate() will make already-bound-but-evicted GEMs
>>>     resident
>>> - Iterate over all ext-objs to add our fence (I'm skipping the slot
>>>     reservation step that's implied). Because drm_gpuva_extobj_insert()
>>>     was called early, we also get all the GEMs that are not yet mapped,
>>>     but are about to be mapped. This means they won't be evicted until
>>>     after our job is done
>>> - add our fence to the VM resv
>>>
>>> Unless I'm missing something, this should guarantee that all GEMs are
>>> resident and mapped when the job is executed.
>>>   
>>>> Yes, we're essentially doing the same. The issue here is that when we,
>>>> for example *submit* Bind vmas for scene 2,
>>>> we need to know how much page-table memory to allocate,
>>> This is solved with over-provisioning in our case.
>>>   
>>>> and what BOs to
>>>> make resident to be able to publish the out-fence.
>>> That's basically what Danilo's latest gpuva_mgr patchset tries to
>>> provide generic helpers for, by exposing functions to iterate over all
>>> evicted GEMs (so we can make them resident) and adding a way to add
>>> fences to all GEMs currently bound to the VM. That leaves external GEMs
>>> that are about to be mapped, which, I think, is addressed by the
>>> solution detailed above.
>>>   
>>>> That means we need to
>>>> know what the VM state would look like at the end of "Unbind vmas for
>>>> scene 1".
>>> Not necessarily, as long as you know all the GEMs that are currently
>>> mapped and those that are about to be mapped. The extobj set provides
>>> exactly that for external GEMs.
>>>   
>>>> If the VM state is updated at submission time, that's all ok
>>>> but if it's updated at execution time, we'd have to guess what resources
>>>> to pre-allocate.
>>> As long as you have enough resources pre-allocated to do the VM update
>>> (not saying this is easy to guess on Intel, but it's doable on Mali,
>>> and the page table caching makes over-provisioning not too bad, as long
>>> as we limit the number of in-flight VM_BIND jobs).
>> OK, then it sounds we're on the same page. I guess it would i theory be
>> possible to pre-allocate all needed resources on xe as well, but if the
>> vm state lock is made an inner lock in order for us to be able to grab
>> it within the dma-fence critical section, then it comes with a number of
>> drawbacks as well:
>> * Over-allocation of resources.
>> * Need to spawn a cpu-thread for the async part (currently we utilize
>> the GPU for that).
> I guess the async CPU part is the logic returning unused resources to
> the cache. You can use a work item/wq for that instead of a thread, but
> yes, there's some work to be done on the CPU, indeed.
>
>> * Probably looking at locking inversions wrt userptr?
>> * Probably looking at locking inversions wrt recoverable pagefaults?
> Okay, I clearly didn't look at userptr, and I briefly looked at
> alloc-on-fault but didn't finish/test my implementation since I didn't
> have a use case for it.
>
> For the use cases we have, we only need to take the VM lock (the lock
> protecting the VM state) when executing a VM operation (map/unmap), and
> that's in the dma-signalling path where we do no allocation (thanks
> to the pre-allocation logic) and no attempt to acquire a resv lock.
>
> Tbh, I'm not even sure we'd need a lock if that wasn't for the debugfs
> gpuva dumper, because drm_sched makes it so VM operations are
> serialized (VM ops happen on the CPU, and there's one thread dequeuing
> drm_sched_jobs).
>
> The extobj set is protected using another lock in Danilo's
> implementation (and my open-coded implementation did something similar,
> though slightly broken apparently), so maybe that's the one you're
> worried about.
>
>> * Mismatch with the cpu mmap() / munmap() interface where the mmap_sem
>> is the outermost lock.
>>
>> So for us currently it currently looks like the sync state update is the
>> preferred one...
> Just to clarify things, I'm not trying to convince you to use the async
> model, just saying that's what we went for, and, at first glance, it
> didn't seem completely insane to me. But if there's something
> fundamentally broken in this approach, I think I'd like to figure it
> out early, so thanks for all your inputs :-).

Yeah I think these discussions are really beneficial and helps at least 
me understand other choices as well, and if it can help providing good 
input to the design of the GPUVA manager, thats  a great benefit as well.

In the end, we have a task to document the VM_BIND locking as part of 
merging Xe, and if there are different flavours, I'll look at 
documenting these to, and in what manner they differ.

Thanks,

Thomas


>
> Regards,
>
> Boris

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06 15:07                     ` Thomas Hellström
  0 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-09-06 15:07 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Matthew Brost, Francois Dugast, linux-kernel, dri-devel,
	Danilo Krummrich, Oak Zeng, Rodrigo Vivi, intel-xe

Hi, Boris

On 9/6/23 16:54, Boris Brezillon wrote:
> Hi Thomas,
>
> On Wed, 6 Sep 2023 16:08:07 +0200
> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>
>> Hi, Boris,
>>
>> On 9/6/23 15:00, Boris Brezillon wrote:
>>> On Wed, 6 Sep 2023 13:57:03 +0200
>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>   
>>>> Hi, Boris
>>>>
>>>> On 9/6/23 13:09, Boris Brezillon wrote:
>>>>> On Wed, 6 Sep 2023 10:32:24 +0200
>>>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>>>
>>>>>      
>>>>>>>>>> +Introducing external (or shared) buffer objects
>>>>>>>>>> +===============================================
>>>>>>>>>> +
>>>>>>>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>>>>>>>>>> +can't share their reservation object with a single gpu_vm, but
>>>>>>>>>> will rather
>>>>>>>>>> +have a reservation object of their own. The shared objects bound to a
>>>>>>>>>> +gpu_vm using one or many
>>>>>>>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>>>>>>>>>> +protected by the gpu_vm lock. One could in theory protect it also
>>>>>>>>>> with
>>>>>>>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is
>>>>>>>>>> typically
>>>>>>>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>>>>>>>>>> +the current locking helpers, that is typically not done. Also see
>>>>>>>>>> +below for userptr gpu_vmas.
>>>>>>>>>> +
>>>>>>>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>>>>>>>>>> +object, but we can no longer be certain that we hold the gpu_vm's
>>>>>>>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
>>>>>>>>> I need to think a bit more about locking of extobj and evicted
>>>>>>>>> object tracking
>>>>>>>>> in the case of processing 'drm_gpuva_ops' directly through callbacks
>>>>>>>>> within the
>>>>>>>>> fence signalling critical path as mentioend in [1].
>>>>>>>>>
>>>>>>>>> In order to support that, we'd need to protect extobjs with a
>>>>>>>>> separate lock,
>>>>>>>>> and while iterating extobjs to acquire the dma-resv lock drop the
>>>>>>>>> lock within
>>>>>>>>> the loop before we actually acquire the dma-resv lock. Maple tree
>>>>>>>>> supports that
>>>>>>>>> already and this can be fully done within the GPUVA manager; no need
>>>>>>>>> for the
>>>>>>>>> driver to care about that.
>>>>>>>> So do I understand correctly that this because you want to update the
>>>>>>>> gpuvm state while operations are progressing asynchronously?
>>>>>>>>
>>>>>>>> If so, I wonder whether that could really be done? For example to
>>>>>>>> allocate enough memory for page-tables etc, you need to know the
>>>>>>>> details of the operations at IOCTL execution time, and to know the
>>>>>>>> details you need to know the state from the previous operation?
>>>>>>> Right, sync and async bind can't run fully concurrently, but you could
>>>>>>> "inject" a
>>>>>>> sync one between two async ones such that the sync ones executed from
>>>>>>> the IOCTL
>>>>>>> directly while async execution is stalled meanwhile. This would be
>>>>>>> possible because
>>>>>>> the actual drm_gpuva_ops would be calculated within the async
>>>>>>> execution path rather
>>>>>>> than in the IOCTL. But yes, page-table management must be desinged to
>>>>>>> support that.
>>>>> FWIW, the panthor driver is designed this way (note that I'm not
>>>>> supporting GEM eviction yet, so there might be subtleties I missed).
>>>> The problem is that once you've published your VM_BIND out-fence, any
>>>> code path required to signal that fence may notallocate memory nor or
>>>> grab any locks that allows allocating memory while held including
>>>> dma_resv locks, and that means all required page-table memory needs to
>>>> be allocated synchronously in the IOCTL,
>>> Yep, that's already what I do, by over-provisioning for the worst case
>>> scenario (page table tree is empty), and returning unused pages after
>>> the operation is done.
>>>   
>>>> and all evicted bos need to be
>>>> made resident in the IOCTL,
>>> Yep, I'm pinning memory to BOs in that path too.
>>>   
>>>> and at least in the xe driver the amount of
>>>> memory we need to allocate depends on the vm state, so we can't really
>>>> update the vm state asynchronously either.
>>> For Mali, we can calculate the maximum amount of pages we'll need for a
>>> MAP operation, by assuming the page table is empty. Then it's just a
>>> matter of returning unused pages to a fast-alloc pool so we can
>>> speed-up further page table allocations (we're using a kmem_cache here,
>>> since the page table update is done by the CPU and memory is shared on
>>> Arm, but there's no reason you can't have your own cache
>>> implementation).
>>>   
>>>> But as long as any async binding work required for signalling the
>>>> VM_BIND out-fence is properly annotated with
>>>> dma_fence_begin_signalling() and dma_fence_end_signalling() and there
>>>> aren't any lockdep splats, things should be good. It would trigger on
>>>> both memory allocation and attempts to grab a dma_resv lock.
>>> I have dma_fence_{begin,end}_signalling() annotations in the
>>> ::run_job() path, and no lockdep complaint spotted so far.
>>>   
>>>>   
>>>>>      
>>>>>> OK, well one of the main motivations for Xe is to be able to pipeline
>>>>>> interleaving binds and execs if needed, like so:
>>>>>>
>>>>>> - Bind vmas for scene 1.
>>>>>> - Submit scene 1.
>>>>>> - Unbind vmas for scene 1.
>>>>>> - Bind vmas for scene 2.
>>>>>> - Submit scene 2.
>>>>>> - Unbind vmas for scene 2.
>>>>>>
>>>>>> And being able to *submit* all of the above while the async binding of
>>>>>> vmas for scene (step 1) has not yet completed.
>>>>>> I can't really see how this could be done, while obeying dma-fence
>>>>>> rules, unless state is updated synchronously while submitting?
>>>>> The idea in this case is to detect when a GPU job dependency is a
>>>>> VM_BIND out-fence, turn drm_sched_fence->parent into an
>>>>> xxx_vm_bind_job_fence object that's holding the GEM that's about to be
>>>>> mapped (AFAICT, we don't need to do anything for unmap operations), and
>>>>> then add our GPU job fence to this BO. This should not only guarantee
>>>>> that the GEMs we depend on are mapped before the GPU job is executed
>>>>> (the fence wait does that), but also that such yet-to-be-mapped GEMs
>>>>> won't be evicted just after they've been mapped and before the GPU had
>>>>> a chance to execute (unless I'm missing something, adding our GPU job
>>>>> fence to the BO being targeted by a pending VM_BIND(async,map) operation
>>>>> solves this problem).
>>> It's not exactly that, because we'd need to add a GEMs of all the
>>> pending VM_BIND(map) jobs that come before the expressed dependency, not
>>> just the one attached to the dependency itself. But after chatting with
>>> Danilo, I realized we might not even need to track the GEMs being
>>> mapped at the fence level if we call drm_gpuva_extobj_insert() in the
>>> ioctl(VM_BIND) path:
>>>
>>> - drm_gpuva_extobj_insert() will make sure the GEM is added to
>>>     the ext-object map even before it's actually mapped to the VM (for
>>>     private GEMs, it doesn't matter, because they are using the VM resv,
>>>     so any private GEM mapped will automatically receive the VM resv
>>>     updates).
>>>
>>> Now, when a GPU job is queued, we do all the VM GEM preparation, which
>>> includes the following steps:
>>>
>>> - drm_gpuva_manager_validate() will make already-bound-but-evicted GEMs
>>>     resident
>>> - Iterate over all ext-objs to add our fence (I'm skipping the slot
>>>     reservation step that's implied). Because drm_gpuva_extobj_insert()
>>>     was called early, we also get all the GEMs that are not yet mapped,
>>>     but are about to be mapped. This means they won't be evicted until
>>>     after our job is done
>>> - add our fence to the VM resv
>>>
>>> Unless I'm missing something, this should guarantee that all GEMs are
>>> resident and mapped when the job is executed.
>>>   
>>>> Yes, we're essentially doing the same. The issue here is that when we,
>>>> for example *submit* Bind vmas for scene 2,
>>>> we need to know how much page-table memory to allocate,
>>> This is solved with over-provisioning in our case.
>>>   
>>>> and what BOs to
>>>> make resident to be able to publish the out-fence.
>>> That's basically what Danilo's latest gpuva_mgr patchset tries to
>>> provide generic helpers for, by exposing functions to iterate over all
>>> evicted GEMs (so we can make them resident) and adding a way to add
>>> fences to all GEMs currently bound to the VM. That leaves external GEMs
>>> that are about to be mapped, which, I think, is addressed by the
>>> solution detailed above.
>>>   
>>>> That means we need to
>>>> know what the VM state would look like at the end of "Unbind vmas for
>>>> scene 1".
>>> Not necessarily, as long as you know all the GEMs that are currently
>>> mapped and those that are about to be mapped. The extobj set provides
>>> exactly that for external GEMs.
>>>   
>>>> If the VM state is updated at submission time, that's all ok
>>>> but if it's updated at execution time, we'd have to guess what resources
>>>> to pre-allocate.
>>> As long as you have enough resources pre-allocated to do the VM update
>>> (not saying this is easy to guess on Intel, but it's doable on Mali,
>>> and the page table caching makes over-provisioning not too bad, as long
>>> as we limit the number of in-flight VM_BIND jobs).
>> OK, then it sounds we're on the same page. I guess it would i theory be
>> possible to pre-allocate all needed resources on xe as well, but if the
>> vm state lock is made an inner lock in order for us to be able to grab
>> it within the dma-fence critical section, then it comes with a number of
>> drawbacks as well:
>> * Over-allocation of resources.
>> * Need to spawn a cpu-thread for the async part (currently we utilize
>> the GPU for that).
> I guess the async CPU part is the logic returning unused resources to
> the cache. You can use a work item/wq for that instead of a thread, but
> yes, there's some work to be done on the CPU, indeed.
>
>> * Probably looking at locking inversions wrt userptr?
>> * Probably looking at locking inversions wrt recoverable pagefaults?
> Okay, I clearly didn't look at userptr, and I briefly looked at
> alloc-on-fault but didn't finish/test my implementation since I didn't
> have a use case for it.
>
> For the use cases we have, we only need to take the VM lock (the lock
> protecting the VM state) when executing a VM operation (map/unmap), and
> that's in the dma-signalling path where we do no allocation (thanks
> to the pre-allocation logic) and no attempt to acquire a resv lock.
>
> Tbh, I'm not even sure we'd need a lock if that wasn't for the debugfs
> gpuva dumper, because drm_sched makes it so VM operations are
> serialized (VM ops happen on the CPU, and there's one thread dequeuing
> drm_sched_jobs).
>
> The extobj set is protected using another lock in Danilo's
> implementation (and my open-coded implementation did something similar,
> though slightly broken apparently), so maybe that's the one you're
> worried about.
>
>> * Mismatch with the cpu mmap() / munmap() interface where the mmap_sem
>> is the outermost lock.
>>
>> So for us currently it currently looks like the sync state update is the
>> preferred one...
> Just to clarify things, I'm not trying to convince you to use the async
> model, just saying that's what we went for, and, at first glance, it
> didn't seem completely insane to me. But if there's something
> fundamentally broken in this approach, I think I'd like to figure it
> out early, so thanks for all your inputs :-).

Yeah I think these discussions are really beneficial and helps at least 
me understand other choices as well, and if it can help providing good 
input to the design of the GPUVA manager, thats  a great benefit as well.

In the end, we have a task to document the VM_BIND locking as part of 
merging Xe, and if there are different flavours, I'll look at 
documenting these to, and in what manner they differ.

Thanks,

Thomas


>
> Regards,
>
> Boris

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Intel-xe] [PATCH v2] Documentation/gpu: VM_BIND locking document
@ 2023-09-06 15:07                     ` Thomas Hellström
  0 siblings, 0 replies; 45+ messages in thread
From: Thomas Hellström @ 2023-09-06 15:07 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Francois Dugast, linux-kernel, dri-devel, Danilo Krummrich,
	Rodrigo Vivi, intel-xe

Hi, Boris

On 9/6/23 16:54, Boris Brezillon wrote:
> Hi Thomas,
>
> On Wed, 6 Sep 2023 16:08:07 +0200
> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>
>> Hi, Boris,
>>
>> On 9/6/23 15:00, Boris Brezillon wrote:
>>> On Wed, 6 Sep 2023 13:57:03 +0200
>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>   
>>>> Hi, Boris
>>>>
>>>> On 9/6/23 13:09, Boris Brezillon wrote:
>>>>> On Wed, 6 Sep 2023 10:32:24 +0200
>>>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>>>
>>>>>      
>>>>>>>>>> +Introducing external (or shared) buffer objects
>>>>>>>>>> +===============================================
>>>>>>>>>> +
>>>>>>>>>> +Since shared buffer objects may be shared by multiple gpu_vm's they
>>>>>>>>>> +can't share their reservation object with a single gpu_vm, but
>>>>>>>>>> will rather
>>>>>>>>>> +have a reservation object of their own. The shared objects bound to a
>>>>>>>>>> +gpu_vm using one or many
>>>>>>>>>> +gpu_vmas are therefore typically put on a per-gpu_vm list which is
>>>>>>>>>> +protected by the gpu_vm lock. One could in theory protect it also
>>>>>>>>>> with
>>>>>>>>>> +the ``gpu_vm->resv``, but since the list of dma_resvs to take is
>>>>>>>>>> typically
>>>>>>>>>> +built before the ``gpu_vm->resv`` is locked due to a limitation in
>>>>>>>>>> +the current locking helpers, that is typically not done. Also see
>>>>>>>>>> +below for userptr gpu_vmas.
>>>>>>>>>> +
>>>>>>>>>> +At eviction time we now need to invalidate *all* gpu_vmas of a shared
>>>>>>>>>> +object, but we can no longer be certain that we hold the gpu_vm's
>>>>>>>>>> +dma_resv of all the object's gpu_vmas. We can only be certain that we
>>>>>>>>> I need to think a bit more about locking of extobj and evicted
>>>>>>>>> object tracking
>>>>>>>>> in the case of processing 'drm_gpuva_ops' directly through callbacks
>>>>>>>>> within the
>>>>>>>>> fence signalling critical path as mentioend in [1].
>>>>>>>>>
>>>>>>>>> In order to support that, we'd need to protect extobjs with a
>>>>>>>>> separate lock,
>>>>>>>>> and while iterating extobjs to acquire the dma-resv lock drop the
>>>>>>>>> lock within
>>>>>>>>> the loop before we actually acquire the dma-resv lock. Maple tree
>>>>>>>>> supports that
>>>>>>>>> already and this can be fully done within the GPUVA manager; no need
>>>>>>>>> for the
>>>>>>>>> driver to care about that.
>>>>>>>> So do I understand correctly that this because you want to update the
>>>>>>>> gpuvm state while operations are progressing asynchronously?
>>>>>>>>
>>>>>>>> If so, I wonder whether that could really be done? For example to
>>>>>>>> allocate enough memory for page-tables etc, you need to know the
>>>>>>>> details of the operations at IOCTL execution time, and to know the
>>>>>>>> details you need to know the state from the previous operation?
>>>>>>> Right, sync and async bind can't run fully concurrently, but you could
>>>>>>> "inject" a
>>>>>>> sync one between two async ones such that the sync ones executed from
>>>>>>> the IOCTL
>>>>>>> directly while async execution is stalled meanwhile. This would be
>>>>>>> possible because
>>>>>>> the actual drm_gpuva_ops would be calculated within the async
>>>>>>> execution path rather
>>>>>>> than in the IOCTL. But yes, page-table management must be desinged to
>>>>>>> support that.
>>>>> FWIW, the panthor driver is designed this way (note that I'm not
>>>>> supporting GEM eviction yet, so there might be subtleties I missed).
>>>> The problem is that once you've published your VM_BIND out-fence, any
>>>> code path required to signal that fence may notallocate memory nor or
>>>> grab any locks that allows allocating memory while held including
>>>> dma_resv locks, and that means all required page-table memory needs to
>>>> be allocated synchronously in the IOCTL,
>>> Yep, that's already what I do, by over-provisioning for the worst case
>>> scenario (page table tree is empty), and returning unused pages after
>>> the operation is done.
>>>   
>>>> and all evicted bos need to be
>>>> made resident in the IOCTL,
>>> Yep, I'm pinning memory to BOs in that path too.
>>>   
>>>> and at least in the xe driver the amount of
>>>> memory we need to allocate depends on the vm state, so we can't really
>>>> update the vm state asynchronously either.
>>> For Mali, we can calculate the maximum amount of pages we'll need for a
>>> MAP operation, by assuming the page table is empty. Then it's just a
>>> matter of returning unused pages to a fast-alloc pool so we can
>>> speed-up further page table allocations (we're using a kmem_cache here,
>>> since the page table update is done by the CPU and memory is shared on
>>> Arm, but there's no reason you can't have your own cache
>>> implementation).
>>>   
>>>> But as long as any async binding work required for signalling the
>>>> VM_BIND out-fence is properly annotated with
>>>> dma_fence_begin_signalling() and dma_fence_end_signalling() and there
>>>> aren't any lockdep splats, things should be good. It would trigger on
>>>> both memory allocation and attempts to grab a dma_resv lock.
>>> I have dma_fence_{begin,end}_signalling() annotations in the
>>> ::run_job() path, and no lockdep complaint spotted so far.
>>>   
>>>>   
>>>>>      
>>>>>> OK, well one of the main motivations for Xe is to be able to pipeline
>>>>>> interleaving binds and execs if needed, like so:
>>>>>>
>>>>>> - Bind vmas for scene 1.
>>>>>> - Submit scene 1.
>>>>>> - Unbind vmas for scene 1.
>>>>>> - Bind vmas for scene 2.
>>>>>> - Submit scene 2.
>>>>>> - Unbind vmas for scene 2.
>>>>>>
>>>>>> And being able to *submit* all of the above while the async binding of
>>>>>> vmas for scene (step 1) has not yet completed.
>>>>>> I can't really see how this could be done, while obeying dma-fence
>>>>>> rules, unless state is updated synchronously while submitting?
>>>>> The idea in this case is to detect when a GPU job dependency is a
>>>>> VM_BIND out-fence, turn drm_sched_fence->parent into an
>>>>> xxx_vm_bind_job_fence object that's holding the GEM that's about to be
>>>>> mapped (AFAICT, we don't need to do anything for unmap operations), and
>>>>> then add our GPU job fence to this BO. This should not only guarantee
>>>>> that the GEMs we depend on are mapped before the GPU job is executed
>>>>> (the fence wait does that), but also that such yet-to-be-mapped GEMs
>>>>> won't be evicted just after they've been mapped and before the GPU had
>>>>> a chance to execute (unless I'm missing something, adding our GPU job
>>>>> fence to the BO being targeted by a pending VM_BIND(async,map) operation
>>>>> solves this problem).
>>> It's not exactly that, because we'd need to add a GEMs of all the
>>> pending VM_BIND(map) jobs that come before the expressed dependency, not
>>> just the one attached to the dependency itself. But after chatting with
>>> Danilo, I realized we might not even need to track the GEMs being
>>> mapped at the fence level if we call drm_gpuva_extobj_insert() in the
>>> ioctl(VM_BIND) path:
>>>
>>> - drm_gpuva_extobj_insert() will make sure the GEM is added to
>>>     the ext-object map even before it's actually mapped to the VM (for
>>>     private GEMs, it doesn't matter, because they are using the VM resv,
>>>     so any private GEM mapped will automatically receive the VM resv
>>>     updates).
>>>
>>> Now, when a GPU job is queued, we do all the VM GEM preparation, which
>>> includes the following steps:
>>>
>>> - drm_gpuva_manager_validate() will make already-bound-but-evicted GEMs
>>>     resident
>>> - Iterate over all ext-objs to add our fence (I'm skipping the slot
>>>     reservation step that's implied). Because drm_gpuva_extobj_insert()
>>>     was called early, we also get all the GEMs that are not yet mapped,
>>>     but are about to be mapped. This means they won't be evicted until
>>>     after our job is done
>>> - add our fence to the VM resv
>>>
>>> Unless I'm missing something, this should guarantee that all GEMs are
>>> resident and mapped when the job is executed.
>>>   
>>>> Yes, we're essentially doing the same. The issue here is that when we,
>>>> for example *submit* Bind vmas for scene 2,
>>>> we need to know how much page-table memory to allocate,
>>> This is solved with over-provisioning in our case.
>>>   
>>>> and what BOs to
>>>> make resident to be able to publish the out-fence.
>>> That's basically what Danilo's latest gpuva_mgr patchset tries to
>>> provide generic helpers for, by exposing functions to iterate over all
>>> evicted GEMs (so we can make them resident) and adding a way to add
>>> fences to all GEMs currently bound to the VM. That leaves external GEMs
>>> that are about to be mapped, which, I think, is addressed by the
>>> solution detailed above.
>>>   
>>>> That means we need to
>>>> know what the VM state would look like at the end of "Unbind vmas for
>>>> scene 1".
>>> Not necessarily, as long as you know all the GEMs that are currently
>>> mapped and those that are about to be mapped. The extobj set provides
>>> exactly that for external GEMs.
>>>   
>>>> If the VM state is updated at submission time, that's all ok
>>>> but if it's updated at execution time, we'd have to guess what resources
>>>> to pre-allocate.
>>> As long as you have enough resources pre-allocated to do the VM update
>>> (not saying this is easy to guess on Intel, but it's doable on Mali,
>>> and the page table caching makes over-provisioning not too bad, as long
>>> as we limit the number of in-flight VM_BIND jobs).
>> OK, then it sounds we're on the same page. I guess it would i theory be
>> possible to pre-allocate all needed resources on xe as well, but if the
>> vm state lock is made an inner lock in order for us to be able to grab
>> it within the dma-fence critical section, then it comes with a number of
>> drawbacks as well:
>> * Over-allocation of resources.
>> * Need to spawn a cpu-thread for the async part (currently we utilize
>> the GPU for that).
> I guess the async CPU part is the logic returning unused resources to
> the cache. You can use a work item/wq for that instead of a thread, but
> yes, there's some work to be done on the CPU, indeed.
>
>> * Probably looking at locking inversions wrt userptr?
>> * Probably looking at locking inversions wrt recoverable pagefaults?
> Okay, I clearly didn't look at userptr, and I briefly looked at
> alloc-on-fault but didn't finish/test my implementation since I didn't
> have a use case for it.
>
> For the use cases we have, we only need to take the VM lock (the lock
> protecting the VM state) when executing a VM operation (map/unmap), and
> that's in the dma-signalling path where we do no allocation (thanks
> to the pre-allocation logic) and no attempt to acquire a resv lock.
>
> Tbh, I'm not even sure we'd need a lock if that wasn't for the debugfs
> gpuva dumper, because drm_sched makes it so VM operations are
> serialized (VM ops happen on the CPU, and there's one thread dequeuing
> drm_sched_jobs).
>
> The extobj set is protected using another lock in Danilo's
> implementation (and my open-coded implementation did something similar,
> though slightly broken apparently), so maybe that's the one you're
> worried about.
>
>> * Mismatch with the cpu mmap() / munmap() interface where the mmap_sem
>> is the outermost lock.
>>
>> So for us currently it currently looks like the sync state update is the
>> preferred one...
> Just to clarify things, I'm not trying to convince you to use the async
> model, just saying that's what we went for, and, at first glance, it
> didn't seem completely insane to me. But if there's something
> fundamentally broken in this approach, I think I'd like to figure it
> out early, so thanks for all your inputs :-).

Yeah I think these discussions are really beneficial and helps at least 
me understand other choices as well, and if it can help providing good 
input to the design of the GPUVA manager, thats  a great benefit as well.

In the end, we have a task to document the VM_BIND locking as part of 
merging Xe, and if there are different flavours, I'll look at 
documenting these to, and in what manner they differ.

Thanks,

Thomas


>
> Regards,
>
> Boris

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2023-09-06 15:07 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-16  9:15 [PATCH v2] Documentation/gpu: VM_BIND locking document Thomas Hellström
2023-08-16  9:15 ` [Intel-xe] " Thomas Hellström
2023-08-16  9:15 ` Thomas Hellström
2023-08-16  9:56 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
2023-08-16  9:57 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-08-16  9:58 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-08-16 10:02 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-08-16 10:02 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-08-16 10:02 ` [Intel-xe] ✗ CI.checksparse: warning " Patchwork
2023-08-16 10:26 ` [Intel-xe] ✓ CI.BAT: success " Patchwork
2023-08-17  2:05 ` [PATCH v2] " kernel test robot
2023-08-17  2:05   ` kernel test robot
2023-08-17  2:05   ` [Intel-xe] " kernel test robot
2023-08-31 19:30 ` Rodrigo Vivi
2023-08-31 19:30   ` [Intel-xe] " Rodrigo Vivi
2023-08-31 19:30   ` Rodrigo Vivi
2023-09-05 19:50 ` Danilo Krummrich
2023-09-05 19:50   ` Danilo Krummrich
2023-09-05 19:50   ` [Intel-xe] " Danilo Krummrich
2023-09-06  7:06   ` Thomas Hellström
2023-09-06  7:06     ` [Intel-xe] " Thomas Hellström
2023-09-06  7:06     ` Thomas Hellström
2023-09-06  8:00     ` Danilo Krummrich
2023-09-06  8:00       ` Danilo Krummrich
2023-09-06  8:00       ` [Intel-xe] " Danilo Krummrich
2023-09-06  8:32       ` Thomas Hellström
2023-09-06  8:32         ` [Intel-xe] " Thomas Hellström
2023-09-06 11:09         ` Boris Brezillon
2023-09-06 11:09           ` [Intel-xe] " Boris Brezillon
2023-09-06 11:09           ` Boris Brezillon
2023-09-06 11:57           ` Thomas Hellström
2023-09-06 11:57             ` [Intel-xe] " Thomas Hellström
2023-09-06 11:57             ` Thomas Hellström
2023-09-06 13:00             ` Boris Brezillon
2023-09-06 13:00               ` [Intel-xe] " Boris Brezillon
2023-09-06 13:00               ` Boris Brezillon
2023-09-06 14:08               ` Thomas Hellström
2023-09-06 14:08                 ` [Intel-xe] " Thomas Hellström
2023-09-06 14:08                 ` Thomas Hellström
2023-09-06 14:54                 ` [Intel-xe] " Boris Brezillon
2023-09-06 14:54                   ` Boris Brezillon
2023-09-06 14:54                   ` Boris Brezillon
2023-09-06 15:07                   ` Thomas Hellström
2023-09-06 15:07                     ` [Intel-xe] " Thomas Hellström
2023-09-06 15:07                     ` Thomas Hellström

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.