dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
@ 2023-02-17 13:44 Danilo Krummrich
  2023-02-17 13:44 ` [PATCH drm-next v2 01/16] drm: execution context for GEM buffers Danilo Krummrich
                   ` (16 more replies)
  0 siblings, 17 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:44 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm, Danilo Krummrich

This patch series provides a new UAPI for the Nouveau driver in order to
support Vulkan features, such as sparse bindings and sparse residency.

Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to
keep track of GPU virtual address (VA) mappings in a more generic way.

The DRM GPUVA manager is indented to help drivers implement userspace-manageable
GPU VA spaces in reference to the Vulkan API. In order to achieve this goal it
serves the following purposes in this context.

    1) Provide infrastructure to track GPU VA allocations and mappings,
       making use of the maple_tree.

    2) Generically connect GPU VA mappings to their backing buffers, in
       particular DRM GEM objects.

    3) Provide a common implementation to perform more complex mapping
       operations on the GPU VA space. In particular splitting and merging
       of GPU VA mappings, e.g. for intersecting mapping requests or partial
       unmap requests.

The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, itself
providing the following new interfaces.

    1) Initialize a GPU VA space via the new DRM_IOCTL_NOUVEAU_VM_INIT ioctl
       for UMDs to specify the portion of VA space managed by the kernel and
       userspace, respectively.

    2) Allocate and free a VA space region as well as bind and unbind memory
       to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.

    3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use of the DRM
scheduler to queue jobs and support asynchronous processing with DRM syncobjs
as synchronization mechanism.

By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.

The new VM_BIND UAPI for Nouveau makes also use of drm_exec (execution context
for GEM buffers) by Christian König. Since the patch implementing drm_exec was
not yet merged into drm-next it is part of this series, as well as a small fix
for this patch, which was found while testing this series.

This patch series is also available at [1].

There is a Mesa NVK merge request by Dave Airlie [2] implementing the
corresponding userspace parts for this series.

The Vulkan CTS test suite passes the sparse binding and sparse residency test
cases for the new UAPI together with Dave's Mesa work.

There are also some test cases in the igt-gpu-tools project [3] for the new UAPI
and hence the DRM GPU VA manager. However, most of them are testing the DRM GPU
VA manager's logic through Nouveau's new UAPI and should be considered just as
helper for implementation.

However, I absolutely intend to change those test cases to proper kunit test
cases for the DRM GPUVA manager, once and if we agree on it's usefulness and
design.

[1] https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
    https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
[2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
[3] https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind

Changes in V2:
==============
  Nouveau:
    - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
      signalling critical sections. Updates to the VA space are split up in three
      separate stages, where only the 2. stage executes in a fence signalling
      critical section:

        1. update the VA space, allocate new structures and page tables
        2. (un-)map the requested memory bindings
        3. free structures and page tables

    - Separated generic job scheduler code from specific job implementations.
    - Separated the EXEC and VM_BIND implementation of the UAPI.
    - Reworked the locking parts of the nvkm/vmm RAW interface, such that
      (un-)map operations can be executed in fence signalling critical sections.

  GPUVA Manager:
    - made drm_gpuva_regions optional for users of the GPUVA manager
    - allow NULL GEMs for drm_gpuva entries
    - swichted from drm_mm to maple_tree for track drm_gpuva / drm_gpuva_region
      entries
    - provide callbacks for users to allocate custom drm_gpuva_op structures to
      allow inheritance
    - added user bits to drm_gpuva_flags
    - added a prefetch operation type in order to support generating prefetch
      operations in the same way other operations generated
    - hand the responsibility for mutual exclusion for a GEM's
      drm_gpuva list to the user; simplified corresponding (un-)link functions

  Maple Tree:
    - I added two maple tree patches to the series, one to support custom tree
      walk macros and one to hand the locking responsibility to the user of the
      GPUVA manager without pre-defined lockdep checks.

TODO
====
  Maple Tree:
    - Maple tree uses the 'unsinged long' type for node entries. While this
      works for 64bit, it's incompatible with the DRM GPUVA Manager on 32bit,
      since the DRM GPUVA Manager uses the u64 type and so do drivers using it.
      While it's questionable whether a 32bit kernel and a > 32bit GPU address
      space make any sense, it creates tons of compiler warnings when compiling
      for 32bit. Maybe it makes sense to expand the maple tree API to let users
      decide which size to pick - other ideas / proposals are welcome.

Christian König (1):
  drm: execution context for GEM buffers

Danilo Krummrich (15):
  drm/exec: fix memory leak in drm_exec_prepare_obj()
  maple_tree: split up MA_STATE() macro
  maple_tree: add flag MT_FLAGS_LOCK_NONE
  drm: manager to keep track of GPUs VA mappings
  drm: debugfs: provide infrastructure to dump a DRM GPU VA space
  drm/nouveau: new VM_BIND uapi interfaces
  drm/nouveau: get vmm via nouveau_cli_vmm()
  drm/nouveau: bo: initialize GEM GPU VA interface
  drm/nouveau: move usercopy helpers to nouveau_drv.h
  drm/nouveau: fence: fail to emit when fence context is killed
  drm/nouveau: chan: provide nouveau_channel_kill()
  drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
  drm/nouveau: implement uvmm for user mode bindings
  drm/nouveau: implement new VM_BIND UAPI
  drm/nouveau: debugfs: implement DRM GPU VA debugfs

 Documentation/gpu/driver-uapi.rst             |   11 +
 Documentation/gpu/drm-mm.rst                  |   43 +
 drivers/gpu/drm/Kconfig                       |    6 +
 drivers/gpu/drm/Makefile                      |    3 +
 drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
 drivers/gpu/drm/drm_debugfs.c                 |   56 +
 drivers/gpu/drm/drm_exec.c                    |  294 +++
 drivers/gpu/drm/drm_gem.c                     |    3 +
 drivers/gpu/drm/drm_gpuva_mgr.c               | 1704 +++++++++++++++++
 drivers/gpu/drm/nouveau/Kbuild                |    3 +
 drivers/gpu/drm/nouveau/Kconfig               |    2 +
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |   26 +-
 drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   19 +-
 .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   20 +-
 drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
 drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
 drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
 drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
 drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
 drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
 drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
 drivers/gpu/drm/nouveau/nouveau_drm.c         |   26 +-
 drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
 drivers/gpu/drm/nouveau/nouveau_exec.c        |  322 ++++
 drivers/gpu/drm/nouveau/nouveau_exec.h        |   39 +
 drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
 drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c         |   57 +-
 drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
 drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c       |  467 +++++
 drivers/gpu/drm/nouveau/nouveau_sched.h       |   96 +
 drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
 drivers/gpu/drm/nouveau/nouveau_uvmm.c        | 1536 +++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_uvmm.h        |  138 ++
 drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
 drivers/gpu/drm/nouveau/nvif/vmm.c            |  100 +-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  213 ++-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |  197 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   25 +
 .../drm/nouveau/nvkm/subdev/mmu/vmmgf100.c    |   16 +-
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |   16 +-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c |   27 +-
 include/drm/drm_debugfs.h                     |   25 +
 include/drm/drm_drv.h                         |    6 +
 include/drm/drm_exec.h                        |  144 ++
 include/drm/drm_gem.h                         |   75 +
 include/drm/drm_gpuva_mgr.h                   |  714 +++++++
 include/linux/maple_tree.h                    |   27 +-
 include/uapi/drm/nouveau_drm.h                |  220 +++
 lib/maple_tree.c                              |    7 +-
 51 files changed, 6808 insertions(+), 209 deletions(-)
 create mode 100644 drivers/gpu/drm/drm_exec.c
 create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
 create mode 100644 include/drm/drm_exec.h
 create mode 100644 include/drm/drm_gpuva_mgr.h


base-commit: 48075a66fca613477ac1969b576a93ef5db0164f
-- 
2.39.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 01/16] drm: execution context for GEM buffers
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
@ 2023-02-17 13:44 ` Danilo Krummrich
  2023-02-17 16:00   ` Christian König
  2023-02-17 13:44 ` [PATCH drm-next v2 02/16] drm/exec: fix memory leak in drm_exec_prepare_obj() Danilo Krummrich
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:44 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-mm, nouveau, linux-kernel, dri-devel, linux-doc

From: Christian König <christian.koenig@amd.com>

This adds the infrastructure for an execution context for GEM buffers
which is similar to the existinc TTMs execbuf util and intended to replace
it in the long term.

The basic functionality is that we abstracts the necessary loop to lock
many different GEM buffers with automated deadlock and duplicate handling.

v2: drop xarray and use dynamic resized array instead, the locking
    overhead is unecessary and measureable.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 Documentation/gpu/drm-mm.rst       |  12 ++
 drivers/gpu/drm/Kconfig            |   6 +
 drivers/gpu/drm/Makefile           |   2 +
 drivers/gpu/drm/amd/amdgpu/Kconfig |   1 +
 drivers/gpu/drm/drm_exec.c         | 295 +++++++++++++++++++++++++++++
 include/drm/drm_exec.h             | 144 ++++++++++++++
 6 files changed, 460 insertions(+)
 create mode 100644 drivers/gpu/drm/drm_exec.c
 create mode 100644 include/drm/drm_exec.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index a79fd3549ff8..a52e6f4117d6 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -493,6 +493,18 @@ DRM Sync Objects
 .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
    :export:
 
+DRM Execution context
+=====================
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :doc: Overview
+
+.. kernel-doc:: include/drm/drm_exec.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :export:
+
 GPU Scheduler
 =============
 
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index f42d4c6a19f2..1573d658fbb5 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -200,6 +200,12 @@ config DRM_TTM
 	  GPU memory types. Will be enabled automatically if a device driver
 	  uses it.
 
+config DRM_EXEC
+	tristate
+	depends on DRM
+	help
+	  Execution context for command submissions
+
 config DRM_BUDDY
 	tristate
 	depends on DRM
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index ab4460fcd63f..d40defbb0347 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -78,6 +78,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
 #
 # Memory-management helpers
 #
+#
+obj-$(CONFIG_DRM_EXEC) += drm_exec.o
 
 obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
 
diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 5341b6b242c3..279fb3bba810 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -11,6 +11,7 @@ config DRM_AMDGPU
 	select DRM_SCHED
 	select DRM_TTM
 	select DRM_TTM_HELPER
+	select DRM_EXEC
 	select POWER_SUPPLY
 	select HWMON
 	select I2C
diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
new file mode 100644
index 000000000000..ed2106c22786
--- /dev/null
+++ b/drivers/gpu/drm/drm_exec.c
@@ -0,0 +1,295 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#include <drm/drm_exec.h>
+#include <drm/drm_gem.h>
+#include <linux/dma-resv.h>
+
+/**
+ * DOC: Overview
+ *
+ * This component mainly abstracts the retry loop necessary for locking
+ * multiple GEM objects while preparing hardware operations (e.g. command
+ * submissions, page table updates etc..).
+ *
+ * If a contention is detected while locking a GEM object the cleanup procedure
+ * unlocks all previously locked GEM objects and locks the contended one first
+ * before locking any further objects.
+ *
+ * After an object is locked fences slots can optionally be reserved on the
+ * dma_resv object inside the GEM object.
+ *
+ * A typical usage pattern should look like this::
+ *
+ *	struct drm_gem_object *obj;
+ *	struct drm_exec exec;
+ *	unsigned long index;
+ *	int ret;
+ *
+ *	drm_exec_init(&exec, true);
+ *	drm_exec_while_not_all_locked(&exec) {
+ *		ret = drm_exec_prepare_obj(&exec, boA, 1);
+ *		drm_exec_continue_on_contention(&exec);
+ *		if (ret)
+ *			goto error;
+ *
+ *		ret = drm_exec_lock(&exec, boB, 1);
+ *		drm_exec_continue_on_contention(&exec);
+ *		if (ret)
+ *			goto error;
+ *	}
+ *
+ *	drm_exec_for_each_locked_object(&exec, index, obj) {
+ *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
+ *		...
+ *	}
+ *	drm_exec_fini(&exec);
+ *
+ * See struct dma_exec for more details.
+ */
+
+/* Dummy value used to initially enter the retry loop */
+#define DRM_EXEC_DUMMY (void*)~0
+
+/* Initialize the drm_exec_objects container */
+static void drm_exec_objects_init(struct drm_exec_objects *container)
+{
+	container->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
+
+	/* If allocation here fails, just delay that till the first use */
+	container->max_objects = container->objects ?
+		PAGE_SIZE / sizeof(void *) : 0;
+	container->num_objects = 0;
+}
+
+/* Cleanup the drm_exec_objects container */
+static void drm_exec_objects_fini(struct drm_exec_objects *container)
+{
+	kvfree(container->objects);
+}
+
+/* Make sure we have enough room and add an object the container */
+static int drm_exec_objects_add(struct drm_exec_objects *container,
+				struct drm_gem_object *obj)
+{
+	if (unlikely(container->num_objects == container->max_objects)) {
+		size_t size = container->max_objects * sizeof(void *);
+		void *tmp;
+
+		tmp = kvrealloc(container->objects, size, size + PAGE_SIZE,
+				GFP_KERNEL);
+		if (!tmp)
+			return -ENOMEM;
+
+		container->objects = tmp;
+		container->max_objects += PAGE_SIZE / sizeof(void *);
+	}
+	drm_gem_object_get(obj);
+	container->objects[container->num_objects++] = obj;
+	return 0;
+}
+
+/* Unlock all objects and drop references */
+static void drm_exec_unlock_all(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj;
+	unsigned long index;
+
+	drm_exec_for_each_duplicate_object(exec, index, obj)
+		drm_gem_object_put(obj);
+
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		dma_resv_unlock(obj->resv);
+		drm_gem_object_put(obj);
+	}
+}
+
+/**
+ * drm_exec_init - initialize a drm_exec object
+ * @exec: the drm_exec object to initialize
+ * @interruptible: if locks should be acquired interruptible
+ *
+ * Initialize the object and make sure that we can track locked and duplicate
+ * objects.
+ */
+void drm_exec_init(struct drm_exec *exec, bool interruptible)
+{
+	exec->interruptible = interruptible;
+	drm_exec_objects_init(&exec->locked);
+	drm_exec_objects_init(&exec->duplicates);
+	exec->contended = DRM_EXEC_DUMMY;
+}
+EXPORT_SYMBOL(drm_exec_init);
+
+/**
+ * drm_exec_fini - finalize a drm_exec object
+ * @exec: the drm_exec object to finilize
+ *
+ * Unlock all locked objects, drop the references to objects and free all memory
+ * used for tracking the state.
+ */
+void drm_exec_fini(struct drm_exec *exec)
+{
+	drm_exec_unlock_all(exec);
+	drm_exec_objects_fini(&exec->locked);
+	drm_exec_objects_fini(&exec->duplicates);
+	if (exec->contended != DRM_EXEC_DUMMY) {
+		drm_gem_object_put(exec->contended);
+		ww_acquire_fini(&exec->ticket);
+	}
+}
+EXPORT_SYMBOL(drm_exec_fini);
+
+/**
+ * drm_exec_cleanup - cleanup when contention is detected
+ * @exec: the drm_exec object to cleanup
+ *
+ * Cleanup the current state and return true if we should stay inside the retry
+ * loop, false if there wasn't any contention detected and we can keep the
+ * objects locked.
+ */
+bool drm_exec_cleanup(struct drm_exec *exec)
+{
+	if (likely(!exec->contended)) {
+		ww_acquire_done(&exec->ticket);
+		return false;
+	}
+
+	if (likely(exec->contended == DRM_EXEC_DUMMY)) {
+		exec->contended = NULL;
+		ww_acquire_init(&exec->ticket, &reservation_ww_class);
+		return true;
+	}
+
+	drm_exec_unlock_all(exec);
+	exec->locked.num_objects = 0;
+	exec->duplicates.num_objects = 0;
+	return true;
+}
+EXPORT_SYMBOL(drm_exec_cleanup);
+
+/* Track the locked object in the xa and reserve fences */
+static int drm_exec_obj_locked(struct drm_exec_objects *container,
+			       struct drm_gem_object *obj,
+			       unsigned int num_fences)
+{
+	int ret;
+
+	if (container) {
+		ret = drm_exec_objects_add(container, obj);
+		if (ret)
+			return ret;
+	}
+
+	if (num_fences) {
+		ret = dma_resv_reserve_fences(obj->resv, num_fences);
+		if (ret)
+			goto error_erase;
+	}
+
+	return 0;
+
+error_erase:
+	if (container) {
+		--container->num_objects;
+		drm_gem_object_put(obj);
+	}
+	return ret;
+}
+
+/* Make sure the contended object is locked first */
+static int drm_exec_lock_contended(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj = exec->contended;
+	int ret;
+
+	if (likely(!obj))
+		return 0;
+
+	if (exec->interruptible) {
+		ret = dma_resv_lock_slow_interruptible(obj->resv,
+						       &exec->ticket);
+		if (unlikely(ret))
+			goto error_dropref;
+	} else {
+		dma_resv_lock_slow(obj->resv, &exec->ticket);
+	}
+
+	ret = drm_exec_obj_locked(&exec->locked, obj, 0);
+	if (unlikely(ret))
+		dma_resv_unlock(obj->resv);
+
+error_dropref:
+	/* Always cleanup the contention so that error handling can kick in */
+	drm_gem_object_put(obj);
+	exec->contended = NULL;
+	return ret;
+}
+
+/**
+ * drm_exec_prepare_obj - prepare a GEM object for use
+ * @exec: the drm_exec object with the state
+ * @obj: the GEM object to prepare
+ * @num_fences: how many fences to reserve
+ *
+ * Prepare a GEM object for use by locking it and reserving fence slots. All
+ * succesfully locked objects are put into the locked container. Duplicates
+ * detected as well and automatically moved into the duplicates container.
+ *
+ * Returns: -EDEADLK if a contention is detected, -ENOMEM when memory
+ * allocation failed and zero for success.
+ */
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences)
+{
+	int ret;
+
+	ret = drm_exec_lock_contended(exec);
+	if (unlikely(ret))
+		return ret;
+
+	if (exec->interruptible)
+		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
+	else
+		ret = dma_resv_lock(obj->resv, &exec->ticket);
+
+	if (unlikely(ret == -EDEADLK)) {
+		drm_gem_object_get(obj);
+		exec->contended = obj;
+		return -EDEADLK;
+	}
+
+	if (unlikely(ret == -EALREADY)) {
+		struct drm_exec_objects *container = &exec->duplicates;
+
+		/*
+		 * If this is the first locked GEM object it was most likely
+		 * just contended. So don't add it to the duplicates, just
+		 * reserve the fence slots.
+		 */
+		if (exec->locked.num_objects && exec->locked.objects[0] == obj)
+			container = NULL;
+
+		ret = drm_exec_obj_locked(container, obj, num_fences);
+		if (ret)
+			return ret;
+
+	} else if (unlikely(ret)) {
+		return ret;
+
+	} else {
+		ret = drm_exec_obj_locked(&exec->locked, obj, num_fences);
+		if (ret)
+			goto error_unlock;
+	}
+
+	drm_gem_object_get(obj);
+	return 0;
+
+error_unlock:
+	dma_resv_unlock(obj->resv);
+	return ret;
+}
+EXPORT_SYMBOL(drm_exec_prepare_obj);
+
+MODULE_DESCRIPTION("DRM execution context");
+MODULE_LICENSE("Dual MIT/GPL");
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
new file mode 100644
index 000000000000..f73981c6292e
--- /dev/null
+++ b/include/drm/drm_exec.h
@@ -0,0 +1,144 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#ifndef __DRM_EXEC_H__
+#define __DRM_EXEC_H__
+
+#include <linux/ww_mutex.h>
+
+struct drm_gem_object;
+
+/**
+ * struct drm_exec_objects - Container for GEM objects in a drm_exec
+ */
+struct drm_exec_objects {
+	unsigned int		num_objects;
+	unsigned int		max_objects;
+	struct drm_gem_object	**objects;
+};
+
+/**
+ * drm_exec_objects_for_each - iterate over all the objects inside the container
+ */
+#define drm_exec_objects_for_each(array, index, obj)		\
+	for (index = 0, obj = (array)->objects[0];		\
+	     index < (array)->num_objects;			\
+	     ++index, obj = (array)->objects[index])
+
+/**
+ * struct drm_exec - Execution context
+ */
+struct drm_exec {
+	/**
+	 * @interruptible: If locks should be taken interruptible
+	 */
+	bool			interruptible;
+
+	/**
+	 * @ticket: WW ticket used for acquiring locks
+	 */
+	struct ww_acquire_ctx	ticket;
+
+	/**
+	 * @locked: container for the locked GEM objects
+	 */
+	struct drm_exec_objects	locked;
+
+	/**
+	 * @duplicates: container for the duplicated GEM objects
+	 */
+	struct drm_exec_objects	duplicates;
+
+	/**
+	 * @contended: contended GEM object we backet of for.
+	 */
+	struct drm_gem_object	*contended;
+};
+
+/**
+ * drm_exec_for_each_locked_object - iterate over all the locked objects
+ * @exec: drm_exec object
+ * @index: unsigned long index for the iteration
+ * @obj: the current GEM object
+ *
+ * Iterate over all the locked GEM objects inside the drm_exec object.
+ */
+#define drm_exec_for_each_locked_object(exec, index, obj)	\
+	drm_exec_objects_for_each(&(exec)->locked, index, obj)
+
+/**
+ * drm_exec_for_each_duplicate_object - iterate over all the duplicate objects
+ * @exec: drm_exec object
+ * @index: unsigned long index for the iteration
+ * @obj: the current GEM object
+ *
+ * Iterate over all the duplicate GEM objects inside the drm_exec object.
+ */
+#define drm_exec_for_each_duplicate_object(exec, index, obj)	\
+	drm_exec_objects_for_each(&(exec)->duplicates, index, obj)
+
+/**
+ * drm_exec_while_not_all_locked - loop until all GEM objects are prepared
+ * @exec: drm_exec object
+ *
+ * Core functionality of the drm_exec object. Loops until all GEM objects are
+ * prepared and no more contention exists.
+ *
+ * At the beginning of the loop it is guaranteed that no GEM object is locked.
+ */
+#define drm_exec_while_not_all_locked(exec)	\
+	while (drm_exec_cleanup(exec))
+
+/**
+ * drm_exec_continue_on_contention - continue the loop when we need to cleanup
+ * @exec: drm_exec object
+ *
+ * Control flow helper to continue when a contention was detected and we need to
+ * clean up and re-start the loop to prepare all GEM objects.
+ */
+#define drm_exec_continue_on_contention(exec)		\
+	if (unlikely(drm_exec_is_contended(exec)))	\
+		continue
+
+/**
+ * drm_exec_break_on_contention - break a subordinal loop on contention
+ * @exec: drm_exec object
+ *
+ * Control flow helper to break a subordinal loop when a contention was detected
+ * and we need to clean up and re-start the loop to prepare all GEM objects.
+ */
+#define drm_exec_break_on_contention(exec)		\
+	if (unlikely(drm_exec_is_contended(exec)))	\
+		break
+
+/**
+ * drm_exec_is_contended - check for contention
+ * @exec: drm_exec object
+ *
+ * Returns true if the drm_exec object has run into some contention while
+ * locking a GEM object and needs to clean up.
+ */
+static inline bool drm_exec_is_contended(struct drm_exec *exec)
+{
+	return !!exec->contended;
+}
+
+/**
+ * drm_exec_has_duplicates - check for duplicated GEM object
+ * @exec: drm_exec object
+ *
+ * Return true if the drm_exec object has encountered some already locked GEM
+ * objects while trying to lock them. This can happen if multiple GEM objects
+ * share the same underlying resv object.
+ */
+static inline bool drm_exec_has_duplicates(struct drm_exec *exec)
+{
+	return exec->duplicates.num_objects > 0;
+}
+
+void drm_exec_init(struct drm_exec *exec, bool interruptible);
+void drm_exec_fini(struct drm_exec *exec);
+bool drm_exec_cleanup(struct drm_exec *exec);
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences);
+
+#endif
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 02/16] drm/exec: fix memory leak in drm_exec_prepare_obj()
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
  2023-02-17 13:44 ` [PATCH drm-next v2 01/16] drm: execution context for GEM buffers Danilo Krummrich
@ 2023-02-17 13:44 ` Danilo Krummrich
  2023-02-17 13:44 ` [PATCH drm-next v2 03/16] maple_tree: split up MA_STATE() macro Danilo Krummrich
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:44 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm, Danilo Krummrich

Don't call drm_gem_object_get() unconditionally.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/drm_exec.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
index ed2106c22786..5713a589a6a3 100644
--- a/drivers/gpu/drm/drm_exec.c
+++ b/drivers/gpu/drm/drm_exec.c
@@ -282,7 +282,6 @@ int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
 			goto error_unlock;
 	}
 
-	drm_gem_object_get(obj);
 	return 0;
 
 error_unlock:
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 03/16] maple_tree: split up MA_STATE() macro
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
  2023-02-17 13:44 ` [PATCH drm-next v2 01/16] drm: execution context for GEM buffers Danilo Krummrich
  2023-02-17 13:44 ` [PATCH drm-next v2 02/16] drm/exec: fix memory leak in drm_exec_prepare_obj() Danilo Krummrich
@ 2023-02-17 13:44 ` Danilo Krummrich
  2023-02-17 18:34   ` Liam R. Howlett
  2023-02-17 19:45   ` Matthew Wilcox
  2023-02-17 13:44 ` [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE Danilo Krummrich
                   ` (13 subsequent siblings)
  16 siblings, 2 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:44 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm, Danilo Krummrich

Split up the MA_STATE() macro such that components using the maple tree
can easily inherit from struct ma_state and build custom tree walk
macros to hide their internals from users.

Example:

struct sample_iter {
	struct ma_state mas;
	struct sample_mgr *mgr;
	struct sample_entry *entry;
};

\#define SAMPLE_ITER(name, __mgr) \
	struct sample_iter name = { \
		.mas = __MA_STATE(&(__mgr)->mt, 0, 0),
		.mgr = __mgr,
		.entry = NULL,
	}

\#define sample_iter_for_each_range(it__, start__, end__) \
	for ((it__).mas.index = start__, (it__).entry = mas_find(&(it__).mas, end__ - 1); \
	     (it__).entry; (it__).entry = mas_find(&(it__).mas, end__ - 1))

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 include/linux/maple_tree.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index e594db58a0f1..ca04c900e51a 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -424,8 +424,8 @@ struct ma_wr_state {
 #define MA_ERROR(err) \
 		((struct maple_enode *)(((unsigned long)err << 2) | 2UL))
 
-#define MA_STATE(name, mt, first, end)					\
-	struct ma_state name = {					\
+#define __MA_STATE(mt, first, end)					\
+	{								\
 		.tree = mt,						\
 		.index = first,						\
 		.last = end,						\
@@ -435,6 +435,9 @@ struct ma_wr_state {
 		.alloc = NULL,						\
 	}
 
+#define MA_STATE(name, mt, first, end)					\
+	struct ma_state name = __MA_STATE(mt, first, end)
+
 #define MA_WR_STATE(name, ma_state, wr_entry)				\
 	struct ma_wr_state name = {					\
 		.mas = ma_state,					\
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (2 preceding siblings ...)
  2023-02-17 13:44 ` [PATCH drm-next v2 03/16] maple_tree: split up MA_STATE() macro Danilo Krummrich
@ 2023-02-17 13:44 ` Danilo Krummrich
  2023-02-17 18:18   ` Liam R. Howlett
  2023-02-17 19:38   ` Matthew Wilcox
  2023-02-17 13:44 ` [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
                   ` (12 subsequent siblings)
  16 siblings, 2 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:44 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm, Danilo Krummrich

Generic components making use of the maple tree (such as the
DRM GPUVA Manager) delegate the responsibility of ensuring mutual
exclusion to their users.

While such components could inherit the concept of an external lock,
some users might just serialize the access to the component and hence to
the internal maple tree.

In order to allow such use cases, add a new flag MT_FLAGS_LOCK_NONE to
indicate not to do any internal lockdep checks.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 include/linux/maple_tree.h | 20 +++++++++++++++-----
 lib/maple_tree.c           |  7 ++++---
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index ca04c900e51a..f795e5def8d0 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -170,10 +170,11 @@ enum maple_type {
 #define MT_FLAGS_USE_RCU	0x02
 #define MT_FLAGS_HEIGHT_OFFSET	0x02
 #define MT_FLAGS_HEIGHT_MASK	0x7C
-#define MT_FLAGS_LOCK_MASK	0x300
+#define MT_FLAGS_LOCK_MASK	0x700
 #define MT_FLAGS_LOCK_IRQ	0x100
 #define MT_FLAGS_LOCK_BH	0x200
 #define MT_FLAGS_LOCK_EXTERN	0x300
+#define MT_FLAGS_LOCK_NONE	0x400
 
 #define MAPLE_HEIGHT_MAX	31
 
@@ -559,11 +560,16 @@ static inline void mas_set(struct ma_state *mas, unsigned long index)
 	mas_set_range(mas, index, index);
 }
 
-static inline bool mt_external_lock(const struct maple_tree *mt)
+static inline bool mt_lock_external(const struct maple_tree *mt)
 {
 	return (mt->ma_flags & MT_FLAGS_LOCK_MASK) == MT_FLAGS_LOCK_EXTERN;
 }
 
+static inline bool mt_lock_none(const struct maple_tree *mt)
+{
+	return (mt->ma_flags & MT_FLAGS_LOCK_MASK) == MT_FLAGS_LOCK_NONE;
+}
+
 /**
  * mt_init_flags() - Initialise an empty maple tree with flags.
  * @mt: Maple Tree
@@ -577,7 +583,7 @@ static inline bool mt_external_lock(const struct maple_tree *mt)
 static inline void mt_init_flags(struct maple_tree *mt, unsigned int flags)
 {
 	mt->ma_flags = flags;
-	if (!mt_external_lock(mt))
+	if (!mt_lock_external(mt) && !mt_lock_none(mt))
 		spin_lock_init(&mt->ma_lock);
 	rcu_assign_pointer(mt->ma_root, NULL);
 }
@@ -612,9 +618,11 @@ static inline void mt_clear_in_rcu(struct maple_tree *mt)
 	if (!mt_in_rcu(mt))
 		return;
 
-	if (mt_external_lock(mt)) {
+	if (mt_lock_external(mt)) {
 		BUG_ON(!mt_lock_is_held(mt));
 		mt->ma_flags &= ~MT_FLAGS_USE_RCU;
+	} else if (mt_lock_none(mt)) {
+		mt->ma_flags &= ~MT_FLAGS_USE_RCU;
 	} else {
 		mtree_lock(mt);
 		mt->ma_flags &= ~MT_FLAGS_USE_RCU;
@@ -631,9 +639,11 @@ static inline void mt_set_in_rcu(struct maple_tree *mt)
 	if (mt_in_rcu(mt))
 		return;
 
-	if (mt_external_lock(mt)) {
+	if (mt_lock_external(mt)) {
 		BUG_ON(!mt_lock_is_held(mt));
 		mt->ma_flags |= MT_FLAGS_USE_RCU;
+	} else if (mt_lock_none(mt)) {
+		mt->ma_flags |= MT_FLAGS_USE_RCU;
 	} else {
 		mtree_lock(mt);
 		mt->ma_flags |= MT_FLAGS_USE_RCU;
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 26e2045d3cda..f51c0fd4eaad 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -802,8 +802,8 @@ static inline void __rcu **ma_slots(struct maple_node *mn, enum maple_type mt)
 
 static inline bool mt_locked(const struct maple_tree *mt)
 {
-	return mt_external_lock(mt) ? mt_lock_is_held(mt) :
-		lockdep_is_held(&mt->ma_lock);
+	return mt_lock_external(mt) ? mt_lock_is_held(mt) :
+		mt_lock_none(mt) ? true : lockdep_is_held(&mt->ma_lock);
 }
 
 static inline void *mt_slot(const struct maple_tree *mt,
@@ -6120,7 +6120,8 @@ bool mas_nomem(struct ma_state *mas, gfp_t gfp)
 		return false;
 	}
 
-	if (gfpflags_allow_blocking(gfp) && !mt_external_lock(mas->tree)) {
+	if (gfpflags_allow_blocking(gfp) &&
+	    !mt_lock_external(mas->tree) && !mt_lock_none(mas->tree)) {
 		mtree_unlock(mas->tree);
 		mas_alloc_nodes(mas, gfp);
 		mtree_lock(mas->tree);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (3 preceding siblings ...)
  2023-02-17 13:44 ` [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE Danilo Krummrich
@ 2023-02-17 13:44 ` Danilo Krummrich
  2023-02-18  1:05   ` kernel test robot
                     ` (2 more replies)
  2023-02-17 13:48 ` [PATCH drm-next v2 06/16] drm: debugfs: provide infrastructure to dump a DRM GPU VA space Danilo Krummrich
                   ` (11 subsequent siblings)
  16 siblings, 3 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:44 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm,
	Danilo Krummrich, Dave Airlie

Add infrastructure to keep track of GPU virtual address (VA) mappings
with a decicated VA space manager implementation.

New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
start implementing, allow userspace applications to request multiple and
arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
intended to serve the following purposes in this context.

1) Provide infrastructure to track GPU VA allocations and mappings,
   making use of the maple_tree.

2) Generically connect GPU VA mappings to their backing buffers, in
   particular DRM GEM objects.

3) Provide a common implementation to perform more complex mapping
   operations on the GPU VA space. In particular splitting and merging
   of GPU VA mappings, e.g. for intersecting mapping requests or partial
   unmap requests.

Suggested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 Documentation/gpu/drm-mm.rst    |   31 +
 drivers/gpu/drm/Makefile        |    1 +
 drivers/gpu/drm/drm_gem.c       |    3 +
 drivers/gpu/drm/drm_gpuva_mgr.c | 1704 +++++++++++++++++++++++++++++++
 include/drm/drm_drv.h           |    6 +
 include/drm/drm_gem.h           |   75 ++
 include/drm/drm_gpuva_mgr.h     |  714 +++++++++++++
 7 files changed, 2534 insertions(+)
 create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
 create mode 100644 include/drm/drm_gpuva_mgr.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index a52e6f4117d6..c9f120cfe730 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
 .. kernel-doc:: drivers/gpu/drm/drm_mm.c
    :export:
 
+DRM GPU VA Manager
+==================
+
+Overview
+--------
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Overview
+
+Split and Merge
+---------------
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Split and Merge
+
+Locking
+-------
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Locking
+
+
+DRM GPU VA Manager Function References
+--------------------------------------
+
+.. kernel-doc:: include/drm/drm_gpuva_mgr.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :export:
+
 DRM Buddy Allocator
 ===================
 
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index d40defbb0347..4d098efffb98 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -45,6 +45,7 @@ drm-y := \
 	drm_vblank.o \
 	drm_vblank_work.o \
 	drm_vma_manager.o \
+	drm_gpuva_mgr.o \
 	drm_writeback.o
 drm-$(CONFIG_DRM_LEGACY) += \
 	drm_agpsupport.o \
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 59a0bb5ebd85..65115fe88627 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
 	if (!obj->resv)
 		obj->resv = &obj->_resv;
 
+	if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA))
+		drm_gem_gpuva_init(obj);
+
 	drm_vma_node_reset(&obj->vma_node);
 	INIT_LIST_HEAD(&obj->lru_node);
 }
diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
new file mode 100644
index 000000000000..19f583704562
--- /dev/null
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -0,0 +1,1704 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *     Danilo Krummrich <dakr@redhat.com>
+ *
+ */
+
+#include <drm/drm_gem.h>
+#include <drm/drm_gpuva_mgr.h>
+
+/**
+ * DOC: Overview
+ *
+ * The DRM GPU VA Manager, represented by struct drm_gpuva_manager keeps track
+ * of a GPU's virtual address (VA) space and manages the corresponding virtual
+ * mappings represented by &drm_gpuva objects. It also keeps track of the
+ * mapping's backing &drm_gem_object buffers.
+ *
+ * &drm_gem_object buffers maintain a list (and a corresponding list lock) of
+ * &drm_gpuva objects representing all existent GPU VA mappings using this
+ * &drm_gem_object as backing buffer.
+ *
+ * If the &DRM_GPUVA_MANAGER_REGIONS feature is enabled, a GPU VA mapping can
+ * only be created within a previously allocated &drm_gpuva_region, which
+ * represents a reserved portion of the GPU VA space. GPU VA mappings are not
+ * allowed to span over a &drm_gpuva_region's boundary.
+ *
+ * GPU VA regions can also be flagged as sparse, which allows drivers to create
+ * sparse mappings for a whole GPU VA region in order to support Vulkan
+ * 'Sparse Resources'.
+ *
+ * The GPU VA manager internally uses &maple_tree structures to manage the
+ * &drm_gpuva mappings and the &drm_gpuva_regions within a GPU's virtual address
+ * space.
+ *
+ * Besides the GPU VA space regions (&drm_gpuva_region) allocated by a driver
+ * the &drm_gpuva_manager contains a special region representing the portion of
+ * VA space reserved by the kernel. This node is initialized together with the
+ * GPU VA manager instance and removed when the GPU VA manager is destroyed.
+ *
+ * In a typical application drivers would embed struct drm_gpuva_manager,
+ * struct drm_gpuva_region and struct drm_gpuva within their own driver
+ * specific structures, there won't be any memory allocations of it's own nor
+ * memory allocations of &drm_gpuva or &drm_gpuva_region entries.
+ */
+
+/**
+ * DOC: Split and Merge
+ *
+ * The DRM GPU VA manager also provides an algorithm implementing splitting and
+ * merging of existent GPU VA mappings with the ones that are requested to be
+ * mapped or unmapped. This feature is required by the Vulkan API to implement
+ * Vulkan 'Sparse Memory Bindings' - drivers UAPIs often refer to this as
+ * VM BIND.
+ *
+ * Drivers can call drm_gpuva_sm_map() to receive a sequence of callbacks
+ * containing map, unmap and remap operations for a given newly requested
+ * mapping. The sequence of callbacks represents the set of operations to
+ * execute in order to integrate the new mapping cleanly into the current state
+ * of the GPU VA space.
+ *
+ * Depending on how the new GPU VA mapping intersects with the existent mappings
+ * of the GPU VA space the &drm_gpuva_fn_ops callbacks contain an arbitrary
+ * amount of unmap operations, a maximum of two remap operations and a single
+ * map operation. The caller might receive no callback at all if no operation is
+ * required, e.g. if the requested mapping already exists in the exact same way.
+ *
+ * The single map operation, if existent, represents the original map operation
+ * requested by the caller. Please note that this operation might be altered
+ * comparing it with the original map operation, e.g. because it was merged with
+ * an already  existent mapping. Hence, drivers must execute this map operation
+ * instead of the original one passed to drm_gpuva_sm_map().
+ *
+ * &drm_gpuva_op_unmap contains a 'keep' field, which indicates whether the
+ * &drm_gpuva to unmap is physically contiguous with the original mapping
+ * request. Optionally, if 'keep' is set, drivers may keep the actual page table
+ * entries for this &drm_gpuva, adding the missing page table entries only and
+ * update the &drm_gpuva_manager's view of things accordingly.
+ *
+ * Drivers may do the same optimization, namely delta page table updates, also
+ * for remap operations. This is possible since &drm_gpuva_op_remap consists of
+ * one unmap operation and one or two map operations, such that drivers can
+ * derive the page table update delta accordingly.
+ *
+ * Note that there can't be more than two existent mappings to split up, one at
+ * the beginning and one at the end of the new mapping, hence there is a
+ * maximum of two remap operations.
+ *
+ * Generally, the DRM GPU VA manager never merges mappings across the
+ * boundaries of &drm_gpuva_regions. This is the case since merging between
+ * GPU VA regions would result into unmap and map operations to be issued for
+ * both regions involved although the original mapping request was referred to
+ * one specific GPU VA region only. Since the other GPU VA region, the one not
+ * explicitly requested to be altered, might be in use by the GPU, we are not
+ * allowed to issue any map/unmap operations for this region.
+ *
+ * To update the &drm_gpuva_manager's view of the GPU VA space
+ * drm_gpuva_insert() and drm_gpuva_remove() should be used.
+ *
+ * Analogous to drm_gpuva_sm_map() drm_gpuva_sm_unmap() uses &drm_gpuva_fn_ops
+ * to call back into the driver in order to unmap a range of GPU VA space. The
+ * logic behind this function is way simpler though: For all existent mappings
+ * enclosed by the given range unmap operations are created. For mappings which
+ * are only partically located within the given range, remap operations are
+ * created such that those mappings are split up and re-mapped partically.
+ *
+ * The following diagram depicts the basic relationships of existent GPU VA
+ * mappings, a newly requested mapping and the resulting mappings as implemented
+ * by drm_gpuva_sm_map() - it doesn't cover any arbitrary combinations of these.
+ *
+ * 1) Requested mapping is identical, hence noop.
+ *
+ *    ::
+ *
+ *	     0     a     1
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	     0     a     1
+ *	req: |-----------| (bo_offset=n)
+ *
+ *	     0     a     1
+ *	new: |-----------| (bo_offset=n)
+ *
+ *
+ * 2) Requested mapping is identical, except for the BO offset, hence replace
+ *    the mapping.
+ *
+ *    ::
+ *
+ *	     0     a     1
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	     0     a     1
+ *	req: |-----------| (bo_offset=m)
+ *
+ *	     0     a     1
+ *	new: |-----------| (bo_offset=m)
+ *
+ *
+ * 3) Requested mapping is identical, except for the backing BO, hence replace
+ *    the mapping.
+ *
+ *    ::
+ *
+ *	     0     a     1
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	     0     b     1
+ *	req: |-----------| (bo_offset=n)
+ *
+ *	     0     b     1
+ *	new: |-----------| (bo_offset=n)
+ *
+ *
+ * 4) Existent mapping is a left aligned subset of the requested one, hence
+ *    replace the existent one.
+ *
+ *    ::
+ *
+ *	     0  a  1
+ *	old: |-----|       (bo_offset=n)
+ *
+ *	     0     a     2
+ *	req: |-----------| (bo_offset=n)
+ *
+ *	     0     a     2
+ *	new: |-----------| (bo_offset=n)
+ *
+ *    .. note::
+ *       We expect to see the same result for a request with a different BO
+ *       and/or non-contiguous BO offset.
+ *
+ *
+ * 5) Requested mapping's range is a left aligned subset of the existent one,
+ *    but backed by a different BO. Hence, map the requested mapping and split
+ *    the existent one adjusting it's BO offset.
+ *
+ *    ::
+ *
+ *	     0     a     2
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	     0  b  1
+ *	req: |-----|       (bo_offset=n)
+ *
+ *	     0  b  1  a' 2
+ *	new: |-----|-----| (b.bo_offset=n, a.bo_offset=n+1)
+ *
+ *    .. note::
+ *       We expect to see the same result for a request with a different BO
+ *       and/or non-contiguous BO offset.
+ *
+ *
+ * 6) Existent mapping is a superset of the requested mapping, hence noop.
+ *
+ *    ::
+ *
+ *	     0     a     2
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	     0  a  1
+ *	req: |-----|       (bo_offset=n)
+ *
+ *	     0     a     2
+ *	new: |-----------| (bo_offset=n)
+ *
+ *
+ * 7) Requested mapping's range is a right aligned subset of the existent one,
+ *    but backed by a different BO. Hence, map the requested mapping and split
+ *    the existent one, without adjusting the BO offset.
+ *
+ *    ::
+ *
+ *	     0     a     2
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	           1  b  2
+ *	req:       |-----| (bo_offset=m)
+ *
+ *	     0  a  1  b  2
+ *	new: |-----|-----| (a.bo_offset=n,b.bo_offset=m)
+ *
+ *
+ * 8) Existent mapping is a superset of the requested mapping, hence noop.
+ *
+ *    ::
+ *
+ *	      0     a     2
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	           1  a  2
+ *	req:       |-----| (bo_offset=n+1)
+ *
+ *	     0     a     2
+ *	new: |-----------| (bo_offset=n)
+ *
+ *
+ * 9) Existent mapping is overlapped at the end by the requested mapping backed
+ *    by a different BO. Hence, map the requested mapping and split up the
+ *    existent one, without adjusting the BO offset.
+ *
+ *    ::
+ *
+ *	     0     a     2
+ *	old: |-----------|       (bo_offset=n)
+ *
+ *	           1     b     3
+ *	req:       |-----------| (bo_offset=m)
+ *
+ *	     0  a  1     b     3
+ *	new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
+ *
+ *
+ * 10) Existent mapping is overlapped by the requested mapping, both having the
+ *     same backing BO with a contiguous offset. Hence, merge both mappings.
+ *
+ *     ::
+ *
+ *	      0     a     2
+ *	 old: |-----------|       (bo_offset=n)
+ *
+ *	            1     a     3
+ *	 req:       |-----------| (bo_offset=n+1)
+ *
+ *	      0        a        3
+ *	 new: |-----------------| (bo_offset=n)
+ *
+ *
+ * 11) Requested mapping's range is a centered subset of the existent one
+ *     having a different backing BO. Hence, map the requested mapping and split
+ *     up the existent one in two mappings, adjusting the BO offset of the right
+ *     one accordingly.
+ *
+ *     ::
+ *
+ *	      0        a        3
+ *	 old: |-----------------| (bo_offset=n)
+ *
+ *	            1  b  2
+ *	 req:       |-----|       (bo_offset=m)
+ *
+ *	      0  a  1  b  2  a' 3
+ *	 new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
+ *
+ *
+ * 12) Requested mapping is a contiguous subset of the existent one, hence noop.
+ *
+ *     ::
+ *
+ *	      0        a        3
+ *	 old: |-----------------| (bo_offset=n)
+ *
+ *	            1  a  2
+ *	 req:       |-----|       (bo_offset=n+1)
+ *
+ *	      0        a        3
+ *	 old: |-----------------| (bo_offset=n)
+ *
+ *
+ * 13) Existent mapping is a right aligned subset of the requested one, hence
+ *     replace the existent one.
+ *
+ *     ::
+ *
+ *	            1  a  2
+ *	 old:       |-----| (bo_offset=n+1)
+ *
+ *	      0     a     2
+ *	 req: |-----------| (bo_offset=n)
+ *
+ *	      0     a     2
+ *	 new: |-----------| (bo_offset=n)
+ *
+ *     .. note::
+ *        We expect to see the same result for a request with a different bo
+ *        and/or non-contiguous bo_offset.
+ *
+ *
+ * 14) Existent mapping is a centered subset of the requested one, hence
+ *     replace the existent one.
+ *
+ *     ::
+ *
+ *	            1  a  2
+ *	 old:       |-----| (bo_offset=n+1)
+ *
+ *	      0        a       3
+ *	 req: |----------------| (bo_offset=n)
+ *
+ *	      0        a       3
+ *	 new: |----------------| (bo_offset=n)
+ *
+ *     .. note::
+ *        We expect to see the same result for a request with a different bo
+ *        and/or non-contiguous bo_offset.
+ *
+ *
+ * 15) Existent mappings is overlapped at the beginning by the requested mapping
+ *     backed by a different BO. Hence, map the requested mapping and split up
+ *     the existent one, adjusting it's BO offset accordingly.
+ *
+ *     ::
+ *
+ *	            1     a     3
+ *	 old:       |-----------| (bo_offset=n)
+ *
+ *	      0     b     2
+ *	 req: |-----------|       (bo_offset=m)
+ *
+ *	      0     b     2  a' 3
+ *	 new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
+ *
+ *
+ * 16) Requested mapping fills the gap between two existent mappings all having
+ *     the same backing BO, such that all three have a contiguous BO offset.
+ *     Hence, merge all mappings.
+ *
+ *     ::
+ *
+ *	      0     a     1
+ *	 old: |-----------|                        (bo_offset=n)
+ *
+ *	                             2     a     3
+ *	 old':                       |-----------| (bo_offset=n+2)
+ *
+ *	                 1     a     2
+ *	 req:            |-----------|             (bo_offset=n+1)
+ *
+ *	                       a
+ *	 new: |----------------------------------| (bo_offset=n)
+ */
+
+/**
+ * DOC: Locking
+ *
+ * Generally, the GPU VA manager does not take care of locking itself, it is
+ * the drivers responsibility to take care about locking. Drivers might want to
+ * protect the following operations: inserting, removing and iterating
+ * &drm_gpuva and &drm_gpuva_region objects as well as generating all kinds of
+ * operations, such as split / merge or prefetch.
+ *
+ * The GPU VA manager also does not take care of the locking of the backing
+ * &drm_gem_object buffers GPU VA lists by itself; drivers are responsible to
+ * enforce mutual exclusion.
+ */
+
+
+static int __drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
+				     struct drm_gpuva_region *reg);
+static void __drm_gpuva_region_remove(struct drm_gpuva_region *reg);
+
+/**
+ * drm_gpuva_manager_init - initialize a &drm_gpuva_manager
+ * @mgr: pointer to the &drm_gpuva_manager to initialize
+ * @name: the name of the GPU VA space
+ * @start_offset: the start offset of the GPU VA space
+ * @range: the size of the GPU VA space
+ * @reserve_offset: the start of the kernel reserved GPU VA area
+ * @reserve_range: the size of the kernel reserved GPU VA area
+ * @ops: &drm_gpuva_fn_ops called on &drm_gpuva_sm_map / &drm_gpuva_sm_unmap
+ * @flags: the feature flags of the &drm_gpuva_manager
+ *
+ * The &drm_gpuva_manager must be initialized with this function before use.
+ *
+ * Note that @mgr must be cleared to 0 before calling this function. The given
+ * &name is expected to be managed by the surrounding driver structures.
+ */
+void
+drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
+		       const char *name,
+		       u64 start_offset, u64 range,
+		       u64 reserve_offset, u64 reserve_range,
+		       struct drm_gpuva_fn_ops *ops,
+		       enum drm_gpuva_mgr_flags flags)
+{
+	mt_init_flags(&mgr->region_mt, MT_FLAGS_LOCK_NONE);
+	mt_init_flags(&mgr->va_mt, MT_FLAGS_LOCK_NONE);
+
+	mgr->mm_start = start_offset;
+	mgr->mm_range = range;
+
+	mgr->name = name ? name : "unknown";
+	mgr->ops = ops;
+	mgr->flags = flags;
+
+	memset(&mgr->kernel_alloc_region, 0, sizeof(struct drm_gpuva_region));
+	mgr->kernel_alloc_region.va.addr = reserve_offset;
+	mgr->kernel_alloc_region.va.range = reserve_range;
+
+	__drm_gpuva_region_insert(mgr, &mgr->kernel_alloc_region);
+}
+EXPORT_SYMBOL(drm_gpuva_manager_init);
+
+/**
+ * drm_gpuva_manager_destroy - cleanup a &drm_gpuva_manager
+ * @mgr: pointer to the &drm_gpuva_manager to clean up
+ *
+ * Note that it is a bug to call this function on a manager that still
+ * holds GPU VA mappings.
+ */
+void
+drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr)
+{
+	mgr->name = NULL;
+	__drm_gpuva_region_remove(&mgr->kernel_alloc_region);
+
+	WARN(!mtree_empty(&mgr->va_mt),
+	     "GPUVA tree is not empty, potentially leaking memory.");
+	__mt_destroy(&mgr->va_mt);
+
+	WARN(!mtree_empty(&mgr->region_mt),
+	     "GPUVA region tree is not empty, potentially leaking memory.");
+	__mt_destroy(&mgr->region_mt);
+}
+EXPORT_SYMBOL(drm_gpuva_manager_destroy);
+
+static inline bool
+drm_gpuva_in_mm_range(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
+{
+	u64 end = addr + range;
+	u64 mm_start = mgr->mm_start;
+	u64 mm_end = mm_start + mgr->mm_range;
+
+	return addr < mm_end && mm_start < end;
+}
+
+static inline bool
+drm_gpuva_in_kernel_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
+{
+	u64 end = addr + range;
+	u64 kstart = mgr->kernel_alloc_region.va.addr;
+	u64 kend = kstart + mgr->kernel_alloc_region.va.range;
+
+	return addr < kend && kstart < end;
+}
+
+static struct drm_gpuva_region *
+drm_gpuva_in_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
+{
+	DRM_GPUVA_REGION_ITER(it, mgr);
+
+	/* Find the VA region the requested range is strictly enclosed by. */
+	drm_gpuva_iter_for_each_range(it, addr, addr + range) {
+		struct drm_gpuva_region *reg = it.reg;
+
+		if (reg->va.addr <= addr &&
+		    reg->va.addr + reg->va.range >= addr + range &&
+		    reg != &mgr->kernel_alloc_region)
+			return reg;
+	}
+
+	return NULL;
+}
+
+static bool
+drm_gpuva_in_any_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
+{
+	return !!drm_gpuva_in_region(mgr, addr, range);
+}
+
+/**
+ * drm_gpuva_remove_iter - removes the iterators current element
+ * @it: the &drm_gpuva_iterator
+ *
+ * This removes the element the iterator currently points to.
+ */
+void
+drm_gpuva_iter_remove(struct drm_gpuva_iterator *it)
+{
+	mas_erase(&it->mas);
+}
+EXPORT_SYMBOL(drm_gpuva_iter_remove);
+
+/**
+ * drm_gpuva_insert - insert a &drm_gpuva
+ * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
+ * @va: the &drm_gpuva to insert
+ * @addr: the start address of the GPU VA
+ * @range: the range of the GPU VA
+ *
+ * Insert a &drm_gpuva with a given address and range into a
+ * &drm_gpuva_manager.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuva_insert(struct drm_gpuva_manager *mgr,
+		 struct drm_gpuva *va)
+{
+	u64 addr = va->va.addr;
+	u64 range = va->va.range;
+	MA_STATE(mas, &mgr->va_mt, addr, addr + range - 1);
+	struct drm_gpuva_region *reg = NULL;
+	int ret;
+
+	if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
+		return -EINVAL;
+
+	if (unlikely(drm_gpuva_in_kernel_region(mgr, addr, range)))
+		return -EINVAL;
+
+	if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS) {
+		reg = drm_gpuva_in_region(mgr, addr, range);
+		if (unlikely(!reg))
+			return -EINVAL;
+	}
+
+	if (unlikely(drm_gpuva_find_first(mgr, addr, range)))
+		return -EEXIST;
+
+	ret = mas_store_gfp(&mas, va, GFP_KERNEL);
+	if (unlikely(ret))
+		return ret;
+
+	va->mgr = mgr;
+	va->region = reg;
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_gpuva_insert);
+
+/**
+ * drm_gpuva_remove - remove a &drm_gpuva
+ * @va: the &drm_gpuva to remove
+ *
+ * This removes the given &va from the underlaying tree.
+ */
+void
+drm_gpuva_remove(struct drm_gpuva *va)
+{
+	MA_STATE(mas, &va->mgr->va_mt, va->va.addr, 0);
+
+	mas_erase(&mas);
+}
+EXPORT_SYMBOL(drm_gpuva_remove);
+
+/**
+ * drm_gpuva_link - link a &drm_gpuva
+ * @va: the &drm_gpuva to link
+ *
+ * This adds the given &va to the GPU VA list of the &drm_gem_object it is
+ * associated with.
+ *
+ * This function expects the caller to protect the GEM's GPUVA list against
+ * concurrent access.
+ */
+void
+drm_gpuva_link(struct drm_gpuva *va)
+{
+	if (likely(va->gem.obj))
+		list_add_tail(&va->head, &va->gem.obj->gpuva.list);
+}
+EXPORT_SYMBOL(drm_gpuva_link);
+
+/**
+ * drm_gpuva_unlink - unlink a &drm_gpuva
+ * @va: the &drm_gpuva to unlink
+ *
+ * This removes the given &va from the GPU VA list of the &drm_gem_object it is
+ * associated with.
+ *
+ * This function expects the caller to protect the GEM's GPUVA list against
+ * concurrent access.
+ */
+void
+drm_gpuva_unlink(struct drm_gpuva *va)
+{
+	if (likely(va->gem.obj))
+		list_del_init(&va->head);
+}
+EXPORT_SYMBOL(drm_gpuva_unlink);
+
+/**
+ * drm_gpuva_find_first - find the first &drm_gpuva in the given range
+ * @mgr: the &drm_gpuva_manager to search in
+ * @addr: the &drm_gpuvas address
+ * @range: the &drm_gpuvas range
+ *
+ * Returns: the first &drm_gpuva within the given range
+ */
+struct drm_gpuva *
+drm_gpuva_find_first(struct drm_gpuva_manager *mgr,
+		     u64 addr, u64 range)
+{
+	MA_STATE(mas, &mgr->va_mt, addr, 0);
+
+	return mas_find(&mas, addr + range - 1);
+}
+EXPORT_SYMBOL(drm_gpuva_find_first);
+
+/**
+ * drm_gpuva_find - find a &drm_gpuva
+ * @mgr: the &drm_gpuva_manager to search in
+ * @addr: the &drm_gpuvas address
+ * @range: the &drm_gpuvas range
+ *
+ * Returns: the &drm_gpuva at a given &addr and with a given &range
+ */
+struct drm_gpuva *
+drm_gpuva_find(struct drm_gpuva_manager *mgr,
+	       u64 addr, u64 range)
+{
+	struct drm_gpuva *va;
+
+	va = drm_gpuva_find_first(mgr, addr, range);
+	if (!va)
+		goto out;
+
+	if (va->va.range != range)
+		goto out;
+
+	return va;
+
+out:
+	return NULL;
+}
+EXPORT_SYMBOL(drm_gpuva_find);
+
+/**
+ * drm_gpuva_find_prev - find the &drm_gpuva before the given address
+ * @mgr: the &drm_gpuva_manager to search in
+ * @start: the given GPU VA's start address
+ *
+ * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
+ *
+ * Note that if there is any free space between the GPU VA mappings no mapping
+ * is returned.
+ *
+ * Returns: a pointer to the found &drm_gpuva or NULL if none was found
+ */
+struct drm_gpuva *
+drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
+{
+	MA_STATE(mas, &mgr->va_mt, start, 0);
+
+	if (start <= mgr->mm_start ||
+	    start > (mgr->mm_start + mgr->mm_range))
+		return NULL;
+
+	return mas_prev(&mas, start - 1);
+}
+EXPORT_SYMBOL(drm_gpuva_find_prev);
+
+/**
+ * drm_gpuva_find_next - find the &drm_gpuva after the given address
+ * @mgr: the &drm_gpuva_manager to search in
+ * @end: the given GPU VA's end address
+ *
+ * Find the adjacent &drm_gpuva after the GPU VA with given &end address.
+ *
+ * Note that if there is any free space between the GPU VA mappings no mapping
+ * is returned.
+ *
+ * Returns: a pointer to the found &drm_gpuva or NULL if none was found
+ */
+struct drm_gpuva *
+drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end)
+{
+	MA_STATE(mas, &mgr->va_mt, end - 1, 0);
+
+	if (end < mgr->mm_start ||
+	    end >= (mgr->mm_start + mgr->mm_range))
+		return NULL;
+
+	return mas_next(&mas, end);
+}
+EXPORT_SYMBOL(drm_gpuva_find_next);
+
+static int
+__drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
+			  struct drm_gpuva_region *reg)
+{
+	u64 addr = reg->va.addr;
+	u64 range = reg->va.range;
+	MA_STATE(mas, &mgr->region_mt, addr, addr + range - 1);
+	int ret;
+
+	if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
+		return -EINVAL;
+
+	ret = mas_store_gfp(&mas, reg, GFP_KERNEL);
+	if (unlikely(ret))
+		return ret;
+
+	reg->mgr = mgr;
+
+	return 0;
+}
+
+/**
+ * drm_gpuva_region_insert - insert a &drm_gpuva_region
+ * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
+ * @reg: the &drm_gpuva_region to insert
+ * @addr: the start address of the GPU VA
+ * @range: the range of the GPU VA
+ *
+ * Insert a &drm_gpuva_region with a given address and range into a
+ * &drm_gpuva_manager.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
+			struct drm_gpuva_region *reg)
+{
+	if (unlikely(!(mgr->flags & DRM_GPUVA_MANAGER_REGIONS)))
+		return -EINVAL;
+
+	return __drm_gpuva_region_insert(mgr, reg);
+}
+EXPORT_SYMBOL(drm_gpuva_region_insert);
+
+static void
+__drm_gpuva_region_remove(struct drm_gpuva_region *reg)
+{
+	struct drm_gpuva_manager *mgr = reg->mgr;
+	MA_STATE(mas, &mgr->region_mt, reg->va.addr, 0);
+
+	mas_erase(&mas);
+}
+
+/**
+ * drm_gpuva_region_remove - remove a &drm_gpuva_region
+ * @reg: the &drm_gpuva to remove
+ *
+ * This removes the given &reg from the underlaying tree.
+ */
+void
+drm_gpuva_region_remove(struct drm_gpuva_region *reg)
+{
+	struct drm_gpuva_manager *mgr = reg->mgr;
+
+	if (unlikely(!(mgr->flags & DRM_GPUVA_MANAGER_REGIONS)))
+		return;
+
+	if (unlikely(reg == &mgr->kernel_alloc_region)) {
+		WARN(1, "Can't destroy kernel reserved region.\n");
+		return;
+	}
+
+	if (unlikely(!drm_gpuva_region_empty(reg)))
+		WARN(1, "GPU VA region should be empty on destroy.\n");
+
+	__drm_gpuva_region_remove(reg);
+}
+EXPORT_SYMBOL(drm_gpuva_region_remove);
+
+/**
+ * drm_gpuva_region_empty - indicate whether a &drm_gpuva_region is empty
+ * @reg: the &drm_gpuva to destroy
+ *
+ * Returns: true if the &drm_gpuva_region is empty, false otherwise
+ */
+bool
+drm_gpuva_region_empty(struct drm_gpuva_region *reg)
+{
+	DRM_GPUVA_ITER(it, reg->mgr);
+
+	drm_gpuva_iter_for_each_range(it, reg->va.addr,
+				      reg->va.addr +
+				      reg->va.range)
+		return false;
+
+	return true;
+}
+EXPORT_SYMBOL(drm_gpuva_region_empty);
+
+/**
+ * drm_gpuva_region_find_first - find the first &drm_gpuva_region in the given
+ * range
+ * @mgr: the &drm_gpuva_manager to search in
+ * @addr: the &drm_gpuva_regions address
+ * @range: the &drm_gpuva_regions range
+ *
+ * Returns: the first &drm_gpuva_region within the given range
+ */
+struct drm_gpuva_region *
+drm_gpuva_region_find_first(struct drm_gpuva_manager *mgr,
+			    u64 addr, u64 range)
+{
+	MA_STATE(mas, &mgr->region_mt, addr, 0);
+
+	return mas_find(&mas, addr + range - 1);
+}
+EXPORT_SYMBOL(drm_gpuva_region_find_first);
+
+/**
+ * drm_gpuva_region_find - find a &drm_gpuva_region
+ * @mgr: the &drm_gpuva_manager to search in
+ * @addr: the &drm_gpuva_regions address
+ * @range: the &drm_gpuva_regions range
+ *
+ * Returns: the &drm_gpuva_region at a given &addr and with a given &range
+ */
+struct drm_gpuva_region *
+drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
+		      u64 addr, u64 range)
+{
+	struct drm_gpuva_region *reg;
+
+	reg = drm_gpuva_region_find_first(mgr, addr, range);
+	if (!reg)
+		goto out;
+
+	if (reg->va.range != range)
+		goto out;
+
+	return reg;
+
+out:
+	return NULL;
+}
+EXPORT_SYMBOL(drm_gpuva_region_find);
+
+static int
+op_map_cb(int (*step)(struct drm_gpuva_op *, void *),
+	  void *priv,
+	  u64 addr, u64 range,
+	  struct drm_gem_object *obj, u64 offset)
+{
+	struct drm_gpuva_op op = {};
+
+	op.op = DRM_GPUVA_OP_MAP;
+	op.map.va.addr = addr;
+	op.map.va.range = range;
+	op.map.gem.obj = obj;
+	op.map.gem.offset = offset;
+
+	return step(&op, priv);
+}
+
+static int
+op_remap_cb(int (*step)(struct drm_gpuva_op *, void *),
+	    void *priv,
+	    struct drm_gpuva_op_map *prev,
+	    struct drm_gpuva_op_map *next,
+	    struct drm_gpuva_op_unmap *unmap)
+{
+	struct drm_gpuva_op op = {};
+	struct drm_gpuva_op_remap *r;
+
+	op.op = DRM_GPUVA_OP_REMAP;
+	r = &op.remap;
+	r->prev = prev;
+	r->next = next;
+	r->unmap = unmap;
+
+	return step(&op, priv);
+}
+
+static int
+op_unmap_cb(int (*step)(struct drm_gpuva_op *, void *),
+	    void *priv,
+	    struct drm_gpuva *va, bool merge)
+{
+	struct drm_gpuva_op op = {};
+
+	op.op = DRM_GPUVA_OP_UNMAP;
+	op.unmap.va = va;
+	op.unmap.keep = merge;
+
+	return step(&op, priv);
+}
+
+static inline bool
+gpuva_should_merge(struct drm_gpuva *va)
+{
+	/* Never merge mappings with NULL GEMs. */
+	return !!va->gem.obj;
+}
+
+static int
+__drm_gpuva_sm_map(struct drm_gpuva_manager *mgr,
+		   struct drm_gpuva_fn_ops *ops, void *priv,
+		   u64 req_addr, u64 req_range,
+		   struct drm_gem_object *req_obj, u64 req_offset)
+{
+	DRM_GPUVA_ITER(it, mgr);
+	int (*step)(struct drm_gpuva_op *, void *);
+	struct drm_gpuva *va, *prev = NULL;
+	u64 req_end = req_addr + req_range;
+	bool skip_pmerge = false, skip_nmerge = false;
+	int ret;
+
+	step = ops->sm_map_step;
+
+	if (unlikely(!drm_gpuva_in_mm_range(mgr, req_addr, req_range)))
+		return -EINVAL;
+
+	if (unlikely(drm_gpuva_in_kernel_region(mgr, req_addr, req_range)))
+		return -EINVAL;
+
+	if ((mgr->flags & DRM_GPUVA_MANAGER_REGIONS) &&
+	    !drm_gpuva_in_any_region(mgr, req_addr, req_range))
+		return -EINVAL;
+
+	drm_gpuva_iter_for_each_range(it, req_addr, req_end) {
+		struct drm_gpuva *va = it.va;
+		struct drm_gem_object *obj = va->gem.obj;
+		u64 offset = va->gem.offset;
+		u64 addr = va->va.addr;
+		u64 range = va->va.range;
+		u64 end = addr + range;
+		bool merge = gpuva_should_merge(va);
+
+		/* Generally, we want to skip merging with potential mappings
+		 * left and right of the requested one when we found a
+		 * collision, since merging happens in this loop already.
+		 *
+		 * However, there is one exception when the requested mapping
+		 * spans into a free VM area. If this is the case we might
+		 * still hit the boundary of another mapping before and/or
+		 * after the free VM area.
+		 */
+		skip_pmerge = true;
+		skip_nmerge = true;
+
+		if (addr == req_addr) {
+			merge &= obj == req_obj &&
+				 offset == req_offset;
+
+			if (end == req_end) {
+				if (merge)
+					goto done;
+
+				ret = op_unmap_cb(step, priv, va, false);
+				if (ret)
+					return ret;
+				break;
+			}
+
+			if (end < req_end) {
+				skip_nmerge = false;
+				ret = op_unmap_cb(step, priv, va, merge);
+				if (ret)
+					return ret;
+				goto next;
+			}
+
+			if (end > req_end) {
+				struct drm_gpuva_op_map n = {
+					.va.addr = req_end,
+					.va.range = range - req_range,
+					.gem.obj = obj,
+					.gem.offset = offset + req_range,
+				};
+				struct drm_gpuva_op_unmap u = { .va = va };
+
+				if (merge)
+					goto done;
+
+				ret = op_remap_cb(step, priv, NULL, &n, &u);
+				if (ret)
+					return ret;
+				break;
+			}
+		} else if (addr < req_addr) {
+			u64 ls_range = req_addr - addr;
+			struct drm_gpuva_op_map p = {
+				.va.addr = addr,
+				.va.range = ls_range,
+				.gem.obj = obj,
+				.gem.offset = offset,
+			};
+			struct drm_gpuva_op_unmap u = { .va = va };
+
+			merge &= obj == req_obj &&
+				 offset + ls_range == req_offset;
+
+			if (end == req_end) {
+				if (merge)
+					goto done;
+
+				ret = op_remap_cb(step, priv, &p, NULL, &u);
+				if (ret)
+					return ret;
+				break;
+			}
+
+			if (end < req_end) {
+				u64 new_addr = addr;
+				u64 new_range = req_range + ls_range;
+				u64 new_offset = offset;
+
+				/* We validated that the requested mapping is
+				 * within a single VA region already.
+				 * Since it overlaps the current mapping (which
+				 * can't cross a VA region boundary) we can be
+				 * sure that we're still within the boundaries
+				 * of the same VA region after merging.
+				 */
+				if (merge) {
+					req_offset = new_offset;
+					req_addr = new_addr;
+					req_range = new_range;
+					ret = op_unmap_cb(step, priv, va, true);
+					if (ret)
+						return ret;
+					goto next;
+				}
+
+				ret = op_remap_cb(step, priv, &p, NULL, &u);
+				if (ret)
+					return ret;
+				goto next;
+			}
+
+			if (end > req_end) {
+				struct drm_gpuva_op_map n = {
+					.va.addr = req_end,
+					.va.range = end - req_end,
+					.gem.obj = obj,
+					.gem.offset = offset + ls_range +
+						      req_range,
+				};
+
+				if (merge)
+					goto done;
+
+				ret = op_remap_cb(step, priv, &p, &n, &u);
+				if (ret)
+					return ret;
+				break;
+			}
+		} else if (addr > req_addr) {
+			merge &= obj == req_obj &&
+				 offset == req_offset +
+					   (addr - req_addr);
+
+			if (!prev)
+				skip_pmerge = false;
+
+			if (end == req_end) {
+				ret = op_unmap_cb(step, priv, va, merge);
+				if (ret)
+					return ret;
+				break;
+			}
+
+			if (end < req_end) {
+				skip_nmerge = false;
+				ret = op_unmap_cb(step, priv, va, merge);
+				if (ret)
+					return ret;
+				goto next;
+			}
+
+			if (end > req_end) {
+				struct drm_gpuva_op_map n = {
+					.va.addr = req_end,
+					.va.range = end - req_end,
+					.gem.obj = obj,
+					.gem.offset = offset + req_end - addr,
+				};
+				struct drm_gpuva_op_unmap u = { .va = va };
+				u64 new_end = end;
+				u64 new_range = new_end - req_addr;
+
+				/* We validated that the requested mapping is
+				 * within a single VA region already.
+				 * Since it overlaps the current mapping (which
+				 * can't cross a VA region boundary) we can be
+				 * sure that we're still within the boundaries
+				 * of the same VA region after merging.
+				 */
+				if (merge) {
+					req_end = new_end;
+					req_range = new_range;
+					ret = op_unmap_cb(step, priv, va, true);
+					if (ret)
+						return ret;
+					break;
+				}
+
+				ret = op_remap_cb(step, priv, NULL, &n, &u);
+				if (ret)
+					return ret;
+				break;
+			}
+		}
+next:
+		prev = va;
+	}
+
+	va = skip_pmerge ? NULL : drm_gpuva_find_prev(mgr, req_addr);
+	if (va) {
+		struct drm_gem_object *obj = va->gem.obj;
+		u64 offset = va->gem.offset;
+		u64 addr = va->va.addr;
+		u64 range = va->va.range;
+		u64 new_offset = offset;
+		u64 new_addr = addr;
+		u64 new_range = req_range + range;
+		bool merge = gpuva_should_merge(va) &&
+			     obj == req_obj &&
+			     offset + range == req_offset;
+
+		if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS)
+			merge &= drm_gpuva_in_any_region(mgr, new_addr,
+							 new_range);
+
+		if (merge) {
+			ret = op_unmap_cb(step, priv, va, true);
+			if (ret)
+				return ret;
+
+			req_offset = new_offset;
+			req_addr = new_addr;
+			req_range = new_range;
+		}
+	}
+
+	va = skip_nmerge ? NULL : drm_gpuva_find_next(mgr, req_end);
+	if (va) {
+		struct drm_gem_object *obj = va->gem.obj;
+		u64 offset = va->gem.offset;
+		u64 addr = va->va.addr;
+		u64 range = va->va.range;
+		u64 end = addr + range;
+		u64 new_range = req_range + range;
+		u64 new_end = end;
+		bool merge = gpuva_should_merge(va) &&
+			     obj == req_obj &&
+			     offset == req_offset + req_range;
+
+		if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS)
+			merge &= drm_gpuva_in_any_region(mgr, req_addr,
+							 new_range);
+
+		if (merge) {
+			ret = op_unmap_cb(step, priv, va, true);
+			if (ret)
+				return ret;
+
+			req_range = new_range;
+			req_end = new_end;
+		}
+	}
+
+	ret = op_map_cb(step, priv,
+			req_addr, req_range,
+			req_obj, req_offset);
+	if (ret)
+		return ret;
+
+done:
+	return 0;
+}
+
+static int
+__drm_gpuva_sm_unmap(struct drm_gpuva_manager *mgr,
+		     struct drm_gpuva_fn_ops *ops, void *priv,
+		     u64 req_addr, u64 req_range)
+{
+	DRM_GPUVA_ITER(it, mgr);
+	int (*step)(struct drm_gpuva_op *, void *);
+	u64 req_end = req_addr + req_range;
+	int ret;
+
+	step = ops->sm_unmap_step;
+
+	drm_gpuva_iter_for_each_range(it, req_addr, req_end) {
+		struct drm_gpuva *va = it.va;
+		struct drm_gpuva_op_map prev = {}, next = {};
+		bool prev_split = false, next_split = false;
+		struct drm_gem_object *obj = va->gem.obj;
+		u64 offset = va->gem.offset;
+		u64 addr = va->va.addr;
+		u64 range = va->va.range;
+		u64 end = addr + range;
+
+		if (addr < req_addr) {
+			prev.va.addr = addr;
+			prev.va.range = req_addr - addr;
+			prev.gem.obj = obj;
+			prev.gem.offset = offset;
+
+			prev_split = true;
+		}
+
+		if (end > req_end) {
+			next.va.addr = req_end;
+			next.va.range = end - req_end;
+			next.gem.obj = obj;
+			next.gem.offset = offset + (req_end - addr);
+
+			next_split = true;
+		}
+
+		if (prev_split || next_split) {
+			struct drm_gpuva_op_unmap unmap = { .va = va };
+
+			ret = op_remap_cb(step, priv, &prev, &next, &unmap);
+			if (ret)
+				return ret;
+		} else {
+			ret = op_unmap_cb(step, priv, va, false);
+			if (ret)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * drm_gpuva_sm_map - creates the &drm_gpuva_op split/merge steps
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ * @req_addr: the start address of the new mapping
+ * @req_range: the range of the new mapping
+ * @req_obj: the &drm_gem_object to map
+ * @req_offset: the offset within the &drm_gem_object
+ * @priv: pointer to a driver private data structure
+ *
+ * This function iterates the given range of the GPU VA space. It utilizes the
+ * &drm_gpuva_fn_ops to call back into the driver providing the split and merge
+ * steps.
+ *
+ * Drivers may use these callbacks to update the GPU VA space right away within
+ * the callback. In case the driver decides to copy and store the operations for
+ * later processing neither this function nor &drm_gpuva_sm_unmap is allowed to
+ * be called before the &drm_gpuva_manager's view of the GPU VA space was
+ * updated with the previous set of operations. To update the
+ * &drm_gpuva_manager's view of the GPU VA space drm_gpuva_insert(),
+ * drm_gpuva_destroy_locked() and/or drm_gpuva_destroy_unlocked() should be
+ * used.
+ *
+ * A sequence of callbacks can contain map, unmap and remap operations, but
+ * the sequence of callbacks might also be empty if no operation is required,
+ * e.g. if the requested mapping already exists in the exact same way.
+ *
+ * There can be an arbitrary amount of unmap operations, a maximum of two remap
+ * operations and a single map operation. The latter one, if existent,
+ * represents the original map operation requested by the caller. Please note
+ * that the map operation might has been modified, e.g. if it was merged with
+ * an existent mapping.
+ *
+ * Returns: 0 on success or a negative error code
+ */
+int
+drm_gpuva_sm_map(struct drm_gpuva_manager *mgr, void *priv,
+		 u64 req_addr, u64 req_range,
+		 struct drm_gem_object *req_obj, u64 req_offset)
+{
+	if (!mgr->ops || !mgr->ops->sm_map_step)
+		return -EINVAL;
+
+	return __drm_gpuva_sm_map(mgr, mgr->ops, priv,
+				  req_addr, req_range,
+				  req_obj, req_offset);
+}
+EXPORT_SYMBOL(drm_gpuva_sm_map);
+
+/**
+ * drm_gpuva_sm_unmap - creates the &drm_gpuva_ops to split on unmap
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ * @req_addr: the start address of the range to unmap
+ * @req_range: the range of the mappings to unmap
+ * @ops: the &drm_gpuva_fn_ops callbacks to provide the split/merge steps
+ * @priv: pointer to a driver private data structure
+ *
+ * This function iterates the given range of the GPU VA space. It utilizes the
+ * &drm_gpuva_fn_ops to call back into the driver providing the operations to
+ * unmap and, if required, split existent mappings.
+ *
+ * Drivers may use these callbacks to update the GPU VA space right away within
+ * the callback. In case the driver decides to copy and store the operations for
+ * later processing neither this function nor &drm_gpuva_sm_map is allowed to be
+ * called before the &drm_gpuva_manager's view of the GPU VA space was updated
+ * with the previous set of operations. To update the &drm_gpuva_manager's view
+ * of the GPU VA space drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
+ * drm_gpuva_destroy_unlocked() should be used.
+ *
+ * A sequence of callbacks can contain unmap and remap operations, depending on
+ * whether there are actual overlapping mappings to split.
+ *
+ * There can be an arbitrary amount of unmap operations and a maximum of two
+ * remap operations.
+ *
+ * Returns: 0 on success or a negative error code
+ */
+int
+drm_gpuva_sm_unmap(struct drm_gpuva_manager *mgr, void *priv,
+		   u64 req_addr, u64 req_range)
+{
+	if (!mgr->ops || !mgr->ops->sm_unmap_step)
+		return -EINVAL;
+
+	return __drm_gpuva_sm_unmap(mgr, mgr->ops, priv,
+				    req_addr, req_range);
+}
+EXPORT_SYMBOL(drm_gpuva_sm_unmap);
+
+static struct drm_gpuva_op *
+gpuva_op_alloc(struct drm_gpuva_manager *mgr)
+{
+	struct drm_gpuva_fn_ops *fn = mgr->ops;
+	struct drm_gpuva_op *op;
+
+	if (fn && fn->op_alloc)
+		op = fn->op_alloc();
+	else
+		op = kzalloc(sizeof(*op), GFP_KERNEL);
+
+	if (unlikely(!op))
+		return NULL;
+
+	return op;
+}
+
+static void
+gpuva_op_free(struct drm_gpuva_manager *mgr,
+	      struct drm_gpuva_op *op)
+{
+	struct drm_gpuva_fn_ops *fn = mgr->ops;
+
+	if (fn && fn->op_free)
+		fn->op_free(op);
+	else
+		kfree(op);
+}
+
+int drm_gpuva_sm_step(struct drm_gpuva_op *__op, void *priv)
+{
+	struct {
+		struct drm_gpuva_manager *mgr;
+		struct drm_gpuva_ops *ops;
+	} *args = priv;
+	struct drm_gpuva_manager *mgr = args->mgr;
+	struct drm_gpuva_ops *ops = args->ops;
+	struct drm_gpuva_op *op;
+
+	op = gpuva_op_alloc(mgr);
+	if (unlikely(!op))
+		goto err;
+
+	memcpy(op, __op, sizeof(*op));
+
+	if (op->op == DRM_GPUVA_OP_REMAP) {
+		struct drm_gpuva_op_remap *__r = &__op->remap;
+		struct drm_gpuva_op_remap *r = &op->remap;
+
+		r->unmap = kmemdup(__r->unmap, sizeof(*r->unmap),
+				   GFP_KERNEL);
+		if (unlikely(!r->unmap))
+			goto err_free_op;
+
+		if (__r->prev) {
+			r->prev = kmemdup(__r->prev, sizeof(*r->prev),
+					  GFP_KERNEL);
+			if (unlikely(!r->prev))
+				goto err_free_unmap;
+		}
+
+		if (__r->next) {
+			r->next = kmemdup(__r->next, sizeof(*r->next),
+					  GFP_KERNEL);
+			if (unlikely(!r->next))
+				goto err_free_prev;
+		}
+	}
+
+	list_add_tail(&op->entry, &ops->list);
+
+	return 0;
+
+err_free_unmap:
+	kfree(op->remap.unmap);
+err_free_prev:
+	kfree(op->remap.prev);
+err_free_op:
+	gpuva_op_free(mgr, op);
+err:
+	return -ENOMEM;
+}
+
+static struct drm_gpuva_fn_ops gpuva_list_ops = {
+	.sm_map_step = drm_gpuva_sm_step,
+	.sm_unmap_step = drm_gpuva_sm_step,
+};
+
+/**
+ * drm_gpuva_sm_map_ops_create - creates the &drm_gpuva_ops to split and merge
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ * @req_addr: the start address of the new mapping
+ * @req_range: the range of the new mapping
+ * @req_obj: the &drm_gem_object to map
+ * @req_offset: the offset within the &drm_gem_object
+ *
+ * This function creates a list of operations to perform splitting and merging
+ * of existent mapping(s) with the newly requested one.
+ *
+ * The list can be iterated with &drm_gpuva_for_each_op and must be processed
+ * in the given order. It can contain map, unmap and remap operations, but it
+ * also can be empty if no operation is required, e.g. if the requested mapping
+ * already exists is the exact same way.
+ *
+ * There can be an arbitrary amount of unmap operations, a maximum of two remap
+ * operations and a single map operation. The latter one, if existent,
+ * represents the original map operation requested by the caller. Please note
+ * that the map operation might has been modified, e.g. if it was merged with an
+ * existent mapping.
+ *
+ * Note that before calling this function again with another mapping request it
+ * is necessary to update the &drm_gpuva_manager's view of the GPU VA space. The
+ * previously obtained operations must be either processed or abandoned. To
+ * update the &drm_gpuva_manager's view of the GPU VA space drm_gpuva_insert(),
+ * drm_gpuva_destroy_locked() and/or drm_gpuva_destroy_unlocked() should be
+ * used.
+ *
+ * After the caller finished processing the returned &drm_gpuva_ops, they must
+ * be freed with &drm_gpuva_ops_free.
+ *
+ * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
+ */
+struct drm_gpuva_ops *
+drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
+			    u64 req_addr, u64 req_range,
+			    struct drm_gem_object *req_obj, u64 req_offset)
+{
+	struct drm_gpuva_ops *ops;
+	struct {
+		struct drm_gpuva_manager *mgr;
+		struct drm_gpuva_ops *ops;
+	} args;
+	int ret;
+
+	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
+	if (unlikely(!ops))
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&ops->list);
+
+	args.mgr = mgr;
+	args.ops = ops;
+
+	ret = __drm_gpuva_sm_map(mgr, &gpuva_list_ops, &args,
+				 req_addr, req_range,
+				 req_obj, req_offset);
+	if (ret) {
+		kfree(ops);
+		return ERR_PTR(ret);
+	}
+
+	return ops;
+}
+EXPORT_SYMBOL(drm_gpuva_sm_map_ops_create);
+
+/**
+ * drm_gpuva_sm_unmap_ops_create - creates the &drm_gpuva_ops to split on unmap
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ * @req_addr: the start address of the range to unmap
+ * @req_range: the range of the mappings to unmap
+ *
+ * This function creates a list of operations to perform unmapping and, if
+ * required, splitting of the mappings overlapping the unmap range.
+ *
+ * The list can be iterated with &drm_gpuva_for_each_op and must be processed
+ * in the given order. It can contain unmap and remap operations, depending on
+ * whether there are actual overlapping mappings to split.
+ *
+ * There can be an arbitrary amount of unmap operations and a maximum of two
+ * remap operations.
+ *
+ * Note that before calling this function again with another range to unmap it
+ * is necessary to update the &drm_gpuva_manager's view of the GPU VA space. The
+ * previously obtained operations must be processed or abandoned. To update the
+ * &drm_gpuva_manager's view of the GPU VA space drm_gpuva_insert(),
+ * drm_gpuva_destroy_locked() and/or drm_gpuva_destroy_unlocked() should be
+ * used.
+ *
+ * After the caller finished processing the returned &drm_gpuva_ops, they must
+ * be freed with &drm_gpuva_ops_free.
+ *
+ * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
+ */
+struct drm_gpuva_ops *
+drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
+			      u64 req_addr, u64 req_range)
+{
+	struct drm_gpuva_ops *ops;
+	struct {
+		struct drm_gpuva_manager *mgr;
+		struct drm_gpuva_ops *ops;
+	} args;
+	int ret;
+
+	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
+	if (unlikely(!ops))
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&ops->list);
+
+	args.mgr = mgr;
+	args.ops = ops;
+
+	ret = __drm_gpuva_sm_unmap(mgr, &gpuva_list_ops, &args,
+				   req_addr, req_range);
+	if (ret) {
+		kfree(ops);
+		return ERR_PTR(ret);
+	}
+
+	return ops;
+}
+EXPORT_SYMBOL(drm_gpuva_sm_unmap_ops_create);
+
+/**
+ * drm_gpuva_prefetch_ops_create - creates the &drm_gpuva_ops to prefetch
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ * @req_addr: the start address of the range to prefetch
+ * @req_range: the range of the mappings to prefetch
+ *
+ * This function creates a list of operations to perform prefetching.
+ *
+ * The list can be iterated with &drm_gpuva_for_each_op and must be processed
+ * in the given order. It can contain prefetch operations.
+ *
+ * There can be an arbitrary amount of prefetch operations.
+ *
+ * After the caller finished processing the returned &drm_gpuva_ops, they must
+ * be freed with &drm_gpuva_ops_free.
+ *
+ * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
+ */
+struct drm_gpuva_ops *
+drm_gpuva_prefetch_ops_create(struct drm_gpuva_manager *mgr,
+			      u64 addr, u64 range)
+{
+	DRM_GPUVA_ITER(it, mgr);
+	struct drm_gpuva_ops *ops;
+	struct drm_gpuva_op *op;
+	int ret;
+
+	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
+	if (!ops)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&ops->list);
+
+	drm_gpuva_iter_for_each_range(it, addr, addr + range) {
+		op = gpuva_op_alloc(mgr);
+		if (!op) {
+			ret = -ENOMEM;
+			goto err_free_ops;
+		}
+
+		op->op = DRM_GPUVA_OP_PREFETCH;
+		op->prefetch.va = it.va;
+		list_add_tail(&op->entry, &ops->list);
+	}
+
+	return ops;
+
+err_free_ops:
+	drm_gpuva_ops_free(mgr, ops);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(drm_gpuva_prefetch_ops_create);
+
+/**
+ * drm_gpuva_gem_unmap_ops_create - creates the &drm_gpuva_ops to unmap a GEM
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ * @obj: the &drm_gem_object to unmap
+ *
+ * This function creates a list of operations to perform unmapping for every
+ * GPUVA attached to a GEM.
+ *
+ * The list can be iterated with &drm_gpuva_for_each_op and consists out of an
+ * arbitrary amount of unmap operations.
+ *
+ * After the caller finished processing the returned &drm_gpuva_ops, they must
+ * be freed with &drm_gpuva_ops_free.
+ *
+ * It is the callers responsibility to protect the GEMs GPUVA list against
+ * concurrent access.
+ *
+ * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
+ */
+struct drm_gpuva_ops *
+drm_gpuva_gem_unmap_ops_create(struct drm_gpuva_manager *mgr,
+			       struct drm_gem_object *obj)
+{
+	struct drm_gpuva_ops *ops;
+	struct drm_gpuva_op *op;
+	struct drm_gpuva *va;
+	int ret;
+
+	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
+	if (!ops)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&ops->list);
+
+	drm_gem_for_each_gpuva(va, obj) {
+		op = gpuva_op_alloc(mgr);
+		if (!op) {
+			ret = -ENOMEM;
+			goto err_free_ops;
+		}
+
+		op->op = DRM_GPUVA_OP_UNMAP;
+		op->unmap.va = va;
+		list_add_tail(&op->entry, &ops->list);
+	}
+
+	return ops;
+
+err_free_ops:
+	drm_gpuva_ops_free(mgr, ops);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(drm_gpuva_gem_unmap_ops_create);
+
+
+/**
+ * drm_gpuva_ops_free - free the given &drm_gpuva_ops
+ * @mgr: the &drm_gpuva_manager the ops were created for
+ * @ops: the &drm_gpuva_ops to free
+ *
+ * Frees the given &drm_gpuva_ops structure including all the ops associated
+ * with it.
+ */
+void
+drm_gpuva_ops_free(struct drm_gpuva_manager *mgr,
+		   struct drm_gpuva_ops *ops)
+{
+	struct drm_gpuva_op *op, *next;
+
+	drm_gpuva_for_each_op_safe(op, next, ops) {
+		list_del(&op->entry);
+
+		if (op->op == DRM_GPUVA_OP_REMAP) {
+			kfree(op->remap.prev);
+			kfree(op->remap.next);
+			kfree(op->remap.unmap);
+		}
+
+		gpuva_op_free(mgr, op);
+	}
+
+	kfree(ops);
+}
+EXPORT_SYMBOL(drm_gpuva_ops_free);
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 1d76d0686b03..4fe4a1552948 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -104,6 +104,12 @@ enum drm_driver_feature {
 	 * acceleration should be handled by two drivers that are connected using auxiliary bus.
 	 */
 	DRIVER_COMPUTE_ACCEL            = BIT(7),
+	/**
+	 * @DRIVER_GEM_GPUVA:
+	 *
+	 * Driver supports user defined GPU VA bindings for GEM objects.
+	 */
+	DRIVER_GEM_GPUVA		= BIT(8),
 
 	/* IMPORTANT: Below are all the legacy flags, add new ones above. */
 
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 772a4adf5287..4a3679034966 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -36,6 +36,8 @@
 
 #include <linux/kref.h>
 #include <linux/dma-resv.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
 
 #include <drm/drm_vma_manager.h>
 
@@ -337,6 +339,17 @@ struct drm_gem_object {
 	 */
 	struct dma_resv _resv;
 
+	/**
+	 * @gpuva:
+	 *
+	 * Provides the list and list mutex of GPU VAs attached to this
+	 * GEM object.
+	 */
+	struct {
+		struct list_head list;
+		struct mutex mutex;
+	} gpuva;
+
 	/**
 	 * @funcs:
 	 *
@@ -479,4 +492,66 @@ void drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj);
 unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru, unsigned nr_to_scan,
 			       bool (*shrink)(struct drm_gem_object *obj));
 
+/**
+ * drm_gem_gpuva_init - initialize the gpuva list of a GEM object
+ * @obj: the &drm_gem_object
+ *
+ * This initializes the &drm_gem_object's &drm_gpuva list and the mutex
+ * protecting it.
+ *
+ * Calling this function is only necessary for drivers intending to support the
+ * &drm_driver_feature DRIVER_GEM_GPUVA.
+ */
+static inline void drm_gem_gpuva_init(struct drm_gem_object *obj)
+{
+	INIT_LIST_HEAD(&obj->gpuva.list);
+	mutex_init(&obj->gpuva.mutex);
+}
+
+/**
+ * drm_gem_gpuva_lock - lock the GEM's gpuva list mutex
+ * @obj: the &drm_gem_object
+ *
+ * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
+ */
+static inline void drm_gem_gpuva_lock(struct drm_gem_object *obj)
+{
+	mutex_lock(&obj->gpuva.mutex);
+}
+
+/**
+ * drm_gem_gpuva_unlock - unlock the GEM's gpuva list mutex
+ * @obj: the &drm_gem_object
+ *
+ * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
+ */
+static inline void drm_gem_gpuva_unlock(struct drm_gem_object *obj)
+{
+	mutex_unlock(&obj->gpuva.mutex);
+}
+
+/**
+ * drm_gem_for_each_gpuva - iternator to walk over a list of gpuvas
+ * @entry: &drm_gpuva structure to assign to in each iteration step
+ * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
+ *
+ * This iterator walks over all &drm_gpuva structures associated with the
+ * &drm_gpuva_manager.
+ */
+#define drm_gem_for_each_gpuva(entry, obj) \
+	list_for_each_entry(entry, &obj->gpuva.list, head)
+
+/**
+ * drm_gem_for_each_gpuva_safe - iternator to safely walk over a list of gpuvas
+ * @entry: &drm_gpuva structure to assign to in each iteration step
+ * @next: &next &drm_gpuva to store the next step
+ * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
+ *
+ * This iterator walks over all &drm_gpuva structures associated with the
+ * &drm_gem_object. It is implemented with list_for_each_entry_safe(), hence
+ * it is save against removal of elements.
+ */
+#define drm_gem_for_each_gpuva_safe(entry, next, obj) \
+	list_for_each_entry_safe(entry, next, &obj->gpuva.list, head)
+
 #endif /* __DRM_GEM_H__ */
diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
new file mode 100644
index 000000000000..d245d01e37a9
--- /dev/null
+++ b/include/drm/drm_gpuva_mgr.h
@@ -0,0 +1,714 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __DRM_GPUVA_MGR_H__
+#define __DRM_GPUVA_MGR_H__
+
+/*
+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/maple_tree.h>
+#include <linux/mm.h>
+#include <linux/rbtree.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+
+struct drm_gpuva_manager;
+struct drm_gpuva_fn_ops;
+
+/**
+ * struct drm_gpuva_region - structure to track a portion of GPU VA space
+ *
+ * This structure represents a portion of a GPUs VA space and is associated
+ * with a &drm_gpuva_manager.
+ *
+ * GPU VA mappings, represented by &drm_gpuva objects, are restricted to be
+ * placed within a &drm_gpuva_region.
+ */
+struct drm_gpuva_region {
+	/**
+	 * @mgr: the &drm_gpuva_manager this object is associated with
+	 */
+	struct drm_gpuva_manager *mgr;
+
+	/**
+	 * @va: structure containing the address and range of the &drm_gpuva_region
+	 */
+	struct {
+		/**
+		 * @addr: the start address
+		 */
+		u64 addr;
+
+		/*
+		 * @range: the range
+		 */
+		u64 range;
+	} va;
+
+	/**
+	 * @sparse: indicates whether this region is sparse
+	 */
+	bool sparse;
+};
+
+int drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
+			    struct drm_gpuva_region *reg);
+void drm_gpuva_region_remove(struct drm_gpuva_region *reg);
+
+bool
+drm_gpuva_region_empty(struct drm_gpuva_region *reg);
+
+struct drm_gpuva_region *
+drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
+		      u64 addr, u64 range);
+struct drm_gpuva_region *
+drm_gpuva_region_find_first(struct drm_gpuva_manager *mgr,
+			    u64 addr, u64 range);
+
+/**
+ * enum drm_gpuva_flags - flags for struct drm_gpuva
+ */
+enum drm_gpuva_flags {
+	/**
+	 * @DRM_GPUVA_EVICTED:
+	 *
+	 * Flag indicating that the &drm_gpuva's backing GEM is evicted.
+	 */
+	DRM_GPUVA_EVICTED = (1 << 0),
+
+	/**
+	 * @DRM_GPUVA_USERBITS: user defined bits
+	 */
+	DRM_GPUVA_USERBITS = (1 << 1),
+};
+
+/**
+ * struct drm_gpuva - structure to track a GPU VA mapping
+ *
+ * This structure represents a GPU VA mapping and is associated with a
+ * &drm_gpuva_manager.
+ *
+ * Typically, this structure is embedded in bigger driver structures.
+ */
+struct drm_gpuva {
+	/**
+	 * @mgr: the &drm_gpuva_manager this object is associated with
+	 */
+	struct drm_gpuva_manager *mgr;
+
+	/**
+	 * @region: the &drm_gpuva_region the &drm_gpuva is mapped in
+	 */
+	struct drm_gpuva_region *region;
+
+	/**
+	 * @head: the &list_head to attach this object to a &drm_gem_object
+	 */
+	struct list_head head;
+
+	/**
+	 * @flags: the &drm_gpuva_flags for this mapping
+	 */
+	enum drm_gpuva_flags flags;
+
+	/**
+	 * @va: structure containing the address and range of the &drm_gpuva
+	 */
+	struct {
+		/**
+		 * @addr: the start address
+		 */
+		u64 addr;
+
+		/*
+		 * @range: the range
+		 */
+		u64 range;
+	} va;
+
+	/**
+	 * @gem: structure containing the &drm_gem_object and it's offset
+	 */
+	struct {
+		/**
+		 * @offset: the offset within the &drm_gem_object
+		 */
+		u64 offset;
+
+		/**
+		 * @obj: the mapped &drm_gem_object
+		 */
+		struct drm_gem_object *obj;
+	} gem;
+};
+
+void drm_gpuva_link(struct drm_gpuva *va);
+void drm_gpuva_unlink(struct drm_gpuva *va);
+
+int drm_gpuva_insert(struct drm_gpuva_manager *mgr,
+		     struct drm_gpuva *va);
+void drm_gpuva_remove(struct drm_gpuva *va);
+
+struct drm_gpuva *drm_gpuva_find(struct drm_gpuva_manager *mgr,
+				 u64 addr, u64 range);
+struct drm_gpuva *drm_gpuva_find_first(struct drm_gpuva_manager *mgr,
+				       u64 addr, u64 range);
+struct drm_gpuva *drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start);
+struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end);
+
+/**
+ * drm_gpuva_evict - sets whether the backing GEM of this &drm_gpuva is evicted
+ * @va: the &drm_gpuva to set the evict flag for
+ * @evict: indicates whether the &drm_gpuva is evicted
+ */
+static inline void drm_gpuva_evict(struct drm_gpuva *va, bool evict)
+{
+	if (evict)
+		va->flags |= DRM_GPUVA_EVICTED;
+	else
+		va->flags &= ~DRM_GPUVA_EVICTED;
+}
+
+/**
+ * drm_gpuva_evicted - indicates whether the backing BO of this &drm_gpuva
+ * is evicted
+ * @va: the &drm_gpuva to check
+ */
+static inline bool drm_gpuva_evicted(struct drm_gpuva *va)
+{
+	return va->flags & DRM_GPUVA_EVICTED;
+}
+
+/**
+ * enum drm_gpuva_mgr_flags - the feature flags for the &drm_gpuva_manager
+ */
+enum drm_gpuva_mgr_flags {
+	/**
+	 * @DRM_GPUVA_MANAGER_REGIONS:
+	 *
+	 * Enable the &drm_gpuva_manager to separately track &drm_gpuva_regions.
+	 *
+	 * &drm_gpuva_regions represent a reserved portion of VA space drivers
+	 * can create mappings in. If regions are enabled, &drm_gpuvas can be
+	 * created within an existing &drm_gpuva_region only and merge
+	 * operations never indicate merging over region boundaries.
+	 */
+	DRM_GPUVA_MANAGER_REGIONS = (1 << 0),
+};
+
+/**
+ * struct drm_gpuva_manager - DRM GPU VA Manager
+ *
+ * The DRM GPU VA Manager keeps track of a GPU's virtual address space by using
+ * &maple_tree structures. Typically, this structure is embedded in bigger
+ * driver structures.
+ *
+ * Drivers can pass addresses and ranges in an arbitrary unit, e.g. bytes or
+ * pages.
+ *
+ * There should be one manager instance per GPU virtual address space.
+ */
+struct drm_gpuva_manager {
+	/**
+	 * @name: the name of the DRM GPU VA space
+	 */
+	const char *name;
+
+	/**
+	 * @mm_start: start of the VA space
+	 */
+	u64 mm_start;
+
+	/**
+	 * @mm_range: length of the VA space
+	 */
+	u64 mm_range;
+
+	/**
+	 * @region_mt: the &maple_tree to track GPU VA regions
+	 */
+	struct maple_tree region_mt;
+
+	/**
+	 * @va_mt: the &maple_tree to track GPU VA mappings
+	 */
+	struct maple_tree va_mt;
+
+	/**
+	 * @kernel_alloc_region:
+	 *
+	 * &drm_gpuva_region representing the address space cutout reserved for
+	 * the kernel
+	 */
+	struct drm_gpuva_region kernel_alloc_region;
+
+	/**
+	 * @ops: &drm_gpuva_fn_ops providing the split/merge steps to drivers
+	 */
+	struct drm_gpuva_fn_ops *ops;
+
+	/**
+	 * @flags: the feature flags of the &drm_gpuva_manager
+	 */
+	enum drm_gpuva_mgr_flags flags;
+};
+
+void drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
+			    const char *name,
+			    u64 start_offset, u64 range,
+			    u64 reserve_offset, u64 reserve_range,
+			    struct drm_gpuva_fn_ops *ops,
+			    enum drm_gpuva_mgr_flags flags);
+void drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr);
+
+/**
+ * struct drm_gpuva_iterator - iterator for walking the internal (maple) tree
+ */
+struct drm_gpuva_iterator {
+	/**
+	 * @mas: the maple tree iterator (maple advanced state)
+	 */
+	struct ma_state mas;
+
+	/**
+	 * @mgr: the &drm_gpuva_manager to iterate
+	 */
+	struct drm_gpuva_manager *mgr;
+
+	union {
+		/**
+		 * @va: the current &drm_gpuva entry
+		 */
+		struct drm_gpuva *va;
+
+		/**
+		 * @reg: the current &drm_gpuva_region entry
+		 */
+		struct drm_gpuva_region *reg;
+
+		/**
+		 * @entry: the current entry
+		 */
+		void *entry;
+	};
+};
+
+void drm_gpuva_iter_remove(struct drm_gpuva_iterator *it);
+
+/**
+ * DRM_GPUVA_ITER - create an iterator structure to iterate the &drm_gpuva tree
+ * @name: the name of the &drm_gpuva_iterator to create
+ * @mgr: the &drm_gpuva_manager to iterate
+ */
+#define DRM_GPUVA_ITER(name, mgr__)				\
+	struct drm_gpuva_iterator name = {			\
+		.mas = __MA_STATE(&(mgr__)->va_mt, 0, 0),	\
+		.mgr = mgr__,					\
+		.va = NULL,					\
+	}
+
+/**
+ * DRM_GPUVA_REGION_ITER - create an iterator structure to iterate the
+ * &drm_gpuva_region tree
+ * @name: the name of the &drm_gpuva_iterator to create
+ * @mgr: the &drm_gpuva_manager to iterate
+ */
+#define DRM_GPUVA_REGION_ITER(name, mgr__)			\
+	struct drm_gpuva_iterator name = {			\
+		.mas = __MA_STATE(&(mgr__)->region_mt, 0, 0),	\
+		.mgr = mgr__,					\
+		.reg = NULL,					\
+	}
+
+/**
+ * drm_gpuva_iter_for_each_range - iternator to walk over a range of entries
+ * @it__: &drm_gpuva_iterator structure to assign to in each iteration step
+ * @start__: starting offset, the first entry will overlap this
+ * @end__: ending offset, the last entry will start before this (but may overlap)
+ *
+ * This function can be used to iterate both &drm_gpuva objects and
+ * &drm_gpuva_region objects.
+ *
+ * It is safe against the removal of elements using &drm_gpuva_iter_remove,
+ * however it is not safe against the removal of elements using
+ * &drm_gpuva_remove and &drm_gpuva_region_remove.
+ */
+#define drm_gpuva_iter_for_each_range(it__, start__, end__) \
+	for ((it__).mas.index = start__, (it__).entry = mas_find(&(it__).mas, end__ - 1); \
+	     (it__).entry; (it__).entry = mas_find(&(it__).mas, end__ - 1))
+
+/**
+ * drm_gpuva_iter_for_each - iternator to walk over all existing entries
+ * @it__: &drm_gpuva_iterator structure to assign to in each iteration step
+ *
+ * This function can be used to iterate both &drm_gpuva objects and
+ * &drm_gpuva_region objects.
+ *
+ * It is safe against the removal of elements using &drm_gpuva_iter_remove,
+ * however it is not safe against the removal of elements using
+ * &drm_gpuva_remove and &drm_gpuva_region_remove.
+ */
+#define drm_gpuva_iter_for_each(it__) \
+	drm_gpuva_iter_for_each_range(it__, (it__).mgr->mm_start, \
+				      (it__).mgr->mm_start + (it__).mgr->mm_range)
+
+/**
+ * enum drm_gpuva_op_type - GPU VA operation type
+ *
+ * Operations to alter the GPU VA mappings tracked by the &drm_gpuva_manager.
+ */
+enum drm_gpuva_op_type {
+	/**
+	 * @DRM_GPUVA_OP_MAP: the map op type
+	 */
+	DRM_GPUVA_OP_MAP,
+
+	/**
+	 * @DRM_GPUVA_OP_REMAP: the remap op type
+	 */
+	DRM_GPUVA_OP_REMAP,
+
+	/**
+	 * @DRM_GPUVA_OP_UNMAP: the unmap op type
+	 */
+	DRM_GPUVA_OP_UNMAP,
+
+	/**
+	 * @DRM_GPUVA_OP_PREFETCH: the prefetch op type
+	 */
+	DRM_GPUVA_OP_PREFETCH,
+};
+
+/**
+ * struct drm_gpuva_op_map - GPU VA map operation
+ *
+ * This structure represents a single map operation generated by the
+ * DRM GPU VA manager.
+ */
+struct drm_gpuva_op_map {
+	/**
+	 * @va: structure containing address and range of a map
+	 * operation
+	 */
+	struct {
+		/**
+		 * @addr: the base address of the new mapping
+		 */
+		u64 addr;
+
+		/**
+		 * @range: the range of the new mapping
+		 */
+		u64 range;
+	} va;
+
+	/**
+	 * @gem: structure containing the &drm_gem_object and it's offset
+	 */
+	struct {
+		/**
+		 * @offset: the offset within the &drm_gem_object
+		 */
+		u64 offset;
+
+		/**
+		 * @obj: the &drm_gem_object to map
+		 */
+		struct drm_gem_object *obj;
+	} gem;
+};
+
+/**
+ * struct drm_gpuva_op_unmap - GPU VA unmap operation
+ *
+ * This structure represents a single unmap operation generated by the
+ * DRM GPU VA manager.
+ */
+struct drm_gpuva_op_unmap {
+	/**
+	 * @va: the &drm_gpuva to unmap
+	 */
+	struct drm_gpuva *va;
+
+	/**
+	 * @keep:
+	 *
+	 * Indicates whether this &drm_gpuva is physically contiguous with the
+	 * original mapping request.
+	 *
+	 * Optionally, if &keep is set, drivers may keep the actual page table
+	 * mappings for this &drm_gpuva, adding the missing page table entries
+	 * only and update the &drm_gpuva_manager accordingly.
+	 */
+	bool keep;
+};
+
+/**
+ * struct drm_gpuva_op_remap - GPU VA remap operation
+ *
+ * This represents a single remap operation generated by the DRM GPU VA manager.
+ *
+ * A remap operation is generated when an existing GPU VA mmapping is split up
+ * by inserting a new GPU VA mapping or by partially unmapping existent
+ * mapping(s), hence it consists of a maximum of two map and one unmap
+ * operation.
+ *
+ * The @unmap operation takes care of removing the original existing mapping.
+ * @prev is used to remap the preceding part, @next the subsequent part.
+ *
+ * If either a new mapping's start address is aligned with the start address
+ * of the old mapping or the new mapping's end address is aligned with the
+ * end address of the old mapping, either @prev or @next is NULL.
+ *
+ * Note, the reason for a dedicated remap operation, rather than arbitrary
+ * unmap and map operations, is to give drivers the chance of extracting driver
+ * specific data for creating the new mappings from the unmap operations's
+ * &drm_gpuva structure which typically is embedded in larger driver specific
+ * structures.
+ */
+struct drm_gpuva_op_remap {
+	/**
+	 * @prev: the preceding part of a split mapping
+	 */
+	struct drm_gpuva_op_map *prev;
+
+	/**
+	 * @next: the subsequent part of a split mapping
+	 */
+	struct drm_gpuva_op_map *next;
+
+	/**
+	 * @unmap: the unmap operation for the original existing mapping
+	 */
+	struct drm_gpuva_op_unmap *unmap;
+};
+
+/**
+ * struct drm_gpuva_op_prefetch - GPU VA prefetch operation
+ *
+ * This structure represents a single prefetch operation generated by the
+ * DRM GPU VA manager.
+ */
+struct drm_gpuva_op_prefetch {
+	/**
+	 * @va: the &drm_gpuva to prefetch
+	 */
+	struct drm_gpuva *va;
+};
+
+/**
+ * struct drm_gpuva_op - GPU VA operation
+ *
+ * This structure represents a single generic operation.
+ *
+ * The particular type of the operation is defined by @op.
+ */
+struct drm_gpuva_op {
+	/**
+	 * @entry:
+	 *
+	 * The &list_head used to distribute instances of this struct within
+	 * &drm_gpuva_ops.
+	 */
+	struct list_head entry;
+
+	/**
+	 * @op: the type of the operation
+	 */
+	enum drm_gpuva_op_type op;
+
+	union {
+		/**
+		 * @map: the map operation
+		 */
+		struct drm_gpuva_op_map map;
+
+		/**
+		 * @remap: the remap operation
+		 */
+		struct drm_gpuva_op_remap remap;
+
+		/**
+		 * @unmap: the unmap operation
+		 */
+		struct drm_gpuva_op_unmap unmap;
+
+		/**
+		 * @prefetch: the prefetch operation
+		 */
+		struct drm_gpuva_op_prefetch prefetch;
+	};
+};
+
+/**
+ * struct drm_gpuva_ops - wraps a list of &drm_gpuva_op
+ */
+struct drm_gpuva_ops {
+	/**
+	 * @list: the &list_head
+	 */
+	struct list_head list;
+};
+
+/**
+ * drm_gpuva_for_each_op - iterator to walk over &drm_gpuva_ops
+ * @op: &drm_gpuva_op to assign in each iteration step
+ * @ops: &drm_gpuva_ops to walk
+ *
+ * This iterator walks over all ops within a given list of operations.
+ */
+#define drm_gpuva_for_each_op(op, ops) list_for_each_entry(op, &(ops)->list, entry)
+
+/**
+ * drm_gpuva_for_each_op_safe - iterator to safely walk over &drm_gpuva_ops
+ * @op: &drm_gpuva_op to assign in each iteration step
+ * @next: &next &drm_gpuva_op to store the next step
+ * @ops: &drm_gpuva_ops to walk
+ *
+ * This iterator walks over all ops within a given list of operations. It is
+ * implemented with list_for_each_safe(), so save against removal of elements.
+ */
+#define drm_gpuva_for_each_op_safe(op, next, ops) \
+	list_for_each_entry_safe(op, next, &(ops)->list, entry)
+
+/**
+ * drm_gpuva_for_each_op_from_reverse - iterate backwards from the given point
+ * @op: &drm_gpuva_op to assign in each iteration step
+ * @ops: &drm_gpuva_ops to walk
+ *
+ * This iterator walks over all ops within a given list of operations beginning
+ * from the given operation in reverse order.
+ */
+#define drm_gpuva_for_each_op_from_reverse(op, ops) \
+	list_for_each_entry_from_reverse(op, &(ops)->list, entry)
+
+/**
+ * drm_gpuva_first_op - returns the first &drm_gpuva_op from &drm_gpuva_ops
+ * @ops: the &drm_gpuva_ops to get the fist &drm_gpuva_op from
+ */
+#define drm_gpuva_first_op(ops) \
+	list_first_entry(&(ops)->list, struct drm_gpuva_op, entry)
+
+/**
+ * drm_gpuva_last_op - returns the last &drm_gpuva_op from &drm_gpuva_ops
+ * @ops: the &drm_gpuva_ops to get the last &drm_gpuva_op from
+ */
+#define drm_gpuva_last_op(ops) \
+	list_last_entry(&(ops)->list, struct drm_gpuva_op, entry)
+
+/**
+ * drm_gpuva_prev_op - previous &drm_gpuva_op in the list
+ * @op: the current &drm_gpuva_op
+ */
+#define drm_gpuva_prev_op(op) list_prev_entry(op, entry)
+
+/**
+ * drm_gpuva_next_op - next &drm_gpuva_op in the list
+ * @op: the current &drm_gpuva_op
+ */
+#define drm_gpuva_next_op(op) list_next_entry(op, entry)
+
+struct drm_gpuva_ops *
+drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
+			    u64 addr, u64 range,
+			    struct drm_gem_object *obj, u64 offset);
+struct drm_gpuva_ops *
+drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
+			      u64 addr, u64 range);
+
+struct drm_gpuva_ops *
+drm_gpuva_prefetch_ops_create(struct drm_gpuva_manager *mgr,
+				 u64 addr, u64 range);
+
+struct drm_gpuva_ops *
+drm_gpuva_gem_unmap_ops_create(struct drm_gpuva_manager *mgr,
+			       struct drm_gem_object *obj);
+
+void drm_gpuva_ops_free(struct drm_gpuva_manager *mgr,
+			struct drm_gpuva_ops *ops);
+
+/**
+ * struct drm_gpuva_fn_ops - callbacks for split/merge steps
+ *
+ * This structure defines the callbacks used by &drm_gpuva_sm_map and
+ * &drm_gpuva_sm_unmap to provide the split/merge steps for map and unmap
+ * operations to drivers.
+ */
+struct drm_gpuva_fn_ops {
+	/**
+	 * @op_alloc: called when the &drm_gpuva_manager allocates
+	 * a struct drm_gpuva_op
+	 *
+	 * Some drivers may want to embed struct drm_gpuva_op into driver
+	 * specific structures. By implementing this callback drivers can
+	 * allocate memory accordingly.
+	 *
+	 * This callback is optional.
+	 */
+	struct drm_gpuva_op *(*op_alloc)(void);
+
+	/**
+	 * @op_free: called when the &drm_gpuva_manager frees a
+	 * struct drm_gpuva_op
+	 *
+	 * Some drivers may want to embed struct drm_gpuva_op into driver
+	 * specific structures. By implementing this callback drivers can
+	 * free the previously allocated memory accordingly.
+	 *
+	 * This callback is optional.
+	 */
+	void (*op_free)(struct drm_gpuva_op *op);
+
+	/**
+	 * @sm_map_step: called from &drm_gpuva_sm_map providing the split and
+	 * merge steps
+	 *
+	 * This callback provides a single split / merge step or, if no split
+	 * and merge is indicated, the original map operation.
+	 *
+	 * The &priv pointer is equal to the one drivers pass to
+	 * &drm_gpuva_sm_map.
+	 */
+	int (*sm_map_step)(struct drm_gpuva_op *op, void *priv);
+
+	/**
+	 * @sm_unmap_step: called from &drm_gpuva_sm_map providing the split and
+	 * merge steps
+	 *
+	 * This callback provides a single split step or, if no split is
+	 * indicated, the plain unmap operations of the corresponding unmap
+	 * range originally passed to &drm_gpuva_sm_unmap.
+	 *
+	 * The &priv pointer is equal to the one drivers pass to
+	 * &drm_gpuva_sm_unmap.
+	 */
+	int (*sm_unmap_step)(struct drm_gpuva_op *op, void *priv);
+};
+
+int drm_gpuva_sm_map(struct drm_gpuva_manager *mgr, void *priv,
+		     u64 addr, u64 range,
+		     struct drm_gem_object *obj, u64 offset);
+
+int drm_gpuva_sm_unmap(struct drm_gpuva_manager *mgr, void *priv,
+		       u64 addr, u64 range);
+
+#endif /* __DRM_GPUVA_MGR_H__ */
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 06/16] drm: debugfs: provide infrastructure to dump a DRM GPU VA space
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (4 preceding siblings ...)
  2023-02-17 13:44 ` [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
@ 2023-02-17 13:48 ` Danilo Krummrich
  2023-02-18  2:47   ` kernel test robot
  2023-02-17 13:48 ` [PATCH drm-next v2 07/16] drm/nouveau: new VM_BIND uapi interfaces Danilo Krummrich
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:48 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm, Danilo Krummrich

This commit adds a function to dump a DRM GPU VA space and a macro for
drivers to register the struct drm_info_list 'gpuvas' entry.

Most likely, most drivers might maintain one DRM GPU VA space per struct
drm_file, but there might also be drivers not having a fixed relation
between DRM GPU VA spaces and a DRM core infrastructure, hence we need the
indirection via the driver iterating it's maintained DRM GPU VA spaces.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/drm_debugfs.c | 56 +++++++++++++++++++++++++++++++++++
 include/drm/drm_debugfs.h     | 25 ++++++++++++++++
 2 files changed, 81 insertions(+)

diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c
index 4f643a490dc3..0a8e3fdd5f6f 100644
--- a/drivers/gpu/drm/drm_debugfs.c
+++ b/drivers/gpu/drm/drm_debugfs.c
@@ -39,6 +39,7 @@
 #include <drm/drm_file.h>
 #include <drm/drm_gem.h>
 #include <drm/drm_managed.h>
+#include <drm/drm_gpuva_mgr.h>
 
 #include "drm_crtc_internal.h"
 #include "drm_internal.h"
@@ -175,6 +176,61 @@ static const struct file_operations drm_debugfs_fops = {
 	.release = single_release,
 };
 
+/**
+ * drm_debugfs_gpuva_info - dump the given DRM GPU VA space
+ * @m: pointer to the &seq_file to write
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ *
+ * Dumps the GPU VA regions and mappings of a given DRM GPU VA manager.
+ *
+ * For each DRM GPU VA space drivers should call this function from their
+ * &drm_info_list's show callback.
+ *
+ * Returns: 0 on success, -ENODEV if the &mgr is not initialized
+ */
+int drm_debugfs_gpuva_info(struct seq_file *m,
+			   struct drm_gpuva_manager *mgr)
+{
+	DRM_GPUVA_ITER(it, mgr);
+	DRM_GPUVA_REGION_ITER(__it, mgr);
+
+	if (!mgr->name)
+		return -ENODEV;
+
+	seq_printf(m, "DRM GPU VA space (%s)\n", mgr->name);
+	seq_puts  (m, "\n");
+	seq_puts  (m, " VA regions  | start              | range              | end                | sparse\n");
+	seq_puts  (m, "------------------------------------------------------------------------------------\n");
+	seq_printf(m, " VA space    | 0x%016llx | 0x%016llx | 0x%016llx |   -\n",
+		   mgr->mm_start, mgr->mm_range, mgr->mm_start + mgr->mm_range);
+	seq_puts  (m, "-----------------------------------------------------------------------------------\n");
+	drm_gpuva_iter_for_each(__it) {
+		struct drm_gpuva_region *reg = __it.reg;
+
+		if (reg == &mgr->kernel_alloc_region) {
+			seq_printf(m, " kernel node | 0x%016llx | 0x%016llx | 0x%016llx |   -\n",
+				   reg->va.addr, reg->va.range, reg->va.addr + reg->va.range);
+			continue;
+		}
+
+		seq_printf(m, "             | 0x%016llx | 0x%016llx | 0x%016llx | %s\n",
+			   reg->va.addr, reg->va.range, reg->va.addr + reg->va.range,
+			   reg->sparse ? "true" : "false");
+	}
+	seq_puts(m, "\n");
+	seq_puts(m, " VAs | start              | range              | end                | object             | object offset\n");
+	seq_puts(m, "-------------------------------------------------------------------------------------------------------------\n");
+	drm_gpuva_iter_for_each(it) {
+		struct drm_gpuva *va = it.va;
+
+		seq_printf(m, "     | 0x%016llx | 0x%016llx | 0x%016llx | 0x%016llx | 0x%016llx\n",
+			   va->va.addr, va->va.range, va->va.addr + va->va.range,
+			   (u64)va->gem.obj, va->gem.offset);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_debugfs_gpuva_info);
 
 /**
  * drm_debugfs_create_files - Initialize a given set of debugfs files for DRM
diff --git a/include/drm/drm_debugfs.h b/include/drm/drm_debugfs.h
index 7616f457ce70..cb2c1956a214 100644
--- a/include/drm/drm_debugfs.h
+++ b/include/drm/drm_debugfs.h
@@ -34,6 +34,22 @@
 
 #include <linux/types.h>
 #include <linux/seq_file.h>
+
+#include <drm/drm_gpuva_mgr.h>
+
+/**
+ * DRM_DEBUGFS_GPUVA_INFO - &drm_info_list entry to dump a GPU VA space
+ * @show: the &drm_info_list's show callback
+ * @data: driver private data
+ *
+ * Drivers should use this macro to define a &drm_info_list entry to provide a
+ * debugfs file for dumping the GPU VA space regions and mappings.
+ *
+ * For each DRM GPU VA space drivers should call drm_debugfs_gpuva_info() from
+ * their @show callback.
+ */
+#define DRM_DEBUGFS_GPUVA_INFO(show, data) {"gpuvas", show, DRIVER_GEM_GPUVA, data}
+
 /**
  * struct drm_info_list - debugfs info list entry
  *
@@ -134,6 +150,9 @@ void drm_debugfs_add_file(struct drm_device *dev, const char *name,
 
 void drm_debugfs_add_files(struct drm_device *dev,
 			   const struct drm_debugfs_info *files, int count);
+
+int drm_debugfs_gpuva_info(struct seq_file *m,
+			   struct drm_gpuva_manager *mgr);
 #else
 static inline void drm_debugfs_create_files(const struct drm_info_list *files,
 					    int count, struct dentry *root,
@@ -155,6 +174,12 @@ static inline void drm_debugfs_add_files(struct drm_device *dev,
 					 const struct drm_debugfs_info *files,
 					 int count)
 {}
+
+static inline int drm_debugfs_gpuva_info(struct seq_file *m,
+					 struct drm_gpuva_manager *mgr)
+{
+	return 0;
+}
 #endif
 
 #endif /* _DRM_DEBUGFS_H_ */
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 07/16] drm/nouveau: new VM_BIND uapi interfaces
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (5 preceding siblings ...)
  2023-02-17 13:48 ` [PATCH drm-next v2 06/16] drm: debugfs: provide infrastructure to dump a DRM GPU VA space Danilo Krummrich
@ 2023-02-17 13:48 ` Danilo Krummrich
  2023-02-17 13:48 ` [PATCH drm-next v2 08/16] drm/nouveau: get vmm via nouveau_cli_vmm() Danilo Krummrich
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:48 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm,
	Danilo Krummrich, Dave Airlie

This commit provides the interfaces for the new UAPI motivated by the
Vulkan API. It allows user mode drivers (UMDs) to:

1) Initialize a GPU virtual address (VA) space via the new
   DRM_IOCTL_NOUVEAU_VM_INIT ioctl. UMDs can provide a kernel reserved
   VA area.

2) Bind and unbind GPU VA space mappings via the new
   DRM_IOCTL_NOUVEAU_VM_BIND ioctl.

3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC support
asynchronous processing with DRM syncobjs as synchronization mechanism.

The default DRM_IOCTL_NOUVEAU_VM_BIND is synchronous processing,
DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.

Co-authored-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 Documentation/gpu/driver-uapi.rst |   8 ++
 include/uapi/drm/nouveau_drm.h    | 220 ++++++++++++++++++++++++++++++
 2 files changed, 228 insertions(+)

diff --git a/Documentation/gpu/driver-uapi.rst b/Documentation/gpu/driver-uapi.rst
index 4411e6919a3d..9c7ca6e33a68 100644
--- a/Documentation/gpu/driver-uapi.rst
+++ b/Documentation/gpu/driver-uapi.rst
@@ -6,3 +6,11 @@ drm/i915 uAPI
 =============
 
 .. kernel-doc:: include/uapi/drm/i915_drm.h
+
+drm/nouveau uAPI
+================
+
+VM_BIND / EXEC uAPI
+-------------------
+
+.. kernel-doc:: include/uapi/drm/nouveau_drm.h
diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
index 853a327433d3..dbe0afbd858a 100644
--- a/include/uapi/drm/nouveau_drm.h
+++ b/include/uapi/drm/nouveau_drm.h
@@ -126,6 +126,220 @@ struct drm_nouveau_gem_cpu_fini {
 	__u32 handle;
 };
 
+/**
+ * struct drm_nouveau_sync - sync object
+ *
+ * This structure serves as synchronization mechanism for (potentially)
+ * asynchronous operations such as EXEC or VM_BIND.
+ */
+struct drm_nouveau_sync {
+	/**
+	 * @flags: the flags for a sync object
+	 *
+	 * The first 8 bits are used to determine the type of the sync object.
+	 */
+	__u32 flags;
+#define DRM_NOUVEAU_SYNC_SYNCOBJ 0x0
+#define DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ 0x1
+#define DRM_NOUVEAU_SYNC_TYPE_MASK 0xf
+	/**
+	 * @handle: the handle of the sync object
+	 */
+	__u32 handle;
+	/**
+	 * @timeline_value:
+	 *
+	 * The timeline point of the sync object in case the syncobj is of
+	 * type DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ.
+	 */
+	__u64 timeline_value;
+};
+
+/**
+ * struct drm_nouveau_vm_init - GPU VA space init structure
+ *
+ * Used to initialize the GPU's VA space for a user client, telling the kernel
+ * which portion of the VA space is managed by the UMD and kernel respectively.
+ */
+struct drm_nouveau_vm_init {
+	/**
+	 * @unmanaged_addr: start address of the kernel managed VA space region
+	 */
+	__u64 unmanaged_addr;
+	/**
+	 * @unmanaged_size: size of the kernel managed VA space region in bytes
+	 */
+	__u64 unmanaged_size;
+};
+
+/**
+ * struct drm_nouveau_vm_bind_op - VM_BIND operation
+ *
+ * This structure represents a single VM_BIND operation. UMDs should pass
+ * an array of this structure via struct drm_nouveau_vm_bind's &op_ptr field.
+ */
+struct drm_nouveau_vm_bind_op {
+	/**
+	 * @op: the operation type
+	 */
+	__u32 op;
+/**
+ * @DRM_NOUVEAU_VM_BIND_OP_ALLOC:
+ *
+ * The alloc operation is used to reserve a VA space region within the GPU's VA
+ * space. Optionally, the &DRM_NOUVEAU_VM_BIND_SPARSE flag can be passed to
+ * instruct the kernel to create sparse mappings for the given region.
+ */
+#define DRM_NOUVEAU_VM_BIND_OP_ALLOC 0x0
+/**
+ * @DRM_NOUVEAU_VM_BIND_OP_FREE: Free a reserved VA space region.
+ */
+#define DRM_NOUVEAU_VM_BIND_OP_FREE 0x1
+/**
+ * @DRM_NOUVEAU_VM_BIND_OP_MAP:
+ *
+ * Map a GEM object to the GPU's VA space. The mapping must be fully enclosed by
+ * a previously allocated VA space region. If the region is sparse, existing
+ * sparse mappings are overwritten.
+ */
+#define DRM_NOUVEAU_VM_BIND_OP_MAP 0x2
+/**
+ * @DRM_NOUVEAU_VM_BIND_OP_UNMAP:
+ *
+ * Unmap an existing mapping in the GPU's VA space. If the region the mapping
+ * is located in is a sparse region, new sparse mappings are created where the
+ * unmapped (memory backed) mapping was mapped previously.
+ */
+#define DRM_NOUVEAU_VM_BIND_OP_UNMAP 0x3
+	/**
+	 * @flags: the flags for a &drm_nouveau_vm_bind_op
+	 */
+	__u32 flags;
+/**
+ * @DRM_NOUVEAU_VM_BIND_SPARSE:
+ *
+ * Indicates that an allocated VA space region should be sparse.
+ */
+#define DRM_NOUVEAU_VM_BIND_SPARSE (1 << 8)
+	/**
+	 * @handle: the handle of the DRM GEM object to map
+	 */
+	__u32 handle;
+	/**
+	 * @pad: 32 bit padding, should be 0
+	 */
+	__u32 pad;
+	/**
+	 * @addr:
+	 *
+	 * the address the VA space region or (memory backed) mapping should be mapped to
+	 */
+	__u64 addr;
+	/**
+	 * @bo_offset: the offset within the BO backing the mapping
+	 */
+	__u64 bo_offset;
+	/**
+	 * @range: the size of the requested mapping in bytes
+	 */
+	__u64 range;
+};
+
+/**
+ * struct drm_nouveau_vm_bind - structure for DRM_IOCTL_NOUVEAU_VM_BIND
+ */
+struct drm_nouveau_vm_bind {
+	/**
+	 * @op_count: the number of &drm_nouveau_vm_bind_op
+	 */
+	__u32 op_count;
+	/**
+	 * @flags: the flags for a &drm_nouveau_vm_bind ioctl
+	 */
+	__u32 flags;
+/**
+ * @DRM_NOUVEAU_VM_BIND_RUN_ASYNC:
+ *
+ * Indicates that the given VM_BIND operation should be executed asynchronously
+ * by the kernel.
+ *
+ * If this flag is not supplied the kernel executes the associated operations
+ * synchronously and doesn't accept any &drm_nouveau_sync objects.
+ */
+#define DRM_NOUVEAU_VM_BIND_RUN_ASYNC 0x1
+	/**
+	 * @wait_count: the number of wait &drm_nouveau_syncs
+	 */
+	__u32 wait_count;
+	/**
+	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
+	 */
+	__u32 sig_count;
+	/**
+	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
+	 */
+	__u64 wait_ptr;
+	/**
+	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
+	 */
+	__u64 sig_ptr;
+	/**
+	 * @op_ptr: pointer to the &drm_nouveau_vm_bind_ops to execute
+	 */
+	__u64 op_ptr;
+};
+
+/**
+ * struct drm_nouveau_exec_push - EXEC push operation
+ *
+ * This structure represents a single EXEC push operation. UMDs should pass an
+ * array of this structure via struct drm_nouveau_exec's &push_ptr field.
+ */
+struct drm_nouveau_exec_push {
+	/**
+	 * @va: the virtual address of the push buffer mapping
+	 */
+	__u64 va;
+	/**
+	 * @va_len: the length of the push buffer mapping
+	 */
+	__u64 va_len;
+};
+
+/**
+ * struct drm_nouveau_exec - structure for DRM_IOCTL_NOUVEAU_EXEC
+ */
+struct drm_nouveau_exec {
+	/**
+	 * @channel: the channel to execute the push buffer in
+	 */
+	__u32 channel;
+	/**
+	 * @push_count: the number of &drm_nouveau_exec_push ops
+	 */
+	__u32 push_count;
+	/**
+	 * @wait_count: the number of wait &drm_nouveau_syncs
+	 */
+	__u32 wait_count;
+	/**
+	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
+	 */
+	__u32 sig_count;
+	/**
+	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
+	 */
+	__u64 wait_ptr;
+	/**
+	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
+	 */
+	__u64 sig_ptr;
+	/**
+	 * @push_ptr: pointer to &drm_nouveau_exec_push ops
+	 */
+	__u64 push_ptr;
+};
+
 #define DRM_NOUVEAU_GETPARAM           0x00 /* deprecated */
 #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
 #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02 /* deprecated */
@@ -136,6 +350,9 @@ struct drm_nouveau_gem_cpu_fini {
 #define DRM_NOUVEAU_NVIF               0x07
 #define DRM_NOUVEAU_SVM_INIT           0x08
 #define DRM_NOUVEAU_SVM_BIND           0x09
+#define DRM_NOUVEAU_VM_INIT            0x10
+#define DRM_NOUVEAU_VM_BIND            0x11
+#define DRM_NOUVEAU_EXEC               0x12
 #define DRM_NOUVEAU_GEM_NEW            0x40
 #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
 #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
@@ -197,6 +414,9 @@ struct drm_nouveau_svm_bind {
 #define DRM_IOCTL_NOUVEAU_GEM_CPU_FINI       DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_FINI, struct drm_nouveau_gem_cpu_fini)
 #define DRM_IOCTL_NOUVEAU_GEM_INFO           DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_INFO, struct drm_nouveau_gem_info)
 
+#define DRM_IOCTL_NOUVEAU_VM_INIT            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
+#define DRM_IOCTL_NOUVEAU_VM_BIND            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
+#define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
 #if defined(__cplusplus)
 }
 #endif
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 08/16] drm/nouveau: get vmm via nouveau_cli_vmm()
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (6 preceding siblings ...)
  2023-02-17 13:48 ` [PATCH drm-next v2 07/16] drm/nouveau: new VM_BIND uapi interfaces Danilo Krummrich
@ 2023-02-17 13:48 ` Danilo Krummrich
  2023-02-17 13:48 ` [PATCH drm-next v2 09/16] drm/nouveau: bo: initialize GEM GPU VA interface Danilo Krummrich
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:48 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm, Danilo Krummrich

Provide a getter function for the client's current vmm context. Since
we'll add a new (u)vmm context for UMD bindings in subsequent commits,
this will keep the code clean.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c   | 2 +-
 drivers/gpu/drm/nouveau/nouveau_chan.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_drv.h  | 9 +++++++++
 drivers/gpu/drm/nouveau/nouveau_gem.c  | 6 +++---
 4 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 288eebc70a67..f3039c1f87c9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -204,7 +204,7 @@ nouveau_bo_alloc(struct nouveau_cli *cli, u64 *size, int *align, u32 domain,
 	struct nouveau_drm *drm = cli->drm;
 	struct nouveau_bo *nvbo;
 	struct nvif_mmu *mmu = &cli->mmu;
-	struct nvif_vmm *vmm = cli->svm.cli ? &cli->svm.vmm : &cli->vmm.vmm;
+	struct nvif_vmm *vmm = &nouveau_cli_vmm(cli)->vmm;
 	int i, pi = -1;
 
 	if (!*size) {
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c b/drivers/gpu/drm/nouveau/nouveau_chan.c
index e648ecd0c1a0..1068abe41024 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.c
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
@@ -148,7 +148,7 @@ nouveau_channel_prep(struct nouveau_drm *drm, struct nvif_device *device,
 
 	chan->device = device;
 	chan->drm = drm;
-	chan->vmm = cli->svm.cli ? &cli->svm : &cli->vmm;
+	chan->vmm = nouveau_cli_vmm(cli);
 	atomic_set(&chan->killed, 0);
 
 	/* allocate memory for dma push buffer */
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index b5de312a523f..81350e685b50 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -112,6 +112,15 @@ struct nouveau_cli_work {
 	struct dma_fence_cb cb;
 };
 
+static inline struct nouveau_vmm *
+nouveau_cli_vmm(struct nouveau_cli *cli)
+{
+	if (cli->svm.cli)
+		return &cli->svm;
+
+	return &cli->vmm;
+}
+
 void nouveau_cli_work_queue(struct nouveau_cli *, struct dma_fence *,
 			    struct nouveau_cli_work *);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index f77e44958037..08689ced4f6a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -103,7 +103,7 @@ nouveau_gem_object_open(struct drm_gem_object *gem, struct drm_file *file_priv)
 	struct nouveau_bo *nvbo = nouveau_gem_object(gem);
 	struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev);
 	struct device *dev = drm->dev->dev;
-	struct nouveau_vmm *vmm = cli->svm.cli ? &cli->svm : &cli->vmm;
+	struct nouveau_vmm *vmm = nouveau_cli_vmm(cli);
 	struct nouveau_vma *vma;
 	int ret;
 
@@ -180,7 +180,7 @@ nouveau_gem_object_close(struct drm_gem_object *gem, struct drm_file *file_priv)
 	struct nouveau_bo *nvbo = nouveau_gem_object(gem);
 	struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev);
 	struct device *dev = drm->dev->dev;
-	struct nouveau_vmm *vmm = cli->svm.cli ? &cli->svm : & cli->vmm;
+	struct nouveau_vmm *vmm = nouveau_cli_vmm(cli);
 	struct nouveau_vma *vma;
 	int ret;
 
@@ -269,7 +269,7 @@ nouveau_gem_info(struct drm_file *file_priv, struct drm_gem_object *gem,
 {
 	struct nouveau_cli *cli = nouveau_cli(file_priv);
 	struct nouveau_bo *nvbo = nouveau_gem_object(gem);
-	struct nouveau_vmm *vmm = cli->svm.cli ? &cli->svm : &cli->vmm;
+	struct nouveau_vmm *vmm = nouveau_cli_vmm(cli);
 	struct nouveau_vma *vma;
 
 	if (is_power_of_2(nvbo->valid_domains))
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 09/16] drm/nouveau: bo: initialize GEM GPU VA interface
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (7 preceding siblings ...)
  2023-02-17 13:48 ` [PATCH drm-next v2 08/16] drm/nouveau: get vmm via nouveau_cli_vmm() Danilo Krummrich
@ 2023-02-17 13:48 ` Danilo Krummrich
  2023-02-17 13:48 ` [PATCH drm-next v2 10/16] drm/nouveau: move usercopy helpers to nouveau_drv.h Danilo Krummrich
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:48 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm, Danilo Krummrich

Initialize the GEM's DRM GPU VA manager interface in preparation for the
(u)vmm implementation, provided by subsequent commits, to make use of it.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index f3039c1f87c9..bf6984c8754c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -215,11 +215,14 @@ nouveau_bo_alloc(struct nouveau_cli *cli, u64 *size, int *align, u32 domain,
 	nvbo = kzalloc(sizeof(struct nouveau_bo), GFP_KERNEL);
 	if (!nvbo)
 		return ERR_PTR(-ENOMEM);
+
 	INIT_LIST_HEAD(&nvbo->head);
 	INIT_LIST_HEAD(&nvbo->entry);
 	INIT_LIST_HEAD(&nvbo->vma_list);
 	nvbo->bo.bdev = &drm->ttm.bdev;
 
+	drm_gem_gpuva_init(&nvbo->bo.base);
+
 	/* This is confusing, and doesn't actually mean we want an uncached
 	 * mapping, but is what NOUVEAU_GEM_DOMAIN_COHERENT gets translated
 	 * into in nouveau_gem_new().
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 10/16] drm/nouveau: move usercopy helpers to nouveau_drv.h
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (8 preceding siblings ...)
  2023-02-17 13:48 ` [PATCH drm-next v2 09/16] drm/nouveau: bo: initialize GEM GPU VA interface Danilo Krummrich
@ 2023-02-17 13:48 ` Danilo Krummrich
  2023-02-17 13:48 ` [PATCH drm-next v2 11/16] drm/nouveau: fence: fail to emit when fence context is killed Danilo Krummrich
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:48 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm, Danilo Krummrich

Move the usercopy helpers to a common driver header file to make it
usable for the new API added in subsequent commits.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_drv.h | 26 ++++++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_gem.c | 26 --------------------------
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 81350e685b50..20a7f31b9082 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -130,6 +130,32 @@ nouveau_cli(struct drm_file *fpriv)
 	return fpriv ? fpriv->driver_priv : NULL;
 }
 
+static inline void
+u_free(void *addr)
+{
+	kvfree(addr);
+}
+
+static inline void *
+u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
+{
+	void *mem;
+	void __user *userptr = (void __force __user *)(uintptr_t)user;
+
+	size *= nmemb;
+
+	mem = kvmalloc(size, GFP_KERNEL);
+	if (!mem)
+		return ERR_PTR(-ENOMEM);
+
+	if (copy_from_user(mem, userptr, size)) {
+		u_free(mem);
+		return ERR_PTR(-EFAULT);
+	}
+
+	return mem;
+}
+
 #include <nvif/object.h>
 #include <nvif/parent.h>
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 08689ced4f6a..4369c8dc8b5b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -613,32 +613,6 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan,
 	return 0;
 }
 
-static inline void
-u_free(void *addr)
-{
-	kvfree(addr);
-}
-
-static inline void *
-u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
-{
-	void *mem;
-	void __user *userptr = (void __force __user *)(uintptr_t)user;
-
-	size *= nmemb;
-
-	mem = kvmalloc(size, GFP_KERNEL);
-	if (!mem)
-		return ERR_PTR(-ENOMEM);
-
-	if (copy_from_user(mem, userptr, size)) {
-		u_free(mem);
-		return ERR_PTR(-EFAULT);
-	}
-
-	return mem;
-}
-
 static int
 nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
 				struct drm_nouveau_gem_pushbuf *req,
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 11/16] drm/nouveau: fence: fail to emit when fence context is killed
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (9 preceding siblings ...)
  2023-02-17 13:48 ` [PATCH drm-next v2 10/16] drm/nouveau: move usercopy helpers to nouveau_drv.h Danilo Krummrich
@ 2023-02-17 13:48 ` Danilo Krummrich
  2023-02-17 13:48 ` [PATCH drm-next v2 12/16] drm/nouveau: chan: provide nouveau_channel_kill() Danilo Krummrich
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:48 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm, Danilo Krummrich

The new VM_BIND UAPI implementation introduced in subsequent commits
will allow asynchronous jobs processing push buffers and emitting
fences.

If a fence context is killed, e.g. due to a channel fault, jobs which
are already queued for execution might still emit new fences. In such a
case a job would hang forever.

To fix that, fail to emit a new fence on a killed fence context with
-ENODEV to unblock the job.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_fence.c | 7 +++++++
 drivers/gpu/drm/nouveau/nouveau_fence.h | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index ee5e9d40c166..62c70d9a32e6 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -96,6 +96,7 @@ nouveau_fence_context_kill(struct nouveau_fence_chan *fctx, int error)
 		if (nouveau_fence_signal(fence))
 			nvif_event_block(&fctx->event);
 	}
+	fctx->killed = 1;
 	spin_unlock_irqrestore(&fctx->lock, flags);
 }
 
@@ -226,6 +227,12 @@ nouveau_fence_emit(struct nouveau_fence *fence, struct nouveau_channel *chan)
 		dma_fence_get(&fence->base);
 		spin_lock_irq(&fctx->lock);
 
+		if (unlikely(fctx->killed)) {
+			spin_unlock_irq(&fctx->lock);
+			dma_fence_put(&fence->base);
+			return -ENODEV;
+		}
+
 		if (nouveau_fence_update(chan, fctx))
 			nvif_event_block(&fctx->event);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h
index 0ca2bc85adf6..00a08699bb58 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
@@ -45,7 +45,7 @@ struct nouveau_fence_chan {
 	char name[32];
 
 	struct nvif_event event;
-	int notify_ref, dead;
+	int notify_ref, dead, killed;
 };
 
 struct nouveau_fence_priv {
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 12/16] drm/nouveau: chan: provide nouveau_channel_kill()
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (10 preceding siblings ...)
  2023-02-17 13:48 ` [PATCH drm-next v2 11/16] drm/nouveau: fence: fail to emit when fence context is killed Danilo Krummrich
@ 2023-02-17 13:48 ` Danilo Krummrich
  2023-02-17 13:48 ` [PATCH drm-next v2 13/16] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm Danilo Krummrich
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:48 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm, Danilo Krummrich

The new VM_BIND UAPI implementation introduced in subsequent commits
will allow asynchronous jobs processing push buffers and emitting fences.

If a job times out, we need a way to recover from this situation. For
now, simply kill the channel to unblock all hung up jobs and signal
userspace that the device is dead on the next EXEC or VM_BIND ioctl.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_chan.c | 14 +++++++++++---
 drivers/gpu/drm/nouveau/nouveau_chan.h |  1 +
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c b/drivers/gpu/drm/nouveau/nouveau_chan.c
index 1068abe41024..6f47e997d9cf 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.c
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
@@ -40,6 +40,14 @@ MODULE_PARM_DESC(vram_pushbuf, "Create DMA push buffers in VRAM");
 int nouveau_vram_pushbuf;
 module_param_named(vram_pushbuf, nouveau_vram_pushbuf, int, 0400);
 
+void
+nouveau_channel_kill(struct nouveau_channel *chan)
+{
+	atomic_set(&chan->killed, 1);
+	if (chan->fence)
+		nouveau_fence_context_kill(chan->fence, -ENODEV);
+}
+
 static int
 nouveau_channel_killed(struct nvif_event *event, void *repv, u32 repc)
 {
@@ -47,9 +55,9 @@ nouveau_channel_killed(struct nvif_event *event, void *repv, u32 repc)
 	struct nouveau_cli *cli = (void *)chan->user.client;
 
 	NV_PRINTK(warn, cli, "channel %d killed!\n", chan->chid);
-	atomic_set(&chan->killed, 1);
-	if (chan->fence)
-		nouveau_fence_context_kill(chan->fence, -ENODEV);
+
+	if (unlikely(!atomic_read(&chan->killed)))
+		nouveau_channel_kill(chan);
 
 	return NVIF_EVENT_DROP;
 }
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.h b/drivers/gpu/drm/nouveau/nouveau_chan.h
index e06a8ffed31a..e483f4a254da 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.h
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.h
@@ -65,6 +65,7 @@ int  nouveau_channel_new(struct nouveau_drm *, struct nvif_device *, bool priv,
 			 u32 vram, u32 gart, struct nouveau_channel **);
 void nouveau_channel_del(struct nouveau_channel **);
 int  nouveau_channel_idle(struct nouveau_channel *);
+void nouveau_channel_kill(struct nouveau_channel *);
 
 extern int nouveau_vram_pushbuf;
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 13/16] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (11 preceding siblings ...)
  2023-02-17 13:48 ` [PATCH drm-next v2 12/16] drm/nouveau: chan: provide nouveau_channel_kill() Danilo Krummrich
@ 2023-02-17 13:48 ` Danilo Krummrich
  2023-02-18  1:16   ` kernel test robot
  2023-02-17 13:48 ` [PATCH drm-next v2 14/16] drm/nouveau: implement uvmm for user mode bindings Danilo Krummrich
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:48 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm, Danilo Krummrich

The new VM_BIND UAPI uses the DRM GPU VA manager to manage the VA space.
Hence, we a need a way to manipulate the MMUs page tables without going
through the internal range allocator implemented by nvkm/vmm.

This patch adds a raw interface for nvkm/vmm to pass the resposibility
for managing the address space and the corresponding map/unmap/sparse
operations to the upper layers.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |  26 ++-
 drivers/gpu/drm/nouveau/include/nvif/vmm.h    |  19 +-
 .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |  20 +-
 drivers/gpu/drm/nouveau/nouveau_svm.c         |   2 +-
 drivers/gpu/drm/nouveau/nouveau_vmm.c         |   4 +-
 drivers/gpu/drm/nouveau/nvif/vmm.c            | 100 +++++++-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    | 213 ++++++++++++++++--
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c | 197 ++++++++++++----
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |  25 ++
 .../drm/nouveau/nvkm/subdev/mmu/vmmgf100.c    |  16 +-
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |  16 +-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c |  27 ++-
 12 files changed, 566 insertions(+), 99 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/include/nvif/if000c.h b/drivers/gpu/drm/nouveau/include/nvif/if000c.h
index 9c7ff56831c5..a5a182b3c28d 100644
--- a/drivers/gpu/drm/nouveau/include/nvif/if000c.h
+++ b/drivers/gpu/drm/nouveau/include/nvif/if000c.h
@@ -3,7 +3,10 @@
 struct nvif_vmm_v0 {
 	__u8  version;
 	__u8  page_nr;
-	__u8  managed;
+#define NVIF_VMM_V0_TYPE_UNMANAGED                                         0x00
+#define NVIF_VMM_V0_TYPE_MANAGED                                           0x01
+#define NVIF_VMM_V0_TYPE_RAW                                               0x02
+	__u8  type;
 	__u8  pad03[5];
 	__u64 addr;
 	__u64 size;
@@ -17,6 +20,7 @@ struct nvif_vmm_v0 {
 #define NVIF_VMM_V0_UNMAP                                                  0x04
 #define NVIF_VMM_V0_PFNMAP                                                 0x05
 #define NVIF_VMM_V0_PFNCLR                                                 0x06
+#define NVIF_VMM_V0_RAW                                                    0x07
 #define NVIF_VMM_V0_MTHD(i)                                         ((i) + 0x80)
 
 struct nvif_vmm_page_v0 {
@@ -66,6 +70,26 @@ struct nvif_vmm_unmap_v0 {
 	__u64 addr;
 };
 
+struct nvif_vmm_raw_v0 {
+	__u8 version;
+#define NVIF_VMM_RAW_V0_GET	0x0
+#define NVIF_VMM_RAW_V0_PUT	0x1
+#define NVIF_VMM_RAW_V0_MAP	0x2
+#define NVIF_VMM_RAW_V0_UNMAP	0x3
+#define NVIF_VMM_RAW_V0_SPARSE	0x4
+	__u8  op;
+	__u8  sparse;
+	__u8  ref;
+	__u8  shift;
+	__u32 argc;
+	__u8  pad01[7];
+	__u64 addr;
+	__u64 size;
+	__u64 offset;
+	__u64 memory;
+	__u64 argv;
+};
+
 struct nvif_vmm_pfnmap_v0 {
 	__u8  version;
 	__u8  page;
diff --git a/drivers/gpu/drm/nouveau/include/nvif/vmm.h b/drivers/gpu/drm/nouveau/include/nvif/vmm.h
index a2ee92201ace..0ecedd0ee0a5 100644
--- a/drivers/gpu/drm/nouveau/include/nvif/vmm.h
+++ b/drivers/gpu/drm/nouveau/include/nvif/vmm.h
@@ -4,6 +4,12 @@
 struct nvif_mem;
 struct nvif_mmu;
 
+enum nvif_vmm_type {
+	UNMANAGED,
+	MANAGED,
+	RAW,
+};
+
 enum nvif_vmm_get {
 	ADDR,
 	PTES,
@@ -30,8 +36,9 @@ struct nvif_vmm {
 	int page_nr;
 };
 
-int nvif_vmm_ctor(struct nvif_mmu *, const char *name, s32 oclass, bool managed,
-		  u64 addr, u64 size, void *argv, u32 argc, struct nvif_vmm *);
+int nvif_vmm_ctor(struct nvif_mmu *, const char *name, s32 oclass,
+		  enum nvif_vmm_type, u64 addr, u64 size, void *argv, u32 argc,
+		  struct nvif_vmm *);
 void nvif_vmm_dtor(struct nvif_vmm *);
 int nvif_vmm_get(struct nvif_vmm *, enum nvif_vmm_get, bool sparse,
 		 u8 page, u8 align, u64 size, struct nvif_vma *);
@@ -39,4 +46,12 @@ void nvif_vmm_put(struct nvif_vmm *, struct nvif_vma *);
 int nvif_vmm_map(struct nvif_vmm *, u64 addr, u64 size, void *argv, u32 argc,
 		 struct nvif_mem *, u64 offset);
 int nvif_vmm_unmap(struct nvif_vmm *, u64);
+
+int nvif_vmm_raw_get(struct nvif_vmm *vmm, u64 addr, u64 size, u8 shift);
+int nvif_vmm_raw_put(struct nvif_vmm *vmm, u64 addr, u64 size, u8 shift);
+int nvif_vmm_raw_map(struct nvif_vmm *vmm, u64 addr, u64 size, u8 shift,
+		     void *argv, u32 argc, struct nvif_mem *mem, u64 offset);
+int nvif_vmm_raw_unmap(struct nvif_vmm *vmm, u64 addr, u64 size,
+		       u8 shift, bool sparse);
+int nvif_vmm_raw_sparse(struct nvif_vmm *vmm, u64 addr, u64 size, bool ref);
 #endif
diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/mmu.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/mmu.h
index 70e7887ef4b4..2fd2f2433fc7 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/mmu.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/mmu.h
@@ -17,6 +17,7 @@ struct nvkm_vma {
 	bool part:1; /* Region was split from an allocated region by map(). */
 	bool busy:1; /* Region busy (for temporarily preventing user access). */
 	bool mapped:1; /* Region contains valid pages. */
+	bool no_comp:1; /* Force no memory compression. */
 	struct nvkm_memory *memory; /* Memory currently mapped into VMA. */
 	struct nvkm_tags *tags; /* Compression tag reference. */
 };
@@ -27,10 +28,26 @@ struct nvkm_vmm {
 	const char *name;
 	u32 debug;
 	struct kref kref;
-	struct mutex mutex;
+
+	struct {
+		struct mutex vmm;
+		struct mutex ref;
+		struct mutex map;
+	} mutex;
 
 	u64 start;
 	u64 limit;
+	struct {
+		struct {
+			u64 addr;
+			u64 size;
+		} p;
+		struct {
+			u64 addr;
+			u64 size;
+		} n;
+		bool raw;
+	} managed;
 
 	struct nvkm_vmm_pt *pd;
 	struct list_head join;
@@ -70,6 +87,7 @@ struct nvkm_vmm_map {
 
 	const struct nvkm_vmm_page *page;
 
+	bool no_comp;
 	struct nvkm_tags *tags;
 	u64 next;
 	u64 type;
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index a74ba8d84ba7..186351ecf72f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -350,7 +350,7 @@ nouveau_svmm_init(struct drm_device *dev, void *data,
 	 * VMM instead of the standard one.
 	 */
 	ret = nvif_vmm_ctor(&cli->mmu, "svmVmm",
-			    cli->vmm.vmm.object.oclass, true,
+			    cli->vmm.vmm.object.oclass, MANAGED,
 			    args->unmanaged_addr, args->unmanaged_size,
 			    &(struct gp100_vmm_v0) {
 				.fault_replay = true,
diff --git a/drivers/gpu/drm/nouveau/nouveau_vmm.c b/drivers/gpu/drm/nouveau/nouveau_vmm.c
index 67d6619fcd5e..a6602c012671 100644
--- a/drivers/gpu/drm/nouveau/nouveau_vmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_vmm.c
@@ -128,8 +128,8 @@ nouveau_vmm_fini(struct nouveau_vmm *vmm)
 int
 nouveau_vmm_init(struct nouveau_cli *cli, s32 oclass, struct nouveau_vmm *vmm)
 {
-	int ret = nvif_vmm_ctor(&cli->mmu, "drmVmm", oclass, false, PAGE_SIZE,
-				0, NULL, 0, &vmm->vmm);
+	int ret = nvif_vmm_ctor(&cli->mmu, "drmVmm", oclass, UNMANAGED,
+				PAGE_SIZE, 0, NULL, 0, &vmm->vmm);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/nouveau/nvif/vmm.c b/drivers/gpu/drm/nouveau/nvif/vmm.c
index 6053d6dc2184..99296f03371a 100644
--- a/drivers/gpu/drm/nouveau/nvif/vmm.c
+++ b/drivers/gpu/drm/nouveau/nvif/vmm.c
@@ -104,6 +104,90 @@ nvif_vmm_get(struct nvif_vmm *vmm, enum nvif_vmm_get type, bool sparse,
 	return ret;
 }
 
+int
+nvif_vmm_raw_get(struct nvif_vmm *vmm, u64 addr, u64 size,
+		 u8 shift)
+{
+	struct nvif_vmm_raw_v0 args = {
+		.version = 0,
+		.op = NVIF_VMM_RAW_V0_GET,
+		.addr = addr,
+		.size = size,
+		.shift = shift,
+	};
+
+	return nvif_object_mthd(&vmm->object, NVIF_VMM_V0_RAW,
+				&args, sizeof(args));
+}
+
+int
+nvif_vmm_raw_put(struct nvif_vmm *vmm, u64 addr, u64 size, u8 shift)
+{
+	struct nvif_vmm_raw_v0 args = {
+		.version = 0,
+		.op = NVIF_VMM_RAW_V0_PUT,
+		.addr = addr,
+		.size = size,
+		.shift = shift,
+	};
+
+	return nvif_object_mthd(&vmm->object, NVIF_VMM_V0_RAW,
+				&args, sizeof(args));
+}
+
+int
+nvif_vmm_raw_map(struct nvif_vmm *vmm, u64 addr, u64 size, u8 shift,
+		 void *argv, u32 argc, struct nvif_mem *mem, u64 offset)
+{
+	struct nvif_vmm_raw_v0 args = {
+		.version = 0,
+		.op = NVIF_VMM_RAW_V0_MAP,
+		.addr = addr,
+		.size = size,
+		.shift = shift,
+		.memory = nvif_handle(&mem->object),
+		.offset = offset,
+		.argv = (u64)(uintptr_t)argv,
+		.argc = argc,
+	};
+
+
+	return nvif_object_mthd(&vmm->object, NVIF_VMM_V0_RAW,
+				&args, sizeof(args));
+}
+
+int
+nvif_vmm_raw_unmap(struct nvif_vmm *vmm, u64 addr, u64 size,
+		   u8 shift, bool sparse)
+{
+	struct nvif_vmm_raw_v0 args = {
+		.version = 0,
+		.op = NVIF_VMM_RAW_V0_UNMAP,
+		.addr = addr,
+		.size = size,
+		.shift = shift,
+		.sparse = sparse,
+	};
+
+	return nvif_object_mthd(&vmm->object, NVIF_VMM_V0_RAW,
+				&args, sizeof(args));
+}
+
+int
+nvif_vmm_raw_sparse(struct nvif_vmm *vmm, u64 addr, u64 size, bool ref)
+{
+	struct nvif_vmm_raw_v0 args = {
+		.version = 0,
+		.op = NVIF_VMM_RAW_V0_SPARSE,
+		.addr = addr,
+		.size = size,
+		.ref = ref,
+	};
+
+	return nvif_object_mthd(&vmm->object, NVIF_VMM_V0_RAW,
+				&args, sizeof(args));
+}
+
 void
 nvif_vmm_dtor(struct nvif_vmm *vmm)
 {
@@ -112,8 +196,9 @@ nvif_vmm_dtor(struct nvif_vmm *vmm)
 }
 
 int
-nvif_vmm_ctor(struct nvif_mmu *mmu, const char *name, s32 oclass, bool managed,
-	      u64 addr, u64 size, void *argv, u32 argc, struct nvif_vmm *vmm)
+nvif_vmm_ctor(struct nvif_mmu *mmu, const char *name, s32 oclass,
+	      enum nvif_vmm_type type, u64 addr, u64 size, void *argv, u32 argc,
+	      struct nvif_vmm *vmm)
 {
 	struct nvif_vmm_v0 *args;
 	u32 argn = sizeof(*args) + argc;
@@ -125,9 +210,18 @@ nvif_vmm_ctor(struct nvif_mmu *mmu, const char *name, s32 oclass, bool managed,
 	if (!(args = kmalloc(argn, GFP_KERNEL)))
 		return -ENOMEM;
 	args->version = 0;
-	args->managed = managed;
 	args->addr = addr;
 	args->size = size;
+
+	switch (type) {
+	case UNMANAGED: args->type = NVIF_VMM_V0_TYPE_UNMANAGED; break;
+	case MANAGED: args->type = NVIF_VMM_V0_TYPE_MANAGED; break;
+	case RAW: args->type = NVIF_VMM_V0_TYPE_RAW; break;
+	default:
+		WARN_ON(1);
+		return -EINVAL;
+	}
+
 	memcpy(args->data, argv, argc);
 
 	ret = nvif_object_ctor(&mmu->object, name ? name : "nvifVmm", 0,
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c
index 524cd3c0e3fe..38b7ced934b1 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c
@@ -58,10 +58,13 @@ nvkm_uvmm_mthd_pfnclr(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
+	if (nvkm_vmm_in_managed_range(vmm, addr, size) && vmm->managed.raw)
+		return -EINVAL;
+
 	if (size) {
-		mutex_lock(&vmm->mutex);
+		mutex_lock(&vmm->mutex.vmm);
 		ret = nvkm_vmm_pfn_unmap(vmm, addr, size);
-		mutex_unlock(&vmm->mutex);
+		mutex_unlock(&vmm->mutex.vmm);
 	}
 
 	return ret;
@@ -88,10 +91,13 @@ nvkm_uvmm_mthd_pfnmap(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
+	if (nvkm_vmm_in_managed_range(vmm, addr, size) && vmm->managed.raw)
+		return -EINVAL;
+
 	if (size) {
-		mutex_lock(&vmm->mutex);
+		mutex_lock(&vmm->mutex.vmm);
 		ret = nvkm_vmm_pfn_map(vmm, page, addr, size, phys);
-		mutex_unlock(&vmm->mutex);
+		mutex_unlock(&vmm->mutex.vmm);
 	}
 
 	return ret;
@@ -113,7 +119,10 @@ nvkm_uvmm_mthd_unmap(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
-	mutex_lock(&vmm->mutex);
+	if (nvkm_vmm_in_managed_range(vmm, addr, 0) && vmm->managed.raw)
+		return -EINVAL;
+
+	mutex_lock(&vmm->mutex.vmm);
 	vma = nvkm_vmm_node_search(vmm, addr);
 	if (ret = -ENOENT, !vma || vma->addr != addr) {
 		VMM_DEBUG(vmm, "lookup %016llx: %016llx",
@@ -134,7 +143,7 @@ nvkm_uvmm_mthd_unmap(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	nvkm_vmm_unmap_locked(vmm, vma, false);
 	ret = 0;
 done:
-	mutex_unlock(&vmm->mutex);
+	mutex_unlock(&vmm->mutex.vmm);
 	return ret;
 }
 
@@ -159,13 +168,16 @@ nvkm_uvmm_mthd_map(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
+	if (nvkm_vmm_in_managed_range(vmm, addr, size) && vmm->managed.raw)
+		return -EINVAL;
+
 	memory = nvkm_umem_search(client, handle);
 	if (IS_ERR(memory)) {
 		VMM_DEBUG(vmm, "memory %016llx %ld\n", handle, PTR_ERR(memory));
 		return PTR_ERR(memory);
 	}
 
-	mutex_lock(&vmm->mutex);
+	mutex_lock(&vmm->mutex.vmm);
 	if (ret = -ENOENT, !(vma = nvkm_vmm_node_search(vmm, addr))) {
 		VMM_DEBUG(vmm, "lookup %016llx", addr);
 		goto fail;
@@ -198,7 +210,7 @@ nvkm_uvmm_mthd_map(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 		}
 	}
 	vma->busy = true;
-	mutex_unlock(&vmm->mutex);
+	mutex_unlock(&vmm->mutex.vmm);
 
 	ret = nvkm_memory_map(memory, offset, vmm, vma, argv, argc);
 	if (ret == 0) {
@@ -207,11 +219,11 @@ nvkm_uvmm_mthd_map(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 		return 0;
 	}
 
-	mutex_lock(&vmm->mutex);
+	mutex_lock(&vmm->mutex.vmm);
 	vma->busy = false;
 	nvkm_vmm_unmap_region(vmm, vma);
 fail:
-	mutex_unlock(&vmm->mutex);
+	mutex_unlock(&vmm->mutex.vmm);
 	nvkm_memory_unref(&memory);
 	return ret;
 }
@@ -232,7 +244,7 @@ nvkm_uvmm_mthd_put(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
-	mutex_lock(&vmm->mutex);
+	mutex_lock(&vmm->mutex.vmm);
 	vma = nvkm_vmm_node_search(vmm, args->v0.addr);
 	if (ret = -ENOENT, !vma || vma->addr != addr || vma->part) {
 		VMM_DEBUG(vmm, "lookup %016llx: %016llx %d", addr,
@@ -248,7 +260,7 @@ nvkm_uvmm_mthd_put(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	nvkm_vmm_put_locked(vmm, vma);
 	ret = 0;
 done:
-	mutex_unlock(&vmm->mutex);
+	mutex_unlock(&vmm->mutex.vmm);
 	return ret;
 }
 
@@ -275,10 +287,10 @@ nvkm_uvmm_mthd_get(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
-	mutex_lock(&vmm->mutex);
+	mutex_lock(&vmm->mutex.vmm);
 	ret = nvkm_vmm_get_locked(vmm, getref, mapref, sparse,
 				  page, align, size, &vma);
-	mutex_unlock(&vmm->mutex);
+	mutex_unlock(&vmm->mutex.vmm);
 	if (ret)
 		return ret;
 
@@ -314,6 +326,167 @@ nvkm_uvmm_mthd_page(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	return 0;
 }
 
+static inline int
+nvkm_uvmm_page_index(struct nvkm_uvmm *uvmm, u64 size, u8 shift, u8 *refd)
+{
+	struct nvkm_vmm *vmm = uvmm->vmm;
+	const struct nvkm_vmm_page *page;
+
+	if (likely(shift)) {
+		for (page = vmm->func->page; page->shift; page++) {
+			if (shift == page->shift)
+				break;
+		}
+
+		if (!page->shift || !IS_ALIGNED(size, 1ULL << page->shift)) {
+			VMM_DEBUG(vmm, "page %d %016llx", shift, size);
+			return -EINVAL;
+		}
+	} else {
+		return -EINVAL;
+	}
+	*refd = page - vmm->func->page;
+
+	return 0;
+}
+
+static int
+nvkm_uvmm_mthd_raw_get(struct nvkm_uvmm *uvmm, struct nvif_vmm_raw_v0 *args)
+{
+	struct nvkm_vmm *vmm = uvmm->vmm;
+	u8 refd;
+	int ret;
+
+	if (!nvkm_vmm_in_managed_range(vmm, args->addr, args->size))
+		return -EINVAL;
+
+	ret = nvkm_uvmm_page_index(uvmm, args->size, args->shift, &refd);
+	if (ret)
+		return ret;
+
+	return nvkm_vmm_raw_get(vmm, args->addr, args->size, refd);
+}
+
+static int
+nvkm_uvmm_mthd_raw_put(struct nvkm_uvmm *uvmm, struct nvif_vmm_raw_v0 *args)
+{
+	struct nvkm_vmm *vmm = uvmm->vmm;
+	u8 refd;
+	int ret;
+
+	if (!nvkm_vmm_in_managed_range(vmm, args->addr, args->size))
+		return -EINVAL;
+
+	ret = nvkm_uvmm_page_index(uvmm, args->size, args->shift, &refd);
+	if (ret)
+		return ret;
+
+	nvkm_vmm_raw_put(vmm, args->addr, args->size, refd);
+
+	return 0;
+}
+
+static int
+nvkm_uvmm_mthd_raw_map(struct nvkm_uvmm *uvmm, struct nvif_vmm_raw_v0 *args)
+{
+	struct nvkm_client *client = uvmm->object.client;
+	struct nvkm_vmm *vmm = uvmm->vmm;
+	struct nvkm_vma vma = {
+		.addr = args->addr,
+		.size = args->size,
+		.used = true,
+		.mapref = false,
+		.no_comp = true,
+	};
+	struct nvkm_memory *memory;
+	u64 handle = args->memory;
+	u8 refd;
+	int ret;
+
+	if (!nvkm_vmm_in_managed_range(vmm, args->addr, args->size))
+		return -EINVAL;
+
+	ret = nvkm_uvmm_page_index(uvmm, args->size, args->shift, &refd);
+	if (ret)
+		return ret;
+
+	vma.page = vma.refd = refd;
+
+	memory = nvkm_umem_search(client, args->memory);
+	if (IS_ERR(memory)) {
+		VMM_DEBUG(vmm, "memory %016llx %ld\n", handle, PTR_ERR(memory));
+		return PTR_ERR(memory);
+	}
+
+	ret = nvkm_memory_map(memory, args->offset, vmm, &vma,
+			      (void *)args->argv, args->argc);
+
+	nvkm_memory_unref(&vma.memory);
+	nvkm_memory_unref(&memory);
+	return ret;
+}
+
+static int
+nvkm_uvmm_mthd_raw_unmap(struct nvkm_uvmm *uvmm, struct nvif_vmm_raw_v0 *args)
+{
+	struct nvkm_vmm *vmm = uvmm->vmm;
+	u8 refd;
+	int ret;
+
+	if (!nvkm_vmm_in_managed_range(vmm, args->addr, args->size))
+		return -EINVAL;
+
+	ret = nvkm_uvmm_page_index(uvmm, args->size, args->shift, &refd);
+	if (ret)
+		return ret;
+
+	nvkm_vmm_raw_unmap(vmm, args->addr, args->size,
+			   args->sparse, refd);
+
+	return 0;
+}
+
+static int
+nvkm_uvmm_mthd_raw_sparse(struct nvkm_uvmm *uvmm, struct nvif_vmm_raw_v0 *args)
+{
+	struct nvkm_vmm *vmm = uvmm->vmm;
+
+	if (!nvkm_vmm_in_managed_range(vmm, args->addr, args->size))
+		return -EINVAL;
+
+	return nvkm_vmm_raw_sparse(vmm, args->addr, args->size, args->ref);
+}
+
+static int
+nvkm_uvmm_mthd_raw(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
+{
+	union {
+		struct nvif_vmm_raw_v0 v0;
+	} *args = argv;
+	int ret = -ENOSYS;
+
+	if (!uvmm->vmm->managed.raw)
+		return -EINVAL;
+
+	if ((ret = nvif_unpack(ret, &argv, &argc, args->v0, 0, 0, true)))
+		return ret;
+
+	switch (args->v0.op) {
+	case NVIF_VMM_RAW_V0_GET:
+		return nvkm_uvmm_mthd_raw_get(uvmm, &args->v0);
+	case NVIF_VMM_RAW_V0_PUT:
+		return nvkm_uvmm_mthd_raw_put(uvmm, &args->v0);
+	case NVIF_VMM_RAW_V0_MAP:
+		return nvkm_uvmm_mthd_raw_map(uvmm, &args->v0);
+	case NVIF_VMM_RAW_V0_UNMAP:
+		return nvkm_uvmm_mthd_raw_unmap(uvmm, &args->v0);
+	case NVIF_VMM_RAW_V0_SPARSE:
+		return nvkm_uvmm_mthd_raw_sparse(uvmm, &args->v0);
+	default:
+		return -EINVAL;
+	};
+}
+
 static int
 nvkm_uvmm_mthd(struct nvkm_object *object, u32 mthd, void *argv, u32 argc)
 {
@@ -326,6 +499,7 @@ nvkm_uvmm_mthd(struct nvkm_object *object, u32 mthd, void *argv, u32 argc)
 	case NVIF_VMM_V0_UNMAP : return nvkm_uvmm_mthd_unmap (uvmm, argv, argc);
 	case NVIF_VMM_V0_PFNMAP: return nvkm_uvmm_mthd_pfnmap(uvmm, argv, argc);
 	case NVIF_VMM_V0_PFNCLR: return nvkm_uvmm_mthd_pfnclr(uvmm, argv, argc);
+	case NVIF_VMM_V0_RAW   : return nvkm_uvmm_mthd_raw   (uvmm, argv, argc);
 	case NVIF_VMM_V0_MTHD(0x00) ... NVIF_VMM_V0_MTHD(0x7f):
 		if (uvmm->vmm->func->mthd) {
 			return uvmm->vmm->func->mthd(uvmm->vmm,
@@ -366,10 +540,11 @@ nvkm_uvmm_new(const struct nvkm_oclass *oclass, void *argv, u32 argc,
 	struct nvkm_uvmm *uvmm;
 	int ret = -ENOSYS;
 	u64 addr, size;
-	bool managed;
+	bool managed, raw;
 
 	if (!(ret = nvif_unpack(ret, &argv, &argc, args->v0, 0, 0, more))) {
-		managed = args->v0.managed != 0;
+		managed = args->v0.type == NVIF_VMM_V0_TYPE_MANAGED;
+		raw = args->v0.type == NVIF_VMM_V0_TYPE_RAW;
 		addr = args->v0.addr;
 		size = args->v0.size;
 	} else
@@ -377,12 +552,13 @@ nvkm_uvmm_new(const struct nvkm_oclass *oclass, void *argv, u32 argc,
 
 	if (!(uvmm = kzalloc(sizeof(*uvmm), GFP_KERNEL)))
 		return -ENOMEM;
+
 	nvkm_object_ctor(&nvkm_uvmm, oclass, &uvmm->object);
 	*pobject = &uvmm->object;
 
 	if (!mmu->vmm) {
-		ret = mmu->func->vmm.ctor(mmu, managed, addr, size, argv, argc,
-					  NULL, "user", &uvmm->vmm);
+		ret = mmu->func->vmm.ctor(mmu, managed || raw, addr, size,
+					  argv, argc, NULL, "user", &uvmm->vmm);
 		if (ret)
 			return ret;
 
@@ -393,6 +569,7 @@ nvkm_uvmm_new(const struct nvkm_oclass *oclass, void *argv, u32 argc,
 
 		uvmm->vmm = nvkm_vmm_ref(mmu->vmm);
 	}
+	uvmm->vmm->managed.raw = raw;
 
 	page = uvmm->vmm->func->page;
 	args->v0.page_nr = 0;
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
index ae793f400ba1..eb5fcadcb39a 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
@@ -676,41 +676,18 @@ nvkm_vmm_ptes_sparse(struct nvkm_vmm *vmm, u64 addr, u64 size, bool ref)
 	return 0;
 }
 
-static void
-nvkm_vmm_ptes_unmap_put(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
-			u64 addr, u64 size, bool sparse, bool pfn)
-{
-	const struct nvkm_vmm_desc_func *func = page->desc->func;
-	nvkm_vmm_iter(vmm, page, addr, size, "unmap + unref",
-		      false, pfn, nvkm_vmm_unref_ptes, NULL, NULL,
-		      sparse ? func->sparse : func->invalid ? func->invalid :
-							      func->unmap);
-}
-
-static int
-nvkm_vmm_ptes_get_map(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
-		      u64 addr, u64 size, struct nvkm_vmm_map *map,
-		      nvkm_vmm_pte_func func)
-{
-	u64 fail = nvkm_vmm_iter(vmm, page, addr, size, "ref + map", true,
-				 false, nvkm_vmm_ref_ptes, func, map, NULL);
-	if (fail != ~0ULL) {
-		if ((size = fail - addr))
-			nvkm_vmm_ptes_unmap_put(vmm, page, addr, size, false, false);
-		return -ENOMEM;
-	}
-	return 0;
-}
-
 static void
 nvkm_vmm_ptes_unmap(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
 		    u64 addr, u64 size, bool sparse, bool pfn)
 {
 	const struct nvkm_vmm_desc_func *func = page->desc->func;
+
+	mutex_lock(&vmm->mutex.map);
 	nvkm_vmm_iter(vmm, page, addr, size, "unmap", false, pfn,
 		      NULL, NULL, NULL,
 		      sparse ? func->sparse : func->invalid ? func->invalid :
 							      func->unmap);
+	mutex_unlock(&vmm->mutex.map);
 }
 
 static void
@@ -718,33 +695,108 @@ nvkm_vmm_ptes_map(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
 		  u64 addr, u64 size, struct nvkm_vmm_map *map,
 		  nvkm_vmm_pte_func func)
 {
+	mutex_lock(&vmm->mutex.map);
 	nvkm_vmm_iter(vmm, page, addr, size, "map", false, false,
 		      NULL, func, map, NULL);
+	mutex_unlock(&vmm->mutex.map);
 }
 
 static void
-nvkm_vmm_ptes_put(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
-		  u64 addr, u64 size)
+nvkm_vmm_ptes_put_locked(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
+			 u64 addr, u64 size)
 {
 	nvkm_vmm_iter(vmm, page, addr, size, "unref", false, false,
 		      nvkm_vmm_unref_ptes, NULL, NULL, NULL);
 }
 
+static void
+nvkm_vmm_ptes_put(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
+		  u64 addr, u64 size)
+{
+	mutex_lock(&vmm->mutex.ref);
+	nvkm_vmm_ptes_put_locked(vmm, page, addr, size);
+	mutex_unlock(&vmm->mutex.ref);
+}
+
 static int
 nvkm_vmm_ptes_get(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
 		  u64 addr, u64 size)
 {
-	u64 fail = nvkm_vmm_iter(vmm, page, addr, size, "ref", true, false,
-				 nvkm_vmm_ref_ptes, NULL, NULL, NULL);
+	u64 fail;
+
+	mutex_lock(&vmm->mutex.ref);
+	fail = nvkm_vmm_iter(vmm, page, addr, size, "ref", true, false,
+			     nvkm_vmm_ref_ptes, NULL, NULL, NULL);
 	if (fail != ~0ULL) {
 		if (fail != addr)
-			nvkm_vmm_ptes_put(vmm, page, addr, fail - addr);
+			nvkm_vmm_ptes_put_locked(vmm, page, addr, fail - addr);
+		mutex_unlock(&vmm->mutex.ref);
+		return -ENOMEM;
+	}
+	mutex_unlock(&vmm->mutex.ref);
+	return 0;
+}
+
+static void
+__nvkm_vmm_ptes_unmap_put(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
+			  u64 addr, u64 size, bool sparse, bool pfn)
+{
+	const struct nvkm_vmm_desc_func *func = page->desc->func;
+
+	nvkm_vmm_iter(vmm, page, addr, size, "unmap + unref",
+		      false, pfn, nvkm_vmm_unref_ptes, NULL, NULL,
+		      sparse ? func->sparse : func->invalid ? func->invalid :
+							      func->unmap);
+}
+
+static void
+nvkm_vmm_ptes_unmap_put(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
+			u64 addr, u64 size, bool sparse, bool pfn)
+{
+	if (vmm->managed.raw) {
+		nvkm_vmm_ptes_unmap(vmm, page, addr, size, sparse, pfn);
+		nvkm_vmm_ptes_put(vmm, page, addr, size);
+	} else {
+		__nvkm_vmm_ptes_unmap_put(vmm, page, addr, size, sparse, pfn);
+	}
+}
+
+static int
+__nvkm_vmm_ptes_get_map(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
+			u64 addr, u64 size, struct nvkm_vmm_map *map,
+			nvkm_vmm_pte_func func)
+{
+	u64 fail = nvkm_vmm_iter(vmm, page, addr, size, "ref + map", true,
+				 false, nvkm_vmm_ref_ptes, func, map, NULL);
+	if (fail != ~0ULL) {
+		if ((size = fail - addr))
+			nvkm_vmm_ptes_unmap_put(vmm, page, addr, size, false, false);
 		return -ENOMEM;
 	}
 	return 0;
 }
 
-static inline struct nvkm_vma *
+static int
+nvkm_vmm_ptes_get_map(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
+		      u64 addr, u64 size, struct nvkm_vmm_map *map,
+		      nvkm_vmm_pte_func func)
+{
+	int ret;
+
+	if (vmm->managed.raw) {
+		ret = nvkm_vmm_ptes_get(vmm, page, addr, size);
+		if (ret)
+			return ret;
+
+		nvkm_vmm_ptes_map(vmm, page, addr, size, map, func);
+
+		return 0;
+	} else {
+		return __nvkm_vmm_ptes_get_map(vmm, page, addr, size, map, func);
+	}
+}
+
+struct nvkm_vma *
 nvkm_vma_new(u64 addr, u64 size)
 {
 	struct nvkm_vma *vma = kzalloc(sizeof(*vma), GFP_KERNEL);
@@ -1045,7 +1097,9 @@ nvkm_vmm_ctor(const struct nvkm_vmm_func *func, struct nvkm_mmu *mmu,
 	vmm->debug = mmu->subdev.debug;
 	kref_init(&vmm->kref);
 
-	__mutex_init(&vmm->mutex, "&vmm->mutex", key ? key : &_key);
+	__mutex_init(&vmm->mutex.vmm, "&vmm->mutex.vmm", key ? key : &_key);
+	mutex_init(&vmm->mutex.ref);
+	mutex_init(&vmm->mutex.map);
 
 	/* Locate the smallest page size supported by the backend, it will
 	 * have the deepest nesting of page tables.
@@ -1101,6 +1155,9 @@ nvkm_vmm_ctor(const struct nvkm_vmm_func *func, struct nvkm_mmu *mmu,
 		if (addr && (ret = nvkm_vmm_ctor_managed(vmm, 0, addr)))
 			return ret;
 
+		vmm->managed.p.addr = 0;
+		vmm->managed.p.size = addr;
+
 		/* NVKM-managed area. */
 		if (size) {
 			if (!(vma = nvkm_vma_new(addr, size)))
@@ -1114,6 +1171,9 @@ nvkm_vmm_ctor(const struct nvkm_vmm_func *func, struct nvkm_mmu *mmu,
 		size = vmm->limit - addr;
 		if (size && (ret = nvkm_vmm_ctor_managed(vmm, addr, size)))
 			return ret;
+
+		vmm->managed.n.addr = addr;
+		vmm->managed.n.size = size;
 	} else {
 		/* Address-space fully managed by NVKM, requiring calls to
 		 * nvkm_vmm_get()/nvkm_vmm_put() to allocate address-space.
@@ -1362,9 +1422,9 @@ void
 nvkm_vmm_unmap(struct nvkm_vmm *vmm, struct nvkm_vma *vma)
 {
 	if (vma->memory) {
-		mutex_lock(&vmm->mutex);
+		mutex_lock(&vmm->mutex.vmm);
 		nvkm_vmm_unmap_locked(vmm, vma, false);
-		mutex_unlock(&vmm->mutex);
+		mutex_unlock(&vmm->mutex.vmm);
 	}
 }
 
@@ -1423,6 +1483,8 @@ nvkm_vmm_map_locked(struct nvkm_vmm *vmm, struct nvkm_vma *vma,
 	nvkm_vmm_pte_func func;
 	int ret;
 
+	map->no_comp = vma->no_comp;
+
 	/* Make sure we won't overrun the end of the memory object. */
 	if (unlikely(nvkm_memory_size(map->memory) < map->offset + vma->size)) {
 		VMM_DEBUG(vmm, "overrun %016llx %016llx %016llx",
@@ -1507,10 +1569,15 @@ nvkm_vmm_map(struct nvkm_vmm *vmm, struct nvkm_vma *vma, void *argv, u32 argc,
 	     struct nvkm_vmm_map *map)
 {
 	int ret;
-	mutex_lock(&vmm->mutex);
+
+	if (nvkm_vmm_in_managed_range(vmm, vma->addr, vma->size) &&
+	    vmm->managed.raw)
+		return nvkm_vmm_map_locked(vmm, vma, argv, argc, map);
+
+	mutex_lock(&vmm->mutex.vmm);
 	ret = nvkm_vmm_map_locked(vmm, vma, argv, argc, map);
 	vma->busy = false;
-	mutex_unlock(&vmm->mutex);
+	mutex_unlock(&vmm->mutex.vmm);
 	return ret;
 }
 
@@ -1620,9 +1687,9 @@ nvkm_vmm_put(struct nvkm_vmm *vmm, struct nvkm_vma **pvma)
 {
 	struct nvkm_vma *vma = *pvma;
 	if (vma) {
-		mutex_lock(&vmm->mutex);
+		mutex_lock(&vmm->mutex.vmm);
 		nvkm_vmm_put_locked(vmm, vma);
-		mutex_unlock(&vmm->mutex);
+		mutex_unlock(&vmm->mutex.vmm);
 		*pvma = NULL;
 	}
 }
@@ -1769,9 +1836,49 @@ int
 nvkm_vmm_get(struct nvkm_vmm *vmm, u8 page, u64 size, struct nvkm_vma **pvma)
 {
 	int ret;
-	mutex_lock(&vmm->mutex);
+	mutex_lock(&vmm->mutex.vmm);
 	ret = nvkm_vmm_get_locked(vmm, false, true, false, page, 0, size, pvma);
-	mutex_unlock(&vmm->mutex);
+	mutex_unlock(&vmm->mutex.vmm);
+	return ret;
+}
+
+void
+nvkm_vmm_raw_unmap(struct nvkm_vmm *vmm, u64 addr, u64 size,
+		   bool sparse, u8 refd)
+{
+	const struct nvkm_vmm_page *page = &vmm->func->page[refd];
+
+	nvkm_vmm_ptes_unmap(vmm, page, addr, size, sparse, false);
+}
+
+void
+nvkm_vmm_raw_put(struct nvkm_vmm *vmm, u64 addr, u64 size, u8 refd)
+{
+	const struct nvkm_vmm_page *page = vmm->func->page;
+
+	nvkm_vmm_ptes_put(vmm, &page[refd], addr, size);
+}
+
+int
+nvkm_vmm_raw_get(struct nvkm_vmm *vmm, u64 addr, u64 size, u8 refd)
+{
+	const struct nvkm_vmm_page *page = vmm->func->page;
+
+	if (unlikely(!size))
+		return -EINVAL;
+
+	return nvkm_vmm_ptes_get(vmm, &page[refd], addr, size);
+}
+
+int
+nvkm_vmm_raw_sparse(struct nvkm_vmm *vmm, u64 addr, u64 size, bool ref)
+{
+	int ret;
+
+	mutex_lock(&vmm->mutex.ref);
+	ret = nvkm_vmm_ptes_sparse(vmm, addr, size, ref);
+	mutex_unlock(&vmm->mutex.ref);
+
 	return ret;
 }
 
@@ -1779,9 +1886,9 @@ void
 nvkm_vmm_part(struct nvkm_vmm *vmm, struct nvkm_memory *inst)
 {
 	if (inst && vmm && vmm->func->part) {
-		mutex_lock(&vmm->mutex);
+		mutex_lock(&vmm->mutex.vmm);
 		vmm->func->part(vmm, inst);
-		mutex_unlock(&vmm->mutex);
+		mutex_unlock(&vmm->mutex.vmm);
 	}
 }
 
@@ -1790,9 +1897,9 @@ nvkm_vmm_join(struct nvkm_vmm *vmm, struct nvkm_memory *inst)
 {
 	int ret = 0;
 	if (vmm->func->join) {
-		mutex_lock(&vmm->mutex);
+		mutex_lock(&vmm->mutex.vmm);
 		ret = vmm->func->join(vmm, inst);
-		mutex_unlock(&vmm->mutex);
+		mutex_unlock(&vmm->mutex.vmm);
 	}
 	return ret;
 }
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h
index f6188aa9171c..f9bc30cdb2b3 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h
@@ -163,6 +163,7 @@ int nvkm_vmm_new_(const struct nvkm_vmm_func *, struct nvkm_mmu *,
 		  u32 pd_header, bool managed, u64 addr, u64 size,
 		  struct lock_class_key *, const char *name,
 		  struct nvkm_vmm **);
+struct nvkm_vma *nvkm_vma_new(u64 addr, u64 size);
 struct nvkm_vma *nvkm_vmm_node_search(struct nvkm_vmm *, u64 addr);
 struct nvkm_vma *nvkm_vmm_node_split(struct nvkm_vmm *, struct nvkm_vma *,
 				     u64 addr, u64 size);
@@ -173,6 +174,30 @@ void nvkm_vmm_put_locked(struct nvkm_vmm *, struct nvkm_vma *);
 void nvkm_vmm_unmap_locked(struct nvkm_vmm *, struct nvkm_vma *, bool pfn);
 void nvkm_vmm_unmap_region(struct nvkm_vmm *, struct nvkm_vma *);
 
+int nvkm_vmm_raw_get(struct nvkm_vmm *vmm, u64 addr, u64 size, u8 refd);
+void nvkm_vmm_raw_put(struct nvkm_vmm *vmm, u64 addr, u64 size, u8 refd);
+void nvkm_vmm_raw_unmap(struct nvkm_vmm *vmm, u64 addr, u64 size,
+			bool sparse, u8 refd);
+int nvkm_vmm_raw_sparse(struct nvkm_vmm *, u64 addr, u64 size, bool ref);
+
+static inline bool
+nvkm_vmm_in_managed_range(struct nvkm_vmm *vmm, u64 start, u64 size)
+{
+	u64 p_start = vmm->managed.p.addr;
+	u64 p_end = p_start + vmm->managed.p.size;
+	u64 n_start = vmm->managed.n.addr;
+	u64 n_end = n_start + vmm->managed.n.size;
+	u64 end = start + size;
+
+	if (start >= p_start && end <= p_end)
+		return true;
+
+	if (start >= n_start && end <= n_end)
+		return true;
+
+	return false;
+}
+
 #define NVKM_VMM_PFN_ADDR                                 0xfffffffffffff000ULL
 #define NVKM_VMM_PFN_ADDR_SHIFT                                              12
 #define NVKM_VMM_PFN_APER                                 0x00000000000000f0ULL
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c
index 5438384d9a67..5e857c02e9aa 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c
@@ -287,15 +287,17 @@ gf100_vmm_valid(struct nvkm_vmm *vmm, void *argv, u32 argc,
 			return -EINVAL;
 		}
 
-		ret = nvkm_memory_tags_get(memory, device, tags,
-					   nvkm_ltc_tags_clear,
-					   &map->tags);
-		if (ret) {
-			VMM_DEBUG(vmm, "comp %d", ret);
-			return ret;
+		if (!map->no_comp) {
+			ret = nvkm_memory_tags_get(memory, device, tags,
+						   nvkm_ltc_tags_clear,
+						   &map->tags);
+			if (ret) {
+				VMM_DEBUG(vmm, "comp %d", ret);
+				return ret;
+			}
 		}
 
-		if (map->tags->mn) {
+		if (!map->no_comp && map->tags->mn) {
 			u64 tags = map->tags->mn->offset + (map->offset >> 17);
 			if (page->shift == 17 || !gm20x) {
 				map->type |= tags << 44;
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
index 17899fc95b2d..f3630d0e0d55 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
@@ -453,15 +453,17 @@ gp100_vmm_valid(struct nvkm_vmm *vmm, void *argv, u32 argc,
 			return -EINVAL;
 		}
 
-		ret = nvkm_memory_tags_get(memory, device, tags,
-					   nvkm_ltc_tags_clear,
-					   &map->tags);
-		if (ret) {
-			VMM_DEBUG(vmm, "comp %d", ret);
-			return ret;
+		if (!map->no_comp) {
+			ret = nvkm_memory_tags_get(memory, device, tags,
+						   nvkm_ltc_tags_clear,
+						   &map->tags);
+			if (ret) {
+				VMM_DEBUG(vmm, "comp %d", ret);
+				return ret;
+			}
 		}
 
-		if (map->tags->mn) {
+		if (!map->no_comp && map->tags->mn) {
 			tags = map->tags->mn->offset + (map->offset >> 16);
 			map->ctag |= ((1ULL << page->shift) >> 16) << 36;
 			map->type |= tags << 36;
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c
index b7548dcd72c7..ff08ad5005a9 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c
@@ -296,19 +296,22 @@ nv50_vmm_valid(struct nvkm_vmm *vmm, void *argv, u32 argc,
 			return -EINVAL;
 		}
 
-		ret = nvkm_memory_tags_get(memory, device, tags, NULL,
-					   &map->tags);
-		if (ret) {
-			VMM_DEBUG(vmm, "comp %d", ret);
-			return ret;
-		}
+		if (!map->no_comp) {
+			ret = nvkm_memory_tags_get(memory, device, tags, NULL,
+						   &map->tags);
+			if (ret) {
+				VMM_DEBUG(vmm, "comp %d", ret);
+				return ret;
+			}
 
-		if (map->tags->mn) {
-			u32 tags = map->tags->mn->offset + (map->offset >> 16);
-			map->ctag |= (u64)comp << 49;
-			map->type |= (u64)comp << 47;
-			map->type |= (u64)tags << 49;
-			map->next |= map->ctag;
+			if (map->tags->mn) {
+				u32 tags = map->tags->mn->offset +
+					   (map->offset >> 16);
+				map->ctag |= (u64)comp << 49;
+				map->type |= (u64)comp << 47;
+				map->type |= (u64)tags << 49;
+				map->next |= map->ctag;
+			}
 		}
 	}
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 14/16] drm/nouveau: implement uvmm for user mode bindings
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (12 preceding siblings ...)
  2023-02-17 13:48 ` [PATCH drm-next v2 13/16] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm Danilo Krummrich
@ 2023-02-17 13:48 ` Danilo Krummrich
  2023-02-17 13:48 ` [PATCH drm-next v2 15/16] drm/nouveau: implement new VM_BIND UAPI Danilo Krummrich
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:48 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm, Danilo Krummrich

uvmm provides the driver abstraction around the DRM GPU VA manager
connecting it to the nouveau infrastructure.

It handles the split and merge operations provided by the DRM GPU VA
manager for map operations colliding with existent mappings and takes
care of the driver specific locking around the DRM GPU VA manager.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/Kbuild          |    1 +
 drivers/gpu/drm/nouveau/nouveau_abi16.c |    7 +
 drivers/gpu/drm/nouveau/nouveau_bo.c    |  147 +--
 drivers/gpu/drm/nouveau/nouveau_bo.h    |    2 +-
 drivers/gpu/drm/nouveau/nouveau_drm.c   |    2 +
 drivers/gpu/drm/nouveau/nouveau_drv.h   |   48 +
 drivers/gpu/drm/nouveau/nouveau_gem.c   |   25 +-
 drivers/gpu/drm/nouveau/nouveau_mem.h   |    5 +
 drivers/gpu/drm/nouveau/nouveau_prime.c |    2 +-
 drivers/gpu/drm/nouveau/nouveau_uvmm.c  | 1090 +++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_uvmm.h  |  110 +++
 11 files changed, 1378 insertions(+), 61 deletions(-)
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h

diff --git a/drivers/gpu/drm/nouveau/Kbuild b/drivers/gpu/drm/nouveau/Kbuild
index 5e5617006da5..ee281bb76463 100644
--- a/drivers/gpu/drm/nouveau/Kbuild
+++ b/drivers/gpu/drm/nouveau/Kbuild
@@ -47,6 +47,7 @@ nouveau-y += nouveau_prime.o
 nouveau-y += nouveau_sgdma.o
 nouveau-y += nouveau_ttm.o
 nouveau-y += nouveau_vmm.o
+nouveau-y += nouveau_uvmm.o
 
 # DRM - modesetting
 nouveau-$(CONFIG_DRM_NOUVEAU_BACKLIGHT) += nouveau_backlight.o
diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c b/drivers/gpu/drm/nouveau/nouveau_abi16.c
index 82dab51d8aeb..36cc80eb0e20 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
@@ -261,6 +261,13 @@ nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
 	if (!drm->channel)
 		return nouveau_abi16_put(abi16, -ENODEV);
 
+	/* If uvmm wasn't initialized until now disable it completely to prevent
+	 * userspace from mixing up UAPIs.
+	 *
+	 * The client lock is already acquired by nouveau_abi16_get().
+	 */
+	__nouveau_cli_uvmm_disable(cli);
+
 	device = &abi16->device;
 	engine = NV_DEVICE_HOST_RUNLIST_ENGINES_GR;
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index bf6984c8754c..f3d73d6edd46 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -199,7 +199,7 @@ nouveau_bo_fixup_align(struct nouveau_bo *nvbo, int *align, u64 *size)
 
 struct nouveau_bo *
 nouveau_bo_alloc(struct nouveau_cli *cli, u64 *size, int *align, u32 domain,
-		 u32 tile_mode, u32 tile_flags)
+		 u32 tile_mode, u32 tile_flags, bool internal)
 {
 	struct nouveau_drm *drm = cli->drm;
 	struct nouveau_bo *nvbo;
@@ -235,68 +235,103 @@ nouveau_bo_alloc(struct nouveau_cli *cli, u64 *size, int *align, u32 domain,
 			nvbo->force_coherent = true;
 	}
 
-	if (cli->device.info.family >= NV_DEVICE_INFO_V0_FERMI) {
-		nvbo->kind = (tile_flags & 0x0000ff00) >> 8;
-		if (!nvif_mmu_kind_valid(mmu, nvbo->kind)) {
-			kfree(nvbo);
-			return ERR_PTR(-EINVAL);
+	nvbo->contig = !(tile_flags & NOUVEAU_GEM_TILE_NONCONTIG);
+	if (!nouveau_cli_uvmm(cli) || internal) {
+		/* for BO noVM allocs, don't assign kinds */
+		if (cli->device.info.family >= NV_DEVICE_INFO_V0_FERMI) {
+			nvbo->kind = (tile_flags & 0x0000ff00) >> 8;
+			if (!nvif_mmu_kind_valid(mmu, nvbo->kind)) {
+				kfree(nvbo);
+				return ERR_PTR(-EINVAL);
+			}
+
+			nvbo->comp = mmu->kind[nvbo->kind] != nvbo->kind;
+		} else if (cli->device.info.family >= NV_DEVICE_INFO_V0_TESLA) {
+			nvbo->kind = (tile_flags & 0x00007f00) >> 8;
+			nvbo->comp = (tile_flags & 0x00030000) >> 16;
+			if (!nvif_mmu_kind_valid(mmu, nvbo->kind)) {
+				kfree(nvbo);
+				return ERR_PTR(-EINVAL);
+			}
+		} else {
+			nvbo->zeta = (tile_flags & 0x00000007);
 		}
+		nvbo->mode = tile_mode;
+
+		/* Determine the desirable target GPU page size for the buffer. */
+		for (i = 0; i < vmm->page_nr; i++) {
+			/* Because we cannot currently allow VMM maps to fail
+			 * during buffer migration, we need to determine page
+			 * size for the buffer up-front, and pre-allocate its
+			 * page tables.
+			 *
+			 * Skip page sizes that can't support needed domains.
+			 */
+			if (cli->device.info.family > NV_DEVICE_INFO_V0_CURIE &&
+			    (domain & NOUVEAU_GEM_DOMAIN_VRAM) && !vmm->page[i].vram)
+				continue;
+			if ((domain & NOUVEAU_GEM_DOMAIN_GART) &&
+			    (!vmm->page[i].host || vmm->page[i].shift > PAGE_SHIFT))
+				continue;
 
-		nvbo->comp = mmu->kind[nvbo->kind] != nvbo->kind;
-	} else
-	if (cli->device.info.family >= NV_DEVICE_INFO_V0_TESLA) {
-		nvbo->kind = (tile_flags & 0x00007f00) >> 8;
-		nvbo->comp = (tile_flags & 0x00030000) >> 16;
-		if (!nvif_mmu_kind_valid(mmu, nvbo->kind)) {
+			/* Select this page size if it's the first that supports
+			 * the potential memory domains, or when it's compatible
+			 * with the requested compression settings.
+			 */
+			if (pi < 0 || !nvbo->comp || vmm->page[i].comp)
+				pi = i;
+
+			/* Stop once the buffer is larger than the current page size. */
+			if (*size >= 1ULL << vmm->page[i].shift)
+				break;
+		}
+
+		if (WARN_ON(pi < 0)) {
 			kfree(nvbo);
 			return ERR_PTR(-EINVAL);
 		}
-	} else {
-		nvbo->zeta = (tile_flags & 0x00000007);
-	}
-	nvbo->mode = tile_mode;
-	nvbo->contig = !(tile_flags & NOUVEAU_GEM_TILE_NONCONTIG);
-
-	/* Determine the desirable target GPU page size for the buffer. */
-	for (i = 0; i < vmm->page_nr; i++) {
-		/* Because we cannot currently allow VMM maps to fail
-		 * during buffer migration, we need to determine page
-		 * size for the buffer up-front, and pre-allocate its
-		 * page tables.
-		 *
-		 * Skip page sizes that can't support needed domains.
-		 */
-		if (cli->device.info.family > NV_DEVICE_INFO_V0_CURIE &&
-		    (domain & NOUVEAU_GEM_DOMAIN_VRAM) && !vmm->page[i].vram)
-			continue;
-		if ((domain & NOUVEAU_GEM_DOMAIN_GART) &&
-		    (!vmm->page[i].host || vmm->page[i].shift > PAGE_SHIFT))
-			continue;
 
-		/* Select this page size if it's the first that supports
-		 * the potential memory domains, or when it's compatible
-		 * with the requested compression settings.
-		 */
-		if (pi < 0 || !nvbo->comp || vmm->page[i].comp)
-			pi = i;
-
-		/* Stop once the buffer is larger than the current page size. */
-		if (*size >= 1ULL << vmm->page[i].shift)
-			break;
-	}
+		/* Disable compression if suitable settings couldn't be found. */
+		if (nvbo->comp && !vmm->page[pi].comp) {
+			if (mmu->object.oclass >= NVIF_CLASS_MMU_GF100)
+				nvbo->kind = mmu->kind[nvbo->kind];
+			nvbo->comp = 0;
+		}
+		nvbo->page = vmm->page[pi].shift;
+	} else {
+		/* reject other tile flags when in VM mode. */
+		if (tile_mode)
+			return ERR_PTR(-EINVAL);
+		if (tile_flags & ~NOUVEAU_GEM_TILE_NONCONTIG)
+			return ERR_PTR(-EINVAL);
 
-	if (WARN_ON(pi < 0)) {
-		kfree(nvbo);
-		return ERR_PTR(-EINVAL);
-	}
+		/* Determine the desirable target GPU page size for the buffer. */
+		for (i = 0; i < vmm->page_nr; i++) {
+			/* Because we cannot currently allow VMM maps to fail
+			 * during buffer migration, we need to determine page
+			 * size for the buffer up-front, and pre-allocate its
+			 * page tables.
+			 *
+			 * Skip page sizes that can't support needed domains.
+			 */
+			if ((domain & NOUVEAU_GEM_DOMAIN_VRAM) && !vmm->page[i].vram)
+				continue;
+			if ((domain & NOUVEAU_GEM_DOMAIN_GART) &&
+			    (!vmm->page[i].host || vmm->page[i].shift > PAGE_SHIFT))
+				continue;
 
-	/* Disable compression if suitable settings couldn't be found. */
-	if (nvbo->comp && !vmm->page[pi].comp) {
-		if (mmu->object.oclass >= NVIF_CLASS_MMU_GF100)
-			nvbo->kind = mmu->kind[nvbo->kind];
-		nvbo->comp = 0;
+			if (pi < 0)
+				pi = i;
+			/* Stop once the buffer is larger than the current page size. */
+			if (*size >= 1ULL << vmm->page[i].shift)
+				break;
+		}
+		if (WARN_ON(pi < 0)) {
+			kfree(nvbo);
+			return ERR_PTR(-EINVAL);
+		}
+		nvbo->page = vmm->page[pi].shift;
 	}
-	nvbo->page = vmm->page[pi].shift;
 
 	nouveau_bo_fixup_align(nvbo, align, size);
 
@@ -334,7 +369,7 @@ nouveau_bo_new(struct nouveau_cli *cli, u64 size, int align,
 	int ret;
 
 	nvbo = nouveau_bo_alloc(cli, &size, &align, domain, tile_mode,
-				tile_flags);
+				tile_flags, true);
 	if (IS_ERR(nvbo))
 		return PTR_ERR(nvbo);
 
@@ -938,6 +973,7 @@ static void nouveau_bo_move_ntfy(struct ttm_buffer_object *bo,
 		list_for_each_entry(vma, &nvbo->vma_list, head) {
 			nouveau_vma_map(vma, mem);
 		}
+		nouveau_uvmm_bo_map_all(nvbo, mem);
 	} else {
 		list_for_each_entry(vma, &nvbo->vma_list, head) {
 			ret = dma_resv_wait_timeout(bo->base.resv,
@@ -946,6 +982,7 @@ static void nouveau_bo_move_ntfy(struct ttm_buffer_object *bo,
 			WARN_ON(ret <= 0);
 			nouveau_vma_unmap(vma);
 		}
+		nouveau_uvmm_bo_unmap_all(nvbo);
 	}
 
 	if (new_reg)
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h b/drivers/gpu/drm/nouveau/nouveau_bo.h
index 774dd93ca76b..cb85207d9e8f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.h
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.h
@@ -73,7 +73,7 @@ extern struct ttm_device_funcs nouveau_bo_driver;
 
 void nouveau_bo_move_init(struct nouveau_drm *);
 struct nouveau_bo *nouveau_bo_alloc(struct nouveau_cli *, u64 *size, int *align,
-				    u32 domain, u32 tile_mode, u32 tile_flags);
+				    u32 domain, u32 tile_mode, u32 tile_flags, bool internal);
 int  nouveau_bo_init(struct nouveau_bo *, u64 size, int align, u32 domain,
 		     struct sg_table *sg, struct dma_resv *robj);
 int  nouveau_bo_new(struct nouveau_cli *, u64 size, int align, u32 domain,
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index cc7c5b4a05fd..cde843156700 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -68,6 +68,7 @@
 #include "nouveau_platform.h"
 #include "nouveau_svm.h"
 #include "nouveau_dmem.h"
+#include "nouveau_uvmm.h"
 
 DECLARE_DYNDBG_CLASSMAP(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
 			"DRM_UT_CORE",
@@ -190,6 +191,7 @@ nouveau_cli_fini(struct nouveau_cli *cli)
 	WARN_ON(!list_empty(&cli->worker));
 
 	usif_client_fini(cli);
+	nouveau_uvmm_fini(&cli->uvmm);
 	nouveau_vmm_fini(&cli->svm);
 	nouveau_vmm_fini(&cli->vmm);
 	nvif_mmu_dtor(&cli->mmu);
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 20a7f31b9082..d634f1054d65 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -64,6 +64,7 @@ struct platform_device;
 #include "nouveau_fence.h"
 #include "nouveau_bios.h"
 #include "nouveau_vmm.h"
+#include "nouveau_uvmm.h"
 
 struct nouveau_drm_tile {
 	struct nouveau_fence *fence;
@@ -91,6 +92,8 @@ struct nouveau_cli {
 	struct nvif_mmu mmu;
 	struct nouveau_vmm vmm;
 	struct nouveau_vmm svm;
+	struct nouveau_uvmm uvmm;
+
 	const struct nvif_mclass *mem;
 
 	struct list_head head;
@@ -112,15 +115,60 @@ struct nouveau_cli_work {
 	struct dma_fence_cb cb;
 };
 
+static inline struct nouveau_uvmm *
+nouveau_cli_uvmm(struct nouveau_cli *cli)
+{
+	if (!cli || !cli->uvmm.vmm.cli)
+		return NULL;
+
+	return &cli->uvmm;
+}
+
+static inline struct nouveau_uvmm *
+nouveau_cli_uvmm_locked(struct nouveau_cli *cli)
+{
+	struct nouveau_uvmm *uvmm;
+
+	mutex_lock(&cli->mutex);
+	uvmm = nouveau_cli_uvmm(cli);
+	mutex_unlock(&cli->mutex);
+
+	return uvmm;
+}
+
 static inline struct nouveau_vmm *
 nouveau_cli_vmm(struct nouveau_cli *cli)
 {
+	struct nouveau_uvmm *uvmm;
+
+	uvmm = nouveau_cli_uvmm(cli);
+	if (uvmm)
+		return &uvmm->vmm;
+
 	if (cli->svm.cli)
 		return &cli->svm;
 
 	return &cli->vmm;
 }
 
+static inline void
+__nouveau_cli_uvmm_disable(struct nouveau_cli *cli)
+{
+	struct nouveau_uvmm *uvmm;
+
+	uvmm = nouveau_cli_uvmm(cli);
+	if (!uvmm)
+		cli->uvmm.disabled = true;
+}
+
+static inline void
+nouveau_cli_uvmm_disable(struct nouveau_cli *cli)
+{
+	mutex_lock(&cli->mutex);
+	__nouveau_cli_uvmm_disable(cli);
+	mutex_unlock(&cli->mutex);
+}
+
 void nouveau_cli_work_queue(struct nouveau_cli *, struct dma_fence *,
 			    struct nouveau_cli_work *);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 4369c8dc8b5b..10c60b0a8dc8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -120,7 +120,11 @@ nouveau_gem_object_open(struct drm_gem_object *gem, struct drm_file *file_priv)
 		goto out;
 	}
 
-	ret = nouveau_vma_new(nvbo, vmm, &vma);
+	/* only create a VMA on binding */
+	if (!nouveau_cli_uvmm(cli))
+		ret = nouveau_vma_new(nvbo, vmm, &vma);
+	else
+		ret = 0;
 	pm_runtime_mark_last_busy(dev);
 	pm_runtime_put_autosuspend(dev);
 out:
@@ -187,6 +191,9 @@ nouveau_gem_object_close(struct drm_gem_object *gem, struct drm_file *file_priv)
 	if (vmm->vmm.object.oclass < NVIF_CLASS_VMM_NV50)
 		return;
 
+	if (nouveau_cli_uvmm(cli))
+		return;
+
 	ret = ttm_bo_reserve(&nvbo->bo, false, false, NULL);
 	if (ret)
 		return;
@@ -231,7 +238,7 @@ nouveau_gem_new(struct nouveau_cli *cli, u64 size, int align, uint32_t domain,
 		domain |= NOUVEAU_GEM_DOMAIN_CPU;
 
 	nvbo = nouveau_bo_alloc(cli, &size, &align, domain, tile_mode,
-				tile_flags);
+				tile_flags, false);
 	if (IS_ERR(nvbo))
 		return PTR_ERR(nvbo);
 
@@ -279,13 +286,15 @@ nouveau_gem_info(struct drm_file *file_priv, struct drm_gem_object *gem,
 	else
 		rep->domain = NOUVEAU_GEM_DOMAIN_VRAM;
 	rep->offset = nvbo->offset;
-	if (vmm->vmm.object.oclass >= NVIF_CLASS_VMM_NV50) {
+	if (vmm->vmm.object.oclass >= NVIF_CLASS_VMM_NV50 &&
+	    !nouveau_cli_uvmm(cli)) {
 		vma = nouveau_vma_find(nvbo, vmm);
 		if (!vma)
 			return -EINVAL;
 
 		rep->offset = vma->addr;
-	}
+	} else
+		rep->offset = 0;
 
 	rep->size = nvbo->bo.base.size;
 	rep->map_handle = drm_vma_node_offset_addr(&nvbo->bo.base.vma_node);
@@ -310,6 +319,11 @@ nouveau_gem_ioctl_new(struct drm_device *dev, void *data,
 	struct nouveau_bo *nvbo = NULL;
 	int ret = 0;
 
+	/* If uvmm wasn't initialized until now disable it completely to prevent
+	 * userspace from mixing up UAPIs.
+	 */
+	nouveau_cli_uvmm_disable(cli);
+
 	ret = nouveau_gem_new(cli, req->info.size, req->align,
 			      req->info.domain, req->info.tile_mode,
 			      req->info.tile_flags, &nvbo);
@@ -715,6 +729,9 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,
 	if (unlikely(!abi16))
 		return -ENOMEM;
 
+	if (unlikely(nouveau_cli_uvmm(cli)))
+		return -ENOSYS;
+
 	list_for_each_entry(temp, &abi16->channels, head) {
 		if (temp->chan->chid == req->channel) {
 			chan = temp->chan;
diff --git a/drivers/gpu/drm/nouveau/nouveau_mem.h b/drivers/gpu/drm/nouveau/nouveau_mem.h
index 76c86d8bb01e..5365a3d3a17f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_mem.h
+++ b/drivers/gpu/drm/nouveau/nouveau_mem.h
@@ -35,4 +35,9 @@ int nouveau_mem_vram(struct ttm_resource *, bool contig, u8 page);
 int nouveau_mem_host(struct ttm_resource *, struct ttm_tt *);
 void nouveau_mem_fini(struct nouveau_mem *);
 int nouveau_mem_map(struct nouveau_mem *, struct nvif_vmm *, struct nvif_vma *);
+int
+nouveau_mem_map_fixed(struct nouveau_mem *mem,
+		      struct nvif_vmm *vmm,
+		      u8 kind, u64 addr,
+		      u64 offset, u64 range);
 #endif
diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c b/drivers/gpu/drm/nouveau/nouveau_prime.c
index f42c2b1b0363..6a883b9a799a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_prime.c
+++ b/drivers/gpu/drm/nouveau/nouveau_prime.c
@@ -50,7 +50,7 @@ struct drm_gem_object *nouveau_gem_prime_import_sg_table(struct drm_device *dev,
 
 	dma_resv_lock(robj, NULL);
 	nvbo = nouveau_bo_alloc(&drm->client, &size, &align,
-				NOUVEAU_GEM_DOMAIN_GART, 0, 0);
+				NOUVEAU_GEM_DOMAIN_GART, 0, 0, true);
 	if (IS_ERR(nvbo)) {
 		obj = ERR_CAST(nvbo);
 		goto unlock;
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
new file mode 100644
index 000000000000..2f7747a5a917
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
@@ -0,0 +1,1090 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *     Danilo Krummrich <dakr@redhat.com>
+ *
+ */
+
+/*
+ * Locking:
+ *
+ * The uvmm mutex protects any operations on the GPU VA space provided by the
+ * DRM GPU VA manager.
+ *
+ * The DRM GEM GPUVA lock protects a GEM's GPUVA list. It also protects single
+ * map/unmap operations against a BO move, which itself walks the GEM's GPUVA
+ * list in order to map/unmap it's entries.
+ *
+ * We'd also need to protect the DRM_GPUVA_EVICTED flag for each individual
+ * GPUVA, however this isn't necessary since any read or write to this flag
+ * happens when we already took the DRM GEM GPUVA lock of the backing GEM of
+ * the particular GPUVA.
+ */
+
+#include "nouveau_drv.h"
+#include "nouveau_gem.h"
+#include "nouveau_mem.h"
+#include "nouveau_uvmm.h"
+
+#include <nvif/vmm.h>
+#include <nvif/mem.h>
+
+#include <nvif/class.h>
+#include <nvif/if000c.h>
+#include <nvif/if900d.h>
+
+#define NOUVEAU_VA_SPACE_BITS		47 /* FIXME */
+#define NOUVEAU_VA_SPACE_START		0x0
+#define NOUVEAU_VA_SPACE_END		(1ULL << NOUVEAU_VA_SPACE_BITS)
+
+struct uvmm_map_args {
+	u64 addr;
+	u64 range;
+	u8 kind;
+};
+
+int
+nouveau_uvmm_validate_range(struct nouveau_uvmm *uvmm, u64 addr, u64 range)
+{
+	u64 end = addr + range;
+	u64 unmanaged_end = uvmm->unmanaged_addr +
+			    uvmm->unmanaged_size;
+
+	if (addr & ~PAGE_MASK)
+		return -EINVAL;
+
+	if (range & ~PAGE_MASK)
+		return -EINVAL;
+
+	if (end <= addr)
+		return -EINVAL;
+
+	if (addr < NOUVEAU_VA_SPACE_START ||
+	    end > NOUVEAU_VA_SPACE_END)
+		return -EINVAL;
+
+	if (addr < unmanaged_end &&
+	    end > uvmm->unmanaged_addr)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int
+nouveau_uvmm_vmm_sparse_ref(struct nouveau_uvmm *uvmm,
+			    u64 addr, u64 range)
+{
+	struct nvif_vmm *vmm = &uvmm->vmm.vmm;
+
+	return nvif_vmm_raw_sparse(vmm, addr, range, true);
+}
+
+static int
+nouveau_uvmm_vmm_sparse_unref(struct nouveau_uvmm *uvmm,
+			      u64 addr, u64 range)
+{
+	struct nvif_vmm *vmm = &uvmm->vmm.vmm;
+
+	return nvif_vmm_raw_sparse(vmm, addr, range, false);
+}
+
+static int
+nouveau_uvmm_vmm_get(struct nouveau_uvmm *uvmm,
+		     u64 addr, u64 range)
+{
+	struct nvif_vmm *vmm = &uvmm->vmm.vmm;
+
+	return nvif_vmm_raw_get(vmm, addr, range, PAGE_SHIFT);
+}
+
+static int
+nouveau_uvmm_vmm_put(struct nouveau_uvmm *uvmm,
+		     u64 addr, u64 range)
+{
+	struct nvif_vmm *vmm = &uvmm->vmm.vmm;
+
+	return nvif_vmm_raw_put(vmm, addr, range, PAGE_SHIFT);
+}
+
+static int
+nouveau_uvmm_vmm_unmap(struct nouveau_uvmm *uvmm,
+		       u64 addr, u64 range, bool sparse)
+{
+	struct nvif_vmm *vmm = &uvmm->vmm.vmm;
+
+	return nvif_vmm_raw_unmap(vmm, addr, range, PAGE_SHIFT, sparse);
+}
+
+static int
+nouveau_uvmm_vmm_map(struct nouveau_uvmm *uvmm,
+		     u64 addr, u64 range,
+		     u64 bo_offset, u8 kind,
+		     struct nouveau_mem *mem)
+{
+	struct nvif_vmm *vmm = &uvmm->vmm.vmm;
+	union {
+		struct gf100_vmm_map_v0 gf100;
+	} args;
+	u32 argc = 0;
+
+	switch (vmm->object.oclass) {
+	case NVIF_CLASS_VMM_GF100:
+	case NVIF_CLASS_VMM_GM200:
+	case NVIF_CLASS_VMM_GP100:
+		args.gf100.version = 0;
+		if (mem->mem.type & NVIF_MEM_VRAM)
+			args.gf100.vol = 0;
+		else
+			args.gf100.vol = 1;
+		args.gf100.ro = 0;
+		args.gf100.priv = 0;
+		args.gf100.kind = kind;
+		argc = sizeof(args.gf100);
+		break;
+	default:
+		WARN_ON(1);
+		return -ENOSYS;
+	}
+
+	return nvif_vmm_raw_map(vmm, addr, range, PAGE_SHIFT,
+				&args, argc,
+				&mem->mem, bo_offset);
+}
+
+static int
+nouveau_uvma_region_sparse_unref(struct nouveau_uvma_region *reg)
+{
+	u64 addr = reg->region.va.addr << PAGE_SHIFT;
+	u64 range = reg->region.va.range << PAGE_SHIFT;
+
+	if (!reg->region.sparse)
+		return 0;
+
+	return nouveau_uvmm_vmm_sparse_unref(reg->uvmm, addr, range);
+}
+
+static int
+nouveau_uvma_vmm_put(struct nouveau_uvma *uvma)
+{
+	u64 addr = uvma->va.va.addr << PAGE_SHIFT;
+	u64 range = uvma->va.va.range << PAGE_SHIFT;
+
+	return nouveau_uvmm_vmm_put(uvma->uvmm, addr, range);
+}
+
+static int
+nouveau_uvma_map(struct nouveau_uvma *uvma,
+		 struct nouveau_mem *mem)
+{
+	u64 addr = uvma->va.va.addr << PAGE_SHIFT;
+	u64 offset = uvma->va.gem.offset << PAGE_SHIFT;
+	u64 range = uvma->va.va.range << PAGE_SHIFT;
+
+	return nouveau_uvmm_vmm_map(uvma->uvmm, addr, range,
+				    offset, uvma->kind, mem);
+}
+
+static int
+nouveau_uvma_unmap(struct nouveau_uvma *uvma)
+{
+	u64 addr = uvma->va.va.addr << PAGE_SHIFT;
+	u64 range = uvma->va.va.range << PAGE_SHIFT;
+	bool sparse = uvma->va.region->sparse;
+
+	if (drm_gpuva_evicted(&uvma->va))
+		return 0;
+
+	return nouveau_uvmm_vmm_unmap(uvma->uvmm, addr, range, sparse);
+}
+
+static int
+nouveau_uvma_alloc(struct nouveau_uvma **puvma)
+{
+	*puvma = kzalloc(sizeof(**puvma), GFP_KERNEL);
+	if (!*puvma)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void
+nouveau_uvma_free(struct nouveau_uvma *uvma)
+{
+	kfree(uvma);
+}
+
+static int
+__nouveau_uvma_insert(struct nouveau_uvmm *uvmm,
+		      struct nouveau_uvma *uvma)
+{
+	return drm_gpuva_insert(&uvmm->umgr, &uvma->va);
+}
+
+static int
+nouveau_uvma_insert(struct nouveau_uvmm *uvmm,
+		    struct nouveau_uvma *uvma,
+		    struct drm_gem_object *obj,
+		    u64 bo_offset, u64 addr,
+		    u64 range, u8 kind)
+{
+	int ret;
+
+	addr >>= PAGE_SHIFT;
+	bo_offset >>= PAGE_SHIFT;
+	range >>= PAGE_SHIFT;
+
+	uvma->uvmm = uvmm;
+	uvma->kind = kind;
+	uvma->va.va.addr = addr;
+	uvma->va.va.range = range;
+	uvma->va.gem.offset = bo_offset;
+	uvma->va.gem.obj = obj;
+
+	ret = __nouveau_uvma_insert(uvmm, uvma);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static void
+nouveau_uvma_remove(struct nouveau_uvma *uvma)
+{
+	drm_gpuva_remove(&uvma->va);
+}
+
+static void
+nouveau_uvma_gem_get(struct nouveau_uvma *uvma)
+{
+	drm_gem_object_get(uvma->va.gem.obj);
+}
+
+static void
+nouveau_uvma_gem_put(struct nouveau_uvma *uvma)
+{
+	drm_gem_object_put(uvma->va.gem.obj);
+}
+
+static int
+nouveau_uvma_region_alloc(struct nouveau_uvma_region **preg)
+{
+	*preg = kzalloc(sizeof(**preg), GFP_KERNEL);
+	if (!*preg)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void
+nouveau_uvma_region_free(struct nouveau_uvma_region *reg)
+{
+	kfree(reg);
+}
+
+static int
+__nouveau_uvma_region_insert(struct nouveau_uvmm *uvmm,
+			     struct nouveau_uvma_region *reg)
+{
+	return drm_gpuva_region_insert(&uvmm->umgr, &reg->region);
+}
+
+static int
+nouveau_uvma_region_insert(struct nouveau_uvmm *uvmm,
+			   struct nouveau_uvma_region *reg,
+			   u64 addr, u64 range,
+			   bool sparse)
+{
+	int ret;
+
+	reg->uvmm = uvmm;
+	reg->region.va.addr = addr >> PAGE_SHIFT;
+	reg->region.va.range = range >> PAGE_SHIFT;
+	reg->region.sparse = sparse;
+
+	ret = __nouveau_uvma_region_insert(uvmm, reg);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+int
+nouveau_uvma_region_create(struct nouveau_uvmm *uvmm,
+			   u64 addr, u64 range,
+			   bool sparse)
+{
+	struct nouveau_uvma_region *reg;
+	int ret;
+
+	ret = nouveau_uvma_region_alloc(&reg);
+	if (ret)
+		return ret;
+
+	ret = nouveau_uvma_region_insert(uvmm, reg, addr, range, sparse);
+	if (ret)
+		goto err_free_region;
+
+	if (sparse) {
+		ret = nouveau_uvmm_vmm_sparse_ref(uvmm, addr, range);
+		if (ret)
+			goto err_region_remove;
+	}
+
+	return 0;
+
+err_region_remove:
+	drm_gpuva_region_remove(&reg->region);
+err_free_region:
+	nouveau_uvma_region_free(reg);
+	return ret;
+}
+
+static struct nouveau_uvma_region *
+nouveau_uvma_region_find(struct nouveau_uvmm *uvmm,
+			 u64 addr, u64 range)
+{
+	struct drm_gpuva_region *reg;
+
+	reg = drm_gpuva_region_find(&uvmm->umgr,
+				    addr >> PAGE_SHIFT,
+				    range >> PAGE_SHIFT);
+	if (!reg)
+		return NULL;
+
+	return uvma_region_from_va_region(reg);
+}
+
+static void
+nouveau_uvma_region_remove(struct nouveau_uvma_region *reg)
+{
+	drm_gpuva_region_remove(&reg->region);
+}
+
+int
+__nouveau_uvma_region_destroy(struct nouveau_uvma_region *reg)
+{
+	struct nouveau_uvmm *uvmm = reg->uvmm;
+	u64 addr = reg->region.va.addr << PAGE_SHIFT;
+	u64 range = reg->region.va.range << PAGE_SHIFT;
+	bool sparse = reg->region.sparse;
+
+	if (!drm_gpuva_region_empty(&reg->region))
+		return -EBUSY;
+
+	nouveau_uvma_region_remove(reg);
+
+	if (sparse)
+		nouveau_uvmm_vmm_sparse_unref(uvmm, addr, range);
+
+	nouveau_uvma_region_free(reg);
+
+	return 0;
+}
+
+int
+nouveau_uvma_region_destroy(struct nouveau_uvmm *uvmm,
+			    u64 addr, u64 range)
+{
+	struct nouveau_uvma_region *reg;
+
+	reg = nouveau_uvma_region_find(uvmm, addr, range);
+	if (!reg)
+		return -ENOENT;
+
+	return __nouveau_uvma_region_destroy(reg);
+}
+
+static void
+op_map_prepare_unwind(struct nouveau_uvma *uvma)
+{
+	nouveau_uvma_gem_put(uvma);
+	nouveau_uvma_remove(uvma);
+	nouveau_uvma_free(uvma);
+}
+
+static void
+op_unmap_prepare_unwind(struct drm_gpuva *va)
+{
+	drm_gpuva_insert(va->mgr, va);
+}
+
+static void
+uvmm_sm_prepare_unwind(struct nouveau_uvmm *uvmm,
+		       struct nouveau_uvma_alloc *new,
+		       struct drm_gpuva_ops *ops,
+		       struct drm_gpuva_op *last,
+		       struct uvmm_map_args *args)
+{
+	struct drm_gpuva_op *op = last;
+	u64 vmm_get_start = args ? args->addr : 0;
+	u64 vmm_get_end = args ? args->addr + args->range : 0;
+
+	/* Unwind GPUVA space. */
+	drm_gpuva_for_each_op_from_reverse(op, ops) {
+		switch (op->op) {
+		case DRM_GPUVA_OP_MAP:
+			op_map_prepare_unwind(new->map);
+			break;
+		case DRM_GPUVA_OP_REMAP: {
+			struct drm_gpuva_op_remap *r = &op->remap;
+
+			if (r->next)
+				op_map_prepare_unwind(new->next);
+
+			if (r->prev)
+				op_map_prepare_unwind(new->prev);
+
+			op_unmap_prepare_unwind(r->unmap->va);
+			break;
+		}
+		case DRM_GPUVA_OP_UNMAP:
+			op_unmap_prepare_unwind(op->unmap.va);
+			break;
+		default:
+			break;
+		}
+	}
+
+	/* Unmap operation don't allocate page tables, hence skip the following
+	 * page table unwind.
+	 */
+	if (!args)
+		return;
+
+	drm_gpuva_for_each_op(op, ops) {
+		switch (op->op) {
+		case DRM_GPUVA_OP_MAP: {
+			u64 vmm_get_range = vmm_get_end - vmm_get_start;
+
+			if (vmm_get_range)
+				nouveau_uvmm_vmm_put(uvmm, vmm_get_start,
+						     vmm_get_range);
+			break;
+		}
+		case DRM_GPUVA_OP_REMAP: {
+			struct drm_gpuva_op_remap *r = &op->remap;
+			struct drm_gpuva *va = r->unmap->va;
+			u64 ustart = va->va.addr << PAGE_SHIFT;
+			u64 urange = va->va.range << PAGE_SHIFT;
+			u64 uend = ustart + urange;
+
+			if (r->prev)
+				vmm_get_start = uend;
+
+			if (r->next)
+				vmm_get_end = ustart;
+
+			if (r->prev && r->next)
+				vmm_get_start = vmm_get_end = 0;
+
+			break;
+		}
+		case DRM_GPUVA_OP_UNMAP: {
+			struct drm_gpuva_op_unmap *u = &op->unmap;
+			struct drm_gpuva *va = u->va;
+			u64 ustart = va->va.addr << PAGE_SHIFT;
+			u64 urange = va->va.range << PAGE_SHIFT;
+			u64 uend = ustart + urange;
+
+			/* Nothing to do for mappings we merge with. */
+			if (uend == vmm_get_start ||
+			    ustart == vmm_get_end)
+				break;
+
+			if (ustart > vmm_get_start) {
+				u64 vmm_get_range = ustart - vmm_get_start;
+
+				nouveau_uvmm_vmm_put(uvmm, vmm_get_start,
+						     vmm_get_range);
+			}
+			vmm_get_start = uend;
+			break;
+		}
+		default:
+			break;
+		}
+
+		if (op == last)
+			break;
+	}
+}
+
+void
+nouveau_uvmm_sm_map_prepare_unwind(struct nouveau_uvmm *uvmm,
+				   struct nouveau_uvma_alloc *new,
+				   struct drm_gpuva_ops *ops,
+				   u64 addr, u64 range)
+{
+	struct drm_gpuva_op *last = drm_gpuva_last_op(ops);
+	struct uvmm_map_args args = {
+		.addr = addr,
+		.range = range,
+	};
+
+	uvmm_sm_prepare_unwind(uvmm, new, ops, last, &args);
+}
+
+void
+nouveau_uvmm_sm_unmap_prepare_unwind(struct nouveau_uvmm *uvmm,
+				     struct nouveau_uvma_alloc *new,
+				     struct drm_gpuva_ops *ops)
+{
+	struct drm_gpuva_op *last = drm_gpuva_last_op(ops);
+
+	uvmm_sm_prepare_unwind(uvmm, new, ops, last, NULL);
+}
+
+static int
+op_map_prepare(struct nouveau_uvmm *uvmm,
+	       struct nouveau_uvma **puvma,
+	       struct drm_gpuva_op_map *m,
+	       struct uvmm_map_args *args)
+{
+	struct nouveau_uvma *uvma;
+	int ret;
+
+	ret = nouveau_uvma_alloc(&uvma);
+	if (ret)
+		goto err;
+
+	ret = nouveau_uvma_insert(uvmm, uvma, m->gem.obj,
+				   m->gem.offset << PAGE_SHIFT,
+				   m->va.addr << PAGE_SHIFT,
+				   m->va.range << PAGE_SHIFT,
+				   args->kind);
+	if (ret)
+		goto err_free_uvma;
+
+	/* Keep a reference until this uvma is destroyed. */
+	nouveau_uvma_gem_get(uvma);
+
+	*puvma = uvma;
+	return 0;
+
+err_free_uvma:
+	nouveau_uvma_free(uvma);
+err:
+	*puvma = NULL;
+	return ret;
+}
+
+static void
+op_unmap_prepare(struct drm_gpuva_op_unmap *u)
+{
+	struct nouveau_uvma *uvma = uvma_from_va(u->va);
+
+	nouveau_uvma_remove(uvma);
+}
+
+static int
+uvmm_sm_prepare(struct nouveau_uvmm *uvmm,
+		struct nouveau_uvma_alloc *new,
+		struct drm_gpuva_ops *ops,
+		struct uvmm_map_args *args)
+{
+	struct drm_gpuva_op *op;
+	u64 vmm_get_start = args ? args->addr : 0;
+	u64 vmm_get_end = args ? args->addr + args->range : 0;
+	int ret;
+
+	drm_gpuva_for_each_op(op, ops) {
+		switch (op->op) {
+		case DRM_GPUVA_OP_MAP: {
+			u64 vmm_get_range = vmm_get_end - vmm_get_start;
+
+			ret = op_map_prepare(uvmm, &new->map, &op->map, args);
+			if (ret)
+				goto unwind;
+
+			if (args && vmm_get_range) {
+				ret = nouveau_uvmm_vmm_get(uvmm, vmm_get_start,
+							   vmm_get_range);
+				if (ret) {
+					op_map_prepare_unwind(new->map);
+					goto unwind;
+				}
+			}
+			break;
+		}
+		case DRM_GPUVA_OP_REMAP: {
+			struct drm_gpuva_op_remap *r = &op->remap;
+			struct drm_gpuva *va = r->unmap->va;
+			struct uvmm_map_args remap_args = {
+				.kind = uvma_from_va(va)->kind,
+			};
+			u64 ustart = va->va.addr << PAGE_SHIFT;
+			u64 urange = va->va.range << PAGE_SHIFT;
+			u64 uend = ustart + urange;
+
+			op_unmap_prepare(r->unmap);
+
+			if (r->prev) {
+				ret = op_map_prepare(uvmm, &new->prev, r->prev,
+						     &remap_args);
+				if (ret)
+					goto unwind;
+
+				if (args)
+					vmm_get_start = uend;
+			}
+
+			if (r->next) {
+				ret = op_map_prepare(uvmm, &new->next, r->next,
+						     &remap_args);
+				if (ret) {
+					if (r->prev)
+						op_map_prepare_unwind(new->prev);
+					goto unwind;
+				}
+
+				if (args)
+					vmm_get_end = ustart;
+			}
+
+			if (args && (r->prev && r->next))
+				vmm_get_start = vmm_get_end = 0;
+
+			break;
+		}
+		case DRM_GPUVA_OP_UNMAP: {
+			struct drm_gpuva_op_unmap *u = &op->unmap;
+			struct drm_gpuva *va = u->va;
+			u64 ustart = va->va.addr << PAGE_SHIFT;
+			u64 urange = va->va.range << PAGE_SHIFT;
+			u64 uend = ustart + urange;
+
+			op_unmap_prepare(u);
+
+			if (!args)
+				break;
+
+			/* Nothing to do for mappings we merge with. */
+			if (uend == vmm_get_start ||
+			    ustart == vmm_get_end)
+				break;
+
+			if (ustart > vmm_get_start) {
+				u64 vmm_get_range = ustart - vmm_get_start;
+
+				ret = nouveau_uvmm_vmm_get(uvmm, vmm_get_start,
+							   vmm_get_range);
+				if (ret) {
+					op_unmap_prepare_unwind(va);
+					goto unwind;
+				}
+			}
+			vmm_get_start = uend;
+
+			break;
+		}
+		default:
+			ret = -EINVAL;
+			goto unwind;
+		}
+	}
+
+	return 0;
+
+unwind:
+	if (op != drm_gpuva_first_op(ops))
+		uvmm_sm_prepare_unwind(uvmm, new, ops,
+				       drm_gpuva_prev_op(op),
+				       args);
+	return ret;
+}
+
+int
+nouveau_uvmm_sm_map_prepare(struct nouveau_uvmm *uvmm,
+			    struct nouveau_uvma_alloc *new,
+			    struct drm_gpuva_ops *ops,
+			    u64 addr, u64 range, u8 kind)
+{
+	struct uvmm_map_args args = {
+		.addr = addr,
+		.range = range,
+		.kind = kind,
+	};
+
+	return uvmm_sm_prepare(uvmm, new, ops, &args);
+}
+
+int
+nouveau_uvmm_sm_unmap_prepare(struct nouveau_uvmm *uvmm,
+			      struct nouveau_uvma_alloc *new,
+			      struct drm_gpuva_ops *ops)
+{
+	return uvmm_sm_prepare(uvmm, new, ops, NULL);
+}
+
+struct drm_gpuva_ops *
+nouveau_uvmm_sm_map_ops(struct nouveau_uvmm *uvmm,
+			u64 addr, u64 range,
+			struct drm_gem_object *obj, u64 offset)
+{
+	return drm_gpuva_sm_map_ops_create(&uvmm->umgr,
+					   addr >> PAGE_SHIFT,
+					   range >> PAGE_SHIFT,
+					   obj, offset >> PAGE_SHIFT);
+}
+
+struct drm_gpuva_ops *
+nouveau_uvmm_sm_unmap_ops(struct nouveau_uvmm *uvmm,
+			  u64 addr, u64 range)
+{
+	return drm_gpuva_sm_unmap_ops_create(&uvmm->umgr,
+					     addr >> PAGE_SHIFT,
+					     range >> PAGE_SHIFT);
+}
+
+static struct drm_gem_object *
+op_gem_obj(struct drm_gpuva_op *op)
+{
+	switch (op->op) {
+	case DRM_GPUVA_OP_MAP:
+		return op->map.gem.obj;
+	case DRM_GPUVA_OP_REMAP:
+		return op->remap.unmap->va->gem.obj;
+	case DRM_GPUVA_OP_UNMAP:
+		return op->unmap.va->gem.obj;
+	default:
+		WARN(1, "Unknown operation.\n");
+		return NULL;
+	}
+}
+
+static void
+op_map(struct nouveau_uvma *uvma)
+{
+	struct nouveau_bo *nvbo = nouveau_gem_object(uvma->va.gem.obj);
+
+	nouveau_uvma_map(uvma, nouveau_mem(nvbo->bo.resource));
+	drm_gpuva_link(&uvma->va);
+}
+
+static void
+op_unmap(struct drm_gpuva_op_unmap *u)
+{
+	struct drm_gpuva *va = u->va;
+	struct nouveau_uvma *uvma = uvma_from_va(va);
+
+	/* nouveau_uvma_unmap() does not try to unmap if backing BO is
+	 * evicted.
+	 */
+	if (!u->keep)
+		nouveau_uvma_unmap(uvma);
+	drm_gpuva_unlink(va);
+}
+
+static void
+op_unmap_range(struct drm_gpuva_op_unmap *u,
+	       u64 addr, u64 range)
+{
+	struct nouveau_uvma *uvma = uvma_from_va(u->va);
+	bool sparse = uvma->va.region->sparse;
+
+	addr <<= PAGE_SHIFT;
+	range <<= PAGE_SHIFT;
+
+	if (!drm_gpuva_evicted(u->va))
+		nouveau_uvmm_vmm_unmap(uvma->uvmm, addr, range, sparse);
+
+	drm_gpuva_unlink(u->va);
+}
+
+static void
+op_remap(struct drm_gpuva_op_remap *r,
+	 struct nouveau_uvma_alloc *new)
+{
+	struct drm_gpuva_op_unmap *u = r->unmap;
+	struct nouveau_uvma *uvma = uvma_from_va(u->va);
+	u64 addr = uvma->va.va.addr;
+	u64 range = uvma->va.va.range;
+
+	if (r->prev) {
+		addr = r->prev->va.addr + r->prev->va.range;
+		drm_gpuva_link(&new->prev->va);
+	}
+
+	if (r->next) {
+		range = r->next->va.addr - addr;
+		drm_gpuva_link(&new->next->va);
+	}
+
+	op_unmap_range(u, addr, range);
+}
+
+static int
+uvmm_sm(struct nouveau_uvmm *uvmm,
+	struct nouveau_uvma_alloc *new,
+	struct drm_gpuva_ops *ops)
+{
+	struct drm_gpuva_op *op;
+
+	drm_gpuva_for_each_op(op, ops) {
+		struct drm_gem_object *obj = op_gem_obj(op);
+
+		if (!obj)
+			return -EINVAL;
+
+		drm_gem_gpuva_lock(obj);
+		switch (op->op) {
+		case DRM_GPUVA_OP_MAP:
+			op_map(new->map);
+			break;
+		case DRM_GPUVA_OP_REMAP: {
+			op_remap(&op->remap, new);
+			break;
+		}
+		case DRM_GPUVA_OP_UNMAP:
+			op_unmap(&op->unmap);
+			break;
+		default:
+			break;
+		}
+		drm_gem_gpuva_unlock(obj);
+	}
+
+	return 0;
+}
+
+int
+nouveau_uvmm_sm_map(struct nouveau_uvmm *uvmm,
+		    struct nouveau_uvma_alloc *new,
+		    struct drm_gpuva_ops *ops)
+{
+	return uvmm_sm(uvmm, new, ops);
+}
+
+int
+nouveau_uvmm_sm_unmap(struct nouveau_uvmm *uvmm,
+		      struct nouveau_uvma_alloc *new,
+		      struct drm_gpuva_ops *ops)
+{
+	return uvmm_sm(uvmm, new, ops);
+}
+
+static void
+uvmm_sm_cleanup(struct nouveau_uvmm *uvmm,
+		struct nouveau_uvma_alloc *new,
+		struct drm_gpuva_ops *ops, bool unmap)
+{
+	struct drm_gpuva_op *op;
+
+	drm_gpuva_for_each_op(op, ops) {
+		switch (op->op) {
+		case DRM_GPUVA_OP_MAP:
+			break;
+		case DRM_GPUVA_OP_REMAP: {
+			struct drm_gpuva_op_remap *r = &op->remap;
+			struct drm_gpuva_op_map *p = r->prev;
+			struct drm_gpuva_op_map *n = r->next;
+			struct drm_gpuva *va = r->unmap->va;
+			struct nouveau_uvma *uvma = uvma_from_va(va);
+
+			if (unmap) {
+				u64 addr = va->va.addr << PAGE_SHIFT;
+				u64 end = addr + (va->va.range << PAGE_SHIFT);
+
+				if (p)
+					addr = (p->va.addr << PAGE_SHIFT) +
+					       (n->va.range << PAGE_SHIFT);
+
+				if (n)
+					end = n->va.addr << PAGE_SHIFT;
+
+				nouveau_uvmm_vmm_put(uvmm, addr, end - addr);
+			}
+
+			nouveau_uvma_gem_put(uvma);
+			nouveau_uvma_free(uvma);
+			break;
+		}
+		case DRM_GPUVA_OP_UNMAP: {
+			struct drm_gpuva_op_unmap *u = &op->unmap;
+			struct drm_gpuva *va = u->va;
+			struct nouveau_uvma *uvma = uvma_from_va(va);
+
+			if (unmap)
+				nouveau_uvma_vmm_put(uvma);
+
+			nouveau_uvma_gem_put(uvma);
+			nouveau_uvma_free(uvma);
+			break;
+		}
+		default:
+			break;
+		}
+	}
+}
+
+void
+nouveau_uvmm_sm_map_cleanup(struct nouveau_uvmm *uvmm,
+			    struct nouveau_uvma_alloc *new,
+			    struct drm_gpuva_ops *ops)
+{
+	uvmm_sm_cleanup(uvmm, new, ops, false);
+}
+
+void
+nouveau_uvmm_sm_unmap_cleanup(struct nouveau_uvmm *uvmm,
+			      struct nouveau_uvma_alloc *new,
+			      struct drm_gpuva_ops *ops)
+{
+	uvmm_sm_cleanup(uvmm, new, ops, true);
+}
+
+void
+nouveau_uvmm_bo_map_all(struct nouveau_bo *nvbo, struct nouveau_mem *mem)
+{
+	struct drm_gem_object *obj = &nvbo->bo.base;
+	struct drm_gpuva *va;
+
+	drm_gem_gpuva_lock(obj);
+	drm_gem_for_each_gpuva(va, obj) {
+		struct nouveau_uvma *uvma = uvma_from_va(va);
+
+		nouveau_uvma_map(uvma, mem);
+		drm_gpuva_evict(va, false);
+	}
+	drm_gem_gpuva_unlock(obj);
+}
+
+void
+nouveau_uvmm_bo_unmap_all(struct nouveau_bo *nvbo)
+{
+	struct drm_gem_object *obj = &nvbo->bo.base;
+	struct drm_gpuva *va;
+
+	drm_gem_gpuva_lock(obj);
+	drm_gem_for_each_gpuva(va, obj) {
+		struct nouveau_uvma *uvma = uvma_from_va(va);
+
+		nouveau_uvma_unmap(uvma);
+		drm_gpuva_evict(va, true);
+	}
+	drm_gem_gpuva_unlock(obj);
+}
+
+int
+nouveau_uvmm_init(struct nouveau_uvmm *uvmm, struct nouveau_cli *cli,
+		  struct drm_nouveau_vm_init *init)
+{
+	int ret;
+	u64 unmanaged_end = init->unmanaged_addr + init->unmanaged_size;
+
+	mutex_init(&uvmm->mutex);
+
+	mutex_lock(&cli->mutex);
+
+	if (unlikely(cli->uvmm.disabled)) {
+		ret = -ENOSYS;
+		goto out_unlock;
+	}
+
+	if (unmanaged_end <= init->unmanaged_addr) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (unmanaged_end > NOUVEAU_VA_SPACE_END) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	uvmm->unmanaged_addr = init->unmanaged_addr;
+	uvmm->unmanaged_size = init->unmanaged_size;
+
+	drm_gpuva_manager_init(&uvmm->umgr, cli->name,
+			       NOUVEAU_VA_SPACE_START >> PAGE_SHIFT,
+			       NOUVEAU_VA_SPACE_END >> PAGE_SHIFT,
+			       init->unmanaged_addr >> PAGE_SHIFT,
+			       init->unmanaged_size >> PAGE_SHIFT,
+			       NULL, DRM_GPUVA_MANAGER_REGIONS);
+
+	ret = nvif_vmm_ctor(&cli->mmu, "uvmm",
+			    cli->vmm.vmm.object.oclass, RAW,
+			    init->unmanaged_addr, init->unmanaged_size,
+			    NULL, 0, &cli->uvmm.vmm.vmm);
+	if (ret)
+		goto out_free_gpuva_mgr;
+
+	cli->uvmm.vmm.cli = cli;
+	mutex_unlock(&cli->mutex);
+
+	return 0;
+
+out_free_gpuva_mgr:
+	drm_gpuva_manager_destroy(&uvmm->umgr);
+out_unlock:
+	mutex_unlock(&cli->mutex);
+	return ret;
+}
+
+void
+nouveau_uvmm_fini(struct nouveau_uvmm *uvmm)
+{
+	DRM_GPUVA_ITER(it, &uvmm->umgr);
+	DRM_GPUVA_REGION_ITER(__it, &uvmm->umgr);
+	struct nouveau_cli *cli = uvmm->vmm.cli;
+
+	if (!cli)
+		return;
+
+	nouveau_uvmm_lock(uvmm);
+	drm_gpuva_iter_for_each(it) {
+		struct drm_gpuva *va = it.va;
+		struct nouveau_uvma *uvma = uvma_from_va(va);
+		struct drm_gem_object *obj = va->gem.obj;
+
+		drm_gpuva_iter_remove(&it);
+
+		drm_gem_gpuva_lock(obj);
+		nouveau_uvma_unmap(uvma);
+		drm_gpuva_unlink(va);
+		drm_gem_gpuva_unlock(obj);
+
+		nouveau_uvma_vmm_put(uvma);
+
+		nouveau_uvma_gem_put(uvma);
+		nouveau_uvma_free(uvma);
+	}
+
+	drm_gpuva_iter_for_each(__it) {
+		struct drm_gpuva_region *reg = __it.reg;
+		struct nouveau_uvma_region *ureg = uvma_region_from_va_region(reg);
+
+		if (unlikely(reg == &uvmm->umgr.kernel_alloc_region))
+			continue;
+
+		drm_gpuva_iter_remove(&__it);
+
+		nouveau_uvma_region_sparse_unref(ureg);
+		nouveau_uvma_region_free(ureg);
+	}
+	nouveau_uvmm_unlock(uvmm);
+
+	mutex_lock(&cli->mutex);
+	nouveau_vmm_fini(&uvmm->vmm);
+	drm_gpuva_manager_destroy(&uvmm->umgr);
+	mutex_unlock(&cli->mutex);
+}
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.h b/drivers/gpu/drm/nouveau/nouveau_uvmm.h
new file mode 100644
index 000000000000..858840e9e0c5
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.h
@@ -0,0 +1,110 @@
+/* SPDX-License-Identifier: MIT */
+
+#ifndef __NOUVEAU_UVMM_H__
+#define __NOUVEAU_UVMM_H__
+
+#include <drm/drm_gpuva_mgr.h>
+
+#include "nouveau_drv.h"
+
+struct nouveau_uvmm {
+	struct nouveau_vmm vmm;
+	struct drm_gpuva_manager umgr;
+	struct mutex mutex;
+
+	u64 unmanaged_addr;
+	u64 unmanaged_size;
+
+	bool disabled;
+};
+
+struct nouveau_uvma_region {
+	struct drm_gpuva_region region;
+	struct nouveau_uvmm *uvmm;
+};
+
+struct nouveau_uvma {
+	struct drm_gpuva va;
+	struct nouveau_uvmm *uvmm;
+	u64 handle;
+	u8 kind;
+};
+
+struct nouveau_uvma_alloc {
+	struct nouveau_uvma *map;
+	struct nouveau_uvma *prev;
+	struct nouveau_uvma *next;
+};
+
+#define uvmm_from_mgr(x) container_of((x), struct nouveau_uvmm, umgr)
+#define uvma_from_va(x) container_of((x), struct nouveau_uvma, va)
+#define uvma_region_from_va_region(x) container_of((x), struct nouveau_uvma_region, region)
+
+int nouveau_uvmm_init(struct nouveau_uvmm *uvmm, struct nouveau_cli *cli,
+		      struct drm_nouveau_vm_init *init);
+void nouveau_uvmm_fini(struct nouveau_uvmm *uvmm);
+
+void nouveau_uvmm_bo_map_all(struct nouveau_bo *nvbov, struct nouveau_mem *mem);
+void nouveau_uvmm_bo_unmap_all(struct nouveau_bo *nvbo);
+
+int nouveau_uvmm_validate_range(struct nouveau_uvmm *uvmm,
+				u64 addr, u64 range);
+
+int nouveau_uvma_region_create(struct nouveau_uvmm *uvmm,
+			       u64 addr, u64 range,
+			       bool sparse);
+int __nouveau_uvma_region_destroy(struct nouveau_uvma_region *reg);
+int nouveau_uvma_region_destroy(struct nouveau_uvmm *uvmm,
+				u64 addr, u64 range);
+
+struct drm_gpuva_ops *
+nouveau_uvmm_sm_map_ops(struct nouveau_uvmm *uvmm,
+			u64 addr, u64 range,
+			struct drm_gem_object *obj, u64 offset);
+struct drm_gpuva_ops *
+nouveau_uvmm_sm_unmap_ops(struct nouveau_uvmm *uvmm,
+			  u64 addr, u64 range);
+
+void
+nouveau_uvmm_sm_map_prepare_unwind(struct nouveau_uvmm *uvmm,
+				   struct nouveau_uvma_alloc *new,
+				   struct drm_gpuva_ops *ops,
+				   u64 addr, u64 range);
+void
+nouveau_uvmm_sm_unmap_prepare_unwind(struct nouveau_uvmm *uvmm,
+				     struct nouveau_uvma_alloc *new,
+				     struct drm_gpuva_ops *ops);
+
+int nouveau_uvmm_sm_map_prepare(struct nouveau_uvmm *uvmm,
+				struct nouveau_uvma_alloc *new,
+				struct drm_gpuva_ops *ops,
+				u64 addr, u64 range, u8 kind);
+int nouveau_uvmm_sm_unmap_prepare(struct nouveau_uvmm *uvmm,
+				  struct nouveau_uvma_alloc *new,
+				  struct drm_gpuva_ops *ops);
+
+int nouveau_uvmm_sm_map(struct nouveau_uvmm *uvmm,
+			struct nouveau_uvma_alloc *new,
+			struct drm_gpuva_ops *ops);
+int nouveau_uvmm_sm_unmap(struct nouveau_uvmm *uvmm,
+			  struct nouveau_uvma_alloc *new,
+			  struct drm_gpuva_ops *ops);
+
+void nouveau_uvmm_sm_map_cleanup(struct nouveau_uvmm *uvmm,
+				 struct nouveau_uvma_alloc *new,
+				 struct drm_gpuva_ops *ops);
+void nouveau_uvmm_sm_unmap_cleanup(struct nouveau_uvmm *uvmm,
+				   struct nouveau_uvma_alloc *new,
+				   struct drm_gpuva_ops *ops);
+
+static inline void nouveau_uvmm_lock(struct nouveau_uvmm *uvmm)
+{
+	mutex_lock(&uvmm->mutex);
+}
+
+static inline void nouveau_uvmm_unlock(struct nouveau_uvmm *uvmm)
+{
+	mutex_unlock(&uvmm->mutex);
+}
+
+#endif
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 15/16] drm/nouveau: implement new VM_BIND UAPI
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (13 preceding siblings ...)
  2023-02-17 13:48 ` [PATCH drm-next v2 14/16] drm/nouveau: implement uvmm for user mode bindings Danilo Krummrich
@ 2023-02-17 13:48 ` Danilo Krummrich
  2023-02-17 13:48 ` [PATCH drm-next v2 16/16] drm/nouveau: debugfs: implement DRM GPU VA debugfs Danilo Krummrich
  2023-03-09  9:12 ` [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Boris Brezillon
  16 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:48 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm, Danilo Krummrich

This commit provides the implementation for the new uapi motivated by the
Vulkan API. It allows user mode drivers (UMDs) to:

1) Initialize a GPU virtual address (VA) space via the new
   DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
   space managed by the kernel and userspace, respectively.

2) Allocate and free a VA space region as well as bind and unbind memory
   to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
   UMDs can request the named operations to be processed either
   synchronously or asynchronously. It supports DRM syncobjs
   (incl. timelines) as synchronization mechanism. The management of the
   GPU VA mappings is implemented with the DRM GPU VA manager.

3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
   execution happens asynchronously. It supports DRM syncobj (incl.
   timelines) as synchronization mechanism. DRM GEM object locking is
   handled with drm_exec.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
GPU scheduler for the asynchronous paths.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 Documentation/gpu/driver-uapi.rst       |   3 +
 drivers/gpu/drm/nouveau/Kbuild          |   2 +
 drivers/gpu/drm/nouveau/Kconfig         |   2 +
 drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
 drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
 drivers/gpu/drm/nouveau/nouveau_drm.c   |  24 +-
 drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
 drivers/gpu/drm/nouveau/nouveau_exec.c  | 322 ++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_exec.h  |  39 ++
 drivers/gpu/drm/nouveau/nouveau_sched.c | 467 ++++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_sched.h |  96 +++++
 drivers/gpu/drm/nouveau/nouveau_uvmm.c  | 446 ++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_uvmm.h  |  28 ++
 13 files changed, 1451 insertions(+), 4 deletions(-)
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h

diff --git a/Documentation/gpu/driver-uapi.rst b/Documentation/gpu/driver-uapi.rst
index 9c7ca6e33a68..c08bcbb95fb3 100644
--- a/Documentation/gpu/driver-uapi.rst
+++ b/Documentation/gpu/driver-uapi.rst
@@ -13,4 +13,7 @@ drm/nouveau uAPI
 VM_BIND / EXEC uAPI
 -------------------
 
+.. kernel-doc:: drivers/gpu/drm/nouveau/nouveau_exec.c
+    :doc: Overview
+
 .. kernel-doc:: include/uapi/drm/nouveau_drm.h
diff --git a/drivers/gpu/drm/nouveau/Kbuild b/drivers/gpu/drm/nouveau/Kbuild
index ee281bb76463..cf6b3a80c0c8 100644
--- a/drivers/gpu/drm/nouveau/Kbuild
+++ b/drivers/gpu/drm/nouveau/Kbuild
@@ -47,6 +47,8 @@ nouveau-y += nouveau_prime.o
 nouveau-y += nouveau_sgdma.o
 nouveau-y += nouveau_ttm.o
 nouveau-y += nouveau_vmm.o
+nouveau-y += nouveau_exec.o
+nouveau-y += nouveau_sched.o
 nouveau-y += nouveau_uvmm.o
 
 # DRM - modesetting
diff --git a/drivers/gpu/drm/nouveau/Kconfig b/drivers/gpu/drm/nouveau/Kconfig
index a70bd65e1400..c52e8096cca4 100644
--- a/drivers/gpu/drm/nouveau/Kconfig
+++ b/drivers/gpu/drm/nouveau/Kconfig
@@ -10,6 +10,8 @@ config DRM_NOUVEAU
 	select DRM_KMS_HELPER
 	select DRM_TTM
 	select DRM_TTM_HELPER
+	select DRM_EXEC
+	select DRM_SCHED
 	select I2C
 	select I2C_ALGOBIT
 	select BACKLIGHT_CLASS_DEVICE if DRM_NOUVEAU_BACKLIGHT
diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c b/drivers/gpu/drm/nouveau/nouveau_abi16.c
index 36cc80eb0e20..694777a58bca 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
@@ -35,6 +35,7 @@
 #include "nouveau_chan.h"
 #include "nouveau_abi16.h"
 #include "nouveau_vmm.h"
+#include "nouveau_sched.h"
 
 static struct nouveau_abi16 *
 nouveau_abi16(struct drm_file *file_priv)
@@ -125,6 +126,17 @@ nouveau_abi16_chan_fini(struct nouveau_abi16 *abi16,
 {
 	struct nouveau_abi16_ntfy *ntfy, *temp;
 
+	/* When a client exits without waiting for it's queued up jobs to
+	 * finish it might happen that we fault the channel. This is due to
+	 * drm_file_free() calling drm_gem_release() before the postclose()
+	 * callback. Hence, we can't tear down this scheduler entity before
+	 * uvmm mappings are unmapped. Currently, we can't detect this case.
+	 *
+	 * However, this should be rare and harmless, since the channel isn't
+	 * needed anymore.
+	 */
+	nouveau_sched_entity_fini(&chan->sched_entity);
+
 	/* wait for all activity to stop before cleaning up */
 	if (chan->chan)
 		nouveau_channel_idle(chan->chan);
@@ -311,6 +323,10 @@ nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
 	if (ret)
 		goto done;
 
+	ret = nouveau_sched_entity_init(&chan->sched_entity, &drm->sched);
+	if (ret)
+		goto done;
+
 	init->channel = chan->chan->chid;
 
 	if (device->info.family >= NV_DEVICE_INFO_V0_TESLA)
diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.h b/drivers/gpu/drm/nouveau/nouveau_abi16.h
index 27eae85f33e6..8209eb28feaf 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.h
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.h
@@ -26,6 +26,7 @@ struct nouveau_abi16_chan {
 	struct nouveau_bo *ntfy;
 	struct nouveau_vma *ntfy_vma;
 	struct nvkm_mm  heap;
+	struct nouveau_sched_entity sched_entity;
 };
 
 struct nouveau_abi16 {
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index cde843156700..a5b1c7e7d24f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -68,7 +68,9 @@
 #include "nouveau_platform.h"
 #include "nouveau_svm.h"
 #include "nouveau_dmem.h"
+#include "nouveau_exec.h"
 #include "nouveau_uvmm.h"
+#include "nouveau_sched.h"
 
 DECLARE_DYNDBG_CLASSMAP(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
 			"DRM_UT_CORE",
@@ -190,6 +192,7 @@ nouveau_cli_fini(struct nouveau_cli *cli)
 	flush_work(&cli->work);
 	WARN_ON(!list_empty(&cli->worker));
 
+	nouveau_sched_entity_fini(&cli->sched_entity);
 	usif_client_fini(cli);
 	nouveau_uvmm_fini(&cli->uvmm);
 	nouveau_vmm_fini(&cli->svm);
@@ -297,6 +300,11 @@ nouveau_cli_init(struct nouveau_drm *drm, const char *sname,
 	}
 
 	cli->mem = &mems[ret];
+
+	ret = nouveau_sched_entity_init(&cli->sched_entity, &drm->sched);
+	if (ret)
+		goto done;
+
 	return 0;
 done:
 	if (ret)
@@ -609,8 +617,13 @@ nouveau_drm_device_init(struct drm_device *dev)
 		pm_runtime_put(dev->dev);
 	}
 
-	return 0;
+	ret = nouveau_sched_init(&drm->sched, drm);
+	if (ret)
+		goto fail_sched_init;
 
+	return 0;
+fail_sched_init:
+	nouveau_display_fini(dev, false, false);
 fail_dispinit:
 	nouveau_display_destroy(dev);
 fail_dispctor:
@@ -635,6 +648,8 @@ nouveau_drm_device_fini(struct drm_device *dev)
 	struct nouveau_cli *cli, *temp_cli;
 	struct nouveau_drm *drm = nouveau_drm(dev);
 
+	nouveau_sched_fini(&drm->sched);
+
 	if (nouveau_pmops_runtime()) {
 		pm_runtime_get_sync(dev->dev);
 		pm_runtime_forbid(dev->dev);
@@ -1175,6 +1190,9 @@ nouveau_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_CPU_PREP, nouveau_gem_ioctl_cpu_prep, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_CPU_FINI, nouveau_gem_ioctl_cpu_fini, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_INFO, nouveau_gem_ioctl_info, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(NOUVEAU_VM_INIT, nouveau_uvmm_ioctl_vm_init, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(NOUVEAU_VM_BIND, nouveau_uvmm_ioctl_vm_bind, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(NOUVEAU_EXEC, nouveau_exec_ioctl_exec, DRM_RENDER_ALLOW),
 };
 
 long
@@ -1223,7 +1241,9 @@ static struct drm_driver
 driver_stub = {
 	.driver_features = DRIVER_GEM |
 			   DRIVER_MODESET |
-			   DRIVER_RENDER,
+			   DRIVER_RENDER |
+			   DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE |
+			   DRIVER_GEM_GPUVA,
 	.open = nouveau_drm_open,
 	.postclose = nouveau_drm_postclose,
 	.lastclose = nouveau_vga_lastclose,
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index d634f1054d65..94de792ef3ca 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -10,8 +10,8 @@
 #define DRIVER_DATE		"20120801"
 
 #define DRIVER_MAJOR		1
-#define DRIVER_MINOR		3
-#define DRIVER_PATCHLEVEL	1
+#define DRIVER_MINOR		4
+#define DRIVER_PATCHLEVEL	0
 
 /*
  * 1.1.1:
@@ -63,6 +63,7 @@ struct platform_device;
 
 #include "nouveau_fence.h"
 #include "nouveau_bios.h"
+#include "nouveau_sched.h"
 #include "nouveau_vmm.h"
 #include "nouveau_uvmm.h"
 
@@ -94,6 +95,8 @@ struct nouveau_cli {
 	struct nouveau_vmm svm;
 	struct nouveau_uvmm uvmm;
 
+	struct nouveau_sched_entity sched_entity;
+
 	const struct nvif_mclass *mem;
 
 	struct list_head head;
@@ -305,6 +308,8 @@ struct nouveau_drm {
 		struct mutex lock;
 		bool component_registered;
 	} audio;
+
+	struct drm_gpu_scheduler sched;
 };
 
 static inline struct nouveau_drm *
diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.c b/drivers/gpu/drm/nouveau/nouveau_exec.c
new file mode 100644
index 000000000000..536956a79279
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/nouveau_exec.c
@@ -0,0 +1,322 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *     Danilo Krummrich <dakr@redhat.com>
+ *
+ */
+
+#include <drm/drm_exec.h>
+
+#include "nouveau_drv.h"
+#include "nouveau_gem.h"
+#include "nouveau_mem.h"
+#include "nouveau_dma.h"
+#include "nouveau_exec.h"
+#include "nouveau_abi16.h"
+#include "nouveau_chan.h"
+#include "nouveau_sched.h"
+#include "nouveau_uvmm.h"
+
+/**
+ * DOC: Overview
+ *
+ * Nouveau's VM_BIND / EXEC UAPI consists of three ioctls: DRM_NOUVEAU_VM_INIT,
+ * DRM_NOUVEAU_VM_BIND and DRM_NOUVEAU_EXEC.
+ *
+ * In order to use the UAPI firstly a user client must initialize the VA space
+ * using the DRM_NOUVEAU_VM_INIT ioctl specifying which region of the VA space
+ * should be managed by the kernel and which by the UMD.
+ *
+ * The DRM_NOUVEAU_VM_BIND ioctl provides clients an interface to manage the
+ * userspace-managable portion of the VA space. It provides operations to
+ * allocate and free a VA space regions and operations to map and unmap memory
+ * within such a region. Bind operations crossing region boundaries are not
+ * permitted.
+ *
+ * When allocating a VA space region userspace may flag this region as sparse.
+ * If a region is flagged as sparse the kernel will take care that for the whole
+ * region sparse mappings are created. Subsequently requested actual memory
+ * backed mappings for a sparse region will take precedence over the sparse
+ * mappings. If the memory backed mappings are unmapped the kernel will make
+ * sure that sparse mappings will take their place again.
+ *
+ * When using the VM_BIND ioctl to request the kernel to map memory to a given
+ * virtual address in the GPU's VA space there is no guarantee that the actual
+ * mappings are created in the GPU's MMU. If the given memory is swapped out
+ * at the time the bind operation is executed the kernel will stash the mapping
+ * details into it's internal alloctor and create the actual MMU mappings once
+ * the memory is swapped back in. While this is transparent for userspace, it is
+ * guaranteed that all the backing memory is swapped back in and all the memory
+ * mappings, as requested by userspace previously, are actually mapped once the
+ * DRM_NOUVEAU_EXEC ioctl is called to submit an exec job.
+ *
+ * Contrary to VM_BIND map requests, unmap requests are allowed to span over VA
+ * space regions and completely untouched areas of the VA space.
+ *
+ * Generally, all rules for constellations like mapping and unmapping over
+ * boundaries of existing mappings are documented in the &drm_gpuva_manager.
+ *
+ * When a VA space region is freed, all existing mappings within this region are
+ * unmapped automatically.
+ *
+ * A VM_BIND job can be executed either synchronously or asynchronously. If
+ * exectued asynchronously, userspace may provide a list of syncobjs this job
+ * will wait for and/or a list of syncobj the kernel will trigger once the
+ * VM_BIND finished execution. If executed synchronously the ioctl will block
+ * until the bind job is finished and no syncobjs are permitted by the kernel.
+ *
+ * To execute a push buffer the UAPI provides the DRM_NOUVEAU_EXEC ioctl. EXEC
+ * jobs are always executed asynchronously, and, equal to VM_BIND jobs, provide
+ * the option to synchronize them with syncobjs.
+ *
+ * Besides that EXEC job can be scheduled for a specified channel to execute on.
+ *
+ * Since VM_BIND jobs update the GPU's VA space on job submit, EXEC jobs do have
+ * an up to date view of the VA space. However, the actual mappings might still
+ * be pending. Hence, EXEC jobs require to have the particular fences - of
+ * the corresponding VM_BIND jobs they depent on - attached to them.
+ */
+
+static int
+nouveau_exec_job_submit(struct nouveau_job *job)
+{
+	struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
+	struct nouveau_cli *cli = exec_job->base.cli;
+	struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(cli);
+	struct drm_exec *exec = &job->exec;
+	int ret;
+
+	nouveau_uvmm_lock(uvmm);
+	drm_exec_while_not_all_locked(exec) {
+		DRM_GPUVA_ITER(it, &uvmm->umgr);
+
+		drm_gpuva_iter_for_each(it) {
+			struct drm_gpuva *va = it.va;
+
+			ret = drm_exec_prepare_obj(exec, va->gem.obj, 1);
+			drm_exec_break_on_contention(exec);
+			if (ret)
+				return ret;
+		}
+	}
+	nouveau_uvmm_unlock(uvmm);
+
+	return 0;
+}
+
+static struct dma_fence *
+nouveau_exec_job_run(struct nouveau_job *job)
+{
+	struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
+	struct nouveau_fence *fence;
+	int i, ret;
+
+	ret = nouveau_dma_wait(job->chan, exec_job->push.count + 1, 16);
+	if (ret) {
+		NV_PRINTK(err, job->cli, "nv50cal_space: %d\n", ret);
+		return ERR_PTR(ret);
+	}
+
+	for (i = 0; i < exec_job->push.count; i++) {
+		nv50_dma_push(job->chan, exec_job->push.s[i].va,
+			      exec_job->push.s[i].va_len);
+	}
+
+	ret = nouveau_fence_new(job->chan, false, &fence);
+	if (ret) {
+		NV_PRINTK(err, job->cli, "error fencing pushbuf: %d\n", ret);
+		WIND_RING(job->chan);
+		return ERR_PTR(ret);
+	}
+
+	return &fence->base;
+}
+static void
+nouveau_exec_job_free(struct nouveau_job *job)
+{
+	struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
+
+	nouveau_base_job_free(job);
+
+	kfree(exec_job->push.s);
+	kfree(exec_job);
+}
+
+static enum drm_gpu_sched_stat
+nouveau_exec_job_timeout(struct nouveau_job *job)
+{
+	struct nouveau_channel *chan = job->chan;
+
+	if (unlikely(!atomic_read(&chan->killed)))
+		nouveau_channel_kill(chan);
+
+	NV_PRINTK(warn, job->cli, "job timeout, channel %d killed!\n",
+		  chan->chid);
+
+	nouveau_sched_entity_fini(job->entity);
+
+	return DRM_GPU_SCHED_STAT_ENODEV;
+}
+
+static struct nouveau_job_ops nouveau_exec_job_ops = {
+	.submit = nouveau_exec_job_submit,
+	.run = nouveau_exec_job_run,
+	.free = nouveau_exec_job_free,
+	.timeout = nouveau_exec_job_timeout,
+};
+
+int
+nouveau_exec_job_init(struct nouveau_exec_job **pjob,
+		      struct nouveau_exec_job_args *args)
+{
+	struct nouveau_exec_job *job;
+	int ret;
+
+	job = *pjob = kzalloc(sizeof(*job), GFP_KERNEL);
+	if (!job)
+		return -ENOMEM;
+
+	job->push.count = args->push.count;
+	job->push.s = kmemdup(args->push.s,
+			      sizeof(*args->push.s) *
+			      args->push.count,
+			      GFP_KERNEL);
+	if (!job->push.s) {
+		ret = -ENOMEM;
+		goto err_free_job;
+	}
+
+	job->base.ops = &nouveau_exec_job_ops;
+	job->base.resv_usage = DMA_RESV_USAGE_WRITE;
+
+	ret = nouveau_base_job_init(&job->base, &args->base);
+	if (ret)
+		goto err_free_pushs;
+
+	return 0;
+
+err_free_pushs:
+	kfree(job->push.s);
+err_free_job:
+	kfree(job);
+	*pjob = NULL;
+
+	return ret;
+}
+
+static int
+nouveau_exec(struct nouveau_exec_job_args *args)
+{
+	struct nouveau_exec_job *job;
+	int ret;
+
+	ret = nouveau_exec_job_init(&job, args);
+	if (ret)
+		return ret;
+
+	ret = nouveau_job_submit(&job->base);
+	if (ret)
+		goto err_job_fini;
+
+	return 0;
+
+err_job_fini:
+	nouveau_job_fini(&job->base);
+	return ret;
+}
+
+int
+nouveau_exec_ioctl_exec(struct drm_device *dev,
+			void *data,
+			struct drm_file *file_priv)
+{
+	struct nouveau_abi16 *abi16 = nouveau_abi16_get(file_priv);
+	struct nouveau_cli *cli = nouveau_cli(file_priv);
+	struct nouveau_abi16_chan *chan16;
+	struct nouveau_channel *chan = NULL;
+	struct nouveau_exec_job_args args = {};
+	struct drm_nouveau_exec *req = data;
+	int ret = 0;
+
+	if (unlikely(!abi16))
+		return -ENOMEM;
+
+	/* abi16 locks already */
+	if (unlikely(!nouveau_cli_uvmm(cli)))
+		return nouveau_abi16_put(abi16, -ENOSYS);
+
+	list_for_each_entry(chan16, &abi16->channels, head) {
+		if (chan16->chan->chid == req->channel) {
+			chan = chan16->chan;
+			break;
+		}
+	}
+
+	if (!chan)
+		return nouveau_abi16_put(abi16, -ENOENT);
+
+	if (unlikely(atomic_read(&chan->killed)))
+		return nouveau_abi16_put(abi16, -ENODEV);
+
+	if (!chan->dma.ib_max)
+		return nouveau_abi16_put(abi16, -ENOSYS);
+
+	if (unlikely(req->push_count == 0))
+		goto out;
+
+	if (unlikely(req->push_count > NOUVEAU_GEM_MAX_PUSH)) {
+		NV_PRINTK(err, cli, "pushbuf push count exceeds limit: %d max %d\n",
+			 req->push_count, NOUVEAU_GEM_MAX_PUSH);
+		return nouveau_abi16_put(abi16, -EINVAL);
+	}
+
+	args.push.count = req->push_count;
+	args.push.s = u_memcpya(req->push_ptr, req->push_count,
+				sizeof(*args.push.s));
+	if (IS_ERR(args.push.s)) {
+		ret = PTR_ERR(args.push.s);
+		goto out;
+	}
+
+	ret = nouveau_job_ucopy_syncs(&args.base,
+				      req->wait_count, req->wait_ptr,
+				      req->sig_count, req->sig_ptr);
+	if (ret)
+		goto out_free_pushs;
+
+	args.base.sched_entity = &chan16->sched_entity;
+	args.base.chan = chan;
+	args.base.file_priv = file_priv;
+
+	ret = nouveau_exec(&args);
+	if (ret)
+		goto out_free_syncs;
+
+out_free_syncs:
+	u_free(args.base.out_sync.s);
+	u_free(args.base.in_sync.s);
+out_free_pushs:
+	u_free(args.push.s);
+out:
+	return nouveau_abi16_put(abi16, ret);
+}
diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.h b/drivers/gpu/drm/nouveau/nouveau_exec.h
new file mode 100644
index 000000000000..5bca1f9aba74
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/nouveau_exec.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: MIT */
+
+#ifndef __NOUVEAU_EXEC_H__
+#define __NOUVEAU_EXEC_H__
+
+#include <drm/drm_exec.h>
+
+#include "nouveau_drv.h"
+#include "nouveau_sched.h"
+
+struct nouveau_exec_job_args {
+	struct nouveau_job_args base;
+	struct drm_exec exec;
+
+	struct {
+		struct drm_nouveau_exec_push *s;
+		u32 count;
+	} push;
+};
+
+struct nouveau_exec_job {
+	struct nouveau_job base;
+
+	struct {
+		struct drm_nouveau_exec_push *s;
+		u32 count;
+	} push;
+};
+
+#define to_nouveau_exec_job(job)		\
+		container_of((job), struct nouveau_exec_job, base)
+
+int nouveau_exec_job_init(struct nouveau_exec_job **job,
+			  struct nouveau_exec_job_args *args);
+
+int nouveau_exec_ioctl_exec(struct drm_device *dev, void *data,
+			    struct drm_file *file_priv);
+
+#endif
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
new file mode 100644
index 000000000000..8e32fcbc09ab
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
@@ -0,0 +1,467 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *     Danilo Krummrich <dakr@redhat.com>
+ *
+ */
+
+#include <linux/slab.h>
+#include <drm/gpu_scheduler.h>
+#include <drm/drm_syncobj.h>
+
+#include "nouveau_drv.h"
+#include "nouveau_gem.h"
+#include "nouveau_mem.h"
+#include "nouveau_dma.h"
+#include "nouveau_exec.h"
+#include "nouveau_abi16.h"
+#include "nouveau_chan.h"
+#include "nouveau_sched.h"
+
+/* FIXME
+ *
+ * We want to make sure that jobs currently executing can't be deferred by
+ * other jobs competing for the hardware. Otherwise we might end up with job
+ * timeouts just because of too many clients submitting too many jobs. We don't
+ * want jobs to time out because of system load, but because of the job being
+ * too bulky.
+ *
+ * For now allow for up to 16 concurrent jobs in flight until we know how many
+ * rings the hardware can process in parallel.
+ */
+#define NOUVEAU_SCHED_HW_SUBMISSIONS		16
+#define NOUVEAU_SCHED_JOB_TIMEOUT_MS		10000
+
+int
+nouveau_job_ucopy_syncs(struct nouveau_job_args *args,
+			u32 inc, u64 ins,
+			u32 outc, u64 outs)
+{
+	struct drm_nouveau_sync **s;
+	int ret;
+
+	if (inc) {
+		s = &args->in_sync.s;
+
+		args->in_sync.count = inc;
+		*s = u_memcpya(ins, inc, sizeof(**s));
+		if (IS_ERR(*s)) {
+			ret = PTR_ERR(*s);
+			goto err_out;
+		}
+	}
+
+	if (outc) {
+		s = &args->out_sync.s;
+
+		args->out_sync.count = outc;
+		*s = u_memcpya(outs, outc, sizeof(**s));
+		if (IS_ERR(*s)) {
+			ret = PTR_ERR(*s);
+			goto err_free_ins;
+		}
+	}
+
+	return 0;
+
+err_free_ins:
+	u_free(args->in_sync.s);
+err_out:
+	return ret;
+}
+
+int
+nouveau_base_job_init(struct nouveau_job *job,
+		      struct nouveau_job_args *args)
+{
+	struct nouveau_sched_entity *entity = args->sched_entity;
+	int ret;
+
+	job->file_priv = args->file_priv;
+	job->cli = nouveau_cli(args->file_priv);
+	job->chan = args->chan;
+	job->entity = entity;
+
+	job->in_sync.count = args->in_sync.count;
+	if (job->in_sync.count) {
+		if (job->sync)
+			return -EINVAL;
+
+		job->in_sync.data = kmemdup(args->in_sync.s,
+					 sizeof(*args->in_sync.s) *
+					 args->in_sync.count,
+					 GFP_KERNEL);
+		if (!job->in_sync.data)
+			return -ENOMEM;
+	}
+
+	job->out_sync.count = args->out_sync.count;
+	if (job->out_sync.count) {
+		if (job->sync) {
+			ret = -EINVAL;
+			goto err_free_in_sync;
+		}
+
+		job->out_sync.data = kmemdup(args->out_sync.s,
+					  sizeof(*args->out_sync.s) *
+					  args->out_sync.count,
+					  GFP_KERNEL);
+		if (!job->out_sync.data) {
+			ret = -ENOMEM;
+			goto err_free_in_sync;
+		}
+
+		job->out_sync.objs = kcalloc(job->out_sync.count,
+					     sizeof(*job->out_sync.objs),
+					     GFP_KERNEL);
+		if (!job->out_sync.objs) {
+			ret = -ENOMEM;
+			goto err_free_out_sync;
+		}
+
+		job->out_sync.chains = kcalloc(job->out_sync.count,
+					       sizeof(*job->out_sync.chains),
+					       GFP_KERNEL);
+		if (!job->out_sync.chains) {
+			ret = -ENOMEM;
+			goto err_free_objs;
+		}
+
+	}
+
+	ret = drm_sched_job_init(&job->base, &entity->base, NULL);
+	if (ret)
+		goto err_free_chains;
+
+	return 0;
+
+err_free_chains:
+	kfree(job->out_sync.chains);
+err_free_objs:
+	kfree(job->out_sync.objs);
+err_free_out_sync:
+	kfree(job->out_sync.data);
+err_free_in_sync:
+	kfree(job->in_sync.data);
+return ret;
+}
+
+void
+nouveau_base_job_free(struct nouveau_job *job)
+{
+	kfree(job->in_sync.data);
+	kfree(job->out_sync.data);
+	kfree(job->out_sync.objs);
+	kfree(job->out_sync.chains);
+}
+
+void nouveau_job_fini(struct nouveau_job *job)
+{
+	dma_fence_put(job->done_fence);
+	drm_sched_job_cleanup(&job->base);
+	job->ops->free(job);
+}
+
+static int
+sync_find_fence(struct nouveau_job *job,
+		struct drm_nouveau_sync *sync,
+		struct dma_fence **fence)
+{
+	u32 stype = sync->flags & DRM_NOUVEAU_SYNC_TYPE_MASK;
+	u64 point = 0;
+	int ret;
+
+	if (stype != DRM_NOUVEAU_SYNC_SYNCOBJ &&
+	    stype != DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
+		return -EOPNOTSUPP;
+
+	if (stype == DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
+		point = sync->timeline_value;
+
+	ret = drm_syncobj_find_fence(job->file_priv,
+				     sync->handle, point,
+				     sync->flags, fence);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int
+nouveau_job_add_deps(struct nouveau_job *job)
+{
+	struct dma_fence *in_fence = NULL;
+	int ret, i;
+
+	for (i = 0; i < job->in_sync.count; i++) {
+		struct drm_nouveau_sync *sync = &job->in_sync.data[i];
+
+		ret = sync_find_fence(job, sync, &in_fence);
+		if (ret) {
+			NV_PRINTK(warn, job->cli,
+				  "Failed to find syncobj (-> in): handle=%d\n",
+				  sync->handle);
+			return ret;
+		}
+
+		ret = drm_sched_job_add_dependency(&job->base, in_fence);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int
+nouveau_job_fence_attach_prepare(struct nouveau_job *job)
+{
+	int i, ret;
+
+	for (i = 0; i < job->out_sync.count; i++) {
+		struct drm_nouveau_sync *sync = &job->out_sync.data[i];
+		struct drm_syncobj **pobj = &job->out_sync.objs[i];
+		struct dma_fence_chain **pchain = &job->out_sync.chains[i];
+		u32 stype = sync->flags & DRM_NOUVEAU_SYNC_TYPE_MASK;
+
+		if (stype != DRM_NOUVEAU_SYNC_SYNCOBJ &&
+		    stype != DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ) {
+			ret = -EINVAL;
+			goto err_sync_cleanup;
+		}
+
+		*pobj = drm_syncobj_find(job->file_priv, sync->handle);
+		if (!*pobj) {
+			NV_PRINTK(warn, job->cli,
+				  "Failed to find syncobj (-> out): handle=%d\n",
+				  sync->handle);
+			ret = -ENOENT;
+			goto err_sync_cleanup;
+		}
+
+		if (stype == DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ) {
+			*pchain = dma_fence_chain_alloc();
+			if (!*pchain) {
+				ret = -ENOMEM;
+				goto err_sync_cleanup;
+			}
+		}
+	}
+
+	return 0;
+
+err_sync_cleanup:
+	for (i = 0; i < job->out_sync.count; i++) {
+		struct drm_syncobj *obj = job->out_sync.objs[i];
+		struct dma_fence_chain *chain = job->out_sync.chains[i];
+
+		if (obj)
+			drm_syncobj_put(obj);
+
+		if (chain)
+			dma_fence_chain_free(chain);
+	}
+	return ret;
+}
+
+static void
+nouveau_job_fence_attach(struct nouveau_job *job)
+{
+	struct dma_fence *fence = job->done_fence;
+	int i;
+
+	for (i = 0; i < job->out_sync.count; i++) {
+		struct drm_nouveau_sync *sync = &job->out_sync.data[i];
+		struct drm_syncobj *obj = job->out_sync.objs[i];
+		struct dma_fence_chain *chain = job->out_sync.chains[i];
+		u32 stype = sync->flags & DRM_NOUVEAU_SYNC_TYPE_MASK;
+
+		if (stype == DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ) {
+			drm_syncobj_add_point(obj, chain, fence,
+					      sync->timeline_value);
+		} else {
+			drm_syncobj_replace_fence(obj, fence);
+		}
+
+		drm_syncobj_put(obj);
+	}
+}
+
+static int
+nouveau_job_validate(struct nouveau_job *job)
+{
+	struct drm_exec *exec = &job->exec;
+	struct drm_gem_object *obj;
+	unsigned long index;
+	int ret;
+
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		struct nouveau_bo *nvbo = nouveau_gem_object(obj);
+
+		ret = nouveau_bo_validate(nvbo, true, false);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static void
+nouveau_job_resv_add_fence(struct nouveau_job *job)
+{
+	struct drm_exec *exec = &job->exec;
+	struct drm_gem_object *obj;
+	unsigned long index;
+
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		struct dma_resv *resv = obj->resv;
+
+		dma_resv_add_fence(resv, job->done_fence, job->resv_usage);
+	}
+}
+
+int
+nouveau_job_submit(struct nouveau_job *job)
+{
+	struct nouveau_sched_entity *entity = to_nouveau_sched_entity(job->base.entity);
+	int ret;
+
+	drm_exec_init(&job->exec, true);
+
+	/* Make sure the job appears on the sched_entity's queue in the same
+	 * order as it was submitted.
+	 */
+	mutex_lock(&entity->job.mutex);
+
+	if (job->ops->submit) {
+		ret = job->ops->submit(job);
+		if (ret)
+			goto err;
+	}
+
+	ret = nouveau_job_validate(job);
+	if (ret)
+		goto err;
+
+	ret = nouveau_job_add_deps(job);
+	if (ret)
+		goto err;
+
+	ret = nouveau_job_fence_attach_prepare(job);
+	if (ret)
+		goto err;
+
+	drm_sched_job_arm(&job->base);
+	job->done_fence = dma_fence_get(&job->base.s_fence->finished);
+
+	nouveau_job_fence_attach(job);
+	nouveau_job_resv_add_fence(job);
+
+	drm_exec_fini(&job->exec);
+
+	drm_sched_entity_push_job(&job->base);
+
+	mutex_unlock(&entity->job.mutex);
+
+	if (job->sync)
+		dma_fence_wait(job->done_fence, true);
+
+	return 0;
+
+err:
+	mutex_unlock(&entity->job.mutex);
+	drm_exec_fini(&job->exec);
+	return ret;
+}
+
+static struct dma_fence *
+nouveau_job_run(struct nouveau_job *job)
+{
+	return job->ops->run(job);
+}
+
+static struct dma_fence *
+nouveau_sched_run_job(struct drm_sched_job *sched_job)
+{
+	struct nouveau_job *job = to_nouveau_job(sched_job);
+
+	return nouveau_job_run(job);
+}
+
+static enum drm_gpu_sched_stat
+nouveau_sched_timedout_job(struct drm_sched_job *sched_job)
+{
+	struct nouveau_job *job = to_nouveau_job(sched_job);
+
+	if (job->ops->timeout)
+		return job->ops->timeout(job);
+
+	NV_PRINTK(warn, job->cli, "Job timed out.\n");
+
+	return DRM_GPU_SCHED_STAT_ENODEV;
+}
+
+static void
+nouveau_sched_free_job(struct drm_sched_job *sched_job)
+{
+	struct nouveau_job *job = to_nouveau_job(sched_job);
+
+	nouveau_job_fini(job);
+}
+
+int nouveau_sched_entity_init(struct nouveau_sched_entity *entity,
+			      struct drm_gpu_scheduler *sched)
+{
+
+	mutex_init(&entity->job.mutex);
+
+	return drm_sched_entity_init(&entity->base,
+				     DRM_SCHED_PRIORITY_NORMAL,
+				     &sched, 1, NULL);
+}
+
+void
+nouveau_sched_entity_fini(struct nouveau_sched_entity *entity)
+{
+	drm_sched_entity_destroy(&entity->base);
+}
+
+static const struct drm_sched_backend_ops nouveau_sched_ops = {
+	.run_job = nouveau_sched_run_job,
+	.timedout_job = nouveau_sched_timedout_job,
+	.free_job = nouveau_sched_free_job,
+};
+
+int nouveau_sched_init(struct drm_gpu_scheduler *sched,
+		       struct nouveau_drm *drm)
+{
+	long job_hang_limit = msecs_to_jiffies(NOUVEAU_SCHED_JOB_TIMEOUT_MS);
+
+	return drm_sched_init(sched, &nouveau_sched_ops,
+			      NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
+			      NULL, NULL, "nouveau", drm->dev->dev);
+}
+
+void nouveau_sched_fini(struct drm_gpu_scheduler *sched)
+{
+	drm_sched_fini(sched);
+}
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.h b/drivers/gpu/drm/nouveau/nouveau_sched.h
new file mode 100644
index 000000000000..407a62b18788
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.h
@@ -0,0 +1,96 @@
+/* SPDX-License-Identifier: MIT */
+
+#ifndef NOUVEAU_SCHED_H
+#define NOUVEAU_SCHED_H
+
+#include <linux/types.h>
+
+#include <drm/drm_exec.h>
+#include <drm/gpu_scheduler.h>
+
+#include "nouveau_drv.h"
+
+#define to_nouveau_job(sched_job)		\
+		container_of((sched_job), struct nouveau_job, base)
+
+struct nouveau_job_args {
+	struct nouveau_channel *chan;
+	struct drm_file *file_priv;
+	struct nouveau_sched_entity *sched_entity;
+
+	struct {
+		struct drm_nouveau_sync *s;
+		u32 count;
+	} in_sync;
+
+	struct {
+		struct drm_nouveau_sync *s;
+		u32 count;
+	} out_sync;
+};
+
+struct nouveau_job {
+	struct drm_sched_job base;
+
+	struct nouveau_sched_entity *entity;
+
+	struct drm_file *file_priv;
+	struct nouveau_cli *cli;
+	struct nouveau_channel *chan;
+
+	struct drm_exec exec;
+	enum dma_resv_usage resv_usage;
+	struct dma_fence *done_fence;
+
+	bool sync;
+
+	struct {
+		struct drm_nouveau_sync *data;
+		u32 count;
+	} in_sync;
+
+	struct {
+		struct drm_nouveau_sync *data;
+		struct drm_syncobj **objs;
+		struct dma_fence_chain **chains;
+		u32 count;
+	} out_sync;
+
+	struct nouveau_job_ops {
+		int (*submit)(struct nouveau_job *);
+		struct dma_fence *(*run)(struct nouveau_job *);
+		void (*free)(struct nouveau_job *);
+		enum drm_gpu_sched_stat (*timeout)(struct nouveau_job *);
+	} *ops;
+};
+
+int nouveau_job_ucopy_syncs(struct nouveau_job_args *args,
+			    u32 inc, u64 ins,
+			    u32 outc, u64 outs);
+
+int nouveau_base_job_init(struct nouveau_job *job,
+			  struct nouveau_job_args *args);
+void nouveau_base_job_free(struct nouveau_job *job);
+
+int nouveau_job_submit(struct nouveau_job *job);
+void nouveau_job_fini(struct nouveau_job *job);
+
+#define to_nouveau_sched_entity(entity)		\
+		container_of((entity), struct nouveau_sched_entity, base)
+
+struct nouveau_sched_entity {
+	struct drm_sched_entity base;
+	struct {
+		struct mutex mutex;
+	} job;
+};
+
+int nouveau_sched_entity_init(struct nouveau_sched_entity *entity,
+			      struct drm_gpu_scheduler *sched);
+void nouveau_sched_entity_fini(struct nouveau_sched_entity *entity);
+
+int nouveau_sched_init(struct drm_gpu_scheduler *sched,
+		       struct nouveau_drm *drm);
+void nouveau_sched_fini(struct drm_gpu_scheduler *sched);
+
+#endif
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
index 2f7747a5a917..a23b71c31021 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
@@ -57,6 +57,40 @@
 #define NOUVEAU_VA_SPACE_START		0x0
 #define NOUVEAU_VA_SPACE_END		(1ULL << NOUVEAU_VA_SPACE_BITS)
 
+#define list_for_each_op(_op, _ops) list_for_each_entry(_op, _ops, entry)
+#define list_for_each_op_continue_reverse(_op, _ops) \
+	list_for_each_entry_continue_reverse(_op, _ops, entry)
+#define list_for_each_op_safe(_op, _n, _ops) list_for_each_entry_safe(_op, _n, _ops, entry)
+
+enum bind_op {
+	OP_ALLOC = DRM_NOUVEAU_VM_BIND_OP_ALLOC,
+	OP_FREE = DRM_NOUVEAU_VM_BIND_OP_FREE,
+	OP_MAP = DRM_NOUVEAU_VM_BIND_OP_MAP,
+	OP_UNMAP = DRM_NOUVEAU_VM_BIND_OP_UNMAP,
+};
+
+struct bind_job_op {
+	struct list_head entry;
+
+	enum bind_op op;
+	u32 flags;
+
+	struct {
+		u64 addr;
+		u64 range;
+	} va;
+
+	struct {
+		u32 handle;
+		u64 offset;
+		struct drm_gem_object *obj;
+	} gem;
+
+	struct nouveau_uvma_region *reg;
+	struct nouveau_uvma_alloc new;
+	struct drm_gpuva_ops *ops;
+};
+
 struct uvmm_map_args {
 	u64 addr;
 	u64 range;
@@ -953,6 +987,418 @@ nouveau_uvmm_sm_unmap_cleanup(struct nouveau_uvmm *uvmm,
 	uvmm_sm_cleanup(uvmm, new, ops, true);
 }
 
+static int
+bind_validate_op(struct nouveau_job *job,
+			struct bind_job_op *op)
+{
+	struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+	struct drm_gem_object *obj = op->gem.obj;
+
+	if (op->op == OP_MAP) {
+		if (op->gem.offset & ~PAGE_MASK)
+			return -EINVAL;
+
+		if (obj->size <= op->gem.offset)
+			return -EINVAL;
+
+		if (op->va.range > (obj->size - op->gem.offset))
+			return -EINVAL;
+	}
+
+	return nouveau_uvmm_validate_range(uvmm, op->va.addr, op->va.range);
+}
+
+static int
+uvmm_bind_job_submit(struct nouveau_job *job)
+{
+	struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+	struct nouveau_uvmm_bind_job *bind_job = to_uvmm_bind_job(job);
+	struct drm_exec *exec = &job->exec;
+	struct bind_job_op *op;
+	int ret;
+
+	/* We need to keep holding the uvmm lock until this function can't fail
+	 * anymore, since we need to be able to unind all GPUVA space changes on
+	 * failure.
+	 */
+	nouveau_uvmm_lock(uvmm);
+	list_for_each_op(op, &bind_job->ops) {
+
+		if (op->op == OP_MAP) {
+			op->gem.obj = drm_gem_object_lookup(job->file_priv,
+							    op->gem.handle);
+			if (!op->gem.obj) {
+				ret = -ENOENT;
+				goto unwind;
+			}
+		}
+
+		ret = bind_validate_op(job, op);
+		if (ret)
+			goto unwind;
+
+		switch (op->op) {
+		case OP_ALLOC: {
+			bool sparse = op->flags & DRM_NOUVEAU_VM_BIND_SPARSE;
+
+			ret = nouveau_uvma_region_create(uvmm,
+							 op->va.addr,
+							 op->va.range,
+							 sparse);
+			if (ret)
+				goto unwind;
+
+			break;
+		}
+		case OP_FREE:
+			op->reg = nouveau_uvma_region_find(uvmm, op->va.addr,
+							   op->va.range);
+			if (!op->reg)
+				goto unwind;
+
+			op->ops = nouveau_uvmm_sm_unmap_ops(uvmm,
+							    op->va.addr,
+							    op->va.range);
+			if (IS_ERR(op->ops)) {
+				ret = PTR_ERR(op->ops);
+				goto unwind;
+			}
+
+			ret = nouveau_uvmm_sm_unmap_prepare(uvmm, &op->new,
+							    op->ops);
+			if (ret)
+				goto unwind;
+
+			nouveau_uvma_region_remove(op->reg);
+
+			break;
+		case OP_MAP:
+			op->ops = nouveau_uvmm_sm_map_ops(uvmm,
+							  op->va.addr,
+							  op->va.range,
+							  op->gem.obj,
+							  op->gem.offset);
+			if (IS_ERR(op->ops)) {
+				ret = PTR_ERR(op->ops);
+				goto unwind;
+			}
+
+			ret = nouveau_uvmm_sm_map_prepare(uvmm, &op->new,
+							  op->ops, op->va.addr,
+							  op->va.range,
+							  op->flags && 0xff);
+			if (ret)
+				goto unwind;
+
+			break;
+		case OP_UNMAP:
+			op->ops = nouveau_uvmm_sm_unmap_ops(uvmm,
+							    op->va.addr,
+							    op->va.range);
+			if (IS_ERR(op->ops)) {
+				ret = PTR_ERR(op->ops);
+				goto unwind;
+			}
+
+			ret = nouveau_uvmm_sm_unmap_prepare(uvmm, &op->new,
+							    op->ops);
+			if (ret)
+				goto unwind;
+
+			break;
+		default:
+			ret = -EINVAL;
+			goto unwind;
+		}
+	}
+
+	drm_exec_while_not_all_locked(exec) {
+		list_for_each_op(op, &bind_job->ops) {
+			if (op->op != OP_MAP)
+				continue;
+
+			ret = drm_exec_prepare_obj(exec, op->gem.obj, 1);
+			drm_exec_break_on_contention(exec);
+			if (ret)
+				goto unwind;
+		}
+	}
+	nouveau_uvmm_unlock(uvmm);
+
+	return 0;
+
+unwind:
+	list_for_each_op_continue_reverse(op, &bind_job->ops) {
+		switch (op->op) {
+		case OP_ALLOC:
+			nouveau_uvma_region_destroy(uvmm, op->va.addr,
+						    op->va.range);
+			break;
+		case OP_FREE:
+			__nouveau_uvma_region_insert(uvmm, op->reg);
+			nouveau_uvmm_sm_unmap_prepare_unwind(uvmm, &op->new,
+							     op->ops);
+			break;
+		case OP_MAP:
+			nouveau_uvmm_sm_map_prepare_unwind(uvmm, &op->new,
+							   op->ops,
+							   op->va.addr,
+							   op->va.range);
+			break;
+		case OP_UNMAP:
+			nouveau_uvmm_sm_unmap_prepare_unwind(uvmm, &op->new,
+							     op->ops);
+			break;
+		}
+
+		drm_gpuva_ops_free(&uvmm->umgr, op->ops);
+		op->ops = NULL;
+		op->reg = NULL;
+	}
+
+	nouveau_uvmm_unlock(uvmm);
+	return ret;
+}
+
+static struct dma_fence *
+uvmm_bind_job_run(struct nouveau_job *job)
+{
+	struct nouveau_uvmm_bind_job *bind_job = to_uvmm_bind_job(job);
+	struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+	struct bind_job_op *op;
+	int ret = 0;
+
+	list_for_each_op(op, &bind_job->ops) {
+		switch (op->op) {
+		case OP_ALLOC:
+			/* noop */
+			break;
+		case OP_MAP:
+			ret = nouveau_uvmm_sm_map(uvmm, &op->new, op->ops);
+			if (ret)
+				goto out;
+			break;
+		case OP_FREE:
+			fallthrough;
+		case OP_UNMAP:
+			ret = nouveau_uvmm_sm_unmap(uvmm, &op->new, op->ops);
+			if (ret)
+				goto out;
+			break;
+		}
+	}
+
+out:
+	if (ret)
+		NV_PRINTK(err, job->cli, "bind job failed: %d\n", ret);
+	return ERR_PTR(ret);
+}
+
+static void
+uvmm_bind_job_free(struct nouveau_job *job)
+{
+	struct nouveau_uvmm_bind_job *bind_job = to_uvmm_bind_job(job);
+	struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+	struct bind_job_op *op, *next;
+
+	list_for_each_op_safe(op, next, &bind_job->ops) {
+		struct drm_gem_object *obj = op->gem.obj;
+
+		/* When uvmm_bind_job_submit() failed op->ops and op->reg will
+		 * be NULL, hence we correctly skip the cleanup.
+		 */
+		switch (op->op) {
+		case OP_ALLOC:
+			/* noop */
+			break;
+		case OP_FREE:
+			if (!IS_ERR_OR_NULL(op->ops))
+				nouveau_uvmm_sm_unmap_cleanup(uvmm, &op->new,
+							      op->ops);
+
+			if (op->reg) {
+				nouveau_uvma_region_sparse_unref(op->reg);
+				nouveau_uvma_region_free(op->reg);
+			}
+
+			break;
+		case OP_MAP:
+			if (!IS_ERR_OR_NULL(op->ops))
+				nouveau_uvmm_sm_map_cleanup(uvmm, &op->new,
+							    op->ops);
+			break;
+		case OP_UNMAP:
+			if (!IS_ERR_OR_NULL(op->ops))
+				nouveau_uvmm_sm_unmap_cleanup(uvmm, &op->new,
+							      op->ops);
+			break;
+		}
+
+		if (!IS_ERR_OR_NULL(op->ops))
+			drm_gpuva_ops_free(&uvmm->umgr, op->ops);
+
+		if (obj)
+			drm_gem_object_put(obj);
+
+		list_del(&op->entry);
+		kfree(op);
+	}
+
+	nouveau_base_job_free(job);
+	kfree(bind_job);
+}
+
+static struct nouveau_job_ops nouveau_bind_job_ops = {
+	.submit = uvmm_bind_job_submit,
+	.run = uvmm_bind_job_run,
+	.free = uvmm_bind_job_free,
+};
+
+static int
+bind_job_op_from_uop(struct bind_job_op **pop,
+		     struct drm_nouveau_vm_bind_op *uop)
+{
+	struct bind_job_op *op;
+
+	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
+	if (!op)
+		return -ENOMEM;
+
+	op->op = uop->op;
+	op->flags = uop->flags;
+	op->va.addr = uop->addr;
+	op->va.range = uop->range;
+	op->gem.handle = uop->handle;
+	op->gem.offset = uop->bo_offset;
+
+	return 0;
+}
+
+static void
+bind_job_ops_free(struct list_head *ops)
+{
+	struct bind_job_op *op, *next;
+
+	list_for_each_op_safe(op, next, ops) {
+		list_del(&op->entry);
+		kfree(op);
+	}
+}
+
+int
+nouveau_uvmm_bind_job_init(struct nouveau_uvmm_bind_job **pjob,
+			   struct nouveau_uvmm_bind_job_args *args)
+{
+	struct nouveau_uvmm_bind_job *job;
+	struct bind_job_op *op;
+	int i, ret;
+
+	job = *pjob = kzalloc(sizeof(*job), GFP_KERNEL);
+	if (!job)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&job->ops);
+
+	for (i = 0; i < args->op.count; i++) {
+		ret = bind_job_op_from_uop(&op, &args->op.s[i]);
+		if (ret)
+			goto err_free;
+
+		list_add_tail(&op->entry, &job->ops);
+	}
+
+	job->base.sync = !(args->flags & DRM_NOUVEAU_VM_BIND_RUN_ASYNC);
+	job->base.ops = &nouveau_bind_job_ops;
+	job->base.resv_usage = DMA_RESV_USAGE_BOOKKEEP;
+
+	ret = nouveau_base_job_init(&job->base, &args->base);
+	if (ret)
+		goto err_free;
+
+	return 0;
+
+err_free:
+	bind_job_ops_free(&job->ops);
+	kfree(job);
+	*pjob = NULL;
+
+	return ret;
+}
+
+int
+nouveau_uvmm_ioctl_vm_init(struct drm_device *dev,
+			   void *data,
+			   struct drm_file *file_priv)
+{
+	struct nouveau_cli *cli = nouveau_cli(file_priv);
+	struct drm_nouveau_vm_init *init = data;
+
+	return nouveau_uvmm_init(&cli->uvmm, cli, init);
+}
+
+static int
+nouveau_uvmm_vm_bind(struct nouveau_uvmm_bind_job_args *args)
+{
+	struct nouveau_uvmm_bind_job *job;
+	int ret;
+
+	ret = nouveau_uvmm_bind_job_init(&job, args);
+	if (ret)
+		return ret;
+
+	ret = nouveau_job_submit(&job->base);
+	if (ret)
+		goto err_job_fini;
+
+	return 0;
+
+err_job_fini:
+	nouveau_job_fini(&job->base);
+	return ret;
+}
+
+int
+nouveau_uvmm_ioctl_vm_bind(struct drm_device *dev,
+			   void *data,
+			   struct drm_file *file_priv)
+{
+	struct nouveau_cli *cli = nouveau_cli(file_priv);
+	struct nouveau_uvmm_bind_job_args args = {};
+	struct drm_nouveau_vm_bind *req = data;
+	int ret = 0;
+
+	if (unlikely(!nouveau_cli_uvmm_locked(cli)))
+		return -ENOSYS;
+
+	args.flags = req->flags;
+
+	args.op.count = req->op_count;
+	args.op.s = u_memcpya(req->op_ptr, req->op_count,
+			      sizeof(*args.op.s));
+	if (IS_ERR(args.op.s))
+		return PTR_ERR(args.op.s);
+
+	ret = nouveau_job_ucopy_syncs(&args.base,
+				      req->wait_count, req->wait_ptr,
+				      req->sig_count, req->sig_ptr);
+	if (ret)
+		goto out_free_ops;
+
+	args.base.sched_entity = &cli->sched_entity;
+	args.base.file_priv = file_priv;
+
+	ret = nouveau_uvmm_vm_bind(&args);
+	if (ret)
+		goto out_free_syncs;
+
+out_free_syncs:
+	u_free(args.base.out_sync.s);
+	u_free(args.base.in_sync.s);
+out_free_ops:
+	u_free(args.op.s);
+	return ret;
+}
+
 void
 nouveau_uvmm_bo_map_all(struct nouveau_bo *nvbo, struct nouveau_mem *mem)
 {
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.h b/drivers/gpu/drm/nouveau/nouveau_uvmm.h
index 858840e9e0c5..d86a521100fa 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.h
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.h
@@ -36,6 +36,25 @@ struct nouveau_uvma_alloc {
 	struct nouveau_uvma *next;
 };
 
+struct nouveau_uvmm_bind_job {
+	struct nouveau_job base;
+
+	/* struct bind_job_op */
+	struct list_head ops;
+};
+
+struct nouveau_uvmm_bind_job_args {
+	struct nouveau_job_args base;
+	unsigned int flags;
+
+	struct {
+		struct drm_nouveau_vm_bind_op *s;
+		u32 count;
+	} op;
+};
+
+#define to_uvmm_bind_job(job) container_of((job), struct nouveau_uvmm_bind_job, base)
+
 #define uvmm_from_mgr(x) container_of((x), struct nouveau_uvmm, umgr)
 #define uvma_from_va(x) container_of((x), struct nouveau_uvma, va)
 #define uvma_region_from_va_region(x) container_of((x), struct nouveau_uvma_region, region)
@@ -97,6 +116,15 @@ void nouveau_uvmm_sm_unmap_cleanup(struct nouveau_uvmm *uvmm,
 				   struct nouveau_uvma_alloc *new,
 				   struct drm_gpuva_ops *ops);
 
+int nouveau_uvmm_bind_job_init(struct nouveau_uvmm_bind_job **pjob,
+			       struct nouveau_uvmm_bind_job_args *args);
+
+int nouveau_uvmm_ioctl_vm_init(struct drm_device *dev, void *data,
+			       struct drm_file *file_priv);
+
+int nouveau_uvmm_ioctl_vm_bind(struct drm_device *dev, void *data,
+			       struct drm_file *file_priv);
+
 static inline void nouveau_uvmm_lock(struct nouveau_uvmm *uvmm)
 {
 	mutex_lock(&uvmm->mutex);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH drm-next v2 16/16] drm/nouveau: debugfs: implement DRM GPU VA debugfs
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (14 preceding siblings ...)
  2023-02-17 13:48 ` [PATCH drm-next v2 15/16] drm/nouveau: implement new VM_BIND UAPI Danilo Krummrich
@ 2023-02-17 13:48 ` Danilo Krummrich
  2023-03-09  9:12 ` [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Boris Brezillon
  16 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-17 13:48 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm, Danilo Krummrich

Provide the driver indirection iterating over all DRM GPU VA spaces to
enable the common 'gpuvas' debugfs file for dumping DRM GPU VA spaces.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_debugfs.c | 24 +++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_debugfs.c b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
index 2a36d1ca8fda..7f6ccc5d1d86 100644
--- a/drivers/gpu/drm/nouveau/nouveau_debugfs.c
+++ b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
@@ -202,6 +202,29 @@ nouveau_debugfs_pstate_open(struct inode *inode, struct file *file)
 	return single_open(file, nouveau_debugfs_pstate_get, inode->i_private);
 }
 
+static int
+nouveau_debugfs_gpuva(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct nouveau_drm *drm = nouveau_drm(node->minor->dev);
+	struct nouveau_cli *cli;
+
+	mutex_lock(&drm->clients_lock);
+	list_for_each_entry(cli, &drm->clients, head) {
+		struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(cli);
+
+		if (!uvmm)
+			continue;
+
+		nouveau_uvmm_lock(uvmm);
+		drm_debugfs_gpuva_info(m, &uvmm->umgr);
+		nouveau_uvmm_unlock(uvmm);
+	}
+	mutex_unlock(&drm->clients_lock);
+
+	return 0;
+}
+
 static const struct file_operations nouveau_pstate_fops = {
 	.owner = THIS_MODULE,
 	.open = nouveau_debugfs_pstate_open,
@@ -213,6 +236,7 @@ static const struct file_operations nouveau_pstate_fops = {
 static struct drm_info_list nouveau_debugfs_list[] = {
 	{ "vbios.rom",  nouveau_debugfs_vbios_image, 0, NULL },
 	{ "strap_peek", nouveau_debugfs_strap_peek, 0, NULL },
+	DRM_DEBUGFS_GPUVA_INFO(nouveau_debugfs_gpuva, NULL),
 };
 #define NOUVEAU_DEBUGFS_ENTRIES ARRAY_SIZE(nouveau_debugfs_list)
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 01/16] drm: execution context for GEM buffers
  2023-02-17 13:44 ` [PATCH drm-next v2 01/16] drm: execution context for GEM buffers Danilo Krummrich
@ 2023-02-17 16:00   ` Christian König
  2023-02-21 14:56     ` Danilo Krummrich
  0 siblings, 1 reply; 64+ messages in thread
From: Christian König @ 2023-02-17 16:00 UTC (permalink / raw)
  To: Danilo Krummrich, airlied, daniel, tzimmermann, mripard, corbet,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-mm, nouveau, linux-kernel, dri-devel, linux-doc

Am 17.02.23 um 14:44 schrieb Danilo Krummrich:
> From: Christian König <christian.koenig@amd.com>
>
> This adds the infrastructure for an execution context for GEM buffers
> which is similar to the existinc TTMs execbuf util and intended to replace
> it in the long term.
>
> The basic functionality is that we abstracts the necessary loop to lock
> many different GEM buffers with automated deadlock and duplicate handling.
>
> v2: drop xarray and use dynamic resized array instead, the locking
>      overhead is unecessary and measureable.

Question for Danilo and probably others: Does Nouveau make use of the 
duplicate tracking at some point?

Background is that I only have two or three use cases for this in 
radeon/amdgpu and would like to make it an optional feature.

Going to take a look at the rest of this series next week.

Regards,
Christian.

>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   Documentation/gpu/drm-mm.rst       |  12 ++
>   drivers/gpu/drm/Kconfig            |   6 +
>   drivers/gpu/drm/Makefile           |   2 +
>   drivers/gpu/drm/amd/amdgpu/Kconfig |   1 +
>   drivers/gpu/drm/drm_exec.c         | 295 +++++++++++++++++++++++++++++
>   include/drm/drm_exec.h             | 144 ++++++++++++++
>   6 files changed, 460 insertions(+)
>   create mode 100644 drivers/gpu/drm/drm_exec.c
>   create mode 100644 include/drm/drm_exec.h
>
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index a79fd3549ff8..a52e6f4117d6 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -493,6 +493,18 @@ DRM Sync Objects
>   .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
>      :export:
>   
> +DRM Execution context
> +=====================
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :doc: Overview
> +
> +.. kernel-doc:: include/drm/drm_exec.h
> +   :internal:
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :export:
> +
>   GPU Scheduler
>   =============
>   
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index f42d4c6a19f2..1573d658fbb5 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -200,6 +200,12 @@ config DRM_TTM
>   	  GPU memory types. Will be enabled automatically if a device driver
>   	  uses it.
>   
> +config DRM_EXEC
> +	tristate
> +	depends on DRM
> +	help
> +	  Execution context for command submissions
> +
>   config DRM_BUDDY
>   	tristate
>   	depends on DRM
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index ab4460fcd63f..d40defbb0347 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -78,6 +78,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
>   #
>   # Memory-management helpers
>   #
> +#
> +obj-$(CONFIG_DRM_EXEC) += drm_exec.o
>   
>   obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig
> index 5341b6b242c3..279fb3bba810 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
> +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
> @@ -11,6 +11,7 @@ config DRM_AMDGPU
>   	select DRM_SCHED
>   	select DRM_TTM
>   	select DRM_TTM_HELPER
> +	select DRM_EXEC
>   	select POWER_SUPPLY
>   	select HWMON
>   	select I2C
> diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
> new file mode 100644
> index 000000000000..ed2106c22786
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_exec.c
> @@ -0,0 +1,295 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#include <drm/drm_exec.h>
> +#include <drm/drm_gem.h>
> +#include <linux/dma-resv.h>
> +
> +/**
> + * DOC: Overview
> + *
> + * This component mainly abstracts the retry loop necessary for locking
> + * multiple GEM objects while preparing hardware operations (e.g. command
> + * submissions, page table updates etc..).
> + *
> + * If a contention is detected while locking a GEM object the cleanup procedure
> + * unlocks all previously locked GEM objects and locks the contended one first
> + * before locking any further objects.
> + *
> + * After an object is locked fences slots can optionally be reserved on the
> + * dma_resv object inside the GEM object.
> + *
> + * A typical usage pattern should look like this::
> + *
> + *	struct drm_gem_object *obj;
> + *	struct drm_exec exec;
> + *	unsigned long index;
> + *	int ret;
> + *
> + *	drm_exec_init(&exec, true);
> + *	drm_exec_while_not_all_locked(&exec) {
> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *
> + *		ret = drm_exec_lock(&exec, boB, 1);
> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *	}
> + *
> + *	drm_exec_for_each_locked_object(&exec, index, obj) {
> + *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
> + *		...
> + *	}
> + *	drm_exec_fini(&exec);
> + *
> + * See struct dma_exec for more details.
> + */
> +
> +/* Dummy value used to initially enter the retry loop */
> +#define DRM_EXEC_DUMMY (void*)~0
> +
> +/* Initialize the drm_exec_objects container */
> +static void drm_exec_objects_init(struct drm_exec_objects *container)
> +{
> +	container->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +
> +	/* If allocation here fails, just delay that till the first use */
> +	container->max_objects = container->objects ?
> +		PAGE_SIZE / sizeof(void *) : 0;
> +	container->num_objects = 0;
> +}
> +
> +/* Cleanup the drm_exec_objects container */
> +static void drm_exec_objects_fini(struct drm_exec_objects *container)
> +{
> +	kvfree(container->objects);
> +}
> +
> +/* Make sure we have enough room and add an object the container */
> +static int drm_exec_objects_add(struct drm_exec_objects *container,
> +				struct drm_gem_object *obj)
> +{
> +	if (unlikely(container->num_objects == container->max_objects)) {
> +		size_t size = container->max_objects * sizeof(void *);
> +		void *tmp;
> +
> +		tmp = kvrealloc(container->objects, size, size + PAGE_SIZE,
> +				GFP_KERNEL);
> +		if (!tmp)
> +			return -ENOMEM;
> +
> +		container->objects = tmp;
> +		container->max_objects += PAGE_SIZE / sizeof(void *);
> +	}
> +	drm_gem_object_get(obj);
> +	container->objects[container->num_objects++] = obj;
> +	return 0;
> +}
> +
> +/* Unlock all objects and drop references */
> +static void drm_exec_unlock_all(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj;
> +	unsigned long index;
> +
> +	drm_exec_for_each_duplicate_object(exec, index, obj)
> +		drm_gem_object_put(obj);
> +
> +	drm_exec_for_each_locked_object(exec, index, obj) {
> +		dma_resv_unlock(obj->resv);
> +		drm_gem_object_put(obj);
> +	}
> +}
> +
> +/**
> + * drm_exec_init - initialize a drm_exec object
> + * @exec: the drm_exec object to initialize
> + * @interruptible: if locks should be acquired interruptible
> + *
> + * Initialize the object and make sure that we can track locked and duplicate
> + * objects.
> + */
> +void drm_exec_init(struct drm_exec *exec, bool interruptible)
> +{
> +	exec->interruptible = interruptible;
> +	drm_exec_objects_init(&exec->locked);
> +	drm_exec_objects_init(&exec->duplicates);
> +	exec->contended = DRM_EXEC_DUMMY;
> +}
> +EXPORT_SYMBOL(drm_exec_init);
> +
> +/**
> + * drm_exec_fini - finalize a drm_exec object
> + * @exec: the drm_exec object to finilize
> + *
> + * Unlock all locked objects, drop the references to objects and free all memory
> + * used for tracking the state.
> + */
> +void drm_exec_fini(struct drm_exec *exec)
> +{
> +	drm_exec_unlock_all(exec);
> +	drm_exec_objects_fini(&exec->locked);
> +	drm_exec_objects_fini(&exec->duplicates);
> +	if (exec->contended != DRM_EXEC_DUMMY) {
> +		drm_gem_object_put(exec->contended);
> +		ww_acquire_fini(&exec->ticket);
> +	}
> +}
> +EXPORT_SYMBOL(drm_exec_fini);
> +
> +/**
> + * drm_exec_cleanup - cleanup when contention is detected
> + * @exec: the drm_exec object to cleanup
> + *
> + * Cleanup the current state and return true if we should stay inside the retry
> + * loop, false if there wasn't any contention detected and we can keep the
> + * objects locked.
> + */
> +bool drm_exec_cleanup(struct drm_exec *exec)
> +{
> +	if (likely(!exec->contended)) {
> +		ww_acquire_done(&exec->ticket);
> +		return false;
> +	}
> +
> +	if (likely(exec->contended == DRM_EXEC_DUMMY)) {
> +		exec->contended = NULL;
> +		ww_acquire_init(&exec->ticket, &reservation_ww_class);
> +		return true;
> +	}
> +
> +	drm_exec_unlock_all(exec);
> +	exec->locked.num_objects = 0;
> +	exec->duplicates.num_objects = 0;
> +	return true;
> +}
> +EXPORT_SYMBOL(drm_exec_cleanup);
> +
> +/* Track the locked object in the xa and reserve fences */
> +static int drm_exec_obj_locked(struct drm_exec_objects *container,
> +			       struct drm_gem_object *obj,
> +			       unsigned int num_fences)
> +{
> +	int ret;
> +
> +	if (container) {
> +		ret = drm_exec_objects_add(container, obj);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (num_fences) {
> +		ret = dma_resv_reserve_fences(obj->resv, num_fences);
> +		if (ret)
> +			goto error_erase;
> +	}
> +
> +	return 0;
> +
> +error_erase:
> +	if (container) {
> +		--container->num_objects;
> +		drm_gem_object_put(obj);
> +	}
> +	return ret;
> +}
> +
> +/* Make sure the contended object is locked first */
> +static int drm_exec_lock_contended(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj = exec->contended;
> +	int ret;
> +
> +	if (likely(!obj))
> +		return 0;
> +
> +	if (exec->interruptible) {
> +		ret = dma_resv_lock_slow_interruptible(obj->resv,
> +						       &exec->ticket);
> +		if (unlikely(ret))
> +			goto error_dropref;
> +	} else {
> +		dma_resv_lock_slow(obj->resv, &exec->ticket);
> +	}
> +
> +	ret = drm_exec_obj_locked(&exec->locked, obj, 0);
> +	if (unlikely(ret))
> +		dma_resv_unlock(obj->resv);
> +
> +error_dropref:
> +	/* Always cleanup the contention so that error handling can kick in */
> +	drm_gem_object_put(obj);
> +	exec->contended = NULL;
> +	return ret;
> +}
> +
> +/**
> + * drm_exec_prepare_obj - prepare a GEM object for use
> + * @exec: the drm_exec object with the state
> + * @obj: the GEM object to prepare
> + * @num_fences: how many fences to reserve
> + *
> + * Prepare a GEM object for use by locking it and reserving fence slots. All
> + * succesfully locked objects are put into the locked container. Duplicates
> + * detected as well and automatically moved into the duplicates container.
> + *
> + * Returns: -EDEADLK if a contention is detected, -ENOMEM when memory
> + * allocation failed and zero for success.
> + */
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences)
> +{
> +	int ret;
> +
> +	ret = drm_exec_lock_contended(exec);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	if (exec->interruptible)
> +		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
> +	else
> +		ret = dma_resv_lock(obj->resv, &exec->ticket);
> +
> +	if (unlikely(ret == -EDEADLK)) {
> +		drm_gem_object_get(obj);
> +		exec->contended = obj;
> +		return -EDEADLK;
> +	}
> +
> +	if (unlikely(ret == -EALREADY)) {
> +		struct drm_exec_objects *container = &exec->duplicates;
> +
> +		/*
> +		 * If this is the first locked GEM object it was most likely
> +		 * just contended. So don't add it to the duplicates, just
> +		 * reserve the fence slots.
> +		 */
> +		if (exec->locked.num_objects && exec->locked.objects[0] == obj)
> +			container = NULL;
> +
> +		ret = drm_exec_obj_locked(container, obj, num_fences);
> +		if (ret)
> +			return ret;
> +
> +	} else if (unlikely(ret)) {
> +		return ret;
> +
> +	} else {
> +		ret = drm_exec_obj_locked(&exec->locked, obj, num_fences);
> +		if (ret)
> +			goto error_unlock;
> +	}
> +
> +	drm_gem_object_get(obj);
> +	return 0;
> +
> +error_unlock:
> +	dma_resv_unlock(obj->resv);
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_exec_prepare_obj);
> +
> +MODULE_DESCRIPTION("DRM execution context");
> +MODULE_LICENSE("Dual MIT/GPL");
> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> new file mode 100644
> index 000000000000..f73981c6292e
> --- /dev/null
> +++ b/include/drm/drm_exec.h
> @@ -0,0 +1,144 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#ifndef __DRM_EXEC_H__
> +#define __DRM_EXEC_H__
> +
> +#include <linux/ww_mutex.h>
> +
> +struct drm_gem_object;
> +
> +/**
> + * struct drm_exec_objects - Container for GEM objects in a drm_exec
> + */
> +struct drm_exec_objects {
> +	unsigned int		num_objects;
> +	unsigned int		max_objects;
> +	struct drm_gem_object	**objects;
> +};
> +
> +/**
> + * drm_exec_objects_for_each - iterate over all the objects inside the container
> + */
> +#define drm_exec_objects_for_each(array, index, obj)		\
> +	for (index = 0, obj = (array)->objects[0];		\
> +	     index < (array)->num_objects;			\
> +	     ++index, obj = (array)->objects[index])
> +
> +/**
> + * struct drm_exec - Execution context
> + */
> +struct drm_exec {
> +	/**
> +	 * @interruptible: If locks should be taken interruptible
> +	 */
> +	bool			interruptible;
> +
> +	/**
> +	 * @ticket: WW ticket used for acquiring locks
> +	 */
> +	struct ww_acquire_ctx	ticket;
> +
> +	/**
> +	 * @locked: container for the locked GEM objects
> +	 */
> +	struct drm_exec_objects	locked;
> +
> +	/**
> +	 * @duplicates: container for the duplicated GEM objects
> +	 */
> +	struct drm_exec_objects	duplicates;
> +
> +	/**
> +	 * @contended: contended GEM object we backet of for.
> +	 */
> +	struct drm_gem_object	*contended;
> +};
> +
> +/**
> + * drm_exec_for_each_locked_object - iterate over all the locked objects
> + * @exec: drm_exec object
> + * @index: unsigned long index for the iteration
> + * @obj: the current GEM object
> + *
> + * Iterate over all the locked GEM objects inside the drm_exec object.
> + */
> +#define drm_exec_for_each_locked_object(exec, index, obj)	\
> +	drm_exec_objects_for_each(&(exec)->locked, index, obj)
> +
> +/**
> + * drm_exec_for_each_duplicate_object - iterate over all the duplicate objects
> + * @exec: drm_exec object
> + * @index: unsigned long index for the iteration
> + * @obj: the current GEM object
> + *
> + * Iterate over all the duplicate GEM objects inside the drm_exec object.
> + */
> +#define drm_exec_for_each_duplicate_object(exec, index, obj)	\
> +	drm_exec_objects_for_each(&(exec)->duplicates, index, obj)
> +
> +/**
> + * drm_exec_while_not_all_locked - loop until all GEM objects are prepared
> + * @exec: drm_exec object
> + *
> + * Core functionality of the drm_exec object. Loops until all GEM objects are
> + * prepared and no more contention exists.
> + *
> + * At the beginning of the loop it is guaranteed that no GEM object is locked.
> + */
> +#define drm_exec_while_not_all_locked(exec)	\
> +	while (drm_exec_cleanup(exec))
> +
> +/**
> + * drm_exec_continue_on_contention - continue the loop when we need to cleanup
> + * @exec: drm_exec object
> + *
> + * Control flow helper to continue when a contention was detected and we need to
> + * clean up and re-start the loop to prepare all GEM objects.
> + */
> +#define drm_exec_continue_on_contention(exec)		\
> +	if (unlikely(drm_exec_is_contended(exec)))	\
> +		continue
> +
> +/**
> + * drm_exec_break_on_contention - break a subordinal loop on contention
> + * @exec: drm_exec object
> + *
> + * Control flow helper to break a subordinal loop when a contention was detected
> + * and we need to clean up and re-start the loop to prepare all GEM objects.
> + */
> +#define drm_exec_break_on_contention(exec)		\
> +	if (unlikely(drm_exec_is_contended(exec)))	\
> +		break
> +
> +/**
> + * drm_exec_is_contended - check for contention
> + * @exec: drm_exec object
> + *
> + * Returns true if the drm_exec object has run into some contention while
> + * locking a GEM object and needs to clean up.
> + */
> +static inline bool drm_exec_is_contended(struct drm_exec *exec)
> +{
> +	return !!exec->contended;
> +}
> +
> +/**
> + * drm_exec_has_duplicates - check for duplicated GEM object
> + * @exec: drm_exec object
> + *
> + * Return true if the drm_exec object has encountered some already locked GEM
> + * objects while trying to lock them. This can happen if multiple GEM objects
> + * share the same underlying resv object.
> + */
> +static inline bool drm_exec_has_duplicates(struct drm_exec *exec)
> +{
> +	return exec->duplicates.num_objects > 0;
> +}
> +
> +void drm_exec_init(struct drm_exec *exec, bool interruptible);
> +void drm_exec_fini(struct drm_exec *exec);
> +bool drm_exec_cleanup(struct drm_exec *exec);
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences);
> +
> +#endif


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
  2023-02-17 13:44 ` [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE Danilo Krummrich
@ 2023-02-17 18:18   ` Liam R. Howlett
  2023-02-17 19:38   ` Matthew Wilcox
  1 sibling, 0 replies; 64+ messages in thread
From: Liam R. Howlett @ 2023-02-17 18:18 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, bagasdotme, christian.koenig, jason

* Danilo Krummrich <dakr@redhat.com> [230217 08:44]:
> Generic components making use of the maple tree (such as the
> DRM GPUVA Manager) delegate the responsibility of ensuring mutual
> exclusion to their users.
> 
> While such components could inherit the concept of an external lock,
> some users might just serialize the access to the component and hence to
> the internal maple tree.
> 
> In order to allow such use cases, add a new flag MT_FLAGS_LOCK_NONE to
> indicate not to do any internal lockdep checks.
> 
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> ---
>  include/linux/maple_tree.h | 20 +++++++++++++++-----
>  lib/maple_tree.c           |  7 ++++---
>  2 files changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> index ca04c900e51a..f795e5def8d0 100644
> --- a/include/linux/maple_tree.h
> +++ b/include/linux/maple_tree.h
> @@ -170,10 +170,11 @@ enum maple_type {
>  #define MT_FLAGS_USE_RCU	0x02
>  #define MT_FLAGS_HEIGHT_OFFSET	0x02
>  #define MT_FLAGS_HEIGHT_MASK	0x7C
> -#define MT_FLAGS_LOCK_MASK	0x300
> +#define MT_FLAGS_LOCK_MASK	0x700
>  #define MT_FLAGS_LOCK_IRQ	0x100
>  #define MT_FLAGS_LOCK_BH	0x200
>  #define MT_FLAGS_LOCK_EXTERN	0x300
> +#define MT_FLAGS_LOCK_NONE	0x400

Please add this to the documentation above the flags as well.  We should
probably add enough context so that users don't just set this and then
use multiple writers.

>  
>  #define MAPLE_HEIGHT_MAX	31
>  
> @@ -559,11 +560,16 @@ static inline void mas_set(struct ma_state *mas, unsigned long index)
>  	mas_set_range(mas, index, index);
>  }
>  
> -static inline bool mt_external_lock(const struct maple_tree *mt)
> +static inline bool mt_lock_external(const struct maple_tree *mt)
>  {
>  	return (mt->ma_flags & MT_FLAGS_LOCK_MASK) == MT_FLAGS_LOCK_EXTERN;
>  }
>  
> +static inline bool mt_lock_none(const struct maple_tree *mt)
> +{
> +	return (mt->ma_flags & MT_FLAGS_LOCK_MASK) == MT_FLAGS_LOCK_NONE;
> +}
> +
>  /**
>   * mt_init_flags() - Initialise an empty maple tree with flags.
>   * @mt: Maple Tree
> @@ -577,7 +583,7 @@ static inline bool mt_external_lock(const struct maple_tree *mt)
>  static inline void mt_init_flags(struct maple_tree *mt, unsigned int flags)
>  {
>  	mt->ma_flags = flags;
> -	if (!mt_external_lock(mt))
> +	if (!mt_lock_external(mt) && !mt_lock_none(mt))
>  		spin_lock_init(&mt->ma_lock);
>  	rcu_assign_pointer(mt->ma_root, NULL);
>  }
> @@ -612,9 +618,11 @@ static inline void mt_clear_in_rcu(struct maple_tree *mt)
>  	if (!mt_in_rcu(mt))
>  		return;
>  
> -	if (mt_external_lock(mt)) {
> +	if (mt_lock_external(mt)) {
>  		BUG_ON(!mt_lock_is_held(mt));
>  		mt->ma_flags &= ~MT_FLAGS_USE_RCU;
> +	} else if (mt_lock_none(mt)) {
> +		mt->ma_flags &= ~MT_FLAGS_USE_RCU;
>  	} else {
>  		mtree_lock(mt);
>  		mt->ma_flags &= ~MT_FLAGS_USE_RCU;
> @@ -631,9 +639,11 @@ static inline void mt_set_in_rcu(struct maple_tree *mt)
>  	if (mt_in_rcu(mt))
>  		return;
>  
> -	if (mt_external_lock(mt)) {
> +	if (mt_lock_external(mt)) {
>  		BUG_ON(!mt_lock_is_held(mt));
>  		mt->ma_flags |= MT_FLAGS_USE_RCU;
> +	} else if (mt_lock_none(mt)) {
> +		mt->ma_flags |= MT_FLAGS_USE_RCU;
>  	} else {
>  		mtree_lock(mt);
>  		mt->ma_flags |= MT_FLAGS_USE_RCU;
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 26e2045d3cda..f51c0fd4eaad 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -802,8 +802,8 @@ static inline void __rcu **ma_slots(struct maple_node *mn, enum maple_type mt)
>  
>  static inline bool mt_locked(const struct maple_tree *mt)
>  {
> -	return mt_external_lock(mt) ? mt_lock_is_held(mt) :
> -		lockdep_is_held(&mt->ma_lock);
> +	return mt_lock_external(mt) ? mt_lock_is_held(mt) :
> +		mt_lock_none(mt) ? true : lockdep_is_held(&mt->ma_lock);

It might be better to just make this two return statements for clarity.

>  }
>  
>  static inline void *mt_slot(const struct maple_tree *mt,
> @@ -6120,7 +6120,8 @@ bool mas_nomem(struct ma_state *mas, gfp_t gfp)
>  		return false;
>  	}
>  
> -	if (gfpflags_allow_blocking(gfp) && !mt_external_lock(mas->tree)) {
> +	if (gfpflags_allow_blocking(gfp) &&
> +	    !mt_lock_external(mas->tree) && !mt_lock_none(mas->tree)) {
>  		mtree_unlock(mas->tree);
>  		mas_alloc_nodes(mas, gfp);
>  		mtree_lock(mas->tree);
> -- 
> 2.39.1
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 03/16] maple_tree: split up MA_STATE() macro
  2023-02-17 13:44 ` [PATCH drm-next v2 03/16] maple_tree: split up MA_STATE() macro Danilo Krummrich
@ 2023-02-17 18:34   ` Liam R. Howlett
  2023-02-20 13:48     ` Danilo Krummrich
  2023-02-17 19:45   ` Matthew Wilcox
  1 sibling, 1 reply; 64+ messages in thread
From: Liam R. Howlett @ 2023-02-17 18:34 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, bagasdotme, christian.koenig, jason

* Danilo Krummrich <dakr@redhat.com> [230217 08:44]:
> Split up the MA_STATE() macro such that components using the maple tree
> can easily inherit from struct ma_state and build custom tree walk
> macros to hide their internals from users.
> 
> Example:
> 
> struct sample_iter {
> 	struct ma_state mas;
> 	struct sample_mgr *mgr;
> 	struct sample_entry *entry;
> };
> 
> \#define SAMPLE_ITER(name, __mgr) \
> 	struct sample_iter name = { \
> 		.mas = __MA_STATE(&(__mgr)->mt, 0, 0),
> 		.mgr = __mgr,
> 		.entry = NULL,
> 	}

I see this patch is to allow for anonymous maple states, this looks
good.

I've a lengthy comment about the iterator that I'm adding here to head
off anyone that may copy your example below.

> 
> \#define sample_iter_for_each_range(it__, start__, end__) \
> 	for ((it__).mas.index = start__, (it__).entry = mas_find(&(it__).mas, end__ - 1); \
> 	     (it__).entry; (it__).entry = mas_find(&(it__).mas, end__ - 1))

I see you've added something like the above in your patch set as well.
I'd like to point out that the index isn't the only state information
that needs to be altered here, and in fact, this could go very wrong.

The maple state has a node and an offset within that node.  If you set
the index to lower than the current position of your iterator and call
mas_find() then what happens is somewhat undefined.  I expect you will
get the wrong value (most likely either the current value or the very
next one that the iterator is already pointing to).  I believe you have
been using a fresh maple state for each iterator in your patches, but I
haven't had a deep look into your code yet.

We have methods of resetting the iterator and set the range (mas_set()
and mas_set_range()) which are safe for what you are doing, but they
will start the walk from the root node to the index again.

So, if you know what you are doing is safe, then the way you have
written it will work, but it's worth mentioning that this could occur.

It is also worth pointing out that it would be much safer to use a
function to do the above so you get type safety.. and I was asked to add
this to the VMA interface by Linus [1], which is on its way upstream [2].

1. https://lore.kernel.org/linux-mm/CAHk-=wg9WQXBGkNdKD2bqocnN73rDswuWsavBB7T-tekykEn_A@mail.gmail.com/
2. https://lore.kernel.org/linux-mm/20230120162650.984577-1-Liam.Howlett@oracle.com/

> 
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> ---
>  include/linux/maple_tree.h | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> index e594db58a0f1..ca04c900e51a 100644
> --- a/include/linux/maple_tree.h
> +++ b/include/linux/maple_tree.h
> @@ -424,8 +424,8 @@ struct ma_wr_state {
>  #define MA_ERROR(err) \
>  		((struct maple_enode *)(((unsigned long)err << 2) | 2UL))
>  
> -#define MA_STATE(name, mt, first, end)					\
> -	struct ma_state name = {					\
> +#define __MA_STATE(mt, first, end)					\
> +	{								\
>  		.tree = mt,						\
>  		.index = first,						\
>  		.last = end,						\
> @@ -435,6 +435,9 @@ struct ma_wr_state {
>  		.alloc = NULL,						\
>  	}
>  
> +#define MA_STATE(name, mt, first, end)					\
> +	struct ma_state name = __MA_STATE(mt, first, end)
> +
>  #define MA_WR_STATE(name, ma_state, wr_entry)				\
>  	struct ma_wr_state name = {					\
>  		.mas = ma_state,					\
> -- 
> 2.39.1
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
  2023-02-17 13:44 ` [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE Danilo Krummrich
  2023-02-17 18:18   ` Liam R. Howlett
@ 2023-02-17 19:38   ` Matthew Wilcox
  2023-02-20 14:00     ` Danilo Krummrich
  1 sibling, 1 reply; 64+ messages in thread
From: Matthew Wilcox @ 2023-02-17 19:38 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, dri-devel, corbet, nouveau, ogabbay, linux-doc,
	linux-kernel, linux-mm, boris.brezillon, bskeggs, tzimmermann,
	Liam.Howlett, bagasdotme, christian.koenig, jason

On Fri, Feb 17, 2023 at 02:44:10PM +0100, Danilo Krummrich wrote:
> Generic components making use of the maple tree (such as the
> DRM GPUVA Manager) delegate the responsibility of ensuring mutual
> exclusion to their users.
> 
> While such components could inherit the concept of an external lock,
> some users might just serialize the access to the component and hence to
> the internal maple tree.
> 
> In order to allow such use cases, add a new flag MT_FLAGS_LOCK_NONE to
> indicate not to do any internal lockdep checks.

I'm really against this change.

First, we really should check that users have their locking right.
It's bitten us so many times when they get it wrong.

Second, having a lock allows us to defragment the slab cache.  The
patches to do that haven't gone anywhere recently, but if we drop the
requirement now, we'll never be able to compact ranges of memory that
have slabs allocated to them.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 03/16] maple_tree: split up MA_STATE() macro
  2023-02-17 13:44 ` [PATCH drm-next v2 03/16] maple_tree: split up MA_STATE() macro Danilo Krummrich
  2023-02-17 18:34   ` Liam R. Howlett
@ 2023-02-17 19:45   ` Matthew Wilcox
  2023-02-20 13:48     ` Danilo Krummrich
  1 sibling, 1 reply; 64+ messages in thread
From: Matthew Wilcox @ 2023-02-17 19:45 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, dri-devel, corbet, nouveau, ogabbay, linux-doc,
	linux-kernel, linux-mm, boris.brezillon, bskeggs, tzimmermann,
	Liam.Howlett, bagasdotme, christian.koenig, jason

On Fri, Feb 17, 2023 at 02:44:09PM +0100, Danilo Krummrich wrote:
> \#define SAMPLE_ITER(name, __mgr) \
> 	struct sample_iter name = { \
> 		.mas = __MA_STATE(&(__mgr)->mt, 0, 0),

This is usually called MA_STATE_INIT()

> #define sample_iter_for_each_range(it__, start__, end__) \
> 	for ((it__).mas.index = start__, (it__).entry = mas_find(&(it__).mas, end__ - 1); \
> 	     (it__).entry; (it__).entry = mas_find(&(it__).mas, end__ - 1))

This is a bad iterator design.  It's usually best to do this:

	struct sample *sample;
	SAMPLE_ITERATOR(si, min);

	sample_iter_for_each(&si, sample, max) {
		frob(mgr, sample);
	}

I don't mind splitting apart MA_STATE_INIT from MA_STATE, and if you
do that, we can also use it in VMA_ITERATOR.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-17 13:44 ` [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
@ 2023-02-18  1:05   ` kernel test robot
  2023-02-21 18:20   ` Liam R. Howlett
  2023-02-22 10:25   ` Christian König
  2 siblings, 0 replies; 64+ messages in thread
From: kernel test robot @ 2023-02-18  1:05 UTC (permalink / raw)
  To: Danilo Krummrich, airlied, daniel, tzimmermann, mripard, corbet,
	christian.koenig, bskeggs, Liam.Howlett, matthew.brost,
	boris.brezillon, alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm,
	Danilo Krummrich, oe-kbuild-all, Dave Airlie

Hi Danilo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on 48075a66fca613477ac1969b576a93ef5db0164f]

url:    https://github.com/intel-lab-lkp/linux/commits/Danilo-Krummrich/drm-execution-context-for-GEM-buffers/20230217-215101
base:   48075a66fca613477ac1969b576a93ef5db0164f
patch link:    https://lore.kernel.org/r/20230217134422.14116-6-dakr%40redhat.com
patch subject: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
config: mips-allyesconfig (https://download.01.org/0day-ci/archive/20230218/202302180805.b0ab40V5-lkp@intel.com/config)
compiler: mips-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/00132cc92b6745cfd51c0d5df4c246a848f2ceaa
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Danilo-Krummrich/drm-execution-context-for-GEM-buffers/20230217-215101
        git checkout 00132cc92b6745cfd51c0d5df4c246a848f2ceaa
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=mips olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=mips SHELL=/bin/bash drivers/gpu/drm/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202302180805.b0ab40V5-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/drm_gpuva_mgr.c:1383:5: warning: no previous prototype for 'drm_gpuva_sm_step' [-Wmissing-prototypes]
    1383 | int drm_gpuva_sm_step(struct drm_gpuva_op *__op, void *priv)
         |     ^~~~~~~~~~~~~~~~~
--
>> drivers/gpu/drm/drm_gpuva_mgr.c:529: warning: expecting prototype for drm_gpuva_remove_iter(). Prototype was for drm_gpuva_iter_remove() instead
   drivers/gpu/drm/drm_gpuva_mgr.c:549: warning: Excess function parameter 'addr' description in 'drm_gpuva_insert'
   drivers/gpu/drm/drm_gpuva_mgr.c:549: warning: Excess function parameter 'range' description in 'drm_gpuva_insert'
   drivers/gpu/drm/drm_gpuva_mgr.c:765: warning: Excess function parameter 'addr' description in 'drm_gpuva_region_insert'
   drivers/gpu/drm/drm_gpuva_mgr.c:765: warning: Excess function parameter 'range' description in 'drm_gpuva_region_insert'
   drivers/gpu/drm/drm_gpuva_mgr.c:1345: warning: Excess function parameter 'ops' description in 'drm_gpuva_sm_unmap'
   drivers/gpu/drm/drm_gpuva_mgr.c:1589: warning: Function parameter or member 'addr' not described in 'drm_gpuva_prefetch_ops_create'
   drivers/gpu/drm/drm_gpuva_mgr.c:1589: warning: Function parameter or member 'range' not described in 'drm_gpuva_prefetch_ops_create'
   drivers/gpu/drm/drm_gpuva_mgr.c:1589: warning: Excess function parameter 'req_addr' description in 'drm_gpuva_prefetch_ops_create'
   drivers/gpu/drm/drm_gpuva_mgr.c:1589: warning: Excess function parameter 'req_range' description in 'drm_gpuva_prefetch_ops_create'


vim +/drm_gpuva_sm_step +1383 drivers/gpu/drm/drm_gpuva_mgr.c

  1382	
> 1383	int drm_gpuva_sm_step(struct drm_gpuva_op *__op, void *priv)
  1384	{
  1385		struct {
  1386			struct drm_gpuva_manager *mgr;
  1387			struct drm_gpuva_ops *ops;
  1388		} *args = priv;
  1389		struct drm_gpuva_manager *mgr = args->mgr;
  1390		struct drm_gpuva_ops *ops = args->ops;
  1391		struct drm_gpuva_op *op;
  1392	
  1393		op = gpuva_op_alloc(mgr);
  1394		if (unlikely(!op))
  1395			goto err;
  1396	
  1397		memcpy(op, __op, sizeof(*op));
  1398	
  1399		if (op->op == DRM_GPUVA_OP_REMAP) {
  1400			struct drm_gpuva_op_remap *__r = &__op->remap;
  1401			struct drm_gpuva_op_remap *r = &op->remap;
  1402	
  1403			r->unmap = kmemdup(__r->unmap, sizeof(*r->unmap),
  1404					   GFP_KERNEL);
  1405			if (unlikely(!r->unmap))
  1406				goto err_free_op;
  1407	
  1408			if (__r->prev) {
  1409				r->prev = kmemdup(__r->prev, sizeof(*r->prev),
  1410						  GFP_KERNEL);
  1411				if (unlikely(!r->prev))
  1412					goto err_free_unmap;
  1413			}
  1414	
  1415			if (__r->next) {
  1416				r->next = kmemdup(__r->next, sizeof(*r->next),
  1417						  GFP_KERNEL);
  1418				if (unlikely(!r->next))
  1419					goto err_free_prev;
  1420			}
  1421		}
  1422	
  1423		list_add_tail(&op->entry, &ops->list);
  1424	
  1425		return 0;
  1426	
  1427	err_free_unmap:
  1428		kfree(op->remap.unmap);
  1429	err_free_prev:
  1430		kfree(op->remap.prev);
  1431	err_free_op:
  1432		gpuva_op_free(mgr, op);
  1433	err:
  1434		return -ENOMEM;
  1435	}
  1436	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 13/16] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
  2023-02-17 13:48 ` [PATCH drm-next v2 13/16] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm Danilo Krummrich
@ 2023-02-18  1:16   ` kernel test robot
  0 siblings, 0 replies; 64+ messages in thread
From: kernel test robot @ 2023-02-18  1:16 UTC (permalink / raw)
  To: Danilo Krummrich, airlied, daniel, tzimmermann, mripard, corbet,
	christian.koenig, bskeggs, Liam.Howlett, matthew.brost,
	boris.brezillon, alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm,
	Danilo Krummrich, oe-kbuild-all

Hi Danilo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on 48075a66fca613477ac1969b576a93ef5db0164f]

url:    https://github.com/intel-lab-lkp/linux/commits/Danilo-Krummrich/drm-execution-context-for-GEM-buffers/20230217-215101
base:   48075a66fca613477ac1969b576a93ef5db0164f
patch link:    https://lore.kernel.org/r/20230217134820.14672-8-dakr%40redhat.com
patch subject: [PATCH drm-next v2 13/16] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
config: mips-allyesconfig (https://download.01.org/0day-ci/archive/20230218/202302180839.s0w26kcJ-lkp@intel.com/config)
compiler: mips-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/b25c0bcfed93dd62ed732968d8987b92e10c4579
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Danilo-Krummrich/drm-execution-context-for-GEM-buffers/20230217-215101
        git checkout b25c0bcfed93dd62ed732968d8987b92e10c4579
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=mips olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=mips SHELL=/bin/bash drivers/gpu/drm/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202302180839.s0w26kcJ-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h:4,
                    from drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h:5,
                    from drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c:22:
   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c: In function 'nvkm_uvmm_mthd_raw_map':
>> drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c:422:31: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     422 |                               (void *)args->argv, args->argc);
         |                               ^
   drivers/gpu/drm/nouveau/include/nvkm/core/memory.h:66:43: note: in definition of macro 'nvkm_memory_map'
      66 |         (p)->func->map((p),(o),(vm),(va),(av),(ac))
         |                                           ^~


vim +422 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c

   388	
   389	static int
   390	nvkm_uvmm_mthd_raw_map(struct nvkm_uvmm *uvmm, struct nvif_vmm_raw_v0 *args)
   391	{
   392		struct nvkm_client *client = uvmm->object.client;
   393		struct nvkm_vmm *vmm = uvmm->vmm;
   394		struct nvkm_vma vma = {
   395			.addr = args->addr,
   396			.size = args->size,
   397			.used = true,
   398			.mapref = false,
   399			.no_comp = true,
   400		};
   401		struct nvkm_memory *memory;
   402		u64 handle = args->memory;
   403		u8 refd;
   404		int ret;
   405	
   406		if (!nvkm_vmm_in_managed_range(vmm, args->addr, args->size))
   407			return -EINVAL;
   408	
   409		ret = nvkm_uvmm_page_index(uvmm, args->size, args->shift, &refd);
   410		if (ret)
   411			return ret;
   412	
   413		vma.page = vma.refd = refd;
   414	
   415		memory = nvkm_umem_search(client, args->memory);
   416		if (IS_ERR(memory)) {
   417			VMM_DEBUG(vmm, "memory %016llx %ld\n", handle, PTR_ERR(memory));
   418			return PTR_ERR(memory);
   419		}
   420	
   421		ret = nvkm_memory_map(memory, args->offset, vmm, &vma,
 > 422				      (void *)args->argv, args->argc);
   423	
   424		nvkm_memory_unref(&vma.memory);
   425		nvkm_memory_unref(&memory);
   426		return ret;
   427	}
   428	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 06/16] drm: debugfs: provide infrastructure to dump a DRM GPU VA space
  2023-02-17 13:48 ` [PATCH drm-next v2 06/16] drm: debugfs: provide infrastructure to dump a DRM GPU VA space Danilo Krummrich
@ 2023-02-18  2:47   ` kernel test robot
  0 siblings, 0 replies; 64+ messages in thread
From: kernel test robot @ 2023-02-18  2:47 UTC (permalink / raw)
  To: Danilo Krummrich, airlied, daniel, tzimmermann, mripard, corbet,
	christian.koenig, bskeggs, Liam.Howlett, matthew.brost,
	boris.brezillon, alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm,
	Danilo Krummrich, oe-kbuild-all

Hi Danilo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on 48075a66fca613477ac1969b576a93ef5db0164f]

url:    https://github.com/intel-lab-lkp/linux/commits/Danilo-Krummrich/drm-execution-context-for-GEM-buffers/20230217-215101
base:   48075a66fca613477ac1969b576a93ef5db0164f
patch link:    https://lore.kernel.org/r/20230217134820.14672-1-dakr%40redhat.com
patch subject: [PATCH drm-next v2 06/16] drm: debugfs: provide infrastructure to dump a DRM GPU VA space
config: mips-allyesconfig (https://download.01.org/0day-ci/archive/20230218/202302181014.L0SHo3S1-lkp@intel.com/config)
compiler: mips-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/e1a1c9659baee305780e1ce50c05e53e1d14b245
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Danilo-Krummrich/drm-execution-context-for-GEM-buffers/20230217-215101
        git checkout e1a1c9659baee305780e1ce50c05e53e1d14b245
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=mips olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=mips SHELL=/bin/bash drivers/gpu/drm/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202302181014.L0SHo3S1-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/drm_debugfs.c: In function 'drm_debugfs_gpuva_info':
>> drivers/gpu/drm/drm_debugfs.c:228:28: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
     228 |                            (u64)va->gem.obj, va->gem.offset);
         |                            ^


vim +228 drivers/gpu/drm/drm_debugfs.c

   178	
   179	/**
   180	 * drm_debugfs_gpuva_info - dump the given DRM GPU VA space
   181	 * @m: pointer to the &seq_file to write
   182	 * @mgr: the &drm_gpuva_manager representing the GPU VA space
   183	 *
   184	 * Dumps the GPU VA regions and mappings of a given DRM GPU VA manager.
   185	 *
   186	 * For each DRM GPU VA space drivers should call this function from their
   187	 * &drm_info_list's show callback.
   188	 *
   189	 * Returns: 0 on success, -ENODEV if the &mgr is not initialized
   190	 */
   191	int drm_debugfs_gpuva_info(struct seq_file *m,
   192				   struct drm_gpuva_manager *mgr)
   193	{
   194		DRM_GPUVA_ITER(it, mgr);
   195		DRM_GPUVA_REGION_ITER(__it, mgr);
   196	
   197		if (!mgr->name)
   198			return -ENODEV;
   199	
   200		seq_printf(m, "DRM GPU VA space (%s)\n", mgr->name);
   201		seq_puts  (m, "\n");
   202		seq_puts  (m, " VA regions  | start              | range              | end                | sparse\n");
   203		seq_puts  (m, "------------------------------------------------------------------------------------\n");
   204		seq_printf(m, " VA space    | 0x%016llx | 0x%016llx | 0x%016llx |   -\n",
   205			   mgr->mm_start, mgr->mm_range, mgr->mm_start + mgr->mm_range);
   206		seq_puts  (m, "-----------------------------------------------------------------------------------\n");
   207		drm_gpuva_iter_for_each(__it) {
   208			struct drm_gpuva_region *reg = __it.reg;
   209	
   210			if (reg == &mgr->kernel_alloc_region) {
   211				seq_printf(m, " kernel node | 0x%016llx | 0x%016llx | 0x%016llx |   -\n",
   212					   reg->va.addr, reg->va.range, reg->va.addr + reg->va.range);
   213				continue;
   214			}
   215	
   216			seq_printf(m, "             | 0x%016llx | 0x%016llx | 0x%016llx | %s\n",
   217				   reg->va.addr, reg->va.range, reg->va.addr + reg->va.range,
   218				   reg->sparse ? "true" : "false");
   219		}
   220		seq_puts(m, "\n");
   221		seq_puts(m, " VAs | start              | range              | end                | object             | object offset\n");
   222		seq_puts(m, "-------------------------------------------------------------------------------------------------------------\n");
   223		drm_gpuva_iter_for_each(it) {
   224			struct drm_gpuva *va = it.va;
   225	
   226			seq_printf(m, "     | 0x%016llx | 0x%016llx | 0x%016llx | 0x%016llx | 0x%016llx\n",
   227				   va->va.addr, va->va.range, va->va.addr + va->va.range,
 > 228				   (u64)va->gem.obj, va->gem.offset);
   229		}
   230	
   231		return 0;
   232	}
   233	EXPORT_SYMBOL(drm_debugfs_gpuva_info);
   234	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 03/16] maple_tree: split up MA_STATE() macro
  2023-02-17 19:45   ` Matthew Wilcox
@ 2023-02-20 13:48     ` Danilo Krummrich
  0 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-20 13:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: matthew.brost, dri-devel, corbet, nouveau, ogabbay, linux-doc,
	linux-kernel, linux-mm, boris.brezillon, bskeggs, tzimmermann,
	Liam.Howlett, bagasdotme, christian.koenig, jason

On 2/17/23 20:45, Matthew Wilcox wrote:
> On Fri, Feb 17, 2023 at 02:44:09PM +0100, Danilo Krummrich wrote:
>> \#define SAMPLE_ITER(name, __mgr) \
>> 	struct sample_iter name = { \
>> 		.mas = __MA_STATE(&(__mgr)->mt, 0, 0),
> 
> This is usually called MA_STATE_INIT()

Yep, that's better.

> 
>> #define sample_iter_for_each_range(it__, start__, end__) \
>> 	for ((it__).mas.index = start__, (it__).entry = mas_find(&(it__).mas, end__ - 1); \
>> 	     (it__).entry; (it__).entry = mas_find(&(it__).mas, end__ - 1))
> 
> This is a bad iterator design.  It's usually best to do this:
> 
> 	struct sample *sample;
> 	SAMPLE_ITERATOR(si, min);
> 
> 	sample_iter_for_each(&si, sample, max) {
> 		frob(mgr, sample);
> 	}
> 


The reason why I don't set index (and max) within SAMPLE_ITER() is that 
the range to iterate might not yet be known at that time, so I thought 
it could just be set in sample_iter_for_each_range().

However, I see that this might prevail users to assume that it's safe to 
iterate a range based on the same iterator instance multiple times 
though. Instead users should maybe move the tree walk to another 
function once the range is known.

The reason for the payload structure to be part of the iterator is that 
I have two maple trees in the GPUVA manager and hence two different 
payload types. Within the iterator structure they're just within a union 
allowing me to implement the tree walk macro just once rather than twice.

Anyway, I feel like your approach looks cleaner, hence I'll change it.

> I don't mind splitting apart MA_STATE_INIT from MA_STATE, and if you
> do that, we can also use it in VMA_ITERATOR.
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 03/16] maple_tree: split up MA_STATE() macro
  2023-02-17 18:34   ` Liam R. Howlett
@ 2023-02-20 13:48     ` Danilo Krummrich
  2023-02-21 16:52       ` Liam R. Howlett
  0 siblings, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-20 13:48 UTC (permalink / raw)
  To: Liam R. Howlett, airlied, daniel, tzimmermann, mripard, corbet,
	christian.koenig, bskeggs, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, dri-devel,
	nouveau, linux-doc, linux-mm, linux-kernel

On 2/17/23 19:34, Liam R. Howlett wrote:
> * Danilo Krummrich <dakr@redhat.com> [230217 08:44]:
>> Split up the MA_STATE() macro such that components using the maple tree
>> can easily inherit from struct ma_state and build custom tree walk
>> macros to hide their internals from users.
>>
>> Example:
>>
>> struct sample_iter {
>> 	struct ma_state mas;
>> 	struct sample_mgr *mgr;
>> 	struct sample_entry *entry;
>> };
>>
>> \#define SAMPLE_ITER(name, __mgr) \
>> 	struct sample_iter name = { \
>> 		.mas = __MA_STATE(&(__mgr)->mt, 0, 0),
>> 		.mgr = __mgr,
>> 		.entry = NULL,
>> 	}
> 
> I see this patch is to allow for anonymous maple states, this looks
> good.
> 
> I've a lengthy comment about the iterator that I'm adding here to head
> off anyone that may copy your example below.
> 
>>
>> \#define sample_iter_for_each_range(it__, start__, end__) \
>> 	for ((it__).mas.index = start__, (it__).entry = mas_find(&(it__).mas, end__ - 1); \
>> 	     (it__).entry; (it__).entry = mas_find(&(it__).mas, end__ - 1))
> 
> I see you've added something like the above in your patch set as well.
> I'd like to point out that the index isn't the only state information
> that needs to be altered here, and in fact, this could go very wrong.
> 
> The maple state has a node and an offset within that node.  If you set
> the index to lower than the current position of your iterator and call
> mas_find() then what happens is somewhat undefined.  I expect you will
> get the wrong value (most likely either the current value or the very
> next one that the iterator is already pointing to).  I believe you have
> been using a fresh maple state for each iterator in your patches, but I
> haven't had a deep look into your code yet.

Yes, I'm aware that I'd need to reset the whole iterator in order to 
re-use it.

Regarding the other considerations of the iterator design please see my 
answer to Matthew.

> 
> We have methods of resetting the iterator and set the range (mas_set()
> and mas_set_range()) which are safe for what you are doing, but they
> will start the walk from the root node to the index again.
> 
> So, if you know what you are doing is safe, then the way you have
> written it will work, but it's worth mentioning that this could occur.
> 
> It is also worth pointing out that it would be much safer to use a
> function to do the above so you get type safety.. and I was asked to add
> this to the VMA interface by Linus [1], which is on its way upstream [2].
> 
> 1. https://lore.kernel.org/linux-mm/CAHk-=wg9WQXBGkNdKD2bqocnN73rDswuWsavBB7T-tekykEn_A@mail.gmail.com/
> 2. https://lore.kernel.org/linux-mm/20230120162650.984577-1-Liam.Howlett@oracle.com/

You mean having wrappers like sample_find() instead of directly using 
mas_find()?

> 
>>
>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>> ---
>>   include/linux/maple_tree.h | 7 +++++--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
>> index e594db58a0f1..ca04c900e51a 100644
>> --- a/include/linux/maple_tree.h
>> +++ b/include/linux/maple_tree.h
>> @@ -424,8 +424,8 @@ struct ma_wr_state {
>>   #define MA_ERROR(err) \
>>   		((struct maple_enode *)(((unsigned long)err << 2) | 2UL))
>>   
>> -#define MA_STATE(name, mt, first, end)					\
>> -	struct ma_state name = {					\
>> +#define __MA_STATE(mt, first, end)					\
>> +	{								\
>>   		.tree = mt,						\
>>   		.index = first,						\
>>   		.last = end,						\
>> @@ -435,6 +435,9 @@ struct ma_wr_state {
>>   		.alloc = NULL,						\
>>   	}
>>   
>> +#define MA_STATE(name, mt, first, end)					\
>> +	struct ma_state name = __MA_STATE(mt, first, end)
>> +
>>   #define MA_WR_STATE(name, ma_state, wr_entry)				\
>>   	struct ma_wr_state name = {					\
>>   		.mas = ma_state,					\
>> -- 
>> 2.39.1
>>
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
  2023-02-17 19:38   ` Matthew Wilcox
@ 2023-02-20 14:00     ` Danilo Krummrich
  2023-02-20 15:10       ` Matthew Wilcox
  0 siblings, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-20 14:00 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: matthew.brost, bagasdotme, linux-doc, nouveau, ogabbay, corbet,
	linux-kernel, dri-devel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, christian.koenig, jason

On 2/17/23 20:38, Matthew Wilcox wrote:
> On Fri, Feb 17, 2023 at 02:44:10PM +0100, Danilo Krummrich wrote:
>> Generic components making use of the maple tree (such as the
>> DRM GPUVA Manager) delegate the responsibility of ensuring mutual
>> exclusion to their users.
>>
>> While such components could inherit the concept of an external lock,
>> some users might just serialize the access to the component and hence to
>> the internal maple tree.
>>
>> In order to allow such use cases, add a new flag MT_FLAGS_LOCK_NONE to
>> indicate not to do any internal lockdep checks.
> 
> I'm really against this change.
> 
> First, we really should check that users have their locking right.
> It's bitten us so many times when they get it wrong.

In case of the DRM GPUVA manager, some users might serialize the access 
to the GPUVA manager and hence to it's maple tree instances, e.g. 
through the drm_gpu_scheduler. In such a case ensuring to hold a lock 
would be a bit pointless and I wouldn't really know how to "sell" this 
to potential users of the GPUVA manager.

> 
> Second, having a lock allows us to defragment the slab cache.  The
> patches to do that haven't gone anywhere recently, but if we drop the
> requirement now, we'll never be able to compact ranges of memory that
> have slabs allocated to them.
> 

Not sure if I get that, do you mind explaining a bit how this would 
affect other users of the maple tree, such as my use case, the GPUVA 
manager?


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
  2023-02-20 14:00     ` Danilo Krummrich
@ 2023-02-20 15:10       ` Matthew Wilcox
  2023-02-20 17:06         ` Danilo Krummrich
  0 siblings, 1 reply; 64+ messages in thread
From: Matthew Wilcox @ 2023-02-20 15:10 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, bagasdotme, linux-doc, nouveau, ogabbay, corbet,
	linux-kernel, dri-devel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, christian.koenig, jason

On Mon, Feb 20, 2023 at 03:00:59PM +0100, Danilo Krummrich wrote:
> On 2/17/23 20:38, Matthew Wilcox wrote:
> > On Fri, Feb 17, 2023 at 02:44:10PM +0100, Danilo Krummrich wrote:
> > > Generic components making use of the maple tree (such as the
> > > DRM GPUVA Manager) delegate the responsibility of ensuring mutual
> > > exclusion to their users.
> > > 
> > > While such components could inherit the concept of an external lock,
> > > some users might just serialize the access to the component and hence to
> > > the internal maple tree.
> > > 
> > > In order to allow such use cases, add a new flag MT_FLAGS_LOCK_NONE to
> > > indicate not to do any internal lockdep checks.
> > 
> > I'm really against this change.
> > 
> > First, we really should check that users have their locking right.
> > It's bitten us so many times when they get it wrong.
> 
> In case of the DRM GPUVA manager, some users might serialize the access to
> the GPUVA manager and hence to it's maple tree instances, e.g. through the
> drm_gpu_scheduler. In such a case ensuring to hold a lock would be a bit
> pointless and I wouldn't really know how to "sell" this to potential users
> of the GPUVA manager.

This is why we like people to use the spinlock embedded in the tree.
There's nothing for the user to care about.  If the access really is
serialised, acquiring/releasing the uncontended spinlock is a minimal
cost compared to all the other things that will happen while modifying
the tree.

> > Second, having a lock allows us to defragment the slab cache.  The
> > patches to do that haven't gone anywhere recently, but if we drop the
> > requirement now, we'll never be able to compact ranges of memory that
> > have slabs allocated to them.
> > 
> 
> Not sure if I get that, do you mind explaining a bit how this would affect
> other users of the maple tree, such as my use case, the GPUVA manager?

When we want to free a slab in order to defragment memory, we need
to relocate all the objects allocated within that slab.  To do that
for the maple tree node cache, for each node in this particular slab,
we'll need to walk up to the top of the tree and lock it.  We can then
allocate a new node from a different slab, change the parent to point
to the new node and drop the lock.  After an RCU delay, we can free the
slab and create a larger contiguous block of memory.

As I said, this is somewhat hypothetical in that there's no current
code in the tree to reclaim slabs when we're trying to defragment
memory.  And that's because it's hard to do.  The XArray and maple
tree were designed to make it possible for their slabs.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
  2023-02-20 15:10       ` Matthew Wilcox
@ 2023-02-20 17:06         ` Danilo Krummrich
  2023-02-20 20:33           ` Matthew Wilcox
  0 siblings, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-20 17:06 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: matthew.brost, bagasdotme, linux-doc, nouveau, ogabbay, corbet,
	linux-kernel, dri-devel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, christian.koenig, jason

On 2/20/23 16:10, Matthew Wilcox wrote:
> On Mon, Feb 20, 2023 at 03:00:59PM +0100, Danilo Krummrich wrote:
>> On 2/17/23 20:38, Matthew Wilcox wrote:
>>> On Fri, Feb 17, 2023 at 02:44:10PM +0100, Danilo Krummrich wrote:
>>>> Generic components making use of the maple tree (such as the
>>>> DRM GPUVA Manager) delegate the responsibility of ensuring mutual
>>>> exclusion to their users.
>>>>
>>>> While such components could inherit the concept of an external lock,
>>>> some users might just serialize the access to the component and hence to
>>>> the internal maple tree.
>>>>
>>>> In order to allow such use cases, add a new flag MT_FLAGS_LOCK_NONE to
>>>> indicate not to do any internal lockdep checks.
>>>
>>> I'm really against this change.
>>>
>>> First, we really should check that users have their locking right.
>>> It's bitten us so many times when they get it wrong.
>>
>> In case of the DRM GPUVA manager, some users might serialize the access to
>> the GPUVA manager and hence to it's maple tree instances, e.g. through the
>> drm_gpu_scheduler. In such a case ensuring to hold a lock would be a bit
>> pointless and I wouldn't really know how to "sell" this to potential users
>> of the GPUVA manager.
> 
> This is why we like people to use the spinlock embedded in the tree.
> There's nothing for the user to care about.  If the access really is
> serialised, acquiring/releasing the uncontended spinlock is a minimal
> cost compared to all the other things that will happen while modifying
> the tree.

I think as for the users of the GPUVA manager we'd have two cases:

1) Accesses to the manager (and hence the tree) are serialized, no lock 
needed.

2) Multiple operations on the tree must be locked in order to make them 
appear atomic.

In either case the embedded spinlock wouldn't be useful, we'd either 
need an external lock or no lock at all.

If there are any internal reasons why specific tree operations must be 
mutually excluded (such as those you explain below), wouldn't it make 
more sense to always have the internal lock and, optionally, allow users 
to specify an external lock additionally?

> 
>>> Second, having a lock allows us to defragment the slab cache.  The
>>> patches to do that haven't gone anywhere recently, but if we drop the
>>> requirement now, we'll never be able to compact ranges of memory that
>>> have slabs allocated to them.
>>>
>>
>> Not sure if I get that, do you mind explaining a bit how this would affect
>> other users of the maple tree, such as my use case, the GPUVA manager?
> 
> When we want to free a slab in order to defragment memory, we need
> to relocate all the objects allocated within that slab.  To do that
> for the maple tree node cache, for each node in this particular slab,
> we'll need to walk up to the top of the tree and lock it.  We can then
> allocate a new node from a different slab, change the parent to point
> to the new node and drop the lock.  After an RCU delay, we can free the
> slab and create a larger contiguous block of memory.
> 
> As I said, this is somewhat hypothetical in that there's no current
> code in the tree to reclaim slabs when we're trying to defragment
> memory.  And that's because it's hard to do.  The XArray and maple
> tree were designed to make it possible for their slabs.
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
  2023-02-20 17:06         ` Danilo Krummrich
@ 2023-02-20 20:33           ` Matthew Wilcox
  2023-02-21 14:37             ` Danilo Krummrich
  0 siblings, 1 reply; 64+ messages in thread
From: Matthew Wilcox @ 2023-02-20 20:33 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, bagasdotme, linux-doc, nouveau, ogabbay, corbet,
	linux-kernel, dri-devel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, christian.koenig, jason

On Mon, Feb 20, 2023 at 06:06:03PM +0100, Danilo Krummrich wrote:
> On 2/20/23 16:10, Matthew Wilcox wrote:
> > This is why we like people to use the spinlock embedded in the tree.
> > There's nothing for the user to care about.  If the access really is
> > serialised, acquiring/releasing the uncontended spinlock is a minimal
> > cost compared to all the other things that will happen while modifying
> > the tree.
> 
> I think as for the users of the GPUVA manager we'd have two cases:
> 
> 1) Accesses to the manager (and hence the tree) are serialized, no lock
> needed.
> 
> 2) Multiple operations on the tree must be locked in order to make them
> appear atomic.

Could you give an example here of what you'd like to do?  Ideally
something complicated so I don't say "Oh, you can just do this" when
there's a more complex example for which "this" won't work.  I'm sure
that's embedded somewhere in the next 20-odd patches, but it's probably
quicker for you to describe in terms of tree operations that have to
appear atomic than for me to try to figure it out.

> In either case the embedded spinlock wouldn't be useful, we'd either need an
> external lock or no lock at all.
> 
> If there are any internal reasons why specific tree operations must be
> mutually excluded (such as those you explain below), wouldn't it make more
> sense to always have the internal lock and, optionally, allow users to
> specify an external lock additionally?

So the way this works for the XArray, which is a little older than the
Maple tree, is that we always use the internal spinlock for
modifications (possibly BH or IRQ safe), and if someone wants to
use an external mutex to make some callers atomic with respect to each
other, they're free to do so.  In that case, the XArray doesn't check
the user's external locking at all, because it really can't know.

I'd advise taking that approach; if there's really no way to use the
internal spinlock to make your complicated updates appear atomic
then just let the maple tree use its internal spinlock, and you can
also use your external mutex however you like.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
  2023-02-20 20:33           ` Matthew Wilcox
@ 2023-02-21 14:37             ` Danilo Krummrich
  2023-02-21 18:31               ` Matthew Wilcox
  0 siblings, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-21 14:37 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: matthew.brost, bagasdotme, linux-doc, nouveau, ogabbay, corbet,
	linux-kernel, dri-devel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, christian.koenig, jason

On Mon, Feb 20, 2023 at 08:33:35PM +0000, Matthew Wilcox wrote:
> On Mon, Feb 20, 2023 at 06:06:03PM +0100, Danilo Krummrich wrote:
> > On 2/20/23 16:10, Matthew Wilcox wrote:
> > > This is why we like people to use the spinlock embedded in the tree.
> > > There's nothing for the user to care about.  If the access really is
> > > serialised, acquiring/releasing the uncontended spinlock is a minimal
> > > cost compared to all the other things that will happen while modifying
> > > the tree.
> > 
> > I think as for the users of the GPUVA manager we'd have two cases:
> > 
> > 1) Accesses to the manager (and hence the tree) are serialized, no lock
> > needed.
> > 
> > 2) Multiple operations on the tree must be locked in order to make them
> > appear atomic.
> 
> Could you give an example here of what you'd like to do?  Ideally
> something complicated so I don't say "Oh, you can just do this" when
> there's a more complex example for which "this" won't work.  I'm sure
> that's embedded somewhere in the next 20-odd patches, but it's probably
> quicker for you to describe in terms of tree operations that have to
> appear atomic than for me to try to figure it out.
> 

Absolutely, not gonna ask you to read all of that. :-)

One thing the GPUVA manager does is to provide drivers the (sub-)operations
that need to be processed in order to fulfill a map or unmap request from
userspace. For instance, when userspace asks the driver to map some memory
the GPUVA manager calculates which existing mappings must be removed, split up
or can be merged with the newly requested mapping.

A driver has two ways to fetch those operations from the GPUVA manager. It can
either obtain a list of operations or receive a callback for each operation
generated by the GPUVA manager.

In both cases the GPUVA manager walks the maple tree, which keeps track of
existing mappings, for the given range in __drm_gpuva_sm_map() (only considering
the map case, since the unmap case is a subset basically). For each mapping
found in the given range the driver, as mentioned, either receives a callback or
a list entry is added to the list of operations.

Typically, for each operation / callback one entry within the maple tree is
removed and, optionally at the beginning and end of a new mapping's range, a
new entry is inserted. An of course, as the last operation, there is the new
mapping itself to insert.

The GPUVA manager delegates locking responsibility to the drivers. Typically,
a driver either serializes access to the VA space managed by the GPUVA manager
(no lock needed) or need to lock the processing of a full set of operations
generated by the GPUVA manager.

> > In either case the embedded spinlock wouldn't be useful, we'd either need an
> > external lock or no lock at all.
> > 
> > If there are any internal reasons why specific tree operations must be
> > mutually excluded (such as those you explain below), wouldn't it make more
> > sense to always have the internal lock and, optionally, allow users to
> > specify an external lock additionally?
> 
> So the way this works for the XArray, which is a little older than the
> Maple tree, is that we always use the internal spinlock for
> modifications (possibly BH or IRQ safe), and if someone wants to
> use an external mutex to make some callers atomic with respect to each
> other, they're free to do so.  In that case, the XArray doesn't check
> the user's external locking at all, because it really can't know.
> 
> I'd advise taking that approach; if there's really no way to use the
> internal spinlock to make your complicated updates appear atomic
> then just let the maple tree use its internal spinlock, and you can
> also use your external mutex however you like.
> 

That sounds like the right thing to do.

However, I'm using the advanced API of the maple tree (and that's the reason
why the above example appears a little more detailed than needed) because I
think with the normal API I can't insert / remove tree entries while walking
the tree, right?

As by the documentation the advanced API, however, doesn't take care of locking
itself, hence just letting the maple tree use its internal spinlock doesn't
really work - I need to take care of that myself, right?

It feels a bit weird that I, as a user of the API, would need to lock certain
(or all?) mas_*() functions with the internal spinlock in order to protect
(future) internal features of the tree, such as the slab cache defragmentation
you mentioned. Because from my perspective, as the generic component that tells
it's users (the drivers) to take care of locking VA space operations (and hence
tree operations) I don't have an own purpose of this internal spinlock, right?

Also I'm a little confused how I'd know where to take the spinlock? E.g. for
inserting entries in the tree I use mas_store_gfp() with GFP_KERNEL.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 01/16] drm: execution context for GEM buffers
  2023-02-17 16:00   ` Christian König
@ 2023-02-21 14:56     ` Danilo Krummrich
  0 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-21 14:56 UTC (permalink / raw)
  To: Christian König
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, bagasdotme, jason

On 2/17/23 17:00, Christian König wrote:
> Am 17.02.23 um 14:44 schrieb Danilo Krummrich:
>> From: Christian König <christian.koenig@amd.com>
>>
>> This adds the infrastructure for an execution context for GEM buffers
>> which is similar to the existinc TTMs execbuf util and intended to 
>> replace
>> it in the long term.
>>
>> The basic functionality is that we abstracts the necessary loop to lock
>> many different GEM buffers with automated deadlock and duplicate 
>> handling.
>>
>> v2: drop xarray and use dynamic resized array instead, the locking
>>      overhead is unecessary and measureable.
> 
> Question for Danilo and probably others: Does Nouveau make use of the 
> duplicate tracking at some point?

I'm not iterating duplicates or do something with them explicitly. 
However, I rely on drm_exec being able to deal with duplicates in general.
> 
> Background is that I only have two or three use cases for this in 
> radeon/amdgpu and would like to make it an optional feature.
> 
> Going to take a look at the rest of this series next week.
> 
> Regards,
> Christian.
> 
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   Documentation/gpu/drm-mm.rst       |  12 ++
>>   drivers/gpu/drm/Kconfig            |   6 +
>>   drivers/gpu/drm/Makefile           |   2 +
>>   drivers/gpu/drm/amd/amdgpu/Kconfig |   1 +
>>   drivers/gpu/drm/drm_exec.c         | 295 +++++++++++++++++++++++++++++
>>   include/drm/drm_exec.h             | 144 ++++++++++++++
>>   6 files changed, 460 insertions(+)
>>   create mode 100644 drivers/gpu/drm/drm_exec.c
>>   create mode 100644 include/drm/drm_exec.h
>>
>> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
>> index a79fd3549ff8..a52e6f4117d6 100644
>> --- a/Documentation/gpu/drm-mm.rst
>> +++ b/Documentation/gpu/drm-mm.rst
>> @@ -493,6 +493,18 @@ DRM Sync Objects
>>   .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
>>      :export:
>> +DRM Execution context
>> +=====================
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
>> +   :doc: Overview
>> +
>> +.. kernel-doc:: include/drm/drm_exec.h
>> +   :internal:
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
>> +   :export:
>> +
>>   GPU Scheduler
>>   =============
>> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
>> index f42d4c6a19f2..1573d658fbb5 100644
>> --- a/drivers/gpu/drm/Kconfig
>> +++ b/drivers/gpu/drm/Kconfig
>> @@ -200,6 +200,12 @@ config DRM_TTM
>>         GPU memory types. Will be enabled automatically if a device 
>> driver
>>         uses it.
>> +config DRM_EXEC
>> +    tristate
>> +    depends on DRM
>> +    help
>> +      Execution context for command submissions
>> +
>>   config DRM_BUDDY
>>       tristate
>>       depends on DRM
>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>> index ab4460fcd63f..d40defbb0347 100644
>> --- a/drivers/gpu/drm/Makefile
>> +++ b/drivers/gpu/drm/Makefile
>> @@ -78,6 +78,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += 
>> drm_panel_orientation_quirks.o
>>   #
>>   # Memory-management helpers
>>   #
>> +#
>> +obj-$(CONFIG_DRM_EXEC) += drm_exec.o
>>   obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
>> b/drivers/gpu/drm/amd/amdgpu/Kconfig
>> index 5341b6b242c3..279fb3bba810 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
>> +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
>> @@ -11,6 +11,7 @@ config DRM_AMDGPU
>>       select DRM_SCHED
>>       select DRM_TTM
>>       select DRM_TTM_HELPER
>> +    select DRM_EXEC
>>       select POWER_SUPPLY
>>       select HWMON
>>       select I2C
>> diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
>> new file mode 100644
>> index 000000000000..ed2106c22786
>> --- /dev/null
>> +++ b/drivers/gpu/drm/drm_exec.c
>> @@ -0,0 +1,295 @@
>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>> +
>> +#include <drm/drm_exec.h>
>> +#include <drm/drm_gem.h>
>> +#include <linux/dma-resv.h>
>> +
>> +/**
>> + * DOC: Overview
>> + *
>> + * This component mainly abstracts the retry loop necessary for locking
>> + * multiple GEM objects while preparing hardware operations (e.g. 
>> command
>> + * submissions, page table updates etc..).
>> + *
>> + * If a contention is detected while locking a GEM object the cleanup 
>> procedure
>> + * unlocks all previously locked GEM objects and locks the contended 
>> one first
>> + * before locking any further objects.
>> + *
>> + * After an object is locked fences slots can optionally be reserved 
>> on the
>> + * dma_resv object inside the GEM object.
>> + *
>> + * A typical usage pattern should look like this::
>> + *
>> + *    struct drm_gem_object *obj;
>> + *    struct drm_exec exec;
>> + *    unsigned long index;
>> + *    int ret;
>> + *
>> + *    drm_exec_init(&exec, true);
>> + *    drm_exec_while_not_all_locked(&exec) {
>> + *        ret = drm_exec_prepare_obj(&exec, boA, 1);
>> + *        drm_exec_continue_on_contention(&exec);
>> + *        if (ret)
>> + *            goto error;
>> + *
>> + *        ret = drm_exec_lock(&exec, boB, 1);
>> + *        drm_exec_continue_on_contention(&exec);
>> + *        if (ret)
>> + *            goto error;
>> + *    }
>> + *
>> + *    drm_exec_for_each_locked_object(&exec, index, obj) {
>> + *        dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
>> + *        ...
>> + *    }
>> + *    drm_exec_fini(&exec);
>> + *
>> + * See struct dma_exec for more details.
>> + */
>> +
>> +/* Dummy value used to initially enter the retry loop */
>> +#define DRM_EXEC_DUMMY (void*)~0
>> +
>> +/* Initialize the drm_exec_objects container */
>> +static void drm_exec_objects_init(struct drm_exec_objects *container)
>> +{
>> +    container->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> +
>> +    /* If allocation here fails, just delay that till the first use */
>> +    container->max_objects = container->objects ?
>> +        PAGE_SIZE / sizeof(void *) : 0;
>> +    container->num_objects = 0;
>> +}
>> +
>> +/* Cleanup the drm_exec_objects container */
>> +static void drm_exec_objects_fini(struct drm_exec_objects *container)
>> +{
>> +    kvfree(container->objects);
>> +}
>> +
>> +/* Make sure we have enough room and add an object the container */
>> +static int drm_exec_objects_add(struct drm_exec_objects *container,
>> +                struct drm_gem_object *obj)
>> +{
>> +    if (unlikely(container->num_objects == container->max_objects)) {
>> +        size_t size = container->max_objects * sizeof(void *);
>> +        void *tmp;
>> +
>> +        tmp = kvrealloc(container->objects, size, size + PAGE_SIZE,
>> +                GFP_KERNEL);
>> +        if (!tmp)
>> +            return -ENOMEM;
>> +
>> +        container->objects = tmp;
>> +        container->max_objects += PAGE_SIZE / sizeof(void *);
>> +    }
>> +    drm_gem_object_get(obj);
>> +    container->objects[container->num_objects++] = obj;
>> +    return 0;
>> +}
>> +
>> +/* Unlock all objects and drop references */
>> +static void drm_exec_unlock_all(struct drm_exec *exec)
>> +{
>> +    struct drm_gem_object *obj;
>> +    unsigned long index;
>> +
>> +    drm_exec_for_each_duplicate_object(exec, index, obj)
>> +        drm_gem_object_put(obj);
>> +
>> +    drm_exec_for_each_locked_object(exec, index, obj) {
>> +        dma_resv_unlock(obj->resv);
>> +        drm_gem_object_put(obj);
>> +    }
>> +}
>> +
>> +/**
>> + * drm_exec_init - initialize a drm_exec object
>> + * @exec: the drm_exec object to initialize
>> + * @interruptible: if locks should be acquired interruptible
>> + *
>> + * Initialize the object and make sure that we can track locked and 
>> duplicate
>> + * objects.
>> + */
>> +void drm_exec_init(struct drm_exec *exec, bool interruptible)
>> +{
>> +    exec->interruptible = interruptible;
>> +    drm_exec_objects_init(&exec->locked);
>> +    drm_exec_objects_init(&exec->duplicates);
>> +    exec->contended = DRM_EXEC_DUMMY;
>> +}
>> +EXPORT_SYMBOL(drm_exec_init);
>> +
>> +/**
>> + * drm_exec_fini - finalize a drm_exec object
>> + * @exec: the drm_exec object to finilize
>> + *
>> + * Unlock all locked objects, drop the references to objects and free 
>> all memory
>> + * used for tracking the state.
>> + */
>> +void drm_exec_fini(struct drm_exec *exec)
>> +{
>> +    drm_exec_unlock_all(exec);
>> +    drm_exec_objects_fini(&exec->locked);
>> +    drm_exec_objects_fini(&exec->duplicates);
>> +    if (exec->contended != DRM_EXEC_DUMMY) {
>> +        drm_gem_object_put(exec->contended);
>> +        ww_acquire_fini(&exec->ticket);
>> +    }
>> +}
>> +EXPORT_SYMBOL(drm_exec_fini);
>> +
>> +/**
>> + * drm_exec_cleanup - cleanup when contention is detected
>> + * @exec: the drm_exec object to cleanup
>> + *
>> + * Cleanup the current state and return true if we should stay inside 
>> the retry
>> + * loop, false if there wasn't any contention detected and we can 
>> keep the
>> + * objects locked.
>> + */
>> +bool drm_exec_cleanup(struct drm_exec *exec)
>> +{
>> +    if (likely(!exec->contended)) {
>> +        ww_acquire_done(&exec->ticket);
>> +        return false;
>> +    }
>> +
>> +    if (likely(exec->contended == DRM_EXEC_DUMMY)) {
>> +        exec->contended = NULL;
>> +        ww_acquire_init(&exec->ticket, &reservation_ww_class);
>> +        return true;
>> +    }
>> +
>> +    drm_exec_unlock_all(exec);
>> +    exec->locked.num_objects = 0;
>> +    exec->duplicates.num_objects = 0;
>> +    return true;
>> +}
>> +EXPORT_SYMBOL(drm_exec_cleanup);
>> +
>> +/* Track the locked object in the xa and reserve fences */
>> +static int drm_exec_obj_locked(struct drm_exec_objects *container,
>> +                   struct drm_gem_object *obj,
>> +                   unsigned int num_fences)
>> +{
>> +    int ret;
>> +
>> +    if (container) {
>> +        ret = drm_exec_objects_add(container, obj);
>> +        if (ret)
>> +            return ret;
>> +    }
>> +
>> +    if (num_fences) {
>> +        ret = dma_resv_reserve_fences(obj->resv, num_fences);
>> +        if (ret)
>> +            goto error_erase;
>> +    }
>> +
>> +    return 0;
>> +
>> +error_erase:
>> +    if (container) {
>> +        --container->num_objects;
>> +        drm_gem_object_put(obj);
>> +    }
>> +    return ret;
>> +}
>> +
>> +/* Make sure the contended object is locked first */
>> +static int drm_exec_lock_contended(struct drm_exec *exec)
>> +{
>> +    struct drm_gem_object *obj = exec->contended;
>> +    int ret;
>> +
>> +    if (likely(!obj))
>> +        return 0;
>> +
>> +    if (exec->interruptible) {
>> +        ret = dma_resv_lock_slow_interruptible(obj->resv,
>> +                               &exec->ticket);
>> +        if (unlikely(ret))
>> +            goto error_dropref;
>> +    } else {
>> +        dma_resv_lock_slow(obj->resv, &exec->ticket);
>> +    }
>> +
>> +    ret = drm_exec_obj_locked(&exec->locked, obj, 0);
>> +    if (unlikely(ret))
>> +        dma_resv_unlock(obj->resv);
>> +
>> +error_dropref:
>> +    /* Always cleanup the contention so that error handling can kick 
>> in */
>> +    drm_gem_object_put(obj);
>> +    exec->contended = NULL;
>> +    return ret;
>> +}
>> +
>> +/**
>> + * drm_exec_prepare_obj - prepare a GEM object for use
>> + * @exec: the drm_exec object with the state
>> + * @obj: the GEM object to prepare
>> + * @num_fences: how many fences to reserve
>> + *
>> + * Prepare a GEM object for use by locking it and reserving fence 
>> slots. All
>> + * succesfully locked objects are put into the locked container. 
>> Duplicates
>> + * detected as well and automatically moved into the duplicates 
>> container.
>> + *
>> + * Returns: -EDEADLK if a contention is detected, -ENOMEM when memory
>> + * allocation failed and zero for success.
>> + */
>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object 
>> *obj,
>> +             unsigned int num_fences)
>> +{
>> +    int ret;
>> +
>> +    ret = drm_exec_lock_contended(exec);
>> +    if (unlikely(ret))
>> +        return ret;
>> +
>> +    if (exec->interruptible)
>> +        ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
>> +    else
>> +        ret = dma_resv_lock(obj->resv, &exec->ticket);
>> +
>> +    if (unlikely(ret == -EDEADLK)) {
>> +        drm_gem_object_get(obj);
>> +        exec->contended = obj;
>> +        return -EDEADLK;
>> +    }
>> +
>> +    if (unlikely(ret == -EALREADY)) {
>> +        struct drm_exec_objects *container = &exec->duplicates;
>> +
>> +        /*
>> +         * If this is the first locked GEM object it was most likely
>> +         * just contended. So don't add it to the duplicates, just
>> +         * reserve the fence slots.
>> +         */
>> +        if (exec->locked.num_objects && exec->locked.objects[0] == obj)
>> +            container = NULL;
>> +
>> +        ret = drm_exec_obj_locked(container, obj, num_fences);
>> +        if (ret)
>> +            return ret;
>> +
>> +    } else if (unlikely(ret)) {
>> +        return ret;
>> +
>> +    } else {
>> +        ret = drm_exec_obj_locked(&exec->locked, obj, num_fences);
>> +        if (ret)
>> +            goto error_unlock;
>> +    }
>> +
>> +    drm_gem_object_get(obj);
>> +    return 0;
>> +
>> +error_unlock:
>> +    dma_resv_unlock(obj->resv);
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL(drm_exec_prepare_obj);
>> +
>> +MODULE_DESCRIPTION("DRM execution context");
>> +MODULE_LICENSE("Dual MIT/GPL");
>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
>> new file mode 100644
>> index 000000000000..f73981c6292e
>> --- /dev/null
>> +++ b/include/drm/drm_exec.h
>> @@ -0,0 +1,144 @@
>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>> +
>> +#ifndef __DRM_EXEC_H__
>> +#define __DRM_EXEC_H__
>> +
>> +#include <linux/ww_mutex.h>
>> +
>> +struct drm_gem_object;
>> +
>> +/**
>> + * struct drm_exec_objects - Container for GEM objects in a drm_exec
>> + */
>> +struct drm_exec_objects {
>> +    unsigned int        num_objects;
>> +    unsigned int        max_objects;
>> +    struct drm_gem_object    **objects;
>> +};
>> +
>> +/**
>> + * drm_exec_objects_for_each - iterate over all the objects inside 
>> the container
>> + */
>> +#define drm_exec_objects_for_each(array, index, obj)        \
>> +    for (index = 0, obj = (array)->objects[0];        \
>> +         index < (array)->num_objects;            \
>> +         ++index, obj = (array)->objects[index])
>> +
>> +/**
>> + * struct drm_exec - Execution context
>> + */
>> +struct drm_exec {
>> +    /**
>> +     * @interruptible: If locks should be taken interruptible
>> +     */
>> +    bool            interruptible;
>> +
>> +    /**
>> +     * @ticket: WW ticket used for acquiring locks
>> +     */
>> +    struct ww_acquire_ctx    ticket;
>> +
>> +    /**
>> +     * @locked: container for the locked GEM objects
>> +     */
>> +    struct drm_exec_objects    locked;
>> +
>> +    /**
>> +     * @duplicates: container for the duplicated GEM objects
>> +     */
>> +    struct drm_exec_objects    duplicates;
>> +
>> +    /**
>> +     * @contended: contended GEM object we backet of for.
>> +     */
>> +    struct drm_gem_object    *contended;
>> +};
>> +
>> +/**
>> + * drm_exec_for_each_locked_object - iterate over all the locked objects
>> + * @exec: drm_exec object
>> + * @index: unsigned long index for the iteration
>> + * @obj: the current GEM object
>> + *
>> + * Iterate over all the locked GEM objects inside the drm_exec object.
>> + */
>> +#define drm_exec_for_each_locked_object(exec, index, obj)    \
>> +    drm_exec_objects_for_each(&(exec)->locked, index, obj)
>> +
>> +/**
>> + * drm_exec_for_each_duplicate_object - iterate over all the 
>> duplicate objects
>> + * @exec: drm_exec object
>> + * @index: unsigned long index for the iteration
>> + * @obj: the current GEM object
>> + *
>> + * Iterate over all the duplicate GEM objects inside the drm_exec 
>> object.
>> + */
>> +#define drm_exec_for_each_duplicate_object(exec, index, obj)    \
>> +    drm_exec_objects_for_each(&(exec)->duplicates, index, obj)
>> +
>> +/**
>> + * drm_exec_while_not_all_locked - loop until all GEM objects are 
>> prepared
>> + * @exec: drm_exec object
>> + *
>> + * Core functionality of the drm_exec object. Loops until all GEM 
>> objects are
>> + * prepared and no more contention exists.
>> + *
>> + * At the beginning of the loop it is guaranteed that no GEM object 
>> is locked.
>> + */
>> +#define drm_exec_while_not_all_locked(exec)    \
>> +    while (drm_exec_cleanup(exec))
>> +
>> +/**
>> + * drm_exec_continue_on_contention - continue the loop when we need 
>> to cleanup
>> + * @exec: drm_exec object
>> + *
>> + * Control flow helper to continue when a contention was detected and 
>> we need to
>> + * clean up and re-start the loop to prepare all GEM objects.
>> + */
>> +#define drm_exec_continue_on_contention(exec)        \
>> +    if (unlikely(drm_exec_is_contended(exec)))    \
>> +        continue
>> +
>> +/**
>> + * drm_exec_break_on_contention - break a subordinal loop on contention
>> + * @exec: drm_exec object
>> + *
>> + * Control flow helper to break a subordinal loop when a contention 
>> was detected
>> + * and we need to clean up and re-start the loop to prepare all GEM 
>> objects.
>> + */
>> +#define drm_exec_break_on_contention(exec)        \
>> +    if (unlikely(drm_exec_is_contended(exec)))    \
>> +        break
>> +
>> +/**
>> + * drm_exec_is_contended - check for contention
>> + * @exec: drm_exec object
>> + *
>> + * Returns true if the drm_exec object has run into some contention 
>> while
>> + * locking a GEM object and needs to clean up.
>> + */
>> +static inline bool drm_exec_is_contended(struct drm_exec *exec)
>> +{
>> +    return !!exec->contended;
>> +}
>> +
>> +/**
>> + * drm_exec_has_duplicates - check for duplicated GEM object
>> + * @exec: drm_exec object
>> + *
>> + * Return true if the drm_exec object has encountered some already 
>> locked GEM
>> + * objects while trying to lock them. This can happen if multiple GEM 
>> objects
>> + * share the same underlying resv object.
>> + */
>> +static inline bool drm_exec_has_duplicates(struct drm_exec *exec)
>> +{
>> +    return exec->duplicates.num_objects > 0;
>> +}
>> +
>> +void drm_exec_init(struct drm_exec *exec, bool interruptible);
>> +void drm_exec_fini(struct drm_exec *exec);
>> +bool drm_exec_cleanup(struct drm_exec *exec);
>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object 
>> *obj,
>> +             unsigned int num_fences);
>> +
>> +#endif
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 03/16] maple_tree: split up MA_STATE() macro
  2023-02-20 13:48     ` Danilo Krummrich
@ 2023-02-21 16:52       ` Liam R. Howlett
  0 siblings, 0 replies; 64+ messages in thread
From: Liam R. Howlett @ 2023-02-21 16:52 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, bagasdotme, christian.koenig, jason

* Danilo Krummrich <dakr@redhat.com> [230220 09:38]:
> On 2/17/23 19:34, Liam R. Howlett wrote:
> > * Danilo Krummrich <dakr@redhat.com> [230217 08:44]:
> > > Split up the MA_STATE() macro such that components using the maple tree
> > > can easily inherit from struct ma_state and build custom tree walk
> > > macros to hide their internals from users.
> > > 
> > > Example:
> > > 
> > > struct sample_iter {
> > > 	struct ma_state mas;
> > > 	struct sample_mgr *mgr;
> > > 	struct sample_entry *entry;
> > > };
> > > 
> > > \#define SAMPLE_ITER(name, __mgr) \
> > > 	struct sample_iter name = { \
> > > 		.mas = __MA_STATE(&(__mgr)->mt, 0, 0),
> > > 		.mgr = __mgr,
> > > 		.entry = NULL,
> > > 	}
> > 
> > I see this patch is to allow for anonymous maple states, this looks
> > good.
> > 
> > I've a lengthy comment about the iterator that I'm adding here to head
> > off anyone that may copy your example below.
> > 
> > > 
> > > \#define sample_iter_for_each_range(it__, start__, end__) \
> > > 	for ((it__).mas.index = start__, (it__).entry = mas_find(&(it__).mas, end__ - 1); \
> > > 	     (it__).entry; (it__).entry = mas_find(&(it__).mas, end__ - 1))
> > 
> > I see you've added something like the above in your patch set as well.
> > I'd like to point out that the index isn't the only state information
> > that needs to be altered here, and in fact, this could go very wrong.
> > 
> > The maple state has a node and an offset within that node.  If you set
> > the index to lower than the current position of your iterator and call
> > mas_find() then what happens is somewhat undefined.  I expect you will
> > get the wrong value (most likely either the current value or the very
> > next one that the iterator is already pointing to).  I believe you have
> > been using a fresh maple state for each iterator in your patches, but I
> > haven't had a deep look into your code yet.
> 
> Yes, I'm aware that I'd need to reset the whole iterator in order to re-use
> it.

Okay, good.  The way you have it written makes it unsafe to just call
without knowledge of the state and that will probably end poorly over
the long run.  If it's always starting from MAS_START then it's probably
safer to just initialize when you want to use it to the correct start
address.

> 
> Regarding the other considerations of the iterator design please see my
> answer to Matthew.
> 
> > 
> > We have methods of resetting the iterator and set the range (mas_set()
> > and mas_set_range()) which are safe for what you are doing, but they
> > will start the walk from the root node to the index again.
> > 
> > So, if you know what you are doing is safe, then the way you have
> > written it will work, but it's worth mentioning that this could occur.
> > 
> > It is also worth pointing out that it would be much safer to use a
> > function to do the above so you get type safety.. and I was asked to add
> > this to the VMA interface by Linus [1], which is on its way upstream [2].
> > 
> > 1. https://lore.kernel.org/linux-mm/CAHk-=wg9WQXBGkNdKD2bqocnN73rDswuWsavBB7T-tekykEn_A@mail.gmail.com/
> > 2. https://lore.kernel.org/linux-mm/20230120162650.984577-1-Liam.Howlett@oracle.com/
> 
> You mean having wrappers like sample_find() instead of directly using
> mas_find()?

I'm not sure you need to go that low level, but I would ensure I have a
store/load function that ensures the correct type being put in/read from
are correct on compile - especially since you seem to have two trees to
track two different sets of things.  That iterator is probably safe
since the type is defined within itself.

> 
> > 
> > > 
> > > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > > ---
> > >   include/linux/maple_tree.h | 7 +++++--
> > >   1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> > > index e594db58a0f1..ca04c900e51a 100644
> > > --- a/include/linux/maple_tree.h
> > > +++ b/include/linux/maple_tree.h
> > > @@ -424,8 +424,8 @@ struct ma_wr_state {
> > >   #define MA_ERROR(err) \
> > >   		((struct maple_enode *)(((unsigned long)err << 2) | 2UL))
> > > -#define MA_STATE(name, mt, first, end)					\
> > > -	struct ma_state name = {					\
> > > +#define __MA_STATE(mt, first, end)					\
> > > +	{								\
> > >   		.tree = mt,						\
> > >   		.index = first,						\
> > >   		.last = end,						\
> > > @@ -435,6 +435,9 @@ struct ma_wr_state {
> > >   		.alloc = NULL,						\
> > >   	}
> > > +#define MA_STATE(name, mt, first, end)					\
> > > +	struct ma_state name = __MA_STATE(mt, first, end)
> > > +
> > >   #define MA_WR_STATE(name, ma_state, wr_entry)				\
> > >   	struct ma_wr_state name = {					\
> > >   		.mas = ma_state,					\
> > > -- 
> > > 2.39.1
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-17 13:44 ` [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
  2023-02-18  1:05   ` kernel test robot
@ 2023-02-21 18:20   ` Liam R. Howlett
  2023-02-22 18:13     ` Danilo Krummrich
  2023-02-28  2:17     ` Danilo Krummrich
  2023-02-22 10:25   ` Christian König
  2 siblings, 2 replies; 64+ messages in thread
From: Liam R. Howlett @ 2023-02-21 18:20 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Dave Airlie, bagasdotme, christian.koenig, jason

* Danilo Krummrich <dakr@redhat.com> [230217 08:45]:
> Add infrastructure to keep track of GPU virtual address (VA) mappings
> with a decicated VA space manager implementation.
> 
> New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
> start implementing, allow userspace applications to request multiple and
> arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
> intended to serve the following purposes in this context.
> 
> 1) Provide infrastructure to track GPU VA allocations and mappings,
>    making use of the maple_tree.
> 
> 2) Generically connect GPU VA mappings to their backing buffers, in
>    particular DRM GEM objects.
> 
> 3) Provide a common implementation to perform more complex mapping
>    operations on the GPU VA space. In particular splitting and merging
>    of GPU VA mappings, e.g. for intersecting mapping requests or partial
>    unmap requests.
> 
> Suggested-by: Dave Airlie <airlied@redhat.com>
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> ---
>  Documentation/gpu/drm-mm.rst    |   31 +
>  drivers/gpu/drm/Makefile        |    1 +
>  drivers/gpu/drm/drm_gem.c       |    3 +
>  drivers/gpu/drm/drm_gpuva_mgr.c | 1704 +++++++++++++++++++++++++++++++
>  include/drm/drm_drv.h           |    6 +
>  include/drm/drm_gem.h           |   75 ++
>  include/drm/drm_gpuva_mgr.h     |  714 +++++++++++++
>  7 files changed, 2534 insertions(+)
>  create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>  create mode 100644 include/drm/drm_gpuva_mgr.h
> 
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index a52e6f4117d6..c9f120cfe730 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
>  .. kernel-doc:: drivers/gpu/drm/drm_mm.c
>     :export:
>  
...

> +
> +/**
> + * drm_gpuva_remove_iter - removes the iterators current element
> + * @it: the &drm_gpuva_iterator
> + *
> + * This removes the element the iterator currently points to.
> + */
> +void
> +drm_gpuva_iter_remove(struct drm_gpuva_iterator *it)
> +{
> +	mas_erase(&it->mas);
> +}
> +EXPORT_SYMBOL(drm_gpuva_iter_remove);
> +
> +/**
> + * drm_gpuva_insert - insert a &drm_gpuva
> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
> + * @va: the &drm_gpuva to insert
> + * @addr: the start address of the GPU VA
> + * @range: the range of the GPU VA
> + *
> + * Insert a &drm_gpuva with a given address and range into a
> + * &drm_gpuva_manager.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuva_insert(struct drm_gpuva_manager *mgr,
> +		 struct drm_gpuva *va)
> +{
> +	u64 addr = va->va.addr;
> +	u64 range = va->va.range;
> +	MA_STATE(mas, &mgr->va_mt, addr, addr + range - 1);
> +	struct drm_gpuva_region *reg = NULL;
> +	int ret;
> +
> +	if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
> +		return -EINVAL;
> +
> +	if (unlikely(drm_gpuva_in_kernel_region(mgr, addr, range)))
> +		return -EINVAL;
> +
> +	if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS) {
> +		reg = drm_gpuva_in_region(mgr, addr, range);
> +		if (unlikely(!reg))
> +			return -EINVAL;
> +	}
> +

-----

> +	if (unlikely(drm_gpuva_find_first(mgr, addr, range)))
> +		return -EEXIST;
> +
> +	ret = mas_store_gfp(&mas, va, GFP_KERNEL);

mas_walk() will set the internal maple state to the limits to what it
finds.  So, instead of an iterator, you can use the walk function and
ensure there is a large enough area in the existing NULL:

/*
 * Nothing at addr, mas now points to the location where the store would
 * happen
 */
if (mas_walk(&mas))
	return -EEXIST;

/* The NULL entry ends at mas.last, make sure there is room */
if (mas.last < (addr + range - 1))
	return -EEXIST;

/* Limit the store size to the correct end address, and store */
 mas.last = addr + range - 1;
 ret = mas_store_gfp(&mas, va, GFP_KERNEL);

> +	if (unlikely(ret))
> +		return ret;
> +
> +	va->mgr = mgr;
> +	va->region = reg;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(drm_gpuva_insert);
> +
> +/**
> + * drm_gpuva_remove - remove a &drm_gpuva
> + * @va: the &drm_gpuva to remove
> + *
> + * This removes the given &va from the underlaying tree.
> + */
> +void
> +drm_gpuva_remove(struct drm_gpuva *va)
> +{
> +	MA_STATE(mas, &va->mgr->va_mt, va->va.addr, 0);
> +
> +	mas_erase(&mas);
> +}
> +EXPORT_SYMBOL(drm_gpuva_remove);
> +
...

> +/**
> + * drm_gpuva_find_first - find the first &drm_gpuva in the given range
> + * @mgr: the &drm_gpuva_manager to search in
> + * @addr: the &drm_gpuvas address
> + * @range: the &drm_gpuvas range
> + *
> + * Returns: the first &drm_gpuva within the given range
> + */
> +struct drm_gpuva *
> +drm_gpuva_find_first(struct drm_gpuva_manager *mgr,
> +		     u64 addr, u64 range)
> +{
> +	MA_STATE(mas, &mgr->va_mt, addr, 0);
> +
> +	return mas_find(&mas, addr + range - 1);
> +}
> +EXPORT_SYMBOL(drm_gpuva_find_first);
> +
> +/**
> + * drm_gpuva_find - find a &drm_gpuva
> + * @mgr: the &drm_gpuva_manager to search in
> + * @addr: the &drm_gpuvas address
> + * @range: the &drm_gpuvas range
> + *
> + * Returns: the &drm_gpuva at a given &addr and with a given &range

Note that mas_find() will continue upwards in the address space if there
isn't anything at @addr.  This means that &drm_gpuva may not be at
&addr.  If you want to check just at &addr, use mas_walk().

> + */
> +struct drm_gpuva *
> +drm_gpuva_find(struct drm_gpuva_manager *mgr,
> +	       u64 addr, u64 range)
> +{
> +	struct drm_gpuva *va;
> +
> +	va = drm_gpuva_find_first(mgr, addr, range);
> +	if (!va)
> +		goto out;
> +
> +	if (va->va.range != range)
> +		goto out;
> +
> +	return va;
> +
> +out:
> +	return NULL;
> +}
> +EXPORT_SYMBOL(drm_gpuva_find);
> +
> +/**
> + * drm_gpuva_find_prev - find the &drm_gpuva before the given address
> + * @mgr: the &drm_gpuva_manager to search in
> + * @start: the given GPU VA's start address
> + *
> + * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
> + *
> + * Note that if there is any free space between the GPU VA mappings no mapping
> + * is returned.
> + *
> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
> + */
> +struct drm_gpuva *
> +drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)

find_prev() usually continues beyond 1 less than the address.  I found
this name confusing.  You may as well use mas_walk(), it would be
faster.

> +{
> +	MA_STATE(mas, &mgr->va_mt, start, 0);
> +
> +	if (start <= mgr->mm_start ||
> +	    start > (mgr->mm_start + mgr->mm_range))
> +		return NULL;
> +
> +	return mas_prev(&mas, start - 1);
> +}
> +EXPORT_SYMBOL(drm_gpuva_find_prev);
> +
> +/**
> + * drm_gpuva_find_next - find the &drm_gpuva after the given address
> + * @mgr: the &drm_gpuva_manager to search in
> + * @end: the given GPU VA's end address
> + *
> + * Find the adjacent &drm_gpuva after the GPU VA with given &end address.
> + *
> + * Note that if there is any free space between the GPU VA mappings no mapping
> + * is returned.
> + *
> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
> + */
> +struct drm_gpuva *
> +drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end)

This name is also a bit confusing for the same reason.  Again, it seems
worth just walking to end here.

> +{
> +	MA_STATE(mas, &mgr->va_mt, end - 1, 0);
> +
> +	if (end < mgr->mm_start ||
> +	    end >= (mgr->mm_start + mgr->mm_range))
> +		return NULL;
> +
> +	return mas_next(&mas, end);
> +}
> +EXPORT_SYMBOL(drm_gpuva_find_next);
> +
> +static int
> +__drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
> +			  struct drm_gpuva_region *reg)
> +{
> +	u64 addr = reg->va.addr;
> +	u64 range = reg->va.range;
> +	MA_STATE(mas, &mgr->region_mt, addr, addr + range - 1);
> +	int ret;
> +
> +	if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
> +		return -EINVAL;
> +
> +	ret = mas_store_gfp(&mas, reg, GFP_KERNEL);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	reg->mgr = mgr;
> +
> +	return 0;
> +}
> +
> +/**
> + * drm_gpuva_region_insert - insert a &drm_gpuva_region
> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
> + * @reg: the &drm_gpuva_region to insert
> + * @addr: the start address of the GPU VA
> + * @range: the range of the GPU VA
> + *
> + * Insert a &drm_gpuva_region with a given address and range into a
> + * &drm_gpuva_manager.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
> +			struct drm_gpuva_region *reg)
> +{
> +	if (unlikely(!(mgr->flags & DRM_GPUVA_MANAGER_REGIONS)))
> +		return -EINVAL;
> +
> +	return __drm_gpuva_region_insert(mgr, reg);
> +}
> +EXPORT_SYMBOL(drm_gpuva_region_insert);
> +
> +static void
> +__drm_gpuva_region_remove(struct drm_gpuva_region *reg)
> +{
> +	struct drm_gpuva_manager *mgr = reg->mgr;
> +	MA_STATE(mas, &mgr->region_mt, reg->va.addr, 0);
> +
> +	mas_erase(&mas);
> +}
> +
> +/**
> + * drm_gpuva_region_remove - remove a &drm_gpuva_region
> + * @reg: the &drm_gpuva to remove
> + *
> + * This removes the given &reg from the underlaying tree.
> + */
> +void
> +drm_gpuva_region_remove(struct drm_gpuva_region *reg)
> +{
> +	struct drm_gpuva_manager *mgr = reg->mgr;
> +
> +	if (unlikely(!(mgr->flags & DRM_GPUVA_MANAGER_REGIONS)))
> +		return;
> +
> +	if (unlikely(reg == &mgr->kernel_alloc_region)) {
> +		WARN(1, "Can't destroy kernel reserved region.\n");
> +		return;
> +	}
> +
> +	if (unlikely(!drm_gpuva_region_empty(reg)))
> +		WARN(1, "GPU VA region should be empty on destroy.\n");
> +
> +	__drm_gpuva_region_remove(reg);
> +}
> +EXPORT_SYMBOL(drm_gpuva_region_remove);
> +
> +/**
> + * drm_gpuva_region_empty - indicate whether a &drm_gpuva_region is empty
> + * @reg: the &drm_gpuva to destroy
> + *
> + * Returns: true if the &drm_gpuva_region is empty, false otherwise
> + */
> +bool
> +drm_gpuva_region_empty(struct drm_gpuva_region *reg)
> +{
> +	DRM_GPUVA_ITER(it, reg->mgr);
> +
> +	drm_gpuva_iter_for_each_range(it, reg->va.addr,
> +				      reg->va.addr +
> +				      reg->va.range)
> +		return false;
> +
> +	return true;
> +}
> +EXPORT_SYMBOL(drm_gpuva_region_empty);
> +
> +/**
> + * drm_gpuva_region_find_first - find the first &drm_gpuva_region in the given
> + * range
> + * @mgr: the &drm_gpuva_manager to search in
> + * @addr: the &drm_gpuva_regions address
> + * @range: the &drm_gpuva_regions range
> + *
> + * Returns: the first &drm_gpuva_region within the given range
> + */
> +struct drm_gpuva_region *
> +drm_gpuva_region_find_first(struct drm_gpuva_manager *mgr,
> +			    u64 addr, u64 range)
> +{
> +	MA_STATE(mas, &mgr->region_mt, addr, 0);
> +
> +	return mas_find(&mas, addr + range - 1);
> +}
> +EXPORT_SYMBOL(drm_gpuva_region_find_first);
> +
> +/**
> + * drm_gpuva_region_find - find a &drm_gpuva_region
> + * @mgr: the &drm_gpuva_manager to search in
> + * @addr: the &drm_gpuva_regions address
> + * @range: the &drm_gpuva_regions range
> + *
> + * Returns: the &drm_gpuva_region at a given &addr and with a given &range

again, I'm not sure you want to find first or walk here.. It sounds like
you want exactly addr to addr + range VMA?

> + */
> +struct drm_gpuva_region *
> +drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
> +		      u64 addr, u64 range)
> +{
> +	struct drm_gpuva_region *reg;
> +
> +	reg = drm_gpuva_region_find_first(mgr, addr, range);
> +	if (!reg)
> +		goto out;
> +
> +	if (reg->va.range != range)
> +		goto out;
> +
> +	return reg;
> +
> +out:
> +	return NULL;
> +}
> +EXPORT_SYMBOL(drm_gpuva_region_find);
> +

...


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
  2023-02-21 14:37             ` Danilo Krummrich
@ 2023-02-21 18:31               ` Matthew Wilcox
  2023-02-22 16:11                 ` Danilo Krummrich
  2023-02-27 17:39                 ` Danilo Krummrich
  0 siblings, 2 replies; 64+ messages in thread
From: Matthew Wilcox @ 2023-02-21 18:31 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, bagasdotme, linux-doc, nouveau, ogabbay, corbet,
	linux-kernel, dri-devel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, christian.koenig, jason

On Tue, Feb 21, 2023 at 03:37:49PM +0100, Danilo Krummrich wrote:
> On Mon, Feb 20, 2023 at 08:33:35PM +0000, Matthew Wilcox wrote:
> > On Mon, Feb 20, 2023 at 06:06:03PM +0100, Danilo Krummrich wrote:
> > > On 2/20/23 16:10, Matthew Wilcox wrote:
> > > > This is why we like people to use the spinlock embedded in the tree.
> > > > There's nothing for the user to care about.  If the access really is
> > > > serialised, acquiring/releasing the uncontended spinlock is a minimal
> > > > cost compared to all the other things that will happen while modifying
> > > > the tree.
> > > 
> > > I think as for the users of the GPUVA manager we'd have two cases:
> > > 
> > > 1) Accesses to the manager (and hence the tree) are serialized, no lock
> > > needed.
> > > 
> > > 2) Multiple operations on the tree must be locked in order to make them
> > > appear atomic.
> > 
> > Could you give an example here of what you'd like to do?  Ideally
> > something complicated so I don't say "Oh, you can just do this" when
> > there's a more complex example for which "this" won't work.  I'm sure
> > that's embedded somewhere in the next 20-odd patches, but it's probably
> > quicker for you to describe in terms of tree operations that have to
> > appear atomic than for me to try to figure it out.
> > 
> 
> Absolutely, not gonna ask you to read all of that. :-)
> 
> One thing the GPUVA manager does is to provide drivers the (sub-)operations
> that need to be processed in order to fulfill a map or unmap request from
> userspace. For instance, when userspace asks the driver to map some memory
> the GPUVA manager calculates which existing mappings must be removed, split up
> or can be merged with the newly requested mapping.
> 
> A driver has two ways to fetch those operations from the GPUVA manager. It can
> either obtain a list of operations or receive a callback for each operation
> generated by the GPUVA manager.
> 
> In both cases the GPUVA manager walks the maple tree, which keeps track of
> existing mappings, for the given range in __drm_gpuva_sm_map() (only considering
> the map case, since the unmap case is a subset basically). For each mapping
> found in the given range the driver, as mentioned, either receives a callback or
> a list entry is added to the list of operations.
> 
> Typically, for each operation / callback one entry within the maple tree is
> removed and, optionally at the beginning and end of a new mapping's range, a
> new entry is inserted. An of course, as the last operation, there is the new
> mapping itself to insert.
> 
> The GPUVA manager delegates locking responsibility to the drivers. Typically,
> a driver either serializes access to the VA space managed by the GPUVA manager
> (no lock needed) or need to lock the processing of a full set of operations
> generated by the GPUVA manager.

OK, that all makes sense.  It does make sense to have the driver use its
own mutex and then take the spinlock inside the maple tree code.  It
shouldn't ever be contended.

> > > In either case the embedded spinlock wouldn't be useful, we'd either need an
> > > external lock or no lock at all.
> > > 
> > > If there are any internal reasons why specific tree operations must be
> > > mutually excluded (such as those you explain below), wouldn't it make more
> > > sense to always have the internal lock and, optionally, allow users to
> > > specify an external lock additionally?
> > 
> > So the way this works for the XArray, which is a little older than the
> > Maple tree, is that we always use the internal spinlock for
> > modifications (possibly BH or IRQ safe), and if someone wants to
> > use an external mutex to make some callers atomic with respect to each
> > other, they're free to do so.  In that case, the XArray doesn't check
> > the user's external locking at all, because it really can't know.
> > 
> > I'd advise taking that approach; if there's really no way to use the
> > internal spinlock to make your complicated updates appear atomic
> > then just let the maple tree use its internal spinlock, and you can
> > also use your external mutex however you like.
> > 
> 
> That sounds like the right thing to do.
> 
> However, I'm using the advanced API of the maple tree (and that's the reason
> why the above example appears a little more detailed than needed) because I
> think with the normal API I can't insert / remove tree entries while walking
> the tree, right?

Right.  The normal API is for simple operations while the advanced API
is for doing compound operations.

> As by the documentation the advanced API, however, doesn't take care of locking
> itself, hence just letting the maple tree use its internal spinlock doesn't
> really work - I need to take care of that myself, right?

Yes; once you're using the advanced API, you get to compose the entire
operation yourself.

> It feels a bit weird that I, as a user of the API, would need to lock certain
> (or all?) mas_*() functions with the internal spinlock in order to protect
> (future) internal features of the tree, such as the slab cache defragmentation
> you mentioned. Because from my perspective, as the generic component that tells
> it's users (the drivers) to take care of locking VA space operations (and hence
> tree operations) I don't have an own purpose of this internal spinlock, right?

You don't ... but we can't know that.

> Also I'm a little confused how I'd know where to take the spinlock? E.g. for
> inserting entries in the tree I use mas_store_gfp() with GFP_KERNEL.

Lockdep will shout at you if you get it wrong ;-)  But you can safely
take the spinlock before calling mas_store_gfp(GFP_KERNEL) because
mas_nomem() knows to drop the lock before doing a sleeping allocation.
Essentially you're open-coding mtree_store_range() but doing your own
thing in addition to the store.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-17 13:44 ` [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
  2023-02-18  1:05   ` kernel test robot
  2023-02-21 18:20   ` Liam R. Howlett
@ 2023-02-22 10:25   ` Christian König
  2023-02-22 15:07     ` Danilo Krummrich
  2 siblings, 1 reply; 64+ messages in thread
From: Christian König @ 2023-02-22 10:25 UTC (permalink / raw)
  To: Danilo Krummrich, airlied, daniel, tzimmermann, mripard, corbet,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason
  Cc: linux-doc, nouveau, linux-kernel, dri-devel, linux-mm, Dave Airlie

Am 17.02.23 um 14:44 schrieb Danilo Krummrich:
> Add infrastructure to keep track of GPU virtual address (VA) mappings
> with a decicated VA space manager implementation.
>
> New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
> start implementing, allow userspace applications to request multiple and
> arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
> intended to serve the following purposes in this context.
>
> 1) Provide infrastructure to track GPU VA allocations and mappings,
>     making use of the maple_tree.
>
> 2) Generically connect GPU VA mappings to their backing buffers, in
>     particular DRM GEM objects.
>
> 3) Provide a common implementation to perform more complex mapping
>     operations on the GPU VA space. In particular splitting and merging
>     of GPU VA mappings, e.g. for intersecting mapping requests or partial
>     unmap requests.
>
> Suggested-by: Dave Airlie <airlied@redhat.com>
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> ---
>   Documentation/gpu/drm-mm.rst    |   31 +
>   drivers/gpu/drm/Makefile        |    1 +
>   drivers/gpu/drm/drm_gem.c       |    3 +
>   drivers/gpu/drm/drm_gpuva_mgr.c | 1704 +++++++++++++++++++++++++++++++
>   include/drm/drm_drv.h           |    6 +
>   include/drm/drm_gem.h           |   75 ++
>   include/drm/drm_gpuva_mgr.h     |  714 +++++++++++++
>   7 files changed, 2534 insertions(+)
>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>   create mode 100644 include/drm/drm_gpuva_mgr.h
>
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index a52e6f4117d6..c9f120cfe730 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
>   .. kernel-doc:: drivers/gpu/drm/drm_mm.c
>      :export:
>   
> +DRM GPU VA Manager
> +==================
> +
> +Overview
> +--------
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
> +   :doc: Overview
> +
> +Split and Merge
> +---------------
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
> +   :doc: Split and Merge
> +
> +Locking
> +-------
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
> +   :doc: Locking
> +
> +
> +DRM GPU VA Manager Function References
> +--------------------------------------
> +
> +.. kernel-doc:: include/drm/drm_gpuva_mgr.h
> +   :internal:
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
> +   :export:
> +
>   DRM Buddy Allocator
>   ===================
>   
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index d40defbb0347..4d098efffb98 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -45,6 +45,7 @@ drm-y := \
>   	drm_vblank.o \
>   	drm_vblank_work.o \
>   	drm_vma_manager.o \
> +	drm_gpuva_mgr.o \
>   	drm_writeback.o
>   drm-$(CONFIG_DRM_LEGACY) += \
>   	drm_agpsupport.o \
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index 59a0bb5ebd85..65115fe88627 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
>   	if (!obj->resv)
>   		obj->resv = &obj->_resv;
>   
> +	if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA))
> +		drm_gem_gpuva_init(obj);
> +
>   	drm_vma_node_reset(&obj->vma_node);
>   	INIT_LIST_HEAD(&obj->lru_node);
>   }
> diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
> new file mode 100644
> index 000000000000..19f583704562
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_gpuva_mgr.c
> @@ -0,0 +1,1704 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2022 Red Hat.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + * Authors:
> + *     Danilo Krummrich <dakr@redhat.com>
> + *
> + */
> +
> +#include <drm/drm_gem.h>
> +#include <drm/drm_gpuva_mgr.h>
> +
> +/**
> + * DOC: Overview
> + *
> + * The DRM GPU VA Manager, represented by struct drm_gpuva_manager keeps track
> + * of a GPU's virtual address (VA) space and manages the corresponding virtual
> + * mappings represented by &drm_gpuva objects. It also keeps track of the
> + * mapping's backing &drm_gem_object buffers.
> + *
> + * &drm_gem_object buffers maintain a list (and a corresponding list lock) of
> + * &drm_gpuva objects representing all existent GPU VA mappings using this
> + * &drm_gem_object as backing buffer.
> + *
> + * If the &DRM_GPUVA_MANAGER_REGIONS feature is enabled, a GPU VA mapping can
> + * only be created within a previously allocated &drm_gpuva_region, which
> + * represents a reserved portion of the GPU VA space. GPU VA mappings are not
> + * allowed to span over a &drm_gpuva_region's boundary.
> + *
> + * GPU VA regions can also be flagged as sparse, which allows drivers to create
> + * sparse mappings for a whole GPU VA region in order to support Vulkan
> + * 'Sparse Resources'.

Well since we have now found that there is absolutely no technical 
reason for having those regions could we please drop them?

I don't really see a need for them any more.

Regards,
Christian.

> + *
> + * The GPU VA manager internally uses &maple_tree structures to manage the
> + * &drm_gpuva mappings and the &drm_gpuva_regions within a GPU's virtual address
> + * space.
> + *
> + * Besides the GPU VA space regions (&drm_gpuva_region) allocated by a driver
> + * the &drm_gpuva_manager contains a special region representing the portion of
> + * VA space reserved by the kernel. This node is initialized together with the
> + * GPU VA manager instance and removed when the GPU VA manager is destroyed.
> + *
> + * In a typical application drivers would embed struct drm_gpuva_manager,
> + * struct drm_gpuva_region and struct drm_gpuva within their own driver
> + * specific structures, there won't be any memory allocations of it's own nor
> + * memory allocations of &drm_gpuva or &drm_gpuva_region entries.
> + */
> +
> +/**
> + * DOC: Split and Merge
> + *
> + * The DRM GPU VA manager also provides an algorithm implementing splitting and
> + * merging of existent GPU VA mappings with the ones that are requested to be
> + * mapped or unmapped. This feature is required by the Vulkan API to implement
> + * Vulkan 'Sparse Memory Bindings' - drivers UAPIs often refer to this as
> + * VM BIND.
> + *
> + * Drivers can call drm_gpuva_sm_map() to receive a sequence of callbacks
> + * containing map, unmap and remap operations for a given newly requested
> + * mapping. The sequence of callbacks represents the set of operations to
> + * execute in order to integrate the new mapping cleanly into the current state
> + * of the GPU VA space.
> + *
> + * Depending on how the new GPU VA mapping intersects with the existent mappings
> + * of the GPU VA space the &drm_gpuva_fn_ops callbacks contain an arbitrary
> + * amount of unmap operations, a maximum of two remap operations and a single
> + * map operation. The caller might receive no callback at all if no operation is
> + * required, e.g. if the requested mapping already exists in the exact same way.
> + *
> + * The single map operation, if existent, represents the original map operation
> + * requested by the caller. Please note that this operation might be altered
> + * comparing it with the original map operation, e.g. because it was merged with
> + * an already  existent mapping. Hence, drivers must execute this map operation
> + * instead of the original one passed to drm_gpuva_sm_map().
> + *
> + * &drm_gpuva_op_unmap contains a 'keep' field, which indicates whether the
> + * &drm_gpuva to unmap is physically contiguous with the original mapping
> + * request. Optionally, if 'keep' is set, drivers may keep the actual page table
> + * entries for this &drm_gpuva, adding the missing page table entries only and
> + * update the &drm_gpuva_manager's view of things accordingly.
> + *
> + * Drivers may do the same optimization, namely delta page table updates, also
> + * for remap operations. This is possible since &drm_gpuva_op_remap consists of
> + * one unmap operation and one or two map operations, such that drivers can
> + * derive the page table update delta accordingly.
> + *
> + * Note that there can't be more than two existent mappings to split up, one at
> + * the beginning and one at the end of the new mapping, hence there is a
> + * maximum of two remap operations.
> + *
> + * Generally, the DRM GPU VA manager never merges mappings across the
> + * boundaries of &drm_gpuva_regions. This is the case since merging between
> + * GPU VA regions would result into unmap and map operations to be issued for
> + * both regions involved although the original mapping request was referred to
> + * one specific GPU VA region only. Since the other GPU VA region, the one not
> + * explicitly requested to be altered, might be in use by the GPU, we are not
> + * allowed to issue any map/unmap operations for this region.
> + *
> + * To update the &drm_gpuva_manager's view of the GPU VA space
> + * drm_gpuva_insert() and drm_gpuva_remove() should be used.
> + *
> + * Analogous to drm_gpuva_sm_map() drm_gpuva_sm_unmap() uses &drm_gpuva_fn_ops
> + * to call back into the driver in order to unmap a range of GPU VA space. The
> + * logic behind this function is way simpler though: For all existent mappings
> + * enclosed by the given range unmap operations are created. For mappings which
> + * are only partically located within the given range, remap operations are
> + * created such that those mappings are split up and re-mapped partically.
> + *
> + * The following diagram depicts the basic relationships of existent GPU VA
> + * mappings, a newly requested mapping and the resulting mappings as implemented
> + * by drm_gpuva_sm_map() - it doesn't cover any arbitrary combinations of these.
> + *
> + * 1) Requested mapping is identical, hence noop.
> + *
> + *    ::
> + *
> + *	     0     a     1
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	     0     a     1
> + *	req: |-----------| (bo_offset=n)
> + *
> + *	     0     a     1
> + *	new: |-----------| (bo_offset=n)
> + *
> + *
> + * 2) Requested mapping is identical, except for the BO offset, hence replace
> + *    the mapping.
> + *
> + *    ::
> + *
> + *	     0     a     1
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	     0     a     1
> + *	req: |-----------| (bo_offset=m)
> + *
> + *	     0     a     1
> + *	new: |-----------| (bo_offset=m)
> + *
> + *
> + * 3) Requested mapping is identical, except for the backing BO, hence replace
> + *    the mapping.
> + *
> + *    ::
> + *
> + *	     0     a     1
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	     0     b     1
> + *	req: |-----------| (bo_offset=n)
> + *
> + *	     0     b     1
> + *	new: |-----------| (bo_offset=n)
> + *
> + *
> + * 4) Existent mapping is a left aligned subset of the requested one, hence
> + *    replace the existent one.
> + *
> + *    ::
> + *
> + *	     0  a  1
> + *	old: |-----|       (bo_offset=n)
> + *
> + *	     0     a     2
> + *	req: |-----------| (bo_offset=n)
> + *
> + *	     0     a     2
> + *	new: |-----------| (bo_offset=n)
> + *
> + *    .. note::
> + *       We expect to see the same result for a request with a different BO
> + *       and/or non-contiguous BO offset.
> + *
> + *
> + * 5) Requested mapping's range is a left aligned subset of the existent one,
> + *    but backed by a different BO. Hence, map the requested mapping and split
> + *    the existent one adjusting it's BO offset.
> + *
> + *    ::
> + *
> + *	     0     a     2
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	     0  b  1
> + *	req: |-----|       (bo_offset=n)
> + *
> + *	     0  b  1  a' 2
> + *	new: |-----|-----| (b.bo_offset=n, a.bo_offset=n+1)
> + *
> + *    .. note::
> + *       We expect to see the same result for a request with a different BO
> + *       and/or non-contiguous BO offset.
> + *
> + *
> + * 6) Existent mapping is a superset of the requested mapping, hence noop.
> + *
> + *    ::
> + *
> + *	     0     a     2
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	     0  a  1
> + *	req: |-----|       (bo_offset=n)
> + *
> + *	     0     a     2
> + *	new: |-----------| (bo_offset=n)
> + *
> + *
> + * 7) Requested mapping's range is a right aligned subset of the existent one,
> + *    but backed by a different BO. Hence, map the requested mapping and split
> + *    the existent one, without adjusting the BO offset.
> + *
> + *    ::
> + *
> + *	     0     a     2
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	           1  b  2
> + *	req:       |-----| (bo_offset=m)
> + *
> + *	     0  a  1  b  2
> + *	new: |-----|-----| (a.bo_offset=n,b.bo_offset=m)
> + *
> + *
> + * 8) Existent mapping is a superset of the requested mapping, hence noop.
> + *
> + *    ::
> + *
> + *	      0     a     2
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	           1  a  2
> + *	req:       |-----| (bo_offset=n+1)
> + *
> + *	     0     a     2
> + *	new: |-----------| (bo_offset=n)
> + *
> + *
> + * 9) Existent mapping is overlapped at the end by the requested mapping backed
> + *    by a different BO. Hence, map the requested mapping and split up the
> + *    existent one, without adjusting the BO offset.
> + *
> + *    ::
> + *
> + *	     0     a     2
> + *	old: |-----------|       (bo_offset=n)
> + *
> + *	           1     b     3
> + *	req:       |-----------| (bo_offset=m)
> + *
> + *	     0  a  1     b     3
> + *	new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
> + *
> + *
> + * 10) Existent mapping is overlapped by the requested mapping, both having the
> + *     same backing BO with a contiguous offset. Hence, merge both mappings.
> + *
> + *     ::
> + *
> + *	      0     a     2
> + *	 old: |-----------|       (bo_offset=n)
> + *
> + *	            1     a     3
> + *	 req:       |-----------| (bo_offset=n+1)
> + *
> + *	      0        a        3
> + *	 new: |-----------------| (bo_offset=n)
> + *
> + *
> + * 11) Requested mapping's range is a centered subset of the existent one
> + *     having a different backing BO. Hence, map the requested mapping and split
> + *     up the existent one in two mappings, adjusting the BO offset of the right
> + *     one accordingly.
> + *
> + *     ::
> + *
> + *	      0        a        3
> + *	 old: |-----------------| (bo_offset=n)
> + *
> + *	            1  b  2
> + *	 req:       |-----|       (bo_offset=m)
> + *
> + *	      0  a  1  b  2  a' 3
> + *	 new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
> + *
> + *
> + * 12) Requested mapping is a contiguous subset of the existent one, hence noop.
> + *
> + *     ::
> + *
> + *	      0        a        3
> + *	 old: |-----------------| (bo_offset=n)
> + *
> + *	            1  a  2
> + *	 req:       |-----|       (bo_offset=n+1)
> + *
> + *	      0        a        3
> + *	 old: |-----------------| (bo_offset=n)
> + *
> + *
> + * 13) Existent mapping is a right aligned subset of the requested one, hence
> + *     replace the existent one.
> + *
> + *     ::
> + *
> + *	            1  a  2
> + *	 old:       |-----| (bo_offset=n+1)
> + *
> + *	      0     a     2
> + *	 req: |-----------| (bo_offset=n)
> + *
> + *	      0     a     2
> + *	 new: |-----------| (bo_offset=n)
> + *
> + *     .. note::
> + *        We expect to see the same result for a request with a different bo
> + *        and/or non-contiguous bo_offset.
> + *
> + *
> + * 14) Existent mapping is a centered subset of the requested one, hence
> + *     replace the existent one.
> + *
> + *     ::
> + *
> + *	            1  a  2
> + *	 old:       |-----| (bo_offset=n+1)
> + *
> + *	      0        a       3
> + *	 req: |----------------| (bo_offset=n)
> + *
> + *	      0        a       3
> + *	 new: |----------------| (bo_offset=n)
> + *
> + *     .. note::
> + *        We expect to see the same result for a request with a different bo
> + *        and/or non-contiguous bo_offset.
> + *
> + *
> + * 15) Existent mappings is overlapped at the beginning by the requested mapping
> + *     backed by a different BO. Hence, map the requested mapping and split up
> + *     the existent one, adjusting it's BO offset accordingly.
> + *
> + *     ::
> + *
> + *	            1     a     3
> + *	 old:       |-----------| (bo_offset=n)
> + *
> + *	      0     b     2
> + *	 req: |-----------|       (bo_offset=m)
> + *
> + *	      0     b     2  a' 3
> + *	 new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> + *
> + *
> + * 16) Requested mapping fills the gap between two existent mappings all having
> + *     the same backing BO, such that all three have a contiguous BO offset.
> + *     Hence, merge all mappings.
> + *
> + *     ::
> + *
> + *	      0     a     1
> + *	 old: |-----------|                        (bo_offset=n)
> + *
> + *	                             2     a     3
> + *	 old':                       |-----------| (bo_offset=n+2)
> + *
> + *	                 1     a     2
> + *	 req:            |-----------|             (bo_offset=n+1)
> + *
> + *	                       a
> + *	 new: |----------------------------------| (bo_offset=n)
> + */
> +
> +/**
> + * DOC: Locking
> + *
> + * Generally, the GPU VA manager does not take care of locking itself, it is
> + * the drivers responsibility to take care about locking. Drivers might want to
> + * protect the following operations: inserting, removing and iterating
> + * &drm_gpuva and &drm_gpuva_region objects as well as generating all kinds of
> + * operations, such as split / merge or prefetch.
> + *
> + * The GPU VA manager also does not take care of the locking of the backing
> + * &drm_gem_object buffers GPU VA lists by itself; drivers are responsible to
> + * enforce mutual exclusion.
> + */
> +
> +
> +static int __drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
> +				     struct drm_gpuva_region *reg);
> +static void __drm_gpuva_region_remove(struct drm_gpuva_region *reg);
> +
> +/**
> + * drm_gpuva_manager_init - initialize a &drm_gpuva_manager
> + * @mgr: pointer to the &drm_gpuva_manager to initialize
> + * @name: the name of the GPU VA space
> + * @start_offset: the start offset of the GPU VA space
> + * @range: the size of the GPU VA space
> + * @reserve_offset: the start of the kernel reserved GPU VA area
> + * @reserve_range: the size of the kernel reserved GPU VA area
> + * @ops: &drm_gpuva_fn_ops called on &drm_gpuva_sm_map / &drm_gpuva_sm_unmap
> + * @flags: the feature flags of the &drm_gpuva_manager
> + *
> + * The &drm_gpuva_manager must be initialized with this function before use.
> + *
> + * Note that @mgr must be cleared to 0 before calling this function. The given
> + * &name is expected to be managed by the surrounding driver structures.
> + */
> +void
> +drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
> +		       const char *name,
> +		       u64 start_offset, u64 range,
> +		       u64 reserve_offset, u64 reserve_range,
> +		       struct drm_gpuva_fn_ops *ops,
> +		       enum drm_gpuva_mgr_flags flags)
> +{
> +	mt_init_flags(&mgr->region_mt, MT_FLAGS_LOCK_NONE);
> +	mt_init_flags(&mgr->va_mt, MT_FLAGS_LOCK_NONE);
> +
> +	mgr->mm_start = start_offset;
> +	mgr->mm_range = range;
> +
> +	mgr->name = name ? name : "unknown";
> +	mgr->ops = ops;
> +	mgr->flags = flags;
> +
> +	memset(&mgr->kernel_alloc_region, 0, sizeof(struct drm_gpuva_region));
> +	mgr->kernel_alloc_region.va.addr = reserve_offset;
> +	mgr->kernel_alloc_region.va.range = reserve_range;
> +
> +	__drm_gpuva_region_insert(mgr, &mgr->kernel_alloc_region);
> +}
> +EXPORT_SYMBOL(drm_gpuva_manager_init);
> +
> +/**
> + * drm_gpuva_manager_destroy - cleanup a &drm_gpuva_manager
> + * @mgr: pointer to the &drm_gpuva_manager to clean up
> + *
> + * Note that it is a bug to call this function on a manager that still
> + * holds GPU VA mappings.
> + */
> +void
> +drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr)
> +{
> +	mgr->name = NULL;
> +	__drm_gpuva_region_remove(&mgr->kernel_alloc_region);
> +
> +	WARN(!mtree_empty(&mgr->va_mt),
> +	     "GPUVA tree is not empty, potentially leaking memory.");
> +	__mt_destroy(&mgr->va_mt);
> +
> +	WARN(!mtree_empty(&mgr->region_mt),
> +	     "GPUVA region tree is not empty, potentially leaking memory.");
> +	__mt_destroy(&mgr->region_mt);
> +}
> +EXPORT_SYMBOL(drm_gpuva_manager_destroy);
> +
> +static inline bool
> +drm_gpuva_in_mm_range(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
> +{
> +	u64 end = addr + range;
> +	u64 mm_start = mgr->mm_start;
> +	u64 mm_end = mm_start + mgr->mm_range;
> +
> +	return addr < mm_end && mm_start < end;
> +}
> +
> +static inline bool
> +drm_gpuva_in_kernel_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
> +{
> +	u64 end = addr + range;
> +	u64 kstart = mgr->kernel_alloc_region.va.addr;
> +	u64 kend = kstart + mgr->kernel_alloc_region.va.range;
> +
> +	return addr < kend && kstart < end;
> +}
> +
> +static struct drm_gpuva_region *
> +drm_gpuva_in_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
> +{
> +	DRM_GPUVA_REGION_ITER(it, mgr);
> +
> +	/* Find the VA region the requested range is strictly enclosed by. */
> +	drm_gpuva_iter_for_each_range(it, addr, addr + range) {
> +		struct drm_gpuva_region *reg = it.reg;
> +
> +		if (reg->va.addr <= addr &&
> +		    reg->va.addr + reg->va.range >= addr + range &&
> +		    reg != &mgr->kernel_alloc_region)
> +			return reg;
> +	}
> +
> +	return NULL;
> +}
> +
> +static bool
> +drm_gpuva_in_any_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
> +{
> +	return !!drm_gpuva_in_region(mgr, addr, range);
> +}
> +
> +/**
> + * drm_gpuva_remove_iter - removes the iterators current element
> + * @it: the &drm_gpuva_iterator
> + *
> + * This removes the element the iterator currently points to.
> + */
> +void
> +drm_gpuva_iter_remove(struct drm_gpuva_iterator *it)
> +{
> +	mas_erase(&it->mas);
> +}
> +EXPORT_SYMBOL(drm_gpuva_iter_remove);
> +
> +/**
> + * drm_gpuva_insert - insert a &drm_gpuva
> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
> + * @va: the &drm_gpuva to insert
> + * @addr: the start address of the GPU VA
> + * @range: the range of the GPU VA
> + *
> + * Insert a &drm_gpuva with a given address and range into a
> + * &drm_gpuva_manager.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuva_insert(struct drm_gpuva_manager *mgr,
> +		 struct drm_gpuva *va)
> +{
> +	u64 addr = va->va.addr;
> +	u64 range = va->va.range;
> +	MA_STATE(mas, &mgr->va_mt, addr, addr + range - 1);
> +	struct drm_gpuva_region *reg = NULL;
> +	int ret;
> +
> +	if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
> +		return -EINVAL;
> +
> +	if (unlikely(drm_gpuva_in_kernel_region(mgr, addr, range)))
> +		return -EINVAL;
> +
> +	if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS) {
> +		reg = drm_gpuva_in_region(mgr, addr, range);
> +		if (unlikely(!reg))
> +			return -EINVAL;
> +	}
> +
> +	if (unlikely(drm_gpuva_find_first(mgr, addr, range)))
> +		return -EEXIST;
> +
> +	ret = mas_store_gfp(&mas, va, GFP_KERNEL);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	va->mgr = mgr;
> +	va->region = reg;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(drm_gpuva_insert);
> +
> +/**
> + * drm_gpuva_remove - remove a &drm_gpuva
> + * @va: the &drm_gpuva to remove
> + *
> + * This removes the given &va from the underlaying tree.
> + */
> +void
> +drm_gpuva_remove(struct drm_gpuva *va)
> +{
> +	MA_STATE(mas, &va->mgr->va_mt, va->va.addr, 0);
> +
> +	mas_erase(&mas);
> +}
> +EXPORT_SYMBOL(drm_gpuva_remove);
> +
> +/**
> + * drm_gpuva_link - link a &drm_gpuva
> + * @va: the &drm_gpuva to link
> + *
> + * This adds the given &va to the GPU VA list of the &drm_gem_object it is
> + * associated with.
> + *
> + * This function expects the caller to protect the GEM's GPUVA list against
> + * concurrent access.
> + */
> +void
> +drm_gpuva_link(struct drm_gpuva *va)
> +{
> +	if (likely(va->gem.obj))
> +		list_add_tail(&va->head, &va->gem.obj->gpuva.list);
> +}
> +EXPORT_SYMBOL(drm_gpuva_link);
> +
> +/**
> + * drm_gpuva_unlink - unlink a &drm_gpuva
> + * @va: the &drm_gpuva to unlink
> + *
> + * This removes the given &va from the GPU VA list of the &drm_gem_object it is
> + * associated with.
> + *
> + * This function expects the caller to protect the GEM's GPUVA list against
> + * concurrent access.
> + */
> +void
> +drm_gpuva_unlink(struct drm_gpuva *va)
> +{
> +	if (likely(va->gem.obj))
> +		list_del_init(&va->head);
> +}
> +EXPORT_SYMBOL(drm_gpuva_unlink);
> +
> +/**
> + * drm_gpuva_find_first - find the first &drm_gpuva in the given range
> + * @mgr: the &drm_gpuva_manager to search in
> + * @addr: the &drm_gpuvas address
> + * @range: the &drm_gpuvas range
> + *
> + * Returns: the first &drm_gpuva within the given range
> + */
> +struct drm_gpuva *
> +drm_gpuva_find_first(struct drm_gpuva_manager *mgr,
> +		     u64 addr, u64 range)
> +{
> +	MA_STATE(mas, &mgr->va_mt, addr, 0);
> +
> +	return mas_find(&mas, addr + range - 1);
> +}
> +EXPORT_SYMBOL(drm_gpuva_find_first);
> +
> +/**
> + * drm_gpuva_find - find a &drm_gpuva
> + * @mgr: the &drm_gpuva_manager to search in
> + * @addr: the &drm_gpuvas address
> + * @range: the &drm_gpuvas range
> + *
> + * Returns: the &drm_gpuva at a given &addr and with a given &range
> + */
> +struct drm_gpuva *
> +drm_gpuva_find(struct drm_gpuva_manager *mgr,
> +	       u64 addr, u64 range)
> +{
> +	struct drm_gpuva *va;
> +
> +	va = drm_gpuva_find_first(mgr, addr, range);
> +	if (!va)
> +		goto out;
> +
> +	if (va->va.range != range)
> +		goto out;
> +
> +	return va;
> +
> +out:
> +	return NULL;
> +}
> +EXPORT_SYMBOL(drm_gpuva_find);
> +
> +/**
> + * drm_gpuva_find_prev - find the &drm_gpuva before the given address
> + * @mgr: the &drm_gpuva_manager to search in
> + * @start: the given GPU VA's start address
> + *
> + * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
> + *
> + * Note that if there is any free space between the GPU VA mappings no mapping
> + * is returned.
> + *
> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
> + */
> +struct drm_gpuva *
> +drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
> +{
> +	MA_STATE(mas, &mgr->va_mt, start, 0);
> +
> +	if (start <= mgr->mm_start ||
> +	    start > (mgr->mm_start + mgr->mm_range))
> +		return NULL;
> +
> +	return mas_prev(&mas, start - 1);
> +}
> +EXPORT_SYMBOL(drm_gpuva_find_prev);
> +
> +/**
> + * drm_gpuva_find_next - find the &drm_gpuva after the given address
> + * @mgr: the &drm_gpuva_manager to search in
> + * @end: the given GPU VA's end address
> + *
> + * Find the adjacent &drm_gpuva after the GPU VA with given &end address.
> + *
> + * Note that if there is any free space between the GPU VA mappings no mapping
> + * is returned.
> + *
> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
> + */
> +struct drm_gpuva *
> +drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end)
> +{
> +	MA_STATE(mas, &mgr->va_mt, end - 1, 0);
> +
> +	if (end < mgr->mm_start ||
> +	    end >= (mgr->mm_start + mgr->mm_range))
> +		return NULL;
> +
> +	return mas_next(&mas, end);
> +}
> +EXPORT_SYMBOL(drm_gpuva_find_next);
> +
> +static int
> +__drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
> +			  struct drm_gpuva_region *reg)
> +{
> +	u64 addr = reg->va.addr;
> +	u64 range = reg->va.range;
> +	MA_STATE(mas, &mgr->region_mt, addr, addr + range - 1);
> +	int ret;
> +
> +	if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
> +		return -EINVAL;
> +
> +	ret = mas_store_gfp(&mas, reg, GFP_KERNEL);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	reg->mgr = mgr;
> +
> +	return 0;
> +}
> +
> +/**
> + * drm_gpuva_region_insert - insert a &drm_gpuva_region
> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
> + * @reg: the &drm_gpuva_region to insert
> + * @addr: the start address of the GPU VA
> + * @range: the range of the GPU VA
> + *
> + * Insert a &drm_gpuva_region with a given address and range into a
> + * &drm_gpuva_manager.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
> +			struct drm_gpuva_region *reg)
> +{
> +	if (unlikely(!(mgr->flags & DRM_GPUVA_MANAGER_REGIONS)))
> +		return -EINVAL;
> +
> +	return __drm_gpuva_region_insert(mgr, reg);
> +}
> +EXPORT_SYMBOL(drm_gpuva_region_insert);
> +
> +static void
> +__drm_gpuva_region_remove(struct drm_gpuva_region *reg)
> +{
> +	struct drm_gpuva_manager *mgr = reg->mgr;
> +	MA_STATE(mas, &mgr->region_mt, reg->va.addr, 0);
> +
> +	mas_erase(&mas);
> +}
> +
> +/**
> + * drm_gpuva_region_remove - remove a &drm_gpuva_region
> + * @reg: the &drm_gpuva to remove
> + *
> + * This removes the given &reg from the underlaying tree.
> + */
> +void
> +drm_gpuva_region_remove(struct drm_gpuva_region *reg)
> +{
> +	struct drm_gpuva_manager *mgr = reg->mgr;
> +
> +	if (unlikely(!(mgr->flags & DRM_GPUVA_MANAGER_REGIONS)))
> +		return;
> +
> +	if (unlikely(reg == &mgr->kernel_alloc_region)) {
> +		WARN(1, "Can't destroy kernel reserved region.\n");
> +		return;
> +	}
> +
> +	if (unlikely(!drm_gpuva_region_empty(reg)))
> +		WARN(1, "GPU VA region should be empty on destroy.\n");
> +
> +	__drm_gpuva_region_remove(reg);
> +}
> +EXPORT_SYMBOL(drm_gpuva_region_remove);
> +
> +/**
> + * drm_gpuva_region_empty - indicate whether a &drm_gpuva_region is empty
> + * @reg: the &drm_gpuva to destroy
> + *
> + * Returns: true if the &drm_gpuva_region is empty, false otherwise
> + */
> +bool
> +drm_gpuva_region_empty(struct drm_gpuva_region *reg)
> +{
> +	DRM_GPUVA_ITER(it, reg->mgr);
> +
> +	drm_gpuva_iter_for_each_range(it, reg->va.addr,
> +				      reg->va.addr +
> +				      reg->va.range)
> +		return false;
> +
> +	return true;
> +}
> +EXPORT_SYMBOL(drm_gpuva_region_empty);
> +
> +/**
> + * drm_gpuva_region_find_first - find the first &drm_gpuva_region in the given
> + * range
> + * @mgr: the &drm_gpuva_manager to search in
> + * @addr: the &drm_gpuva_regions address
> + * @range: the &drm_gpuva_regions range
> + *
> + * Returns: the first &drm_gpuva_region within the given range
> + */
> +struct drm_gpuva_region *
> +drm_gpuva_region_find_first(struct drm_gpuva_manager *mgr,
> +			    u64 addr, u64 range)
> +{
> +	MA_STATE(mas, &mgr->region_mt, addr, 0);
> +
> +	return mas_find(&mas, addr + range - 1);
> +}
> +EXPORT_SYMBOL(drm_gpuva_region_find_first);
> +
> +/**
> + * drm_gpuva_region_find - find a &drm_gpuva_region
> + * @mgr: the &drm_gpuva_manager to search in
> + * @addr: the &drm_gpuva_regions address
> + * @range: the &drm_gpuva_regions range
> + *
> + * Returns: the &drm_gpuva_region at a given &addr and with a given &range
> + */
> +struct drm_gpuva_region *
> +drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
> +		      u64 addr, u64 range)
> +{
> +	struct drm_gpuva_region *reg;
> +
> +	reg = drm_gpuva_region_find_first(mgr, addr, range);
> +	if (!reg)
> +		goto out;
> +
> +	if (reg->va.range != range)
> +		goto out;
> +
> +	return reg;
> +
> +out:
> +	return NULL;
> +}
> +EXPORT_SYMBOL(drm_gpuva_region_find);
> +
> +static int
> +op_map_cb(int (*step)(struct drm_gpuva_op *, void *),
> +	  void *priv,
> +	  u64 addr, u64 range,
> +	  struct drm_gem_object *obj, u64 offset)
> +{
> +	struct drm_gpuva_op op = {};
> +
> +	op.op = DRM_GPUVA_OP_MAP;
> +	op.map.va.addr = addr;
> +	op.map.va.range = range;
> +	op.map.gem.obj = obj;
> +	op.map.gem.offset = offset;
> +
> +	return step(&op, priv);
> +}
> +
> +static int
> +op_remap_cb(int (*step)(struct drm_gpuva_op *, void *),
> +	    void *priv,
> +	    struct drm_gpuva_op_map *prev,
> +	    struct drm_gpuva_op_map *next,
> +	    struct drm_gpuva_op_unmap *unmap)
> +{
> +	struct drm_gpuva_op op = {};
> +	struct drm_gpuva_op_remap *r;
> +
> +	op.op = DRM_GPUVA_OP_REMAP;
> +	r = &op.remap;
> +	r->prev = prev;
> +	r->next = next;
> +	r->unmap = unmap;
> +
> +	return step(&op, priv);
> +}
> +
> +static int
> +op_unmap_cb(int (*step)(struct drm_gpuva_op *, void *),
> +	    void *priv,
> +	    struct drm_gpuva *va, bool merge)
> +{
> +	struct drm_gpuva_op op = {};
> +
> +	op.op = DRM_GPUVA_OP_UNMAP;
> +	op.unmap.va = va;
> +	op.unmap.keep = merge;
> +
> +	return step(&op, priv);
> +}
> +
> +static inline bool
> +gpuva_should_merge(struct drm_gpuva *va)
> +{
> +	/* Never merge mappings with NULL GEMs. */
> +	return !!va->gem.obj;
> +}
> +
> +static int
> +__drm_gpuva_sm_map(struct drm_gpuva_manager *mgr,
> +		   struct drm_gpuva_fn_ops *ops, void *priv,
> +		   u64 req_addr, u64 req_range,
> +		   struct drm_gem_object *req_obj, u64 req_offset)
> +{
> +	DRM_GPUVA_ITER(it, mgr);
> +	int (*step)(struct drm_gpuva_op *, void *);
> +	struct drm_gpuva *va, *prev = NULL;
> +	u64 req_end = req_addr + req_range;
> +	bool skip_pmerge = false, skip_nmerge = false;
> +	int ret;
> +
> +	step = ops->sm_map_step;
> +
> +	if (unlikely(!drm_gpuva_in_mm_range(mgr, req_addr, req_range)))
> +		return -EINVAL;
> +
> +	if (unlikely(drm_gpuva_in_kernel_region(mgr, req_addr, req_range)))
> +		return -EINVAL;
> +
> +	if ((mgr->flags & DRM_GPUVA_MANAGER_REGIONS) &&
> +	    !drm_gpuva_in_any_region(mgr, req_addr, req_range))
> +		return -EINVAL;
> +
> +	drm_gpuva_iter_for_each_range(it, req_addr, req_end) {
> +		struct drm_gpuva *va = it.va;
> +		struct drm_gem_object *obj = va->gem.obj;
> +		u64 offset = va->gem.offset;
> +		u64 addr = va->va.addr;
> +		u64 range = va->va.range;
> +		u64 end = addr + range;
> +		bool merge = gpuva_should_merge(va);
> +
> +		/* Generally, we want to skip merging with potential mappings
> +		 * left and right of the requested one when we found a
> +		 * collision, since merging happens in this loop already.
> +		 *
> +		 * However, there is one exception when the requested mapping
> +		 * spans into a free VM area. If this is the case we might
> +		 * still hit the boundary of another mapping before and/or
> +		 * after the free VM area.
> +		 */
> +		skip_pmerge = true;
> +		skip_nmerge = true;
> +
> +		if (addr == req_addr) {
> +			merge &= obj == req_obj &&
> +				 offset == req_offset;
> +
> +			if (end == req_end) {
> +				if (merge)
> +					goto done;
> +
> +				ret = op_unmap_cb(step, priv, va, false);
> +				if (ret)
> +					return ret;
> +				break;
> +			}
> +
> +			if (end < req_end) {
> +				skip_nmerge = false;
> +				ret = op_unmap_cb(step, priv, va, merge);
> +				if (ret)
> +					return ret;
> +				goto next;
> +			}
> +
> +			if (end > req_end) {
> +				struct drm_gpuva_op_map n = {
> +					.va.addr = req_end,
> +					.va.range = range - req_range,
> +					.gem.obj = obj,
> +					.gem.offset = offset + req_range,
> +				};
> +				struct drm_gpuva_op_unmap u = { .va = va };
> +
> +				if (merge)
> +					goto done;
> +
> +				ret = op_remap_cb(step, priv, NULL, &n, &u);
> +				if (ret)
> +					return ret;
> +				break;
> +			}
> +		} else if (addr < req_addr) {
> +			u64 ls_range = req_addr - addr;
> +			struct drm_gpuva_op_map p = {
> +				.va.addr = addr,
> +				.va.range = ls_range,
> +				.gem.obj = obj,
> +				.gem.offset = offset,
> +			};
> +			struct drm_gpuva_op_unmap u = { .va = va };
> +
> +			merge &= obj == req_obj &&
> +				 offset + ls_range == req_offset;
> +
> +			if (end == req_end) {
> +				if (merge)
> +					goto done;
> +
> +				ret = op_remap_cb(step, priv, &p, NULL, &u);
> +				if (ret)
> +					return ret;
> +				break;
> +			}
> +
> +			if (end < req_end) {
> +				u64 new_addr = addr;
> +				u64 new_range = req_range + ls_range;
> +				u64 new_offset = offset;
> +
> +				/* We validated that the requested mapping is
> +				 * within a single VA region already.
> +				 * Since it overlaps the current mapping (which
> +				 * can't cross a VA region boundary) we can be
> +				 * sure that we're still within the boundaries
> +				 * of the same VA region after merging.
> +				 */
> +				if (merge) {
> +					req_offset = new_offset;
> +					req_addr = new_addr;
> +					req_range = new_range;
> +					ret = op_unmap_cb(step, priv, va, true);
> +					if (ret)
> +						return ret;
> +					goto next;
> +				}
> +
> +				ret = op_remap_cb(step, priv, &p, NULL, &u);
> +				if (ret)
> +					return ret;
> +				goto next;
> +			}
> +
> +			if (end > req_end) {
> +				struct drm_gpuva_op_map n = {
> +					.va.addr = req_end,
> +					.va.range = end - req_end,
> +					.gem.obj = obj,
> +					.gem.offset = offset + ls_range +
> +						      req_range,
> +				};
> +
> +				if (merge)
> +					goto done;
> +
> +				ret = op_remap_cb(step, priv, &p, &n, &u);
> +				if (ret)
> +					return ret;
> +				break;
> +			}
> +		} else if (addr > req_addr) {
> +			merge &= obj == req_obj &&
> +				 offset == req_offset +
> +					   (addr - req_addr);
> +
> +			if (!prev)
> +				skip_pmerge = false;
> +
> +			if (end == req_end) {
> +				ret = op_unmap_cb(step, priv, va, merge);
> +				if (ret)
> +					return ret;
> +				break;
> +			}
> +
> +			if (end < req_end) {
> +				skip_nmerge = false;
> +				ret = op_unmap_cb(step, priv, va, merge);
> +				if (ret)
> +					return ret;
> +				goto next;
> +			}
> +
> +			if (end > req_end) {
> +				struct drm_gpuva_op_map n = {
> +					.va.addr = req_end,
> +					.va.range = end - req_end,
> +					.gem.obj = obj,
> +					.gem.offset = offset + req_end - addr,
> +				};
> +				struct drm_gpuva_op_unmap u = { .va = va };
> +				u64 new_end = end;
> +				u64 new_range = new_end - req_addr;
> +
> +				/* We validated that the requested mapping is
> +				 * within a single VA region already.
> +				 * Since it overlaps the current mapping (which
> +				 * can't cross a VA region boundary) we can be
> +				 * sure that we're still within the boundaries
> +				 * of the same VA region after merging.
> +				 */
> +				if (merge) {
> +					req_end = new_end;
> +					req_range = new_range;
> +					ret = op_unmap_cb(step, priv, va, true);
> +					if (ret)
> +						return ret;
> +					break;
> +				}
> +
> +				ret = op_remap_cb(step, priv, NULL, &n, &u);
> +				if (ret)
> +					return ret;
> +				break;
> +			}
> +		}
> +next:
> +		prev = va;
> +	}
> +
> +	va = skip_pmerge ? NULL : drm_gpuva_find_prev(mgr, req_addr);
> +	if (va) {
> +		struct drm_gem_object *obj = va->gem.obj;
> +		u64 offset = va->gem.offset;
> +		u64 addr = va->va.addr;
> +		u64 range = va->va.range;
> +		u64 new_offset = offset;
> +		u64 new_addr = addr;
> +		u64 new_range = req_range + range;
> +		bool merge = gpuva_should_merge(va) &&
> +			     obj == req_obj &&
> +			     offset + range == req_offset;
> +
> +		if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS)
> +			merge &= drm_gpuva_in_any_region(mgr, new_addr,
> +							 new_range);
> +
> +		if (merge) {
> +			ret = op_unmap_cb(step, priv, va, true);
> +			if (ret)
> +				return ret;
> +
> +			req_offset = new_offset;
> +			req_addr = new_addr;
> +			req_range = new_range;
> +		}
> +	}
> +
> +	va = skip_nmerge ? NULL : drm_gpuva_find_next(mgr, req_end);
> +	if (va) {
> +		struct drm_gem_object *obj = va->gem.obj;
> +		u64 offset = va->gem.offset;
> +		u64 addr = va->va.addr;
> +		u64 range = va->va.range;
> +		u64 end = addr + range;
> +		u64 new_range = req_range + range;
> +		u64 new_end = end;
> +		bool merge = gpuva_should_merge(va) &&
> +			     obj == req_obj &&
> +			     offset == req_offset + req_range;
> +
> +		if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS)
> +			merge &= drm_gpuva_in_any_region(mgr, req_addr,
> +							 new_range);
> +
> +		if (merge) {
> +			ret = op_unmap_cb(step, priv, va, true);
> +			if (ret)
> +				return ret;
> +
> +			req_range = new_range;
> +			req_end = new_end;
> +		}
> +	}
> +
> +	ret = op_map_cb(step, priv,
> +			req_addr, req_range,
> +			req_obj, req_offset);
> +	if (ret)
> +		return ret;
> +
> +done:
> +	return 0;
> +}
> +
> +static int
> +__drm_gpuva_sm_unmap(struct drm_gpuva_manager *mgr,
> +		     struct drm_gpuva_fn_ops *ops, void *priv,
> +		     u64 req_addr, u64 req_range)
> +{
> +	DRM_GPUVA_ITER(it, mgr);
> +	int (*step)(struct drm_gpuva_op *, void *);
> +	u64 req_end = req_addr + req_range;
> +	int ret;
> +
> +	step = ops->sm_unmap_step;
> +
> +	drm_gpuva_iter_for_each_range(it, req_addr, req_end) {
> +		struct drm_gpuva *va = it.va;
> +		struct drm_gpuva_op_map prev = {}, next = {};
> +		bool prev_split = false, next_split = false;
> +		struct drm_gem_object *obj = va->gem.obj;
> +		u64 offset = va->gem.offset;
> +		u64 addr = va->va.addr;
> +		u64 range = va->va.range;
> +		u64 end = addr + range;
> +
> +		if (addr < req_addr) {
> +			prev.va.addr = addr;
> +			prev.va.range = req_addr - addr;
> +			prev.gem.obj = obj;
> +			prev.gem.offset = offset;
> +
> +			prev_split = true;
> +		}
> +
> +		if (end > req_end) {
> +			next.va.addr = req_end;
> +			next.va.range = end - req_end;
> +			next.gem.obj = obj;
> +			next.gem.offset = offset + (req_end - addr);
> +
> +			next_split = true;
> +		}
> +
> +		if (prev_split || next_split) {
> +			struct drm_gpuva_op_unmap unmap = { .va = va };
> +
> +			ret = op_remap_cb(step, priv, &prev, &next, &unmap);
> +			if (ret)
> +				return ret;
> +		} else {
> +			ret = op_unmap_cb(step, priv, va, false);
> +			if (ret)
> +				return ret;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * drm_gpuva_sm_map - creates the &drm_gpuva_op split/merge steps
> + * @mgr: the &drm_gpuva_manager representing the GPU VA space
> + * @req_addr: the start address of the new mapping
> + * @req_range: the range of the new mapping
> + * @req_obj: the &drm_gem_object to map
> + * @req_offset: the offset within the &drm_gem_object
> + * @priv: pointer to a driver private data structure
> + *
> + * This function iterates the given range of the GPU VA space. It utilizes the
> + * &drm_gpuva_fn_ops to call back into the driver providing the split and merge
> + * steps.
> + *
> + * Drivers may use these callbacks to update the GPU VA space right away within
> + * the callback. In case the driver decides to copy and store the operations for
> + * later processing neither this function nor &drm_gpuva_sm_unmap is allowed to
> + * be called before the &drm_gpuva_manager's view of the GPU VA space was
> + * updated with the previous set of operations. To update the
> + * &drm_gpuva_manager's view of the GPU VA space drm_gpuva_insert(),
> + * drm_gpuva_destroy_locked() and/or drm_gpuva_destroy_unlocked() should be
> + * used.
> + *
> + * A sequence of callbacks can contain map, unmap and remap operations, but
> + * the sequence of callbacks might also be empty if no operation is required,
> + * e.g. if the requested mapping already exists in the exact same way.
> + *
> + * There can be an arbitrary amount of unmap operations, a maximum of two remap
> + * operations and a single map operation. The latter one, if existent,
> + * represents the original map operation requested by the caller. Please note
> + * that the map operation might has been modified, e.g. if it was merged with
> + * an existent mapping.
> + *
> + * Returns: 0 on success or a negative error code
> + */
> +int
> +drm_gpuva_sm_map(struct drm_gpuva_manager *mgr, void *priv,
> +		 u64 req_addr, u64 req_range,
> +		 struct drm_gem_object *req_obj, u64 req_offset)
> +{
> +	if (!mgr->ops || !mgr->ops->sm_map_step)
> +		return -EINVAL;
> +
> +	return __drm_gpuva_sm_map(mgr, mgr->ops, priv,
> +				  req_addr, req_range,
> +				  req_obj, req_offset);
> +}
> +EXPORT_SYMBOL(drm_gpuva_sm_map);
> +
> +/**
> + * drm_gpuva_sm_unmap - creates the &drm_gpuva_ops to split on unmap
> + * @mgr: the &drm_gpuva_manager representing the GPU VA space
> + * @req_addr: the start address of the range to unmap
> + * @req_range: the range of the mappings to unmap
> + * @ops: the &drm_gpuva_fn_ops callbacks to provide the split/merge steps
> + * @priv: pointer to a driver private data structure
> + *
> + * This function iterates the given range of the GPU VA space. It utilizes the
> + * &drm_gpuva_fn_ops to call back into the driver providing the operations to
> + * unmap and, if required, split existent mappings.
> + *
> + * Drivers may use these callbacks to update the GPU VA space right away within
> + * the callback. In case the driver decides to copy and store the operations for
> + * later processing neither this function nor &drm_gpuva_sm_map is allowed to be
> + * called before the &drm_gpuva_manager's view of the GPU VA space was updated
> + * with the previous set of operations. To update the &drm_gpuva_manager's view
> + * of the GPU VA space drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
> + * drm_gpuva_destroy_unlocked() should be used.
> + *
> + * A sequence of callbacks can contain unmap and remap operations, depending on
> + * whether there are actual overlapping mappings to split.
> + *
> + * There can be an arbitrary amount of unmap operations and a maximum of two
> + * remap operations.
> + *
> + * Returns: 0 on success or a negative error code
> + */
> +int
> +drm_gpuva_sm_unmap(struct drm_gpuva_manager *mgr, void *priv,
> +		   u64 req_addr, u64 req_range)
> +{
> +	if (!mgr->ops || !mgr->ops->sm_unmap_step)
> +		return -EINVAL;
> +
> +	return __drm_gpuva_sm_unmap(mgr, mgr->ops, priv,
> +				    req_addr, req_range);
> +}
> +EXPORT_SYMBOL(drm_gpuva_sm_unmap);
> +
> +static struct drm_gpuva_op *
> +gpuva_op_alloc(struct drm_gpuva_manager *mgr)
> +{
> +	struct drm_gpuva_fn_ops *fn = mgr->ops;
> +	struct drm_gpuva_op *op;
> +
> +	if (fn && fn->op_alloc)
> +		op = fn->op_alloc();
> +	else
> +		op = kzalloc(sizeof(*op), GFP_KERNEL);
> +
> +	if (unlikely(!op))
> +		return NULL;
> +
> +	return op;
> +}
> +
> +static void
> +gpuva_op_free(struct drm_gpuva_manager *mgr,
> +	      struct drm_gpuva_op *op)
> +{
> +	struct drm_gpuva_fn_ops *fn = mgr->ops;
> +
> +	if (fn && fn->op_free)
> +		fn->op_free(op);
> +	else
> +		kfree(op);
> +}
> +
> +int drm_gpuva_sm_step(struct drm_gpuva_op *__op, void *priv)
> +{
> +	struct {
> +		struct drm_gpuva_manager *mgr;
> +		struct drm_gpuva_ops *ops;
> +	} *args = priv;
> +	struct drm_gpuva_manager *mgr = args->mgr;
> +	struct drm_gpuva_ops *ops = args->ops;
> +	struct drm_gpuva_op *op;
> +
> +	op = gpuva_op_alloc(mgr);
> +	if (unlikely(!op))
> +		goto err;
> +
> +	memcpy(op, __op, sizeof(*op));
> +
> +	if (op->op == DRM_GPUVA_OP_REMAP) {
> +		struct drm_gpuva_op_remap *__r = &__op->remap;
> +		struct drm_gpuva_op_remap *r = &op->remap;
> +
> +		r->unmap = kmemdup(__r->unmap, sizeof(*r->unmap),
> +				   GFP_KERNEL);
> +		if (unlikely(!r->unmap))
> +			goto err_free_op;
> +
> +		if (__r->prev) {
> +			r->prev = kmemdup(__r->prev, sizeof(*r->prev),
> +					  GFP_KERNEL);
> +			if (unlikely(!r->prev))
> +				goto err_free_unmap;
> +		}
> +
> +		if (__r->next) {
> +			r->next = kmemdup(__r->next, sizeof(*r->next),
> +					  GFP_KERNEL);
> +			if (unlikely(!r->next))
> +				goto err_free_prev;
> +		}
> +	}
> +
> +	list_add_tail(&op->entry, &ops->list);
> +
> +	return 0;
> +
> +err_free_unmap:
> +	kfree(op->remap.unmap);
> +err_free_prev:
> +	kfree(op->remap.prev);
> +err_free_op:
> +	gpuva_op_free(mgr, op);
> +err:
> +	return -ENOMEM;
> +}
> +
> +static struct drm_gpuva_fn_ops gpuva_list_ops = {
> +	.sm_map_step = drm_gpuva_sm_step,
> +	.sm_unmap_step = drm_gpuva_sm_step,
> +};
> +
> +/**
> + * drm_gpuva_sm_map_ops_create - creates the &drm_gpuva_ops to split and merge
> + * @mgr: the &drm_gpuva_manager representing the GPU VA space
> + * @req_addr: the start address of the new mapping
> + * @req_range: the range of the new mapping
> + * @req_obj: the &drm_gem_object to map
> + * @req_offset: the offset within the &drm_gem_object
> + *
> + * This function creates a list of operations to perform splitting and merging
> + * of existent mapping(s) with the newly requested one.
> + *
> + * The list can be iterated with &drm_gpuva_for_each_op and must be processed
> + * in the given order. It can contain map, unmap and remap operations, but it
> + * also can be empty if no operation is required, e.g. if the requested mapping
> + * already exists is the exact same way.
> + *
> + * There can be an arbitrary amount of unmap operations, a maximum of two remap
> + * operations and a single map operation. The latter one, if existent,
> + * represents the original map operation requested by the caller. Please note
> + * that the map operation might has been modified, e.g. if it was merged with an
> + * existent mapping.
> + *
> + * Note that before calling this function again with another mapping request it
> + * is necessary to update the &drm_gpuva_manager's view of the GPU VA space. The
> + * previously obtained operations must be either processed or abandoned. To
> + * update the &drm_gpuva_manager's view of the GPU VA space drm_gpuva_insert(),
> + * drm_gpuva_destroy_locked() and/or drm_gpuva_destroy_unlocked() should be
> + * used.
> + *
> + * After the caller finished processing the returned &drm_gpuva_ops, they must
> + * be freed with &drm_gpuva_ops_free.
> + *
> + * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
> + */
> +struct drm_gpuva_ops *
> +drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
> +			    u64 req_addr, u64 req_range,
> +			    struct drm_gem_object *req_obj, u64 req_offset)
> +{
> +	struct drm_gpuva_ops *ops;
> +	struct {
> +		struct drm_gpuva_manager *mgr;
> +		struct drm_gpuva_ops *ops;
> +	} args;
> +	int ret;
> +
> +	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
> +	if (unlikely(!ops))
> +		return ERR_PTR(-ENOMEM);
> +
> +	INIT_LIST_HEAD(&ops->list);
> +
> +	args.mgr = mgr;
> +	args.ops = ops;
> +
> +	ret = __drm_gpuva_sm_map(mgr, &gpuva_list_ops, &args,
> +				 req_addr, req_range,
> +				 req_obj, req_offset);
> +	if (ret) {
> +		kfree(ops);
> +		return ERR_PTR(ret);
> +	}
> +
> +	return ops;
> +}
> +EXPORT_SYMBOL(drm_gpuva_sm_map_ops_create);
> +
> +/**
> + * drm_gpuva_sm_unmap_ops_create - creates the &drm_gpuva_ops to split on unmap
> + * @mgr: the &drm_gpuva_manager representing the GPU VA space
> + * @req_addr: the start address of the range to unmap
> + * @req_range: the range of the mappings to unmap
> + *
> + * This function creates a list of operations to perform unmapping and, if
> + * required, splitting of the mappings overlapping the unmap range.
> + *
> + * The list can be iterated with &drm_gpuva_for_each_op and must be processed
> + * in the given order. It can contain unmap and remap operations, depending on
> + * whether there are actual overlapping mappings to split.
> + *
> + * There can be an arbitrary amount of unmap operations and a maximum of two
> + * remap operations.
> + *
> + * Note that before calling this function again with another range to unmap it
> + * is necessary to update the &drm_gpuva_manager's view of the GPU VA space. The
> + * previously obtained operations must be processed or abandoned. To update the
> + * &drm_gpuva_manager's view of the GPU VA space drm_gpuva_insert(),
> + * drm_gpuva_destroy_locked() and/or drm_gpuva_destroy_unlocked() should be
> + * used.
> + *
> + * After the caller finished processing the returned &drm_gpuva_ops, they must
> + * be freed with &drm_gpuva_ops_free.
> + *
> + * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
> + */
> +struct drm_gpuva_ops *
> +drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
> +			      u64 req_addr, u64 req_range)
> +{
> +	struct drm_gpuva_ops *ops;
> +	struct {
> +		struct drm_gpuva_manager *mgr;
> +		struct drm_gpuva_ops *ops;
> +	} args;
> +	int ret;
> +
> +	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
> +	if (unlikely(!ops))
> +		return ERR_PTR(-ENOMEM);
> +
> +	INIT_LIST_HEAD(&ops->list);
> +
> +	args.mgr = mgr;
> +	args.ops = ops;
> +
> +	ret = __drm_gpuva_sm_unmap(mgr, &gpuva_list_ops, &args,
> +				   req_addr, req_range);
> +	if (ret) {
> +		kfree(ops);
> +		return ERR_PTR(ret);
> +	}
> +
> +	return ops;
> +}
> +EXPORT_SYMBOL(drm_gpuva_sm_unmap_ops_create);
> +
> +/**
> + * drm_gpuva_prefetch_ops_create - creates the &drm_gpuva_ops to prefetch
> + * @mgr: the &drm_gpuva_manager representing the GPU VA space
> + * @req_addr: the start address of the range to prefetch
> + * @req_range: the range of the mappings to prefetch
> + *
> + * This function creates a list of operations to perform prefetching.
> + *
> + * The list can be iterated with &drm_gpuva_for_each_op and must be processed
> + * in the given order. It can contain prefetch operations.
> + *
> + * There can be an arbitrary amount of prefetch operations.
> + *
> + * After the caller finished processing the returned &drm_gpuva_ops, they must
> + * be freed with &drm_gpuva_ops_free.
> + *
> + * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
> + */
> +struct drm_gpuva_ops *
> +drm_gpuva_prefetch_ops_create(struct drm_gpuva_manager *mgr,
> +			      u64 addr, u64 range)
> +{
> +	DRM_GPUVA_ITER(it, mgr);
> +	struct drm_gpuva_ops *ops;
> +	struct drm_gpuva_op *op;
> +	int ret;
> +
> +	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
> +	if (!ops)
> +		return ERR_PTR(-ENOMEM);
> +
> +	INIT_LIST_HEAD(&ops->list);
> +
> +	drm_gpuva_iter_for_each_range(it, addr, addr + range) {
> +		op = gpuva_op_alloc(mgr);
> +		if (!op) {
> +			ret = -ENOMEM;
> +			goto err_free_ops;
> +		}
> +
> +		op->op = DRM_GPUVA_OP_PREFETCH;
> +		op->prefetch.va = it.va;
> +		list_add_tail(&op->entry, &ops->list);
> +	}
> +
> +	return ops;
> +
> +err_free_ops:
> +	drm_gpuva_ops_free(mgr, ops);
> +	return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL(drm_gpuva_prefetch_ops_create);
> +
> +/**
> + * drm_gpuva_gem_unmap_ops_create - creates the &drm_gpuva_ops to unmap a GEM
> + * @mgr: the &drm_gpuva_manager representing the GPU VA space
> + * @obj: the &drm_gem_object to unmap
> + *
> + * This function creates a list of operations to perform unmapping for every
> + * GPUVA attached to a GEM.
> + *
> + * The list can be iterated with &drm_gpuva_for_each_op and consists out of an
> + * arbitrary amount of unmap operations.
> + *
> + * After the caller finished processing the returned &drm_gpuva_ops, they must
> + * be freed with &drm_gpuva_ops_free.
> + *
> + * It is the callers responsibility to protect the GEMs GPUVA list against
> + * concurrent access.
> + *
> + * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
> + */
> +struct drm_gpuva_ops *
> +drm_gpuva_gem_unmap_ops_create(struct drm_gpuva_manager *mgr,
> +			       struct drm_gem_object *obj)
> +{
> +	struct drm_gpuva_ops *ops;
> +	struct drm_gpuva_op *op;
> +	struct drm_gpuva *va;
> +	int ret;
> +
> +	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
> +	if (!ops)
> +		return ERR_PTR(-ENOMEM);
> +
> +	INIT_LIST_HEAD(&ops->list);
> +
> +	drm_gem_for_each_gpuva(va, obj) {
> +		op = gpuva_op_alloc(mgr);
> +		if (!op) {
> +			ret = -ENOMEM;
> +			goto err_free_ops;
> +		}
> +
> +		op->op = DRM_GPUVA_OP_UNMAP;
> +		op->unmap.va = va;
> +		list_add_tail(&op->entry, &ops->list);
> +	}
> +
> +	return ops;
> +
> +err_free_ops:
> +	drm_gpuva_ops_free(mgr, ops);
> +	return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL(drm_gpuva_gem_unmap_ops_create);
> +
> +
> +/**
> + * drm_gpuva_ops_free - free the given &drm_gpuva_ops
> + * @mgr: the &drm_gpuva_manager the ops were created for
> + * @ops: the &drm_gpuva_ops to free
> + *
> + * Frees the given &drm_gpuva_ops structure including all the ops associated
> + * with it.
> + */
> +void
> +drm_gpuva_ops_free(struct drm_gpuva_manager *mgr,
> +		   struct drm_gpuva_ops *ops)
> +{
> +	struct drm_gpuva_op *op, *next;
> +
> +	drm_gpuva_for_each_op_safe(op, next, ops) {
> +		list_del(&op->entry);
> +
> +		if (op->op == DRM_GPUVA_OP_REMAP) {
> +			kfree(op->remap.prev);
> +			kfree(op->remap.next);
> +			kfree(op->remap.unmap);
> +		}
> +
> +		gpuva_op_free(mgr, op);
> +	}
> +
> +	kfree(ops);
> +}
> +EXPORT_SYMBOL(drm_gpuva_ops_free);
> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> index 1d76d0686b03..4fe4a1552948 100644
> --- a/include/drm/drm_drv.h
> +++ b/include/drm/drm_drv.h
> @@ -104,6 +104,12 @@ enum drm_driver_feature {
>   	 * acceleration should be handled by two drivers that are connected using auxiliary bus.
>   	 */
>   	DRIVER_COMPUTE_ACCEL            = BIT(7),
> +	/**
> +	 * @DRIVER_GEM_GPUVA:
> +	 *
> +	 * Driver supports user defined GPU VA bindings for GEM objects.
> +	 */
> +	DRIVER_GEM_GPUVA		= BIT(8),
>   
>   	/* IMPORTANT: Below are all the legacy flags, add new ones above. */
>   
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 772a4adf5287..4a3679034966 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -36,6 +36,8 @@
>   
>   #include <linux/kref.h>
>   #include <linux/dma-resv.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
>   
>   #include <drm/drm_vma_manager.h>
>   
> @@ -337,6 +339,17 @@ struct drm_gem_object {
>   	 */
>   	struct dma_resv _resv;
>   
> +	/**
> +	 * @gpuva:
> +	 *
> +	 * Provides the list and list mutex of GPU VAs attached to this
> +	 * GEM object.
> +	 */
> +	struct {
> +		struct list_head list;
> +		struct mutex mutex;
> +	} gpuva;
> +
>   	/**
>   	 * @funcs:
>   	 *
> @@ -479,4 +492,66 @@ void drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj);
>   unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru, unsigned nr_to_scan,
>   			       bool (*shrink)(struct drm_gem_object *obj));
>   
> +/**
> + * drm_gem_gpuva_init - initialize the gpuva list of a GEM object
> + * @obj: the &drm_gem_object
> + *
> + * This initializes the &drm_gem_object's &drm_gpuva list and the mutex
> + * protecting it.
> + *
> + * Calling this function is only necessary for drivers intending to support the
> + * &drm_driver_feature DRIVER_GEM_GPUVA.
> + */
> +static inline void drm_gem_gpuva_init(struct drm_gem_object *obj)
> +{
> +	INIT_LIST_HEAD(&obj->gpuva.list);
> +	mutex_init(&obj->gpuva.mutex);
> +}
> +
> +/**
> + * drm_gem_gpuva_lock - lock the GEM's gpuva list mutex
> + * @obj: the &drm_gem_object
> + *
> + * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
> + */
> +static inline void drm_gem_gpuva_lock(struct drm_gem_object *obj)
> +{
> +	mutex_lock(&obj->gpuva.mutex);
> +}
> +
> +/**
> + * drm_gem_gpuva_unlock - unlock the GEM's gpuva list mutex
> + * @obj: the &drm_gem_object
> + *
> + * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
> + */
> +static inline void drm_gem_gpuva_unlock(struct drm_gem_object *obj)
> +{
> +	mutex_unlock(&obj->gpuva.mutex);
> +}
> +
> +/**
> + * drm_gem_for_each_gpuva - iternator to walk over a list of gpuvas
> + * @entry: &drm_gpuva structure to assign to in each iteration step
> + * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
> + *
> + * This iterator walks over all &drm_gpuva structures associated with the
> + * &drm_gpuva_manager.
> + */
> +#define drm_gem_for_each_gpuva(entry, obj) \
> +	list_for_each_entry(entry, &obj->gpuva.list, head)
> +
> +/**
> + * drm_gem_for_each_gpuva_safe - iternator to safely walk over a list of gpuvas
> + * @entry: &drm_gpuva structure to assign to in each iteration step
> + * @next: &next &drm_gpuva to store the next step
> + * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
> + *
> + * This iterator walks over all &drm_gpuva structures associated with the
> + * &drm_gem_object. It is implemented with list_for_each_entry_safe(), hence
> + * it is save against removal of elements.
> + */
> +#define drm_gem_for_each_gpuva_safe(entry, next, obj) \
> +	list_for_each_entry_safe(entry, next, &obj->gpuva.list, head)
> +
>   #endif /* __DRM_GEM_H__ */
> diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
> new file mode 100644
> index 000000000000..d245d01e37a9
> --- /dev/null
> +++ b/include/drm/drm_gpuva_mgr.h
> @@ -0,0 +1,714 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __DRM_GPUVA_MGR_H__
> +#define __DRM_GPUVA_MGR_H__
> +
> +/*
> + * Copyright (c) 2022 Red Hat.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/maple_tree.h>
> +#include <linux/mm.h>
> +#include <linux/rbtree.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +
> +struct drm_gpuva_manager;
> +struct drm_gpuva_fn_ops;
> +
> +/**
> + * struct drm_gpuva_region - structure to track a portion of GPU VA space
> + *
> + * This structure represents a portion of a GPUs VA space and is associated
> + * with a &drm_gpuva_manager.
> + *
> + * GPU VA mappings, represented by &drm_gpuva objects, are restricted to be
> + * placed within a &drm_gpuva_region.
> + */
> +struct drm_gpuva_region {
> +	/**
> +	 * @mgr: the &drm_gpuva_manager this object is associated with
> +	 */
> +	struct drm_gpuva_manager *mgr;
> +
> +	/**
> +	 * @va: structure containing the address and range of the &drm_gpuva_region
> +	 */
> +	struct {
> +		/**
> +		 * @addr: the start address
> +		 */
> +		u64 addr;
> +
> +		/*
> +		 * @range: the range
> +		 */
> +		u64 range;
> +	} va;
> +
> +	/**
> +	 * @sparse: indicates whether this region is sparse
> +	 */
> +	bool sparse;
> +};
> +
> +int drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
> +			    struct drm_gpuva_region *reg);
> +void drm_gpuva_region_remove(struct drm_gpuva_region *reg);
> +
> +bool
> +drm_gpuva_region_empty(struct drm_gpuva_region *reg);
> +
> +struct drm_gpuva_region *
> +drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
> +		      u64 addr, u64 range);
> +struct drm_gpuva_region *
> +drm_gpuva_region_find_first(struct drm_gpuva_manager *mgr,
> +			    u64 addr, u64 range);
> +
> +/**
> + * enum drm_gpuva_flags - flags for struct drm_gpuva
> + */
> +enum drm_gpuva_flags {
> +	/**
> +	 * @DRM_GPUVA_EVICTED:
> +	 *
> +	 * Flag indicating that the &drm_gpuva's backing GEM is evicted.
> +	 */
> +	DRM_GPUVA_EVICTED = (1 << 0),
> +
> +	/**
> +	 * @DRM_GPUVA_USERBITS: user defined bits
> +	 */
> +	DRM_GPUVA_USERBITS = (1 << 1),
> +};
> +
> +/**
> + * struct drm_gpuva - structure to track a GPU VA mapping
> + *
> + * This structure represents a GPU VA mapping and is associated with a
> + * &drm_gpuva_manager.
> + *
> + * Typically, this structure is embedded in bigger driver structures.
> + */
> +struct drm_gpuva {
> +	/**
> +	 * @mgr: the &drm_gpuva_manager this object is associated with
> +	 */
> +	struct drm_gpuva_manager *mgr;
> +
> +	/**
> +	 * @region: the &drm_gpuva_region the &drm_gpuva is mapped in
> +	 */
> +	struct drm_gpuva_region *region;
> +
> +	/**
> +	 * @head: the &list_head to attach this object to a &drm_gem_object
> +	 */
> +	struct list_head head;
> +
> +	/**
> +	 * @flags: the &drm_gpuva_flags for this mapping
> +	 */
> +	enum drm_gpuva_flags flags;
> +
> +	/**
> +	 * @va: structure containing the address and range of the &drm_gpuva
> +	 */
> +	struct {
> +		/**
> +		 * @addr: the start address
> +		 */
> +		u64 addr;
> +
> +		/*
> +		 * @range: the range
> +		 */
> +		u64 range;
> +	} va;
> +
> +	/**
> +	 * @gem: structure containing the &drm_gem_object and it's offset
> +	 */
> +	struct {
> +		/**
> +		 * @offset: the offset within the &drm_gem_object
> +		 */
> +		u64 offset;
> +
> +		/**
> +		 * @obj: the mapped &drm_gem_object
> +		 */
> +		struct drm_gem_object *obj;
> +	} gem;
> +};
> +
> +void drm_gpuva_link(struct drm_gpuva *va);
> +void drm_gpuva_unlink(struct drm_gpuva *va);
> +
> +int drm_gpuva_insert(struct drm_gpuva_manager *mgr,
> +		     struct drm_gpuva *va);
> +void drm_gpuva_remove(struct drm_gpuva *va);
> +
> +struct drm_gpuva *drm_gpuva_find(struct drm_gpuva_manager *mgr,
> +				 u64 addr, u64 range);
> +struct drm_gpuva *drm_gpuva_find_first(struct drm_gpuva_manager *mgr,
> +				       u64 addr, u64 range);
> +struct drm_gpuva *drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start);
> +struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end);
> +
> +/**
> + * drm_gpuva_evict - sets whether the backing GEM of this &drm_gpuva is evicted
> + * @va: the &drm_gpuva to set the evict flag for
> + * @evict: indicates whether the &drm_gpuva is evicted
> + */
> +static inline void drm_gpuva_evict(struct drm_gpuva *va, bool evict)
> +{
> +	if (evict)
> +		va->flags |= DRM_GPUVA_EVICTED;
> +	else
> +		va->flags &= ~DRM_GPUVA_EVICTED;
> +}
> +
> +/**
> + * drm_gpuva_evicted - indicates whether the backing BO of this &drm_gpuva
> + * is evicted
> + * @va: the &drm_gpuva to check
> + */
> +static inline bool drm_gpuva_evicted(struct drm_gpuva *va)
> +{
> +	return va->flags & DRM_GPUVA_EVICTED;
> +}
> +
> +/**
> + * enum drm_gpuva_mgr_flags - the feature flags for the &drm_gpuva_manager
> + */
> +enum drm_gpuva_mgr_flags {
> +	/**
> +	 * @DRM_GPUVA_MANAGER_REGIONS:
> +	 *
> +	 * Enable the &drm_gpuva_manager to separately track &drm_gpuva_regions.
> +	 *
> +	 * &drm_gpuva_regions represent a reserved portion of VA space drivers
> +	 * can create mappings in. If regions are enabled, &drm_gpuvas can be
> +	 * created within an existing &drm_gpuva_region only and merge
> +	 * operations never indicate merging over region boundaries.
> +	 */
> +	DRM_GPUVA_MANAGER_REGIONS = (1 << 0),
> +};
> +
> +/**
> + * struct drm_gpuva_manager - DRM GPU VA Manager
> + *
> + * The DRM GPU VA Manager keeps track of a GPU's virtual address space by using
> + * &maple_tree structures. Typically, this structure is embedded in bigger
> + * driver structures.
> + *
> + * Drivers can pass addresses and ranges in an arbitrary unit, e.g. bytes or
> + * pages.
> + *
> + * There should be one manager instance per GPU virtual address space.
> + */
> +struct drm_gpuva_manager {
> +	/**
> +	 * @name: the name of the DRM GPU VA space
> +	 */
> +	const char *name;
> +
> +	/**
> +	 * @mm_start: start of the VA space
> +	 */
> +	u64 mm_start;
> +
> +	/**
> +	 * @mm_range: length of the VA space
> +	 */
> +	u64 mm_range;
> +
> +	/**
> +	 * @region_mt: the &maple_tree to track GPU VA regions
> +	 */
> +	struct maple_tree region_mt;
> +
> +	/**
> +	 * @va_mt: the &maple_tree to track GPU VA mappings
> +	 */
> +	struct maple_tree va_mt;
> +
> +	/**
> +	 * @kernel_alloc_region:
> +	 *
> +	 * &drm_gpuva_region representing the address space cutout reserved for
> +	 * the kernel
> +	 */
> +	struct drm_gpuva_region kernel_alloc_region;
> +
> +	/**
> +	 * @ops: &drm_gpuva_fn_ops providing the split/merge steps to drivers
> +	 */
> +	struct drm_gpuva_fn_ops *ops;
> +
> +	/**
> +	 * @flags: the feature flags of the &drm_gpuva_manager
> +	 */
> +	enum drm_gpuva_mgr_flags flags;
> +};
> +
> +void drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
> +			    const char *name,
> +			    u64 start_offset, u64 range,
> +			    u64 reserve_offset, u64 reserve_range,
> +			    struct drm_gpuva_fn_ops *ops,
> +			    enum drm_gpuva_mgr_flags flags);
> +void drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr);
> +
> +/**
> + * struct drm_gpuva_iterator - iterator for walking the internal (maple) tree
> + */
> +struct drm_gpuva_iterator {
> +	/**
> +	 * @mas: the maple tree iterator (maple advanced state)
> +	 */
> +	struct ma_state mas;
> +
> +	/**
> +	 * @mgr: the &drm_gpuva_manager to iterate
> +	 */
> +	struct drm_gpuva_manager *mgr;
> +
> +	union {
> +		/**
> +		 * @va: the current &drm_gpuva entry
> +		 */
> +		struct drm_gpuva *va;
> +
> +		/**
> +		 * @reg: the current &drm_gpuva_region entry
> +		 */
> +		struct drm_gpuva_region *reg;
> +
> +		/**
> +		 * @entry: the current entry
> +		 */
> +		void *entry;
> +	};
> +};
> +
> +void drm_gpuva_iter_remove(struct drm_gpuva_iterator *it);
> +
> +/**
> + * DRM_GPUVA_ITER - create an iterator structure to iterate the &drm_gpuva tree
> + * @name: the name of the &drm_gpuva_iterator to create
> + * @mgr: the &drm_gpuva_manager to iterate
> + */
> +#define DRM_GPUVA_ITER(name, mgr__)				\
> +	struct drm_gpuva_iterator name = {			\
> +		.mas = __MA_STATE(&(mgr__)->va_mt, 0, 0),	\
> +		.mgr = mgr__,					\
> +		.va = NULL,					\
> +	}
> +
> +/**
> + * DRM_GPUVA_REGION_ITER - create an iterator structure to iterate the
> + * &drm_gpuva_region tree
> + * @name: the name of the &drm_gpuva_iterator to create
> + * @mgr: the &drm_gpuva_manager to iterate
> + */
> +#define DRM_GPUVA_REGION_ITER(name, mgr__)			\
> +	struct drm_gpuva_iterator name = {			\
> +		.mas = __MA_STATE(&(mgr__)->region_mt, 0, 0),	\
> +		.mgr = mgr__,					\
> +		.reg = NULL,					\
> +	}
> +
> +/**
> + * drm_gpuva_iter_for_each_range - iternator to walk over a range of entries
> + * @it__: &drm_gpuva_iterator structure to assign to in each iteration step
> + * @start__: starting offset, the first entry will overlap this
> + * @end__: ending offset, the last entry will start before this (but may overlap)
> + *
> + * This function can be used to iterate both &drm_gpuva objects and
> + * &drm_gpuva_region objects.
> + *
> + * It is safe against the removal of elements using &drm_gpuva_iter_remove,
> + * however it is not safe against the removal of elements using
> + * &drm_gpuva_remove and &drm_gpuva_region_remove.
> + */
> +#define drm_gpuva_iter_for_each_range(it__, start__, end__) \
> +	for ((it__).mas.index = start__, (it__).entry = mas_find(&(it__).mas, end__ - 1); \
> +	     (it__).entry; (it__).entry = mas_find(&(it__).mas, end__ - 1))
> +
> +/**
> + * drm_gpuva_iter_for_each - iternator to walk over all existing entries
> + * @it__: &drm_gpuva_iterator structure to assign to in each iteration step
> + *
> + * This function can be used to iterate both &drm_gpuva objects and
> + * &drm_gpuva_region objects.
> + *
> + * It is safe against the removal of elements using &drm_gpuva_iter_remove,
> + * however it is not safe against the removal of elements using
> + * &drm_gpuva_remove and &drm_gpuva_region_remove.
> + */
> +#define drm_gpuva_iter_for_each(it__) \
> +	drm_gpuva_iter_for_each_range(it__, (it__).mgr->mm_start, \
> +				      (it__).mgr->mm_start + (it__).mgr->mm_range)
> +
> +/**
> + * enum drm_gpuva_op_type - GPU VA operation type
> + *
> + * Operations to alter the GPU VA mappings tracked by the &drm_gpuva_manager.
> + */
> +enum drm_gpuva_op_type {
> +	/**
> +	 * @DRM_GPUVA_OP_MAP: the map op type
> +	 */
> +	DRM_GPUVA_OP_MAP,
> +
> +	/**
> +	 * @DRM_GPUVA_OP_REMAP: the remap op type
> +	 */
> +	DRM_GPUVA_OP_REMAP,
> +
> +	/**
> +	 * @DRM_GPUVA_OP_UNMAP: the unmap op type
> +	 */
> +	DRM_GPUVA_OP_UNMAP,
> +
> +	/**
> +	 * @DRM_GPUVA_OP_PREFETCH: the prefetch op type
> +	 */
> +	DRM_GPUVA_OP_PREFETCH,
> +};
> +
> +/**
> + * struct drm_gpuva_op_map - GPU VA map operation
> + *
> + * This structure represents a single map operation generated by the
> + * DRM GPU VA manager.
> + */
> +struct drm_gpuva_op_map {
> +	/**
> +	 * @va: structure containing address and range of a map
> +	 * operation
> +	 */
> +	struct {
> +		/**
> +		 * @addr: the base address of the new mapping
> +		 */
> +		u64 addr;
> +
> +		/**
> +		 * @range: the range of the new mapping
> +		 */
> +		u64 range;
> +	} va;
> +
> +	/**
> +	 * @gem: structure containing the &drm_gem_object and it's offset
> +	 */
> +	struct {
> +		/**
> +		 * @offset: the offset within the &drm_gem_object
> +		 */
> +		u64 offset;
> +
> +		/**
> +		 * @obj: the &drm_gem_object to map
> +		 */
> +		struct drm_gem_object *obj;
> +	} gem;
> +};
> +
> +/**
> + * struct drm_gpuva_op_unmap - GPU VA unmap operation
> + *
> + * This structure represents a single unmap operation generated by the
> + * DRM GPU VA manager.
> + */
> +struct drm_gpuva_op_unmap {
> +	/**
> +	 * @va: the &drm_gpuva to unmap
> +	 */
> +	struct drm_gpuva *va;
> +
> +	/**
> +	 * @keep:
> +	 *
> +	 * Indicates whether this &drm_gpuva is physically contiguous with the
> +	 * original mapping request.
> +	 *
> +	 * Optionally, if &keep is set, drivers may keep the actual page table
> +	 * mappings for this &drm_gpuva, adding the missing page table entries
> +	 * only and update the &drm_gpuva_manager accordingly.
> +	 */
> +	bool keep;
> +};
> +
> +/**
> + * struct drm_gpuva_op_remap - GPU VA remap operation
> + *
> + * This represents a single remap operation generated by the DRM GPU VA manager.
> + *
> + * A remap operation is generated when an existing GPU VA mmapping is split up
> + * by inserting a new GPU VA mapping or by partially unmapping existent
> + * mapping(s), hence it consists of a maximum of two map and one unmap
> + * operation.
> + *
> + * The @unmap operation takes care of removing the original existing mapping.
> + * @prev is used to remap the preceding part, @next the subsequent part.
> + *
> + * If either a new mapping's start address is aligned with the start address
> + * of the old mapping or the new mapping's end address is aligned with the
> + * end address of the old mapping, either @prev or @next is NULL.
> + *
> + * Note, the reason for a dedicated remap operation, rather than arbitrary
> + * unmap and map operations, is to give drivers the chance of extracting driver
> + * specific data for creating the new mappings from the unmap operations's
> + * &drm_gpuva structure which typically is embedded in larger driver specific
> + * structures.
> + */
> +struct drm_gpuva_op_remap {
> +	/**
> +	 * @prev: the preceding part of a split mapping
> +	 */
> +	struct drm_gpuva_op_map *prev;
> +
> +	/**
> +	 * @next: the subsequent part of a split mapping
> +	 */
> +	struct drm_gpuva_op_map *next;
> +
> +	/**
> +	 * @unmap: the unmap operation for the original existing mapping
> +	 */
> +	struct drm_gpuva_op_unmap *unmap;
> +};
> +
> +/**
> + * struct drm_gpuva_op_prefetch - GPU VA prefetch operation
> + *
> + * This structure represents a single prefetch operation generated by the
> + * DRM GPU VA manager.
> + */
> +struct drm_gpuva_op_prefetch {
> +	/**
> +	 * @va: the &drm_gpuva to prefetch
> +	 */
> +	struct drm_gpuva *va;
> +};
> +
> +/**
> + * struct drm_gpuva_op - GPU VA operation
> + *
> + * This structure represents a single generic operation.
> + *
> + * The particular type of the operation is defined by @op.
> + */
> +struct drm_gpuva_op {
> +	/**
> +	 * @entry:
> +	 *
> +	 * The &list_head used to distribute instances of this struct within
> +	 * &drm_gpuva_ops.
> +	 */
> +	struct list_head entry;
> +
> +	/**
> +	 * @op: the type of the operation
> +	 */
> +	enum drm_gpuva_op_type op;
> +
> +	union {
> +		/**
> +		 * @map: the map operation
> +		 */
> +		struct drm_gpuva_op_map map;
> +
> +		/**
> +		 * @remap: the remap operation
> +		 */
> +		struct drm_gpuva_op_remap remap;
> +
> +		/**
> +		 * @unmap: the unmap operation
> +		 */
> +		struct drm_gpuva_op_unmap unmap;
> +
> +		/**
> +		 * @prefetch: the prefetch operation
> +		 */
> +		struct drm_gpuva_op_prefetch prefetch;
> +	};
> +};
> +
> +/**
> + * struct drm_gpuva_ops - wraps a list of &drm_gpuva_op
> + */
> +struct drm_gpuva_ops {
> +	/**
> +	 * @list: the &list_head
> +	 */
> +	struct list_head list;
> +};
> +
> +/**
> + * drm_gpuva_for_each_op - iterator to walk over &drm_gpuva_ops
> + * @op: &drm_gpuva_op to assign in each iteration step
> + * @ops: &drm_gpuva_ops to walk
> + *
> + * This iterator walks over all ops within a given list of operations.
> + */
> +#define drm_gpuva_for_each_op(op, ops) list_for_each_entry(op, &(ops)->list, entry)
> +
> +/**
> + * drm_gpuva_for_each_op_safe - iterator to safely walk over &drm_gpuva_ops
> + * @op: &drm_gpuva_op to assign in each iteration step
> + * @next: &next &drm_gpuva_op to store the next step
> + * @ops: &drm_gpuva_ops to walk
> + *
> + * This iterator walks over all ops within a given list of operations. It is
> + * implemented with list_for_each_safe(), so save against removal of elements.
> + */
> +#define drm_gpuva_for_each_op_safe(op, next, ops) \
> +	list_for_each_entry_safe(op, next, &(ops)->list, entry)
> +
> +/**
> + * drm_gpuva_for_each_op_from_reverse - iterate backwards from the given point
> + * @op: &drm_gpuva_op to assign in each iteration step
> + * @ops: &drm_gpuva_ops to walk
> + *
> + * This iterator walks over all ops within a given list of operations beginning
> + * from the given operation in reverse order.
> + */
> +#define drm_gpuva_for_each_op_from_reverse(op, ops) \
> +	list_for_each_entry_from_reverse(op, &(ops)->list, entry)
> +
> +/**
> + * drm_gpuva_first_op - returns the first &drm_gpuva_op from &drm_gpuva_ops
> + * @ops: the &drm_gpuva_ops to get the fist &drm_gpuva_op from
> + */
> +#define drm_gpuva_first_op(ops) \
> +	list_first_entry(&(ops)->list, struct drm_gpuva_op, entry)
> +
> +/**
> + * drm_gpuva_last_op - returns the last &drm_gpuva_op from &drm_gpuva_ops
> + * @ops: the &drm_gpuva_ops to get the last &drm_gpuva_op from
> + */
> +#define drm_gpuva_last_op(ops) \
> +	list_last_entry(&(ops)->list, struct drm_gpuva_op, entry)
> +
> +/**
> + * drm_gpuva_prev_op - previous &drm_gpuva_op in the list
> + * @op: the current &drm_gpuva_op
> + */
> +#define drm_gpuva_prev_op(op) list_prev_entry(op, entry)
> +
> +/**
> + * drm_gpuva_next_op - next &drm_gpuva_op in the list
> + * @op: the current &drm_gpuva_op
> + */
> +#define drm_gpuva_next_op(op) list_next_entry(op, entry)
> +
> +struct drm_gpuva_ops *
> +drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
> +			    u64 addr, u64 range,
> +			    struct drm_gem_object *obj, u64 offset);
> +struct drm_gpuva_ops *
> +drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
> +			      u64 addr, u64 range);
> +
> +struct drm_gpuva_ops *
> +drm_gpuva_prefetch_ops_create(struct drm_gpuva_manager *mgr,
> +				 u64 addr, u64 range);
> +
> +struct drm_gpuva_ops *
> +drm_gpuva_gem_unmap_ops_create(struct drm_gpuva_manager *mgr,
> +			       struct drm_gem_object *obj);
> +
> +void drm_gpuva_ops_free(struct drm_gpuva_manager *mgr,
> +			struct drm_gpuva_ops *ops);
> +
> +/**
> + * struct drm_gpuva_fn_ops - callbacks for split/merge steps
> + *
> + * This structure defines the callbacks used by &drm_gpuva_sm_map and
> + * &drm_gpuva_sm_unmap to provide the split/merge steps for map and unmap
> + * operations to drivers.
> + */
> +struct drm_gpuva_fn_ops {
> +	/**
> +	 * @op_alloc: called when the &drm_gpuva_manager allocates
> +	 * a struct drm_gpuva_op
> +	 *
> +	 * Some drivers may want to embed struct drm_gpuva_op into driver
> +	 * specific structures. By implementing this callback drivers can
> +	 * allocate memory accordingly.
> +	 *
> +	 * This callback is optional.
> +	 */
> +	struct drm_gpuva_op *(*op_alloc)(void);
> +
> +	/**
> +	 * @op_free: called when the &drm_gpuva_manager frees a
> +	 * struct drm_gpuva_op
> +	 *
> +	 * Some drivers may want to embed struct drm_gpuva_op into driver
> +	 * specific structures. By implementing this callback drivers can
> +	 * free the previously allocated memory accordingly.
> +	 *
> +	 * This callback is optional.
> +	 */
> +	void (*op_free)(struct drm_gpuva_op *op);
> +
> +	/**
> +	 * @sm_map_step: called from &drm_gpuva_sm_map providing the split and
> +	 * merge steps
> +	 *
> +	 * This callback provides a single split / merge step or, if no split
> +	 * and merge is indicated, the original map operation.
> +	 *
> +	 * The &priv pointer is equal to the one drivers pass to
> +	 * &drm_gpuva_sm_map.
> +	 */
> +	int (*sm_map_step)(struct drm_gpuva_op *op, void *priv);
> +
> +	/**
> +	 * @sm_unmap_step: called from &drm_gpuva_sm_map providing the split and
> +	 * merge steps
> +	 *
> +	 * This callback provides a single split step or, if no split is
> +	 * indicated, the plain unmap operations of the corresponding unmap
> +	 * range originally passed to &drm_gpuva_sm_unmap.
> +	 *
> +	 * The &priv pointer is equal to the one drivers pass to
> +	 * &drm_gpuva_sm_unmap.
> +	 */
> +	int (*sm_unmap_step)(struct drm_gpuva_op *op, void *priv);
> +};
> +
> +int drm_gpuva_sm_map(struct drm_gpuva_manager *mgr, void *priv,
> +		     u64 addr, u64 range,
> +		     struct drm_gem_object *obj, u64 offset);
> +
> +int drm_gpuva_sm_unmap(struct drm_gpuva_manager *mgr, void *priv,
> +		       u64 addr, u64 range);
> +
> +#endif /* __DRM_GPUVA_MGR_H__ */


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-22 10:25   ` Christian König
@ 2023-02-22 15:07     ` Danilo Krummrich
  2023-02-22 15:14       ` Christian König
  0 siblings, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-22 15:07 UTC (permalink / raw)
  To: Christian König
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, Dave Airlie, bagasdotme, jason

On 2/22/23 11:25, Christian König wrote:
> Am 17.02.23 um 14:44 schrieb Danilo Krummrich:

<snip>

>> +/**
>> + * DOC: Overview
>> + *
>> + * The DRM GPU VA Manager, represented by struct drm_gpuva_manager 
>> keeps track
>> + * of a GPU's virtual address (VA) space and manages the 
>> corresponding virtual
>> + * mappings represented by &drm_gpuva objects. It also keeps track of 
>> the
>> + * mapping's backing &drm_gem_object buffers.
>> + *
>> + * &drm_gem_object buffers maintain a list (and a corresponding list 
>> lock) of
>> + * &drm_gpuva objects representing all existent GPU VA mappings using 
>> this
>> + * &drm_gem_object as backing buffer.
>> + *
>> + * If the &DRM_GPUVA_MANAGER_REGIONS feature is enabled, a GPU VA 
>> mapping can
>> + * only be created within a previously allocated &drm_gpuva_region, 
>> which
>> + * represents a reserved portion of the GPU VA space. GPU VA mappings 
>> are not
>> + * allowed to span over a &drm_gpuva_region's boundary.
>> + *
>> + * GPU VA regions can also be flagged as sparse, which allows drivers 
>> to create
>> + * sparse mappings for a whole GPU VA region in order to support Vulkan
>> + * 'Sparse Resources'.
> 
> Well since we have now found that there is absolutely no technical 
> reason for having those regions could we please drop them?

I disagree this was the outcome of our previous discussion.

In nouveau I still need them to track the separate sparse page tables 
and, as you confirmed previously, Nvidia cards are not the only cards 
supporting this feature.

The second reason is that with regions we can avoid merging between 
buffers, which saves some effort. However, I agree that this argument by 
itself probably doesn't hold too much, since you've pointed out in a 
previous mail that:

<cite>
1) If we merge and decide to only do that inside certain boundaries then 
those boundaries needs to be provided and checked against. This burns 
quite some CPU cycles

2) If we just merge what we can we might have extra page table updates 
which cost time and could result in undesired side effects.

3) If we don't merge at all we have additional housekeeping for the 
mappings and maybe hw restrictions.
</cite>

However, if a driver uses regions to track its separate sparse page 
tables anyway it gets 1) for free, which is a nice synergy.

I totally agree that regions aren't for everyone though. Hence, I made 
them an optional feature and by default regions are disabled. In order 
to use them drm_gpuva_manager_init() must be called with the 
DRM_GPUVA_MANAGER_REGIONS feature flag.

I really would not want to open code regions or have two GPUVA manager 
instances in nouveau to track sparse page tables. That would be really 
messy, hence I hope we can agree on this to be an optional feature.

> 
> I don't really see a need for them any more.
> 
> Regards,
> Christian.
>  


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-22 15:07     ` Danilo Krummrich
@ 2023-02-22 15:14       ` Christian König
  2023-02-22 16:40         ` Danilo Krummrich
  0 siblings, 1 reply; 64+ messages in thread
From: Christian König @ 2023-02-22 15:14 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, Dave Airlie, bagasdotme, jason

Am 22.02.23 um 16:07 schrieb Danilo Krummrich:
> On 2/22/23 11:25, Christian König wrote:
>> Am 17.02.23 um 14:44 schrieb Danilo Krummrich:
>
> <snip>
>
>>> +/**
>>> + * DOC: Overview
>>> + *
>>> + * The DRM GPU VA Manager, represented by struct drm_gpuva_manager 
>>> keeps track
>>> + * of a GPU's virtual address (VA) space and manages the 
>>> corresponding virtual
>>> + * mappings represented by &drm_gpuva objects. It also keeps track 
>>> of the
>>> + * mapping's backing &drm_gem_object buffers.
>>> + *
>>> + * &drm_gem_object buffers maintain a list (and a corresponding 
>>> list lock) of
>>> + * &drm_gpuva objects representing all existent GPU VA mappings 
>>> using this
>>> + * &drm_gem_object as backing buffer.
>>> + *
>>> + * If the &DRM_GPUVA_MANAGER_REGIONS feature is enabled, a GPU VA 
>>> mapping can
>>> + * only be created within a previously allocated &drm_gpuva_region, 
>>> which
>>> + * represents a reserved portion of the GPU VA space. GPU VA 
>>> mappings are not
>>> + * allowed to span over a &drm_gpuva_region's boundary.
>>> + *
>>> + * GPU VA regions can also be flagged as sparse, which allows 
>>> drivers to create
>>> + * sparse mappings for a whole GPU VA region in order to support 
>>> Vulkan
>>> + * 'Sparse Resources'.
>>
>> Well since we have now found that there is absolutely no technical 
>> reason for having those regions could we please drop them?
>
> I disagree this was the outcome of our previous discussion.
>
> In nouveau I still need them to track the separate sparse page tables 
> and, as you confirmed previously, Nvidia cards are not the only cards 
> supporting this feature.
>
> The second reason is that with regions we can avoid merging between 
> buffers, which saves some effort. However, I agree that this argument 
> by itself probably doesn't hold too much, since you've pointed out in 
> a previous mail that:
>
> <cite>
> 1) If we merge and decide to only do that inside certain boundaries 
> then those boundaries needs to be provided and checked against. This 
> burns quite some CPU cycles
>
> 2) If we just merge what we can we might have extra page table updates 
> which cost time and could result in undesired side effects.
>
> 3) If we don't merge at all we have additional housekeeping for the 
> mappings and maybe hw restrictions.
> </cite>
>
> However, if a driver uses regions to track its separate sparse page 
> tables anyway it gets 1) for free, which is a nice synergy.
>
> I totally agree that regions aren't for everyone though. Hence, I made 
> them an optional feature and by default regions are disabled. In order 
> to use them drm_gpuva_manager_init() must be called with the 
> DRM_GPUVA_MANAGER_REGIONS feature flag.
>
> I really would not want to open code regions or have two GPUVA manager 
> instances in nouveau to track sparse page tables. That would be really 
> messy, hence I hope we can agree on this to be an optional feature.

I absolutely don't think that this is a good idea then. This separate 
handling of sparse page tables is completely Nouveau specific.

Even when it's optional feature mixing this into the common handling is 
exactly what I pointed out as not properly separating between hardware 
specific and hardware agnostic functionality.

This is exactly the problem we ran into with TTM as well and I've spend 
a massive amount of time to clean that up again.

Regards,
Christian.

>
>>
>> I don't really see a need for them any more.
>>
>> Regards,
>> Christian.
>>
>


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
  2023-02-21 18:31               ` Matthew Wilcox
@ 2023-02-22 16:11                 ` Danilo Krummrich
  2023-02-22 16:32                   ` Matthew Wilcox
  2023-02-27 17:39                 ` Danilo Krummrich
  1 sibling, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-22 16:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: matthew.brost, bagasdotme, linux-doc, nouveau, ogabbay, corbet,
	linux-kernel, dri-devel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, christian.koenig, jason

On 2/21/23 19:31, Matthew Wilcox wrote:
> On Tue, Feb 21, 2023 at 03:37:49PM +0100, Danilo Krummrich wrote:
>> On Mon, Feb 20, 2023 at 08:33:35PM +0000, Matthew Wilcox wrote:
>>> On Mon, Feb 20, 2023 at 06:06:03PM +0100, Danilo Krummrich wrote:
>>>> On 2/20/23 16:10, Matthew Wilcox wrote:
>>>>> This is why we like people to use the spinlock embedded in the tree.
>>>>> There's nothing for the user to care about.  If the access really is
>>>>> serialised, acquiring/releasing the uncontended spinlock is a minimal
>>>>> cost compared to all the other things that will happen while modifying
>>>>> the tree.
>>>>
>>>> I think as for the users of the GPUVA manager we'd have two cases:
>>>>
>>>> 1) Accesses to the manager (and hence the tree) are serialized, no lock
>>>> needed.
>>>>
>>>> 2) Multiple operations on the tree must be locked in order to make them
>>>> appear atomic.
>>>
>>> Could you give an example here of what you'd like to do?  Ideally
>>> something complicated so I don't say "Oh, you can just do this" when
>>> there's a more complex example for which "this" won't work.  I'm sure
>>> that's embedded somewhere in the next 20-odd patches, but it's probably
>>> quicker for you to describe in terms of tree operations that have to
>>> appear atomic than for me to try to figure it out.
>>>
>>
>> Absolutely, not gonna ask you to read all of that. :-)
>>
>> One thing the GPUVA manager does is to provide drivers the (sub-)operations
>> that need to be processed in order to fulfill a map or unmap request from
>> userspace. For instance, when userspace asks the driver to map some memory
>> the GPUVA manager calculates which existing mappings must be removed, split up
>> or can be merged with the newly requested mapping.
>>
>> A driver has two ways to fetch those operations from the GPUVA manager. It can
>> either obtain a list of operations or receive a callback for each operation
>> generated by the GPUVA manager.
>>
>> In both cases the GPUVA manager walks the maple tree, which keeps track of
>> existing mappings, for the given range in __drm_gpuva_sm_map() (only considering
>> the map case, since the unmap case is a subset basically). For each mapping
>> found in the given range the driver, as mentioned, either receives a callback or
>> a list entry is added to the list of operations.
>>
>> Typically, for each operation / callback one entry within the maple tree is
>> removed and, optionally at the beginning and end of a new mapping's range, a
>> new entry is inserted. An of course, as the last operation, there is the new
>> mapping itself to insert.
>>
>> The GPUVA manager delegates locking responsibility to the drivers. Typically,
>> a driver either serializes access to the VA space managed by the GPUVA manager
>> (no lock needed) or need to lock the processing of a full set of operations
>> generated by the GPUVA manager.
> 
> OK, that all makes sense.  It does make sense to have the driver use its
> own mutex and then take the spinlock inside the maple tree code.  It
> shouldn't ever be contended.
> 
>>>> In either case the embedded spinlock wouldn't be useful, we'd either need an
>>>> external lock or no lock at all.
>>>>
>>>> If there are any internal reasons why specific tree operations must be
>>>> mutually excluded (such as those you explain below), wouldn't it make more
>>>> sense to always have the internal lock and, optionally, allow users to
>>>> specify an external lock additionally?
>>>
>>> So the way this works for the XArray, which is a little older than the
>>> Maple tree, is that we always use the internal spinlock for
>>> modifications (possibly BH or IRQ safe), and if someone wants to
>>> use an external mutex to make some callers atomic with respect to each
>>> other, they're free to do so.  In that case, the XArray doesn't check
>>> the user's external locking at all, because it really can't know.
>>>
>>> I'd advise taking that approach; if there's really no way to use the
>>> internal spinlock to make your complicated updates appear atomic
>>> then just let the maple tree use its internal spinlock, and you can
>>> also use your external mutex however you like.
>>>
>>
>> That sounds like the right thing to do.
>>
>> However, I'm using the advanced API of the maple tree (and that's the reason
>> why the above example appears a little more detailed than needed) because I
>> think with the normal API I can't insert / remove tree entries while walking
>> the tree, right?
> 
> Right.  The normal API is for simple operations while the advanced API
> is for doing compound operations.
> 
>> As by the documentation the advanced API, however, doesn't take care of locking
>> itself, hence just letting the maple tree use its internal spinlock doesn't
>> really work - I need to take care of that myself, right?
> 
> Yes; once you're using the advanced API, you get to compose the entire
> operation yourself.
> 
>> It feels a bit weird that I, as a user of the API, would need to lock certain
>> (or all?) mas_*() functions with the internal spinlock in order to protect
>> (future) internal features of the tree, such as the slab cache defragmentation
>> you mentioned. Because from my perspective, as the generic component that tells
>> it's users (the drivers) to take care of locking VA space operations (and hence
>> tree operations) I don't have an own purpose of this internal spinlock, right?
> 
> You don't ... but we can't know that.

Thanks for the clarification. I think I should now know what to for the 
GPUVA manager in terms of locking the maple tree in general.

Though I still have very limited insights on the maple tree I want to 
share some further thoughts.

 From what I got so far it really seems to me that it would be better to 
just take the internal spinlock for both APIs (normal and advanced) 
whenever you need to internally.

This way users would not need to take care of locking maple tree 
internals, which I still think is a little odd.

Another plus would probably be maintainability. Once you got quite a few 
maple tree users using external locks (either in the sense of calling 
mt_set_external_lock() or in the way I'll potentially do it by using the 
internal lock with the advanced API and an additional external lock) it 
might be hard to apply any changes to the locking requirements, because 
you would either need to check every users implementation by hand or be 
able to run it in order to check it with lockdep.

If I got this correctly (please tell me if I don't) the only reason the 
internal lock is not managed by the advanced API internally is to let 
users do more complex transactions, without the need of having a 
separate external lock, as long as they fulfill the locking requirements 
of the maple tree, which are enforced by lockdep.

However, you already mentioned that "acquiring/releasing the uncontended 
spinlock is a minimal cost compared to all the other things that will 
happen while modifying the tree".

Do I miss something?

> 
>> Also I'm a little confused how I'd know where to take the spinlock? E.g. for
>> inserting entries in the tree I use mas_store_gfp() with GFP_KERNEL.
> 
> Lockdep will shout at you if you get it wrong ;-)  But you can safely
> take the spinlock before calling mas_store_gfp(GFP_KERNEL) because
> mas_nomem() knows to drop the lock before doing a sleeping allocation.
> Essentially you're open-coding mtree_store_range() but doing your own
> thing in addition to the store.
> 
Just asking lockdep was my plan already, however I thought I still 
better ask. :D

If you will keep the current approach of handling the internal lock I 
think its necessary to somewhere document where users need to take the 
lock and, even more important, where the maple tree implementation will 
drop the lock.

For instance, if I would rely on using the internal spinlock for locking 
sets of transactions to the maple tree this would cause nasty bugs if I 
use functions like mas_store_gfp() dropping the lock. Even though I must 
admit that this is not a great example, since it should raise some red 
flags if a user would expect a spinlock is held for a sleeping 
allocation without questioning it. However, you get the point I guess.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
  2023-02-22 16:11                 ` Danilo Krummrich
@ 2023-02-22 16:32                   ` Matthew Wilcox
  2023-02-22 17:28                     ` Danilo Krummrich
  0 siblings, 1 reply; 64+ messages in thread
From: Matthew Wilcox @ 2023-02-22 16:32 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, bagasdotme, linux-doc, nouveau, ogabbay, corbet,
	linux-kernel, dri-devel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, christian.koenig, jason

On Wed, Feb 22, 2023 at 05:11:34PM +0100, Danilo Krummrich wrote:
> On 2/21/23 19:31, Matthew Wilcox wrote:
> > on tue, feb 21, 2023 at 03:37:49pm +0100, danilo krummrich wrote:
> > > It feels a bit weird that I, as a user of the API, would need to lock certain
> > > (or all?) mas_*() functions with the internal spinlock in order to protect
> > > (future) internal features of the tree, such as the slab cache defragmentation
> > > you mentioned. Because from my perspective, as the generic component that tells
> > > it's users (the drivers) to take care of locking VA space operations (and hence
> > > tree operations) I don't have an own purpose of this internal spinlock, right?
> > 
> > You don't ... but we can't know that.
> 
> Thanks for the clarification. I think I should now know what to for the
> GPUVA manager in terms of locking the maple tree in general.
> 
> Though I still have very limited insights on the maple tree I want to share
> some further thoughts.
> 
> From what I got so far it really seems to me that it would be better to just
> take the internal spinlock for both APIs (normal and advanced) whenever you
> need to internally.

No.  Really, no.  The point of the advanced API is that it's a toolbox
for doing the operation you want, but isn't a generic enough operation
to be part of the normal API.  To take an example from the radix
tree days, in the page cache, we need to walk a range of the tree,
looking for any entries that are marked as DIRTY, clear the DIRTY
mark and set the TOWRITE mark.  There was a horrendous function called
radix_tree_range_tag_if_tagged() which did exactly this.  Now look at
the implementation of tag_pages_for_writeback(); it's a simple loop over
a range with an occasional pause to check whether we need to reschedule.

But that means you need to know how to use the toolbox.  Some of the
tools are dangerous and you can cut yourself on them.

> Another plus would probably be maintainability. Once you got quite a few
> maple tree users using external locks (either in the sense of calling

I don't want maple tree users using external locks.  That exists
because it was the only reasonable way of converting the VMA tree
from the rbtree to the maple tree.  I intend to get rid of
mt_set_external_lock().  The VMAs are eventually going to be protected
by the internal spinlock.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-22 15:14       ` Christian König
@ 2023-02-22 16:40         ` Danilo Krummrich
  2023-02-23  7:06           ` Christian König
  0 siblings, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-22 16:40 UTC (permalink / raw)
  To: Christian König
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, Dave Airlie, bagasdotme, jason

On 2/22/23 16:14, Christian König wrote:
> Am 22.02.23 um 16:07 schrieb Danilo Krummrich:
>> On 2/22/23 11:25, Christian König wrote:
>>> Am 17.02.23 um 14:44 schrieb Danilo Krummrich:
>>
>> <snip>
>>
>>>> +/**
>>>> + * DOC: Overview
>>>> + *
>>>> + * The DRM GPU VA Manager, represented by struct drm_gpuva_manager 
>>>> keeps track
>>>> + * of a GPU's virtual address (VA) space and manages the 
>>>> corresponding virtual
>>>> + * mappings represented by &drm_gpuva objects. It also keeps track 
>>>> of the
>>>> + * mapping's backing &drm_gem_object buffers.
>>>> + *
>>>> + * &drm_gem_object buffers maintain a list (and a corresponding 
>>>> list lock) of
>>>> + * &drm_gpuva objects representing all existent GPU VA mappings 
>>>> using this
>>>> + * &drm_gem_object as backing buffer.
>>>> + *
>>>> + * If the &DRM_GPUVA_MANAGER_REGIONS feature is enabled, a GPU VA 
>>>> mapping can
>>>> + * only be created within a previously allocated &drm_gpuva_region, 
>>>> which
>>>> + * represents a reserved portion of the GPU VA space. GPU VA 
>>>> mappings are not
>>>> + * allowed to span over a &drm_gpuva_region's boundary.
>>>> + *
>>>> + * GPU VA regions can also be flagged as sparse, which allows 
>>>> drivers to create
>>>> + * sparse mappings for a whole GPU VA region in order to support 
>>>> Vulkan
>>>> + * 'Sparse Resources'.
>>>
>>> Well since we have now found that there is absolutely no technical 
>>> reason for having those regions could we please drop them?
>>
>> I disagree this was the outcome of our previous discussion.
>>
>> In nouveau I still need them to track the separate sparse page tables 
>> and, as you confirmed previously, Nvidia cards are not the only cards 
>> supporting this feature.
>>
>> The second reason is that with regions we can avoid merging between 
>> buffers, which saves some effort. However, I agree that this argument 
>> by itself probably doesn't hold too much, since you've pointed out in 
>> a previous mail that:
>>
>> <cite>
>> 1) If we merge and decide to only do that inside certain boundaries 
>> then those boundaries needs to be provided and checked against. This 
>> burns quite some CPU cycles
>>
>> 2) If we just merge what we can we might have extra page table updates 
>> which cost time and could result in undesired side effects.
>>
>> 3) If we don't merge at all we have additional housekeeping for the 
>> mappings and maybe hw restrictions.
>> </cite>
>>
>> However, if a driver uses regions to track its separate sparse page 
>> tables anyway it gets 1) for free, which is a nice synergy.
>>
>> I totally agree that regions aren't for everyone though. Hence, I made 
>> them an optional feature and by default regions are disabled. In order 
>> to use them drm_gpuva_manager_init() must be called with the 
>> DRM_GPUVA_MANAGER_REGIONS feature flag.
>>
>> I really would not want to open code regions or have two GPUVA manager 
>> instances in nouveau to track sparse page tables. That would be really 
>> messy, hence I hope we can agree on this to be an optional feature.
> 
> I absolutely don't think that this is a good idea then. This separate 
> handling of sparse page tables is completely Nouveau specific.

Actually, I rely on what you said in a previous mail when I say it's, 
potentially, not specific to nouveau.

<cite>
This sounds similar to what AMD hw used to have up until gfx8 (I think), 
basically sparse resources where defined through a separate mechanism to 
the address resolution of the page tables. I won't rule out that other 
hardware has similar approaches.
</cite>

> 
> Even when it's optional feature mixing this into the common handling is 
> exactly what I pointed out as not properly separating between hardware 
> specific and hardware agnostic functionality.

Optionally having regions is *not* a hardware specific concept, drivers 
might use it for a hardware specific purpose though. Which potentially 
is is the case for almost every DRM helper.

Drivers can use regions only for the sake of not merging between buffer 
boundaries as well. Some drivers might prefer this over "never merge" or 
"always merge", depending on the cost of re-organizing page tables for 
unnecessary splits/merges, without having the need of tracking separate 
sparse page tables.

Its just that I think *if* a driver needs to track separate sparse page 
tables anyways its a nice synergy since then there is no extra cost for 
getting this optimization.

> 
> This is exactly the problem we ran into with TTM as well and I've spend 
> a massive amount of time to clean that up again. >

Admittedly, I don't know what problems you are referring to. However, I 
don't see which kind of trouble it could cause by allowing drivers to 
track regions optionally.

> Regards,
> Christian.
> 
>>
>>>
>>> I don't really see a need for them any more.
>>>
>>> Regards,
>>> Christian.
>>>
>>
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
  2023-02-22 16:32                   ` Matthew Wilcox
@ 2023-02-22 17:28                     ` Danilo Krummrich
  0 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-22 17:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: matthew.brost, bagasdotme, linux-doc, nouveau, ogabbay, corbet,
	linux-kernel, dri-devel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, christian.koenig, jason

On 2/22/23 17:32, Matthew Wilcox wrote:
> On Wed, Feb 22, 2023 at 05:11:34PM +0100, Danilo Krummrich wrote:
>> On 2/21/23 19:31, Matthew Wilcox wrote:
>>> on tue, feb 21, 2023 at 03:37:49pm +0100, danilo krummrich wrote:
>>>> It feels a bit weird that I, as a user of the API, would need to lock certain
>>>> (or all?) mas_*() functions with the internal spinlock in order to protect
>>>> (future) internal features of the tree, such as the slab cache defragmentation
>>>> you mentioned. Because from my perspective, as the generic component that tells
>>>> it's users (the drivers) to take care of locking VA space operations (and hence
>>>> tree operations) I don't have an own purpose of this internal spinlock, right?
>>>
>>> You don't ... but we can't know that.
>>
>> Thanks for the clarification. I think I should now know what to for the
>> GPUVA manager in terms of locking the maple tree in general.
>>
>> Though I still have very limited insights on the maple tree I want to share
>> some further thoughts.
>>
>>  From what I got so far it really seems to me that it would be better to just
>> take the internal spinlock for both APIs (normal and advanced) whenever you
>> need to internally.
> 
> No.  Really, no.  The point of the advanced API is that it's a toolbox
> for doing the operation you want, but isn't a generic enough operation
> to be part of the normal API.

Again the disclaimer, I'm just sharing my thoughts from the perspective 
of a user from a generic tree API.

For me it feels like - and this purely is an assumption, hence please 
correct me if I'm wrong on that - you consider the advanced API to be 
more of a collection of internal functions not *really* being meant to 
be used by arbitrary users and maybe even being slightly tied to mm 
since it originated there?

However, from my external perspective I see it the following way.

Even if an operation is not part of the 'normal API', but an API called 
'advanced API', it still is a generic API operation being exposed to 
arbitrary users. However, my point is not (at least not exclusively) 
that I do not consider this to be safe enough or something.

Its just that I think that when the API *enforces* the user to take an 
internal lock at certain places it can also just take the lock itself no 
matter what the API is being called. Especially when one can't rely on 
this lock at all for other (external) purposes anyways because the 
implementation behind the API is free to drop the lock whenever it needs to.

> To take an example from the radix
> tree days, in the page cache, we need to walk a range of the tree,
> looking for any entries that are marked as DIRTY, clear the DIRTY
> mark and set the TOWRITE mark.  There was a horrendous function called
> radix_tree_range_tag_if_tagged() which did exactly this.  Now look at
> the implementation of tag_pages_for_writeback(); it's a simple loop over
> a range with an occasional pause to check whether we need to reschedule.
> 
> But that means you need to know how to use the toolbox.  Some of the
> tools are dangerous and you can cut yourself on them.
> 
>> Another plus would probably be maintainability. Once you got quite a few
>> maple tree users using external locks (either in the sense of calling
> 
> I don't want maple tree users using external locks.  That exists
> because it was the only reasonable way of converting the VMA tree
> from the rbtree to the maple tree.  I intend to get rid of
> mt_set_external_lock().  The VMAs are eventually going to be protected
> by the internal spinlock.
> 

But the argument also holds for the case of using the advanced API and 
using the internal spinlock. If your requirements on locking change in 
the future every user implementation must be re-validated.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-21 18:20   ` Liam R. Howlett
@ 2023-02-22 18:13     ` Danilo Krummrich
  2023-02-23 19:09       ` Liam R. Howlett
  2023-02-28  2:17     ` Danilo Krummrich
  1 sibling, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-22 18:13 UTC (permalink / raw)
  To: Liam R. Howlett
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Dave Airlie, bagasdotme, christian.koenig, jason

On 2/21/23 19:20, Liam R. Howlett wrote:
> * Danilo Krummrich <dakr@redhat.com> [230217 08:45]:
>> Add infrastructure to keep track of GPU virtual address (VA) mappings
>> with a decicated VA space manager implementation.
>>
>> New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
>> start implementing, allow userspace applications to request multiple and
>> arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
>> intended to serve the following purposes in this context.
>>
>> 1) Provide infrastructure to track GPU VA allocations and mappings,
>>     making use of the maple_tree.
>>
>> 2) Generically connect GPU VA mappings to their backing buffers, in
>>     particular DRM GEM objects.
>>
>> 3) Provide a common implementation to perform more complex mapping
>>     operations on the GPU VA space. In particular splitting and merging
>>     of GPU VA mappings, e.g. for intersecting mapping requests or partial
>>     unmap requests.
>>
>> Suggested-by: Dave Airlie <airlied@redhat.com>
>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>> ---
>>   Documentation/gpu/drm-mm.rst    |   31 +
>>   drivers/gpu/drm/Makefile        |    1 +
>>   drivers/gpu/drm/drm_gem.c       |    3 +
>>   drivers/gpu/drm/drm_gpuva_mgr.c | 1704 +++++++++++++++++++++++++++++++
>>   include/drm/drm_drv.h           |    6 +
>>   include/drm/drm_gem.h           |   75 ++
>>   include/drm/drm_gpuva_mgr.h     |  714 +++++++++++++
>>   7 files changed, 2534 insertions(+)
>>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>>   create mode 100644 include/drm/drm_gpuva_mgr.h
>>
>> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
>> index a52e6f4117d6..c9f120cfe730 100644
>> --- a/Documentation/gpu/drm-mm.rst
>> +++ b/Documentation/gpu/drm-mm.rst
>> @@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
>>   .. kernel-doc:: drivers/gpu/drm/drm_mm.c
>>      :export:
>>   
> ...
> 
>> +
>> +/**
>> + * drm_gpuva_remove_iter - removes the iterators current element
>> + * @it: the &drm_gpuva_iterator
>> + *
>> + * This removes the element the iterator currently points to.
>> + */
>> +void
>> +drm_gpuva_iter_remove(struct drm_gpuva_iterator *it)
>> +{
>> +	mas_erase(&it->mas);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_iter_remove);
>> +
>> +/**
>> + * drm_gpuva_insert - insert a &drm_gpuva
>> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
>> + * @va: the &drm_gpuva to insert
>> + * @addr: the start address of the GPU VA
>> + * @range: the range of the GPU VA
>> + *
>> + * Insert a &drm_gpuva with a given address and range into a
>> + * &drm_gpuva_manager.
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +int
>> +drm_gpuva_insert(struct drm_gpuva_manager *mgr,
>> +		 struct drm_gpuva *va)
>> +{
>> +	u64 addr = va->va.addr;
>> +	u64 range = va->va.range;
>> +	MA_STATE(mas, &mgr->va_mt, addr, addr + range - 1);
>> +	struct drm_gpuva_region *reg = NULL;
>> +	int ret;
>> +
>> +	if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
>> +		return -EINVAL;
>> +
>> +	if (unlikely(drm_gpuva_in_kernel_region(mgr, addr, range)))
>> +		return -EINVAL;
>> +
>> +	if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS) {
>> +		reg = drm_gpuva_in_region(mgr, addr, range);
>> +		if (unlikely(!reg))
>> +			return -EINVAL;
>> +	}
>> +
> 
> -----
> 
>> +	if (unlikely(drm_gpuva_find_first(mgr, addr, range)))
>> +		return -EEXIST;
>> +
>> +	ret = mas_store_gfp(&mas, va, GFP_KERNEL);
> 
> mas_walk() will set the internal maple state to the limits to what it
> finds.  So, instead of an iterator, you can use the walk function and
> ensure there is a large enough area in the existing NULL:
> 
> /*
>   * Nothing at addr, mas now points to the location where the store would
>   * happen
>   */
> if (mas_walk(&mas))
> 	return -EEXIST;
> 
> /* The NULL entry ends at mas.last, make sure there is room */
> if (mas.last < (addr + range - 1))
> 	return -EEXIST;
> 
> /* Limit the store size to the correct end address, and store */
>   mas.last = addr + range - 1;
>   ret = mas_store_gfp(&mas, va, GFP_KERNEL);
> 

Would this variant be significantly more efficient?

Also, would this also work while already walking the tree?

To remove an entry while walking the tree I have a separate function 
drm_gpuva_iter_remove(). Would I need something similar for inserting 
entries?

I already provided this example in a separate mail thread, but it may 
makes sense to move this to the mailing list:

In __drm_gpuva_sm_map() we're iterating a given range of the tree, where 
the given range is the size of the newly requested mapping. 
__drm_gpuva_sm_map() invokes a callback for each sub-operation that 
needs to be taken in order to fulfill this mapping request. In most 
cases such a callback just creates a drm_gpuva_op object and stores it 
in a list.

However, drivers can also implement the callback, such that they 
directly execute this operation within the callback.

Let's have a look at the following example:

      0     a     2
old: |-----------|       (bo_offset=n)

            1     b     3
req:       |-----------| (bo_offset=m)

      0  a' 1     b     3
new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)

This would result in the following operations.

__drm_gpuva_sm_map() finds entry "a" and calls back into the driver 
suggesting to re-map "a" with the new size. The driver removes entry "a" 
from the tree and adds "a'"

__drm_gpuva_sm_map(), ideally, continues the loop searching for nodes 
starting from the end of "a" (which is 2) till the end of the requested 
mapping "b" (which is 3). Since it doesn't find any other mapping within 
this range it calls back into the driver suggesting to finally map "b".

If there would have been another mapping between 2 and 3 it would have 
called back into the driver asking to unmap this mapping beforehand.

So, it boils down to re-mapping as described at the beginning (and 
analogously at the end) of a new mapping range and removing of entries 
that are enclosed by the new mapping range.

>> +	if (unlikely(ret))
>> +		return ret;
>> +
>> +	va->mgr = mgr;
>> +	va->region = reg;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_insert);
>> +
>> +/**
>> + * drm_gpuva_remove - remove a &drm_gpuva
>> + * @va: the &drm_gpuva to remove
>> + *
>> + * This removes the given &va from the underlaying tree.
>> + */
>> +void
>> +drm_gpuva_remove(struct drm_gpuva *va)
>> +{
>> +	MA_STATE(mas, &va->mgr->va_mt, va->va.addr, 0);
>> +
>> +	mas_erase(&mas);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_remove);
>> +
> ...
> 
>> +/**
>> + * drm_gpuva_find_first - find the first &drm_gpuva in the given range
>> + * @mgr: the &drm_gpuva_manager to search in
>> + * @addr: the &drm_gpuvas address
>> + * @range: the &drm_gpuvas range
>> + *
>> + * Returns: the first &drm_gpuva within the given range
>> + */
>> +struct drm_gpuva *
>> +drm_gpuva_find_first(struct drm_gpuva_manager *mgr,
>> +		     u64 addr, u64 range)
>> +{
>> +	MA_STATE(mas, &mgr->va_mt, addr, 0);
>> +
>> +	return mas_find(&mas, addr + range - 1);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_find_first);
>> +
>> +/**
>> + * drm_gpuva_find - find a &drm_gpuva
>> + * @mgr: the &drm_gpuva_manager to search in
>> + * @addr: the &drm_gpuvas address
>> + * @range: the &drm_gpuvas range
>> + *
>> + * Returns: the &drm_gpuva at a given &addr and with a given &range
> 
> Note that mas_find() will continue upwards in the address space if there
> isn't anything at @addr.  This means that &drm_gpuva may not be at
> &addr.  If you want to check just at &addr, use mas_walk().

Good catch. drm_gpuva_find() should then either also check for 
'va->va.addr == addr' as well or, alternatively, use mas_walk(). As 
above, any reason to prefer mas_walk()?

> 
>> + */
>> +struct drm_gpuva *
>> +drm_gpuva_find(struct drm_gpuva_manager *mgr,
>> +	       u64 addr, u64 range)
>> +{
>> +	struct drm_gpuva *va;
>> +
>> +	va = drm_gpuva_find_first(mgr, addr, range);
>> +	if (!va)
>> +		goto out;
>> +
>> +	if (va->va.range != range)
>> +		goto out;
>> +
>> +	return va;
>> +
>> +out:
>> +	return NULL;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_find);
>> +
>> +/**
>> + * drm_gpuva_find_prev - find the &drm_gpuva before the given address
>> + * @mgr: the &drm_gpuva_manager to search in
>> + * @start: the given GPU VA's start address
>> + *
>> + * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
>> + *
>> + * Note that if there is any free space between the GPU VA mappings no mapping
>> + * is returned.
>> + *
>> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
>> + */
>> +struct drm_gpuva *
>> +drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
> 
> find_prev() usually continues beyond 1 less than the address. I found
> this name confusing. 

Don't really get that, mind explaining?

> You may as well use mas_walk(), it would be faster.

How would I use mas_walk() for that? If I understand it correctly, 
mas_walk() requires me to know that start address, which I don't know 
for the previous entry.

However, mas_walk() seems to be a good alternative to use for 
drm_gpuva_find_next().

>> +{
>> +	MA_STATE(mas, &mgr->va_mt, start, 0);
>> +
>> +	if (start <= mgr->mm_start ||
>> +	    start > (mgr->mm_start + mgr->mm_range))
>> +		return NULL;
>> +
>> +	return mas_prev(&mas, start - 1);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_find_prev);
>> +
>> +/**
>> + * drm_gpuva_find_next - find the &drm_gpuva after the given address
>> + * @mgr: the &drm_gpuva_manager to search in
>> + * @end: the given GPU VA's end address
>> + *
>> + * Find the adjacent &drm_gpuva after the GPU VA with given &end address.
>> + *
>> + * Note that if there is any free space between the GPU VA mappings no mapping
>> + * is returned.
>> + *
>> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
>> + */
>> +struct drm_gpuva *
>> +drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end)
> 
> This name is also a bit confusing for the same reason.  Again, it seems
> worth just walking to end here.
> 
>> +{
>> +	MA_STATE(mas, &mgr->va_mt, end - 1, 0);
>> +
>> +	if (end < mgr->mm_start ||
>> +	    end >= (mgr->mm_start + mgr->mm_range))
>> +		return NULL;
>> +
>> +	return mas_next(&mas, end);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_find_next);
>> +
>> +static int
>> +__drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
>> +			  struct drm_gpuva_region *reg)
>> +{
>> +	u64 addr = reg->va.addr;
>> +	u64 range = reg->va.range;
>> +	MA_STATE(mas, &mgr->region_mt, addr, addr + range - 1);
>> +	int ret;
>> +
>> +	if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
>> +		return -EINVAL;
>> +
>> +	ret = mas_store_gfp(&mas, reg, GFP_KERNEL);
>> +	if (unlikely(ret))
>> +		return ret;
>> +
>> +	reg->mgr = mgr;
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * drm_gpuva_region_insert - insert a &drm_gpuva_region
>> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
>> + * @reg: the &drm_gpuva_region to insert
>> + * @addr: the start address of the GPU VA
>> + * @range: the range of the GPU VA
>> + *
>> + * Insert a &drm_gpuva_region with a given address and range into a
>> + * &drm_gpuva_manager.
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +int
>> +drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
>> +			struct drm_gpuva_region *reg)
>> +{
>> +	if (unlikely(!(mgr->flags & DRM_GPUVA_MANAGER_REGIONS)))
>> +		return -EINVAL;
>> +
>> +	return __drm_gpuva_region_insert(mgr, reg);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_region_insert);
>> +
>> +static void
>> +__drm_gpuva_region_remove(struct drm_gpuva_region *reg)
>> +{
>> +	struct drm_gpuva_manager *mgr = reg->mgr;
>> +	MA_STATE(mas, &mgr->region_mt, reg->va.addr, 0);
>> +
>> +	mas_erase(&mas);
>> +}
>> +
>> +/**
>> + * drm_gpuva_region_remove - remove a &drm_gpuva_region
>> + * @reg: the &drm_gpuva to remove
>> + *
>> + * This removes the given &reg from the underlaying tree.
>> + */
>> +void
>> +drm_gpuva_region_remove(struct drm_gpuva_region *reg)
>> +{
>> +	struct drm_gpuva_manager *mgr = reg->mgr;
>> +
>> +	if (unlikely(!(mgr->flags & DRM_GPUVA_MANAGER_REGIONS)))
>> +		return;
>> +
>> +	if (unlikely(reg == &mgr->kernel_alloc_region)) {
>> +		WARN(1, "Can't destroy kernel reserved region.\n");
>> +		return;
>> +	}
>> +
>> +	if (unlikely(!drm_gpuva_region_empty(reg)))
>> +		WARN(1, "GPU VA region should be empty on destroy.\n");
>> +
>> +	__drm_gpuva_region_remove(reg);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_region_remove);
>> +
>> +/**
>> + * drm_gpuva_region_empty - indicate whether a &drm_gpuva_region is empty
>> + * @reg: the &drm_gpuva to destroy
>> + *
>> + * Returns: true if the &drm_gpuva_region is empty, false otherwise
>> + */
>> +bool
>> +drm_gpuva_region_empty(struct drm_gpuva_region *reg)
>> +{
>> +	DRM_GPUVA_ITER(it, reg->mgr);
>> +
>> +	drm_gpuva_iter_for_each_range(it, reg->va.addr,
>> +				      reg->va.addr +
>> +				      reg->va.range)
>> +		return false;
>> +
>> +	return true;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_region_empty);
>> +
>> +/**
>> + * drm_gpuva_region_find_first - find the first &drm_gpuva_region in the given
>> + * range
>> + * @mgr: the &drm_gpuva_manager to search in
>> + * @addr: the &drm_gpuva_regions address
>> + * @range: the &drm_gpuva_regions range
>> + *
>> + * Returns: the first &drm_gpuva_region within the given range
>> + */
>> +struct drm_gpuva_region *
>> +drm_gpuva_region_find_first(struct drm_gpuva_manager *mgr,
>> +			    u64 addr, u64 range)
>> +{
>> +	MA_STATE(mas, &mgr->region_mt, addr, 0);
>> +
>> +	return mas_find(&mas, addr + range - 1);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_region_find_first);
>> +
>> +/**
>> + * drm_gpuva_region_find - find a &drm_gpuva_region
>> + * @mgr: the &drm_gpuva_manager to search in
>> + * @addr: the &drm_gpuva_regions address
>> + * @range: the &drm_gpuva_regions range
>> + *
>> + * Returns: the &drm_gpuva_region at a given &addr and with a given &range
> 
> again, I'm not sure you want to find first or walk here.. It sounds like
> you want exactly addr to addr + range VMA?

Exactly, same as above.

> 
>> + */
>> +struct drm_gpuva_region *
>> +drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
>> +		      u64 addr, u64 range)
>> +{
>> +	struct drm_gpuva_region *reg;
>> +
>> +	reg = drm_gpuva_region_find_first(mgr, addr, range);
>> +	if (!reg)
>> +		goto out;
>> +
>> +	if (reg->va.range != range)
>> +		goto out;
>> +
>> +	return reg;
>> +
>> +out:
>> +	return NULL;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_region_find);
>> +
> 
> ...
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-22 16:40         ` Danilo Krummrich
@ 2023-02-23  7:06           ` Christian König
  2023-02-23 14:12             ` Danilo Krummrich
  0 siblings, 1 reply; 64+ messages in thread
From: Christian König @ 2023-02-23  7:06 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, Dave Airlie, bagasdotme, jason

Am 22.02.23 um 17:40 schrieb Danilo Krummrich:
> On 2/22/23 16:14, Christian König wrote:
>> Am 22.02.23 um 16:07 schrieb Danilo Krummrich:
>>> On 2/22/23 11:25, Christian König wrote:
>>>> Am 17.02.23 um 14:44 schrieb Danilo Krummrich:
>>>
>>> <snip>
>>>
>>>>> +/**
>>>>> + * DOC: Overview
>>>>> + *
>>>>> + * The DRM GPU VA Manager, represented by struct 
>>>>> drm_gpuva_manager keeps track
>>>>> + * of a GPU's virtual address (VA) space and manages the 
>>>>> corresponding virtual
>>>>> + * mappings represented by &drm_gpuva objects. It also keeps 
>>>>> track of the
>>>>> + * mapping's backing &drm_gem_object buffers.
>>>>> + *
>>>>> + * &drm_gem_object buffers maintain a list (and a corresponding 
>>>>> list lock) of
>>>>> + * &drm_gpuva objects representing all existent GPU VA mappings 
>>>>> using this
>>>>> + * &drm_gem_object as backing buffer.
>>>>> + *
>>>>> + * If the &DRM_GPUVA_MANAGER_REGIONS feature is enabled, a GPU VA 
>>>>> mapping can
>>>>> + * only be created within a previously allocated 
>>>>> &drm_gpuva_region, which
>>>>> + * represents a reserved portion of the GPU VA space. GPU VA 
>>>>> mappings are not
>>>>> + * allowed to span over a &drm_gpuva_region's boundary.
>>>>> + *
>>>>> + * GPU VA regions can also be flagged as sparse, which allows 
>>>>> drivers to create
>>>>> + * sparse mappings for a whole GPU VA region in order to support 
>>>>> Vulkan
>>>>> + * 'Sparse Resources'.
>>>>
>>>> Well since we have now found that there is absolutely no technical 
>>>> reason for having those regions could we please drop them?
>>>
>>> I disagree this was the outcome of our previous discussion.
>>>
>>> In nouveau I still need them to track the separate sparse page 
>>> tables and, as you confirmed previously, Nvidia cards are not the 
>>> only cards supporting this feature.
>>>
>>> The second reason is that with regions we can avoid merging between 
>>> buffers, which saves some effort. However, I agree that this 
>>> argument by itself probably doesn't hold too much, since you've 
>>> pointed out in a previous mail that:
>>>
>>> <cite>
>>> 1) If we merge and decide to only do that inside certain boundaries 
>>> then those boundaries needs to be provided and checked against. This 
>>> burns quite some CPU cycles
>>>
>>> 2) If we just merge what we can we might have extra page table 
>>> updates which cost time and could result in undesired side effects.
>>>
>>> 3) If we don't merge at all we have additional housekeeping for the 
>>> mappings and maybe hw restrictions.
>>> </cite>
>>>
>>> However, if a driver uses regions to track its separate sparse page 
>>> tables anyway it gets 1) for free, which is a nice synergy.
>>>
>>> I totally agree that regions aren't for everyone though. Hence, I 
>>> made them an optional feature and by default regions are disabled. 
>>> In order to use them drm_gpuva_manager_init() must be called with 
>>> the DRM_GPUVA_MANAGER_REGIONS feature flag.
>>>
>>> I really would not want to open code regions or have two GPUVA 
>>> manager instances in nouveau to track sparse page tables. That would 
>>> be really messy, hence I hope we can agree on this to be an optional 
>>> feature.
>>
>> I absolutely don't think that this is a good idea then. This separate 
>> handling of sparse page tables is completely Nouveau specific.
>
> Actually, I rely on what you said in a previous mail when I say it's, 
> potentially, not specific to nouveau.
>
> <cite>
> This sounds similar to what AMD hw used to have up until gfx8 (I 
> think), basically sparse resources where defined through a separate 
> mechanism to the address resolution of the page tables. I won't rule 
> out that other hardware has similar approaches.
> </cite>

Ok, sounds like I didn't made my point here clear: AMD does have that 
same mechanism for older hw you try to implement here for Nouveau, but 
we have *abandoned* it because it is to much trouble and especially 
overhead to support! In other words we have said "Ok we would need two 
separate components to cleanly handle that, one for newer hw and one for 
older hw.".

What you now try to do is to write one component which works for both. 
We have already exercised this idea and came to the conclusion that it's 
not a good path to go down. So you're basically just repeating our mistake.

I mean if it's just for Nouveau then I would say feel free to do 
whatever you want, but since this component is supposed to be used by 
more drivers then I strongly think we need to tackle this from a 
different side.

>> Even when it's optional feature mixing this into the common handling 
>> is exactly what I pointed out as not properly separating between 
>> hardware specific and hardware agnostic functionality.
>
> Optionally having regions is *not* a hardware specific concept, 
> drivers might use it for a hardware specific purpose though. Which 
> potentially is is the case for almost every DRM helper.
>
> Drivers can use regions only for the sake of not merging between 
> buffer boundaries as well. Some drivers might prefer this over "never 
> merge" or "always merge", depending on the cost of re-organizing page 
> tables for unnecessary splits/merges, without having the need of 
> tracking separate sparse page tables.
>
> Its just that I think *if* a driver needs to track separate sparse 
> page tables anyways its a nice synergy since then there is no extra 
> cost for getting this optimization.

Well exactly that's the point: I really don't believe that this comes 
without extra costs.

What we could maybe do is to have an two separate functions, one for 
updating the data structures and one for merging. When you now call the 
merging function with a limit you don't get mappings merged over that 
limit and if you don't call the merging function at all you don't get 
merges.

But we should have definitely not have the tracking of the ranges inside 
the common component. This is something separated.

>> This is exactly the problem we ran into with TTM as well and I've 
>> spend a massive amount of time to clean that up again. >
>
> Admittedly, I don't know what problems you are referring to. However, 
> I don't see which kind of trouble it could cause by allowing drivers 
> to track regions optionally.

Take a look at my 2020 presentation about TTM on FOSDEM.

Regards,
Christian.

>
>> Regards,
>> Christian.
>>
>>>
>>>>
>>>> I don't really see a need for them any more.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-23  7:06           ` Christian König
@ 2023-02-23 14:12             ` Danilo Krummrich
  0 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-23 14:12 UTC (permalink / raw)
  To: Christian König
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, Dave Airlie, bagasdotme, jason

On 2/23/23 08:06, Christian König wrote:
> Am 22.02.23 um 17:40 schrieb Danilo Krummrich:
>> On 2/22/23 16:14, Christian König wrote:
>>> Am 22.02.23 um 16:07 schrieb Danilo Krummrich:
>>>> On 2/22/23 11:25, Christian König wrote:
>>>>> Am 17.02.23 um 14:44 schrieb Danilo Krummrich:
>>>>
>>>> <snip>
>>>>
>>>>>> +/**
>>>>>> + * DOC: Overview
>>>>>> + *
>>>>>> + * The DRM GPU VA Manager, represented by struct 
>>>>>> drm_gpuva_manager keeps track
>>>>>> + * of a GPU's virtual address (VA) space and manages the 
>>>>>> corresponding virtual
>>>>>> + * mappings represented by &drm_gpuva objects. It also keeps 
>>>>>> track of the
>>>>>> + * mapping's backing &drm_gem_object buffers.
>>>>>> + *
>>>>>> + * &drm_gem_object buffers maintain a list (and a corresponding 
>>>>>> list lock) of
>>>>>> + * &drm_gpuva objects representing all existent GPU VA mappings 
>>>>>> using this
>>>>>> + * &drm_gem_object as backing buffer.
>>>>>> + *
>>>>>> + * If the &DRM_GPUVA_MANAGER_REGIONS feature is enabled, a GPU VA 
>>>>>> mapping can
>>>>>> + * only be created within a previously allocated 
>>>>>> &drm_gpuva_region, which
>>>>>> + * represents a reserved portion of the GPU VA space. GPU VA 
>>>>>> mappings are not
>>>>>> + * allowed to span over a &drm_gpuva_region's boundary.
>>>>>> + *
>>>>>> + * GPU VA regions can also be flagged as sparse, which allows 
>>>>>> drivers to create
>>>>>> + * sparse mappings for a whole GPU VA region in order to support 
>>>>>> Vulkan
>>>>>> + * 'Sparse Resources'.
>>>>>
>>>>> Well since we have now found that there is absolutely no technical 
>>>>> reason for having those regions could we please drop them?
>>>>
>>>> I disagree this was the outcome of our previous discussion.
>>>>
>>>> In nouveau I still need them to track the separate sparse page 
>>>> tables and, as you confirmed previously, Nvidia cards are not the 
>>>> only cards supporting this feature.
>>>>
>>>> The second reason is that with regions we can avoid merging between 
>>>> buffers, which saves some effort. However, I agree that this 
>>>> argument by itself probably doesn't hold too much, since you've 
>>>> pointed out in a previous mail that:
>>>>
>>>> <cite>
>>>> 1) If we merge and decide to only do that inside certain boundaries 
>>>> then those boundaries needs to be provided and checked against. This 
>>>> burns quite some CPU cycles
>>>>
>>>> 2) If we just merge what we can we might have extra page table 
>>>> updates which cost time and could result in undesired side effects.
>>>>
>>>> 3) If we don't merge at all we have additional housekeeping for the 
>>>> mappings and maybe hw restrictions.
>>>> </cite>
>>>>
>>>> However, if a driver uses regions to track its separate sparse page 
>>>> tables anyway it gets 1) for free, which is a nice synergy.
>>>>
>>>> I totally agree that regions aren't for everyone though. Hence, I 
>>>> made them an optional feature and by default regions are disabled. 
>>>> In order to use them drm_gpuva_manager_init() must be called with 
>>>> the DRM_GPUVA_MANAGER_REGIONS feature flag.
>>>>
>>>> I really would not want to open code regions or have two GPUVA 
>>>> manager instances in nouveau to track sparse page tables. That would 
>>>> be really messy, hence I hope we can agree on this to be an optional 
>>>> feature.
>>>
>>> I absolutely don't think that this is a good idea then. This separate 
>>> handling of sparse page tables is completely Nouveau specific.
>>
>> Actually, I rely on what you said in a previous mail when I say it's, 
>> potentially, not specific to nouveau.
>>
>> <cite>
>> This sounds similar to what AMD hw used to have up until gfx8 (I 
>> think), basically sparse resources where defined through a separate 
>> mechanism to the address resolution of the page tables. I won't rule 
>> out that other hardware has similar approaches.
>> </cite>
> 
> Ok, sounds like I didn't made my point here clear: AMD does have that 
> same mechanism for older hw you try to implement here for Nouveau, but 
> we have *abandoned* it because it is to much trouble and especially 
> overhead to support! In other words we have said "Ok we would need two 
> separate components to cleanly handle that, one for newer hw and one for 
> older hw.".

My point was more about the potential existence of other hardware having 
similar concepts.

I, personally, can't judge whether actually making use of having 
separate sparse page tables (or similar concepts) makes sense for other 
drivers or not. I think it depends on how the hardware works, which 
limitations it has in handling page tables, etc.

I definitely recognize your experience and that for AMD you decided its 
not worth using a similar mechanism. I would definitely be interested in 
the details. Do you mind sharing them?

However, I think we need to differentiate between whether for AMD 
hardware you just found an approach that worked out better for your 
specific hardware or whether something is fundamentally broken with 
separate sparse page tables (or similar concepts) in general.

Do you think there is something fundamentally broken with such an 
approach? And if so, why?

> 
> What you now try to do is to write one component which works for both. 
> We have already exercised this idea and came to the conclusion that it's 
> not a good path to go down. So you're basically just repeating our mistake.
> 
> I mean if it's just for Nouveau then I would say feel free to do 
> whatever you want, but since this component is supposed to be used by 
> more drivers then I strongly think we need to tackle this from a 
> different side.
> 
>>> Even when it's optional feature mixing this into the common handling 
>>> is exactly what I pointed out as not properly separating between 
>>> hardware specific and hardware agnostic functionality.
>>
>> Optionally having regions is *not* a hardware specific concept, 
>> drivers might use it for a hardware specific purpose though. Which 
>> potentially is is the case for almost every DRM helper.
>>
>> Drivers can use regions only for the sake of not merging between 
>> buffer boundaries as well. Some drivers might prefer this over "never 
>> merge" or "always merge", depending on the cost of re-organizing page 
>> tables for unnecessary splits/merges, without having the need of 
>> tracking separate sparse page tables.
>>
>> Its just that I think *if* a driver needs to track separate sparse 
>> page tables anyways its a nice synergy since then there is no extra 
>> cost for getting this optimization.
> 
> Well exactly that's the point: I really don't believe that this comes 
> without extra costs.

If you already have to store some information for purpose A and an 
optional purpose B requires the exact same information you would get B 
for free.

Which other costs would you see here?

> 
> What we could maybe do is to have an two separate functions, one for 
> updating the data structures and one for merging. When you now call the 
> merging function with a limit you don't get mappings merged over that 
> limit and if you don't call the merging function at all you don't get 
> merges.

Having a separate merging function would work. However, I am against an 
interface that takes limit parameters. Having such an interface signals 
general compliance with tracking regions to drivers, but without the 
offer to do this job in a generic way.

This sounds like a bad compromise to me. I think we should either accept 
that some drivers might have a purpose of tracking regions and hence 
*optionally* support them or have clear evidence that tracking regions 
never ever make sense at all regardless of how a specific hardware 
handles it's page tables.

Allowing drivers to set the merge strategy, however, is a good idea. I 
could also just add corresponding feature flags to let the driver pick.

> 
> But we should have definitely not have the tracking of the ranges inside 
> the common component. This is something separated.
> 
>>> This is exactly the problem we ran into with TTM as well and I've 
>>> spend a massive amount of time to clean that up again. >
>>
>> Admittedly, I don't know what problems you are referring to. However, 
>> I don't see which kind of trouble it could cause by allowing drivers 
>> to track regions optionally.
> 
> Take a look at my 2020 presentation about TTM on FOSDEM.
> 
> Regards,
> Christian.
> 
>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>>>
>>>>> I don't really see a need for them any more.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-22 18:13     ` Danilo Krummrich
@ 2023-02-23 19:09       ` Liam R. Howlett
  2023-02-27 12:23         ` Danilo Krummrich
  0 siblings, 1 reply; 64+ messages in thread
From: Liam R. Howlett @ 2023-02-23 19:09 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Dave Airlie, bagasdotme, christian.koenig, jason

* Danilo Krummrich <dakr@redhat.com> [230222 13:13]:
> On 2/21/23 19:20, Liam R. Howlett wrote:
> > * Danilo Krummrich <dakr@redhat.com> [230217 08:45]:
> > > Add infrastructure to keep track of GPU virtual address (VA) mappings
> > > with a decicated VA space manager implementation.
> > > 
> > > New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
> > > start implementing, allow userspace applications to request multiple and
> > > arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
> > > intended to serve the following purposes in this context.
> > > 
> > > 1) Provide infrastructure to track GPU VA allocations and mappings,
> > >     making use of the maple_tree.
> > > 
> > > 2) Generically connect GPU VA mappings to their backing buffers, in
> > >     particular DRM GEM objects.
> > > 
> > > 3) Provide a common implementation to perform more complex mapping
> > >     operations on the GPU VA space. In particular splitting and merging
> > >     of GPU VA mappings, e.g. for intersecting mapping requests or partial
> > >     unmap requests.
> > > 
> > > Suggested-by: Dave Airlie <airlied@redhat.com>
> > > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > > ---
> > >   Documentation/gpu/drm-mm.rst    |   31 +
> > >   drivers/gpu/drm/Makefile        |    1 +
> > >   drivers/gpu/drm/drm_gem.c       |    3 +
> > >   drivers/gpu/drm/drm_gpuva_mgr.c | 1704 +++++++++++++++++++++++++++++++
> > >   include/drm/drm_drv.h           |    6 +
> > >   include/drm/drm_gem.h           |   75 ++
> > >   include/drm/drm_gpuva_mgr.h     |  714 +++++++++++++
> > >   7 files changed, 2534 insertions(+)
> > >   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
> > >   create mode 100644 include/drm/drm_gpuva_mgr.h
> > > 
> > > diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> > > index a52e6f4117d6..c9f120cfe730 100644
> > > --- a/Documentation/gpu/drm-mm.rst
> > > +++ b/Documentation/gpu/drm-mm.rst
> > > @@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
> > >   .. kernel-doc:: drivers/gpu/drm/drm_mm.c
> > >      :export:
> > ...
> > 
> > > +
> > > +/**
> > > + * drm_gpuva_remove_iter - removes the iterators current element
> > > + * @it: the &drm_gpuva_iterator
> > > + *
> > > + * This removes the element the iterator currently points to.
> > > + */
> > > +void
> > > +drm_gpuva_iter_remove(struct drm_gpuva_iterator *it)
> > > +{
> > > +	mas_erase(&it->mas);
> > > +}
> > > +EXPORT_SYMBOL(drm_gpuva_iter_remove);
> > > +
> > > +/**
> > > + * drm_gpuva_insert - insert a &drm_gpuva
> > > + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
> > > + * @va: the &drm_gpuva to insert
> > > + * @addr: the start address of the GPU VA
> > > + * @range: the range of the GPU VA
> > > + *
> > > + * Insert a &drm_gpuva with a given address and range into a
> > > + * &drm_gpuva_manager.
> > > + *
> > > + * Returns: 0 on success, negative error code on failure.
> > > + */
> > > +int
> > > +drm_gpuva_insert(struct drm_gpuva_manager *mgr,
> > > +		 struct drm_gpuva *va)
> > > +{
> > > +	u64 addr = va->va.addr;
> > > +	u64 range = va->va.range;
> > > +	MA_STATE(mas, &mgr->va_mt, addr, addr + range - 1);
> > > +	struct drm_gpuva_region *reg = NULL;
> > > +	int ret;
> > > +
> > > +	if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
> > > +		return -EINVAL;
> > > +
> > > +	if (unlikely(drm_gpuva_in_kernel_region(mgr, addr, range)))
> > > +		return -EINVAL;
> > > +
> > > +	if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS) {
> > > +		reg = drm_gpuva_in_region(mgr, addr, range);
> > > +		if (unlikely(!reg))
> > > +			return -EINVAL;
> > > +	}
> > > +
> > 
> > -----
> > 
> > > +	if (unlikely(drm_gpuva_find_first(mgr, addr, range)))
> > > +		return -EEXIST;
> > > +
> > > +	ret = mas_store_gfp(&mas, va, GFP_KERNEL);
> > 
> > mas_walk() will set the internal maple state to the limits to what it
> > finds.  So, instead of an iterator, you can use the walk function and
> > ensure there is a large enough area in the existing NULL:
> > 
> > /*
> >   * Nothing at addr, mas now points to the location where the store would
> >   * happen
> >   */
> > if (mas_walk(&mas))
> > 	return -EEXIST;
> > 
> > /* The NULL entry ends at mas.last, make sure there is room */
> > if (mas.last < (addr + range - 1))
> > 	return -EEXIST;
> > 
> > /* Limit the store size to the correct end address, and store */
> >   mas.last = addr + range - 1;
> >   ret = mas_store_gfp(&mas, va, GFP_KERNEL);
> > 
> 
> Would this variant be significantly more efficient?

Well, what you are doing is walking the tree to see if there's anything
there... then re-walking the tree to store it.  So, yes, it's much more
efficient..  However, writing is heavier.  How much of the time is spent
walking vs writing depends on the size of the tree, but it's rather easy
to do this in a single walk of the tree so why wouldn't you?

> 
> Also, would this also work while already walking the tree?

Yes, to an extent.  If you are at the correct location in the tree, you
can write to that location.  If you are not in the correct location and
try to write to the tree then things will go poorly..  In this scenario,
we are very much walking the tree and writing to it in two steps.

> 
> To remove an entry while walking the tree I have a separate function
> drm_gpuva_iter_remove(). Would I need something similar for inserting
> entries?

I saw that.  Your remove function uses the erase operation which is
implemented as a walk to that location and a store of a null over the
range that is returned.  You do not need a function to insert an entry
if the maple state is at the correct location, and that doesn't just
mean setting mas.index/mas.last to the correct value.  There is a node &
offset saved in the maple state that needs to be in the correct
location.  If you store to that node then the node may be replaced, so
other iterators that you have may become stale, but the one you used
execute the store operation will now point to the new node with the new
entry.

> 
> I already provided this example in a separate mail thread, but it may makes
> sense to move this to the mailing list:
> 
> In __drm_gpuva_sm_map() we're iterating a given range of the tree, where the
> given range is the size of the newly requested mapping. __drm_gpuva_sm_map()
> invokes a callback for each sub-operation that needs to be taken in order to
> fulfill this mapping request. In most cases such a callback just creates a
> drm_gpuva_op object and stores it in a list.
> 
> However, drivers can also implement the callback, such that they directly
> execute this operation within the callback.
> 
> Let's have a look at the following example:
> 
>      0     a     2
> old: |-----------|       (bo_offset=n)
> 
>            1     b     3
> req:       |-----------| (bo_offset=m)
> 
>      0  a' 1     b     3
> new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
> 
> This would result in the following operations.
> 
> __drm_gpuva_sm_map() finds entry "a" and calls back into the driver
> suggesting to re-map "a" with the new size. The driver removes entry "a"
> from the tree and adds "a'"

What you have here won't work.  The driver will cause your iterators
maple state to point to memory that is freed.  You will either need to
pass through your iterator so that the modifications can occur with that
maple state so it remains valid, or you will need to invalidate the
iterator on every modification by the driver.

I'm sure the first idea you have will be to invalidate the iterator, but
that is probably not the way to proceed.  Even ignoring the unclear
locking of two maple states trying to modify the tree, this is rather
inefficient - each invalidation means a re-walk of the tree.  You may as
well not use an iterator in this case.

Depending on how/when the lookups occur, you could still iterate over
the tree and let the driver modify the ending of "a", but leave the tree
alone and just store b over whatever - but the failure scenarios may
cause you grief.

If you pass the iterator through, then you can just use it to do your
writes and keep iterating as if nothing changed.

> 
> __drm_gpuva_sm_map(), ideally, continues the loop searching for nodes
> starting from the end of "a" (which is 2) till the end of the requested
> mapping "b" (which is 3). Since it doesn't find any other mapping within
> this range it calls back into the driver suggesting to finally map "b".
> 
> If there would have been another mapping between 2 and 3 it would have
> called back into the driver asking to unmap this mapping beforehand.
> 
> So, it boils down to re-mapping as described at the beginning (and
> analogously at the end) of a new mapping range and removing of entries that
> are enclosed by the new mapping range.

I assume the unmapped area is no longer needed, and the 're-map' is
really a removal of information?  Otherwise I'd suggest searching for a
gap which fits your request.  What you have here is a lot like
"MAP_FIXED" vs top-down/bottom-up search in the VMA code, this seems to
be like your __drm_gpuva_sm_map() and the drm mm range allocator with
DRM_MM_INSERT_LOW, and DRM_MM_INSERT_HIGH.

Why can these split/unmappings fail?  Is it because they are still
needed?

> 
> > > +	if (unlikely(ret))
> > > +		return ret;
> > > +
> > > +	va->mgr = mgr;
> > > +	va->region = reg;
> > > +
> > > +	return 0;
> > > +}
> > > +EXPORT_SYMBOL(drm_gpuva_insert);
> > > +
> > > +/**
> > > + * drm_gpuva_remove - remove a &drm_gpuva
> > > + * @va: the &drm_gpuva to remove
> > > + *
> > > + * This removes the given &va from the underlaying tree.
> > > + */
> > > +void
> > > +drm_gpuva_remove(struct drm_gpuva *va)
> > > +{
> > > +	MA_STATE(mas, &va->mgr->va_mt, va->va.addr, 0);
> > > +
> > > +	mas_erase(&mas);
> > > +}
> > > +EXPORT_SYMBOL(drm_gpuva_remove);
> > > +
> > ...
> > 
> > > +/**
> > > + * drm_gpuva_find_first - find the first &drm_gpuva in the given range
> > > + * @mgr: the &drm_gpuva_manager to search in
> > > + * @addr: the &drm_gpuvas address
> > > + * @range: the &drm_gpuvas range
> > > + *
> > > + * Returns: the first &drm_gpuva within the given range
> > > + */
> > > +struct drm_gpuva *
> > > +drm_gpuva_find_first(struct drm_gpuva_manager *mgr,
> > > +		     u64 addr, u64 range)
> > > +{
> > > +	MA_STATE(mas, &mgr->va_mt, addr, 0);
> > > +
> > > +	return mas_find(&mas, addr + range - 1);
> > > +}
> > > +EXPORT_SYMBOL(drm_gpuva_find_first);
> > > +
> > > +/**
> > > + * drm_gpuva_find - find a &drm_gpuva
> > > + * @mgr: the &drm_gpuva_manager to search in
> > > + * @addr: the &drm_gpuvas address
> > > + * @range: the &drm_gpuvas range
> > > + *
> > > + * Returns: the &drm_gpuva at a given &addr and with a given &range
> > 
> > Note that mas_find() will continue upwards in the address space if there
> > isn't anything at @addr.  This means that &drm_gpuva may not be at
> > &addr.  If you want to check just at &addr, use mas_walk().
> 
> Good catch. drm_gpuva_find() should then either also check for 'va->va.addr
> == addr' as well or, alternatively, use mas_walk(). As above, any reason to
> prefer mas_walk()?
> 
> > 
> > > + */
> > > +struct drm_gpuva *
> > > +drm_gpuva_find(struct drm_gpuva_manager *mgr,
> > > +	       u64 addr, u64 range)
> > > +{
> > > +	struct drm_gpuva *va;
> > > +
> > > +	va = drm_gpuva_find_first(mgr, addr, range);
> > > +	if (!va)
> > > +		goto out;
> > > +
> > > +	if (va->va.range != range)
> > > +		goto out;
> > > +
> > > +	return va;
> > > +
> > > +out:
> > > +	return NULL;
> > > +}
> > > +EXPORT_SYMBOL(drm_gpuva_find);
> > > +
> > > +/**
> > > + * drm_gpuva_find_prev - find the &drm_gpuva before the given address
> > > + * @mgr: the &drm_gpuva_manager to search in
> > > + * @start: the given GPU VA's start address
> > > + *
> > > + * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
> > > + *
> > > + * Note that if there is any free space between the GPU VA mappings no mapping
> > > + * is returned.
> > > + *
> > > + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
> > > + */
> > > +struct drm_gpuva *
> > > +drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
> > 
> > find_prev() usually continues beyond 1 less than the address. I found
> > this name confusing.
> 
> Don't really get that, mind explaining?

When I ask for the previous one in a list or tree, I think the one
before.. but since you are limiting your search from start to start - 1,
you may as well walk to start - 1 and see if one exists.

Is that what you meant to do here?

> 
> > You may as well use mas_walk(), it would be faster.
> 
> How would I use mas_walk() for that? If I understand it correctly,
> mas_walk() requires me to know that start address, which I don't know for
> the previous entry.

mas_walk() walks to the value you specify and returns the entry at that
address, not necessarily the start address, but any address in the
range.

If you have a tree and store A = [0x1000 - 0x2000] and set your maple
state to walk to 0x1500, mas_walk() will return A, and the maple state
will have mas.index = 0x1000 and mas.last = 0x2000.

You have set the maple state to start at "start" and called
mas_prev(&mas, start - 1).  start - 1 is the lower limit, so the
internal implementation will walk to start then go to the previous entry
until start - 1.. it will stop at start - 1 and return NULL if there
isn't one there.

> 
> However, mas_walk() seems to be a good alternative to use for
> drm_gpuva_find_next().
> 
> > > +{
> > > +	MA_STATE(mas, &mgr->va_mt, start, 0);
> > > +
> > > +	if (start <= mgr->mm_start ||
> > > +	    start > (mgr->mm_start + mgr->mm_range))
> > > +		return NULL;
> > > +
> > > +	return mas_prev(&mas, start - 1);
> > > +}
> > > +EXPORT_SYMBOL(drm_gpuva_find_prev);
> > > +
> > > +/**
> > > + * drm_gpuva_find_next - find the &drm_gpuva after the given address
> > > + * @mgr: the &drm_gpuva_manager to search in
> > > + * @end: the given GPU VA's end address
> > > + *
> > > + * Find the adjacent &drm_gpuva after the GPU VA with given &end address.
> > > + *
> > > + * Note that if there is any free space between the GPU VA mappings no mapping
> > > + * is returned.
> > > + *
> > > + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
> > > + */
> > > +struct drm_gpuva *
> > > +drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end)
> > 
> > This name is also a bit confusing for the same reason.  Again, it seems
> > worth just walking to end here.
> > 
> > > +{
> > > +	MA_STATE(mas, &mgr->va_mt, end - 1, 0);
> > > +
> > > +	if (end < mgr->mm_start ||
> > > +	    end >= (mgr->mm_start + mgr->mm_range))
> > > +		return NULL;
> > > +
> > > +	return mas_next(&mas, end);
> > > +}
> > > +EXPORT_SYMBOL(drm_gpuva_find_next);
> > > +
> > > +static int
> > > +__drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
> > > +			  struct drm_gpuva_region *reg)
> > > +{
> > > +	u64 addr = reg->va.addr;
> > > +	u64 range = reg->va.range;
> > > +	MA_STATE(mas, &mgr->region_mt, addr, addr + range - 1);
> > > +	int ret;
> > > +
> > > +	if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
> > > +		return -EINVAL;
> > > +
> > > +	ret = mas_store_gfp(&mas, reg, GFP_KERNEL);
> > > +	if (unlikely(ret))
> > > +		return ret;
> > > +
> > > +	reg->mgr = mgr;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * drm_gpuva_region_insert - insert a &drm_gpuva_region
> > > + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
> > > + * @reg: the &drm_gpuva_region to insert
> > > + * @addr: the start address of the GPU VA
> > > + * @range: the range of the GPU VA
> > > + *
> > > + * Insert a &drm_gpuva_region with a given address and range into a
> > > + * &drm_gpuva_manager.
> > > + *
> > > + * Returns: 0 on success, negative error code on failure.
> > > + */
> > > +int
> > > +drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
> > > +			struct drm_gpuva_region *reg)
> > > +{
> > > +	if (unlikely(!(mgr->flags & DRM_GPUVA_MANAGER_REGIONS)))
> > > +		return -EINVAL;
> > > +
> > > +	return __drm_gpuva_region_insert(mgr, reg);
> > > +}
> > > +EXPORT_SYMBOL(drm_gpuva_region_insert);
> > > +
> > > +static void
> > > +__drm_gpuva_region_remove(struct drm_gpuva_region *reg)
> > > +{
> > > +	struct drm_gpuva_manager *mgr = reg->mgr;
> > > +	MA_STATE(mas, &mgr->region_mt, reg->va.addr, 0);
> > > +
> > > +	mas_erase(&mas);
> > > +}
> > > +
> > > +/**
> > > + * drm_gpuva_region_remove - remove a &drm_gpuva_region
> > > + * @reg: the &drm_gpuva to remove
> > > + *
> > > + * This removes the given &reg from the underlaying tree.
> > > + */
> > > +void
> > > +drm_gpuva_region_remove(struct drm_gpuva_region *reg)
> > > +{
> > > +	struct drm_gpuva_manager *mgr = reg->mgr;
> > > +
> > > +	if (unlikely(!(mgr->flags & DRM_GPUVA_MANAGER_REGIONS)))
> > > +		return;
> > > +
> > > +	if (unlikely(reg == &mgr->kernel_alloc_region)) {
> > > +		WARN(1, "Can't destroy kernel reserved region.\n");
> > > +		return;
> > > +	}
> > > +
> > > +	if (unlikely(!drm_gpuva_region_empty(reg)))
> > > +		WARN(1, "GPU VA region should be empty on destroy.\n");
> > > +
> > > +	__drm_gpuva_region_remove(reg);
> > > +}
> > > +EXPORT_SYMBOL(drm_gpuva_region_remove);
> > > +
> > > +/**
> > > + * drm_gpuva_region_empty - indicate whether a &drm_gpuva_region is empty
> > > + * @reg: the &drm_gpuva to destroy
> > > + *
> > > + * Returns: true if the &drm_gpuva_region is empty, false otherwise
> > > + */
> > > +bool
> > > +drm_gpuva_region_empty(struct drm_gpuva_region *reg)
> > > +{
> > > +	DRM_GPUVA_ITER(it, reg->mgr);
> > > +
> > > +	drm_gpuva_iter_for_each_range(it, reg->va.addr,
> > > +				      reg->va.addr +
> > > +				      reg->va.range)
> > > +		return false;
> > > +
> > > +	return true;
> > > +}
> > > +EXPORT_SYMBOL(drm_gpuva_region_empty);
> > > +
> > > +/**
> > > + * drm_gpuva_region_find_first - find the first &drm_gpuva_region in the given
> > > + * range
> > > + * @mgr: the &drm_gpuva_manager to search in
> > > + * @addr: the &drm_gpuva_regions address
> > > + * @range: the &drm_gpuva_regions range
> > > + *
> > > + * Returns: the first &drm_gpuva_region within the given range
> > > + */
> > > +struct drm_gpuva_region *
> > > +drm_gpuva_region_find_first(struct drm_gpuva_manager *mgr,
> > > +			    u64 addr, u64 range)
> > > +{
> > > +	MA_STATE(mas, &mgr->region_mt, addr, 0);
> > > +
> > > +	return mas_find(&mas, addr + range - 1);
> > > +}
> > > +EXPORT_SYMBOL(drm_gpuva_region_find_first);
> > > +
> > > +/**
> > > + * drm_gpuva_region_find - find a &drm_gpuva_region
> > > + * @mgr: the &drm_gpuva_manager to search in
> > > + * @addr: the &drm_gpuva_regions address
> > > + * @range: the &drm_gpuva_regions range
> > > + *
> > > + * Returns: the &drm_gpuva_region at a given &addr and with a given &range
> > 
> > again, I'm not sure you want to find first or walk here.. It sounds like
> > you want exactly addr to addr + range VMA?
> 
> Exactly, same as above.

MA_STATE(mas, &mgr->region_mt, addr, addr);

reg = mas_walk(&mas);
if (!reg)
	return reg;

if ((mas.index != addr) | (mas.last != range - 1))
	return NULL;

return reg;

> 
> > 
> > > + */
> > > +struct drm_gpuva_region *
> > > +drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
> > > +		      u64 addr, u64 range)
> > > +{
> > > +	struct drm_gpuva_region *reg;
> > > +
> > > +	reg = drm_gpuva_region_find_first(mgr, addr, range);

mas_find() will keep searching, so you may get a VMA that starts higher
than addr.

> > > +	if (!reg)
> > > +		goto out;
> > > +
> > > +	if (reg->va.range != range)
> > > +		goto out;
> > > +
> > > +	return reg;
> > > +
> > > +out:
> > > +	return NULL;
> > > +}
> > > +EXPORT_SYMBOL(drm_gpuva_region_find);
> > > +
> > 
> > ...
> > 
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-23 19:09       ` Liam R. Howlett
@ 2023-02-27 12:23         ` Danilo Krummrich
  2023-03-02  2:38           ` Liam R. Howlett
  0 siblings, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-27 12:23 UTC (permalink / raw)
  To: Liam R. Howlett
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Dave Airlie, bagasdotme, christian.koenig, jason

On 2/23/23 20:09, Liam R. Howlett wrote:
> * Danilo Krummrich <dakr@redhat.com> [230222 13:13]:
>> On 2/21/23 19:20, Liam R. Howlett wrote:
>>> * Danilo Krummrich <dakr@redhat.com> [230217 08:45]:
>>>> Add infrastructure to keep track of GPU virtual address (VA) mappings
>>>> with a decicated VA space manager implementation.
>>>>
>>>> New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
>>>> start implementing, allow userspace applications to request multiple and
>>>> arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
>>>> intended to serve the following purposes in this context.
>>>>
>>>> 1) Provide infrastructure to track GPU VA allocations and mappings,
>>>>      making use of the maple_tree.
>>>>
>>>> 2) Generically connect GPU VA mappings to their backing buffers, in
>>>>      particular DRM GEM objects.
>>>>
>>>> 3) Provide a common implementation to perform more complex mapping
>>>>      operations on the GPU VA space. In particular splitting and merging
>>>>      of GPU VA mappings, e.g. for intersecting mapping requests or partial
>>>>      unmap requests.
>>>>
>>>> Suggested-by: Dave Airlie <airlied@redhat.com>
>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>> ---
>>>>    Documentation/gpu/drm-mm.rst    |   31 +
>>>>    drivers/gpu/drm/Makefile        |    1 +
>>>>    drivers/gpu/drm/drm_gem.c       |    3 +
>>>>    drivers/gpu/drm/drm_gpuva_mgr.c | 1704 +++++++++++++++++++++++++++++++
>>>>    include/drm/drm_drv.h           |    6 +
>>>>    include/drm/drm_gem.h           |   75 ++
>>>>    include/drm/drm_gpuva_mgr.h     |  714 +++++++++++++
>>>>    7 files changed, 2534 insertions(+)
>>>>    create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>>>>    create mode 100644 include/drm/drm_gpuva_mgr.h
>>>>
>>>> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
>>>> index a52e6f4117d6..c9f120cfe730 100644
>>>> --- a/Documentation/gpu/drm-mm.rst
>>>> +++ b/Documentation/gpu/drm-mm.rst
>>>> @@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
>>>>    .. kernel-doc:: drivers/gpu/drm/drm_mm.c
>>>>       :export:
>>> ...
>>>
>>>> +
>>>> +/**
>>>> + * drm_gpuva_remove_iter - removes the iterators current element
>>>> + * @it: the &drm_gpuva_iterator
>>>> + *
>>>> + * This removes the element the iterator currently points to.
>>>> + */
>>>> +void
>>>> +drm_gpuva_iter_remove(struct drm_gpuva_iterator *it)
>>>> +{
>>>> +	mas_erase(&it->mas);
>>>> +}
>>>> +EXPORT_SYMBOL(drm_gpuva_iter_remove);
>>>> +
>>>> +/**
>>>> + * drm_gpuva_insert - insert a &drm_gpuva
>>>> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
>>>> + * @va: the &drm_gpuva to insert
>>>> + * @addr: the start address of the GPU VA
>>>> + * @range: the range of the GPU VA
>>>> + *
>>>> + * Insert a &drm_gpuva with a given address and range into a
>>>> + * &drm_gpuva_manager.
>>>> + *
>>>> + * Returns: 0 on success, negative error code on failure.
>>>> + */
>>>> +int
>>>> +drm_gpuva_insert(struct drm_gpuva_manager *mgr,
>>>> +		 struct drm_gpuva *va)
>>>> +{
>>>> +	u64 addr = va->va.addr;
>>>> +	u64 range = va->va.range;
>>>> +	MA_STATE(mas, &mgr->va_mt, addr, addr + range - 1);
>>>> +	struct drm_gpuva_region *reg = NULL;
>>>> +	int ret;
>>>> +
>>>> +	if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
>>>> +		return -EINVAL;
>>>> +
>>>> +	if (unlikely(drm_gpuva_in_kernel_region(mgr, addr, range)))
>>>> +		return -EINVAL;
>>>> +
>>>> +	if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS) {
>>>> +		reg = drm_gpuva_in_region(mgr, addr, range);
>>>> +		if (unlikely(!reg))
>>>> +			return -EINVAL;
>>>> +	}
>>>> +
>>>
>>> -----
>>>
>>>> +	if (unlikely(drm_gpuva_find_first(mgr, addr, range)))
>>>> +		return -EEXIST;
>>>> +
>>>> +	ret = mas_store_gfp(&mas, va, GFP_KERNEL);
>>>
>>> mas_walk() will set the internal maple state to the limits to what it
>>> finds.  So, instead of an iterator, you can use the walk function and
>>> ensure there is a large enough area in the existing NULL:
>>>
>>> /*
>>>    * Nothing at addr, mas now points to the location where the store would
>>>    * happen
>>>    */
>>> if (mas_walk(&mas))
>>> 	return -EEXIST;
>>>
>>> /* The NULL entry ends at mas.last, make sure there is room */
>>> if (mas.last < (addr + range - 1))
>>> 	return -EEXIST;
>>>
>>> /* Limit the store size to the correct end address, and store */
>>>    mas.last = addr + range - 1;
>>>    ret = mas_store_gfp(&mas, va, GFP_KERNEL);
>>>
>>
>> Would this variant be significantly more efficient?
> 
> Well, what you are doing is walking the tree to see if there's anything
> there... then re-walking the tree to store it.  So, yes, it's much more
> efficient..  However, writing is heavier.  How much of the time is spent
> walking vs writing depends on the size of the tree, but it's rather easy
> to do this in a single walk of the tree so why wouldn't you?

I will, I was just curious about how much of an impact it has.

> 
>>
>> Also, would this also work while already walking the tree?
> 
> Yes, to an extent.  If you are at the correct location in the tree, you
> can write to that location.  If you are not in the correct location and
> try to write to the tree then things will go poorly..  In this scenario,
> we are very much walking the tree and writing to it in two steps.
> 
>>
>> To remove an entry while walking the tree I have a separate function
>> drm_gpuva_iter_remove(). Would I need something similar for inserting
>> entries?
> 
> I saw that.  Your remove function uses the erase operation which is
> implemented as a walk to that location and a store of a null over the
> range that is returned.  You do not need a function to insert an entry
> if the maple state is at the correct location, and that doesn't just
> mean setting mas.index/mas.last to the correct value.  There is a node &
> offset saved in the maple state that needs to be in the correct
> location.  If you store to that node then the node may be replaced, so
> other iterators that you have may become stale, but the one you used
> execute the store operation will now point to the new node with the new
> entry.
> 
>>
>> I already provided this example in a separate mail thread, but it may makes
>> sense to move this to the mailing list:
>>
>> In __drm_gpuva_sm_map() we're iterating a given range of the tree, where the
>> given range is the size of the newly requested mapping. __drm_gpuva_sm_map()
>> invokes a callback for each sub-operation that needs to be taken in order to
>> fulfill this mapping request. In most cases such a callback just creates a
>> drm_gpuva_op object and stores it in a list.
>>
>> However, drivers can also implement the callback, such that they directly
>> execute this operation within the callback.
>>
>> Let's have a look at the following example:
>>
>>       0     a     2
>> old: |-----------|       (bo_offset=n)
>>
>>             1     b     3
>> req:       |-----------| (bo_offset=m)
>>
>>       0  a' 1     b     3
>> new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
>>
>> This would result in the following operations.
>>
>> __drm_gpuva_sm_map() finds entry "a" and calls back into the driver
>> suggesting to re-map "a" with the new size. The driver removes entry "a"
>> from the tree and adds "a'"
> 
> What you have here won't work.  The driver will cause your iterators
> maple state to point to memory that is freed.  You will either need to
> pass through your iterator so that the modifications can occur with that
> maple state so it remains valid, or you will need to invalidate the
> iterator on every modification by the driver.
> 
> I'm sure the first idea you have will be to invalidate the iterator, but
> that is probably not the way to proceed.  Even ignoring the unclear
> locking of two maple states trying to modify the tree, this is rather
> inefficient - each invalidation means a re-walk of the tree.  You may as
> well not use an iterator in this case.
> 
> Depending on how/when the lookups occur, you could still iterate over
> the tree and let the driver modify the ending of "a", but leave the tree
> alone and just store b over whatever - but the failure scenarios may
> cause you grief.
> 
> If you pass the iterator through, then you can just use it to do your
> writes and keep iterating as if nothing changed.

Passing through the iterater clearly seems to be the way to go.

I assume that if the entry to insert isn't at the location of the 
iterator (as in the following example) we can just keep walking to this 
location my changing the index of the mas and calling mas_walk()? This 
would also imply that the "outer" tree walk continues after the entry we 
just inserted, right?

            1     a     3
old:       |-----------| (bo_offset=n)

      0     b     2
req: |-----------|       (bo_offset=m)

      0     b     2  a' 3
new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)

Again, after finding "a", we want to remove it and insert "a'" instead.

> 
>>
>> __drm_gpuva_sm_map(), ideally, continues the loop searching for nodes
>> starting from the end of "a" (which is 2) till the end of the requested
>> mapping "b" (which is 3). Since it doesn't find any other mapping within
>> this range it calls back into the driver suggesting to finally map "b".
>>
>> If there would have been another mapping between 2 and 3 it would have
>> called back into the driver asking to unmap this mapping beforehand.
>>
>> So, it boils down to re-mapping as described at the beginning (and
>> analogously at the end) of a new mapping range and removing of entries that
>> are enclosed by the new mapping range.
> 
> I assume the unmapped area is no longer needed, and the 're-map' is
> really a removal of information?  Otherwise I'd suggest searching for a
> gap which fits your request.  What you have here is a lot like
> "MAP_FIXED" vs top-down/bottom-up search in the VMA code, this seems to
> be like your __drm_gpuva_sm_map() and the drm mm range allocator with
> DRM_MM_INSERT_LOW, and DRM_MM_INSERT_HIGH.
> 
> Why can these split/unmappings fail?  Is it because they are still
> needed?
> 

You mean the check before the mas_*() operations in drm_gpuva_insert()?

Removing entries should never fail, inserting entries should fail when 
the caller tries to store to an area outside of the VA space (it doesn't 
necessarily span the whole 64-bit space), a kernel reserved area of the 
VA space, is not in any pre-allocated range of the VA space (if regions 
are enabled) or an entry already exists at that location.

>>
>>>> +	if (unlikely(ret))
>>>> +		return ret;
>>>> +
>>>> +	va->mgr = mgr;
>>>> +	va->region = reg;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +EXPORT_SYMBOL(drm_gpuva_insert);
>>>> +
>>>> +/**
>>>> + * drm_gpuva_remove - remove a &drm_gpuva
>>>> + * @va: the &drm_gpuva to remove
>>>> + *
>>>> + * This removes the given &va from the underlaying tree.
>>>> + */
>>>> +void
>>>> +drm_gpuva_remove(struct drm_gpuva *va)
>>>> +{
>>>> +	MA_STATE(mas, &va->mgr->va_mt, va->va.addr, 0);
>>>> +
>>>> +	mas_erase(&mas);
>>>> +}
>>>> +EXPORT_SYMBOL(drm_gpuva_remove);
>>>> +
>>> ...
>>>
>>>> +/**
>>>> + * drm_gpuva_find_first - find the first &drm_gpuva in the given range
>>>> + * @mgr: the &drm_gpuva_manager to search in
>>>> + * @addr: the &drm_gpuvas address
>>>> + * @range: the &drm_gpuvas range
>>>> + *
>>>> + * Returns: the first &drm_gpuva within the given range
>>>> + */
>>>> +struct drm_gpuva *
>>>> +drm_gpuva_find_first(struct drm_gpuva_manager *mgr,
>>>> +		     u64 addr, u64 range)
>>>> +{
>>>> +	MA_STATE(mas, &mgr->va_mt, addr, 0);
>>>> +
>>>> +	return mas_find(&mas, addr + range - 1);
>>>> +}
>>>> +EXPORT_SYMBOL(drm_gpuva_find_first);
>>>> +
>>>> +/**
>>>> + * drm_gpuva_find - find a &drm_gpuva
>>>> + * @mgr: the &drm_gpuva_manager to search in
>>>> + * @addr: the &drm_gpuvas address
>>>> + * @range: the &drm_gpuvas range
>>>> + *
>>>> + * Returns: the &drm_gpuva at a given &addr and with a given &range
>>>
>>> Note that mas_find() will continue upwards in the address space if there
>>> isn't anything at @addr.  This means that &drm_gpuva may not be at
>>> &addr.  If you want to check just at &addr, use mas_walk().
>>
>> Good catch. drm_gpuva_find() should then either also check for 'va->va.addr
>> == addr' as well or, alternatively, use mas_walk(). As above, any reason to
>> prefer mas_walk()?
>>
>>>
>>>> + */
>>>> +struct drm_gpuva *
>>>> +drm_gpuva_find(struct drm_gpuva_manager *mgr,
>>>> +	       u64 addr, u64 range)
>>>> +{
>>>> +	struct drm_gpuva *va;
>>>> +
>>>> +	va = drm_gpuva_find_first(mgr, addr, range);
>>>> +	if (!va)
>>>> +		goto out;
>>>> +
>>>> +	if (va->va.range != range)
>>>> +		goto out;
>>>> +
>>>> +	return va;
>>>> +
>>>> +out:
>>>> +	return NULL;
>>>> +}
>>>> +EXPORT_SYMBOL(drm_gpuva_find);
>>>> +
>>>> +/**
>>>> + * drm_gpuva_find_prev - find the &drm_gpuva before the given address
>>>> + * @mgr: the &drm_gpuva_manager to search in
>>>> + * @start: the given GPU VA's start address
>>>> + *
>>>> + * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
>>>> + *
>>>> + * Note that if there is any free space between the GPU VA mappings no mapping
>>>> + * is returned.
>>>> + *
>>>> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
>>>> + */
>>>> +struct drm_gpuva *
>>>> +drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
>>>
>>> find_prev() usually continues beyond 1 less than the address. I found
>>> this name confusing.
>>
>> Don't really get that, mind explaining?
> 
> When I ask for the previous one in a list or tree, I think the one
> before.. but since you are limiting your search from start to start - 1,
> you may as well walk to start - 1 and see if one exists.
> 
> Is that what you meant to do here?

Yes, I want to know whether there is a previous entry which ends right 
before the current entry, without a gap between the two.

> 
>>
>>> You may as well use mas_walk(), it would be faster.
>>
>> How would I use mas_walk() for that? If I understand it correctly,
>> mas_walk() requires me to know that start address, which I don't know for
>> the previous entry.
> 
> mas_walk() walks to the value you specify and returns the entry at that
> address, not necessarily the start address, but any address in the
> range.
> 
> If you have a tree and store A = [0x1000 - 0x2000] and set your maple
> state to walk to 0x1500, mas_walk() will return A, and the maple state
> will have mas.index = 0x1000 and mas.last = 0x2000.
> 
> You have set the maple state to start at "start" and called
> mas_prev(&mas, start - 1).  start - 1 is the lower limit, so the
> internal implementation will walk to start then go to the previous entry
> until start - 1.. it will stop at start - 1 and return NULL if there
> isn't one there.

Thanks for the clarification and all the other very helpful comments and 
explanations!

- Danilo

> 
>>
>> However, mas_walk() seems to be a good alternative to use for
>> drm_gpuva_find_next().
>>
>>>> +{
>>>> +	MA_STATE(mas, &mgr->va_mt, start, 0);
>>>> +
>>>> +	if (start <= mgr->mm_start ||
>>>> +	    start > (mgr->mm_start + mgr->mm_range))
>>>> +		return NULL;
>>>> +
>>>> +	return mas_prev(&mas, start - 1);
>>>> +}
>>>> +EXPORT_SYMBOL(drm_gpuva_find_prev);
>>>> +
>>>> +/**
>>>> + * drm_gpuva_find_next - find the &drm_gpuva after the given address
>>>> + * @mgr: the &drm_gpuva_manager to search in
>>>> + * @end: the given GPU VA's end address
>>>> + *
>>>> + * Find the adjacent &drm_gpuva after the GPU VA with given &end address.
>>>> + *
>>>> + * Note that if there is any free space between the GPU VA mappings no mapping
>>>> + * is returned.
>>>> + *
>>>> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
>>>> + */
>>>> +struct drm_gpuva *
>>>> +drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end)
>>>
>>> This name is also a bit confusing for the same reason.  Again, it seems
>>> worth just walking to end here.
>>>
>>>> +{
>>>> +	MA_STATE(mas, &mgr->va_mt, end - 1, 0);
>>>> +
>>>> +	if (end < mgr->mm_start ||
>>>> +	    end >= (mgr->mm_start + mgr->mm_range))
>>>> +		return NULL;
>>>> +
>>>> +	return mas_next(&mas, end);
>>>> +}
>>>> +EXPORT_SYMBOL(drm_gpuva_find_next);
>>>> +
>>>> +static int
>>>> +__drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
>>>> +			  struct drm_gpuva_region *reg)
>>>> +{
>>>> +	u64 addr = reg->va.addr;
>>>> +	u64 range = reg->va.range;
>>>> +	MA_STATE(mas, &mgr->region_mt, addr, addr + range - 1);
>>>> +	int ret;
>>>> +
>>>> +	if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
>>>> +		return -EINVAL;
>>>> +
>>>> +	ret = mas_store_gfp(&mas, reg, GFP_KERNEL);
>>>> +	if (unlikely(ret))
>>>> +		return ret;
>>>> +
>>>> +	reg->mgr = mgr;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +/**
>>>> + * drm_gpuva_region_insert - insert a &drm_gpuva_region
>>>> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
>>>> + * @reg: the &drm_gpuva_region to insert
>>>> + * @addr: the start address of the GPU VA
>>>> + * @range: the range of the GPU VA
>>>> + *
>>>> + * Insert a &drm_gpuva_region with a given address and range into a
>>>> + * &drm_gpuva_manager.
>>>> + *
>>>> + * Returns: 0 on success, negative error code on failure.
>>>> + */
>>>> +int
>>>> +drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
>>>> +			struct drm_gpuva_region *reg)
>>>> +{
>>>> +	if (unlikely(!(mgr->flags & DRM_GPUVA_MANAGER_REGIONS)))
>>>> +		return -EINVAL;
>>>> +
>>>> +	return __drm_gpuva_region_insert(mgr, reg);
>>>> +}
>>>> +EXPORT_SYMBOL(drm_gpuva_region_insert);
>>>> +
>>>> +static void
>>>> +__drm_gpuva_region_remove(struct drm_gpuva_region *reg)
>>>> +{
>>>> +	struct drm_gpuva_manager *mgr = reg->mgr;
>>>> +	MA_STATE(mas, &mgr->region_mt, reg->va.addr, 0);
>>>> +
>>>> +	mas_erase(&mas);
>>>> +}
>>>> +
>>>> +/**
>>>> + * drm_gpuva_region_remove - remove a &drm_gpuva_region
>>>> + * @reg: the &drm_gpuva to remove
>>>> + *
>>>> + * This removes the given &reg from the underlaying tree.
>>>> + */
>>>> +void
>>>> +drm_gpuva_region_remove(struct drm_gpuva_region *reg)
>>>> +{
>>>> +	struct drm_gpuva_manager *mgr = reg->mgr;
>>>> +
>>>> +	if (unlikely(!(mgr->flags & DRM_GPUVA_MANAGER_REGIONS)))
>>>> +		return;
>>>> +
>>>> +	if (unlikely(reg == &mgr->kernel_alloc_region)) {
>>>> +		WARN(1, "Can't destroy kernel reserved region.\n");
>>>> +		return;
>>>> +	}
>>>> +
>>>> +	if (unlikely(!drm_gpuva_region_empty(reg)))
>>>> +		WARN(1, "GPU VA region should be empty on destroy.\n");
>>>> +
>>>> +	__drm_gpuva_region_remove(reg);
>>>> +}
>>>> +EXPORT_SYMBOL(drm_gpuva_region_remove);
>>>> +
>>>> +/**
>>>> + * drm_gpuva_region_empty - indicate whether a &drm_gpuva_region is empty
>>>> + * @reg: the &drm_gpuva to destroy
>>>> + *
>>>> + * Returns: true if the &drm_gpuva_region is empty, false otherwise
>>>> + */
>>>> +bool
>>>> +drm_gpuva_region_empty(struct drm_gpuva_region *reg)
>>>> +{
>>>> +	DRM_GPUVA_ITER(it, reg->mgr);
>>>> +
>>>> +	drm_gpuva_iter_for_each_range(it, reg->va.addr,
>>>> +				      reg->va.addr +
>>>> +				      reg->va.range)
>>>> +		return false;
>>>> +
>>>> +	return true;
>>>> +}
>>>> +EXPORT_SYMBOL(drm_gpuva_region_empty);
>>>> +
>>>> +/**
>>>> + * drm_gpuva_region_find_first - find the first &drm_gpuva_region in the given
>>>> + * range
>>>> + * @mgr: the &drm_gpuva_manager to search in
>>>> + * @addr: the &drm_gpuva_regions address
>>>> + * @range: the &drm_gpuva_regions range
>>>> + *
>>>> + * Returns: the first &drm_gpuva_region within the given range
>>>> + */
>>>> +struct drm_gpuva_region *
>>>> +drm_gpuva_region_find_first(struct drm_gpuva_manager *mgr,
>>>> +			    u64 addr, u64 range)
>>>> +{
>>>> +	MA_STATE(mas, &mgr->region_mt, addr, 0);
>>>> +
>>>> +	return mas_find(&mas, addr + range - 1);
>>>> +}
>>>> +EXPORT_SYMBOL(drm_gpuva_region_find_first);
>>>> +
>>>> +/**
>>>> + * drm_gpuva_region_find - find a &drm_gpuva_region
>>>> + * @mgr: the &drm_gpuva_manager to search in
>>>> + * @addr: the &drm_gpuva_regions address
>>>> + * @range: the &drm_gpuva_regions range
>>>> + *
>>>> + * Returns: the &drm_gpuva_region at a given &addr and with a given &range
>>>
>>> again, I'm not sure you want to find first or walk here.. It sounds like
>>> you want exactly addr to addr + range VMA?
>>
>> Exactly, same as above.
> 
> MA_STATE(mas, &mgr->region_mt, addr, addr);
> 
> reg = mas_walk(&mas);
> if (!reg)
> 	return reg;
> 
> if ((mas.index != addr) | (mas.last != range - 1))
> 	return NULL;
> 
> return reg;
> 
>>
>>>
>>>> + */
>>>> +struct drm_gpuva_region *
>>>> +drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
>>>> +		      u64 addr, u64 range)
>>>> +{
>>>> +	struct drm_gpuva_region *reg;
>>>> +
>>>> +	reg = drm_gpuva_region_find_first(mgr, addr, range);
> 
> mas_find() will keep searching, so you may get a VMA that starts higher
> than addr.
> 
>>>> +	if (!reg)
>>>> +		goto out;
>>>> +
>>>> +	if (reg->va.range != range)
>>>> +		goto out;
>>>> +
>>>> +	return reg;
>>>> +
>>>> +out:
>>>> +	return NULL;
>>>> +}
>>>> +EXPORT_SYMBOL(drm_gpuva_region_find);
>>>> +
>>>
>>> ...
>>>
>>
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
  2023-02-21 18:31               ` Matthew Wilcox
  2023-02-22 16:11                 ` Danilo Krummrich
@ 2023-02-27 17:39                 ` Danilo Krummrich
  2023-02-27 18:36                   ` Matthew Wilcox
  1 sibling, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-27 17:39 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: matthew.brost, bagasdotme, linux-doc, nouveau, ogabbay, corbet,
	linux-kernel, dri-devel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, christian.koenig, jason

On 2/21/23 19:31, Matthew Wilcox wrote:
> On Tue, Feb 21, 2023 at 03:37:49PM +0100, Danilo Krummrich wrote:
>> On Mon, Feb 20, 2023 at 08:33:35PM +0000, Matthew Wilcox wrote:
>>> On Mon, Feb 20, 2023 at 06:06:03PM +0100, Danilo Krummrich wrote:
>>>> On 2/20/23 16:10, Matthew Wilcox wrote:
>>>>> This is why we like people to use the spinlock embedded in the tree.
>>>>> There's nothing for the user to care about.  If the access really is
>>>>> serialised, acquiring/releasing the uncontended spinlock is a minimal
>>>>> cost compared to all the other things that will happen while modifying
>>>>> the tree.
>>>>
>>>> I think as for the users of the GPUVA manager we'd have two cases:
>>>>
>>>> 1) Accesses to the manager (and hence the tree) are serialized, no lock
>>>> needed.
>>>>
>>>> 2) Multiple operations on the tree must be locked in order to make them
>>>> appear atomic.
>>>
>>> Could you give an example here of what you'd like to do?  Ideally
>>> something complicated so I don't say "Oh, you can just do this" when
>>> there's a more complex example for which "this" won't work.  I'm sure
>>> that's embedded somewhere in the next 20-odd patches, but it's probably
>>> quicker for you to describe in terms of tree operations that have to
>>> appear atomic than for me to try to figure it out.
>>>
>>
>> Absolutely, not gonna ask you to read all of that. :-)
>>
>> One thing the GPUVA manager does is to provide drivers the (sub-)operations
>> that need to be processed in order to fulfill a map or unmap request from
>> userspace. For instance, when userspace asks the driver to map some memory
>> the GPUVA manager calculates which existing mappings must be removed, split up
>> or can be merged with the newly requested mapping.
>>
>> A driver has two ways to fetch those operations from the GPUVA manager. It can
>> either obtain a list of operations or receive a callback for each operation
>> generated by the GPUVA manager.
>>
>> In both cases the GPUVA manager walks the maple tree, which keeps track of
>> existing mappings, for the given range in __drm_gpuva_sm_map() (only considering
>> the map case, since the unmap case is a subset basically). For each mapping
>> found in the given range the driver, as mentioned, either receives a callback or
>> a list entry is added to the list of operations.
>>
>> Typically, for each operation / callback one entry within the maple tree is
>> removed and, optionally at the beginning and end of a new mapping's range, a
>> new entry is inserted. An of course, as the last operation, there is the new
>> mapping itself to insert.
>>
>> The GPUVA manager delegates locking responsibility to the drivers. Typically,
>> a driver either serializes access to the VA space managed by the GPUVA manager
>> (no lock needed) or need to lock the processing of a full set of operations
>> generated by the GPUVA manager.
> 
> OK, that all makes sense.  It does make sense to have the driver use its
> own mutex and then take the spinlock inside the maple tree code.  It
> shouldn't ever be contended.
> 
>>>> In either case the embedded spinlock wouldn't be useful, we'd either need an
>>>> external lock or no lock at all.
>>>>
>>>> If there are any internal reasons why specific tree operations must be
>>>> mutually excluded (such as those you explain below), wouldn't it make more
>>>> sense to always have the internal lock and, optionally, allow users to
>>>> specify an external lock additionally?
>>>
>>> So the way this works for the XArray, which is a little older than the
>>> Maple tree, is that we always use the internal spinlock for
>>> modifications (possibly BH or IRQ safe), and if someone wants to
>>> use an external mutex to make some callers atomic with respect to each
>>> other, they're free to do so.  In that case, the XArray doesn't check
>>> the user's external locking at all, because it really can't know.
>>>
>>> I'd advise taking that approach; if there's really no way to use the
>>> internal spinlock to make your complicated updates appear atomic
>>> then just let the maple tree use its internal spinlock, and you can
>>> also use your external mutex however you like.
>>>
>>
>> That sounds like the right thing to do.
>>
>> However, I'm using the advanced API of the maple tree (and that's the reason
>> why the above example appears a little more detailed than needed) because I
>> think with the normal API I can't insert / remove tree entries while walking
>> the tree, right?
> 
> Right.  The normal API is for simple operations while the advanced API
> is for doing compound operations.
> 
>> As by the documentation the advanced API, however, doesn't take care of locking
>> itself, hence just letting the maple tree use its internal spinlock doesn't
>> really work - I need to take care of that myself, right?
> 
> Yes; once you're using the advanced API, you get to compose the entire
> operation yourself.
> 
>> It feels a bit weird that I, as a user of the API, would need to lock certain
>> (or all?) mas_*() functions with the internal spinlock in order to protect
>> (future) internal features of the tree, such as the slab cache defragmentation
>> you mentioned. Because from my perspective, as the generic component that tells
>> it's users (the drivers) to take care of locking VA space operations (and hence
>> tree operations) I don't have an own purpose of this internal spinlock, right?
> 
> You don't ... but we can't know that.
> 
>> Also I'm a little confused how I'd know where to take the spinlock? E.g. for
>> inserting entries in the tree I use mas_store_gfp() with GFP_KERNEL.
> 
> Lockdep will shout at you if you get it wrong ;-)  But you can safely
> take the spinlock before calling mas_store_gfp(GFP_KERNEL) because
> mas_nomem() knows to drop the lock before doing a sleeping allocation.
> Essentially you're open-coding mtree_store_range() but doing your own
> thing in addition to the store.
> 

As already mentioned, I went with your advice to just take the maple 
tree's internal spinlock within the GPUVA manager and leave all the 
other locking to the drivers as intended.

However, I run into the case that lockdep shouts at me for not taking 
the spinlock before calling mas_find() in the iterator macros.

Now, I definitely don't want to let the drivers take the maple tree's 
spinlock before they use the iterator macro. Of course, drivers 
shouldn't even know about the underlying maple tree of the GPUVA manager.

One way to make lockdep happy in this case seems to be taking the 
spinlock right before mas_find() and drop it right after for each iteration.

What do you advice to do in this case?

Thanks,
Danilo


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
  2023-02-27 17:39                 ` Danilo Krummrich
@ 2023-02-27 18:36                   ` Matthew Wilcox
  2023-02-27 18:59                     ` Danilo Krummrich
  0 siblings, 1 reply; 64+ messages in thread
From: Matthew Wilcox @ 2023-02-27 18:36 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, bagasdotme, linux-doc, nouveau, ogabbay, corbet,
	linux-kernel, dri-devel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, christian.koenig, jason

On Mon, Feb 27, 2023 at 06:39:33PM +0100, Danilo Krummrich wrote:
> On 2/21/23 19:31, Matthew Wilcox wrote:
> > Lockdep will shout at you if you get it wrong ;-)  But you can safely
> > take the spinlock before calling mas_store_gfp(GFP_KERNEL) because
> > mas_nomem() knows to drop the lock before doing a sleeping allocation.
> > Essentially you're open-coding mtree_store_range() but doing your own
> > thing in addition to the store.
> 
> As already mentioned, I went with your advice to just take the maple tree's
> internal spinlock within the GPUVA manager and leave all the other locking
> to the drivers as intended.
> 
> However, I run into the case that lockdep shouts at me for not taking the
> spinlock before calling mas_find() in the iterator macros.
> 
> Now, I definitely don't want to let the drivers take the maple tree's
> spinlock before they use the iterator macro. Of course, drivers shouldn't
> even know about the underlying maple tree of the GPUVA manager.
> 
> One way to make lockdep happy in this case seems to be taking the spinlock
> right before mas_find() and drop it right after for each iteration.

While we don't have any lockdep checking of this, you really shouldn't be
using an iterator if you're going to drop the lock between invocations.
The iterator points into the tree, so you need to invalidate the iterator
any time you drop the lock.

You don't have to use a spinlock to do a read iteration.  You can just
take the rcu_read_lock() around your iteration, as long as you can
tolerate the mild inconsistencies that RCU permits.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
  2023-02-27 18:36                   ` Matthew Wilcox
@ 2023-02-27 18:59                     ` Danilo Krummrich
  0 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-27 18:59 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: matthew.brost, bagasdotme, linux-doc, nouveau, ogabbay, corbet,
	linux-kernel, dri-devel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Liam.Howlett, christian.koenig, jason

On 2/27/23 19:36, Matthew Wilcox wrote:
> On Mon, Feb 27, 2023 at 06:39:33PM +0100, Danilo Krummrich wrote:
>> On 2/21/23 19:31, Matthew Wilcox wrote:
>>> Lockdep will shout at you if you get it wrong ;-)  But you can safely
>>> take the spinlock before calling mas_store_gfp(GFP_KERNEL) because
>>> mas_nomem() knows to drop the lock before doing a sleeping allocation.
>>> Essentially you're open-coding mtree_store_range() but doing your own
>>> thing in addition to the store.
>>
>> As already mentioned, I went with your advice to just take the maple tree's
>> internal spinlock within the GPUVA manager and leave all the other locking
>> to the drivers as intended.
>>
>> However, I run into the case that lockdep shouts at me for not taking the
>> spinlock before calling mas_find() in the iterator macros.
>>
>> Now, I definitely don't want to let the drivers take the maple tree's
>> spinlock before they use the iterator macro. Of course, drivers shouldn't
>> even know about the underlying maple tree of the GPUVA manager.
>>
>> One way to make lockdep happy in this case seems to be taking the spinlock
>> right before mas_find() and drop it right after for each iteration.
> 
> While we don't have any lockdep checking of this, you really shouldn't be
> using an iterator if you're going to drop the lock between invocations.
> The iterator points into the tree, so you need to invalidate the iterator
> any time you drop the lock.

The tree can't change either way in my case. Changes to the DRM GPUVA 
manager (and hence the tree) are protected by drivers, either by 
serializing tree accesses or by having another external lock ensuring 
mutual exclusion. Just as a reminder, in the latter case drivers usually 
lock multiple transactions to the manager (and hence the tree) to ensure 
they appear atomic.

So, really the only purpose for me taking the internal lock is to ensure 
I satisfy lockdep and the maple tree's internal requirements on locking 
for future use cases you mentioned (e.g. slab cache defragmentation).

It's the rcu_dereference_check() in mas_root() that triggers in my case:

[   28.745706] lib/maple_tree.c:851 suspicious rcu_dereference_check() 
usage!

                stack backtrace:
[   28.746057] CPU: 8 PID: 1518 Comm: nouveau_dma_cop Not tainted 
6.2.0-rc6-vmbind-0.2+ #104
[   28.746061] Hardware name: ASUS System Product Name/PRIME Z690-A, 
BIOS 2103 09/30/2022
[   28.746064] Call Trace:
[   28.746067]  <TASK>
[   28.746070]  dump_stack_lvl+0x5b/0x77
[   28.746077]  mas_walk+0x16d/0x1b0
[   28.746082]  mas_find+0xf7/0x300
[   28.746088]  drm_gpuva_in_region+0x63/0xa0
[   28.746099]  __drm_gpuva_sm_map.isra.0+0x465/0x9f0
[   28.746103]  ? lock_acquire+0xbf/0x2b0
[   28.746111]  ? __pfx_drm_gpuva_sm_step+0x10/0x10
[   28.746114]  ? lock_is_held_type+0xe3/0x140
[   28.746121]  ? mark_held_locks+0x49/0x80
[   28.746125]  ? _raw_spin_unlock_irqrestore+0x30/0x60
[   28.746138]  drm_gpuva_sm_map_ops_create+0x80/0xc0
[   28.746145]  uvmm_bind_job_submit+0x3c2/0x470 [nouveau]
[   28.746272]  nouveau_job_submit+0x60/0x450 [nouveau]
[   28.746393]  nouveau_uvmm_ioctl_vm_bind+0x179/0x1e0 [nouveau]
[   28.746510]  ? __pfx_nouveau_uvmm_ioctl_vm_bind+0x10/0x10 [nouveau]
[   28.746622]  drm_ioctl_kernel+0xa9/0x160
[   28.746629]  drm_ioctl+0x1f7/0x4b0

> 
> You don't have to use a spinlock to do a read iteration.  You can just
> take the rcu_read_lock() around your iteration, as long as you can
> tolerate the mild inconsistencies that RCU permits.
>

Doing that would mean that the driver needs to do it. However, the 
driver either needs to serialize accesses or use it's own mutex for 
protection for the above reasons. Hence, that should not be needed.



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-21 18:20   ` Liam R. Howlett
  2023-02-22 18:13     ` Danilo Krummrich
@ 2023-02-28  2:17     ` Danilo Krummrich
  2023-02-28 16:24       ` Liam R. Howlett
  1 sibling, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-02-28  2:17 UTC (permalink / raw)
  To: Liam R. Howlett, airlied, daniel, tzimmermann, mripard, corbet,
	christian.koenig, bskeggs, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, dri-devel,
	nouveau, linux-doc, linux-mm, linux-kernel, Dave Airlie

[-- Attachment #1: Type: text/plain, Size: 4539 bytes --]

On Tue, Feb 21, 2023 at 01:20:50PM -0500, Liam R. Howlett wrote:
> * Danilo Krummrich <dakr@redhat.com> [230217 08:45]:
> > Add infrastructure to keep track of GPU virtual address (VA) mappings
> > with a decicated VA space manager implementation.
> > 
> > New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
> > start implementing, allow userspace applications to request multiple and
> > arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
> > intended to serve the following purposes in this context.
> > 
> > 1) Provide infrastructure to track GPU VA allocations and mappings,
> >    making use of the maple_tree.
> > 
> > 2) Generically connect GPU VA mappings to their backing buffers, in
> >    particular DRM GEM objects.
> > 
> > 3) Provide a common implementation to perform more complex mapping
> >    operations on the GPU VA space. In particular splitting and merging
> >    of GPU VA mappings, e.g. for intersecting mapping requests or partial
> >    unmap requests.
> > 
> > Suggested-by: Dave Airlie <airlied@redhat.com>
> > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > ---
> >  Documentation/gpu/drm-mm.rst    |   31 +
> >  drivers/gpu/drm/Makefile        |    1 +
> >  drivers/gpu/drm/drm_gem.c       |    3 +
> >  drivers/gpu/drm/drm_gpuva_mgr.c | 1704 +++++++++++++++++++++++++++++++
> >  include/drm/drm_drv.h           |    6 +
> >  include/drm/drm_gem.h           |   75 ++
> >  include/drm/drm_gpuva_mgr.h     |  714 +++++++++++++
> >  7 files changed, 2534 insertions(+)
> >  create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
> >  create mode 100644 include/drm/drm_gpuva_mgr.h
> > 
> > diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> > index a52e6f4117d6..c9f120cfe730 100644
> > --- a/Documentation/gpu/drm-mm.rst
> > +++ b/Documentation/gpu/drm-mm.rst
> > @@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
> >  .. kernel-doc:: drivers/gpu/drm/drm_mm.c
> >     :export:
> >  
> ...
> 
> > +
> > +/**
> > + * drm_gpuva_remove_iter - removes the iterators current element
> > + * @it: the &drm_gpuva_iterator
> > + *
> > + * This removes the element the iterator currently points to.
> > + */
> > +void
> > +drm_gpuva_iter_remove(struct drm_gpuva_iterator *it)
> > +{
> > +	mas_erase(&it->mas);
> > +}
> > +EXPORT_SYMBOL(drm_gpuva_iter_remove);
> > +
> > +/**
> > + * drm_gpuva_insert - insert a &drm_gpuva
> > + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
> > + * @va: the &drm_gpuva to insert
> > + * @addr: the start address of the GPU VA
> > + * @range: the range of the GPU VA
> > + *
> > + * Insert a &drm_gpuva with a given address and range into a
> > + * &drm_gpuva_manager.
> > + *
> > + * Returns: 0 on success, negative error code on failure.
> > + */
> > +int
> > +drm_gpuva_insert(struct drm_gpuva_manager *mgr,
> > +		 struct drm_gpuva *va)
> > +{
> > +	u64 addr = va->va.addr;
> > +	u64 range = va->va.range;
> > +	MA_STATE(mas, &mgr->va_mt, addr, addr + range - 1);
> > +	struct drm_gpuva_region *reg = NULL;
> > +	int ret;
> > +
> > +	if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
> > +		return -EINVAL;
> > +
> > +	if (unlikely(drm_gpuva_in_kernel_region(mgr, addr, range)))
> > +		return -EINVAL;
> > +
> > +	if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS) {
> > +		reg = drm_gpuva_in_region(mgr, addr, range);
> > +		if (unlikely(!reg))
> > +			return -EINVAL;
> > +	}
> > +
> 
> -----
> 
> > +	if (unlikely(drm_gpuva_find_first(mgr, addr, range)))
> > +		return -EEXIST;
> > +
> > +	ret = mas_store_gfp(&mas, va, GFP_KERNEL);
> 
> mas_walk() will set the internal maple state to the limits to what it
> finds.  So, instead of an iterator, you can use the walk function and
> ensure there is a large enough area in the existing NULL:
> 
> /*
>  * Nothing at addr, mas now points to the location where the store would
>  * happen
>  */
> if (mas_walk(&mas))
> 	return -EEXIST;
> 

For some reason mas_walk() finds an entry and hence this function returns
-EEXIST for the following sequence of insertions.

A = [0xc0000 - 0xfffff]
B = [0x0 - 0xbffff]

Interestingly, inserting B before A works fine.

I attached a test module that reproduces the issue. I hope its just a stupid
mistake I just can't spot though.

> /* The NULL entry ends at mas.last, make sure there is room */
> if (mas.last < (addr + range - 1))
> 	return -EEXIST;
> 
> /* Limit the store size to the correct end address, and store */
>  mas.last = addr + range - 1;
>  ret = mas_store_gfp(&mas, va, GFP_KERNEL);
> 

[-- Attachment #2: maple.c --]
[-- Type: text/x-c, Size: 1954 bytes --]

/* SPDX-License-Identifier: GPL-2.0 */
#if 1
#include <linux/init.h>
#include <linux/ioctl.h>
#include <linux/kernel.h>
#include <linux/list.h>
#include <linux/maple_tree.h>
#include <linux/mm.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/mutex.h>
#include <linux/printk.h>
#include <linux/proc_fs.h>
#include <linux/slab.h>
#include <linux/types.h>
#endif

struct maple_tree mt;

struct va {
	u64 addr;
	u64 range;
};

static int va_store(struct va *va)
{
	void *entry = NULL;
	u64 addr = va->addr;
	u64 range = va->range;
	u64 last = addr + range - 1;
	MA_STATE(mas, &mt, addr, addr);
	int ret;

	mas_lock(&mas);

	if ((entry = mas_walk(&mas))) {
		pr_err("addr=%llx, range=%llx, last=%llx, mas.index=%lx, mas.last=%lx, entry=%px - exists\n",
		       addr, range, last, mas.index, mas.last, entry);
		ret = -EEXIST;
		goto err_unlock;
	}

	if (mas.last < last) {
		pr_err("addr=%llx, range=%llx, last=%llx, mas.index=%lx, mas.last%lx, va=%px - not enough space\n",
		       addr, range, last, mas.index, mas.last, va);
		ret = -EEXIST;
		goto err_unlock;
	}

	mas.last = last;
	ret = mas_store_gfp(&mas, &va, GFP_KERNEL);
	if (ret) {
		pr_err("mas_store_gfp() failed\n");
		goto err_unlock;
	}

	mas_unlock(&mas);

	pr_info("addr=%llx, range=%llx, last=%llx, mas.index=%lx, mas.last=%lx, va=%px - insert\n",
		addr, range, last, mas.index, mas.last, va);

	return 0;

err_unlock:
	mas_unlock(&mas);
	return ret;
}

static int __init maple_init(void)
{
	struct va kernel_node = { .addr = 0xc0000, .range = 0x40000 };
	struct va node = { .addr = 0x0, .range = 0xc0000 };

	mt_init(&mt);

	va_store(&kernel_node);
	va_store(&node);

	return 0;
}

static void __exit maple_exit(void)
{
	mtree_lock(&mt);
	__mt_destroy(&mt);
	mtree_unlock(&mt);
}

module_init(maple_init);
module_exit(maple_exit);

MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Danilo Krummrich");
MODULE_DESCRIPTION("Maple Tree example.");
MODULE_VERSION("0.1");

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-28  2:17     ` Danilo Krummrich
@ 2023-02-28 16:24       ` Liam R. Howlett
  2023-03-06 13:39         ` Danilo Krummrich
  0 siblings, 1 reply; 64+ messages in thread
From: Liam R. Howlett @ 2023-02-28 16:24 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Dave Airlie, bagasdotme, christian.koenig, jason

* Danilo Krummrich <dakr@redhat.com> [230227 21:17]:
> On Tue, Feb 21, 2023 at 01:20:50PM -0500, Liam R. Howlett wrote:
> > * Danilo Krummrich <dakr@redhat.com> [230217 08:45]:
> > > Add infrastructure to keep track of GPU virtual address (VA) mappings
> > > with a decicated VA space manager implementation.
> > > 
> > > New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
> > > start implementing, allow userspace applications to request multiple and
> > > arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
> > > intended to serve the following purposes in this context.
> > > 
> > > 1) Provide infrastructure to track GPU VA allocations and mappings,
> > >    making use of the maple_tree.
> > > 
> > > 2) Generically connect GPU VA mappings to their backing buffers, in
> > >    particular DRM GEM objects.
> > > 
> > > 3) Provide a common implementation to perform more complex mapping
> > >    operations on the GPU VA space. In particular splitting and merging
> > >    of GPU VA mappings, e.g. for intersecting mapping requests or partial
> > >    unmap requests.
> > > 
> > > Suggested-by: Dave Airlie <airlied@redhat.com>
> > > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > > ---
> > >  Documentation/gpu/drm-mm.rst    |   31 +
> > >  drivers/gpu/drm/Makefile        |    1 +
> > >  drivers/gpu/drm/drm_gem.c       |    3 +
> > >  drivers/gpu/drm/drm_gpuva_mgr.c | 1704 +++++++++++++++++++++++++++++++
> > >  include/drm/drm_drv.h           |    6 +
> > >  include/drm/drm_gem.h           |   75 ++
> > >  include/drm/drm_gpuva_mgr.h     |  714 +++++++++++++
> > >  7 files changed, 2534 insertions(+)
> > >  create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
> > >  create mode 100644 include/drm/drm_gpuva_mgr.h
> > > 
> > > diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> > > index a52e6f4117d6..c9f120cfe730 100644
> > > --- a/Documentation/gpu/drm-mm.rst
> > > +++ b/Documentation/gpu/drm-mm.rst
> > > @@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
> > >  .. kernel-doc:: drivers/gpu/drm/drm_mm.c
> > >     :export:
> > >  
> > ...
> > 
> > > +
> > > +/**
> > > + * drm_gpuva_remove_iter - removes the iterators current element
> > > + * @it: the &drm_gpuva_iterator
> > > + *
> > > + * This removes the element the iterator currently points to.
> > > + */
> > > +void
> > > +drm_gpuva_iter_remove(struct drm_gpuva_iterator *it)
> > > +{
> > > +	mas_erase(&it->mas);
> > > +}
> > > +EXPORT_SYMBOL(drm_gpuva_iter_remove);
> > > +
> > > +/**
> > > + * drm_gpuva_insert - insert a &drm_gpuva
> > > + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
> > > + * @va: the &drm_gpuva to insert
> > > + * @addr: the start address of the GPU VA
> > > + * @range: the range of the GPU VA
> > > + *
> > > + * Insert a &drm_gpuva with a given address and range into a
> > > + * &drm_gpuva_manager.
> > > + *
> > > + * Returns: 0 on success, negative error code on failure.
> > > + */
> > > +int
> > > +drm_gpuva_insert(struct drm_gpuva_manager *mgr,
> > > +		 struct drm_gpuva *va)
> > > +{
> > > +	u64 addr = va->va.addr;
> > > +	u64 range = va->va.range;
> > > +	MA_STATE(mas, &mgr->va_mt, addr, addr + range - 1);
> > > +	struct drm_gpuva_region *reg = NULL;
> > > +	int ret;
> > > +
> > > +	if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
> > > +		return -EINVAL;
> > > +
> > > +	if (unlikely(drm_gpuva_in_kernel_region(mgr, addr, range)))
> > > +		return -EINVAL;
> > > +
> > > +	if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS) {
> > > +		reg = drm_gpuva_in_region(mgr, addr, range);
> > > +		if (unlikely(!reg))
> > > +			return -EINVAL;
> > > +	}
> > > +
> > 
> > -----
> > 
> > > +	if (unlikely(drm_gpuva_find_first(mgr, addr, range)))
> > > +		return -EEXIST;
> > > +
> > > +	ret = mas_store_gfp(&mas, va, GFP_KERNEL);
> > 
> > mas_walk() will set the internal maple state to the limits to what it
> > finds.  So, instead of an iterator, you can use the walk function and
> > ensure there is a large enough area in the existing NULL:
> > 
> > /*
> >  * Nothing at addr, mas now points to the location where the store would
> >  * happen
> >  */
> > if (mas_walk(&mas))
> > 	return -EEXIST;
> > 
> 
> For some reason mas_walk() finds an entry and hence this function returns
> -EEXIST for the following sequence of insertions.
> 
> A = [0xc0000 - 0xfffff]
> B = [0x0 - 0xbffff]
> 
> Interestingly, inserting B before A works fine.
> 
> I attached a test module that reproduces the issue. I hope its just a stupid
> mistake I just can't spot though.

This is probably my fault in how I explained things, I seem to have had
a bug in my code.

Let me try again.

mas_walk(&mas) will go to the range of mas.index
	It will set mas.index = range_start
	It will set mas.last = range_end
	It will return entry in that range.

Your code is walking to addr (0xc0000, say)
You get NULL
and the range is now: mas.index = 0, mas.last = ULONG_MAX

You set mas.last = 0xc0000 + 0x40000 -1
You store your va in the range of 0 - 0xfffff - This isn't what you want
to do and this is why you are seeing it exists when done in this order.

In the reverse order, your lower limit is fine so it works out.

Try adding a check to ensure the lower range is still accurate as well:
        if (mas.index < addr)                                                                                           
                mas.index = addr;

If you compile with CONFIG_DEBUG_MAPLE_TREE, you can use mt_dump() to
dump the tree for debugging.

I also have some quality of life patches I'm developing to configure the
format of the dump (hex/dec) and a mas_dump() for more information as
well.

> 
> > /* The NULL entry ends at mas.last, make sure there is room */
> > if (mas.last < (addr + range - 1))
> > 	return -EEXIST;
> > 
> > /* Limit the store size to the correct end address, and store */
> >  mas.last = addr + range - 1;
> >  ret = mas_store_gfp(&mas, va, GFP_KERNEL);
> > 



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-27 12:23         ` Danilo Krummrich
@ 2023-03-02  2:38           ` Liam R. Howlett
  2023-03-06 15:46             ` Danilo Krummrich
  0 siblings, 1 reply; 64+ messages in thread
From: Liam R. Howlett @ 2023-03-02  2:38 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Dave Airlie, bagasdotme, christian.koenig, jason

* Danilo Krummrich <dakr@redhat.com> [230227 08:17]:

...
> > > Would this variant be significantly more efficient?
> > 
> > Well, what you are doing is walking the tree to see if there's anything
> > there... then re-walking the tree to store it.  So, yes, it's much more
> > efficient..  However, writing is heavier.  How much of the time is spent
> > walking vs writing depends on the size of the tree, but it's rather easy
> > to do this in a single walk of the tree so why wouldn't you?
> 
> I will, I was just curious about how much of an impact it has.
> 
> > 
> > > 
> > > Also, would this also work while already walking the tree?
> > 
> > Yes, to an extent.  If you are at the correct location in the tree, you
> > can write to that location.  If you are not in the correct location and
> > try to write to the tree then things will go poorly..  In this scenario,
> > we are very much walking the tree and writing to it in two steps.
> > 
> > > 
> > > To remove an entry while walking the tree I have a separate function
> > > drm_gpuva_iter_remove(). Would I need something similar for inserting
> > > entries?
> > 
> > I saw that.  Your remove function uses the erase operation which is
> > implemented as a walk to that location and a store of a null over the
> > range that is returned.  You do not need a function to insert an entry
> > if the maple state is at the correct location, and that doesn't just
> > mean setting mas.index/mas.last to the correct value.  There is a node &
> > offset saved in the maple state that needs to be in the correct
> > location.  If you store to that node then the node may be replaced, so
> > other iterators that you have may become stale, but the one you used
> > execute the store operation will now point to the new node with the new
> > entry.
> > 
> > > 
> > > I already provided this example in a separate mail thread, but it may makes
> > > sense to move this to the mailing list:
> > > 
> > > In __drm_gpuva_sm_map() we're iterating a given range of the tree, where the
> > > given range is the size of the newly requested mapping. __drm_gpuva_sm_map()
> > > invokes a callback for each sub-operation that needs to be taken in order to
> > > fulfill this mapping request. In most cases such a callback just creates a
> > > drm_gpuva_op object and stores it in a list.
> > > 
> > > However, drivers can also implement the callback, such that they directly
> > > execute this operation within the callback.
> > > 
> > > Let's have a look at the following example:
> > > 
> > >       0     a     2
> > > old: |-----------|       (bo_offset=n)
> > > 
> > >             1     b     3
> > > req:       |-----------| (bo_offset=m)
> > > 
> > >       0  a' 1     b     3
> > > new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
> > > 
> > > This would result in the following operations.
> > > 
> > > __drm_gpuva_sm_map() finds entry "a" and calls back into the driver
> > > suggesting to re-map "a" with the new size. The driver removes entry "a"
> > > from the tree and adds "a'"
> > 
> > What you have here won't work.  The driver will cause your iterators
> > maple state to point to memory that is freed.  You will either need to
> > pass through your iterator so that the modifications can occur with that
> > maple state so it remains valid, or you will need to invalidate the
> > iterator on every modification by the driver.
> > 
> > I'm sure the first idea you have will be to invalidate the iterator, but
> > that is probably not the way to proceed.  Even ignoring the unclear
> > locking of two maple states trying to modify the tree, this is rather
> > inefficient - each invalidation means a re-walk of the tree.  You may as
> > well not use an iterator in this case.
> > 
> > Depending on how/when the lookups occur, you could still iterate over
> > the tree and let the driver modify the ending of "a", but leave the tree
> > alone and just store b over whatever - but the failure scenarios may
> > cause you grief.
> > 
> > If you pass the iterator through, then you can just use it to do your
> > writes and keep iterating as if nothing changed.
> 
> Passing through the iterater clearly seems to be the way to go.
> 
> I assume that if the entry to insert isn't at the location of the iterator
> (as in the following example) we can just keep walking to this location my
> changing the index of the mas and calling mas_walk()?

no.  You have to mas_set() to the value and walk from the top of the
tree.  mas_walk() walks down, not from side to side - well, it does go
forward within a node (increasing offset), but if you hit the node limit
then you have gotten yourself in trouble.

> This would also imply
> that the "outer" tree walk continues after the entry we just inserted,
> right?

I don't understand the "outer" tree walk statement.

> 
>            1     a     3
> old:       |-----------| (bo_offset=n)
> 
>      0     b     2
> req: |-----------|       (bo_offset=m)
> 
>      0     b     2  a' 3
> new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> 
> Again, after finding "a", we want to remove it and insert "a'" instead.

Ah, so you could walk to 0, see that it's NULL from 0 - 1, call
mas_next() and get "a" from 1 - 3, write "a'" from 2 - 3:

        0     1  a   2  a' 3
broken: |-----|------|-----| (a is broken in this 1/2 step)

mas_set_range(&mas, 0, 2); /* Resets the tree location to MAS_START */
mas_store(&mas, b);
        0     b     2  a' 3
new:    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)


You can *probably* also get away with this:

walk to 0, see that it's NULL from 0 - 1, call mas_next() and get "a"
from 1 - 3, write "a'" from 2 - 3:

        0     1  a   2  a' 3
broken: |-----|------|-----| (a is broken in this 1/2 step)

mas_prev(&mas, 0); /* Looking at broken a from 1-2.
mas_store(&mas, NULL); /* NULL is expanded on write to 0-2.
            0    NULL   2  a' 3
broken':    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)

mas_store(&mas, b);
        0     b     2  a' 3
new:    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)

You may want to iterate backwards and do the writes as you go until you
have enough room.. it really depends how you want to go about doing
things.

> 
> > 
> > > 
> > > __drm_gpuva_sm_map(), ideally, continues the loop searching for nodes
> > > starting from the end of "a" (which is 2) till the end of the requested
> > > mapping "b" (which is 3). Since it doesn't find any other mapping within
> > > this range it calls back into the driver suggesting to finally map "b".
> > > 
> > > If there would have been another mapping between 2 and 3 it would have
> > > called back into the driver asking to unmap this mapping beforehand.
> > > 
> > > So, it boils down to re-mapping as described at the beginning (and
> > > analogously at the end) of a new mapping range and removing of entries that
> > > are enclosed by the new mapping range.
> > 
> > I assume the unmapped area is no longer needed, and the 're-map' is
> > really a removal of information?  Otherwise I'd suggest searching for a
> > gap which fits your request.  What you have here is a lot like
> > "MAP_FIXED" vs top-down/bottom-up search in the VMA code, this seems to
> > be like your __drm_gpuva_sm_map() and the drm mm range allocator with
> > DRM_MM_INSERT_LOW, and DRM_MM_INSERT_HIGH.
> > 
> > Why can these split/unmappings fail?  Is it because they are still
> > needed?
> > 
> 
> You mean the check before the mas_*() operations in drm_gpuva_insert()?

Yes, the callbacks.

> 
> Removing entries should never fail, inserting entries should fail when the
> caller tries to store to an area outside of the VA space (it doesn't
> necessarily span the whole 64-bit space), a kernel reserved area of the VA
> space, is not in any pre-allocated range of the VA space (if regions are
> enabled) or an entry already exists at that location.

In the mmap code, I have to deal with splitting the start/end VMA and
removing any VMAs in the way.  I do this by making a 'detached' tree
that is dealt with later, then just overwriting the area with one
mas_store() operation.  Would something like that work for you?

> 
> > > 
> > > > > +	if (unlikely(ret))
> > > > > +		return ret;
> > > > > +
> > > > > +	va->mgr = mgr;
> > > > > +	va->region = reg;
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_gpuva_insert);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuva_remove - remove a &drm_gpuva
> > > > > + * @va: the &drm_gpuva to remove
> > > > > + *
> > > > > + * This removes the given &va from the underlaying tree.
> > > > > + */
> > > > > +void
> > > > > +drm_gpuva_remove(struct drm_gpuva *va)
> > > > > +{
> > > > > +	MA_STATE(mas, &va->mgr->va_mt, va->va.addr, 0);
> > > > > +
> > > > > +	mas_erase(&mas);
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_gpuva_remove);
> > > > > +
> > > > ...
> > > > 
> > > > > +/**
> > > > > + * drm_gpuva_find_first - find the first &drm_gpuva in the given range
> > > > > + * @mgr: the &drm_gpuva_manager to search in
> > > > > + * @addr: the &drm_gpuvas address
> > > > > + * @range: the &drm_gpuvas range
> > > > > + *
> > > > > + * Returns: the first &drm_gpuva within the given range
> > > > > + */
> > > > > +struct drm_gpuva *
> > > > > +drm_gpuva_find_first(struct drm_gpuva_manager *mgr,
> > > > > +		     u64 addr, u64 range)
> > > > > +{
> > > > > +	MA_STATE(mas, &mgr->va_mt, addr, 0);
> > > > > +
> > > > > +	return mas_find(&mas, addr + range - 1);
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_gpuva_find_first);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuva_find - find a &drm_gpuva
> > > > > + * @mgr: the &drm_gpuva_manager to search in
> > > > > + * @addr: the &drm_gpuvas address
> > > > > + * @range: the &drm_gpuvas range
> > > > > + *
> > > > > + * Returns: the &drm_gpuva at a given &addr and with a given &range
> > > > 
> > > > Note that mas_find() will continue upwards in the address space if there
> > > > isn't anything at @addr.  This means that &drm_gpuva may not be at
> > > > &addr.  If you want to check just at &addr, use mas_walk().
> > > 
> > > Good catch. drm_gpuva_find() should then either also check for 'va->va.addr
> > > == addr' as well or, alternatively, use mas_walk(). As above, any reason to
> > > prefer mas_walk()?

I think I missed this question last time..

Internally, mas_find() is just a mas_walk() on the first call, then
mas_next() for each call after that.  If, during the mas_walk(), there
is no value at addr, it immediately calls mas_next() to get a value to
return.  It will continue upwards until the limit is reached (addr +
range - 1 in your case).

So if you only want to know if there is something at addr, then it's
best to use mas_walk() and keep things a bit more efficient.  Then you
can check mas.last for your end value.

If you do want the first VMA within the range passed in, then mas_find()
is the function you want.

> > > 
> > > > 
> > > > > + */
> > > > > +struct drm_gpuva *
> > > > > +drm_gpuva_find(struct drm_gpuva_manager *mgr,
> > > > > +	       u64 addr, u64 range)
> > > > > +{
> > > > > +	struct drm_gpuva *va;
> > > > > +
> > > > > +	va = drm_gpuva_find_first(mgr, addr, range);
> > > > > +	if (!va)
> > > > > +		goto out;
> > > > > +
> > > > > +	if (va->va.range != range)
> > > > > +		goto out;
> > > > > +
> > > > > +	return va;
> > > > > +
> > > > > +out:
> > > > > +	return NULL;
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_gpuva_find);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuva_find_prev - find the &drm_gpuva before the given address
> > > > > + * @mgr: the &drm_gpuva_manager to search in
> > > > > + * @start: the given GPU VA's start address
> > > > > + *
> > > > > + * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
> > > > > + *
> > > > > + * Note that if there is any free space between the GPU VA mappings no mapping
> > > > > + * is returned.
> > > > > + *
> > > > > + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
> > > > > + */
> > > > > +struct drm_gpuva *
> > > > > +drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
> > > > 
> > > > find_prev() usually continues beyond 1 less than the address. I found
> > > > this name confusing.
> > > 
> > > Don't really get that, mind explaining?
> > 
> > When I ask for the previous one in a list or tree, I think the one
> > before.. but since you are limiting your search from start to start - 1,
> > you may as well walk to start - 1 and see if one exists.
> > 
> > Is that what you meant to do here?
> 
> Yes, I want to know whether there is a previous entry which ends right
> before the current entry, without a gap between the two.
> 
> > 
> > > 
> > > > You may as well use mas_walk(), it would be faster.
> > > 
> > > How would I use mas_walk() for that? If I understand it correctly,
> > > mas_walk() requires me to know that start address, which I don't know for
> > > the previous entry.
> > 
> > mas_walk() walks to the value you specify and returns the entry at that
> > address, not necessarily the start address, but any address in the
> > range.
> > 
> > If you have a tree and store A = [0x1000 - 0x2000] and set your maple
> > state to walk to 0x1500, mas_walk() will return A, and the maple state
> > will have mas.index = 0x1000 and mas.last = 0x2000.
> > 
> > You have set the maple state to start at "start" and called
> > mas_prev(&mas, start - 1).  start - 1 is the lower limit, so the
> > internal implementation will walk to start then go to the previous entry
> > until start - 1.. it will stop at start - 1 and return NULL if there
> > isn't one there.
> 
> Thanks for the clarification and all the other very helpful comments and
> explanations!
> 

Always glad to help.  The more users the tree has, the more I can see
where we may need to expand the interface to help others.

...


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-02-28 16:24       ` Liam R. Howlett
@ 2023-03-06 13:39         ` Danilo Krummrich
  0 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-03-06 13:39 UTC (permalink / raw)
  To: Liam R. Howlett, airlied, daniel, tzimmermann, mripard, corbet,
	christian.koenig, bskeggs, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, dri-devel,
	nouveau, linux-doc, linux-mm, linux-kernel, Dave Airlie

On 2/28/23 17:24, Liam R. Howlett wrote:
> * Danilo Krummrich <dakr@redhat.com> [230227 21:17]:
>> On Tue, Feb 21, 2023 at 01:20:50PM -0500, Liam R. Howlett wrote:
>>> * Danilo Krummrich <dakr@redhat.com> [230217 08:45]:
>>>> Add infrastructure to keep track of GPU virtual address (VA) mappings
>>>> with a decicated VA space manager implementation.
>>>>
>>>> New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
>>>> start implementing, allow userspace applications to request multiple and
>>>> arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
>>>> intended to serve the following purposes in this context.
>>>>
>>>> 1) Provide infrastructure to track GPU VA allocations and mappings,
>>>>     making use of the maple_tree.
>>>>
>>>> 2) Generically connect GPU VA mappings to their backing buffers, in
>>>>     particular DRM GEM objects.
>>>>
>>>> 3) Provide a common implementation to perform more complex mapping
>>>>     operations on the GPU VA space. In particular splitting and merging
>>>>     of GPU VA mappings, e.g. for intersecting mapping requests or partial
>>>>     unmap requests.
>>>>
>>>> Suggested-by: Dave Airlie <airlied@redhat.com>
>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>> ---
>>>>   Documentation/gpu/drm-mm.rst    |   31 +
>>>>   drivers/gpu/drm/Makefile        |    1 +
>>>>   drivers/gpu/drm/drm_gem.c       |    3 +
>>>>   drivers/gpu/drm/drm_gpuva_mgr.c | 1704 +++++++++++++++++++++++++++++++
>>>>   include/drm/drm_drv.h           |    6 +
>>>>   include/drm/drm_gem.h           |   75 ++
>>>>   include/drm/drm_gpuva_mgr.h     |  714 +++++++++++++
>>>>   7 files changed, 2534 insertions(+)
>>>>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>>>>   create mode 100644 include/drm/drm_gpuva_mgr.h
>>>>
>>>> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
>>>> index a52e6f4117d6..c9f120cfe730 100644
>>>> --- a/Documentation/gpu/drm-mm.rst
>>>> +++ b/Documentation/gpu/drm-mm.rst
>>>> @@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
>>>>   .. kernel-doc:: drivers/gpu/drm/drm_mm.c
>>>>      :export:
>>>>   
>>> ...
>>>
>>>> +
>>>> +/**
>>>> + * drm_gpuva_remove_iter - removes the iterators current element
>>>> + * @it: the &drm_gpuva_iterator
>>>> + *
>>>> + * This removes the element the iterator currently points to.
>>>> + */
>>>> +void
>>>> +drm_gpuva_iter_remove(struct drm_gpuva_iterator *it)
>>>> +{
>>>> +	mas_erase(&it->mas);
>>>> +}
>>>> +EXPORT_SYMBOL(drm_gpuva_iter_remove);
>>>> +
>>>> +/**
>>>> + * drm_gpuva_insert - insert a &drm_gpuva
>>>> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
>>>> + * @va: the &drm_gpuva to insert
>>>> + * @addr: the start address of the GPU VA
>>>> + * @range: the range of the GPU VA
>>>> + *
>>>> + * Insert a &drm_gpuva with a given address and range into a
>>>> + * &drm_gpuva_manager.
>>>> + *
>>>> + * Returns: 0 on success, negative error code on failure.
>>>> + */
>>>> +int
>>>> +drm_gpuva_insert(struct drm_gpuva_manager *mgr,
>>>> +		 struct drm_gpuva *va)
>>>> +{
>>>> +	u64 addr = va->va.addr;
>>>> +	u64 range = va->va.range;
>>>> +	MA_STATE(mas, &mgr->va_mt, addr, addr + range - 1);
>>>> +	struct drm_gpuva_region *reg = NULL;
>>>> +	int ret;
>>>> +
>>>> +	if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
>>>> +		return -EINVAL;
>>>> +
>>>> +	if (unlikely(drm_gpuva_in_kernel_region(mgr, addr, range)))
>>>> +		return -EINVAL;
>>>> +
>>>> +	if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS) {
>>>> +		reg = drm_gpuva_in_region(mgr, addr, range);
>>>> +		if (unlikely(!reg))
>>>> +			return -EINVAL;
>>>> +	}
>>>> +
>>>
>>> -----
>>>
>>>> +	if (unlikely(drm_gpuva_find_first(mgr, addr, range)))
>>>> +		return -EEXIST;
>>>> +
>>>> +	ret = mas_store_gfp(&mas, va, GFP_KERNEL);
>>>
>>> mas_walk() will set the internal maple state to the limits to what it
>>> finds.  So, instead of an iterator, you can use the walk function and
>>> ensure there is a large enough area in the existing NULL:
>>>
>>> /*
>>>   * Nothing at addr, mas now points to the location where the store would
>>>   * happen
>>>   */
>>> if (mas_walk(&mas))
>>> 	return -EEXIST;
>>>
>>
>> For some reason mas_walk() finds an entry and hence this function returns
>> -EEXIST for the following sequence of insertions.
>>
>> A = [0xc0000 - 0xfffff]
>> B = [0x0 - 0xbffff]
>>
>> Interestingly, inserting B before A works fine.
>>
>> I attached a test module that reproduces the issue. I hope its just a stupid
>> mistake I just can't spot though.
> 
> This is probably my fault in how I explained things, I seem to have had
> a bug in my code.
> 
> Let me try again.
> 
> mas_walk(&mas) will go to the range of mas.index
> 	It will set mas.index = range_start
> 	It will set mas.last = range_end
> 	It will return entry in that range.
> 
> Your code is walking to addr (0xc0000, say)
> You get NULL
> and the range is now: mas.index = 0, mas.last = ULONG_MAX
> 
> You set mas.last = 0xc0000 + 0x40000 -1
> You store your va in the range of 0 - 0xfffff - This isn't what you want
> to do and this is why you are seeing it exists when done in this order.
> 
> In the reverse order, your lower limit is fine so it works out.
> 
> Try adding a check to ensure the lower range is still accurate as well:
>          if (mas.index < addr)
>                  mas.index = addr;
> 

Thanks for the clarification, this indeed fixes the issue.

> If you compile with CONFIG_DEBUG_MAPLE_TREE, you can use mt_dump() to
> dump the tree for debugging.

That's pretty handy.

> 
> I also have some quality of life patches I'm developing to configure the
> format of the dump (hex/dec) and a mas_dump() for more information as
> well.

Great, I tried it out and really missed a hex option. Are they in any 
git repository already, such that I could fetch them?

> 
>>
>>> /* The NULL entry ends at mas.last, make sure there is room */
>>> if (mas.last < (addr + range - 1))
>>> 	return -EEXIST;
>>>
>>> /* Limit the store size to the correct end address, and store */
>>>   mas.last = addr + range - 1;
>>>   ret = mas_store_gfp(&mas, va, GFP_KERNEL);
>>>
> 
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-03-02  2:38           ` Liam R. Howlett
@ 2023-03-06 15:46             ` Danilo Krummrich
  2023-03-07 22:43               ` Liam R. Howlett
  0 siblings, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-03-06 15:46 UTC (permalink / raw)
  To: Liam R. Howlett, airlied, daniel, tzimmermann, mripard, corbet,
	christian.koenig, bskeggs, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, dri-devel,
	nouveau, linux-doc, linux-mm, linux-kernel, Dave Airlie

On 3/2/23 03:38, Liam R. Howlett wrote:
> * Danilo Krummrich <dakr@redhat.com> [230227 08:17]:
> 
> ...
>>>> Would this variant be significantly more efficient?
>>>
>>> Well, what you are doing is walking the tree to see if there's anything
>>> there... then re-walking the tree to store it.  So, yes, it's much more
>>> efficient..  However, writing is heavier.  How much of the time is spent
>>> walking vs writing depends on the size of the tree, but it's rather easy
>>> to do this in a single walk of the tree so why wouldn't you?
>>
>> I will, I was just curious about how much of an impact it has.
>>
>>>
>>>>
>>>> Also, would this also work while already walking the tree?
>>>
>>> Yes, to an extent.  If you are at the correct location in the tree, you
>>> can write to that location.  If you are not in the correct location and
>>> try to write to the tree then things will go poorly..  In this scenario,
>>> we are very much walking the tree and writing to it in two steps.
>>>
>>>>
>>>> To remove an entry while walking the tree I have a separate function
>>>> drm_gpuva_iter_remove(). Would I need something similar for inserting
>>>> entries?
>>>
>>> I saw that.  Your remove function uses the erase operation which is
>>> implemented as a walk to that location and a store of a null over the
>>> range that is returned.  You do not need a function to insert an entry
>>> if the maple state is at the correct location, and that doesn't just
>>> mean setting mas.index/mas.last to the correct value.  There is a node &
>>> offset saved in the maple state that needs to be in the correct
>>> location.  If you store to that node then the node may be replaced, so
>>> other iterators that you have may become stale, but the one you used
>>> execute the store operation will now point to the new node with the new
>>> entry.
>>>
>>>>
>>>> I already provided this example in a separate mail thread, but it may makes
>>>> sense to move this to the mailing list:
>>>>
>>>> In __drm_gpuva_sm_map() we're iterating a given range of the tree, where the
>>>> given range is the size of the newly requested mapping. __drm_gpuva_sm_map()
>>>> invokes a callback for each sub-operation that needs to be taken in order to
>>>> fulfill this mapping request. In most cases such a callback just creates a
>>>> drm_gpuva_op object and stores it in a list.
>>>>
>>>> However, drivers can also implement the callback, such that they directly
>>>> execute this operation within the callback.
>>>>
>>>> Let's have a look at the following example:
>>>>
>>>>        0     a     2
>>>> old: |-----------|       (bo_offset=n)
>>>>
>>>>              1     b     3
>>>> req:       |-----------| (bo_offset=m)
>>>>
>>>>        0  a' 1     b     3
>>>> new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
>>>>
>>>> This would result in the following operations.
>>>>
>>>> __drm_gpuva_sm_map() finds entry "a" and calls back into the driver
>>>> suggesting to re-map "a" with the new size. The driver removes entry "a"
>>>> from the tree and adds "a'"
>>>
>>> What you have here won't work.  The driver will cause your iterators
>>> maple state to point to memory that is freed.  You will either need to
>>> pass through your iterator so that the modifications can occur with that
>>> maple state so it remains valid, or you will need to invalidate the
>>> iterator on every modification by the driver.
>>>
>>> I'm sure the first idea you have will be to invalidate the iterator, but
>>> that is probably not the way to proceed.  Even ignoring the unclear
>>> locking of two maple states trying to modify the tree, this is rather
>>> inefficient - each invalidation means a re-walk of the tree.  You may as
>>> well not use an iterator in this case.
>>>
>>> Depending on how/when the lookups occur, you could still iterate over
>>> the tree and let the driver modify the ending of "a", but leave the tree
>>> alone and just store b over whatever - but the failure scenarios may
>>> cause you grief.
>>>
>>> If you pass the iterator through, then you can just use it to do your
>>> writes and keep iterating as if nothing changed.
>>
>> Passing through the iterater clearly seems to be the way to go.
>>
>> I assume that if the entry to insert isn't at the location of the iterator
>> (as in the following example) we can just keep walking to this location my
>> changing the index of the mas and calling mas_walk()?
> 
> no.  You have to mas_set() to the value and walk from the top of the
> tree.  mas_walk() walks down, not from side to side - well, it does go
> forward within a node (increasing offset), but if you hit the node limit
> then you have gotten yourself in trouble.
> 
>> This would also imply
>> that the "outer" tree walk continues after the entry we just inserted,
>> right?
> 
> I don't understand the "outer" tree walk statement.

I think I could have phrased this better. I just mean "my" iterator 
walking each tree entry rather than an internal tree walk, as it happens 
in e.g. mas_walk() or mas_find().

> 
>>
>>             1     a     3
>> old:       |-----------| (bo_offset=n)
>>
>>       0     b     2
>> req: |-----------|       (bo_offset=m)
>>
>>       0     b     2  a' 3
>> new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
>>
>> Again, after finding "a", we want to remove it and insert "a'" instead.
> 
> Ah, so you could walk to 0, see that it's NULL from 0 - 1, call
> mas_next() and get "a" from 1 - 3, write "a'" from 2 - 3:
> 
>          0     1  a   2  a' 3
> broken: |-----|------|-----| (a is broken in this 1/2 step)
> 
> mas_set_range(&mas, 0, 2); /* Resets the tree location to MAS_START */
> mas_store(&mas, b);
>          0     b     2  a' 3
> new:    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> 
> 
> You can *probably* also get away with this:
> 
> walk to 0, see that it's NULL from 0 - 1, call mas_next() and get "a"
> from 1 - 3, write "a'" from 2 - 3:
> 
>          0     1  a   2  a' 3
> broken: |-----|------|-----| (a is broken in this 1/2 step)
> 
> mas_prev(&mas, 0); /* Looking at broken a from 1-2.
> mas_store(&mas, NULL); /* NULL is expanded on write to 0-2.
>              0    NULL   2  a' 3
> broken':    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> 
> mas_store(&mas, b);
>          0     b     2  a' 3
> new:    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> 
> You may want to iterate backwards and do the writes as you go until you
> have enough room.. it really depends how you want to go about doing
> things.

I see, again thanks for explaining.

I think I would prefer to either (1) have generic insert() function with 
a similar behavior as when iterating through a list or (2) have a 
function dedicated to the "split" use case.

1) When iterating the tree inserting entries at arbitrary locations 
should not influence the next iteration step. Unless the new entry 
really is the next entry, but that'd be optional. I don't see a use case 
for that.

2) Similar to how you broke it down above I could imagine a function 
dedicated to the split operation. This would be similar to what you 
mention for mmap below. However, it wouldn't be a single operation.

The GPUVA manager provides sub-operations to the driver for a single 
mapping request. Those can be an arbitrary amount of unmaps (for 
mappings "in the way", as you say below), one or two remaps (for splits 
at the beginning or end or both) and exactly one map (which is the last 
sub-operation adding the newly requested mapping).

Remaps consist out of the mapping to unmap and one or two new mappings 
to map. The only case where a remap sub-op has two new mappings to map 
is when the newly requested mapping is enclosed by a single existing 
mapping. If we overlap a mapping at the beginning and another one at the 
end this would be two separate remap sub-ops. Of course, between the two 
remaps there could be an arbitrary amount of unmap sub-ops.

Unmap sub-ops are simple, I just need to remove a single entry in the 
tree. drm_gpuva_iter_remove() should be fine for that.

For remap sub-ops, I would need a function that removes an entry and 
then adds one or two new entries within the range of the removed one. 
The next loop iteration should then continue at the entry (is any) after 
the range of the removed one.

However, I'm unsure how to implement this. Would I need to just do a 
mas_store() of the new entry/entries (since the nodes should already be 
allocated) and then clean up the nodes that are left with mas_erase()?

Let's say there is an entry A = [0 - 5] and I want to replace it with B 
= [0 - 1] and C = [4 - 5].

Could I just store B and C and then somehow clean up the range [2 - 3]?

Maybe 1) would be the most flexible way, however, if 2) can be 
implemented more efficiently that's perfectly fine too.

> 
>>
>>>
>>>>
>>>> __drm_gpuva_sm_map(), ideally, continues the loop searching for nodes
>>>> starting from the end of "a" (which is 2) till the end of the requested
>>>> mapping "b" (which is 3). Since it doesn't find any other mapping within
>>>> this range it calls back into the driver suggesting to finally map "b".
>>>>
>>>> If there would have been another mapping between 2 and 3 it would have
>>>> called back into the driver asking to unmap this mapping beforehand.
>>>>
>>>> So, it boils down to re-mapping as described at the beginning (and
>>>> analogously at the end) of a new mapping range and removing of entries that
>>>> are enclosed by the new mapping range.
>>>
>>> I assume the unmapped area is no longer needed, and the 're-map' is
>>> really a removal of information?  Otherwise I'd suggest searching for a
>>> gap which fits your request.  What you have here is a lot like
>>> "MAP_FIXED" vs top-down/bottom-up search in the VMA code, this seems to
>>> be like your __drm_gpuva_sm_map() and the drm mm range allocator with
>>> DRM_MM_INSERT_LOW, and DRM_MM_INSERT_HIGH.
>>>
>>> Why can these split/unmappings fail?  Is it because they are still
>>> needed?
>>>
>>
>> You mean the check before the mas_*() operations in drm_gpuva_insert()?
> 
> Yes, the callbacks.
> 
>>
>> Removing entries should never fail, inserting entries should fail when the
>> caller tries to store to an area outside of the VA space (it doesn't
>> necessarily span the whole 64-bit space), a kernel reserved area of the VA
>> space, is not in any pre-allocated range of the VA space (if regions are
>> enabled) or an entry already exists at that location.
> 
> In the mmap code, I have to deal with splitting the start/end VMA and
> removing any VMAs in the way.  I do this by making a 'detached' tree
> that is dealt with later, then just overwriting the area with one
> mas_store() operation.  Would something like that work for you?

I think this is pretty much the same thing I want to do, hence this 
should work. However, this would require more state keeping for the 
whole iteration, I guess. Drivers shouldn't know how the GPUVA manager 
keeps track of mappings internally (and hence they shouldn't know about 
the maple tree). If I could get away with something similar to what I 
wrote above, I think I'd probably not add this extra complexity, unless 
there are relevant performance reasons to do so.

> 
>>
>>>>
>>>>>> +	if (unlikely(ret))
>>>>>> +		return ret;
>>>>>> +
>>>>>> +	va->mgr = mgr;
>>>>>> +	va->region = reg;
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_gpuva_insert);
>>>>>> +
>>>>>> +/**
>>>>>> + * drm_gpuva_remove - remove a &drm_gpuva
>>>>>> + * @va: the &drm_gpuva to remove
>>>>>> + *
>>>>>> + * This removes the given &va from the underlaying tree.
>>>>>> + */
>>>>>> +void
>>>>>> +drm_gpuva_remove(struct drm_gpuva *va)
>>>>>> +{
>>>>>> +	MA_STATE(mas, &va->mgr->va_mt, va->va.addr, 0);
>>>>>> +
>>>>>> +	mas_erase(&mas);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_gpuva_remove);
>>>>>> +
>>>>> ...
>>>>>
>>>>>> +/**
>>>>>> + * drm_gpuva_find_first - find the first &drm_gpuva in the given range
>>>>>> + * @mgr: the &drm_gpuva_manager to search in
>>>>>> + * @addr: the &drm_gpuvas address
>>>>>> + * @range: the &drm_gpuvas range
>>>>>> + *
>>>>>> + * Returns: the first &drm_gpuva within the given range
>>>>>> + */
>>>>>> +struct drm_gpuva *
>>>>>> +drm_gpuva_find_first(struct drm_gpuva_manager *mgr,
>>>>>> +		     u64 addr, u64 range)
>>>>>> +{
>>>>>> +	MA_STATE(mas, &mgr->va_mt, addr, 0);
>>>>>> +
>>>>>> +	return mas_find(&mas, addr + range - 1);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_gpuva_find_first);
>>>>>> +
>>>>>> +/**
>>>>>> + * drm_gpuva_find - find a &drm_gpuva
>>>>>> + * @mgr: the &drm_gpuva_manager to search in
>>>>>> + * @addr: the &drm_gpuvas address
>>>>>> + * @range: the &drm_gpuvas range
>>>>>> + *
>>>>>> + * Returns: the &drm_gpuva at a given &addr and with a given &range
>>>>>
>>>>> Note that mas_find() will continue upwards in the address space if there
>>>>> isn't anything at @addr.  This means that &drm_gpuva may not be at
>>>>> &addr.  If you want to check just at &addr, use mas_walk().
>>>>
>>>> Good catch. drm_gpuva_find() should then either also check for 'va->va.addr
>>>> == addr' as well or, alternatively, use mas_walk(). As above, any reason to
>>>> prefer mas_walk()?
> 
> I think I missed this question last time..
> 
> Internally, mas_find() is just a mas_walk() on the first call, then
> mas_next() for each call after that.  If, during the mas_walk(), there
> is no value at addr, it immediately calls mas_next() to get a value to
> return.  It will continue upwards until the limit is reached (addr +
> range - 1 in your case).
> 
> So if you only want to know if there is something at addr, then it's
> best to use mas_walk() and keep things a bit more efficient.  Then you
> can check mas.last for your end value.
> 
> If you do want the first VMA within the range passed in, then mas_find()
> is the function you want.
> 
>>>>
>>>>>
>>>>>> + */
>>>>>> +struct drm_gpuva *
>>>>>> +drm_gpuva_find(struct drm_gpuva_manager *mgr,
>>>>>> +	       u64 addr, u64 range)
>>>>>> +{
>>>>>> +	struct drm_gpuva *va;
>>>>>> +
>>>>>> +	va = drm_gpuva_find_first(mgr, addr, range);
>>>>>> +	if (!va)
>>>>>> +		goto out;
>>>>>> +
>>>>>> +	if (va->va.range != range)
>>>>>> +		goto out;
>>>>>> +
>>>>>> +	return va;
>>>>>> +
>>>>>> +out:
>>>>>> +	return NULL;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_gpuva_find);
>>>>>> +
>>>>>> +/**
>>>>>> + * drm_gpuva_find_prev - find the &drm_gpuva before the given address
>>>>>> + * @mgr: the &drm_gpuva_manager to search in
>>>>>> + * @start: the given GPU VA's start address
>>>>>> + *
>>>>>> + * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
>>>>>> + *
>>>>>> + * Note that if there is any free space between the GPU VA mappings no mapping
>>>>>> + * is returned.
>>>>>> + *
>>>>>> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
>>>>>> + */
>>>>>> +struct drm_gpuva *
>>>>>> +drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
>>>>>
>>>>> find_prev() usually continues beyond 1 less than the address. I found
>>>>> this name confusing.
>>>>
>>>> Don't really get that, mind explaining?
>>>
>>> When I ask for the previous one in a list or tree, I think the one
>>> before.. but since you are limiting your search from start to start - 1,
>>> you may as well walk to start - 1 and see if one exists.
>>>
>>> Is that what you meant to do here?
>>
>> Yes, I want to know whether there is a previous entry which ends right
>> before the current entry, without a gap between the two.
>>
>>>
>>>>
>>>>> You may as well use mas_walk(), it would be faster.
>>>>
>>>> How would I use mas_walk() for that? If I understand it correctly,
>>>> mas_walk() requires me to know that start address, which I don't know for
>>>> the previous entry.
>>>
>>> mas_walk() walks to the value you specify and returns the entry at that
>>> address, not necessarily the start address, but any address in the
>>> range.
>>>
>>> If you have a tree and store A = [0x1000 - 0x2000] and set your maple
>>> state to walk to 0x1500, mas_walk() will return A, and the maple state
>>> will have mas.index = 0x1000 and mas.last = 0x2000.
>>>
>>> You have set the maple state to start at "start" and called
>>> mas_prev(&mas, start - 1).  start - 1 is the lower limit, so the
>>> internal implementation will walk to start then go to the previous entry
>>> until start - 1.. it will stop at start - 1 and return NULL if there
>>> isn't one there.
>>
>> Thanks for the clarification and all the other very helpful comments and
>> explanations!
>>
> 
> Always glad to help.  The more users the tree has, the more I can see
> where we may need to expand the interface to help others.
> 
> ...
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-03-06 15:46             ` Danilo Krummrich
@ 2023-03-07 22:43               ` Liam R. Howlett
  2023-03-13 23:46                 ` Danilo Krummrich
  0 siblings, 1 reply; 64+ messages in thread
From: Liam R. Howlett @ 2023-03-07 22:43 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Dave Airlie, bagasdotme, christian.koenig, jason

* Danilo Krummrich <dakr@redhat.com> [230306 10:46]:
> On 3/2/23 03:38, Liam R. Howlett wrote:
> > * Danilo Krummrich <dakr@redhat.com> [230227 08:17]:
> > 
> > ...
> > > > > Would this variant be significantly more efficient?
> > > > 
> > > > Well, what you are doing is walking the tree to see if there's anything
> > > > there... then re-walking the tree to store it.  So, yes, it's much more
> > > > efficient..  However, writing is heavier.  How much of the time is spent
> > > > walking vs writing depends on the size of the tree, but it's rather easy
> > > > to do this in a single walk of the tree so why wouldn't you?
> > > 
> > > I will, I was just curious about how much of an impact it has.
> > > 
> > > > 
> > > > > 
> > > > > Also, would this also work while already walking the tree?
> > > > 
> > > > Yes, to an extent.  If you are at the correct location in the tree, you
> > > > can write to that location.  If you are not in the correct location and
> > > > try to write to the tree then things will go poorly..  In this scenario,
> > > > we are very much walking the tree and writing to it in two steps.
> > > > 
> > > > > 
> > > > > To remove an entry while walking the tree I have a separate function
> > > > > drm_gpuva_iter_remove(). Would I need something similar for inserting
> > > > > entries?
> > > > 
> > > > I saw that.  Your remove function uses the erase operation which is
> > > > implemented as a walk to that location and a store of a null over the
> > > > range that is returned.  You do not need a function to insert an entry
> > > > if the maple state is at the correct location, and that doesn't just
> > > > mean setting mas.index/mas.last to the correct value.  There is a node &
> > > > offset saved in the maple state that needs to be in the correct
> > > > location.  If you store to that node then the node may be replaced, so
> > > > other iterators that you have may become stale, but the one you used
> > > > execute the store operation will now point to the new node with the new
> > > > entry.
> > > > 
> > > > > 
> > > > > I already provided this example in a separate mail thread, but it may makes
> > > > > sense to move this to the mailing list:
> > > > > 
> > > > > In __drm_gpuva_sm_map() we're iterating a given range of the tree, where the
> > > > > given range is the size of the newly requested mapping. __drm_gpuva_sm_map()
> > > > > invokes a callback for each sub-operation that needs to be taken in order to
> > > > > fulfill this mapping request. In most cases such a callback just creates a
> > > > > drm_gpuva_op object and stores it in a list.
> > > > > 
> > > > > However, drivers can also implement the callback, such that they directly
> > > > > execute this operation within the callback.
> > > > > 
> > > > > Let's have a look at the following example:
> > > > > 
> > > > >        0     a     2
> > > > > old: |-----------|       (bo_offset=n)
> > > > > 
> > > > >              1     b     3
> > > > > req:       |-----------| (bo_offset=m)
> > > > > 
> > > > >        0  a' 1     b     3
> > > > > new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
> > > > > 
> > > > > This would result in the following operations.
> > > > > 
> > > > > __drm_gpuva_sm_map() finds entry "a" and calls back into the driver
> > > > > suggesting to re-map "a" with the new size. The driver removes entry "a"
> > > > > from the tree and adds "a'"
> > > > 
> > > > What you have here won't work.  The driver will cause your iterators
> > > > maple state to point to memory that is freed.  You will either need to
> > > > pass through your iterator so that the modifications can occur with that
> > > > maple state so it remains valid, or you will need to invalidate the
> > > > iterator on every modification by the driver.
> > > > 
> > > > I'm sure the first idea you have will be to invalidate the iterator, but
> > > > that is probably not the way to proceed.  Even ignoring the unclear
> > > > locking of two maple states trying to modify the tree, this is rather
> > > > inefficient - each invalidation means a re-walk of the tree.  You may as
> > > > well not use an iterator in this case.
> > > > 
> > > > Depending on how/when the lookups occur, you could still iterate over
> > > > the tree and let the driver modify the ending of "a", but leave the tree
> > > > alone and just store b over whatever - but the failure scenarios may
> > > > cause you grief.
> > > > 
> > > > If you pass the iterator through, then you can just use it to do your
> > > > writes and keep iterating as if nothing changed.
> > > 
> > > Passing through the iterater clearly seems to be the way to go.
> > > 
> > > I assume that if the entry to insert isn't at the location of the iterator
> > > (as in the following example) we can just keep walking to this location my
> > > changing the index of the mas and calling mas_walk()?
> > 
> > no.  You have to mas_set() to the value and walk from the top of the
> > tree.  mas_walk() walks down, not from side to side - well, it does go
> > forward within a node (increasing offset), but if you hit the node limit
> > then you have gotten yourself in trouble.
> > 
> > > This would also imply
> > > that the "outer" tree walk continues after the entry we just inserted,
> > > right?
> > 
> > I don't understand the "outer" tree walk statement.
> 
> I think I could have phrased this better. I just mean "my" iterator walking
> each tree entry rather than an internal tree walk, as it happens in e.g.
> mas_walk() or mas_find().
> 
> > 
> > > 
> > >             1     a     3
> > > old:       |-----------| (bo_offset=n)
> > > 
> > >       0     b     2
> > > req: |-----------|       (bo_offset=m)
> > > 
> > >       0     b     2  a' 3
> > > new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> > > 
> > > Again, after finding "a", we want to remove it and insert "a'" instead.
> > 
> > Ah, so you could walk to 0, see that it's NULL from 0 - 1, call
> > mas_next() and get "a" from 1 - 3, write "a'" from 2 - 3:
> > 
> >          0     1  a   2  a' 3
> > broken: |-----|------|-----| (a is broken in this 1/2 step)
> > 
> > mas_set_range(&mas, 0, 2); /* Resets the tree location to MAS_START */
> > mas_store(&mas, b);
> >          0     b     2  a' 3
> > new:    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> > 
> > 
> > You can *probably* also get away with this:
> > 
> > walk to 0, see that it's NULL from 0 - 1, call mas_next() and get "a"
> > from 1 - 3, write "a'" from 2 - 3:
> > 
> >          0     1  a   2  a' 3
> > broken: |-----|------|-----| (a is broken in this 1/2 step)
> > 
> > mas_prev(&mas, 0); /* Looking at broken a from 1-2.
> > mas_store(&mas, NULL); /* NULL is expanded on write to 0-2.
> >              0    NULL   2  a' 3
> > broken':    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> > 
> > mas_store(&mas, b);
> >          0     b     2  a' 3
> > new:    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> > 
> > You may want to iterate backwards and do the writes as you go until you
> > have enough room.. it really depends how you want to go about doing
> > things.
> 
> I see, again thanks for explaining.
> 
> I think I would prefer to either (1) have generic insert() function with a
> similar behavior as when iterating through a list or (2) have a function
> dedicated to the "split" use case.
> 
> 1) When iterating the tree inserting entries at arbitrary locations should
> not influence the next iteration step. Unless the new entry really is the
> next entry, but that'd be optional. I don't see a use case for that.
> 
> 2) Similar to how you broke it down above I could imagine a function
> dedicated to the split operation. This would be similar to what you mention
> for mmap below. However, it wouldn't be a single operation.
> 
> The GPUVA manager provides sub-operations to the driver for a single mapping
> request. Those can be an arbitrary amount of unmaps (for mappings "in the
> way", as you say below), one or two remaps (for splits at the beginning or
> end or both) and exactly one map (which is the last sub-operation adding the
> newly requested mapping).
> 
> Remaps consist out of the mapping to unmap and one or two new mappings to
> map. The only case where a remap sub-op has two new mappings to map is when
> the newly requested mapping is enclosed by a single existing mapping. If we
> overlap a mapping at the beginning and another one at the end this would be
> two separate remap sub-ops. Of course, between the two remaps there could be
> an arbitrary amount of unmap sub-ops.
> 
> Unmap sub-ops are simple, I just need to remove a single entry in the tree.
> drm_gpuva_iter_remove() should be fine for that.
> 
> For remap sub-ops, I would need a function that removes an entry and then
> adds one or two new entries within the range of the removed one. The next
> loop iteration should then continue at the entry (is any) after the range of
> the removed one.
> 
> However, I'm unsure how to implement this. Would I need to just do a
> mas_store() of the new entry/entries (since the nodes should already be
> allocated) and then clean up the nodes that are left with mas_erase()?
> 
> Let's say there is an entry A = [0 - 5] and I want to replace it with B = [0
> - 1] and C = [4 - 5].
> 
> Could I just store B and C and then somehow clean up the range [2 - 3]?

The most efficient way:
mas_set(&mas, 0);
// Walk down to 0
mas_walk(&mas);
// We are now pointing at A (index = 0, last = 5)
mas.last = 1;
// No walk here.
mas_store(&mas, B);
// Going to the next entry is very fast.
mas_next(&mas)
// We are now pointing at a fragment of A (index = 2, last = 5)
mas.last = 3;
// No walk here.
mas_store(&mas, NULL);
// Going to the next entry is very fast
mas_next(&mas);
// We are now pointing at a fragment of A (index = 4, last = 5)
mas_store(&mas, C);

Less efficient, but still fine:
// Walk down to 0 and store
mas_set_range(&mas, 0, 1);
mas_store(&mas, B);
// Reset to the top of the tree
mas_set_range(&mas, 4, 5);
// Walk down to 4 and store
mas_store(&mas, C);
// Reset to the top of the tree
mas_set_range(&mas, 2, 3);
// Walk down to 2 and store
mas_store(&mas, NULL);


> 
> Maybe 1) would be the most flexible way, however, if 2) can be implemented
> more efficiently that's perfectly fine too.

You can do anything you want, but the more you can use the same maple
state and save walking from the top the more efficient it will be.
Every level is another dereference down the tree..  We do have a
branching factor of 16 here, so I don't know the size of your tree and
how worth the effort it is for you.

> 
> > 
> > > 
> > > > 
> > > > > 
> > > > > __drm_gpuva_sm_map(), ideally, continues the loop searching for nodes
> > > > > starting from the end of "a" (which is 2) till the end of the requested
> > > > > mapping "b" (which is 3). Since it doesn't find any other mapping within
> > > > > this range it calls back into the driver suggesting to finally map "b".
> > > > > 
> > > > > If there would have been another mapping between 2 and 3 it would have
> > > > > called back into the driver asking to unmap this mapping beforehand.
> > > > > 
> > > > > So, it boils down to re-mapping as described at the beginning (and
> > > > > analogously at the end) of a new mapping range and removing of entries that
> > > > > are enclosed by the new mapping range.
> > > > 
> > > > I assume the unmapped area is no longer needed, and the 're-map' is
> > > > really a removal of information?  Otherwise I'd suggest searching for a
> > > > gap which fits your request.  What you have here is a lot like
> > > > "MAP_FIXED" vs top-down/bottom-up search in the VMA code, this seems to
> > > > be like your __drm_gpuva_sm_map() and the drm mm range allocator with
> > > > DRM_MM_INSERT_LOW, and DRM_MM_INSERT_HIGH.
> > > > 
> > > > Why can these split/unmappings fail?  Is it because they are still
> > > > needed?
> > > > 
> > > 
> > > You mean the check before the mas_*() operations in drm_gpuva_insert()?
> > 
> > Yes, the callbacks.
> > 
> > > 
> > > Removing entries should never fail, inserting entries should fail when the
> > > caller tries to store to an area outside of the VA space (it doesn't
> > > necessarily span the whole 64-bit space), a kernel reserved area of the VA
> > > space, is not in any pre-allocated range of the VA space (if regions are
> > > enabled) or an entry already exists at that location.
> > 
> > In the mmap code, I have to deal with splitting the start/end VMA and
> > removing any VMAs in the way.  I do this by making a 'detached' tree
> > that is dealt with later, then just overwriting the area with one
> > mas_store() operation.  Would something like that work for you?
> 
> I think this is pretty much the same thing I want to do, hence this should
> work. However, this would require more state keeping for the whole
> iteration, I guess. Drivers shouldn't know how the GPUVA manager keeps track
> of mappings internally (and hence they shouldn't know about the maple tree).
> If I could get away with something similar to what I wrote above, I think
> I'd probably not add this extra complexity, unless there are relevant
> performance reasons to do so.
> 

Well maybe you can tell your drivers that there's something in the way
and they can remove it from their end but not alter the tree.  Sort of
like a "Untracked" callback.

If you have a "This range has changed to X-Y" then you can use this
along with the "Untracked" to implement the above in a single
write.  Iterate through the area , call back to the driver to alter the
start range, then keep "Untracing" until you alter the end range.

...

Cheers,
Liam

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (15 preceding siblings ...)
  2023-02-17 13:48 ` [PATCH drm-next v2 16/16] drm/nouveau: debugfs: implement DRM GPU VA debugfs Danilo Krummrich
@ 2023-03-09  9:12 ` Boris Brezillon
  2023-03-09  9:48   ` Boris Brezillon
  16 siblings, 1 reply; 64+ messages in thread
From: Boris Brezillon @ 2023-03-09  9:12 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, bskeggs, tzimmermann,
	Liam.Howlett, bagasdotme, christian.koenig, jason

Hi Danilo,

On Fri, 17 Feb 2023 14:44:06 +0100
Danilo Krummrich <dakr@redhat.com> wrote:

> Changes in V2:
> ==============
>   Nouveau:
>     - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
>       signalling critical sections. Updates to the VA space are split up in three
>       separate stages, where only the 2. stage executes in a fence signalling
>       critical section:
> 
>         1. update the VA space, allocate new structures and page tables

Sorry for the silly question, but I didn't find where the page tables
pre-allocation happens. Mind pointing it to me? It's also unclear when
this step happens. Is this at bind-job submission time, when the job is
not necessarily ready to run, potentially waiting for other deps to be
signaled. Or is it done when all deps are met, as an extra step before
jumping to step 2. If that's the former, then I don't see how the VA
space update can happen, since the bind-job might depend on other
bind-jobs modifying the same portion of the VA space (unbind ops might
lead to intermediate page table levels disappearing while we were
waiting for deps). If it's the latter, I wonder why this is not
considered as an allocation in the fence signaling path (for the
bind-job out-fence to be signaled, you need these allocations to
succeed, unless failing to allocate page-tables is considered like a HW
misbehavior and the fence is signaled with an error in that case).

Note that I'm not familiar at all with Nouveau or TTM, and it might
be something that's solved by another component, or I'm just
misunderstanding how the whole thing is supposed to work. This being
said, I'd really like to implement a VM_BIND-like uAPI in pancsf using
the gpuva_manager infra you're proposing here, so please bare with me
:-).

>         2. (un-)map the requested memory bindings
>         3. free structures and page tables
> 
>     - Separated generic job scheduler code from specific job implementations.
>     - Separated the EXEC and VM_BIND implementation of the UAPI.
>     - Reworked the locking parts of the nvkm/vmm RAW interface, such that
>       (un-)map operations can be executed in fence signalling critical sections.
> 

Regards,

Boris


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-03-09  9:12 ` [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Boris Brezillon
@ 2023-03-09  9:48   ` Boris Brezillon
  2023-03-10 16:45     ` Danilo Krummrich
  0 siblings, 1 reply; 64+ messages in thread
From: Boris Brezillon @ 2023-03-09  9:48 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, bskeggs, tzimmermann,
	Liam.Howlett, bagasdotme, christian.koenig, jason

On Thu, 9 Mar 2023 10:12:43 +0100
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> Hi Danilo,
> 
> On Fri, 17 Feb 2023 14:44:06 +0100
> Danilo Krummrich <dakr@redhat.com> wrote:
> 
> > Changes in V2:
> > ==============
> >   Nouveau:
> >     - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
> >       signalling critical sections. Updates to the VA space are split up in three
> >       separate stages, where only the 2. stage executes in a fence signalling
> >       critical section:
> > 
> >         1. update the VA space, allocate new structures and page tables  
> 
> Sorry for the silly question, but I didn't find where the page tables
> pre-allocation happens. Mind pointing it to me? It's also unclear when
> this step happens. Is this at bind-job submission time, when the job is
> not necessarily ready to run, potentially waiting for other deps to be
> signaled. Or is it done when all deps are met, as an extra step before
> jumping to step 2. If that's the former, then I don't see how the VA
> space update can happen, since the bind-job might depend on other
> bind-jobs modifying the same portion of the VA space (unbind ops might
> lead to intermediate page table levels disappearing while we were
> waiting for deps). If it's the latter, I wonder why this is not
> considered as an allocation in the fence signaling path (for the
> bind-job out-fence to be signaled, you need these allocations to
> succeed, unless failing to allocate page-tables is considered like a HW
> misbehavior and the fence is signaled with an error in that case).

Ok, so I just noticed you only have one bind queue per drm_file
(cli->sched_entity), and jobs are executed in-order on a given queue,
so I guess that allows you to modify the VA space at submit time
without risking any modifications to the VA space coming from other
bind-queues targeting the same VM. And, if I'm correct, synchronous
bind/unbind ops take the same path, so no risk for those to modify the
VA space either (just wonder if it's a good thing to have to sync
bind/unbind operations waiting on async ones, but that's a different
topic).

> 
> Note that I'm not familiar at all with Nouveau or TTM, and it might
> be something that's solved by another component, or I'm just
> misunderstanding how the whole thing is supposed to work. This being
> said, I'd really like to implement a VM_BIND-like uAPI in pancsf using
> the gpuva_manager infra you're proposing here, so please bare with me
> :-).
> 
> >         2. (un-)map the requested memory bindings
> >         3. free structures and page tables
> > 
> >     - Separated generic job scheduler code from specific job implementations.
> >     - Separated the EXEC and VM_BIND implementation of the UAPI.
> >     - Reworked the locking parts of the nvkm/vmm RAW interface, such that
> >       (un-)map operations can be executed in fence signalling critical sections.
> >   
> 
> Regards,
> 
> Boris
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-03-09  9:48   ` Boris Brezillon
@ 2023-03-10 16:45     ` Danilo Krummrich
  2023-03-10 17:25       ` Boris Brezillon
  0 siblings, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-03-10 16:45 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, bskeggs, tzimmermann,
	Liam.Howlett, bagasdotme, christian.koenig, jason

Hi Boris,

On 3/9/23 10:48, Boris Brezillon wrote:
> On Thu, 9 Mar 2023 10:12:43 +0100
> Boris Brezillon <boris.brezillon@collabora.com> wrote:
> 
>> Hi Danilo,
>>
>> On Fri, 17 Feb 2023 14:44:06 +0100
>> Danilo Krummrich <dakr@redhat.com> wrote:
>>
>>> Changes in V2:
>>> ==============
>>>    Nouveau:
>>>      - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
>>>        signalling critical sections. Updates to the VA space are split up in three
>>>        separate stages, where only the 2. stage executes in a fence signalling
>>>        critical section:
>>>
>>>          1. update the VA space, allocate new structures and page tables
>>
>> Sorry for the silly question, but I didn't find where the page tables
>> pre-allocation happens. Mind pointing it to me? It's also unclear when
>> this step happens. Is this at bind-job submission time, when the job is
>> not necessarily ready to run, potentially waiting for other deps to be
>> signaled. Or is it done when all deps are met, as an extra step before
>> jumping to step 2. If that's the former, then I don't see how the VA
>> space update can happen, since the bind-job might depend on other
>> bind-jobs modifying the same portion of the VA space (unbind ops might
>> lead to intermediate page table levels disappearing while we were
>> waiting for deps). If it's the latter, I wonder why this is not
>> considered as an allocation in the fence signaling path (for the
>> bind-job out-fence to be signaled, you need these allocations to
>> succeed, unless failing to allocate page-tables is considered like a HW
>> misbehavior and the fence is signaled with an error in that case).
> 
> Ok, so I just noticed you only have one bind queue per drm_file
> (cli->sched_entity), and jobs are executed in-order on a given queue,
> so I guess that allows you to modify the VA space at submit time
> without risking any modifications to the VA space coming from other
> bind-queues targeting the same VM. And, if I'm correct, synchronous
> bind/unbind ops take the same path, so no risk for those to modify the
> VA space either (just wonder if it's a good thing to have to sync
> bind/unbind operations waiting on async ones, but that's a different
> topic).

Yes, that's all correct.

The page table allocation happens through nouveau_uvmm_vmm_get() which 
either allocates the corresponding page tables or increases the 
reference count, in case they already exist, accordingly.
The call goes all the way through nvif into the nvkm layer (not the 
easiest to follow the call chain) and ends up in nvkm_vmm_ptes_get().

There are multiple reasons for updating the VA space at submit time in 
Nouveau.

1) Subsequent EXEC ioctl() calls would need to wait for the bind jobs 
they depend on within the ioctl() rather than in the scheduler queue, 
because at the point of time where the ioctl() happens the VA space 
wouldn't be up-to-date.

2) Let's assume a new mapping is requested and within it's range other 
mappings already exist. Let's also assume that those existing mappings 
aren't contiguous, such that there are gaps between them. In such a case 
I need to allocate page tables only for the gaps between the existing 
mappings, or alternatively, allocate them for the whole range of the new 
mapping, but free / decrease the reference count of the page tables for 
the ranges of the previously existing mappings afterwards.
In the first case I need to know the gaps to allocate page tables for 
when submitting the job, which means the VA space must be up-to-date. In 
the latter one I must save the ranges of the previously existing 
mappings somewhere in order to clean them up, hence I need to allocate 
memory to store this information. Since I can't allocate this memory in 
the jobs run() callback (fence signalling critical section) I need to do 
it when submitting the job already and hence the VA space must be 
up-to-date again.
However, this is due to how page table management currently works in 
Nouveau and we might change that in the future.

Synchronous binds/unbinds taking the same path through the scheduler is 
a downside of this approach.

- Danilo

> 
>>
>> Note that I'm not familiar at all with Nouveau or TTM, and it might
>> be something that's solved by another component, or I'm just
>> misunderstanding how the whole thing is supposed to work. This being
>> said, I'd really like to implement a VM_BIND-like uAPI in pancsf using
>> the gpuva_manager infra you're proposing here, so please bare with me
>> :-).
>>
>>>          2. (un-)map the requested memory bindings
>>>          3. free structures and page tables
>>>
>>>      - Separated generic job scheduler code from specific job implementations.
>>>      - Separated the EXEC and VM_BIND implementation of the UAPI.
>>>      - Reworked the locking parts of the nvkm/vmm RAW interface, such that
>>>        (un-)map operations can be executed in fence signalling critical sections.
>>>    
>>
>> Regards,
>>
>> Boris
>>
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-03-10 16:45     ` Danilo Krummrich
@ 2023-03-10 17:25       ` Boris Brezillon
  2023-03-10 20:06         ` Danilo Krummrich
  0 siblings, 1 reply; 64+ messages in thread
From: Boris Brezillon @ 2023-03-10 17:25 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, bskeggs, tzimmermann,
	Liam.Howlett, bagasdotme, christian.koenig, jason

Hi Danilo,

On Fri, 10 Mar 2023 17:45:58 +0100
Danilo Krummrich <dakr@redhat.com> wrote:

> Hi Boris,
> 
> On 3/9/23 10:48, Boris Brezillon wrote:
> > On Thu, 9 Mar 2023 10:12:43 +0100
> > Boris Brezillon <boris.brezillon@collabora.com> wrote:
> >   
> >> Hi Danilo,
> >>
> >> On Fri, 17 Feb 2023 14:44:06 +0100
> >> Danilo Krummrich <dakr@redhat.com> wrote:
> >>  
> >>> Changes in V2:
> >>> ==============
> >>>    Nouveau:
> >>>      - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
> >>>        signalling critical sections. Updates to the VA space are split up in three
> >>>        separate stages, where only the 2. stage executes in a fence signalling
> >>>        critical section:
> >>>
> >>>          1. update the VA space, allocate new structures and page tables  
> >>
> >> Sorry for the silly question, but I didn't find where the page tables
> >> pre-allocation happens. Mind pointing it to me? It's also unclear when
> >> this step happens. Is this at bind-job submission time, when the job is
> >> not necessarily ready to run, potentially waiting for other deps to be
> >> signaled. Or is it done when all deps are met, as an extra step before
> >> jumping to step 2. If that's the former, then I don't see how the VA
> >> space update can happen, since the bind-job might depend on other
> >> bind-jobs modifying the same portion of the VA space (unbind ops might
> >> lead to intermediate page table levels disappearing while we were
> >> waiting for deps). If it's the latter, I wonder why this is not
> >> considered as an allocation in the fence signaling path (for the
> >> bind-job out-fence to be signaled, you need these allocations to
> >> succeed, unless failing to allocate page-tables is considered like a HW
> >> misbehavior and the fence is signaled with an error in that case).  
> > 
> > Ok, so I just noticed you only have one bind queue per drm_file
> > (cli->sched_entity), and jobs are executed in-order on a given queue,
> > so I guess that allows you to modify the VA space at submit time
> > without risking any modifications to the VA space coming from other
> > bind-queues targeting the same VM. And, if I'm correct, synchronous
> > bind/unbind ops take the same path, so no risk for those to modify the
> > VA space either (just wonder if it's a good thing to have to sync
> > bind/unbind operations waiting on async ones, but that's a different
> > topic).  
> 
> Yes, that's all correct.
> 
> The page table allocation happens through nouveau_uvmm_vmm_get() which 
> either allocates the corresponding page tables or increases the 
> reference count, in case they already exist, accordingly.
> The call goes all the way through nvif into the nvkm layer (not the 
> easiest to follow the call chain) and ends up in nvkm_vmm_ptes_get().
> 
> There are multiple reasons for updating the VA space at submit time in 
> Nouveau.
> 
> 1) Subsequent EXEC ioctl() calls would need to wait for the bind jobs 
> they depend on within the ioctl() rather than in the scheduler queue, 
> because at the point of time where the ioctl() happens the VA space 
> wouldn't be up-to-date.

Hm, actually that's what explicit sync is all about, isn't it? If you
have async binding ops, you should retrieve the bind-op out-fences and
pass them back as in-fences to the EXEC call, so you're sure all the
memory mappings you depend on are active when you execute those GPU
jobs. And if you're using sync binds, the changes are guaranteed to be
applied before the ioctl() returns. Am I missing something?

> 
> 2) Let's assume a new mapping is requested and within it's range other 
> mappings already exist. Let's also assume that those existing mappings 
> aren't contiguous, such that there are gaps between them. In such a case 
> I need to allocate page tables only for the gaps between the existing 
> mappings, or alternatively, allocate them for the whole range of the new 
> mapping, but free / decrease the reference count of the page tables for 
> the ranges of the previously existing mappings afterwards.
> In the first case I need to know the gaps to allocate page tables for 
> when submitting the job, which means the VA space must be up-to-date. In 
> the latter one I must save the ranges of the previously existing 
> mappings somewhere in order to clean them up, hence I need to allocate 
> memory to store this information. Since I can't allocate this memory in 
> the jobs run() callback (fence signalling critical section) I need to do 
> it when submitting the job already and hence the VA space must be 
> up-to-date again.

Yep that makes perfect sense, and that explains how the whole thing can
work. When I initially read the patch series, I had more complex use
cases in mind, with multiple bind queues targeting the same VM, and
synchronous bind taking a fast path (so they don't have to wait on
async binds which can in turn wait on external deps). This model makes
it hard to predict what the VA space will look like when an async bind
operation gets to be executed, thus making page table allocation more
complex, or forcing us to over-estimate the amount of pages we need for
this update (basically one page per MMU level, except maybe the top
level, plus the number of pages you'll always need for the bind
operation itself).

> However, this is due to how page table management currently works in 
> Nouveau and we might change that in the future.

I'm curious to hear about that if you have a bit of time. I'm starting
from scratch with pancsf, and I might consider going for something
similar to what you plan to do next.

> 
> Synchronous binds/unbinds taking the same path through the scheduler is 
> a downside of this approach.

Indeed. I mean, I can probably live with this limitation, but I'm
curious to know if the pg table management changes you're considering
for the future would solve that problem.

Anyway, thanks for taking the time to answer my question, things are
much clearer now.

Boris

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-03-10 17:25       ` Boris Brezillon
@ 2023-03-10 20:06         ` Danilo Krummrich
  0 siblings, 0 replies; 64+ messages in thread
From: Danilo Krummrich @ 2023-03-10 20:06 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: matthew.brost, bagasdotme, corbet, nouveau, ogabbay, linux-doc,
	linux-kernel, dri-devel, linux-mm, willy, tzimmermann,
	Liam.Howlett, christian.koenig, jason, bskeggs

On 3/10/23 18:25, Boris Brezillon wrote:
> Hi Danilo,
> 
> On Fri, 10 Mar 2023 17:45:58 +0100
> Danilo Krummrich <dakr@redhat.com> wrote:
>> Hi Boris,
>>
>>> On Thu, 9 Mar 2023 10:12:43 +0100
>>> Boris Brezillon <boris.brezillon@collabora.com> wrote:
>>>
>>> Ok, so I just noticed you only have one bind queue per drm_file
>>> (cli->sched_entity), and jobs are executed in-order on a given queue,
>>> so I guess that allows you to modify the VA space at submit time
>>> without risking any modifications to the VA space coming from other
>>> bind-queues targeting the same VM. And, if I'm correct, synchronous
>>> bind/unbind ops take the same path, so no risk for those to modify the
>>> VA space either (just wonder if it's a good thing to have to sync
>>> bind/unbind operations waiting on async ones, but that's a different
>>> topic).
>>
>> Yes, that's all correct.
>>
>> The page table allocation happens through nouveau_uvmm_vmm_get() which
>> either allocates the corresponding page tables or increases the
>> reference count, in case they already exist, accordingly.
>> The call goes all the way through nvif into the nvkm layer (not the
>> easiest to follow the call chain) and ends up in nvkm_vmm_ptes_get().
>>
>> There are multiple reasons for updating the VA space at submit time in
>> Nouveau.
>>
>> 1) Subsequent EXEC ioctl() calls would need to wait for the bind jobs
>> they depend on within the ioctl() rather than in the scheduler queue,
>> because at the point of time where the ioctl() happens the VA space
>> wouldn't be up-to-date.
> 
> Hm, actually that's what explicit sync is all about, isn't it? If you
> have async binding ops, you should retrieve the bind-op out-fences and
> pass them back as in-fences to the EXEC call, so you're sure all the
> memory mappings you depend on are active when you execute those GPU
> jobs. And if you're using sync binds, the changes are guaranteed to be
> applied before the ioctl() returns. Am I missing something?
> 

No, you're right and this is exactly how I implemented it. The 
difference is where to wait for the bind jobs out-fences.

In the EXEC ioctl() we need to validate the GEM objects backing the 
dependent mappings and add the jobs fence to the GEMs DMA reservation. 
If the VA space isn't up-to-date we might not be able to look up the 
relevant GEMs and miss them.

If the VA space change happens in the bind jobs submit path (ioctl()), 
it is guaranteed that the view of the VA space is up-to-date (actually 
it might even be ahead of the actual current state) when the EXEC 
ioctl() is called. Hence, I can just pass the out-fences of the binds 
jobs the EXEC depends on to the job scheduler and return from the 
ioctl(). The job scheduler will then wait for the actual mappings being 
populated before executing the EXEC job.

If the VA space change is done when the bind job executes on the 
scheduler we would need to wait for the bind jobs out-fences in the EXEC 
ioctl() itself.

>>
>> 2) Let's assume a new mapping is requested and within it's range other
>> mappings already exist. Let's also assume that those existing mappings
>> aren't contiguous, such that there are gaps between them. In such a case
>> I need to allocate page tables only for the gaps between the existing
>> mappings, or alternatively, allocate them for the whole range of the new
>> mapping, but free / decrease the reference count of the page tables for
>> the ranges of the previously existing mappings afterwards.
>> In the first case I need to know the gaps to allocate page tables for
>> when submitting the job, which means the VA space must be up-to-date. In
>> the latter one I must save the ranges of the previously existing
>> mappings somewhere in order to clean them up, hence I need to allocate
>> memory to store this information. Since I can't allocate this memory in
>> the jobs run() callback (fence signalling critical section) I need to do
>> it when submitting the job already and hence the VA space must be
>> up-to-date again.
> 
> Yep that makes perfect sense, and that explains how the whole thing can
> work. When I initially read the patch series, I had more complex use
> cases in mind, with multiple bind queues targeting the same VM, and
> synchronous bind taking a fast path (so they don't have to wait on
> async binds which can in turn wait on external deps). This model makes
> it hard to predict what the VA space will look like when an async bind
> operation gets to be executed, thus making page table allocation more
> complex, or forcing us to over-estimate the amount of pages we need for
> this update (basically one page per MMU level, except maybe the top
> level, plus the number of pages you'll always need for the bind
> operation itself).
> 
>> However, this is due to how page table management currently works in
>> Nouveau and we might change that in the future.
> 
> I'm curious to hear about that if you have a bit of time. I'm starting
> from scratch with pancsf, and I might consider going for something
> similar to what you plan to do next.

There is no concrete plan yet. However, with the current implementation 
there are a few shortcomings (also in handling sparse ranges) that I'd 
like to address in the future.

> 
>>
>> Synchronous binds/unbinds taking the same path through the scheduler is
>> a downside of this approach.
> 
> Indeed. I mean, I can probably live with this limitation, but I'm
> curious to know if the pg table management changes you're considering
> for the future would solve that problem.

As mentioned above, I have a few ideas, but I did not think through them 
entirely yet.

A few thoughts though: If running synchronous binds/unbinds through the 
job scheduler is a concern I think it could be beneficial to 
(pre-)allocate page tables for newly requested mappings without the need 
to know whether there are existing mappings within this range already 
(ideally without tracking page table allocations separate from GPUVAs), 
such that we can update the VA space at job execution time. Same thing 
for freeing page tables for a range that only partially contains 
mappings at all. For that, reference counting page tables per mapping 
wouldn't really work.

On the other hand we need to consider that freeing page tables for a 
given range and allocating new page tables for the same or an 
overlapping range would need to be ordered in order to avoid races.

> 
> Anyway, thanks for taking the time to answer my question, things are
> much clearer now.

I'm happy to discuss this. Feel free to also reach out in IRC, my nick 
is 'dakr'.

> 
> Boris
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-03-07 22:43               ` Liam R. Howlett
@ 2023-03-13 23:46                 ` Danilo Krummrich
  2023-03-20 19:16                   ` Liam R. Howlett
  0 siblings, 1 reply; 64+ messages in thread
From: Danilo Krummrich @ 2023-03-13 23:46 UTC (permalink / raw)
  To: Liam R. Howlett
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Dave Airlie, bagasdotme, christian.koenig, jason

On 3/7/23 23:43, Liam R. Howlett wrote:
> * Danilo Krummrich <dakr@redhat.com> [230306 10:46]:
>> On 3/2/23 03:38, Liam R. Howlett wrote:
>>> * Danilo Krummrich <dakr@redhat.com> [230227 08:17]:
>>>
>>> ...
>>>>>> Would this variant be significantly more efficient?
>>>>>
>>>>> Well, what you are doing is walking the tree to see if there's anything
>>>>> there... then re-walking the tree to store it.  So, yes, it's much more
>>>>> efficient..  However, writing is heavier.  How much of the time is spent
>>>>> walking vs writing depends on the size of the tree, but it's rather easy
>>>>> to do this in a single walk of the tree so why wouldn't you?
>>>>
>>>> I will, I was just curious about how much of an impact it has.
>>>>
>>>>>
>>>>>>
>>>>>> Also, would this also work while already walking the tree?
>>>>>
>>>>> Yes, to an extent.  If you are at the correct location in the tree, you
>>>>> can write to that location.  If you are not in the correct location and
>>>>> try to write to the tree then things will go poorly..  In this scenario,
>>>>> we are very much walking the tree and writing to it in two steps.
>>>>>
>>>>>>
>>>>>> To remove an entry while walking the tree I have a separate function
>>>>>> drm_gpuva_iter_remove(). Would I need something similar for inserting
>>>>>> entries?
>>>>>
>>>>> I saw that.  Your remove function uses the erase operation which is
>>>>> implemented as a walk to that location and a store of a null over the
>>>>> range that is returned.  You do not need a function to insert an entry
>>>>> if the maple state is at the correct location, and that doesn't just
>>>>> mean setting mas.index/mas.last to the correct value.  There is a node &
>>>>> offset saved in the maple state that needs to be in the correct
>>>>> location.  If you store to that node then the node may be replaced, so
>>>>> other iterators that you have may become stale, but the one you used
>>>>> execute the store operation will now point to the new node with the new
>>>>> entry.
>>>>>
>>>>>>
>>>>>> I already provided this example in a separate mail thread, but it may makes
>>>>>> sense to move this to the mailing list:
>>>>>>
>>>>>> In __drm_gpuva_sm_map() we're iterating a given range of the tree, where the
>>>>>> given range is the size of the newly requested mapping. __drm_gpuva_sm_map()
>>>>>> invokes a callback for each sub-operation that needs to be taken in order to
>>>>>> fulfill this mapping request. In most cases such a callback just creates a
>>>>>> drm_gpuva_op object and stores it in a list.
>>>>>>
>>>>>> However, drivers can also implement the callback, such that they directly
>>>>>> execute this operation within the callback.
>>>>>>
>>>>>> Let's have a look at the following example:
>>>>>>
>>>>>>         0     a     2
>>>>>> old: |-----------|       (bo_offset=n)
>>>>>>
>>>>>>               1     b     3
>>>>>> req:       |-----------| (bo_offset=m)
>>>>>>
>>>>>>         0  a' 1     b     3
>>>>>> new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
>>>>>>
>>>>>> This would result in the following operations.
>>>>>>
>>>>>> __drm_gpuva_sm_map() finds entry "a" and calls back into the driver
>>>>>> suggesting to re-map "a" with the new size. The driver removes entry "a"
>>>>>> from the tree and adds "a'"
>>>>>
>>>>> What you have here won't work.  The driver will cause your iterators
>>>>> maple state to point to memory that is freed.  You will either need to
>>>>> pass through your iterator so that the modifications can occur with that
>>>>> maple state so it remains valid, or you will need to invalidate the
>>>>> iterator on every modification by the driver.
>>>>>
>>>>> I'm sure the first idea you have will be to invalidate the iterator, but
>>>>> that is probably not the way to proceed.  Even ignoring the unclear
>>>>> locking of two maple states trying to modify the tree, this is rather
>>>>> inefficient - each invalidation means a re-walk of the tree.  You may as
>>>>> well not use an iterator in this case.
>>>>>
>>>>> Depending on how/when the lookups occur, you could still iterate over
>>>>> the tree and let the driver modify the ending of "a", but leave the tree
>>>>> alone and just store b over whatever - but the failure scenarios may
>>>>> cause you grief.
>>>>>
>>>>> If you pass the iterator through, then you can just use it to do your
>>>>> writes and keep iterating as if nothing changed.
>>>>
>>>> Passing through the iterater clearly seems to be the way to go.
>>>>
>>>> I assume that if the entry to insert isn't at the location of the iterator
>>>> (as in the following example) we can just keep walking to this location my
>>>> changing the index of the mas and calling mas_walk()?
>>>
>>> no.  You have to mas_set() to the value and walk from the top of the
>>> tree.  mas_walk() walks down, not from side to side - well, it does go
>>> forward within a node (increasing offset), but if you hit the node limit
>>> then you have gotten yourself in trouble.
>>>
>>>> This would also imply
>>>> that the "outer" tree walk continues after the entry we just inserted,
>>>> right?
>>>
>>> I don't understand the "outer" tree walk statement.
>>
>> I think I could have phrased this better. I just mean "my" iterator walking
>> each tree entry rather than an internal tree walk, as it happens in e.g.
>> mas_walk() or mas_find().
>>
>>>
>>>>
>>>>              1     a     3
>>>> old:       |-----------| (bo_offset=n)
>>>>
>>>>        0     b     2
>>>> req: |-----------|       (bo_offset=m)
>>>>
>>>>        0     b     2  a' 3
>>>> new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
>>>>
>>>> Again, after finding "a", we want to remove it and insert "a'" instead.
>>>
>>> Ah, so you could walk to 0, see that it's NULL from 0 - 1, call
>>> mas_next() and get "a" from 1 - 3, write "a'" from 2 - 3:
>>>
>>>           0     1  a   2  a' 3
>>> broken: |-----|------|-----| (a is broken in this 1/2 step)
>>>
>>> mas_set_range(&mas, 0, 2); /* Resets the tree location to MAS_START */
>>> mas_store(&mas, b);
>>>           0     b     2  a' 3
>>> new:    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
>>>
>>>
>>> You can *probably* also get away with this:
>>>
>>> walk to 0, see that it's NULL from 0 - 1, call mas_next() and get "a"
>>> from 1 - 3, write "a'" from 2 - 3:
>>>
>>>           0     1  a   2  a' 3
>>> broken: |-----|------|-----| (a is broken in this 1/2 step)
>>>
>>> mas_prev(&mas, 0); /* Looking at broken a from 1-2.
>>> mas_store(&mas, NULL); /* NULL is expanded on write to 0-2.
>>>               0    NULL   2  a' 3
>>> broken':    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
>>>
>>> mas_store(&mas, b);
>>>           0     b     2  a' 3
>>> new:    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
>>>
>>> You may want to iterate backwards and do the writes as you go until you
>>> have enough room.. it really depends how you want to go about doing
>>> things.
>>
>> I see, again thanks for explaining.
>>
>> I think I would prefer to either (1) have generic insert() function with a
>> similar behavior as when iterating through a list or (2) have a function
>> dedicated to the "split" use case.
>>
>> 1) When iterating the tree inserting entries at arbitrary locations should
>> not influence the next iteration step. Unless the new entry really is the
>> next entry, but that'd be optional. I don't see a use case for that.
>>
>> 2) Similar to how you broke it down above I could imagine a function
>> dedicated to the split operation. This would be similar to what you mention
>> for mmap below. However, it wouldn't be a single operation.
>>
>> The GPUVA manager provides sub-operations to the driver for a single mapping
>> request. Those can be an arbitrary amount of unmaps (for mappings "in the
>> way", as you say below), one or two remaps (for splits at the beginning or
>> end or both) and exactly one map (which is the last sub-operation adding the
>> newly requested mapping).
>>
>> Remaps consist out of the mapping to unmap and one or two new mappings to
>> map. The only case where a remap sub-op has two new mappings to map is when
>> the newly requested mapping is enclosed by a single existing mapping. If we
>> overlap a mapping at the beginning and another one at the end this would be
>> two separate remap sub-ops. Of course, between the two remaps there could be
>> an arbitrary amount of unmap sub-ops.
>>
>> Unmap sub-ops are simple, I just need to remove a single entry in the tree.
>> drm_gpuva_iter_remove() should be fine for that.
>>
>> For remap sub-ops, I would need a function that removes an entry and then
>> adds one or two new entries within the range of the removed one. The next
>> loop iteration should then continue at the entry (is any) after the range of
>> the removed one.
>>
>> However, I'm unsure how to implement this. Would I need to just do a
>> mas_store() of the new entry/entries (since the nodes should already be
>> allocated) and then clean up the nodes that are left with mas_erase()?
>>
>> Let's say there is an entry A = [0 - 5] and I want to replace it with B = [0
>> - 1] and C = [4 - 5].
>>
>> Could I just store B and C and then somehow clean up the range [2 - 3]?
> 
> The most efficient way:
> mas_set(&mas, 0);
> // Walk down to 0
> mas_walk(&mas);
> // We are now pointing at A (index = 0, last = 5)
> mas.last = 1;
> // No walk here.
> mas_store(&mas, B);
> // Going to the next entry is very fast.
> mas_next(&mas)
> // We are now pointing at a fragment of A (index = 2, last = 5)
> mas.last = 3;
> // No walk here.
> mas_store(&mas, NULL);
> // Going to the next entry is very fast
> mas_next(&mas);
> // We are now pointing at a fragment of A (index = 4, last = 5)
> mas_store(&mas, C);
> 
> Less efficient, but still fine:
> // Walk down to 0 and store
> mas_set_range(&mas, 0, 1);
> mas_store(&mas, B);
> // Reset to the top of the tree
> mas_set_range(&mas, 4, 5);
> // Walk down to 4 and store
> mas_store(&mas, C);
> // Reset to the top of the tree
> mas_set_range(&mas, 2, 3);
> // Walk down to 2 and store
> mas_store(&mas, NULL);
> 
> 
>>
>> Maybe 1) would be the most flexible way, however, if 2) can be implemented
>> more efficiently that's perfectly fine too.
> 
> You can do anything you want, but the more you can use the same maple
> state and save walking from the top the more efficient it will be.
> Every level is another dereference down the tree..  We do have a
> branching factor of 16 here, so I don't know the size of your tree and
> how worth the effort it is for you.

I think it could be worth taking the first approach and providing 
functions that are tied specifically to the use cases of the GPUVA 
manager, rather than generalizing them too much and re-walk the tree 
more than necessary. I think the size of the tree can be up to a couple 
100k.

Since some operations may be executed from dma-fence signalling critical 
sections I have a use case for mas_preallocate(). I was wondering if I 
can ignore the "entry" argument of mas_preallocate() and just pass NULL, 
since it's actually never used. What's the purpose of this argument? Or 
is it bug?

> 
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>> __drm_gpuva_sm_map(), ideally, continues the loop searching for nodes
>>>>>> starting from the end of "a" (which is 2) till the end of the requested
>>>>>> mapping "b" (which is 3). Since it doesn't find any other mapping within
>>>>>> this range it calls back into the driver suggesting to finally map "b".
>>>>>>
>>>>>> If there would have been another mapping between 2 and 3 it would have
>>>>>> called back into the driver asking to unmap this mapping beforehand.
>>>>>>
>>>>>> So, it boils down to re-mapping as described at the beginning (and
>>>>>> analogously at the end) of a new mapping range and removing of entries that
>>>>>> are enclosed by the new mapping range.
>>>>>
>>>>> I assume the unmapped area is no longer needed, and the 're-map' is
>>>>> really a removal of information?  Otherwise I'd suggest searching for a
>>>>> gap which fits your request.  What you have here is a lot like
>>>>> "MAP_FIXED" vs top-down/bottom-up search in the VMA code, this seems to
>>>>> be like your __drm_gpuva_sm_map() and the drm mm range allocator with
>>>>> DRM_MM_INSERT_LOW, and DRM_MM_INSERT_HIGH.
>>>>>
>>>>> Why can these split/unmappings fail?  Is it because they are still
>>>>> needed?
>>>>>
>>>>
>>>> You mean the check before the mas_*() operations in drm_gpuva_insert()?
>>>
>>> Yes, the callbacks.
>>>
>>>>
>>>> Removing entries should never fail, inserting entries should fail when the
>>>> caller tries to store to an area outside of the VA space (it doesn't
>>>> necessarily span the whole 64-bit space), a kernel reserved area of the VA
>>>> space, is not in any pre-allocated range of the VA space (if regions are
>>>> enabled) or an entry already exists at that location.
>>>
>>> In the mmap code, I have to deal with splitting the start/end VMA and
>>> removing any VMAs in the way.  I do this by making a 'detached' tree
>>> that is dealt with later, then just overwriting the area with one
>>> mas_store() operation.  Would something like that work for you?
>>
>> I think this is pretty much the same thing I want to do, hence this should
>> work. However, this would require more state keeping for the whole
>> iteration, I guess. Drivers shouldn't know how the GPUVA manager keeps track
>> of mappings internally (and hence they shouldn't know about the maple tree).
>> If I could get away with something similar to what I wrote above, I think
>> I'd probably not add this extra complexity, unless there are relevant
>> performance reasons to do so.
>>
> 
> Well maybe you can tell your drivers that there's something in the way
> and they can remove it from their end but not alter the tree.  Sort of
> like a "Untracked" callback.
> 
> If you have a "This range has changed to X-Y" then you can use this
> along with the "Untracked" to implement the above in a single
> write.  Iterate through the area , call back to the driver to alter the
> start range, then keep "Untracing" until you alter the end range.
> 
> ...
> 
> Cheers,
> Liam
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
  2023-03-13 23:46                 ` Danilo Krummrich
@ 2023-03-20 19:16                   ` Liam R. Howlett
  0 siblings, 0 replies; 64+ messages in thread
From: Liam R. Howlett @ 2023-03-20 19:16 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: matthew.brost, willy, dri-devel, corbet, nouveau, ogabbay,
	linux-doc, linux-kernel, linux-mm, boris.brezillon, bskeggs,
	tzimmermann, Dave Airlie, bagasdotme, christian.koenig, jason

* Danilo Krummrich <dakr@redhat.com> [230313 19:46]:
> On 3/7/23 23:43, Liam R. Howlett wrote:
> > * Danilo Krummrich <dakr@redhat.com> [230306 10:46]:
> > > On 3/2/23 03:38, Liam R. Howlett wrote:
> > > > * Danilo Krummrich <dakr@redhat.com> [230227 08:17]:
> > > > 
> > > > ...
> > > > > > > Would this variant be significantly more efficient?
> > > > > > 
> > > > > > Well, what you are doing is walking the tree to see if there's anything
> > > > > > there... then re-walking the tree to store it.  So, yes, it's much more
> > > > > > efficient..  However, writing is heavier.  How much of the time is spent
> > > > > > walking vs writing depends on the size of the tree, but it's rather easy
> > > > > > to do this in a single walk of the tree so why wouldn't you?
> > > > > 
> > > > > I will, I was just curious about how much of an impact it has.
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Also, would this also work while already walking the tree?
> > > > > > 
> > > > > > Yes, to an extent.  If you are at the correct location in the tree, you
> > > > > > can write to that location.  If you are not in the correct location and
> > > > > > try to write to the tree then things will go poorly..  In this scenario,
> > > > > > we are very much walking the tree and writing to it in two steps.
> > > > > > 
> > > > > > > 
> > > > > > > To remove an entry while walking the tree I have a separate function
> > > > > > > drm_gpuva_iter_remove(). Would I need something similar for inserting
> > > > > > > entries?
> > > > > > 
> > > > > > I saw that.  Your remove function uses the erase operation which is
> > > > > > implemented as a walk to that location and a store of a null over the
> > > > > > range that is returned.  You do not need a function to insert an entry
> > > > > > if the maple state is at the correct location, and that doesn't just
> > > > > > mean setting mas.index/mas.last to the correct value.  There is a node &
> > > > > > offset saved in the maple state that needs to be in the correct
> > > > > > location.  If you store to that node then the node may be replaced, so
> > > > > > other iterators that you have may become stale, but the one you used
> > > > > > execute the store operation will now point to the new node with the new
> > > > > > entry.
> > > > > > 
> > > > > > > 
> > > > > > > I already provided this example in a separate mail thread, but it may makes
> > > > > > > sense to move this to the mailing list:
> > > > > > > 
> > > > > > > In __drm_gpuva_sm_map() we're iterating a given range of the tree, where the
> > > > > > > given range is the size of the newly requested mapping. __drm_gpuva_sm_map()
> > > > > > > invokes a callback for each sub-operation that needs to be taken in order to
> > > > > > > fulfill this mapping request. In most cases such a callback just creates a
> > > > > > > drm_gpuva_op object and stores it in a list.
> > > > > > > 
> > > > > > > However, drivers can also implement the callback, such that they directly
> > > > > > > execute this operation within the callback.
> > > > > > > 
> > > > > > > Let's have a look at the following example:
> > > > > > > 
> > > > > > >         0     a     2
> > > > > > > old: |-----------|       (bo_offset=n)
> > > > > > > 
> > > > > > >               1     b     3
> > > > > > > req:       |-----------| (bo_offset=m)
> > > > > > > 
> > > > > > >         0  a' 1     b     3
> > > > > > > new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
> > > > > > > 
> > > > > > > This would result in the following operations.
> > > > > > > 
> > > > > > > __drm_gpuva_sm_map() finds entry "a" and calls back into the driver
> > > > > > > suggesting to re-map "a" with the new size. The driver removes entry "a"
> > > > > > > from the tree and adds "a'"
> > > > > > 
> > > > > > What you have here won't work.  The driver will cause your iterators
> > > > > > maple state to point to memory that is freed.  You will either need to
> > > > > > pass through your iterator so that the modifications can occur with that
> > > > > > maple state so it remains valid, or you will need to invalidate the
> > > > > > iterator on every modification by the driver.
> > > > > > 
> > > > > > I'm sure the first idea you have will be to invalidate the iterator, but
> > > > > > that is probably not the way to proceed.  Even ignoring the unclear
> > > > > > locking of two maple states trying to modify the tree, this is rather
> > > > > > inefficient - each invalidation means a re-walk of the tree.  You may as
> > > > > > well not use an iterator in this case.
> > > > > > 
> > > > > > Depending on how/when the lookups occur, you could still iterate over
> > > > > > the tree and let the driver modify the ending of "a", but leave the tree
> > > > > > alone and just store b over whatever - but the failure scenarios may
> > > > > > cause you grief.
> > > > > > 
> > > > > > If you pass the iterator through, then you can just use it to do your
> > > > > > writes and keep iterating as if nothing changed.
> > > > > 
> > > > > Passing through the iterater clearly seems to be the way to go.
> > > > > 
> > > > > I assume that if the entry to insert isn't at the location of the iterator
> > > > > (as in the following example) we can just keep walking to this location my
> > > > > changing the index of the mas and calling mas_walk()?
> > > > 
> > > > no.  You have to mas_set() to the value and walk from the top of the
> > > > tree.  mas_walk() walks down, not from side to side - well, it does go
> > > > forward within a node (increasing offset), but if you hit the node limit
> > > > then you have gotten yourself in trouble.
> > > > 
> > > > > This would also imply
> > > > > that the "outer" tree walk continues after the entry we just inserted,
> > > > > right?
> > > > 
> > > > I don't understand the "outer" tree walk statement.
> > > 
> > > I think I could have phrased this better. I just mean "my" iterator walking
> > > each tree entry rather than an internal tree walk, as it happens in e.g.
> > > mas_walk() or mas_find().
> > > 
> > > > 
> > > > > 
> > > > >              1     a     3
> > > > > old:       |-----------| (bo_offset=n)
> > > > > 
> > > > >        0     b     2
> > > > > req: |-----------|       (bo_offset=m)
> > > > > 
> > > > >        0     b     2  a' 3
> > > > > new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> > > > > 
> > > > > Again, after finding "a", we want to remove it and insert "a'" instead.
> > > > 
> > > > Ah, so you could walk to 0, see that it's NULL from 0 - 1, call
> > > > mas_next() and get "a" from 1 - 3, write "a'" from 2 - 3:
> > > > 
> > > >           0     1  a   2  a' 3
> > > > broken: |-----|------|-----| (a is broken in this 1/2 step)
> > > > 
> > > > mas_set_range(&mas, 0, 2); /* Resets the tree location to MAS_START */
> > > > mas_store(&mas, b);
> > > >           0     b     2  a' 3
> > > > new:    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> > > > 
> > > > 
> > > > You can *probably* also get away with this:
> > > > 
> > > > walk to 0, see that it's NULL from 0 - 1, call mas_next() and get "a"
> > > > from 1 - 3, write "a'" from 2 - 3:
> > > > 
> > > >           0     1  a   2  a' 3
> > > > broken: |-----|------|-----| (a is broken in this 1/2 step)
> > > > 
> > > > mas_prev(&mas, 0); /* Looking at broken a from 1-2.
> > > > mas_store(&mas, NULL); /* NULL is expanded on write to 0-2.
> > > >               0    NULL   2  a' 3
> > > > broken':    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> > > > 
> > > > mas_store(&mas, b);
> > > >           0     b     2  a' 3
> > > > new:    |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> > > > 
> > > > You may want to iterate backwards and do the writes as you go until you
> > > > have enough room.. it really depends how you want to go about doing
> > > > things.
> > > 
> > > I see, again thanks for explaining.
> > > 
> > > I think I would prefer to either (1) have generic insert() function with a
> > > similar behavior as when iterating through a list or (2) have a function
> > > dedicated to the "split" use case.
> > > 
> > > 1) When iterating the tree inserting entries at arbitrary locations should
> > > not influence the next iteration step. Unless the new entry really is the
> > > next entry, but that'd be optional. I don't see a use case for that.
> > > 
> > > 2) Similar to how you broke it down above I could imagine a function
> > > dedicated to the split operation. This would be similar to what you mention
> > > for mmap below. However, it wouldn't be a single operation.
> > > 
> > > The GPUVA manager provides sub-operations to the driver for a single mapping
> > > request. Those can be an arbitrary amount of unmaps (for mappings "in the
> > > way", as you say below), one or two remaps (for splits at the beginning or
> > > end or both) and exactly one map (which is the last sub-operation adding the
> > > newly requested mapping).
> > > 
> > > Remaps consist out of the mapping to unmap and one or two new mappings to
> > > map. The only case where a remap sub-op has two new mappings to map is when
> > > the newly requested mapping is enclosed by a single existing mapping. If we
> > > overlap a mapping at the beginning and another one at the end this would be
> > > two separate remap sub-ops. Of course, between the two remaps there could be
> > > an arbitrary amount of unmap sub-ops.
> > > 
> > > Unmap sub-ops are simple, I just need to remove a single entry in the tree.
> > > drm_gpuva_iter_remove() should be fine for that.
> > > 
> > > For remap sub-ops, I would need a function that removes an entry and then
> > > adds one or two new entries within the range of the removed one. The next
> > > loop iteration should then continue at the entry (is any) after the range of
> > > the removed one.
> > > 
> > > However, I'm unsure how to implement this. Would I need to just do a
> > > mas_store() of the new entry/entries (since the nodes should already be
> > > allocated) and then clean up the nodes that are left with mas_erase()?
> > > 
> > > Let's say there is an entry A = [0 - 5] and I want to replace it with B = [0
> > > - 1] and C = [4 - 5].
> > > 
> > > Could I just store B and C and then somehow clean up the range [2 - 3]?
> > 
> > The most efficient way:
> > mas_set(&mas, 0);
> > // Walk down to 0
> > mas_walk(&mas);
> > // We are now pointing at A (index = 0, last = 5)
> > mas.last = 1;
> > // No walk here.
> > mas_store(&mas, B);
> > // Going to the next entry is very fast.
> > mas_next(&mas)
> > // We are now pointing at a fragment of A (index = 2, last = 5)
> > mas.last = 3;
> > // No walk here.
> > mas_store(&mas, NULL);
> > // Going to the next entry is very fast
> > mas_next(&mas);
> > // We are now pointing at a fragment of A (index = 4, last = 5)
> > mas_store(&mas, C);
> > 
> > Less efficient, but still fine:
> > // Walk down to 0 and store
> > mas_set_range(&mas, 0, 1);
> > mas_store(&mas, B);
> > // Reset to the top of the tree
> > mas_set_range(&mas, 4, 5);
> > // Walk down to 4 and store
> > mas_store(&mas, C);
> > // Reset to the top of the tree
> > mas_set_range(&mas, 2, 3);
> > // Walk down to 2 and store
> > mas_store(&mas, NULL);
> > 
> > 
> > > 
> > > Maybe 1) would be the most flexible way, however, if 2) can be implemented
> > > more efficiently that's perfectly fine too.
> > 
> > You can do anything you want, but the more you can use the same maple
> > state and save walking from the top the more efficient it will be.
> > Every level is another dereference down the tree..  We do have a
> > branching factor of 16 here, so I don't know the size of your tree and
> > how worth the effort it is for you.
> 
> I think it could be worth taking the first approach and providing functions
> that are tied specifically to the use cases of the GPUVA manager, rather
> than generalizing them too much and re-walk the tree more than necessary. I
> think the size of the tree can be up to a couple 100k.

A couple 100k VMAs?  As in 2 trees of 100k VMAs or 200k VMAs in a single
tree?  So that's 5 dereferences to walk from the root to the VMA.

> 
> Since some operations may be executed from dma-fence signalling critical
> sections I have a use case for mas_preallocate(). I was wondering if I can
> ignore the "entry" argument of mas_preallocate() and just pass NULL, since
> it's actually never used. What's the purpose of this argument? Or is it bug?

It existed to optimize the preallocations, but that functionality was
never completed.  It is slated to be dropped by a patch [1] in the
mm-unstable branch.  I am not sure it's worth doing the optimization
after the zeroing fix [2] of the maple nodes.  If you find the
preallocations are too large and causing issues, we can revisit.. but
with a 5 level tree, we will allocate 16 nodes and almost always have
extras - we get 16 nodes per page.

If you have sparse data, then I would start to get concerned after ~524K
VMAs, then we'd be looking for 2 pages.  More compact data can run up to
~1.04M before needing 2 pages.  Then again, two pages doesn't seem like
a lot for such a large task.

How sparse is your data, on average?

[1] https://lore.kernel.org/all/20230110154211.1758562-1-vernon2gm@gmail.com/T/#u
[2] https://lore.kernel.org/all/20230105160427.2988454-1-Liam.Howlett@oracle.com/ 

Thanks,
Liam

...

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2023-03-20 19:17 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-17 13:44 [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
2023-02-17 13:44 ` [PATCH drm-next v2 01/16] drm: execution context for GEM buffers Danilo Krummrich
2023-02-17 16:00   ` Christian König
2023-02-21 14:56     ` Danilo Krummrich
2023-02-17 13:44 ` [PATCH drm-next v2 02/16] drm/exec: fix memory leak in drm_exec_prepare_obj() Danilo Krummrich
2023-02-17 13:44 ` [PATCH drm-next v2 03/16] maple_tree: split up MA_STATE() macro Danilo Krummrich
2023-02-17 18:34   ` Liam R. Howlett
2023-02-20 13:48     ` Danilo Krummrich
2023-02-21 16:52       ` Liam R. Howlett
2023-02-17 19:45   ` Matthew Wilcox
2023-02-20 13:48     ` Danilo Krummrich
2023-02-17 13:44 ` [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE Danilo Krummrich
2023-02-17 18:18   ` Liam R. Howlett
2023-02-17 19:38   ` Matthew Wilcox
2023-02-20 14:00     ` Danilo Krummrich
2023-02-20 15:10       ` Matthew Wilcox
2023-02-20 17:06         ` Danilo Krummrich
2023-02-20 20:33           ` Matthew Wilcox
2023-02-21 14:37             ` Danilo Krummrich
2023-02-21 18:31               ` Matthew Wilcox
2023-02-22 16:11                 ` Danilo Krummrich
2023-02-22 16:32                   ` Matthew Wilcox
2023-02-22 17:28                     ` Danilo Krummrich
2023-02-27 17:39                 ` Danilo Krummrich
2023-02-27 18:36                   ` Matthew Wilcox
2023-02-27 18:59                     ` Danilo Krummrich
2023-02-17 13:44 ` [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
2023-02-18  1:05   ` kernel test robot
2023-02-21 18:20   ` Liam R. Howlett
2023-02-22 18:13     ` Danilo Krummrich
2023-02-23 19:09       ` Liam R. Howlett
2023-02-27 12:23         ` Danilo Krummrich
2023-03-02  2:38           ` Liam R. Howlett
2023-03-06 15:46             ` Danilo Krummrich
2023-03-07 22:43               ` Liam R. Howlett
2023-03-13 23:46                 ` Danilo Krummrich
2023-03-20 19:16                   ` Liam R. Howlett
2023-02-28  2:17     ` Danilo Krummrich
2023-02-28 16:24       ` Liam R. Howlett
2023-03-06 13:39         ` Danilo Krummrich
2023-02-22 10:25   ` Christian König
2023-02-22 15:07     ` Danilo Krummrich
2023-02-22 15:14       ` Christian König
2023-02-22 16:40         ` Danilo Krummrich
2023-02-23  7:06           ` Christian König
2023-02-23 14:12             ` Danilo Krummrich
2023-02-17 13:48 ` [PATCH drm-next v2 06/16] drm: debugfs: provide infrastructure to dump a DRM GPU VA space Danilo Krummrich
2023-02-18  2:47   ` kernel test robot
2023-02-17 13:48 ` [PATCH drm-next v2 07/16] drm/nouveau: new VM_BIND uapi interfaces Danilo Krummrich
2023-02-17 13:48 ` [PATCH drm-next v2 08/16] drm/nouveau: get vmm via nouveau_cli_vmm() Danilo Krummrich
2023-02-17 13:48 ` [PATCH drm-next v2 09/16] drm/nouveau: bo: initialize GEM GPU VA interface Danilo Krummrich
2023-02-17 13:48 ` [PATCH drm-next v2 10/16] drm/nouveau: move usercopy helpers to nouveau_drv.h Danilo Krummrich
2023-02-17 13:48 ` [PATCH drm-next v2 11/16] drm/nouveau: fence: fail to emit when fence context is killed Danilo Krummrich
2023-02-17 13:48 ` [PATCH drm-next v2 12/16] drm/nouveau: chan: provide nouveau_channel_kill() Danilo Krummrich
2023-02-17 13:48 ` [PATCH drm-next v2 13/16] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm Danilo Krummrich
2023-02-18  1:16   ` kernel test robot
2023-02-17 13:48 ` [PATCH drm-next v2 14/16] drm/nouveau: implement uvmm for user mode bindings Danilo Krummrich
2023-02-17 13:48 ` [PATCH drm-next v2 15/16] drm/nouveau: implement new VM_BIND UAPI Danilo Krummrich
2023-02-17 13:48 ` [PATCH drm-next v2 16/16] drm/nouveau: debugfs: implement DRM GPU VA debugfs Danilo Krummrich
2023-03-09  9:12 ` [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Boris Brezillon
2023-03-09  9:48   ` Boris Brezillon
2023-03-10 16:45     ` Danilo Krummrich
2023-03-10 17:25       ` Boris Brezillon
2023-03-10 20:06         ` Danilo Krummrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).